Skip to main content
Annals of the American Thoracic Society logoLink to Annals of the American Thoracic Society
. 2024 Dec 1;21(12):1670–1677. doi: 10.1513/AnnalsATS.202405-457OC

Interstitial Lung Disease Patients’ Global Impressions of Symptoms, Severity Ratings, and Meaningfulness of Changes Over Time

Jeffrey J Swigris 1,, Joseph B Pryor 1, Kerri I Aronson 2, Taylor A Guess 1, Joshua J Solomon 1
PMCID: PMC12392340  PMID: 39133575

Abstract

Rationale

In interstitial lung disease (ILD), symptoms drive impairments in health-related quality of life. Patient-reported outcome measures (PROMs) can assess whether interventions change symptom severity. The meaningfulness of change in a PROM score is estimated by anchoring it to a related variable for which meaningful change has been previously established. Patient global impressions of severity (PGISs) are single-item PROMs that may make trustworthy anchors, but, for ILD, the meaningfulness of change in PGIS items for shortness of breath (SOB), cough, and fatigue/low energy are unknown.

Objectives

To improve understanding of how patients with ILD rate and categorize symptoms, how differing levels of symptom severity affect lived experiences, and how patients derive and apply meaningfulness to change in symptoms.

Methods

We used one-on-one interviews and an electronic survey to collect data from patients with various forms of ILD. Interviews were conducted to provide richness and context to survey responses. We conducted certain analyses with respondents stratified by supplemental oxygen use.

Results

Interviewees (N = 18) confirmed SOB, cough, and fatigue/low energy as the most bothersome symptoms of ILD. Among 298 survey respondents, on a PGIS for SOB with a 0–4 numeric rating scale, on average, those who used supplemental O2 had more severe SOB than nonusers, and most respondents considered a 2-point change meaningful for worsening (45.5%) or improvement (47.2%). On a PGIS with a five-option ordinal response scale, for SOB, most considered a 1-category change meaningful for worsening (49.8%) and a 2-category change meaningful for improvement (42.3); for cough frequency, most respondents considered a 1-category change on the five-option ordinal response scale meaningful for worsening (48.2%) or improvement (45.0%). Survey responses for SOB at the present time versus 3 months earlier (patient global impressions of change) were biased toward the present state.

Conclusions

PGISs can be used as anchors for meaningful change analyses of PROMs that assess SOB or cough in patients with ILD. Patient global impressions of change demonstrate present-state bias and should not be used. Patients’ descriptions paint a vivid picture of lived experience with varying levels of symptom severity and can help contextualize change scores.

Keywords: ILD, symptoms, patient global impressions, meaningful change


Three symptoms (shortness of breath [SOB], cough, and fatigue/low energy) impair health-related quality of life in patients with interstitial lung disease (ILD) (1). Patient-reported outcome measures (PROMs) are administered in drug trials or other studies to assess how patients/subjects feel and function in their daily lives.

Validation is a process through which we gain understanding of the meaning (and meaningfulness to patients) of a PROM’s score: what a respondent with a given score “looks like” and how their daily life is altered when scores change by varying magnitudes. A PROM’s scores are valid if the inferences made about the respondent, based on those scores, are informative and accurate; thus, validation is about the score, not the measure.

Meaningfulness of scores and their change are typically estimated as part of the PROM validation analysis and describe the change in a PROM’s score that corresponds to a meaningful alteration in how a patient feels or functions. Over time, there has been confusing terminology around this concept, whose labels have included “minimal clinically important difference” and “meaningful within-patient change,” but, currently, the preferred term is “meaningful score difference” (MSD) (2). It is frequently derived as a threshold, so a PROM change score that exceeds the MSD is a meaningful change to the patient/respondent.

There is no consensus on how MSD thresholds for a PROM’s score should be estimated (36). It most often involves tying PROM change scores to anchor variables, which are other metrics that measure (or are hypothesized to measure) the same construct as the PROM. In analyses aimed at estimating a PROM score’s MSD, only anchors with established MSDs should be used (7).

Patient global impressions of severity (PGISs) or change (PGICs) are single-item PROMs aimed at capturing symptom severity or change, and they are recommended for consideration for use as anchors in PROM validation analysis (2). However, in ILD, MSDs for PGI symptom severity items have not been established. Without this critical information, there can be little confidence in the accuracy of MSDs derived for PROMs using PGI items as anchors. In this mixed-methods study, we used interviews and surveys to achieve three objectives: 1) to examine how patients with ILD describe, rate, and categorize symptoms; 2) to assess how patients discern and conceptualize changes in symptoms; and 3) to establish MSDs for SOB and cough PGIS items.

Methods

Survey

The English-language survey was developed in and distributed via REDCap (see data supplement). The link to the survey was presented to patients by emailing it directly or by posting it in advocacy group newsletters or on websites (see data supplement). Because of the length of the survey, fatigue/low energy was only briefly touched on, and there were no items focused on meaningful change of this symptom.

Interviews

Semistructured interviews were conducted to enhance richness and contextualize survey responses. The interview cohort comprised a convenience sample of English-speaking patients with various forms of ILD recruited from the Center for ILD at National Jewish Health (Denver, CO). Interviewees did not complete the survey. See data supplement for more information on the interview, including the guide.

Analysis and Ethics Approval

Microsoft Excel was used to organize interview data. Immersion and familiarization with the data was achieved by reading and rereading full transcripts and during data organization. Data were summarized using the interview question as the organizational unit, and the analytic approach was explicit rather than latent, with the “codes” determined a priori to be severity ratings rather than actively developed in the analytic process. We intended all patient quotes/descriptors to be visible and to stand on their own rather than being integrated into higher-order themes. SAS software (version 9.4) was used for all other analyses. Demographic and clinical data were summarized and tabulated. A χ2 test (or Fisher’s exact test as appropriate) was used to analyze categorical data. With logistic regression, we examined associations between predictors and survey responses while controlling for potentially influential variables. We conducted some analyses with interviewees and survey respondents stratified by supplemental oxygen (O2) use: continuous or with exertion (users) versus no daytime use (nonusers). Interviewees were compensated with gift cards; survey respondents were not. The protocol (HS-4137) was reviewed, approved, and deemed to qualify for exemption from federal regulations by the National Jewish Health Institutional Review Board.

Results

Table 1 summarizes characteristics of interviewees (N = 18) and survey respondents (N = 298). Because of how the survey was administered, we could not calculate the response rate. There were 592 clicks to open the survey.

Table 1.

Clinical characteristics of interviewees and survey respondents

  Interviewees (n = 18) Survey Respondents (n = 298)
Age, yr 67.8 ± 9.4 70.5 ± 9.4
Female sex* 10 (56%) 133 (45%)
Country*    
 United States 18 (100%) 215 (72%)
 Australia 21 (7%)
 Canada 3 (1%)
 Ireland 1 (<1%)
 United Kingdom 53 (18%)
 Other   4 (2%)
Diagnosis    
 IPF 3 (17%) 202 (68%)
 HP 6 (33%) 27 (9%)
 Other 3 (17%) 40 (13%)
 CTD-ILD 6 (33%) 28 (9%)
  RA-ILD 2
  Anti-synthetase syndrome 3
  SSc-ILD 1
Daytime supplemental O2 use    
 No 10 (22%) 166 (56%)
 Yes 8 (44%) 132 (44%)

Definition of abbreviations: CTD-ILD = connective tissue disease–related interstitial lung disease; HP = hypersensitivity pneumonitis; IPF = idiopathic pulmonary fibrosis; RA-ILD = rheumatoid arthritis interstitial lung disease; SSc-ILD = systemic sclerosis–related interstitial lung disease.

*

One missing for survey group.

The survey did not ask patients with CTD-ILD whether their ILD was fibrotic, but it did for HP.

Most Bothersome Symptoms of ILD According to Interviewees

To set the stage, we started by asking interviewees to identify the most bothersome symptom of their lung disease. Fifteen of 18 volunteered SOB, which they described as “shortness of breath” (69-year-old man with rheumatoid arthritis ILD, no O2 use), “hard to breathe” (60-year-old woman with anti-synthetase syndrome, continuous O2 use), “[low] lung intake” (64-year-old man with familial pulmonary fibrosis, no O2 use), “drains my lungs” (66-year-old man with idiopathic pulmonary fibrosis [IPF], continuous O2), “gotta stop and catch my breath” (67-year-old woman with hypersensitivity pneumonitis [HP], sleep O2 use), “not being able to breathe” (73-year-old woman with idiopathic nonspecific interstitial pneumonia, continuous O2), “a little labored on my breathing” (59-year-old man with HP, no O2 use), “inability to get as much breath as I need” (57-year-old woman with HP, no O2 use), “lack of oxygen” (71-year-old woman with HP, continuous O2). Four interpreted “symptoms” broadly and mentioned O2. Two mentioned cough, and three mentioned “major fatigue,” “lack of energy,” or “tire out easily.”

When asked directly, 5 of 17 interviewees denied cough altogether, and 7 (41%) rated their cough severity a 0 during the previous week on a 0–4 numeric rating scale (NRS). Those who mentioned cough said they cough “if I try to take a deep breath”; “more with activity”; “[in] fits”; “not very much… and when I do, it’s kind of dry”; and “[in] episodes where I felt like, you know, it’s gonna snap my back.”

When asked directly, 16 of 18 interviewees described fatigue and/or low energy and described it as “…[feeling like] I could do this, but you can’t. You’re tired, you can’t”; “my energy level is definitely dropped”; “low energy means I don’t wanna go to the gym… but I’ll read a book or watch a movie”; “at the end of day, you’re tired. You just want to rest, you know? That has nothing to do with being short of breath”; “just the blah feeling. You just don’t feel any energy”; and “I think that low energy probably more… captures how I feel than fatigue.”

SOB Severity

Survey

On a 0–4 NRS, most survey respondents (52.0%) categorized response options 1 and 2 as mild, 3 as moderate, and 4 as severe (Figure 1). There was no difference in this categorization scheme between O2 users and nonusers (P = 0.88). O2 users had more severe SOB than nonusers (Fisher’s exact test P < 0.0001; Figure 2). Most respondents (82%) believed their positions for transitions between mild and moderate SOB and between moderate and severe SOB on the 0–4 NRS for the SOB PGIS were the same for a 0–4 NRS PGIS for cough severity.

Figure 1.


Figure 1.

Mild, moderate, and severe categorization of (A) 0–4 and (B) 0–5 numeric rating scales for patient global impression of severity for shortness of breath. Shown are the numbers and percentages of survey respondents who categorized the numeric rating scale as shown. Color changes represent thresholds between categories (e.g., in A, the “N = 129, 52.0%” group considered scores of 1 and 2 as mild [green], 3 as moderate [blue], and 4 as severe [red]).

Figure 2.


Figure 2.

Patient global impression of severity for shortness of breath for (A) supplemental O2 users (continuous or exertion) and (B) nonusers (none or sleep only). Fisher’s exact test P < 0.0001 for comparison between O2 users and nonusers. CTD = connective tissue disease; HP = hypersensitivity pneumonitis; IPF = idiopathic pulmonary fibrosis; Oth = other.

We asked survey respondents to write their own item to give to patients with ILD to assess SOB severity during the previous week. Responses covered a range of concepts (Table E1 in the data supplement). Of the 281 responses, 94 dealt with assessing SOB during various activities (climbing stairs was mentioned most frequently, n = 12).

Interviews

To add granularity, Table 2 shows interviewees’ descriptions of ratings of 1, 2, 3, and 4 on the 0–4 NRS PGIS for SOB. In general, interviewees considered a severity of 1 mild, something “I can feel,” occurring with various activities but easily controlled and not terribly limiting. With increasing severity, in general, SOB was present with lower-demand activities, forced patients to stop and catch their breath more frequently, and lengthened recovery time.

Table 2.

Quotes from interviewees about dyspnea severity ratings on a 0–4 scale

When shortness of breath is rated a ____…
  1 2 3 4
24/7 O2 use
  • I wouldn’t fight to breathe. I would be okay.

  • …just get a little shortness breath, and then it’s gone. Then it’s… under control… But a 1 doesn’t feel bad. It’s just kind of that warning signal to slow down a little bit, but I can still maneuver…

  • But if I’m huffing and puffing and filling a cart with things and carrying it up to the building… if I’m walking and talking, I can get short of breath…

  • …you can still carry on even though you feel short of breath.

  • …you take out the trash… and then you start back and you go to suckin’ wind… I have to stop and go back to breathing before I can do anything else…

  • …went into [store] … and I had a cart, and I put the [liquid oxygen] … in the cart… and I had to stop every, say, a hundred feet or so to… to settle myself down… that was a 2.

  • Probably if I had to do box breathing for more than 10 minutes or 15.

  • I can’t do a lot… just doing simple stuff, it’s hard to breathe… just getting up and walking in the house, prepping all the vegetables… peeling, you know, washing, peeling…

  • …you’re just miserable, and it doesn’t matter how much you turn the air up or turn it down it. It don’t help.

  • …gotta stop and take a deep breath pretty frequently… can’t walk 50 yards… without stopping… when I’m working or doing some chores around the… house.

  • …longer period that I cannot catch a breath.

  • I would have to stop… and I’d be leaned over…

  • …take me an hour to get ready. and I would have to sit down between every activity of daily life washing my face… sit down to brush my teeth, to dry off. I just have to rest in between every little thing…

  • I’m thinking, well, I’m taking in more oxygen. I should be recovering faster…

  • …divide up a chore into specific spots… maybe only go halfway to the mailbox and… and lean on a ski pole or something, but…

  • …and a 4, I would be disturbing everybody around me with my gasping…

  • …inability to breathe [and having] panic… so I would say, it’s the loss of reasoning and causing one to go into fear mode.

  • …sometimes my recovery can take almost a half an hour…

No daytime O2 use
  • But, you know, I don’t feel like it’s super bad. I mean, I can feel it…

  • …I’ve gotta stop and catch my breath.

  • …go up the steps a couple of times, and I’m like, Oh, I gotta… I guess… Hold on just a minute… I gotta catch my breath…

  • A little labored on my breathing when we do the walk… It’s up and down and and… and… we’ll take our breaks.

  • …a 1, it would have to be like every time I, you know, take a walk or anything I get out of breath… sometimes when I go up the steps, I get out of breath.

  • [it would feel like walking] much faster so that would probably take me… I don’t often outpace myself much, but I would guess it would be about an 88 [saturation] … uh, where I would be put me in the 2 level.

  • I’m having occasional symptoms at times when I’m not feeling like I’m breathing like I should be.

  • …as soon as I climb the little bit of a hill before I hit the smooth part of the walk… um… I might have to just stand and rest…

  • …bad, but no [wouldn’t need] oxygen.

  • … just going up the stairs once or twice, and feeling like… Oh, that’s… that’s a lot of exertion… That’s probably where 3 would be.

  • I’ll get stuff and cover plants to do all this, and it can be pretty strenuous, and I can outrun my, uh [body’s], oxygen supply.

  • like I’m having more difficulty getting my breath, and then the numbers are probably… uh, matching that… constantly having to stop and concentrate to breathe and make sure my numbers are there.

  • …eating breakfast this morning, I just felt shorter of breath than I would normally… activities such as bending over, I get short of breath… the normal activities of daily living are all limited, you know, going up and down stairs…

  • I would say 4 is severe, and I would be on oxygen…

  • …it’s like I’m gasping… to my mind… I need to have oxygen.

  • Let’s say I had to climb 5 flights of stairs at a relatively rapid rate.

  • 4 is, I can’t keep them [saturation] up. I need oxygen.

  • …if I walk across the room and I get short of breath.

  • …just normal activities, you know, going to the grocery store, to the gym, where I just couldn’t do the activity or walk that I would have to sit down…

  • …[felt like] I was trying to choke air.

Change in SOB Severity

Survey

For the PGIC item for SOB—“How would you rate the severity of your shortness of breath now compared with 3 months ago?”, with seven response options (much/moderately/minimally worse, no different, minimally/moderately/much better)—worsening was reported by a greater number of O2 users than nonusers (P = 0.001; Figure E1). Table E2 shows the biased relationship between current SOB severity and PGIC ratings: respondents with more severe SOB currently are more likely to respond that SOB had worsened during the previous 3 months.

Interviews

Interviewees were presented the same PGIC item for SOB severity as the survey and asked to read it out loud and “think aloud” as they decided on their responses. Just more than half of interviewees recalled thinking about the exact month it was 3 months before the interview (December; “Well, I’m going back to the holidays…”). Despite the wording of the question, the others did not call to mind a specific time frame; one used “just a general feel.” Although some interviewees reflected on a specific activity they performed routinely and used it as a barometer of change over time—“I’m thinking of examples like… how far can I walk the dog”—most, including those who recalled the specific month, did not. A few interviewees recalled their pulmonary function tests: “3 months ago… the [pulmonary function tests] or whatever… they were, in that period, more difficult”; some reflected on the amount of oxygen they were using (“I go by how much air I’m using”); or data from their exercise equipment (“So… over that time period, I’ve increased my pedaling rate, and I have noticed that my heart rate seems to be to be better”).

Meaningfulness of Change in SOB Severity

Survey

The survey presented a hypothetical scenario: imagine, at baseline and again 3 months later, you were asked to use the 0–4 scale to respond to the question, “over the last week, how severe has your shortness of breath been?” This was followed by questions of how many points would ratings have to differ (between baseline and 3 months) for them to consider it a meaningful improvement or worsening. On the 0–4 NRS, nearly half of respondents considered 2 points meaningful for improvement (n = 135; 47.2%) and worsening (n = 130; 45.5%) (Table 3). Among all respondents, the weighted κ-value for agreement between responses for what constitutes meaningful improvement and worsening was 0.62 (95% confidence limits, 0.54–0.70). Agreement was the same for subgroups stratified on O2 use (users, 0.60; nonusers, 0.63).

Table 3.

Meaningfulness of 3-month change in shortness of breath patient global impression of severity with response options of 0–4 or “not at all” to “very severe”

Response options Over the Last Week, How Severe Has Your Shortness of Breath Been? 3 months Over the Last Week, How Severe Has Your Shortness of Breath Been?
0 1 2 3 4
Meaningful worsening 1 point 2 points 3 points 4 points
 All (N = 286) 104 (36.4%) 130 (45.5%) 39 (13.6) 13 (4.6)
 O2 users (n = 125) 46 (36.8%) 54 (43.2%) 18 (14.4%) 7 (5.6%)
 No O2 (n = 161) 58 (36.0%) 76 (47.2%) 21 (13.0%) 6 (3.7%)
Meaningful improvement 1 point 2 points 3 points 4 points
 All (N = 286) 97 (33.9%) 135 (47.2%) 42 (14.7%) 12 (4.2%)
 O2 users (n = 125) 35 (28.0%) 55 (44.0%) 28 (22.4%) 7 (5.6%)
 No O2 (n = 161) 62 (38.5%) 80 (49.7%) 14 (8.7%) 5 (3.1%)
Response options Not at all  Mild  Moderate  Severe  Very severe
Meaningful worsening 1 category 2 categories 3 categories 4 categories
 All (N = 279) 139 (49.8%) 100 (35.8%) 26 (9.3%) 14 (5.0%)
 O2 users (n = 122) 61 (50.0%) 38 (31.2%) 14 (11.5%) 9 (7.4%)
 No O2 (n = 157) 78 (49.7%) 62 (39.5%) 12 (7.6%) 5 (3.2%)
Meaningful improvement 1 category 2 categories 3 categories 4 categories
 All (N = 279) 116 (41.6%) 118 (42.3%) 31 (11.1%) 14 (5.0%)
 O2 users (n = 122) 46 (37.7%) 50 41.0%) 17 (13.9%) 9 (7.4%)
 No O2 (n = 157) 70 (44.6%) 68 43.3%) 14 (8.9%) 5 (3.2%)

“O2 users” indicates respondents who used supplemental oxygen during the day; “no O2” indicates no O2 use or use during sleep only. For the 0–4 numeric rating scale, the survey stated that “0” means “not at all” and higher numbers correspond to greater severity. Among all respondents, the weighted κ for agreement between responses for meaningful improvement and worsening shortness of breath severity (e.g., 1-category for improvement and worsening, 2 categories for improvement and worsening) was 0.81 (95% confidence limits, 0.74–0.88). And, among all respondents, the weighted κ for agreement between responses for what constitutes meaningful improvement and worsening was 0.62 (95% confidence limits, 0.54–0.70). Agreement was the same for subgroups stratified on O2 use (users, 0.60; nonusers, 0.63).

A majority of survey respondents (70–77%) affirmed that it did not matter where the baseline response was (e.g., 0, 1, 2, 3, 4): over 3 months, all 1-point improvements or worsenings in SOB were equally meaningful (e.g., a 3-month worsening from 0 to 1 carries the same meaning as a worsening from 3 to 4). The same was true for 2- and 3-point changes on the 0–4 NRS for SOB.

When the survey posed the same hypothetical scenario, but instead of 0–4, the response options were “not at all,” “mild,” “moderate,” “severe,” and “very severe,” most respondents considered 1-category worsenings, but 2-category improvements, meaningful (Table 3). Among all respondents, the weighted κ-value for agreement between responses for meaningful improvement and worsening SOB severity (e.g., 1 category for improvement and worsening, 2 categories for improvement and worsening) was 0.81 (95% confidence limits, 0.74–0.88). The greatest number of respondents in any off-diagonal cell was 28 (10%): they considered a 1-category worsening meaningful but required a 2-category improvement for it to be meaningful. Three quarters of respondents considered all 1-category worsenings equivalent; the same was true for all 2- and 3-category worsenings. However, as with the 0–4 NRS, a number of survey respondents stated that not all 1-category worsenings are the same, as exemplified by one survey respondent on a write-in item: “any jump to [very severe] is worse than any other increase.” In a multivariable logistic regression model, there was no association between sex, O2 use, diagnosis (IPF yes/no), or age and a response of nonequivalence among all 1- or 2-point or 1- or 2-category worsenings (Table E3).

Interviews

In contrast to the previous findings, five of the six interviewees (and as many as 30% of survey respondents) considered a 1-point worsening on the 0–4 NRS PGIS for SOB meaningful but said the weight of meaningfulness depended on the location on the scale. Generally, any worsening (1- or 2-point) over 3 months that landed them closer to a score of 4 was more meaningful (e.g., change from 2 to 3 was more meaningful than from 1 to 2). From the interviews: “The higher the numbers go… I’m getting closer to not being able to breathe at all”; “I would say the 2 to 4 would be a bigger jump than the 1 to 3 to me”; “The higher the number, the more severe. There’s, to me, there’s a big difference between 2 and 3 and 4.”

Cough Severity

Survey

In contrast to the 0–4 NRS for SOB, cough severity was categorized using a 0–5 NRS. A total of 33.3% of survey respondents categorized 1 and 2 as mild, 3 as moderate, and 4 and 5 as severe. A nearly equal proportion (32.2%) categorized 1 and 2 as mild, 3 and 4 as moderate, and 5 as severe (Figure 1). Nearly 70% of O2 users and 81% of nonusers rated the severity of cough absent or mild, and nearly 40% in each group rated the frequency of their cough during the previous week as occurring “sometimes” (Figure E2). The majority (82%) of respondents believed the positions of their mild-to-moderate and moderate-to-severe transitions on the 0–5 NRS for cough severity would be the same for a 0–5 NRS PGIS for fatigue/low energy severity.

Interviews

Table E4 shows interviewees’ descriptions of ratings of 1, 2, 3, and 4 on the 0–4 NRS PGIS for cough severity. (Note: this is different from the 0–5 NRS on the survey.) In general, interviewees considered cough severity of 1 mild, occurring occasionally, and noninterfering. Several interviewees suggested the “depth” of the cough (“coming up from deep within my lungs”) as, at least partially, driving the severity. Severe cough was one that occurred frequently, was interfering with activities, and was often continuous to the point that patients could not “talk, breathe, eat” or “grab a breath.” Table E5 shows the relationship between perceived cough frequency and perceived cough severity during the previous week.

Meaningfulness of Change in Cough Frequency

Survey

For the hypothetical 3-month change scenario for cough frequency—at baseline and 3 months later, “over the last week, how frequently have you coughed?”, with response options of “none,” “rarely,” “sometimes,” “often,” or “very frequently”—most respondents (Table 4) considered a 1-category change meaningful for improvement (45.0%) and worsening (48.2%); fewer considered a 2-category change meaningful for improvement (38.9%) and worsening (37.9%). The weighted κ-value for agreement between ratings for improvement and worsening was 0.77 (95% confidence limits, 0.67–0.86).

Table 4.

Meaningfulness of 3-month change in cough frequency patient global impression of severity from “not at all” to “very frequently” ordinal response scale response

  Over the Last Week, How Frequently Have You Coughed? 3 months Over the Last Week, How Frequently Have You Coughed?
Response options Not at all  Rarely  Sometimes  Often  Very frequently
Meaningful worsening 1 category 2 categories 3 categories 4 categories
 All (N = 280) 135 (48.2%) 106 (37.9%) 27 (9.6%) 12 (4.3%)
 O2 users (n = 123) 61 (49.6%) 44 (35.8%) 13 (10.6%) 5 (4.0%)
 No O2 (n = 157) 74 (47.1%) 62 (39.5%) 14 (8.9%) 7 (4.5%)
Meaningful improvement 1 category 2 categories 3 categories 4 categories
 All (N = 280) 126 (45.0%) 109 (38.9%) 29 (10.4%) 16 (5.7%)
 O2 users (n = 123) 52 (42.3%) 44 (35.8%) 17 (13.8%) 10 (8.1%)
 No O2 (n = 157) 74 (47.1%) 65 (41.4%) 12 (7.6%) 6 (3.8%)

“O2 users” indicates respondents who used supplemental oxygen during the day; “no O2” indicates no O2 use or use during sleep only. The weighted κ for agreement between ratings for improvement and worsening was 0.77 (95% confidence limits, 0.67–0.86).

Fatigue/Low Energy Severity

Survey

Most survey respondents rated the severity of their fatigue/low energy during the previous week mild or moderate (Figure E3).

Interviews

Table E6 shows interviewees’ descriptions of ratings of 1, 2, 3, or 4 on the 0–4 NRS PGIS for fatigue/low energy severity. In general, interviewees considered a severity of 1 as not interfering with things they liked or needed to do. Severe fatigue/low energy meant “[feeling] exhausted”; there was an emotional component at this level, with interviewees “not caring”; not feeling like “accomplishing anything”; or “just [wanting to] sit on the couch.”

Discussion

We interviewed and surveyed patients with various forms of ILD and learned how they describe, rate, and categorize symptoms and how they assess the meaningfulness of change in symptoms over a 3-month period. This information allowed us to generate first-ever MSD determinations for PGIS items for SOB and cough in this target population. Interviews confirmed that patients with all forms of ILD (IPF and non-IPF fibrosing ILD) are bothered by the same three symptoms and that the impact of those symptoms on wellbeing appears to not differ by diagnosis. To our knowledge, our interviews also provide, among other things, the largest set of qualitative data describing various levels of severity of SOB, cough, and fatigue/low energy in patients with ILD.

Through the survey, we learned from patients (the true ILD experts) that, for most, on an SOB PGIS item with a 0–4 NRS, over 3 months, a 2-point improvement or 2-point worsening is a meaningful change. However, for a SOB PGIS item with a 5-option ordinal response scale (ORS; from “not at all” to “very severe”), the MSDs were 2 categories for improvement and 1 category for worsening. Because significant proportions of respondents answered differently, if these PGIS items are used as anchors for PROM MSD estimation, it is reasonable to conduct analyses using 1- and 2-unit changes in the anchor as meaningful. For a cough frequency PGIS with a 5-option ORS (from “never” to “very frequently”), a 1-category change could be considered meaningful for improvement or worsening. As with SOB, a 2-category change could also be analyzed. Regardless of symptom, for any PGIS anchor, investigators must select one cutoff a priori for primary analyses and use any other cutoff(s) only for exploratory purposes. Because >80% of interviewees we asked, as well as a significant minority of survey respondents, said not all 1-unit worsenings were equivalent, it will be informative to show results for subgroups defined by baseline PGIS rating.

One drawback to using single-item PROMs (like PGI items or visual analogue scales) as endpoints themselves is loss of granularity. For example, what drives cough severity ratings? Survey data suggest a relationship between cough frequency and severity, but it is clear from the interview data that even infrequent episodes of cough that induce SOB likely contribute. Because we cannot apply numerals to ORS options—the “distance” between ORS response options is likely different from the distance between consecutive numbers—we could not conduct statistical analyses. However, the data in Table E4 show a convincing but not perfect relationship between cough frequency and severity.

In theory, PGIC responses should be equally (but oppositely) correlated with past and present status, but this is rarely the case. Most often, PGIC responses exhibit present-state bias (8, 9): they have an inappropriately strong correlation with the present state and a far lower-than-expected correlation with past state. Our results confirmed present-state bias in SOB PGIC and—particularly when combined with the somewhat haphazard way many interviewees appeared to develop their responses in the think-aloud exercise—would argue strongly against the use of PGICs as anchors in MSD analyses in ILD.

The interview and response data generated here will support the move to greater patient centeredness in trial endpoints (10). What will propel such advancement is increased confidence that candidate PROM endpoints have scores that are reliable, valid, and responsive to change. Even more important is improved interpretability of how changes in a PROM’s scores relate to patients’ experiences (2). Several methods can generate such interpretability, but the one used most involves relating a PROM’s scores to other trustworthy outcome assessment tools (i.e., anchors) whose MSDs have already been established. For ILD, PGIS items for SOB, cough, and fatigue/low energy are attractive anchors because they are brief, easily understood, capture important concepts, and can be administered in various formats at times when other study data are collected. Until now, sponsors and ILD investigators using PGISs in their trials were required to assume what score change is meaningful to patients. Those assumptions no longer need to be made: the experts have spoken.

Our study has limitations. Most respondents were from the United States, and we received relatively few surveys from patients with non-IPF diagnoses. The lack of heterogeneity precluded subgroup analyses by country. The interviews and surveys were not identical, but there was extensive overlap and the responses were generally similar, supporting the internal validity of the results. We cannot assess the fidelity of self-reported demographic and clinical information provided by survey respondents, nor can we confirm that any respondent turned in only one survey. We did not ask for ethnicity and/or race, which was a missed opportunity.

Because interviews were conducted over Zoom and the survey was delivered electronically, only patients with access to the internet could participate. Recruiting through patient advocacy groups selected for a particularly engaged patient group, likely with high health literacy and perhaps a keen understanding of complex terminology associated with ILD; a broader advertisement campaign could have mitigated this bias. The survey was in English, so only English speakers/readers could respond. Many patients are used to filling out questionnaires about symptoms but not questions about response options or items that require intense reflection, as many of our items did; this could have influenced the response rate and created an influential response burden. All of these raise concern about whether the results would apply to the universe of patients with ILD. Nonetheless, more 300 participants, including 18 interviewees, is a nice-sized sample.

An inductive approach to the qualitative analysis, with organic and open development of codes and themes (using multiple coders), could generate a rich conceptual framework for ILD symptom severity, but that was not the intent of our analysis. We did not administer other PROMs to use as anchors for respondents’ meaningfulness thresholds; however, we believe the patients’ responses are the gold standard for assessing meaningfulness. The analysis for present-state bias of the PGIC could have been enhanced by collecting PGISs at baseline and then resurveying to collect PGIS and PGIC 3 months later.

Our results should not be viewed as valid for other PGI wordings, time frames, or patients outside the target population. For example, although the modified Medical Research Council dyspnea scale could be considered a 0–4 PGIS for dyspnea and is likely used by many ILD practitioners in the clinic, additional research would be needed to determine whether it is a suitable dyspnea anchor and what its MSD is. Recognizing these limitations, we believe our results advance the field to a new level of understanding of patients’ perceptions of symptom severity and meaningfulness of change. Investigators now have disease-specific guidance to incorporate in the planning of any studies in which PROMs and PGIS items are included. Future research could target diagnostic subgroups (to determine whether there are differences between them) and patients from other countries and cultures to assess whether our results hold there.

Acknowledgments

Acknowledgments

The authors thank the Pulmonary Fibrosis Foundation, PF Warriors, Tam Corte in Australia, Chris Ryerson in Canada, and Phil Molyneaux in England for their assistance in distributing the survey and to all the patients who completed it.

Footnotes

Author Contributions: Conceptualization: J.J. Swigris and K.I.A. Analysis: J.J. Swigris and J.B.P. Interpretation of data: J.J. Swigris, J.B.P., K.I.A., T.A.G., and J.J. Solomon. Initial draft of the manuscript: J.J. Swigris. Critical input, review, editing, and approval of the final draft: J.J. Swigris, J.B.P., K.I.A., T.A.G., and J.J. Solomon.

This article has a data supplement, which is accessible at the Supplements tab.

Author disclosures are available with the text of this article at www.atsjournals.org.

References

  • 1. Wijsenbeek M, Molina-Molina M, Chassany O, Fox J, Galvin L, Geissler K, et al. Developing a conceptual model of symptoms and impacts in progressive fibrosing interstitial lung disease to evaluate patient-reported outcome measures. ERJ Open Res . 2022;8:00681-2021. doi: 10.1183/23120541.00681-2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.U.S. Department of Health and Human Services and the Food and Drug Administration (CDER) Guidance for industry, Food and Drug administration staff, and other stakeholders: patient-focused drug development: incorporating clinical outcome assessments into endpoints for regulatory decision-making. Silver Spring, MD: U.S. Food and Drug Administration; 2023. [Google Scholar]
  • 3. Peipert JD, Hays RD, Cella D. Likely change indexes improve estimates of individual change on patient-reported outcomes. Qual Life Res . 2023;32:1341–1352. doi: 10.1007/s11136-022-03200-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Terluin B, Eekhout I, Terwee C, De Vet H. Minimal important change (MIC) based on a predictive modeling approach was more precise than MIC based on ROC analysis. J Clin Epidemiol . 2015;68:1388–1396. doi: 10.1016/j.jclinepi.2015.03.015. [DOI] [PubMed] [Google Scholar]
  • 5. Terluin B, Eekhout I, Terwee CB. The anchor-based minimal important change, based on receiver operating characteristic analysis or predictive modeling, may need to be adjusted for the proportion of improved patients. J Clin Epidemiol . 2017;83:90–100. doi: 10.1016/j.jclinepi.2016.12.015. [DOI] [PubMed] [Google Scholar]
  • 6. Wyrwich KW, Norman GR. The challenges inherent with anchor-based approaches to the interpretation of important change in clinical outcome assessments. Qual Life Res . 2023;32:1239–1246. doi: 10.1007/s11136-022-03297-7. [DOI] [PubMed] [Google Scholar]
  • 7. Devji T, Carrasco-Labra A, Qasim A, Phillips M, Johnston BC, Devasenapathy N, et al. Evaluating the credibility of anchor based estimates of minimal important differences for patient reported outcomes: instrument development and reliability study. BMJ . 2020;369:m1714. doi: 10.1136/bmj.m1714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Guyatt GH, Norman GR, Juniper EF, Griffith LE. A critical look at transition ratings. J Clin Epidemiol . 2002;55:900–908. doi: 10.1016/s0895-4356(02)00435-3. [DOI] [PubMed] [Google Scholar]
  • 9. Norman GR, Stratford P, Regehr G. Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach. J Clin Epidemiol . 1997;50:869–879. doi: 10.1016/s0895-4356(97)00097-8. [DOI] [PubMed] [Google Scholar]
  • 10. Aronson KI, Danoff SK, Russell AM, Ryerson CJ, Suzuki A, Wijsenbeek MS, et al. Patient-centered outcomes research in interstitial lung disease: an official American Thoracic Society Research Statement. Am J Respir Crit Care Med . 2021;204:e3–e23. doi: 10.1164/rccm.202105-1193ST. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Annals of the American Thoracic Society are provided here courtesy of American Thoracic Society

RESOURCES