Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Aug 1.
Published in final edited form as: J Card Fail. 2010 Apr 8;16(8):659–668. doi: 10.1016/j.cardfail.2010.03.002

Development and validation of a computer adaptive test for measuring dyspnea in heart failure

Bernice Ruo 1, Seung Choi 2, David Baker 1, Kathleen Grady 3, David Cella 2
PMCID: PMC2913136  NIHMSID: NIHMS188643  PMID: 20670845

Abstract

Background

Dyspnea is a common symptom among patients with heart failure. Currently, there is no standardized, rapid, precise method to assess dyspnea.

Methods and Results

From a review of the literature, we pooled questions from various questionnaires assessing dyspnea. 201 patients with heart failure completed all questions in the preliminary item bank. Each item asks how much shortness of breath the patient had when doing an activity. Medical charts were reviewed for hospitalization within 1 or 3 months of completing the questions. We created a dyspnea item bank of 44 items. Computer Adaptive Tests (CAT) generated from this item bank can assess dyspnea by administering on average 10 questions. Simulation CAT scores were generated to compare with the item bank scores. The CAT scores had a correlation of 0.98 with item bank scores. Logistic regression models predicting the probability of being hospitalized from the dyspnea score were statistically significant (p<0.05). A 5-point score increase was associated with a 32% increased odds of hospitalization in1 month and a 20% increased odds of hospitalization in 3 months.

Conclusions

This computer based tool for dyspnea assessment obtains similar precision to that of answering the entire dyspnea item bank with less patient burden.

BACKGROUND

Dyspnea is the symptom most commonly reported by patients with heart failure (HF).1, 2 Ideally, we would like to be able to quickly and accurately assess dyspnea. Despite many advances in patient-reported outcomes research, a need exists for a user-friendly, brief, and precise instrument to measure changes in dyspnea. Better symptom assessment could help: 1) identify HF patients at high risk for hospitalization for targeted intervention such as disease management programs; and 2) improve assessment of treatment efficacy such as in the evaluation of outcomes in clinical trials.

Patient-reported outcomes, i.e. symptom status, functional ability, and perceived well-being, reflect broadly those endpoints that are derived from patients and reflect the patients’ perspective on their health. Many aspects of patients’ subjective experience, such as symptom severity and frequency, are important targets for disease interventions that are not measured by radiographic, laboratory or other clinical data. Patient-reported outcome measurement is particularly important in clinical trials, wherein a lack of traditional clinical improvement may not reflect a meaningful benefit that could be captured by accurate patient-reported outcomes assessment. In recognition of the importance of patient-reported outcomes, the NIH created a roadmap initiative for the assessment of patient-reported chronic disease outcomes and the creation of a network, Patient-Reported Outcomes Measurement Information System (PROMIS).3

There are several methods of assessing HF symptoms.48 However, each has its limitations (Table 1) for assessing dyspnea. Health-related quality of life instruments have prognostic significance since poorer scores are associated with increased risk of hospitalization or death.9, 10 Since they do include a few questions to assess symptoms, it is possible that reducing symptoms such as dyspnea may reduce adverse events.

Table 1.

Examples of current HF or dyspnea assessment tools

Assessment tool Brief description Limitations
New York Heart Association classification4 4 broad categories of functional status Provider determined rather than patient reported. Does not specifically assess dyspnea
Baseline Dyspnea Index 8 3 questions assessing functional impairment, magnitude of task & effort Having seven response categories creates less reproducibility
Modified Medical Research Council Dyspnea scale 20 Categorizes patients into one of five categories Each category has a wide range of variation
Oxygen Cost Diagram21 A visual analogue scale with intermediate descriptors High time burden required for explanation & not user-friendly
Chronic HF Questionnaire dyspnea subscale6 Select 5 most important activities from a list and rate level of dyspnea for each. Difficult to perform inter-subject comparisons
Minnesota Living with Heart Failure questionnaire7 “Did your heart failure prevent you from living as you wanted during the last month by making you short of breath?” Includes only one question focusing specifically on dyspnea
Kansas City Cardiomyopathy questionnaire (select questions)5 Includes questions regarding activity limitations by heart failure Some questions do not distinguish between shortness of breath and fatigue

As health-related quality of life is a complex construct of many components, changes in symptoms may be undetected. With these instruments which report an overall score, having a poor health-related quality of life does not imply that the patient has specific symptoms.11 The detection of symptom changes is important for clinical decision making in order to adjust diuretic dosing or to determine whether an intervention reduces symptoms.

Current questionnaires that address HF symptoms do not help distinguish among the different symptoms of HF. For example, the Chronic Heart Failure questionnaire asks how much certain activities are limited “due to heart failure.” 6 Therefore, it is difficult to understand whether a patient is limited by shortness of breath, fatigue, or emotional distress.

Barriers to utilizing current measures of symptoms of HF include the imprecision and length. Surveys must include a large number of questions to precisely measure symptoms among patients with wide ranging disease severity. It is difficult to find the balance between including a full range of questions to obtain a more precise estimate versus the practical appeal of a brief questionnaire. In addition to creating respondent burden, lengthy questionnaires can contribute to problems with missing data.

The most efficient survey tools are computer adaptive tests.12, 13 Computer adaptive testing (CAT) can obtain a precise measurement in a short time as the selection of subsequent questions is influenced by the individual’s responses to the prior questions. This allows for questions to be tailored to each patient’s ability level.

Questions for computer adaptive tests (CATs) are drawn from an item bank. An item bank contains questions focusing on one specific content area, such as dyspnea. Item banking uses an item response theory based method to select questions from multiple questionnaires and organizes them by placing them on a common scale.11

In CAT, questions are administered until either a specified standard error is reached or a pre-set maximum number of questions have been administered. This typically results in a test that is much shorter than paper-administered tests.12, 13 On average, about 50% of testing time can be eliminated using CAT.12

CATs generated from a pre-calibrated item bank can give a score that is comparable to the score obtained from the entire set of questions in the item bank.14 Thus, CAT allows for the administration of fewer questions, while maintaining measurement precision. Item response theory has been applied to health status measurement15 with subsequent studies supporting the expectations that item response theory would reduce patient burden16 and that item banks would enhance measurement precision.17

Our goal in this project was to create an item bank based on physical activities to assess dyspnea in a standardized fashion and to employ CAT to maximize information gathered while minimizing patient burden.

METHODS (Figure 1)

Figure 1.

Figure 1

Outline of steps used to develop a dyspnea computer adaptive test

Each participant gave informed consent and the study protocol was approved by the Northwestern University institutional review board.

Developing the initial dyspnea item bank

Item development for the dyspnea item bank began with a list of activities being developed for dyspnea assessment in chronic obstructive pulmonary disease.18, 19 In brief, this list of activities was generated from review of various questionnaires about dyspnea in the pulmonary literature. Activities were initially categorized (“binned”) into conceptually similar groups. These “bins” were used to group highly redundant questions within larger concepts. Highly redundant items were thus clustered together to enable selection of most desirable components relating to description of the item context, linguistic expression, and response options. Next, highly redundant items within bins were sorted out (or “winnowed”) by the item review panel, who selected from among the available items in each bin, that which offered the clearest expression of the content. Study and discussion of bins at this point sometimes led to item wording modification, and several new items were written to fill conceptual gaps. After a complete round of winnowing, the item review panel re-examined items for redundancy, clarity, and translatability.19

For our preliminary item bank, we started with this pooled and refined list of activities since they encompass the range of physical activities that an individual might do in the course of a usual day. We believe that these physical activities would not differ by disease process. To confirm this, we reviewed these items with patients with heart failure and heart failure experts and completed separate calibration and creation of item bank for dyspnea for patients with heart failure.

To do this, we first verified that these activities encompassed those included in the Baseline Dyspnea Index,8 the modified Medical Research Council Dyspnea scale,20 the Oxygen Cost Diagram,21 the Minnesota Living with Heart Failure questionnaire,7 Chronic Heart Failure Questionnaire dyspnea subscale,6 and the Kansas City Cardiomyopathy questionnaire.5 We added one additional activity, “sleeping.” Then, to ensure that regular activities performed by patients were captured in the preliminary set of questions in the initial dyspnea item bank, we conducted one-on-one patient interviews with 20 heart failure patients asking if the activities were relevant and if there were additional activities that they commonly did that were not listed. If more than half the patients identified an activity as of little or no relevance, the activity was eligible for elimination pending the results of the expert review. We also verified comprehension of the question stem and response options. These patients were recruited from the Heart Failure clinic at Northwestern Medical Faculty Foundation. We interviewed adult patients of varying age, gender, race/ethnicity, and disease severity.

Five expert clinicians who treat patients with HF reviewed the list of items to assess of content validity and adequate breadth of coverage. They were each asked which activities they considered most important to ask. Regarding the activities to include in the preliminary item bank, we were not looking for agreement but rather to make sure we didn’t eliminate any questions deemed less relevant by patients but of importance to clinicians. In this development phase, if any expert marked the activity as among the most important to ask, then we retained the activity in the preliminary item bank for the next phase of testing even if the patients had rated the activity as having little or no relevance. We chose this method to allow for the broadest inclusion of activities for subsequent phase of field testing. In addition, experts were asked if any additional activities should be added and for suggestions regarding the wording of the question and response options.

Field testing to develop dyspnea item bank

Study site, subjects, and sample size

Participants were recruited from the Heart Failure clinic at Northwestern Medical Faculty Foundation and the Cardiology clinics at Evanston Hospital. Eligible participants consisted of adult patients (age ≥21) with systolic and/or diastolic HF. Besides direct physician referral to the study, we also identified potentially eligible participants as those with a diagnosis code of HF (ICD9 428.XX, 398.91, 402.01, 402.11, 402.91, 404.01, 404.03, 404.11, 404.13, 404.91, or 404.93) documented for prior visits or in their medical problem list in their medical records. After agreement by the patients’ physicians, eligible patients were invited by a research assistant to participate in the study when they came to clinic for their next scheduled appointment. The diagnoses and types of HF were confirmed by more comprehensive chart review of co-morbidities and left ventricular ejection fraction after patients consented to participate in the study. Non-ambulatory patients were excluded because a six-minute walk test was included as a measure of functional performance.

Prior studies have reported sample size requirements as few as 150 when using less complex item response theory models to construct item banks.22, 23 Thus, we aimed to recruit at least 200 participants.

Data collection from heart failure patients

Preliminary item bank

Each participant was asked, “Over the past 7 days, how short of breath did you get with each of these activities? (no shortness of breath; mildly short of breath; moderately short of breath; severely short of breath; I did not do this activity in the past 7 days). Patients who responded “I did not do this activity in the past 7 days” then were asked why they did not do this activity during the past 7 days. (Figure 2) The response choice of “I have stopped trying or knew I could not do this activity because of shortness of breath” was considered equivalent to the most severe response category: severely short of breath. We asked participants whether questions were clear and if no, which question and why.

Figure 2.

Figure 2

Question and response choices for dyspnea item bank activities

The research assistant read each question aloud and then recorded the patient’s responses using a hand-held tablet computer to facilitate data entry.

Overall shortness of breath question

Patients were asked to rank their shortness of breath on a scale of 0 to 10, where 0 is no shortness of breath and 10 is the worst shortness of breath imaginable.

Functional performance: Six-minute walk distance

In accordance with a standardized protocol24, participants were instructed to walk up and down a 100-foot hallway for six minutes, covering as much distance as possible during that time. The distance (in feet) walked during the six minutes was recorded.

Medical chart review

The clinics at Northwestern Medical Faculty Foundation and NorthShore University HealthSystem use an electronic medical record (Epic; Epic Systems Corporation; Madison, Wisconsin). Using the electronic medical record, we abstracted medication lists and echocardiographic data. A research study coordinator performed the chart review blinded to dyspnea score. We also determined whether each participant had been hospitalized within 1 or 3 months of enrollment in the study by reviewing documentation of any inpatient stays in the primary hospital of record. Any inpatient stay listed under the “encounters tab” in the electronic medical record was considered a hospitalization.

Statistical methods

Item response theory model selection

We used a two-parameter model, the graded rating scale model25. Since all the questions in the dyspnea item bank have a single set of response choices, the graded rating scale model is efficient, especially for sample sizes in the hundreds. It assumes that the distances between response choice thresholds (e.g. moderate shortness of breath vs severe shortness of breath) are similar for all questions. For each question, the model estimates one location parameter and one discrimination parameter. The location parameter places each question on the dyspnea continuum, where lower scores represent dyspnea with more exertion and higher scores represent dyspnea with less exertion. The discrimination parameter measures the ability of the question to separate patients into different dyspnea levels.

Data analysis to create item bank and CATs

Psychometric properties for each item and the total scale were determined using conventional means (Cronbach’s alpha and item-total correlations) and item response theory based statistics.26 Parscale27, a computer software package for rating scale analysis, was used for the graded rating scale analyses and the creation of the item bank. Dyspnea item bank scores were created using T-score units with a mean of 50 and SD of 10. We evaluated the concurrent validity of the dyspnea item bank score by examining the correlation with the 6-minute walk distance and the overall shortness of breath question responses.

Based on an algorithm28 developed by the Center On Outcomes, Research and Education at NorthShore University HealthSystem, one can use software to program a computer to administer questions from an item bank to participants. The algorithm employs a Bayesian estimation procedure to calculate the trait estimate (in this case dyspnea level) given the items answered. The algorithm also considers assessment length. CAT is typically terminated by a combination of stopping rules; when a specified level of standard error is reached; when a pre-specified maximum number of items has been administered; or when there are no remaining items that will contribute additional information to score estimation. The minimum standard error stopping criterion was set at 0.3. The minimum and maximum numbers of items to administer were set equal to 5 and 15, respectively.

Analysis of association with outcomes

We used correlations to examine the relationship between dyspnea item bank score and 6 minute walk distance. We used analysis of variance (ANOVA) to compare mean dyspnea scores between patients who were or were not hospitalized in the 1 or 3 months following completion of the questions. We examined the magnitude of association between the dyspnea item bank score and hospitalization in 1 or 3 months using unadjusted logistic regression.

CAT simulation

We conducted a post-hoc CAT simulation using the data collected from the patients recruited for the development of the dyspnea item bank. The simulation is the score that would have been obtained based on the questions for the CAT that would have been selected for that individual using the responses that individual provided. We examined the number of items administered by CAT and compared the simulated CAT score with the item bank score for each patient.

RESULTS

Dyspnea item bank development

We pooled and winnowed questions from various questionnaires resulting in 59 activities. From one-on-one patient interviews, we identified 5 activities (entertaining friends at home, dining out, visiting friends, washing your face, raising arms overhead) with median of ≤1 (0=not relevant, 1= a little relevant, 2= somewhat relevant, 3= very relevant). One of these activities (washing your face) was named by one expert clinician as important. Thus, the other four activities were removed and 55 activities were retained. No additional activities were identified from the patient interviews or by expert clinician input.

For item bank calibration, we recruited 201 patients with heart failure. Their average age was 59 (SD=14) and 50% were female; 49% were White and 42% were African American. (Table 2) More than half of the patients had left ventricular ejection fractions of less than 40% on their echocardiograms. The vast majority of participants had beta-blockers (89%) and angiotensin converting enzyme inhibitors (69%) on their medication lists. The six minute walk distances ranged from 77 to 2154 feet with a median of 1204 feet.

Table 2.

Patient characteristics (n=201)

% (unless otherwise specified)
Age, range (mean, SD) 23–88 (59,14)
Female 50%
Race/Ethnicity
 White 49%
 Black 42%
Co-morbidities (n=196)
 Hypertension 61.2%
 Diabetes mellitus 26.0%
 Coronary disease 39.7%
 Pulmonary disease 16.8%
 Osteoarthritis 12.8%
 Peripheral vascular disease 2.5%
Hematocrit (n=187)
 Normal range (≥ 34) 75.9%
 Slightly low (30–33.9) 20.3%
 Low (< 30) 3.7%
Glomerular filtration rate (GFR) (n=195)
 ≥ 60 56.4%
 40–59.9 28.2%
 <40 15.4%
Left ventricular ejection fraction< 40% (n=185) 63.5%
Medications (n =196)
 Beta-blocker 90.8%
 ACE inhibitor or angiotensin receptor blocker 85.2%
 Diuretic 65.3%
 Digoxin 16.3%
 Aspirin 51.0%

During item bank calibration analyses, we dropped 11 items. These activities were removed because either 1) more than 50% of patients did not do these activities for reasons other than shortness of breath; 2) not all response categories were represented; or 3) the item did not fit in the dyspnea item bank continuum that was generated.

The final item bank used to generate the computer adaptive test includes 44 activities (Table 3). The activities in the final item bank range from the activity reflecting dyspnea only with high levels of exertion (walking faster than your usual speed for more than 1 mile) to the activity reflecting dyspnea at low levels of exertion (sleeping). Higher dyspnea item bank scores reflect symptoms at lower levels of exertion.

Table 3.

Dyspnea item bank scores and corresponding level of dyspnea with various activities

Activity (in order of decreasing level of exertion) Item bank score
25 30 35 40 45 50 55 60 65 70 75 80 85 90
Walking faster than your usual speed for >1 mile 1 2 2 3 4 4 4 4 4 4 4 4 4 4
Walking faster than your usual speed for 1 mile 1 1 2 2 3 4 4 4 4 4 4 4 4 4
Walking faster than your usual speed for 1/2 mile 1 1 2 2 2 3 4 4 4 4 4 4 4 4
Lifting more than 20 pounds 1 1 1 2 2 3 4 4 4 4 4 4 4 4
Walking 1 mile on flat ground without stopping 1 1 1 2 2 2 4 4 4 4 4 4 4 4
Walking up 20 stairs without stopping 1 1 1 2 2 2 4 4 4 4 4 4 4 4
Carrying something weighing 10–20 pounds 1 1 1 1 2 2 3 4 4 4 4 4 4 4
Hurrying such as to catch a bus 1 1 1 1 2 2 3 3 4 4 4 4 4 4
Walking faster than your usual speed for 50 steps 1 1 1 1 2 2 3 4 4 4 4 4 4 4
Lifting 10–20 pounds 1 1 1 1 2 2 3 4 4 4 4 4 4 4
Walking 1/2 mile on flat ground without stopping 1 1 1 1 2 2 3 3 4 4 4 4 4 4
Sexual activity 1 1 1 1 1 2 2 3 4 4 4 4 4 4
Walking up 10 stairs without stopping 1 1 1 1 1 2 2 3 4 4 4 4 4 4
Being angry or upset 1 1 1 1 1 1 2 2 3 4 4 4 4 4
Sweeping or mopping 1 1 1 1 1 1 2 2 3 4 4 4 4 4
Scrubbing the floor or counter 1 1 1 1 1 1 2 2 4 4 4 4 4 4
Walking 50 steps on flat ground without stopping 1 1 1 1 1 1 2 2 3 4 4 4 4 4
Picking up 5–10 pounds 1 1 1 1 1 1 2 2 3 4 4 4 4 4
Carrying something weighing 5–10 pounds 1 1 1 1 1 1 2 2 3 4 4 4 4 4
Walking up 5 stairs without stopping 1 1 1 1 1 1 2 2 3 4 4 4 4 4
Playing with children or grandchildren 1 1 1 1 1 1 2 2 2 4 4 4 4 4
Putting on socks or stockings 1 1 1 1 1 1 2 2 2 3 4 4 4 4
Shopping (such as for groceries) 1 1 1 1 1 1 2 2 2 3 4 4 4 4
Bending over 1 1 1 1 1 1 2 2 2 3 4 4 4 4
Making a bed 1 1 1 1 1 1 2 2 2 3 4 4 4 4
Talking while walking 1 1 1 1 1 1 1 2 2 3 4 4 4 4
Dressing yourself without help 1 1 1 1 1 1 1 2 2 3 4 4 4 4
Going out socially 1 1 1 1 1 1 1 2 2 3 4 4 4 4
Taking a shower 1 1 1 1 1 1 1 2 2 3 3 4 4 4
Taking a bath without assistance 1 1 1 1 1 1 1 2 2 3 4 4 4 4
Walking 10 steps on flat ground without stopping 1 1 1 1 1 1 1 2 2 3 3 4 4 4
Light home repair 1 1 1 1 1 1 1 2 2 2 4 4 4 4
Getting in or out of a car 1 1 1 1 1 1 1 2 2 2 3 4 4 4
Carrying something weighing less than 5 pounds 1 1 1 1 1 1 1 1 2 2 3 4 4 4
Picking up less than 5 pounds 1 1 1 1 1 1 1 1 2 2 3 4 4 4
Preparing meals 1 1 1 1 1 1 1 1 2 2 3 4 4 4
Washing dishes 1 1 1 1 1 1 1 1 2 2 3 3 4 4
Standing for at least 5 minutes 1 1 1 1 1 1 1 1 1 2 2 3 4 4
Attending religious services 1 1 1 1 1 1 1 1 1 2 2 2 4 4
Brushing teeth 1 1 1 1 1 1 1 1 1 2 2 2 3 4
Washing face 1 1 1 1 1 1 1 1 1 1 2 2 3 4
Working at a desk or table 1 1 1 1 1 1 1 1 1 1 2 2 4 4
Eating 1 1 1 1 1 1 1 1 1 1 1 2 4 4
Sleeping 1 1 1 1 1 1 1 1 1 1 1 2 4 4

(1= no shortness of breath; 2= mildly short of breath; 3=moderately short of breath; 4=severely short of breath)

The internal consistency reliability coefficient (Cronbach’s coefficient alpha) was 0.98. The item-total correlations ranged from 0.43 (eating) to 0.82 (going out socially) with a median of 0.72. The standard error is larger at the extremes of the scale because dyspnea item bank scores are reported on the T-score unit with a mean of 50 and SD of 10. Dyspnea item bank scores were highly correlated at 0.94 with the mean item scores.

The concurrent validity of the dyspnea item bank score, evaluated examining the correlation with two measures (the 6-minute walk distance and the overall shortness of breath question) was good. As expected, there was a moderate, negative relationship (r = −0.56, p value <0.001) between 6-minute walking distance and the dyspnea item bank scores (Figure 3). Generally, patients with higher dyspnea item bank scores walked shorter distances within the time allowed.

Figure 3.

Figure 3

Comparison of dyspnea item bank scores and six-minute walk distances

We compared scores on the single item question of overall shortness of breath with the dyspnea item bank score to examine the validity of the item bank score and the increase in precision compared to the single item. Due to sparse observations in the extreme categories, the top three categories on the overall shortness of breath scale (8, 9, and 10) were combined into one category. The overall shortness of breath question was strongly correlated with the dyspnea item bank score (r = 0.76, p value <0.001). Thus, approximately 58% of the variance in the dyspnea item bank score was explained by the single shortness of breath item. The single item question discriminated poorly between patients with a wide range of dyspnea as reflected by their item bank scores (Figure 4). For example, patients who rated their dyspnea as 3 on the single item question had item bank scores ranging from 35 to 70. One patient may rate his dyspnea as a 3 on a scale of 0 to 10 when he gets short of breath with walking one mile without stopping whereas another patient may also rate his shortness of breath as a 3 when he gets short of breath walking 10 steps.

Figure 4.

Figure 4

Comparison of dyspnea item bank scores and overall shortness of breath scores

This is a box-whisker plot where the solid circle represents the median. The box represents the interquartile range (IQR), the range of the 25th through 75th percentiles. The horizontal lines (the “whiskers”) extend to at most 1.5 times the box width from either or both ends of the box. They must end at an observed value, thus connecting all the values outside the box that are not more than 1.5 times the box width away from the box. The open circles represent outliers. Any data observation which lies more than 1.5 times IQR lower than the first quartile or 1.5 times IQR higher than the third quartile is considered an outlier.

The validity of the item bank scores was also supported by the relationship to hospitalization outcomes data. The mean dyspnea score for participants who were hospitalized during the month following completion of the dyspnea questions was significantly higher (56.8, SD 12.1, n=18) than that of participants who were not hospitalized within 1 month (49.4, SD 11.4, n=176, p<0.01). Similarly, those hospitalized within 3 months had higher dyspnea scores (53.9, SD 12.1, n=37) compared with those who were not hospitalized in that time frame (49.2, SD 11.4, n=157, p<0.05). Dyspnea score was a significant predictor of hospitalization at 1 or 3 months in logistic regression models (p<0.05). A one-point increase (i.e. 1/10 SD) in dyspnea score was associated with a 5.7% increased odds of hospitalization within 1 month and a 3.6% increased odds of hospitalization within 3 months. A five-point increase (i.e. ½ SD) in dyspnea score was associated with a 31.8% increased odds of hospitalization within 1 month and a 19.6% increased odds of hospitalization within 3 months. If a prospective study confirms these findings, this would be important prognostic information.

CAT simulation

The CAT simulation revealed a substantial reduction in test length without compromising measurement precision. The simulation showed that the CAT would have asked an average of 9.8 items, which is over 75% reduction from 44 items. Figure 5 shows an example of subsequent question selection based on the response selected. Figure 6 shows the scatter plot of the CAT scores and the corresponding dyspnea item bank scores. The dyspnea CAT scores were comparable to those based on all 44 items (r = 0.98, p value <0.001).

Figure 5.

Figure 5

Example of subsequent question selection depending on response to the prior question in CAT

Figure 6.

Figure 6

Comparison of dyspnea item bank scores and CAT scores

DISCUSSION

We created a dyspnea item bank consisting of 44 questions. The dyspnea item bank scores range from approximately 25–90 with higher scores representing dyspnea with less physical exertion. The dyspnea item bank scores have good concurrent validity as shown by good correlations with six minute walk distances and ratings of overall shortness of breath on a 0–10 scale. The dyspnea item bank scores are also strongly associated with risk for hospitalization in 1 or 3 months. Using this dyspnea item bank, we then developed a CAT to assess dyspnea. This brief, precise tool will select which questions to administer from the item bank, administer an average of 10 questions, and the resulting score has precision similar to that as if all 44 questions had been administered.

The dyspnea CAT is simpler than other dyspnea indices8,21 because of having the same question stem and response options. In addition, since the administration of the dyspnea CAT would take on average 10 questions, it is shorter than disease specific health-related quality of life questionnaires such as the Kansas City Cardiomyopathy Questionnaire and the Living with Heart Failure Questionnaire. Similar to disease specific health related quality of life questionnaires, the dyspnea CAT is moderately correlated with six minute walk distance. This is as expected since the dyspnea is a key symptom in heart failure and symptoms are a component of health related quality of life. It is important to note that the assessment of one is not meant to be used in place of another. The dyspnea CAT assesses more specific content, the symptom of dyspnea, rather than the broader issue of a quality of life. Thus, the dypsnea CAT could be complementary to assessing quality of life. If administered at the same time, it could help to identify whether a patient’s symptoms are what are driving poor quality of life as opposed to other factors such as social limitation.

The dyspnea CAT could be administered by different modalities such as a computer kiosk, voice-activated telephone administration, or a web-based system. CATs are scored in real-time and results may be presented immediately to the physician and patient, enabling them to have a focused discussion regarding treatment options.29, 30 Prior studies with other CATs have shown that patients have reported that these discussions improve communication with providers31 and may encourage better care.32

Potential applications

Clinical decision making

This tool has many potential uses in clinical decision making by helping to better track symptoms. This tool may fulfill the role of detecting clinical deterioration among patients with established heart failure because one study found that increase in body weight and serum B-type natriuretic peptide level in isolation are not sensitive in assessing clinical deterioration.33 Better tracking of dyspnea may allow for more accurate HF staging and corresponding appropriateness or need for interventions such as chronic resynchronization therapy.

Disease management programs

A systematic review of multidisciplinary disease management programs showed that trials with interventions that included patient education and symptom monitoring reduced hospital re-admissions and mortality.34 A dyspnea CAT completed by voice-activated telephone calls could provide closer follow up of patients’ symptoms. This method of monitoring dyspnea could lower costs of the heart failure disease management programs as the cost of regular monitoring by home visits, telephone calls, or clinic visits may be prohibitive.

Evaluation of new therapies or interventions

Beside the clinical utility, this dyspnea assessment tool would be useful as a patient-reported outcome measure in research evaluating new therapies or interventions. For example, it could be included as an outcome in heart failure clinical trials for new medications.

Limitations

This tool also has its limitations. Computers must be available for use in administering CAT. Since the dyspnea CAT has been interviewer-administered to avoid needing patients to be computer literate, future studies will be needed to confirm the ease of patient self-entering information. Item bank scores less than 30 are less precise, as reflected by the larger standard errors. However, this may not be clinically significant since the patient with a very low score has minimal dyspnea. The upper range of the dyspnea item bank contains few activities since most items in the item bank are clustered in the middle range. However, there may not be activities that are involve less exertion than sleeping, an item already included in this item bank. Once patients have dyspnea at rest or during sleep, there may be no further ability to differentiate severity of dyspnea among these patients. Though the relationship between overall shortness of breath and the median dyspnea item bank score was linear, there was a wide range of dyspnea item bank scores for each level of overall shortness of breath. This likely reflects the lack of precision of the 0–10 scale where there are not word anchors provided for respondents for the middle range of the scale. On the other hand, the dyspnea item bank score is more precise as it helps patients define their dyspnea using explicit activities. There are also limitations to the use of chart review at the patient’s primary hospital to assess rates of hospitalizations. This may underestimate frequency of hospitalizations because patients may have been hospitalized at other hospitals. Also since death were not ascertained using death indices, it is possible that some patients without any records of hospitalizations may have died

Given the prevalence of heart failure, we can strive to improve quality of care, optimize resource use, and reduce the impact of heart failure on patients’ health, quality of life, and longevity. This dyspnea item bank and CAT could help achieve these goals by providing a more accurate assessment of symptoms.

Acknowledgments

This project was supported by the American Heart Association (Scientist Development Award 0630156N) and a grant from the NIH (K23HL085766).

We thank Neerja Khurana and Tiffany Simpson for their help with data collection. We would also like to thank Rachel Hanrahan and Rumi Semer for their help with programming. Bernice Ruo, MD, MAS has received support by the American Heart Association (Scientist Development Award 0630156N) and is currently supported by a grant from the NIH (K23HL085766).

Footnotes

DISCLOSURES

None

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Adamson PB, Magalski A, Braunschweig F, et al. Ongoing right ventricular hemodynamics in heart failure: clinical value of measurements derived from an implantable monitoring system. J Am Coll Cardiol. 2003 Feb 19;41(4):565–571. doi: 10.1016/s0735-1097(02)02896-6. [DOI] [PubMed] [Google Scholar]
  • 2.Wilson JR, Rayos G, Yeoh TK, Gothard P, Bak K. Dissociation between exertional symptoms and circulatory function in patients with heart failure. Circulation. 1995 Jul 1;92(1):47–53. doi: 10.1161/01.cir.92.1.47. [DOI] [PubMed] [Google Scholar]
  • 3.Fries JF, Bruce B, Cella D. The promise of PROMIS: using item response theory to improve assessment of patient-reported outcomes. Clin Exp Rheumatol. 2005 Sep-Oct;23(5 Suppl 39):S53–57. [PubMed] [Google Scholar]
  • 4.New York Heart Association. Nomenclature and Criteria for Diagnosis of Diseases of the Heart and Blood Vessels. New York: New York Heart Association; 1963. [Google Scholar]
  • 5.Green CP, Porter CB, Bresnahan DR, Spertus JA. Development and evaluation of the Kansas City Cardiomyopathy Questionnaire: a new health status measure for heart failure. J Am Coll Cardiol. 2000 Apr;35(5):1245–1255. doi: 10.1016/s0735-1097(00)00531-3. [DOI] [PubMed] [Google Scholar]
  • 6.Guyatt GH, Nogradi S, Halcrow S, Singer J, Sullivan MJ, Fallen EL. Development and testing of a new measure of health status for clinical trials in heart failure. J Gen Intern Med. 1989 Mar-Apr;4(2):101–107. doi: 10.1007/BF02602348. [DOI] [PubMed] [Google Scholar]
  • 7.Rector TS, Kubo SH, Cohn JN. Patients’ Self Assessment of Their Congestive Heart Failure: Content, reliability, and validity of a new measure, the Minnesota Living with Heart Failure Questionnaire. Heart Failure. 1987;3:198–209. [Google Scholar]
  • 8.Mahler DA, Weinberg DH, Wells CK, Feinstein AR. The measurement of dyspnea. Contents, interobserver agreement, and physiologic correlates of two new clinical indexes. Chest. 1984 Jun;85(6):751–758. doi: 10.1378/chest.85.6.751. [DOI] [PubMed] [Google Scholar]
  • 9.Alla F, Briancon S, Guillemin F, et al. Self-rating of quality of life provides additional prognostic information in heart failure. Insights into the EPICAL study. Eur J Heart Fail. 2002 Jun;4(3):337–343. doi: 10.1016/s1388-9842(02)00006-5. [DOI] [PubMed] [Google Scholar]
  • 10.Heidenreich PA, Spertus JA, Jones PG, et al. Health status identifies heart failure outpatients at risk for hospitalization or death. J Am Coll Cardiol. 2006 Feb 21;47(4):752–756. doi: 10.1016/j.jacc.2005.11.021. [DOI] [PubMed] [Google Scholar]
  • 11.Fayers PM, Machin D. Quality of life:Assessment, analysis and interpretation. John Wiley & Sons Ltd; 2000. [Google Scholar]
  • 12.Hambleton RK, Slater SC. Item response theory models and testing practices: current international status and future directions. European Journal of Psychological Assessment. 1997;13(1):21–28. [Google Scholar]
  • 13.Hays RD, Morales LS, Reise SP. Item response theory and health outcomes measurement in the 21st century. Med Care. 2000 Sep;38(9 Suppl):II28–42. doi: 10.1097/00005650-200009002-00007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Weiss DJ. Improving Measurement Quality and Efficiency with Adaptive Testing. Applied Psychological Measurement. 1982;6(4):473–492. [Google Scholar]
  • 15.Revicki DA, Cella DF. Health status assessment for the twenty-first century: item response theory, item banking and computer adaptive testing. Qual Life Res. 1997 Aug;6(6):595–600. doi: 10.1023/a:1018420418455. [DOI] [PubMed] [Google Scholar]
  • 16.Jenkinson C, Fitzpatrick R, Garratt A, Peto V, Stewart-Brown S. Can item response theory reduce patient burden when measuring health status in neurological disorders? Results from Rasch analysis of the SF-36 physical functioning scale (PF-10) J Neurol Neurosurg Psychiatry. 2001 Aug;71(2):220–224. doi: 10.1136/jnnp.71.2.220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Lai JS, Dineen K, Reeve BB, et al. An item response theory-based pain item bank can enhance measurement precision. J Pain Symptom Manage. 2005 Sep;30(3):278–288. doi: 10.1016/j.jpainsymman.2005.03.009. [DOI] [PubMed] [Google Scholar]
  • 18.Victorson D, Cella D, Yount S, Anton S, Hamilton A. Development of a patient-driven conceptual framework of dyspnea and functional limitations in COPD. Value in Health. 2009;12(6):1018–1025. doi: 10.1111/j.1524-4733.2009.00547.x. [DOI] [PubMed] [Google Scholar]
  • 19.Choi SW, Victorson DE, Yount S, Anton S, Hamilton A, Cella D. Development of a Conceptual Framework and Calibrated Item Banks to Measure Patient Reported Dyspnea Severity and Related Functional Limitations (manuscript under review) [DOI] [PubMed] [Google Scholar]
  • 20.Brooks SC. Surveillance for respiratory hazards. ATS News. 1982;8:12–16. [Google Scholar]
  • 21.McGavin CR, Artvinli M, Naoe H, McHardy GJ. Dyspnoea, disability, and distance walked: comparison of estimates of exercise performance in respiratory disease. Br Med J. 1978 Jul 22;2(6132):241–243. doi: 10.1136/bmj.2.6132.241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Linacre J. Sample size and item calibration stability. Rasch Measurement Transactions. 1994;7(4):328. [Google Scholar]
  • 23.Wright B, Tennant A. Sample size again. Rasch Measurement Transactions. 1996;9(4):468. [Google Scholar]
  • 24.Guyatt GH, Sullivan MJ, Thompson PJ, et al. The 6-minute walk: a new measure of exercise capacity in patients with chronic heart failure. Can Med Assoc J. 1985 Apr 15;132(8):919–923. [PMC free article] [PubMed] [Google Scholar]
  • 25.Muraki E. Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement. 1990;14(1):59–71. [Google Scholar]
  • 26.Lord FM. Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: Erlbaum; 1980. [Google Scholar]
  • 27.PARSCALE 4: IRT based test scoring and item analysis for graded items and rating scales [computer program]. Version. Chicago: Scientific Software International, Inc; 2003. [Google Scholar]
  • 28.Choi SW. Firestar: Computerized Adaptive Testing Simulation Program for Polytomous IRT Models. Applied Psychological Measurement. 2009;33:644–645. [Google Scholar]
  • 29.Cella D, Chang CH. A discussion of item response theory and its applications in health status assessment. Med Care. 2000 Sep;38(9 Suppl):II66–72. doi: 10.1097/00005650-200009002-00010. [DOI] [PubMed] [Google Scholar]
  • 30.Wolfe F, Pincus T. Listening to the patient: a practical guide to self-report questionnaires in clinical care. Arthritis Rheum. 1999 Sep;42(9):1797–1808. doi: 10.1002/1529-0131(199909)42:9<1797::AID-ANR2>3.0.CO;2-Q. [DOI] [PubMed] [Google Scholar]
  • 31.Detmar SB, Muller MJ, Schornagel JH, Wever LD, Aaronson NK. Health-related quality-of-life assessments and patient-physician communication: a randomized controlled trial. Jama. 2002 Dec 18;288(23):3027–3034. doi: 10.1001/jama.288.23.3027. [DOI] [PubMed] [Google Scholar]
  • 32.Jacobsen PB, Davis K, Cella D. Assessing quality of life in research and clinical practice. Oncology (Williston Park) 2002 Sep;16(9 Suppl 10):133–139. [PubMed] [Google Scholar]
  • 33.Lewin J, Ledwidge M, O’Loughlin C, McNally C, McDonald K. Clinical deterioration in established heart failure: what is the value of BNP and weight gain in aiding diagnosis? Eur J Heart Fail. 2005 Oct;7(6):953–957. doi: 10.1016/j.ejheart.2005.06.003. [DOI] [PubMed] [Google Scholar]
  • 34.Holland R, Battersby J, Harvey I, Lenaghan E, Smith J, Hay L. Systematic review of multidisciplinary interventions in heart failure. Heart. 2005 Jul;91(7):899–906. doi: 10.1136/hrt.2004.048389. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES