Abstract
Objective:
To assess the psychometric properties of the consideRATE questions, a measure of serious illness experience.
Methods:
We recruited people at least 50 years old via paid panels online, with US-Census-based quotas. We randomized participants to a patient experience story at two time points. They completed a series of measures, including the consideRATE questions. We assessed convergent (Pearson’s correlation), discriminative (one-way ANOVA with Tukey’s test for multiple comparisons) and divergent (Pearson’s correlation) validity. We also assessed intra-rater reliability (intra-class correlation) and responsiveness to change (t-tests).
Results:
We included 809 individuals in our analysis. We established convergent validity (r=0.77; p<0.001); discriminative validity (bad/neutral stories [mean diff=0.4; p<0.001]; neutral/good stories [mean diff=1.3; p<0.001]) and moderate divergent validity (r=0.57; p<0.001). We established sensitivity to change in all stories (bad/good [mean diff=1.52; p <0.001]; good/bad [mean diff= −1.68; p<0.001]; neutral/bad [mean diff= −0.57; p<0.001]; good/neutral [mean diff= −1.11; p<0.001]; neutral/good [mean diff= 1.1; p<0.001]) but one (bad/neutral [mean diff= 0.4; p<0.07]). Intra-rater reliability was demonstrated between time points (r=0.77; p<0.001).
Conclusions:
the consideRATE questions are reliable and valid in a simulated online test.
Practice implications:
the consideRATE questions may be a practical way to measure serious illness experience and the effectiveness of interventions to improve it.
Keywords: Serious illness, patient-reported outcome measure, patient-reported experience measure, psychometric assessment, survey
1. Introduction
People with serious illnesses do not have experiences that match their expectations [1,2]. Efforts to improve the care of people with serious illnesses and those under the care of palliative clinicians are underway [3–5]. Measuring the experiences of people with serious illnesses, including those near death, has also been of interest.
Existing measures, however, are not specific enough to clearly capture the patient experience. Instead, they are typically diluted with harder outcomes like pain or place of death, making it difficult to isolate the patient experience [6–9]. For those that focus narrowly on patient experience, the measures are often too laborious to be reasonably implemented in routine care [10].
A brief and psychometrically sound measure of serious illness experience that captures patients’ priorities and can facilitate assessment and improvement of care for this population has been lacking [8,9,11]. Although there are many measures close to these goals, none are brief enough for deployment in routine care [7–10]. Also, none are based purely on the priorities of people who are seriously ill.
We developed a measure for this purpose, called the consideRATE questions, which is less than half the length of the existing measure most likely to fulfill these needs [6]. Our measure is based on eight discrete elements of care that matter most to patients when they are seriously ill, supplemented with icons to improve understandability. Our community-engaged research process, which is detailed in a separate manuscript, included: development based on the elements of serious illness care that patients feel are most important; cognitive interviews with patients, families and clinicians; pilot testing with patients and families [6].
We explored the validity and reliability of the consideRATE questions using an online sample of members of the public exposed to simulated clinical experiences.
2. Methods
2.1. Design
This study was a two-timepoint survey using population-based quota sampling. We used the Checklist for Reporting Results of Internet E-Surveys (CHERRIES), which is available in supplemental file 1[12]. Dartmouth’s Committee for the Protection of Human Subjects (CPHS) approved this study (CPHS #STUDY00030668).
2.2. Participants
We invited English-speaking adults who resided in the US to participate in our survey. Participants were required to be 50 or older because are more likely to have serious illnesses that might result in death; this means the population in our study was demographically similar to the population of people who have serious illnesses [13]. We set race/ethnicity and education quotas based on the US population during the 2010 census [14].
2.3. Survey Development
We created an online survey using Qualtrics [15]. We piloted the survey with colleagues and designed it based on previously-conducted online validation surveys of experience measures [16,17]. Qualtrics personnel also reviewed our survey for functionality.
2.4. Survey Elements
2.4.1. Participant Characteristics
We asked participants to provide gender, age, education, race and ethnicity using generic items. We also used the Single Item Health Literacy screener and asked about experiences with the health system in the last year [18–20].
We added “prefer not to say” for demographic and background questions and forced responses to these questions.
2.4.2. Simulated Hospital Experience Stories.
We instructed participants: 1) to read one of three hospital experience stories, and 2) to imagine they were the patient in the story (Table 1). The stories had Flesch-Kincaid Grade Level scores of 6.3, calculated via readable.com meaning most sixth graders could understand them [21]. We did not specify the patient’s gender and age. The stories recounted a patient’s experience from hospital admission through one night and day during which he or she expressed preferences about, and requested, needed services. Preferences and interactions concerned the priorities that patients care most about when they are seriously ill, as well as the constructs featured in the consideRATE questions [6]. We developed the stories for the simulation scenarios based on the authors’ experiences observing seriously ill patient interactions and consulted closely with two palliative clinicians (K.K. and M.M) and an oncologist (G.W.), as well as the family representatives (J.C. and D.W.M) on our team.
Table 1.
Simulated patient experience stories and consideRATE scores
| mean consideRATE questions score at T1 story type and story content | ||||
|---|---|---|---|---|
| Care element | Story set-up | Bad 1.64 (n=272) | Neutral 2.09 (264) | Good 3.42 (n=273) |
| 1.8 | 2.27 | 3.31 | ||
| Physical problems | It was late at night when I got to the hospital. When I saw the nurse, I told him I had pain in my belly. | The nurse didn’t respond to my concern and walked away. | The nurse gave me medication and told me it would go away in an hour or so. | The nurse reassured me, gave me medication, and checked my pain levels after an hour or so. |
| 1.62 | 2.12 | 3.51 | ||
| Feelings | When the doctor came by the next morning, I told her I was feeling sad to be in the hospital again. | She said that many people do not like being in the hospital, and there was nothing she could do about it. | She was sorry to hear that I was sad and said these feelings usually go away in a few days. | She asked me a few questions, listened to me, and we came up with a plan to help me feel better. |
| 1.76 | 2.11 | 3.32 | ||
| Surroundings | I woke up in the middle of the night in my hospital room, and I couldn’t get back to sleep. I was cold, so I asked the nurse if he could turn up the heat. | He said it was the usual temperature for the hospital and didn’t offer anything to help. | He said he was sorry, but he couldn’t control the heat at the hospital, and he didn’t offer me anything to help. | He said he was sorry that I felt cold and brought me an extra blanket. |
| 1.62 | 2.03 | 3.51 | ||
| What matters most | I also told the doctor it mattered to me to be at my son’s wedding in a month. | She said that we needed to deal with my problems now before talking about my future plans. | She said she understood why it was important, but we needed to deal with my problems now. | She said she understood why it was important. We talked about ways I might be able to get to the wedding. |
| 1.62 | 2.08 | 3.47 | ||
| Plans | Later that day, when the doctor came back to check on me, I asked about my plans for when I leave the hospital, like where I'd get care. | She said it was not her job to think about those kinds of plans. | She said it was too soon to try and think about those kinds of plans, and that we would try and talk about it when things become clearer. | She described a few options and asked me which seemed to fit best with my ideas about my future. |
| 1.6 | 2.21 | 3.35 | ||
| Affairs | I added that I was worried about whether my wishes would be followed if I was so sick I couldn’t speak for myself. | The doctor said they would decide what care I would get. | The doctor said I could put my wishes in a legal document so they would have it if I couldn’t speak for myself. | The doctor asked whether I needed more advice about my wishes. We came up with a plan together. |
| 1.64 | 1.98 | 3.3 | ||
| What to Expect | I also asked what my chances are of surviving this illness. | She didn’t answer my concerns about survival and changed the subject. | She said that she was sorry because it was difficult to predict survival at this time, and it was best to talk about it another time. | She said that my concerns about survival were very important to her. She took the time to talk about it with me. |
In the bad story, the clinicians neither acknowledged nor addressed patient preferences and requests. Clinicians recognized concerns but did not adequately address them in the neutral story. In the good story, clinicians acknowledged and addressed concerns (Table 1).
2.4.3. Measures
We included three measures in our survey, detailed below: the consideRATE questions [6], the CANHELP Lite [10], and the OPENness to Feedback question [22]. The entire survey is available in supplemental file 2.
2.4.3.1. The consideRATE questions
The consideRATE questions are a 9-item measure of serious illness experience that takes 2.5 minutes to complete. Only seven items of the measure are scored, on a four-point Likert-like scale from 1 (very bad) to 4 (very good). We based the measure on the elements of serious illness experience that are most important to people who are seriously ill [6,23]. We developed it using user-centered design and community-engaged research approaches [6,24,25][12,25–27]. Questions are supplemented with icons to improve understandability (Box 1); these icons were refined during user testing [6].
Highlights.
Care of people who are seriously ill needs improvement
Our measure of serious illness experience is valid and reliable in a simulated online test
The consideRATE questions may be useful to assess and improve practice and test interventions for people who are seriously ill
Importantly, the consideRATE questions are not a uni-dimensional scale, meaning that it would not be unexpected for a patient to mark some items as very good and other items as very bad [26,27]. Additionally, the questions are formative, rather than reflective, meaning it is comprised of discrete items that will not necessarily co-vary [28].
We asked participants to complete the consideRATE questions after they read a randomly assigned patient experience story (Table 1). We excluded item 9, “would you like to share your name with us,” because it was not relevant to our research questions and would have unnecessarily compromised participant privacy. We retained question eight about “any other things.” The consideRATE questions have a four-option Likert-type response scale, including: very bad (1 point), bad (2 points), good (3 points) and very good (4 points), with no neutral option. Scores convey information via both overall means and by item means.
2.4.3.2. Other Patient Serious Illness Experience Measure
We asked participants to complete the CANHELP Lite, a 21-item patient-facing and validated serious illness experience measure, designed to measure patient satisfaction near the end of their lives [10]. The underlying constructs in the consideRATE questions are also represented in the CANHELP Lite, although the items are worded differently and were derived using a different process [6]. Comparing scores on the two measures allowed us to assess construct validity, specifically convergent validity [10,27]. The CANHELP Lite has a five-option, Likert-type response, including: not at all, not very, somewhat, very and completely [10]. We calculated CANHELP Lite scores as overall means. We did not alter the CANHELP Lite response choices by adding a “prefer not to say” option, but all items were optional. We did not require a minimum number of completed items for computing an overall score.
2.4.3.3. Hospital OPENness to Feedback Measure
We then asked participants to complete a single-item question concerning whether the hospital staff is open to feedback. This question measures theoretically different concepts than the consideRATE questions, and we used it to assess divergent validity, another form of construct validity [27]. Our colleagues used the OPENness to Feedback measure in a similar validation study [23]. Likert-type response choices include: not at all interested, slightly interested, moderately interested, very interested, and extremely interested. We scored this question by overall means as well. The OPENness to Feedback response choices did not include a “prefer not to say” option, but we did not force response and participants could leave the item blank.
2.4.4. Procedures
2.4.4.1. Recruitment
Through Qualtrics (www.qualtrics.com), an Internet-based survey site, we engaged two Internet panel companies, which announced our survey to prospective participants online. Survey access was by invitation-only and based on existing panel membership. Although invitation-only approaches are more rigorous than traditional open surveys, they are not closed surveys. Some participants received invitations to participate via email, others via websites. Internet panel companies are specifically designed to connect individuals with research or marketing projects. Visitors are usually seeking survey or other research opportunities. Visitors are generally planning or taking surveys. An example announcement banner is available in the CHERRIES supplement in supplemental file 2.
2.4.4.2. Incentives
Qualtrics offered incentives to panelists, and participants chose their incentives. Some chose travel points. Others choose different forms of compensation, like points in online games. The range of payment also varied.
2.4.4.3. Consent
The survey link took participants to an information screen about the study’s purpose, expected completion time, and data storage procedures, as well as study team contact information. See supplemental file 2. After the information screen, participants had the option to consent to continue or to decline. Participation was voluntary.
2.4.4.4. Survey Completion[6]
Consented participants entered the survey and saw a brief description of the survey steps, including background questions, patient story, and questions about the patient story. Participants then completed demographic questions, as well as the health literacy screener and the question about their experience with the healthcare system.
We randomly assigned each participant to one of the three patient stories: bad, neutral, or good. We did not alter the order or presentation of any other items. After reading the story, participants completed the three measures in succession: the consideRATE questions, the CANHELP Lite, and the OPENness to Feedback questionnaire. Participants could not take the survey more than once.
The survey was brief, spanning eight screens. It only included the minimum number of questions needed to report results appropriately. We, therefore, did not need to use adaptive questioning. We enabled participants to go back in the survey flow using a back button. We presented survey instruments in their entirety, without splitting them over more than one screen.
After two weeks, we offered all participants an opportunity to retake the survey. They were re-randomized and therefore had a one-third chance of receiving the same story or a two-thirds chance of receiving a different story.
2.4.4.5. Data Protection
We stored our data in a database managed by Qualtrics, which only allows authorized users. Qualtrics protects the data with standard firewalls and IT security procedures.
As an additional precaution, we did not collect or store participants’ IP addresses. Qualtrics, however, checked every IP address using digital fingerprinting technology, in order to ensure each participant only had one opportunity to complete our survey. Responses were automatically collected.
2.5. Statistical Analysis
2.5.1. Sample Size and Power Calculations
We estimated sample sizes needed using pilot data from the development of the consideRATE questions [6]. We determined we would need at least 17 participants at time point two to detect a change of 0.5, representing a point halfway between bad and good, at 95% power. And 28 participants to detect a change of 0.35. We decided to aim for 31 participants for each story for time point two.
2.5.2. Data Exclusion.
We planned to exclude participants who completed the survey in less than half the median time (i.e., speeders), as well as those who completed the open-text data fields with nonsense or gibberish. Additionally, we excluded individuals who began the survey while the overall quota or their demographic quota was open but finished it after the quotas closed. We only analyzed the remaining complete responses.
2.5.3. Planned Analyses
We performed statistical analyses to assess the psychometric properties of the consideRATE questions, including discriminant validity; concurrent validity; divergent validity; intra-rater reliability; responsiveness to change [16,29]. We planned to evaluate the scores using combined means or scores for each item (Table 2).
Table 2.
Psychometric analysis plan
| Description | Comparisons | Statistical test | |
|---|---|---|---|
| Validity tests | |||
| Discriminative | The ability of the measure to produce low scores when the construct in question is absent or low and high scores when the construct is present | Mean the consideRATE questions scores for bad, neutral and good stories at T1 | One-way ANOVA with Tukey’s test for multiple comparisons |
| Convergent | Whether a measure captures the same construct a similar established instrument | the consideRATE questions scores (T1) and mean CANHELP Lite scores (T1) | Pearson’s correlation |
| Divergent | Whether a measure captures a different construct than a dissimilar established instrument | Mean the consideRATE questions scores (T1) and Openness to Feedback (T1) scores | Pearson’s correlation |
| Responsiveness | How well an instrument can detect a change in a construct | Mean the consideRATE questions scores for participants that received different stories in T1 and T2 | T-test |
| Reliability test | |||
| Intra-rater1 | How stable scores are at different points in time with the same raters | Mean the consideRATE questions (T1) scores for participants that received the same stories in T1 and T2 | The intra-class correlation coefficient, two-way mixed-effects [44] |
We do not report on internal consistency or principal components analysis of the consideRATE questions because they questions represent a formative model of the serious illness experience[30].
3. Results
3.1. Participant Flow and Characteristics
Five thousand, four hundred fifteen individuals visited the first page of our survey, and 809 completed the entire survey satisfactorily (Figure 1). Some participants were excluded based on suspicious response patterns. The estimated number of people invited to the survey was between 54,160 and 67,700. The number of unique visitors to the first page of the survey was 5,415, anywhere from 8–10% of those invited viewed the first page of our survey. Of those who visited the first page of our survey, 97% consented to participate, excluding those who interacted with the first page but did not meet our demographic quotas. Thirty seven percent, or 809, of those that consented to the time point one survey completed it. Of those 809, 275 people completed the survey at time point two.
Figure 1:
Study participant flow diagram
We summarize the characteristics of participants at time point one (Table 3) Race and ethnicity, as well as education levels, were consistent with the U.S. population; age was not [15].
Table 3.
Participant characteristics at time point 1
| Story | |||||||
|---|---|---|---|---|---|---|---|
| Bad (n=272) | Neutral (n=264) | Good (n=273) | |||||
| Num. (%)* | Num. (%) | Num. (%) | |||||
| Demographic characteristics | |||||||
| Age | |||||||
| 50–59 | 71 | (26.1) | 79 | (29.9) | 80 | (29.3) | |
| 60–69 | 123 | (45.2) | 122 | (46.2) | 134 | (49.1) | |
| 70–79 | 72 | (26.5) | 52 | (19.7) | 51 | (18.7) | |
| 80–89 | 6 | (2.2) | 9 | (3.4) | 8 | (2.9) | |
| 90 or older | 0 | (0.0) | 2 | (0.8) | 0 | (0.0) | |
| Gender | |||||||
| Female | 143 | 126 | 107 | ||||
| Race/Ethnicity | |||||||
| Asian | 13 | (4.8) | 12 | (4.5) | 9 | (3.3) | |
| Black or African American | 29 | (10.7) | 24 | (9.1) | 32 | (11.7) | |
| Hispanic, Latino/a or Spanish origin | 42 | (15.4) | 30 | (11.4) | 36 | (13.2) | |
| White or Caucasian | 192 | (70.6) | 191 | (72.3) | 203 | (74.4) | |
| Other | 5 | (1.8) | 9 | (3.4) | 1 | (0.4) | |
| Education | |||||||
| Less than high school degree | 14 | (5.1) | 17 | (6.4) | 10 | (3.7) | |
| High school degree or equivalent (i.e. GED) | 66 | (24.3) | 63 | (23.9) | 67 | (24.2) | |
| Some college | 81 | (29.8) | 74 | (28.0) | 88 | (32.2) | |
| 4-year degree | 62 | (22.8) | 62 | (23.5) | 65 | (23.8) | |
| Graduate degree | 42 | (15.4) | 41 | (15.5) | 35 | (12.8) | |
| Doctoral degree | 6 | (2.2) | 7 | (2.7) | 6 | (2.2) | |
| Healthcare characteristics | |||||||
| Health literacy* (Chew’s Single Item Health Literacy Screener) [20,21] | |||||||
| Low | 108 | (40.6) | 108 | (41.5) | 102 | (38.2) | |
| Healthcare visits in the past year | |||||||
| Primary care provider | 250 | (91.9) | 273 | (89.8) | 256 | (93.8) | |
| Healthcare specialist | 182 | (66.9) | 164 | (62.1) | 191 | (70.0) | |
| Urgent care | 55 | (20.2) | 59 | (23.3) | 60 | (22.0) | |
| Emergency room | 56 | (20.6) | 57 | (21.6) | 53 | (19.4) | |
| Hospital | 38 | (14.0) | 41 | (15.5) | 38 | (13.9) | |
Percent across stories
3.2. Measure And Item Characteristics
Item non-response rates were 5% across all items. “Doesn’t apply” responses ranged from 1% for the “what matters most to you” question to 5% for the “your surroundings” question.
3.3. Reliability And Validity
The consideRATE questions demonstrated reliability and validity (Table 4).
Table 4.
Reliability and validity of the consideRATE questions
| Validity tests | Mean (SD) | N | Statistics | P-value | Interpretation | Validity Interpretation | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Discriminative | 809 | f(df) = 373.583 (2, 799) | <0.001 | Can discriminate between bad, neutral and good stories | |||||||||
| Bad (T1) vs. | 1.7 | (0.85) | 272 | ||||||||||
| Neutral (T1) | 2.1 | (0.81) | 264 | mean diff = 0.4 | <0.001 | Yes | |||||||
| Good (T1) | 3.4 | (0.56) | 273 | mean diff = 1.7 | <0.001 | Yes | |||||||
| Neutral (T1) vs. | - | - | |||||||||||
| Good (T1) | - | - | mean diff = 1.3 | <0.001 | Yes | ||||||||
| Convergent | r = 0.77 | <0.001 | Shows high positive correlation with an existing validated measure including similar constructs [32,33] | Yes | |||||||||
| consideRATE (T1) vs. | 2.4 | (1.04) | - | ||||||||||
| CANHELP Lite (T1) | 2.7 | (1.28) | 802 | ||||||||||
| Divergent | r = 0.57 | <0.001 | Shows moderate positive correlation with an existing measure including a dissimilar construct, suggesting divergent validity [32,33] | Possibly | |||||||||
| consideRATE (T1) vs. | 2.4 | (1.04) | 801 | ||||||||||
| OPENness to Feedback (T1) | 2.7 | (1.54) | 801 | ||||||||||
| Responsiveness | Shows responsiveness to change between all stories, except from the bad story to the neutral story | ||||||||||||
| Bad (T1) vs. | 1.4 | (0.51) | - | - | |||||||||
| Neutral (T2) | 1.7 | (0.71) | 19 |
t = 1.94 mean diff = 0.4 |
0.070 | No | |||||||
| Good (T2) | 3.5 | (0.51) | 29 |
t = −6.23 mean diff = 1.52 |
<0.001 | Yes | |||||||
| Neutral (T1) vs. | 2.1 | (1.01) | - | - | |||||||||
| Bad (T2) | 1.5 | (0.67) | 29 |
t = 3.72 mean diff = 0.57 |
0.001 | Yes | |||||||
| Good (T2) | 3.3 | (0.80) | 28 |
t = −4.39 mean diff = 1.1 |
<0.001 | Yes | |||||||
| Good (T1) vs. | 3.5 | (0.53) | - | - | |||||||||
| Bad (T2) | 1.8 | (0.86) | 26 |
t = 7.72 mean diff = −1.68 |
<0.001 | Yes | |||||||
| Neutral (T2) | 2.3 | (0.88) | 24 |
t = 5.24 mean diff = −1.11 |
<0.001 | Yes | |||||||
| Reliability tests | Mean (SD) | N | Statistics | P-value | Interpretation | Reliable | |||||||
| Intra-rater overall | Shows good reliability when raters are exposed to the same story twice | ||||||||||||
| consideRATE (T1) vs. | 2.3 | (1.04) | - |
r = 0.77(single) f(df1, df2) = 73(1,2) |
<0.001 | Yes | |||||||
| consideRATE (T2) | 2.3 | (1.12) | 74 | ||||||||||
| Intra-rater by story | |||||||||||||
| Bad (T1) vs. | 1.5 | (0.53) | - | r= 0.74(single) f(df1, df2) = 23(1,2) |
<0.001 | Shows good reliability when the same raters receive a bad story [44] | Yes | ||||||
| Bad (T2) | 1.6 | (0.80) | 23 | ||||||||||
| Neutral (T1) vs. | 2.3 | (0.97) | - |
r = 0.54(single) f(df1, df2) = 14(1,2) |
0.001 | Shows moderate reliability when the same raters receive a neutral story [44] | Yes | ||||||
| Neutral (T2) | 2.0 | (0.85) | 29 | ||||||||||
| Good (T1) vs. | 3.3 | (0.60) | - |
r = 0.32(single) f(df1, df2) = 20(1,2) |
0.051 | Shows poor reliability when the same raters receive a good story [45] | No | ||||||
| Good (T2) | 3.5 | (0.54) | 21 | ||||||||||
Discriminative Validity.
The consideRATE questions demonstrated discriminative validity, with mean scores rising significantly between participants assigned bad stories (mean 1.7), neutral stories (mean 2.1) and good stories (mean 3.4) at time point one (Figure 2).
Figure 2:
Mean consideRATE questions scores by scenario
3.4. Convergent Validity
Mean scores from the consideRATE questions at time point one showed a high positive correlation with the CANHELP Lite (r=0.77; p<0.001), suggesting the measures address the same constructs [31,32].
3.5. Divergent Validity
Mean scores from the consideRATE questions showed a moderate positive correlation with the OPENness to Feedback measure at time point one (r=0.57; p <0.001). This finding suggests the measures do not include the same constructs, although this is open to interpretation [32].
3.6. Responsiveness To Change
The consideRATE questions responded to change between most stories from time point one to time point two. From the bad story at time point one to the neutral story at time point two, the measure was not responsive. The inverse, however, moving from time point one neutral story to time point two bad story showed statistically significant change.
3.7. Intra-Rater Reliability
The consideRATE questions demonstrated intra-rater reliability when participants received the same story in time point one and time point two (r=0.77). This finding suggests limited measurement error with the same rater[29].
3.8. Subgroup Analyses
We demonstrated the reliability and validity of the consideRATE questions for individuals with low health literacy and low education, although we could not perform the entire battery of assessments due to lower participant numbers. We detailed the results of these analyses in supplemental file 3.
4. Discussion and Conclusion
4.1. Discussion
The consideRATE questions are valid and reliable using simulated patient experiences. Using simulations with members of the general public at least 50 years old, we established discriminant and convergent validity. Responsiveness and intra-rater reliability, along with divergent validity, were also demonstrated, with limitations, suggesting the questions will be able to measure serious illness experience for a diverse group of individuals with diverse backgrounds and experiences.
We developed the consideRATE questions, a generic and flexible measure of serious illness experience, using community-engaged research methods, including theoretically grounded cognitive interviews and piloting [12]. Our previous work has demonstrated that the questions can be completed easily and quickly, in less than 3 minutes, without disrupting workflow. Short tools like this measure can reduce respondent burden and make interpretation and implementation easier [33,34].
Additionally, the consideRATE questions can seamlessly follow one patient throughout his or her illness, from home hospice to clinic to intensive care units. It can also be completed by family members as proxies when the patient is unable to speak for themself, limiting missing or difficult-to-interpret data.
The consideRATE questions are intended to provide real-time feedback on routine care to clinical teams on the serious illness experiences of their patients, for instance routine distribution after visits at an oncology clinic [6]. Additionally, they could be used as instruments to assess serious illness experience in research, quality improvement and registries.
The choice to use simulated patient experiences carries strengths and weaknesses. Simulated encounters present an opportunity to assess discriminative validity in a controlled environment, a task that would be difficult with real patients and real experiences [35–37] Additionally, simulated experiences are not burdensome for actual patients and their families and were not unnecessarily resource-intensive [16]. We also believe that not specifying gender, age, race or ethnicity in our patient stories limited rater bias and helped participants imagine themselves as the patient in the stories. Validity and reliability are limited to these simulated patient experiences. Using written patient experience stories, although helpful for limiting bias, may have limited the ability of low literacy individuals to participate fully in the study. Audio recorded patient experience stories may have helped mitigate the access issue for people with low literacy, but would have simultaneously weakened our rater bias by introducing the demographics of the patient in the story unnecessarily.
Historically there have been concerns about both the quality of responses and the representativeness of respondents that participate as members of online panels[38], particularly because it is not possible to detail the demographic characteristics of those who decline to participate. However, Walter et al conducted a meta-analysis in 2019 that compared the reliability of data from online panels to direct recruitment methods.[39] They concluded that although data from online samples cannot be representative of the broader U.S. population, such precision may not be necessary for early-stage psychometric testing, especially given their finding that psychometric and criterion validity analyses were consistent between online and conventional surveys.[39]
Additional psychometric assessments are underway for the consideRATE questions, using patients and family members recruited in clinical settings.
In a previous search, we identified 93 patient-reported measures relating to serious illness, dying, or death [6]. None of these existing instruments were brief patient experience measures [6]. The most comparable measure to ours is the CANHELP Lite. Heyland and colleagues derived this measure from the longer CANHELP questionnaire. They also validated the shorter version [10]. The original CANHELP has 37 items and takes 40–60 minutes to complete, while the shorter version has 21 questions [10]. Although the authors do not report completion time for the abridged measure, it is likely at least 20 minutes. CANHELP Lite has both patient and family member versions [10]. Scores on the CANHELP Lite correlate with scores on the original CANHELP questionnaire (r>0.90 across domains) and has a moderate correlation with the global rating of satisfaction with care (r=0.51) [10]. Additionally, the CANHELP Lite has high internal consistency, ranging from 0.68 to 0.93 across domains [10]. The authors did not assess discriminative or divergent validity, responsiveness, or intra-rater reliability.
4.2. Practice Implications
As the populace ages, there is an imperative to improve serious illness care systematically. With improvements to policy and practice in mind, there are efforts underway to measure the serious illness experience and other related metrics. Measurement can improve clinical outcomes [40]. With support from The Gordon and Betty Moore Foundation, the National Committee for Quality Assurance is building an “accountability program,” based on measurement of serious illness care [41].
Similarly, the Center to Advance Palliative Care is expanding the National Palliative Care Registry, also funded by the Moore Foundation, to connect and compare data across care settings, from community clinics to more extensive health care systems [42].
Long-term, we believe the consideRATE questions will be useful for assessing the effectiveness of serious illness interventions, like guides, trainings and templates.
4.3. Conclusions
A brief, valid and reliable measure of serious illness experience could improve clinical practice and simultaneously assist in testing the effectiveness of interventions. The consideRATE questions could fulfill that role.
Supplementary Material
Textbox 1. The consideRATE questions items.
Please do not use without permission.
How would you rate our attention to your physical problems? Things like pain, dry mouth or trouble breathing
How would you rate our attention to your feelings? Things like feeling sad, worried or like a burden
How would you rate our attention to your surroundings? Things like noise, light and warmth
How would you rate our respect for what matters most to you? Things like values, preferences about care or important activities
How would you rate our communication about your plans? Things like medicines, procedures or place of care
How would you rate our attention to your affairs? Things like a will, finances or advance directives for care
How would you rate our communication about what you can expect? Things like illness getting worse or time left to live
Are there any other things you want to share? Write them here or on the next page (not scored)
Would you like to share your name with us? If not, you will stay nameless (not scored)
Disclosures and Acknowledgements
Development of this manuscript was supported, in part, by the T32 Research Fellowship in Geriatric Mental Health Services Research (T32 MH19132; Bruce).
Dr. MacMartin has nothing to disclose. Dr. Barnato has nothing to disclose.
Dr. Saunders reports holding copyright in The Considerate Suite.
Dr. Durand reports fees from EBSCO Health and ACCESS Community Health Network and reports holding copyright in The Considerate Suite.
Dr. Kirkland reports holding copyright in The Considerate Suite.
Dr. Elwyn reports royalties from Oxford University Press and Radcliffe Press, ownership of &think LLC, SHARPNetwork LLC, and fees from ACCESS Community Health Network, Chicago Federally Qualified Medical Centers, EBSCO Health, Bind Insurance, PatientWisdom Inc, abridge AI Inc. He also reports holding copyright in The Considerate Suite.
Importantly, we thank our patient and family partners, David Wilson Milne and Joan Collison for their input throughout the consideRATE suite studies.
We also thank Gabrielle Stevens, PhD, for her counsel concerning psychometric validation and SPSS. We also acknowledge the SPSS and statistical support of Jianjun Hua, PhD. As well as the psychometric support of Greg McHugo, PhD. We thank Savannah Braun of Qualtrics LLC for her support with panel recruitment. And we acknowledge the contributions of Garrett Wasp, MD for reviewing our patient experience stories and Ashleigh Jaggars for assisting with setting up our survey.
Footnotes
CRediT authorship contribution statement
Catherine H. Saunders: conceptualization, methodology, formal analysis, investigation, data curation, writing-original draft. Marie-Anne Durand: conceptualization, writing-review & editing, supervision. Kathryn Kirkland: conceptualization, writing – review and editing. Meredith MacMartin: conceptualization, writing – review and editing. Amber Barnato: Writing – review and editing. Glyn Elwyn: conceptualization, methodology, draft – review and editing. *if there is an earlier version of this associated with the submission please use that version.
O’Malley and colleagues proposed a new method for establishing inter and intra rater reliability, using Bayesian analysis. For the purposes of this project, we’ve elected to use used established methods, using intra-class correlations[43].
Notably, intra-rater reliability is a test of whether scores remain constant over time when stories remain constant while responsiveness to change is a measure of whether scores change over time when stories change.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- [1].Dzul-Church V, Cimino JW, Adler SR, Wong P, Anderson WG, “I’m sitting here by myself …”: experiences of patients with serious illness at an Urban Public Hospital, J. Palliat. Med. 13 (2010) 695–701. 10.1089/jpm.2009.0352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Singer AE, Meeker D, Teno JM, Lynn J, Lunney JR, Lorenz KA, Symptom trends in the last year of life from 1998 to 2010: a cohort study, Ann. Intern. Med. 162 (2015) 175–183. 10.7326/M13-1609. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Brighton LJ, Koffman J, Hawkins A, McDonald C, O’Brien S, Robinson V, Khan SA, George R, Higginson IJ, Selman LE, A Systematic Review of End-of-Life Care Communication Skills Training for Generalist Palliative Care Providers: Research Quality and Reporting Guidance, J. Pain Symptom Manage. 54 (2017) 417–425. 10.1016/j.jpainsymman.2017.04.008. [DOI] [PubMed] [Google Scholar]
- [4].Clark D, From margins to centre: a review of the history of palliative care in cancer, Lancet Oncol. 8 (2007) 430–438. 10.1016/S1470-2045(07)70138-9. [DOI] [PubMed] [Google Scholar]
- [5].Phongtankuel V, Meador L, Adelman RD, Roberts J, Henderson CR Jr, Mehta SS, Del Carmen T, Reid MC, Multicomponent Palliative Care Interventions in Advanced Chronic Diseases: A Systematic Review, Am. J. Hosp. Palliat. Care. 35 (2018) 173–183. 10.1177/1049909116674669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Saunders CH, Durand M-A, Scalia P, Kirkland KB, MacMartin MA, Barnato AE, Milne DW, Collison J, Jaggars A, Butt T, Wasp G, Nelson E, Elwyn G, User-centered design of the consideRATE questions, a measure of people’s experiences when they are seriously ill, J. Pain Symptom Manage. (2020). 10.1016/j.jpainsymman.2020.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Hearn J, Higginson IJ, Development and validation of a core outcome measure for palliative care: the palliative care outcome scale. Palliative Care Core Audit Project Advisory Group, Qual. Health Care. 8 (1999) 219–227. 10.1136/qshc.8.4.219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Kearns T, Cornally N, Molloy W, Patient reported outcome measures of quality of end-of-life care: A systematic review, Maturitas. 96 (2017) 16–25. 10.1016/j.maturitas.2016.11.004. [DOI] [PubMed] [Google Scholar]
- [9].Lendon JP, Ahluwalia SC, Walling AM, Lorenz KA, Oluwatola OA, Anhang Price R, Quigley D, Teno JM, Measuring Experience With End-of-Life Care: A Systematic Literature Review, J. Pain Symptom Manage. 49 (2015) 904–15.e1–3. 10.1016/j.jpainsymman.2014.10.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Heyland DK, Jiang X, Day AG, Cohen SR, Canadian Researchers at the End of Life Network (CARENET), The development and validation of a shorter version of the Canadian Health Care Evaluation Project Questionnaire (CANHELP Lite): a novel tool to measure patient and family satisfaction with end-of-life care, J. Pain Symptom Manage. 46 (2013) 289–297. 10.1016/j.jpainsymman.2012.07.012. [DOI] [PubMed] [Google Scholar]
- [11].LaVela SL, Gallan A, Evaluation and Measurement of Patient Experience, (2014). https://papers.ssrn.com/abstract=2643249 (accessed April 18, 2019).
- [12].Eysenbach G, Improving the quality of Web surveys: the Checklist for Reporting Results of Internet E-Surveys (CHERRIES), J. Med. Internet Res. 6 (2004) e34. 10.2196/jmir.6.3.e34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Cancer mortality by age, Cancer Research UK. (2015). www.cancerresearchuk.org (accessed July 3, 2019). [Google Scholar]
- [14].U.S. Census Bureau QuickFacts: United States, Census Bureau QuickFacts. (n.d.). https://www.census.gov/quickfacts/fact/table/US/PST045218 (accessed October 8, 2019). [Google Scholar]
- [15].Qualtrics XM - Experience Management Software, Qualtrics. (n.d.). https://www.qualtrics.com/?utm_source=google&utm_medium=ppc&utm_campaign=US-Brand-Qualtrics-Brand&utm_keyword=qualtrics&MatchType=e&adid=351559587090&utm_content=351559587090&adgroupid=41339289338&campaignid=755409789&AdGroup={AdGroup}&BidMatchType={BidMatchType}&Target=&targetid=kwd-8232955280&Device=c&devicemodel=&loc_phsyical_ms=9002419&Placement=&querystring={querystring}&network=g&adposition=1t1&GCLID=EAIaIQobChMIzLmEj96m5wIVB26GCh1bnQ5CEAAYASAAEgJJmvD_BwE&gclid=EAIaIQobChMIzLmEj96m5wIVB26GCh1bnQ5CEAAYASAAEgJJmvD_BwE (accessed January 28, 2020). [Google Scholar]
- [16].Barr PJ, Thompson R, Walsh T, Grande SW, Ozanne EM, Elwyn G, The psychometric properties of CollaboRATE: a fast and frugal patient-reported measure of the shared decision-making process, J. Med. Internet Res. 16 (2014) e2. 10.2196/jmir.3085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Elwyn G, Thompson R, John R, Grande SW, Developing IntegRATE: a fast and frugal patient-reported measure of integration in health care delivery, Int. J. Integr. Care. 15 (2015) e008. 10.5334/ijic.1597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Chew LD, Bradley KA, Boyko EJ, Brief questions to identify patients with inadequate health literacy, Fam. Med. 36 (2004) 588–594. https://www.ncbi.nlm.nih.gov/pubmed/15343421. [PubMed] [Google Scholar]
- [19].Morris NS, MacLean CD, Chew LD, Littenberg B, The Single Item Literacy Screener: evaluation of a brief instrument to identify limited reading ability, BMC Fam. Pract. 7 (2006) 21. 10.1186/1471-2296-7-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Chew LD, Griffin JM, Partin MR, Noorbaloochi S, Grill JP, Snyder A, Bradley KA, Nugent SM, Baines AD, Vanryn M, Validation of screening questions for limited health literacy in a large VA outpatient population, J. Gen. Intern. Med. 23 (2008) 561–566. 10.1007/s11606-008-0520-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Measure the Readability of Text - Text Analysis Tools - Unique readability tools to improve your writing! App.readable.com, (n.d.). https://app.readable.com/text/ (accessed October 16, 2021).
- [22].Thompson R, Stevens G, Elwyn G, Measuring patient experiences of integration in health care delivery: Psychometric validation of IntegRATE under controlled conditions, J. Patient Exp. 8 (2021) 23743735211007346. 10.1177/23743735211007346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Virdun C, Luckett T, Davidson PM, Phillips J, Dying in the hospital setting: A systematic review of quantitative studies identifying the elements of end-of-life care that patients and their families rank as being most important, Palliat. Med. 29 (2015) 774–796. 10.1177/0269216315583032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Wood LE, Semi-structured interviewing for user-centered design, Interactions. 4 (1997) 48–61. 10.1145/245129.245134. [DOI] [Google Scholar]
- [25].Chadia A M-KD, User-Centered Design, Encyclopedia of Human-Computer Interaction. (2004). [Google Scholar]
- [26].Streiner DL, Norman GR, Cairney J, Health measurement scales : a practical guide to their development and use, n.d.
- [27].Health measurement scales: a practical guide to their development and use (5th edition), Aust. N. Z. J. Public Health. 40 (2016) 294–295. 10.1111/1753-6405.12484. [DOI] [PubMed] [Google Scholar]
- [28].Borsboom D, Mellenbergh GJ, van Heerden J, The theoretical status of latent variables, Psychol. Rev. 110 (2003) 203–219. 10.1037/0033-295X.110.2.203. [DOI] [PubMed] [Google Scholar]
- [29].Streiner DL, Norman GR, Health measurement scales: a practical guide to their development and use, 2008. 10.1378/chest.96.5.1161. [DOI]
- [30].Coltman T, Devinney TM, Midgley DF, Venaik S, Formative versus reflective measurement models: Two applications of formative measurement, J. Bus. Res. 61 (2008) 1250–1262. 10.1016/j.jbusres.2008.01.013. [DOI] [Google Scholar]
- [31].Hinkle D. E WW, Jurs SG, Applied statistics for the behavioral sciences, Houghton Mifflin; Boston, 1988. [Google Scholar]
- [32].Post MW, What to Do With “Moderate” Reliability and Validity Coefficients?, Arch. Phys. Med. Rehabil. 97 (2016) 1051–1052. 10.1016/j.apmr.2016.04.001. [DOI] [PubMed] [Google Scholar]
- [33].Bowling A, Just one question: If one question works, why ask several?, J. Epidemiol. Community Health. 59 (2005) 342–345. 10.1136/jech.2004.021204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Littman AJ, White E, Satia JA, Bowen DJ, Kristal AR, Reliability and validity of 2 single-item measures of psychosocial stress, Epidemiology. 17 (2006) 398–403. 10.1097/01.ede.0000219721.89552.51. [DOI] [PubMed] [Google Scholar]
- [35].Elwyn G, Barr PJ, Grande SW, Thompson R, Walsh T, Ozanne EM, Developing CollaboRATE: a fast and frugal patient-reported measure of shared decision making in clinical encounters, Patient Educ. Couns. 93 (2013) 102–107. 10.1016/j.pec.2013.05.009. [DOI] [PubMed] [Google Scholar]
- [36].Cohen DS, Colliver JA, Marcy MS, Fried ED, Swartz MH, Psychometric properties of a standardized-patient checklist and rating-scale form used to assess interpersonal and communication skills, Acad. Med. 71 (1996) S87–9. https://www.ncbi.nlm.nih.gov/pubmed/8546794. [DOI] [PubMed] [Google Scholar]
- [37].Melbourne E, Sinclair K, Durand M-A, Légaré F, Elwyn G, Developing a dyadic OPTION scale to measure perceptions of shared decision making, Patient Educ. Couns. 78 (2010) 177–183. 10.1016/j.pec.2009.07.009. [DOI] [PubMed] [Google Scholar]
- [38].Porter COLH, Outlaw R, Gale JP, Cho TS, The Use of Online Panel Data in Management Research: A Review and Recommendations, J. Manage. 45 (2019) 319–344. 10.1177/0149206318811569. [DOI] [Google Scholar]
- [39].Walter SL, Seibert SE, Goering D, O’Boyle EH, A Tale of Two Sample Sources: Do Results from Online Panel Data and Conventional Data Converge?, J. Bus. Psychol. 34 (2019) 425–452. 10.1007/s10869-018-9552-y. [DOI] [Google Scholar]
- [40].Ivers N, Jamtvedt G, Flottorp S, Young JM, Odgaard-Jensen J, French SD, O’Brien MA, Johansen M, Grimshaw J, Oxman AD, Audit and feedback: effects on professional practice and healthcare outcomes, in: Ivers N (Ed.), Cochrane Database of Systematic Reviews, John Wiley & Sons, Ltd, Chichester, UK, 2012: p. CD000259. 10.1002/14651858.CD000259.pub3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Henry M, Hudson Scholle S, Briefer French J, Accountability for the Quality of Care Provided to People with Serious Illness, J. Palliat. Med. 21 (2018) S68–S73. 10.1089/jpm.2017.0603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Kamal AH, Kirkland KB, Meier DE, Morgan TS, Nelson EC, Pantilat SZ, A Person-Centered, Registry-Based Learning Health System for Palliative Care: A Path to Coproducing Better Outcomes, Experience, Value, and Science, J. Palliat. Med. 21 (2018) S61–S67. 10.1089/jpm.2017.0354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [43].Bobak CA, Barr PJ, O’Malley AJ, Estimation of an inter-rater intra-class correlation coefficient that overcomes common assumption violations in the assessment of health measurement scales, BMC Med. Res. Methodol. 18 (2018) 93. 10.1186/s12874-018-0550-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Koo TK, Li MY, A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research, J. Chiropr. Med. 15 (2016) 155–163. 10.1016/j.jcm.2016.02.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


