Skip to main content
VA Author Manuscripts logoLink to VA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jun 14.
Published in final edited form as: Qual Life Res. 2021 Apr 9;30(8):2363–2374. doi: 10.1007/s11136-021-02824-2

Psychometric evaluation of a patient-reported item bank for healthcare engagement

Benjamin D Schalet 1, Steven Reise 2, Donna M Zulman 4,5, Eleanor Lewis 4, Rachel Kimerling 3,4
PMCID: PMC10262960  NIHMSID: NIHMS1797918  PMID: 33835412

Abstract

Purpose:

Healthcare engagement is a core measurement target for efforts to improve healthcare systems. This construct is broadly defined as the extent to which healthcare services represent collaborative partnerships with patients. Previous qualitative work operationalized healthcare engagement as generalized self-efficacy in four related subdomains: self-management, collaborative communication, health information use, and healthcare navigation. Building on this work, our objective was to establish a healthcare engagement instrument that is sufficiently unidimensional to yield a single score.

Method:

We conducted cognitive interviews followed by a nation-wide mail survey of US Veteran Administration (VA) healthcare users. Data was collected on 49 candidate healthcare engagement items, as well as measures of self-efficacy for managing symptoms, provider communication, and perceived access. Items were subjected to exploratory bifactor, statistical learning, and IRT analyses.

Results.

Cognitive interviews were completed by 56 patients and 9,552 VA healthcare users with chronic conditions completed the mail survey. Participants were mostly white and male, but with sizable minority participation. Psychometric analyses and content considerations reduced the item pool to 23 items, which demonstrated a strong general factor (OmegaH of .89). IRT analyses revealed a high level of reliability across the trait range, and little DIF across groups. Most health information use items were removed during analyses, suggesting a more independent role for this domain.

Conclusion.

We provide quantitative evidence for a relatively unidimensional measure of healthcare engagement. Although developed with VA healthcare users, the measure is intended for general use. Future work includes short-form development and validation with other patient groups.

Keywords: healthcare engagement, patient-reported outcomes, providers, healthcare system, communication, healthcare navigation

Introduction

Healthcare engagement is one of the core measurement targets for efforts to improve the performance of health systems [1]. This construct is broadly defined as the extent to which healthcare services represent collaborative partnerships between patients and providers and systems that deliver care [2, 3]. Engagement is distinct from other patient-focused constructs like patient experience or satisfaction, which address how consumers feel about their care rather than what they do to interact with their care. There are explicit financial incentives for healthcare engagement in publicly funded health care, through legislation such as the Medicare Access and CHIP Reauthorization Act (MACRA). The ongoing transformation in healthcare – emphasizing organizational accountability to promote population health – has galvanized efforts to promote patient engagement through patient-centered care. Patient-centered care aligns services with patient knowledge, skills, social circumstances and preferences for their care, and therefore depends on patient participation in processes such as goal setting, shared decision-making, and self-management. These participatory behaviors optimize the benefit one receives from healthcare services, enhancing the efficiency and effectiveness of care [4]. To date, however, much research on strategies to enhance patient engagement infers, rather than measures, healthcare engagement. A reliable and valid patient-reported measure of healthcare engagement would provide actionable data to evaluate health systems and help providers tailor care.

Like many psychological constructs, healthcare engagement is multifaceted and complex, with a range of approaches to operationalize the construct. Candidate measures have emerged, but require several subscales to cover the construct [5], or focus on patient traits, such as psychological readiness, motivation, or skills as a proxy for engagement [6, 7]. Efforts to operationalize the reciprocal nature of patient engagement have identified salient participatory behaviors. The engagement behavior framework [8] established a comprehensive inventory of patient participatory healthcare behaviors across the continuum of preparation, interaction, and follow-up. Subsequent research investigating barriers and facilitators to patient engagement suggest reliability across heterogeneous populations and settings in the behaviors patients describe as most important for benefiting from health care [915].

Our theoretical model for the present study organizes these behaviors into four inter-related domains [16]. We conceptualize patient engagement as the generalized self-efficacy across: (1) self-management, including health promotion, illness management, and treatment plan adherence; (2) collaborative communication, including behaviors such as asking questions or sharing feedback to establish shared goals and partnerships; (3) health information use, or behaviors required to seek, evaluate, and apply health information; and (4) healthcare navigation, consisting of behaviors related to maneuvering systems to obtain needed care. Most theoretical models maintain that engagement behavior is influenced by policy, organizational, and provider contexts [2, 17], but these elements do not map easily to a person-level measurement model. However, self-efficacy judgements reflect individual behavioral competencies within these organizational or relational contexts, because self-efficacy is a context-dependent self-assessment of behavioral capability. In other words, better engagement occurs where providers or systems make engagement behaviors easier to perform. The application of self-efficacy theory lends predictive utility and a framework that adapts well to measurement efforts [18]. In addition, self-efficacy can generalize across behaviors that require similar competencies, or that occur within shared contexts [19]. In this way, cumulative self-efficacy judgements for engagement behaviors can place individuals along a single continuum that reflects the propensity to engage with care.

The goal of the current study was to establish a patient-reported outcome measure that reflects a patient’s propensity to engage with health care. We aimed to establish an item bank on healthcare engagement that is sufficiently unidimensional to yield a single score, is free from differential item functioning based on race, gender, or chronic condition, and maximizes parsimony and precision across the continuum of the latent construct.

Methods

Item pool development

We developed a Healthcare Engagement item bank using methods established by the patient-reported outcomes measurement information system (PROMIS) [20, 21]. We previously validated the healthcare engagement construct using a literature review, expert reviews, and qualitative concept elicitation interviews with Veterans [16]. Though this work was conducted in the US Veterans Affairs (VA) healthcare system and our primary goal was to develop a measure that resonated with Veteran experiences, a secondary objective was to ensure that the construct would generalize beyond the VA healthcare system.

The item pool for this study (49 items) was generated by mapping items from publicly available measures of similar constructs and writing new items to achieve construct coverage. All items were written or edited to a 6th grade reading level. Cognitive interviews with 56 patients included think aloud responses, prompts for comprehension, retrieval, judgement or other difficulties responding to items, and relevance [22]. The a priori standard for item inclusion was three consecutive interviews establishing comprehension, relevance, and no difficulties. Minor wording changes were based on consensus of study interviewers and the principal investigator, major wording changes were addressed in dedicated item review meetings of the study staff, investigators, and consulting primary care or mental health providers.

Participants and Procedures

We conducted a nation-wide mail survey of 9552 veteran users of 136 Veteran Administration (VA) medical centers in the continental US (response rate 38%). Our sampling priorities for this initial quantitative validation were: a) variability across the full range of the construct; and b) representation of diverse clinical samples from target populations for the measure’s use.[23] The sampling frame was identified from administrative data in the VA Corporate Data Warehouse [24] We sampled from the VA primary care population, who are predominantly male, and older, with poorer health status, and less racially diverse as compared to other health care populations [25]. Inclusion criteria were: Veteran, aged 18–80, White or Black race (Asian, Pacific Islander, and Native Americans are each ≤ 1% of users and accuracy in public sector administrative data is poor [26]), a valid mailing address in the medical record, at least one past-year primary care visit to the assigned primary care panel, and a diagnosis of one of four prevalent chronic conditions (diabetes, hypertension, depression, PTSD), as indicated by one inpatient stay or two outpatient visits on different dates of service. Exclusion criteria were active diagnoses of dementia or psychotic disorders, or 90-day risk for hospitalization or death in the top 5% as indicated by the VA’s Care Assessment Needs score [27]. Stratified sampling was used to ensure sufficient cell size for subgroup comparisons and DIF analyses. Sampling was stratified by chronic condition cohort, and within cohorts to oversample Women, Black Veterans and Hispanic Veterans. Cohorts were also balanced on health status, as indicated by risk adjustment scores [28]. Study information sheets were mailed with the surveys in lieu of consent forms; return of a completed survey was considered implied consent. The IRB at Stanford University School of Medicine approved the study. Data were collected October 2018 – January 2019.

Measures

Candidate Healthcare Engagement items.

We tested 49 candidate items representing each of the four domains of the engagement construct [16]: self-management (14 items), collaborative communication (12 items), health information use (12 items), and healthcare navigation (11 items). The measure asks respondents to evaluate the truth of each statement based on the “current time” and based on the respondent’s care from any doctor or provider. Candidate items were provisionally scored on a 5-point Likert-type scale, with response options of “Not true”, “A little true”, “Somewhat true”, “Mostly true,” and “Very true.”

Concurrent validity indicators.

The PROMIS Self-Efficacy for Managing Chronic Conditions - Managing Symptoms Short Form 4a [29] is a 4-item instrument that addresses managing symptoms during daily activities, interference with relationships, managing symptoms in public, and managing symptoms with a provider. T-scores were obtained via the Heath Measures scoring service and Cronbach’s alpha was .88 for this study. The Consumer Assessment of Healthcare Providers and Systems, Provider Communication Measure (CAHPS) [30] is a 4-item measure comprised of patient ratings of how well a provider explains things, listens carefully, shows respect, and spends sufficient time with the patient. Top-box scoring methods were used (proportion of items with top scores), and Cronbach’s alpha was .92 for this measure. Perceived access to care was measured using a single item scored on a 5-point Likert scale [31] that captures the patient’s global assessment of how well the care they received met their needs, representing the inverse of unmet need for care.

Data Analysis

Our analysis strategy was to pare down the item pool to reflect the latent construct of healthcare engagement. Though our theoretical model was comprised of 4 inter-related domains of engagement behaviors, engagement was hypothesized to stem from generalized self-efficacy across these behaviors [16], making the common variance across domains our primary interest, rather than comprehensive coverage of each domain that would result in separate scales or subscales. We assumed an effect-indicator model from the beginning of the analysis [32] and therefore the analysis strategy also targeted removal of smaller, extraneous factors in an effort to identify a sufficiently unidimensional set of items that could form a single scale. A different set of theoretical assumptions, such as creating an inventory of knowledge and skills, could be consistent with a causal-indicator model, where the focus of item selection would be to maximize the unique content represented by each subdomain [33]. While there are merits of each approach, we felt the latter would have contributed to proliferation of similar constructs in the field, overemphasized each subdomain’s independence, increased respondent burden, and would have had relatively limited utility in health care systems due to the need to reliably assess and interpret multiple domains and scores.

Preliminary analyses examined classical item-level statistics, polychoric correlations, and hierarchical cluster analysis [34] to identify poorly functioning items. Next, we conducted iterative exploratory bifactor analyses using the psych package in R. We started by selecting 4 subfactors corresponding to the 4 hypothesized domains of the construct, but considered solutions with 3 subfactors as well. (For a detailed illustration of how to apply bifactor modelling to patient experience data, see Reise, Morizot, & Hays, 2007 [35]) We evaluated dimensionality based on general factor saturation and computed Omega-Hierarchical (Omega-H), where values > 0.70 suggest sufficient unidimensionality for IRT modeling [36]. In addition, we calculated explained common variance (ECV), the relative strength of a general factor compared to group factors, targeting ECV values > 0.60 as a tentative benchmark [36].

As a final step, we used a statistical learning theory (SLT) approach to optimize the parsimony of the item pool, minimize expected prediction error and incorporate criterion validity into the item selection process [37]. SLT methods have good potential to inform the development of latent trait scales. The measurement of complex constructs, such as patient engagement, requires investigators to balance the theoretical and conceptual basis of the construct with empirical results in making judgements in the presence of structural ambiguity [38]. Well-documented pitfalls are associated with too great an emphasis on optimizing inter-item correlations, such as the proliferation of reliable but narrow scales or subscales that under-represent a construct [39].

In our context, the SLT procedure helped us to identify items that could be deleted because they were not sufficiently related to criterion variables. We derived a cross-validated set of items that correlated with the validity indicators for each of the three domains of the model using the BISCUIT (Best Items Scale that is Cross Validated, Unit-weighted, Informative and Transparent), which has been shown to produce parsimonious models with comparable levels of predictive accuracy, even at high levels of missing data when compared to other statistical learning techniques [40]. The tuning parameter was selected to exert a moderate degree of influence on item selection, identifying the best 20 items for each criterion, with 1000 bootstrap replications. All items that appeared in optimal scales for all 3 validity indicators in > 1 replication were retained in the item pool.

To illustrate the structure of our final set of items prior to IRT analyses, we ran a confirmatory bifactor model corresponding to the final structure determined by exploratory factor analyses. For this final model, we report Omega-H, ECV, Comparative Fit Index (CFI), the Tucker Lewis Index (TLI), and the root mean square error of approximation (RMSEA). Next, we estimated IRT parameters using the graded response model (GRM) [41], as well as the corresponding IRT-based bifactor model with three subfactors [42]. We did this with the understanding that unidimensional parameters might still be distorted by subfactors evident in a bifactor model. Therefore, we applied a procedure to convert conditional general factor slopes into marginal slopes, using formulas 10 and 11 provided in a didactic article by Toland et al. [43], citing detailed work on these procedures [44]. Once in marginal form, the slopes and location parameters from the bifactor model can be fairly compared to unidimensional models, and may be scored “as if” they were from unidimensional models.

Differential item functioning (DIF) analysis compared race (Black/African-American vs. White), age (65 and above vs below 65), mental health vs physical health cohorts, and financial strain. We applied logistic ordinal regression with IRT scoring, commonly used in the development of PROMIS instruments [45]. We used the chi-squared likelihood-ratio statistic as the initial DIF detection criteria (alpha < 0.01), and a cut-off of McFadden pseudo R2Δ ≥ 0.02 in model comparisons to determine substantial DIF, a reasonable threshold used in the development of self-reported health outcomes [46].

All analyses were conducted in R, using the psych package [47] and lavaan [48] for factor analyses, mirt [49]for IRT analyses and lordif [45] for DIF analyses.

Results

Participant Characteristics

In total, 9,569 individuals responded to the survey. Participants who returned the survey, did not opt out, and completed at least 50% of the healthcare engagement item pool were considered complete responders and included in the study (N = 9,552). Non-responders included Veterans who opted out of the survey (N = 1,022), returned a blank questionnaire (N = 15), and did not return anything (N = 14,584). Ineligible Veterans included those who were deceased (N = 9) or whose address was undeliverable (N = 304). The response rate was approximately 38%.

Table 1 shows the demographic and clinical characteristics of the sample used for item analysis. Participants were mostly white and male, but with sizable minority participation (i.e., 27% Black, 15% Hispanic/Latinx).

Table 1.

Demographic and clinical variables for Veteran sample (N = 9552)

Variable N %
Age
 < 50 1548 16
 50–64 2863 30
 65–74 4432 46
 ≥75 709 7
Female 2134 22
Race/Ethnicity
 Hispanic / Latinx 1442 15
 Non-Hispanic Black or African American 2584 27
 Non-Hispanic White 5440 57
 Other/Unknown 85 <1
Married / cohabitating 5958 62
Education
 High school / GED 2369 25
 Some college or technical school 4070 43
 College graduate or higher 3045 32
Financial Strain
 Comfortable 2579 27
 Income can provide for basic needs 4507 47
 Difficult/Very difficult to get by on present income 2350 25
Self-Rated Health Status
 Poor 647 7
 Fair 3091 32
 Good 3609 38
 Very Good 1590 17
 Excellent 242 3
Any past-year mental health diagnosis 6668 70

Note: Race/Ethnicity, Education, Financial Strain, Self-Rated Health Status categories add to less than 9552 due to missing data.

Initial item analysis

Thirteen pairs of items in the polychoric correlation matrix were highly correlated (r > .70). After removing 8 items, these pairs disappeared (several items appeared in multiple pairs). We note that these pairs of items were expected because we tested several items with alternate wording. Next, we examined how individual items (or small groups of items) might separate from larger sets of items with a hierarchical item cluster plot [34]. This plot showed final clustering of a pair of items with narrow content (social support) and a single item that had poor wording and the lowest cluster loading (i.e., corrected item-total correlation). We deleted these 3 items, and an additional item with redundant wording, leaving 39 in the item pool at this stage.

Exploratory bifactor analysis

We removed a total of nine items in this step, based on empirical and conceptual considerations. The bifactor models with four subfactors identified 3 items with low general factor loadings (< .50) or high specific factor loadings (>.50). These items focused on individual health behavior maintenance. In addition, we noticed that a single specific factor was driven by 3 items from the “health information use” domain, also showing high specific factor loadings and low general factor loadings, suggesting a potential for multidimensionality. They focused on independent health information use (e.g., “It is easy to find information on my own that helps me manage my symptoms”). All other items in this domain showed lower loadings on the “health information use” factor (< .35). We opted to remove these 3 items, and model 3 subfactors representing the remaining subdomains. This decision to shift from 4 to 3 subfactors was based not only on these empirical considerations, but on the overarching conceptual assumption that the construct of engagement is essentially collaborative, and that the multidimensionality of items addressing independent skills was inconsistent with this assumption.

In the subsequent 3-factor bifactor model, we identified 3 more items with general factor loadings of .50 or less. These items addressed resilient self-efficacy for independent self-management behaviors (e.g., “I attend all of my appointments, even when life gets busy or stressful”) and were removed based on similar empirical considerations and conceptual assumptions. The resulting 30 item set showed a high proportion of general factor variance relative to specific factor variance (item-level proportions ranging from .58 to .83), with overall indices of OmegaH as .79 and ECV .69. These values suggest sufficient levels of unidimensionality and that the items mostly reflect a single construct [36].

Criterion Validity

Finally, we derived a cross-validated set of items that correlated with the validity indicators for each of the three domains of the model using BISCUIT [40]. As noted above, we ran models separately for each of the three criterion variables: CAHPS communication, PROMIS self-management, and perceived access (as proxy for healthcare navigation, since few measures exist), and selected all items retained in all three models. Average item-criterion correlations for these items ranged from .29 to .61 (SD = .01). Three items were not retained in any model (two items from the communication subfactor and one item from the self-management subfactor), and an additional 4 items were retained in some, but not all, models (two from the communication subfactor, and two from the self-management subfactor), yielding a final item pool of 23 items. This final item set correlated with criterion variables at .62, .64, and .65 for communication, self-management, and perceived access, respectively. Table 2 shows the full item text of the final 23 items, as well the frequency of endorsement for each response option.

Table 2.

Final item text for the Healthcare Engagement Measure and item frequencies

Frequency of Endorsement (%)

Item Key # Item Text Not at all true A little bit true Somewhat true Mostly true Very true
1 If I think my treatment plan needs to change, I have no problem bringing it up with my provider. 4 6 14 29 48
2 When I need more information, I ask, even when my provider is in a rush. 4 6 14 32 44
3 I make sure I understand all of my test results. 2 4 12 31 51
4 I know I can express my doubts, even when my provider might disagree. 4 7 17 32 40
5 If I didn’t think a treatment was working, I would tell my provider. 1 3 6 23 67
6 It is easy to find the health care resources I need (such as classes, support groups). 13 16 25 25 21
7 I know I can think through the pros and cons when I need to make a choice about my health. 2 6 17 34 40
8 I know I can get the information I need about the pros and cons of treatments. 3 8 19 35 34
9 I always know who to contact when I have a health issue. 4 8 15 32 42
10 It is easy for me to refill medications on time. 3 5 10 28 53
11 I know I can get the health care services I need, even if I must arrange it myself. 8 8 16 31 37
12 I know I can get a provider to deal with my main health concerns. 6 9 15 30 41
13 I can make sure my concerns are fully addressed before I leave appointments. 3 7 16 36 38
14 I know I can find a way to get in touch with my provider or care team when I need to. 6 8 14 27 45
15 When I need information about my care, like test results, I can get it easily. 6 9 17 30 39
16 I have a provider who I can trust to act in my best interests. 5 6 12 25 53
17 I can get the care I need without getting discouraged. 1 4 13 42 39
18 I know I can always follow my doctor’s instructions. 1 4 15 33 47
19 Learning more about my health issues helps me manage them better. 6 13 24 35 22
20 Even if I am tired or in pain, I know I can stick to my treatment plan. 5 12 26 32 25
21 I have clear goals to improve my health. 2 8 20 34 35
22 Monitoring how well my treatments are working helps me get the most out of my care. 3 9 22 40 26
23 I know I can get myself to keep doing the things that keep me healthy, even when life gets challenging. 4 6 14 29 48

Note. Missing frequencies ranged between 1 and 3 % (mean was 1.5% across items). When used in practice, the first and second categories should be collapsed. Specifically, item scores assigned to each categories should be as follows 1 = Not at all true, 1 = A little bit true, 2 = Somewhat true, 3 = Mostly true, and 4 = Very true.

Bifactor model of final 23 item set

A bifactor EFA model of the final item set showed that the highest subfactor loadings largely corresponded to our a priori classification. Exceptions were two items that focused on making sure health concerns are addressed (#12 and #13); these loaded on the navigation factor instead of communication. In addition, three items originally conceptualized as health information (#3, #7, #8) were distributed across the three subfactors.

To provide detail on the structure of our selected items, we ran a final confirmatory bifactor model on a subset of complete data (N = 8426), specifying ordered categorical data, and using the EFA result to assign items to factors. This model showed relatively high general factor indices, with OmegaH of .89 and ECV of .79. Fit statistics were: CFI = 0.996, TLI = 0.995, RMSEA = 0.044 [90% CI: 0.045,0.046]. Table 3 shows these loadings and the item-level ECV. As Table 3 shows, item #8 had the highest general factor loading (.81) and could be said to represent the construct (“I know I can get the information I need about the pros and cons of treatments”).

Table 3.

Confirmatory bifactor model of the final 23-item set and item-level explained common variance (I-ECV)

Item Key # Truncated item text General Factor Navigation Factor Self-Management Factor Communication Factor Item-ECV
14 get in touch w provider 0.66 0.53 0 0 0.61
12 provider main concerns 0.71 0.46 0 0 0.70
15 get info on care easily 0.68 0.41 0 0 0.73
16 provider best interests 0.67 0.41 0 0 0.73
11 services I need 0.58 0.28 0 0 0.81
13 concerns fully addressed 0.76 0.28 0 0 0.88
17 not discouraged 0.77 0.28 0 0 0.88
9 know who to contact 0.70 0.26 0 0 0.88
10 easy to refill 0.55 0.25 0 0 0.83
6 easy to find resources 0.62 0.24 0 0 0.87
8 info on treatment 0.81 0.12 0 0 0.98
23 things to keep healthy 0.60 0 0.48 0 0.61
21 goals to improve health 0.57 0 0.45 0 0.62
19 learning about health 0.61 0 0.37 0 0.73
20 stick to treatment plan 0.59 0 0.36 0 0.73
22 monitoring treatment 0.65 0 0.34 0 0.78
18 follow dr. instructions 0.61 0 0.20 0 0.91
7 think through pros/cons 0.67 0 0.20 0 0.92
5 tell my provider 0.73 0 0 0.39 0.78
1 bringing up plan change 0.69 0 0 0.38 0.77
4 express doubts 0.71 0 0 0.36 0.80
2 ask for more information 0.73 0 0 0.33 0.83
3 understand results 0.76 0 0 0.18 0.95

Item response theory (IRT) model

We first ran a nominal response IRT model to examine category functioning within the items. Results indicated the lowest two categories (“Not at all true” and “A little bit true”) did not reliably distinguish engagement levels among respondents. Consequently, we collapsed these two categories in subsequent models such that 3 thresholds are estimated for 4 categories of item data (for additional detail, see Reise et al., under review).

We proceeded with estimating item parameters for our 23 item set with both a unidimensional and bifactor model. Because slopes from the two models cannot be fairly compared [43], we converted the bifactor conditional slopes into marginal slopes. The resulting marginal slopes were highly correlated with the unidimensional slopes (r = .86), but slightly lower on average, showing a mean slope value of 1.67 (range 1.67 to 2.61) compared to 1.83 (range 1.30 to 2.54) for the pure unidimensional solution. Given this slight difference, we decided to use the converted bifactor parameters, which have more accurate (higher) standard errors [43]. The mean location value of the final item parameter set (averaged across the 3 values for each item) was −0.76 (range −1.57 to 0.14). The lowest location value was −2.68 and the highest was 1.25. Figure 1 shows the average location parameter across the items for the final item parameter set. This plot illustrates that item #5 (“If I didn’t think a treatment was working, I would tell my provider”) was easiest for participants to endorse, but item #6 (“It is easy to find the health care resources I need (such as classes, support groups)”) was the hardest. In other words, any given participant was more likely to rate #5 as true compared to #6.

Figure 1. Plot of Healthcare Engagement Measure items, ordered by average location parameter.

Figure 1.

Minimum and maximum values of the lines represent the lowest and highest value of the location parameters associated with each item. Dot size is proportional to the discrimination value. Truncated item text is shown.

Figure 2 shows the marginal reliability curve across the latent trait, customarily denoted by theta, which has a population mean of 0 and population standard deviation of 1. For the x-axis of Figure 2, we converted theta to T-scores (theta *10 + 50). This shows that the estimated reliability is .90 or higher ranging from T-scores of 22 to 65. Figure 3 shows the distribution of IRT-based scores for the current sample, using our estimated item parameters and Bayes expected a posteriori (EAP) scoring [50]. The scores mostly form a bell-like shape, with the exception of a small subset of persons with very high levels of healthcare engagement.

Figure 2.

Figure 2.

Reliability estimates for the final 23-item set across the trait range, based on the Graded Response Model.

Figure 3.

Figure 3.

Histogram of Bayes expected a posteriori (EAP)-based T-scores for the final 23-item set across the trait range

DIF analysis

None of the DIF analysis results exceeded (or approached) the cut-off of McFadden pseudo R2Δ ≥ 0.02 for any of the items, suggesting that the item parameters are valid for the different demographic and clinical groups examined.

Discussion

In an effort to address the lack of pragmatic, predictive, and publicly available measures that can guide patient engagement efforts, we developed an item bank to measure patient engagement in healthcare, building on prior conceptual and qualitative work [16]. Our measure scales patients on their propensity to engage with healthcare by measuring self-efficacy for engagement behaviors. Although the items are sufficiently unidimensional to represent engagement as a single score, three subdomain concepts were still evident: Healthcare Navigation, Collaborative Communication, and Self-Management. Theoretically, engagement scale scores can be thought of as the generalized self-efficacy across these three domains of healthcare behaviors. Indeed, patients who can effectively navigate their health system, establish a collaborative relationship with their provider characterized by effective communication, and self-manage chronic conditions receive optimal benefits from their healthcare services. The final Healthcare Engagement Measure items demonstrated excellent reliability across the range of trait scores, including the lower range, where discriminating engagement levels is likely to be most relevant. We found no evidence of substantial DIF across age, race, gender, financial strain, or the presence of a mental health condition. The absence of item bias in these domains is encouraging in the development of an instrument that could be used with diverse populations, including in public sector health care settings, and with a range of health conditions.

Our results reaffirmed that collaborative interaction is at the core of healthcare engagement. Exploratory bifactor modeling identified several items addressing independent skills or attributes in health information use and self-management domains, which demonstrated high loadings on the specific factor, but poor general factor loadings. While these items measured their narrower domains well, they were not strong indicators of the latent engagement construct. This appears to be especially the case for the health information domain, of which only 3 items were ultimately retained. These retained health information use items showed strong loadings on the general factor, while also having interpretable (albeit very small) loadings on subfactors. For example, understanding test results (#3) might involve communication with the provider, and finding information about one’s treatment (#8) also reflects healthcare navigation.

The absorption of health information into the other three subfactors is consistent with results from our qualitative work [16]: individuals with low health literacy demonstrated high engagement in the presence of collaborative communication. Being able to reliably turn to a provider for support can engender proxy efficacy, or a sense of efficacy that is socially mediated, gained from relationships with others who possess helpful expertise or agency. We do not discount the value that individual health literacy has to promote self-efficacy for engagement behaviors, but posit that individual strengths and vulnerabilities may be indirect indicators of engagement to the extent that these skills facilitate collaborative interaction with providers and systems of care. Our results are also consistent with efforts to promote engagement by system redesign to reduce the health literacy demands on patients [51]. Interventions such as the use of infographics, teach-back, decision aids, or team-based medical homes can enhance self-efficacy for engagement behaviors by decreasing task demands and making it easier for patients to participate in their care.

We used innovative methods to incorporate validity indicators into item selection in an effort to identify the most parsimonious, yet predictive, set of measurement items using SLT. We implemented analyses that optimized out-of-sample predictive validity as an exploratory check on potential measurement bias stemming from judgement calls made in item selection and factor structure. Based on these results, we were able to improve parsimony by reducing the item bank from 30 to 23 items, make small but meaningful improvements in unidimensionality indices, and enhance confidence in predictive validity.

Our results should be interpreted in light of several considerations. First, this initial calibration was conducted with users of VA healthcare. The Veterans Health Administration is the largest integrated healthcare system in the US, which, like other public sector settings and systems, provides care to lower income patients with high rates of mental health and chronic conditions [25]. While subsequent evaluation with users from other healthcare settings will lend more confidence to generalizability, especially in more racially diverse settings, initial work within VA healthcare system has the potential to support applicability to vulnerable populations. Application to vulnerable populations is essential for a measure of health care engagement, as settings that serve greater proportions of low SES, or minority patients, deliver less patient-centered care and fare worse under alternative payment models, both within and outside VA system.[5254] Second, a scale of 23 items is too long to be practical for many clinical and research purposes. Though the calibrated item bank lends itself to administration via computerized adaptive testing, the next step in our research is to develop a short form that can be more easily administered and interpreted at the point of care. Validation of the short form against indicators of engagement and healthcare quality measures will also be informative.

Efforts to measure healthcare engagement are particularly relevant for the increasing number of health care systems whose financial performance rests on effective population health management, or promoting and maintaining the health of a defined group of patients [55]. A reliable and valid patient-reported measure of healthcare engagement could be used in risk stratification models to enhance population health management through tailored care, to derive methods for equitable financial reimbursement, or as an outcome measure for organizational efforts to promote healthcare engagement. The precision, predictive validity, and lack of DIF with respect to race, ethnicity, or socioeconomic status for this item bank suggests considerable promise for these applications.

Acknowledgements

This manuscript is supported by grants #I21HX001855 and 1I01HX002317 from the United States (US) Department of Veterans Affairs Health Services Research and Development Service. The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the United States government.

Footnotes

Declarations

Availability of data and material

Data will be made available in accordance with the data management plan submitted to the VA Office of Research & Development. Within one year of publication of manuscripts addressing the aims of the grant, investigators will make a deidentified, anonymized dataset available to the public.

Code availability

Code for the analyses presented in this paper will be made available upon request.

Conflicts of interest/Competing interests

The authors declare no conflict of interest.

References

  • 1.Blumenthal D, & McGinnis JM (2015). Measuring Vital Signs: An IOM Report on Core Metrics for Health and Health Care Progress. JAMA, 313(19), 1901–1902. 10.1001/jama.2015.4862 [DOI] [PubMed] [Google Scholar]
  • 2.Carman KL, Dardess P, Maurer M, Sofaer S, Adams K, Bechtel C, & Sweeney J. (2013). Patient and family engagement: a framework for understanding the elements and developing interventions and policies. Health Aff (Millwood), 32(2), 223–31. 10.1377/hlthaff.2012.1133 [DOI] [PubMed] [Google Scholar]
  • 3.Frampton SB, Guastello S, Hoy L, Naylor M, Sheridan S, & Johnston-Fleece M. (2017). Harnessing Evidence and Experience to Change Culture: A Guiding Framework for Patient and Family Engaged Care. NAM Perspectives, 7(1), 1–37. 10.31478/201701f [DOI] [Google Scholar]
  • 4.Berwick DM, Nolan TW, & Whittington J. (2008). The triple aim: care, health, and cost. Health Aff (Millwood), 27(3), 759–69. 10.1377/hlthaff.27.3.759 [DOI] [PubMed] [Google Scholar]
  • 5.Castellon-Lopez Y, Skrine Jeffers K, Duru OK, Moreno G, Moin T, Grotts J, … Hays RD (2020). Psychometric Properties of the Altarum Consumer Engagement (ACE) Measure of Activation in Patients with Prediabetes. Journal of General Internal Medicine, 35(11), 3159–3165. 10.1007/s11606-020-05727-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Graffigna G, Barello S, Bonanomi A, & Lozza E. (2015). Measuring patient engagement: Development and psychometric properties of the Patient Health Engagement (PHE) Scale. Frontiers in Psychology, 6. 10.3389/fpsyg.2015.00274 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wasson JH, & Coleman EA (2014). Health confidence: an essential measure for patient engagement and better practice. Fam Pract Manag, 21(5), 8–12. [PubMed] [Google Scholar]
  • 8.Gruman J, Rovner MH, French ME, Jeffress D, Sofaer S, Shaller D, & Prager DJ (2010). From patient education to patient engagement: implications for the field of patient education. Patient Educ Couns, 78(3), 350–6. 10.1016/j.pec.2010.02.002 [DOI] [PubMed] [Google Scholar]
  • 9.Tzeng HM, & Marcus Pierson J. (2017). Measuring patient engagement: which healthcare engagement behaviours are important to patients? J Adv Nurs, 73(7), 1604–1609. 10.1111/jan.13257 [DOI] [PubMed] [Google Scholar]
  • 10.Powell RE, Doty A, Casten RJ, Rovner BW, & Rising KL (2016). A qualitative analysis of interprofessional healthcare team members’ perceptions of patient barriers to healthcare engagement. BMC Health Services Research, 16. 10.1186/s12913-016-1751-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Tai-Seale M, Sullivan G, Cheney A, Thomas K, & Frosch DL (2016). The language of engagement: “Aha!” moments from engaging patients and community partners in two pilot projects of the patient-centered outcomes research institute. The Permanente Journal, 20(2), 89–92. 10.7812/TPP/15-123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Grant RW, Altschuler A, Uratsu CS, Sanchez G, Schmittdiel JA, Adams AS, & Heisler M. (2017). Primary care visit preparation and communication for patients with poorly controlled diabetes: A qualitative study of patients and physicians. Prim Care Diabetes, 11(2), 148–153. 10.1016/j.pcd.2016.11.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Austin EJ, Lee JR, Bergstedt B, Mitchell AI, Javid SH, Ko CW, & Gore JL (2020). “Help me figure this out”: Qualitative explorations of patient experiences with cancer pathology reports. Patient Education and Counseling. 10.1016/j.pec.2020.07.020 [DOI] [PubMed] [Google Scholar]
  • 14.Smith SK, Dixon A, Trevena L, Nutbeam D, & McCaffery KJ (2009). Exploring patient involvement in healthcare decision making across different education and functional health literacy groups. Social Science & Medicine, 69(12), 1805–1812. 10.1016/j.socscimed.2009.09.056 [DOI] [PubMed] [Google Scholar]
  • 15.Bokhour BG, Cohn ES, Cortes DE, Solomon JL, Fix GM, Elwy AR, … Kressin NR (2012). The role of patients’ explanatory models and daily-lived experience in hypertension self-management. J Gen Intern Med, 27(12), 1626–34. 10.1007/s11606-012-2141-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kimerling R, Lewis ET, Javier SJ, & Zulman DM (2019). Opportunity or Burden? A Behavioral Framework for Patient Engagement. Medical Care. 10.1097/MLR.0000000000001240 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bombard Y, Baker GR, Orlando E, Fancott C, Bhatia P, Casalino S, … Pomey M-P (2018). Engaging patients to improve quality of care: a systematic review. Implementation Science, 13(1), 1–22. 10.1186/s13012-018-0784-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Bandura A. (2006). Guide for constructing self-efficacy scales. In Pajares F. & Urdan T. (Eds.), Self-efficacy beleifs of adolescents (pp. 307–337). Greenwich, Connecticut: Information Age Publishing. [Google Scholar]
  • 19.Bandura A. (2004). Health promotion by social cognitive means. Health Educ Behav, 31(2), 143–64. 10.1177/1090198104263660 [DOI] [PubMed] [Google Scholar]
  • 20.Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, … Group, P. C. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Med Care, 45(5 Suppl 1), S22–31. 10.1097/01.mlr.0000250483.85507.04 [DOI] [PubMed] [Google Scholar]
  • 21.Cella D, Riley W, Stone A, Rothrock N, Reeve B, Yount S, … Group, P. C. (2010). The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. J Clin Epidemiol, 63(11), 1179–94. 10.1016/j.jclinepi.2010.04.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.DeWalt DA, Rothrock N, Yount S, Stone AA, & PROMIS Cooperative Group. (2007). Evaluation of item candidates: the PROMIS qualitative item review. Medical care, 45(5 Suppl 1), S12–S21. 10.1097/01.mlr.0000254567.79743.e2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Clark LA, & Watson D. (1995). Constructing validity: basic issues in objective scale development. Psychological Assessment, 7(3), 309–319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Fihn SD, Francis J, Clancy C, Nielson C, Nelson K, Rumsfeld J, … Graham GL (2014). Insights From Advanced Analytics At The Veterans Health Administration. Health Affairs, 33(7), 1203–1211. 10.1377/hlthaff.2014.0054 [DOI] [PubMed] [Google Scholar]
  • 25.Wong ES, Wang V, Liu CF, Hebert PL, & Maciejewski ML (2015). Do Veterans Health Administration Enrollees Generalize to Other Populations? Med Care Res Rev. 10.1177/1077558715617382 [DOI] [PubMed] [Google Scholar]
  • 26.Hernandez SE, Sylling PW, Mor MK, Fine MJ, Nelson KM, Wong ES, … Hebert PL (2019). Developing an Algorithm for Combining Race and Ethnicity Data Sources in the Veterans Health Administration. Military Medicine, 185(3–4), 3495–e500. 10.1093/milmed/usz322 [DOI] [PubMed] [Google Scholar]
  • 27.Nelson KM, Chang ET, Zulman DM, Rubenstein LV, Kirkland FD, & Fihn SD (2019). Using Predictive Analytics to Guide Patient Care and Research in a National Health System. Journal of General Internal Medicine, 34(8), 1379–1380. 10.1007/s11606-019-04961-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wagner TH, Almenoff P, Francis J, Jacobs J, & Pal Chee C. (2018). Assessment of the Medicare Advantage Risk Adjustment Model for Measuring Veterans Affairs Hospital Performance. JAMA network open, 1(8), e185993. 10.1001/jamanetworkopen.2018.5993 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Gruber-Baldini AL, Velozo C, Romero S, & Shulman LM (2017). Validation of the PROMIS® measures of self-efficacy for managing chronic conditions. Quality of Life Research, 26(7), 1915–1924. 10.1007/s11136-017-1527-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Dyer N, Sorra JS, Smith SA, Cleary PD, & Hays RD (2012). Psychometric Properties of the Consumer Assessment of Healthcare Providers and Systems (CAHPSs) Clinician and Group Adult Visit Survey. Medical Care, 50(11), 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kimerling R, Pavao J, Greene L, Karpenko J, Rodriguez A, Saweikis M, & Washington DL (2015). Access to mental health care among women Veterans: is VA meeting women’s needs? Med Care, 53(4 Suppl 1), S97–S104. 10.1097/MLR.0000000000000272 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Bollen K, & Lennox R. (1991). Conventional wisdom on measurement: A structural equation perspective. Psychological Bulletin, 110(2), 305–314. 10.1037/0033-2909.110.2.305 [DOI] [Google Scholar]
  • 33.Streiner DL, Norman GR, & Cairney J. (2015). Health measurement scales: a practical guide to their development and use. Oxford University Press, USA. [Google Scholar]
  • 34.Revelle W. (1979). Hierarchical Cluster Analysis And The Internal Structure Of Tests. Multivariate Behav Res, 14(1), 57–74. 10.1207/s15327906mbr1401_4 [DOI] [PubMed] [Google Scholar]
  • 35.Reise SP, Morizot J, & Hays RD (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research, 16(1), 19–31. 10.1007/s11136-007-9183-7 [DOI] [PubMed] [Google Scholar]
  • 36.Reise SP, Bonifay WE, & Haviland MG (2013). Scoring and modeling psychological measures in the presence of multidimensionality. Journal of Personality Assessment, 95(2), 129–140. 10.1080/00223891.2012.725437 [DOI] [PubMed] [Google Scholar]
  • 37.Chapman BP, Weiss A, & Duberstein P. (2016). Statistical Learning Theory for High Dimensional Prediction: Application to Criterion-Keyed Scale Development. Psychological methods, 21(4), 603–620. 10.1037/met0000088 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Reise SP, Moore TM, & Haviland MG (2010). Bifactor models and rotations: exploring the extent to which multidimensional data yield univocal scale scores. J Pers Assess, 92(6), 544–59. 10.1080/00223891.2010.496477 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Smits N, van der Ark LA, & Conijn JM (2018). Measurement versus prediction in the construction of patient-reported outcome questionnaires: can we have our cake and eat it? Quality of Life Research, 27(7), 1673–1682. 10.1007/s11136-017-1720-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Elleman LG, McDougald SK, Condon DM, & Revelle W. (in press). That takes the BISCUIT: A comparative study of predictive accuracy and parsimony of four statistical learning techniques in personality data, with data missingness conditions. European Journal of Personality Assessment. 10.31234/osf.io/tuqap [DOI] [Google Scholar]
  • 41.Samejima F. (1968). ESTIMATION OF LATENT ABILITY USING A RESPONSE PATTERN OF GRADED SCORES. ETS Research Bulletin Series, 1968(1), i–169. 10.1002/j.2333-8504.1968.tb00153.x [DOI] [Google Scholar]
  • 42.Gibbons RD, Bock RD, Hedeker D, Weiss DJ, Segawa E, Bhaumik DK, … Stover A. (2007). Full-Information Item Bifactor Analysis of Graded Response Data. Applied Psychological Measurement, 31(1), 4–19. 10.1177/0146621606289485 [DOI] [Google Scholar]
  • 43.Toland MD, Sulis I, Giambona F, Porcu M, & Campbell JM (2017). Introduction to bifactor polytomous item response theory analysis. Journal of School Psychology, 60, 41–63. 10.1016/j.jsp.2016.11.001 [DOI] [PubMed] [Google Scholar]
  • 44.Stucky BD, Thissen D, & Edelen MO (2013). Using logistic approximations of marginal trace lines to develop short assessments. Applied Psychological Measurement, 37(1), 41–57. 10.1177/0146621612462759 [DOI] [Google Scholar]
  • 45.Choi SW, Gibbons LE, & Crane PK (2011). lordif : An R Package for Detecting Differential Item Functioning Using Iterative Hybrid Ordinal Logistic Regression/Item Response Theory and Monte Carlo Simulations. Journal of Statistical Software, 39(8). 10.18637/jss.v039.i08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Hays RD, Calderón JL, Spritzer KL, Reise SP, & Paz SH (2018). Differential item functioning by language on the PROMIS® physical functioning items for children and adolescents. Quality of Life Research, 27(1), 235–247. 10.1007/s11136-017-1691-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Revelle WR (2017). psych: Procedures for personality and psychological research. [Google Scholar]
  • 48.Rosseel Y. (2012). lavaan : An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2). 10.18637/jss.v048.i02 [DOI] [Google Scholar]
  • 49.Chalmers RP (2012). mirt : A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6). 10.18637/jss.v048.i06 [DOI] [Google Scholar]
  • 50.Bock RD, & Mislevy RJ (1982). Adaptive EAP Estimation of Ability in a Microcomputer Environment. Applied Psychological Measurement, 6(4), 431–444. 10.1177/014662168200600405 [DOI] [Google Scholar]
  • 51.Koh HK, Brach C, Harris LM, & Parchman ML (2013). A proposed “health literate care model” would constitute a systems approach to improving patients’ engagement in care. Health Aff (Millwood), 32(2), 357–67. 10.1377/hlthaff.2012.1205 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Roberts ET, Zaslavsky AM, Barnett ML, Landon BE, Ding L, & McWilliams JM (2018). Assessment of the Effect of Adjustment for Patient Characteristics on Hospital Readmission Rates: Implications for Pay for Performance. JAMA internal medicine, 178(11), 1498–1507. 10.1001/jamainternmed.2018.4481 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Shakir M, Armstrong K, & Wasfy JH (2018). Could Pay-for-Performance Worsen Health Disparities? J Gen Intern Med, 33(4), 567–569. 10.1007/s11606-017-4243-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Hausmann LRM, Canamucio A, Gao S, Jones AL, Keddem S, Long JA, & Werner R. (2017). Racial and Ethnic Minority Concentration in Veterans Affairs Facilities and Delivery of Patient-Centered Primary Care. Population Health Management, 20(3), 189–198. 10.1089/pop.2016.0053 [DOI] [PubMed] [Google Scholar]
  • 55.Ginsburg PB (2013). Achieving health care cost containment through provider payment reform that engages patients and providers. Health Affairs (Project Hope), 32(5), 929–934. 10.1377/hlthaff.2012.1007 [DOI] [PubMed] [Google Scholar]

RESOURCES