Abstract
Objective:
Low health literacy is a concern among US Veterans. In this study, we evaluated NoteAid, a system that provides lay definitions to medical jargon terms in EHR notes to help Veterans comprehend EHR notes. We expected that low initial scores for Veterans would be improved by using NoteAid.
Materials and Methods:
We recruited Veterans from the Amazon Mechanical Turk crowd work platform (MTurk). We also recruited non-Veterans from MTurk as a control group for comparison. We randomly split recruited MTurk Veteran participants into control and intervention groups. We recruited non-Veteran participants into mutually exclusive control or intervention tasks on the MTurk platform. We showed participants de-identified EHR notes and asked them to answer comprehension questions related to the notes. We provided participants in the intervention group with EHR note content processed with NoteAid, while NoteAid was not available for participants in the control group.
Results:
We recruited 94 Veterans and 181 non-Veterans. NoteAid leads to a significant improvement for non-Veterans but not for Veterans. Comparing Veterans recruited via MTurk with non-Veterans recruited via MTurk, we found that without NoteAid, Veterans have significantly higher raw scores than non-Veterans. This difference is not significant with NoteAid.
Discussion:
That Veterans outperform a comparable population of non-Veterans is a surprising outcome. Without NoteAid, scores on the test are already high for Veterans, therefore, minimizing the ability of an intervention such as NoteAid to improve performance. With regards to Veterans, understanding the health literacy of Veterans has been an open question. We show here that Veterans score higher than a comparable, non-Veteran population.
Conclusion:
Veterans on MTurk do not see improved scores when using NoteAid, but they already score high on the test, significantly higher than non-Veterans. When evaluating NoteAid, population specifics need to be considered, as performance may vary across groups. Future work investigating the effectiveness of NoteAid on improving comprehension with local Veterans and developing a more difficult test to assess groups with higher health literacy is needed.
Keywords: Health literacy, Health information technology, Electronic health records
Graphical Abstract

1. INTRODUCTION
1.1. Background and Significance
Health literacy is defined as “how well an individual understands their health condition and information provided by health care providers” [1]. Low health literacy is a widespread issue among patients, especially among Veterans of the United States Military (“Veterans”) who face physical, mental, and social issues that make them a vulnerable population [2]. Low health literacy has been called a “major problem” and has been linked to health issues for patients and higher costs for healthcare centers [3]. Patients with low health literacy are at increased risk of adverse health outcomes. Low health literacy has been associated with higher mortality in the elderly, less awareness of current health conditions, and increased fear of cancer progression [4]–[9]. One study estimates that only 45% of Veterans have adequate health literacy [2].
Identifying patients with low health literacy and providing resources they can use to improve their understanding are critical areas in population health research [10]–[12]. If a patient is unable to understand the details of their medical history or current condition, they may not be able to take steps to improve the condition and may not follow up as recommended [5]. Low health literacy has been associated with a variety of adverse effects for patients [3]–[7], [9].
One area of health literacy that has become more critical in recent years is eHealth literacy [13], [14]. eHealth literacy has been defined as “the ability to seek, find, understand, and appraise health information from electronic sources and apply the knowledge gained to addressing or solving a health problem” [15]. With the recent mandate for granting patients access to view their medical records online via patient portals, there is a growing need for eHealth literacy measurement instruments and interventions to improve eHealth literacy. One assessment tool is the ComprehENotes test [16], which tests the ability of patients to understand the free-text notes in electronic health records (EHRs). ComprehENotes questions were created by physicians and medical researchers and evaluated using Item Response Theory (IRT) [17]. Subsequent research demonstrated that NoteAid, an educational intervention tool that automatically defines medical terms, improved patient comprehension as measured using ComprehENotes [18]–[20]. NoteAid augments a key digital health intervention [21], namely EHRs, and ComprehENotes is a targeted assessment of computer and health literacy, placing these two tools firmly within the realm of eHealth literacy [15].
Veterans are an understudied group concerning health literacy [2], [22], [23]. While prior work has studied the use of eHealth tools by Veterans [23]–[27], to our knowledge, no study of eHealth literacy interventions has focused directly on Veterans’ comprehension of their EHR notes. In addition, the question of Veterans having higher or lower eHealth literacy than the general population has not been studied. Prior work comparing Veterans to non-Veterans regarding health numeracy has been mixed [5], [28].
Veterans are a vulnerable population, with a number of demographic and socioeconomic factors that leave them vulnerable to poor health outcomes [2], [29]. 40% of Veterans report poor or inadequate health literacy [29]. Prior work showing that NoteAid improves comprehension across a number of groups [20] suggests that it should also be helpful for Veterans. We expect low scores for Veterans without NoteAid, and significant improvements with NoteAid. Our goal with this work is to answer two research questions related to Veterans:
RQ1: Are Veterans’ scores on ComprehENotes quantitatively different than non-Veterans?
RQ2: Do Veterans benefit from tools such as NoteAid in the same way that non-Veterans do?
2. MATERIALS AND METHODS
The IRB at the University of Massachusetts approved the work in this study. All participants provided informed consent before participating.
2.1. NoteAid
NoteAid is a natural language processing system for identifying medical terminology in EHR notes and providing lay-language definitions. NoteAid consists of the CoDeMed repository of lay definitions for medical jargon [18] and the MedLink system for matching definitions to jargon terms in EHR notes. Users can enter text into NoteAid. The text is processed and displayed to the user with definitions embedded as tooltip text. Users can hover their mouse over terms to view definitions (Figure 1).
Figure 1.

An example of medical terminology definition using NoteAid. Terms that are defined by NoteAid are highlighted in blue. When a user hovers over a highlighted term with their mouse, the definition is presented as tooltip text. In this example, the definition of ESG is shown in the gray pop-up bubble.
2.2. ComprehENotes
The ComprehENotes test is the first test to directly measure people’s ability to comprehend EHR notes [16]. Prior work used the ComprehENotes test to demonstrate that the NoteAid medical term definition system effectively improves note comprehension [19], [20]. The ComprehENotes test consists of 14 passages taken from de-identified EHR notes. Each of these passages contains a section in boldface and is followed by three options that attempt to paraphrase or explain the bolded section. The test takers are asked to identify the best explanation from the options. A group of physicians and non-clinical medical researchers wrote the passages and options in the ComprehENotes test using Sentence Verification Technique (SVT) [30], [31]. SVT is a technique for assessing reading comprehension where individuals read a snippet of text and select from multiple options the text that has the same meaning. Prior work analyzed these questions using IRT to examine the psychometric properties of the items and the overall test using responses collected from the Amazon Mechanical Turk crowdsourcing platform (MTurk) [16]. The MTurk platform facilitates participant recruitment for user studies, data annotation, and other research tasks [32]–[34]. MTurk has been used to successfully and effectively obtain data in various research areas, [35] and allows for attention-check questions to ensure that participants provide high-quality data [36]. MTurk also offers a more diverse set of participants than student recruitment or local community recruitment [32], [37]–[39]. The participants receive the 14 passage-question pairs one at a time. We collected responses directly on our server.
2.2.1. Web implementation of ComprehENotes
In this work, we implemented the ComprehENotes test as a web application. This implementation allowed for flexibility in delivery and intervention modifications. The control version of the test showed the users a brief introduction paragraph and gave demographic information. We then administered the test one question at a time in randomized order. The test included quality control questions to ensure that MTurk respondents completed the task to the best of their ability.
As in our prior work, we implemented the NoteAid web interface as a built-in feature of the ComprehENotes test to remove the need for the user to navigate to the NoteAid page to obtain definitions. We instead defined the terms directly on the ComprehENotes page for ease of use.
In the intervention arm, we pre-processed each ComprehENotes passage with NoteAid, and embedded the results into the test web application directly. We added definitions of terms to the web application as tooltip text. We underlined terms with available definitions. The definition would automatically display when a user hovered over a defined term (Figure 2). The introductory text described this behavior so that the individuals were aware of the definitions and knew how to access them.
Figure 2.

Example of a ComprehENotes test question with embedded NoteAid definitions.
2.3. Data Collection
2.3.1. Veteran Recruitment
To recruit Veterans on MTurk, we used a previously developed screening tool for identifying Veterans [40]. The screener consists of several questions targeted towards individuals with military experience and two attention check questions to ensure that workers are paying attention (Figure 3). While non-Veterans may answer these questions correctly, it is improbable. Prior work has shown that this screener is effective for recruiting Veterans on MTurk [40].
Figure 3.

Screenshot showing the Veteran screener as adapted from the literature. The screener is dynamic; in this example, we show the possible ranks for question 5 when “Army” is selected for question 1 [40].
We implemented the screener as part of the ComprehENotes web application. The screener is dynamic, in particular for question 5. Enlisted ranks vary by branch, so the options available are populated based on the response to question 1.* Potential workers were first shown the Veteran screener for this specific recruitment task. As done in prior studies, if workers answered four or more of the non-attention gauge questions correctly, they could continue to the task as described above [40]. Otherwise, the program informed them that they did not qualify for this particular task. If the workers answered both attention check questions incorrectly, they could not continue. The app was dynamic to accommodate the fact that specific questions had different answers for different military branches. For example, one question asked the user to order ranks by seniority. Since the list of ranks differs between military branches, the app automatically populates the appropriate ranks for sorting once the worker identifies his or her military branch. After passing the screener, workers were randomly assigned to either the control group or the NoteAid treatment group.
2.3.2. Non-Veteran Recruitment
To compare Veterans to a non-Veteran population, we employed data collected from workers from the general MTurk population. While MTurk is not an ideal venue for testing individuals with low eHealth literacy [16], its use ensures a fair comparison between Veterans and non-Veterans for this part of the study by removing recruitment location as a mitigating factor. . We recruited non-Veterans from MTurk in sequential batches. First, we recruited MTurk workers for the control group. Once the control group task was completed, we recruited the NoteAid treatment group. MTurk workers who participated in the control group were excluded from participating in the treatment group.
2.4. Data Analysis
We fit the data to a generalized linear mixed model (GLMM), where we modeled the individual participants’ total scores from the 14 items with a binomial distribution. The probability of correctly answering a question is linked through a logistic link function to the four groups defined by crossing Veteran status (Veteran and non-Veteran) and condition (with and without NoteAid). We model individual differences in this probability within the groups through a random effect.
To address the first research question, we test the difference between Veterans and non-Veterans under the control and the NoteAid conditions separately. To address the second research question, we test the difference between the NoteAid and the control conditions for Veterans and non-Veterans separately and then test whether this intervention effect differs across Veteran and non-Veteran cohorts. We use Bonferroni correction for multiplicity control of all five tests. We repeated these tests with demographic variables showing significant variations across groups included in the GLMM.
3. RESULTS
3.1. Demographics
Table 1 shows the demographic discrepancies between the Veteran and non-Veteran cohorts on MTurk and between the control and NoteAid conditions within each of the two cohorts.
Table 1.
Demographic information of study participants.
| Characteristic | Veteran | Non-Veteran | Overall | ||||
|---|---|---|---|---|---|---|---|
| Control N=54 |
NoteAid N=40 |
Total N=94 |
Control N=92 |
NoteAid N=89 |
Total N=181 |
Total N=275 |
|
| Age | |||||||
| 18–21 | 0 (0%) | 1 (2.5%) | 1 (1.1%) | 2 (2.2%) | 1 (1.1%) | 3 (1.7%) | 4 (1.5%) |
| 22–34 | 21 (38.9%) | 16 (40.0%) | 37 (39.4%) | 50 (54.3%) | 55 (61.8%) | 105 (58.0%) | 142 (51.6%) |
| 35–44 | 18 (33.3%) | 14 (35.0%) | 32 (34.0%) | 25 (27.2%) | 21 (23.6%) | 46 (25.4%) | 78 (28.4%) |
| 45–54 | 11 (20.4%) | 6 (15.0%) | 17 (18.1%) | 10 (10.9%) | 8 (9.0%) | 18 (9.9%) | 35 (12.7%) |
| 55–64 | 2 (3.7%) | 2 (5.0%) | 4 (4.3%) | 5 (5.4%) | 3 (3.4%) | 8 (4.4%) | 12 (4.4%) |
| 65 and over | 2 (3.7%) | 1 (2.5%) | 3 (3.2%) | 0 (0%) | 1 (1.1%) | 1 (0.6%) | 4 (1.5%) |
| Education | |||||||
| High School | 10 (18.5%) | 5 (12.5%) | 15 (16.0%) | 23 (25.0%) | 26 (29.2%) | 49 (27.1%) | 64 (23.3%) |
| Associates | 14 (25.9%) | 7 (17.5%) | 21 (22.3%) | 16 (17.4%) | 20 (22.5%) | 36 (19.9%) | 57 (20.7%) |
| Bachelors | 26 (48.1%) | 20 (50.0%) | 46 (48.9%) | 47 (51.1%) | 39 (43.8%) | 86 (47.5%) | 132 (48.0%) |
| Masters | 4 (7.4%) | 8 (20.0%) | 12 (12.8%) | 6 (6.5%) | 4 (4.5%) | 10 (5.5%) | 22 (8.0%) |
| Race | |||||||
| African American | 4 (7.4%) | 4 (10.0%) | 8 (8.5%) | 8 (8.7%) | 9 (10.1%) | 17 (9.4%) | 25 (9.1%) |
| Asian | 3 (5.6%) | 1 (2.5%) | 4 (4.3%) | 7 (7.6%) | 1 (1.1%) | 8 (4.4%) | 12 (4.4%) |
| Hispanic | 3 (5.6%) | 5 (12.5%) | 8 (8.5%) | 15 (16.3%) | 6 (6.7%) | 21 (11.6%) | 29 (10.5%) |
| White | 44 (81.5%) | 30 (75.0%) | 74 (78.7%) | 62 (67.4%) | 73 (82.0%) | 135 (74.6%) | 209 (76.0%) |
| Gender | |||||||
| Female | 14 (25.9%) | 15 (37.5%) | 29 (30.9%) | 37 (40.2%) | 40 (44.9%) | 77 (42.5%) | 106 (38.5%) |
| Male | 40 (74.1%) | 24 (60.0%) | 64 (68.1%) | 55 (59.8%) | 49 (55.1%) | 104 (57.5%) | 168 (61.1%) |
| Other | 0 (0%) | 1 (2.5%) | 1 (1.1%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (0.4%) |
| Profession | |||||||
| Medical Student | 1 (1.9%) | 0 (0%) | 1 (1.1%) | 3 (3.3%) | 2 (2.2%) | 5 (2.8%) | 6 (2.2%) |
| Other Role in Healthcare | 10 (18.5%) | 6 (15.0%) | 16 (17.0%) | 8 (8.7%) | 11 (12.4%) | 19 (10.5) | 35 (12.7%) |
| Nurse | 2 (3.7%) | 1 (2.5%) | 3 (3.2%) | 2 (2.2%) | 5 (5.6%) | 7 (3.9%) | 10 (3.6%) |
| Physician | 1 (1.9%) | 3 (7.5%) | 4 (4.3%) | 4 (4.3%) | 0 (0%) | 4 (2.2%) | 8 (2.9%) |
| Other | 40 (74.1%) | 30 (75.0%) | 70 (74.5%) | 75 (81.5%) | 71 (79.8%) | 146 (80.7%) | 216 (78.5%) |
To investigate whether Veteran and non-Veteran groups differ in their demographics, we conducted χ2 tests of independence with Monte Carlo p-value for gender, race, and profession; we used generalized linear models to test age and education’s linear and nonlinear components in predicting Veteran status. Before these tests, we combined extreme categories of age with their adjacent categories, we gave the remaining categories scores of 26.5, 40, 50, and 60, we dichotomized professions as medical and non-medical professions, and we removed the case with non-binary gender. “Other Role in Healthcare” represents respondents who work in another profession in healthcare. For multiplicity control, we used Bonferroni adjustment proportionally with df of the tests. We detected a linear age effect (adjusted p<0.05). In addition, we fit a logistic regression model with all demographic variables dichotomized and their interactions included to predict Veteran status. We found that this model was significantly different from a model with intercept only (p<0.001). These results suggest that Veterans and non-Veterans differ in their demographic distributions, and the difference lies in that the Veteran group is older than the non-Veteran group.
We conducted similar analyses to test whether participants in the control and NoteAid conditions differ in their demographics for the Veteran group. We detected no significant effect. This result confirms the successful randomization of the Veterans. We obtained similar results for the non-Veterans even though they were not randomized but were instead recruited in sequential, mutually exclusive batches.
3.2. Statistical Analysis (without Adjustment for Demographic Variables)
Table 2 shows the summary statistics from the GLMM. We observe that non-Veteran performance with NoteAid is similar to that of Veterans under both the control and the NoteAid conditions; the non-Veteran control performance appears lower than the other three groups.
Table 2:
Summary statistics for the four groups from the GLMM. The group mean logits are the fixed effects in the model, and they correspond to the median proportion of correct responses. The random effect has an estimated SD of 1.327. Note the CIs are one-at-a-time CIs.
| Group mean logit | Group median proportion of correctness | ||
|---|---|---|---|
| Groups | Estimate | SE | Estimate (95% CI) |
| Non-Veteran control | 1.615 | 0.166 | 0.83 (0.78, 0.87) |
| Non-Veteran NoteAid | 2.281 | 0.179 | 0.91 (0.87, 0.93) |
| Veteran control | 2.370 | 0.231 | 0.91 (0.87, 0.94) |
| Veteran NoteAid | 2.558 | 0.272 | 0.93 (0.88, 0.96) |
Table 3 contains the five comparisons conducted to address the two research questions.
Table 3:
Summary of comparisons among the four groups. Cohen’s d is the estimated fixed effect divided by the estimated SD of the random effect (1.327). Difference in effects is the interaction contrast between cohort and intervention. CIs are simultaneous.
| In mean logit | as odds ratio | ||||
|---|---|---|---|---|---|
| Comparisons | Estimate | SE | Cohen’s d | Estimate | 95% CI |
| Intervention effect in non-Veterans | 0.667 | 0.240 | 0.50 | 1.95 | (1.05, 3.61) |
| Intervention effect in Veterans | 0.188 | 0.352 | 0.14 | 1.21 | (0.49, 2.99) |
| Cohort effect without NoteAid | 0.756 | 0.280 | 0.57 | 2.13 | (1.04, 4.38) |
| Cohort effect with NoteAid | 0.277 | 0.321 | 0.21 | 1.32 | (0.58, 3.01) |
| Difference in effects | 0.479 | 0.426 | 0.36 | 1.61* | (0.54, 4.84) |
This is the ratio of two odds ratios.
Regarding the effect of NoteAid, it improves the median performance of the non-Veteran by increasing the odds of correctly answering a question by 95% (adjusted p<0.05) and by increasing the group mean logit by 0.5 group SD. In contrast, the effect of NoteAid on the median performance of the Veterans is not significant (OR = 1.21, Cohen’s d = 0.14, adjusted p>0.05). While this appears to suggest a differential effect of NoteAid across Veteran and non-Veteran groups, such difference in effect is not significant (ratio of ORs = 1.61, Cohen’s d = 0.36, adjusted p > 0.05). With regard to our second research question: the difference in the effects of NoteAid between the Veterans group and the non-Veterans group is not significant.
Regarding the cohort difference, the Veterans and non-Veterans show a significant difference in performance without NoteAid: the median Veteran has more than twice the odds to correctly answer a question than the median non-Veteran, and its logit is more than half of the group SD higher (adjusted p<0.05). In contrast, with NoteAid, the two cohorts do not show a significant difference in their performance (OR =1.32, Cohen’s d = 0.21, adjusted p > 0.05). To answer our first research question, without NoteAid there is a significant difference in performance between Veterans and non-Veterans. With NoteAid, that difference is no longer significant.
3.3. Statistical Analysis with Linear Trends of Age and Education Adjusted
Due to the significant age difference between the Veteran and non-Veteran groups, we further investigate whether any of the results above can be attributed to such age differences. Education would also be significant between the two groups without multiplicity adjustment, so it is also investigated. As we did before examining the demographic differences across the four groups, we combined extreme categories of age with their adjacent categories and gave the remaining categories scores of 26.5, 40, 50, and 60; we coded education levels with equally spaced values.
We then added these variables to the GLMM. Age is found significant: an increase of 10 years of age is associated with a 0.41 increase in logit, or 50% increase in odds (p<0.05). Education is not significant (p>0.05).
Table 4 contains the five comparisons similar to Table 3, but now with Age and Education adjusted. Again, NoteAid improves the median performance of the non-Veteran by almost doubling the odds of correctly answering a question (adjusted p<0.05) and by increasing the group mean logit by more than half of the group SD. In contrast, the effect of NoteAid on the median performance of the Veterans is not significant (OR = 1.29, Cohen’s d = 0.19, adjusted p>0.05). While this appears to suggest a differential effect of NoteAid across Veteran and non-Veteran groups, such difference in effect is not significant (ratio of ORs = 1.53, Cohen’s d = 0.32, adjusted p > 0.05). These conclusions stay qualitatively unchanged from Table 3. Specifically, there is a significant improvement for non-Veterans when using NoteAid.
Table 4:
Summary of comparisons among the four groups. Cohen’s d is the estimated fixed effect divided by the estimated SD of the random effect (1.327) from the GLMM without demographic adjustment. Difference in effects is the interaction contrast between cohort and intervention. CIs are simultaneous.
| In mean logit | as odds ratio | ||||
|---|---|---|---|---|---|
| Comparisons | Estimate | SE | Cohen’s d | Estimate | 95% CI |
| Intervention effect in non-Veterans | 0.686 | 0.231 | 0.52 | 1.99 | (1.09, 3.60) |
| Intervention effect in Veterans | 0.258 | 0.342 | 0.19 | 1.29 | (0.54, 3.12) |
| Cohort effect without NoteAid | 0.601 | 0.272 | 0.45 | 1.82 | (0.91, 3.68) |
| Cohort effect with NoteAid | 0.172 | 0.317 | 0.13 | 1.19 | (0.52, 2.69) |
| Difference in effects | 0.428 | 0.414 | 0.32 | 1.53* | (0.53, 4.46) |
This is the ratio of two odds ratios.
Regarding the cohort difference, the Veterans and non-Veterans no longer show a significant difference in performance without NoteAid (OR =1.82, Cohen’s d = 0.45, adjusted p > 0.05). With NoteAid, the two cohorts do not show a significant difference in their performance either (OR =1.19, Cohen’s d = 0.13, adjusted p > 0.05). Either with or without NoteAid, there is no significant difference in performance between Veterans and non-Veterans of the same age and education level. Regarding our second research question, with age and education adjusted, there is no significant difference between the Veteran and non-Veteran cohorts, with or without NoteAid.
4. DISCUSSION
Our results confirm prior work demonstrating the effectiveness of NoteAid on improving EHR note comprehension [11], [19]. Giving participants access to lay definitions of medical terms improves comprehension among the non-Veterans. For the Veterans, however, this effect is not significant. This is a surprising result, especially considering our expectations based on the low health literacy of Veterans reported in prior work [29]. This is likely due to a ceiling effect, where scores without NoteAid are already so high that interventions can only have a limited impact on performance.
Our data do not provide conclusive evidence for any differential effect of NoteAid across Veteran and non-Veteran groups. That said, our results comparing Veterans to non-Veterans found that without NoteAid, Veterans significantly outperform non-Veterans. This result could be due to age differences. Our demographic analyses indicated that the Veteran group was significantly older than the non-Veteran group. The majority of non-Veterans were between 22 and 34 years old. There were more Veterans in the 35–44 and 45–54 age buckets. While research shows that health literacy and age are negatively correlated in the elderly [41], it could be that middle-aged Veterans have more experience with the healthcare system than the younger non-Veterans. When we adjust for age and education, the difference between cohorts without NoteAid is no longer significant.
Demonstrating that Veterans’ health literacy is significantly different from non-Veterans is an important result. By recruiting both Veterans and non-Veterans from MTurk, we removed recruitment location as a mitigating factor. Therefore, we assume that the differences in performance can be attributed to the difference in the Veteran and non-Veteran populations. Identifying differences between Veterans and non-Veterans for health literacy validates the need for research into Veteran-specific assessments and interventions for low health literacy. However, the observed ceiling effect suggests that future work should look into recruiting Veterans at US Department of Veterans Affairs (VA) clinics. More research is needed to determine whether the high scores are an artifact of the Veterans’ ability to access and take the test on an online platform such as MTurk.
In this work, we utilized a previously developed screener for Veterans [40]. To the best of our knowledge, this screener is not widely used in the literature. However, we found that it was a useful screener for filtering out non-Veteran MTurk workers during the Veteran recruitment portion of our study. While there is a risk that non-Veterans are identified as Veterans to complete the task, we believe it is unlikely. This study shows that the screener is effective and can be used in future studies.
4.1. Limitations
There are several limitations to this study. First is the limited amount of information that we have for the Veterans. For example, we do not know how long it has been since our participants were discharged from service, how long they served, and what they did post-discharge. However, we are able to collect a number of demographic characteristics. Participants recruited from MTurk remain anonymous. Therefore we cannot obtain health records from the participants. In order to recruit enough Veterans for our analyses, we relied on MTurk and the web screener as in prior work. Future work analyzing the effect of NoteAid on local Veteran populations may be able to leverage health records at the VA for more detailed demographic information.
We acknowledge that recruiting Veterans from MTurk may skew results toward those individuals who can navigate the MTurk web page. However, we account for this by collecting responses from a comparable group of non-Veterans via the Mechanical Turk platform. This way, both cohorts have a similar degree of technical competency, and our base population is consistent, except for the fact that one group is Veterans and one is not. While there could be more fine-grained considerations of what may or may not increase an individual’s note comprehension for both Veterans and non-Veterans, our data analyses accounted for between-group differences in demographics. Our main goal with this study was to determine whether there was a significant difference in the effectiveness of NoteAid between Veterans and non-Veterans.
Our comparison of the two groups shows that the Veterans group is older than the non-Veterans group, but the other demographic characteristics are similar. Age may be a factor that affects participants’ performance, which is worth investigating in future work. Our conclusions regarding Veteran status should be understood as a description of the difference between the Veteran and non-Veteran groups accessible online, not as evidence of a causal relationship between serving in the army and eHealth literacy.
Our study population’s education levels are higher than the broader population. We believe that this is due to the fact that, overall, the population of MTurk workers is more highly educated than the general population. Since we recruited both Veterans and non-Veterans from MTurk, the education levels are consistent across groups, and we were able to recruit a suitable number of participants for each group. On the other hand, one limitation of our study is that the recruitment of MTurk workers may miss the population with low health literacy, and the recruitment of such a population will be our future work (e.g., recruitment from hospitals, VA centers, and churches).
Lastly, an assumption of this work is that individuals (Veterans and non-Veterans) attempt to manage their own healthcare. However, for many individuals, family members or caregivers assist or take care of healthcare-related tasks for them. In this work, we assess the EHR note comprehension of the patients themselves; future work may evaluate the comprehension of caregivers if they are the ones who handle the healthcare affairs of the patient.
5. CONCLUSION
In this work, we examined two research questions with regard to Veterans and low health literacy. We found that Veterans’ scores on the ComprehENotes test are significantly higher than non-Veterans when no intervention is provided. While most Veterans are older, adult males, a population typically associated with lower health literacy, this result contradicts several studies in the literature showing that Veterans have lower health literacy than a comparable non-Veteran cohort.
We did not find significant difference in the effects of NoteAid on scores between the two groups. This result is consistent with the use of NoteAid as a general-purpose tool for identifying and defining medical jargon. Future work can investigate deployment of NoteAidalongside systems such as the Department of Veteran Affairs’ myHealtheVet patient portal.
Low health literacy is a significant problem, and interventions to improve health literacy are needed. Using NoteAid to define medical terms for EHR note comprehension significantly improves many groups on the ComprehENotes test. More work is required to study further the impact of NoteAid on local populations, particularly Veterans. The Veterans who can navigate MTurk and complete tasks online may not represent Veterans as a whole, just as the non-Veteran MTurk population is not representative of the larger population. Therefore, studies in local hospitals and VA centers with Veterans are needed. Because of the observed ceiling effect for Veterans, future work in population-specific, specialized tests of EHR note comprehension should be developed. For certain groups, harder questions might be necessary to establish an appropriate baseline. We also effectively utilized a previously developed Veteran screener tool [40]. By using the screener, we were able to recruit a large number of Veterans on MTurk directly. The screener can therefore be used in future research for recruiting Veterans directly on MTurk.
Highlights.
A web-based screener is an efficient and effective way to identify Veterans on crowdsourcing platforms.
Veterans score higher on a test of electronic health record literacy than a comparable lay population.
Using the NoteAid tool for defining medical jargon leads to similar scores for Veterans and non-Veterans.
SUMMARY TABLE.
NoteAid can improve electronic health record note comprehension for a variety of user groups.
Veterans are vulnerable population, and report lower levels of health literacy than the wider population.
Veterans score higher than a comparable lay population in electronic health record note comprehension.
ACKNOWLEDGEMENTS
The authors thank the anonymous Amazon Mechanical Turk participants who participated in the Amazon Mechanical Turk tasks and the subjects who participated in the community hospital tasks. This work was supported in part by grant R01LM012817 from the National Institutes of Health (NIH). The content is solely the responsibility of the authors and does not represent the views of the NIH.
Abbreviations:
- EHR
Electronic Health Record
- IRT
Item Response Theory
- SVT
Sentence Verification Technique
- MTurk
Amazon Mechanical Turk
- GLMM
Generalized Linear Mixed Model
- VA
US Department of Veterans Affairs
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
We only use military branch to dynamically populate the screener questions. Future work investigating differences in EHR note comprehension by military branch could leverage this information.
Author Statement
John Patrick Lalor: Conceptualization, Methodology, Software, Validation, Writing – Original Draft, Visualization, Data Curation
Hao Wu: Conceptualization, Methodology, Software, Validation, Writing – Original Draft, Visualization
Kathleen Mazor: Conceptualization, Writing – Review and Editing
Hong Yu: Conceptualization, Writing – Review and Editing, Funding Acquisition, Data Curation
Conflict of Interest Statement
The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.
• John Lalor
• Hao Wu
• Kathleen Mazor
• Hong Yu
REFERENCES
- [1].Walters M and Korshak L, “Health Literacy to Achieve Health Equity in Minority Veterans,” 2020. https://www.va.gov/HEALTHEQUITY/Health_Literacy_to_Achieve_Health_Equity_In_Minority_Veterans.asp (accessed Dec. 20, 2022).
- [2].Rodríguez V et al. , “Health Literacy, Numeracy, and Graphical Literacy Among Veterans in Primary Care and Their Effect on Shared Decision Making and Trust in Physicians,” J Health Commun, vol. 18, no. Suppl 1, pp. 273–289, Dec. 2013, doi: 10.1080/10810730.2013.829137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Vastag B, “Low health literacy called a major problem,” JAMA, vol. 291, no. 18, pp. 2181–2182, May 2004, doi: 10.1001/jama.291.18.2181. [DOI] [PubMed] [Google Scholar]
- [4].Sudore RL et al. , “Limited literacy and mortality in the elderly: the health, aging, and body composition study,” J Gen Intern Med, vol. 21, no. 8, pp. 806–812, Aug. 2006, doi: 10.1111/j.1525-1497.2006.00539.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Schapira MM et al. , “The development and validation of the hypertension evaluation of lifestyle and management knowledge scale,” J Clin Hypertens (Greenwich), vol. 14, no. 7, pp. 461–466, Jul. 2012, doi: 10.1111/j.1751-7176.2012.00619.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Chapman K, Abraham C, Jenkins V, and Fallowfield L, “Lay understanding of terms used in cancer consultations,” Psychooncology, vol. 12, no. 6, pp. 557–566, Sep. 2003, doi: 10.1002/pon.673. [DOI] [PubMed] [Google Scholar]
- [7].Lerner EB, Jehle DV, Janicke DM, and Moscati RM, “Medical communication: do our patients understand?,” Am J Emerg Med, vol. 18, no. 7, pp. 764–766, Nov. 2000, doi: 10.1053/ajem.2000.18040. [DOI] [PubMed] [Google Scholar]
- [8].Reading SR et al. , “Health Literacy and Awareness of Atrial Fibrillation,” Journal of the American Heart Association, vol. 6, no. 4, p. e005128, Apr. 2017, doi: 10.1161/JAHA.116.005128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Halbach SM et al. , “Health literacy and fear of cancer progression in elderly women newly diagnosed with breast cancer—A longitudinal analysis,” Patient Education and Counseling, vol. 99, no. 5, pp. 855–862, May 2016, doi: 10.1016/j.pec.2015.12.012. [DOI] [PubMed] [Google Scholar]
- [10].Jeppesen KM, Coyle JD, and Miser WF, “Screening Questions to Predict Limited Health Literacy: A Cross-Sectional Study of Patients With Diabetes Mellitus,” Ann Fam Med, vol. 7, no. 1, pp. 24–31, Jan. 2009, doi: 10.1370/afm.919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Polepalli Ramesh B, Houston T, Brandt C, Fang H, and Yu H, “Improving Patients’ Electronic Health Record Comprehension with NoteAid,” Stud Health Technol Inform, vol. 192, pp. 714–718, 2013. [PubMed] [Google Scholar]
- [12].Ylitalo KR, Meyer MRU, Lanning BA, During C, Laschober R, and Griggs JO, “Simple screening tools to identify limited health literacy in a low-income patient population,” Medicine (Baltimore), vol. 97, no. 10, Mar. 2018, doi: 10.1097/MD.0000000000010110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Noblin AM, Wan TTH, and Fottler M, “The Impact of Health Literacy on a Patient’s Decision to Adopt a Personal Health Record,” Perspect Health Inf Manag, vol. 9, no. Fall, Oct. 2012, Accessed: Jan. 04, 2018. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3510648/ [PMC free article] [PubMed] [Google Scholar]
- [14].Giudice PD et al. , “Correlation Between eHealth Literacy and Health Literacy Using the eHealth Literacy Scale and Real-Life Experiences in the Health Sector as a Proxy Measure of Functional Health Literacy: Cross-Sectional Web-Based Survey,” Journal of Medical Internet Research, vol. 20, no. 10, p. e281, 2018, doi: 10.2196/jmir.9401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Norman CD and Skinner HA, “eHealth Literacy: Essential Skills for Consumer Health in a Networked World,” Journal of Medical Internet Research, vol. 8, no. 2, p. e506, Jun. 2006, doi: 10.2196/jmir.8.2.e9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Lalor JP, Wu H, Chen L, Mazor KM, and Yu H, “ComprehENotes, an Instrument to Assess Patient Reading Comprehension of Electronic Health Record Notes: Development and Validation,” Journal of Medical Internet Research, vol. 20, no. 4, p. e139, 2018, doi: 10.2196/jmir.9380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Baker FB and Kim S-H, Item Response Theory: Parameter Estimation Techniques, Second Edition. CRC Press, 2004. [Google Scholar]
- [18].Chen J et al. , “A Natural Language Processing System That Links Medical Terms in Electronic Health Record Notes to Lay Definitions: System Development Using Physician Reviews,” Journal of Medical Internet Research, vol. 20, no. 1, p. e26, 2018, doi: 10.2196/jmir.8669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Lalor JP, Woolf B, and Yu H, “Improving Electronic Health Record Note Comprehension With NoteAid: Randomized Trial of Electronic Health Record Note Comprehension Interventions With Crowdsourced Workers,” J Med Internet Res, vol. 21, no. 1, p. e10793, Jan. 2019, doi: 10.2196/10793. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Lalor JP, Hu W, Tran M, Wu H, Mazor KM, and Yu H, “Evaluating the Effectiveness of NoteAid in a Community Hospital Setting: Randomized Trial of Electronic Health Record Note Comprehension Interventions With Patients,” Journal of Medical Internet Research, vol. 23, no. 5, p. e26354, May 2021, doi: 10.2196/26354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Benny ME, Kabakian-Khasholian T, El-Jardali F, and Bardus M, “Application of the eHealth Literacy Model in Digital Health Interventions: Scoping Review,” Journal of Medical Internet Research, vol. 23, no. 6, p. e23473, Jun. 2021, doi: 10.2196/23473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Wimer C, Shipman D, and Lea L, “Diabetes: Health Literacy Education Improves Veteran Outcomes,” Fed Pract, vol. 34, no. 1, pp. 32–36, Jan. 2017. [PMC free article] [PubMed] [Google Scholar]
- [23].Denneson LM, Pisciotta M, Hooker ER, Trevino A, and Dobscha SK, “Impacts of a web-based educational program for veterans who read their mental health notes online,” J Am Med Inform Assoc, vol. 26, no. 1, pp. 3–8, Jan. 2019, doi: 10.1093/jamia/ocy134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Haun JN et al. , “Evaluating User Experiences of the Secure Messaging Tool on the Veterans Affairs’ Patient Portal System,” Journal of Medical Internet Research, vol. 16, no. 3, p. e75, 2014, doi: 10.2196/jmir.2976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Whealin JM, Jenchura EC, Wong AC, and Zulman DM, “How Veterans With Post-Traumatic Stress Disorder and Comorbid Health Conditions Utilize eHealth to Manage Their Health Care Needs: A Mixed-Methods Analysis,” Journal of Medical Internet Research, vol. 18, no. 10, p. e280, 2016, doi: 10.2196/jmir.5594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Bouhaddou O et al. , “Translating standards into practice: Experience and lessons learned at the Department of Veterans Affairs,” Journal of Biomedical Informatics, vol. 45, no. 4, pp. 813–823, Aug. 2012, doi: 10.1016/j.jbi.2012.01.003. [DOI] [PubMed] [Google Scholar]
- [27].Gibson B, Butler J, Doyon K, Ellington L, Bray BE, and Zeng Q, “Veterans Like Me: Formative evaluation of a patient decision aid design,” Journal of Biomedical Informatics, vol. 71, pp. S46–S52, Jul. 2017, doi: 10.1016/j.jbi.2016.09.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Fagerlin A, Zikmund-Fisher BJ, Ubel PA, Jankovic A, Derry HA, and Smith DM, “Measuring Numeracy without a Math Test: Development of the Subjective Numeracy Scale,” Med Decis Making, vol. 27, no. 5, pp. 672–680, Sep. 2007, doi: 10.1177/0272989X07304449. [DOI] [PubMed] [Google Scholar]
- [29].Rasu R, Bawa W, Hu A, Sharma R, Stahnke A, and Burros S, “Evaluation of Health Literacy in Veteran Affairs Outpatient Population: A Focus on Patient Self-Perceived Health Status,” J Community Med Health Educ, vol. 08, no. 03, 2018, doi: 10.4172/2161-0711.1000613. [DOI] [Google Scholar]
- [30].Royer JM, Hastings CN, and Hook C, “A Sentence Verification Technique for Measuring Reading Comprehension,” Journal of Reading Behavior, vol. 11, no. 4, pp. 355–363, Dec. 1979, doi: 10.1080/10862967909547341. [DOI] [Google Scholar]
- [31].Royer JM, Greene BA, and Sinatra GM, “The Sentence Verification Technique: A Practical Procedure for Testing Comprehension,” Journal of Reading, vol. 30, no. 5, pp. 414–422, 1987. [Google Scholar]
- [32].Difallah D, Filatova E, and Ipeirotis P, “Demographics and Dynamics of Mechanical Turk Workers,” in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, Feb. 2018, pp. 135–143. doi: 10.1145/3159652.3159661. [DOI] [Google Scholar]
- [33].Leroy G, Kauchak D, and Mouradi O, “A user-study measuring the effects of lexical simplification and coherence enhancement on perceived and actual text difficulty,” International Journal of Medical Informatics, vol. 82, no. 8, pp. 717–730, Aug. 2013, doi: 10.1016/j.ijmedinf.2013.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Abraham J et al. , “Exploring patient perspectives on telemedicine monitoring within the operating room,” International Journal of Medical Informatics, vol. 156, p. 104595, Dec. 2021, doi: 10.1016/j.ijmedinf.2021.104595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Sheehan KB, “Crowdsourcing research: Data collection with Amazon’s Mechanical Turk,” Communication Monographs, vol. 85, no. 1, pp. 140–156, Jan. 2018, doi: 10.1080/03637751.2017.1342043. [DOI] [Google Scholar]
- [36].Hauser DJ and Schwarz N, “Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants,” Behav Res, vol. 48, no. 1, pp. 400–407, Mar. 2016, doi: 10.3758/s13428-015-0578-z. [DOI] [PubMed] [Google Scholar]
- [37].Ipeirotis PG, Provost F, and Wang J, “Quality Management on Amazon Mechanical Turk,” in Proceedings of the ACM SIGKDD Workshop on Human Computation, New York, NY, USA, 2010, pp. 64–67. doi: 10.1145/1837885.1837906. [DOI] [Google Scholar]
- [38].Huff C and Tingley D, “‘Who are these people?’ Evaluating the demographic characteristics and political preferences of MTurk survey respondents,” Research & Politics, vol. 2, no. 3, p. 2053168015604648, Jul. 2015, doi: 10.1177/2053168015604648. [DOI] [Google Scholar]
- [39].Chambers S, Nimon K, and Anthony-McMann P, “A Primer for Conducting Survey Research using MTurk: Tips for the Field,” International Journal of Adult Vocational Education and Technology (IJAVET), vol. 7, no. 2, pp. 54–73, 2016, doi: 10.4018/IJAVET.2016040105. [DOI] [Google Scholar]
- [40].Lynn BM-D and Morgan JK, “Using amazon’s mechanical turk (MTurk) to recruit military veterans: Issues and suggestions,” The Military Psychologist, vol. 31, no. 3, pp. 8–14, 2016. [Google Scholar]
- [41].Baker DW, Gazmararian JA, Sudano J, and Patterson M, “The association between age and health literacy among elderly persons,” J Gerontol B Psychol Sci Soc Sci, vol. 55, no. 6, pp. S368–374, Nov. 2000. [DOI] [PubMed] [Google Scholar]
