Evaluating the Use of ChatGPT to Accurately Simplify Patient-centered Information about Breast Cancer Prevention and Screening

Hana L Haver; Anuj K Gupta; Emily B Ambinder; Manisha Bahl; Eniola T Oluyemi; Jean Jeudy; Paul H Yi

doi:10.1148/rycan.230086

. 2024 Feb 2;6(2):e230086. doi: 10.1148/rycan.230086

Evaluating the Use of ChatGPT to Accurately Simplify Patient-centered Information about Breast Cancer Prevention and Screening

Hana L Haver ¹, Anuj K Gupta ¹, Emily B Ambinder ¹, Manisha Bahl ¹, Eniola T Oluyemi ¹, Jean Jeudy ¹, Paul H Yi ^1,^✉

PMCID: PMC10988327 PMID: 38305716

Abstract

Purpose

To evaluate the use of ChatGPT as a tool to simplify answers to common questions about breast cancer prevention and screening.

Materials and Methods

In this retrospective, exploratory study, ChatGPT was requested to simplify responses to 25 questions about breast cancer to a sixth-grade reading level in March and August 2023. Simplified responses were evaluated for clinical appropriateness. All original and simplified responses were assessed for reading ease on the Flesch Reading Ease Index and for readability on five scales: Flesch-Kincaid Grade Level, Gunning Fog Index, Coleman-Liau Index, Automated Readability Index, and the Simple Measure of Gobbledygook (ie, SMOG) Index. Mean reading ease, readability, and word count were compared between original and simplified responses using paired t tests. McNemar test was used to compare the proportion of responses with adequate reading ease (score of 60 or greater) and readability (sixth-grade level).

Results

ChatGPT improved mean reading ease (original responses, 46 vs simplified responses, 70; P < .001) and readability (original, grade 13 vs simplified, grade 8.9; P < .001) and decreased word count (original, 193 vs simplified, 173; P < .001). Ninety-two percent (23 of 25) of simplified responses were considered clinically appropriate. All 25 (100%) simplified responses met criteria for adequate reading ease, compared with only two of 25 original responses (P < .001). Two of the 25 simplified responses (8%) met criteria for adequate readability.

Conclusion

ChatGPT simplified answers to common breast cancer screening and prevention questions by improving the readability by four grade levels, though the potential to produce incorrect information necessitates physician oversight when using this tool.

Keywords: Mammography, Screening, Informatics, Breast, Education, Health Policy and Practice, Oncology, Technology Assessment

Supplemental material is available for this article.

Keywords: Mammography, Screening, Informatics, Breast, Education, Health Policy and Practice, Oncology, Technology Assessment

graphic file with name rycan.230086.VA.jpg

Summary

ChatGPT simplified responses to questions about breast cancer prevention and screening by improving reading ease and readability while maintaining overall clinical appropriateness, though its potential for producing responses containing incorrect information necessitates ongoing physician oversight.

Key Points

■ Original responses by ChatGPT to breast cancer screening and prevention questions had a readability level of grade 13, exceeding the nationally recommended grade 6 level.
■ ChatGPT simplified text responses when prompted to rewrite text to a sixth-grade level, with an average readability level of grade 9 (two of 25 [8%] responses reached the sixth-grade level, and 25 of 25 [100%] reached adequate reading ease).
■ A total of 92% of the simplified responses were deemed clinically accurate by a fellowship-trained breast imaging radiologist.

Introduction

The U.S. Department of Health and Human Services has identified health literacy as a central focus in its Healthy People 2030 initiative to “eliminate health disparities, achieve health equity, and attain health literacy to improve the health and well-being of all” (1). The definitions of health literacy have expanded to account for personal and organizational health literacy, revealing that health literacy is the responsibility of health care organizations to provide equitable access to information for individuals, as well as the ability of individuals themselves to find, understand, and use this information to inform their decision-making (1). It is therefore imperative that available information on this topic is both clinically appropriate and accessible.

Accessing the internet is a common way that American adults seek patient-centered health information, such as on a web-based patient education site (2) or on social media. The provision of medical care and medical information is shifting toward empowering the patient with accessible information for their individualized needs. The average U.S. adult can read at an eighth-grade reading level (3), and the American Medical Association (AMA) recommends that patient-facing written material be at or below the sixth-grade reading level (4). The National Institutes of Health recommends designing written materials for those with limited health literacy to ensure that everyone can understand the message being conveyed (5). Based on previous studies, the threshold for adequate readability includes grades 6 to 8 and below (6,7).

Health literacy has been positively associated with cancer screening program adherence (8,9), which is known to improve breast cancer outcomes such as decreased mortality and increased life years (10). Women with low reading ability have been found to have less knowledge of mammography, including the reasons for obtaining a screening mammogram (11). Low health literacy has been associated with decreased rates of mammography screening; one study found that screening rates were 29% among women with low health literacy (12), compared with the nationwide screening rate of 65.6% estimated by the Centers for Disease Control and Prevention (13). Women with low health literacy tend to have decreased screening rates and report a greater number of perceived barriers to obtaining screening, such as lack of knowledge about examination logistics and fear of a positive result (14).

Readability refers to the academic grade level required to read and comprehend written text and is a commonly used tool for assessing patient education materials. It is well documented that patient education materials in radiology often exceed the recommended sixth-grade reading level, and when material is more difficult to read it can be harder to understand (15–17). ChatGPT is a generative pretrained transformer (ie, GPT) developed by OpenAI that produces natural language text in a dialogue-based format. This tool has gained increasing popularity with its performance in natural language processing tasks; within radiology, ChatGPT has been shown to have the potential to facilitate automatic summarization (18), generating a condensed version of text input (18,19), though this has not been thoroughly evaluated.

In this study, we evaluated ChatGPT's ability to generate patient education material in a chatbot-style interaction in which a patient asks questions about breast cancer prevention and screening and ChatGPT performs automated summarization and simplification of the language to improve readability of the information.

Materials and Methods

Breast Cancer Questions

This retrospective study received an exemption from our institutional review board. We used a set of 25 basic questions about breast cancer prevention and screening previously developed by Haver et al (20) to evaluate ChatGPT; ChatGPT was shown in this study to answer 88% of these questions accurately and appropriately. These questions were previously developed based on fundamental concepts informed by the Breast Imaging Reporting and Data System (BI-RADS) atlas (21) and submitted from the perspective of a patient, asking basic questions without modifying the prompt wording to influence the response. The same three answers previously generated for each question from the original study were used to account for potential variation among responses, including appropriate, inappropriate, and inconsistent responses. These answers served as the basis for our evaluation of reading ease and readability.

Reading Ease and Readability Evaluation of Original Responses

To evaluate the comprehensibility of ChatGPT's responses to the breast cancer prevention and screening questions, we objectively measured reading ease and readability grade level, using text-based algorithms that account for textual elements like numbers of words and syllables. These measures are well established in the health literacy literature for these purposes, including in radiology (16,22–25). Reading ease refers to how easy a passage is to read and is determined by the Flesch Reading Ease Index (26,27), which results in a score ranging from 0 to 100, with 60 and greater representing text that is easy to read for a general audience. Readability scores result in an education grade level that is generally required to understand text and can be determined using the following five scales: Flesch-Kincaid Grade Level, Gunning Fog Index (28), Coleman-Liau Index (29), Automated Readability Index (26), and Simple Measure of Gobbledygook (SMOG) Index (30). These scales are based on different metrics such as number of characters, words, sentences, and paragraphs to determine the estimated academic grade level required to read a given text. The Flesch-Kincaid Grade Level and the Flesch Reading Ease Index have been more widely reported, though all have been used in previous reports (16). Given that the scales are calculated using different formulas, we assigned each text a score that reflected the mean grade level across five scales to reduce variability incurred from the random responses. Based on national recommendations for the readability of patient education material to be at or below the sixth-grade level and prior work evaluating readability in radiology educational materials (31), we define adequate according to the threshold of grade 6 and below (AMA recommendation) (4). An example of a readability evaluation for a sample text is provided in Table 1.

Table 1:

Example of Readability Score for ChatGPT-generated Text

Open in a new tab

We evaluated the reading ease and readability of each of the three original answers to all 25 questions about breast cancer screening and prevention that were generated in a previous study (20). The responses were copied without formatting into a Google Docs file to remove associated elements that might interfere with a plain text readability assessment (16,22–25). The responses were entered individually into a text analysis program (https://www.webfx.com/tools/read-able/), which reports scores for Flesch Reading Ease Index, Flesch-Kincaid Grade Level, Gunning Fog Index, Coleman-Liau Index, Automated Readability Index, and SMOG. Each response was scored on all scales, and these values were included in the reading ease and readability analysis.

ChatGPT Translation Process

Using the web-based free version of ChatGPT (GPT-3.5; https://chat.openai.com/auth/login) available in March 2023, we submitted the following prompt, “Rewrite the following prompt at a 6^th grade reading comprehension level,” followed by each originally generated response to summarize and simplify the text at a patient-friendly reading level and at the recommended level for readability among the average population (4). To evaluate the consistency of the rewritten text, the simplifications were performed twice more in August 2023 (GPT-3.5). An example of the text inputs and outputs using this workflow is shown in Figure 1. Each text input was simplified three separate times.

Schematic diagram of method to generate example responses from ChatGPT to common questions about breast cancer prevention and screening. Here, “What are the symptoms of breast cancer?” was input to ChatGPT three times, and each response was recorded verbatim. Subsequently, each generated response was resubmitted to ChatGPT with the prompt, “Rewrite at a 6th grade reading comprehension level,” followed by the original generated text three separate times. The rewritten texts were then recorded verbatim.

Clinical Appropriateness, Reading Ease, and Readability of Simplified Responses

The clinical appropriateness of the generated responses and the corresponding rewritten simplified versions were assessed by a fellowship-trained board-certified breast imaging radiologist (H.L.H., 2 years of posttraining experience), using the method described previously by Haver et al (20). Reading ease and readability were then evaluated for the simplified responses in the same manner as described for the original responses.

Statistical Analysis

Mean readability scores and grades were calculated along with SDs. Paired t tests were performed using the mean reading ease scores (Flesch Reading Ease Index), readability (Flesch-Kincaid Grade Level, Gunning Fog Index, Coleman-Liau Index, Automated Readability Index, and SMOG), and word count for each answer in its original version and those that were simplified by ChatGPT. A mean readability score was calculated for the answers to each question, which was based on the mean scores of the five readability scales for each of the three answers. McNemar test was performed using the proportion of original and simplified responses that met the criteria for adequate reading ease (60 and greater) and readability (grade 6 and below). All statistical tests were performed using Prism (version 9.5.1; GraphPad Software). A P value less than .05 was considered significant.

Results

Reading Ease and Readability Evaluation of Original Responses

Reading ease for each question was determined by mean Flesch Reading Ease score for the three original responses by ChatGPT to each question, and the mean for all 25 questions was 46 ± 11 (SD), which is considered difficult to read (Table 2). Two of the 25 questions (8%) met the threshold for reading ease (Flesch Reading Ease score greater than or equal to 60). Readability for each question was determined by a mean of the readability scores for each of the three original responses to each question (Table S1). The mean for all 25 questions was grade 13.1 ± 2.1, and none of the responses had a mean readability score that met the threshold for readability by an average audience (grade 6 and below). Average word count of the original responses was 193.

Table 2:

Appropriateness and Readability of Original and Simplified Responses Generated by ChatGPT in Response to Questions about Breast Cancer Prevention and Screening

Open in a new tab

Comparing Reading Ease and Readability of the Original and Simplified Responses

After translating in ChatGPT, the mean reading ease was 70 ± 6.9, which is considered fairly easy to read, and the mean readability was grade 8.9 ± 1.4, which exceeds the recommended grade 6 threshold (Table S2). Average word count of the simplified responses was 172. Reading ease and readability data for the simplified responses are shown in Table 2 and compared with the original responses in Table S3. The differences in mean Flesch Reading Ease score and mean readability score (Fig 2) between the original and simplified responses for the 25 questions were statistically significant (both P < .001). Mean word count decreased by 41 words (P < .001). One hundred percent (25 of 25) of simplified responses had a Flesch Reading Ease score greater than or equal to 60, and 8% (two of 25) were determined to be at or below a sixth-grade level. The proportion of responses that met the criteria for adequate reading ease between original and simplified responses was statistically significant (P < .001), while there was no evidence of a difference in readability proportions (Table 3).

Mean readability scores (Flesch-Kincaid Grade Level, Gunning Fog Index, Coleman-Liau Index, Automated Readability Index, and Simple Measure of Gobbledygook [SMOG] Index) of responses by ChatGPT to 25 questions on breast cancer prevention and screening topics in the original and simplified (sixth-grade reading level) versions. Standard error mean values are shown. * indicates statistically significant (P < .05) difference in mean readability between the simplified and original texts.

Table 3:

Comparison of Adequate Reading Ease and Readability of Original and Simplified Responses by ChatGPT

Open in a new tab

ChatGPT Translation and Clinical Appropriateness of the Simplified Responses

The fellowship-trained breast radiologist deemed 23 of 25 (92%) of the simplified responses as clinically appropriate (Table 2). When considering only the 22 of 25 (88%) original responses that were determined to be clinically appropriate from the previous study (20), 21 of 22 (95%) of the simplified answers maintained clinical appropriateness. Accuracy of the information generated was assessed during the evaluation of the simplified responses, each of which was repeated three separate times. Examples of pre- and posttransformed samples are shown in Figure 1.

Discussion

This study demonstrated that ChatGPT's responses to common questions about breast cancer prevention and screening were written at a high reading comprehension level (mean, grade 13). We found that prompting ChatGPT to rewrite its responses at a sixth-grade reading level improved both reading ease and readability, while maintaining overall accuracy of the responses; for example, readability improved by a mean of four grade levels to ninth-grade level (P < .001). The readability of these ChatGPT-generated responses is higher compared with previously reported readability of web-based information on breast cancer risk assessment (32), information on breast imaging provided by the American College of Radiology–designated Breast Imaging Centers of Excellence (15), and websites from various medical organizations with information on breast lesions and associated procedural terms (22), which reported mean readability scores ranging from 11.7 to 12.4.

Socioeconomic factors are known to play a role in fostering health care disparities, and disadvantaged social and socioeconomic conditions have been found to contribute to low health literacy (33). In cancer screening programs, having an adequate level of health literacy was linked to adherence to screening participation (8). As such, initiatives such as Healthy People 2030 (1) in the United States and those of the World Health Organization (34) have identified health literacy as a modifiable factor that can empower individuals with improved access to health information, ultimately promoting health equity. To access relevant health information, sources need to contain appropriate content at a manageable reading comprehension level.

Previous studies have found that patient-facing educational materials in radiology in English are written at a reading comprehension level that exceeds the estimated reading comprehension level of both the average American adult and the AMA-recommended sixth-grade reading level (15–17). With recent advances in large language models (LLMs), we sought to evaluate the readability of information generated by the publicly available LLM ChatGPT to common questions about breast cancer prevention and screening. A previous study by Haver et al (20) found that the recommendations made by ChatGPT to such questions were appropriate 88% of the time. In the present study, we used the same prompt methods to show that the simplified responses were 92% clinically appropriate among all questions and 95% appropriate among the initially appropriate responses. The answers to “How can I prevent breast cancer?” and “Where can I get screened for breast cancer?” had been inconsistent, but after simplification, they were determined to be appropriate. The answer to “Do I need to plan my screening mammogram around my COVID vaccination?” was initially inappropriate and remained inappropriate after simplification. Notably, responses to “My radiology report has a BI-RADS score, what is that?” originally generated appropriate information, but following simplification, the answers were considered inconsistent per breast imager review (Table 4). Though this technology should not be used by patients without physician oversight at this time, the automatic generation of simplified information may help institutions and clinicians looking to develop patient-facing materials with greater readability.

Table 4:

ChatGPT Responses to Simplification Prompt in Three Independent Trials with Reviewer Grades and Explanation of Inconsistencies

Open in a new tab

To evaluate the ability of ChatGPT to rewrite responses in simpler language, we calculated the readability of each rewritten response in three trials for all 25 questions. Aggregating the scores for three trials of all questions across five quantifiable scales, we found the mean grade level was 8.9. Though the rewritten responses required a reading level higher than sixth grade, there was improved readability when compared with other known sources of patient information on breast cancer in English, which ranged from grade 11.0 to 12.4 (15,22,32). Though ChatGPT improved reading ease and readability, it did not accurately reach the threshold requested by the prompt to reach a grade 6 reading level. Perhaps the training data included information on these topics that is predominantly difficult to read. Though ChatGPT did not simplify the answers to the extent to which it was prompted, these early findings demonstrate the potential to automate summarization and simplify patient-facing educational materials.

In the context of breast cancer screening, increasing the readability of patient-facing material has been shown to improve patient follow-up for mammography recalls (7). It is important to note that ChatGPT and other LLMs are very early in their development; therefore, we believe that the ability of ChatGPT and other LLMs will continue to improve as the technologies further develop and as prompt engineering matures as a discipline, both of which should be topics for future study within radiology. One recent report on the readability of responses by ChatGPT to 13 questions about cancer misconceptions had a mean readability of 12.0 on the Flesch-Kincaid grade level (35), which is consistent with the responses from the present study, which also had a mean of 12.0 on the same scale. It is important to note that although there were two questions whose answers were initially inconsistent, one set of answers became appropriate after simplification and the other became inappropriate after simplification. Twenty-three of 25 (92%) of the simplified responses maintained the appropriateness of the content in the original responses. While overall accuracy and appropriateness of health information appear to be maintained after simplification by ChatGPT, there remains a risk that misinformation could be conveyed to patients, and physician oversight will remain crucial to the safe development and deployment of these tools.

Limitations of the present study included the continuously evolving nature of LLMs such as ChatGPT, which was accessed between March and August 2023. Future iterations of the software and other LLMs released by other companies may yield different outcomes and could certainly be the subject of future study as this technology evolves. The prompts were not modified in the present study to simulate the experience of a patient asking questions about breast cancer prevention and screening to ChatGPT, and the small sample size used in this exploratory study serves to demonstrate a potential application of this technology; further study with a larger sample is planned for the future. Each prompt was submitted three times, and no two responses were identical. An additional limitation was that random seed was not being controlled, which can limit the exact reproducibility of the results. An important limitation in the statistical analysis reflected the underlying assumption that ChatGPT's responses, to the same or different prompts, are independent of one another, which may not be true given the model's ability to learn over time. For example, the P value in Table S3 represents a comparison between the mean scores for readability, and independent responses were more variable. The readability scales themselves have variability, as they are based on unique weightings and evaluations of textual elements, such as number of syllables and number of words. No single readability scale is universally adopted by the health care community, making comparison of readability results challenging. This is why we used multiple scales in our study and aggregated the scores from the five scales to facilitate a more uniform comparison of readability, as done in previous work (16). There is an ongoing discussion about bias in the context of LLMs that may have implications for uses such as in the present study, as real patients have varied linguistic features that likely have varied representation in the training data. Another limitation was that this study was performed in English only; future studies of information generated in other languages should be considered given that 22% of households in the United States have at least one member who speaks a language other than English at home (36). Further study may also include asking ChatGPT to write responses with specific parameters to improve readability, such as low word count and low number of polysyllabic words. Finally, we used only a single prompt to simplify the responses, albeit one that is intuitive and straightforward; as prompt engineering becomes a more established field, future work should evaluate how different prompts will impact both the readability and accuracy of patient education materials in radiology.

In summary, we demonstrated that ChatGPT provides largely accurate recommendations to common questions about breast cancer prevention and screening, though readability levels remain too high for the average U.S. English-speaking adult. Using a simple prompt, ChatGPT was able to simplify these texts with improvements in readability by a mean of four grade levels while maintaining the overall accuracy of the recommendations. The potential for ChatGPT and other LLMs to improve patient education in radiology represents an exciting step toward improving health literacy and health equity, though future study and validation are needed as these technologies rapidly improve and develop, as addressing the safety of such tools is paramount to meaningful application in a patient care setting.

Authors declared no funding for this work.

Disclosures of conflicts of interest: H.L.H. No relevant relationships. A.K.G. No relevant relationships. E.B.A. No relevant relationships. M.B. Grant from the National Institutes of Health (grant no. K08CA241365), paid to author's institution; consulting fees paid to author from 2nd.MD, Hologic, and Lunit; associate editor for Radiology and Radiology: Artificial Intelligence. E.T.O. Received research funding from GE Healthcare as a recipient of the Association of University Radiologists GE Radiology Research Academic Fellowship award. J.J. No relevant relationships. P.H.Y. Associate editor for Radiology: Artificial Intelligence.

Abbreviations:

AMA: American Medical Association
BI-RADS: Breast Imaging Reporting and Data System
LLM: large language model
SMOG: Simple Measure of Gobbledygook

References

1. Health Literacy in Healthy People 2030 - Healthy People 2030 . health.gov . https://health.gov/healthypeople/priority-areas/health-literacy-healthy-people-2030. Accessed March 1, 2023.
2. Demiris G , Afrin LB , Speedie S , et al . Patient-centered applications: use of information technology to promote disease management and wellness. A white paper by the AMIA knowledge in motion working group . J Am Med Inform Assoc 2008. ; 15 ( 1 ): 8 – 13 . [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Kutner M , Greenberg E , Jin Y , Paulsen C . The Health Literacy of America's Adults: Results From the 2003 National Assessment of Adult Literacy (NCES 2006–483) . U.S. Department of Education . Washington, DC: : National Center for Education Statistics; , 2003. . [Google Scholar]
4. Weiss B . Health Literacy: A Manual for Clinicians . American Medical Association/American Medical Association Foundation; ; 2003. . [Google Scholar]
5. Clear & Simple . National Institutes of Health . https://www.nih.gov/institutes-nih/nih-office-director/office-communications-public-liaison/clear-communication/clear-simple. Published 2015. Accessed May 14, 2023.
6. Kressin NR , Gunn CM , Battaglia TA . Content, readability, and understandability of dense breast notifications by state . JAMA 2016. ; 315 ( 16 ): 1786 – 1788 . [DOI] [PubMed] [Google Scholar]
7. Nguyen DL , Harvey SC , Oluyemi ET , Myers KS , Mullen LA , Ambinder EB . Impact of improved screening mammography recall lay letter readability on patient follow-up . J Am Coll Radiol 2020. ; 17 ( 11 ): 1429 – 1436 . [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Baccolini V , Isonne C , Salerno C , et al . The association between adherence to cancer screening programs and health literacy: A systematic review and meta-analysis . Prev Med 2022. ; 155 : 106927 . [DOI] [PubMed] [Google Scholar]
9. Oldach BR , Katz ML . Health literacy and cancer screening: a systematic review . Patient Educ Couns 2014. ; 94 ( 2 ): 149 – 157 . [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Mandelblatt JS , Stout NK , Schechter CB , et al . Collaborative modeling of the benefits and harms associated with different U.S. breast cancer screening strategies . Ann Intern Med 2016. ; 164 ( 4 ): 215 – 225 . [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Davis TC , Arnold C , Berkel HJ , Nandy I , Jackson RH , Glass J . Knowledge and attitude on screening mammography among low-literate, low-income women . Cancer 1996. ; 78 ( 9 ): 1912 – 1920 . [DOI] [PubMed] [Google Scholar]
12. Komenaka IK , Nodora JN , Hsu C-H , et al . Association of health literacy with adherence to screening mammography guidelines . Obstet Gynecol 2015. ; 125 ( 4 ): 852 – 859 . [DOI] [PubMed] [Google Scholar]
13. National Center for Health Statistics . Health, United States, 2020-2021: Table CanBrTest . https://www.cdc.gov/nchs/data/hus/2020-2021/CanBrTest.pdf. Accessed November 27, 2023. [PubMed]
14. Poon PKM , Tam KW , Lam T , et al . Poor health literacy associated with stronger perceived barriers to breast cancer screening and overestimated breast cancer risk . Front Oncol 2023. ; 12 : 1053698 . [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Choudhery S , Xi Y , Chen H , et al . Readability and quality of online patient education material on websites of breast imaging centers . J Am Coll Radiol 2020. ; 17 ( 10 ): 1245 – 1251 . [DOI] [PubMed] [Google Scholar]
16. Bange M , Huh E , Novin SA , Hui FK , Yi PH . Readability of patient education materials from RadiologyInfo.org: Has there been progress over the past 5 years? AJR Am J Roentgenol 2019. ; 213 ( 4 ): 875 – 879 . [DOI] [PubMed] [Google Scholar]
17. Hansberry DR , Agarwal N , Baker SR . Health literacy and online educational resources: an opportunity to educate patients . AJR Am J Roentgenol 2015. ; 204 ( 1 ): 111 – 116 . [DOI] [PubMed] [Google Scholar]
18. Ismail A , Ghorashi NS , Javan R . New horizons: the potential role of OpenAI's ChatGPT in clinical radiology . J Am Coll Radiol 2023. ; 20 ( 7 ): 696 – 698 . [DOI] [PubMed] [Google Scholar]
19. Shen Y , Heacock L , Elias J , et al . ChatGPT and other large language models are double-edged swords . Radiology 2023. ; 307 ( 2 ): e230163 . [DOI] [PubMed] [Google Scholar]
20. Haver HL , Ambinder EB , Bahl M , Oluyemi ET , Jeudy J , Yi PH . Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT . Radiology 2023. ; 307 ( 4 ): e230424 . [DOI] [PubMed] [Google Scholar]
21. D'Orsi CJ , Sickles EA , Mendelson EB , Morrie EA . ACR BI-RADS Atlas, Breast Imaging Reporting and Data System . Reston, Va: : American College of Radiology; , 2013. . [Google Scholar]
22. Miles RC , Baird GL , Choi P , Falomo E , Dibble EH , Garg M . Readability of online patient educational materials related to breast lesions requiring surgery . Radiology 2019. ; 291 ( 1 ): 112 – 118 . [DOI] [PubMed] [Google Scholar]
23. Novin SA , Huh EH , Bange MG , Hui FK , Yi PH . Readability of Spanish-Language Patient Education Materials From RadiologyInfo.org . J Am Coll Radiol 2019. ; 16 ( 8 ): 1108 – 1113 . [DOI] [PubMed] [Google Scholar]
24. Yi PH , Golden SK , Harringa JB , Kliewer MA . Readability of lumbar spine MRI reports: will patients understand? AJR Am J Roentgenol 2019. ; 212 ( 3 ): 602 – 606 . [DOI] [PubMed] [Google Scholar]
25. Yi PH , Yi MM , Nguyen JC . Readability of online information related to pediatric radiation safety from societal websites . AJR Am J Roentgenol 2018. ; 211 ( 5 ): 1128 – 1134 . [DOI] [PubMed] [Google Scholar]
26. Kincaid J , Fishburne R , Rogers R , Chissom BS . Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel . Naval Technical Training Command Millington TN Research Branch , 1975. . [Google Scholar]
27. Flesch R . A new readability yardstick . J Appl Psychol 1948. ; 32 ( 3 ): 221 – 233 . [DOI] [PubMed] [Google Scholar]
28. Gunning R . The technique of clear writing . New York, NY: : McGraw-Hill; , 1952. . [Google Scholar]
29. Coleman M , Liau T , Coleman M , Liau TL . A computer readability formula designed for machine scoring . J Appl Psychol 1975. ; 60 ( 2 ): 283 – 284 . [Google Scholar]
30. McLaughlin G . SMOG grading–A new readability formula . J Read 1969. ; 12 ( 8 ): 639 – 646 . https://www.jstor.org/stable/40011226 . [Google Scholar]
31. Rooney MK , Santiago G , Perni S , et al . Readability of patient education materials from high-impact medical journals: a 20-year analysis . J Patient Exp 2021. ; 8 : 2374373521998847 . [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Lamb LR , Baird GL , Roy IT , Choi PHS , Lehman CD , Miles RC . Are English-language online patient education materials related to breast cancer risk assessment understandable, readable, and actionable? Breast 2022. ; 61 : 29 – 34 . [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Stormacq C , Van den Broucke S , Wosinski J . Does health literacy mediate the relationship between socioeconomic status and health disparities? Integrative review . Health Promot Int 2019. ; 34 ( 5 ): e1 – e17 . [DOI] [PubMed] [Google Scholar]
34. World Health Organization . Improving health literacy . https://www.who.int/health-promotion/enhanced-wellbeing/improving-health-literay. Accessed March 6, 2023.
35. Johnson SB , King AJ , Warner EL , Aneja S , Kann BH , Bylund CL . Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information . JNCI Cancer Spectr 2023. ; 7 ( 2 ): pkad015 . [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Dietrich S , Hernandez E . Language Use in the United States: 2019 . https://www.census.gov/content/dam/Census/library/publications/2022/acs/acs-50.pdf. Accessed August 22, 2023. [Google Scholar]

[r1] 1. Health Literacy in Healthy People 2030 - Healthy People 2030 . health.gov . https://health.gov/healthypeople/priority-areas/health-literacy-healthy-people-2030. Accessed March 1, 2023.

[r2] 2. Demiris G , Afrin LB , Speedie S , et al . Patient-centered applications: use of information technology to promote disease management and wellness. A white paper by the AMIA knowledge in motion working group . J Am Med Inform Assoc 2008. ; 15 ( 1 ): 8 – 13 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r3] 3. Kutner M , Greenberg E , Jin Y , Paulsen C . The Health Literacy of America's Adults: Results From the 2003 National Assessment of Adult Literacy (NCES 2006–483) . U.S. Department of Education . Washington, DC: : National Center for Education Statistics; , 2003. . [Google Scholar]

[r4] 4. Weiss B . Health Literacy: A Manual for Clinicians . American Medical Association/American Medical Association Foundation; ; 2003. . [Google Scholar]

[r5] 5. Clear & Simple . National Institutes of Health . https://www.nih.gov/institutes-nih/nih-office-director/office-communications-public-liaison/clear-communication/clear-simple. Published 2015. Accessed May 14, 2023.

[r6] 6. Kressin NR , Gunn CM , Battaglia TA . Content, readability, and understandability of dense breast notifications by state . JAMA 2016. ; 315 ( 16 ): 1786 – 1788 . [DOI] [PubMed] [Google Scholar]

[r7] 7. Nguyen DL , Harvey SC , Oluyemi ET , Myers KS , Mullen LA , Ambinder EB . Impact of improved screening mammography recall lay letter readability on patient follow-up . J Am Coll Radiol 2020. ; 17 ( 11 ): 1429 – 1436 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r8] 8. Baccolini V , Isonne C , Salerno C , et al . The association between adherence to cancer screening programs and health literacy: A systematic review and meta-analysis . Prev Med 2022. ; 155 : 106927 . [DOI] [PubMed] [Google Scholar]

[r9] 9. Oldach BR , Katz ML . Health literacy and cancer screening: a systematic review . Patient Educ Couns 2014. ; 94 ( 2 ): 149 – 157 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r10] 10. Mandelblatt JS , Stout NK , Schechter CB , et al . Collaborative modeling of the benefits and harms associated with different U.S. breast cancer screening strategies . Ann Intern Med 2016. ; 164 ( 4 ): 215 – 225 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r11] 11. Davis TC , Arnold C , Berkel HJ , Nandy I , Jackson RH , Glass J . Knowledge and attitude on screening mammography among low-literate, low-income women . Cancer 1996. ; 78 ( 9 ): 1912 – 1920 . [DOI] [PubMed] [Google Scholar]

[r12] 12. Komenaka IK , Nodora JN , Hsu C-H , et al . Association of health literacy with adherence to screening mammography guidelines . Obstet Gynecol 2015. ; 125 ( 4 ): 852 – 859 . [DOI] [PubMed] [Google Scholar]

[r13] 13. National Center for Health Statistics . Health, United States, 2020-2021: Table CanBrTest . https://www.cdc.gov/nchs/data/hus/2020-2021/CanBrTest.pdf. Accessed November 27, 2023. [PubMed]

[r14] 14. Poon PKM , Tam KW , Lam T , et al . Poor health literacy associated with stronger perceived barriers to breast cancer screening and overestimated breast cancer risk . Front Oncol 2023. ; 12 : 1053698 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r15] 15. Choudhery S , Xi Y , Chen H , et al . Readability and quality of online patient education material on websites of breast imaging centers . J Am Coll Radiol 2020. ; 17 ( 10 ): 1245 – 1251 . [DOI] [PubMed] [Google Scholar]

[r16] 16. Bange M , Huh E , Novin SA , Hui FK , Yi PH . Readability of patient education materials from RadiologyInfo.org: Has there been progress over the past 5 years? AJR Am J Roentgenol 2019. ; 213 ( 4 ): 875 – 879 . [DOI] [PubMed] [Google Scholar]

[r17] 17. Hansberry DR , Agarwal N , Baker SR . Health literacy and online educational resources: an opportunity to educate patients . AJR Am J Roentgenol 2015. ; 204 ( 1 ): 111 – 116 . [DOI] [PubMed] [Google Scholar]

[r18] 18. Ismail A , Ghorashi NS , Javan R . New horizons: the potential role of OpenAI's ChatGPT in clinical radiology . J Am Coll Radiol 2023. ; 20 ( 7 ): 696 – 698 . [DOI] [PubMed] [Google Scholar]

[r19] 19. Shen Y , Heacock L , Elias J , et al . ChatGPT and other large language models are double-edged swords . Radiology 2023. ; 307 ( 2 ): e230163 . [DOI] [PubMed] [Google Scholar]

[r20] 20. Haver HL , Ambinder EB , Bahl M , Oluyemi ET , Jeudy J , Yi PH . Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT . Radiology 2023. ; 307 ( 4 ): e230424 . [DOI] [PubMed] [Google Scholar]

[r21] 21. D'Orsi CJ , Sickles EA , Mendelson EB , Morrie EA . ACR BI-RADS Atlas, Breast Imaging Reporting and Data System . Reston, Va: : American College of Radiology; , 2013. . [Google Scholar]

[r22] 22. Miles RC , Baird GL , Choi P , Falomo E , Dibble EH , Garg M . Readability of online patient educational materials related to breast lesions requiring surgery . Radiology 2019. ; 291 ( 1 ): 112 – 118 . [DOI] [PubMed] [Google Scholar]

[r23] 23. Novin SA , Huh EH , Bange MG , Hui FK , Yi PH . Readability of Spanish-Language Patient Education Materials From RadiologyInfo.org . J Am Coll Radiol 2019. ; 16 ( 8 ): 1108 – 1113 . [DOI] [PubMed] [Google Scholar]

[r24] 24. Yi PH , Golden SK , Harringa JB , Kliewer MA . Readability of lumbar spine MRI reports: will patients understand? AJR Am J Roentgenol 2019. ; 212 ( 3 ): 602 – 606 . [DOI] [PubMed] [Google Scholar]

[r25] 25. Yi PH , Yi MM , Nguyen JC . Readability of online information related to pediatric radiation safety from societal websites . AJR Am J Roentgenol 2018. ; 211 ( 5 ): 1128 – 1134 . [DOI] [PubMed] [Google Scholar]

[r26] 26. Kincaid J , Fishburne R , Rogers R , Chissom BS . Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel . Naval Technical Training Command Millington TN Research Branch , 1975. . [Google Scholar]

[r27] 27. Flesch R . A new readability yardstick . J Appl Psychol 1948. ; 32 ( 3 ): 221 – 233 . [DOI] [PubMed] [Google Scholar]

[r28] 28. Gunning R . The technique of clear writing . New York, NY: : McGraw-Hill; , 1952. . [Google Scholar]

[r29] 29. Coleman M , Liau T , Coleman M , Liau TL . A computer readability formula designed for machine scoring . J Appl Psychol 1975. ; 60 ( 2 ): 283 – 284 . [Google Scholar]

[r30] 30. McLaughlin G . SMOG grading–A new readability formula . J Read 1969. ; 12 ( 8 ): 639 – 646 . https://www.jstor.org/stable/40011226 . [Google Scholar]

[r31] 31. Rooney MK , Santiago G , Perni S , et al . Readability of patient education materials from high-impact medical journals: a 20-year analysis . J Patient Exp 2021. ; 8 : 2374373521998847 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r32] 32. Lamb LR , Baird GL , Roy IT , Choi PHS , Lehman CD , Miles RC . Are English-language online patient education materials related to breast cancer risk assessment understandable, readable, and actionable? Breast 2022. ; 61 : 29 – 34 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r33] 33. Stormacq C , Van den Broucke S , Wosinski J . Does health literacy mediate the relationship between socioeconomic status and health disparities? Integrative review . Health Promot Int 2019. ; 34 ( 5 ): e1 – e17 . [DOI] [PubMed] [Google Scholar]

[r34] 34. World Health Organization . Improving health literacy . https://www.who.int/health-promotion/enhanced-wellbeing/improving-health-literay. Accessed March 6, 2023.

[r35] 35. Johnson SB , King AJ , Warner EL , Aneja S , Kann BH , Bylund CL . Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information . JNCI Cancer Spectr 2023. ; 7 ( 2 ): pkad015 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r36] 36. Dietrich S , Hernandez E . Language Use in the United States: 2019 . https://www.census.gov/content/dam/Census/library/publications/2022/acs/acs-50.pdf. Accessed August 22, 2023. [Google Scholar]

PERMALINK

Evaluating the Use of ChatGPT to Accurately Simplify Patient-centered Information about Breast Cancer Prevention and Screening

Hana L Haver, MD

Anuj K Gupta, DO

Emily B Ambinder, MD

Manisha Bahl, MD

Eniola T Oluyemi, MD

Jean Jeudy, MD

Paul H Yi, MD

Abstract

Purpose

Materials and Methods

Results

Conclusion

Summary

Key Points

Introduction

Materials and Methods

Breast Cancer Questions

Reading Ease and Readability Evaluation of Original Responses

Table 1:

ChatGPT Translation Process

Figure 1:

Clinical Appropriateness, Reading Ease, and Readability of Simplified Responses

Statistical Analysis

Results

Reading Ease and Readability Evaluation of Original Responses

Table 2:

Comparing Reading Ease and Readability of the Original and Simplified Responses

Figure 2:

Table 3:

ChatGPT Translation and Clinical Appropriateness of the Simplified Responses

Discussion

Table 4:

Abbreviations:

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases