Abstract
Introduction: Health literacy is a critical determinant of a patient’s overall health status, and studies have demonstrated a consistent link between poor health literacy and negative health outcomes. The Centers for Disease Control and Prevention (CDC) and the National Institutes of Health (NIH) advise that patient educational materials (PEMs) should be written at an eighth-grade reading level or lower, matching the average reading level of adult Americans. The purpose of this study was to investigate the ability of generative artificial intelligence (AI) to edit PEMs from orthopaedic institutions to meet the CDC and NIH guidelines.
Methods: PEMs about lateral epicondylitis (LE) from the top 25 ranked orthopaedic institutions from the 2022 U.S. News & World Report Best Hospitals Specialty Ranking were gathered. ChatGPT Plus (version 4.0) was then instructed to rewrite PEMs on LE from these institutions to comply with CDC and NIH-recommended guidelines. Readability scores were calculated for the original and rewritten PEMs, and paired t-tests were used to determine statistical significance.
Results: Analysis of the original and edited PEMs about LE revealed significant reductions in reading grade level and word count of 3.70 ± 1.84 (p<0.001) and 346.72 ± 364.63 (p<0.001), respectively.
Conclusion: Our study demonstrated generative AI’s ability to rewrite PEM about LE at a reading comprehension level that conforms to the CDC and NIH guidelines. Hospital administrators and orthopaedic surgeons should consider the findings of this study and the potential utility of artificial intelligence when crafting PEMs of their own.
Keywords: health literacy, patient education materials, chatgpt, lateral epicondylitis, orthopedic surgery
Introduction
Health literacy, or the ability of patients to understand health-related information, stands as one of the most significant indicators of an individual's overall health [1]. Past research has consistently linked low health literacy to negative health outcomes [2-3]. The surge in global internet usage by over 11-fold in recent years [3] presents healthcare providers with a new opportunity to disseminate information to their patients. This shift towards online patient education necessitates an awareness of the average patient's reading comprehension abilities by physicians to ensure the highest quality of patient education material (PEM).
Data from the National Center for Education Statistics indicate that in the United States, the average person reads at about the level of an eighth grader [4]. Considering this, both the Centers for Disease Control and Prevention (CDC) and the National Institute of Health (NIH) advise that PEMs should be written at no higher than an eighth-grade reading level [5-6]. However, research indicates that many hospital and physician websites offer PEMs at reading levels that far exceed these recommendations [7-10]. This situation underscores the pressing need for PEMs to be revised to a level accessible and understandable to the general population.
Lateral epicondylitis (LE), commonly known as “tennis elbow,” represents a form of tendinosis of the extensor muscles of the forearm and is one of the most common sources of lateral elbow pain [11]. This condition has traditionally been linked to physical activities that require repetitive motion of the arm and wrist, such as playing tennis or using a screwdriver at work [12]. Consequently, blue-collar workers whose jobs predominantly involve manual labour may be at particularly high risk for developing LE [11-13]. This raises concerns about the accessibility and effectiveness of PEMs targeted to individuals suffering from LE.
One potential solution to this discrepancy between the CDC’s recommendations and the actual readability of available online PEMs involves utilising artificial intelligence. In response to the skyrocketing global interest in artificial intelligence (AI), many companies are exploring innovative ways to introduce AI-based technology into everyday life. One notable example of such companies is OpenAI with its widely used online chatbot, Chat Generative Pretrained Transformer (ChatGPT). ChatGPT is a natural language processing tool that engages users in a conversational dialogue [14]. While the full versatility of ChatGPT is still being explored, it attracts over one hundred million weekly users who utilise it for professional, educational, and recreational purposes [14-16].
The goal of this study is to explore the ability of ChatGPT to generate PEMs on LE that are comprehensible to most Americans, while considering the average national reading grade level. We hypothesise that by incorporating recommendations proposed by the CDC and NIH, ChatGPT will be able to improve the readability of these PEMs in efforts to bridge the gap in health literacy between patient and provider.
Materials and methods
The top 25 orthopaedic institutions, as ranked by the 2022 U.S. News & World Report Best Hospitals Specialty Ranking, were selected for this study. The websites of these institutions were searched for PEMs pertaining to LE. The content from each website was copied and preserved for analysis, excluding any audiovisual multimedia such as pictures, diagrams, and videos. Institutions lacking relevant PEMs on LE were excluded from the study.
Each institution’s PEMs were individually uploaded to ChatGPT Plus (version 4.0). ChatGPT was prompted with a message to rewrite the PEM with the following parameters: (1) limit the total number of polysyllabic words to less than 30, (2) limit sentences to less than 10 words, (3) limit paragraphs to less than five sentences, (4) eliminate as much medical jargon without compromising accuracy, (5) when eliminating medical jargon is not possible, provide a brief explanation of the relevant concept, and (6) overall, rewrite this as if you were speaking to an eighth grader. These parameters were based on pertinent recommendations set forth by both the CDC’s Simply Put and NIH Clear & Simple documents [5-6]. A senior orthopaedic surgery resident reviewed all rewritten PEMs to verify the accuracy of the information presented.
Readability scores for both the original PEMs and those rewritten by ChatGPT were calculated using the Readability Formulas website [17]. For each set of PEMs, readability was assessed using seven distinct formulas: Gunning Fog, Flesch-Kincaid Grade Level, Coleman-Liau Index, Simple Measure of Gobbledygook (SMOG) Index, Automated Readability Index, Linsear Write Formula, and FORECAST Readability Formula. These formulas evaluate different elements of text composition, including total word count, the presence of polysyllabic words, and overall language complexity, to derive a readability score, which is indicative of the text's grade-level comprehension requirement (Table 1). The study calculated the average readability score and word count for the PEMs of each institution. To assess the significance of the differences between the original and ChatGPT-revised PEMs, paired t-tests were conducted. Statistical analyses were performed using SPSS software (Version 29.0.0(241); Armonk, NY: IBM Corp), with a p-value of 0.05 or lower indicating statistical significance.
Table 1. Description of readability tools and their corresponding formulas.
| Readability Tool | Formula |
| Gunning Fog | Grade level = 0.4 (average sentence length/percentage of hard words) |
| Flesch-Kincaid Grade Level | Grade level = (0.39 x average sentence length) + (11.8 x average # syllable per word) - 15.59 |
| The Coleman-Liau Index | Grade level = 0.0855 (average # of letters per 100 words) - 0.296 (average # of sentences per 100 words) - 15.8 |
| SMOG Index | Grade level = 3 x square root of polysyllable count |
| Automated Readability Index | Grade level = 4.71 (characters/words) + 0.5 (words/sentences) - 21.43 |
| Linsear Write Formula | n = [(2 syllable words x 1) + (3 or more syllable words x 3)]/number of sentences If n <20, Grade level = n/20 If n>20, Grade level = n-2/20 |
| FORCAST Readability Formula | Grade level = 20 - (# of single syllable words x 150 / # of words x 10 ) |
Results
Twenty-two of the initial 25 orthopaedic institutions contained educational material related to LE on their websites. In the original, unedited PEM cohort, only six institutions obtained average readability scores below the eighth-grade reading level. The average reading grade level of all institutions’ original PEMs was 9.81 ± 1.76, and the average word count was 600.68 ± 409.44 words. Following ChatGPT’s edits to the original PEMs, all 25 rewritten PEMs obtained average readability scores below the eighth-grade reading level. The total average readability score for all rewritten PEMs was 6.12 ± 0.97 (Figure 1), with an average word count of 253.96 ± 100.76 (Figure 2). By utilising ChatGPT to rewrite the original PEMs, a reduction of 3.70 ± 1.84 (p<0.001) reading grade levels and 346.72 ± 364.63 (p<0.001) words was achieved (Table 2). The senior orthopaedic resident validated the accuracy of the information in the PEMs rewritten by ChatGPT.
Table 2. Results for paired t-test.
Results for paired t-test to determine significance between (a) average readability scores and (b) total word count of the original and rewritten PEMs. PEM: patient educational material.
| Mean Reduction | Standard Deviation | 95% Confidence Interval | p-value | ||
| Lower | Upper | ||||
| Readability score | 3.70 | 1.84 | 2.93 | 4.46 | <0.001 |
| Word count | 346.72 | 364.63 | 196.21 | 497.24 | <0.001 |
Figure 1. The average readability scores of patient education materials related to lateral epicondylitis from 25 of the top nationally ranked orthopaedic institutions, before and after generative AI-assisted editing.
AI: artificial intelligence.
Figure 2. The average word count of patient education materials related to lateral epicondylitis from 25 of the top nationally ranked orthopaedic institutions, before and after generative AI-assisted editing.
AI: artificial intelligence.
Discussion
Our study examined patient educational material provided by 25 distinct orthopaedic institutions. Among them, 22 included PEMs pertaining to lateral epicondylitis. Only six of these institutions initially had PEM written at an eighth-grade reading level or lower. With minimal guidance, ChatGPT successfully revised the original PEMs, resulting in a decrease in average reading grade level and word count to meet the recommended guidelines by the CDC and NIH. These findings suggest that ChatGPT shows promise in enhancing the comprehensibility of PEMs for their target audience.
With the widespread availability of the internet at home, work, and while on the move, accessing PEMs have become more convenient than ever. Despite this improved access, the understanding of these materials has not seen a corresponding increase. Previous studies have shown a significant discrepancy between the CDC and NIH-recommended guidelines for writing PEMs and the true readability of PEMs published by healthcare institutions [18-20]. The implications of these studies are profound, as there is a well-documented correlation between low health literacy and worse clinical outcomes [1-3,21]. Moreover, a study investigating the impact of health literacy on patient satisfaction in surgical settings revealed that patients with higher health literacy tended to report greater satisfaction with their surgeries compared to those with lower health literacy level [22]. Health literacy has also been shown to be positively correlated with patient compliance [23]. Overall, PEMs that adhere to CDC and NIH guidelines not only improve patient outcomes, satisfaction, and compliance but also empower patients to make informed decisions about their healthcare. Artificial intelligence may be the solution to writing simpler and more effective PEMs.
In recent decades, there has been significant progress in artificial intelligence, with ChatGPT leading the way. These advancements have led to ChatGPT’s widespread integration across various sectors, including the medical field. A recent study on ChatGPT's ability to develop customized obesity treatment plans revealed its proficiency in crafting individualized strategies tailored to the specific requirements of each person [24]. Another study that looked at ChatGPT’s ability to manage obstructive sleep apnoea (OSA) determined that ChatGPT was able to demonstrate potential as a valuable resource for OSA diagnoses [25]. These studies showcase the multifaceted utility of ChatGPT within the healthcare sector, displaying its potential to revolutionise personalized treatment strategies for conditions like obesity and OSA. Building on this foundation, our study indicates that ChatGPT's capabilities extend further, demonstrating its capacity to craft PEMs that abide by the standards given by the CDC and NIH. This not a standalone finding. Previous studies have found that ChatGPT is not only capable of generating text-based information for specific audiences but is also preferred over content generated by humans [26-27]. This not only highlights ChatGPT's versatility but also underscores its pivotal role in enhancing patient comprehension and empowerment in medical decision-making processes [28-29].
There are a few noteworthy limitations to the present study. First, the CDC guidelines suggest considering demographic factors such as race, gender, and ethnicity when writing PEMs, which can be challenging when requesting assistance from a language processing model such as ChatGPT. Additionally, our evaluation of reading comprehension relied solely on reading grade-level comprehension scores and excluded videos, pictures, or other visual media that were included on institution’s websites. To our knowledge, there is no reliable metric that would have allowed us to objectively compare audiovisual media between hospital websites. Furthermore, ChatGPT is limited in its ability to incorporate, analyse, and regenerate pictures or diagrams, which was a critical aspect of our methodology. Ultimately, these visual tools may alter and potentially improve the viewer’s understanding of the information provided. Potential avenues for future research may include using AI to generate improved audiovisual multimedia in the healthcare space. Despite these limitations, our study is the first to successfully explore ChatGPT’s ability to generate PEMs that adhere to the CDC and NIH guidelines.
Conclusions
Our study determined that ChatGPT was capable of rewriting PEM on LE at a reading comprehension level that abides by the CDC and NIH guidelines. Hospital administrators and orthopaedic surgeons should consider the findings of this study and the potential utility of artificial intelligence when crafting PEMs of their own.
Disclosures
Human subjects: All authors have confirmed that this study did not involve human participants or tissue.
Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue.
Conflicts of interest: In compliance with the ICMJE uniform disclosure form, all authors declare the following:
Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work.
Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work.
Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.
Author Contributions
Concept and design: Michael J. Miskiewicz
Acquisition, analysis, or interpretation of data: Michael J. Miskiewicz, Salvatore Capotosto, Christian Leonardo, Kenny Ling, Dorian Cohen, David Komatsu, Edward D. Wang
Drafting of the manuscript: Michael J. Miskiewicz, Salvatore Capotosto, Christian Leonardo
Critical review of the manuscript for important intellectual content: Michael J. Miskiewicz, Salvatore Capotosto, Kenny Ling, Dorian Cohen, David Komatsu, Edward D. Wang
Supervision: Michael J. Miskiewicz, Christian Leonardo, David Komatsu, Edward D. Wang
References
- 1.Exploring the potential of Chat GPT in personalized obesity treatment. Arslan S. Ann Biomed Eng. 2023;51:1887–1888. doi: 10.1007/s10439-023-03227-9. [DOI] [PubMed] [Google Scholar]
- 2.Relationship between oral health literacy and oral health status. Baskaradoss JK. BMC Oral Health. 2018;18:172. doi: 10.1186/s12903-018-0640-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Low health literacy and health outcomes: an updated systematic review. Berkman ND, Sheridan SL, Donahue KE, Halpern DJ, Crotty K. Ann Intern Med. 2011;155:97–107. doi: 10.7326/0003-4819-155-2-201107190-00005. [DOI] [PubMed] [Google Scholar]
- 4.Osteotomy around the knee: assessment of quality, content and readability of online information. Broderick JM, McCarthy A, Hogan N. Knee. 2021;28:139–150. doi: 10.1016/j.knee.2020.11.010. [DOI] [PubMed] [Google Scholar]
- 5.Buchanan BK, Varacallo M. StatPearls [Internet] Treasure Island (FL): StatPearls Publishing; 2023. Lateral epicondylitis (tennis elbow) [PubMed] [Google Scholar]
- 6.Centers for Disease Control and Prevention, U.S. Department of Health and Human Services: Simply Put: a guide for creating easy-to-understand materials. [ Jan; 2023 ]. 2010. https://www.cdc.gov/healthliteracy/pdf/Simply_Put.pdf https://www.cdc.gov/healthliteracy/pdf/Simply_Put.pdf
- 7.Readability and quality of online patient education material on websites of Breast Imaging Centers. Choudhery S, Xi Y, Chen H, et al. J Am Coll Radiol. 2020;17:1245–1251. doi: 10.1016/j.jacr.2020.04.016. [DOI] [PubMed] [Google Scholar]
- 8.Readability of online patient material provided by Reflux Centers in the United States. Ekkel E, Seeras K. Am Surg. 2023;89:2782–2784. doi: 10.1177/00031348211050828. [DOI] [PubMed] [Google Scholar]
- 9.OpenAI: GPT-4 technical report. [ Feb; 2024 ]. 2024. https://cdn.openai.com/papers/gpt-4.pdf https://cdn.openai.com/papers/gpt-4.pdf
- 10.Readability of online foot and ankle surgery patient education materials. Hartnett DA, Philips AP, Daniels AH, Blankenhorn BD. Foot Ankle Spec. 2022:19386400221116463. doi: 10.1177/19386400221116463. [DOI] [PubMed] [Google Scholar]
- 11.Work-related risk factors for lateral epicondylitis and other cause of elbow pain in the working population. Herquelot E, Bodin J, Roquelaure Y, et al. Am J Ind Med. 2013;56:400–409. doi: 10.1002/ajim.22140. [DOI] [PubMed] [Google Scholar]
- 12.Management of lateral epicondylitis. Lenoir H, Mares O, Carlier Y. Orthop Traumatol Surg Res. 2019;105:0–6. doi: 10.1016/j.otsr.2019.09.004. [DOI] [PubMed] [Google Scholar]
- 13.Evaluation of a Chat GPT generated patient information leaflet about laparoscopic cholecystectomy. Lockie E, Choi J. ANZ J Surg. 2024;94:353–355. doi: 10.1111/ans.18834. [DOI] [PubMed] [Google Scholar]
- 14.Readability and suitability of online patient education materials for glaucoma. Martin CA, Khan S, Lee R, Do AT, Sridhar J, Crowell EL, Bowden EC. Ophthalmol Glaucoma. 2022;5:525–530. doi: 10.1016/j.ogla.2022.03.004. [DOI] [PubMed] [Google Scholar]
- 15.Health literacy and adherence to medical treatment in chronic and acute illness: a meta-analysis. Miller TA. Patient Educ Couns. 2016;99:1079–1086. doi: 10.1016/j.pec.2016.01.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Chat GPT for the management of obstructive sleep apnea: do we have a polar star? Mira FA, Favier V, Dos Santos Sobreira Nunes H, et al. Eur Arch Otorhinolaryngol. 2024;281:2087–2093. doi: 10.1007/s00405-023-08270-9. [DOI] [PubMed] [Google Scholar]
- 17.Evaluation of readability of patient education materials on lateral epicondylitis (tennis elbow) from the top 25 orthopedic institutions. Miskiewicz M, Capotosto S, Wang ED. JSES Int. 2023;7:877–880. doi: 10.1016/j.jseint.2023.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Using ChatGPT for writing articles for patients' education for dermatological diseases: a pilot study. Mondal H, Mondal S, Podder I. Indian Dermatol Online J. 2023;14:482–486. doi: 10.4103/idoj.idoj_72_23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.A pilot study on the capability of artificial intelligence in preparation of patients' educational materials for Indian public health issues. Mondal H, Panigrahi M, Mishra B, Behera JK, Mondal S. J Family Med Prim Care. 2023;12:1659–1662. doi: 10.4103/jfmpc.jfmpc_262_23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.National Center for Education Statistics: Digest of education statistics: 2021. [ Feb; 2024 ]. 2021. https://nces.ed.gov/programs/digest/d21/ https://nces.ed.gov/programs/digest/d21/
- 21.National Institutes of Health: Clear & Simple: developing effective print materials for low-literate readers. 95-3594). National Cancer Institute. [ Nov; 2022 ]. 1994. https://www.nih.gov/institutes-nih/nih-office-director/office-communications-public-liaison/clear-communication/clear-simple https://www.nih.gov/institutes-nih/nih-office-director/office-communications-public-liaison/clear-communication/clear-simple
- 22.Readability of patient educational materials in sports medicine. Ó Doinn T, Broderick JM, Clarke R, Hogan N. Orthop J Sports Med. 2022;10:23259671221092356. doi: 10.1177/23259671221092356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.OpenAI: Introducing ChatGPT. [ Feb; 2024 ]. 2024. https://openai.com/index/chatgpt/ https://openai.com/index/chatgpt/
- 24.The causal pathways linking health literacy to health outcomes. Paasche-Orlow MK, Wolf MS. http://pubmed.ncbi.nlm.nih.gov/17931132/ Am J Health Behav. 2007;31:0–26. doi: 10.5555/ajhb.2007.31.supp.S19. [DOI] [PubMed] [Google Scholar]
- 25.Readability Formulas: Readability Scoring System. [ Feb; 2024 ]. 2024. https://readabilityformulas.com/readability-scoring-system.php#formulaResults https://readabilityformulas.com/readability-scoring-system.php#formulaResults
- 26.Relationship of preventive health practices and health literacy: a national study. White S, Chen J, Atchison R. http://pubmed.ncbi.nlm.nih.gov/18067463/ Am J Health Behav. 2008;32:227–242. doi: 10.5555/ajhb.2008.32.3.227. [DOI] [PubMed] [Google Scholar]
- 27.Health literacy assessment and patient satisfaction in surgical practice. Yim CK, Shumate L, Barnett SH, Leitman IM. Ann Med Surg (Lond) 2018;35:25–28. doi: 10.1016/j.amsu.2018.08.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Generative AI in healthcare: an implementation science informed translational path on application, integration and governance. Reddy S. Implement Sci. 2024;19:27. doi: 10.1186/s13012-024-01357-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Revolutionizing healthcare: the role of artificial intelligence in clinical practice. Alowais SA, Alghamdi SS, Alsuhebany N, et al. BMC Med Educ. 2023;23:689. doi: 10.1186/s12909-023-04698-z. [DOI] [PMC free article] [PubMed] [Google Scholar]


