Skip to main content
Medicine logoLink to Medicine
. 2025 Oct 24;104(43):e45135. doi: 10.1097/MD.0000000000045135

Comprehensibility and readability of selected artificial intelligence chatbots in providing uveitis-related information

Halil İbrahim Sönmezoğlu a,*, Büşra Güner Sönmezoğlu b, Mustafa Hüseyin Temel c, Burçin Çakir d
PMCID: PMC12558299  PMID: 41137349

Abstract

This study aims to evaluate and compare the quality and comprehensibility of responses generated by 5 artificial intelligence chatbots – ChatGPT-4, Claude, Mistral, Grok, and Google PaLM – to the most frequently asked questions about uveitis. Google Trends was employed to identify significant phrases associated with uveitis. Each artificial intelligence chatbot was provided with a unique sequence of 25 frequently searched terms as input. The responses were evaluated using 3 distinct tools: The Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), the Simple Measure of Gobbledygook (SMOG) index, and the Automated Readability Index (ARI). The 3 most frequently searched terms were “uveitis eye,” “anterior uveitis,” and “uveitis symptoms.” Among the chatbots evaluated, GPT-4 demonstrated the lowest ARI and SMOG scores (P = .001). Regarding the PEMAT-P, Mistral scored the lowest in understandability, while Grok achieved the highest score for actionability (P < .001). All chatbots, except Mistral, exhibited high intelligibility scores. GPT-4 had the lowest SMOG and ARI score among the chatbots evaluated, making it the easiest to read. Chatbot technology holds significant potential to enhance healthcare information dissemination and facilitate better patient understanding. While chatbots can effectively provide information on health topics such as uveitis, further improvement is needed to maximize their efficacy and accessibility.

Keywords: artificial intelligence, ChatGPT-4, readability, understandability, uveitis

1. Introduction

Uveitis can be a serious cause of vision impairment; thus, it can also reduce the quality of life for the patient.[1] It is estimated that about 5% to 10% of the world population is affected by uveitic vision impairment.[2] In addition, a huge portion of the patient group with uveitis, which accounts for about 35%, sees their vision becoming drastically worse. Some of them are even likely to become permanently blind.[3,4] It is important that patients fully understand the disease process and adhere to treatment at an adequate level. In this context, AI chatbots, which have become increasingly popular in recent times, can also be used to provide information about uveitis and for patient education.

AI chatbots are now widely used for patient education, chronic disease management, mental health support, and self-diagnosis, with evidence of increasing adoption in both developed and developing countries.[5,6]

AI chatbots are exceptional robotic programs created for social language-user interactivity. They behave like virtual assistants on forums and web-based applications.[7] These chatbots have many different uses in various sectors, like customer service, healthcare interactions, and symptom identification. In healthcare, AI chatbots help users decide whether to see a doctor.[8]

Patient education comprises another main area where AI may be utilized, e.g., giving information about diseases and medication use, offering diagnostic or therapeutic recommendations, patient triage, disclosing the risks and advantages of surgery, and preoperative anxiety management steps along with providing accurate postoperative care and follow-up instructions.[9]

Given the level of trust patients have in patient education materials, it is essential that such content be clear and accurate. The information provided by freely accessible AI chatbots is very important, as it enables patients to play an active role in their own healthcare, and therefore must be readable, understandable, and accurate.[10] Therefore, in our study, we examined the quality and readability of the responses provided by 5 different chatbots to keywords related to uveitis obtained from Google Trends (GT).

Google searches exhibit a significant association with current events, particularly emphasizing health and medical inquiries. Public sentiment on healthcare information may also be discerned from GT and has proven particularly beneficial during recent outbreaks and epidemics.[11]

Although there are several studies comparing the readability and quality of informative texts on different health conditions and uveitis by AI chatbots, this article aims to compare the comprehensibility, readability, and quality of texts generated by 5 different AI chatbots on frequently asked questions about uveitis obtained with GT and to identify deficiencies, unlike other studies.[1214]

2. Methods

The cross-sectional study was conducted on May 1, 2024, in the Ophthalmology Department of Serdivan State Hospital. Our study did not require ethics committee approval since it did not directly collect data through methods such as questionnaires, interviews, focus group studies, did not analyze personal data of people, and did not conduct in vivo or in vitro experiments on subjects.[1517]

GT was used to identify the most frequently searched keywords related to uveitis (https://trends.google.com/). Prior to initiating the searches, all personal browser data were deleted to prevent potential bias. “Uveitis” was entered into GT. Options were selected for all categories worldwide from 2004 to the present. The 25 most frequently searched terms related to the topic were identified. “Uveitis ojo” and “Que es uveitis” were excluded because they were not in English, “Anterior” and “Glaukoma” were excluded because they were not related to Uveitis, and “Uveitis dogs” and “Uveitis in dogs” words were removed because they do not concern the human species. The remaining 19 words were entered separately into each chatbot without being changed. Table 1 shows the 19 most frequent searches about uveitis according to Google trend data from 2004 to 2024.

Table 1.

The 19 most frequent searches about uveitis according to Google trend data from 2004 to 2024.

Rank Keyword Relevance
1 Uveitis eye 100
2 Anterior uveitis 87
3 Uveitis Symptoms 39
4 Uveitis Treatment 38
5 Uveitis eyes 27
6 Uveitis Posterior 25
7 Uveitis in eye 25
8 Uveitis Causes 25
9 What is uveitis 22
10 Iritis 21
11 Uveitis cause 18
12 Uveitis pain 17
13 Arthritis 16
14 Uveitis Ocular 15
15 Acute uveitis 13
16 Uveitis ICD-10 12
17 Uveitis meaning 12
18 Spondylitis 12
19 Ankylosing spondylitis uveitis 10

Search queries were systematically entered into 5 AI chatbots: GPT-4 (https://chat.openai.com/), Claude-3 (https://claude.ai/), Grok (https://grok.x.ai/), Mistral Large (https://mistral.ai/), and Google PaLM 2 (https://ai.google/palm2/). Each query was processed on a separate web page to ensure distinct separation and enhance the analytical process. Individual accounts were created for interacting with each AI chatbot to maintain clear differentiation. Prior to starting the searches, all browser data were thoroughly erased. The chatbot responses were recorded for subsequent evaluation of their quality and clarity.

To evaluate the accuracy and reliability of the healthcare information provided by each chatbot, the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P) was utilized.[9] Two experienced ophthalmologists, BGS and HIS, jointly discussed and determined the PEMAT-P score. If they cannot reach a joint decision, they consult MHT as a third specialist. The responses of AI chatbots were scored without knowing which bot provided which answers.

The PEMAT-P is applicable to both physical and digital materials. Understandability refers to the ease with which individuals from various backgrounds and levels of health literacy can comprehend, analyze, and articulate the primary message presented in the materials. Actionability assesses how easily consumers can determine the appropriate course of action based on the provided information. The PEMAT-P consists of 24 components, including fifteen items rated on a 2-point scale (0 indicating disagreement and 1 indicating agreement) and 9 items rated on a 3-point scale (0 indicating disagreement, 1 indicating agreement, and 2 indicating the item is not applicable). Higher scores indicate greater understandability or actionability. Materials scoring below 70% are considered to lack understandability or actionability, while scores of 70% or higher indicate the materials are intelligible or actionable.[18] PEMAT-P measures not only the readability of a material, but also how easily patients can understand the information (understandability) and how easily they can apply this information to improve their health (actionability). These 2 dimensions are the cornerstones of the effectiveness of patient education materials. Unlike traditional formulas, PEMAT-P evaluates not only the content itself, but also structural and visual elements. This takes into account how the patient perceives elements such as the layout of the text, font, visual-text relationship, headings, and paragraphs.[18]

The simple measure of Gobbledygook (SMOG) index assesses the quantity of words with multiple syllables and the overall number of sentences. The SMOG formula can be calculated using the equation 1.043 multiplied by the square root of the total amount of polysyllabic words, multiplied by 30 divided by the total number of sentences, and then adding 3.1291. The SMOG formula evaluates a text according to the intricacy of its sentences and vocabulary. It specifically examines the quantity of polysyllabic words (words containing 3 or more syllables) and the number of sentences inside a document.[19] It is a readability framework. It measures how many years of education the average person needs to have to understand a text. It is best for texts of 30 sentences or more. Although SMOG is widely used, healthcare is the sector it is mostly used in. Medical use of the formula has been helped along by research. Studies looked at different formulas and their usefulness in healthcare. For example, 1 case study assessing online Parkinson disease information called SMOG the “gold standard.”[20]

The automated readability index (ARI) assesses the United States grade level required to read a piece of text. In some ways, it is similar to other formulas. Its difference is rather than counting syllables, it counts characters. The more characters, the harder the word. It also counts sentences. This sets it apart from some other formulas. The ARI calculates a score based on the average number of characters per word and the average number of words per sentence. The ARI is suited to technical writing. The ARI formula can be expressed as 4.71 multiplied by the ratio of the number of characters to the average number of words per phrase, plus 0.5 multiplied by the ratio of the average number of words per sentence to the average number of sentences, minus 21.43.[19] Both SMOG and ARI are part of a group of traditional readability formulas that aim to predict the grade level required to understand a text. Studies show that, although their average scores are generally similar, they can differ by up to 2 grade levels for the same text, especially for technical or complex material. Differences may arise because each formula counts words and sentences differently and processes punctuation marks, abbreviations, and numbers differently.[21]

The statistical analysis was conducted using SPSS version 26.0 (IBM, New York, USA). The data’s normality was assessed using the Shapiro–Wilk test. Continuous data were analyzed using the minimum, maximum, mean, median and standard deviation, whereas categorical data were represented by their frequency. The Kruskal–Wallis test was employed to evaluate disparities in means among groups. Bonferroni correction was applied in subgroup analysis. A significance level of .05 was employed, resulting in a confidence interval of 95%.

3. Results

The 3 most commonly queried terms were “Uveitis eye,” “Anterior uveitis,” and “Uveitis Symptoms.” (Table 1) Figure 1 shows global search interest in uveitis in different regions from 2004 to 2024, excluding locations with low search volume, using data from GT.

Figure 1.

Figure 1.

The provided data from Google Trends displays the global search interest in uveitis across different regions from 2004 to 2024. It specifically excludes locations with low search volumes.

The study’s results indicated a statistically significant difference (P < .001) between the SMOG scores of the chatbots. The utilization of the Bonferroni correction statistically significance of these disparities, revealing a statistically significant variation in the SMOG scores between Mistral and Google PaLM, Mistral and ChatGPT-4, Claude-3 and PaLM, and Claude-3 and Chat GPT-4, as well as ChatGPT-4 and Grok. Chat GPT-4 received the lowest score, while Claude-3 received the highest SMOG score (P < .001), indicating significant differences in performance.

The study’s findings demonstrated statistically significant differences in the ARI scores across the evaluated chatbots. The application of Bonferroni correction statistical analysis confirmed the statistical significance of these disparities, revealing significant variations in ARI scores between Mistral and Google PaLM, Mistral and ChatGPT-4, Claude-3 and PaLM, Claude-3 and ChatGPT-4, as well as ChatGPT-4 and Grok. ChatGPT-4 attained the lowest ARI score, while Claude-3 achieved the highest score (P < .001). So, according to ARI and SMOG scores, Chat GPT-4 had the best readability, while Claude-3 had the worst readability.

The SMOG and ARI scores were found to be strongly correlated. (P < 0,001, r:0,946).

The chatbots demonstrated substantial differences in their PEMAT-P scores (P < .001). The analysis of PEMAT-P understandability scores, using the Bonferroni correction, revealed that Mistral had significantly lower scores compared to the other chatbots. Furthermore, a notable difference was observed when the chatbots were compared based on their PEMAT-P actionability scores (P < .001). Subgroup analysis revealed significant differences, with Claude-3 exhibiting significantly lower scores compared to the other chatbots, and PaLM also showing significantly lower scores relative to the others.

Mean, standard deviation, minimum, maximum, median and interquartile range (IQR) values of PEMAT-A PEMAT-U, SMOG and ARI scores of 5 different chatbots are shown in Table 2.

Table 2.

Mean, standard deviation, minimum, maximum, median and IQR values of PEMAT-A PEMAT-U, SMOG and ARI scores of 5 different chatbots.

Mistral Claude-3 PaLM GPT-4 Grok P
The Simple Measure of Gobbledygook (SMOG) Mean ± SD 11.84 ± 1.98 12.35 ± 2.24 9.84 ± 1.78 9.06 ± 1.64 11.20 ± 1.80 <.001 *
Min–max 7.15–14.92 8.29–16.56 5.83–12.59 4.06–11.57 8.41–13.88
Median 12.48 12.52 9.79 9.45 10.46
IQR (10.2–13.3) (10.7–14) (8.7–11.2) (8.2–10) (9.7–12.8)
The Automated Readability Index (ARI) Mean ± SD 13 ± 2.3 14.22 ± 2.2 11.42 ± 2 10.36 ± 1.83 12.47 ± 1.83 <.001
Min–max 7–17 11–19 6–15 4–12 10–15
Median 14 14 11 11 12
IQR (12–14) (13–16) (11–13) (10–12) (11–14)
PEMAT-P Understandability (%) Mean ± SD 54.31 ± 9.18 75.26 ± 8.3 73.63 ± 6 71.94 ± 9.62 68.74 ± 7.84 <.001
Min–max 40–73 55–82 67–82 58–82 55–82
Median 53 80 78 73 67
IQR (46–60) (70–82) (67–78) (64–82) (64–73)
PEMAT-P Actionability (%) Mean ± SD 33.68 ± 14.98 14.73 ± 16.11 3.15 ± 10.02 32.63 ± 11.94 40 ± 14.90 <.001 §
Min–max 20–60 0–40 0–40 20–60 20–60
Median 40 20 0 40 40
IQR (20–40) (0–20) (0–0) (20–40) (20–60)

ARI = automated readability index, GPT = generative pretrained transformer, IQR = interquartile range, PEMAT-p = the patient education materials assessment tool for printable materials, SD = standard deviation, SMOG = simple measure of Gobbledygook index.

*

Difference is between mistral and google PaLM, mistral and ChatGPT, claude-3 and PaLM, claude-3 and ChatGPT, ChatGPT and Grok.

Difference is between mistral and ChatGPT, Claude-3 and PaLM, Claude-3 and ChatGPT, ChatGPT and Grok.

Differences are between mistral and others.

§

Differences between Claude and others, Google PaLM and others.

4. Discussion

Our study evaluated the answers provided by 5 different chatbots to questions about uveitis in terms of readability. Notably, none of the chatbots achieved the desired 8th grade reading level.[22] Additionally, the chatbot responses demonstrated low actionability scores on the PEMAT test, while all chatbots except Mistral exhibited high understandability scores.

Given the possibility of leveraging AI to access health information, acquire health-related knowledge, and make treatment decisions, chatbot responses’ readability, quality, and comprehensibility should be of high standards. Health literacy empowers patients to engage in medical decision-making processes actively.[23] Conversely, inadequate readability may impair health literacy, diminishing patients’ capacity to comprehend health information and engage in informed medical decision-making.[24,25] Our study’s analysis of SMOG and ARI metrics revealed that ChatGPT-4 had the best readability, while Claude-3 had the worst. Notably, all the chatbot responses met or exceeded the recommended readability standard. Addressing this challenge may require implementing more advanced natural language processing methods to reduce the complexity of the language. Furthermore, incorporating user feedback mechanisms and algorithms to adjust readability could aid in generating content that is more accessible to the target audience. Several factors contribute to ChatGPT-4’s high readability. These include a comprehensive parameter set, a large number of users and collaborating experts who provide continuous feedback for its training, advanced reasoning and instruction-following capabilities, more up-to-date training data, and insights from the practical applications of previous models integrated into GPT-4’s security research and monitoring system. All of these factors may have contributed to ChatGPT-4 providing more accurate responses.[16]

In our study, the most frequently searched words related to uveitis obtained from GT were directly entered into chatbots without any referral questions. Therefore, the quality and readability of the texts may not have been written at the desired levels. With guiding questions, it would also be possible to obtain higher quality and more readable texts. The reason we chose to input such search terms directly was based on the assumption that users might directly ask chatbots these questions.

There are several studies in the literature that have examined the performance of AI systems in ophthalmology. For example, Kianian et al.[26] compared the readability of educational materials on uveitis generated by ChatGPT and Bard, finding that ChatGPT produced more accessible content. Rasmussen et al.[27] found that over half of ChatGPT’s responses to inquiries regarding vernal keratoconjunctivitis were either entirely accurate or contained only negligible and harmless errors. Yilmaz and Dogan comparative analysis revealed that chatbot responses regarding cataracts were more detailed and accurate in content than the information provided on the American Academy of Ophthalmology’s website.[28] Lim et al.’s[16] comparative analysis of chatbot responses related to myopia care specifically highlighted the potential of ChatGPT-4.o to provide accurate and comprehensive answers to myopia-related queries. The cumulative findings from these studies demonstrate that AI chatbots hold significant potential for ophthalmological educational materials despite the current limitations.

There are several studies in the literature about the readability and quality of patient education materials, AI chatbot responses, and online resources in other medical disciplines that have produced findings consistent with the current investigation. For instance, Temel et al.‘s examination of ChatGPT’s responses to frequently asked questions about spinal cord injury revealed that the AI’s outputs lacked sufficient readability and quality.[29] Şahin et al.[30] evaluated the readability and quality of 5 AI-based chatbots focused on erectile dysfunction, finding that none met the required standards. Similarly, Srinivasan et al.[31] compared the readability of patient education materials generated by GPT-3.5, GPT-4, and Bard, as well as online institutional resources, and highlighted the potential of large language models to enhance the readability of bariatric surgery patient education materials. The results of these studies indicate that advanced natural language processing methods to simplify language, user feedback mechanisms to adjust readability, and interdisciplinary collaboration between AI developers, healthcare professionals, and linguists to create more accurate, accessible, and readable content are at the urge.

Chatbots’ Terms of Use and Privacy Policies explicitly state that chatbots are not intended to provide medical advice, diagnosis, or treatment and should not be used as a substitute for professional healthcare. These disclaimers are crucial for understanding the limitations of chatbots’ functionality in the medical domain. Despite this, the natural language fluency of chatbots’ responses can create a false impression of clinical authority, potentially leading users to over-rely on AI-generated information. This disconnect introduces ethical concerns around user safety and informed use. Moreover, the privacy policy cautions users against entering sensitive health information, raising questions about data protection when health-related queries are used in research. From an ethical standpoint, these terms necessitate that researchers emphasize the experimental and informational nature of chatbots, ensuring that neither the study nor its audience misconstrue the tool as a validated clinical support system.[32,33]

A low actionability score in chatbots means that they lack clear, practical instructions that users can follow to make informed decisions or take next steps. Studies evaluating AI chatbots’ responses to common cancer-related queries show that actionability scores are consistently low, with median scores ranging from 20% to 40% in validated evaluation tools, suggesting that users may struggle to understand what actions to take based on the chatbot’s recommendations.[34]

In our study, although Grok had the highest actionability score, it was still not at the desired level. This limitation is significant because actionable guidance is crucial for effective health communication, especially for patients seeking help with complex medical issues. The use of chatbots with low actionability as a research resource is questionable and should be used as complementary tools rather than primary sources of medical advice. Improving actionability is essential to ensure that chatbots can truly support users in making informed health decisions.

There are several limitations of this study that need to be acknowledged. Firstly, the scope of the analysis was constrained to the initial 25 search terms, which may have restricted the breadth of the study and potentially overlooked other pertinent queries related to uveitis. Secondly, the exclusive focus on English keywords could limit the generalizability of the findings to non-English-speaking populations, potentially reducing the applicability of the results in a global context. Additionally, the study utilized specific AI chatbots available at the time, but as this technology rapidly evolves, newer models may exhibit divergent performance characteristics. Furthermore, the investigation did not account for the varying user experiences and interactions with the chatbots, which could influence perceptions of the quality and readability of the responses. Moreover, as the assessments of the text quality generated by chatbots were performed by 2 ophthalmologists, the outcomes may fluctuate based on the assessors’ expertise, knowledge, and biases.

Future research should consider these factors and explore broader, more inclusive approaches to better understand and enhance the effectiveness of AI chatbots in healthcare education.

5. Conclusion

This study revealed statistically significant differences in readability and understandability scores among 5 commonly used chatbots. According to SMOG and ARI, ChatGPT-4 showed the best readability, while Claude-3 showed the poorest readability performance, but none of the chatbots achieved the required reading level. Chatbots other than Mistral demonstrated a high level of understandability. Claude-3 and PaLM received significantly lower actionability scores. These results indicate that AI must be trained by medical literature, utilize neurolinguistic programming techniques, incorporate user feedback mechanisms, and engage in interdisciplinary collaboration to ensure the safe and effective use of AI-based tools in healthcare settings.

Author contributions

Conceptualization: Halil İbrahim Sönmezoğlu.

Data curation: Halil İbrahim Sönmezoğlu, Büşra Güner Sönmezoğlu.

Formal analysis: Halil İbrahim Sönmezoğlu, Büşra Güner Sönmezoğlu.

Investigation: Mustafa Hüseyin Temel.

Supervision: Mustafa Hüseyin Temel.

Writing – original draft: Halil İbrahim Sönmezoğlu.

Writing – review & editing: Mustafa Hüseyin Temel, Burçin Çakir.

Abbreviations:

AI
artificial intelligence
ARI
automated readability index
GPT
generative pretrained transformer
IQR
interquartile range
PEMAT-p
the patient education materials assessment tool for printable materials
SD
standard deviation
SMOG
simple measure of Gobbledygook index

Since the research did not involve any procedures on living organisms or human data, ethical approval was not required.

The authors have no funding and conflicts of interest to disclose.

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

How to cite this article: Sönmezoğlu Hİ, Güner Sönmezoğlu B, Temel MH, Çakir B. Comprehensibility and readability of selected artificial intelligence chatbots in providing uveitis-related information. Medicine 2025;104:43(e45135).

Contributor Information

Büşra Güner Sönmezoğlu, Email: busra-gnr1@hotmail.com.

Mustafa Hüseyin Temel, Email: mhuseyintemel@gmail.com.

Burçin Çakir, Email: b_koklu@yahoo.com.

References

  • [1].Zhang Z, Griva K, Rojas-Carabali W, et al. Psychosocial well-being and quality of life in uveitis: a review. Ocul Immunol Inflamm. 2024;32:1380–94. [DOI] [PubMed] [Google Scholar]
  • [2].Miserocchi E, Fogliato G, Modorati G, Bandello F. Review on the worldwide epidemiology of uveitis. Eur J Ophthalmol. 2013;23:705–17. [DOI] [PubMed] [Google Scholar]
  • [3].de Smet MD, Taylor SR, Bodaghi B, et al. Understanding uveitis: the impact of research on visual outcomes. Prog Retin Eye Res. 2011;30:452–70. [DOI] [PubMed] [Google Scholar]
  • [4].E Cunningham ET, Zierhut M. Vision loss in uveitis. Ocul Immunol Inflamm. 2021;29:1037–9. [DOI] [PubMed] [Google Scholar]
  • [5].Clark M, Bailey S. Chatbots in health care: connecting patients to information: emerging health technologies. In: CADTH Horizon Scans. Canadian Agency for Drugs and Technologies in Health; 2024. [PubMed] [Google Scholar]
  • [6].Wah JNK. Revolutionizing e-health: the transformative role of AI-powered hybrid chatbots in healthcare solutions. Front Public Health. 2025;13:1530799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Pérez-Soler S, Juarez-Puerta S, Guerra E, de Lara J. Choosing a chatbot development tool. IEEE Softw. 2021;38:94–103. [Google Scholar]
  • [8].Athota L, Shukla VK, Pandey N, Rana A. Chatbot for healthcare system using artificial intelligence. In: 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (trends and future directions)(ICRITO). IEEE; 2020:619–622. [Google Scholar]
  • [9].Ittarat M, Cheungpasitporn W, Chansangpetch S. Personalized care in eye health: exploring opportunities, challenges, and the road ahead for chatbots. J Pers Med. 2023;13:1679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Khan S, Moon J, Martin CA, et al. Readability and suitability of online uveitis patient education materials. Ocul Immunol Inflamm. 2024;32:1175–9. [DOI] [PubMed] [Google Scholar]
  • [11].Mehra M, Brody PA, Mehrotra S, Sakhalkar O, Maugans T. Google Trends™ and quality of information analyses of Google™ searches pertaining to concussion. Neurotrauma Rep. 2023;4:159–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Marshall RF, Mallem K, Xu H, et al. Investigating the accuracy and completeness of an artificial intelligence large language model about uveitis: an evaluation of ChatGPT. Ocul Immunol Inflamm. 2024;32:2052–5. [DOI] [PubMed] [Google Scholar]
  • [13].Demir S. Comparison of ChatGPT-4o, Google Gemini 1.5 Pro, Microsoft Copilot Pro, and ophthalmologists in the management of uveitis and ocular inflammation: a comparative study of large language models. J Fr Ophtalmol. 2025;48:104468. [DOI] [PubMed] [Google Scholar]
  • [14].Zhao FF, He HJ, Liang JJ, et al. Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3. Eye (Lond). 2025;39:1132–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Bagcier F, Yurdakul OV, Temel MH. Quality and readability of online information on myofascial pain syndrome. J Bodyw Mov Ther. 2021;25:61–6. [DOI] [PubMed] [Google Scholar]
  • [16].Lim ZW, Pushpanathan K, Yew SME, et al. Benchmarking large language models’ performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine. 2023;95:104770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Ning Y, Teixayavong S, Shang Y, et al. Generative artificial intelligence and ethical considerations in health care: a scoping review and ethics checklist. Lancet Digit Health. 2024;6:e848–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Asupoto O, Anwar S, Wurcel AG. A health literacy analysis of online patient-directed educational materials about mycobacterium avium complex. J Clin Tuberc Other Mycobact Dis. 2024;35:100424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Aslan A, Musmar B, Mamilly A, et al. The complexity of online patient education materials about interventional neuroradiology procedures published by major academic institutions. Cureus. 2023;15:e34233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Fitzsimmons PR, Michael BD, Hulley JL, Scott GO. A readability assessment of online Parkinson’s disease information. J R Coll Physicians Edinb. 2010;40:292–6. [DOI] [PubMed] [Google Scholar]
  • [21].Ocmen E, Erdemir I, Aksu Erdost H, Hanci V. Assessing parental comprehension of online resources on childhood pain. Medicine (Baltimore). 2024;103:e38569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Oliffe M, Thompson E, Johnston J, Freeman D, Bagga H, Wong PKK. Assessing the readability and patient comprehension of rheumatology medicine information sheets: a cross-sectional health literacy study. BMJ Open. 2019;9:e024582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Rooney MK, Santiago G, Perni S, et al. Readability of patient education materials from high-impact medical journals: a 20-year analysis. J Patient Exp. 2021;8:2374373521998847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Horner SD, Surratt D, Juliusson S. Improving readability of patient education materials. J Community Health Nurs. 2000;17:15–23. [DOI] [PubMed] [Google Scholar]
  • [25].Kim W, Kim I, Baltimore K, Imtiaz AS, Bhattacharya BS, Lin L. Simple contents and good readability: improving health literacy for LEP populations. Int J Med Inform. 2020;141:104230. [DOI] [PubMed] [Google Scholar]
  • [26].Kianian R, Sun D, Crowell EL, Tsui E. The use of large language models to generate education materials about uveitis. Ophthalmol Retina. 2024;8:195–201. [DOI] [PubMed] [Google Scholar]
  • [27].Rasmussen MLR, Larsen AC, Subhi Y, Potapenko I. Artificial intelligence-based ChatGPT chatbot responses for patient and parent questions on vernal keratoconjunctivitis. Graefes Arch Clin Exp Ophthalmol. 2023;261:3041–3. [DOI] [PubMed] [Google Scholar]
  • [28].Yilmaz IBE, Doğan L. Talking technology: exploring chatbots as a tool for cataract patient education. Clin Exp Optom. 2025;108:56–64. [DOI] [PubMed] [Google Scholar]
  • [29].Temel MH, Erden Y, Bağcier F. Information quality and readability: ChatGPT’s responses to the most common questions about spinal cord injury. World Neurosurg. 2024;181:e1138–44. [DOI] [PubMed] [Google Scholar]
  • [30].Şahin MF, Ateş H, Keleş A, et al. Responses of five different artificial intelligence chatbots to the top searched queries about erectile dysfunction: a comparative analysis. J Med Syst. 2024;48:38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Srinivasan N, Samaan JS, Rajeev ND, Kanu MU, Yeo YH, Samakar K. Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources. Surg Endosc. 2024;38:2522–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Erren TC, Lewis P, Shaw DM. Brave (in a) new world: an ethical perspective on chatbots for medical advice. Front Public Health. 2023;11:1254334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Garcia Valencia OA, Suppadungsuk S, Thongprayoon C, et al. Ethical implications of chatbot utilization in nephrology. J Pers Med. 2023;13:1363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Pan A, Musheyev D, Bockelman D, Loeb S, Kabarriti AE. Assessment of artificial intelligence chatbot responses to top searched queries about cancer. JAMA Oncol. 2023;9:1437–40. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Medicine are provided here courtesy of Wolters Kluwer Health

RESOURCES