Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 May 1.
Published in final edited form as: Curr Opin Urol. 2025 Feb 12;35(3):219–223. doi: 10.1097/MOU.0000000000001267

Artificial Intelligence and Patient Education

Olivia Paluszek 1, Stacy Loeb 1
PMCID: PMC11964839  NIHMSID: NIHMS2052808  PMID: 39945126

Abstract

Purpose of review:

Artificial intelligence (AI) chatbots are increasingly used as a source of information. Our objective was to review the literature on their use for patient education in urology.

Recent findings:

There are many published studies examining the quality of AI chatbots, most commonly ChatGPT. In many studies, responses from chatbots had acceptable accuracy but were written at a difficult reading level without specific prompts to enhance readability. A few studies have examined AI chatbots for other types of patient education, such as creating lay summaries of research publications or generating handouts.

Summary:

Artificial intelligence chatbots may provide an adjunctive source of patient education in the future, particularly if prompted to provide results with better readability. In addition, they may be used to rapidly generate lay research summaries, leaflets or other patient education materials for final review by experts.

Keywords: artificial intelligence, patient education, urology

Introduction

Effective health communications are important to increase public knowledge and awareness, prompt action, and refute myths.(1) Most U.S. adults get health information online; however, there is significant misinformation about urological topics.(2)

Recently, artificial intelligence (AI) chatbots have emerged as a source of online information. As of February 2024, 23% of U.S. adults had used ChatGPT, from 18% in July 2023.(3) The purpose of this review was to summarize published studies about AI for patient education in urology.

Methods

Pubmed was reviewed on 12/20/24 for “artificial intelligence patient education urology” yielding 49 results. After excluding 17 that were not relevant and 5 reviews, 27 articles with relevant primary data were included.

Results

Many studies evaluated the quality of urology patient education from publicly available AI chatbots, most commonly ChatGPT. Chen et al. derived responses to 11 common stress incontinence questions using ChatGPT 3.5.(4) Overall response quality was moderate to moderately high (3.73/5 DISCERN score). Overall accuracy was 88%, but actionability was poor (18%/100% PEMAT score) and reading level was college graduate.

Shayegh et al. assessed the accuracy of ChatGPT responses to 10 common questions about inflatable penile prothesis (IPP).(5) Urologists classified responses as “excellent” or “satisfactory with minimal clarification” and “unsatisfactory” in 70%, 20% and 10%. ChatGPT failed to include statistical data and the authors discuss how addressing this shortcoming would provide a more a valuable tool for education about procedures.

Gokmen et al. evaluated ChatGPT 4.0 responses to 1073 male infertility questions from AUA/ASRM guidelines.(6) Accuracy for true/false questions was initially 92% and increased to 94% after 60 days. Similarly, accuracy increased from 85% to 89% for multiple choice and from 78% to 86% for open-ended questions. ChatGPT improved at answering more difficult questions and incorporating guideline-based data over time.

Gibson et al. asked ChatGPT-4 about 8 common prostate cancer questions from Google Trends.(7) Urologists rated response as good quality (by modified DISCERN and Global Quality Scores), and 79.44% for understandability (modified PEMAT-AI). On a 5-point Likert scale, mean accuracy was 3.96, safety 4.32, appropriateness 4.45, actionability 4.05, and effectiveness 4.09. However, responses averaged an 11th grade reading level, limiting its value given the association between poor health literacy with worse quality of life.(8)

Mershon et al. assessed responses of ChatGPT 3.5 to 15 questions about renal cell carcinoma from professional society guidelines.(9) ChatGPT responses were generally accurate (mean 3.64/5) and useful (3.58/5) by both urologists and non-clinicians. However, clinicians were neutral about its usefulness to educate patients.

Other studies compared ChatGPT to materials from urological organizations. Gabriel et al examined ChatGPT 3.5 responses to 14 questions about complications of robotic prostatectomy.(10) 92.9% were appropriate and pertinent, and there was 78.6% concordance with a leaflet from the British Association of Urological Surgeons (BAUS). While new research is not yet incorporated into the ChatGPT database, the authors note that many leaflets are only updated periodically.

Shah et al. compared patient education about erectile dysfunction, premature ejaculation, low testosterone, sperm retrieval, penile augmentation and male infertility from the Urology Care Foundation (UCF) versus ChatGPT 3.5.(11) Accuracy was comparable and content from ChatGPT was rated more comprehensive. However, both sources exceeded the recommended reading level. Readability for some responses was improved with prompts to “explain it to me like I am in sixth grade.” The authors suggest that organizations involve physicians to create patient education then use AI chatbots to improve readability, or vice versa.

Johnson et al. examined ChatGPT versus leaflets regarding pelvic floor surgery from the International Urogynecological Association (IUGA).(12). Multiple urogynecologists rated similar accuracy and completeness of information from ChatGPT (3/2023 version) compared with leaflets, and metrics improved on a re-rerun after 3 months (p<0.01). However, ChatGPT responses had significantly lower understandability and actionability than IUGA leaflets.

Studies also compared different ChatGPT versions for urology patient education. Ergin et al. evaluated ChatGPT 3.5 and 4.0 responses to 92 questions about male hypogonadism, erectile dysfunction and sexual desire disorder, ejaculatory dysfunction, penile abnormalities, priapism, and male infertility.(13) Although urologists rated both versions as mostly satisfactory answers, they were often missing information. For ChatGPT 3.5, 50% were completely correct, 33.6% missing information, 13.3% mixed accurate and misleading, and 3.3% completely incorrect (vs. 57.6%, 32.2%, 11.1%, and 2.2% for ChatGPT 4.0). The authors conclude that ChatGPT may be helpful but given its frequent information gaps should only be used under clinician supervision.

Some studies examined prompting the chatbots for patient education. Rotem et al. prompted ChatGPT to provide clear and specific responses to 10 questions about urinary incontinence.(14) Mean ratings from blinded urogynecologists were 3.9/5 for accuracy (71% of scores ≥ 4), 4.0 for comprehensiveness (74% ≥ 4), and 4.0 for safety (74% ≥ 4). The researchers conclude that ChatGPT has potential to be helpful for incontinence, but there remains room for improvement in providing precise and accurate information.

Hershenhouse et al. evaluated responses of ChatGPT 3.5 to 9 frequently searched questions about prostate cancer.(15) They separately tested a prompt to provide a comprehensive accurate response at ≤6th-grade reading level. Urologists rated chatbot responses on a 5-point Likert scale. Although overall accuracy and completeness were high for unprompted and prompted outputs, responses about diagnosis and treatment were less accurate than for follow-up. Additionally, ChatGPT’s simplified responses had significantly better readability. This study also included lay participants, with the majority rating the simplified outputs highly for clarity and understandability. They concluded that prompting chatGPT to produce simpler responses may increase utility in patient education.

Halawani et al. compared materials on kidney cancer from the American Urological Association (AUA) and European Association of Urology (EAU) to ChatGPT 4.0, Gemini, and Perplexity.(16). Responses were recorded when unprompted and after prompting to respond at a sixth-grade level. Responses were assessed by urologists from 1 (accurate/complete) to 5 (inaccurate). All chatbot outputs were accurate, with significant variability depending on the questions asked. ChatGPT 4.0 displayed the highest overall accuracy (mean 1.5 ), followed by Gemini (1.6) and Perplexity (2.2). Readability was poor across all chatbots, with Perplexity having the greatest complexity; and AUA the most readable (9.84). Although readability improved after prompting, the chatbots were not consistently able to respond at a sixth-grade level. The authors discuss how complexity and variability in chatbot responses limit usefulness in patient education.

Shah et al. compared patient education about kidney, bladder and prostate cancer from Epic MyChart and Urology Care Foundation (UCF) to ChatGPT 3.5.(17) Blinded urologic oncologists rated ChatGPT responses as greatest quality and EPIC as lowest (DISCERN); however, readability was worst for ChatGPT and best for Epic. Readability improved with prompting ChatGPT to “explain it to me like I am in sixth grade,” but accuracy and comprehensiveness decreased.

Other studies compared patient education between different AI chatbots. Alasker et al. asked ChatGPT 3.5, ChatGPT 4.0, and Bard to answer 52 prostate cancer-related questions sourced from educational websites.(18) Urologists rated no significant difference in overall accuracy between the chatbots, but ChatGPT 3.5 produced more correct responses to general knowledge questions (88.9% correct) than ChatGPT-4 (77.8%) and Bard (22.2%). ChatGPT 4.0 had greater overall comprehensiveness (67.3%) versus ChatGPT 3.5 (40.4%) and Bard (48.1%). Bard’s responses were the most readable, with a Flesch-Kincaid Reading Level of 10.2 (versus 13.0 and 12.3 for ChatGPT 3.5 and 4.0).

Erkan et al. asked ChatGPT, Gemini, and Copilot 4 common questions about prostate, bladder, kidney, and testicular cancer treatment from Google Trends.(19) Urologists rated ChatGPT and Gemini responses as average quality (DISCERN sum 41 and 42 respectively), while Copilot responses were low quality (DISCERN sum 35),. Gemini had moderate actionability (60% PEMAT) and the others were low (40% PEMAT). All three had low understandability (40% PEMAT) and above-college level readability. The researchers conclude that chatbots should be used cautiously for patient information given difficult readability and moderate quality.

Similarly, Musheyev et al. evaluated responses from ChatGPT, Perplexity, Chat Sonic, and Microsoft Bing AI to top queries from Google Trends about prostate, bladder, kidney and testicular cancer.(20) Outputs were generally good quality (median 4/5 DISCERN). However, median understandability and actionability were only 66.7% and 40% (PEMAT). Subsequently, Musheyev et al. assessed information about kidney stones from ChatGPT 3.5, Perplexity, Chat Sonic, and Bing AI.(21) Quality was high (median overall DISCERN 4/5) without misinformation; however, the median understandability and actionability were 69.6% and 40%.

Other studies have also compared information about kidney stones between chatbots. Song et al. queried Claude, Bard, ChatGPT 4, and New Bing with 21 questions about urolithiasis and 2 case scenarios.(22) Accuracy was highest for Claude (most scores 4 or 5/5) and lowest for Bard (most scores ≤ 3). Claude scored best for comprehensiveness and “human care” or demonstrating empathy. The authors discuss how performance may improve as LLMs are updated and more specialized medical LLMs become available, but that they should not substitute for medical consultation.

Sahin et al. similarly compared responses of ChatGPT-4, Claude-3, Mistral Large, Google PaLM 2, and Grok to 25 kidney stone questions identified by Google Trends.(23) None were sufficiently comprehensible, with ChatGPT-4 displaying more complex responses and the worst readability, and Grok the simplest. Claude produced the highest quality responses (DISCERN) and greatest actionability (PEMAT); whereas, Google PaLM had the highest understandability (PEMAT).

Connors et al. examined 32 questions about kidney stones, ureteral stents, benign prostatic hyperplasia and upper tract urothelial carcinoma from the UCF patient education materials as inputs into ChatGPT 4.0 and Bard.(24) Endourologists blindly rated responses in random order. Using a 5-point Likert scale, they rated AI chatbot responses higher than UCF. For accuracy, mean ratings were 3.79 for ChatGPT, 3.58 for Bard and 3.48 for UCF; for comprehensiveness, the mean ratings were 3.69, 3.42 and 3.21, respectively. However, AI responses were at a significantly higher reading level, presenting a barrier for lay audiences.

Other investigators compared AI chatbots for patient education about benign urological conditions. For example, Warren et al. queried ChatGPT, Bing AI, Bard, and Doximity GPT about 3 common patient questions regarding benign prostatic hyperplasia from Google Trends.(25) All chatbots were deemed accurate (5.6/6) and complete (2.8/3). The quality of information was moderate (3.3 mean overall DISCERN score) but increased after prompting with a clinical scenario (4.4 mean). However, readability of unprompted output was poor (12.1 Flesch-Kincaid reading level), and remained high (11.2) after prompting, representing a limitation to use in patient education.

Warren et al. also assessed responses of ChatGPT, Bing AI, Bard and Doximity’s DocsGPT to 9 common questions about Peyronie’s disease from the NIH website.(26) They compared unprompted results to those prompted with a clinical scenario. All bots provided moderate quality responses (mean 3.5/5) which improved to high quality when prompted (4.6 mean). All responses were deemed accurate (mean 5.5/ 6) and complete (mean 2.8/3.0). However, readability was poor (12.9 mean grade level), and they were not able to respond at an 8th grade reading level even when prompted. Although the bots produced accurate responses, poor readability and reliance on prompting limit their application.

Malak et al. investigated outputs regarding female urinary incontinence produced by 10 chatbots prompted to produce understandable information.(27) Gemini responses had the highest overall quality assessed by urogynecologists while Grok had the lowest. However, Grok responses were considered the most readable and Mistral the least. The authors express caution over using chatbots as a replacement for actual medical advice especially given their extremely poor readability.

One study by Pomplii et al. compared the ability of ChatGPT-4, Bard, and Meta to create leaflets regarding circumcision, nephrectomy, overactive bladder syndrome, and transurethral prostate resection with prompts to make them understandable to the public.(28) Leaflets produced by Bard had the highest quality (3.58 out of 5 average score), followed by Meta (3.34) and ChatGPT-4 (3.08). Of note, quality scores were higher for simpler topics (circumcision) than complicated ones (nephrectomy). However, all three models produced responses above the average adult reading level.

Gortz et al. created a new “prostate cancer communication assistant” (PROSCA) AI chatbot for patient information.(29) It was trained to answer questions about prostate disease, diagnostic procedures and treatment options. In an initial pilot study 9 participants used the chatbot while undergoing diagnostic evaluation, with increased knowledge of prostate cancer in 89%. In a follow-up trial, 112 patients diagnosed with prostate cancer were randomized to receive information from PROSCA plus urologists versus urologists alone.(30) While the initial information need was similar between the two groups, by the end patients who used PROSCA had a greater decrease in information need than the control group. Meanwhile, 34.7% of control patients had their need for information increased by one category, compared to 16.7% of PROSCA users. 71.4% of chatbot users agreed that it made them more informed about their diagnoses, and 90.7% said that they would use it again. Given the favorable findings, the authors advocate for integration of PROSCA into clinical settings.

Finally, one study examined the use of AI to help with plain language summaries of publications. Specifically, Eppler et al. studied the use of ChatGPT to create lay summaries of urology journal articles.(31) The authors first examined different prompts to translate the abstract into a layperson summary that is understandable (≤6th grade, sentences <20 words, includes specific key elements, and explains medical concepts). It took 17.5 seconds for ChatGPT to generate lay research summaries. >85% were accurate with better readability compared to the original abstracts and to author-generated patient summaries. Accordingly, the authors recommended prompting ChatGPT to rapidly develop patient summaries for scientific journals, which should be verified by researchers. The authors discuss potential future applications in creating simplified versions of other patient materials (e.g., discharge summaries, at-home care instructions).

Conclusion

Numerous studies have examined AI for patient education about urological topics, including oncology, female pelvic medicine, sexual medicine, and endourology. In many studies, quality of the chatbot responses was acceptable, especially when prompted with a clinical scenario for context. Additional advantages included the rapidity to obtain summarized information,(29) and that AI may provide a way for patients with sensitive questions to get information in a non-judgmental environment.

However, many studies showed that unprompted chatbot output exceeds the recommended reading level for lay health consumers and does not provide easily actionable steps. This raises concerns about health equity, particularly for patients with low health literacy. One possible way to mitigate this is through the use of optimized prompts for readability. However, this is unlikely to fully solve the issue, as several studies noted that chatbots were often still unable to respond at an appropriate reading level or that this affected response quality. Overall, acceptance of chatbots in the healthcare setting is variable between patients and clinicians.(9)

A few studies examined AI to create patient leaflets or research summaries. These reflect another area where AI may bring added value in quickly generating content that can undergo expert review prior to dissemination.

Only one study in our review directly showed that the addition of AI to urologist advice improved patients’ perceived knowledge.(30) Further studies are warranted to examine the downstream outcomes of using AI chatbots for patient education.

Key Points.

  • AI chatbots are increasingly being used to educate patients about various urological topics.

  • Advantages of chatbots include accessibility, rapid response time, and providing a supportive environment for users.

  • Chatbot responses are generally accurate and may improve user knowledge, but their output consistently exceeds the recommended reading level which limits their value in patient education.

  • Further research is recommended to investigate the downstream impacts of using AI chatbots for patient education.

Acknowledgement:

SL is supported by the National Cancer Institute at the National Institutes of Health (R01 CA278997). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

COI: SL reports consulting with Savor Health, Astellas and Doceree, unrelated to the current manuscript.

References

  • 1.U.S Department of Health and Human Services. National Institutes of Health. National Cancer Institute. Making Health Communication Programs Work. https://www.cancer.gov/publications/health-communication/pink-book.pdf. Accessed January 11, 2020. [Google Scholar]
  • 2.Loeb S, Taylor J, Borin JF, Mihalcea R, Perez-Rosas V, Byrne N, Chiang AL, Langford A. Fake News: Spread of Misinformation about Urological Conditions on Social Media. European urology focus 2020;6(3):437–9. doi: 10.1016/j.euf.2019.11.011. [DOI] [PubMed] [Google Scholar]
  • 3.Pew Research Center. Americans’ use of ChatGPT is ticking up, but few trust its election information. https://www.pewresearch.org/short-reads/2024/03/26/americans-use-of-chatgpt-is-ticking-up-but-few-trust-its-election-information/. Published March 26, 2024.
  • 4.Chen A, Jacob J, Hwang K, Kobashi K, Gonzalez RR. AUA Guideline Committee Members Determine Quality of Artificial Intelligence‒Generated Responses for Female Stress Urinary Incontinence. Urol Pract 2024;11(4):693–8. doi: 10.1097/UPJ.0000000000000577. [DOI] [PubMed] [Google Scholar]
  • 5.Shayegh NA, Byer D, Griffiths Y, Coleman PW, Deane LA, Tonkin J. Assessing artificial intelligence responses to common patient questions regarding inflatable penile prostheses using a publicly available natural language processing tool (ChatGPT). Can J Urol 2024;31(3):11880–5. [PubMed] [Google Scholar]
  • 6.Gokmen O, Gurbuz T, Devranoglu B, Karaman MI. Artificial intelligence and clinical guidance in male reproductive health: ChatGPT4.0’s AUA/ASRM guideline compliance evaluation. Andrology 2024. doi: 10.1111/andr.13693. [DOI] [PubMed] [Google Scholar]; ••The authors reported that ChatGPT improved at answering difficult patient questions with repeated use. The chatbot also incorporated more guideline-based data over time.
  • 7.Gibson D, Jackson S, Shanmugasundaram R, Seth I, Siu A, Ahmadi N, Kam J, Mehan N, Thanigasalam R, Jeffery N, et al. Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment. J Med Internet Res 2024;26:e55939. doi: 10.2196/55939. [DOI] [PMC free article] [PubMed] [Google Scholar]; ••This paper discussed chatbots’ poor readability which is concerning in the context of the established relationship between poor health literacy and worse outcomes in patients with cancer. The authors emphasized an urgent need to address these readability issues before chatbots can be integrated into the healthcare setting.
  • 8.Holden CE, Wheelwright S, Harle A, Wagland R. The role of health literacy in cancer care: A mixed studies systematic review. PLoS One 2021;16(11):e0259815. doi: 10.1371/journal.pone.0259815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Mershon JP, Posid T, Salari K, Matulewicz RS, Singer EA, Dason S. Integrating artificial intelligence in renal cell carcinoma: evaluating ChatGPT’s performance in educating patients and trainees. Transl Cancer Res 2024;13(11):6246–54. doi: 10.21037/tcr-23-2234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Gabriel J, Shafik L, Alanbuki A, Larner T. The utility of the ChatGPT artificial intelligence tool for patient education and enquiry in robotic radical prostatectomy. Int Urol Nephrol 2023;55(11):2717–32. doi: 10.1007/s11255-023-03729-4. [DOI] [PubMed] [Google Scholar]
  • 11.Shah YB, Ghosh A, Hochberg AR, Rapoport E, Lallas CD, Shah MS, Cohen SD. Comparison of ChatGPT and Traditional Patient Education Materials for Men’s Health. Urol Pract 2024;11(1):87–94. doi: 10.1097/upj.0000000000000490. [DOI] [PubMed] [Google Scholar]
  • 12.Johnson CM, Bradley CS, Kenne KA, Rabice S, Takacs E, Vollstedt A, Kowalski JT. Evaluation of ChatGPT for Pelvic Floor Surgery Counseling. Urogynecology (Phila) 2024;30(3):245–50. doi: 10.1097/spv.0000000000001459. [DOI] [PubMed] [Google Scholar]
  • 13.Ergin IE, Sanci A. Can ChatGPT help patients understand their andrological diseases? Rev Int Androl 2024;22(2):14–20. doi: 10.22514/j.androl.2024.010. [DOI] [PubMed] [Google Scholar]
  • 14.Rotem R, Zamstein O, Rottenstreich M, O’Sullivan OE, O’Reilly BA, Weintraub AY. The future of patient education: A study on AI-driven responses to urinary incontinence inquiries. Int J Gynaecol Obstet 2024;167(3):1004–9. doi: 10.1002/ijgo.15751. [DOI] [PubMed] [Google Scholar]
  • 15.Hershenhouse JS, Mokhtar D, Eppler MB, Rodler S, Storino Ramacciotti L, Ganjavi C, Hom B, Davis RJ, Tran J, Russo GI, et al. Accuracy, readability, and understandability of large language models for prostate cancer information to the public. Prostate Cancer Prostatic Dis 2024. doi: 10.1038/s41391-024-00826-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Halawani A, Almehmadi SG, Alhubaishy BA, Alnefaie ZA, Hasan MN. Empowering patients: how accurate and readable are large language models in renal cancer education. Front Oncol 2024;14:1457516. doi: 10.3389/fonc.2024.1457516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Shah YB, Ghosh A, Hochberg A, Mark JR, Lallas CD, Shah MS. Artificial intelligence improves urologic oncology patient education and counseling. Can J Urol 2024;31(5):12013–8. [PubMed] [Google Scholar]
  • 18.Alasker A, Alsalamah S, Alshathri N, Almansour N, Alsalamah F, Alghafees M, AlKhamees M, Alsaikhan B. Performance of large language models (LLMs) in providing prostate cancer information. BMC Urol 2024;24(1):177. doi: 10.1186/s12894-024-01570-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Erkan A, Koc A, Barali D, Satir A, Zengin S, Kilic M, Dundar G, Guzelsoy M. Can Patients With Urogenital Cancer Rely on Artificial Intelligence Chatbots for Treatment Decisions? Clin Genitourin Cancer 2024;22(6):102206. doi: 10.1016/j.clgc.2024.102206. [DOI] [PubMed] [Google Scholar]
  • 20.Musheyev D, Pan A, Loeb S, Kabarriti AE. How Well Do Artificial Intelligence Chatbots Respond to the Top Search Queries About Urological Malignancies? European urology 2023. doi: 10.1016/j.eururo.2023.07.004. [DOI] [PubMed] [Google Scholar]
  • 21.Musheyev D, Pan A, Kabarriti AE, Loeb S, Borin JF. Quality of Information About Kidney Stones from Artificial Intelligence Chatbots. Journal of endourology 2024;38(10):1056–61. doi: 10.1089/end.2023.0484. [DOI] [PubMed] [Google Scholar]
  • 22.Song H, Xia Y, Luo Z, Liu H, Song Y, Zeng X, Li T, Zhong G, Li J, Chen M, et al. Evaluating the Performance of Different Large Language Models on Health Consultation and Patient Education in Urolithiasis. J Med Syst 2023;47(1):125. doi: 10.1007/s10916-023-02021-3. [DOI] [PubMed] [Google Scholar]
  • 23.Sahin MF, Topkac EC, Dogan C, Seramet S, Ozcan R, Akgul M, Yazici CM. Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots’ Answers to the Most Common Questions About Kidney Stones. J Endourol 2024;38(11):1172–7. doi: 10.1089/end.2024.0474. [DOI] [PubMed] [Google Scholar]
  • 24.Connors C, Gupta K, Khusid JA, Khargi R, Yaghoubian AJ, Levy M, Gallante B, Atallah W, Gupta M. Evaluation of the Current Status of Artificial Intelligence for Endourology Patient Education: A Blind Comparison of ChatGPT and Google Bard Against Traditional Information Resources. Journal of endourology 2024;38(8):843–51. doi: 10.1089/end.2023.0696. [DOI] [PubMed] [Google Scholar]
  • 25.Warren CJ, Payne NG, Edmonds VS, Voleti SS, Choudry MM, Punjani N, Abdul-Muhsin HM, Humphreys MR. Quality of Chatbot Information Related to Benign Prostatic Hyperplasia. Prostate 2025;85(2):175–80. doi: 10.1002/pros.24814. [DOI] [PubMed] [Google Scholar]
  • 26.Warren CJ, Edmonds VS, Payne NG, Voletti S, Wu SY, Colquitt J, Sadeghi-Nejad H, Punjani N. Prompt matters: evaluation of large language model chatbot responses related to Peyronie’s disease. Sex Med 2024;12(4):qfae055. doi: 10.1093/sexmed/qfae055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Malak A, Sahin MF. How Useful are Current Chatbots Regarding Urology Patient Information? Comparison of the Ten Most Popular Chatbots’ Responses About Female Urinary Incontinence. J Med Syst 2024;48(1):102. doi: 10.1007/s10916-024-02125-4. [DOI] [PubMed] [Google Scholar]
  • 28.Pompili D, Richa Y, Collins P, Richards H, Hennessey DB. Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models. World J Urol 2024;42(1):455. doi: 10.1007/s00345-024-05146-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Görtz M, Baumgärtner K, Schmid T, Muschko M, Woessner P, Gerlach A, Byczkowski M, Sültmann H, Duensing S, Hohenfellner M. An artificial intelligence-based chatbot for prostate cancer education: Design and patient evaluation study. Digit Health 2023;9:20552076231173304. doi: 10.1177/20552076231173304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Baumgartner K, Byczkowski M, Schmid T, Muschko M, Woessner P, Gerlach A, Bonekamp D, Schlemmer HP, Hohenfellner M, Gortz M. Effectiveness of the Medical Chatbot PROSCA to Inform Patients About Prostate Cancer: Results of a Randomized Controlled Trial. Eur Urol Open Sci 2024;69:80–8. doi: 10.1016/j.euros.2024.08.022. [DOI] [PMC free article] [PubMed] [Google Scholar]; •••The authors used a randomized control trial to investigate the added benefit of a novel prostate cancer chatbot (PROSCA) on patient knowledge in addition to information granted from urologists alone. They reported favorable findings and advocated for integration of PROSCA into clinical settings.
  • 31.Eppler MB, Ganjavi C, Knudsen JE, Davis RJ, Ayo-Ajibola O, Desai A, Storino Ramacciotti L, Chen A, De Castro Abreu A, Desai MM, et al. Bridging the Gap Between Urological Research and Patient Understanding: The Role of Large Language Models in Automated Generation of Layperson’s Summaries. Urol Pract 2023;10(5):436–43. doi: 10.1097/upj.0000000000000428. [DOI] [PubMed] [Google Scholar]; •••This study intestigated the ability of chatbots to summarize journal articles. The summaries were accurate and more readable than the original abstracts and author-generated summaries. The authors recommended chatbot use for summarizing scientific abstracts and possibly future applications for other patient materials like discharge summaries.

RESOURCES