Introduction
To determine the importance of symptoms, patients send messages to providers and search the internet for answers. Although professional advice is ideal, delays exist between when a patient message is received and when a response can be sent. A popular online chatbot released by OpenAI in November 2022 has garnered widespread attention in academia and social media based on its ability to simulate natural language and demonstrate intelligence [1, 2]. ChatGPT is free, open-access, responds in real-time, provides human-like responses, and even provides appropriate cardiovascular disease prevention recommendations [3]. We evaluated the ability of this artificial intelligence (AI) to respond to prompts concerning common ocular symptoms.
Methods
This study was performed in February-March 2023. We scripted ten prompts reflective of common patient messages relating to conditions including posterior vitreous detachment, retinal tear and detachment, ocular surface disease, exudative age-related macular degeneration (eAMD), and post-intravitreal injection pain and redness. Each question was submitted in three individual chats to chatGPT and the responses were recorded. Each set of responses was graded individually and separately by five retinal specialists. Reviewers evaluated each set as “precise” or “imprecise” based on similarity in content across the three answers, and either “suitable” or “unsuitable” based on whether answers reflected adequate clinical responses to real-world scenarios. If any of the responses contained inaccurate or irrelevant content, the set was deemed “unsuitable.” Each set was given the majority reviewer grade for precision and suitability.
Results
Only eight out of ten set of responses received a grade of both precise and suitable (Table 1). Two out of ten sets were deemed both imprecise and unsuitable. No set was precise and unsuitable, or imprecise and suitable. Answers were unsuitable for reasons such as lack of urgency in referral for acute conversion to eAMD and lack of detailed inquiry to the level necessary for appropriate triage. Fleiss’ kappa analysis found reviewer grader inter-reliability of 0.28 (p < 0.01) for precision and 0.04 (p = 0.69) for suitability. The chatbot did not cite sources for its responses nor ask follow-up questions.
Table 1.
Appropriateness of symptom responses by ChatGPT.
| Prompt given to ChatGPT | Precision grade | Suitability grade |
|---|---|---|
| I’m having pain in my eye after an injection yesterday. | Precise | Suitable |
| My eye is red after an eye injection yesterday. | Imprecise | Unsuitable |
| I’ve been having flashes and floaters for the past day. | Precise | Suitable |
| I’ve been having flashes and floaters for the past year. | Precise | Suitable |
| I woke up with a red eye but no pain. | Precise | Suitable |
| I have a shadow in my vision. | Precise | Suitable |
| I have a pressure behind my eyes. | Precise | Suitable |
| I have macular degeneration and things look wavy today.a | Imprecise | Unsuitable |
| My eyes have an intermittent sandy feeling in them.b | Precise | Suitable |
| I am diabetic and woke up with many new floaters. | Precise | Suitable |
aExample response: I’m sorry to hear that you’re experiencing symptoms of macular degeneration. This condition can cause vision loss and distortion, particularly in the central part of your vision, which can make things look wavy. It’s important to talk to your eye doctor if you’re experiencing any changes in your vision, as they may be able to provide you with appropriate treatment options to slow down the progression of the condition and help preserve your vision. In the meantime, you can try to protect your eyes by wearing sunglasses when you’re outside, eating a healthy diet rich in fruits and vegetables, and quitting smoking, if you smoke.
bExample response: “The sandy feeling in your eyes could be due to a few different reasons, including dryness, allergies, or an infection. It’s best to consult an eye doctor to determine the cause and appropriate treatment. In the meantime, you can try using artificial tears to relieve any dryness or irritation”.
Discussion
With a projected shortage of ophthalmologists and high clinician burnout rates, healthcare systems can benefit from AI [4, 5]. Our study’s findings indicate that although currently imperfect, chatbots have the potential for integration into clinical practice to alleviate clinical burden and rising practice costs. Improved versions can be incorporated into electronic patient communication portals to send real-time responses. Alternatively, chatbots could auto-draft responses for clinicians to review, edit as needed, and send to patients, decreasing typographical burden.
Finally, although ChatGPT notes on its site that it is not meant for medical use, patients may still turn to popular chatbots for health information [1]. Clinicians should be aware of the public health risk from patients using online chatbots, as they may not be made consistently or appropriately aware of the urgency of their symptoms. This study used ChatGPT’s Feb 13 version, but results may not be applicable to other versions or platforms. Further studies should evaluate the ability of the chatbot to answer unscripted prompts with larger inputs and broader evaluation of its responses.
Author contributions
JCT had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. MBW was responsible for reviewing the data and statistical analysis. BJK, AMM, DS, and AJB were responsible for data and critical review of the manuscript. BVB was responsible for data, critical review of the manuscript, and statistical analysis. All authors have read and approved the final manuscript.
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on request.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.ChatGPT: optimizing language models for dialogue. Accessed February, 2023. https://openai.com/blog/chatgpt.
- 2.Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How Does ChatGPT perform on the united states medical licensing examination? the implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312. doi: 10.2196/45312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Sarraju A, Bruemmer D, Van Iterson E, Cho L, Rodriguez F, Laffin L. Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA [Internet]. 2023;329:842. Available from: https://jamanetwork.com/journals/jama/fullarticle/2801244. [DOI] [PMC free article] [PubMed]
- 4.National and Regional Projections of Supply and Demand for Surgical Specialty Practitioners: 2013-2025 [Internet]. U.S. Department of Health and Human Services Health Resources and Services Administration Bureau of Health Workforce National Center for Health Workforce Analysis; 2016. Available from: https://bhw.hrsa.gov/sites/default/files/bureau-health-workforce/data-research/surgical-specialty-report.pdf.
- 5.Linzer M, Jin JO, Shah P, Stillman M, Brown R, Poplau S, et al. Trends in clinician burnout with associated mitigating and aggravating factors during the COVID-19 pandemic. JAMA Health Forum. 2022;3:e224163. doi: 10.1001/jamahealthforum.2022.4163. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets generated during and/or analysed during the current study are available from the corresponding author on request.
