Abstract
ChatGPT, an artificial intelligence (AI) chatbot built on large language models (LLMs), has rapidly gained popularity. The benefits and limitations of this transformative technology have been discussed across various fields, including medicine. The widespread availability of ChatGPT has enabled clinicians to study how these tools could be used for a variety of tasks such as generating differential diagnosis lists, organizing patient notes, and synthesizing literature for scientific research. LLMs have shown promising capabilities in ophthalmology by performing well on the Ophthalmic Knowledge Assessment Program, providing fairly accurate responses to questions about retinal diseases, and in generating differential diagnoses list. There are current limitations to this technology, including the propensity of LLMs to “hallucinate”, or confidently generate false information; their potential role in perpetuating biases in medicine; and the challenges in incorporating LLMs into research without allowing “AI-plagiarism” or publication of false information. In this paper, we provide a balanced overview of what LLMs are and introduce some of the LLMs that have been generated in the past few years. We discuss recent literature evaluating the role of these language models in medicine with a focus on ChatGPT. The field of AI is fast-paced, and new applications based on LLMs are being generated rapidly; therefore, it is important for ophthalmologists to be aware of how this technology works and how it may impact patient care. Here, we discuss the benefits, limitations, and future advancements of LLMs in patient care and research.
Subject terms: Medical research, Education
Abstract
基于大型语言模型(LLM)的人工智能聊天机器人ChatGPT已迅速普及。这项变革性技术的优势和局限性在包括医学在内的各个领域引致了广泛讨论。ChatGPT的广泛应用使得临床医生能够将这些工具用于各种任务, 例如生成鉴别诊断清单、整理病人记录以及为科学研究整合文献。LLM通过在眼科知识评估项目表现良好、对视网膜疾病问题可提供准确的回答, 并在鉴别诊断方面显示出在眼科领域应用前景的能力。这项技术目前存在局限性, 包括LLM的“幻觉”倾向, 或自信地生成虚假信息; 在医学偏倚方面存在潜在作用; 以及在不允许“人工智能剽窃”或发表虚假信息的情况下将LLM纳入研究等方面的挑战。在本文中, 我们对什么是LLM提供中立的概述, 介绍了过去几年产生的一些LLM。我们讨论了评估这些语言模型在医学中作用的最新文献, 重点是ChatGPT。人工智能领域发展快速, 基于LLM的新的应用层出不穷; 因此, 眼科医生了解这项技术的工作原理以及它对患者治疗的影响非常重要。在此, 我们对LLM在患者治疗和研究方面的优势、局限性及未来发展进行讨论。
Introduction
The field of ophthalmology has often been at the forefront of adopting new technology and the use of artificial intelligence (AI) has been explored in many areas of ophthalmology [1–4]. A recent advancement in the field of AI has been the introduction of large language models (LLMs), which can process natural language input and generate conversational responses. A free and accessible LLM known as ChatGPT was introduced by a San Francisco-based company, OpenAI Inc., and has become increasingly popular since it was released on November 30, 2022 [5]. ChatGPT can interact with users and “answer follow up questions, admit mistakes, challenge incorrect premises, and reject inappropriate requests” [5]. The original iteration of ChatGPT, ChatGPT-3.5 was released to the public for free, and an updated version ChatGPT-4 was released on March 14, 2023, and is available to users at a cost. There have been several studies exploring the performance of both ChatGPT-3.5 and ChatGPT-4 on tasks such as debugging code [5], performance on standardized tests [6], and even proposing novel compounds [6].
Since its release, ChatGPT has been faced with a great amount of excitement and criticism, with many speculating on both the positive and negative impacts this technology will have on healthcare [7–12]. As the use of LLMs grows in medicine, it is important for ophthalmologists to be aware of how these models work and how to leverage the unique capabilities of this technology along with identifying potential sources of harm. In this work, we will explore the principles behind how LLMs such as ChatGPT work and present findings of current studies regarding the application of LLMs in medicine. We will focus on how LLMs could be used in ophthalmology, specifically the benefits, limitations, and future advancement of this technology with respect to patient care and research.
Overview of ChatGPT and LLMs
LLMs have demonstrated the ability to perform natural language processing (NLP) tasks zero-shot, meaning they are able to complete tasks without being trained on specific examples [13]. The most popular and accessible LLM has been the generative pretrained transformer 3.5 (GPT-3.5), which is available to use for free through a “research preview” of ChatGPT, released by OpenAI. GPT-3.5 is a 175 billion-parameter large language model (LLM) that is trained using unsupervised deep learning on a variety of texts available as of 2021 [14–16]. The unsupervised training consists of predicting the next token in a text sequence with additional fine-tuning by reinforcement learning from human feedback (RLHF) [17]. This enables ChatGPT to have human-like conversations with its users and generate coherent text.
However, despite its success and popularity, there are many limitations to ChatGPT. The creators of ChatGPT, OpenAI, have warned that the chatbot can sometimes “write plausible-sounding but incorrect or nonsensical answers” [5]. These are often referred to as “hallucinations.” Additionally, as ChatGPT is only trained on data up to 2021, it does not have access to current events and the latest information [6]. However, recently, OpenAI and its competitors have employed retrieval-augmented generation (RAG) to connect their LLMs to the internet. RAG entails using the user’s query to fetch relevant documents from the internet and add them to give the model additional information [18]. Additionally, OpenAI has released an application programming interface (API) that allows developers to build upon the existing GPT-3.5 and GPT-4 models for custom applications. For example, some developers have made chatbots that let users “talk to” a database of documents which can reduce hallucinations and improve reliability by restricting the AI’s access to peer-reviewed papers such as those available on PubMed [19, 20]. For example, the authors of this review have used the GPT-3.5 API to develop SightBot (https://sightbot.brilliantly.ai/), a chatbot tool built using OpenAI’s and PubMed’s APIs that provides information based on peer-reviewed PubMed sources and cites the articles used [20]. It is also well documented that the prompt inputted into LLMs can also greatly impact the output, thus carefully crafted prompts can be used to encourage desired behaviour and reduce the risk of low-quality output [13].
More work is being done to improve the accuracy of the responses generated by LLMs for specific biomedical applications. Several other groups are working on building language models that have been fine-tuned on biomedical texts, thus theoretically optimizing their performance for medical applications. For example, BioMedLM was trained using a custom biomedical tokenizer trained on PubMed abstracts [21]. Other biomedical language models include DRAGON [22], BioLinkBERT [23], PubMedBERT [24], BioMegatron [25], and BioGPT [26]. These biomedical language models are not as easily accessible as ChatGPT but represent how LLMs can be built on domain-specific knowledge. We provide a full list of the language models mentioned in this review in Table 1. We have included their parameter counts, the year the model was released, and their performance of USMLE-style questions. It should be noted that only a subset of the models were tested on USMLE-style questions and in most cases, the results reported are a combination of questions from Step 1, Step 2, and Step 3. This is not an exhaustive list of language models; many more exist and are actively being developed. However, many of these are not publicly available. An additional list of chatbots built on LLMs that are easily accessible is provided in Table 2. Further work needs to be done to understand what impact these chatbots will have in the field of ophthalmology. Here, we discuss some reported benefits and limitations of LLMs, focusing on ChatGPT.
Table 1.
List of language models included in this paper.
| Model | Type | Date | Parameter Count | Data Used for Training | Performance on USMLE-style questions |
|---|---|---|---|---|---|
| GPT-3 [16] | Generative Pre-Trained Transformer (GPT) | 2020 | 175 B | Common Crawl dataset along with other curated high-quality datasets including an expanded version of the WebText dataset, two internet-based book copora, and English-language Wikipedia. | - |
| GPT-3.5 | GPT | 2022 | 175 B | This model uses the same training data as GPT-3 but adds reinforcement learning from human feedback (RLHF) | 53.6% [88] |
| GPT-4 [6] | GPT | 2023 | - | The training data used to generate this model has not been released. | 86.7% [88] |
| BioMedLM [21] | GPT | 2022 | 2.7 B | Biomedical Abstracts and Papers from The Pile (https://pile.eleuther.ai/) | 50.3% [21] |
| DRAGON [22] | Deep Bidirectional Language-Knowledge Graph Pretraining | 2022 | 360 M | Texts (PubMed abstracts) and the Unified Medical Language System knowledge graph [89] | 47.5% [21] |
| BioLinkBERT [23] | Bidirectional Encoder Representations from Transformers (BERT) | 2022 | 340 M | Biomedical model that uses the link structure (hyperlinks) of documents during training | 45.1% [21] |
| PubMedBERT [24] | BERT | 2021 | 100 M | Uses abstracts and full texts from PubMed | 38.1% [21] |
| BioMegatron [25] | Similar to BERT | 2020 | 345 M/800 M/1.2 B | Uses abstracts and full texts from PubMed | - |
| BioGPT [26] | GPT | 2022 | 347 M | Pre-trained on 15 M PubMed abstracts | - |
| T5 [90] | Text-to-text Transfer Transformer (T5) | 2020 | 11 B | Created the Colossal Clean Crawled Corpus (C4) that is 2 orders of magnitude larger than Wikipedia | - |
| ClinicalBert [52] | BERT | 2019 | - | Uses the MIMIC-III dataset from the intensive care unit at Beth Israel | - |
| GatorTron [53] | BERT | 2022 | 8.9 B [53] | Uses de-identified clinical data from the EHR at the University of Florida Health System, PubMed, WikiText, and MIMIC-III | - |
| SciBERT [63] | BERT | 2019 | - | Random sample of 1.14 M paper from Semantic Scholar (82% from broad biomedical domain) | - |
| PaLM [33] | Pathways Language Models (PaLM) | 2022 | 540 B | Filtered webpages, books, Wikipedia, news articles, source code, and social media conversations | - |
| MedPaLM [34] | PaLM | 2022 | 540 B | Trained on the same information as PaLM but with instruction prompt tuning using a dataset of medical questions |
Med-PaLM 1: 67% [34] Med-PaLM 2: 85% [35] |
We have listed the type of model, the number of parameters used in the language model (which is a measure of the size and complexity of the model), the year the model was released, the data the model was trained on, and when available, it’s performance on USMLE-style questions.
Table 2.
List of publicly available chatbots built on LLMs.
| Chatbot | Link | Description |
|---|---|---|
| GPT-3.5 | https://chat.openai.com/ | Default version of ChatGPT. Currently available as a “Free Research Preview”. |
| GPT-4 | https://chat.openai.com/ | Available to users with a ChatGPT Plus account. Currently priced at 20$/month. |
| Bard | https://bard.google.com | “Experiment” version available with Google account. It is powered by PaLM. |
| Claude | https://claude.ai/ | Free plan available with daily usage limits. |
| Bing Chat | www.bing.com | AI chat that is available to anyone using the Bing chat engine. |
Methods
The scholarly databases PubMed and Google Scholar were utilized in this review with searches conducted between March 2023 and October 2023. Due to the fast growth of the field relative to the peer-review process, we also referenced many papers from medRxiv and arXiv. Search terms were input into the databases independently or in conjunction. The terms used include: “AI”; “LLM”; “GPT”; “ChatGPT”; “GPT-3.5”; “GPT-4”; “NLP”; “BERT”; “Language Models”; “Ophthalmology”; “Retina”; “Cornea”; “Glaucoma”; “Medicine”; “Differential Diagnosis”; “Research”; “Telemedicine”; “Biases”; “Plagiarism”. Online websites were also used to get updated news articles and company reports regarding the available AI chatbots. We also referenced AI-based research search engines such as elicit.org, though these were used secondarily to PubMed and Google Scholar.
Benefits of LLMs in clinical settings
Although a range of LLMs are being designed, most of the current studies in medicine focus on the use of GPT-3.5 with a few focusing on the improvements seen in GPT-4. Studies have found that even without being fine-tuned for medical or scientific tasks, GPT-3.5 has performed well on several knowledge-based tests including the United States Medical Licensing Exam (USMLE), where it was able to perform at or near the passing threshold [27]. GPT-3.5 was also tested on an ophthalmology exam, Ophthalmic Knowledge Assessment Program (OKAP), and scored 55.8% and 42.7% in two 260-question simulated exams. It performed the best in general medicine but worse in subspecialties such as neuro-ophthalmology, ophthalmic pathology, and intraocular tumours [28, 29]. This test was repeated with GPT-4 which significantly outperformed GPT-3.5, achieving a score of 81%, with performance in subspecialties such as neuro-ophthalmology increasing from 25% to 100% (though this was not statistically significant given the small sample size) [30, 31]. It has been found that GPT-4 outperforms GPT-3.5 and even models fine-tuned on medical knowledge, such as Google’s Med-PaLM [11, 32–34]. Recently, Google released Med-PaLM 2, which displays greater performance, achieving 85% correctness on the MedQA dataset of USMLE questions, compared to 67% for MedQA [35]. Thus, more studies are required to compare these various models, especially in the field of ophthalmology.
As the use and popularity of ChatGPT grows, physicians have begun to evaluate the accuracy of these tools in the clinic. There have been several studies published that assess the use of GPT-3.5 in acting as a supportive tool for generating differential diagnoses or aiding in clinical decision making [36–38]. These preliminary studies suggest that while ChatGPT could be useful in presenting differential diagnoses, it underperforms relative to a physician and should not be used as a standalone tool [37]. One study found that ChatGPT was able to accurately provide a diagnosis in eight out of 11 glaucoma cases which was equal to or above the performance of ophthalmology residents [31].
Extensive studies of the use of ChatGPT in ophthalmology have been limited thus far. However, one recent study compared GPT-3.5 to an existing diagnostic tool, the Isabel Pro Differential Diagnosis Generator. Both tools were given ten ophthalmology patient cases. It was found to have identified the correct diagnosis in 9/10 cases, with the differential diagnosis list including the correct diagnosis 10/10 time, whereas the Isabel Pro Differential Diagnosis Generator only identified the correct differential diagnosis 1/10 times with the correct diagnosis only included in the differential diagnosis list 7/10 times [39]. Additionally, ChatGPT was able to provide information about how the diagnosis could be confirmed for each of the 10 cases and how it should be treated [39].
In a letter to the editor, Potapenko et al. assessed the ability of ChatGPT to provide ophthalmology-specific information. They described a small study in which 5 specialists evaluated the accuracy of information ChatGPT provided about common retinal diseases (age-related macular degeneration, diabetic retinopathy, retinal vein occlusion, retinal artery occlusion, and central serous chorioretinopathy) [40]. They found that ChatGPT provided highly accurate responses regarding general information on diseases, their prevention, and prognosis. However, the responses regarding treatment options were less useful and 12 out of 100 responses generated were found to have potentially harmful inaccuracies [40]. For example, regarding central serous chorioretinopathy, the chatbot present suggested corticosteroids could reduce inflammation and fluid accumulation, which is misleading [40]. Though the authors report these errors, they found that ChatGPT was able to generate fairly accurate responses to most questions and asserted that LLMs could increase patient accessibility to ophthalmologic care.
This is important because access to care can be quite challenging in ophthalmology, especially in medically underserved communities. Patients may try to receive ophthalmic care at primary care offices or in emergency departments. However, trained ophthalmologists may not always be available for consultation. In these settings, easy access to information regarding ocular health through AI chatbots could benefit both the primary care physician and the patient. These chatbots can take into consideration patient-specific information such as age or comorbidities, improving its ability to provide differential diagnoses, and triage patients compared to a regular search engine. Furthermore, patients already lean towards reviewing information on the internet prior to presenting to physicians, and there is evidence that this ability to obtain information has a positive impact on doctor-patient interaction [41]. Compared to traditional search engines, AI-based chatbots have the benefit of directly interacting with the user such that clarification questions can be asked. The responses provided by the chatbot are also often easier to understand compared to those of a search engine where many sources are provided, sometimes with differing information or misinformation. The ability of ChatGPT to connect with the internet can also mitigate the risks of misleading or outdated information while retaining the accessibility of LLMs – whereas patients may not have the familiarity to digest online medical sources, ChatGPT excels at textual summarization and simplification [42].
A recent article written by glaucoma specialists provided examples of how ChatGPT could be used to design personalized notes and instruction sheets for patients [43]. This can be especially useful for telemedicine which has been growing in popularity since the COVID-19 pandemic [44]. Additionally, ChatGPT can also be asked to respond in different languages which can be useful in settings where there is a language-barrier, as seen in Fig. 1. Disparities in communication has been studied in the clinical ophthalmic literature [45, 46], and addressing these language barriers may help to provide optimal clinical care and increase patient satisfaction. There is large potential for this technology to address many longstanding barriers. However, future research may continue to optimize this process for clinical care. As seen in Fig. 1c, ChatGPT seemingly struggles to respond in one try in Hindi, requiring the user to prompt it to continue. Thus, future developments in this area may continue to optimize this process and may help to address longstanding language barriers in healthcare.
Fig. 1. ChatGPT in different languages for ophthalmic pathology.
ChatGPT provides guidelines regarding lifestyle considerations for a patient with diabetic retinopathy in (a) Spanish, (b) Chinese, and (c) Hindi.
Furthermore, it has been shown that ophthalmologists spend a significant amount of time working on administrative tasks such as documentation for electronic health record (EHR) systems [47]. These are applications of LLMs that can help reduce physician burden and increase time spent with patients. One group found that ChatGPT was able to correctly categorize most of the parameters of clinical information into the appropriate sections of a clinical note without specifying abbreviations [48]. Another group explored whether ChatGPT could be used to provide discharge summaries [49]. GPT-4 was shown to be able to read a transcript of the physician-patient encounter and write a medical note, even identifying the facts that do not appear explicitly in the transcript but would improve the summary [11]. Recently, Microsoft has partnered with Epic Systems and announced that they are bringing GPT-4 into the health record system to aid in drafting message responses from health care workers to patients and to help analyse medical records [50].
Other LLMs are specifically being trained on EHR information. ClinicalBERT was an LLM developed with over 110 million parameters and was trained using 0.5 billion words from publicly available Medical Information Mart for Intensive care III (MIMIC-III) dataset which is a large, single-centre database comprising information related to critical care units at a large tertiary care hospital [51]. They represented clinical notes using bidirectional transformers, and evaluated its performance on the task of predicting 30-day hospital readmission based on the discharge summaries and notes from the first few days of admission in the intensive care unit [52]. This even provided interpretable predictions by revealing which terms in the clinical notes were predictive of patient readmission [52]. Similarly, the University of Florida designed their own LLM, GatorTron, that was developed from scratch using >90 billion words of text from de-identified clinical notes, PubMed articles, and Wikipedia [53]. They included patient notes from a range of different clinical departments (including 1.01% from the ophthalmology department). GatorTron was systematically evaluated based on five clinical tasks including concept extraction, medical relation extraction, semantic textual similarity, natural language inference, and medical question answering. They showed that this model performed well at these tasks and better than past models [53]. These types of models could potentially improve the use of clinical notes and narratives to improve healthcare delivery and health outcomes.
Future work also includes incorporating clinical images into the LLMs to improve their use in providing patient specific clinical diagnoses. A recent paper proposed a framework that incorporates the capabilities of LLMs into existing medical-image computer-aided diagnosis models which use deep-learning algorithms to analyse medical images. This may allow future researchers to couple the power of deep learning based diagnostic systems with LLMs to provide users with a condensed report and interactive explanations based on a given clinical image [54]. The use of deep learning is already being implemented to interpret ophthalmic images, thus adding this feature of natural language processing could be a potential next step.
Other work is being done on multimodal large language models (MLLMs) that can process both text and image input [55]. GPT-4 is one such model, and OpenAI released this functionality to the public in October 2023 under their paid subscription [6]. OpenAI has previously showcased this capability in their GPT-4 technical report and have partnered with the organization “Be My Eyes,” which connects volunteers with blind or low vision patients [6]. They have used GPT-4 to develop the “Virtual Volunteer” which is able to generate similar responses to human volunteers, enabling those with vision impairment to have a greater degree of independence in their lives [56]. As an ophthalmologist, this can be a useful tool to recommend to patients with visual deficits. Such models may also allow for more accurate diagnoses with picture submissions from patients.
Drawbacks of LLMs in clinical settings
Despite these potential benefits and future enhancements, there are still many limitations of LLMs such as ChatGPT which may make them harmful in clinical settings. As mentioned previously, a huge downfall of ChatGPT is its propensity to “hallucinate”. The LLM will produce misinformation since there is no “built-in filter to determine the correctness of [the] output” [57]. These hallucinations are presented with confidence and may even sound reasonable. This can be extremely misleading for those who are not experts in the field. Without citations or references, verifying the claims made by the chatbot is not trivial and can lead to the amplification of medical misinformation. This can be especially dangerous in the hands of a patient who may have more trouble deciphering which parts of the chatbots’ responses are correct and which are fabricated. Since hallucinations are so dangerous in clinical settings, there are many people working to prevent this. One approach has been to incorporate a framework for human evaluation and prompt tuning when designing new LLMs, although these results have still been found to be inferior to human clinicians [58]. Other options are to use RAG to restrict the chatbot to only use and cite documents from trusted databases such as PubMed [20] or Semantic Scholar [19]. This is a challenge that is continuing to be explored in the field. However, with medical information, there is a higher bar for accuracy and a lower tolerance for misinformation. Thus, it is important for these factors to be addressed prior to physicians or patients relying on these models to obtain medical information. Additionally, the responses that the free version of ChatGPT provides are limited to information prior to 2021. Thus, the model is not trained on new updates to treatment guidelines or recent literature which is an important part of providing evidence-based care. While RAG holds the promise of addressing this limitation, the default (and most commonly used) LLM interfaces from providers such as OpenAI do not employ it. Care must be taken by healthcare providers using such models to verify outputs. The models lack clinical acumen and do not actively take into consideration patient-specific risk factors or demographics which are important parts of making clinical decisions.
Another limitation of LLMs is that they may perpetuate biases [59]. They are trained on datasets of information, and these sources may contain biased information or underrepresent some demographics. This can be a huge problem when deploying them as part of a tool to provide care to diverse patient populations. Recruiting participants in underrepresented racial and ethnic groups has been shown to be crucial in ophthalmology research [60]. Studies have found that vision loss and visual impairments differ with respect to age, gender, and race [61]. All these factors must be well-represented in the training data provided to the LLM in order to provide relevant medical information and address health disparities in eyecare. With LLMs trained on EHRs, ensuring that a diverse dataset is used to provide equitable healthcare. For example, GatorTron was trained on a predominantly white population (44%), which may lead to underperformance on other ethnic groups [53]. Zhang et al. analysed the extent to which the dataset MIMIC-III, used to train ClinicalBERT, encoded marginalized populations differently, and how this may lead to worsened performance on clinical tasks for those populations. They found that the classifiers trained from the LLMs displayed significant bias with regards to gender, language, ethnicity, and insurance status [62]. In using SciBERT, they also saw that when prompted to generate fill-in-the blank reports, the model output varied by race. The responses generated when the patient was specified to be black or African American resulted in a worse course of action than when specified as white and Caucasian [62, 63]. This provides evidence that there may be bias amplification when using LLMs and others are continuing to look into this within medicine [64]. There is a growing community actively working to address and remove biases using adversarial debiasing approaches during pretraining, however these are not always effective [62, 65, 66]. Additionally, when developing a clinical LLM based on data from EHRs, such as GatorTron or ClinicalBERT, there may be concerns about patient privacy. Proper deidentification is a crucial step [67].
Further, while ChatGPT is currently freely available, there is no guarantee that further iterations of LLMs will not be paywalled. OpenAI already charges for third-party applications built on ChatGPT and has a paid tier called ChatGPT Plus for access to GPT-4. Additionally, these kinds of tools may favour resource-rich countries with access to more computational resources, potentially increasing global disparities in healthcare delivery. It is important that ophthalmologists and other physicians work with AI companies to ensure that the information can be accessed regardless of socioeconomic status.
All in all, chatbots should not be used as a standalone tool in ophthalmology. The responses provided by chatbots must be evaluated critically before making any decisions regarding clinical care. Further work needs to be done to improve the accuracy of the output responses and to avoid perpetuation of misinformation. There are currently very few studies assessing the use of LLMs in ophthalmology. With regards to eye care, experts must work with those building these systems to ensure that these tools are designed for diverse communities and are open access to ensure equity. As these tools continue to develop, they may reduce the time required for administrative tasks and increase the availability of ophthalmic information across the country, enabling physicians to spend more time with their patients and thus improving patient care.
Benefits of LLMs in research
The use of ChatGPT and LLMs in research has been widely explored [68–70]. There are several exciting benefits of using LLMs in research. ChatGPT can help researchers quickly summarize large bodies of information, be used to improve scientific writing, aid in idea generation, and advise on statistical analyses [68, 71, 72]. LLMs can be particularly useful for non-native English speakers who can use these to better communicate research ideas and results, improving equity and diversity in research [69]. They can also help researchers with analysing large datasets or writing code [73].
Certain AI-assisted tools such as elicit.org organize papers on a given research topic and present them in a table with information such as the abstract summary, intervention used, outcomes measured, number of participants, and other specified information [19]. This feature is similar to existing search engines such as PubMed or Google Scholar but with the added benefit of these summary features, allowing researchers to more easily sift through a larger number of papers. We developed SightBot (https://sightbot.brilliantly.ai), a custom chatbot made with OpenAI’s and PubMed’s APIs. Another website (elicit.org) also employs LLMs for literature searching. Since SightBot is constructed using OpenAI’s API, there is a charge associated with using it. Compared to Google Scholar and PubMed, AI-based tools can provide summaries of research topics along with relevant peer-reviewed papers.
Embedding text documents into LLMs for RAG allows one to “interact” with these documents such that extracting the pertinent information becomes easier. A recently published study also assessed the use of ChatGPT to write a Boolean query for systematic literature reviews and found promising results [74]. The use of LLMs to improve the literature review process could potentially help researchers identify the most relevant papers, especially in growing fields like ophthalmology where the number of papers published has been increasing steadily every year [75].
Additionally, LLMs have the potential to aid in clinical research projects. Currently, patient charts in EHRs can be used to perform retrospective cohort studies that answer a range of questions regarding patient care. These studies often require researchers to manually go through many patient charts on an EHR system and extract information from free text. This is time-consuming, often subjective, and may limit the number of patients that can be included in a given study. LLMs trained on EHR data, such as GatorTron [53], could potentially improve this process of manually searching through health records for key information. We show an example of the use of ChatGPT to extract relevant information from a body of free-text using an annotated and de-identified open-source annotated glaucoma dataset [67]. In Fig. 2, we demonstrate that ChatGPT is able to take the provided note, interpret the abbreviations, and when tasked to extract only information about the medication, gender, and age it is able to do so using just the free text. In the given example, the chatbot is also asked to format the data into a chart.
Fig. 2. ChatGPT is provided with deidentified data from a glaucoma database [67].
The chatbot can interpret the text even though it includes several medical abbreviations and when prompted, ChatGPT was able to extract specific information about the patient and their medication.
The use of NLPs is already being explored in ophthalmology. In a literature review, Chen et al. reported on nineteen published studies of NLP in ophthalmology [76]. They found that the majority of these existing publications focused on extracting specific text such as visual acuity from free-text notes, and others looked at domain embedding, predictive modelling, and topic modelling [76]. However, most of the NLP techniques used in this paper were trained for specific tasks, unlike GPT which is zero-shot, and thus able to perform tasks it was not specifically trained on. These newer LLMs may further benefit from these tasks. Overall, LLMs offer the ability to synthesize large bodies of information and may allow researchers to spend more time focusing on experimental design and generating new research questions.
Drawbacks of LLMs in research
However, as with any advance in technology, the use of LLMs in research comes with drawbacks. In one study, researchers asked chatbots to generate 50 medical-research related abstracts [77]. Human reviewers and AI-output detectors only spotted 68% and 66% of artificially generated abstracts respectively, with 32% of the AI-generated abstracts being identified as real [77]. Further, it was found that ChatGPT was able to generate scientific abstracts that easily bypass classic plagiarism checkers [77]. This challenge in distinguishing between an article written by a researcher and an article written by AI may lead to the publication of false information. Although tools are being generated to detect “AI plagiarism”, thus far these tools are not able to accurately distinguish between human writing and AI-generated text [78].
Several publications have even listed AI as a co-author [79, 80]. Since then, many journals have begun to add policies regarding the use of ChatGPT in scientific writing. For example, Elsevier released new author policies that urge users to “apply the technology with human oversight and control” and prevents researchers from listing “AI and AI-assisted technologies as an author” since AI tools cannot take responsibility for the scientific work [81]. Similarly, Eye has stated that LLMs do not satisfy their authorship criteria and use of LLM should be documented [82]. However, there are still journals in ophthalmology have not yet specified any author guidelines regarding the use of LLMs in writing. It is important for the ophthalmology research community to discuss ways that this tool can be used ethically. This is especially important in the field of ophthalmology where research output can be used to determine selection into residency programs or aid in career advancements [83, 84].
Researchers are typically prodding at the boundaries of human knowledge and must think critically about the information scientific papers provide, recognizing the scopes of claims and evaluating flaws in study designs. They may encounter issues using LLMs for such niche topics, as these will naturally be more sparsely represented in the corpus that the model is trained on. This can lead to more errant behaviour, such as overstated confidence for claims with flimsy evidence backing them, or conversely, claiming a topic is unstudied when in fact it may simply not be present in the training corpus. More broadly, the capability of LLMs when it comes to uncertainty quantification is an open question. It is important for researchers interacting with LLMs to account for these behaviours, such as by manually verifying claims, cross-checking the results of using LLMs for search against better-established search algorithms, and even asking the model itself. An example of this is shown in the paper by Lee et al. regarding the use of GPT-4 in medicine, where the authors ask the model “can you check this conversation between a human and an AI chatbot for error” and the response admits some mistakes made during the conversation [11].
A more pernicious risk is that even when the LLM refers to real and relevant articles, they may not contain the information they are being cited to support. There have been several reports showing incorrect citations generated by ChatGPT [69, 85]. In this case, verification is more costly and involves checking the contents of the article against the claim. Additionally, it is opaque to the user which factors determine the selection of articles shown. With Google Scholar, PubMed, and similar article search engines, there is a notion of sorting by keyword relevance, citation count, or even publication date. Due to the black box nature of LLMs, it may not be clear why particular articles are chosen. They may not represent strong, peer-reviewed sources of evidence, and may introduce or amplify biases in the types of research that is cited.
Steps can be taken to mitigate these risks. For instance, document-based semantic search involves retrieving documents that pertain to a user query and passing them on to the LLM to answer the query from. This decreases the risk of hallucinations by encouraging the model to answer from the provided sources, pre-screened for credibility. LLM-based tools such as elicit.org or AI2’s Semantic Scholar have been designed to help scientists perform better literature reviews with such approaches [19]. Here, we generated a custom chatbot, SightBot, that uses OpenAI’s API for GPT-3.5 and provides a response based on a PubMed search [20]. Unlike ChatGPT, which often acts as a black box, these methods offer users more transparency for research and aid in verifying that the information provided by the LLM is accurate. However, there are errors produced even with these approaches. For example, SightBot will often output the PubMed ID of articles that are not as relevant to the question. Additionally, elicit.org has published some limitations of their search system, noting that “Elicit can miss the nuance of a paper or misunderstand what a number refers to” [19]. Further work needs to be done to improve AI-based search engines for researchers.
Conclusion
The field of ophthalmology has always been on the cutting edge of technology [86] and analyses these techniques to optimize clinical care and research. ChatGPT has been the first large-language model that has been adapted in public with this much enthusiasm and scepticism. Companies are working to develop more LLMs and focusing on fine tuning these using domain-specific knowledge or building them from scratch with custom data. These may provide more accurate medical and scientific information compared to ChatGPT which has a broad training base. However, more studies need to be done to evaluate how well these other LLMs function for ophthalmologists.
The field of LLMs and the associated research in medicine is rapidly growing. Since the GPT-3.5 model was released in November 2022, there have been nearly many research articles mentioning ChatGPT indexed on PubMed, with even more preliminary studies published on MedArchives. It is evident that this type of technology will continue to impact medicine and healthcare. Many physicians and scientists are advocating for systems to be built that support the use of this technology in an ethical manner [87]. It is important for ophthalmologists to work with companies that are designing these tools to implement changes that will benefit patient care and research while making sure to reduce any potential harm. The building of LLMs specifically trained on ophthalmology information may aid in ensuring they provide accurate and relevant knowledge for research and clinical practice. Understanding the principles behind this technology such as its benefits, limitations, and potential for future improvements can be useful for ophthalmologists as the field begins to evaluate the effectiveness of LLMs in both medicine and research.
Acknowledgements
The authors would like to thank their colleagues who helped test SightBot.
Author contributions
Writing: NK, SS, JO, JC; Review and Editing: NK, SS, JO, JC; Final Approval of Manuscript: NK, SS, JO, JC.
Funding information
This research has received no external funding from any agency.
Competing interests
The custom-generated chatbot mentioned in this article, SightBot, was developed by Suvansh Sanjeev and Nikita Kedia under advisement from Jay Chhablani, and was used as a demo for Suvansh’s company, Brilliantly AI (https://brilliantly.ai). The chatbot is not revenue generating.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama. 2016;316:2402–10. doi: 10.1001/jama.2016.17216. [DOI] [PubMed] [Google Scholar]
- 2.Ting D, Pasquale LR, Peng L, Campbell JP, Lee AY, Raman R, et al. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol. 2019;103:167–75. doi: 10.1136/bjophthalmol-2018-313173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Keskinbora K, Guven F. Artificial intelligence and ophthalmology. Turk J Ophthalmol. 2020;50:37–43. doi: 10.4274/tjo.galenos.2020.78989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ong J, Selvam A, Chhablani J. Artificial intelligence in ophthalmology: optimization of machine learning for ophthalmic care and research. Clin Exp Ophthalmol. 2021;49:413–5. doi: 10.1111/ceo.13952. [DOI] [PubMed] [Google Scholar]
- 5.OpenAI. Introducing ChatGPT, https://openai.com/blog/chatgpt (2022).
- 6.OpenAI. GPT-4 Technical Report. ArXiv abs/2303.08774 (2023).
- 7.Tools such as ChatGPT threaten transparent science; here are our ground rules for their use. Nature. 2023;613:612, 10.1038/d41586-023-00191-1. [DOI] [PubMed]
- 8.The Lancet Digital, H. ChatGPT: friend or foe? Lancet Digit Health. 2023;5:e102. doi: 10.1016/s2589-7500(23)00023-7. [DOI] [PubMed] [Google Scholar]
- 9.Will ChatGPT transform healthcare? Nat Med. 2023;29:505–6, 10.1038/s41591-023-02289-5. [DOI] [PubMed]
- 10.Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and other large language models are double-edged swords. Radiology. 2023;307:230163, 10.1148/radiol.230163. [DOI] [PubMed]
- 11.Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI Chatbot for medicine. N. Engl J Med. 2023;388:1233–9. doi: 10.1056/NEJMsr2214184. [DOI] [PubMed] [Google Scholar]
- 12.Ong J, Hariprasad SM, Chhablani J. ChatGPT and GPT-4 in ophthalmology: applications of large language model artificial intelligence in retina. Ophthalmic Surg Lasers Imaging Retin. 2023;54:557–62. doi: 10.3928/23258160-20230926-01. [DOI] [PubMed] [Google Scholar]
- 13.Kojima T, Gu SS, Reid M, Matsuo, Y & Iwasawa, Y. Large language models are zero-shot reasoners. ArXiv abs/2205.11916 (2022).
- 14.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017;30:1–15.
- 15.OpenAI. Model index for researchers. 2023 https://platform.openai.com/docs/model-index-for-researchers.
- 16.Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Adv Neural Inf Process Syst. 2020;33:1877–901. [Google Scholar]
- 17.Ouyang L, Wu J, Jiang X, Almeida D, Wainwright CL, Mishkin P, et al. Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst. 2022;35:27730–44. [Google Scholar]
- 18.Wang C, Ong J, Wang C, Ong H, Cheng R, Ong D. Potential for GPT technology to optimize future clinical decision-making using retrieval-augmented generation. Ann Biomed Eng. 2023 10.1007/s10439-023-03327-6. [DOI] [PubMed]
- 19.Elicit. 2023 https://elicit.org/.
- 20.Sanjeev S. Meet SightBot: ChatGPT-powered research insights with pubmed citations. 2023 https://www.brilliantly.ai/blog/sightbot.
- 21.Abhinav Venigalla JF, Carbin M. BioMedLM: a Domain-Specific Large Language Model for Biomedical Text. 2022 https://www.mosaicml.com/blog/introducing-pubmed-gpt.
- 22.Yasunaga M, Bosselut A, Ren H, Zhang X, Manning CD, Liang P, et al. Deep bidirectional language-knowledge graph pretraining. Adv Neural Inf Process Syst. 2022;35:37309–23. [Google Scholar]
- 23.Yasunaga M, Leskovec J & Liang P Linkbert: Pretraining language models with document links. arXiv preprint arXiv:2203.15827 (2022), 8003–16.
- 24.Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthc (HEALTH) 2021;3:1–23. [Google Scholar]
- 25.Shin HC, Zhang Y, Bakhturina E, Puri R, Patwary M, Shoeybi M, et al. BioMegatron: Larger biomedical domain language model. arXiv preprint arXiv:2010.06060 2020; 4700–6.
- 26.Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H, et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefi Bioinform. 2022;23:1–12. [DOI] [PubMed]
- 27.Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2:e0000198. doi: 10.1371/journal.pdig.0000198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Antaki F, Touma S, Milad D, El-Khoury J, Duval R. Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings. Ophthalmol Sci. 2023;3:100324. doi: 10.1016/j.xops.2023.100324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Antaki, F, Touma, S, Milad, D, El-Khoury, J & Duval, R. Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings. medRxiv. 2023; 2023.2001.2022.23284882, 10.1101/2023.01.22.23284882. [DOI] [PMC free article] [PubMed]
- 30.Teebagy S, Colwell L, Wood E, Yaghy A, Faustina M. Improved performance of ChatGPT-4 on the OKAP examination: a comparative study with ChatGPT-3.5. J Acad Ophthalmol (2017) 2023;15:e184–e187. doi: 10.1055/s-0043-1774399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Delsoz M, Raja H, Madadi Y, Tang AA, Wirostko BM, Kahook MY, et al. The use of ChatGPT to assist in diagnosing glaucoma based on clinical case reports. Ophthalmol Ther. 2023;12:3121–32. doi: 10.1007/s40123-023-00805-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.OpenAI. GPT-4. 2023 https://openai.com/research/gpt-4.
- 33.Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A, Barham P. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
- 34.Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138 (2022).
- 35.Matias Y & Corrado G. Our latest health AI research updates. 2023 https://blog.google/technology/health/ai-llm-medpalm-research-thecheckup/.
- 36.Rao A, Kim J, Kamineni M, Pang M, Lie W, Succi MD. Evaluating ChatGPT as an adjunct for radiologic decision-making. medRxiv. 2023 10.1101/2023.02.02.23285399. [DOI] [PMC free article] [PubMed]
- 37.Hirosawa T, Harada Y, Yokose M, Sakamoto T, Kawamura R, Shimizu T. Diagnostic accuracy of differential-diagnosis lists generated by generative pretrained transformer 3 chatbot for clinical vignettes with common chief complaints: a pilot study. Int J Environ Res Public Health. 2023;20, 10.3390/ijerph20043378. [DOI] [PMC free article] [PubMed]
- 38.Liu S, Wright AP, Patterson BL, Wanderer JP, Turer RW, Nelson SD, et al. Assessing the value of ChatGPT for clinical decision support optimization. medRxiv. 2023 10.1101/2023.02.21.23286254.
- 39.Michael B, Edsel BI. Conversational AI Models for ophthalmic diagnosis: comparison of ChatGPT and the isabel pro differential diagnosis generator. JFO Open Ophthalmol. 2023;1:100005. doi: 10.1016/j.jfop.2023.100005. [DOI] [Google Scholar]
- 40.Potapenko I, Boberg-Ans LC, Stormly Hansen M, Klefter ON, van Dijk E, Subhi Y. Artificial intelligence-based chatbot patient information on common retinal diseases using ChatGPT. Acta Ophthalmol. 2023;101:829–31, 10.1111/aos.15661. [DOI] [PubMed]
- 41.Cocco AM, Zordan R, Taylor DM, Weiland TJ, Dilley SJ, Kant J, et al. Dr Google in the ED: searching for online health information by adult emergency department patients. Med J Aust. 2018;209:342–7. doi: 10.5694/mja17.00889. [DOI] [PubMed] [Google Scholar]
- 42.Ong H, Ong J, Cheng R, Wang C, Lin M, Ong D. GPT technology to help address longstanding barriers to care in free medical clinics. Ann Biomed Eng. 2023;51:1906–9. doi: 10.1007/s10439-023-03256-4. [DOI] [PubMed] [Google Scholar]
- 43.AlRyalat SA & Kahook MY. The use of artificial intelligence chatbots in ophthalmology. 2022 https://www.glaucomaphysician.net/issues/2022/december-2022/the-use-of-artificial-intelligence-chatbots-in-oph.
- 44.Parikh D, Armstrong G, Liou V, Husain D. Advances in telemedicine in ophthalmology. Semin Ophthalmol. 2020;35:210–5. doi: 10.1080/08820538.2020.1789675. [DOI] [PubMed] [Google Scholar]
- 45.Mudie LI, Patnaik JL, Gill Z, Wagner M, Christopher KL, Seibold LK, et al. Disparities in eye clinic patient encounters among patients requiring language interpreter services. BMC Ophthalmol. 2023;23:82. doi: 10.1186/s12886-022-02756-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Nesher R, Ever-Hadani P, Epstein E, Stern Y, Assia E. Overcoming the language barrier in visual field testing. J Glaucoma. 2001;10:203–5. doi: 10.1097/00061198-200106000-00010. [DOI] [PubMed] [Google Scholar]
- 47.Read-Brown S, Hribar MR, Reznick LG, Lombardi LH, Parikh M, Chamberlain WD, et al. Time requirements for electronic health record use in an academic ophthalmology center. JAMA Ophthalmol. 2017;135:1250–7. doi: 10.1001/jamaophthalmol.2017.4187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47:33. doi: 10.1007/s10916-023-01925-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Patel SB, Lam K. ChatGPT: the future of discharge summaries? Lancet Digit Health. 2023;5:e107–e108. doi: 10.1016/S2589-7500(23)00021-3. [DOI] [PubMed] [Google Scholar]
- 50.Microsoft and Epic expand strategic collaboration with integration of Azure OpenAI Service. 2023 https://news.microsoft.com/2023/04/17/microsoft-and-epic-expand-strategic-collaboration-with-integration-of-azure-openai-service/.
- 51.Johnson AE, Pollard TJ, Shen L, Lehman LW, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3:160035. doi: 10.1038/sdata.2016.35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Huang K, Altosaar J & Ranganath R. Clinicalbert: modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv. 2019;1904.05342.
- 53.Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C, et al. A large language model for electronic health records. NPJ Digit Med. 2022;5:194. doi: 10.1038/s41746-022-00742-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Wang S, Zhao Z, Ouyang X, Wang Q & Shen D. ChatCAD: interactive computer-aided diagnosis on medical image using large language models. ArXiv. 2023;abs/2302.07257. [DOI] [PMC free article] [PubMed]
- 55.Huang S, Dong L, Wang W, Hao Y, Singhal S, Ma S, et al. Language is not all you need: Aligning perception with language models. arXiv preprint arXiv.2302.14045 2023.
- 56.Be My Eyes. 2023 https://openai.com/customer-stories/be-my-eyes.
- 57.Azamfirei R, Kudchadkar SR, Fackler J. Large language models and the perils of their hallucinations. Crit Care. 2023;27:120. doi: 10.1186/s13054-023-04393-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. ArXiv. 2022 abs/2212.13138
- 59.Baumgartner C. The potential impact of ChatGPT in clinical and translational medicine. Clin Transl Med. 2023;13:e1206. doi: 10.1002/ctm2.1206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Berkowitz ST, Groth SL, Gangaputra S, Patel S. Racial/ethnic disparities in ophthalmology clinical trials resulting in us food and drug administration drug approvals from 2000 to 2020. JAMA Ophthalmol. 2021;139:629–37. doi: 10.1001/jamaophthalmol.2021.0857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Zambelli-Weiner A, Crews JE, Friedman DS. Disparities in adult vision health in the United States. Am J Ophthalmol. 2012;154:S23–S30.e21. doi: 10.1016/j.ajo.2012.03.018. [DOI] [PubMed] [Google Scholar]
- 62.Zhang H, Lu AX, Abdalla M, McDermott M & Ghassemi M in proceedings of the ACM Conference on Health, Inference, and Learning. 110–20.
- 63.Beltagy I, Lo K & Cohan A SciBERT: a pretrained language model for scientific text. arXiv preprint arXiv:1903.10676 (2019), 3613–8.
- 64.Pal R, Garg H, Patel S & Sethi T. Bias amplification in intersectional subpopulations for clinical phenotyping by large language models. medRxiv. 2023; 2023.2003.2022.23287585, 10.1101/2023.03.22.23287585.
- 65.Edwards H & Storkey A. Censoring representations with an adversary. arXiv preprint arXiv:1511.05897 (2015).
- 66.Elazar Y & Goldberg Y. Adversarial removal of demographic attributes from text data. arXiv preprint arXiv:1808.06640 (2018), 11–21.
- 67.Chen JS, Lin WC, Yang S, Chiang MF, Hribar MR. Development of an open-source annotated glaucoma medication dataset from clinical notes in the electronic health record. Transl Vis Sci Technol. 2022;11:20. doi: 10.1167/tvst.11.11.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Salvagno M, Taccone FS, Gerli AG. Can artificial intelligence help for scientific writing? Crit Care. 2023;27:75. doi: 10.1186/s13054-023-04380-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare. 2023;11:887. doi: 10.3390/healthcare11060887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Homolak J. Opportunities and risks of ChatGPT in medicine, science, and academic publishing: a modern Promethean dilemma. Croat Med J. 2023;64:1–3. doi: 10.3325/cmj.2023.64.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Hutson M. Could AI help you to write your next paper? Nature. 2022;611:192–3. doi: 10.1038/d41586-022-03479-w. [DOI] [PubMed] [Google Scholar]
- 72.Dahmen J, Kayaalp ME, Ollivier M, Pareek A, Hirschmann MT, Karlsson J, et al. Artificial intelligence bot ChatGPT in medical research: the potential game changer as a double-edged sword. Knee Surg, Sports Traumatol, Arthrosc. 2023;31:1187–9. doi: 10.1007/s00167-023-07355-6. [DOI] [PubMed] [Google Scholar]
- 73.Owens B. How nature readers are using ChatGPT. Nature. 2023;615:20. doi: 10.1038/d41586-023-00500-8. [DOI] [PubMed] [Google Scholar]
- 74.Wang S, Scells H, Koopman B & Zuccon G. Can ChatGPT write a good boolean query for systematic review literature search? ArXiv. 2023; abs/2302.03495
- 75.Yu ZL, Hu XY, Wang YN, Ma Z. Scientometric analysis of published papers in global ophthalmology in the past ten years. Int J Ophthalmol. 2017;10:1898–901. doi: 10.18240/ijo.2017.12.17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Chen JS, Baxter SL. Applications of natural language processing in ophthalmology: present and future. Front Med (Lausanne) 2022;9:906554. doi: 10.3389/fmed.2022.906554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Else H. Abstracts written by ChatGPT fool scientists. Nature. 2023;613:423. doi: 10.1038/d41586-023-00056-7. [DOI] [PubMed] [Google Scholar]
- 78.Faisal RE, Leena NR. AI-generated research paper fabrication and plagiarism in the scientific community. Patterns. 2023;4:100706. doi: 10.1016/j.patter.2023.100706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.King MR, chatGpt. A conversation on artificial intelligence, chatbots, and plagiarism in higher education. Cell Mol Bioeng. 2023;16:1–2. doi: 10.1007/s12195-022-00754-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Zhavoronkov A. Rapamycin in the context of Pascal’s Wager: generative pre-trained transformer perspective. Oncoscience. 2022;9:82–84. doi: 10.18632/oncoscience.571. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Elsevier. The Use of AI and AI-assisted Technologies in Scientific Writing. 2023 https://www.elsevier.com/about/policies/publishing-ethics/the-use-of-ai-and-ai-assisted-writing-technologies-in-scientific-writing.
- 82.Eye. Guide to Authors. 2023 https://www.nature.com/eye/authors-and-referees/gta.
- 83.Srinivasan N, Zhou B, Taruvai V, Nadkarni S, Song A, Khouri AS. Catching eyes: an analysis of medical student publications in the ophthalmology match. Investig Ophthalmol Vis Sci. 2021;62:2660–2660. [Google Scholar]
- 84.Protopsaltis NJ, Chen AJ, Hwang V, Gedde SJ, Chao DL. Success in attaining independent funding among national institutes of health K grant awardees in ophthalmology: an extended follow-up. JAMA Ophthalmol. 2018;136:1335–40. doi: 10.1001/jamaophthalmol.2018.3887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Alkaissi H, McFarlane SI. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus. 2023;15:e35179. doi: 10.7759/cureus.35179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Ong J, Hariprasad SM, Chhablani J. A guide to accessible artificial intelligence and machine learning for the 21st century retina specialist. Ophthalmic Surg Lasers Imaging Retin. 2021;52:361–5. doi: 10.3928/23258160-20210628-01. [DOI] [PubMed] [Google Scholar]
- 87.Ali MJ & Djalilian A. Readership Awareness Series – Paper 4: Chatbots and ChatGPT - Ethical Considerations in Scientific Publications. Seminars in Ophthalmology. 2023;38:1–2 10.1080/08820538.2023.2193444. [DOI] [PubMed]
- 88.Nori H, King N, McKinney SM, Carignan D & Horvitz E. Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375 (2023).
- 89.Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32:D267–270. doi: 10.1093/nar/gkh061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Roberts, A Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer. 2020 https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html.


