Abstract
Large language models such as ChatGPT have gained public and scientific attention. These models may support oncologists in their work. Oncologists should be familiar with large language models to harness their potential while being aware of potential dangers and limitations.
Supplementary Information
The online version contains supplementary material available at 10.1007/s00432-023-04824-w.
Large language models (LLMs) such as ChatGPT have gained public and scientific attention. While one of the main controversies among the scientific community regarding LLMs relates to their ability to generate synthetic data (Editorials Nature 2023), it is important to recognize the potential of these models for various NLP applications, including text summarization, question-answering, and conversational AI. Despite the valid concerns, this revolutionary technology holds the promise of significantly impacting oncology by streamlining clinical decision-making, enhancing patient education, and accelerating research (Rösler et al 2023). Deep learning natural language processing (NLP) is increasingly being used in medicine and oncology, allowing free text analyses (Hirschberg and Manning 2015; Sorin et al. 2020a). The number of publications on deep learning NLP in medicine is growing (Supplemental Fig. 1).
In recent years, transformer models have emerged as the state-of-the-art method for many NLP applications. Transformers are a type of deep learning algorithm that can analyze and process large amounts of text data. They are composed of multiple stacked layers that utilize a mechanism called Attention. This mechanism enables the model to assign varying levels of importance to different words within a text (Supplemental Fig. 2). Unlike traditional machine learning methods, transformers focus on the context of words in a text. By considering the surrounding words and their relationships, transformers are able to better capture complex patterns and nuanced relationships within a text (Vaswani 2017) (Supplemental Fig. 3).
The impressive performance of transformers on a variety of NLP tasks has caused further interest in exploring larger transformer models. LLMs are currently at the forefront of research and development. These models are pre-trained on diverse and extensive text data, consisting of billions or trillions of words and parameters. This allows them to generate human-like responses for various NLP tasks. One example is GPT (Generative Pre-training Transformer). “ChatGPT” is a chatbot with a free-to-use public preview provided by OpenAI. It is capable of capturing complex relationships within text data, making it a powerful tool in a variety of applications. In healthcare, the model's ability to process and analyze large volumes of text makes it valuable for tasks such as medical literature review, patient communication, and clinical decision support. There are also LLMs specifically trained for scientific and medical purposes. For example, BioBERT was trained on PubMed publications (Lee 2019). Another example is MedPaLM, a medical question–answering chatbot launched by Google and trained on medical datasets such as HealthSearchQA, PubMedQA and others.
In oncology, there are several potential uses for LLMs (Singhal, et al. 2022; Yang, et al. 2022; Kather 2023). The models can be used by clinicians during everyday clinical work, saving time and improving medical care (Supplemental Fig. 4). For example, LLMs can be used to identify and summarize key findings in research texts, or clinical notes. An LLM can summarize the main findings in radiology or pathology reports (Elkassem and Smith 2023; Ma, et al. 2021). The algorithm can even perhaps suggest appropriate management (Liu 2023) (Fig. 1). This information could be organized and presented in a clear and concise manner, allowing oncologists to quickly access and act upon important patient data. We used the LLM ‘‘ChatGPT’’ to show one possible use case by summarizing a synthetic radiology report (Supplemental Fig. 5).
Fig. 1.
The figure illustrates a flow of information from various input data sources (such as medical records, radiology and pathology reports, genomic data, clinical trial data, and scientific research) through large language model (LLM) processing, where tasks such as natural language understanding, pattern recognition, and data analysis are performed. Examples for LLM's applications in oncology are shown as output blocks, including text summarization, diagnosis and decision support, clinical trial matching, and prognostic prediction. Potential challenges and limitations of LLMs in oncology, such as data security concerns, biases, and legal responsibility are highlighted in black blocks
LLMs can be used for question–answering on different topics, including in oncology. By interacting with ChatGPT, physicians can quickly access relevant information and learn about all types of topics, potentially enhancing their knowledge and decision-making abilities. For example, a recent publication by Holmes et al. 2023) evaluated four different LLMs (including Bard, BLOOMZ, ChatGPT-3.5, and ChatGPT-4) in multiple-choice questions in radiation-oncology physics. ChatGPT-4 was the only model that outperformed medical physicists in tests scores (Holmes et al. 2023).
LLMs can be effectively employed to inform and educate patients about their unique cancer diagnosis, available treatment options, and potential side effects. These models can generate easily understandable explanations of medical personal data for the general public (Jeblick et al. 2022). Through a user-friendly interface, patients can engage with the model to ask questions and receive personalized information. This will enable patients to better understand their condition and make informed decisions about their care, while also fostering improved communication between patients and healthcare providers. LLMs can also have a tremendous impact on scientific research. Different ideas for potential applications in oncology are summarized in Supplemental Table 1.
There are, however, limitations to this technology that have to be considered. The quality and diversity of data used for training LLMs affect their performance. Data have to be representative of various cancer types, stages, and demographics to create unbiased models. However, acquiring accurate, diverse, and large datasets can be challenging. Data are scattered across institutions, there are patient privacy concerns, and reluctance to share data among healthcare organizations. Furthermore, rare cancer cases or underrepresented populations may not have enough data available, leading to potential blind spots in the models. Finally, the use of unverified data sources for training LLMs introduces the risk of incorporating incorrect or outdated information into the models. This can lead to inaccurate or outdated responses that may negatively affect patient care.
Currently, LLMs sometimes generate false or fake information, with a seemingly high level of confidence. In the context of clinical application, false information can have serious consequences. Overreliance on these models may lead to a reduction in human expertise. This will in turn result in an inability to identify when the models are wrong. Furthermore, algorithms can replicate biases and discrimination, based on the data that they were trained on (Sorin and Klang 2021). Addressing biases in LLMs is crucial to ensure their safe and effective deployment in oncology. This can be done with pre-processing training data, ensuring that the data is representative of diverse patient populations and cancer types. Involving oncologists can help identify potential sources of bias and guide the selection of high-quality and reliable data sources. During training, adversarial examples (Sorin et al 2020a, b) can be used to minimize biases in model predictions. Periodic evaluation and monitoring of model performance across different population subgroups are also imperative to ensure that disparities are not introduced.
For LLMs to be effectively utilized in oncology or any other medical field, it is crucial for healthcare providers and patients to have confidence in the accuracy and reliability of the generated responses. Otherwise, the integration of these AI models into medical practice becomes unfeasible. To foster trust, it is important to improve the transparency of AI algorithms, making it clearer how the models arrive at their conclusions. Improving transparency in LLMs can be challenging due to their complexity and the large size of these models. Thus, fostering trust might involve providing explanations of the models’ underlying logic, or presenting confidence scores for their predictions. Also, implementing validation mechanisms, such as cross-referencing generated content with trusted medical databases, can help reduce the risk of misinformation.
There are ethical aspects and legal responsibility concerns if the models are integrated into medical decision-making processes. Developing clear guidelines, regulations, and ethical frameworks (UNESCO 2022) is essential to ensure responsible AI usage in medicine. The use of these models has to be in compliance with the Health Insurance Portability and Accountability Act (HIPAA) (HIPAA Act of 1996). The current versions of GPT do not adhere to HIPAA regulations and may compromise patient confidentiality. Thus, clinicians should refrain from entering protected health information until professional-grade versions with sufficient safeguards are accessible. Additionally, as LLMs become more prevalent in healthcare, they may become targets for adversarial cyber-attacks (Finlayson et al. 2019). Thus, robust security measures are needed, including encryption, secure access controls, and continuous monitoring for potential vulnerabilities.
There are some promising future directions for the technology in oncology. LLMs may be considered as “few-shot learners” (Brown et al. 2020), meaning that once trained, they can adapt to a new domain with a small amount of examples. This is in contrast to traditional machine learning models, which typically require a large dataset to learn a new task or fine-tune it to a specific topic. Not only does this makes LLMs flexible, but also potentially highly efficient if fine-tuned on a large amount of data for a specific narrow topic.
The technology improves exponentially, and we can only imagine its performance in the years to come. For example, GPT-3 with 175 billion parameters is 100-fold larger than GPT-2. GPT-4 was launched March 14th 2023, and it’s larger and more accurate. LLMs may ultimately lead to cost savings in oncology, both in clinical research and patient care. The assimilation of this technology is already happening. Risks and limitations should be thus discussed and addressed. A consensus statement by experts on the use of LLMs may be warranted to ensure the safe and responsible use of the technology in everyday clinical work.
Interdisciplinary collaboration is crucial for maximizing the potential of LLMs in oncology. A coordinated effort among AI specialists, data scientists, oncologists, and policymakers is essential for the successful development and implementation of LLMs in practice. AI experts and data scientists develop and refine the algorithms. Oncologists can contribute their clinical expertise, guiding the development process and addressing clinically relevant questions and problems. They also ensure that AI-generated content is accurate, meaningful, and actionable. Regulators and policymakers must establish and enforce transparent guidelines and ethical frameworks to govern the use of LLMs in healthcare, safeguarding patient safety and privacy. By cultivating a collaborative atmosphere, stakeholders can overcome challenges, enhance the advantages of LLMs, and ultimately improve patient care and outcomes in oncology.
To conclude, we may be on the verge of a revolution in NLP. LLMs have achieved state-of-the art results in many human language tasks, and have the ability to adapt to new tasks and domains. Some of these models are easily accessed through the internet, making them convenient to use at any time. In oncology, they may improve the efficiency and accuracy of cancer research and care. Despite their limitations and associated concerns, it is likely that the use of LLMs in oncology will expand and evolve. Given the significant advancements, oncologists should be familiar with the technology and its potential benefits, costs and limitations.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgements
We thank Rotem Schwartz for graphic design of the figures in this manuscript.
Author contributions
VS and EKL reviewed the literature and wrote the paper. YB and EKO critically revised the manuscript and contributed to the discussion. EKL conceived and directed the project.
Funding
No funding has been received for this work.
Declarations
Conflict of interests
We declare no conflict of interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Brown T et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901 [Google Scholar]
- Editorials N (2023) Tools such as ChatGPT threaten transparent science; here are our ground rules for their use. Nature. 10.1038/d41586-023-00191-1 [DOI] [PubMed] [Google Scholar]
- Elkassem AA, Smith AD (2023) Potential use cases for ChatGPT in radiology reporting. Am J Roentgenol. 10.2214/AJR.23.29198 [DOI] [PubMed] [Google Scholar]
- Finlayson SG et al (2019) Adversarial attacks on medical machine learning. Science 363:1287–1289 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Health Insurance Portability and Accountability Act of 1996 Pub. L. No. 104–191 (1996)
- Hirschberg J, Manning CD (2015) Advances in natural language processing. Science 349:261–266 [DOI] [PubMed] [Google Scholar]
- Holmes J et al (2023) Evaluating large language models on a highly-specialized topic, radiation oncology physics arXiv preprint. arXiv 2304:01938 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jeblick K et al (2022) ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports arxiv preprint. arXiv 2212:14882 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kather JN (2023) Artificial intelligence in oncology: chances and pitfalls. J Cancer Res Clin Oncol. 10.1007/s00432-023-04666-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee J et al (2019) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 10.1093/bioinformatics/btz682 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu S et al (2023) Using AI-generated suggestions from ChatGPT to optimize clinical decision support. J Am Med Inform Assoc. 10.1093/jamia/ocad072 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma C et al (2021) ImpressionGPT: an iterative optimizing framework for radiology report summarization with chatGPT. ArXiv Preprint arXiv 2304:08448 [Google Scholar]
- Rösler W et al (2023) An overview and a roadmap for artificial intelligence in hematology and oncology. J Cancer Res Clin Oncol. 10.1007/s00432-023-04667-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Singhal K et al (2022) Large language models encode clinical knowledge. ArXiv Preprint arXiv 2212:13138 [Google Scholar]
- Sorin V, Klang E (2021) Artificial intelligence and health care disparities in radiology. Radiology 301:E443–E443 [DOI] [PubMed] [Google Scholar]
- Sorin V, Barash Y, Konen E, Klang E (2020a) Deep-learning natural language processing for oncological applications. Lancet Oncol 21:1553–1556 [DOI] [PubMed] [Google Scholar]
- Sorin V, Barash Y, Konen E, Klang E (2020b) Creating artificial images for radiology applications using generative adversarial networks (GANs)—a systematic review. Acad Radiol 27:1175–1185 [DOI] [PubMed] [Google Scholar]
- UNESCO. Recommendation on the Ethics of Artificial Intelligence (2022).
- Vaswani, A. et al (2017) Attention is all you need. Advances in neural information processing systems 30.
- Yang X et al (2022) A large language model for electronic health records. Npj Digit Med. 10.1038/s41746-022-00742-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

