Abstract
The rapid evolution of artificial intelligence (AI) in healthcare, particularly in radiology, underscores a transformative era marked by a potential for enhanced diagnostic precision, increased patient engagement, and streamlined clinical workflows. Amongst the key developments at the heart of this transformation are Large Language Models like the Generative Pre-trained Transformer 4 (GPT-4), whose integration into radiological practices could potentially herald a significant leap by assisting in the generation and summarization of radiology reports, aiding in differential diagnoses, and recommending evidence-based treatments. This review delves into the multifaceted potential applications of Large Language Models within radiology, using GPT-4 as an example, from improving diagnostic accuracy and reporting efficiency to translating complex medical findings into patient-friendly summaries. The review acknowledges the ethical, privacy, and technical challenges inherent in deploying AI technologies, emphasizing the importance of careful oversight, validation, and adherence to regulatory standards. Through a balanced discourse on the potential and pitfalls of GPT-4 in radiology, the article aims to provide a comprehensive overview of how these models have the potential to reshape the future of radiological services, fostering improvements in patient care, educational methodologies, and clinical research.
Keywords: Artificial Intelligence in Radiology, Generative Pre-trained Transformer 4 (GPT-4), Automated Radiology Reporting, Patient Communication, Radiology Education, AI Ethics and Regulation, Technology Adoption in Healthcare
Highlights
-
•
GPT-4 in Radiology: Explores the transformative role of GPT-4 in enhancing diagnostic accuracy and streamlining workflows in radiology.
-
•
Automated Report Generation: Demonstrates how GPT-4 can automate the creation and summarization of radiology reports.
-
•
Patient Engagement: Highlights GPT-4's ability to convert complex radiological data into understandable language.
-
•
Educational Impact: Discusses the potential of GPT-4 to revolutionize radiology education through personalized learning modules and real-time feedback.
-
•
Ethical and Regulatory Considerations: Addresses the ethical, privacy, and technical challenges in deploying AI technologies like GPT-4.
1. Introduction
As the digital transformation of healthcare accelerates, radiology finds itself at the forefront of adopting artificial intelligence (AI) to enhance diagnostic accuracy, streamline clinical workflows, increase patient engagement and improve outcomes. The emergence of Large Language Models (LLMs) like Generative Pre-trained Transformer 4 (GPT-4) (Open AI, San Francisco, USA) represents a significant milestone in this journey, offering multiple opportunities to reshape the landscape of radiological services. This review article examines how this powerful AI model has the potential to catalyze innovations across various facets of radiology, including the way reports are structured, patients are engaged, research is conducted, and educational content is delivered.
By leveraging its advanced natural language processing capabilities, GPT-4 has shown a significant potential in automating the generation and summarization of radiology reports, offering informed differential diagnoses, and providing evidence-based treatment recommendations. Such capabilities not only augment the radiologist's role but also promise to reduce diagnostic errors and improve patient care.
Beyond clinical applications, this article explores GPT-4's impact on patient engagement and education. With its ability to translate complex radiological findings into understandable language, GPT-4 can bridge the communication gap between radiologists and patients, facilitating better-informed healthcare decisions. Moreover, its application in radiology education and research can pave the way for a more interactive and personalized learning experience, fostering the next generation of radiologists and advancing radiological research.
Through a discussion on the ethical implications, data privacy concerns, and the need for human oversight, the article also underscores the importance of navigating the application of AI in radiology with caution and responsibility.
2. GPT-4: origins and development
The genesis of GPT-4 traces back to the foundational work on transformers in 2017, a novel neural network architecture that introduced self-attention mechanisms, allowing for significantly improved performance in language understanding and generation tasks [1]. Building on this, OpenAI introduced the first version of the Generative Pre-trained Transformer (GPT) in 2018, demonstrating the potential of transformers in generating coherent and contextually relevant text based on vast amounts of training data [2].
Subsequent iterations, GPT-2 and GPT-3, marked significant leaps in language model capacity and versatility, with GPT-3, released in 2020, showcasing a substantial ability to perform a wide range of language tasks from a minimal number of examples. These models were pre-trained on diverse internet text, enabling them to generate human-like text, answer questions, summarize passages, and even create content in various formats.
GPT-4 and GPT-4o the latest iterations, represent a further leap in AI capabilities. These new models further refine and expand upon the capabilities of their predecessors, offering better linguistic accuracy, nuanced understanding, and adaptability across languages and contexts. With its enhanced performance, GPT-4 has been shown to have capabilities of AI, particularly in fields requiring high levels of comprehension and synthesis, such as radiology.
LLMs have understandably gotten better at standard NLP tasks like text extraction and classification as their size and training data have grown. In addition, they have unexpectedly acquired what are known as emergent abilities—abilities not present in smaller models. To enable specific use cases, for example, classical NLP models needed to be fine-tuned on enormous data sets of labelled instances. Contrarily, LLMs exhibit strong performance upon exposure to a small number of examples—a process known as "few-shot learning." They perform well for some tasks even when given just one example (single-shot learning) or none at all (zero-shot learning). [3]
3. Potential clinical applications in radiology
3.1. Assistance to referring physicians
Ensuring that the patients are referred for the most appropriate imaging investigation is a critical first step in the radiology journey. This will increase diagnostic accuracy and decrease the number of inappropriate and unnecessary investigations, radiation exposure and healthcare costs [4]. Adherence to appropriateness guidelines and recommendations like the American College of Radiology (ACR) Appropriateness Criteria is therefore extremely important. It has been shown that an LLM connected to the ACR appropriateness criteria outperformed radiologists in applying those criteria to certain clinical situations, and at a lower cost [5]
Chat GPT can be of value in providing clinical decision support and provide recommendations regarding the most appropriate investigation for a given clinical setting [6]. The performance of GPT models is shown to be higher when specialist knowledge is incorporated to make them appropriateness criteria context aware as compared with their generic counterparts [4].
3.2. Automated population of clinical history in a report
Better clinical histories on radiology referrals improve interpretation and diagnosis. But history provided often is incomplete or inadequate [7]. GPT-4 can collate and summarise the relevant information needed by the radiologists to issue accurate reports. It can do this by analysing data from various sources like the referral details, previous imaging reports and the EMR. It can pre-populate the clinical history section of the report with this summary for the radiologists to review prior to reporting, thereby saving a significant amount of time and effort, and potentially improving the accuracy and clinical relevance of the reports [8].
3.3. Automatic determination of the radiological investigation and protocol
GPT-4 is able to use information from the Radiology Referral Form to determine the type of radiology study, the body region to be scanned and whether contrast enhancement should be used. This task is usually undertaken by radiologists or allied health professionals and requires a high degree of knowledge and expertise [9]. Automation at this step would improve speed and efficiency and save time and resources. One study has shown an agreement between GPT-4 and the reference standard (an expert decision made by a board certified radiologist) in 84 out of 100 cases [10].
3.4. Automated structured report generation
Among the potential applications of LLMs in radiology, automated generation of structured radiology reports stands out as an important innovation. This application harnesses the advanced natural language processing (NLP) capabilities of models like GPT-4 to analyze demographic information and key imaging findings and generate reports with good readability, reasonable image findings and differential diagnoses [11]
Structured radiology reports improve understanding of the reports and enhance collaboration between healthcare professionals. They also improve data extraction for research purposes. GPT-4 is shown to be able to effectively convert free text reports into structured reports [12]
This application of LLMs can help alleviate the substantial workload that burdens radiologists. In the current healthcare landscape, radiologists face increasing pressure from growing imaging volumes, which can lead to burnout and potentially affect the quality of diagnostic processes and the accuracy of the reports [13]. By delegating the task of structuring of reports to LLMs, radiologists can focus their expertise oncomplex cases, interpretive nuances, and direct patient care.
The integration of LLMs into radiology reporting workflows can significantly expedite report turnaround times. The speed and efficiency of LLMs in processing and interpreting vast amounts of data can shorten the time from image acquisition to report completion. This acceleration not only improves patient throughput but also has the potential to enhance patient outcomes by facilitating quicker decision-making in clinical care.
3.5. Provision of differential diagnosis
Providing a differential diagnoses based on imaging patterns is one of the most important aspects of a radiologist’s service. This usually requires referring to literature and is time consuming, especially for radiologists in training [14], [15]. GPT-4 can generate relevant differential diagnoses when it is given text based imaging patterns as an input. In one study [16] 80 differential diagnoses were generated by an expert panel and GPT-4 respectively across multiple subspecialities. There was a 68.8 % (55 out of 80) concordance between the differential diagnoses suggested by the expert panel and GPT-4 % and 93.8 % (75 of 80) of diagnoses proposed by GPT-4 were deemed as acceptable alternatives.
Fig. 1 below is an example of an immediate output from GPT-4 for a prompt asking for differential diagnoses for a solitary liver lesion.
Fig. 1.
Screenshot of output from GPT-4 for the prompt “Please provide 5 differential diagnoses for a solitary hypervascular lesion in the right lobe of the liver in a 25 year old asymptomatic woman.
3.6. Report summarisation
GPT-4 can be used to distil lengthy and complex radiology reports into succinct comprehensible summaries. This functionality could not only streamline the diagnostic workflow for radiologists by providing immediate access to key findings but also facilitate enhanced interdisciplinary communication, ensuring that critical diagnostic information is promptly and effectively shared among healthcare professionals. GPT-4 could also include information from previous reports or the EMR in its reports to provide better context [8], [17].
Fig. 2 below is GPT-4 output when unstructured free text findings from a CT of the pancreas were provided as an input. It was asked to structure the report and provide a summary.
Fig. 2.
Screenshots of the GPT-4 output containing a structured report and a summary.
4. Detection of errors in radiology reports
The high radiologist workloads and unreliable speech recognition can result in errors in radiology reports which may go undetected. GPT-4 can be used as a cost-effective solution to detect and highlight these errors. Its performance is shown to be comparable with that of radiologists [18]. The use of assistive technologies such as this in improving radiology report accuracy can help radiologists focus more on clinical interpretation [19]. In a comparative study, GPT-4 showed the highest accuracy in detecting speech recognition errors in radiology errors, compared with the other advanced generative models tested in that study [20].
4.1. Generation of patient friendly summaries
GPT-4 can interpret and convert radiology reports which contain specialized terminology and complex descriptions, into simpler language. This helps patients understand their medical condition, the findings of radiology reports, and the implications for their health. Access to understandable information empowers patients to take an active role in their healthcare decisions. This empowerment is crucial for shared decision-making. One study [21] has shown that LLMs can improve the readability and ease and decrease the mean reading level of radiology reports across all 4 major diagnostic modalities to the level of the average U.S. adult.
GPT-4 can generate patient friendly summaries in 40 different languages and for patients with different educational levels [8]. This can contribute to reducing health disparities as patients from diverse linguistic and cultural backgrounds will have better access to information.
Fig. 3 is a patient friendly summary generated by GPT-4 for the radiology report example shown in Fig. 3
Fig. 3.
A screenshot of a patient friendly summary generated by GPT-4 for the CT scan report shown previously that showed acute pancreatitis with gall stones.
4.2. Making Treatment Recommendations
GPT-4 is able to generate treatment recommendations from radiology reports. In a pilot study [22] GPT-4 generated largely accurate and clinically useful treatment recommendations for common orthopaedic conditions involving the shoulder and the knee.
4.3. Radiology education and research
GPT-4 can facilitate rapid creation of personalised and customized training modules and educational materials tailored to specific learning objectives, thereby enhancing the educational experience for radiologists in training. Through simulations, GPT-4 can provide a safe environment for learners to practice interpretative skills without the risk of patient harm. It can also provide interactive, live mentor like guidance.
GPT-4 offers multiple forms of assistance to radiologists in research, particularly in organizing and composing their research articles. It can provide guidance on structuring the article coherently, advise on the arrangement of sections such as the introduction, methodology, results, and discussion. In addition, it can aid in refining the article's language and presentation by recommending suitable words, grammatical constructs, and sentence formations. ChatGPT further contributes to the research article's formatting by offering advice on incorporating references, citations, and essential details. [23]. However, the integration of LLMs into the research landscape has raised serious ethical considerations. However a prudent, cautious and transparent use of LLMs in assisting manuscript creation and review process may have benefits [24]
5. Pitfalls, drawbacks, limitations and ethical considerations
As discussed in this article Large Language Models like GPT-4 have a lot of promise in transforming radiology for the better. However, none of these models are currently approved as regulated medical devices. Their use will need close oversight, review and validation by radiologists for every use case.
A significant technical limitation for GPT-4 is the phenomenon of "hallucinations," where it generates convincing but inaccurate or fabricated information. One of the causes of hallucinations could be intrinsic bias or deficiencies in the data used for training these models [25]. The plausible sounding but inaccurate information generated by hallucinations may lead to inappropriate clinical decisions with serious adverse implications for the patients [26]
Data privacy emerges as another paramount concern, especially given the sensitive nature of medical records; ensuring GPT-4's compliance with stringent data protection regulations like HIPAA is essential to safeguard patient confidentiality. Ethical issues also abound, particularly in the realm of patient consent and the transparency of AI-driven decisions. Patients must be adequately informed about the role of AI in their care, including the potential for errors and the measures taken to mitigate such risks.
6. Conclusion
Large Language Models s like GPT-4 and its successors hold promise and potential for having a significant positive impact on radiology services at almost every step, offering benefits such as improved accuracy, efficiency, consistency, and decision support. They could help ease the huge workload burden faced by radiologists across many parts of the world and also allow radiologists to focus more on complex interpretation tasks. However, realizing their full potential requires overcoming significant challenges related to data privacy, model bias, and clinical integration.
Funding statement
This review article did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Disclosure
During the preparation of this work the authors used Chat GPT-4 in some portions of the manuscript in order to improve readability. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.
Ethical statement
This review article does not involve any new studies with human or animal subjects performed by any of the authors. Therefore, no ethics committee approval was required for the preparation of this manuscript.
CRediT authorship contribution statement
Janani Baradwaj: Writing – review & editing, Validation. Sadhana Kalidindi: Methodology, Conceptualization.
Declaration of Competing Interest
I, Dr. Sadhana Kalidindi, declare that I have no financial or personal relationships with other people or organizations that could inappropriately influence my work. There are no professional or other personal interests of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled "Advancing Radiology with GPT-4: Innovations in Clinical Applications, Patient Engagement, Research, and Learning." This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. All data generated or analyzed during this study are included in this published article.
Contributor Information
Sadhana Kalidindi, Email: skalidindi359@gmail.com, Sk15654@bristol.ac.uk.
Janani Baradwaj, Email: dr.janani20@gmail.com.
References
- 1.Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., … & Polosukhin, I. (2017). Attention is all you need." In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017).
- 2.Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I. Language models are unsupervised multitask learners. OpenAI blog. 2019 Feb 24;1(8):9.
- 3.Bhayana R. Chatbots and large language models in radiology: a practical primer for clinical and research applications. Radiology. 2024;310(1) doi: 10.1148/radiol.232756. [DOI] [PubMed] [Google Scholar]
- 4.Rau A., Rau S., Zöller D., Fink A., Tran H., Wilpert C., Nattenmüller J., Neubauer J., Bamberg F., Reisert M., Russe M.F. A context-based chatbot surpasses radiologists and generic ChatGPT in following the ACR appropriateness guidelines. Radiology. 2023;308(1) doi: 10.1148/radiol.230970. [DOI] [PubMed] [Google Scholar]
- 5.A.C.R. Appropriateness Criteria. (n.d.). American College of Radiology. Retrieved April 2, 2024, from 〈https://www.acr.org/Clinical-Resources/ACR-Appropriateness-Criteria〉.
- 6.Shen Y., Heacock L., Elias J., Hentel K.D., Reig B., Shih G., Moy L. ChatGPT and other large language models are double-edged swords. Radiology. 2023;307(2) doi: 10.1148/radiol.230163. [DOI] [PubMed] [Google Scholar]
- 7.Wassermann T.B., Straus C.M. A failure to communicate? Acad. Radio. 2018;25(7):943–950. [Google Scholar]
- 8.Elkassem A.A., Smith A.D. Potential use cases for ChatGPT in radiology reporting. Am. J. Roentgenol. 2023;221(3):373–376. doi: 10.2214/ajr.23.29198. [DOI] [PubMed] [Google Scholar]
- 9.Tadavarthi Y., Makeeva V., Wagstaff W., Zhan H., Podlasek A., Bhatia N., Heilbrun M., Krupinski E., Safdar N., Banerjee I., Gichoya J., Trivedi H. Overview of noninterpretive artificial intelligence models for safety, quality, workflow, and education applications in radiology practice. Radiol.: Artif. Intell. 2022;4(2) doi: 10.1148/ryai.210114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Gertz R.J., Bunck A.C., Lennartz S., Dratsch T., Iuga A.I., Maintz D., Kottlors J. GPT-4 for automated determination of radiologic study and protocol based on radiology request forms: a feasibility study. Radiology. 2023;307(5) doi: 10.1148/radiol.230877. [DOI] [PubMed] [Google Scholar]
- 11.Nakaura T., Yoshida N., Kobayashi N., Shiraishi K., Nagayama Y., Uetani H., Kidoh M., Hokamura M., Funama Y., Hirai T. Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports. Jpn. J. Radiol. 2023;42(2):190–200. doi: 10.1007/s11604-023-01487-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Adams L.C., Truhn D., Busch F., Kader A., Niehues S.M., Makowski M.R., Bressem K.K. Leveraging GPT-4 for Post Hoc transformation of free-text radiology reports into structured reporting: a multilingual feasibility study. Radiology. 2023;307(4) doi: 10.1148/radiol.230725. [DOI] [PubMed] [Google Scholar]
- 13.Parikh J.R., Wolfman D., Bender C.E., Arleo E. Radiologist burnout according to surveyed radiology practice leaders. J. Am. Coll. Radiol. 2020;17(1):78–81. doi: 10.1016/j.jacr.2019.07.008. [DOI] [PubMed] [Google Scholar]
- 14.Gray B.R., Mutz J.M., Gunderman R.B. radiology as personal knowledge. Am. J. Roentgenol. 2020;214(2):237–238. doi: 10.2214/ajr.19.22073. [DOI] [PubMed] [Google Scholar]
- 15.Medina L.S., Blackmore C.C. Evidence-based radiology: review and dissemination. Radiology. 2007;244(2):331–336. doi: 10.1148/radiol.2442051766. [DOI] [PubMed] [Google Scholar]
- 16.Kottlors J., Bratke G., Rauen P., Kabbasch C., Persigehl T., Schlamann M., Lennartz S. Feasibility of differential diagnosis based on imaging patterns using a large language model. Radiology. 2023;308(1) doi: 10.1148/radiol.231167. [DOI] [PubMed] [Google Scholar]
- 17.Akinci D’Antonoli T., Stanzione A., Bluethgen C., Vernuccio F., Ugga L., Klontzas M.E., Cuocolo R., Cannella R., Koçak B. Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions. Diagn. Interv. Radiol. 2024;30(2):80–90. doi: 10.4274/dir.2023.232417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gertz R.J., Dratsch T., Bunck A.C., Lennartz S., Iuga A.I., Hellmich M.G., Persigehl T., Pennig L., Gietzen C.H., Fervers P., Maintz D., Hahnfeldt R., Kottlors J. Potential of GPT-4 for detecting errors in radiology reports: implications for reporting accuracy. Radiology. 2024;311(1) doi: 10.1148/radiol.232714. [DOI] [PubMed] [Google Scholar]
- 19.Forman H.P. Large language models as an inexpensive and effective extra set of eyes in radiology reporting. Radiology. 2024;311(1) doi: 10.1148/radiol.240844. [DOI] [PubMed] [Google Scholar]
- 20.Schmidt R.A., Seah J.C.Y., Cao K., Lim L., Lim W., Yeung J. Generative large language models for detection of speech recognition errors in radiology reports. Radiol. Artif. Intell. 2024;6(2) doi: 10.1148/ryai.230205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Li H., Moon J.T., Iyer D., Balthazar P., Krupinski E.A., Bercu Z.L., Newsome J.M., Banerjee I., Gichoya J.W., Trivedi H.M. Decoding radiology reports: potential application of OpenAI ChatGPT to enhance patient understanding of diagnostic reports. Clin. Imaging. 2023;101:137–141. doi: 10.1016/j.clinimag.2023.06.008. [DOI] [PubMed] [Google Scholar]
- 22.Truhn D., Weber C.D., Braun B.J., Bressem K., Kather J.N., Kuhl C., Nebelung S. A pilot study on the efficacy of GPT-4 in providing orthopedic treatment recommendations from MRI reports. Sci. Rep. 2023;13(1) doi: 10.1038/s41598-023-47500-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lecler A., Duron L., Soyer P. Revolutionizing radiology with GPT-based models: current applications, future possibilities and limitations of ChatGPT. Diagn. Interv. Imaging. 2023;104(6):269–274. doi: 10.1016/j.diii.2023.02.003. [DOI] [PubMed] [Google Scholar]
- 24.Nakaura T., Ito R., Ueda D., Nozaki T., Fushimi Y., Matsui Y., Yanagawa M., Yamada A., Tsuboyama T., Fujima N., Tatsugami F., Hirata K., Fujita S., Kamagata K., Fujioka T., Kawamura M., Naganawa S. The impact of large language models on radiology: a guide for radiologists on the latest innovations in AI. Jpn. J. Radiol. 2024 doi: 10.1007/s11604-024-01552-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kim S., Lee C.K., Kim S.S. Large language models: a guide for radiologists. Korean J. Radiol./Korean J. Radiol. 2024;25(2):126. doi: 10.3348/kjr.2023.0997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sorin V., Klang E. Large language models and the emergence phenomena. Eur. J. Radiol. Open. 2023;10 doi: 10.1016/j.ejro.2023.100494. [DOI] [PMC free article] [PubMed] [Google Scholar]



