Abstract
Large language models (LLMs) are used in radiology to simplify reports, translate findings, and support patient-facing communication, yet their clinical value and safety remain uncertain. This narrative review was conducted in accordance with the Scale for the Assessment of Narrative Review Articles (SANRA) quality criteria and synthesises evidence from 49 studies published between 2020 and 2025, focusing on clinician-mediated use of LLMs across four domains: report simplification, multilingual translation, patient education, and patient attitudes. Across studies, LLMs consistently improved readability by 2-6 grade levels, but only one randomised trial directly assessed patient comprehension. A professional review was required in up to 80% of outputs in controlled settings, compared with <10% in observational studies. Harmful factual errors were uncommon but non-negligible (0-10% depending on task and model). Translation performance was highest for high-resource languages, while semantic drift was more frequent in low-resource languages, necessitating bilingual review. Patients generally accepted AI-assisted communication when clinician oversight was explicit. Current regulatory and professional guidance support supervised, institution-hosted deployment. Evidence supports specific use cases, patient summaries, translation drafts, and educational materials, but does not justify autonomous deployment or direct patient self-use. Key evidence gaps remain in comprehension outcomes, workflow impact, and real-world validation.
Keywords: artificial intelligence, governance, large language models, patient communication, radiology
Introduction and background
Radiology is central to modern diagnostics, yet its outputs have traditionally remained inaccessible to patients. Reports are written in specialist terminology, and medical images are stored within picture archiving and communication systems (PACS), limiting patient understanding despite growing expectations for transparency [1-3]. Recent expansions in patient-centred care, including electronic health-record (EHR) portals and open-access initiatives, now provide patients with near-real-time access to radiology reports and, increasingly, to images [4,5]. This shift has exposed a persistent gap between information access and comprehension.
At the same time, radiologists are under increasing workload pressures. Rising imaging volumes, staffing shortages, and expanding communication expectations contribute to time constraints, cognitive load, and burnout. Report-related communication tasks such as explaining findings to clinicians, rewording reports for patients, or resolving misunderstandings are an under-recognised but significant burden [6].
Across clinical medicine, artificial intelligence has been associated with improvements in diagnostic accuracy, therapeutic support, operational efficiency, and patient safety across a wide range of clinical applications [7,8]. Large language models (LLMs), including models such as GPT-4 and Gemini and their deployment within platforms such as ChatGPT and Microsoft Copilot, have demonstrated the capacity to process complex medical text and generate patient-friendly explanations [8]. Early studies show that LLMs can simplify reports [9,10], translate findings into multiple languages [11], and produce educational material; however, safe use consistently relies on clinician-mediated workflows in which outputs are reviewed, corrected, and contextualised before communication with patients (Figure 1) [12].
Figure 1. Clinician-mediated workflow for LLM-assisted radiology communication.
Schematic overview of a clinician-mediated large language model (LLM) workflow for radiologist–patient communication. LLM-assisted report simplification, translation, and patient education are generated from the original radiology report and undergo mandatory clinician review and verification prior to patient release, ensuring professional oversight and accountability.
Source: Author-created figure.
Within this framework, translation represents a critical and frequently encountered component of radiology communication. In multilingual healthcare systems, linguistic accessibility is a key determinant of patient understanding, particularly for individuals with limited English proficiency [13]. Translation requests are common and time-intensive for clinicians, and delays or inaccuracies may exacerbate inequities in access to imaging information. When appropriately supervised, LLM-generated translations may offer a scalable means of supporting timely, culturally sensitive communication while reducing repetitive linguistic workload [14].
Despite these potential benefits, direct patient use of LLMs remains largely unevaluated and raises concerns related to misinformation, bias, privacy, and data-protection frameworks such as the Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR) [8,15,16]. Regulatory pathways, including the FDA’s Software as a Medical Device (SaMD) framework, the UK Medicines and Healthcare products Regulatory Agency (MHRA) AI roadmap, and the EU AI Act, currently provide limited guidance for non-diagnostic, patient-facing applications [17-19].
Clinician-mediated deployment, therefore, represents a pragmatic and safe intermediate step. It may enhance health literacy, improve patient engagement [12], reduce linguistic and educational barriers to understanding [20], and alleviate components of radiologist workload related to communication tasks, while maintaining essential professional oversight. This narrative review synthesises current evidence on clinician-mediated LLM use for radiology communication. It examines: (1) report simplification, (2) multilingual translation, (3) patient education, and (4) patient attitudes toward AI-assisted communication, alongside safety, governance, and regulatory considerations. Key terms are provided in Appendix 1.
Review
Materials and methods
This article is a narrative review, conducted and reported in accordance with the principles of the Scale for the Assessment of Narrative Review Articles (SANRA) [21], which emphasises clarity of rationale, explicit aims, justified literature search, appropriate referencing, scientific reasoning, and meaningful presentation of evidence.
A structured but non-systematic search of PubMed, Scopus, and Google Scholar was performed for English-language literature published between January 2020 and November 2025, capturing the period in which modern large language models (LLMs) became clinically relevant. Search terms included combinations of Medical Subject Headings (MeSH): “radiology”, “medical imaging”, “large language model”, “LLM”, “ChatGPT”, “GPT-4”, “patient communication”, “report simplification”, “translation”, “patient education”, “health literacy”, “AI communication”, “clinician-mediated”. Reference lists of included studies were manually screened and back-searched to identify additional relevant publications.
Studies were included if they examined or discussed the use of large language models to simplify, translate, or explain radiology reports or imaging information for patients or lay audiences; addressed clinician-mediated or supervised workflows; reported empirical findings or provided relevant ethical, regulatory, or governance analysis; and were published in English. Studies were excluded if they focused solely on diagnostic performance without a communication component, evaluated direct unsupervised patient use (outside the scope of this review), or consisted of commentaries, case reports, or opinion pieces without substantive analysis.
Grey literature on regulation and governance was included for contextual governance analysis rather than performance evaluation. It was retrieved from professional bodies (RCR, ACR), regulatory agencies (MHRA, FDA), data-protection authorities (ICO, HHS), and international organisations (WHO).
Given the heterogeneity of study designs and evaluation metrics (readability indices, accuracy ratings, translation metrics, patient surveys), findings were synthesised narratively, consistent with SANRA expectations. Empirical studies provide the primary evidence base for performance and safety outcomes, while ethical analyses and regulatory guidance are incorporated to contextualise clinical implementation and governance requirements. These complementary sources serve different evidentiary roles and are not treated as equivalent in weight. Extracted information included: Study design and context, communication task (simplification, translation, patient education), models evaluated, imaging modalities, evaluation methods (readability, semantic fidelity, safety, patient comprehension) and governance or regulatory relevance. Themes were grouped into the key clinical domains presented in the review.
Results
A total of 49 studies examining applications of LLMs in radiology met the inclusion criteria and were synthesised narratively. Study design, patient-facing status, and primary task are systematically presented in Table 1.
Table 1. Summary of patient-facing large language model studies included in the review (n = 49).
This table summarises the characteristics of all 49 studies included in the narrative review, classified by study design, patient-facing status, and primary task [8-12,20,22-64]. The included evidence spans prospective and retrospective evaluations, randomised controlled trials, qualitative and survey-based studies, comparative model assessments, and narrative or systematic reviews. Most studies evaluated patient-facing applications of large language models in radiology, including report simplification, readability enhancement, multilingual translation, patient education, consent support, and assessment of patient attitudes toward artificial intelligence.
| Citation Number | Study | Study design | Patient-facing? | Task |
| [22] | Lyu et al. 2023 | Prospective evaluation | Yes | Translate full radiology reports into plain language |
| [23] | Amin et al. 2023 | Retrospective multicentre study | Yes | Simplify full radiology reports |
| [9] | Jeblick et al. 2024 | Exploratory case study | Yes | Simplify radiology reports |
| [24] | Doshi et al. 2024 | Quantitative analysis | Yes | Simplify “Impression” sections of reports |
| [10] | Rahsepar et al. 2024 | Prospective comparative study | Yes | Enhance report readability |
| [12] | Park et al. 2024 | Single-centre evaluation | Yes | Generate patient-friendly MRI summaries |
| [25] | Sterling et al. 2024 | Quality and safety evaluation | Yes | Produce lay summaries of radiology reports |
| [20] | Tariq et al. 2024 | Model development study (preprint) | Yes | Generate literacy-tailored summaries |
| [26] | Maroncelli et al. 2024 | Pilot study with lay readers | Yes | Simplify breast imaging reports |
| [27] | Aydin et al. 2024 | Scoping review | Yes | Review patient-facing LLM applications in imaging |
| [28] | Berigan et al. 2024 | Randomised controlled trial | Yes | Provide LLM-generated summaries versus usual care |
| [29] | Gupta M. et al. 2024 | Readability study | Yes | Simplify radiological information |
| [30] | Tepe et al. 2024 | Quantitative evaluation | Yes | Simplify reports and classify urgency |
| [31] | Butler et al. 2024a | Retrospective evaluation | Yes | Simplify foot and ankle radiology reports |
| [32] | Butler et al. 2024b | Retrospective evaluation | Yes | Simplify knee MRI reports |
| [33] | Tang et al. 2024 | Method development study | Yes | Generate colloquial radiology summaries |
| [34] | Kuckelman et al. 2024 | Prospective evaluation | Yes | Summarise musculoskeletal MRI reports |
| [35] | Cesur et al. 2024 | Quantitative evaluation | Yes | Simplify non-English MRI findings |
| [36] | Gupta et al. 2025 | Prospective clinical study | Yes | Deliver simplified reports to oncology patients |
| [37] | Hu et al. 2025 | Cross-model evaluation | Yes | Simplify radiology reports |
| [38] | Sunshine et al. 2025 | Quality and understandability study | Yes | Simplify radiology report summaries |
| [39] | van Driel et al. 2025 | Prospective evaluation | Yes | Simplify Dutch radiology reports |
| [40] | Stephan et al. 2025 | Prospective evaluation | Yes | Simplify AI-generated dental imaging reports |
| [41] | Herwald et al. 2025 | System development and evaluation | Yes | Provide personalised explanations and Q&A |
| [42] | Lee H.S. et al. 2025 | Retrospective evaluation | Yes | Examine accuracy–simplification trade-offs |
| [43] | Bozer et al. 2025 | Comparative evaluation | Yes | Generate patient-friendly imaging explanations |
| [44] | Butler et al. 2025 | Retrospective evaluation | Yes | Simplify hand and wrist radiology reports |
| [45] | Can et al. 2025 | Comparative model evaluation | Yes | Simplify interventional radiology reports |
| [46] | Sarangi et al. 2023 | Simplification and translation evaluation | Yes | Simplify and translate radiology reports into lay-friendly Hindi |
| [47] | Meddeb et al. 2024 | Multilingual translation evaluation | Yes | Translate CT and MRI free-text radiology reports across multiple languages |
| [48] | Gupta et al. 2024 | Comparative retrospective evaluation | Yes | Translate CT report impressions into simple Hindi |
| [49] | Khanna et al. 2024 | Pilot bilingual physician evaluation | Yes | Translate radiology reports into multiple non-English languages |
| [50] | Gulati et al. 2024 | Evaluation/commentary with empirical testing | Yes | Translate radiology report segments into multiple languages |
| [11] | Terzis et al. 2025 | Prospective multicentre evaluation | Yes | Near real-time translation of radiology reports into multiple languages |
| [8] | Keshavarz et al., 2024 | Systematic review | Indirectly (patient-facing outputs assessed) | Review ChatGPT performance, pitfalls, and future perspectives in radiology |
| [51] | Haver et al., 2024 | Retrospective exploratory evaluation | Yes | Simplify patient-centred information on breast cancer prevention and screening |
| [52] | Gordon et al., 2024 | Prospective evaluation | Yes | Answer common imaging-related patient questions and assess readability |
| [53] | McCarthy et al., 2023 | Comparative evaluation | Yes | Deliver interventional radiology patient education content |
| [54] | Zaki et al., 2024 | Readability intervention evaluation | Yes | Improve readability of interventional radiology procedure descriptions |
| [55] | Scheschenja et al., 2024 | Comparative model evaluation | Yes | Provide in-depth patient education prior to interventional radiology procedures |
| [56] | Hofmann & Vairavamurthy, 2024 | Cross-sectional physician evaluation | Yes | Deliver interventional radiology procedural information during consent |
| [57] | Kaba et al., 2025 | Prospective expert evaluation | Yes | Explain potential complications of interventional radiology procedures |
| [58] | Baghdadi et al. 2024 | Cross-sectional survey study | Yes | Assess patient attitudes toward the use of AI as a diagnostic tool in radiology |
| [59] | Ibba et al. 2024 | Cross-sectional survey study | Yes | Evaluate patient perceptions of AI–radiologist interaction |
| [60] | Hemphill et al. 2023 | Narrative review | Yes | Synthesize patient perspectives on the implementation of AI in radiology |
| [61] | Royal College of Radiologists, 2025 | Public perception survey / report | Yes | Assess public perceptions of AI use in radiology |
| [62] | Currie et al. 2024 | Comparative evaluation | Yes | Generate and compare patient information in nuclear medicine using LLMs |
| [63] | Glenning & Gualtieri 2023 | Qualitative / survey-based study | Yes | Explore patient perspectives on AI in medical imaging |
| [64] | Fanni & Neri 2024 | Commentary with perspective synthesis | Yes | Examine patient roles and perspectives in adoption of AI in radiology |
Simplification of Radiology Reports
28 studies were identified that explored LLMs as tools to generate patient-friendly summaries within supervised workflows (Table 2). These included randomised, prospective and observational evaluations, with sample sizes from 3 to 1982 radiology reports across multiple imaging modalities [8-12,20,22-64]. Across studies, LLM-generated summaries were associated with higher clarity ratings and patient-reported confidence, though the degree of revision required varied by clinical context.
Table 2. Studies evaluating LLM-based simplification of radiology reports for patient use.
Summary of published and preprint studies (2020–2025) investigating the use of LLMs to generate simplified or patient-readable versions of radiology reports [9,10,12,20,22-45]. Each study is summarised by design, cohort or dataset, imaging modality, model type, evaluation approach, and principal outcomes. “Patient-facing” denotes studies in which the primary aim was to improve accessibility, readability or comprehension of radiology outputs for patients or lay readers. Included studies encompass experimental, observational, methodological and review designs addressing patient-oriented applications of LLMs within radiology reporting workflows. This table provides illustrative examples identified through a non-systematic literature search and is not intended as an exhaustive list.
Abbreviations: LLM: Large language model; RCT: Randomised controlled trial; CT: Computed tomography; MRI: Magnetic resonance imaging; MSK: Musculoskeletal; US: Ultrasound; CXR: Chest radiograph; LDCT: Low-dose computed tomography; FRES: Flesch Reading Ease Score; FKGL/FKRL: Flesch–Kincaid Grade/Reading Level; PEMAT: Patient Education Materials Assessment Tool; κ: kappa statistic.
| Citation Number | Study | Study design | Patient facing? | Task | Data (n, modality) | Models | Evaluation | Key outcomes |
| [22] | Lyu et al. 2023 | Prospective evaluation | Yes | Translate full reports into plain language | 138 reports (LDCT; brain MRI) | ChatGPT-3.5; GPT-4 | Radiologist accuracy & completeness ratings | Mean 4.27/5; low omissions/misinformation; GPT-4 improved results |
| [23] | Amin et al. 2023 | Retrospective multicentre | Yes (readability-focused) | Simplify full reports | 254 reports (CT/MRI/US) | ChatGPT-3.5; Bard; Bing | Readability + fidelity review | Reduced reading level; ChatGPT-3.5 & Bing most accurate |
| [9] | Jeblick et al. 2024 | Exploratory case study | Yes | Simplify reports | 3 synthetic reports | ChatGPT | Radiologist correctness/completeness | Understandable; some omissions |
| [24] | Doshi et al. 2024 | Quantitative analysis | Yes | Simplify Impression sections | Multi-institutional set | Multiple LLMs incl. GPT-4 | Expert scoring; readability | High readability; safety considerations |
| [10] | Rahsepar et al. 2024 | Prospective comparison | Yes | Enhance readability | Varied reports | Four LLMs | Readability indices + review | Improved readability (P<.05) |
| [12] | Park et al. 2024 | Single-centre evaluation | Yes | Patient-friendly MRI summaries | 685 spine MRI reports | LLM pipeline | Quality; accuracy; consistency | Acceptable accuracy: consistency varied |
| [25] | Sterling et al. 2024 | Quality & safety evaluation | Yes | Lay summaries | 1,982 summaries (varied) | Multiple LLMs | Physician safety & quality | 80.6% Very Good; modality variation |
| [20] | Tariq et al. 2024 | Model development (preprint) | Yes | Literacy-tailored summaries | Dataset per preprint | Custom LLM | Automated + human review | Better understanding across literacy levels |
| [26] | Maroncelli et al. 2024 | Pilot with lay readers | Yes | Simplify breast imaging reports | 21 reports | ChatGPT-4o | Readability + lay comprehension | Good clarity; feasible |
| [27] | Aydin et al. 2024 | Scoping review | Yes | Review of patient-facing uses | NA | Various LLMs | Narrative synthesis | Imaging uses emerge Nng |
| [28] | Berigan et al. 2024 | Randomised controlled trial | Yes | Provide LLM summaries vs control | Randomised patient cohort | LLM pipeline | Patient comprehension + usability | Improved comprehension vs control |
| [29] | Gupta M. et al. 2024 | Readability study | Yes | Simplify education material | 100 texts | ChatGPT-3.5; GPT-4 | Readability + expert review | Higher readability; accurate |
| [30] | Tepe et al. 2024 | Quantitative evaluation | Yes | Simplify + classify urgency | 30 reports | ChatGPT-4; Bard; Copilot | Readability; PEMAT; urgency accuracy | >70% understandability; variable urgency accuracy |
| [31] | Butler et al. 2024 | Retrospective evaluation | Yes | Simplify foot/ankle reports | 300 reports | LLM (prompted) | Readability; accuracy; hallucinations | Improved readability; accuracy ~4/5; 4–7% hallucinations |
| [32] | Butler et al. 2024 | Retrospective evaluation | Yes | Simplify knee reports | 300 reports | LLM (prompted) | Readability + hallucination rate | Improved FRES/FKGL; 2–5% hallucinations |
| [33] | Tang et al. 2024 | Method development | Yes | Colloquial summaries | 100 neuroradiology reports | General LLM + prompting | Radiologist accuracy + readability | Accuracy ↑20%; optimal 8th-grade level |
| [34] | Kuckelman et al. 2024 | Prospective evaluation | Yes | Summaries of MSK MRI | 60 MSK MRI reports | ChatGPT-4 | Accuracy/completeness; kappa | Mostly correct; some harmful; k≈0.3 |
| [35] | Cesur et al. 2024 | Quantitative evaluation | Yes | Simplify Turkish MRI findings | 50 synthetic findings | GPT-4; Gemini; Claude; Perplexity | Ratings + readability | GPT-4/Gemini/Claude ≈4.8–4.9/5 |
| [36] | Gupta et al. 2025 | Prospective clinical study | Yes | Deliver simplified reports to oncology patients | Oncology cohort | General LLM | Patient comprehension + satisfaction | Improved understanding & confidence |
| [37] | Hu et al. 2025 | Cross-model evaluation | Yes | Simplify reports | Multimodality test set | Nine LLMs | Readability + quality checks | Improved readability; variable performance |
| [38] | Sunshine et al. 2025 | Quality/understandability | Yes | Simplify report summaries | Reported in paper | General LLM | Clinician rating; error review | More accessible; oversight needed |
| [39] | van Driel et al. 2025 | Prospective evaluation | Yes | Simplify Dutch reports | Varied reports | GPT-4 | Patient comprehension + satisfaction | Significant comprehension improvements |
| [40] | Stephan et al. 2025 | Prospective evaluation | Yes | Simplify AI-generated dental reports | Mixed modalities | ChatGPT | Patient understanding; readability | Improved communication & comprehension |
| [41] | Herwald et al. 2025 | System development + evaluation | Yes | Personalised explanations + Q&A | Varied reports | RadGPT | Quality & educational value scoring | High-quality explanations |
| [42] | Lee H.S. et al. 2025 | Ethics analysis | Yes | Accuracy vs simplification trade-off | Sample excerpts | GPT-4 | Semantic fidelity + readability | Accuracy–clarity tension; verification needed |
| [43] | Bozer et al. 2025 | Comparative evaluation | Yes | Patient-friendly explanations | 100 CT/MRI reports | ChatGPT; Gemini; Copilot | Readability; PEMAT; expert ratings | ChatGPT best readability |
| [44] | Butler et al. 2025 | Retrospective evaluation | Yes | Simplify hand/wrist reports | 300 reports | LLM (prompted) | Readability; accuracy; hallucinations | <8th grade; hallucinations 3–6% |
| [45] | Can et al. 2025 | Comparative model evaluation | Yes | Simplify IR reports | 109 IR reports | GPT-4; GPT-3.5; Claude-3; Gemini; Mistral | Readability + qualitative scoring | GPT-4/Claude-3 best; some harmful errors |
Prospective patient-facing studies, including Gupta et al. in oncology clinics and van Driel et al. in Dutch outpatients, reported higher comprehension and satisfaction, typically with only minor edits before release [36,39]. By contrast, the only randomised trial by Berigan et al. found that although summarisation improved readability, substantial manual editing was required in 80% of cases to remove speculative or overly confident statements, contributing to median delays of 2.94-4.21 days before patients received the simplified report via the portal [28].
Across large observational and safety-focused evaluations, clinician-rated accuracy for LLM-generated summaries ranged from moderate to high (≈3.9-4.3/5) [44], although inter-rater agreement was limited (κ ≈ 0.3) [34], indicating that acceptability is highly context-dependent. Clinically relevant hallucinations or qualifier distortions were uncommon but persistent, occurring in approximately 2-7% of outputs [31,32], particularly in musculoskeletal reports. Sterling et al. demonstrated feasibility at scale in 1,982 summaries, with 80.6% rated “very good” by clinicians, though error profiles varied by modality and report complexity [25].
Technical and cross-model studies by Lyu et al., Rahsepar et al., and Hu et al. showed uniformly improved readability across LLMs and modalities, with error rates of 0-5% but recurrent omission of nuance, model variability, and occasional over-simplification [10,22,37]. Across studies, readability improvements were quantitatively consistent: Flesch Reading Ease Scores (FRES) improved by 2-6 grade levels, Flesch-Kincaid Grade Level (FKGL) by 3-5 grade levels, with similar patterns in other validated indices [10,24,45]. However, Lee et al. [42] emphasise that outputs only maintain clinical accuracy at approximately an 11th-grade reading level, whereas the 7th-grade standard remains ethically preferable for informed consent and accessible communication.
Prompting strategies have also been explored. Doshi et al. found that specifying a school grade level improved readability for ChatGPT-3.5 and GPT-4 but not for Bard or Bing, showing differences in responsiveness to literacy cues between models [24]. Conversely, Amin et al. demonstrated that a minimal prompt, such as “simplify this radiology report” produced high readability and acceptable accuracy across models, showing that extensive prompt engineering may not be required for routine summarisation tasks [23].
Translation, Cultural Adaptation, and Accessibility
Six studies explored multilingual LLMs as tools to bridge language barriers in radiology by translating report content into non-English languages with sample sizes ranging from 3 to 100 radiology reports (Table 3) [11,46-50]. Across studies, performance varied substantially between languages and between high- and low-resource linguistic settings. In the largest prospective multicentre evaluation, Terzis et al. reported median expert quality scores of 4.5/5 for English, French, and Spanish outputs, compared with 4.0/5 for Russian, despite similar processing times (9-24 seconds per report), indicating that reduced accuracy was not attributable to workflow latency but to linguistic complexity [11]. Meddeb et al. [47] similarly demonstrated high translation quality for high-resource languages (English, Italian, French, German, Chinese), but observed marked reductions in medical terminology fidelity for lower-resource languages, including Swedish, Turkish, Russian, Greek, and Thai, across 100 CT and MRI free-text reports.
Table 3. Studies evaluating LLM-based translation of radiology reports into non-English languages.
Summary of published and preprint studies (2020–2025) investigating the use of LLMs for translating radiology reports into languages other than English [11, 46-50]. Each study is summarised by design, language(s) evaluated, imaging modality or report type, LLM model(s) used, evaluation framework, and principal findings. “Translation” in this context refers to the direct or simplified cross-lingual rendering of radiology report text aimed at improving accessibility for patients or clinicians with limited English proficiency. This table provides illustrative examples identified through a non-systematic literature search and is not intended as an exhaustive list.
Abbreviations: CT: Computed tomography; MRI: Magnetic resonance imaging; GPT: Generative Pretrained Transformer; BLEU: Bilingual Evaluation Understudy; METEOR: Metric for Evaluation of Translation with Explicit Ordering; TER: Translation Error Rate; chrF: character-level F-score.
| Citation Number | Study | Study design | Languages | Modality / Text | Models | Evaluation | Key findings |
| [46] | Sarangi et al. 2023 | Simplification + translation evaluation | Hindi (lay-friendly Hindi) | Mixed radiology reports | ChatGPT (GPT-3.5 era) | Readability + clinician accuracy review | Generally comprehensible; some nuance lost; clinician oversight required |
| [47] | Meddeb et al. 2024 | Multilingual translation evaluation | High-resource: English, Italian, French, German, Chinese; Low-resource: Swedish, Turkish, Russian, Greek, Thai | CT & MRI free-text reports | Multiple LLMs | Accuracy & quality across languages | High accuracy in high-resource languages; reduced fidelity in low-resource languages |
| [48] | Gupta et al. 2024 | Comparative retrospective evaluation | Hindi (simple Hindi) | 100 CT report impressions | GPT-4o; GPT-4; Gemini; Claude Opus | BLEU; METEOR; TER; chrF + expert review | Usable Hindi translations; quality varied by model and prompt |
| [49] | Khanna et al. 2024 | Pilot bilingual physician evaluation | Vietnamese; Tagalog; Spanish; Mandarin; Arabic (+ Hindi pilot) | Selected radiology reports | ChatGPT-4 | Bilingual accuracy & clarity ratings | Mixed accuracy: idiom and nuance issues noted |
| [50] | Gulati et al. 2024 | Evaluation/commentary with empirical tests | Spanish; Arabic; Mandarin; Hindi; Vietnamese (examples shown) | Radiology report segments | ChatGPT (GPT-4 class) | Expert qualitative review | Generally accurate translations: safeguards recommended |
| [11] | Terzis et al. 2025 | Prospective multicentre evaluation | English; French; Spanish; Russian | Mixed-modality radiology reports | GPT-4o | Translation quality + processing time | Near real-time translation feasible; lower accuracy in Russian |
Targeted single-language evaluations showed comparable trends. In Hindi translations of 100 CT impression sections, Gupta et al. [48] demonstrated strong prompt dependence, with Bilingual Evaluation Understudy (BLEU) scores improving from 0.098 to 0.281 and Metric for Evaluation of Translation with Explicit Ordering (METEOR) from 0.297 to 0.547 after prompt optimisation, although clinician review still identified omission errors affecting diagnostic nuance. In contrast, Khanna et al. [49] reported mixed bilingual clinician agreement across five target languages, with lower perceived accuracy in Arabic despite acceptable performance in Hindi, Spanish, Tagalog, Vietnamese, and Mandarin. Across these studies, translations in high-resource languages received higher expert ratings than those in lower-resource or morphologically complex languages.
Patient-Education Material Generation
Eight studies evaluated LLMs as adjuncts for patient education in radiology, generating plain-language explanations, consent-style material, and responses to common imaging questions [8,51-57]. Across comparative studies, similar findings were reported: LLM outputs are generally accessible in tone and empathetic in style, yet readability frequently exceeds recommended health-literacy thresholds and content precision varies by context [51-54].
Several studies have benchmarked LLM-generated educational content against established radiology information resources. For example, McCarthy et al. compared ChatGPT-produced explanations with Society of Interventional Radiology materials and found that although the chatbot’s responses were conversational and approachable, they were more verbose, scored lower on patient-education suitability, and contained factual inaccuracies in approximately 10% of items reviewed [53]. Gordon et al. similarly demonstrated that prompting can improve accuracy and relevance, yet readability remained at a college level, well above the eighth-grade target for patient-facing information [52].
Evidence from interventional radiology (IR) patient-consent contexts showed similar results. Scheschenja et al. found that GPT-4 produced answers rated highly for accuracy and safety, with no harmful errors detected [55], while Hofmann et al. reported that practising IR clinicians judged GPT-4 explanations to be accurate and comprehensible but noted gaps in completeness and alignment with local procedural norms, particularly among senior reviewers [56]. Kaba et al. further demonstrated strong overall accuracy across 25 IR procedures, yet readability levels remained well above lay comprehension standards [57].
Overall, binary or quasi-binary measures showed non-trivial inaccuracy rates, ranging from ~11.5% incorrect answers in a societal benchmarking study (12/104) to ~13-17% inaccurate responses when answering common imaging questions (by rater-defined accuracy rubrics) [52,53]. In IR procedure education tasks rated on Likert scales, “mostly incorrect” responses occurred in ~2.3-5.3% of outputs, while no potentially harmful responses were identified in that dataset [55].
Patient Attitudes and Acceptability
Survey-based research across seven studies consistently indicates that patients are cautiously receptive to AI in radiology, particularly when it is framed as a supplement to, rather than a replacement for, clinician expertise [58-64]. Across studies, most patients continued to prefer confirmation or interpretation from a radiologist or referring clinician, even when AI-generated summaries are available [58-61].
Empirical evaluations of educational materials reinforce these themes. Currie et al. compared GPT-3.5 and GPT-4 for generating nuclear medicine patient information sheets across several procedure types, finding that GPT-4 produced outputs rated as more accurate, empathetic, and clinically appropriate, whereas GPT-3.5 content was more variable and occasionally outdated [62]. Broader qualitative analyses, such as those synthesised by Glenning et al., similarly report that patients respond positively to simplified, conversational explanations and perceive greater engagement, although a subset describe unease when the language feels excessively confident or anthropomorphic [63].
Across survey-based studies, higher trust ratings were reported among younger participants and those with greater digital literacy or familiarity with technology, while respondents across demographic groups consistently reported a preference for professional review and explicit disclosure of AI involvement [12,38,64].
Safety and Implementation
Despite encouraging early data, LLMs are not fail-safe. Their stochastic nature means that identical prompts can produce variable outputs, occasionally fabricating details or omitting key information. Robust safety processes are therefore essential. Practical implementation requirements for professionally supervised deployment are operationalised in Table 4, which provides structured governance.
Table 4. Safety checklist for clinician-mediated LLM use in radiology.
Safety checklist outlining core governance, oversight, and quality-assurance measures for the clinician-mediated use of large language models (LLMs) in radiology communication workflows. Source: Author synthesis based on [15,16].
Abbreviations: NHS DSPT: National Health Service (UK) Data Security and Protection Toolkit; GDPR: General Data Protection Regulation
| Domain | Checklist Item | Purpose |
| Governance | Deploy within a secure, institution-approved environment | Prevents data leakage; ensures compliance with NHS DSPT and GDPR |
| Input control | Use anonymised, structured report text only | Eliminates inadvertent disclosure of identifiers |
| Clinician oversight | Mandatory review and approval before patient release | Maintains diagnostic accountability |
| Audit trail | Version-controlled storage of prompts and outputs | Enables traceability and quality assurance |
| Education | Provide training in prompt design and AI limitations | Mitigates over-reliance and misuse |
| Monitoring | Periodic sampling for factual accuracy and tone | Supports continuous improvement and governance reporting |
| Boundaries | Prohibit autonomous patient-chat functionality | Prevents unverified clinical advice or misinterpretation |
Clinical Risks
General-purpose LLMs are not validated for diagnostic accuracy and may generate confident but incorrect statements [8,65-67]. Uncorrected inaccuracies could lead to inappropriate reassurance or unnecessary anxiety, for example, mishandling of qualifiers such as “cannot exclude malignancy” may invert a report’s intended message. Outputs may also diverge from clinical guidelines or fail to reflect uncertainty when summarising incidental findings [8,29,42,68].
Even factually correct statements can alter patient perception through changes in tone or certainty framing. Without professional mediation, such shifts risk breaching standards for communication quality and informed consent outlined in the UK GMC’s Good Medical Practice and regulatory expectations for Software as a Medical Device [17,69,70]. Clear disclaimers, transparent acknowledgement of model limitations, and escalation pathways for detected errors are therefore mandatory components of safe deployment.
Privacy and Security
Radiology data are inherently identifiable, particularly when linked to anatomy or demographics [71,72]. The use of LLMs introduces additional exposure points at data input and retention. Safe practice demands that identifiable content is never transmitted to consumer-facing platforms that store prompts for model training, an action incompatible with GDPR, HIPAA, and NHS Data Security Standards [73-75]. Risks are amplified in multimodal systems capable of reconstructing features from partially anonymised images [76].
Equity and Access
LLMs can enhance accessibility but may also widen disparities if ungoverned. Models trained predominantly on English-language, high-resource data perform less accurately and less empathetically for speakers of underrepresented languages or for patients with limited health literacy [11,49,77,78].
Digital inequities further influence who benefits from access to devices, reliable internet, and confidence with patient portals remains uneven [79-81]. In resource-limited contexts, dependence on free, unregulated models could amplify misinformation and erode trust [79,82]. Responsible implementation, therefore, requires multilingual validation, provision of analogue alternatives for digitally excluded patients, and ongoing audit of differential impact.
Governance and Regulation
Within supervised workflows, radiologists remain accountable for patient-facing explanations derived from LLMs. Liability rests with the clinician-endorsed output, not the underlying model. This distinction separates clinician-mediated use from unsupervised patient self-use, which falls outside the formal duty of care [83-85]. These two contexts differ materially in verification requirements, liability pathways, and documentation expectations, as summarised in Table 5.
Table 5. Consent, disclosure, and documentation requirements for patient-facing LLMs in radiology.
The table outlines recommended patient disclosures, consent mechanisms, clinical record documentation, responsible sign-off, and traceability controls to support safe, transparent, and accountable deployment. Abbreviations: AI = artificial intelligence; LLM = large language model; IR = interventional radiology; EHR = electronic health record; IRB = institutional review board; REC = research ethics committee.
Source: Author synthesis. Note: Disclosure standards align with GMC Good Medical Practice [88], FDA SaMD labelling guidance [17,89], and EU MDR Annexe XIV clinical evaluation requirements [86, 90]. Research or pilot use should follow ISO 14155 and ICH-GCP for documentation, consent and audit trails [91,92].
| Clinical Setting / Use Case | Minimum Disclosures to Patient | Consent Mechanism | Documentation in Clinical Record | Responsible Sign-off | Traceability / Version Control |
| Patient portal displaying AI-simplified report summary | AI involvement; limitations; official report prevails, when to seek care | Inline tick-box or portal disclaimer | Copy of AI summary stored with model version/date | Reporting radiologist or governance lead | Model ID, version, and prompt template retained in audit log |
| Translated radiology report (LLM-assisted) | Disclosure of AI translation; confirmation of bilingual review; limits of automatic translation | Portal prompt before translation | Record of review and sign-off attached to report | Bilingual clinician / radiology department | Original and translated text linked by version ID |
| AI-generated patient-education or consent information (e.g., IR procedures) | Statement that content was AI-generated and clinician-verified; not a substitute for face-to-face consent | Tick-box acknowledgement of AI-assisted material | Copy archived in consent record or EHR | Interventional radiologist / consent supervisor | Version control via LLM audit trail; timestamped approval |
| Pilot or research deployment of LLM interface | Explicit research consent; description of data handling and withdrawal rights | Written informed consent under IRB/REC approval | Full log of patient interaction stored securely | Principal investigator | Traceability per ISO 14155 / GCP standards |
Current regulatory frameworks (FDA SaMD, MHRA AI programme, EU MDR/AI Act) focus on diagnostic AI and provide limited guidance for generative communication tools [77,86,87]. As a result, deployment relies on local governance rather than external certification. In practice, LLM-assisted communication should be governed similarly to PACS or EHR systems: institution-approved hosting, data protection compliance, documented verification, staff training, and audit.
For radiology departments, pragmatic governance requirements include designated clinical accountability, version control of prompts and outputs, explicit AI disclosure to patients, routine quality assurance sampling, and clear escalation pathways for detected errors or complaints. Until dedicated regulatory pathways emerge, institution-hosted, clinician-supervised deployment remains the only defensible model.
Discussion
This review synthesises current evidence on patient-facing applications of large language models (LLMs) in radiology and, to our knowledge, is the first to focus specifically on radiologist-patient communication. Across studies, the clearest conclusion is that LLMs can serve as effective communication adjuncts when embedded within clinician-mediated workflows. By contrast, direct patient self-use remains unvalidated and carries well-documented risks relating to misinformation, privacy, inequity, and lack of contextual interpretation [22,25,26,28,36,38-41,51].
A critical limitation pervades current evidence: the near-exclusive reliance on readability metrics (FRES, FKGL, Gunning Fog, SMOG, Dale-Chall, ARI) as proxies for comprehension. While 28 studies demonstrate 2-6 grade-level improvements in reading difficulty, only Berigan et al. directly tested whether patients understood content better or made different clinical decisions [28]. Readability measures linguistic surface structure, sentence length, syllable count, but not whether patients correctly interpret conditional language ("possible", "cannot exclude"), radiological uncertainty, incidental findings, or follow-up recommendations. These nuances carry significant clinical consequences: misinterpretation of qualifiers risks inappropriate reassurance or unnecessary anxiety, yet no study has evaluated whether simplified reports reduce such misunderstandings. Until comprehension is directly tested through patient interviews, decision tasks, adherence outcomes, or anxiety measures, the clinical value of improved readability remains technically demonstrated but clinically unvalidated.
Clinical Implications
Current evidence supports specific, supervised use cases in three domains: 1. Post-hoc report explanation via patient portals represents the most mature and immediately deployable application. When appended to formal reports with explicit AI disclosure and mandatory radiologist verification, LLM-generated summaries can improve patient understanding without altering diagnostic responsibility. Importantly, feasibility and turnaround time are highly sensitive to institutional verification thresholds and report complexity, underscoring that successful deployment depends more on workflow design than model selection. 2. Multilingual translation draft generation offers a pragmatic approach to addressing communication inequities for patients with limited English proficiency. Evidence supports reliable performance in high-resource languages, while translations in lower-resource or morphologically complex languages remain vulnerable to semantic drift. As a result, translation workflows must be explicitly framed as drafting tools, with bilingual clinician verification, version control, and clear escalation pathways embedded into routine practice. 3. Patient education and consent material supplementation, particularly in interventional radiology, can reduce repetitive explanatory workload by generating first-pass content. However, current outputs frequently exceed recommended health-literacy thresholds and exhibit variable factual accuracy, reinforcing that LLM-generated materials should supplement, not replace, clinician-led discussion and institutional consent processes.
Evidence Gaps and Research Priorities
Current evidence is dominated by proof-of-concept evaluations and small observational cohorts, with limited assessment of true patient comprehension or downstream behavioural impact. Key research priorities, therefore, include the development of radiology-specific comprehension metrics that extend beyond readability indices or translation scores such as BLEU; evaluation across diverse linguistic, cultural, and educational populations to avoid perpetuating inequities; and quantification of workflow, time-saving, and cost impacts, including effects on radiologist workload, burnout, and communication quality. Additional priorities include the creation of harmonised verification and transparency protocols, incorporating explicit guardrails for uncertainty, incidental findings, and follow-up recommendations, as well as the establishment of shared datasets comprising original reports, LLM-generated outputs, and clinician-verified summaries to support reproducibility and benchmarking. Multicentre collaborations will be essential to validate safe deployment at scale.
Limitations
This narrative review is not a systematic review and includes emerging and preprint literature reflective of the field's rapid development. Evidence regarding unsupervised patient use remains absent, restricting conclusions to supervised contexts. Additional limitations include potential publication bias favouring positive findings, geographic concentration of studies in high-income English-speaking countries, and heterogeneity of outcome measures precluding quantitative meta-analysis. The non-systematic search strategy, while comprehensive in scope, may have missed relevant studies in non-indexed sources or non-English publications. Grey literature on regulation and governance may not represent complete or current policy positions. As the evidence base grows, future systematic reviews and real-world evaluations will refine and extend the findings summarised here.
Conclusions
Large language models demonstrate clear technical capability as clinician-mediated communication adjuncts in radiology, with consistent evidence of improved linguistic accessibility, functional multilingual translation in high-resource languages, and utility in generating first-pass patient education materials. However, current evidence primarily demonstrates improvements in readability rather than comprehension, leaving the clinical impact on patient understanding, decision-making, and outcomes largely unvalidated. Across use cases, safety and feasibility are determined less by model choice than by workflow design, verification thresholds, and institutional governance. Accordingly, near-term implementation should be limited to supervised applications-patient portal summaries, translation drafts with bilingual review, and supplementary educational content-delivered within secure, institution-hosted environments with explicit clinician accountability. Advancing clinical utility will require a shift toward comprehension-centred outcomes, rigorous evaluation across diverse populations, and implementation science focused on workflow, equity, and professional impact. Until such evidence matures, LLMs should be integrated cautiously to augment, not replace, radiologist expertise in patient communication.
Appendices
Appendix 1
Glossary of Key Terms
Table 6. Glossary of Key Terms.
Glossary of key terms and abbreviations used in this review relating to large language models, clinician-mediated workflows, patient-facing radiology communication, readability and comprehension metrics, and governance concepts.
| Term | Definition |
| Large Language Model (LLM) | A generative artificial intelligence model trained on large text corpora, capable of producing natural-language outputs used here for report simplification, translation, and patient education. |
| Clinician-Mediated Use | A workflow in which all AI-generated patient-facing content is reviewed, corrected, and authorised by a clinician prior to release. |
| Direct Patient Self-Use | Patient interaction with an LLM outside formal clinical workflows, without clinician verification or institutional governance. |
| Patient-Facing Content | Radiology-related information made accessible to patients, including simplified reports, translations, and educational materials. |
| Post-hoc Report Explanation | Simplified or plain-language summaries appended to completed radiology reports, without altering the original diagnostic content. |
| Plain-Language Summary | A rewritten version of a radiology report intended to improve accessibility by reducing technical vocabulary and sentence complexity. |
| Multilingual Translation Draft Generation | Use of LLMs to produce preliminary translations of radiology reports into non-English languages, requiring bilingual clinician verification. |
| Semantic Fidelity | Preservation of the original clinical meaning when simplifying or translating radiology content. |
| Semantic Drift | Loss or distortion of diagnostic meaning, qualifiers, or conditional phrasing despite grammatically fluent output. |
| Clinically Relevant Error | An inaccuracy or omission that could plausibly affect patient understanding, anxiety, decision-making, or follow-up behaviour. |
| Readability | Linguistic ease of text estimated using formula-based indices that assess sentence length, word complexity, and syllable count. |
| Comprehension | A patient’s accurate understanding of findings, uncertainty, and clinical implications, beyond surface linguistic simplicity. |
| Readability Metrics | Quantitative indices used to estimate textual difficulty; they do not directly measure comprehension or clinical understanding. |
| Flesch Reading Ease Score (FRES) | A readability metric scoring text from 0–100, calculated from average sentence length and syllables per word; higher scores indicate easier reading. |
| Flesch–Kincaid Grade Level (FKGL) | A readability formula estimating the U.S. school grade level required to understand a text, based on sentence length and syllable count. |
| Gunning Fog Index | An index estimating years of formal education needed to understand text on first reading, weighted toward sentence length and proportion of complex words. |
| SMOG Index | The Simple Measure of Gobbledygook; estimates years of education required based on the number of polysyllabic words in a text sample. |
| Dale–Chall Readability Score | A readability metric using a list of familiar words to estimate text difficulty; higher scores indicate more complex text. |
| Automated Readability Index (ARI) | A readability formula estimating grade level using characters per word and words per sentence rather than syllable counts. |
| Health-Literacy Threshold | Recommended reading level for patient-facing health information, typically between 7th and 9th grade. |
| Prompt Engineering | Modification of LLM input instructions to influence readability, tone, or accuracy of outputs. |
| Workflow-Sensitive Risk | Variation in safety, accuracy, and feasibility driven primarily by verification structure, release thresholds, and clinical context rather than model choice. |
| Verification Threshold | The institutional standard determining whether AI-generated content is released, revised, or rejected. |
| Human-in-the-Loop | Deployment model requiring active human review and approval of each AI output before clinical use. |
| Audit Trail | A time-stamped record of AI inputs, outputs, edits, and approvals enabling accountability and retrospective review. |
| Electronic Health Record (EHR) Portal | A patient-accessible digital platform providing access to radiology reports and related clinical information. |
| Picture Archiving and Communication System (PACS) | Secure system used to store, retrieve, and manage radiological images and reports. |
| Institution-Hosted Deployment | Use of LLMs within secure, healthcare-approved IT environments rather than public consumer platforms. |
| SANRA (Scale for the Assessment of Narrative Review Articles) | A validated framework used to guide the conduct and reporting quality of this narrative review. |
Disclosures
Conflicts of interest: In compliance with the ICMJE uniform disclosure form, all authors declare the following:
Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work.
Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work.
Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.
Author Contributions
Concept and design: Jatin Naidu, Hitesh Muthyala, Sonia S. Naidu, Sandeep Muralidharan, Vasanth K. Baskaradoss
Acquisition, analysis, or interpretation of data: Jatin Naidu, Sonia S. Naidu
Drafting of the manuscript: Jatin Naidu, Hitesh Muthyala, Vasanth K. Baskaradoss
Critical review of the manuscript for important intellectual content: Jatin Naidu, Hitesh Muthyala, Sonia S. Naidu, Sandeep Muralidharan
Supervision: Vasanth K. Baskaradoss
References
- 1.Readability of lumbar spine MRI reports: Will patients understand? Yi PH, Golden SK, Harringa JB, Kliewer MA. AJR Am J Roentgenol. 2019;212:602–606. doi: 10.2214/AJR.18.20197. [DOI] [PubMed] [Google Scholar]
- 2.Readability of radiology reports: Implications for patient-centered care. Martin-Carreras T, Cook TS, Kahn CE Jr. https://pubmed.ncbi.nlm.nih.gov/30639521/ Clin Imaging. 2019;54:116–120. doi: 10.1016/j.clinimag.2018.12.006. [DOI] [PubMed] [Google Scholar]
- 3.Journal club: Structured feedback from patients on actual radiology reports: A novel approach to improve reporting practices. Gunn AJ, Gilcrease-Garcia B, Mangano MD, Sahani DV, Boland GW, Choy G. https://www.ajronline.org/doi/pdf/10.2214/AJR.16.17584. AJR Am J Roentgenol. 2017;208:1262–1270. doi: 10.2214/AJR.16.17584. [DOI] [PubMed] [Google Scholar]
- 4.Patient experience of imaging reports: A systematic literature review. Rogers C, Willis S, Gillard S, Chudleigh J. https://pmc.ncbi.nlm.nih.gov/articles/PMC10395377/ Ultrasound. 2023;31:164–175. doi: 10.1177/1742271X221140024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Understanding patient experiences, opinions, and actions taken after viewing their own radiology images online: Web-based survey. Norris EC, Halaska C, Sachs PB, Lin CT, Sanfilippo K, Honce JM. https://pmc.ncbi.nlm.nih.gov/articles/PMC9086874/ JMIR Form Res. 2022;6:0. doi: 10.2196/29496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Impact of turnaround time in radiology: The good, the bad, and the ugly. Ritchie B, Summerville L, Sheng M, Choi M, Tirumani S, Ramaiya N. https://www.sciencedirect.com/science/article/abs/pii/S0363018825000891. Curr Probl Diagn Radiol. 2025 doi: 10.1067/j.cpradiol.2025.04.018. [DOI] [PubMed] [Google Scholar]
- 7.OpenAI. GPT-4 Technical Report. GPT-4 technical report. [ Nov; 2025 ]. 2023. https://cdn.openai.com/papers/gpt-4.pdf https://cdn.openai.com/papers/gpt-4.pdf
- 8.ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives. Keshavarz P, Bagherieh S, Nabipoorashrafi SA, et al. https://pubmed.ncbi.nlm.nih.gov/38679540/ Diagn Interv Imaging. 2024;105:251–265. doi: 10.1016/j.diii.2024.04.003. [DOI] [PubMed] [Google Scholar]
- 9.ChatGPT makes medicine easy to swallow: An exploratory case study on simplified radiology reports. Jeblick K, Schachtner B, Dexl J, et al. https://pmc.ncbi.nlm.nih.gov/articles/PMC11126432/ Eur Radiol. 2023;34:2817–2825. doi: 10.1007/s00330-023-10213-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Large language models for enhancing radiology report impressions: Improve readability while decreasing burnout. Rahsepar AA. Radiology. 2024;310:0. doi: 10.1148/radiol.240498. [DOI] [PubMed] [Google Scholar]
- 11.Evaluation of GPT-4o for multilingual translation of radiology reports across imaging modalities. Terzis R, Salam B, Nowak S, et al. https://doi.org/10.1016/j.ejrad.2025.112341. Eur J Radiol. 2025;191:112341. doi: 10.1016/j.ejrad.2025.112341. [DOI] [PubMed] [Google Scholar]
- 12.Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting. Park J, Oh K, Han K, Lee YH. https://www.nature.com/articles/s41598-024-63824-z. Sci Reports. 2024;141:1–9. doi: 10.1038/s41598-024-63824-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Patient communication in radiology: Moving up the agenda. Rockall AG, Justich C, Helbich T, Vilgrain V. https://www.sciencedirect.com/science/article/pii/S0720048X2200314X. Eur J Radiol. 2022;1:110464. doi: 10.1016/j.ejrad.2022.110464. [DOI] [PubMed] [Google Scholar]
- 14.Use of interpreters by physicians for hospitalized limited English proficient patients and its impact on patient outcomes. López L, Rodriguez F, Huerta D, Soukup J, Hicks L. https://pubmed.ncbi.nlm.nih.gov/25666220/ J Gen Intern Med. 2015;30:783–789. doi: 10.1007/s11606-015-3213-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Summary of the HIPAA privacy rule. [ Nov; 2025 ]. 2025. https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html
- 16.Data protection act 2018. [ Nov; 2025 ]. 2018. https://www.legislation.gov.uk/ukpga/2018/12/contents https://www.legislation.gov.uk/ukpga/2018/12/contents
- 17.Artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD) action plan. [ Nov; 2025 ]. 2021. https://www.fda.gov/media/145022/download https://www.fda.gov/media/145022/download
- 18.Software and AI as a medical device change programme roadmap. [ Nov; 2025 ];https://www.gov.uk/government/publications/software-and-ai-as-a-medical-device-change-programme/software-and-ai-as-a-medical-device-change-programme-roadmap 2023 Device Change Programme roadmap -:2025–2022. [Google Scholar]
- 19.Regulation - 2017/745 - EN - medical device regulation. [ Nov; 2025 ]. 2023. https://eur-lex.europa.eu/eli/reg/2017/745/oj/eng https://eur-lex.europa.eu/eli/reg/2017/745/oj/eng
- 20.Patient centric summarization of radiology findings using large language models [PREPRINT] Tariq A, Fathizadeh S, Ramaswamy G, et al. https://www.medrxiv.org/content/10.1101/2024.02.01.24302145v1 medRxiv. 2024;5 [Google Scholar]
- 21.SANRA-a scale for the quality assessment of narrative review articles. Baethge C, Goldbeck-Wood S, Mertens S. Res Integr Peer Rev. 2019;4:5. doi: 10.1186/s41073-019-0064-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: Results, limitations, and potential. Lyu Q, Tan J, Zapadka ME, et al. https://vciba.springeropen.com/articles/10.1186/s42492-023-00136-5. Vis Comput Ind Biomed Art. 2023;6:9. doi: 10.1186/s42492-023-00136-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Accuracy of ChatGPT, Google Bard, and Microsoft Bing for simplifying radiology reports. Amin KS, Davis MA, Doshi R, Haims AH, Khosla P, Forman HP. Radiology. 2023;309:0. doi: 10.1148/radiol.232561. [DOI] [PubMed] [Google Scholar]
- 24.Quantitative evaluation of large language models to streamline radiology report impressions: A multimodal retrospective analysis. Doshi R, Amin KS, Khosla P, Bajaj SS, Chheang S, Forman HP. Radiology. 2024;310:0. doi: 10.1148/radiol.231593. [DOI] [PubMed] [Google Scholar]
- 25.Patient-readable radiology report summaries generated via large language model: Safety and quality. Sterling NW, Brann F, Frisch SO, Schrager JD. https://scholar.google.com/scholar_url J Patient Exp. 2024:2025–2029. [Google Scholar]
- 26.Probing clarity: AI-generated simplified breast imaging reports for enhanced patient comprehension powered by ChatGPT-4o. Maroncelli R, Rizzo V, Pasculli M, et al. https://eurradiolexp.springeropen.com/articles/10.1186/s41747-024-00526-1. Eur Radiol Exp. 2024;8:124. doi: 10.1186/s41747-024-00526-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Large language models in patient education: A scoping review of applications in medicine. Aydin S, Karabacak M, Vlachos V, Margetis K. https://pmc.ncbi.nlm.nih.gov/articles/PMC11554522/ Front Med (Lausanne) 2024;11:1477898. doi: 10.3389/fmed.2024.1477898. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.The impact of large language model-generated radiology report summaries on patient comprehension: A randomized controlled trial. Berigan K, Short R, Reisman D, et al. J Am Coll Radiol. 2024;21:1898–1903. doi: 10.1016/j.jacr.2024.06.018. [DOI] [PubMed] [Google Scholar]
- 29.Can generative AI improve the readability of patient education materials at a radiology practice? Gupta M, Gupta P, Ho C, Wood J, Guleria S, Virostko J. https://pubmed.ncbi.nlm.nih.gov/39266371/ Clin Radiol. 2024;79:0–71. doi: 10.1016/j.crad.2024.08.019. [DOI] [PubMed] [Google Scholar]
- 30.Decoding medical jargon: The use of AI language models (ChatGPT-4, BARD, microsoft copilot) in radiology reports. Tepe M, Emekli E. https://pubmed.ncbi.nlm.nih.gov/38743965/ Patient Educ Couns. 2024;126:108307. doi: 10.1016/j.pec.2024.108307. [DOI] [PubMed] [Google Scholar]
- 31.From jargon to clarity: Improving the readability of foot and ankle radiology reports with an artificial intelligence large language model. Butler JJ, Harrington MC, Tong Y, Rosenbaum AJ, Samsonov AP, Walls RJ, Kennedy JG. https://pubmed.ncbi.nlm.nih.gov/38336501/ Foot Ankle Surg. 2024;30:331–337. doi: 10.1016/j.fas.2024.01.008. [DOI] [PubMed] [Google Scholar]
- 32.From technical to understandable: Artificial intelligence large language models improve the readability of knee radiology reports. Butler JJ, Puleo J, Harrington MC, Dahmen J, Rosenbaum AJ, Kerkhoffs GM, Kennedy JG. https://pubmed.ncbi.nlm.nih.gov/38488217/ Knee Surg Sports Traumatol Arthrosc. 2024;32:1077–1086. doi: 10.1002/ksa.12133. [DOI] [PubMed] [Google Scholar]
- 33.Generating colloquial radiology reports with large language models. Tang CC, Nagesh S, Fussell DA, et al. https://pubmed.ncbi.nlm.nih.gov/39178375/ J Am Med Inform Assoc. 2024;31:2660–2667. doi: 10.1093/jamia/ocae223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Translating musculoskeletal radiology reports into patient-friendly summaries using ChatGPT-4. Kuckelman IJ, Wetley K, Yi PH, Ross AB. https://pubmed.ncbi.nlm.nih.gov/38270616/ Skeletal Radiol. 2024;53:1621–1624. doi: 10.1007/s00256-024-04599-2. [DOI] [PubMed] [Google Scholar]
- 35.Use of large language models in radiological reports: A study on simplifying Turkish MRI findings. Cesur T, Çamur E, Güneş YC. Ann Clin Anal Med. 2024;15:8. [Google Scholar]
- 36.Provision of radiology reports simplified with large language models to patients with cancer: Impact on patient satisfaction. Gupta A, Singh S, Malhotra H, et al. https://pubmed.ncbi.nlm.nih.gov/39879570/ JCO Clin Cancer Inform. 2025;9:0. doi: 10.1200/CCI-24-00166. [DOI] [PubMed] [Google Scholar]
- 37.Large language models in summarizing radiology report impressions for lung cancer in Chinese: Evaluation study. Hu D, Zhang S, Liu Q, Zhu X, Liu B. https://www.jmir.org/2025/1/e65547. J Med Internet Res. 2025;3:65547. doi: 10.2196/65547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Evaluating the quality and understandability of radiology report summaries generated by ChatGPT: Survey study. Sunshine A, Honce GH, Callen AL, et al. https://pmc.ncbi.nlm.nih.gov/articles/PMC12385610/ JMIR Form Res. 2025;9:0. doi: 10.2196/76097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Leveraging GPT-4 enables patient comprehension of radiology reports. van Driel MH, Blok N, van den Brand JA, et al. Eur J Radiol. 2025;187:112111. doi: 10.1016/j.ejrad.2025.112111. [DOI] [PubMed] [Google Scholar]
- 40.Improving patient communication by simplifying AI-generated dental radiology reports with ChatGPT: Comparative study. Stephan D, Bertsch AS, Schumacher S, et al. https://www.jmir.org/2025/1/e73337. J Med Internet Res. 2025;9:73337. doi: 10.2196/73337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.RadGPT: A system based on a large language model that generates sets of patient-centered materials to explain radiology report information. Herwald SE, Shah P, Johnston A, Olsen C, Delbrouck JB, Langlotz CP. J Am Coll Radiol. 2025;22:1050–1059. doi: 10.1016/j.jacr.2025.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.The ethics of simplification: Balancing patient autonomy, comprehension, and accuracy in AI-generated radiology reports. Lee HS, Song SH, Park C, et al. https://bmcmedethics.biomedcentral.com/articles/10.1186/s12910-025-01285-3. BMC Med Ethics. 2025;26:136. doi: 10.1186/s12910-025-01285-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Comparative evaluation of large language models in explaining radiology reports: Expert assessment of readability, understandability, and communication features. Bozer A, Pekçevik Y. https://insightsimaging.springeropen.com/articles/10.1186/s13244-025-02121-3. Insights Imaging. 2025;16:232. doi: 10.1186/s13244-025-02121-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Decoding radiology reports: Artificial intelligence-large language models can improve the readability of hand and wrist orthopedic radiology reports. Butler JJ, Acosta E, Kuna MC, Harrington MC, Rosenbaum AJ, Mulligan MT, Kennedy JG. https://pubmed.ncbi.nlm.nih.gov/39138809/ Hand (N Y) 2025;20:1144–1152. doi: 10.1177/15589447241267766. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Large language models for simplified interventional radiology reports: A comparative analysis. Can E, Uller W, Vogt K, et al. https://pubmed.ncbi.nlm.nih.gov/39353826/ Acad Radiol. 2025;32:888–898. doi: 10.1016/j.acra.2024.09.041. [DOI] [PubMed] [Google Scholar]
- 46.Assessing ChatGPT’s proficiency in simplifying radiological reports for healthcare professionals and patients. Sarangi PK, Lumbani A, Swarup MS, et al. https://pubmed.ncbi.nlm.nih.gov/38249202/ Cureus. 2023;15:0. doi: 10.7759/cureus.50881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Large language model ability to translate CT and MRI free-text radiology reports into multiple languages. Meddeb A, Lüken S, Busch F, et al. Radiology. 2024;313:0. doi: 10.1148/radiol.241736. [DOI] [PubMed] [Google Scholar]
- 48.Comparative evaluation of large language models for translating radiology reports into Hindi. Gupta A, Rastogi A, Malhotra H, Rangarajan K. Indian J Radiol Imaging. 2025;35:88–96. doi: 10.1055/s-0044-1789618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Artificial intelligence in multilingual interpretation and radiology assessment for clinical language evaluation (AI-MIRACLE) Khanna P, Dhillon G, Buddhavarapu V, Verma R, Kashyap R, Grewal H. https://pmc.ncbi.nlm.nih.gov/articles/PMC11433331/ J Pers Med. 2024;14:923. doi: 10.3390/jpm14090923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Transcending language barriers: Can ChatGPT be the key to enhancing multilingual accessibility in health care? Gulati V, Roy SG, Moawad A, Garcia D, Babu A, Poot JD, Teytelboym OM. J Am Coll Radiol. 2024;21:1888–1895. doi: 10.1016/j.jacr.2024.05.009. [DOI] [PubMed] [Google Scholar]
- 51.Evaluating the use of ChatGPT to accurately simplify patient-centered information about breast cancer prevention and screening. Haver HL, Gupta AK, Ambinder EB, Bahl M, Oluyemi ET, Jeudy J, Yi PH. Radiol Imaging Cancer. 2024;6:0. doi: 10.1148/rycan.230086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Enhancing patient communication with Chat-GPT in radiology: Evaluating the efficacy and readability of answers to common imaging-related questions. Gordon EB, Towbin AJ, Wingrove P, et al. https://pubmed.ncbi.nlm.nih.gov/37863153/ J Am Coll Radiol. 2024;21:353–359. doi: 10.1016/j.jacr.2023.09.011. [DOI] [PubMed] [Google Scholar]
- 53.Evaluation of an artificial intelligence chatbot for delivery of IR patient education material: A comparison with societal website content. McCarthy CJ, Berkowitz S, Ramalingam V, Ahmed M. https://pubmed.ncbi.nlm.nih.gov/37330210/ J Vasc Interv Radiol. 2023;34:1760–1768. doi: 10.1016/j.jvir.2023.05.037. [DOI] [PubMed] [Google Scholar]
- 54.Using ChatGPT to improve readability of interventional radiology procedure descriptions. Zaki HA, Mai M, Abdel-Megid H, et al. https://pubmed.ncbi.nlm.nih.gov/38981939/ Cardiovasc Intervent Radiol. 2024;47:1134–1141. doi: 10.1007/s00270-024-03803-z. [DOI] [PubMed] [Google Scholar]
- 55.Feasibility of GPT-3 and GPT-4 for in-depth patient education prior to interventional radiological procedures: A comparative analysis. Scheschenja M, Viniol S, Bastian MB, Wessendorf J, König AM, Mahnken AH. https://pubmed.ncbi.nlm.nih.gov/37872295/ Cardiovasc Intervent Radiol. 2024;47:245–250. doi: 10.1007/s00270-023-03563-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Large language model doctor: Assessing the ability of ChatGPT-4 to deliver interventional radiology procedural information to patients during the consent process. Hofmann HL, Vairavamurthy J. https://cvirendovasc.springeropen.com/articles/10.1186/s42155-024-00477-z. CVIR Endovasc. 2024;7:83. doi: 10.1186/s42155-024-00477-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Accuracy and readability of ChatGPT on potential complications of interventional radiology procedures: AI-powered patient interviewing. Kaba E, Beyazal M, Çeliker FB, Yel İ, Vogl TJ. Acad Radiol. 2025;32:1547–1553. doi: 10.1016/j.acra.2024.10.028. [DOI] [PubMed] [Google Scholar]
- 58.Patients’ attitudes toward the use of artificial intelligence as a diagnostic tool in radiology in Saudi Arabia: Cross-sectional study. Baghdadi LR, Mobeirek AA, Alhudaithi DR, Albenmousa FA, Alhadlaq LS, Alaql MS, Alhamlan SA. https://pubmed.ncbi.nlm.nih.gov/39110973/ JMIR Hum Factors. 2024;11:0. doi: 10.2196/53108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.How do patients perceive the AI-radiologists interaction? Results of a survey on 2119 responders. Ibba S, Tancredi C, Fantesini A, et al. https://pubmed.ncbi.nlm.nih.gov/37327548/ Eur J Radiol. 2023;165:110917. doi: 10.1016/j.ejrad.2023.110917. [DOI] [PubMed] [Google Scholar]
- 60.The implementation of artificial intelligence in radiology: A narrative review of patient perspectives. Hemphill S, Jackson K, Bradley S, Bhartia B. https://pmc.ncbi.nlm.nih.gov/articles/PMC10538685/ Future Healthc J. 2023;10:63–68. doi: 10.7861/fhj.2022-0097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.The future of AI in healthcare: Public perceptions of AI in radiology. [ Nov; 2025 ]. 2025. https://www.rcr.ac.uk/media/poelyzlz/rcr-reports-the-future-of-ai-in-healthcare-public-perceptions-of-ai-in-radiology.pdf https://www.rcr.ac.uk/media/poelyzlz/rcr-reports-the-future-of-ai-in-healthcare-public-perceptions-of-ai-in-radiology.pdf
- 62.ChatGPT and patient information in nuclear medicine: GPT-3.5 versus GPT-4. Currie G, Robbie S, Tually P. https://pubmed.ncbi.nlm.nih.gov/37699647/ J Nucl Med Technol. 2023;51:307–313. doi: 10.2967/jnmt.123.266151. [DOI] [PubMed] [Google Scholar]
- 63.Patient perspectives on artificial intelligence in medical imaging. Glenning J, Gualtieri L. J Particip Med. 2025;17:0. doi: 10.2196/67816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Bystanders or stakeholders: Patient perspectives on the adoption of AI in radiology. Fanni SC, Neri E. https://link.springer.com/article/10.1007/s00330-024-11135-2. Eur Radiol. 2025;35:767–768. doi: 10.1007/s00330-024-11135-2. [DOI] [PubMed] [Google Scholar]
- 65.Evaluating GPT-V4 (GPT-4 with vision) on detection of radiologic findings on chest radiographs. Zhou Y, Ong H, Kennedy P, et al. https://pubmed.ncbi.nlm.nih.gov/38713028/ Radiology. 2024;311:0. doi: 10.1148/radiol.233270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Assessing GPT-4 multimodal performance in radiological image analysis. Brin D, Sorin V, Barash Y, Konen E, Glicksberg BS, Nadkarni GN, Klang E. https://pubmed.ncbi.nlm.nih.gov/39214893/ Eur Radiol. 2025;35:1959–1965. doi: 10.1007/s00330-024-11035-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Evaluating ChatGPT-4V in chest CT diagnostics: A critical image interpretation assessment. Dehdab R, Brendlin A, Werner S, et al. https://link.springer.com/article/10.1007/s11604-024-01606-3. Jpn J Radiol. 2024;42:1168–1177. doi: 10.1007/s11604-024-01606-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Leveraging GPT-4 for post hoc transformation of free-text radiology reports into structured reporting: A multilingual feasibility study. Adams LC, Truhn D, Busch F, Kader A, Niehues SM, Makowski MR, Bressem KK. Radiology. 2023;307:0. doi: 10.1148/radiol.230725. [DOI] [PubMed] [Google Scholar]
- 69.Confidentiality: good practice in handling patient information. [ Nov; 2025 ];https://www.gmc-uk.org/professional-standards/the-professional-standards/confidentiality 2025 Confidentiality: good practice in handling patient information - professional standards -:2025–2022. [Google Scholar]
- 70.Interplay between the Medical Devices Regulation (MDR) & in vitro diagnostic medical devices Regulation (IVDR) and the Artificial Intelligence Act (AIA) [ Nov; 2025 ];https://health.ec.europa.eu/document/download/b78a17d7-e3cd-4943-851d-e02a2f22bbb4_en?filename=mdcg_2025-6_en.pdf 2025 & In vitro:2025. [Google Scholar]
- 71.Re-identification of anonymised MRI head images with publicly available software: Investigation of the current risk to patient privacy. Steeg K, Bohrer E, Schäfer SB, Vu VD, Scherberich J, Windfelder AG, Krombach GA. EClinicalMedicine. 2024;78:102930. doi: 10.1016/j.eclinm.2024.102930. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Face recognition from research brain PET: An unexpected PET problem. Schwarz CG, Kremers WK, Lowe VJ, et al. https://www.sciencedirect.com/science/article/pii/S1053811922004761. Neuroimage. 2022;1:2025–2022. doi: 10.1016/j.neuroimage.2022.119357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Guidance on the use of AI-enabled ambient scribing products in health and care settings. [ Nov; 2025 ]. 2025. https://www.england.nhs.uk/long-read/guidance-on-the-use-of-ai-enabled-ambient-scribing-products-in-health-and-care-settings/ https://www.england.nhs.uk/long-read/guidance-on-the-use-of-ai-enabled-ambient-scribing-products-in-health-and-care-settings/
- 74.Artificial Intelligence. [ Nov; 2025 ]. 2025. https://transform.england.nhs.uk/information-governance/guidance/artificial-intelligence/ https://transform.england.nhs.uk/information-governance/guidance/artificial-intelligence/
- 75.ICO consultation series on generative AI and data protection. [ Nov; 2025 ]. 2025. https://ico.org.uk/about-the-ico/ico-and-stakeholder-consultations/2024/09/ico-consultation-series-on-generative-ai-and-data-protection/ https://ico.org.uk/about-the-ico/ico-and-stakeholder-consultations/2024/09/ico-consultation-series-on-generative-ai-and-data-protection/
- 76.Identification of anonymous MRI research participants with face-recognition software. Schwarz CG, Kremers WK, Therneau TM, et al. https://pubmed.ncbi.nlm.nih.gov/31644852/ N Engl J Med. 2019;381:1684–1686. doi: 10.1056/NEJMc1908881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Organization WH. Ethics and governance of artificial intelligence for health: WHO guidance. [ Nov; 2025 ]. 2021. https://www.who.int/publications/i/item/9789240029200 pp. 10665–350567.https://www.who.int/publications/i/item/9789240029200
- 78.WHO calls for safe and ethical AI for health. [ Nov; 2025 ]. 2025. https://www.who.int/news/item/16-05-2023-who-calls-for-safe-and-ethical-ai-for-health https://www.who.int/news/item/16-05-2023-who-calls-for-safe-and-ethical-ai-for-health
- 79.Disparities in patient portal access and the role of providers in encouraging access and use. Richwine C, Johnson C, Patel V. https://pubmed.ncbi.nlm.nih.gov/36451262/ J Am Med Inform Assoc. 2023;30:308–317. doi: 10.1093/jamia/ocac227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Use of electronic health record patient portal accounts among patients with smartphone-only internet access. Turner K, Nguyen O, Hong YR, Tabriz AA, Patel K, Jim HS. https://pmc.ncbi.nlm.nih.gov/articles/PMC8314137/ JAMA Netw Open. 2021;4:0. doi: 10.1001/jamanetworkopen.2021.18229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.The digital divide in radiology: Computer use for health care-related tasks and breast cancer screening. Abraham P, Balthazar P, Reid NJ, Flores EJ, Narayan AK. https://pubmed.ncbi.nlm.nih.gov/36040332/ Radiology. 2023;306:218–219. doi: 10.1148/radiol.220796. [DOI] [PubMed] [Google Scholar]
- 82.Government response to the report of the equity in medical devices: independent review. [ Nov; 2025 ]. 2025. https://www.gov.uk/government/publications/government-response-to-the-report-of-the-equity-in-medical-devices-independent-review/government-response-to-the-report-of-the-equity-in-medical-devices-independent-review https://www.gov.uk/government/publications/government-response-to-the-report-of-the-equity-in-medical-devices-independent-review/government-response-to-the-report-of-the-equity-in-medical-devices-independent-review
- 83.Doctors warned by MDU about AI complaint responses. [ Nov; 2025 ]. 2025. https://www.themdu.com/press-centre/press-releases/doctors-warned-by-mdu-about-ai-complaint-responses https://www.themdu.com/press-centre/press-releases/doctors-warned-by-mdu-about-ai-complaint-responses
- 84.Adopting AI in healthcare. [ Nov; 2025 ]. 2025. https://mdujournal.themdu.com/issue-archive/spring-2024/adopting-ai-in-healthcare https://mdujournal.themdu.com/issue-archive/spring-2024/adopting-ai-in-healthcare
- 85.AI safer in practice. [ Nov; 2025 ]. 2025. https://www.medicalprotection.org/ai-framework https://www.medicalprotection.org/ai-framework
- 86.Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, amending Directive 2001/83/EC, Regulation (EC) No 178/2002 and Regulation (EC) No 1223/2009 and repealing Council Directives 90/385/EEC and 93/42/EEC. [ Nov; 2025 ]. 2017. https://eur-lex.europa.eu/eli/reg/2017/745/oj/eng https://eur-lex.europa.eu/eli/reg/2017/745/oj/eng
- 87.Recommendation of the council on Artificial Intelligence: OECD. [ Nov; 2025 ]. 2022. https://www.google.com/search https://www.google.com/search
- 88.Good medical practice - professional standards. [ Nov; 2025 ];https://www.gmc-uk.org/professional-standards/the-professional-standards/good-medical-practice 2025 Good medical practice - professional standards -:2025–2022. [Google Scholar]
- 89.Guidance on medical device patient labeling. [ Nov; 2025 ]. 2025. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/guidance-medical-device-patient-labeling https://www.fda.gov/regulatory-information/search-fda-guidance-documents/guidance-medical-device-patient-labeling
- 90.Guidance on clinical evaluation (MDR) / performance evaluation (IVDR) of medical device software. [ Nov; 2025 ]. 2020. https://health.ec.europa.eu/system/files/2020-09/md_mdcg_2020_1_guidance_clinic_eva_md_software_en_0.pdf https://health.ec.europa.eu/system/files/2020-09/md_mdcg_2020_1_guidance_clinic_eva_md_software_en_0.pdf
- 91.Iso Iso. Clinical investigation of medical devices for human subjects — Good clinical practice. [ Nov; 2025 ]. 2020. https://www.iso.org/standard/71690.html https://www.iso.org/standard/71690.html
- 92.INTERNATIONAL COUNCIL FOR HARMONISATION OF TECHNICAL REQUIREMENTS. International council for harmonisation of technical requirements for pharmaceuticals for human-use. [ Nov; 2025 ]. 2016. https://database.ich.org/sites/default/files/E6_R2_Addendum.pdf https://database.ich.org/sites/default/files/E6_R2_Addendum.pdf

