Abstract
Artificial intelligence (AI) systems are now prevalent in our daily lives and hold promise for transforming high-stakes fields such as healthcare. Medical AI systems are showing significant potential to support diagnostics and treatment recommendations. As these systems play an increasingly significant role in clinical decision-making, ensuring transparency in their design, operation, and outcomes is essential for building trust among key stakeholders, including patients, providers, developers, and regulators. However, many systems still function as “black boxes,” making it challenging for users–such as clinicians, patients, and other stakeholders–to interpret and verify their inner workings. Here, we examine the current state of transparency in medical AIs, identifying key challenges and risks these opaque systems pose. After motivating the need for transparency in all aspects of the machine learning pipeline, from training data to model development to model deployment, we explore a range of techniques that promote explainability throughout the pipeline while highlighting the importance of continual monitoring and system updates to ensure that AI systems remain reliable over time. Finally, we address the need to overcome barriers that inhibit the integration of transparency tools into clinical settings and review regulatory frameworks that prioritize transparency in emerging AI systems. Through this survey, we aim to increase awareness of current challenges and offer actionable insights for stakeholders, such as researchers, clinicians, and regulators, on how to build trustworthy and ethically responsible AI healthcare solutions.
Introduction
Recent rapid advances in medical AI, accelerated by breakthroughs in deep learning techniques, are transforming healthcare. Medical AI models are trained on clinical data to learn underlying patterns; once trained, they can analyze new clinical data to support diagnostic decision-making and treatment planning, helping to identify diseases earlier, personalize treatments, and improve patient outcomes (Box 1). Applications of these technologies span varied domains, including radiology for image interpretation1, pathology for cancer detection2, dermatology for lesion classification3,4, and cardiology for cardiovascular disease diagnosis5,6.
Box 1 ∣. Artificial intelligence basics.
General Artificial Intelligence Concepts
Deep learning.
A subfield of machine learning that uses artificial neural networks, consisting of many hidden layers containing nonlinear activation functions, to model complex patterns in data. Compared to traditional machine learning approaches, it automatically learns features from raw input (e.g., image or text) in an end-to-end manner, without relying on manually crafted features.
Supervised learning.
A machine learning paradigm where a model is trained using labeled datasets, with each input paired with a corresponding output (e.g., skin lesion images labeled with their diagnoses).
Self-supervised learning.
A machine learning paradigm where a model is trained on unlabeled datasets via generating supervisory signals from the data itself (i.e., leveraging the data’s structure to create pseudo-labels). For example, we can hide part of a sentence or image and train a model to predict the missing part. This approach enables training on large-scale unlabeled datasets.
Contrastive learning.
A training strategy where a model learns representations by contrasting similar and dissimilar data pairs.
Foundation models.
Large-scale neural networks that are pre-trained on vast amounts of diverse data, often using self-supervised learning. Once trained, these models can be adapted for a wide variety of downstream tasks without needing to be retrained from scratch. For example, ChatGPT, which can perform various tasks, such as text summarization, sentiment analysis, and question answering, is a foundation model.
Generative models.
Models designed to generate new data resembling the training data, such as synthetic images (e.g., GANs or diffusion models) or natural language (e.g., ChatGPT).
Fine-tuning.
A process of taking a pre-trained model and further training it on a new (typically smaller) dataset by updating model weights to adapt to the specific characteristics of the new task. For example, a general language model can be fine-tuned on a medical dataset to equip it with clinical knowledge and enable it to perform tasks such as clinical text classification or diagnosis prediction.
Model Reliability and Generalization Concepts
Fairness.
Ensuring that medical AI provides equitable performance across diverse population groups. It is quantified through criteria like demographic parity, predictive parity, and equalized odds, which assess performance differences across protected subgroups.
Robustness.
The ability of models to maintain their performance under varying conditions, such as noise, outliers, or adversarial attacks.
Distribution shifts.
Changes to the distribution of data between the training set and the test set or the deployment environment, which can degrade model performance. For example, a classification model trained primarily on images of light skin tone may show reduced accuracy when applied to images of darker skin tones due to a shift in data distribution. Spurious correlations. These occur when variables are associated but lack a causal relationship. In medical images, artifacts such as ruler marks or stray hairs may be correlated with the label but hold no clinical relevance to the task at hand. AI models can inadvertently exploit these associations, leading to deceptively high performance on training data without capturing meaningful clinical relationships.
Confounders.
Variables that are associated with both the input features (e.g., patient characteristics, imaging data) and the target (e.g., diagnosis). Confounders can result in the development of spurious correlations.
Overfitting.
A phenomenon that occurs when a model learns patterns specific to the training data, including noise and irrelevant details, rather than generalizable features. Overfitted models perform very well on the training dataset but poorly on new, unseen datasets.
Beyond diagnostic and treatment support, medical AI models are increasingly being applied to drug discovery and patient monitoring, offering new pathways to improve patient outcomes7,8. This growing promise is spurring active research and development, with more than 1,000 such models receiving regulatory approval to date from the U.S. Food and Drug Administration (FDA) as medical devices9-12. Notably, some models are now directly accessible to patients through smartphone applications for self-assessment13,14.
Despite these advances, integrating medical AI systems into clinical practice poses significant challenges. First, these systems, deployed in high-stakes clinical settings, remain subject to failure; although they can perform well on curated datasets in controlled environments, they often struggle to generalize to real-world scenarios due to distribution shifts, such as variations in imaging techniques, patient demographics, or rare diseases15-17. Second, they are susceptible to learning shortcuts, i.e., spurious associations between features and target labels in training data, that limit their generalizability18-21. For example, although AI models developed to detect COVID-19 from chest radiographs initially showed impressive performance in research studies, subsequent evaluations revealed that they relied heavily on spurious associations between labels and dataset-specific artifacts, such as text markers and differences in radiographic projections, rather than genuine markers of medical pathology22,23. Such failure modes lead to delays in diagnoses and treatments and undermine the vital element of trust in medical AI systems.
Further compounding these issues is the “black-box” nature of many medical AI models24-26. Frequently developed in siloed environments using private data, these systems lack the transparency required for necessary external scrutiny and validation. Furthermore, they are inherently complex, often involving neural networks with millions to billions of parameters, exacerbating user efforts to comprehend and interpret their mechanism27-30. Without clear insights into how these systems operate, when and why they may fail, or even why they succeed, integrating them into clinical workflows remains problematic and potentially dangerous.
Transparency has therefore emerged as a critical pillar for developing and deploying safe, reliable medical AIs31,32. By ‘transparency,’ we mean the extent to which essential information about AI systems, such as characteristics of the data used to train and test the model, their intended use, development process, underlying logic, performance, and known limitations, is clearly communicated to stakeholders. Insight into how models function improves model interpretability, helping developers debug and refine AI systems to ensure they function as intended; it thereby fosters trust among clinicians and patients33. Even when an AI system proves effective in randomized controlled trials34,35, transparency into their inner workings remains critical. Patients may be skeptical of AI-assisted diagnoses without understandable explanations36-38. Likewise, clinicians, who are accountable for medical decisions, must be able to explain them to patients and respond to their concerns. Furthermore, transparency helps meet evolving regulatory standards that increasingly prioritize explainability and accountability31.
Achieving transparency requires a holistic approach that addresses every stage of medical AI development: data collection, model development, and clinical deployment. In this review, we analyze transparency at each stage (Fig. 1). We scope this review to focus on complex deep learning models that are widely applied to different clinical data domains, such as imaging, electronic health records, and waveforms. First, we examine challenges to achieving data transparency, such as limited access to datasets used to train and test AI models, and explore potential solutions. Next, we review methods and trends in explainable AI (XAI), a subfield of AI that aims to explain the inner workings of AI models. We then explore strategies to maintain transparency and safety during clinical deployment. Finally, we identify emerging trends in medical AI systems and propose opportunities to improve the utility of transparency tools. We note that many of the principles and strategies we discuss are applicable to other biomedical AI domains, such as drug discovery and genomics, but those are beyond the scope of this review.
Fig. 1 ∣.

a, The development pipeline for medical AI systems, consisting of three key stages: data collection, model training, and deployment. For each stage, we review key issues of transparency and techniques to address them. The feedback loops highlight the iterative process, where issues identified during deployment monitoring can inform adjustments in model training and data collection procedures. b,–d, Examples illustrating transparency challenges at each stage. b, In the data collection stage, identifying and documenting the properties of the training data is essential for assessing biases and generalizability. For example, training datasets for melanoma detection AI often overrepresent Fitzpatrick skin types I–II (light skin tones) and underrepresent Fitzpatrick skin types V–VI (dark skin tones). A model trained on such disproportionate data can show biased performance across populations. c, During model training, it is important to go beyond evaluating accuracy and interrogate how models make predictions. For instance, COVID-19 classifiers trained on chest X-rays that initially reported high performance were later found to rely on spurious cues such as text markers rather than medically meaningful signals. d, Transparency during deployment includes continuous monitoring of model performance. For example, an AI system triaging diabetic retinopathy via fundus photography can show a performance drop after a change in image acquisition hardware, underscoring the need for continuous monitoring and adaptive model updates. The chest X-ray illustration was adapted from Wikimedia Commons by Jmarchn under CC BY-SA 3.0 license.
Data transparency
As the lifeblood of medical AI models, data fuels their development and evaluation. Therefore, data transparency is vital to ensuring the safety and reliability of these systems (Fig. 2).
Fig. 2 ∣.

Key components of data transparency in medical AI systems. a, Data documentation involves recording key properties of the data (e.g., patient demographics, labeling procedures, collection methods) to ensure transparency and track biases. Several initiatives aim to standardize these practices, including the MI-CLAIM Checklist59, the MINIMAR Reporting Standards60, and the STANDING Together initiative61, and the TRIPOD+AI statement62. b, The scarcity of diverse, publicly available datasets hinders both the development of medical AI models that are generalizable and their rigorous external validation. The Medical AI Data for All (MAIDA) initiative seeks to address this challenge by establishing a global framework for medical imaging data sharing41. MAIDA collaborates with partner institutions worldwide, providing standardized templates for institutional review boards (IRBs) and data-transfer agreements (DTUAs). It also supports partners with the data standardization and de-identification processes by providing custom DICOM identification tools and data-recording worksheets. c, Privacy-enhancing technologies are also being developed to facilitate data sharing. For example, federated learning allows institutions to collaboratively train AI models, without transferring raw patient data, by sharing local weight updates only with a central server. Data sharing facilitates collaboration across institutions to build medical AIs that generalize across varied clinical settings. d, Synthetic data generation uses generative models to create datasets that replicate the statistical properties of real data. Users can control the generation process through queries to produce images with desired properties (e.g., rare conditions or artifacts). The chest X-ray image was adapted from Wikimedia Commons, originally by Hellerhoff, under CC BY-SA 3.0 license.
Data documentation
Transparency in medical AI models starts with identifying and documenting the properties of the data used to train them. Clear insights into the characteristics of training datasets are essential to identify whether biases exist and assess if AI models are generalizable, i.e., they perform equally well across diverse clinical settings and patient populations. However, transparency into data provenance can be limited by a lack of detailed information on data acquisition and pre-processing39,40. Specifically, key properties of data used to train and test medical AIs include patient demographics, data labeling procedures, and data collection methods, which are often incomplete or entirely unknown39,41. This deficiency is particularly problematic in the medical domain, where most medical datasets are siloed due to their sensitive nature and access to raw data is not possible39,42,43.
Patient demographic information, such as patient age, sex, and race/ethnicity, is crucial for evaluating whether a model can generalize across different groups. If training data is biased towards certain demographics, models will often exhibit poor performance on underrepresented populations, perpetuating healthcare inequities16,40,44,45. Nonetheless, many medical AI models, whether developed in academia or industry, fail to reveal such demographic details39,46,47. While documenting demographic information is essential, this need for transparency must also be balanced with the recognition that such data is inherently sensitive and must be handled carefully. Demographic attributes may increase re-identification risks, especially when combined with other quasi-identifiers48,49. As such, disclosure risk must be rigorously assessed before releasing individual-level demographic data. Moreover, how demographic variables are defined and derived can influence which populations are made visible in analyses50-54. Therefore, documenting how they were recorded and derived is as important as reporting the data itself.
Equally important is a thorough description of data labeling procedures, including (1) who performed the labeling (e.g., level of expertise and number), (2) how the labels were verified (e.g., biopsy-confirmed diagnoses for skin lesions), and (3) what diagnostic criteria were used. In most cases, this information is not available, or labeling is performed inconsistently, giving rise to issues of label noise and lack of trustworthiness39,44. Furthermore, diagnostic criteria often evolve over time or vary by region, leading to labeling inconsistencies that can affect model predictions when deployed across different healthcare systems17. For example, Behcet’s disease, with 17 different diagnostic criteria proposed globally, illustrates how temporal and regional variations in labeling criteria can undermine model consistency17,55,56. Models trained on older classifications may perform poorly or inconsistently under updated standards.
Finally, descriptions of data collection methods, such as (1) sources of data (e.g., hospitals or online databases), (2) the equipment used for image acquisition, and (3) any pre-processing steps applied (e.g., sample filtering or image normalization), are vital for understanding the context of the data and how it might influence the AI model. For example, differences in imaging devices across hospitals or variations in lighting conditions in dermatology images can introduce biases that affect model predictions. Additionally, (4) artifacts in datasets should be documented. For example, in dermatology images, common artifacts, such as marker ink, gel bubbles, color charts, ruler marks, and skin hair, can correlate with target labels, causing a model to exploit these spurious correlations instead of clinically meaningful signals57,58.
To address these issues, several initiatives have aimed to standardize reporting guidelines for documenting medical AI datasets. These include the MI-CLAIM Checklist59, the MINIMAR Reporting Standards60, the STANDING Together initiative61, and the TRIPOD+AI statement62, paralleling initiatives in general AI domains63-65 that requires labels essential to understanding potential biases during model training. Having detailed documentation lets developers mitigate potential biases during model training using computational techniques even if the biases are discovered after data collection17,66,67.
Data sharing
Access to large-scale datasets collected from diverse sources is essential for training robust models capable of generalizing across varied clinical settings. Further, open access to data enables broader scrutiny and validation by the scientific community, fostering transparency and promoting collaborative research. However, data sharing in medicine is limited due to patient privacy concerns42,68 as well as legal and institutional barriers41,69-71. The sharing of medical data containing sensitive patient information increases the risk of data breaches, which can result in severe legal liabilities, harm to patients, and a loss of patient trust. Also, institutions, including hospitals and private companies, are often reluctant to share data, and data sharing often gives rise to lengthy data use agreement negotiations. As a result, public datasets are currently scarce and lack diversity due to limited data sources16, posing a significant obstacle to training models that generalize well and to replicating medical AI models and fairly evaluating their performance40,58,72-74.
To address this challenge, efforts are being made to facilitate responsible data sharing while maintaining patient privacy. Institutional frameworks, such as open-source data initiatives, aim to promote transparency, interoperability, and responsible data use by developing standardized procedures for data collection, de-identification, and sharing42,75-81. For example, the MAIDA initiative has established a global framework for sharing medical imaging data that involves collaborating with diverse institutions, standardizing data collection and de-identification protocols, and addressing local challenges with tailored solutions41,82. Specifically, for de-identification, MAIDA provided detailed data-recording worksheets that were designed to omit variables classified as protected health information (PHI). In addition, partners were instructed to carefully review and modify any potentially identifying data before sharing. Such initiatives aim to improve the accessibility and diversity of medical data while complying with ethical and legal requirements.
On the technical front, privacy-enhancing technologies are being developed to facilitate medical data sharing. De-identification tools systematically remove or obfuscate patient identifiers in datasets, facilitating broader data sharing while maintaining privacy standards41. Further, federated learning enables the collaborative training of medical AI models across multiple institutions without the need to centralize sensitive patient data17,78,83-88; this approach lets multiple institutions contribute to the development of a shared model without directly exposing their patient data17,71.
Synthetic data generation
Synthetic data generation is emerging as a promising solution to address the challenge of limited access to large-scale and diverse medical datasets43,89. Synthetic data refers to artificially created data that replicates the statistical properties and patterns of real data without revealing actual patient information90. Generative models, such as generative adversarial networks (GANs) and diffusion models, are trained on real datasets to learn their underlying data distribution and generate new, artificial data points that statistically resemble the original data43,91-93.
One significant advantage of synthetic data is its ability to facilitate data sharing. Since synthetic datasets contain no identifiable patient information, they can be safely shared for research and training purposes. Additionally, synthetic data can augment existing medical training datasets90,94,95. This is particularly useful for addressing bias and fairness concerns in imbalanced training data96,97 since generative models can produce data that reflect underrepresented populations or rare conditions. By doing so, synthetic data can contribute to more representative and equitable models.
However, synthetic data generation also imposes limitations. First, generative model data is not completely free of privacy issues: if the generative model overfits and memorizes training samples, it could generate synthetic data that reveals sensitive personal information, posing the risk of patient re-identification43,89. Second, evaluating the quality of synthetic data remains challenging89,90 since it must consider multiple dimensions: fidelity (realism of generated samples)22,98-100, diversity (capturing of real data variability)99, privacy or identifiability (distinguishability of synthetic from real samples)101, and utility (effectiveness of data for downstream tasks)102. However, the lack of a standardized framework for selecting these metrics complicates evaluation. Moreover, there are inherent trade-offs between these dimensions. For example, stricter privacy constraints to limit information leakage could reduce synthetic data’s utility by making it less representative of real data and omitting outliers89. Optimizing multiple quality metrics simultaneously involves prioritizing different dimensions depending on specific synthetic data use cases. Lastly, the generative models themselves pose challenges to transparency and explainability. They are complex deep-learning models, such as GANs and diffusion models, making it difficult to understand how they generate specific outputs. This is particularly concerning given that synthetic data could be used for training downstream medical AI systems. Recent studies have begun to address important explainability questions, such as 1) what concepts are encoded in the internal representations of generative models103,104, 2) which training samples significantly impact the quality of the generated data105,106, and 3) how biases in generative models can be detected and mitigated107.
Model transparency
We next examine model transparency in the development pipeline by describing how AI models trained on these datasets operate. Explainable AI (XAI), a subfield of AI, provides tools to investigate and improve AI model interpretability by explaining how models (1) process data, (2) make predictions, and (3) generate outcomes in clinical settings (Fig. 3a). These insights are crucial for identifying potential biases, errors, or unintended behaviors, enabling safer, more reliable integration of AI into medical workflows as well as more trustworthy models (Table 1; Box 2). Here, we provide an overview of XAI methods, grouped into key categories, and highlight representative examples of their applications in clinical contexts. This structure is intended to help readers identify techniques that are suitable for their particular needs. We also acknowledge that some methods may not be exclusive to a single category, and in such cases, we classify them into a category for which they were originally developed.
Fig. 3 ∣.

Key components of different model transparency frameworks. a, AI models are increasingly applied to diverse types of medical data for tasks such as classification, risk prediction, and segmentation. However, these models are often opaque, limiting our understanding of their underlying mechanisms. Explainable AI techniques aim to elucidate how these models work and improve their transparency. b, Feature attribution methods assign importance scores directly to the input features. c, Concept-based explanation methods aim to explain model behavior in terms of high-level, human-understandable concepts. d, Counterfactual models use generative models to produce slightly modified examples that flip the model’s prediction. By examining what changed, users can identify which concepts are important to the model’s decision. e, Inherently interpretable models are designed with interpretability built into their architecture or training process. f, Large complex models like LLMs pose new transparency challenges. One promising approach to improving their transparency is retrieval-augmented generation (RAG), which grounds model outputs in external sources. b-e, We highlight the output of each transparency framework in red. The chest X-ray image and retinal image were adapted from Wikimedia Commons, originally by Spideog and Tmhlee, respectively, under CC BY-SA 3.0 license.
Table 1 ∣.
Overview of model transparency methods. For each method, the proposed approach, applications, advantages, limitations, and a set of key references are provided.
| Method | Approach | Medical applications | Advantages | Limitations | Refs |
|---|---|---|---|---|---|
| Feature attribution methods | |||||
| GradCAM | Uses the spatial gradients of CNN layers |
|
|
[109, 236] | |
| Integrated Gradients | Integrates gradients from a baseline input to the actual input and computes the cumulative contribution |
|
|
[111, 237] | |
| LIME | Explains individual predictions of a black-box model by fitting an interpretable surrogate model (e.g., linear model or decision tree) to locally perturbed samples around the input instance |
|
|
[115] | |
| SHAP | Averages the marginal contribution of a feature to the model prediction across all possible feature subsets |
|
|
[118, 129-132] | |
| Concept-based explanations | |||||
| TCAV | Learns concept vectors by training linear classifiers to separate positive and negative samples and then computes the directional derivative of classifier predictions along that vector |
|
|
[139] | |
| Automatic concept explanation | Utilize large-scale pretrained models (e.g., CLIP or image-captioning models) to obtain the concept annotations automatically from images and then use them for model explanation |
|
|
|
[143, 149] |
| Counterfactual explanations | |||||
| StylEx | Uses the latent space of a StyleGAN to manipulate concept-specific latent attributes and generates the corresponding counterfactuals |
|
|
|
[152] |
| EBPE | Uses a GAN-based generative model to progressively update a reference image which gradually changes the posterior probability to the negation class |
|
|
[153] | |
| Inherently interpretable models | |||||
| ProtoPNet | Learns a set of prototypical image patches and compares input regions to these prototypes for decision making. |
|
|
[238] | |
| Concept bottleneck model (CBM) | First predicts the concepts from the input using a deep learning model and then uses them to predict the target using a linear model |
|
|
[164, 171] | |
| Large language model transparency | |||||
| Self explanation | Uses the inherent ability of LLMs to generate explanations for their outputs |
|
|
[177, 178] | |
| Mechanistic interpretability | Aims to reverse-engineer LLMs to understand how they operate internally, such as what features they learned and what model components, such as neurons and attention heads, are important |
|
|
|
[184-186, 241] |
| Retrieval augmented generation (RAG) | Augments LLMs with specific external information (like web documents) during inference to reduce hallucinations and improve explanations |
|
|
[197, 198] | |
Box 2 ∣. Explainability and interpretability concepts.
Explainable AI concepts
Post hoc explanations.
Explanation methods applied to already-trained models to make their complex decision-making processes interpretable and understandable.
Feature attribution.
Methods that quantify the contribution of input features (e.g., pixels in an image or words in text) to a model’s prediction (e.g., disease probability).
Attribution map.
Visual representation of feature attributions shown as heatmaps. These maps use colors to highlight the features or regions of the input that strongly influence a model’s prediction, helping users easily localize important features. Concept-based explanations. Methods that explain the AI models’ behavior using human-understandable semantic concepts.
Counterfactual explanations.
Methods that explain the AI models’ behavior by showing how a model’s inputs would need to change to achieve a desired outcome from a machine learning model, providing insights into decision boundaries. Mechanistic interpretability. A reverse-engineering approach to understanding how neural networks operate internally, focusing on learned features (e.g., patterns or properties recognized by the model) and circuits (functional units performing computations). This is analogous to how brain functions are studied in neuroscience.
Inherently interpretable models.
Models designed to provide explanations alongside their predictions without requiring post hoc methods.
Chain-of-thought prompting.
A prompting method that guides AI models to solve complex tasks by breaking them into intermediate reasoning steps, mimicking human problem-solving processes. This can be achieved by providing the model with several examples of step-by-step reasoning during usage or simply instructing the model with prompts like, “Let’s think step by step.”
Retrieval-augmented generation (RAG).
A technique that enables language models to access external knowledge during text generation. Given a user query, the technique retrieves relevant documents from external sources and appends them to the query before passing it to the language models. This allows the model to leverage external information beyond what is encoded in its parameters, leading to more accurate responses. Also, it enhances the explainability of LLM-generated text by providing users with references used during the generation process.
Black-box model.
A machine learning system whose internal workings are not interpretable or accessible to the user. Model agnostic. Methods that can be applied to any machine learning model, regardless of its architecture or training regime.
Zero-shot annotation.
Process of labeling or classifying data without seeing any examples for that specific task. Hallucinations. Instances where the machine learning model (typically LLMs) generates incorrect or fabricated outputs that are not grounded in factual knowledge.
Feature attribution methods
Feature attribution methods address a fundamental question in model transparency: which input features contribute to a model’s prediction? These methods quantify the importance of input features, offering insights into the model’s decision-making process (Fig. 3b). Examples of such features include specific regions for medical images like X-rays and MRI scans, segments of waveform data like ECG, features from operating room signals, or words in electronic health record (EHR) text data.
One approach for quantifying the importance of input features is gradient-based saliency maps, i.e., a visual representation that highlights influential features by leveraging the gradients (partial derivatives) of the model’s output with respect to the input features. The magnitude of these gradients indicates how sensitive the model’s prediction is to changes in each input feature. Common methods for creating these maps include (1) taking the partial derivatives of the output with respect to the input and multiplying them by the input itself108, (2) generating attribution maps based on the spatial feature gradients in convolutional layers associated with a particular class109,110, and (3) computing the cumulative contribution of input features by integrating gradients along a path from a baseline input to the actual input111.
Another approach, removal-based attribution, assesses feature importance by explicitly withholding one or more input features from the model and observing the impact on model predictions112. For images, removing inputs corresponds to masking pixels or patches in the image by setting them to zero113 or applying blurring to them114. One such method that only requires black-box access to a classifier is called Local Interpretable Model-agnostic Explanations (LIME)115, which explains the individual predictions of a black-box model by approximating it with a simple, interpretable model, such as sparse linear model, that is locally faithful to the original model’s behavior. It does this by generating perturbed samples around an input instance, observing the model outputs, and using them to fit a surrogate model to estimate the importance of each feature. LIME is designed as a general framework that allows users to flexibly select the surrogate model class, similarity kernel, and fidelity loss function. It is model-agnostic—meaning it can be used with any classifier—and provides per-instance explanations rather than global interpretability. Another method, SHapley Additive exPlanations (SHAP), is a widely used removal-based method which uses concepts from cooperative game theory to fairly assign credit among input features116. Each feature is treated as a player in a cooperative game, and its contribution is calculated by averaging its marginal contribution to the model prediction across all possible subsets of the features. SHAP is renowned for its strong theoretical foundation, broad applicability, and well-documented implementations, making it a popular choice in healthcare117. Its theoretical grounding ensures useful properties like efficiency (attributions sum to the total prediction) and symmetry (identical features receive equal attributions). SHAP can be seen as a special case of LIME. If LIME uses the Shapley kernel as its kernel function and fits a linear model, it produces SHAP values112,118. SHAP retains LIME’s model-agnostic local approximation strengths while also offering theoretical guarantees from game theory.
These approaches have all been applied to explain medical AI models. One study identified that deep learning systems to detect COVID-19 from chest radiographs rely on confounding factors rather than clinical signals21. Another study visualized the specific ECG leads and portions of ECG waves that were most influential for detecting myocardial infarction119. LIME has been used for revealing key features which are important for chronic diseases in older adults like heart disease and diabetes120, for Alzheimer’s disease prediction121, and for retinoblastoma diagnosis122. Removal-based methods have been used to audit medical AI models trained for a variety of tasks, like distinguishing between mediastinal cysts and tumors123, preventing hypoxemia during surgery124, joint modeling of multiple neuropathological measures of Alzheimer’s disease125, and estimating biological age from tabular hospital features126.
However, feature attribution methods have inherent limitations. Gradient-based methods require access to the model weights, making them inapplicable to black-box models, which is the case for most of the deployed AI models. Gradients are also susceptible to adversarial manipulation127 and the magnitude of the gradient can be unintuitive, since it relates to model sensitivity rather than quantifying feature importance. Additionally, some gradient-based methods have been shown to fail simple sanity checks and produce outputs similar to those of edge detectors128. LIME explanations are based on an assumption that the model behaves linearly in a small neighborhood around the instance, which might not always hold for complex models. SHAP requires enumeration over many possible feature subsets, which can prove prohibitively expensive for larger datasets. Mitigating this cost has prompted the development of techniques focusing on practical approximations129,130 and extensions to specific models like decision-trees131 and vision-transformers132.
Beyond these technical challenges, a key conceptual limitation of feature attribution methods is their inability to explain why a feature is influential. Although these methods can identify important features, the responsibility of interpreting the highlighted regions in an input image, e.g., whether it is important due to size, color, or another characteristic, ultimately falls on the users. Since users such as clinicians must independently derive meaningful insights from the highlighted features, these methods increase subjectivity and often place a cognitive burden on users133,134. Furthermore, for some medical modalities, such as dermoscopic images of skin lesions, saliency maps often degrade to highlight the whole lesion, offering scant insight into model behavior.
Concept-based explanations
In medical practice, clinicians typically rely on high-level, semantically meaningful concepts to describe images and make predictions. For example, dermatologists describe lesions using concepts such as erythema, scale, pigmentation, and dome-shaped features135,136. Explaining AI models directly in terms of these concepts aligns more closely with the clinical reasoning process than feature attribution methods.
To bridge the gap between feature importance and domain-specific knowledge that clinicians use for decision making, a different paradigm, i.e., concept-based explanations, has gained popularity. By attributing model predictions to high-level, human-accessible concepts, these methods enable more interpretable AI systems that are easier to integrate into clinical workflows (Fig. 3c). Several techniques facilitate concept-based explanations137-139. One well-known approach is called Testing with Concept Activation Vectors (TCAV)139; this approach first obtains concept activation vectors (CAVs), which are vector representations of a high-level concept in a neural network’s intermediate layers. To obtain a CAV of a specific concept, a linear classifier is trained to distinguish the activation of an intermediate layer for positive examples (e.g., 100 images containing the concept) and negative examples (e.g., 100 images without the concept). The weights of the trained classifier represent the CAV. Next, the model’s sensitivity to that concept is quantified by computing the directional derivative of its predictions along the CAV. TCAV has been previously applied to medical AI models. For example, the original study identified concepts like microaneurysms and pan-retinal laser scars as being important predictors of diabetic retinopathy levels139. Subsequent studies used TCAV to interpret an AI model for segmenting cardiac structures140 and time-series models for EHR data141.
One key limitation of TCAV is its reliance on manually selecting and annotating concepts, a time-consuming process that introduces subjectivity. To address this challenge, automatic annotation procedures have been developed. One method segments images at multiple resolutions and applies a clustering algorithm to intermediate representations of a classifier, grouping images with similar activations into concepts142; however, this approach depends on segmentation accuracy and is limited to localizable concepts. Other techniques leverage vision-language foundation models, such as Contrastive Language-Image Pretraining (CLIP) models143. These models are trained on a large set of image-caption pairs using a contrastive learning objective to learn a joint representation space; during training, the paired image and text are forced to be close in the joint representation space, whereas those from different pairs are forced to be far apart. The models have zero-shot capability, where concepts in a given image can be identified in terms of natural language without requiring additional training.
Extensions of CLIP methods to medical AI include the Medical Concept Retriever (MONET), a framework for leveraging medical CLIP models to enable large-scale concept annotation and explanations across the medical AI development pipeline144. Medical CLIP models have been trained on PubMed articles145,146 social media posts147 and YouTube videos148. These models can then be used to identify clinically relevant and spurious concepts of significance to disease classifiers; for example, MONET was used to audit a medical AI model trained to predict malignancy from dermoscopic lesions; it found that concepts related to lesion ‘redness’ led to the highest error rate, where malignant images were predominantly misclassified as benign. More recently, VisDiff149 was proposed to identify concept differences between image sets without requiring predefined concepts using captioning models150 and vision LLMs like GPT-4V. While originally developed in general domains, the method’s potential for medical application will grow as vision-language models improve their medical capabilities.
Counterfactual explanations
Since concept-based explanation methods rely on a pre-defined set of concepts, they might prove less robust in clinical settings where medical concepts are difficult to curate and require domain expertise. An alternative is counterfactual explanations151, which answer the question, “What changes to the input features would result in a different output of the medical AI model?” Answering this question is accomplished by generating and investigating counterfactual images— synthetic images that slightly alter attributes of a reference image in order to cross the model’s decision boundary and flip the model’s prediction (Fig. 3d). Examining the difference between the reference image and the counterfactual image, we can infer that the classifier uses the differing visual signals as part of its reasoning process. Counterfactual explanations provide specific and actionable insights into the causal interpretation of important concepts: we learn which concepts must be altered in the images to elicit a different prediction from the classifier.
One approach to generating counterfactuals, called StylEx, identifies classifier-specific attributes in the latent space of a StyleGAN and manipulates them to analyze how the generated image affects the classifier’s output probability152. Another approach uses a GAN-based generative model to produce a progressive set of plausible variations of a reference image, each of which gradually changes the posterior probability from its original class to its negation153. Extensions of these approaches have been used to identify important visual concepts in medical AI classifiers. One study used counterfactual analysis along with insights from board-certified dermatologists to rigorously audit five melanoma classifiers. The framework revealed that the classifiers rely both on features used by clinicians, such as lesional pigmentation patterns, and on undesirable features, such as background skin texture and color balance154. Using a similar framework, a different study identified signals which caused classifiers to predict protected attributes like sex with unexpectedly high performance, which is surprising since the task is difficult even for trained clinicians155. Another study demonstrated the broad applicability of counterfactual visualizations on eight different prediction tasks across three medical imaging modalities, including retinal fundus images, eye photographs, and chest X-rays; the study revealed clinically known features (e.g., types of cataract and enlarged heart) as being significant while also identifying confounders, like the correlation between chest X-ray underexposure with abnormality prediction as well as eye makeup’s correlation with the prediction of low hemoglobin levels156.
One limitation of counterfactual explanations is the potential for the Rashomon effect, i.e., given the same input image, different training runs of the generative model can produce differing counterfactuals that can be valid in terms of crossing the decision boundary to get a different model prediction151,157. This can cause different concepts to be altered across different runs since multiple concepts can lead to similar changes in model output, introducing ambiguity about which concepts are truly important for classifier predictions.
Inherently interpretable models
The explanation methods discussed so far operate in a post hoc manner, attempting to interpret black-box models after training. However, these approaches might be challenging to use effectively without deep technical expertise. Clinicians, in particular, would find it easier and more practically useful to work with models that are inherently interpretable since they are easier to analyze and understand. A classic example is the linear classifier, which makes predictions as a weighted sum of input features. The learned weights indicate whether each feature affects positively or negatively toward the model output, making the model’s internal workings transparent. Other simple models with similar benefits include decision trees158,159 and generalized additive models160,161. When such simple models achieve performance comparable to that of complex models162, they are generally preferred due to their interpretability. However, deep learning models capable of learning complex patterns from data and delivering high performance are becoming more widely available in clinical settings. To bridge the gap between performance and interpretability, recent research has explored deep learning architectures that are inherently interpretable.
One such example is the Prototypical Part Network (ProtoPNet), which is designed for image classification163. Instead of relying on dense feature representations, ProtoPNet learns a set of visual prototypes, which are small image patches that represent key parts of each class. During inference, the model compares parts of the input image to these learned prototypes and makes a prediction based on weighted similarity scores with the prototypes. This allows the predictions to be visually and semantically explainable, as the model shows the prototype that influenced its decision. However, a limitation of this approach is that the prototypes might not align with human-defined concepts, making them difficult to interpret.
Concept bottleneck models (CBMs) are another popular approach to develop inherently interpretable models164. Unlike complex black-box models, CBMs make predictions in two stages: (1) they predict concepts from input using models like Convolutional Neural Networks (CNNs) and then (2) use these predicted concepts to predict the target output via a linear model (Fig. 3e). Since each node in the bottleneck layer corresponds to a human-defined concept, the linear model’s associated weights show how each concept affects the model output. CBMs have been applied in varied domains, ranging from general vision and natural language processing tasks165-167 to medical applications135,168,169. An extension of this technique was used in a study to predict malignancy from mass lesions in mammograms and identified clinically relevant features leveraged by the model, e.g., mass margin characteristics170.
However, CBMs are limited in that they require a well-curated concept list and dense concept annotations in the training data to learn the bottleneck layer. These limitations have been addressed by prior work that automates concept annotation using CLIP-based models144,171. MONET144, for example, trained a CBM for melanoma prediction using concept scores that were automatically calculated; it found that CBM performance matches that of black-box models and that concept weights align well with prior medical knowledge in the dermatology domain (e.g., asymmetry, erosion, multiple colors).
Large language model transparency
The AI field is seeing rapid advances in the development of large language models (LLMs), like ChatGPT, Claude, Gemini, and LLaMA, that can perform a wide range of natural language tasks. Medical LLMs are developed by fine-tuning these base general-domain LLMs on medical datasets, such as MedQA (USMLE-style questions)172 and PubMed articles27,173. Such LLMs can support various stages of clinical workflows, including intake tasks (e.g., chatbots that help patients prepare for medical visits), core clinical tasks (e.g., decision-support systems or talk therapy), and discharge tasks (e.g., clinical report generation or insurance claim letter drafting)174,175.
Though the unprecedented scale and capabilities of LLMs open exciting new opportunities for medical AI, they also introduce new challenges in achieving transparency. Since they are highly complex transformer-based models with billions of parameters, applying XAI methods is inherently computationally burdensome. In addition, LLMs are generative models that produce text stochastically–meaning their outputs can vary across runs even when given the same input–further complicating the application of traditional XAI methods. Finally, LLMs demonstrate reasoning abilities beyond the simple recognition of patterns, requiring explanation techniques that offer deeper insights into their behavior176.
Self-explanation.
A widely discussed approach to enhancing transparency in LLMs is self-explanation. This technique leverages the inherent ability of LLMs to generate explanations for their outputs by eliciting intermediate reasoning steps177. Techniques like chain-of-thought prompting can guide LLMs to break their decision-making process into discrete steps178. For example, users can simply ask the model “Why do you think so?” to obtain a self-generated explanation. However, though these explanations may seem plausible and user friendly, they are not always faithful to the model’s actual internal workings: LLMs are often trained on human-generated text and optimized to align with human preferences, which means their explanations are designed to seem reasonable rather than accurately reflect the models’ underlying mechanism179-181.
Future directions could explore incorporating faithfulness to model’s internal reasoning in addition to human preference in the model training process. Doing so would ensure that explanations align more closely with the model’s true reasoning182,183.
Mechanistic interpretability.
Another line of research focuses on mechanistic interpretability, which aims to reverse-engineer LLMs to understand how specific components (e.g., neurons, layers, attention heads) affect model behavior184-187. While some progress has been made in applying mechanistic interpretability to biomedical models188,189, current efforts focus more on advancing scientific understanding than on providing actionable explanations for real-world use, and their applicability to clinical settings has been less explored. Challenges include computational burden due to the significant number of large LLM components and the need for explanations to address the myriad individual components. These factors currently limit the practicality of this approach for clinical use—where clear, concise, and actionable explanations are essential—but future work may help bridge this gap by making the techniques more scalable and user-facing.
Retrieval-augmented generation.
Another promising direction for increasing LLM transparency involves grounding LLM outputs in specified external evidence, although this method does not focus on explaining the model itself190-196. Though LLMs demonstrate deeper knowledge as their size and training data increase, they continue to suffer from hallucinations–phenomenon where the model generates plausible-sounding but factually incorrect or fabricated information—and lack fine-grained, domain-specific knowledge. A well-known technique to address these limitations is retrieval-augmented generation (RAG)197,198: instead of relying solely on the LLM’s internal knowledge, RAG augments LLMs with specific external information during inference (Fig. 3f). When a user task is received, the AI system first looks up relevant documents and sends retrieved ones to the LLM, which can then generate responses that include references to a specified set of documents. In medicine, this approach is particularly valuable since it grounds model outputs in clinical guidelines and facilitates fact-checking by helping users verify the referenced material. By anchoring responses in trusted external sources, RAG enhances medical AI transparency and reliability.
Evaluation of explanation methods
Despite the growing number of explanation methods, systematically evaluating and comparing them remains a key challenge. Evaluation is essential both for researchers developing new techniques and for practitioners selecting appropriate methods for their specific applications. In this section, we outline the commonly used evaluation criteria—faithfulness and plausibility—along with representative metrics used for each.
The first evaluation criterion is faithfulness, which refers to how accurately an explanation reflects the decision-making process of models. In other words, a faithful explanation should highlight features that the model genuinely relies on for prediction. Several quantitative metrics have been developed to evaluate faithfulness. The insertion/deletion metric113 measures how the model’s output changes as important features are added or removed from the input. The sensitivity-n metric199 measures how well the sum of attribution aligns with the variation in the model output when subsets of features are withheld. Another approach is ROAR200, which retrains models on data where features identified as important are removed. A significant drop in performance after retraining suggests that the removed features were indeed relevant.
The second evaluation criterion is plausibility, which assesses how well the explanation aligns with human intuition or domain expertise. One evaluation technique is the “pointing game”201, in which the explanation is compared to ground-truth annotations, such as bounding boxes in images. Alternatively, plausibility can be evaluated qualitatively through user studies, such as surveys or interviews, where domain experts assess whether the explanation looks reasonable109. These evaluation methods, however, rely on a key assumption that the features a model uses are similar to those used by humans. This assumption may not always hold, which means that low plausibility scores do not necessarily indicate that an explanation is incorrect—only that it may be different from human expectations.
In summary, a variety of criteria and metrics are available for evaluating explanation methods. The relative importance of factors, such as faithfulness, plausibility, or computational cost, may depend on the specific use case— whether for model debugging or building end-user trust. As a result, no single explanation method is universally optimal; the choice of evaluation metrics should be guided by the goals and constraints of the particular application context.
Deployment transparency
Beyond the challenges of transparency during AI development, ensuring transparency in deployment is equally critical for medical AI. Transparency in deployment extends beyond understanding how AI models operate—it requires clear and accessible reporting on their real-world performance, safety, and clinical impact. Even models that perform well in controlled environments can face unforeseen challenges once integrated into complex healthcare systems, where multiple stakeholders interact with AI in dynamic and unpredictable ways. One major challenge arises from the interaction between users and AI systems. Clinicians may under-rely on AI, dismissing valuable insights, or over-rely on its predictions without sufficient oversight. Furthermore, real-world clinical environments are not static. Shifts in patient demographics, disease prevalence, and data acquisition methods can introduce biases, degrade performance, and create new safety risks. Without ongoing transparency and monitoring, such issues can go undetected, leading to unintended clinical consequences. This section explores three critical aspects of transparent medical AI deployment. We first examine the need to rigorously evaluate AI models in real-world settings to ensure they provide meaningful improvements in clinical outcomes. Next, we discuss the importance of continuous monitoring and model updates to detect failures and maintain system reliability. Finally, we review regulatory frameworks designed to ensure transparency in deployed medical AI systems by enforcing safety and accountability standards.
Evaluating the impact of medical AI in clinical practice
Though usually developed in siloed and controlled settings, medical AI models are deployed as part of multi-faceted systems with varied stakeholders in uncontrolled environments202. Many studies show that problems with medical AI models are more often related to how the device is used or with issues concerning data rather than with flaws in the algorithms 203-205. For example, in diagnostic imaging, an AI model may flag potential abnormalities, but the final interpretation and decision-making are the responsibility of a radiologist to minimize errors such as false positives or negatives. Even with a “perfect” model, clinicians unfamiliar with its use can misdiagnose or impose other serious consequences. These findings highlight the need to assess how effectively the AI interacts with clinicians and integrates into clinical workflows206,207. Thus, real-world evaluation is essential to maintain transparency regarding the AI systems’ effectiveness and clinical impact.
To assess whether AI models translate into meaningful improvements in patient care, several studies have evaluated their real-world impact across different specialties. One work assessed the ability of dermatologists to detect melanoma risk from skin lesion images before and after exposure to an AI algorithm208; authors observed that the dermatologists’ ability improved significantly after AI exposure and that such exposure had an equivalent or even higher net benefit compared to performing biopsies of all lesions. Other studies (1) conducted a population-based trial to assess the non-inferiority of cancer detection within 3 months of mammography by one radiologist plus an AI model compared with double reading by two radiologists209,210; replacing one radiologist with an AI yielded a 4% higher non-inferior cancer detection rate, and (2) analyzed the real-world applicability and generalizability of an AI-based triaging system for chest X-rays across a diverse demographic cohort, observing a 77% faster turnaround time compared to radiologists211. Such studies highlight the necessity of evaluating medical AI at a systemic level to gain deeper insights into how models integrate into clinical workflows and the extent to which they translate into measurable improvements in real-world clinical outcomes.
Continuous monitoring and model updating
Medical AI models deployed in clinical settings must adapt to ever-changing environments212. Medical environments are dynamic, with evolving patient cohort demographics, newer diseases, shifts in input acquisition techniques, and changes in treatment protocols that can cause a distribution shift in the data, in turn resulting in fluctuations in AI system performance. Without proactive monitoring, these changes can introduce biases, degrade performance, and compromise patient safety, ultimately leading to reduced transparency.
Continuous monitoring of input data quality and model behavior enables early detection of such distribution shifts131, facilitating timely model updates via retraining, fine-tuning, or other adaptive techniques213,214. Despite its importance, continuous monitoring remains underutilized in practice. A prior study reviewing 43 predictive medical AI tools for primary care—including 25 from regulatory databases and 18 from peer-reviewed literature—found that only 2% provided evidence of deployment monitoring. This mismatch between expectations and reality has driven discussions within the medical AI community about the urgency of incorporating continuous monitoring to maintain required levels of performance and transparency215.
Beyond tracking system health, continuous monitoring serves as a cornerstone for developing robust and safe medical AI via continual model updating214,216. For example, continual learning, also known as lifelong learning, enables medical AI models to integrate new clinical data while retaining previously learned knowledge217. The insights gained from continuous monitoring can be fed back into the model training process, allowing models to dynamically adapt and improve17,66,67,155. Thus, continuous monitoring is not just a diagnostic tool but also serves as a pathway for improving long-term AI safety and clinical effectiveness.
Regulatory frameworks for medical AI systems
As AI-driven medical tools become increasingly integrated into clinical practice, ensuring transparency via regulatory frameworks is essential for maintaining trust, safety, and accountability. Similar to traditional medical devices, medical AI systems undergo rigorous regulatory assessment, with a growing emphasis on transparency31,218. In the U.S., the FDA regulates AI-based medical tools under the Software as a Medical Device (SaMD) framework, which outlines key requirements for transparency, performance evaluation, test data quality, and continuous monitoring10,219,220.
Over time, the FDA has refined its regulatory approach to medical AI by introducing several key frameworks to promote ongoing evaluation and transparency. The agency has long promoted a Total Product Lifecycle (TPLC) approach, which emphasizes continuous evaluation from development to post-market surveillance. Under TPLC, AI systems must undergo premarket evaluation, real-world performance monitoring, and adaptive updates to ensure safety and reliability over time. This framework acknowledges that AI models are dynamic and must evolve with changing clinical conditions. In 2021, the FDA collaborated with Health Canada and the United Kingdom’s Medicines and Healthcare products Regulatory Agency (MHRA) to introduce Good Machine Learning Practice (GMLP) for medical devices. GMLP emphasizes transparency by requiring that users—including healthcare providers and patients— have access to clear, essential information, such as a device’s intended use, performance metrics, data characteristics, acceptable inputs, limitations, and user interface interpretation. Also, GMLP highlights the role of the human-AI team, shifting the focus from isolated model performance to real-world team effectiveness in clinical workflows. Furthermore, GMLP mentions that deployed models must be continuously monitored in real-world use, with mechanisms to track and explain performance changes over time.221. Most recently, in 2024, the FDA, Health Canada, and MHRA have established guiding principles focused on transparency—Transparency in Machine Learning-Enabled Medical Devices (MLMDs). These principles require that critical information, such as intended use, performance, risks, and decision logic, is clearly communicated to relevant stakeholders, emphasizing explainability and proactive risk management207.
Regulatory oversight of medical AI is a global effort, with agencies worldwide introducing frameworks to enhance transparency and accountability. To provide a broader perspective on these developments, Fig. 4 shows a chronological timeline of all the major guidelines for medical AI established by governmental institutions around the world. Collaborative efforts across stakeholders and countries are essential to establishing comprehensive guidelines that ensure medical AI models remain transparent, adaptable, and aligned with patient-centered care.
Fig. 4 ∣.

Over the years, regulatory bodies across the world, including the EU, FDA, WHO, and US Congress, have established institutional guidelines for appropriate deployment of medical AI systems. These frameworks address different aspects such as data privacy, ethical principles, transparency, safety, and reliability. As medical AI systems continue to evolve rapidly, these guidelines also need to be regularly updated to reflect the emerging capabilities and challenges.
Outlook
We next explore further opportunities and challenges in advancing transparency in medical AI systems. We focus on addressing transparency issues posed by emerging model types, overcoming barriers to adopting transparency tools in clinical practice, and evaluating the real-world clinical utility of explainability methods.
Emerging model types
Democratization of large models
The immense scale of foundation models makes their training prohibitively expensive, and training is thus affordable only for a select few well-funded organizations. Further, deploying these models on local machines requires high-performance GPU clusters, resources which are often unavailable in many medical institutions175,222,223. This issue is especially prominent in low-resource healthcare settings, which lack computing resources and IT professionals, further exacerbating existing healthcare disparities. Consequently, most medical institutions may need to rely on proprietary LLMs accessed via application programming interfaces (APIs).
The reliance on these external systems poses significant transparency and privacy issues. Many high-performing LLMs are closed-source and do not disclose details about their training methods or datasets, limiting their transparency and trustworthiness224,225. Moreover, with API access only, it would be extremely difficult to ascertain how the model is processing patient data on the backend. For example, entering a patient’s health information into the AI chatbot systems hosted by a third party raises the risk of protected health information (PHI) breaches and potential HIPAA violations226. Additionally, closed-source models are prone to unpredictable changes. The lack of archived versions for regulatory review and dependency on proprietary vendors make closed models pose challenges for long-term medical deployment.
To address these challenges, efforts are being made to democratize the use of large models through techniques such as model distillation and efficient inference. Model distillation aims to create smaller, lightweight versions of large language models that retain much of the original model’s performance while being more resource-efficient227. Efficient inference techniques, such as model quantization and pruning as well as on-device inference engines, further enhance accessibility by enabling models to run locally on less powerful hardware, reducing dependency on external APIs228,229. By enabling local deployment, these approaches allow healthcare institutions to maintain control over their data and processing workflows.
Regulating emerging model types
Regulating emerging AI models such as LLMs presents unprecedented challenges174. Unlike traditional AI models with fixed functionality, LLMs are general-purpose models that can be adapted to various tasks, making it difficult to define their intended use—a critical component of current regulatory frameworks. Also, their performance varies depending on which prompts are used, complicating standardized evaluation. Additionally, the vast, complex, and often proprietary nature of their training data makes auditing challenging. Recent proposals advocate for creating a distinct regulatory category for LLM to ensure their transparent and accountable deployment222. Furthermore, novel regulatory strategies are being explored, including treating LLMs similarly to human clinicians by applying comparable methods, such as periodic evaluations, supervised clinical use, and public reporting of performance230.
Integrating transparency tools in clinical practice
Addressing the computational cost of XAI methods in resource-constrained environments
The computational overhead of XAI methods poses a hurdle to their integration into clinical practice. Many widely used XAI techniques, such as SHAP and LIME, require post hoc analysis that involves additional computational steps. This increases latency and resource requirements, making these methods challenging to deploy in time-sensitive clinical settings where real-time explanations are essential. In critical scenarios, such as emergency diagnostics, clinicians lack time to wait until the output of XAI methods is ready. To overcome these constraints, researchers and engineers must prioritize the development of lightweight, real-time explainability techniques and optimize models for efficient deployment on resource-limited systems129,132,231-233.
Clinician-centered integration
Seamlessly embedding transparency tools into clinical workflows is essential for ensuring their practical utility in real-world healthcare settings. A key strategy is integrating these tools with electronic health record (EHR) systems, enabling clinicians to access explanations directly within their existing workflows. To maximize usability, the design of explainability interfaces should be intuitive and tailored to clinicians’ needs. Features like interactive visualization dashboards, voice-assisted tools, and concise textual summaries can make explanations more accessible and actionable. For example, a visualization dashboard might highlight critical regions in medical images, while a voice-assisted tool could provide real-time explanations during patient consultations. These designs should minimize clinicians’ cognitive load, ensuring they enhance rather than complicate clinical workflows.
Education and training
Educating clinicians about the strengths and limitations of explainability methods is critical to their effective use in clinical practice. Explainability methods are not perfect, and it is essential to ensure that clinicians do not become overly reliant on or overly confident in medical AI systems based solely on the outputs of these methods. For example, a saliency map might highlight a skin lesion as the most influential region for a model’s prediction while ignoring surrounding artifacts, like ruler marks or color charts. Although this may appear reassuring, it does not guarantee that the model is making decisions based on clinically relevant features in a way that aligns with clinicians’ expectations. Training programs should emphasize the importance of understanding explainability tools and recognizing their limitations. Clinicians should be equipped with the knowledge to critically evaluate AI-generated explanations and incorporate them as supplementary insights rather than definitive answers.
Evaluating the clinical utility of explainability
Transparency in medical AI is widely regarded as a means to enhance trust, reduce bias, and improve accountability. However, quantifying its real-world impact is not straightforward. While explainability is often cited as critical for trust and adoption, studying its direct impact on clinical utility through prospective trials remains underexplored. For example, does a more interpretable model lead to improved decision-making by healthcare providers or a reduction in diagnostic errors? Most existing studies focus on theoretical benefits or rely on retrospective data, offering limited insights into real-world outcomes.
Key questions remain unanswered. Can clinicians effectively utilize explainability tools in time-constrained settings? Do these tools genuinely improve trust, or do they inadvertently add to the cognitive burden of already overwhelmed healthcare providers? What types of explanations are the most beneficial for specific clinical outcomes? Addressing these questions requires robust evaluation frameworks and benchmarks that directly measure the impact of explainability in clinical practice. These frameworks could incorporate metrics such as improvements in diagnostic accuracy, reductions in time-to-decision, and enhanced collaboration between clinicians and AI systems and between clinicians and their patients. Additional metrics, including clinician trust scores, patient satisfaction, regulatory adherence, and AI adoption rates, could serve as proxies for the practical benefits of transparency. Gathering real-world evidence will be critical to ensuring that explainability methods achieve their intended goals in clinical practice.
Key points.
As AI systems play an increasingly significant role in clinical decision-making, ensuring transparency in their design, operation, and outcomes is essential for their safe and effective deployment and for building trust among stakeholders.
Achieving transparency requires a holistic approach that spans the entire development pipeline: from data collection and model development to clinical deployment.
Explainable AI techniques, including feature attributions, concept-based explanations, and counterfactual explanations, elucidate how medical AI models process data and make clinical predictions.
Transparent deployment of medical AI systems demands rigorous real-world evaluation, continuous performance monitoring, and evolving regulatory frameworks to maintain safety, reliability, and clinical impact over time.
Advancing transparency further requires democratizing access to large models, integrating transparency tools into clinical workflows, and systematically evaluating their clinical utility.
Acknowledgements
C.K., S.U.G., and S.-I.L. received support from the National Institutes of Health (R01 AG061132, R01 EB035934, and RF1 AG088824).
Citation diversity statement
We acknowledge that papers authored by scholars from historically excluded groups are systematically under-cited. Here, we have made every attempt to reference relevant papers in a manner that is equitable in terms of racial, ethnic, gender and geographical representation.
Footnotes
Competing interests
The authors declare no competing interests.
References
- 1.Ra jpurkar P & Lungren MP The Current and Future State of AI Interpretation of Medical Images. New England Journal of Medicine 388. Publisher: Massachusetts Medical Society, 1981–1990. issn: 0028-4793. 10.1056/NEJMra2301725 (2024) (May 25, 2023). [DOI] [Google Scholar]
- 2.Song AH et al. Artificial intelligence for digital and computational pathology. Nature Reviews Bioengineering 1, 930–949. issn: 2731-6092. 10.1038/s44222-023-00096-8 (Dec. 1, 2023). [DOI] [Google Scholar]
- 3.Jones OT et al. Artificial intelligence and machine learning algorithms for early detection of skin cancer in community and primary care settings: a systematic review. The Lancet Digital Health 4. Publisher: Elsevier, e466–e476. issn: 2589-7500. https://www.thelancet.com/journals/landig/article/PIIS2589-7500(22)00023-1/fulltext (2023) (June 1, 2022). [Google Scholar]
- 4.Esteva A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542. Number: 7639 Publisher: Nature Publishing Group, 115-118. issn: 1476-4687. https://www.nature.com/articles/nature21056 (2022) (Feb. 2017). [Google Scholar]
- 5.Ouyang D. et al. Video-based AI for beat-to-beat assessment of cardiac function. Nature 580. Publisher: Nature Publishing Group, 252–256. issn: 1476-4687. https://www.nature.com/articles/s41586-020-2145-8 (2024) (Apr. 2020). [Google Scholar]
- 6.Christensen M, Vukadinovic M, Yuan N & Ouyang D Vision–language foundation model for echocardiogram interpretation. Nature Medicine 30. Publisher: Nature Publishing Group, 1481–1488. issn: 1546-170X. https://www.nature.com/articles/s41591-024-02959-y (2024) (May 2024). [Google Scholar]
- 7.Arnold C. Inside the nascent industry of AI-designed drugs. Nature Medicine 29, 1292–1295. issn: 1546-170X. 10.1038/s41591-023-02361-0 (June 1, 2023). [DOI] [Google Scholar]
- 8.Rakers MM et al. Availability of Evidence for Predictive Machine Learning Algorithms in Primary Care: A Systematic Review. JAMA Network Open 7, e2432990–e2432990. issn: 2574-3805. 10.1001/jamanetworkopen.2024.32990 (2024) (Sept. 12, 2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Health, C. f. D. bibinitperiod R. Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices FDA. Publisher: FDA. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices (2024). [Google Scholar]
- 10.Wu E. et al. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nature Medicine 27. Number: 4 Publisher: Nature Publishing Group, 582–584. issn: 1546-170X. https://www.nature.com/articles/s41591-021-01312-x (2023) (Apr. 2021). [Google Scholar]
- 11.Dorr DA, Adams L & Embí P Harnessing the Promise of Artificial Intelligence Responsibly. JAMA 329, 1347–1348. issn: 0098-7484. 10.1001/jama.2023.2771 (2024) (Apr. 25, 2023). [DOI] [Google Scholar]
- 12.Muehlematter UJ, Daniore P & Vokinger KN Approval of artificial intelligence and machine learning-based medical devices in the USA and Europe (2015–20): a comparative analysis. The Lancet Digital Health 3, e195–e203. issn: 2589-7500. https://www.sciencedirect.com/science/article/pii/S2589750020302922 (2021). [DOI] [PubMed] [Google Scholar]
- 13.Smak Gregoor AM et al. An artificial intelligence based app for skin cancer detection evaluated in a population based setting. npj Digital Medicine 6, 90. issn: 2398-6352. 10.1038/s41746-023-00831-w (May 20, 2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chen W. et al. Early detection of visual impairment in young children using a smartphone-based deep learning system. Nature Medicine 29, 493–503. issn: 1546-170X. 10.1038/s41591-022-02180-9 (Feb. 1, 2023). [DOI] [Google Scholar]
- 15.Temple SWP & Rowbottom CG Gross failure rates and failure modes for a commercial AI-based auto-segmentation algorithm in head and neck cancer patients. Journal of Applied Clinical Medical Physics 25, e14273. issn: 1526-9914, 1526-9914. https://aapm.onlinelibrary.wiley.com/doi/10.1002/acm2.14273 (2024) (June 2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Daneshjou R. et al. Disparities in dermatology ai: Assessments using diverse clinical images. arXiv preprint arXiv:2111.08006 (2021). [Google Scholar]
- 17.Chen RJ et al. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nature Biomedical Engineering 7, 719–742. issn: 2157-846X. https://www.nature.com/articles/s41551-023-01056-8 (2024) (June 28, 2023). [Google Scholar]
- 18.Maleki F. et al. Generalizability of Machine Learning Models: Quantitative Evaluation of Three Methodological Pitfalls. Radiology: Artificial Intelligence 5, e220028. issn: 2638-6100. http://pubs.rsna.org/doi/10.1148/ryai.220028 (2024) (Jan. 1, 2023). [Google Scholar]
- 19.Yang J. et al. Generalizability assessment of AI models across hospitals in a low-middle and high income country. Nature Communications 15. Publisher: Nature Publishing Group; UK London, 8270. https://www.nature.com/articles/s41467-024-52618-6 (2024) (2024). [Google Scholar]
- 20.Ong Ly C. et al. Shortcut learning in medical AI hinders generalization: method for estimating AI model generalization without external data. NPJ Digital Medicine 7. Publisher: Nature Publishing Group; UK London, 124. https://www.nature.com/articles/s41746-024-01118-4 (2024) (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.DeGrave AJ, Janizek JD & Lee S-I AI for radiographic COVID-19 detection selects shortcuts over signal. Nature Machine Intelligence 3. Number: 7 Publisher: Nature Publishing Group, 610–619. issn: 2522-5839. https://www.nature.com/articles/s42256-021-00338-7 (2023) (July 2021). [Google Scholar]
- 22.Laghi A. Cautions about radiologic diagnosis of COVID-19 infection driven by artificial intelligence. The Lancet Digital Health 2, e225. issn: 2589-7500. https://www.sciencedirect.com/science/article/pii/S2589750020300790 (May 1, 2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wang L, Lin ZQ & Wong A COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Scientific Reports 10, 19549. issn: 2045-2322. 10.1038/s41598-020-76550-z (Nov. 11, 2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Marey A. et al. Explainability, transparency and black box challenges of AI in radiology: impact on patient care in cardiovascular radiology. Egyptian Journal of Radiology and Nuclear Medicine 55, 183. issn: 2090-4762. https://ejrnm.springeropen.com/articles/10.1186/s43055-024-01356-2 (2024) (Sept. 13, 2024). [Google Scholar]
- 25.Poon AIF & Sung JJY Opening the black box of AI-Medicine. Journal of Gastroenterology and Hepatology 36, 581–584. issn: 0815-9319, 1440-1746. https://onlinelibrary.wiley.com/doi/10.1111/jgh.15384 (2024) (Mar. 2021). [Google Scholar]
- 26.Saw SN & Ng KH Current challenges of implementing artificial intelligence in medical imaging. Physica Medica 100. Publisher: Elsevier, 12–17. https://www.sciencedirect.com/science/article/pii/S1120179722019962 (2024) (2022). [Google Scholar]
- 27.Singhal K. et al. Large language models encode clinical knowledge. Nature 620. Number: 7972 Publisher: Nature Publishing Group, 172–180. issn: 1476-4687. https://www.nature.com/articles/s41586-023-06291-2 (2023) (Aug. 2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Jiang LY et al. Health system-scale language models are all-purpose prediction engines. Nature 619. Number: 7969 Publisher: Nature Publishing Group, 357–362. issn: 1476-4687. https://www.nature.com/articles/s41586-023-06160-y (2023) (July 2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Tiu E. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nature Biomedical Engineering. Publisher: Nature Publishing Group, 1–8. issn: 2157-846X. https://www.nature.com/articles/s41551-022-00936-9 (2022) (Sept. 15, 2022). [Google Scholar]
- 30.Krishnan R, Ra jpurkar P & Topol EJ Self-supervised learning in medicine and healthcare. Nature Biomedical Engineering. Publisher: Nature Publishing Group, 1–7. issn: 2157-846X. https://www.nature.com/articles/s41551-022-00914-1 (2022) (Aug. 11, 2022). [Google Scholar]
- 31.Shick AA et al. Transparency of artificial intelligence/machine learning-enabled medical devices. npj Digital Medicine 7, 21. issn: 2398-6352. 10.1038/s41746-023-00992-8 (Jan. 26, 2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Chen H, Gomez C, Huang C-M & Unberath M Explainable medical imaging AI needs human-centered design: guidelines and evidence from a systematic review. npj Digital Medicine 5, 156. issn: 2398-6352. 10.1038/s41746-022-00699-2 (Oct. 19, 2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Cadario R, Longoni C & Morewedge CK Understanding, explaining, and utilizing medical artificial intelligence. Nature Human Behaviour 5, 1636–1642. issn: 2397-3374. 10.1038/s41562-021-01146-0 (Dec. 1, 2021). [DOI] [Google Scholar]
- 34.He B. et al. Blinded, randomized trial of sonographer versus AI cardiac function assessment. Nature 616, 520–524. issn: 1476-4687. 10.1038/s41586-023-05947-3 (Apr. 1, 2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Cruz Rivera S. et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nature Medicine 26, 1351–1363. issn: 1546-170X. 10.1038/s41591-020-1037-7 (Sept. 1, 2020). [DOI] [Google Scholar]
- 36.Zhou Y, Shi Y, Lu W & Wan F Did Artificial Intelligence invade humans? The study on the mechanism of patients’ willingness to accept artificial intelligence medical care: From the perspective of Intergroup Threat Theory. Frontiers in Psychology 13, 866124 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Shevtsova D. et al. Trust in and acceptance of artificial intelligence applications in medicine: mixed methods study. JMIR human factors 11, e47031 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Dean TB, Seecheran R, Badgett RG, Zackula R & Symons J Perceptions and attitudes toward artificial intelligence among frontline physicians and physicians’ assistants in Kansas: a cross-sectional survey. JAMIA Open 7, ooae100. issn: 2574-2531. 10.1093/jamiaopen/ooae100 (2025) (Dec. 1, 2024). [DOI] [Google Scholar]
- 39.Daneshjou R, Smith MP, Sun MD, Rotemberg V & Zou J Lack of Transparency and Potential Bias in Artificial Intelligence Data Sets and Algorithms: A Scoping Review. JAMA Dermatology 157, 1362–1369. issn: 2168-6068. 10.1001/jamadermatol.2021.3129 (2022) (Nov. 1, 2021). [DOI] [Google Scholar]
- 40.Groh M. et al. Evaluating Deep Neural Networks Trained on Clinical Images in Dermatology With the Fitzpatrick 17k Dataset in IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2021, virtual, June 19-25, 2021 (Computer Vision Foundation / IEEE, 2021), 1820–1828. https://openaccess.thecvf.com/content/CVPR2021W/ISIC/html/Groh%5C_Evaluating%5C_Deep%5C_Neural%5C_Networks%5C_Trained%5C_on%5C_Clinical%5C_Images%5C_in%5C_Dermatology%5C_CVPRW%5C_2021%5C_paper.html. [Google Scholar]
- 41.Saenz A, Chen E, Marklund H & Ra jpurkar P The MAIDA initiative: establishing a framework for global medical-imaging data sharing. The Lancet Digital Health 6. Publisher: Elsevier, e6–e8. issn: 2589-7500. https://www.thelancet.com/journals/landig/article/PIIS2589-7500(23)00222-4/fulltext (2024) (Jan. 1, 2024). [DOI] [PubMed] [Google Scholar]
- 42.Bak M, Madai VI, Fritzsche M-C, Mayrhofer MT & McLennan S You Can’t Have AI Both Ways: Balancing Health Data Privacy and Access Fairly. Front. Genet 13. Publisher: Frontiers. issn: 1664-8021. https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2022.929453/full (2024) (June 13, 2022). [Google Scholar]
- 43.Koetzier LR et al. Generating Synthetic Data for Medical Imaging. Radiology 312, e232471. issn: 0033-8419, 1527-1315. http://pubs.rsna.org/doi/10.1148/radiol.232471 (2024) (Sept. 1, 2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Groh M, Harris C, Daneshjou R, Badri O & Koochek A Towards transparency in dermatology image datasets with skin tone annotations by experts, crowds, and an algorithm. Proceedings of the ACM on Human-Computer Interaction 6. Publisher: ACM New York, NY, USA, 1–26 (CSCW2 2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Goldberg CB et al. To do no harm — and the most good — with AI in health care. Nature Medicine 30, 623–627. issn: 1546-170X. 10.1038/s41591-024-02853-7 (Mar. 1, 2024). [DOI] [Google Scholar]
- 46.Liu X. et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nature Medicine 26, 1364–1374. issn: 1546-170X. 10.1038/s41591-020-1034-x (Sept. 1, 2020). [DOI] [Google Scholar]
- 47.Bluemke DA et al. Assessing Radiology Research on Artificial Intelligence: A Brief Guide for Authors, Reviewers, and Readers—From the Radiology Editorial Board. Radiology 294. Publisher: Radiological Society of North America, 487–489. issn: 0033-8419. 10.1148/radiol.2019192515 (2024) (Mar. 1, 2020). [DOI] [Google Scholar]
- 48.Health Insurance Portability and Accountability Act of 1996 Public Law No. 104-191, 110 Stat. 1936. Enacted by the 104th United States Congress. 1996. https://www.govinfo.gov/content/pkg/PLAW-104publ191/pdf/PLAW-104publ191.pdf. [Google Scholar]
- 49.Carvalho T, Antunes L, Costa Santos C & Moniz N Empowering open data sharing for social good: a privacy-aware approach. Scientific Data 12, 248. issn: 2052-4463. 10.1038/s41597-025-04506-x (Feb. 12, 2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Andrus M & Villeneuve S Demographic-Reliant Algorithmic Fairness: Characterizing the Risks of Demographic Data Collection in the Pursuit of Fairness. FAccT ’22 1709–1721. 10.1145/3531146.3533226 (2022). [DOI] [Google Scholar]
- 51.Wang A, Ramaswamy VV & Russakovsky O Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentation, and Performing Evaluation in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (Association for Computing Machinery, Seoul, Republic of Korea, 2022), 336–349. isbn: 9781450393522. 10.1145/3531146.3533101. [DOI] [Google Scholar]
- 52.Tomasev N, McKee KR, Kay J & Mohamed S Fairness for Unobserved Characteristics: Insights from Technological Impacts on Queer Communities in Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (Association for Computing Machinery, Virtual Event, USA, 2021), 254–265. isbn: 9781450384735. 10.1145/3461702.3462540. [DOI] [Google Scholar]
- 53.Bowker GC & Star SL Sorting things out: Classification and its consequences (MIT press, 2000). [Google Scholar]
- 54.Hanna A, Denton R, Smart A & Smith-Loud J Towards a critical race methodology in algorithmic fairness in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Association for Computing Machinery, Barcelona, Spain, 2020), 501-512. isbn: 9781450369367. 10.1145/3351095.3372826. [DOI] [Google Scholar]
- 55.Muntner P. et al. Potential US Population Impact of the 2017 ACC/AHA High Blood Pressure Guideline. Circulation 137. Publisher: American Heart Association, 109–118. 10.1161/CIRCULATIONAHA.117.032582 (2024) (Jan. 9, 2018). [DOI] [Google Scholar]
- 56.Davatchi F. et al. The saga of diagnostic/classification criteria in Behcet’s disease. International Journal of Rheumatic Diseases 18. Publisher: John Wiley & Sons, Ltd, 594–605. issn: 1756-1841. 10.1111/1756-185X.12520 (2024) (July 1, 2015). [DOI] [Google Scholar]
- 57.Winkler JK et al. Association Between Surgical Skin Markings in Dermoscopic Images and Diagnostic Performance of a Deep Learning Convolutional Neural Network for Melanoma Recognition. JAMA Dermatology 155, 1135–1141. issn: 2168-6068. 10.1001/jamadermatol.2019.1735 (2023) (Oct. 1, 2019). [DOI] [Google Scholar]
- 58.Bissoto A, Fornaciali M, Valle E & Avila S (De) Constructing Bias on Skin Lesion Datasets in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (IEEE, Long Beach, CA, USA, June 2019), 2766–2774. isbn: 978-1-72812-506-0. https://ieeexplore.ieee.org/document/9025695/ (2023). [Google Scholar]
- 59.Norgeot B. et al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nature Medicine 26, 1320–1324. issn: 1546-170X. 10.1038/s41591-020-1041-y (Sept. 1, 2020). [DOI] [Google Scholar]
- 60.Hernandez-Boussard T, Bozkurt S, Ioannidis JPA & Shah NH MINIMAR (MINimum Information for Medical AI Reporting): Developing reporting standards for artificial intelligence in health care. Journal of the American Medical Informatics Association 27, 2011–2015. issn: 1527-974X. 10.1093/jamia/ocaa088 (2024) (Dec. 9, 2020). [DOI] [Google Scholar]
- 61.Ganapathi S. et al. Tackling bias in AI health datasets through the STANDING Together initiative. Nature Medicine 28, 2232–2233. issn: 1546-170X. 10.1038/s41591-022-01987-w (Nov. 1, 2022). [DOI] [Google Scholar]
- 62.Collins GS et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 385. eprint: https://www.bmj.com/content/385/bmj-2023-078378.full.pdf. https://www.bmj.com/content/385/bmj-2023-078378 (2024). [Google Scholar]
- 63.Pushkarna M, Zaldivar A & Kjartansson O Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (Association for Computing Machinery, Seoul, Republic of Korea, 2022), 1776-1826. isbn: 9781450393522. 10.1145/3531146.3533231. [DOI] [Google Scholar]
- 64.Gebru T. et al. Datasheets for datasets. Commun. ACM; 64, 86–92. issn: 0001-0782. 10.1145/3458723 (Nov. 2021). [DOI] [Google Scholar]
- 65.Mitchell M. et al. Model Cards for Model Reporting in Proceedings of the Conference on Fairness, Accountability, and Transparency event-place: Atlanta, GA, USA (Association for Computing Machinery, New York, NY, USA, 2019), 220–229. isbn: 978-1-4503-6125-5. 10.1145/3287560.3287596. [DOI] [Google Scholar]
- 66.Janizek JD, Erion GG, DeGrave AJ & Lee S-I An adversarial approach for the robust classification of pneumonia from chest radiographs in ACM CHIL ’20: ACM Conference on Health, Inference, and Learning, Toronto, Ontario, Canada, April 2-4, 2020 [delayed] (ed Ghassemi M) (ACM, 2020), 69–79. 10.1145/3368555.3384458. [DOI] [Google Scholar]
- 67.Chen F, Wang L, Hong J, Jiang J & Zhou L Unmasking bias in artificial intelligence: a systematic review of bias detection and mitigation strategies in electronic health record-based models. Journal of the American Medical Informatics Association 31, 1172–1183. issn: 1527-974X. 10.1093/jamia/ocae060 (2025) (May 1, 2024). [DOI] [Google Scholar]
- 68.Jiageng Wu et al. Clinical Text Datasets for Medical Artificial Intelligence and Large Language Models — A Systematic Review. NEJM AI 1. Publisher: Massachusetts Medical Society, AIra2400012. 10.1056/AIra2400012 (2024) (May 23, 2024). [DOI] [Google Scholar]
- 69.Blueprint for an AI Bill of Rights — OSTP The White House. https://www.whitehouse.gov/ostp/ai-bill-of-rights/ (2024).
- 70.U.S. Government Accountability Office. Artificial Intelligence in Health Care: Benefits and Challenges of Machine Learning Technologies for Medical Diagnostics GAO-22-104629. Published and publicly released on September 29, 2022. (U.S. Government Accountability Office, Sept. 29, 2022). https://www.gao.gov/products/gao-22-104629. [Google Scholar]
- 71.Raab R. et al. Federated electronic health records for the European Health Data Space. The Lancet Digital Health 5. Publisher: Elsevier, e840–e847. issn: 2589-7500. 10.1016/S2589-7500(23)00156-5 (2024) (Nov. 1, 2023). [DOI] [Google Scholar]
- 72.Gutman D. et al. Skin Lesion Analysis toward Melanoma Detection: A Challenge at the International Symposium on Biomedical Imaging (ISBI) 2016, hosted by the International Skin Imaging Collaboration (ISIC) May 4, 2016. arXiv: 1605.01397[cs] http://arxiv.org/abs/1605.01397. [Google Scholar]
- 73.Codella NCF et al. Skin lesion analysis toward melanoma detection: A challenge at the 2017 International symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC) in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). ISSN: 1945-8452 (Apr. 2018), 168–172. [Google Scholar]
- 74.Codella N. et al. Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC) Mar. 29, 2019. arXiv: 1902.03368[cs] http://arxiv.org/abs/1902.03368. [Google Scholar]
- 75.Gonzales S, Carson MB & Holmes K Ten simple rules for maximizing the recommendations of the NIH data management and sharing plan. PLOS Computational Biology 18. Publisher: Public Library of Science, e1010397. 10.1371/journal.pcbi.1010397 (Aug. 3, 2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Bianchi DW et al. The All of Us Research Program is an opportunity to enhance the diversity of US biomedical research. Nature Medicine 30, 330–333. issn: 1546-170X. 10.1038/s41591-023-02744-3 (Feb. 1, 2024). [DOI] [Google Scholar]
- 77.Watson Hope et al. Delivering on NIH data sharing requirements: avoiding Open Data in Appearance Only. BMJ Health & Care Informatics 30, e100771. https://informaticssite-bmj.vercel.app/content/30/1/e100771 (June 21, 2023). [Google Scholar]
- 78.van der Haak M. et al. Data security and protection in cross-institutional electronic patient records. International Journal of Medical Informatics 70. MIE 2002 Special Issue, 117–130. issn: 1386-5056. https://www.sciencedirect.com/science/article/pii/S1386505603000339 (2003). [DOI] [PubMed] [Google Scholar]
- 79.Price WN & Cohen IG Privacy in the age of medical big data. Nature medicine 25. Publisher: Nature Publishing Group; US New York, 37–43. https://www.nature.com/articles/s41591-018-0272-7 (2024) (2019). [Google Scholar]
- 80.Johnson AE et al. MIMIC-III, a freely accessible critical care database. Scientific Data 3, 160035. issn: 2052-4463. 10.1038/sdata.2016.35 (May 24, 2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Johnson AEW et al. MIMIC-IV, a freely accessible electronic health record dataset. Scientific Data 10, 1. issn: 2052-4463. 10.1038/s41597-022-01899-x (Jan. 3, 2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Labkoff SE, Quintana Y & Rozenblit L Identifying the capabilities for creating next-generation registries: a guide for data leaders and a case for “registry science”. Journal of the American Medical Informatics Association 31, 1001–1008. issn: 1527-974X. 10.1093/jamia/ocae024 (2024) (Apr. 1, 2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Rieke N. et al. The future of digital health with federated learning. npj Digital Medicine 3, 119. issn: 2398-6352. 10.1038/s41746-020-00323-1 (Sept. 14, 2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Kushida CA et al. Strategies for De-identification and Anonymization of Electronic Health Record Data for Use in Multicenter Research Studies. Medical Care 50. issn: 0025-7079. https://ournals.lww.com/lww-medicalcare/fulltext/2012/07001/strategies_for_de_identification_and_anonymization.17.aspx (2012). [Google Scholar]
- 85.Sadilek A. et al. Privacy-first health research with federated learning. npj Digital Medicine 4, 132. issn: 23986352. 10.1038/s41746-021-00489-2 (Sept. 7, 2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Sheller MJ et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Scientific Reports 10, 12598. issn: 2045-2322. 10.1038/s41598-020-69250-1 (July 28, 2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Sarma KV et al. Federated learning improves site performance in multicenter deep learning without data sharing. Journal of the American Medical Informatics Association 28, 1259–1264. issn: 1527-974X. 10.1093/jamia/ocaa341 (2024) (June 1, 2021). [DOI] [Google Scholar]
- 88.Lu MY et al. Federated learning for computational pathology on gigapixel whole slide images. Medical Image Analysis 76, 102298. issn: 1361-8415. https://www.sciencedirect.com/science/article/pii/S1361841521003431 (Feb. 1, 2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Van Breugel B, Liu T, Oglic D & van der Schaar M Synthetic data in biomedicine via generative artificial intelligence. Nature Reviews Bioengineering. Publisher: Nature Publishing Group, 1–14. issn: 2731-6092. https://www.nature.com/articles/s44222-024-00245-7 (2024) (Oct. 8, 2024). [Google Scholar]
- 90.D’Amico S. et al. Synthetic Data Generation by Artificial Intelligence to Accelerate Research and Precision Medicine in Hematology. JCO Clinical Cancer Informatics, e2300021. issn: 2473-4276. https://ascopubs.org/doi/10.1200/CCI.23.00021 (2024) (June 2023). [Google Scholar]
- 91.Dahan C, Christodoulidis S, Vakalopoulou M & Boyd J Artifact Removal in Histopathology Images Dec. 16, 2022. arXiv: 2211.16161[eess] http://arxiv.org/abs/2211.16161 (2025). [Google Scholar]
- 92.Jiménez-Sánchez A, Juodelyte D, Chamberlain B & Cheplygina V. Detecting Shortcuts in Medical Images - A Case Study in Chest X-Rays in 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI) 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI). Journal Abbreviation: 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI) (Apr. 18, 2023), 1–5. [Google Scholar]
- 93.Bluethgen C. et al. A vision-language foundation model for the generation of realistic chest X-ray images. Nature Biomedical Engineering. issn: 2157–846X. 10.1038/s41551-024-01246-y (Aug. 26, 2024). [DOI] [Google Scholar]
- 94.Chen RJ, Lu MY, Chen TY, Williamson DFK & Mahmood F Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering 5, 493–497. issn: 2157-846X. 10.1038/s41551-021-00751-8 (June 1, 2021). [DOI] [Google Scholar]
- 95.Shaban MT, Baur C, Navab N & Albarqouni S Staingan: Stain Style Transfer for Digital Histological Images. in 16th IEEE International Symposium on Biomedical Imaging, ISBI 2019, Venice, Italy, April 8-11, 2019 (2019), 953–956. 10.1109/ISBI.2019.8759152. [DOI] [Google Scholar]
- 96.Ktena I. et al. Generative models improve fairness of medical classifiers under distribution shifts. Nature Medicine 30, 1166–1173. issn: 1546-170X. 10.1038/s41591-024-02838-6 (Apr. 1, 2024). [DOI] [Google Scholar]
- 97.Sagers LW et al. Augmenting medical image classifiers with synthetic data from latent diffusion models Aug. 23, 2023. arXiv: 2308.12453[cs] http://arxiv.org/abs/2308.12453 (2025). [Google Scholar]
- 98.Salimans T. et al. Improved Techniques for Training GANs in Advances in Neural Information Processing Systems (eds Lee D, Sugiyama M, Luxburg U, Guyon I & Garnett R) 29 (Curran Associates, Inc., 2016). https://proceedings.neurips.cc/paper_files/paper/2016/file/8a3363abe792db2d8761d6403605aeb7-Paper.pdf. [Google Scholar]
- 99.Naeem MF, Oh SJ, Uh Y, Choi Y & Yoo J Reliable Fidelity and Diversity Metrics for Generative Models in Proceedings of the 37th International Conference on Machine Learning (eds III HD & Singh A) 119 (PMLR, July 13, 2020), 7176–7185. https://proceedings.mlr.press/v119/naeem20a.html. [Google Scholar]
- 100.Jordon J et al. Synthetic Data - what, why and how? CoRR abs/2205.03257. arXiv: 2205.03257 10.48550/arXiv.2205.03257 (2022). [DOI] [Google Scholar]
- 101.Lee J & Clifton C How Much Is Enough? Choosing ϵ for Differential Privacy in Information Security (eds Lai X & Li H) (Springer Berlin Heidelberg, Berlin, Heidelberg, 2011), 325–340. isbn: 978-3-642-24861-0. [Google Scholar]
- 102.Esteban C, Hyland SL & Rätsch G Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs. CoRR abs/1706.02633. arXiv: 1706.02633 http://arxiv.org/abs/1706.02633 (2017). [Google Scholar]
- 103.Chefer H. et al. The Hidden Language of Diffusion Models in The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=awWpHnEJDw. [Google Scholar]
- 104.Shen Y, Gu J, Tang X & Zhou B Interpreting the latent space of gans for semantic face editing in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2020), 9243–9252. [Google Scholar]
- 105.Lu M, Lin C, Kim C & Lee S-I An Efficient Framework for Crediting Data Contributors of Diffusion Models in The Thirteenth International Conference on Learning Representations (2025). https://openreview.net/forum?id=9EqQC2ct4H. [Google Scholar]
- 106.Zheng X, Pang T, Du C, Jiang J & Lin M Intriguing Properties of Data Attribution on Diffusion Models in The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 (OpenReview.net, 2024). https://openreview.net/forum?id=vKViCoKGcB. [Google Scholar]
- 107.Kim E, Kim S, Park M, Entezari R & Yoon S Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion 2025. arXiv: 2408.12692 [cs.AI] https://arxiv.org/abs/2408.12692. [Google Scholar]
- 108.Shrikumar A, Greenside P, Shcherbina A & Kunda je A Not just a black box: Learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713 (2016). [Google Scholar]
- 109.Selvaraju RR et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. International Journal of Computer Vision 128, 336–359. issn: 0920-5691, 1573-1405. arXiv: 1610.02391[cs]. http://arxiv.org/abs/1610.02391 (2023) (Feb. 2020). [Google Scholar]
- 110.Wang H. et al. Score-CAM: Score-weighted visual explanations for convolutional neural networks in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (2020), 24–25. [Google Scholar]
- 111.Sundararajan M, Taly A & Yan Q Axiomatic attribution for deep networks in International conference on machine learning (2017), 3319–3328. [Google Scholar]
- 112.Covert I, Lundberg SM & Lee S-I Explaining by Removing: A Unified Framework for Model Explanation. J. Mach. Learn. Res 22, 209:1–209:90. https://jmlr.org/papers/v22/20-1316.html (2021). [Google Scholar]
- 113.Petsiuk V, Das A & Saenko K RISE: Randomized Input Sampling for Explanation of Black-box Models in British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3-6, 2018 (BMVA Press, 2018), 151. http://bmvc2018.org/contents/papers/1064.pdf. [Google Scholar]
- 114.Fong RC & Vedaldi A Interpretable Explanations of Black Boxes by Meaningful Perturbation in Proceedings of the IEEE International Conference on Computer Vision (ICCV) (Oct. 2017). [Google Scholar]
- 115.Ribeiro MT, Singh S & Guestrin C ” Why should i trust you?” Explaining the predictions of any classifier in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (2016), 1135–1144. [Google Scholar]
- 116.Lundberg SM & Lee S-I A Unified Approach to Interpreting Model Predictions in Advances in Neural Information Processing Systems (eds Guyon I et al. 30 (Curran Associates, Inc., 2017). https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf. [Google Scholar]
- 117.Loh HW et al. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011–2022). Computer Methods and Programs in Biomedicine 226, 107161. issn: 0169-2607. https://www.sciencedirect.com/science/article/pii/S0169260722005429 (2022). [DOI] [PubMed] [Google Scholar]
- 118.Lundberg SM & Lee S-I A Unified Approach to Interpreting Model Predictions in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA (eds Guyon I et al. (2017), 4765–4774. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html. [Google Scholar]
- 119.Jahmunah V, Ng EY, Tan R-S, Oh SL & Acharya UR Explainable detection of myocardial infarction using deep learning models with Grad-CAM technique on ECG signals. Computers in Biology and Medicine 146, 105550 (2022). [DOI] [PubMed] [Google Scholar]
- 120.Wu Y, Zhang L, Bhatti UA & Huang M Interpretable machine learning for personalized medical recommendations: A LIME-based approach. Diagnostics 13, 2681 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Vimbi V, Shaffi N & Mahmud M Interpreting artificial intelligence models: a systematic review on the application of LIME and SHAP in Alzheimer’s disease detection. Brain Informatics 11, 10 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Aldughayfiq B, Ashfaq F, Jhanjhi N & Humayun M Explainable AI for retinoblastoma diagnosis: interpreting deep learning models with LIME and SHAP. Diagnostics 13, 1932 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Wang X. et al. A radiomics model combined with XGBoost may improve the accuracy of distinguishing between mediastinal cysts and tumors: a multicenter validation analysis. Annals of Translational Medicine 9 (2021). [Google Scholar]
- 124.Lundberg SM et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering 2. Publisher: Nature Publishing Group, 749–760. issn: 2157-846X. https://www.nature.com/articles/s41551-018-0304-0 (2024) (Oct. 2018). [Google Scholar]
- 125.Beebe-Wang N. et al. Unified AI framework to uncover deep interrelationships between gene expression and Alzheimer’s disease neuropathologies. Nature Communications 12, 5369 (2021). [Google Scholar]
- 126.Qiu W, Chen H, Kaeberlein M & Lee S-I An explainable AI framework for interpretable biological age. medRxiv, 2022-10 (2022). [Google Scholar]
- 127.Dombrowski A-K et al. Explanations can be manipulated and geometry is to blame. Advances in neural information processing systems 32 (2019). [Google Scholar]
- 128.Adebayo J. et al. Sanity Checks for Saliency Maps in Advances in Neural Information Processing Systems (eds Bengio S et al. 31 (Curran Associates, Inc., 2018). https://proceedings.neurips.cc/paper_files/paper/2018/file/294a8ed24b1ad22ec2e7efea049b8737-Paper.pdf. [Google Scholar]
- 129.Jethani N, Sudarshan M, Covert IC, Lee S-I & Ranganath R Fastshap: Real-time shapley value estimation in International conference on learning representations (2021). [Google Scholar]
- 130.Covert I & Lee S-I Improving kernelshap: Practical shapley value estimation using linear regression in International Conference on Artificial Intelligence and Statistics (2021), 3457–3465. [Google Scholar]
- 131.Lundberg SM et al. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence 2, 56–67 (2020). [Google Scholar]
- 132.Covert IC, Kim C & Lee S-I Learning to Estimate Shapley Values with Vision Transformers in The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 (OpenReview.net, 2023). https://openreview.net/forum?id=5ktFNz%5C_pJLK. [Google Scholar]
- 133.Lipton ZC The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 16. Publisher: ACM New York, NY, USA, 31–57 (2018). [Google Scholar]
- 134.Ghassemi M, Oakden-Rayner L & Beam AL The false hope of current approaches to explainable artificial intelligence in health care. The Lancet Digital Health 3. Publisher: Elsevier, e745–e750. https://www.thelancet.com/journals/landig/article/PIIS2589-7500(21)00208-9/fulltext?tpcc=nleyeonai (2024) (2021). [Google Scholar]
- 135.Daneshjou R, Yuksekgonul M, Cai ZR, Novoa RA & Zou J SkinCon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis in. Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Oct. 12, 2022). https://openreview.net/forum?id=gud0qopqJc4 (2022). [Google Scholar]
- 136.Tsao H. et al. Early detection of melanoma: reviewing the ABCDEs. Journal of the American Academy of Dermatology 72, 717–723 (2015). [DOI] [PubMed] [Google Scholar]
- 137.Chen Z, Bei Y & Rudin C Concept whitening for interpretable image recognition. Nature Machine Intelligence 2, 772–782. issn: 2522-5839. 10.1038/s42256-020-00265-z (Dec. 1, 2020). [DOI] [Google Scholar]
- 138.Crabbé J & Schaar M. v. d. Concept Activation Regions: A Generalized Framework For Concept-Based Explanations. in Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. (2022). http://papers.nips.cc/paper_files/paper/2022/hash/11a7f429d75f9f8c6e9c630aeb6524b5-Abstract-Conference.html. [Google Scholar]
- 139.Kim B. et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) in Proceedings of the 35th International Conference on Machine Learning International Conference on Machine Learning. ISSN: 2640-3498 (PMLR, July 3, 2018), 2668–2677. https://proceedings.mlr.press/v80/kim18d.html (2023). [Google Scholar]
- 140.Janik A, Dodd J, Ifrim G, Sankaran K & Curran K Interpretability of a deep learning model in the application of cardiac MRI segmentation with an ACDC challenge dataset in Medical imaging 2021: image processing 11596 (2021), 861–872. [Google Scholar]
- 141.Mincu D. et al. Concept-based model explanations for electronic health records in Proceedings of the Conference on Health, Inference, and Learning (2021), 36–46. [Google Scholar]
- 142.Ghorbani A, Wexler J, Zou JY & Kim B Towards automatic concept-based explanations. Advances in neural information processing systems 32 (2019). [Google Scholar]
- 143.Radford A. et al. Learning transferable visual models from natural language supervision in International conference on machine learning (2021), 8748–8763. [Google Scholar]
- 144.Kim C. et al. Transparent medical image AI via an image–text foundation model grounded in medical literature. Nature Medicine 30. Publisher: Nature Publishing Group, 1154–1165. issn: 1546-170X. https://www.nature.com/articles/s41591-024-02887-x (2024) (Apr. 2024). [Google Scholar]
- 145.Lu MY et al. A visual-language foundation model for computational pathology. Nature Medicine 30, 863–874 (2024). [Google Scholar]
- 146.Sheng Zhang et al. A Multimodal Biomedical Foundation Model Trained from Fifteen Million Image–Text Pairs. NEJM AI 2. Publisher: Massachusetts Medical Society, AIoa2400640. 10.1056/AIoa2400640 (2025) (Jan. 1, 2025). [DOI] [Google Scholar]
- 147.Huang Z, Bianchi F, Yuksekgonul M, Montine TJ & Zou J A visual-language foundation model for pathology image analysis using medical twitter. Nature medicine 29, 2307–2316 (2023). [Google Scholar]
- 148.Ikezogwo W. et al. Quilt-1m: One million image-text pairs for histopathology. Advances in neural information processing systems 36 (2024). [Google Scholar]
- 149.Dunlap L. et al. Describing differences in image sets with natural language in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024), 24199–24208. [Google Scholar]
- 150.Li J, Li D, Savarese S & Hoi S Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models in International conference on machine learning (2023), 19730–19742. [Google Scholar]
- 151.Verma S, Dickerson J & Hines K Counterfactual explanations for machine learning: A review. arXiv preprint arXiv:2010.10596 2, 1 (2020). [Google Scholar]
- 152.Lang O. et al. Explaining in style: training a GAN to explain a classifier in stylespace in Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), 693–702. [Google Scholar]
- 153.Singla S, Pollack B, Chen J & Batmanghelich K Explanation by Progressive Exaggeration in International Conference on Learning Representations (2020). https://openreview.net/forum?id=H1xFWgrFPS. [Google Scholar]
- 154.DeGrave AJ, Cai ZR, Janizek JD, Daneshjou R & Lee S-I Auditing the inference processes of medical image classifiers by leveraging generative AI and the expertise of physicians. Nature Biomedical Engineering. issn: 2157-846X. 10.1038/s41551-023-01160-9 (Dec. 28, 2023). [DOI] [Google Scholar]
- 155.Gadgil SU, DeGrave AJ, Daneshjou R & Lee S-I Discovering mechanisms underlying medical AI prediction of protected attributes. medRxiv, 2024-04 (2024). [Google Scholar]
- 156.Lang O. et al. Using generative AI to investigate medical imagery models and datasets. eBioMedicine 102. Publisher: Elsevier. issn: 2352-3964. https://www.thelancet.com/journals/ebiom/article/PIIS2352-3964(24)00110-5/fulltext (2024) (Apr. 1, 2024). [Google Scholar]
- 157.Molnar C. A guide for making black box models explainable. URL: https://christophm.github.io/interpretable-ml-book 2, 10 (2018). [Google Scholar]
- 158.Quinlan JR Induction of decision trees. Machine learning 1, 81–106 (1986). [Google Scholar]
- 159.Bennett KP & Blue JA Optimal decision trees. Rensselaer Polytechnic Institute Math Report 214, 128 (1996). [Google Scholar]
- 160.Hastie TJ Generalized additive models. Statistical models in S, 249–307 (2017). [Google Scholar]
- 161.Lou Y, Caruana R, Gehrke J & Hooker G Accurate intelligible models with pairwise interactions in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (2013), 623–631. [Google Scholar]
- 162.Semenova L, Rudin C & Parr R On the existence of simpler machine learning models in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (2022), 1827–1858. [Google Scholar]
- 163.Mohammadjafari S, Cevik M, Thanabalasingam M & Basar A Using ProtoPNet for Interpretable Alzheimer’s Disease Classification. in Canadian AI (2021). [Google Scholar]
- 164.Koh PW et al. Concept bottleneck models in International conference on machine learning (2020), 5338–5348. [Google Scholar]
- 165.Lanchantin J, Wang T, Ordonez V & Qi Y General multi-label image classification with transformers in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2021), 16478–16488. [Google Scholar]
- 166.Jeyakumar JV et al. Automatic concept extraction for concept bottleneck-based video classification. arXiv preprint arXiv:2206.10129 (2022). [Google Scholar]
- 167.Sun X. et al. Interpreting deep learning models in natural language processing: A review. arXiv preprint arXiv:2110.10470 (2021). [Google Scholar]
- 168.Klimiene U. et al. Multiview concept bottleneck models applied to diagnosing pediatric appendicitis in 2nd Workshop on Interpretable Machine Learning in Healthcare (IMLH) (2022). [Google Scholar]
- 169.Wu C, Parbhoo S, Havasi M & Doshi-Velez F Learning optimal summaries of clinical time-series with concept bottleneck models in Machine Learning for Healthcare Conference (2022), 648–672. [Google Scholar]
- 170.Barnett AJ et al. A case-based interpretable deep learning model for classification of mass lesions in digital mammography. Nature Machine Intelligence 3, 1061–1070 (2021). [Google Scholar]
- 171.Yuksekgonul M, Wang M & Zou J Post-hoc concept bottleneck models. arXiv preprint arXiv:2205.15480 (2022). [Google Scholar]
- 172.Jin D. et al. What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams. Applied Sciences 11. issn: 2076-3417 (2021). [Google Scholar]
- 173.Li C. et al. LLaVA-med: training a large language-and-vision assistant for biomedicine in one day in Proceedings of the 37th International Conference on Neural Information Processing Systems event-place: New Orleans, LA, USA (Curran Associates Inc., Red Hook, NY, USA, 2024). [Google Scholar]
- 174.Gilbert S, Harvey H, Melvin T, Vollebregt E & Wicks P Large language model AI chatbots require approval as medical devices. Nature Medicine 29, 2396–2398. issn: 1546-170X. 10.1038/s41591-023-02412-6 (Oct. 1, 2023). [DOI] [Google Scholar]
- 175.Moor M. et al. Foundation models for generalist medical artificial intelligence. Nature 616. Publisher: Nature Publishing Group UK London, 259–265. https://www.nature.com/articles/s41586-023-05881-4 (2024) (2023). [Google Scholar]
- 176.Chowdhery A. et al. PaLM: Scaling Language Modeling with Pathways. Journal of Machine Learning Research 24, 1–113. http://jmlr.org/papers/v24/22-1144.html (2023). [Google Scholar]
- 177.Huang S, Mamidanna S, Jangam S, Zhou Y & Gilpin LH Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations Oct. 17, 2023. arXiv: 2310.11207. http://arxiv.org/abs/2310.11207 (2024). [Google Scholar]
- 178.Wei J. et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models in Advances in Neural Information Processing Systems (eds Koyejo S et al.) 35 (Curran Associates, Inc., 2022), 24824–24837. https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf. [Google Scholar]
- 179.Madsen A, Chandar S & Reddy S Are self-explanations from Large Language Models faithful? in Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024 (eds Ku L-W, Martins A & Srikumar V) (Association for Computational Linguistics, 2024), 295–337. 10.18653/v1/2024.findings-acl.19. [DOI] [Google Scholar]
- 180.Turpin M, Michael J, Perez E & Bowman SR Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting in Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 (eds Oh A et al. (2023). http://papers.nips.cc/paper%5C_files/paper/2023/hash/ed3fea9033a80fea1376299fa7863f4a-Abstract-Conference.html. [Google Scholar]
- 181.Agarwal C, Tanneru SH & Lakkaraju H Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models Mar. 14, 2024. arXiv: 2402.04614. http://arxiv.org/abs/2402.04614 (2024). [Google Scholar]
- 182.Madsen A, Lakkaraju H, Reddy S & Chandar S Interpretability Needs a New Paradigm eprint: 2405.05386. 2024. https://arxiv.org/abs/2405.05386. [Google Scholar]
- 183.Peng Z. et al. Kosmos-2: Grounding Multimodal Large Language Models to the World https://www.microsoft.com/en-us/research/publication/kosmos-2-grounding-multimodal-large-language-models-to-the-world/ (June 2023). [Google Scholar]
- 184.Cunningham H, Ewart A, Riggs L, Huben R & Sharkey L Sparse autoencoders find highly interpretable features in language models. arXiv preprint arXiv:2309.08600 (2023). [Google Scholar]
- 185.Bills S. et al. Language models can explain neurons in language models. URL https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html.(Date accessed: 14.05. 2023) 2 (2023). [Google Scholar]
- 186.Templeton A. et al. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. Transformer Circuits Thread. https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html (2024). [Google Scholar]
- 187.Conmy A, Mavor-Parker A, Lynch A, Heimersheim S & Garriga-Alonso A Towards Automated Circuit Discovery for Mechanistic Interpretability in Advances in Neural Information Processing Systems (eds Oh A et al. 36 (Curran Associates, Inc., 2023), 16318–16352. https://proceedings.neurips.cc/paper_files/paper/2023/file/34e1dbe95d34d7ebaf99b9bcaeb5b2be-Paper-Conference.pdf. [Google Scholar]
- 188.Simon E & Zou J InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders. bioRxiv, 2024.11.14.623630. http://biorxiv.org/content/early/2024/11/15/2024.11.14.623630.abstract (Jan. 1, 2024). [Google Scholar]
- 189.Le N. et al. Interpretability analysis on a pathology foundation model reveals biologically relevant embeddings across modalities in ICML 2024 Workshop on Mechanistic Interpretability (2024). https://openreview.net/forum?id=briEoJFKof. [Google Scholar]
- 190.Zhong H. et al. Copyright Protection and Accountability of Generative AI: Attack, Watermarking and Attribution in Companion Proceedings of the ACM Web Conference 2023 WWW ’23: The ACM Web Conference 2023 (ACM, Austin TX USA, Apr. 30, 2023), 94–98. isbn: 978-1-4503-9419-2. https://dl.acm.org/doi/10.1145/3543873.3587321 (2024). [Google Scholar]
- 191.Liu Y. et al. Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models’ Alignment eprint: 2308.05374. 2024. https://arxiv.org/abs/2308.05374. [Google Scholar]
- 192.Huang Y. et al. TrustLLM: Trustworthiness in Large Language Models eprint: 2401.05561. 2024. https://arxiv.org/abs/2401.05561. [Google Scholar]
- 193.Lin C, Lu M, Kim C & Lee S-I Efficient Shapley Values for Attributing Global Properties of Diffusion Models to Data Group June 9, 2024. https://arxiv.org/abs/2407.03153v1 (2024). [Google Scholar]
- 194.Li D. et al. A Survey of Large Language Models Attribution version: 2. Dec. 14, 2023. arXiv: 2311. 03731 http://arxiv.org/abs/2311.03731 (2024). [Google Scholar]
- 195.Gao L. et al. RARR: Researching and Revising What Language Models Say, Using Language Models May 31, 2023. arXiv: 2210.08726 http://arxiv.org/abs/2210.08726 (2024). [Google Scholar]
- 196.Guu K. et al. Simfluence: Modeling the Influence of Individual Training Examples by Simulating Training Runs Mar. 14, 2023. arXiv: 2303.08114 http://arxiv.org/abs/2303.08114 (2024). [Google Scholar]
- 197.Gao Y. et al. Retrieval-Augmented Generation for Large Language Models: A Survey eprint: 2312.10997. 2024. https://arxiv.org/abs/2312.10997. [Google Scholar]
- 198.Lewis P. et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 33, 9459–9474 (2020). [Google Scholar]
- 199.Ancona M, Ceolini E, Öztireli C & Gross M Towards better understanding of gradient-based attribution methods for Deep Neural Networks in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings (OpenReview.net, 2018). https://openreview.net/forum?id=Sy21R9JAW. [Google Scholar]
- 200.Hooker S, Erhan D, Kindermans P-J & Kim B in Advances in Neural Information Processing Systems 32 (eds Wallach H et al. 9737–9748 (Curran Associates, Inc., 2019). http://papers.nips.cc/paper/9167-a-benchmark-for-interpretability-methods-in-deep-neural-networks.pdf. [Google Scholar]
- 201.Zhang J, Lin Z, Brandt J, Shen X & Sclaroff S Top-Down Neural Attention by Excitation Backprop in Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV (eds Leibe B, Matas J, Sebe N & Welling M) 9908 (Springer, 2016), 543–559. [Google Scholar]
- 202.Jin S. et al. AI-assisted CT imaging analysis for COVID-19 screening: Building and deploying a medical AI system in four weeks. MedRxiv, 2020-03 (2020). [Google Scholar]
- 203.Jotterand F & Bosco C Keeping the “human in the loop” in the age of artificial intelligence: accompanying commentary for “correcting the brain?” by Rainey and Erden. Science and Engineering Ethics 26, 2455–2460 (2020). [DOI] [PubMed] [Google Scholar]
- 204.Bakken S. AI in health: keeping the human in the loop. Journal of the American Medical Informatics Association 30, 1225–1226. issn: 1527-974X. 10.1093/jamia/ocad091 (2025) (July 1, 2023). [DOI] [Google Scholar]
- 205.Kostick-Quenet KM & Gerke S AI in the hands of imperfect users. npj Digital Medicine 5, 197. issn: 2398-6352. 10.1038/s41746-022-00737-z (Dec. 28, 2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 206.Wu JT et al. AI accelerated human-in-the-loop structuring of radiology reports in AMIA Annual Symposium Proceedings 2020 (2021), 1305. [Google Scholar]
- 207.FDA. Transparency for Machine Learning-Enabled Medical Devices: Guiding Principles — FDA; https://www.fda.gov/medical-devices/software-medical-device-samd/transparency-machine-learning-enabled-medical-devices-guiding-principles (2025). [Google Scholar]
- 208.Marchetti MA et al. Prospective validation of dermoscopy-based open-source artificial intelligence for melanoma diagnosis (PROVE-AI study). npj Digital Medicine 6, 127. issn: 2398-6352. 10.1038/s41746-023-00872-1 (July 12, 2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 209.Dembrower K, Crippa A, Colón E, Eklund M & Strand F Artificial intelligence for breast cancer detection in screening mammography in Sweden: a prospective, population-based, paired-reader, non-inferiority study. The Lancet Digital Health 5, e703–e711. issn: 25897500. https://linkinghub.elsevier.com/retrieve/pii/S258975002300153X (2024) (Oct. 2023). [Google Scholar]
- 210.Elhakim MT et al. AI-integrated screening to replace double reading of mammograms: a population-wide accuracy and feasibility study. Radiology: Artificial Intelligence 6, e230529 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 211.Sridharan S. et al. Real-World evaluation of an AI triaging system for chest X-rays: A prospective clinical study. European Journal of Radiology 181 (2024). [Google Scholar]
- 212.Patel MR, Balu S & Pencina MJ Translating AI for the Clinician. JAMA 332, 1701–1702. issn: 0098-7484. 10.1001/jama.2024.21772 (2025) (Nov. 26, 2024). [DOI] [Google Scholar]
- 213.Steidl M, Felderer M & Ramler R The pipeline for the continuous development of artificial intelligence models—Current state of research and practice. Journal of Systems and Software 199, 111615 (2023). [Google Scholar]
- 214.Feng J. et al. Clinical artificial intelligence quality improvement: towards continual monitoring and updating of AI algorithms in healthcare. npj Digital Medicine 5, 66. issn: 2398-6352. 10.1038/s41746-022-00611-y (May 31, 2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 215.Shah NH, Pfeffer MA & Ghassemi M The Need for Continuous Evaluation of Artificial Intelligence Prediction Algorithms. JAMA Network Open 7, e2433009–e2433009 (2024). [DOI] [PubMed] [Google Scholar]
- 216.Beede E. et al. A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy in Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems event-place: Honolulu, HI, USA (Association for Computing Machinery, New York, NY, USA, 2020), 1–12. isbn: 978-1-4503-6708-0. 10.1145/3313831.3376718. [DOI] [Google Scholar]
- 217.Vokinger KN, Feuerriegel S & Kesselheim AS Continual learning in medical devices: FDA’s action plan and beyond. The Lancet Digital Health 3, e337–e338 (2021). [DOI] [PubMed] [Google Scholar]
- 218.Bouderhem R. Shaping the future of AI in healthcare through ethics and governance. Humanities and Social Sciences Communications 11, 416. issn: 2662-9992. 10.1057/s41599-024-02894-w (Mar. 15, 2024). [DOI] [Google Scholar]
- 219.Joshi G. et al. FDA-approved artificial intelligence and machine learning (AI/ML)-enabled medical devices: an updated landscape. Electronics 13, 498 (2024). [Google Scholar]
- 220.Food, U., Administration, D., et al. Artificial intelligence and machine learning in software as a medical device. US Food & Drug Administration: Silver Spring, MD, USA: (2021). [Google Scholar]
- 221.Health, C. f. D. bibinitperiod R. Good Machine Learning Practice for Medical Device Development: Guiding Principles Publisher: FDA. https://www.fda.gov/medical-devices/software-medical-device-samd/good-machine-learning-practice-medical-device-development-guiding-principles. [Google Scholar]
- 222.Meskó B & Topol EJ The imperative for regulatory oversight of large language models (or generative AI) in healthcare. npj Digital Medicine 6, 120. issn: 2398-6352. 10.1038/s41746-023-00873-0 (July 6, 2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 223.Minssen T, Vayena E & Cohen IG The Challenges for Regulating Medical Use of ChatGPT and Other Large Language Models. JAMA 330, 315–316. issn: 0098-7484. 10.1001/jama.2023.9651 (2024) (July 25, 2023). [DOI] [Google Scholar]
- 224.Groeneveld D. et al. OLMo: Accelerating the Science of Language Models in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024 (eds Ku L, Martins A & Srikumar V) (Association for Computational Linguistics, 2024), 15789–15809. 10.18653/v1/2024.acl-long.841. [DOI] [Google Scholar]
- 225.Riedemann L, Labonne M & Gilbert S The path forward for large language models in medicine is open. npj Digital Medicine 7, 339. issn: 2398-6352. 10.1038/s41746-024-01344-w (Nov. 27, 2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 226.Kanter GP & Packel EA Health Care Privacy Risks of AI Chatbots. JAMA 330, 311–312. issn: 0098-7484. 10.1001/jama.2023.9618 (2024. (July 25, 2023). [DOI] [Google Scholar]
- 227.Hsieh C-Y et al. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes. in Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023. (2023, 8003–8017. 10.18653/v1/2023.findings-acl.507. [DOI] [Google Scholar]
- 228.Kim S. et al. SqueezeLLM: Dense-and-Sparse Quantization. in Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024 (2024). https://openreview.net/forum?id=0jpbpFia8m. [Google Scholar]
- 229.Dettmers T, Lewis M, Belkada Y & Zettlemoyer L GPT3.int8(): 8-bit Matrix Multiplication for Transformers at Scale. in Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. (2022). http://papers.nips.cc/paper_files/paper/2022/hash/c3ba4962c05c49636d4c6206a97e9c8a-Abstract-Conference.html. [Google Scholar]
- 230.David Blumenthal & Bakul Patel. The Regulation of Clinical Artificial Intelligence. NEJM AI 1. Publisher: Massachusetts Medical Society, AIpc2400545. 10.1056/AIpc2400545 (2025) (July 25, 2024). [DOI] [Google Scholar]
- 231.Covert IC, Kim C, Lee S-I, Zou J & Hashimoto T Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution in The Thirty-eighth Annual Conference on Neural Information Processing Systems (2024). https://openreview.net/forum?id=ZdWTN2HOie. [Google Scholar]
- 232.Li W & Yu Y Faster Approximation of Probabilistic and Distributional Values via Least Squares in The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=lvSMIsztka. [Google Scholar]
- 233.Park SM, Georgiev K, Ilyas A, Leclerc G & Madry A TRAK: Attributing Model Behavior at Scale in Proceedings of the 40th International Conference on Machine Learning (eds Krause A et al. ) 202 (PMLR, July 2023), 27074–27113. https://proceedings.mlr.press/v202/park23c.html. [Google Scholar]
- 234.M MM, T. R M, V VK & Guluwadi S Enhancing brain tumor detection in MRI images through explainable AI using Grad-CAM with Resnet 50. BMC Medical Imaging 24, 107. issn: 1471-2342. 10.1186/s12880-024-01292-7 (May 11, 2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 235.Panwar H. et al. A deep learning and grad-CAM based color visualization approach for fast detection of COVID-19 cases using chest X-ray and CT-Scan images. Chaos, Solitons & Fractals 140, 110190 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 236.Gildenblat, J. & contributors. PyTorch library for CAM methods https://github.com/jacobgil/pytorch-grad-cam. 2021. [Google Scholar]
- 237.Erion G, Janizek JD, Sturmfels P, Lundberg SM & Lee S-I Improving performance of deep learning models with axiomatic attribution priors and expected gradients. Nature machine intelligence 3, 620–631 (2021). [Google Scholar]
- 238.Chen C. et al. This looks like that: deep learning for interpretable image recognition. Advances in neural information processing systems 32 (2019). [Google Scholar]
- 239.Kim Y, Wu J, Abdulle Y & Wu H MedExQA: Medical Question Answering Benchmark with Multiple Explanations in. Trans. by Demner-Fushman D, Ananiadou S, Miwa M, Roberts K & Tsujii J Type: 10.18653/v1/2024.bionlp-1.14 (Association for Computational Linguistics, Bangkok, Thailand, Aug. 2024), 167–181. https://aclanthology.org/2024.bionlp-1.14/,%20https://doi.org/10.18653/v1/2024.bionlp-1.14. [DOI] [Google Scholar]
- 240.Lindsey J. et al. On the Biology of a Large Language Model Publisher: Transformer Circuits. https://transformer-circuits.pub/2025/attribution-graphs/biology.html. [Google Scholar]
- 241.Meng K, Bau D, Andonian A & Belinkov Y Locating and Editing Factual Associations in GPT in Advances in Neural Information Processing Systems (eds Koyejo S et al. ) 35 (Curran Associates, Inc., 2022), 17359–17372. https://proceedings.neurips.cc/paper_files/paper/2022/file/6f1d43d5a82a37e89b0665b33bf3a182-Paper-Conference.pdf. [Google Scholar]
- 242.Zakka C. et al. Almanac—retrieval-augmented language models for clinical medicine. Nejm ai 1, AIoa2300068 (2024). [Google Scholar]
- 243.Kim J, Hur M & Min M in Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing 1293–1295 (Association for Computing Machinery, New York, NY, USA, 2025). isbn: 9798400706295. 10.1145/3672608.3707749. [DOI] [Google Scholar]
