Skip to main content
Frontiers in Public Health logoLink to Frontiers in Public Health
. 2026 Apr 2;14:1776922. doi: 10.3389/fpubh.2026.1776922

Uncloaking the black-box: the need for explainable artificial intelligence in clinical microbiology and infectious diseases applications

Sreya Pulakkat Warrier 1,, Venkatesh Narasimhan 1,, Eline Meijer 2, Yukino Gütlin 2, Oliver Nolte 2, Balaji Veeraraghavan 1,, Adrian Egli 2,*,
PMCID: PMC13082983  PMID: 42007334

Abstract

Antimicrobial resistance and emerging infectious diseases remain significant challenges for global health, driving a need for advanced technological solutions. Artificial Intelligence (AI) expanded opportunities in clinical microbiology, infectious diseases, and public health by harnessing vast, structured datasets. Despite impressive analytical capabilities, the clinical integration of AI-based applications is hindered by its opacity. The “black-box” aspect undermines adoption into healthcare workflows. Explainable AI (XAI) methods, including intrinsically interpretable models and post-hoc interpretability tools, such as SHAP, LIME, and Grad-CAM, can address these transparency challenges. This narrative review is intended to be a primer for the interested clinician. It systematically evaluates recent advancements in XAI in the context of clinical applications for clinical microbiology, infectious diseases, and public health. We further discuss the ethical and regulatory landscape shaping AI adoption, including the critical role of open, quality-controlled data, robust performance metrics, and clear interpretability to ensure safe and effective clinical implementation. Lastly, we propose future directions, emphasizing interdisciplinary collaboration, international data-sharing initiatives, and tailored AI literacy training to facilitate trustworthy, equitable, and impactful use of AI in clinical microbiology and infectious diseases.

Keywords: antimicrobial resistance, artificial intelligence, deep learning, explainable AI/XAI, genomics, infectious diseases, lime, machine learning

Introduction

Around the globe, antimicrobial resistant and virulent pathogens continue to drive morbidity, mortality, and escalating healthcare costs, pose ongoing public health challenges, and impact modern medicine (1, 2). Despite transformative advances in hygiene, antibiotics, and vaccines, infectious diseases remain a major cause of childhood mortality worldwide, but affect all age groups, ranking alongside cardiovascular diseases and cancer (3). Recent surveillance efforts during the COVID19 pandemic (4), Mpox (5), and Corynebacterium diphtheriae (6) outbreaks highlight the crucial role of open data in diagnostics, infection control, and enhancing insights into pathogen biology (7, 8). The exponential growth of structured, open healthcare and laboratory datasets, and high-performance computing has expanded opportunities for artificial intelligence (AI)-founded applications in clinical microbiology, infectious diseases, and public health (9, 10). To effectively implement these technologies, a robust, infectious disease-focused data ecosystem is necessary. Adherence to high quality, curated, and FAIR [Findable, Accessible, Interoperable, and Reusable (7, 11)] data principles is essential to maximise their utility for domain-specific clinical and research purposes. In our case, AI models trained on large-scale microbiological diagnostic data can be adapted to a wide range of tasks, even though they were not explicitly trained for each specific application (12, 13). However, a major critical aspect of the ongoing technological revolution in medicine is the lack of explainability of highly complex AI applications and their underlying models. This results in resistance to implement such algorithms in diagnostic and therapeutic workflows, especially when linked to human healthcare outcomes (Figure 1).

Figure 1.

Flowchart illustration showing how patient, drug, and pathogen data are used in AI-based clinical decision support for diagnostics. Two outcomes are depicted: black box AI, which leads to clinician confusion, and explainable AI, which facilitates clinician understanding.

Integration of artificial intelligence (AI) into diagnostic and antimicrobial stewardship workflows. Clinical and laboratory data encompassing patient characteristics, drug information, and pathogen attributes serve as foundational input adhering to FAIR (Findable, Accessible, Interoperable, Reusable) data principles. The AI system operates across multiple stages of diagnostic decision-making, including pre-analytics (guiding appropriate diagnostic tests), analytics (identifying pathogens, performing antimicrobial susceptibility testing [AST], and outbreak detection), post-analytics (interpreting results and suggesting subsequent clinical actions), and decision support (enhancing antibiotic stewardship). The output of AI models can follow two distinct pathways: (i) “Black box” models, which lack transparency and may confuse clinicians (upper box), and (ii) explainable AI models, providing interpretable results that empower physicians, facilitating clear decision-making and confident clinical actions (lower box).

For this narrative review, we define AI as the capability of computer systems to perceive, reason, learn, plan, and predict in ways analogous to those of medical specialists (14, 15). AI models that leverage historical data to identify patterns and predict, e.g., antimicrobial resistance directly from MALDI-TOF mass spectra (16), help to discover new antibiotics (17), support clinical decision making in, e.g., treatment of sepsis (18), or predict the outcome of sepsis (19, 20). Increasingly, AI not only expands medical knowledge but also significantly enhances our understanding of fundamental mechanisms in infection biology (21–23). In the following sections, we will summarize the status quo and discuss recent developments of explainable AI (XAI) with a focus on infectious diseases.

The studies were identified through a literature review covering January 2000 to July 2025 using combinations of keywords such as “microbiology,” “infectious diseases,” “pathogen,” “antimicrobial resistance,” “genomics,” “metagenomics,” “artificial intelligence,” “machine learning,” “deep learning,” “explainable AI/XAI,” “SHAP,” and “LIME” on PubMed/MEDLINE, Google Scholar and traditional Google searches, with a focus on papers published post-2021, particularly in 2024 and 2025.

Fundamentals of artificial intelligence

AI encompasses a broad field dedicated to creating systems capable of emulating intelligent human behaviour without direct human intervention (15). AI’s origins trace back to the 1950s, with Alan Turing’s foundational work on machine intelligence and the 1956 Dartmouth Conference, which established AI as a scientific field aimed at replicating and surpassing human cognition. Recently, advanced large language models (LLMs) were brought into mainstream, highlighting their potential within diagnostic laboratories (24).

Learning strategies. AI models are highly effective in rapidly categorizing data. These models generate predictions by learning on prior experiences captured in data inputs and typically rely on supervised, unsupervised, or reinforcement learning strategies (14, 25). In supervised learning, the ground truth is known, and this is a typical teacher-student relationship. Supervised learning algorithms utilize labelled datasets to predict outcomes or classify inputs, e.g., Gram-staining categories and morphologies of bacteria in microscopy. In contrast, unsupervised learning operates without labelled data; methods such as K-means clustering and hierarchical clustering identify intrinsic data patterns without labelled inputs. These techniques, including anomaly detection methods and principal component analysis, are notably effective in biomarker discovery contexts, e.g., grouping of patients based on gene expression upon SARS-CoV-2 infection (26). Finally, reinforcement learning is a training strategy in which an agent learns an optimal sequence of actions by interacting with an environment and receiving scalar rewards that signals success or failure; the objective is to maximise the long-term discounted return, typically formalised as a Markov decision process (27). In the clinical domain, reinforcement learning is uniquely suited to sequential decision-making problems where actions taken now influence downstream patient outcomes, such as in sepsis management (18) or learning drug cycling policies to reduce evolution of antimicrobial resistance (28).

Accuracy vs. interpretability. Historically, symbolic AI programs from the 1950s–1970s inherently provided transparency through clear, expert-authored “if-then” rules. However, the shift towards statistical learning methods and neural networks in the 1990s improved predictive accuracy at the expense of interpretability, increasing complexity and obscuring decision-making logic (29). Often in AI models there is a trade-off between accuracy vs. interpretability (Figure 2).

Figure 2.

Graphic comparing predictive performance versus interpretability of machine learning models, illustrating that black box models like deep neural networks have high predictive performance but low interpretability, glass box models like linear regression are highly interpretable but may have lower performance, and grey box models such as random forests balance both aspects.

The accuracy and interpretability tradeoff in machine learning models. The figure illustrates the inverse relationship typically observed between model accuracy (y-axis) and interpretability (x-axis). Highly complex models such as deep neural networks (DNN), support vector machines with radial basis function kernels (SVM-RBF), and extreme gradient boosting (XGBoost) often achieve superior accuracy but at the expense of interpretability (“Black box” models, top-left). Intermediate models like random forests and fuzzy rule-based systems offer moderate interpretability and accuracy (“Grey box” models, middle). Simpler models such as decision trees, logistic regression, and linear regression provide high interpretability, making them preferable in contexts where transparent decision-making and explainability are critical, albeit often with lower accuracy (“Glass box” models, lower-right).

Traditional machine learning models. Traditional machine learning (ML) methods refer to a set of well-established algorithms and models that are used to analyse and make predictions based on structured data. These methods typically rely on statistical techniques and mathematical models to learn patterns and relationships from data, often with a focus on interpretability and simplicity. Traditional ML methods involve more straightforward algorithms like linear regression, decision trees, support vector machines, k-nearest neighbours, random forests, and gradient boosting (30).

Neural networks and deep learning. The concept of neural networks has been around for decades but only gained substantial traction with the rise of increased computational power (31). Deep learning is a branch of AI in which data is processed through multiple layers of interconnected neurons. Each neuron is a simple computational unit that combines input values and passes the result forward. Together, these layers progressively extract more complex patterns from the data. For example, in image analysis, early layers may detect simple features such as edges or shapes, while later layers combine this information into more complex patterns, such as bacterial colony morphology. The complexity of these models is usually described by the number of layers and neurons (nodes), often summarized as the number of parameters. Different types of neural networks exist, referred to as different architectures, which differ in how their neurons are organized and how they process information. One type of such architecture is Recurrent Neural Network (RNN), specifically designed to handle sequential data. Unlike traditional neural networks that process data independently, RNNs have “memory” or hidden states that allow them to remember previous inputs, making them suitable for tasks like language translation, natural language processing, sentiment analysis, and time series forecasting. RNNs were used in the discovery of synthetic antimicrobial peptides (32). Another common architecture is Convolutional Neural Network (CNN), particularly well suited for image analysis. CNNs inherently integrate feature extraction processes, e.g., used for the identification of bacterial species through colony morphology on agar plates (33) and rapid Gram-stain characterization in positive blood cultures (34).

Foundational models & multi-model models. Foundation models are based on transformer architecture, which allow them to learn from vast amounts of data through a process called self-supervised learning. In the case of LLMs, the model predicts the next word in a sentence or sequence of data, enabling it to understand context and relationships without needing explicitly labelled data. LLMs have a broad spectrum of applications in clinical microbiology and infectious diseases (24), e.g., the interpretation of disk diffusion diameters to predict the underlying molecular antimicrobial resistance mechanisms (35). Additionally, multi-modal approaches that combine different types of inputs are increasingly being used to solve complex, multi-dimensional problems (12). These approaches are opening new avenues for research and clinical applications, especially in areas that require combining multiple data types, ultimately enabling AI systems to perform tasks that were previously considered too complex for conventional models.

Accuracy vs. interpretability trade-off. A critical ongoing debate involves balancing model interpretability with predictive accuracy [Figure 2, (36, 37)]. Simple, interpretable models like decision trees typically offer clear logic but are often surpassed in predictive performance by complex ensemble models. However, recent studies show interpretable models can achieve competitive performance even in critical clinical settings. For instance, a gradient-boosting model combined with SHapley Additive exPlanations (SHAP) values effectively predicted sepsis in emergency department triage, matching deeper neural networks in predictive performance, while simultaneously clarifying the influence of critical clinical factors such as heart rate increases and hypotension (38). The choice of model complexity must carefully weigh incremental accuracy gains against ethical and practical considerations, such as biases inherent in opaque models. For example, correcting predictive biases across socioeconomic groups may negate the slight accuracy advantage of complex models, making simpler, interpretable options safer, more transparent, and ethically preferable (39, 40). For these approaches, clear methods to assess their reliability and limits are essential. Despite their high predictive accuracy, the rapid evolution and complexity of AI models often result in opaque, “black-box” decision-making processes. This opacity fosters clinician skepticism and mistrust, limiting broader clinical adoption (41).

Explainable AI

Explainability complements interpretability by clarifying reasons behind prediction. Using XAI bridges this transparency gap by offering understandable explanations that humans can interpret, trust, and troubleshoot. Enhanced explainability fosters clinician trust and facilitates safer incorporation into routine clinical practice (41).

Strategies for enhanced interpretability. Modern XAI methodologies enhance interpretability through three principal approaches: (i) Intrinsic interpretability with models, such as decision trees or rule-based classifiers, inherently limit complexity to maintain clear, transparent logic; (ii) post-hoc explanations; and (iii) hybrid architectures (29).

Intrinsic interpretability. Intrinsic interpretability refers to models that are inherently interpretable due to their design. These models, such as decision trees or rule-based classifiers, intentionally limit complexity to maintain clear and transparent decision-making logic. The principle of Occam’s Razor applies here, with simpler models that provide sufficient explanatory power often being preferable when their predictive performance is comparable to more complex alternatives (42). For example, sparse decision trees offer a level of clarity that can be especially beneficial in fields like malaria risk assessment, where the decision process needs to be actionable (43).

Post-hoc explanations. For more complex models, such as neural networks or ensemble models, post-hoc explanation methods are employed to clarify how predictions are made after the model has been trained. Tools like SHAP (44) and LIME [Local Interpretable Model-agnostic Explanations, (45)] quantify the influence of individual features on model predictions and visually present these insights. These methods do not alter the model itself but provide an understandable breakdown of its decision-making process (Figure 3).

Figure 3.

Infographic compares SHAP and LIME methods for AI model interpretation using an example of bacterial colony prediction for Escherichia coli. SHAP visualizes feature importance through systematic addition of features and measuring prediction changes, while LIME generates and perturbs samples, calculates distances, predicts with a black box model, then fits a weighted linear model to highlight important features. Key steps, visualizations of feature contributions, and resulting importance rankings are shown for both techniques.

Application of explainability measures in clinical decision support systems. Patient data, collected from clinical records or diagnostic tests, serve as inputs for an AI predictive model. The AI model processes these inputs and generates predictive outcomes. To ensure clinical utility, explainability measures are applied, elucidating feature importance through interpretable visualizations, such as bar charts displaying importance scores of different clinical parameters (e.g., biomarkers, symptoms, demographics). Crucially, validation of the model includes both internal (local) validation to assess performance consistency within the development dataset and external validation using independent datasets or clinical centers, enhancing generalizability and reliability. The resulting interpretable predictions facilitate informed decision-making by physicians, who engage in iterative feedback loops, refining the AI model through practical insights and ongoing validation processes.

Hybrid Architectures. Hybrid AI architectures combine symbolic AI, which uses explicit rules and logical reasoning to support interpretability, with sub-symbolic AI like deep learning, which learns patterns from data and makes accurate predictions. For example, in medical microbiology, a hybrid AI could use a deep neural network to analyse images of bacterial colonies on agar plates, while a rule-based system interprets the findings by matching visual traits (e.g., round shape, yellow colour) to known species. This approach allows models to maintain interpretability while benefiting from the high performance of deep learning (29).

Selection of suitable explainability methods. Selection of suitable explainability methods generally depends on three dimensions: transparency, explanatory scope, and data modality (Supplementary Table 1). Transparency distinguishes intrinsically interpretable (“glass-box”) from post-hoc methods clarifying complex models retrospectively. Intrinsically interpretable models provide immediate transparency but may slightly sacrifice accuracy (36). Post-hoc methods, however, maintain high accuracy while restoring interpretability post-training (46). Effective interpretability further depends on matching techniques with data modalities. For instance, Gradient-weighted Class Activation Mapping (Grad-CAM) excels in visualizing medical images, highlighting relevant regions in chest radiographs; SHAP efficiently attributes feature importance in structured tabular data like antibiotic resistance datasets, and LIME-text effectively interprets clinical triage text data (47, 48). Integrating these considerations ensures robust and transparent AI implementation in infectious diseases contexts.

Potential use cases for explainable AI

XAI significantly enhances transparency by grounding AI-generated recommendations in clinically relevant factors, promoting informed decision-making, and reducing errors and biases. This transparency is critical for managing complex conditions such as sepsis or acute kidney injury, where clear insights into AI logic can mitigate uncertainties inherent in complex pathophysiology, e.g., during sepsis (18). Although explainability cannot eliminate random data errors, it effectively identifies and reduces systematic biases, aligning AI outputs closely with clinical judgment and fostering clinician trust. XAI may also support identifying risk factors linked to specific types of infection (Supplementary Table 2).

Clinical decision support systems (CDSS). AI-based CDSS hold promise to enhance medical decision-making by integrating patient data, guidelines, and expert knowledge, thus providing real-time diagnostic assistance and highlighting overlooked features (23, 49). Despite their potential, regulatory challenges such as the European Union’s (EU) In Vitro Diagnostic Regulation (IVDR), the EU-AI-Act, and regulations from the Food and Drug Administration (FDA), as well as insufficient validation in clinical trials, currently limit routine implementation in infectious disease management. Most studies show retrospective study designs with respective limitations; for example, the predictive performance for severe respiratory infections with rule-based models reaching 95.4% accuracy (50) was derived from a single-center retrospective cohort (n = 485) without external validation, and explainable XGBoost algorithms effectively predicting influenza mortality with interpretable SHAP insights (51) utilized a retrospective multicenter dataset from Taiwan (n = 336) but lacked prospective validation. Prospective randomized controlled trials of AI-based CDSS tools are rarely done. One such example includes an improved outcome in AI-driven sepsis care demonstrated by a randomized trial (52) (n = 142) across two intensive care units but still limited to a single healthcare center.

Epidemiology and clinical risk assessment. Traditional epidemiological models have been substantially enhanced by AI approaches by allowing the integration of complex datasets for improved forecasting, contact tracing, and outbreak management (8, 53). Examples include Naïve Bayes prediction of COVID-19 outcomes (54) (retrospective analysis of countries’ aggregated data, limited by data quality variability across regions) and ensemble models accurately forecasting Measles outbreaks by transparently prioritizing risk factors (55) (using US county-level surveillance data trained on just 2 years and tested on one). Additionally, wastewater viral surveillance correlated with clinical Mpox infections demonstrates reliable AI-based monitoring (56) (study from a single Chinese hospital with n = 283 total wastewater samples). In clinical settings, AI rapidly processes extensive data for effective risk stratification as demonstrated by high-accuracy emergency triage models (57) (single-center retrospective study, n = 276,164 patients) and predictive models for COVID-19 ICU admissions (58) (used MIMIC-III database with no external validation) that emphasize the necessity of external validation, explainability for generalizability and trust.

Trust in AI: transparency and explainability

Various aspects contribute to mistrust issues in AI applications. Firstly, the data sources used to train the various models are often biased or sometimes unknown. Secondly, the rapid evolution of this technology has generated numerous predictive AI models, and versions thereof, with increasingly complex, hard to comprehend architectures. The increasing number of models is a problem. It is very hard to follow the versioning of the existing models. From a diagnostician’s point of view, and with respect to approval and QC, knowledge of versions (and the differences to the previous one) is needed. While often effective in terms of predictive accuracy, these sophisticated models typically lack transparency, functioning as “black box” systems whose internal decision-making processes remain opaque (59). Such opacity contributes significantly to skepticism and mistrust among clinicians, hindering wider adoption into routine clinical practice (Figure 1). Transparency has emerged as a fundamental requirement for AI adoption in healthcare, particularly in life-critical clinical decision-making scenarios. Mindful and ethically responsible usage and deployment of data and AI algorithms require careful attention to interpretability and fairness by proactively addressing biases and ensuring accountability in clinical settings (60, 61).

Open, quality-controlled data. For almost all commercial LLMs, the data source used for training is not transparently disclosed. For medical applications, most publications use primarily retrospective datasets from single institutions, thus introducing potential biases related to geography, specific patient populations, and institutional practices. This becomes particularly evident for AMR data, where prevalence data is not equally well documented (62) or during the COVID-19 pandemic, where molecular diagnostics testing and surveillance rates were significantly lower in high-income countries compared to low- and middle-income countries (63). Data used for AI applications should follow minimal quality standards (64).

Performance metrics. A broad range of performance metrics can be used to compare models. The info box summarizes the key metrics commonly used (Appendix). Metrics allow us to quantify how closely predictions match the real-world, thereby providing a basis for trust in the model. For binary classification tasks, e.g., predicting whether a bacterial isolate is susceptible or resistant to an antibiotic, common metrics include sensitivity (true positive rate, or recall), specificity (true negative rate), precision (positive predictive value of a “positive” prediction), and overall accuracy. These values are derived from a contingency table (confusion matrix) of predicted vs. actual outcomes, much like the evaluation of laboratory diagnostic tests in clinical microbiology (65).

In assessing classifier performance, a receiver operating characteristic (ROC) curve is often used to visualize the trade-off between sensitivity and 1–specificity across various decision thresholds. The area under the ROC curve (AUC) provides a single summary measure of discriminative ability. An AUC of 0.5 indicates a 50/50 chance, whereas an AUC of 1.0 indicates a perfectly performing classifier (66). For continuous predictions (regression tasks) such as estimating an exact minimum inhibitory concentration (MIC) from genomic data, different metrics apply. Instead of true/false positives, these models are judged by how close their numeric predictions are to the true values by using error measures like mean squared error (MSE) or mean absolute error, and sometimes by reporting the fraction of predictions within an acceptable range of the actual value (67). Such metrics help determine if an AI model’s continuous output, e.g., a predicted MIC is consistently near the ground truth.

Performance metrics like sensitivity, specificity, precision, accuracy, and AUC quantify AI model reliability, indicating areas needing improvement. High sensitivity and precision in antibiotic resistance classifiers increase trust in resistant strains being rarely missed, and calls being accurate. However, metrics can mislead if derived from small, biased, or unbalanced datasets, which emphasizes the necessity of external validation on diverse, representative data to ensure clinical utility and fairness (65).

Ethical and regulatory aspects of explainable AI

Integrating AI into clinical practice involves navigating ethical and regulatory challenges. Ethical considerations focus heavily on transparency, fairness, and autonomy. Regulations such as the EU’s General Data Protection Regulation (GDPR) mandate clear disclosure of automated decision-making, protecting patient autonomy and ensuring informed consent (68). However, the opacity of commercially developed AI datasets complicates transparency and limits comprehensive bias assessments. Concepts like Lipton’s “contestability” and Ploug and Holm’s advocacy for clear, actionable patient information underscore the ethical necessity of empowering patients to challenge and engage meaningfully with AI-driven clinical recommendations (69). Ensuring fairness and robustness, through representative and openly accessible datasets is crucial to mitigate biases that could disproportionately impact vulnerable populations and exacerbate health disparities during infectious diseases outbreaks (29).

Regulatory frameworks for AI in healthcare further emphasize the need for explicit standards regarding transparency and explainability. Authorities like the FDA and the EU’s Medical Device Regulation (MDR) currently provide limited guidance on AI interpretability, which complicates the deployment of trustworthy systems. Initiatives such as the FDA’s Total Product Lifecycle framework are evolving to support ongoing validation and monitoring, yet regulatory ambiguity persists (70). Transparent AI systems are essential to enable clinicians to clearly communicate model details and safeguard informed consent and patient autonomy (71).

In infectious diseases management, explicit regulatory standards and rigorous ethical compliance are especially critical. AI-driven predictions significantly impact both individual patient management and public health responses, which is why transparency is important to prevent clinical errors and alert fatigue that may undermine trust, as demonstrated by performance issues in the Epic Sepsis Model (72). Robust, transparent, and ethically sound AI ensures effective surveillance, equitable resource allocation, and improved outcomes, especially in managing infectious diseases emergencies (29, 71).

Limitations

There are a few limitations with this narrative review. As a narrative review rather than systematic review, our study selection was guided by targeted searches rather than a comprehensive protocol. Our review primarily focused on English-language publications, which may underrepresent contributions from non-English speaking regions. We did not apply formal quality assessment tools to the reviewed studies, though we have highlighted methodological strengths and limitations where relevant. The current evidence base consists largely of retrospective, single-center studies with limited external validation, reflecting the early developmental stage of XAI in clinical microbiology and infectious diseases. Future systematic reviews with formal quality appraisal will be valuable as the field matures.

Future directions

Transparency mandated by regulatory frameworks such as GDPR (EU) and Health Insurance Portability and Accountability Act (HIPAA, US) highlights the importance of AI explainability in healthcare data management. Balancing interpretability and predictive performance become increasingly complex as AI models evolve but remain pertinent for clinical acceptance and patient safety (73).

Ensuring robustness (consistency despite data variability) and fairness (equitable model performance across diverse populations) is vital for user trust and reliability. Tailored explanations for clinicians, patients, and healthcare administrators through intuitive and privacy-preserving interfaces can enhance AI usability and acceptance significantly. A strategic shift towards inherently transparent, integrated models embedded directly within clinical workflows can reduce skepticism, support informed clinical decisions, and ultimately enhance patient outcomes. Such ethically sound AI frameworks require interdisciplinary collaboration involving clinicians, data scientists, ethicists, and regulatory bodies.

Five steps are recommended to implement XAI: (i) Curate. Create structured, high-quality datasets, e.g., MALDI-TOF spectra linked to resistance phenotypes. (ii) Clarify. Use interpretable models, e.g., SHAP with gradient-boosted trees for predicting antimicrobial resistance (e.g., against carbapenems). (iii) Validate. Prospectively test models in clinical routines, e.g., AI-driven early sepsis diagnosis in emergency units. (iv) Train. Educate specialists with interactive XAI workshops, e.g., interpreting SHAP visuals for MDR tuberculosis. (v) Comply. Align transparently with ethical and regulatory frameworks, e.g., IVDR and GDPR for trustworthy clinical integration.

Open data initiatives are essential for equitable AI and can enable researchers, particularly those in low- and middle-income countries, to effectively develop and validate AI models. Encouraging inclusive international collaboration and data sharing can address global health disparities by ensuring AI solutions accurately represent diverse populations. Furthermore, structured education programs promoting AI literacy among healthcare professionals will be crucial for successful AI integration into clinical practice and public health systems.

Appendix: info box

In assessing binary classifiers, it is essential to look beyond overall accuracy to metrics that reveal true performance. This is especially the case in contexts with imbalanced classes or differing costs of error. Precision (Positive Predictive Value), Recall (Sensitivity), F1 Score, and AUC-ROC together offer a nuanced view:

  • Precision reveals how reliable positive predictions are.

  • Recall captures the ability to detect actual positives.

  • F1 Score balances these two, particularly useful when neither false positives nor false negatives are negligible.

  • AUC-ROC summarizes model discrimination across all possible decision thresholds.

Combined with sensitivity, specificity, and related rates, this suite of metrics provides a robust overview of classifier behaviour, enabling informed decisions in varied application domains.

Metric Also known as Definition Formula
Sensitivity (TPR) Recall, true positive rate Probability that the model correctly identifies actual positives. Useful for ruling out disease when negative. TP/(TP + FN)
Specificity (TNR) True negative rate Probability that the model correctly identifies actual negatives. Helpful for ruling in disease when positive. TN/(FP + TN)
False positive rate (FPR) Proportion of negatives wrongly classified as positive (1-Specificity). FP/(FP + TN)
False negative rate (FNR) Miss rate Proportion of positives wrongly classified as negative (1- Sensitivity). FN/(TP + FN)
Positive predictive value, PPV Precision Proportion of predicted positives that are true positives. TP/(TP + FP)
Negative Predictive Value (NPV) Proportion of predicted negatives that are true negatives. TN/(TN + FN)
Accuracy Overall fraction of correct classifications. (TP + TN)/(TP + FP + TN + FN)
F1 score F-measure Harmonic means of Precision and Recall and balances both; robust for imbalanced data. 2·(Precision · Recall)/(Precision + Recall)
AUC-ROC ROC AUC Area under the ROC curve; measures model’s ability to rank positives higher than negatives across thresholds.

Funding Statement

The author(s) declared that financial support was received for this work and/or its publication. AE received funding from the Swiss National Science Foundation (Grant No. 213019) and an unrestricted endowment grant from the University of Zurich, Switzerland.

Footnotes

Edited by: João Manuel R. S. Tavares, University of Porto, Portugal

Reviewed by: Marcelo Pillonetto, Pontifical Catholic University of Parana, Brazil

Adekunle Adeoye, Georgia State University, United States

Author contributions

SW: Conceptualization, Data curation, Methodology, Writing – original draft, Writing – review & editing. VN: Conceptualization, Data curation, Methodology, Writing – original draft, Writing – review & editing. EM: Methodology, Visualization, Writing – review & editing. YG: Methodology, Writing – review & editing. ON: Methodology, Writing – review & editing. BV: Conceptualization, Funding acquisition, Resources, Supervision, Writing – review & editing. AE: Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Visualization, Writing – review & editing.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was used in the creation of this manuscript. OpenAI ChatGPT was used to improve the grammar and language of the manuscript. The author(s) verify and take full responsibility for the use of generative AI in the preparation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2026.1776922/full#supplementary-material

Table_1.pdf (230.1KB, pdf)
Table_2.pdf (264.9KB, pdf)

References

  • 1.G.B.D.A.R. Collaborators . Global mortality associated with 33 bacterial pathogens in 2019: a systematic analysis for the global burden of disease study 2019. Lancet. (2022) 400:2221–48. doi: 10.1016/S0140-6736(22)02185-7, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.G.B.D.A.R. Collaborators . Global burden of bacterial antimicrobial resistance 1990-2021: a systematic analysis with forecasts to 2050. Lancet. (2024) 404:1199–226. doi: 10.1016/s0140-6736(24)01867-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Morens DM, Folkers GK, Fauci AS. The challenge of emerging and re-emerging infectious diseases. Nature. (2004) 430:242–9. doi: 10.1038/nature02759, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chen Z, Azman AS, Chen X, Zou J, Tian Y, Sun R, et al. Global landscape of SARS-CoV-2 genomic surveillance and data sharing. Nat Genet. (2022) 54:499–507. doi: 10.1038/s41588-022-01033-y, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Reyes Nieva H, Zucker J, Tucker E, McLean J, DeLaurentis C, Gunaratne S, et al. Development of machine learning-based mpox surveillance models in a learning health system. Sex Transm Infect. (2025) 101:456–60. doi: 10.1136/sextrans-2024-056382, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hoefer A, Seth-Smith H, Palma F, Schindler S, Freschi L, Dangel A, et al. Corynebacterium diphtheriae outbreak in migrant populations in Europe. N Engl J Med. (2025) 392:2334–45. doi: 10.1056/nejmoa2311981 [DOI] [PubMed] [Google Scholar]
  • 7.Egli A, Schrenzel J, Greub G. Digital microbiology. Clin Microbiol Infect. (2020) 26:1324–31. doi: 10.1016/j.cmi.2020.06.023, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kraemer MUG, Tsui JL, Chang SY, Lytras S, Khurana MP, Vanderslott S, et al. Artificial intelligence for modelling infectious disease epidemics. Nature. (2025) 638:623–35. doi: 10.1038/s41586-024-08564-w, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Liu GY, Yu D, Fan MM, Zhang X, Jin ZY, Tang C, et al. Antimicrobial resistance crisis: could artificial intelligence be the solution? Mil Med Res. (2024) 11:7. doi: 10.1186/s40779-024-00510-1, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wong F, de la Fuente-Nunez C, Collins JJ. Leveraging artificial intelligence in the fight against infectious diseases. Science. (2023) 381:164–70. doi: 10.1126/science.adh1114, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Padoan A, Cadamuro J, Frans G, Cabitza F, Tolios A, De Bruyne S, et al. Data flow in clinical laboratories: could metadata and peridata bridge the gap to new AI-based applications? Clin Chem Lab Med. (2025) 63:684–91. doi: 10.1515/cclm-2024-0971 [DOI] [PubMed] [Google Scholar]
  • 12.Moor M, Banerjee O, Abad ZSH, Krumholz HM, Leskovec J, Topol EJ, et al. Foundation models for generalist medical artificial intelligence. Nature. (2023) 616:259–65. doi: 10.1038/s41586-023-05881-4, [DOI] [PubMed] [Google Scholar]
  • 13.Bommasani R., Hudson D.A., Adeli E., Altman R., Arora S.V, Arx M.S., et al. On the opportunities and risks of foundation models. ArXiv abs/210807258 (2021).
  • 14.Asnicar F, Thomas AM, Passerini A, Waldron L, Segata N. Machine learning for microbiologists. Nat Rev Microbiol. (2024) 22:191–205. doi: 10.1038/s41579-023-00984-1, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Xu Y, Liu X, Cao X, Huang C, Liu E, Qian S, et al. Artificial intelligence: a powerful paradigm for scientific research. Innovation (Camb). (2021) 2:100179. doi: 10.1016/j.xinn.2021.100179, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Weis C, Cuenod A, Rieck B, Dubuis O, Graf S, Lang C, et al. Direct antimicrobial resistance prediction from clinical MALDI-TOF mass spectra using machine learning. Nat Med. (2022) 28:164–74. doi: 10.1038/s41591-021-01619-9, [DOI] [PubMed] [Google Scholar]
  • 17.Wong F, Zheng EJ, Valeri JA, Donghia NM, Anahtar MN, Omori S, et al. Discovery of a structural class of antibiotics with explainable deep learning. Nature. (2024) 626:177–85. doi: 10.1038/s41586-023-06887-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Komorowski M, Celi LA, Badawi O, Gordon AC, Faisal AA. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med. (2018) 24:1716–20. doi: 10.1038/s41591-018-0213-5, [DOI] [PubMed] [Google Scholar]
  • 19.Bhargava A, López-Espina C, Schmalz L, Khan S, Watson GL, Urdiales D, et al. FDA-authorized AI/ML tool for sepsis prediction: development and validation. NEJM AI. (2024) 1:AIoa2400867. doi: 10.1056/aioa2400867 [DOI] [Google Scholar]
  • 20.Goh KH, Wang L, Yeow AYK, Poh H, Li K, Yeow JJL, et al. Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare. Nat Commun. (2021) 12:711. doi: 10.1038/s41467-021-20910-4, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Agrebi S, Larbi A. "Chapter 18 - use of artificial intelligence in infectious diseases". In: Barh D, editor. Artificial Intelligence in Precision Health. Cambridge: Academic Press; (2020). p. 415–38. [Google Scholar]
  • 22.Rozera T, Pasolli E, Segata N, Ianiro G. Machine learning and artificial intelligence in the multi-omics approach to gut microbiota. Gastroenterology. (2025) 169:487–501. doi: 10.1053/j.gastro.2025.02.035, [DOI] [PubMed] [Google Scholar]
  • 23.Sarantopoulos A, Mastori Kourmpani C, Yokarasa AL, Makamanzi C, Antoniou P, Spernovasilis N, et al. Artificial intelligence in infectious disease clinical practice: an overview of gaps, opportunities, and limitations. Trop Med Infect Dis. (2024) 9. doi: 10.3390/tropicalmed9100228, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Egli A. ChatGPT, GPT-4, and other large language models: the next revolution for clinical microbiology? Clin Infect Dis. (2023) 77:1322–8. doi: 10.1093/cid/ciad407, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Sarker IH. AI-based modeling: techniques, applications and research issues towards automation, intelligent and smart systems. SN Comput Sci. (2022) 3:158. doi: 10.1007/s42979-022-01043-x, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Fujisawa K, Shimo M, Taguchi YH, Ikematsu S, Miyata R. PCA-based unsupervised feature extraction for gene expression analysis of COVID-19 patients. Sci Rep. (2021) 11:17351. doi: 10.1038/s41598-021-95698-w, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.van Otterlo M, Wiering M. "Reinforcement learning and Markov decision processes". In: Wiering M, van Otterlo M, editors. Reinforcement Learning: State-of-the-Art. Berlin, Heidelberg: Springer; (2012). p. 3–42. [Google Scholar]
  • 28.Weaver DT, King ES, Maltas J, Scott JG. Reinforcement learning informs optimal treatment strategies to limit antibiotic resistance. Proc Natl Acad Sci USA. (2024) 121:e2303165121. doi: 10.1073/pnas.2303165121, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ali S, Abuhmed T, El-Sappagh S, Muhammad K, Alonso-Moral JM, Confalonieri R, et al. Explainable artificial intelligence (XAI): what we know and what is left to attain trustworthy artificial intelligence. Inf Fusion. (2023) 99:101805. doi: 10.1016/j.inffus.2023.101805 [DOI] [Google Scholar]
  • 30.Ting Sim JZ, Fong QW, Huang W, Tan CH. Machine learning in medicine: what clinicians should know. Singapore Med J. (2023) 64:91–7. doi: 10.11622/smedj.2021054, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, et al. A guide to deep learning in healthcare. Nat Med. (2019) 25:24–9. doi: 10.1038/s41591-018-0316-z, [DOI] [PubMed] [Google Scholar]
  • 32.Li C, Sutherland D, Richter A, Coombe L, Yanai A, Warren RL, et al. De novo synthetic antimicrobial peptide design with a recurrent neural network. Protein Sci. (2024) 33:e5088. doi: 10.1002/pro.5088, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Signoroni A, Ferrari A, Lombardi S, Savardi M, Fontana S, Culbreath K. Hierarchical AI enables global interpretation of culture plates in the era of digital microbiology. Nat Commun. (2023) 14:6874. doi: 10.1038/s41467-023-42563-1, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Smith KP, Kang AD, Kirby JE. Automated interpretation of blood culture gram stains by use of a deep convolutional neural Network. J Clin Microbiol. (2018) 56. doi: 10.1128/JCM.01521-17, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Giske CG, Bressan M, Fiechter F, Hinic V, Mancini S, Nolte O, et al. GPT-4-based AI agents-the new expert system for detection of antimicrobial resistance mechanisms? J Clin Microbiol. (2024) 62:e0068924. doi: 10.1128/jcm.00689-24, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Hassija V, Chamola V, Mahapatra A, Singal A, Goel D, Huang K, et al. Interpreting black-box models: a review on explainable artificial intelligence. Cogn Comput. (2024) 16:45–74. doi: 10.1007/s12559-023-10179-8 [DOI] [Google Scholar]
  • 37.Biswas AA. A comprehensive review of explainable AI for disease diagnosis. Array. (2024) 22:100345. doi: 10.1016/j.array.2024.100345 [DOI] [Google Scholar]
  • 38.Liu Z, Shu W, Li T, Zhang X, Chong W. Interpretable machine learning for predicting sepsis risk in emergency triage patients. Sci Rep. (2025) 15:887. doi: 10.1038/s41598-025-85121-z, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Jo N, Aghaei S, Benson J, Gomez A, Vayanos P. Learning Optimal Fair Decision Trees: Trade-offs Between Interpretability, Fairness, and Accuracy, Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society. Montréal, QC: Association for Computing Machinery; (2023). p. 181–92. [Google Scholar]
  • 40.Chakradeo K, Huynh I, Balaganeshan SB, Dollerup OL, Gade-Jorgensen H, Laupstad SK, et al. Navigating fairness aspects of clinical prediction models. BMC Med. (2025) 23:567. doi: 10.1186/s12916-025-04340-3, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Saeed W, Omlin C. Explainable AI (XAI): a systematic meta-survey of current challenges and future opportunities. Knowl Based Syst. (2023) 263:110273. doi: 10.1016/j.knosys.2023.110273 [DOI] [Google Scholar]
  • 42.Piasini E, Liu S, Chaudhari P, Balasubramanian V, Gold JI. How Occam's razor guides human decision-making. bioRxiv. (2025). doi: 10.1101/2023.01.10.523479, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Fleitas PE, Sarasola LB, Ferrer DC, Muñoz J, Petrone P. Machine learning approach to identify malaria risk in travelers using real-world evidence. Heliyon. (2024) 10:e28534. doi: 10.1016/j.heliyon.2024.e28534, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. Adv Neural Inf Proces Syst. (2017) 30. doi: 10.5555/3295222.3295230 [DOI] [Google Scholar]
  • 45.Ribeiro MT, Singh S, Guestrin C. "“Why should I trust you?” Explaining the predictions of any classifier". In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Cambridge: ACM; (2016). p. 1135–44. [Google Scholar]
  • 46.Retzlaff CO, Angerschmid A, Saranti A, Schneeberger D, Röttger R, Müller H, et al. Post-hoc vs ante-hoc explanations: xAI design guidelines for data scientists. Cogn Syst Res. (2024) 86:101243. doi: 10.1016/j.cogsys.2024.101243 [DOI] [Google Scholar]
  • 47.Linardatos P, Papastefanopoulos V, Kotsiantis S. Explainable AI: a review of machine learning interpretability methods. Entropy (Basel). (2020) 23. doi: 10.3390/e23010018, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Vilone G, Longo L. Classification of explainable artificial intelligence methods through their output formats. Mach Learn Knowl Extract. (2021) 3:615–61. doi: 10.3390/make3030032 [DOI] [Google Scholar]
  • 49.Kim SY, Kim DH, Kim MJ, Ko HJ, Jeong OR. XAI-based clinical decision support systems: a systematic review. Appl Sci. (2024) 14:6638. doi: 10.3390/app14156638 [DOI] [Google Scholar]
  • 50.Ahmed F, Hossain MS, Islam RU, Andersson K. An evolutionary belief rule-based clinical decision support system to predict COVID-19 severity under uncertainty. Appl Sci. (2021) 11:5810. doi: 10.3390/app11135810 [DOI] [Google Scholar]
  • 51.Hu CA, Chen CM, Fang YC, Liang SJ, Wang HC, Fang WF, et al. Using a machine learning approach to predict mortality in critically ill influenza patients: a cross-sectional retrospective multicentre study in Taiwan. BMJ Open. (2020) 10:e033898. doi: 10.1136/bmjopen-2019-033898 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Shimabukuro DW, Barton CW, Feldman MD, Mataraso SJ, Das R. Effect of a machine learning-based severe sepsis prediction algorithm on patient survival and hospital length of stay: a randomised clinical trial. BMJ Open Respir Res. (2017) 4:e000234. doi: 10.1136/bmjresp-2017-000234, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Ye Y, Pandey A, Bawden C, Sumsuzzman DM, Rajput R, Shoukat A, et al. Integrating artificial intelligence with mechanistic epidemiological modeling: a scoping review of opportunities and challenges. Nat Commun. (2025) 16:581. doi: 10.1038/s41467-024-55461-x, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Tiwari D, Bhati BS, Al-Turjman F, Nagpal B. Pandemic coronavirus disease (Covid-19): world effects analysis and prediction using machine-learning techniques. Expert Syst. (2022) 39:e12714. doi: 10.1111/exsy.12714, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Kujawski SA, Ru B, Afanador NL, Conway JH, Baumgartner R, Pawaskar M. Prediction of measles cases in US counties: a machine learning approach. Vaccine. (2024) 42:126289. doi: 10.1016/j.vaccine.2024.126289, [DOI] [PubMed] [Google Scholar]
  • 56.Ou G, Tang Y, Liu J, Hao Y, Chen Z, Huang T, et al. Automated robot and artificial intelligence-powered wastewater surveillance for proactive mpox outbreak prediction. Biosaf Health. (2024) 6:225–34. doi: 10.1016/j.bsheal.2024.07.002, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Gao Z, Qi X, Zhang X, Gao X, He X, Guo S, et al. Developing and validating an emergency triage model using machine learning algorithms with medical big data. Risk Manag Healthc Policy. (2022) 15:1545–51. doi: 10.2147/RMHP.S355176, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Deasy J, Lio P, Ercole A. Dynamic survival prediction in intensive care units from heterogeneous time series without the need for variable selection or curation. Sci Rep. (2020) 10:22129. doi: 10.1038/s41598-020-79142-z, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Minh D, Wang HX, Li YF, Nguyen TN. Explainable artificial intelligence: a comprehensive review. Artif Intell Rev. (2022) 55:3503–68. doi: 10.1007/s10462-021-10088-y [DOI] [Google Scholar]
  • 60.Marko JGO, Neagu CD, Anand PB. Examining inclusivity: the use of AI and diverse populations in health and social care: a systematic review. BMC Med Inform Decis Mak. (2025) 25:57. doi: 10.1186/s12911-025-02884-1, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Weidener L, Fischer M. Role of ethics in developing AI-based applications in medicine: insights from expert interviews and discussion of implications. JMIR AI. (2024) 3:e51204. doi: 10.2196/51204, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Kalanxhi E, Osena G, Kapoor G, Klein E. Confidence interval methods for antimicrobial resistance surveillance data. Antimicrob Resist Infect Control. (2021) 10:91. doi: 10.1186/s13756-021-00960-5, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Brito AF, Semenova E, Dudas G, Hassler GW, Kalinich CC, Kraemer MUG, et al. Global disparities in SARS-CoV-2 genomic surveillance. Nat Commun. (2022) 13:7003. doi: 10.1038/s41467-022-33713-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.de Hond AAH, Leeuwenberg AM, Hooft L, Kant IMJ, Nijman SWJ, van Os HJA, et al. Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review. NPJ Digit Med. (2022) 5:2. doi: 10.1038/s41746-021-00549-7, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Rainio O, Teuho J, Klen R. Evaluation metrics and statistical tests for machine learning. Sci Rep. (2024) 14:6086. doi: 10.1038/s41598-024-56706-x, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Nahm FS. Receiver operating characteristic curve: overview and practical use for clinicians. Korean J Anesthesiol. (2022) 75:25–36. doi: 10.4097/kja.21209, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Botchkarev A., Performance metrics (error measures) in machine learning regression, forecasting and prognostics: properties and typology, arXiv preprint arXiv:180903006 (2018).
  • 68.Abgrall G, Holder AL, Chelly Dagdia Z, Zeitouni K, Monnet X. Should AI models be explainable to clinicians? Crit Care. (2024) 28:301. doi: 10.1186/s13054-024-05005-y, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Ploug T, Holm S. "Right to contest AI diagnostics". In: Lidströmer N, Ashrafian H, editors. Artificial Intelligence in Medicine. Cham: Springer International Publishing; (2022). p. 227–38. [Google Scholar]
  • 70.Hacker P, Krestel R, Grundmann S, Naumann F. Explainable AI under contract and tort law: legal incentives and technical challenges. Artif Intell Law. (2020) 28:415–39. doi: 10.1007/s10506-020-09260-6 [DOI] [Google Scholar]
  • 71.Amann J, Blasimme A, Vayena E, Frey D, VIQC M. Precise, explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak. (2020) 20:310. doi: 10.1186/s12911-020-01332-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Wong A, Otles E, Donnelly JP, Krumm A, McCullough J, DeTroyer-Cooley O, et al. External validation of a widely implemented proprietary Sepsis prediction model in hospitalized patients. JAMA Intern Med. (2021) 181:1065–70. doi: 10.1001/jamainternmed.2021.2626, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Hulsen T. Explainable artificial intelligence (XAI): concepts and challenges in healthcare. AI. (2023) 4:652–66. doi: 10.3390/ai4030034 [DOI] [Google Scholar]
  • 74.Neuman I, Shvartser L, Teppler S, Friedman Y, Levine JJ, Kagan I, et al. A machine-learning model for prediction of Acinetobacter baumannii hospital acquired infection. PLoS One. (2024) 19:e0311576. doi: 10.1371/journal.pone.0311576, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Madden GR, Boone RH, Lee E, Sifri CD, Petri WA. Predicting Clostridioides difficile infection outcomes with explainable machine learning. EBioMedicine. (2024) 106:105244. doi: 10.1016/j.ebiom.2024.105244, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Tang J-W, Li F, Liu X, Wang J-T, Xiong X-S, Lu X-Y, et al. Detection of Helicobacter pylori infection in human gastric fluid through surface-enhanced Raman spectroscopy coupled with machine learning algorithms. Lab Investig. (2023) 104:100310. doi: 10.1016/j.labinv.2023.100310 [DOI] [PubMed] [Google Scholar]
  • 77.Li X, Chen Y, Huang G, Sun X, Mo G, Peng X. Epidemiology and risk factors of Clonorchis sinensis infection in the mountainous areas of Longsheng County, Guangxi: insights from automated machine learning. Parasitol Res. (2025) 124:26. doi: 10.1007/s00436-025-08470-8, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Debnath JP, Hossen K, Sayed SB, Khandaker MS, Dev PC, Sarker S, et al. Identification of potential biomarkers for 2022 Mpox virus infection: a transcriptomic network analysis and machine learning approach. Sci Rep. (2025) 15:2922. doi: 10.1038/s41598-024-80519-7, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Doyle OM, Leavitt N, Rigg JA. Finding undiagnosed patients with hepatitis C infection: an application of artificial intelligence to patient claims data. Sci Rep. (2020) 10:10521. doi: 10.1038/s41598-020-67013-6, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.De Rose Ghilardi F, Silva G, Vieira TM, Mota A, Bierrenbach AL, Damasceno RF, et al. Machine learning for predicting Chagas disease infection in rural areas of Brazil. PLoS Negl Trop Dis. (2024) 18:e0012026. doi: 10.1371/journal.pntd.0012026 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Robison HM, Chapman CA, Zhou H, Erskine CL, Theel E, Peikert T, et al. Risk assessment of latent tuberculosis infection through a multiplexed cytokine biosensor assay and machine learning feature selection. Sci Rep. (2021) 11:20544. doi: 10.1038/s41598-021-99754-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Shi M, Lin J, Wei W, Qin Y, Meng S, Chen X, et al. Machine learning-based in-hospital mortality prediction of HIV/AIDS patients with Talaromyces marneffei infection in Guangxi, China. PLoS Negl Trop Dis. (2022) 16:e0010388. doi: 10.1371/journal.pntd.0010388, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Mayer LM, Strich JR, Kadri SS, Lionakis MS, Evans NG, Prevots DR, et al. Machine learning in infectious disease for risk factor identification and hypothesis generation: proof of concept using invasive candidiasis. Open Forum Infect Dis. (2022) 9:ofac401. doi: 10.1093/ofid/ofac401, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Cui C, Mu F, Tang M, Lin R, Wang M, Zhao X, et al. A prediction and interpretation machine learning framework of mortality risk among severe infection patients with pseudomonas aeruginosa. Front Med (Lausanne). (2022) 9:942356. doi: 10.3389/fmed.2022.942356, [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table_1.pdf (230.1KB, pdf)
Table_2.pdf (264.9KB, pdf)

Articles from Frontiers in Public Health are provided here courtesy of Frontiers Media SA

RESOURCES