Summary
Medical AI agents represent a transformative paradigm in healthcare, distinguished from traditional AI by their autonomy, adaptability, and ability to manage complex tasks. This review introduces a conceptual framework for these agents built on four core components: planning, action, reflection, and memory. We examine the framework’s application across key clinical domains, from enhancing diagnostic accuracy and personalizing treatment to guiding robotic surgery and enabling real-time patient monitoring. The review critically analyzes implementation challenges, including technical integration, clinician adoption, regulatory adaptation, and ethical considerations like data privacy and algorithmic bias. Future directions are explored, including the shift toward proactive, multi-agent collaborative systems and the visionary AI Agent Hospital concept. While these agents hold immense potential to revolutionize healthcare delivery by improving efficiency and patient outcomes, their successful and equitable integration hinges on navigating these profound technical, ethical, and regulatory hurdles.
Keywords: artifical intelligene, medical AI agents, mutimodal, real-time diagnosis, personalized treatment
Graphical abstract

Medical AI is evolving from single-task tools to autonomous “agents” capable of complex reasoning and action. Liu et al. provide a unifying framework for these agents, exploring their transformative applications in clinical workflows while critically analyzing the profound technical, ethical, and regulatory challenges to their real-world implementation.
Introduction
Artificial intelligence (AI) is transforming healthcare across numerous applications. The term “medical AI” broadly encompasses any use of AI in this domain, often referring to single-function algorithms designed for specific, isolated tasks—such as classifying medical images or predicting patient risk from a static dataset. These tools, while powerful, typically lack autonomy and contextual awareness. This review focuses on a more advanced and fundamentally different paradigm: the medical AI agent (Table 1). AI agents are intelligent systems designed to function autonomously or semi-autonomously, capable of making decisions and taking actions in dynamic, complex environments.1,2,3,4,5,6,7 In the medical context, we define medical AI agents as autonomous computational systems structured around four core components: planning (cognitive processing and decision-making), action (execution through diverse interfaces), reflection (multimodal data perception and interpretation), and memory (contextual information storage and retrieval). Unlike traditional medical AI tools that operate on isolated datasets or perform single functions, medical AI agents integrate these four components to maintain context across interactions, learn from accumulated experiences, and adapt their behavior based on changing clinical scenarios, enabling comprehensive and dynamic healthcare support. Their core capabilities—autonomy, adaptability, and decision-making—allow them to perform tasks typically carried out by humans.8,9,10 Autonomy enables agents to function independently, without the need for constant human supervision,11 while adaptability ensures they can adjust their actions based on changing circumstances or new data.12,13,14 Decision-making empowers agents to analyze data, evaluate various potential outcomes, and select the most effective course of action to achieve specific goals.15,16 These abilities are powered by advanced technologies such as large language models (LLMs) and vision language models (VLMs), which enable agents to process and understand human language and visual information.17,18,19,20 LLMs facilitate complex problem-solving, reasoning, and even creative thinking by interpreting and generating text, while VLMs extend these capabilities to visual data, enabling agents to make decisions based on images and videos.20,21,22,23 Together, these technologies equip AI agents to perform tasks that require human-like intelligence and adaptability, forming the foundation for a wide range of applications.
Table 1.
Comparison of human clinicians, traditional medical AI, and medical AI agents
| Feature | Human clinician | Traditional medical AI | Medical AI agent |
|---|---|---|---|
| Core nature | biological intelligence with consciousness & empathy | a computational tool or algorithm | an autonomous or semi-autonomous system |
| Primary function | holistic patient care, complex reasoning, empathy | performs a specific, pre-defined task (e.g., classify, regression) | manages complex, multi-step tasks and workflows |
| Decision-making | experience-based, intuitive, and evidence-informed | rule-based or pattern recognition on a given dataset | data-driven, goal-oriented, and adaptive reasoning |
| Autonomy | full autonomy in clinical judgment | none; acts as a passive tool for human use | high; can operate independently to achieve goals |
| Learning & adaptation | lifelong learning from experience and education | static; requires retraining on new datasets to update | continuous learning from new data and interactions |
| Context awareness | high; understands patient history and context | low to none; typically context-agnostic | high; maintains context across interactions via memory |
| Interaction | direct interaction with patients and systems | human-invoked; typically a one-way information output | interactive; can query systems and collaborate with humans |
| Role in healthcare | the central decision-maker and caregiver | a supplementary tool to augment a specific task | a collaborative partner or assistant in the care process |
The true potential of AI agents becomes evident when they operate across multiple modalities, integrating various forms of inputs such as text, images, speech, and sensory data.24,25 This multimodal capability allows AI agents to process information in a way that mirrors human cognitive abilities, enabling more nuanced and effective decision-making.26,27,28,29 For example, in robotics, AI agents can combine visual input, physical movement, and environmental feedback to execute complex tasks such as navigation, object manipulation, and autonomous control in dynamic environments.30 Similarly, in the gaming industry, AI agents can seamlessly switch between understanding and responding to visual, auditory, and textual inputs, making real-time decisions that enhance the gaming experience by adapting to player behavior and environmental changes.31 In autonomous vehicles, AI agents integrate data from cameras, LiDAR, and radar to navigate safely, avoiding obstacles and reacting to traffic conditions.32,33,34 Furthermore, AI agents in smart homes use multimodal data to optimize user comfort and efficiency by integrating inputs from voice commands, sensors, and environmental data, ensuring automated systems function in harmony with human preferences.35,36 This ability to synthesize diverse forms of data enables AI agents to tackle complex challenges across various fields, offering more robust, intelligent, and adaptable solutions.37,38,39
This review is based on a comprehensive literature search of PubMed, Google Scholar, and arXiv for articles published up to May 2025. Our search strategy utilized a combination of keywords related to the core concepts of medical AI agents, enabling technologies, key clinical applications, and implementation challenges. We included peer-reviewed articles, reviews, high-impact conference proceedings, and influential pre-prints relevant to AI agents in healthcare. The selected literature was then thematically synthesized to structure the primary sections of this review, covering the conceptual framework, clinical applications, development, evaluation, and ethical considerations of medical AI agents.
In the medical field, healthcare professionals are tasked with managing complex, time-sensitive situations that require accurate diagnoses, personalized treatment plans, and effective patient care40,41,42 (Figure 1A). These challenges are amplified by the exponential growth of medical data and the increasing complexity of modern healthcare systems. Healthcare providers must process large volumes of diverse data, such as patient histories, medical imaging, laboratory results, and genetic information43—all of which must be integrated and analyzed to ensure optimal outcomes.44,45 Furthermore, healthcare providers also need to adapt treatment plans in real time as patient conditions evolve.44 Medical AI agents are uniquely positioned to address these multifaceted challenges through their advanced capabilities in data processing, autonomous decision-making, and adaptive learning. By leveraging multimodal technologies, AI agents can simultaneously analyze various data sources—such as medical imaging, electronic health records, and genetic data—to assist in diagnosis, predict patient outcomes, and recommend personalized treatment plans.46,47,48 Their ability to autonomously make data-driven decisions and provide real-time insights enhances both the efficiency and accuracy of medical decision-making, supporting healthcare professionals in delivering timely, personalized care.49 As these technologies continue to mature, they promise to revolutionize healthcare delivery by enabling more precise, efficient, and equitable patient care across diverse clinical settings.50 This comprehensive review aims to establish a clear conceptual framework for medical AI agents, examine their current applications across healthcare domains, and identify future directions for their development and implementation in clinical practice. Our research question focuses on understanding how autonomous AI agents can be effectively integrated into healthcare delivery while addressing the technical, ethical, and regulatory challenges they present.
Figure 1.
AI-driven multi-agent system in clinical practice
(A) This schematic illustrates an AI-driven multi-agent system that supports clinicians in patient care. The system includes diagnostic, predictive, therapeutic, monitor, rehabilitation, and robotic agents, each specializing in tasks such as disease identification, outcome prediction, personalized treatment, real-time monitoring, rehabilitation guidance, and surgical assistance. These agents work collaboratively, integrating multimodal data to enhance clinical decision-making, efficiency, and patient outcomes while keeping the clinician at the center of care.
(B) Medical AI agents operate based on four core components: planning, action, memory, and reflection. Planning enables strategic decision-making, resource allocation, and system monitoring. Action encompasses AI-driven tools for diagnosis, prediction, treatment, surgery, and patient monitoring. Memory is divided into short-term memory, which handles real-time data updates, and long-term memory, which stores historical data and patient trends for continuous learning. Reflection allows AI agents to evaluate outcomes, analyze errors, refine models, and incorporate patient feedback for improved decision-making. Together, these components ensure that AI agents function autonomously while continuously adapting to dynamic clinical environments.
Medical AI agent framework
AI agents are fundamentally structured around four core components—planning, action, reflection, and memory (Figure 1B)—that collectively enable their autonomous functionality in complex environments.2,45,51,52,53 “Planning,” often powered by LLMs or VLMs, serves as the cognitive core, allowing agents to process inputs, perform reasoning, and generate decisions. In healthcare, this capability enables agents to analyze medical records, synthesize information, and assist in diagnostic tasks with precision. “Action” translates these decisions into tangible outputs, such as generating responses, guiding robotic systems in surgical procedures, or recommending treatment plans. “Reflection” equips the agent with the ability to perceive and interpret its environment through sensory data, such as extracting insights from medical imaging or monitoring real-time patient conditions, facilitating context-aware interactions. Meanwhile, “memory” acts as a repository for past experiences and acquired knowledge, enabling agents to adapt and improve over time. This component is particularly critical in personalized medicine, as historical patient data are leveraged to refine recommendations and enhance outcomes. Together, these four components—cognition, perception, action, and memory—form an integrated framework, empowering AI agents to intelligently engage with their environment, adapt to dynamic challenges, and continuously evolve their capabilities.
Planning
The planning component of AI agents serves as the cognitive foundation, responsible for processing complex information, performing sophisticated reasoning, and making decisions.5 At the core of this system are advanced models, particularly LLMs,54,55 which enable agents to analyze complex inputs and draw meaningful inferences from extensive training data.56,57 This cognitive capability allows agents to handle diverse tasks, ranging from natural language understanding to specialized cognitive functions like logical reasoning or planning.58 In healthcare contexts, the planning system processes patient data from electronic health records (EHRs), interprets diagnostic test results, and synthesizes information from medical literature to generate evidence-based recommendations. For example, when evaluating a patient with chest pain, the planning system can integrate symptoms, vital signs, laboratory results, and imaging findings to generate differential diagnoses and recommend appropriate diagnostic workups. The integration of external medical knowledge bases further enhances the agent’s decision-making capabilities, ensuring recommendations align with current clinical guidelines and best practices.
Action
The action system of AI agents refers to their ability to execute decisions and perform tasks based on the data they process and the decisions they make.38,39 In healthcare, this capability is exemplified through tools that allow the agent to carry out precise actions, from generating diagnostic reports to assisting in clinical procedures.59 For instance, in robotic surgery, AI agents use their action systems to control robotic arms, executing delicate and highly precise movements based on real-time sensory feedback. Similarly, in clinical decision support, the action system can generate treatment recommendations or alert healthcare providers to critical changes in patient conditions. The tool component encompasses diverse interfaces including (1) application programming interfaces (APIs) for accessing electronic health records, laboratory information systems, and medical imaging repositories; (2) hardware interfaces for controlling medical devices such as infusion pumps, ventilators, and surgical robots; (3) specialized software libraries for image processing, natural language processing, and predictive analytics; and (4) robotic actuators for physical manipulation tasks in laboratory automation and patient care. These tools may include data retrieval systems, automated drug dispensers, or even integration with hospital management systems for streamlining administrative duties. By utilizing both internal action capabilities and external tools, AI agents can perform complex, multi-step procedures in healthcare environments, contributing to greater efficiency, accuracy, and improved patient outcomes.
Reflection
The reflection ability of AI agents is crucial for processing and interpreting sensory data in healthcare environments, enabling them to interact with and understand complex medical information.15,60 This capability encompasses various modalities, including visual inputs, such as medical imaging (e.g., X-rays, magnetic resonance imaging [MRI], and computed tomography [CT] scans), and physiological data like heart rate, blood pressure, and other vital signs monitored through wearable devices. AI agents can analyze medical images to detect anomalies like tumors, fractures, or tissue damage, providing valuable insights for diagnosis and treatment planning. Additionally, the perceptual system allows the agent to process real-time data from sensors, enabling continuous monitoring of patient conditions and alerting healthcare providers to potential risks, such as deterioration in vital signs. By synthesizing diverse sensory inputs, AI agents can generate a more holistic understanding of a patient’s health status, facilitating timely and accurate clinical decisions. This perceptual ability is vital for applications like robotic surgery, where the agent must interpret visual and tactile feedback to assist with precision in real-time operations.
Memory
The memory system of AI agents is fundamental for storing, retrieving, and applying knowledge over time, enabling them to adapt and improve based on past experiences.4,8 The memory system operates through two distinct components: short-term and long-term memory. Short-term memory maintains immediate contextual information during active patient encounters, such as current vital signs (heart rate: 72 bpm, blood pressure: 120/80 mmHg), ongoing medication administration, and real-time symptom progression within a single clinical session. Long-term memory stores aggregated knowledge including patient medical histories, treatment outcomes across similar patient cohorts, learned patterns from thousands of diagnostic cases, and evidence-based treatment protocols that inform future decision-making. In healthcare, the memory system allows AI agents to retain patient data, treatment histories, and clinical outcomes, providing a valuable resource for informed decision-making. For example, an AI agent can recall a patient’s medical history, previous treatments, and responses to medications, helping clinicians to tailor more personalized treatment plans.61 Additionally, the memory system facilitates continuous learning, enabling the AI agent to refine its decision-making capabilities as new data become available. This learning is crucial for improving diagnostic accuracy, predicting patient outcomes, and identifying potential risks. By maintaining and updating a dynamic repository of knowledge, the memory system ensures that AI agents can leverage accumulated data for more precise and contextually appropriate actions. Thus, the memory system plays a pivotal role in enhancing the long-term effectiveness and reliability of AI agents in healthcare, ensuring they evolve in response to changing clinical needs and advancements in medical knowledge.
This four-component framework provides the theoretical architecture for these agents. The following section will demonstrate how this architecture is realized in practice across various clinical applications.
Potential applications of medical AI agents in clinical practice
As illustrated in Figure 2A, these individual applications do not operate in isolation but can be integrated into a cohesive, end-to-end patient care workflow orchestrated by a multi-agent system. This conceptual workflow traces a patient’s journey from pre-admission, where an AI Early Warning Agent uses social and behavioral data to identify at-risk individuals, to hospitalization. Upon admission, a Diagnosis AI Agent synthesizes data from EHRs, lab tests, and imaging to perform differential diagnosis and risk stratification. This informs both the Treatment AI Agent, which devises personalized plans like targeted therapies, and the Surgical Robotics AI Agent, which assists in procedures with real-time navigation. Throughout the hospital stay and even after discharge, a Real-time Monitoring Agent continuously tracks patient status, ensuring timely interventions. This workflow-based model demonstrates how specialized AI agents can collaborate synergistically, transforming fragmented care into a continuous, proactive, and deeply personalized healthcare experience.
Figure 2.
From a multi-agent clinical workflow to an AI Agent Hospital Ecosystem
This figure illustrates the transformative potential of medical AI agents at two interconnected scales.
(A) The patient-level clinical workflow: this panel depicts an end-to-end patient journey orchestrated by a team of specialized AI agents. It begins with pre-admission risk assessment by an AI Early Warning Agent and proceeds through in-hospital diagnosis, treatment, and surgery managed by dedicated agents, all under the continuous watch of a real-time monitoring agent. This demonstrates how multi-agent collaboration can create a seamless, personalized care continuum for an individual patient.
(B) The system-level AI Agent Hospital Ecosystem: this panel expands the concept to a system-wide level. The Agent Hospital Healthy System acts as a central coordinator, managing resources and patient flow across different healthcare facilities. At the operational level, function-specific agents such as the ED Agent, IW Agent, and MDT Agent handle the complex logistics of a hospital. (B) The ultimate vision where the patient-level workflows shown in (A) are scaled up and integrated across an entire health system, creating a truly intelligent, efficient, and interconnected healthcare network.
ED, emergency department; OD, outpatient department; IW, inpatient ward; ICU, intensive unit care; ST, surgical theater; MDT, multidisciplinary team.
Diagnosis and decision support
As shown in Figure 2A, AI agents possess the capacity to process extensive patient data, including medical histories, diagnostic test results, and imaging, thereby aiding healthcare professionals in making more informed and precise diagnostic decisions.62,63,64,65,66,67,68,69 For example, Google’s Med-PaLM 2 demonstrated proficiency in medical question answering, achieving performance comparable to clinical experts on medical licensing examinations and performing better than generalist physicians in real-world medical questions.70 In sepsis detection, AI agents like those deployed at Johns Hopkins Hospital continuously monitor patient vital signs and laboratory values, alerting clinicians to early sepsis indicators with promising sensitivity rates and potential for reducing mortality rates.71 By synthesizing vast datasets drawn from medical literature, clinical guidelines, and real-time patient data, these agents significantly enhance diagnostic accuracy and efficiency.72 Their ability to discern complex patterns, often unnoticed by humans, is particularly valuable in the diagnosis of rare or multifaceted conditions. Moreover, AI agents continuously evolve through iterative learning from new datasets, enabling them to adapt to emerging medical knowledge and evolving clinical standards.73,74 This dynamic learning process ensures that AI agents remain aligned with the latest advancements in medical science, further enhancing their diagnostic capabilities.75 By providing robust, evidence-based insights, AI agents help reduce diagnostic errors, enabling clinicians to deliver more timely and precise care. Through the integration of comprehensive data sources with real-time clinical information, AI agents serve as indispensable decision-support tools, enhancing both the accuracy and efficacy of the diagnostic workflow and ultimately improving patient outcomes.
Medical imaging analysis
AI-driven medical imaging analysis has emerged as a pivotal tool in modern healthcare, significantly improving the accuracy and efficiency of diagnostic processes.66,74,76,77,78,79,80,81,82,83 By leveraging deep learning algorithms, AI systems can interpret complex medical images—such as X-rays, CT scans, MRIs, pathology images, and ultrasounds—providing automated, yet highly precise assessments.84,85 These systems are capable of identifying subtle patterns and anomalies that may elude human observation, enabling earlier detection of conditions such as tumors, fractures, and neurological disorders.86 Additionally, AI algorithms improve continuously through exposure to diverse datasets, ensuring that their diagnostic capabilities evolve in response to new clinical evidence. By automating the image interpretation process, AI not only accelerates the diagnostic workflow but also reduces the potential for human error. Furthermore, AI’s ability to integrate multimodal data, such as patient histories and genetic information, enhances the contextual understanding of imaging results, fostering more accurate and personalized treatment decisions. In this manner, AI is poised to redefine the landscape of medical imaging, enhancing both diagnostic precision and patient outcomes.87
Personalized treatment
AI agents show significant potential in the realm of personalized treatment by leveraging a variety of patient-specific data, including genetic information, medical history, lifestyle factors, and treatment responses.77,88,89,90,91,92 With the ability to analyze and integrate this complex data, AI can assist in tailoring individualized treatment plans that optimize therapeutic outcomes. For instance, in oncology, AI agents can analyze a patient’s genetic profile alongside clinical data to recommend targeted therapies that are more likely to be effective, minimizing adverse effects and improving prognosis.93 Additionally, AI can predict how patients will respond to specific drugs based on their unique genetic makeup, facilitating the choice of the most suitable medication and dosage.94 By continuously learning from patient data and treatment outcomes, AI agents help refine treatment plans in real-time, ensuring the most effective interventions are implemented throughout the course of care.95 This ability to offer personalized, data-driven recommendations is transforming healthcare delivery, making it more precise and patient-centered.
Surgical assistance and robotic surgery
AI-driven surgical assistance and robotic surgery represent a paradigm shift in the precision and safety of medical interventions.96,97,98,99,100 By integrating advanced algorithms with robotic systems, AI enhances the surgeon’s ability to perform complex procedures with unparalleled accuracy.101 These AI systems facilitate real-time image analysis, providing the surgeon with enhanced visualizations of the surgical site and critical anatomical structures, thus improving decision-making and reducing intraoperative risks.102 Moreover, robotic surgery systems guided by AI can execute highly precise movements, minimizing human error and ensuring optimal outcomes, particularly in delicate, minimally invasive procedures.103 The continuous learning capability of AI allows these systems to adapt based on historical surgical data, refining their performance over time. Furthermore, AI can predict potential complications during surgery by analyzing real-time data from patient monitoring systems, enabling proactive interventions. Through the integration of AI in surgical environments, healthcare delivery is transformed, leading to improved accuracy, reduced recovery times, and enhanced overall patient outcomes.
Real-time patient monitoring
AI-driven real-time patient monitoring systems are revolutionizing clinical care by enabling continuous, data-driven tracking of patient health.90,104,105 Through integration of wearable devices, biosensors, and remote monitoring tools, AI agents can collect and analyze a range of physiological data, including heart rate, blood pressure, and blood glucose levels.106 This real-time data analysis allows for early detection of health abnormalities, facilitating timely interventions and reducing risks.107 AI algorithms can identify subtle patterns indicating the onset of critical events, such as arrhythmias or sepsis, which may otherwise go unnoticed. Moreover, through continuous learning, these systems refine their predictive capabilities, tailoring their assessments to individual patient conditions. By enhancing the accuracy of clinical decision-making, AI systems support healthcare providers in delivering more personalized, proactive care. Consequently, real-time patient monitoring powered by AI not only improves the responsiveness of healthcare teams but also contributes to better patient outcomes by enabling timely, evidence-based interventions.
AI Agent Hospital Healthy System
The applications discussed above—from diagnosis and treatment to surgery and monitoring—are not merely isolated advancements. This concept can be scaled to an entire healthcare system, culminating in the visionary model of the AI Hospital (Figure 2B). This idea is no longer purely theoretical; pioneering research, such as the “Agent Hospital” simulation by Li et al.50 from Tsinghua University, has already demonstrated a virtual hospital where all roles—patients, nurses, and doctors—are played by autonomous LLM-powered agents. Inspired by such research, an overarching “Agent Hospital Healthy System” coordinates patient flow across a network of facilities, while function-specific agents—such as the ED (emergency department) agent, IW (inpatient ward) agent, and MDT (multidisciplinary team) agent—orchestrate complex daily logistics. The AI Agent Hospital, therefore, represents the ultimate application of medical AI agents: a fully integrated system where the micro-level efficiencies of individual patient care are aggregated to revolutionize healthcare delivery. This forward-looking application underscores the immense transformative potential of this technology, while also highlighting the scale of the challenges that will be discussed in the subsequent sections.
Stages of autonomy in medical AI agents
The integration of AI into healthcare is marked by distinct levels of autonomy, each representing a deeper integration of AI agents into clinical workflows.108,109 These levels range from the absence of AI intervention to full autonomy in patient care management. As AI systems evolve, they progressively take on more complex roles, supporting healthcare professionals in a wide range of tasks, from data analysis to decision-making.
In the initial stage, the use of AI is nonexistent, with healthcare professionals entirely responsible for diagnostic and treatment decisions. In this phase, there is no AI agent involved in the clinical process, and all tasks, including hypothesis generation, patient assessment, and treatment planning, are performed solely by human practitioners. AI has yet to play any role in augmenting clinical decision-making or workflow efficiency.
In the intermediate stage, AI agents act as assistants to healthcare providers. These systems assist with predefined tasks, such as analyzing patient data, identifying trends, and offering diagnostic suggestions based on established protocols. While AI provides valuable support, it remains a passive tool, with human clinicians making the final decisions and overseeing patient care.
In the advanced stage, AI agents transition into active collaborators, playing a significant role in clinical decision-making. At this stage, AI systems generate hypotheses, assist in designing treatment plans, and analyze complex datasets to provide real-time, data-driven insights. Healthcare providers and AI agents work in tandem, with AI continuously learning from new data to refine its recommendations, while human professionals retain control over final decisions and care pathways.
In the final stage, AI agents reach full autonomy, independently handling complex tasks such as diagnosing conditions, formulating treatment strategies, and adapting care plans in real time. These agents are capable of processing vast amounts of patient data, applying advanced machine learning algorithms, and autonomously adjusting patient care based on continuous monitoring and new information. Healthcare providers primarily focus on oversight and ethical validation, with AI executing the majority of clinical decisions, significantly enhancing both the efficiency and accuracy of healthcare delivery.
Developing medical AI agent: A comprehensive approach
The development of medical AI agents can be approached through two primary methodologies: graphical user interface (GUI)-based frameworks and coding-based development.110,111,112 GUI-based approaches, commonly facilitated by low-code or no-code platforms, offer a streamlined pathway for constructing AI systems, allowing users to leverage intuitive drag-and-drop tools for rapid development. These platforms are particularly advantageous for prototyping and for those with limited programming expertise, facilitating quick deployment in specific medical use cases. However, this approach often sacrifices flexibility, scalability, and, importantly, data privacy, especially when integration with external services is required. In contrast, coding-based development provides a more robust and customizable solution, granting developers full control over the AI agent’s architecture, decision-making processes, and data handling capabilities. This approach is particularly valuable in healthcare settings, where the security of sensitive patient data is paramount. Given its advantages in flexibility, control, and privacy, this part focuses on the coding-based development of medical AI agents, offering a comprehensive exploration of this methodology’s capabilities and applications (Figure 3).
Figure 3.
The development of medical AI agent
The development of medical AI agents follows a structured process comprising five key stages. Defining the problem and scope involves identifying clinical needs such as early diagnosis, outcome prediction, and robot-assisted surgery. Selecting an agentic framework entails choosing AI architectures like LangChain, AutoGen, and MetaGPT for efficient task execution. Developing the AI Model includes pre-training, fine-tuning for medical contexts, and reinforcement learning to enhance performance. Test and validation ensures safety and effectiveness through a structured pipeline that reflects the phased progression of clinical trials, moving from initial in silico simulations to controlled real-world clinical trials and subsequent external and longitudinal validation. Deployment and continuous learning integrates AI agents into clinical environments, allowing real-time optimization, adaptation to evolving data, and ensuring data security and privacy.
Defining the problem and scope
The initial phase in the development of a medical AI agent involves the precise identification of the specific healthcare problem that the agent is intended to address. This process extends beyond a narrow technical task; it requires contextualizing the agent’s role within the broader ecosystems of digital health and clinical decision-making. In the context of digital health, defining the problem means understanding how the agent will function as a potential data integrator rather than another data silo. For example, is the agent designed to merely process a single data stream, or will it actively synthesize information from disparate sources like EHRs, wearable devices, and genomic databases to provide a holistic patient view? This consideration determines whether the agent supports a truly interconnected health model or simply adds to the existing fragmentation. Similarly, the scope must clarify if the agent’s role is reactive (e.g., analyzing past events) or proactive (e.g., forecasting health risks), which aligns its development with the larger trend in digital health toward preventative care. Within the clinical decision-making process, defining the scope involves specifying the agent’s level of partnership. Is it intended to be a passive information provider, merely flagging abnormalities for clinicians? Or is it envisioned as a more active cognitive partner, capable of generating hypotheses, weighing evidence, and engaging in diagnostic reasoning alongside the healthcare team? This distinction is critical, as it directly impacts the design of human-agent interaction, the requirements for explainability, and the necessary evolution of clinical workflows to accommodate this new form of collaboration. Therefore, a well-defined problem statement that situates the agent within these broader contexts is crucial. It ensures the AI system is meticulously tailored to the complex, interconnected realities of modern healthcare, optimizing its utility and ensuring its relevance and efficacy.
Selecting the right agentic framework
After defining the problem, selecting the appropriate framework is crucial for developing a robust and effective medical AI agent. Agentic frameworks, such as LangChain and Microsoft AutoGen, provide the essential infrastructure required to construct AI systems capable of managing complex, dynamic workflows. These frameworks are designed with modular components, offering pre-built tools, memory management systems, and seamless data integration capabilities, thereby significantly reducing development time and simplifying the creation of intelligent agents. It is crucial that the chosen framework enables the AI agent to effectively interface with clinical databases, medical imaging tools, and real-time patient monitoring systems. Such integration is indispensable for generating actionable insights that can inform clinical decision-making and enhance the overall quality of patient care. The framework must not only support efficient data processing but also ensure that the AI agent can interact with diverse medical systems in a way that aligns with clinical workflows and regulatory standards.
Developing the AI model
Once the framework is established, the subsequent phase involves the development of the AI model that will underpin decision-making processes. This step requires the selection or training of an appropriate model based on the available medical data, including EHRs, patient histories, and diagnostic imaging.113 The model may either be pre-trained—leveraging open-source or proprietary models—or constructed from the ground up using a dataset tailored to the specific clinical application. When using a pre-trained model, fine-tuning is often necessary to adapt the model to the particularities of the medical context and the defined task. Alternatively, training a bespoke model provides greater flexibility, enabling the AI system to learn directly from proprietary data, identify intricate patterns, and generate highly accurate predictions. Regardless of the approach, memory management is integral to the model’s functionality, as the AI must retain and integrate contextual knowledge from past interactions and patient data. This capacity for contextual continuity is essential for delivering consistent, long-term care recommendations and ensuring the system’s adaptability to the dynamic nature of patient conditions.
Test and validation
Prior to the deployment of a medical AI agent in clinical practice, it is imperative that the agent undergoes comprehensive testing and validation to ensure its safety, efficacy, and alignment with medical standards. This process involves both in silico simulations and clinical trials, allowing the AI to be evaluated in real-world conditions and ensuring that its performance meets the rigorous demands of healthcare environments. To ensure clinical relevance, this validation should ideally mirror the phased progression of traditional drug or device trials, beginning with initial feasibility and safety assessments before moving to larger-scale efficacy studies. The AI system must be meticulously validated against established medical protocols to confirm that its diagnostic predictions and therapeutic recommendations are consistent with current clinical expertise. Furthermore, adherence to regulatory frameworks, including compliance with data privacy laws such as HIPAA and medical device guidelines such as those set by the Food and Drug Administration (FDA), is essential for ensuring legal and ethical integrity. Just as importantly, ethical considerations such as transparency in decision-making, accountability for outcomes, and the mitigation of biases must be thoroughly addressed during this stage. This ensures that the AI operates in a manner that is scientifically sound and equitable in its application to patient care.
Deployment and continuous learning
Following successful validation, the medical AI agent is ready for deployment within a clinical environment. Seamless integration with existing healthcare infrastructure, including hospital management systems and EHRs, is crucial for efficient operation and data flow. Beyond technical integration, effective human-agent interaction design, grounded in principles from human-factors engineering,114 is critical for successful clinical adoption. This requires intuitive interfaces that present AI recommendations in contextually appropriate formats with clear confidence indicators and alternative options, a design imperative focused on minimizing clinician cognitive load and enhancing situational awareness. Trust-building mechanisms must include gradual exposure protocols, performance transparency dashboards, and feedback loops where clinicians can correct AI decisions, enabling the system to learn from human expertise. Interface design should minimize cognitive load by presenting information hierarchically, highlighting critical alerts, and integrating seamlessly into existing electronic health record workflows, as supported by numerous usability studies in clinical settings.115,116,117 Once deployed, continuous monitoring and regular feedback from healthcare professionals are essential to assess the AI system’s ongoing effectiveness and relevance in real-world clinical settings. It is imperative that AI agents are designed to learn iteratively from new data and evolving clinical experiences, thereby enabling them to refine their capabilities and adjust to shifts in medical knowledge, practices, and patient demographics. This capacity for dynamic learning ensures that the AI system remains responsive and adaptive, optimizing its performance and clinical outcomes over time. Furthermore, as medical knowledge evolves and new challenges arise, the AI agent’s continuous improvement contributes to more accurate decision-making and enhanced healthcare delivery, ultimately improving both patient care and operational efficiency.
Evaluation of medical AI agent
Recommendation accuracy
The fundamental criterion for evaluating a medical AI agent is its precision in generating diagnostic and treatment recommendations.80,100,118,119,120 This can be assessed by comparing the agent’s outputs with those of experienced clinicians, particularly in the context of complex or rare conditions where human expertise is paramount. Diagnostic accuracy is typically quantified using established clinical metrics, including sensitivity, specificity, and overall accuracy. These measures provide a rigorous framework for assessing the AI system’s performance, ensuring its recommendations align with current medical standards and evidence-based practices. Such evaluations are critical in confirming the AI’s reliability as a clinical support tool, demonstrating its capability to offer valid, data-driven guidance in real-world medical settings.
Task execution and efficiency
The efficiency of a medical AI agent is not solely defined by the speed of task execution but also by the ability to reduce human workload, streamline medical workflows, and optimize overall operational efficiency.121 The assessment of efficiency involves evaluating key performance metrics such as task completion time, success rate, step complexity, and economic cost. These factors are particularly relevant when AI agents rely on proprietary or closed-source models, which may introduce significant computational costs and licensing fees. For example, in applications such as automated medical record entry, drug prescription recommendations, and surgical planning, efficiency can be gauged by the time required to complete tasks, the number of steps needed, and the associated resource consumption. The economic cost, especially when leveraging closed-source models, must also be considered, as this can affect the cost-effectiveness of AI implementation. An efficient AI system that handles routine tasks with minimal time and resource expenditure enables healthcare professionals to focus more on patient care, thereby improving clinical outcomes and operational productivity.
Long-term learning and adaptation
Medical AI agents should be designed to continuously learn from new data and clinical experiences to sustain their relevance and effectiveness. However, evaluating long-term learning capabilities presents significant methodological challenges, requiring longitudinal studies with substantial costs and potential confounding variables from evolving medical practices. The evaluation of their capacity for ongoing improvement is based on the system’s ability to incorporate model updates and integrate feedback loops.122,123 AI systems that embed continuous learning processes are better equipped to adapt to emerging disease patterns, evolving treatment strategies, and changing patient outcomes. This adaptive capacity allows the AI to remain responsive to advancements in medical knowledge, ensuring its clinical applicability and enhancing patient care. Given the rapid pace of progress in medical tools and methodologies, as well as the variability of clinical contexts, it is essential that AI models are not static but modular. Such modularity facilitates the incorporation of specialized models tailored to distinct clinical scenarios, thereby increasing the agent’s versatility and scope of application. This approach enables the AI agent to integrate new technologies and methodologies seamlessly, ensuring its sustained relevance across a wide array of healthcare applications, from diagnosis to treatment planning. To guide future research in this area, potential methodologies could include longitudinal simulation studies, which would allow for the assessment of an agent’s adaptability in controlled, evolving virtual environments, thus overcoming the cost and complexity of long-term clinical trials.
Explainability and transparency
In the medical domain, the explainability of AI agents plays a crucial role in fostering their acceptance and ensuring their trustworthiness among healthcare professionals. AI systems are expected to provide transparent reasoning for their decisions, particularly when offering diagnostic suggestions or treatment strategies. The ability for clinicians to understand the rationale behind an AI’s conclusions is essential for informed clinical decision-making, as it cultivates confidence in the system and supports appropriate human oversight. The evaluation of explainability involves assessing the AI’s capacity to present its reasoning in a manner that is both accessible and comprehensible to medical practitioners. This may include the use of interpretable models and tools designed to visualize or explain the decision-making process, such as feature importance maps or decision trees. Explainability becomes even more critical in high-stakes environments, where AI-generated recommendations can significantly impact patient health. By ensuring that AI systems can provide clear, justifiable explanations, their clinical utility is enhanced, and the indispensable role of healthcare providers in decision-making is reinforced. Despite its importance, quantifying explainability remains inherently subjective, as different stakeholders—clinicians, patients, and regulators—may require fundamentally different types and levels of explanation. This subjectivity creates challenges in establishing standardized evaluation metrics that adequately capture the diverse needs and perspectives across healthcare settings. Future work could focus on developing stakeholder-specific explainability benchmarks, creating tailored evaluation frameworks that measure the utility and clarity of AI-generated explanations for distinct end-users, thereby moving beyond a one-size-fits-all approach to transparency.
While these applications highlight the immense potential of medical AI agents, their translation from promising prototypes to widespread clinical reality is fraught with significant hurdles. The following sections will critically analyze these technical, ethical, and practical challenges that must be addressed.
Ethical and technical considerations
The implementation of AI agents in healthcare requires addressing various ethical and technical challenges to ensure they enhance healthcare delivery while maintaining trust, equity, and efficiency.124 Real-world implementation faces substantial hurdles including IT infrastructure integration complexities, where legacy hospital systems often lack standardized APIs and interoperability standards. Cost-benefit analyses reveal significant upfront investments in computational infrastructure, staff training, and system maintenance, with implementation costs varying substantially depending on scope and complexity. Clinician adoption represents another critical barrier, as some healthcare providers may express concerns about AI reliability and workflow disruption, necessitating comprehensive training programs and change management strategies to ensure successful integration.
Ethics
Medical AI agents must prioritize patient privacy and data security when they process sensitive information such as medical histories and treatment plans.125,126,127 Robust encryption and strict access policies are crucial for preventing data breaches and misuse.128 Transparency in decision-making is equally vital, requiring AI systems to provide clear explanations for their recommendations to ensure accountability and foster trust among patients and clinicians.129 Another pressing concern is model bias, as AI systems may inadvertently reflect historical healthcare disparities, potentially leading to inequitable outcomes. Rigorous bias mitigation strategies are essential to ensure fairness across diverse patient populations. Additionally, the debate over closed-source versus open-source AI models presents unique ethical dilemmas. While closed-source systems may limit transparency, open-source models risk uncontrolled dissemination and misuse, necessitating a careful balance between innovation and security. Finally, concerns about AI replacing clinicians are unfounded, as AI agents are best positioned to augment, not replace, human expertise, allowing clinicians to focus on relational and complex aspects of care.
Technical challenges
The effective deployment of AI agents is hindered by several technical barriers. Efficiency remains a significant issue, as some tasks require extended processing times, impeding timely decision-making in critical medical scenarios.130 Ensuring the quality of AI outputs is another challenge, as inconsistencies or ambiguities in results can have serious implications for patient care. Failure cases in medical AI systems underscore these challenges, including instances where AI diagnostic systems have demonstrated reduced accuracy across different patient populations and cases where AI-powered prediction algorithms have generated excessive false alarms, potentially causing alert fatigue among clinicians and affecting response to genuine emergencies. Limitations in context length also pose challenges, particularly for analyzing extensive clinical records and longitudinal data. Moreover, unexpected AI behaviors and deviations from instructions can disrupt workflows, leading to errors in diagnoses or treatment recommendations. Network reliability further complicates deployment, as latency or service interruptions can compromise real-time applications in healthcare. Addressing these challenges requires advances in model design, robust validation protocols, improved context management, and optimized network infrastructure.
Regulatory framework adaptation
Current medical device regulatory frameworks require substantial adaptation to accommodate the unique characteristics of AI agents with increasing autonomy and continuous learning capabilities. Traditional FDA pathways for medical device approval assume static algorithms with predictable behaviors, whereas autonomous AI agents evolve their decision-making patterns through continuous learning, presenting novel challenges for safety and efficacy monitoring. In response, regulatory agencies are establishing dedicated initiatives, such as the FDA’s Digital Health Center of Excellence, to foster innovation while ensuring patient safety. These bodies are developing adaptive frameworks, but regional approaches differ. For instance, the FDA’s Software as a Medical Device (SaMD) framework focuses on a risk-based and iterative life cycle approach, even allowing for pre-approved modification plans for adaptive AI algorithms. In contrast, the EU’s AI Act provides a broader, cross-sectoral regulation that classifies AI systems into risk tiers, imposing stringent requirements on “high-risk” systems, which include many medical AI agents. While these frameworks establish risk-based classifications and require ongoing post-market surveillance, they must continue to evolve to fully address continuous learning algorithms that may develop unexpected behaviors, requiring dynamic validation protocols and real-time monitoring systems to ensure patient safety.
Future directions for medical AI agents
From reactive to proactive
By leveraging continuous monitoring of vital signs, environmental data, and patient histories, coupled with advanced predictive analytics, AI agents are capable of forecasting health risks and intervening before symptoms emerge. This paradigm empowers individuals to take control of their health and alleviates pressure on healthcare systems by prioritizing prevention over treatment. The transition to proactive medical AI represents a transformative step toward anticipatory, precision-focused healthcare that aligns with sustainability and patient-centric principles.
Personalized AI agents
Personalized medical agents, functioning as real-time health assistants, utilize advanced AI to monitor individual health metrics, including vital signs, activity levels, and environmental influences.131 By analyzing data from wearable devices and biosensors, these agents deliver tailored health recommendations, predict potential risks, and issue timely alerts. This integration of continuous monitoring and predictive analytics enables individuals to proactively manage their well-being while facilitating early interventions, bridging the gap between clinical care and personal health autonomy for a more preventive, patient-centered approach.
Synergistic multi-agent collaboration
Collaborative multi-agent systems employ sophisticated coordination mechanisms including hierarchical task decomposition, distributed consensus algorithms, and dynamic role allocation protocols. The communication framework utilizes standardized medical ontologies to ensure semantic interoperability between agents. Conflict resolution mechanisms implement priority-based decision trees when agents provide contradictory recommendations, with escalation protocols involving human clinicians for complex cases. Diagnostic, therapeutic, and monitoring agents work in tandem, integrating their expertise to address complex medical challenges. For example, diagnostic agents can analyze imaging data, while therapeutic agents develop tailored treatment plans, supported by real-time feedback from monitoring agents. This dynamic synergy enhances precision, efficiency, and adaptability across diverse clinical scenarios, from individualized care to large-scale public health interventions, paving the way for a more integrated and responsive healthcare ecosystem.
Lightweight models on edge devices
The deployment of lightweight AI models on edge devices revolutionizes medical AI by enabling real-time, localized data processing with minimal latency.132,133 Optimized through techniques like pruning and quantization, these models deliver high performance while reducing computational demands, ensuring functionality even in low-resource settings.103 By minimizing data transmission, they enhance patient privacy and mitigate connectivity limitations, making AI-powered diagnostics and personalized care accessible globally. Lightweight models on edge devices represent a transformative step toward equitable, efficient, and secure healthcare delivery.
Impact on medical education
The widespread adoption of AI agents necessitates fundamental changes to medical education curricula and training paradigms. Future clinicians must develop competencies in human-AI collaboration, including understanding AI limitations, interpreting algorithmic recommendations, and maintaining clinical reasoning skills despite AI assistance. Medical schools are beginning to integrate AI literacy courses covering machine learning principles, bias recognition, and ethical AI use. Additionally, training programs must emphasize skills that complement AI capabilities, such as empathetic patient communication, complex ethical decision-making, and adaptability to evolving technological landscapes. Simulation-based training with AI agents allows students to practice collaborative decision-making in controlled environments before clinical practice.
Conclusion
Medical AI agents are transforming healthcare by integrating advanced decision-making capabilities, autonomy, and adaptability into clinical workflows. Through multimodal technologies and a framework of planning, action, reflection, and memory, these agents address complex challenges such as personalized care, real-time monitoring, and predictive analytics. Our comprehensive review establishes a conceptual framework for medical AI agents, examines their current applications with specific case studies demonstrating promising clinical improvements, and addresses critical implementation challenges including IT integration complexities, clinician adoption barriers, and regulatory considerations. The evidence presented shows that medical AI agents can achieve significant clinical benefits in areas such as sepsis detection and emergency medicine response times, while also highlighting important limitations and failure cases that must be addressed. Future developments, including proactive systems, personalized agents, and lightweight edge models, emphasize a shift toward precision, efficiency, and equity in healthcare delivery. By fostering collaboration among diverse agents and leveraging cutting-edge technologies, AI agents hold immense potential to revolutionize patient care and improve health outcomes across a wide range of applications. However, successful implementation requires careful attention to human-agent interaction design, continuous learning evaluation, and adaptation of regulatory frameworks to accommodate the unique characteristics of autonomous AI systems in healthcare.
Acknowledgments
This research was supported by the National Natural Science Foundation of China (32470964, 32100631, W2431057, and 32141005), the Macau Science and Technology Development Fund, Macao (0007/2020/AFJ, 0070/2020/A2, and 0003/2021/AKP), and Guangzhou National Laboratory (YW-SLJC0201) and sponsored by the Beijing Nova Program (20240484627), the Capital's Funds for Health Improvement and Research (2024-4-40215), the China Postdoctoral Science Foundation (2023T160061), the Macao Young Scholars Program (AM2023018), National High-Level Hospital Clinical Research Funding and the Beijing Hope Run Special Fund of Cancer Foundation of China (LC2022B29), and the Innovation Team and Talents Cultivation Program of the National Administration of Traditional Chinese Medicine (no.: ZYYCXTD-D-202402).
Author contributions
Y.Y. and K.Z. conceived the idea. F.L. curated and analyzed the data, F.L. wrote the manuscript and created and generated the figures and tables. Y.N., Q.H.Z., J.L., K.W., Z.Y.D., I.N.W., L.L.C., T.L., L.D., K.L., G.L., T.W.H., M.F., H.Y.L., X.M.C., K.Z., and Y.Y. commented on and revised the manuscript.
Declaration of interests
The authors declare no competing interests.
Declaration of generative AI and AI-assisted technologies in the writing process
During the preparation of this work, the authors used a large language model for grammar and language refinement in the writing process. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the final publication.
Contributor Information
Manson Fok, Email: manson.fok@gmail.com.
Huiyan Luo, Email: luohy@sysucc.org.cn.
Xiangmei Chen, Email: xmchen301@126.com.
Kang Zhang, Email: kang.zhang@gmail.com.
Yun Yin, Email: jennyyin629@gmail.com.
References
- 1.Alison D., Athena R., Paul W. Conversational Agents in Health Care. JAMA. 2020;324 doi: 10.1001/jama.2020.21509. [DOI] [Google Scholar]
- 2.Durante Z., Huang Q., Wake N., Gong R., Park J.S., Sarkar B., Taori R., Noda Y., Terzopoulos D., Choi Y. Agent AI: Surveying the horizons of multimodal interaction. arXiv. 2024 doi: 10.48550/arXiv.2401.03568. Preprint at. [DOI] [Google Scholar]
- 3.Kapoor S., Stroebl B., Siegel Z.S., Nadgir N., Narayanan A. AI Agents That Matter. arXiv. 2024 doi: 10.48550/arXiv.2407.01502. Preprint at. [DOI] [Google Scholar]
- 4.Kolt N. Governing AI agents. arXiv. 2025 doi: 10.48550/arXiv.2501.07913. Preprint at. [DOI] [Google Scholar]
- 5.Fan W., Chen P., Shi D., Guo X., Kou L. Multi-agent modeling and simulation in the AI age. Tsinghua Sci. Technol. 2021;26:608–624. [Google Scholar]
- 6.Ruan J., Chen Y., Zhang B., Xu Z., Bao T., Mao H., Li Z., Zeng X., Zhao R. Tptu: Task planning and tool usage of large language model-based AI agents. arXiv. 2023 doi: 10.48550/arXiv.2308.03427. Preprint at. [DOI] [Google Scholar]
- 7.Ye A., Ma Q., Chen J., Li M., Li T., Liu F., Mai S., Lu M., Bao H., You Y. SOP-Agent: Empower General Purpose AI Agent with Domain-Specific SOPs. arXiv. 2025 doi: 10.48550/arXiv.2501.07913. Preprint at. [DOI] [Google Scholar]
- 8.Agashe S., Han J., Gan S., Yang J., Li A., Wang X.E. Agent S: An open agentic framework that uses computers like a human. arXiv. 2024 doi: 10.48550/arXiv.2410.08164. Preprint at. [DOI] [Google Scholar]
- 9.Feriani A., Hossain E. Single and multi-agent deep reinforcement learning for AI-enabled wireless networks: A tutorial. IEEE Commun. Surv. Tutorials. 2021;23:1226–1252. [Google Scholar]
- 10.Charlotte J.,H., Jeffrey M.,D. Artificial Intelligence and Machine Learning in Clinical Medicine, 2023. N. Engl. J. Med. 2023;388:1201–1208. doi: 10.1056/NEJMra2302038. [DOI] [PubMed] [Google Scholar]
- 11.Hong J.-W., Williams D. Racism, responsibility and autonomy in HCI: Testing perceptions of an AI agent. Comput. Hum. Behav. 2019;100:79–84. [Google Scholar]
- 12.Putta P., Mills E., Garg N., Motwani S., Finn C., Garg D., Rafailov R. Agent Q: Advanced reasoning and learning for autonomous AI agents. arXiv. 2024 doi: 10.48550/arXiv.2408.07199. Preprint at. [DOI] [Google Scholar]
- 13.White R.W. Advancing the search frontier with AI agents. Commun. ACM. 2024;67:54–65. [Google Scholar]
- 14.Zhang J., Arawjo I. ChainBuddy: An AI Agent System for Generating LLM Pipelines. arXiv. 2024 doi: 10.1145/3706598.3714085. Preprint at. [DOI] [Google Scholar]
- 15.Ashktorab, Z., Dugan, C., Johnson, J., Pan, Q., Zhang, W., Kumaravel, S., and Campbell, M. (2021). Effects of communication directionality and AI agent differences in human-AI interaction. pp. 1-15.
- 16.Grzonka D., Jakóbik A., Kołodziej J., Pllana S. Using a multi-agent system and artificial intelligence for monitoring and improving the cloud performance and security. Future Gener. Comput. Syst. 2018;86:1106–1117. [Google Scholar]
- 17.Achiam J., Adler S., Agarwal S., Ahmad L., Akkaya I., Aleman F.L., Almeida D., Altenschmidt J., Altman S., Anadkat S. Gpt-4 technical report. arXiv. 2023 doi: 10.48550/arXiv.2303.08774. Preprint at. [DOI] [Google Scholar]
- 18.Mirchandani S., Xia F., Florence P., Ichter B., Driess D., Arenas M.G., Rao K., Sadigh D., Zeng A. Large language models as general pattern machines. arXiv. 2023 doi: 10.48550/arXiv.2307.04721. Preprint at. [DOI] [Google Scholar]
- 19.Sun, G., Zhan, X., and Such, J. (2024). Building Better AI Agents: A Provocation on the Utilisation of Persona in Llm-Based Conversational Agents. pp. 1-6.
- 20.Xi Z., Chen W., Guo X., He W., Ding Y., Hong B., Zhang M., Wang J., Jin S., Zhou E. The rise and potential of large language model based agents: A survey. arXiv. 2023 doi: 10.48550/arXiv.2309.07864. Preprint at. [DOI] [Google Scholar]
- 21.Huang X., Lian J., Lei Y., Yao J., Lian D., Xie X. Recommender ai agent: Integrating large language models for interactive recommendations. arXiv. 2023 doi: 10.48550/arXiv.2303.08774. Preprint at. [DOI] [Google Scholar]
- 22.Lior A. AI entities as AI agents: Artificial intelligence liability and the AI respondeat superior analogy. Mitchell Hamline L. Rev. 2019;46:1043. [Google Scholar]
- 23.Davies A., Veličković P., Buesing L., Blackwell S., Zheng D., Tomašev N., Tanburn R., Battaglia P., Blundell C., Juhász A., et al. Advancing mathematics by guiding human intuition with AI. Nature. 2021;600:70–74. doi: 10.1038/s41586-021-04086-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Meta Fundamental AI Research Diplomacy Team FAIR†. Bakhtin A., Brown N., Dinan E., Farina G., Flaherty C., Fried D., Goff A., Gray J., Hu H., et al. Human-level play in the game of Diplomacy by combining language models with strategic reasoning. Science. 2022;378:1067–1074. doi: 10.1126/science.ade9097. [DOI] [PubMed] [Google Scholar]
- 25.Bubeck S., Chandrasekaran V., Eldan R., Gehrke J., Horvitz E., Kamar E., Lee P., Lee Y.T., Li Y., Lundberg S. Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv. 2023 doi: 10.48550/arXiv.2303.12712. Preprint at. [DOI] [Google Scholar]
- 26.Wang Z., Cai S., Chen G., Liu A., Ma X., Liang Y. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv. 2023 doi: 10.48550/arXiv.2302.01560. Preprint at. [DOI] [Google Scholar]
- 27.Yao S., Zhao J., Yu D., Du N., Shafran I., Narasimhan K., Cao Y. ReAct: synergizing reasoning and acting in language models (2022) arXiv. 2023 doi: 10.48550/arXiv.2210.03629. Preprint at. [DOI] [Google Scholar]
- 28.Chen Z., Sun Q., Li N., Li X., Wang Y., Chih-Lin I. Enabling mobile AI agent in 6G era: Architecture and key technologies. IEEE Network. 2024;38:66–75. [Google Scholar]
- 29.Sreedharan S., Srivastava S., Kambhampati S. Using state abstractions to compute personalized contrastive explanations for AI agent behavior. Artif. Intell. 2021;301 [Google Scholar]
- 30.Hou X., Guan Y., Han T., Wang C. Towards real-time embodied AI agent: A bionic visual encoding framework for mobile robotics. Int. J. Intell. Robot. Appl. 2024;8:1038–1056. [Google Scholar]
- 31.Wang G., Xie Y., Jiang Y., Mandlekar A., Xiao C., Zhu Y., Fan L., Anandkumar A. Voyager: An Open-Ended Embodied Agent with Large Language Models. Comment: Project website and open-source codebase. 2023;33 https://voyager.minedojo.org/Citedon [Google Scholar]
- 32.Yu Y., Yao S., Zhou T., Fu Y., Yu J., Wang D., Wang X., Chen C., Lin Y. IEEE; 2024. Data on the Move: Traffic-Oriented Data Trading Platform Powered by AI Agent with Common Sense; pp. 521–526. [Google Scholar]
- 33.Ma Y., Wang Z., Yang H., Yang L. Artificial intelligence applications in the development of autonomous vehicles: A survey. IEEE/CAA J. Autom. Sinica. 2020;7:315–329. [Google Scholar]
- 34.Dinneweth J., Boubezoul A., Mandiau R., Espié S. Multi-agent reinforcement learning for autonomous vehicles: A survey. Auton. Intell. Syst. 2022;2:27. [Google Scholar]
- 35.Guo X., Shen Z., Zhang Y., Wu T. Review on the application of artificial intelligence in smart homes. Smart Cities. 2019;2:402–420. [Google Scholar]
- 36.Rivkin D., Hogan F., Feriani A., Konar A., Sigal A., Liu X., Dudek G. AIoT Smart Home via Autonomous LLM Agents. IEEE Internet Things J. 2024 [Google Scholar]
- 37.Bovo R., Abreu S., Ahuja K., Gonzalez E.J., Cheng L.-T., Gonzalez-Franco M. Embardiment: an embodied ai agent for productivity in XR. arXiv. 2024 doi: 10.48550/arXiv.2408.08158. Preprint at. [DOI] [Google Scholar]
- 38.Cardoso R.C., Ferrando A. A review of agent-based programming for multi-agent systems. Computers. 2021;10:16. [Google Scholar]
- 39.Dieker L., Hines R., Wilkins I., Hughes C., Hawkins Scott K., Smith S., Ingraham K., Ali K., Zaugg T., Shah S. Using an Artificial Intelligence (AI) Agent to Support Teacher Instruction and Student Learning. J. Spec. Educ. Prep. 2024;4:78–88. [Google Scholar]
- 40.Li X., Wang C., Sheng Y., Zhang J., Wang W., Yin F.F., Wu Q., Wu Q.J., Ge Y. An artificial intelligence-driven agent for real-time head-and-neck IMRT plan generation using conditional generative adversarial network (cGAN) Med. Phys. 2021;48:2714–2723. doi: 10.1002/mp.14770. [DOI] [PubMed] [Google Scholar]
- 41.Lee P., Bubeck S., Petro J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N. Engl. J. Med. 2023;388:1233–1239. doi: 10.1056/NEJMsr2214184. [DOI] [PubMed] [Google Scholar]
- 42.Gao S., Fang A., Huang Y., Giunchiglia V., Noori A., Schwarz J.R., Ektefaie Y., Kondic J., Zitnik M. Empowering biomedical discovery with ai agents. Cell. 2024;187:6125–6151. doi: 10.1016/j.cell.2024.09.022. [DOI] [PubMed] [Google Scholar]
- 43.Roohani Y., Lee A., Huang Q., Vora J., Steinhart Z., Huang K., Marson A., Liang P., Leskovec J. Biodiscoveryagent: An AI agent for designing genetic perturbation experiments. arXiv. 2024 doi: 10.48550/arXiv.2405.17631. Preprint at. [DOI] [Google Scholar]
- 44.Li C., Wong C., Zhang S., Usuyama N., Liu H., Yang J., Naumann T., Poon H., Gao J. Llava-med: Training a large language-and-vision assistant for biomedicine in one day. Adv. Neural Inf. Process. Syst. 2024;36 [Google Scholar]
- 45.Li B., Yan T., Pan Y., Luo J., Ji R., Ding J., Xu Z., Liu S., Dong H., Lin Z. Mmedagent: Learning to use medical tools with multi-modal agent. arXiv. 2024 doi: 10.48550/arXiv.2407.02483. Preprint at. [DOI] [Google Scholar]
- 46.Kalech M., Natan A. Vol. 36. 2022. pp. 12334–12341. (Model-based diagnosis of multi-agent systems: A survey). [DOI] [Google Scholar]
- 47.Dhatterwal J.S., Naruka M.S., Kaswan K.S. IEEE; 2023. Multi-Agent System Based Medical Diagnosis Using Particle Swarm Optimization in Healthcare; pp. 889–893. [Google Scholar]
- 48.Kumar Y., Koul A., Singla R., Ijaz M.F. Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda. J. Ambient Intell. Humaniz. Comput. 2023;14:8459–8486. doi: 10.1007/s12652-021-03612-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Calisto F.M., Santiago C., Nunes N., Nascimento J.C. BreastScreening-AI: Evaluating medical intelligent agents for human-AI interactions. Artif. Intell. Med. 2022;127 doi: 10.1016/j.artmed.2022.102285. [DOI] [PubMed] [Google Scholar]
- 50.Li J., Wang S., Zhang M., Li W., Lai Y., Kang X., Ma W., Liu Y. Agent hospital: A simulacrum of hospital with evolvable medical agents. arXiv. 2024 doi: 10.48550/arXiv.2405.02957. Preprint at. [DOI] [Google Scholar]
- 51.Jiang Y.-H., Li R., Zhou Y., Qi C., Hu H., Wei Y., Jiang B., Wu Y. AI Agent for Education: von Neumann Multi-Agent System Framework. arXiv. 2024 doi: 10.48550/arXiv.2501.00083. Preprint at. [DOI] [Google Scholar]
- 52.Tran K.-T., Dao D., Nguyen M.-D., Pham Q.-V., O'Sullivan B., Nguyen H.D. Multi-Agent Collaboration Mechanisms: A Survey of LLMs. arXiv. 2025 doi: 10.48550/arXiv.2501. Preprint at. [DOI] [Google Scholar]
- 53.Zeeshan T., Kumar A., Pirttikangas S., Tarkoma S. Large Language Model Based Multi-Agent System Augmented Complex Event Processing Pipeline for Internet of Multimedia Things. arXiv. 2025 doi: 10.48550/arXiv.2501.00906. Preprint at. [DOI] [Google Scholar]
- 54.Arun James T., Darren Shu Jeng T., Kabilan E., Laura G., Ting Fang T., Daniel Shu Wei T. Large language models in medicine. Nat Med. 2023;29:1930–1940. doi: 10.1038/s41591-023-02448-8. [DOI] [PubMed] [Google Scholar]
- 55.Guo D., Yang D., Zhang H., Song J., Zhang R., Xu R., Zhu Q., Ma S., Wang P., Bi X. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv. 2025 doi: 10.1109/MNET.2024.3422309. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Moor M., Banerjee O., Abad Z.S.H., Krumholz H.M., Leskovec J., Topol E.J., Rajpurkar P. Foundation models for generalist medical artificial intelligence. Nature. 2023;616:259–265. doi: 10.1038/s41586-023-05881-4. [DOI] [PubMed] [Google Scholar]
- 57.Sun K., Xue S., Sun F., Sun H., Luo Y., Wang L., Wang S., Guo N., Liu L., Zhao T. Medical Multimodal Foundation Models in Clinical Diagnosis and Treatment: Applications, Challenges, and Future Directions. arXiv. 2024 doi: 10.48550/arXiv.2412.02621. Preprint at. [DOI] [PubMed] [Google Scholar]
- 58.Theodoris C.V., Xiao L., Chopra A., Chaffin M.D., Al Sayed Z.R., Hill M.C., Mantineo H., Brydon E.M., Zeng Z., Liu X.S., Ellinor P.T. Transfer learning enables predictions in network biology. Nature. 2023;618:616–624. doi: 10.1038/s41586-023-06139-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Singhal K., Tu T., Gottweis J., Sayres R., Wulczyn E., Amin M., Hou L., Clark K., Pfohl S.R., Cole-Lewis H., et al. Toward expert-level medical question answering with large language models. Nat. Med. 2025;31:943–950. doi: 10.1038/s41591-024-03423-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Singhal K., Azizi S., Tu T., Mahdavi S.S., Wei J., Chung H.W., Scales N., Tanwani A., Cole-Lewis H., Pfohl S. Large language models encode clinical knowledge. arXiv. 2022 doi: 10.48550/arXiv.2212.13138. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Sheng B., Pushpanathan K., Guan Z., Lim Q.H., Lim Z.W., Yew S.M.E., Goh J.H.L., Bee Y.M., Sabanayagam C., Sevdalis N., et al. Artificial intelligence for diabetes care: current and future prospects. Lancet Diabetes Endocrinol. 2024;12:569–595. doi: 10.1016/s2213-8587(24)00154-2. [DOI] [PubMed] [Google Scholar]
- 62.Peng C., Yang X., Chen A., Smith K.E., PourNejatian N., Costa A.B., Martin C., Flores M.G., Zhang Y., Magoc T., et al. A study of generative large language model for medical research and healthcare. NPJ Digit. Med. 2023;6:210. doi: 10.1038/s41746-023-00958-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Yala A., Mikhael P.G., Lehman C., Lin G., Strand F., Wan Y.-L., Hughes K., Satuluru S., Kim T., Banerjee I., et al. Optimizing risk-based breast cancer screening policies with reinforcement learning. Nat. Med. 2022;28:136–143. doi: 10.1038/s41591-021-01599-w. [DOI] [PubMed] [Google Scholar]
- 64.Davide P., Bo Y., Jessica X H., Chunlei Z., Amalie D H., Piotr J C., Chen Y., Jihye K., Renato U., Gregory A., et al. A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories. Nat Med. 2023;29:1113–1122. doi: 10.1038/s41591-023-02332-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Groh M., Badri O., Daneshjou R., Koochek A., Harris C., Soenksen L.R., Doraiswamy P.M., Picard R. Deep learning-aided decision support for diagnosis of skin disease across skin tones. Nat. Med. 2024;30:573–583. doi: 10.1038/s41591-023-02728-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Sophia J.,W., Daniel R., Nicholas P.,W., Jan Moritz N., Jiefu Z., Sebastian F., Gregory Patrick V., Philip Q., Heike I G., Piet A v.d.B., et al. Transformer-based biomarker prediction from colorectal cancer histology: A large-scale multicentric study. Cancer Cell. 2023;41:1650–1661.e4. doi: 10.1016/j.ccell.2023.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Kristina L., Viktoria J., Anna-Maria L., Stefan L., Charlotte H., Hanna S., Solveig H., Ingvar A., Aldana R. Artificial intelligence-supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): a clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy study. Lancet Oncol. 2023;24:936–944. doi: 10.1016/s1470-2045(23)00298-x. [DOI] [PubMed] [Google Scholar]
- 68.Rikiya Y., Jin L., Teri L., Lan P., Gerald B., Brock M., John H., Daniel L R., Jeanne S. Deep learning model for the prediction of microsatellite instability in colorectal cancer: a diagnostic study. Lancet Oncol. 2021;22:132–141. doi: 10.1016/s1470-2045(20)30535-0. [DOI] [PubMed] [Google Scholar]
- 69.Stefano A., Juan D C., Jaume B., Mariola K.-S. Using explainable artificial intelligence to predict and forestall flare in rheumatoid arthritis. Nat Med. 2024;30:925–926. doi: 10.1038/s41591-024-02818-w. [DOI] [PubMed] [Google Scholar]
- 70.Singhal K., Azizi S., Tu T., Mahdavi S.S., Wei J., Chung H.W., Scales N., Tanwani A., Cole-Lewis H., Pfohl S., et al. Large language models encode clinical knowledge. Nature. 2023;620:172–180. doi: 10.1038/s41586-023-06291-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Henry K.E., Adams R., Parent C., Soleimani H., Sridharan A., Johnson L., Hager D.N., Cosgrove S.E., Markowski A., Klein E.Y., et al. Factors driving provider adoption of the TREWS machine learning-based early warning system and its effects on sepsis treatment timing. Nat. Med. 2022;28:1447–1454. doi: 10.1038/s41591-022-01895-z. [DOI] [PubMed] [Google Scholar]
- 72.Kulkarni P.A., Singh H. Artificial intelligence in clinical diagnosis: opportunities, challenges, and hype. JAMA. 2023;330:317–318. doi: 10.1001/jama.2023.11440. [DOI] [PubMed] [Google Scholar]
- 73.Betzler B.K., Chen H., Cheng C.Y., Lee C.S., Ning G., Song S.J., Lee A.Y., Kawasaki R., van Wijngaarden P., Grzybowski A., et al. Large language models and their impact in ophthalmology. Lancet Digit. Health. 2023;5:e917–e924. doi: 10.1016/s2589-7500(23)00201-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Wang X., Zhao J., Marostica E., Yuan W., Jin J., Zhang J., Li R., Tang H., Wang K., Li Y., et al. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature. 2024;634:970–978. doi: 10.1038/s41586-024-07894-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Kanjee Z., Crowe B., Rodman A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA. 2023;330:78–80. doi: 10.1001/jama.2023.8288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Huang Z., Bianchi F., Yuksekgonul M., Montine T.J., Zou J. A visual–language foundation model for pathology image analysis using medical twitter. Nat. Med. 2023;29:2307–2316. doi: 10.1038/s41591-023-02504-3. [DOI] [PubMed] [Google Scholar]
- 77.Hoang D.T., Dinstag G., Shulman E.D., Hermida L.C., Ben-Zvi D.S., Elis E., Caley K., Sammut S.J., Sinha S., Sinha N., et al. A deep-learning framework to predict cancer treatment response from histopathology images through imputed transcriptomics. Nat. Cancer. 2024;5:1305–1317. doi: 10.1038/s43018-024-00793-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Jana L., Richard J C., Bowen C., Ming Y L., Matteo B., Daniel S., Anurag J V., Chengkuan C., Luoting Z., Drew F.K.,W., et al. Artificial intelligence for multimodal data integration in oncology. Cancer Cell. 2022;40:1095–1110. doi: 10.1016/j.ccell.2022.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Kaustav B., Nathaniel B., Amit G., Vamsidhar V., Anant M. Predicting cancer outcomes with radiomics and artificial intelligence in radiology. Nat. Rev. Clin. Oncol. 2021;19:132–146. doi: 10.1038/s41571-021-00560-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Stoel B.C., Staring M., Reijnierse M., van der Helm-van Mil A.H.M. Deep learning in rheumatological image interpretation. Nat. Rev. Rheumatol. 2024;20:182–195. doi: 10.1038/s41584-023-01074-5. [DOI] [PubMed] [Google Scholar]
- 81.van der Laak J., Litjens G., Ciompi F. Deep learning in histopathology: the path to the clinic. Nat. Med. 2021;27:775–784. doi: 10.1038/s41591-021-01343-4. [DOI] [PubMed] [Google Scholar]
- 82.Zhang P., Ma D., Cheng X., Tsai A.P., Tang Y., Gao H.C., Fang L., Bi C., Landreth G.E., Chubykin A.A., Huang F. Deep learning-driven adaptive optics for single-molecule localization microscopy. Nat. Methods. 2023;20:1748–1758. doi: 10.1038/s41592-023-02029-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Nils E., Alexander Shakeel B., Andrew C., Michelle D., Yijie Y., Philipp S., Alicia Kun-Yang L., Thomson R., Samantha F.-M., Tyler P., et al. Neurotransmitter classification from electron microscopy images at synaptic sites in Drosophila melanogaster. Cell. 2024;187:2574–2594.e23. doi: 10.1016/j.cell.2024.03.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Priessner M., Gaboriau D.C.A., Sheridan A., Lenn T., Garzon-Coral C., Dunn A.R., Chubb J.R., Tousley A.M., Majzner R.G., Manor U., et al. Content-aware frame interpolation (CAFI): deep learning-based temporal super-resolution for fast bioimaging. Nat. Methods. 2024;21:322–330. doi: 10.1038/s41592-023-02138-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Ruusuvuori P., Valkonen M., Latonen L. Deep learning transforms colorectal cancer biomarker prediction from histopathology images. Cancer Cell. 2023;41:1543–1545. doi: 10.1016/j.ccell.2023.08.006. [DOI] [PubMed] [Google Scholar]
- 86.Wu Z., Trevino A.E., Wu E., Swanson K., Kim H.J., D'Angio H.B., Preska R., Charville G.W., Dalerba P.D., Egloff A.M., et al. Graph deep learning for the characterization of tumour microenvironments from spatial protein profiles in tissue specimens. Nat. Biomed. Eng. 2022;6:1435–1448. doi: 10.1038/s41551-022-00951-w. [DOI] [PubMed] [Google Scholar]
- 87.Wong I.N., Monteiro O., Baptista-Hon D.T., Wang K., Lu W., Sun Z., Nie S., Yin Y., Ni J. Leveraging foundation and large language models in medical artificial intelligence. Chin. Med. J. 2024;137:2529–2539. doi: 10.1097/CM9.0000000000003302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Gao Q., Yang L., Lu M., Jin R., Ye H., Ma T. The artificial intelligence and machine learning in lung cancer immunotherapy. J. Hematol. Oncol. 2023;16:55. doi: 10.1186/s13045-023-01456-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Zhang C., Xu J., Tang R., Yang J., Wang W., Yu X., Shi S. Novel research and future prospects of artificial intelligence in cancer diagnosis and treatment. J. Hematol. Oncol. 2023;16:114. doi: 10.1186/s13045-023-01514-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Adam M.,C., Matt H., Hieronimus L., Julia B., Ralitza G., Alkomiet H., Joseph K., Philip R C., Nikolaos K., Harlan M K., et al. Illusory generalizability of clinical prediction models. Science. 2024;383:164–167. doi: 10.1126/science.adg8538. [DOI] [PubMed] [Google Scholar]
- 91.Brent M.,K., Jisoo P., Samson H F., Kyle S S., John L., Jason F K., Jianzhu M., Trey I. Predicting Drug Response and Synergy Using a Deep Learning Model of Human Cancer Cells. Cancer Cell. 2020;38:672–684.e6. doi: 10.1016/j.ccell.2020.09.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Wei W., Yu Z., Jing J., Molly V L., Gregory A F., Camarin E R., Crystal C., Cherise C.-F., Noralie K., Carena A C., et al. An electroencephalographic signature predicts antidepressant response in major depression. Nat. Biotechnol. 2020;38:439–447. doi: 10.1038/s41587-019-0397-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Gubanov, M., Pyayt, A., and Karolak, A. (2024). Cancerkg. Org-A Web-Scale, Interactive, Verifiable Knowledge Graph-Llm Hybrid for Assisting with Optimal Cancer Treatment and Care. pp. 4497-4505.
- 94.Kraljevic Z., Bean D., Shek A., Bendayan R., Hemingway H., Yeung J.A., Deng A., Baston A., Ross J., Idowu E., et al. Foresight-a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study. Lancet Digit. Health. 2024;6:e281–e290. doi: 10.1016/s2589-7500(24)00025-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Hanneman K., Playford D., Dey D., van Assen M., Mastrodicasa D., Cook T.S., Gichoya J.W., Williamson E.E., Rubin G.D., American Heart Association Council on Cardiovascular Radiology and Intervention; and Council on Lifelong Congenital Heart Disease and Heart Health in the Young Value Creation Through Artificial Intelligence and Cardiovascular Imaging: A Scientific Statement From the American Heart Association. Circulation. 2024;149:e296–e311. doi: 10.1161/cir.0000000000001202. [DOI] [PubMed] [Google Scholar]
- 96.Hashimoto D.A., Varas J., Schwartz T.A. Practical Guide to Machine Learning and Artificial Intelligence in Surgical Education Research. JAMA Surg. 2024;159:455–456. doi: 10.1001/jamasurg.2023.6687. [DOI] [PubMed] [Google Scholar]
- 97.Iacucci M., Santacroce G., Zammarchi I., Maeda Y., Del Amor R., Meseguer P., Kolawole B.B., Chaudhari U., Di Sabatino A., Danese S., et al. Artificial intelligence and endo-histo-omics: new dimensions of precision endoscopy and histology in inflammatory bowel disease. Lancet Gastroenterol. Hepatol. 2024;9:758–772. doi: 10.1016/s2468-1253(24)00053-0. [DOI] [PubMed] [Google Scholar]
- 98.Varghese C., Harrison E.M., O'Grady G., Topol E.J. Artificial intelligence in surgery. Nat. Med. 2024;30:1257–1268. doi: 10.1038/s41591-024-02970-3. [DOI] [PubMed] [Google Scholar]
- 99.Mamatha B., Madhumitha R., Beatriz Sordi C., Douglas A S. Artificial intelligence, machine learning, and deep learning in liver transplantation. J. Hepatol. 2023;78:1216–1233. doi: 10.1016/j.jhep.2023.01.006. [DOI] [PubMed] [Google Scholar]
- 100.Yanik E., Schwaitzberg S., De S. Deep Learning for Video-Based Assessment in Surgery. JAMA Surg. 2024;159:957–958. doi: 10.1001/jamasurg.2024.1510. [DOI] [PubMed] [Google Scholar]
- 101.C V., M P.-G., L K., M E G K., P W., N V., P d.W.H., E J K., L D., J v.d.L., et al. Ultra-fast deep-learned CNS tumour classification during surgery. Nature. 2023;622:842–849. doi: 10.1038/s41586-023-06615-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Jiang L.Y., Liu X.C., Nejatian N.P., Nasir-Moin M., Wang D., Abidin A., Eaton K., Riina H.A., Laufer I., Punjabi P., et al. Health system-scale language models are all-purpose prediction engines. Nature. 2023;619:357–362. doi: 10.1038/s41586-023-06160-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Yip M., Salcudean S., Goldberg K., Althoefer K., Menciassi A., Opfermann J.D., Krieger A., Swaminathan K., Walsh C.J., Huang H.H., Lee I.C. Artificial intelligence meets medical robotics. Science. 2023;381:141–146. doi: 10.1126/science.adj3312. [DOI] [PubMed] [Google Scholar]
- 104.Hong N., Whittier D.E., Glüer C.C., Leslie W.D. The potential role for artificial intelligence in fracture risk prediction. Lancet Diabetes Endocrinol. 2024;12:596–600. doi: 10.1016/s2213-8587(24)00153-0. [DOI] [PubMed] [Google Scholar]
- 105.Guangyu W., Xiaohong L., Zhen Y., Guoxing Y., Zhiwei C., Zhiwen L., Min Z., Hongmei Y., Yuxing L., Yuanxu G., et al. Optimized glycemic control of type 2 diabetes with reinforcement learning: a proof-of-concept trial. Nat Med. 2023;29:2633–2642. doi: 10.1038/s41591-023-02552-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Mayourian J., La Cava W.G., Vaid A., Nadkarni G.N., Ghelani S.J., Mannix R., Geva T., Dionne A., Alexander M.E., Duong S.Q., Triedman J.K. Pediatric ECG-Based Deep Learning to Predict Left Ventricular Dysfunction and Remodeling. Circulation. 2024;149:917–931. doi: 10.1161/circulationaha.123.067750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Sun J., Feng T., Wang B., Li F., Han B., Chu M., Gong F., Yi Q., Zhou X., Chen S., et al. Leveraging artificial intelligence for predicting spontaneous closure of perimembranous ventricular septal defect in children: a multicentre, retrospective study in China. Lancet Digit. Health. 2025;7:e44–e53. doi: 10.1016/s2589-7500(24)00245-0. [DOI] [PubMed] [Google Scholar]
- 108.Sahni N.R., Carrus B. Artificial intelligence in US health care delivery. N. Engl. J. Med. 2023;389:348–358. doi: 10.1056/NEJMra2204673. [DOI] [PubMed] [Google Scholar]
- 109.Shah N.H., Entwistle D., Pfeffer M.A. Creation and Adoption of Large Language Models in Medicine. JAMA. 2023;330:866–869. doi: 10.1001/jama.2023.14217. [DOI] [PubMed] [Google Scholar]
- 110.Wu Q., Bansal G., Zhang J., Wu Y., Zhang S., Zhu E., Li B., Jiang L., Zhang X., Wang C. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv. 2023 doi: 10.48550/arXiv.2308.08155. Preprint at. [DOI] [Google Scholar]
- 111.Li G., Hammoud H., Itani H., Khizbullin D., Ghanem B. Camel: Communicative agents for" mind" exploration of large language model society. Adv. Neural Inf. Process. Syst. 2023;36:51991–52008. [Google Scholar]
- 112.Duan Z., Wang J. Exploration of LLM Multi-Agent Application Implementation Based on LangGraph+ CrewAI. arXiv. 2024 doi: 10.48550/arXiv.2411.18241. Preprint at. [DOI] [Google Scholar]
- 113.Arora A., Alderman J.E., Palmer J., Ganapathi S., Laws E., Mccradden M.D., Oakden-Rayner L., Pfohl S.R., Ghassemi M., Mckay F., et al. The value of standards for health datasets in artificial intelligence-based applications. Nat. Med. 2023;29:2929–2938. doi: 10.1038/s41591-023-02608-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Karsh B.-T., Holden R.J., Alper S.J., Or C.K.L. A human factors engineering paradigm for patient safety: designing to support the performance of the healthcare professional. Qual. Saf. Health Care. 2006;15:i59–i65. doi: 10.1136/qshc.2005.015974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Wright M.A., Herzog F., Mas-Vinyals A., Carnicero-Carmona A., Lobo-Prat J., Hensel C., Franz S., Weidner N., Vidal J., Opisso E., Rupp R. Multicentric investigation on the safety, feasibility and usability of the ABLE lower-limb robotic exoskeleton for individuals with spinal cord injury: a framework towards the standardisation of clinical evaluations. J. NeuroEng. Rehabil. 2023;20:45. doi: 10.1186/s12984-023-01165-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Rao A., Pang M., Kim J., Kamineni M., Lie W., Prasad A.K., Landman A., Dreyer K., Succi M.D. Assessing the utility of ChatGPT throughout the entire clinical workflow: development and usability study. J. Med. Internet Res. 2023;25 doi: 10.2196/48659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Guillén-Climent S., Garzo A., Muñoz-Alcaraz M.N., Casado-Adam P., Arcas-Ruiz-Ruano J., Mejías-Ruiz M., Mayordomo-Riera F.J. A usability study in patients with stroke using MERLIN, a robotic system based on serious games for upper limb rehabilitation in the home setting. J. NeuroEng. Rehabil. 2021;18:41. doi: 10.1186/s12984-021-00837-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Bakas S., Vollmuth P., Galldiks N., Booth T.C., Aerts H.J.W.L., Bi W.L., Wiestler B., Tiwari P., Pati S., Baid U., et al. Artificial Intelligence for Response Assessment in Neuro Oncology (AI-RANO), part 2: recommendations for standardisation, validation, and good clinical practice. Lancet Oncol. 2024;25:e589–e601. doi: 10.1016/s1470-2045(24)00315-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Ole-Johan S., Sepp D.R., Andreas K., Tarjei S H., Knut L., John M., Hanne A A., Manohar P., John Arne N., Fritz A., et al. Deep learning for prediction of colorectal cancer outcome: a discovery and validation study. Lancet. 2020;395:350–360. doi: 10.1016/s0140-6736(19)32998-8. [DOI] [PubMed] [Google Scholar]
- 120.Richard J.,C., Ming Y L., Drew F.K.,W., Tiffany Y C., Jana L., Zahra N., Muhammad S., Maha S., Mane W., Bumjin J., Faisal M. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell. 2022;40:865–878.e6. doi: 10.1016/j.ccell.2022.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Park S.H., Han K., Jang H.Y., Park J.E., Lee J.-G., Kim D.W., Choi J. Methods for clinical evaluation of artificial intelligence algorithms for medical diagnosis. Radiology. 2023;306:20–31. doi: 10.1148/radiol.220182. [DOI] [PubMed] [Google Scholar]
- 122.Han R., Acosta J.N., Shakeri Z., Ioannidis J.P.A., Topol E.J., Rajpurkar P. Randomised controlled trials evaluating artificial intelligence in clinical practice: a scoping review. Lancet Digit. Health. 2024;6:e367–e373. doi: 10.1016/s2589-7500(24)00047-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Pencina M.J., McCall J., Economou-Zavlanos N.J. A Federated Registration System for Artificial Intelligence in Health. JAMA. 2024;332:789–790. doi: 10.1001/jama.2024.14026. [DOI] [PubMed] [Google Scholar]
- 124.Li H., Moon J.T., Purkayastha S., Celi L.A., Trivedi H., Gichoya J.W. Ethics of large language models in medicine and medical research. Lancet Digit. Health. 2023;5:e333–e335. doi: 10.1016/S2589-7500(23)00083-3. [DOI] [PubMed] [Google Scholar]
- 125.Goetz L., Trengove M., Trotsyuk A., Federico C.A. Unreliable LLM bioethics assistants: Ethical and pedagogical risks. Am. J. Bioeth. 2023;23:89–91. doi: 10.1080/15265161.2023.2249843. [DOI] [PubMed] [Google Scholar]
- 126.Ning Y., Teixayavong S., Shang Y., Savulescu J., Nagaraj V., Miao D., Mertens M., Ting D.S.W., Ong J.C.L., Liu M., et al. Generative artificial intelligence and ethical considerations in health care: a scoping review and ethics checklist. Lancet Digit. Health. 2024;6:e848–e856. doi: 10.1016/s2589-7500(24)00143-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.Kun-Hsing Y., Elizabeth H., Tze-Yun L., Isaac S K., Arjun K M. Medical Artificial Intelligence and Human Values. N. Engl. J. Med. 2024;390:1895–1904. doi: 10.1056/NEJMra2214183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Liu K. Artificial Intelligence and Ethical Frameworks in Pediatrics. JAMA Pediatr. 2024;178:626–627. doi: 10.1001/jamapediatrics.2024.0510. [DOI] [PubMed] [Google Scholar]
- 129.John D 3rd M., C William 3rd H., Ross K. Clinical, Legal, and Ethical Aspects of Artificial Intelligence-Assisted Conversational Agents in Health Care. JAMA. 2020;324:552–553. doi: 10.1001/jama.2020.2724. [DOI] [PubMed] [Google Scholar]
- 130.Li R., Kumar A., Chen J.H. How chatbots and large language model artificial intelligence systems will reshape modern medicine: fountain of creativity or pandora’s box? JAMA Intern. Med. 2023;183:596–597. doi: 10.1001/jamainternmed.2023.1835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131.Fu X., Cheng W., Wan G., Yang Z., Tee B.C.K. Toward an AI Era: Advances in Electronic Skins. Chem. Rev. 2024;124:9899–9948. doi: 10.1021/acs.chemrev.4c00049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132.De Freitas J., Cohen I.G. The health risks of generative AI-based wellness apps. Nat. Med. 2024;30:1269–1275. doi: 10.1038/s41591-024-02943-6. [DOI] [PubMed] [Google Scholar]
- 133.Shokr A., Pacheco L.G.C., Thirumalaraju P., Kanakasabapathy M.K., Gandhi J., Kartik D., Silva F.S.R., Erdogmus E., Kandula H., Luo S., et al. Mobile Health (mHealth) Viral Diagnostics Enabled with Adaptive Adversarial Learning. ACS Nano. 2021;15:665–673. doi: 10.3390/jcm14010286. [DOI] [PMC free article] [PubMed] [Google Scholar]



