The rise of large language models (LLMs) has revolutionized biomedical data science, transforming the way we access, synthesize, and reason over data and information. Early in their adoption, the art and science of prompt engineering emerged as a crucial skill. Prompts allow users to craft input instructions that coax LLM models into producing high-quality, contextually appropriate, and accurate responses. Data scientists, developers, and domain experts quickly learned that the wording, structure, and intent behind a prompt could make the difference between useful and useless answers. Prompt engineering became an indispensable interface between human creativity and machine intelligence.
However, as powerful as prompt engineering has been, it exposes a fundamental limitation: it relies heavily on static, one-off or chains of instructions implemented using a single LLM model. A prompt is, in essence, a frozen request or an isolated act of communication that lacks memory, initiative, or adaptability. In contrast, most real-world problems, particularly in biomedical and clinical domains, are complex, dynamic, and iterative. They require exploration, hypothesis refinement, integration of multiple data streams, and continuous learning from feedback. This has given rise to the new field of agentic AI that requires the user to specify teams of autonomous agents that each have their own expertise for tackling the varied parts of a complex and multifaceted problem. To leverage this powerful approach, we envision a new future where AI practitioners will need to pivot from prompt engineering to agent engineering.
Agent engineering builds on the foundation of LLMs but introduces concepts of autonomy, persistence, code execution, and multi-step reasoning. Instead of asking a model to respond to a single prompt, we design multiple agents that represent intelligent systems capable of setting goals, decomposing tasks, retrieving information, collaborating with other agents, and adapting their behavior over time. In this new framework, the “prompt” is no longer a script but a scaffold or a set of initial conditions and characteristics that define an agent’s capabilities, memory, and reasoning strategies. The focus shifts from crafting the perfect instruction to designing the architecture of cognition or problem solving itself. Therefore, we define Agent Engineering as the systematic design, implementation, and evaluation of AI agents specifically structured to address the dynamic and multidisciplinary challenges of biomedical research.
Agent Engineering has four core components: (1) agent specification (defining goals, code, tools, and reasoning style), (2) orchestration (inter-agent communication and hierarchy), (3) evaluation (assessing trust, reproducibility, and alignment), and (4) governance (embedding ethical and regulatory constraints). In this sense, designing a team of agents is very much like assembling a team of human experts to solve a problem.
Consider, for example, the task of engineering agents to generate novel biomedical hypotheses to test. The first thing you might need to create is a literature-mining agent capable of scanning PubMed and preprint servers for new studies related to your disease or biological mechanism of interest. A causal inference agent might integrate evidence across clinical trials, electronic health records, and animal model studies to propose plausible mechanistic chains that merit experimental validation. Meanwhile, a concept fusion agent could use semantic similarity and knowledge-graph reasoning to connect seemingly unrelated concepts. Other agents might specialize in cross-referencing biomedical datasets with literature-derived hypotheses to prioritize the most promising research directions. It might also be necessary to create one or more manager agents that can supervise the worker agents to make sure they are on task and producing relevant results. Together, such agents could form a dynamic ecosystem that not only digests the biomedical literature but also generates and tests new scientific hypotheses. In doing so, the AI agents act as tireless and efficient research collaborators capable of accelerating the pace of biomedical innovation.
This transformation from prompt to agent engineering marks a profound shift in how we conceptualize artificial intelligence (AI). It reframes the human-AI relationship from command-and-response to partnership and co-discovery. Agentic AI systems can be endowed with domain-specific expertise, ethical frameworks, and reasoning protocols, allowing them to navigate uncertainty, test hypotheses, and even critique their own outputs. For data scientists, this means moving beyond linguistic tuning toward systems-level thinking with ecosystems of agents that collaborate, specialize, and evolve. Importantly, these emerging ecosystems of agents must operate with established frameworks of interpretability, provenance, and ethical constraint, ensuring that autonomy does not come at the cost of accountability.
The promise of agentic AI for biomedical and clinical research is particularly compelling. Imagine autonomous research agents that continuously integrate multi-omics data, patient records, and scientific literature to generate new hypotheses about disease mechanisms. Or clinical decision agents that collaborate with physicians to design adaptive treatment plans, simulate outcomes, and learn from each case. Agentic AI offers a path toward self-improving, knowledge-driven systems capable of accelerating discovery, personalizing care, and addressing the complexity that has long limited human understanding. Just as prompt engineering unlocked the expressive potential of LLMs, agent engineering may unlock AI as a true collaborator in the pursuit of health and scientific progress. To realize this vision, the biomedical AI community must develop shared frameworks for agent design, benchmarking, and validation by integrating domain knowledge with computational rigor. The future of biomedical discovery will depend not just on smarter models, but on how we engineer the agents that wield them.
Box 1. A day in the life of an agent engineer
At 8:00 a.m., Dr. Lina Chen logs into her lab’s agent orchestration dashboard, a living ecosystem of autonomous AI agents she and her team maintain for translational oncology research. Overnight, her literature-mining agent has scanned new PubMed entries, preprints, and clinicaltrials.gov updates, summarizing 43 studies on checkpoint inhibitor resistance. It has also cross-referenced findings against TCGA and other public omics repositories, flagging metabolic stress pathways that repeatedly correlate with treatment non-response.
By 9:00 a.m., Lina reviews the morning digest with her causal inference agent, which has constructed an evidence graph linking metabolomic markers to immune exhaustion signatures derived from her institution’s EHR-linked tumor registry. The agent automatically queries de-identified patient data to test whether these associations replicate in real-world cohorts, estimating effect sizes and uncertainty intervals.
At 10:30 a.m., she initiates a short agent conversation thread among the knowledge-graph reasoning agent, the multi-omics integration agent, and the replication agent, which specializes in validating hypotheses across independent datasets. Within an hour, the agents converge on two promising mechanistic hypotheses including one involving a lipid-regulating enzyme implicated in T-cell dysfunction. The replication agent confirms that this signal is consistent in multiple public datasets from GEO and single-cell atlases, adding confidence before any wet-lab experiments are planned.
After lunch, Lina collaborates with the experimental design agent to explore potential validation strategies. The agent suggests both in silico analyses, such as leveraging a federated network of cancer centers for EHR-based outcome modeling, and in vitro follow-ups using existing CRISPR-compatible organoid lines. It automatically checks reagent availability, prior experimental results, and ethical approvals.
Before wrapping up, Lina runs an ethics and reproducibility audit agent which is a required step in her lab’s workflow. It verifies data provenance across public and private sources, ensures regulatory compliance for all EHR access, and logs every agent action for transparent review.
As Lina shuts down for the day, her agents continue to refine models, retrain on new data drops, and re-prioritize hypotheses. By the next morning, they’ll have not only generated fresh insights but validated them across real-world biomedical data, ready for human interpretation and experimental confirmation.
Author contributions
JHM and NPT wrote and edited the manuscript.
Declarations
Competing interests
Drs. Jason H. Moore and Nicholas P. Tatonetti are the Editors-in-Chief of Biodata Mining.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
