Exocortex Network for AI-Augmented Human-Led Scientific Expedition

Esther H R Tsai; Kevin G Yager

doi:10.1021/photonsci.5c00009

. 2025 Oct 22;1(2):68–76. doi: 10.1021/photonsci.5c00009

Exocortex Network for AI-Augmented Human-Led Scientific Expedition

Esther H R Tsai ^1,^*, Kevin G Yager ^1,^*

PMCID: PMC13022949

Abstract

AI advances in science can be viewed along two main directions with a fluid boundary: enhancing efficiency through automation and smart tools to accelerate tasks that humans can already perform; and enabling exploration into uncharted territories and potentially toward AGI. These advances manifest in the AI cognitive core through the development and explainability of foundation models; in the physical embodiment of instruments and facilities; and in the integrated agency of AI workflows exemplified by the science exocortex. To address the role of humans in this evolving landscape, in this Perspective, we suggest a third direction: the development of personalized agents that form human-centered networks, supporting both efficiency and exploration while ensuring that AI remains aligned with human vision.

Keywords: artificial intelligence, machine-learning, foundation models, human-computer interaction, exocortex

graphic file with name po5c00009_0004.jpg

graphic file with name po5c00009_0003.jpg

Introduction

Over recent years, automation methods, often leveraging artificial intelligence and machine-learning (AI/ML) have gradually redefined the fundamental processes and methodologies for scientific research and, in some cases, accelerated discovery. − Automation is not merely a convenience; many problem spaces are not amenable to search manually. For instance, materials discovery often involves exploration of high-dimensional spaces, making it infeasible to exhaustively search for optimal structures or properties. With increasing automation adoption across different areas, including hardware robotics and analysis software, more laboratories are moving toward “closed-loop” experimental workflows. For example, samples are prepared using robotic or combinatorial methods, and then key measurements are taken and data analyzed. The resulting data are fed into a decision-making method that selects high-value follow-up samples to synthesize/measure according to user-defined criteria. This creates a continuous cycle of measurement, analysis, and adjustment, and hence, the term “closed-loop”. Self-driving lab (SDL) − and autonomous experimentation (AE) − can refer to variations of this framework, some focusing more on automation with robotics and some on developing tailored decision-making methods. For decision-making, Gaussian processes serve as a powerful probabilistic approach within the Bayesian optimization framework, with applications in material synthesis, − material processing, , instrumentation configuration, and characterization via diffraction − and microscopy. , While automation of routine tasks can reduce human effort, the integration of advanced AI, whether autonomous methods or agentic AI, should serve to augment scientists’ capabilities and actually promote human engagement in science.

With the emergence of foundation models, in particular, large language models (LLMs), , science is on the cusp of another major transformation. − Unlike earlier AI systems, which are often task-specific or narrow in scope, these newer models bring much broader capabilities, including understanding complex natural language expression, , efficient information retrieval, , reasoning − and hypotheses generation, − and multi-agent orchestration. − LLM applications have been demonstrated in material science and chemistry, , including molecular and material design, − experimentation, − data captioning, benchmarking, − and multi-agent workflow. Foundation models and applications open the door to AI-augmented scientific expeditions, enabling researchers to venture into uncharted intellectual territory and pursue discoveries that were previously unimaginable due to time, scale, or complexity constraints. By serving as collaborators rather than just tools, LLMs can help scientists explore questions they might not have considered, identify hidden connections across disciplines, and navigate a vast knowledge base and ideas. Just as automation revolutionized the “doing” of science, LLMs and foundation models may revolutionize the “thinking” of science via empowering humans with enhanced capabilities and fresh perspectives.

Imagine a near future, where every scientist has an exocortex that acts as an external layer of cognition to augment a scientist’s abilities. The swarm of AI agents that constituted the personal exocortex can provide guidance on the scientific instrument operation, correlate multi-modal data to provide new insights, connect simulations with multiple experimental results for holistic studies, and search the literature to propose new theoretical or experimental studies. The exocortex is a broad and ambitious vision for the future of science, which will require advancements in AI/ML methods and tailoring of these advancements to science workflows. Nevertheless, we can see early hints of such agentic and multi-agent approaches accelerating research. − As a concrete example, we have been developing a scientific companion , that leverages a designed sequence of LLM calls to behave as a natural-language interface to complex instrumentation at scientific user facilities. The system is intended to alleviate the burden for instrument scientists (training, troubleshooting, etc.) and increase engagement from users (as it allows them to more rapidly become independent operators), thereby expanding the complexity of possible experiments. Using this approach, we recently demonstrated the first voice-controlled experiment at a synchrotron beamline that hosts complex experiments, contributing one component of the exocortex.

While the exocortex (exo) concept is not yet realized, it is not too soon to begin imagining further improvements to the concept. In particular, the exocortex is envisioned foremost as a means of expanding the cognition and volition of an individual scientist, allowing them to orchestrate a diverse catalogue of science resources (publications, databases, software, instruments, etc.). Yet we must acknowledge that modern science is fundamentally a community activity. The reasons for this are many: the key role of peer-review as a self-correction mechanism, the improved creativity arising from group brainstorming, the growing need for interdisciplinary approaches to tackle the grandest challenges, and perhaps most importantlybecause humans are social creatures that thrive when part of a community. In this context, here, we propose the concept of a scientific exocortex network (ExoNet) to connect not only human-to-AI, but also human-to-human, to efficiently create productive human partnerships that consistently drive human-led science. We envision a world where all our exocortices are connected, continuously facilitating collective brainstorming, ideation, and, most importantly, forming dynamic communities (sub-nets) that connect human scientists with shared passion and vision. Based on our experience of natural-language-based interactions at synchrotron beamlines, in this Perspective, we continue to use synchrotron X-ray experiments as illustrative examples to discuss the envisioned impact of ExoNet, while noting that its potential influence extends broadly to other facility instruments and across the scientific community.

Exocortex Network

Humans are driven by aspiration and creativity, yet the demands of daily life often constrain us to routine, repetitive tasks, limiting the time and cognitive resources available for higher-order thinking and innovation. The fundamental question is how can AI be leveraged to alleviate these burdens and free human capacity for creative and impactful work? Recent advances in LLMs are enabling AI to manage many well-defined tasks, while human scientists may tackle complex challenges and develop long-term vision. However, a chasm between technological advances and scientific experimentation, coupled with a lack of common language and shared interests across disciplines, often results in sporadic and disorganized testing of AI/ML concepts in physical science communities. We are far from realizing the full potential of the current AI methods for improving science. Looking further into the future, one can extrapolate the rapid progress in AI capabilities , to predict the upcoming development of human-level artificial general intelligence (AGI). − The potential of AGI engenders a host of concerns, as such powerful systems would be difficult to control and by default may well displace humans instead of empowering them. While much of the current focus in agentic AI centers on replicating human abilities and thereby replacing human workers, we argue that the true opportunity lies in leveraging frontier AI to enhance human creativity, broaden knowledge, foster efficient cross-disciplinary collaboration, and achieve greater productivity. Therefore, in this Perspective, we suggest a human-centered agentic AI framework where a human sets the goal, and AI agents autonomously coordinate and collaborate with other custom agents, synchronously learning from human domain experts while collecting real-time human feedback to stay human-led.

Task-focused platforms, in general, prioritize building capabilities and delivering business-oriented results, whereas social platforms emphasize discoverability and constant open-ended access. Existing social and entertainment-oriented platforms, e.g., Character.ai and Meta’s AI Studio, enable users to create custom AI characters focused on personality-driven interactions. While limited AI-to-AI exchanges are possible, these platforms prioritize engagement and creative expression over task automation and workflow execution, typical of productivity-oriented AI tools. Designed to achieve specific goals, some platforms focused on lightweight self-operating agents, e.g., AutoGPT, BabyAGI, AgentGPT, while some on workflow automation and tool integration, e.g., Zapier Central and Copilot Studio. For multi-agent collaboration, MetaGPT mimics a software company by assigning AI agents specialized roles, e.g., architect and engineer, to collaboratively deliver working software. CAMEL offers a framework for exploring multi-agent systems, emphasizing autonomous reasoning and planning in research settings. CrewAI provides a general-purpose framework focused on building and deploying collaborative AI agents with tool integration and agent orchestration for real-world workflows. For publicly accessible agents, SuperAGI provides production-ready multi-agent framework for automation in business and startups that also offers a marketplace for publishing agents. ChatGPT’s custom GPTs offers both global public access and strong productivity features, however, without native peer-discovery and interprocess communication. On the other hand, Google’s Agent-to-Agent (A2A) protocol is an open standard that enables AI agents across platforms to securely discover one another, share capabilities via agent cards, and is primarily used in enterprise workflows for cross-platform automation. While task-focused platforms prioritize business capabilities and outcomes, features such as discoverability and public access are generally associated with social platforms. The concept of ExoNet suggests a framework that combines task-focused and social-creative approaches to advance scientific productivity and creativity.

In this Perspective, we address the prevailing notion that AI often reduces human involvement and put forward a vision for a human-centered AI network. While reduced human effort may be true with automated routine tasks, the emergence of advanced and agentic AI offers a new paradigm: we envision the ExoNet concept to be the foundation of a global scientific network of interconnected exocortices, where human minds, personalized AI, and physical AI work together as a unified ecosystem. Figure illustrates different motifs based on the level of human engagement and AI involvement in science. In the pre-AI era, humans handwrite manuscripts and draw figures and discussions are largely constrained by time and space, as illustrated in Figure (a1). With machinery and automation of simple tasks, human no longer need to be as involved in certain routine work, for example manufacturing automation. As illustrated in Figure (a2), these automations do not require much human involvement once the task and goal are set. As opposed to automation in (a2), where the goal is more AI involvement and less human engagement, intelligent AI tools or agentic AI should empower humans and invite deeper human engagement and enhanced creativity in the scientific process. Figure (b) illustrates the current approach for leveraging AI for science, where increasingly complex tasks are handed over to increasingly capable AI. The default outcome in such an approach is less human involvement in the work. In Figure (b1), humans send queries to AI tools, but each interaction is independent and the answers do not consider the full picture of the scientific topic without further human analysis. Whereas in Figure (b2), a large centralized AI framework is queried by humans to provide a comprehensive solution of the topic. ChatGPT , and AlphaFold can be seen as in between (b1) and (b2), depending on the inquiry type: for scientific research, AlphaFold provides fast and accurate protein structure predictions and delivers a coherent scientific response, while ChatGPT by itself offers fragmented/snippets of knowledge without structured scientific outputs. There is widespread concern that increased use of agentic AI and development toward AGI may lead to AI-led science instead of human-led discovery, illustrated in Fig. (c1). The limiting case is, in fact, total replacement of humans, Figure (c2), in the conduct of science. While these worries are legitimate, this outcome is not inevitable. AI advances can instead augment human capabilities when approached with the right perspectives and supported by necessary infrastructure. The concept of the exocortex envisions humans being enhanced by AI: a personal exocortex customized to individual preferences can support, e.g., experimentation, knowledge organization, and ideation to boost efficiency and productivity. Figure (d) illustrates the ideal scenario where agentic AI urges human scientists to be even more engaged than the pre-AI era. With a network of personalized exocortices, interaction can be delegated to AI with no human inputs needed or only prompting scientists for quick feedback with AI-filtering, as shown in Figure (d1), or connecting human to human to form sub-nets with scientists that share aligned interest, as illustrated in Figure (d2). With exo adapting to the level of involvement needed in each situation, the flexible interaction framework aims to enable scientists to stay focused on high-value work, delegate intelligently, and collaborate efficiently. Here we present our vision of what ExoNet could look like for the scientific community.

Illustration of different levels of human engagement and AI involvement in science. In the pre-AI era, scientific discovery is based on human effort (a). As AI methods advance, AI tools are able to automate more tasks (b). The straightforward deployment of more agentic AI tends to replace human engagement (c). However, alternative deployments are possible: AI should be leveraged to enhance human capabilities. With personalized AI systems (exocortices), interaction can be delegated to AI (d1), i.e., the system acts as a filter, or a network of exocortices (ExoNet) can connect human to human to achieve AI-augmented human-led science (d2).

Delegate-AI

A scientist and AI can collaborate in different interaction modes. In the delegation mode, the human is minimally disrupted as AIs interact with each other directly without involving their human managers. Each AI, as a personal exocortex, captures some of their human’s unique domain expertise, interests, and opinions. This allows the exo to search for data and connections of interest to their human and also to offer to others this person’s unique perspective. Through a chatting or recording interface, scientists can opt to share project and instrumentation updates, selected documents including publications or correspondence, personal insights, or interests in exploring unfamiliar domains. Substantial front-end and back-end engineering efforts are necessary to ensure data security, user privacy, and smooth interactions between scientists and their exos and between exos. This personal AI approach is complementary to conventional retrieval chatbots that extract relevant information from a large knowledge base; exos capture individual perspectives owing to their personalized design (selection of tools, workflows, knowledge repositories, etc.) and by amortizing their human’s opinions.

Exos can operate autonomously in the background, continuously searching for potential collaborators with aligned interests and sending out invitations to the other scientist’s exo to have preliminary discussions. By scanning a network of diverse disciplines, exos can explore novel ideas, emerging trends, or unconventional connections that may not be obvious. Exos can also simply perform routine tasks such as querying other exosomes for quick answers or clarifications. This persistent and autonomous interaction enables a dynamic flow of knowledge and interaction far beyond the limits of human availability or coordination. Colleagues or students can also query the scientist’s exo to understand the expertise and interests of the scientist, allowing efficient knowledge transfer without requiring the scientist’s direct attention. This minimizes distractions and time spent on minor tasks, enabling scientists to maintain their focus on valuable research and development activities.

In the context of synchrotron experiments (beamtime), AI delegation means that potential users can consult an exo (AI agent) representing the beamline scientist to obtain information on beamline specifications (e.g., energy range and corresponding beam size), current beamline status (e.g., detector availability or malfunctioning instruments), and latest capabilities along with their specification and limitations (e.g., focusing optics or in-situ apparatus). Given the user’s scientific question, exos can also search if it be helpful to run multi-modal measurements with, for example, both X-ray scattering and spectroscopy to study the degradation of perovskite thin films or solid-state metal dealloying. The user’s exo can also search relevant literature, coordinate with the beamline scientist’s exo for beamline-specific questions, and assist in scheduling meetings and instrument availability.

AI-Filtering

An AI-filtering interaction mode could be designed to collect quick feedback and confirmation from the human scientist. In this mode, the exo gathers and processes information and presents a summary and suggestions to the scientist for review. The scientist can quickly confirm, refine, or reject the exo’s suggestions. By using exo to organize and filter information while keeping humans in control of critical judgments, this interaction mode ensures that the workflow remains both accurate and time-efficient. This can be conceived as a smarter and more efficient version of question-and-answer web communities (e.g., Stack Overflow): scientists can instruct their personal exo to send push notifications for quick feedback when specific types of inquiries are received. When someone has a question, their exo can search and identify which scientist might have the answer. The scientist’s exo will then generate a preliminary response for the scientist to approve or edit before sharing. Scientific discussion with experts is highly sought, straining the time of researchers across myriad discussions when only a subset truly requires their considered input. On the other hand, a vast array of important questions are never asked (and not answered) because scientists worry about wasting each other’s time. AI-driven triage and routing of communication could resolve these challenges. Managing multiple tasks within a single project or across several projects can also be challenging. Exos can alleviate these burdens by providing real-time project updates, ensuring that scientists remain continuously informed of ongoing progress while requiring minimal time and effort.

At scientific facilities, users ask facility scientists diverse questions about capabilities and operation of instruments and software. Some questions are very standard and can be answered with a simple chatbot implementation, while some answers will require instrument-specific experience or extended calculations; some inquiries are more advanced and require an interactive discussion with the scientist. With AI-filtering, beamline scientists do not have to answer all beamline technical questions but only offer quick feedback to open questions, e.g., when a user with a custom material processing platform seeks to codevelop capabilities or explore science cases collaboratively, or when users seek collaboration on modeling and computational methods to improve data analysis. Instrument or domain-science specific questions can also propagate from users to staff exos and to even senior staff and so on. This may also help higher management quickly collect feedback from the experimental floor.

Augment Human–Human Interaction

We envision that with ExoNet, scientists can find new ideas and new colleagues by leveraging AI as a matchmaker that identifies high-value connections and allows for easy sorting by human researchers. This can be viewed analogously to the rapid “swiping left or right” of online dating apps, although one can leverage these efficient sorting methods without also importing the undesirable aspects of such processes by focusing purely on the correlation between scientific interests. This could involve exos working together (talking to each other) to identify potentially useful collaboration opportunities, with scientists then deciding which they want to pursue. The network will not only foster collaborations but also spark new ideas through the cross-pollination of experiences and insights across disciplines. Exos can act as proactive assistants to match researchers with projects, collaborators, or reviewers, streamlining connections beyond what conventional platforms such as LinkedIn can offer. When a beamline user or staff wants to find local experts on, e.g., modeling, electrochemistry, or robotics, their exo will not only look up information and help with implementation details but also identify potential collaborators based on factors such as geographical region, type of affiliation (e.g., industry or academia), or availability. The impact of a new instrument, software, or capability often hinges on effective advertising and support. For instance, when introducing a new image segmentation tool or characterization method, it is important to identify applications or researchers that can benefit from it. With approval from interested parties, a team (sub-net) can be formed, and exos can then initiate a kickoff meeting, connect relevant individuals and information, and even engage in debates on behalf of humans.

One essential part of preserving human control is ensuring that valuable knowledge is effectively transferred between people. Effective knowledge transfer is essential to ensuring the reproducibility of experiments, project transitions, and the development of a continuous workforce for sustained scientific advancement. Knowledge transfer involves frequent human communication, making it a time-intensive process that is most effective through a one-on-one interaction. Key insights often lie in the finer details, which can be easily overlooked or inadequately conveyed. Inefficient communication often leads to a limited number of trained experts or the loss of valuable knowledge. Senior scientists and professors possess extensive scientific knowledge and years of precious experience; however, their time is extremely valuable and managerial responsibilities often limit their availability for scientific discussions and execution. Students and junior scientists can greatly benefit from accessing the knowledge base of these senior experts through their exos, enabling knowledge transfer with little-to-no time investment required from the senior experts themselves. The dynamic exchange between junior and senior scientists not only advances research but also fosters a forward-thinking scientific community. Scientific exocortex sub-nets can also serve as part of project succession plans by keeping detailed project documentation and a network of potential candidates to continue the work. As automation advances, preserving and transferring human expertise is vital, especially for large-scale facilities and collaborative projects. For example, the complex construction and operation of accelerators depend on the accumulated knowledge of not only staff but also termed personnel, making systematic knowledge transfer essential for continuity and sustainability.

Envisioning Collaborative AI

Beyond accelerating processes for efficiency and exploring AI to augment human capabilities, we envision a scientific human-centered network in which AI serves as a supportive partner while humans continue to provide strategic leadership and shape the overarching vision. During beamtime, especially for in-situ experiments, the exos would proactively provide real-time updates and visualization as well as surface pertinent research from the literature or previous experiments, creating a fluid partnership where humans focus on creative problem-solving while AI handles information management and contextual support. Whenever a scientist contributes new data, e.g., correlated electron microscopy and X-ray data, insights, or quick updates, it can trigger dynamic intra- or inter-exo conversations based on the latest information. Exos can continuously conduct web searches, maintain background dialogues, and perform computational tasks. Summarized reports can be delivered to users through push notifications or upon request. Staff exo can also offer the latest information on the beamline and provide guidance for fixing issues during after-hour support, which has always been an unresolved issue. The idea of a personal exo emphasizes the fact that the staff exo can share the latest information with minimal staff involvement. Currently AI tools are often method- or instrument-focused, e.g., autonomous experimentation with X-ray or scanning probe microscopy, ML-based methods for X-ray absorption spectroscopy, , virtual assistant via the Bluesky data acquisition framework, or agentic workflow for microscopy and spectroscopy. These advances in automation, autonomy, and AI/ML are laying the groundwork for exocortices and driving the development of personal exosystems and human-centered networks. In reality, scientists may differ in their approaches and recommendations, for instance, some emphasizing stronger statistical analysis via duplicated measurements, while others prefer broader exploration by varying the parameters in the material system. The most suitable answer also depends on the user’s background and expertise; for example, an undergraduate student versus a senior researcher, or someone trained in chemistry versus computer engineering, will expect different types of explanation. In addition to publishing papers, the network itself may offer a way to gauge scientific impact by tracking the number of inquiries, follow-up discussions, or collaborations.

Here we provide a prototype to illustrate the feasibility of personal exo and sub-nets. As shown in Figure , compared to querying an LLM directly, enriching personalized agents with specific information can (1) provide most relevant answers, (2) anticipate user responses for forming sub-nets, and (3) support diverse collaborations or group-specific insights. The left column (red) shows results from GPT-5 via Azure OpenAI, while the center and right columns (green) display outputs and specialties of personalized agents built on the same GPT-5. The implementation available on github: https://github.com/esther279/ExoNet_v0/, with agent descriptions, background information, and interaction order specified simply in a JSON file. In the first example (Q1), the agent is given anecdotal information on the instrument for tomographic angular sampling that can affect the experiment planning, especially for dose-sensitive samples. In general, a personalized agent (personal exo) of a staff would know about the latest updates, e.g., a new detector or instrument is being commissioned and expected to be available at a certain time frame. In cases where relevant information is not explicitly encoded in the prompt, as illustrated in Q1, the agent model would engage in internal reasoning to generate a predicted output. In the second example (Q2), a small dataset from webpage and publication was used to predict the scientist’s response. While the pursuit of a unified theory of cognition has long represented a central objective in psychology with recent advances based on foundation models, , scientific perspectives are collectively framed by domain-specific expertise, personal research experience, and underlying psychological dispositions. Incorporating specific information resembles approaches such as retrieval-augmented generation (RAG) or prompt engineering, whereas the emphasis here lies in anticipating unknown human data or behavior with predictions anchored in actual human scientific narratives. Each experiment was repeated ten times for statistical information, with 26±12 s inference time each. In contrast to GPT-5, which always produced option (A), the personalized-AI chose (B) approximately 60% of the time, which reflects the true perspective of the scientist. Although the present predictions exhibit limited robustness due to the model stochasticity, here we demonstrate feasibility and expect that optimizing content inclusion and prompting can markedly improve accuracy. Moreover, a disclaimer should accompany these predictions, emphasizing that they are provisional and require prompt validation from the scientist. These forecasts can simplify the search for collaborations and, when combined with human input, ground AI predictions in authentic human perspectives, thus allowing AI to evolve alongside human vision, much like a trusted personal assistant. The third example (Q3) suggests that the collaboration between two personalized agents with diverse backgrounds leads to more creative suggestions. Depending on the user request and matching algorithms, sub-nets can be either specialized or diverse expertise, e.g., specialization may be best for solving technical problems, while diversity may be valuable for brainstorming.

Comparison between GPT-5 without and with personalization. By building personalized agents with specific information instead of relying solely on direct LLM queries, they can (1) deliver more relevant answers, (2) anticipate user response for forming best-matches for sub-nets, and (3) facilitate diverse teams.

Outlook

The rapid evolution of AI from tool towards AGI is becoming evident in its cognitive core of foundation models, physical AI, and emerging integrated agency. , Foundation models can democratize advanced technologies for faster innovations and multidisciplinary collaboration, while interpretability research advances trust and utility. Physical AI expands the AI impact to the physical world via experimental assistant, robotics, , and towards embodied instrument and facility. The exocortex enhances instrumentation and human capabilities with timely guidance and insights, and ExoNet links human experts into human-centered networks, enabling AI-augmented but human-led scientific discovery. Together, these developments promise a broader, structured, and lasting transformation of scientific research and workflows.

As the prospect of AGI becomes increasingly plausible, it becomes even more critical for humans to remain deeply engaged in the scientific process and not sidelined by automation or AI tools. Given the limited amount of high-quality human-generated content and the growing presence of synthetic data, it is essential for humans to stay involved in the development process to ensure AI progresses is aligned with human experience and to prevent model collapse. , A network that captures human-AI and human–human interactions can continuously provide valuable data for AI development and reduce hallucination. Moreover, multidisciplinary research plays a vital role in driving scientific breakthroughs, which often arise from the convergence of different fields. However, engaging in multidisciplinary work can be challenging, as experts may struggle to find a common language or find the time to search and connect with suitable collaborators. ExoNet’s matchmaking concept and the formation of sub-nets of human scientists may offer a way to tackle this problem and foster low-friction cross-disciplinary collaboration. Building the necessary infrastructure will lay the foundation for a future where humans and frontier AI evolve in synergy, with humanity remaining the driving force. This requires investment in research and development for agentic AI systems that leverage foundation models for science as well as engineering efforts to build infrastructure needed to realize the exocortex blueprint.

Designing the infrastructure for a global networking application requires thorough planning to ensure optimal performance, low-latency, and scalability as well as data security, privacy, and customizable permissions. Key components to consider include a front-end messaging interface, inter-exo communication, matching algorithms, computing resources allocation, and a robust client-server architecture capable of supporting real-time data flow across distributed regions. As wearable AI technology , evolves to rapidly handle audio and visual data, one can anticipate an even more natural form of interaction. While all developments inherently carry some risk of misuse, science and technology must still advance. This network roadmap should be designed for scientific purposes, while its use for general purposes or by the wider public is debatable due to safety and privacy concerns.

Science matters because it deepens our understanding of the natural world, helps address complex problems, drives innovation to enhance our quality of life, and encourages critical thinking vital to individual and societal progress. Scientists aim to explore the unknown while ensuring that scientific progress aligns with human ethical and cultural values, as well as the welfare of humans. Science is fundamental a human enterprise, in the sense that the goal is not merely predictive models and technological outcomes but in fact a deeper goal of human-intelligible insights that provide understanding of the universe and our place in it.

We are stepping into a crucial era in which humans must determine whether to take the lead or be led. It is our responsibility to guide the use of AI to ensure that AI developments align with our shared values, and AI methods contribute meaningfully to our understanding of the natural world and scientific principles. Both academia and industry are already heavily investing in foundation models and agentic AI. As foundation models grow larger and more powerful, humans also need to connect and form a larger foundation for science and society. AI will be able to architect new solutions, but humans must be the ones that define the vision rooted in human needs and lead the mission; and by setting the goals and acting upon them with unwavering persistence, bring inspiration and form communities. The heartbeat of ExoNet is not the connected AI agents but on finding common ground and building connections between humans.

Acknowledgments

This research was supported by the Center for Functional Nanomaterials (CFN), which is a U.S. Department of Energy Office of Science User Facility, at Brookhaven National Laboratory under Contract No. DE-SC0012704. The work was also supported by a DOE Early Career Research Program.

The authors declare no competing financial interest.

References

Merchant A., Batzner S., Schoenholz S. S., Aykol M., Cheon G., Cubuk E. D.. Scaling deep learning for materials discovery. Nature. 2023;624:80–85. doi: 10.1038/s41586-023-06735-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin Z., Akin H., Rao R., Hie B., Zhu Z., Lu W., Smetanin N., Verkuil R., Kabeli O., Shmueli Y.. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023;379:1123–1130. doi: 10.1126/science.ade2574. [DOI] [PubMed] [Google Scholar]
Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A.. et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen C., Nguyen D. T., Lee S. J., Baker N. A., Karakoti A. S., Lauw L., Owen C., Mueller K. T., Bilodeau B. A., Murugesan V.. et al. Accelerating computational materials discovery with machine learning and cloud high-performance computing: from large-scale screening to experimental validation. J. Am. Chem. Soc. 2024;146:20009–20018. doi: 10.1021/jacs.4c03849. [DOI] [PubMed] [Google Scholar]
Adesiji A. D., Wang J., Kuo C.-S., Brown K. A.. Benchmarking Self-Driving Labs. arXiv:2508.06642 [physics.comp-ph] 2025:na. doi: 10.48550/arXiv.2508.06642. [DOI] [Google Scholar]
Canty R. B., Bennett J. A., Brown K. A., Buonassisi T., Kalinin S. V., Kitchin J. R., Maruyama B., Moore R. G., Schrier J., Seifrid M.. et al. Science acceleration and accessibility with self-driving labs. Nat. Commun. 2025;16:3856. doi: 10.1038/s41467-025-59231-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Delgado-Licona F., Addington D., Alsaiari A., Abolhasani M.. Engineering principles for self-driving laboratories. Nature Chemical Engineering. 2025;2:277. doi: 10.1038/s44286-025-00217-7. [DOI] [Google Scholar]
Noack M. M., Zwart P. H., Ushizima D. M., Fukuto M., Yager K. G., Elbert K. C., Murray C. B., Stein A., Doerk G. S., Tsai E. H.. et al. Gaussian processes for autonomous data acquisition at large-scale synchrotron and neutron facilities. Nature Reviews Physics. 2021;3:685–697. doi: 10.1038/s42254-021-00345-y. [DOI] [Google Scholar]
Noack, M. ; Ushizima, D. . Methods and Applications of Autonomous Experimentation; CRC Press, 2024. [Google Scholar]
Yager, K. G. In Methods and Applications of Autonomous Experimentation, 1st ed.; Noack, M. , Ushizima, D. , Eds.; Chapman and Hall/CRC, 2023; Chapter 1, p 21. [Google Scholar]
Maffettone P. M., Friederich P., Baird S. G., Blaiszik B., Brown K. A., Campbell S. I., Cohen O. A., Davis R. L., Foster I. T., Haghmoradi N.. et al. What is missing in autonomous discovery: open challenges for the community. Digital Discovery. 2023;2:1644–1659. doi: 10.1039/D3DD00143A. [DOI] [Google Scholar]
Burger B., Maffettone P. M., Gusev V. V., Aitchison C. M., Bai Y., Wang X., Li X., Alston B. M., Li B., Clowes R.. et al. A mobile robotic chemist. Nature. 2020;583:237–241. doi: 10.1038/s41586-020-2442-2. [DOI] [PubMed] [Google Scholar]
Beaucage P. A., Martin T. B.. The autonomous formulation laboratory: an open liquid handling platform for formulation discovery using x-ray and neutron scattering. Chem. Mater. 2023;35:846–852. doi: 10.1021/acs.chemmater.2c03118. [DOI] [Google Scholar]
Coley C. W., Thomas D. A. III, Lummiss J. A., Jaworski J. N., Breen C. P., Schultz V., Hart T., Fishman J. S., Rogers L., Gao H.. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science. 2019;365:eaax1566. doi: 10.1126/science.aax1566. [DOI] [PubMed] [Google Scholar]
Szymanski N. J.. et al. An autonomous laboratory for the accelerated synthesis of novel materials. Nature. 2023;624:86–91. doi: 10.1038/s41586-023-06734-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vriza A., Chan H., Xu J.. Self-driving laboratory for polymer electronics. Chem. Mater. 2023;35:3046–3056. doi: 10.1021/acs.chemmater.2c03593. [DOI] [Google Scholar]
Wang C., Kim Y.-J., Vriza A., Batra R., Baskaran A., Shan N., Li N., Darancet P., Ward L., Liu Y.. et al. Autonomous platform for solution processing of electronic polymers. Nat. Commun. 2025;16:1498. doi: 10.1038/s41467-024-55655-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Morris T. W., Rakitin M., Du Y., Fedurin M., Giles A. C., Leshchev D., Li W. H., Romasky B., Stavitski E., Walter A. L.. A general Bayesian algorithm for the autonomous alignment of beamlines. J. Synchrotron Radiation. 2024;31:1446. doi: 10.1107/S1600577524008993. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yager K. G., Majewski P. W., Noack M. M., Fukuto M.. Autonomous x-ray scattering. Nanotechnology. 2023;34:322001. doi: 10.1088/1361-6528/acd25a. [DOI] [PubMed] [Google Scholar]
Doerk G. S., Stein A., Bae S., Noack M. M., Fukuto M., Yager K. G.. Autonomous discovery of emergent morphologies in directed self-assembly of block copolymer blends. Science Advances. 2023;9:eadd3687. doi: 10.1126/sciadv.add3687. [DOI] [PMC free article] [PubMed] [Google Scholar]
McDannald A., Frontzek M., Savici A. T., Doucet M., Rodriguez E. E., Meuse K., Opsahl-Ong J., Samarov D., Takeuchi I., Ratcliff W., Kusne A. G.. On-the-fly autonomous control of neutron diffraction via physics-informed Bayesian active learning. Applied Physics Reviews. 2022;9:021408. doi: 10.1063/5.0082956. [DOI] [Google Scholar]
Kalinin S. V., Mukherjee D., Roccapriore K., Blaiszik B. J., Ghosh A., Ziatdinov M. A., Al-Najjar A., Doty C., Akers S., Rao N. S.. et al. Machine learning for automated experimentation in scanning transmission electron microscopy. npj Computational Materials. 2023;9:227. doi: 10.1038/s41524-023-01142-0. [DOI] [Google Scholar]
Pratiush U., Funakubo H., Vasudevan R., Kalinin S. V., Liu Y.. Scientific exploration with expert knowledge (SEEK) in autonomous scanning probe microscopy with active learning. Digital Discovery. 2025;4:252. doi: 10.1039/D4DD00277F. [DOI] [Google Scholar]
Bommasani R.. On the Opportunities and Risks of Foundation Models. arXiv:2108.07258 [cs.LG] 2021:na. doi: 10.48550/arXiv.2108.07258. [DOI] [Google Scholar]
Minaee S., Mikolov T., Nikzad N., Chenaghlu M., Socher R., Amatriain X., Gao J.. Large Language Models: A Survey. arXiv:2402.06196 [cs.CL] 2024:na. doi: 10.48550/arXiv.2402.06196. [DOI] [Google Scholar]
Liu Y.. Understanding LLMs: A Comprehensive Overview from Training to Inference. arXiv:2401.02038 [cs.CL] 2024:na. doi: 10.48550/arXiv.2401.02038. [DOI] [Google Scholar]
Lu C., Lu C., Lange R. T., Foerster J., Clune J., Ha D.. The AI scientist: Towards fully automated open-ended scientific discovery. arXiv:2408.06292 [cs.AI] 2024:na. doi: 10.48550/arXiv.2408.06292. [DOI] [Google Scholar]
Yamada Y., Lange R. T., Lu C., Hu S., Lu C., Foerster J., Clune J., Ha D.. The AI scientist-v2: Workshop-level automated scientific discovery via agentic tree search. arXiv:2504.08066 [cs.AI] 2025:na. doi: 10.48550/arXiv.2504.08066. [DOI] [Google Scholar]
Luo Z., Yang Z., Xu Z., Yang W., Du X.. LLM4SR: A Survey on Large Language Models for Scientific Research. arXiv:2501.04306 [cs.CL] 2025:na. doi: 10.48550/arXiv.2501.04306. [DOI] [Google Scholar]
Gottweis J., Weng W.-H., Daryin A., Tu T., Palepu A., Sirkovic P., Myaskovsky A., Weissenberger F., Rong K., Tanno R.. Towards an AI co-scientist. arXiv:2502.18864 [cs.AI] 2025:na. doi: 10.48550/arXiv.2502.18864. [DOI] [Google Scholar]
Zheng T., Deng Z., Tsang H. T., Wang W., Bai J., Wang Z., Song Y.. From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery. arXiv:2505.13259 [cs.CL] 2025:na. doi: 10.48550/arXiv.2505.13259. [DOI] [Google Scholar]
Strachan J. W. A., Albergo D., Borghini G., Pansardi O., Scaliti E., Gupta S., Saxena K., Rufo A., Panzeri S., Manzi G., Graziano M. S. A., Becchio C.. Testing theory of mind in large language models and humans. Nature Human Behaviour. 2024;8:1285–1295. doi: 10.1038/s41562-024-01882-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Street W., Siy J. O., Keeling G., Baranes A., Barnett B., McKibben M., Kanyere T., Lentz A., y Arcas B. A., Dunbar R. I. M.. LLMs achieve adult human performance on higher-order theory of mind tasks. arXiv:2405.18870 [cs.AI] 2024:na. doi: 10.48550/arXiv.2405.18870. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gao Y., Xiong Y., Gao X., Jia K., Pan J., Bi Y., Dai Y., Sun J., Wang M., Wang H.. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997 [cs.CL] 2024:na. doi: 10.48550/arXiv.2312.10997. [DOI] [Google Scholar]
Yu H., Gan A., Zhang K., Tong S., Liu Q., Liu Z.. Evaluation of Retrieval-Augmented Generation: A Survey. arXiv:2405.07437. 2024:na. doi: 10.48550/arXiv:2405.07437. [DOI] [Google Scholar]
Narayanan S. M., Braza J. D., Griffiths R.-R., Bou A., Wellawatte G., Ramos M. C., Mitchener L., Rodriques S. G., White A. D.. Training a Scientific Reasoning Model for Chemistry. arXiv:2506.17238. 2025:na. doi: 10.48550/arXiv:2506.17238. [DOI] [Google Scholar]
Wei J., Wang X., Schuurmans D., Bosma M., Ichter B., Xia F., Chi E., Le Q., Zhou D.. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903. 2023:na. doi: 10.48550/arXiv:2201.11903. [DOI] [Google Scholar]
Xu J., Fei H., Pan L., Liu Q., Lee M.-L., Hsu W.. Faithful Logical Reasoning via Symbolic Chain-of-Thought. arXiv:2405.18357. 2024:na. doi: 10.48550/arXiv:2405.18357. [DOI] [Google Scholar]
Pfau, J. ; Merrill, W. ; Bowman, S. R. . Let’s Think Dot by Dot: Hidden Computation in Transformer Language Models. 2024; https://arxiv.org/abs/2404.15758.
Zhou Y., Liu H., Srivastava T., Mei H., Tan C.. Hypothesis generation with large language models. arXiv:2404.04326. 2024:na. doi: 10.48550/arXiv:2404.04326. [DOI] [Google Scholar]
Ghafarollahi A., Buehler M. J.. SciAgents: Automating Scientific Discovery Through Bioinspired Multi-Agent Intelligent Graph Reasoning. Advanced Materials. 2025;37:2413523. doi: 10.1002/adma.202413523. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guo T., Chen X., Wang Y., Chang R., Pei S., Chawla N. V., Wiest O., Zhang X.. Large Language Model based Multi-Agents: A Survey of Progress and Challenges. arXiv:2402.01680. 2024:na. doi: 10.48550/arXiv:2402.01680. [DOI] [Google Scholar]
Dong Y., Jiang X., Jin Z., Li G.. Self-collaboration Code Generation via ChatGPT. arXiv:2304.07590. 2024:na. doi: 10.48550/arXiv:2304.07590. [DOI] [Google Scholar]
Khan A., Hughes J., Valentine D., Ruis L., Sachan K., Radhakrishnan A., Grefenstette E., Bowman S. R., Rocktäschel T., Perez E.. Debating with More Persuasive LLMs Leads to More Truthful Answers. arXiv:2402.06782. 2024:na. doi: 10.48550/arXiv:2402.06782. [DOI] [Google Scholar]; We investigate this question in an analogous setting, where stronger models (experts) possess the necessary information to answer questions and weaker models (non-experts) lack this information but are otherwise as capable. The method we evaluate is debate, where two LLM experts each argue for a different answer, and a non-expert selects the answer.
Abdelnabi S., Gomaa A., Sivaprasad S., Schönherr L., Fritz M.. LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games. arXiv:2309.17234. 2023:na. doi: 10.48550/arXiv:2309.17234. [DOI] [Google Scholar]
Parmar M., Liu X., Goyal P., Chen Y., Le L., Mishra S., Mobahi H., Gu J., Wang Z., Nakhost H.. PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving. arXiv:2502.16111. 2025:na. doi: 10.48550/arXiv:2502.16111. [DOI] [Google Scholar]
Gosmar D., Dahl D. A.. Hallucination mitigation using agentic ai natural language-based frameworks. arXiv:2501.13946. 2025:na. doi: 10.48550/arXiv:2501.13946. [DOI] [Google Scholar]
AL A., Ahn A., Becker N., Carroll S., Christie N., Cortes M., Demirci A., Du M., Li F., Luo S.. Project Sid: Many-agent simulations toward AI civilization. arXiv:2411.00114. 2024:na. doi: 10.48550/arXiv:2411.00114. [DOI] [Google Scholar]
Schmidgall S., Su Y., Wang Z., Sun X., Wu J., Yu X., Liu J., Liu Z., Barsoum E.. Agent laboratory: Using llm agents as research assistants. arXiv:2501.04227. 2025:na. doi: 10.48550/arXiv:2501.04227. [DOI] [Google Scholar]
Zhou H., Wan X., Sun R., Palangi H., Iqbal S., Vulić I., Korhonen A., Arık S. Ö.. Multi-agent design: Optimizing agents with better prompts and topologies. arXiv:2502.02533. 2025:na. doi: 10.48550/arXiv:2502.02533. [DOI] [Google Scholar]
Zimmermann Y., Bazgir A., Al-Feghali A., Ansari M., Brinson L. C., Chiang Y., Circi D., Chiu M.-H., Daelman N., Evans M. L.. 34 Examples of LLM Applications in Materials Science and Chemistry: Towards Automation, Assistants, Agents, and Accelerated Scientific Discovery. arXiv:2505.03049. 2025:na. doi: 10.48550/arXiv:2505.03049. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jiang X., Wang W., Tian S., Wang H., Lookman T., Su Y.. Applications of natural language processing and large language models in materials discovery. npj Computational Materials. 2025;11:79. doi: 10.1038/s41524-025-01554-0. [DOI] [Google Scholar]; This review explores the application of NLP tools in materials science, focusing on automatic data extraction, materials discovery, and autonomous research. We also discuss the challenges and opportunities associated with utilizing LLMs and outline the prospects and advancements that will propel the field forward
Kong L., Shoghi N., Hu G., Li P., Fung V.. MatterTune: An Integrated, User-Friendly Platform for Fine-Tuning Atomistic Foundation Models to Accelerate Materials Simulation and Discovery. arXiv:2504.10655. 2025:na. doi: 10.48550/arXiv:2504.10655. [DOI] [Google Scholar]; The MatterTune offers a flexible, generalizable framework that seamlessly integrated multiple atomistic FMs and supports tasks such as molecular dynamics simulations, materials property predictions, and materials discovery. MatterTune offers users a wide range of choices in data formats, model architectures, and training configurations. As a consequence of the modular design of MatterTune, users can freely mix and match these components according to their performance needs and specific requirements. Our experimental results to replicate the models’ original reported performance on ambient water systems and JMP’s Matbench experiments demonstrate that this unified approach to integrating atomistic FMs is both feasible and does not compromise model performance.
Choudhary K.. DiffractGPT: Atomic Structure Determination from X-ray Diffraction Patterns Using a Generative Pretrained Transformer. J. Phys. Chem. Lett. 2025;16:2110–2119. doi: 10.1021/acs.jpclett.4c03137. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zeni C., Pinsler R., Zügner D., Fowler A., Horton M., Fu X., Wang Z., Shysheya A., Crabbé J., Ueda S.. et al. A generative model for inorganic materials design. Nature. 2025;639:624–632. doi: 10.1038/s41586-025-08628-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jia S., Zhang C., Fung V.. LLMatDesign: Autonomous Materials Discovery with Large Language Models. arXiv:2406.13163. 2024:na. doi: 10.48550/arXiv:2406.13163. [DOI] [Google Scholar]
Choudhary K.. AtomGPT: Atomistic generative pretrained transformer for forward and inverse materials design. J. Phys. Chem. Lett. 2024;15:6909–6917. doi: 10.1021/acs.jpclett.4c01126. [DOI] [PubMed] [Google Scholar]
Ghafarollahi A., Buehler M. J.. Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems. arXiv:2410.13768 [cond-mat.mtrl-sci] 2024:na. doi: 10.48550/arXiv:2410.13768. [DOI] [Google Scholar]
Jablonka K. M., Schwaller P., Ortega-Guerrero A., Smit B.. Leveraging large language models for predictive chemistry. Nature Machine Intelligence. 2024;6:161–169. doi: 10.1038/s42256-023-00788-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Okabe R., West Z., Chotrattanapituk A., Cheng M., Carrizales D. C., Xie W., Cava R. J., Li M.. Large Language Model-Guided Prediction Toward Quantum Materials Synthesis. arXiv:2410.20976 [cond-mat.mtrl-sci] 2024:na. doi: 10.48550/arXiv:2410.20976. [DOI] [Google Scholar]
Reinhart W. F., Statt A.. Large language models design sequence-defined macromolecules via evolutionary optimization. npj Computational Materials. 2024;10:262. doi: 10.1038/s41524-024-01449-6. [DOI] [Google Scholar]
Kitchin J. R.. The evolving role of programming and LLMs in the development of self-driving laboratories. APL Machine Learning. 2025;3:na. doi: 10.1063/5.0266757. [DOI] [Google Scholar]
Vriza, A. ; Prince, M. ; Chan, H. ; Zhou, T. . Operating Robotic Laboratories with Large Language Models and Teachable Agents. AI for Accelerated Materials Design-ICLR, Apr 24–28, 2025, Singapore, ICLR, 2025, conference paper. [Google Scholar]
Mathur S., van der Vleuten N., Yager K. G., Tsai E.. VISION: A modular AI assistant for natural human-instrument interaction at scientific user facilities. Machine Learning: Science and Technology. 2025;6:025051. doi: 10.1088/2632-2153/add9e4. [DOI] [Google Scholar]
Yin X., Shi C., Deng J., Han Y., Jiang Y.. PEAR: A Knowledge-guided Autonomous Pipeline for Ptychography Enabled by Large Language Models. Microscopy and Microanalysis. 2024;30:ozae044.184. doi: 10.1093/mam/ozae044.184. [DOI] [Google Scholar]
Liu Y., Checa M., Vasudevan R. K.. Synergizing Human Expertise and AI Efficiency with Language Model for Microscopy Operation and Automated Experiment Design. Machine Learning: Science and Technology. 2024;5:02LT01. doi: 10.1088/2632-2153/ad52e9. [DOI] [Google Scholar]
Potemkin D., Soto C., Li R., Yager K., Tsai E.. Virtual Scientific Companion for Synchrotron Beamlines: A Prototype. arXiv:2312.17180 [cs.CL] 2023:na. doi: 10.48550/arXiv:2312.17180. [DOI] [Google Scholar]
Boiko D. A., MacKnight R., Kline B., Gomes G.. Autonomous chemical research with large language models. Nature. 2023;624:570–578. doi: 10.1038/s41586-023-06792-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Prince M. H., Chan H., Vriza A., Zhou T., Sastry V. K., Luo Y., Dearing M. T., Harder R. J., Vasudevan R. K., Cherukara M. J.. Opportunities for retrieval and tool augmented large language models in scientific facilities. npj Computational Materials. 2024;10:251. doi: 10.1038/s41524-024-01423-2. [DOI] [Google Scholar]
Choudhary K.. MicroscopyGPT: Generating Atomic-Structure Captions from Microscopy Images of 2D Materials with Vision-Language Transformers. J. Phys. Chem. Lett. 2025;16(27):7028–7035. doi: 10.1021/acs.jpclett.5c01257. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rubungo A. N., Li K., Hattrick-Simpers J., Dieng A. B.. LLM4Mat-bench: benchmarking large language models for materials property prediction. Machine Learning: Science and Technology. 2025;6:020501. doi: 10.1088/2632-2153/add3bb. [DOI] [Google Scholar]
Cheung J. J., Shen S., Zhuang Y., Li Y., Ramprasad R., Zhang C.. MSQA: Benchmarking LLMs on Graduate-Level Materials Science Reasoning and Knowledge. arXiv:2505.23982 [cs.AI] 2025:na. doi: 10.48550/arXiv:2505.23982. [DOI] [Google Scholar]
Mirza A., Alampara N., Kunchapu S., Ríos-García M., Emoekabu B., Krishnan A., Gupta T., Schilling-Wilhelmi M., Okereke M., Aneesh A.. et al. A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists. Nat. Chem. 2025;17:1027–1034. doi: 10.1038/s41557-025-01815-x. [DOI] [PMC free article] [PubMed] [Google Scholar]; Here we introduce ChemBench, an automated framework for evaluating the chemical knowledge and reasoning abilities of state-of-the-art LLMs against the expertise of chemists. We curated more than 2700 question/answer pairs, evaluated leading open- and closed-source LLMs and found that the best models, on average, outperformed the best human chemists in our study. However, the models struggle with some basic tasks and provide overconfident predictions.
Zaki M., Krishnan N. A.. et al. MaScQA: investigating materials science knowledge of large language models. Digital Discovery. 2024;3:313–327. doi: 10.1039/D3DD00188A. [DOI] [Google Scholar]
Yao L., Samantray S., Ghosh A., Roccapriore K., Kovarik L., Allec S., Ziatdinov M.. Operationalizing Serendipity: Multi-Agent AI Workflows for Enhanced Materials Characterization with Theory-in-the-Loop. arXiv:2508.06569 [cs.AI] 2025:na. doi: 10.48550/arXiv:2508.06569. [DOI] [Google Scholar]
Yager K. G.. Towards a Science Exocortex. Digital Discovery. 2024;3:1933–1957. doi: 10.1039/D4DD00178H. [DOI] [Google Scholar]
Yager K. G.. Domain-specific chatbots for science using embeddings. Digital Discovery. 2023;2:1850–1861. doi: 10.1039/D3DD00112A. [DOI] [Google Scholar]
Liu B.. Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems. arXiv:2504.01990 [cs.AI] 2025:na. doi: 10.48550/arXiv:2504.01990. [DOI] [Google Scholar]
Ke Z., Jiao F., Ming Y., Nguyen X.-P., Xu A., Long D. X., Li M., Qin C., Wang P., Savarese S., Xiong C., Joty S.. A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems. arXiv:2504.09037 [cs.AI] 2025:na. doi: 10.48550/arXiv:2504.09037. [DOI] [Google Scholar]
Eger S., Cao Y., D’Souza J., Geiger A., Greisinger C., Gross S., Hou Y., Krenn B., Lauscher A., Li Y., Lin C., Moosavi N. S., Zhao W., Miller T.. Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation. arXiv:2502.05151 [cs.CL] 2025:na. doi: 10.48550/arXiv:2502.05151. [DOI] [Google Scholar]
Willmott, P. An Introduction to Synchrotron Radiation: Techniques and Applications; John Wiley & Sons, 2019. [Google Scholar]
Kwa, T. et al. Measuring AI Ability to Complete Long Tasks. 2025; https://arxiv.org/abs/2503.14499.
Pilz K. F., Sanders J., Rahman R., Heim L.. Trends in AI Supercomputers. arXiv:2504.16026 [cs.CY] 2025:na. doi: 10.48550/arXiv:2504.16026. [DOI] [Google Scholar]
Morris M. R., Sohl-dickstein J., Fiedel N., Warkentin T., Dafoe A., Faust A., Farabet C., Legg S.. Levels of AGI for Operationalizing Progress on the Path to AGI. arXiv:2311.02462 [cs.AI] 2024:na. doi: 10.48550/arXiv:2311.02462. [DOI] [Google Scholar]
Xu B.. What is Meant by AGI? On the Definition of Artificial General Intelligence. arXiv:2404.10731 [cs.AI] 2024:na. doi: 10.48550/arXiv:2404.10731. [DOI] [Google Scholar]
Raman R., Kowalski R., Achuthan K., Iyer A., Nedungadi P.. Navigating artificial general intelligence development: societal, technological, ethical, and brain-inspired pathways. Sci. Rep. 2025;15:1–22. doi: 10.1038/s41598-025-92190-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shao Y., Zope H., Jiang Y., Pei J., Nguyen D., Brynjolfsson E., Yang D.. Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce. arXiv:2506.06576 [cs.CY] 2025:na. doi: 10.48550/arXiv:2506.06576. [DOI] [Google Scholar]
Character.ai. https://character.ai/, 2025. (Accessed: 2025-08-15).
Meta AI Studio: Build and interact with AI characters across our platforms. Meta Platforms, Inc., 2024; https://about.fb.com/news/2024/04/meta-ai-studio/ (Accessed: 2025-08-15).
Significant Gravitas AutoGPT: An experimental open-source attempt to make GPT-4 fully autonomous. https://github.com/Torantulino/Auto-GPT, 2023. (Accessed: 2025-08-15).
Nakajima, Y. BabyAGI: A simple task-driven autonomous agent. https://github.com/yoheinakajima/babyagi, 2023. (Accessed: 2025-08-15).
AgentGPT: Assemble, configure, and deploy autonomous AI Agents in your browser. https://github.com/reworkd/AgentGPT. GitHub repository, licensed under GPL-3.0, 2023.
Zapier Zapier Copilot: Build Zaps even faster with AI. https://zapier.com/blog/zapier-copilot-guide/, 2024. (Accessed: 2025-08-15).
Microsoft Power Platform Documentation Copilot Studio: Build AI-powered solutions with low-code. https://learn.microsoft.com/en-us/power-platform/architecture/products/copilot-studio, 2025. (Accessed: 2025-08-15).
Hong S.. MetaGPT: Meta Programming for Multi-Agent Collaborative Framework. arXiv:2308.00352 [cs.AI] 2023:na. doi: 10.48550/arXiv:2308.00352. [DOI] [Google Scholar]
Li G., Hammoud H. A. A. K., Itani H., Khizbullin D., Ghanem B.. CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society. arXiv:2303.17760 [cs.AI] 2023:na. doi: 10.48550/arXiv.2303.17760. [DOI] [Google Scholar]
CrewAI Inc. CrewAI GitHub Repository, 2024; https://github.com/crewAIInc/crewAI (Accessed: 2025-09-08).
TransformerOptimus SuperAGI: A dev-first open source autonomous AI agent framework. GitHub repository, 2025; https://github.com/TransformerOptimus/SuperAGI.
Google Cloud Agent-to-Agent (A2A) Protocol: A new standard for AI agent interoperability. https://cloud.google.com/blog/products/ai-machine-learning/agent2agent-protocol-is-getting-an-upgrade, 2024. (Accessed: 2025-08-15).
Achiam, J. ; Adler, S. ; Agarwal, S. ; Ahmad, L. ; Akkaya, I. ; Aleman, F. L. ; Almeida, D. ; Altenschmidt, J. ; Altman, S. ; Anadkat, S. ; et al. Technical Report: OpenAI, 2023.
OpenAI GPT-5 System Card. 2025; https://openai.com/index/introducing-gpt-5/ (Accessed: 2025-08-20).
Carbone M. R., Topsakal M., Lu D., Yoo S.. Machine-learning X-ray absorption spectra to quantitative accuracy. Physical review letters. 2020;124:156401. doi: 10.1103/PhysRevLett.124.156401. [DOI] [PubMed] [Google Scholar]
Meng F., Maurer B., Peschel F., Selcuk S., Hybertsen M., Qu X., Vorwerk C., Draxl C., Vinson J., Lu D.. Multicode benchmark on simulated Ti K-edge x-ray absorption spectra of Ti-O compounds. Physical Review Materials. 2024;8:013801. doi: 10.1103/PhysRevMaterials.8.013801. [DOI] [Google Scholar]
Allan D., Caswell T., Campbell S., Rakitin M.. Bluesky’s Ahead: AMulti-Facility Collaboration for an a la Carte Software Project for Data Acquisition and Management. Synchrotron Radiation News. 2019;32:19–22. doi: 10.1080/08940886.2019.1608121. [DOI] [Google Scholar]
Binz M., Akata E., Bethge M., Brändle F., Callaway F., Coda-Forno J., Dayan P., Demircan C., Eckstein M. K., Éltető N.. et al. A foundation model to predict and capture human cognition. Nature. 2025;644:1002–1009. doi: 10.1038/s41586-025-09215-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Abdurahman S., Atari M., Karimi-Malekabadi F., Xue M. J., Trager J., Park P. S., Golazizian P., Omrani A., Dehghani M.. Perils and opportunities in using large language models in psychological research. PNAS. 2024;3:245. doi: 10.1093/pnasnexus/pgae245. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fernando C., Marcello H., Wlodek J., Sinsheimer J., Olds D., Campbell S. I., Maffettone P. M.. Robotic integration for end-stations at scientific user facilities. Digital Discovery. 2025;4:1083–1091. doi: 10.1039/D5DD00036J. [DOI] [Google Scholar]
Bertrand Q., Bose A. J., Duplessis A., Jiralerspong M., Gidel G.. On the Stability of Iterative Retraining of Generative Models on their own Data. arXiv:2310.00429 [cs.LG] 2024:na. doi: 10.48550/arXiv.2310.00429. [DOI] [Google Scholar]
Shumailov I., Shumaylov Z., Zhao Y., Papernot N., Anderson R., Gal Y.. AI models collapse when trained on recursively generated data. Nature. 2024;631:755–759. doi: 10.1038/s41586-024-07566-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Srinidhi S., Lu E., Rowe A.. XaiR: An XR Platform that Integrates Large Language Models with the Physical World. 2024 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2024:759–767. [Google Scholar]
Wang Z., Rao M., Ye S., Song W., Lu F.. Towards spatial computing: recent advances in multimodal natural interaction for XR headsets. Front. Comput. Sci. 2025;19:1912708. doi: 10.1007/s11704-025-41123-8. [DOI] [Google Scholar]

[ref1] Merchant A., Batzner S., Schoenholz S. S., Aykol M., Cheon G., Cubuk E. D.. Scaling deep learning for materials discovery. Nature. 2023;624:80–85. doi: 10.1038/s41586-023-06735-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref2] Lin Z., Akin H., Rao R., Hie B., Zhu Z., Lu W., Smetanin N., Verkuil R., Kabeli O., Shmueli Y.. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023;379:1123–1130. doi: 10.1126/science.ade2574. [DOI] [PubMed] [Google Scholar]

[ref3] Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A.. et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] Chen C., Nguyen D. T., Lee S. J., Baker N. A., Karakoti A. S., Lauw L., Owen C., Mueller K. T., Bilodeau B. A., Murugesan V.. et al. Accelerating computational materials discovery with machine learning and cloud high-performance computing: from large-scale screening to experimental validation. J. Am. Chem. Soc. 2024;146:20009–20018. doi: 10.1021/jacs.4c03849. [DOI] [PubMed] [Google Scholar]

[ref5] Adesiji A. D., Wang J., Kuo C.-S., Brown K. A.. Benchmarking Self-Driving Labs. arXiv:2508.06642 [physics.comp-ph] 2025:na. doi: 10.48550/arXiv.2508.06642. [DOI] [Google Scholar]

[ref6] Canty R. B., Bennett J. A., Brown K. A., Buonassisi T., Kalinin S. V., Kitchin J. R., Maruyama B., Moore R. G., Schrier J., Seifrid M.. et al. Science acceleration and accessibility with self-driving labs. Nat. Commun. 2025;16:3856. doi: 10.1038/s41467-025-59231-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] Delgado-Licona F., Addington D., Alsaiari A., Abolhasani M.. Engineering principles for self-driving laboratories. Nature Chemical Engineering. 2025;2:277. doi: 10.1038/s44286-025-00217-7. [DOI] [Google Scholar]

[ref8] Noack M. M., Zwart P. H., Ushizima D. M., Fukuto M., Yager K. G., Elbert K. C., Murray C. B., Stein A., Doerk G. S., Tsai E. H.. et al. Gaussian processes for autonomous data acquisition at large-scale synchrotron and neutron facilities. Nature Reviews Physics. 2021;3:685–697. doi: 10.1038/s42254-021-00345-y. [DOI] [Google Scholar]

[ref9] Noack, M. ; Ushizima, D. . Methods and Applications of Autonomous Experimentation; CRC Press, 2024. [Google Scholar]

[ref10] Yager, K. G. In Methods and Applications of Autonomous Experimentation, 1st ed.; Noack, M. , Ushizima, D. , Eds.; Chapman and Hall/CRC, 2023; Chapter 1, p 21. [Google Scholar]

[ref11] Maffettone P. M., Friederich P., Baird S. G., Blaiszik B., Brown K. A., Campbell S. I., Cohen O. A., Davis R. L., Foster I. T., Haghmoradi N.. et al. What is missing in autonomous discovery: open challenges for the community. Digital Discovery. 2023;2:1644–1659. doi: 10.1039/D3DD00143A. [DOI] [Google Scholar]

[ref12] Burger B., Maffettone P. M., Gusev V. V., Aitchison C. M., Bai Y., Wang X., Li X., Alston B. M., Li B., Clowes R.. et al. A mobile robotic chemist. Nature. 2020;583:237–241. doi: 10.1038/s41586-020-2442-2. [DOI] [PubMed] [Google Scholar]

[ref13] Beaucage P. A., Martin T. B.. The autonomous formulation laboratory: an open liquid handling platform for formulation discovery using x-ray and neutron scattering. Chem. Mater. 2023;35:846–852. doi: 10.1021/acs.chemmater.2c03118. [DOI] [Google Scholar]

[ref14] Coley C. W., Thomas D. A. III, Lummiss J. A., Jaworski J. N., Breen C. P., Schultz V., Hart T., Fishman J. S., Rogers L., Gao H.. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science. 2019;365:eaax1566. doi: 10.1126/science.aax1566. [DOI] [PubMed] [Google Scholar]

[ref15] Szymanski N. J.. et al. An autonomous laboratory for the accelerated synthesis of novel materials. Nature. 2023;624:86–91. doi: 10.1038/s41586-023-06734-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] Vriza A., Chan H., Xu J.. Self-driving laboratory for polymer electronics. Chem. Mater. 2023;35:3046–3056. doi: 10.1021/acs.chemmater.2c03593. [DOI] [Google Scholar]

[ref17] Wang C., Kim Y.-J., Vriza A., Batra R., Baskaran A., Shan N., Li N., Darancet P., Ward L., Liu Y.. et al. Autonomous platform for solution processing of electronic polymers. Nat. Commun. 2025;16:1498. doi: 10.1038/s41467-024-55655-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref18] Morris T. W., Rakitin M., Du Y., Fedurin M., Giles A. C., Leshchev D., Li W. H., Romasky B., Stavitski E., Walter A. L.. A general Bayesian algorithm for the autonomous alignment of beamlines. J. Synchrotron Radiation. 2024;31:1446. doi: 10.1107/S1600577524008993. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref19] Yager K. G., Majewski P. W., Noack M. M., Fukuto M.. Autonomous x-ray scattering. Nanotechnology. 2023;34:322001. doi: 10.1088/1361-6528/acd25a. [DOI] [PubMed] [Google Scholar]

[ref20] Doerk G. S., Stein A., Bae S., Noack M. M., Fukuto M., Yager K. G.. Autonomous discovery of emergent morphologies in directed self-assembly of block copolymer blends. Science Advances. 2023;9:eadd3687. doi: 10.1126/sciadv.add3687. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] McDannald A., Frontzek M., Savici A. T., Doucet M., Rodriguez E. E., Meuse K., Opsahl-Ong J., Samarov D., Takeuchi I., Ratcliff W., Kusne A. G.. On-the-fly autonomous control of neutron diffraction via physics-informed Bayesian active learning. Applied Physics Reviews. 2022;9:021408. doi: 10.1063/5.0082956. [DOI] [Google Scholar]

[ref22] Kalinin S. V., Mukherjee D., Roccapriore K., Blaiszik B. J., Ghosh A., Ziatdinov M. A., Al-Najjar A., Doty C., Akers S., Rao N. S.. et al. Machine learning for automated experimentation in scanning transmission electron microscopy. npj Computational Materials. 2023;9:227. doi: 10.1038/s41524-023-01142-0. [DOI] [Google Scholar]

[ref23] Pratiush U., Funakubo H., Vasudevan R., Kalinin S. V., Liu Y.. Scientific exploration with expert knowledge (SEEK) in autonomous scanning probe microscopy with active learning. Digital Discovery. 2025;4:252. doi: 10.1039/D4DD00277F. [DOI] [Google Scholar]

[ref24] Bommasani R.. On the Opportunities and Risks of Foundation Models. arXiv:2108.07258 [cs.LG] 2021:na. doi: 10.48550/arXiv.2108.07258. [DOI] [Google Scholar]

[ref25] Minaee S., Mikolov T., Nikzad N., Chenaghlu M., Socher R., Amatriain X., Gao J.. Large Language Models: A Survey. arXiv:2402.06196 [cs.CL] 2024:na. doi: 10.48550/arXiv.2402.06196. [DOI] [Google Scholar]

[ref26] Liu Y.. Understanding LLMs: A Comprehensive Overview from Training to Inference. arXiv:2401.02038 [cs.CL] 2024:na. doi: 10.48550/arXiv.2401.02038. [DOI] [Google Scholar]

[ref27] Lu C., Lu C., Lange R. T., Foerster J., Clune J., Ha D.. The AI scientist: Towards fully automated open-ended scientific discovery. arXiv:2408.06292 [cs.AI] 2024:na. doi: 10.48550/arXiv.2408.06292. [DOI] [Google Scholar]

[ref28] Yamada Y., Lange R. T., Lu C., Hu S., Lu C., Foerster J., Clune J., Ha D.. The AI scientist-v2: Workshop-level automated scientific discovery via agentic tree search. arXiv:2504.08066 [cs.AI] 2025:na. doi: 10.48550/arXiv.2504.08066. [DOI] [Google Scholar]

[ref29] Luo Z., Yang Z., Xu Z., Yang W., Du X.. LLM4SR: A Survey on Large Language Models for Scientific Research. arXiv:2501.04306 [cs.CL] 2025:na. doi: 10.48550/arXiv.2501.04306. [DOI] [Google Scholar]

[ref30] Gottweis J., Weng W.-H., Daryin A., Tu T., Palepu A., Sirkovic P., Myaskovsky A., Weissenberger F., Rong K., Tanno R.. Towards an AI co-scientist. arXiv:2502.18864 [cs.AI] 2025:na. doi: 10.48550/arXiv.2502.18864. [DOI] [Google Scholar]

[ref31] Zheng T., Deng Z., Tsang H. T., Wang W., Bai J., Wang Z., Song Y.. From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery. arXiv:2505.13259 [cs.CL] 2025:na. doi: 10.48550/arXiv.2505.13259. [DOI] [Google Scholar]

[ref32] Strachan J. W. A., Albergo D., Borghini G., Pansardi O., Scaliti E., Gupta S., Saxena K., Rufo A., Panzeri S., Manzi G., Graziano M. S. A., Becchio C.. Testing theory of mind in large language models and humans. Nature Human Behaviour. 2024;8:1285–1295. doi: 10.1038/s41562-024-01882-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref33] Street W., Siy J. O., Keeling G., Baranes A., Barnett B., McKibben M., Kanyere T., Lentz A., y Arcas B. A., Dunbar R. I. M.. LLMs achieve adult human performance on higher-order theory of mind tasks. arXiv:2405.18870 [cs.AI] 2024:na. doi: 10.48550/arXiv.2405.18870. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref34] Gao Y., Xiong Y., Gao X., Jia K., Pan J., Bi Y., Dai Y., Sun J., Wang M., Wang H.. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997 [cs.CL] 2024:na. doi: 10.48550/arXiv.2312.10997. [DOI] [Google Scholar]

[ref35] Yu H., Gan A., Zhang K., Tong S., Liu Q., Liu Z.. Evaluation of Retrieval-Augmented Generation: A Survey. arXiv:2405.07437. 2024:na. doi: 10.48550/arXiv:2405.07437. [DOI] [Google Scholar]

[ref36] Narayanan S. M., Braza J. D., Griffiths R.-R., Bou A., Wellawatte G., Ramos M. C., Mitchener L., Rodriques S. G., White A. D.. Training a Scientific Reasoning Model for Chemistry. arXiv:2506.17238. 2025:na. doi: 10.48550/arXiv:2506.17238. [DOI] [Google Scholar]

[ref37] Wei J., Wang X., Schuurmans D., Bosma M., Ichter B., Xia F., Chi E., Le Q., Zhou D.. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903. 2023:na. doi: 10.48550/arXiv:2201.11903. [DOI] [Google Scholar]

[ref38] Xu J., Fei H., Pan L., Liu Q., Lee M.-L., Hsu W.. Faithful Logical Reasoning via Symbolic Chain-of-Thought. arXiv:2405.18357. 2024:na. doi: 10.48550/arXiv:2405.18357. [DOI] [Google Scholar]

[ref39] Pfau, J. ; Merrill, W. ; Bowman, S. R. . Let’s Think Dot by Dot: Hidden Computation in Transformer Language Models. 2024; https://arxiv.org/abs/2404.15758.

[ref40] Zhou Y., Liu H., Srivastava T., Mei H., Tan C.. Hypothesis generation with large language models. arXiv:2404.04326. 2024:na. doi: 10.48550/arXiv:2404.04326. [DOI] [Google Scholar]

[ref41] Ghafarollahi A., Buehler M. J.. SciAgents: Automating Scientific Discovery Through Bioinspired Multi-Agent Intelligent Graph Reasoning. Advanced Materials. 2025;37:2413523. doi: 10.1002/adma.202413523. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref42] Guo T., Chen X., Wang Y., Chang R., Pei S., Chawla N. V., Wiest O., Zhang X.. Large Language Model based Multi-Agents: A Survey of Progress and Challenges. arXiv:2402.01680. 2024:na. doi: 10.48550/arXiv:2402.01680. [DOI] [Google Scholar]

[ref43] Dong Y., Jiang X., Jin Z., Li G.. Self-collaboration Code Generation via ChatGPT. arXiv:2304.07590. 2024:na. doi: 10.48550/arXiv:2304.07590. [DOI] [Google Scholar]

[ref44] Khan A., Hughes J., Valentine D., Ruis L., Sachan K., Radhakrishnan A., Grefenstette E., Bowman S. R., Rocktäschel T., Perez E.. Debating with More Persuasive LLMs Leads to More Truthful Answers. arXiv:2402.06782. 2024:na. doi: 10.48550/arXiv:2402.06782. [DOI] [Google Scholar]; We investigate this question in an analogous setting, where stronger models (experts) possess the necessary information to answer questions and weaker models (non-experts) lack this information but are otherwise as capable. The method we evaluate is debate, where two LLM experts each argue for a different answer, and a non-expert selects the answer.

[ref45] Abdelnabi S., Gomaa A., Sivaprasad S., Schönherr L., Fritz M.. LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games. arXiv:2309.17234. 2023:na. doi: 10.48550/arXiv:2309.17234. [DOI] [Google Scholar]

[ref46] Parmar M., Liu X., Goyal P., Chen Y., Le L., Mishra S., Mobahi H., Gu J., Wang Z., Nakhost H.. PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving. arXiv:2502.16111. 2025:na. doi: 10.48550/arXiv:2502.16111. [DOI] [Google Scholar]

[ref47] Gosmar D., Dahl D. A.. Hallucination mitigation using agentic ai natural language-based frameworks. arXiv:2501.13946. 2025:na. doi: 10.48550/arXiv:2501.13946. [DOI] [Google Scholar]

[ref48] AL A., Ahn A., Becker N., Carroll S., Christie N., Cortes M., Demirci A., Du M., Li F., Luo S.. Project Sid: Many-agent simulations toward AI civilization. arXiv:2411.00114. 2024:na. doi: 10.48550/arXiv:2411.00114. [DOI] [Google Scholar]

[ref49] Schmidgall S., Su Y., Wang Z., Sun X., Wu J., Yu X., Liu J., Liu Z., Barsoum E.. Agent laboratory: Using llm agents as research assistants. arXiv:2501.04227. 2025:na. doi: 10.48550/arXiv:2501.04227. [DOI] [Google Scholar]

[ref50] Zhou H., Wan X., Sun R., Palangi H., Iqbal S., Vulić I., Korhonen A., Arık S. Ö.. Multi-agent design: Optimizing agents with better prompts and topologies. arXiv:2502.02533. 2025:na. doi: 10.48550/arXiv:2502.02533. [DOI] [Google Scholar]

[ref51] Zimmermann Y., Bazgir A., Al-Feghali A., Ansari M., Brinson L. C., Chiang Y., Circi D., Chiu M.-H., Daelman N., Evans M. L.. 34 Examples of LLM Applications in Materials Science and Chemistry: Towards Automation, Assistants, Agents, and Accelerated Scientific Discovery. arXiv:2505.03049. 2025:na. doi: 10.48550/arXiv:2505.03049. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref52] Jiang X., Wang W., Tian S., Wang H., Lookman T., Su Y.. Applications of natural language processing and large language models in materials discovery. npj Computational Materials. 2025;11:79. doi: 10.1038/s41524-025-01554-0. [DOI] [Google Scholar]; This review explores the application of NLP tools in materials science, focusing on automatic data extraction, materials discovery, and autonomous research. We also discuss the challenges and opportunities associated with utilizing LLMs and outline the prospects and advancements that will propel the field forward

[ref53] Kong L., Shoghi N., Hu G., Li P., Fung V.. MatterTune: An Integrated, User-Friendly Platform for Fine-Tuning Atomistic Foundation Models to Accelerate Materials Simulation and Discovery. arXiv:2504.10655. 2025:na. doi: 10.48550/arXiv:2504.10655. [DOI] [Google Scholar]; The MatterTune offers a flexible, generalizable framework that seamlessly integrated multiple atomistic FMs and supports tasks such as molecular dynamics simulations, materials property predictions, and materials discovery. MatterTune offers users a wide range of choices in data formats, model architectures, and training configurations. As a consequence of the modular design of MatterTune, users can freely mix and match these components according to their performance needs and specific requirements. Our experimental results to replicate the models’ original reported performance on ambient water systems and JMP’s Matbench experiments demonstrate that this unified approach to integrating atomistic FMs is both feasible and does not compromise model performance.

[ref54] Choudhary K.. DiffractGPT: Atomic Structure Determination from X-ray Diffraction Patterns Using a Generative Pretrained Transformer. J. Phys. Chem. Lett. 2025;16:2110–2119. doi: 10.1021/acs.jpclett.4c03137. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref55] Zeni C., Pinsler R., Zügner D., Fowler A., Horton M., Fu X., Wang Z., Shysheya A., Crabbé J., Ueda S.. et al. A generative model for inorganic materials design. Nature. 2025;639:624–632. doi: 10.1038/s41586-025-08628-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref56] Jia S., Zhang C., Fung V.. LLMatDesign: Autonomous Materials Discovery with Large Language Models. arXiv:2406.13163. 2024:na. doi: 10.48550/arXiv:2406.13163. [DOI] [Google Scholar]

[ref57] Choudhary K.. AtomGPT: Atomistic generative pretrained transformer for forward and inverse materials design. J. Phys. Chem. Lett. 2024;15:6909–6917. doi: 10.1021/acs.jpclett.4c01126. [DOI] [PubMed] [Google Scholar]

[ref58] Ghafarollahi A., Buehler M. J.. Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems. arXiv:2410.13768 [cond-mat.mtrl-sci] 2024:na. doi: 10.48550/arXiv:2410.13768. [DOI] [Google Scholar]

[ref59] Jablonka K. M., Schwaller P., Ortega-Guerrero A., Smit B.. Leveraging large language models for predictive chemistry. Nature Machine Intelligence. 2024;6:161–169. doi: 10.1038/s42256-023-00788-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref60] Okabe R., West Z., Chotrattanapituk A., Cheng M., Carrizales D. C., Xie W., Cava R. J., Li M.. Large Language Model-Guided Prediction Toward Quantum Materials Synthesis. arXiv:2410.20976 [cond-mat.mtrl-sci] 2024:na. doi: 10.48550/arXiv:2410.20976. [DOI] [Google Scholar]

[ref61] Reinhart W. F., Statt A.. Large language models design sequence-defined macromolecules via evolutionary optimization. npj Computational Materials. 2024;10:262. doi: 10.1038/s41524-024-01449-6. [DOI] [Google Scholar]

[ref62] Kitchin J. R.. The evolving role of programming and LLMs in the development of self-driving laboratories. APL Machine Learning. 2025;3:na. doi: 10.1063/5.0266757. [DOI] [Google Scholar]

[ref63] Vriza, A. ; Prince, M. ; Chan, H. ; Zhou, T. . Operating Robotic Laboratories with Large Language Models and Teachable Agents. AI for Accelerated Materials Design-ICLR, Apr 24–28, 2025, Singapore, ICLR, 2025, conference paper. [Google Scholar]

[ref64] Mathur S., van der Vleuten N., Yager K. G., Tsai E.. VISION: A modular AI assistant for natural human-instrument interaction at scientific user facilities. Machine Learning: Science and Technology. 2025;6:025051. doi: 10.1088/2632-2153/add9e4. [DOI] [Google Scholar]

[ref65] Yin X., Shi C., Deng J., Han Y., Jiang Y.. PEAR: A Knowledge-guided Autonomous Pipeline for Ptychography Enabled by Large Language Models. Microscopy and Microanalysis. 2024;30:ozae044.184. doi: 10.1093/mam/ozae044.184. [DOI] [Google Scholar]

[ref66] Liu Y., Checa M., Vasudevan R. K.. Synergizing Human Expertise and AI Efficiency with Language Model for Microscopy Operation and Automated Experiment Design. Machine Learning: Science and Technology. 2024;5:02LT01. doi: 10.1088/2632-2153/ad52e9. [DOI] [Google Scholar]

[ref67] Potemkin D., Soto C., Li R., Yager K., Tsai E.. Virtual Scientific Companion for Synchrotron Beamlines: A Prototype. arXiv:2312.17180 [cs.CL] 2023:na. doi: 10.48550/arXiv:2312.17180. [DOI] [Google Scholar]

[ref68] Boiko D. A., MacKnight R., Kline B., Gomes G.. Autonomous chemical research with large language models. Nature. 2023;624:570–578. doi: 10.1038/s41586-023-06792-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref69] Prince M. H., Chan H., Vriza A., Zhou T., Sastry V. K., Luo Y., Dearing M. T., Harder R. J., Vasudevan R. K., Cherukara M. J.. Opportunities for retrieval and tool augmented large language models in scientific facilities. npj Computational Materials. 2024;10:251. doi: 10.1038/s41524-024-01423-2. [DOI] [Google Scholar]

[ref70] Choudhary K.. MicroscopyGPT: Generating Atomic-Structure Captions from Microscopy Images of 2D Materials with Vision-Language Transformers. J. Phys. Chem. Lett. 2025;16(27):7028–7035. doi: 10.1021/acs.jpclett.5c01257. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref71] Rubungo A. N., Li K., Hattrick-Simpers J., Dieng A. B.. LLM4Mat-bench: benchmarking large language models for materials property prediction. Machine Learning: Science and Technology. 2025;6:020501. doi: 10.1088/2632-2153/add3bb. [DOI] [Google Scholar]

[ref72] Cheung J. J., Shen S., Zhuang Y., Li Y., Ramprasad R., Zhang C.. MSQA: Benchmarking LLMs on Graduate-Level Materials Science Reasoning and Knowledge. arXiv:2505.23982 [cs.AI] 2025:na. doi: 10.48550/arXiv:2505.23982. [DOI] [Google Scholar]

[ref73] Mirza A., Alampara N., Kunchapu S., Ríos-García M., Emoekabu B., Krishnan A., Gupta T., Schilling-Wilhelmi M., Okereke M., Aneesh A.. et al. A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists. Nat. Chem. 2025;17:1027–1034. doi: 10.1038/s41557-025-01815-x. [DOI] [PMC free article] [PubMed] [Google Scholar]; Here we introduce ChemBench, an automated framework for evaluating the chemical knowledge and reasoning abilities of state-of-the-art LLMs against the expertise of chemists. We curated more than 2700 question/answer pairs, evaluated leading open- and closed-source LLMs and found that the best models, on average, outperformed the best human chemists in our study. However, the models struggle with some basic tasks and provide overconfident predictions.

[ref74] Zaki M., Krishnan N. A.. et al. MaScQA: investigating materials science knowledge of large language models. Digital Discovery. 2024;3:313–327. doi: 10.1039/D3DD00188A. [DOI] [Google Scholar]

[ref75] Yao L., Samantray S., Ghosh A., Roccapriore K., Kovarik L., Allec S., Ziatdinov M.. Operationalizing Serendipity: Multi-Agent AI Workflows for Enhanced Materials Characterization with Theory-in-the-Loop. arXiv:2508.06569 [cs.AI] 2025:na. doi: 10.48550/arXiv:2508.06569. [DOI] [Google Scholar]

[ref76] Yager K. G.. Towards a Science Exocortex. Digital Discovery. 2024;3:1933–1957. doi: 10.1039/D4DD00178H. [DOI] [Google Scholar]

[ref77] Yager K. G.. Domain-specific chatbots for science using embeddings. Digital Discovery. 2023;2:1850–1861. doi: 10.1039/D3DD00112A. [DOI] [Google Scholar]

[ref78] Liu B.. Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems. arXiv:2504.01990 [cs.AI] 2025:na. doi: 10.48550/arXiv:2504.01990. [DOI] [Google Scholar]

[ref79] Ke Z., Jiao F., Ming Y., Nguyen X.-P., Xu A., Long D. X., Li M., Qin C., Wang P., Savarese S., Xiong C., Joty S.. A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems. arXiv:2504.09037 [cs.AI] 2025:na. doi: 10.48550/arXiv:2504.09037. [DOI] [Google Scholar]

[ref80] Eger S., Cao Y., D’Souza J., Geiger A., Greisinger C., Gross S., Hou Y., Krenn B., Lauscher A., Li Y., Lin C., Moosavi N. S., Zhao W., Miller T.. Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation. arXiv:2502.05151 [cs.CL] 2025:na. doi: 10.48550/arXiv:2502.05151. [DOI] [Google Scholar]

[ref81] Willmott, P. An Introduction to Synchrotron Radiation: Techniques and Applications; John Wiley & Sons, 2019. [Google Scholar]

[ref82] Kwa, T. et al. Measuring AI Ability to Complete Long Tasks. 2025; https://arxiv.org/abs/2503.14499.

[ref83] Pilz K. F., Sanders J., Rahman R., Heim L.. Trends in AI Supercomputers. arXiv:2504.16026 [cs.CY] 2025:na. doi: 10.48550/arXiv:2504.16026. [DOI] [Google Scholar]

[ref84] Morris M. R., Sohl-dickstein J., Fiedel N., Warkentin T., Dafoe A., Faust A., Farabet C., Legg S.. Levels of AGI for Operationalizing Progress on the Path to AGI. arXiv:2311.02462 [cs.AI] 2024:na. doi: 10.48550/arXiv:2311.02462. [DOI] [Google Scholar]

[ref85] Xu B.. What is Meant by AGI? On the Definition of Artificial General Intelligence. arXiv:2404.10731 [cs.AI] 2024:na. doi: 10.48550/arXiv:2404.10731. [DOI] [Google Scholar]

[ref86] Raman R., Kowalski R., Achuthan K., Iyer A., Nedungadi P.. Navigating artificial general intelligence development: societal, technological, ethical, and brain-inspired pathways. Sci. Rep. 2025;15:1–22. doi: 10.1038/s41598-025-92190-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref87] Shao Y., Zope H., Jiang Y., Pei J., Nguyen D., Brynjolfsson E., Yang D.. Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce. arXiv:2506.06576 [cs.CY] 2025:na. doi: 10.48550/arXiv:2506.06576. [DOI] [Google Scholar]

[ref88] Character.ai. https://character.ai/, 2025. (Accessed: 2025-08-15).

[ref89] Meta AI Studio: Build and interact with AI characters across our platforms. Meta Platforms, Inc., 2024; https://about.fb.com/news/2024/04/meta-ai-studio/ (Accessed: 2025-08-15).

[ref90] Significant Gravitas AutoGPT: An experimental open-source attempt to make GPT-4 fully autonomous. https://github.com/Torantulino/Auto-GPT, 2023. (Accessed: 2025-08-15).

[ref91] Nakajima, Y. BabyAGI: A simple task-driven autonomous agent. https://github.com/yoheinakajima/babyagi, 2023. (Accessed: 2025-08-15).

[ref92] AgentGPT: Assemble, configure, and deploy autonomous AI Agents in your browser. https://github.com/reworkd/AgentGPT. GitHub repository, licensed under GPL-3.0, 2023.

[ref93] Zapier Zapier Copilot: Build Zaps even faster with AI. https://zapier.com/blog/zapier-copilot-guide/, 2024. (Accessed: 2025-08-15).

[ref94] Microsoft Power Platform Documentation Copilot Studio: Build AI-powered solutions with low-code. https://learn.microsoft.com/en-us/power-platform/architecture/products/copilot-studio, 2025. (Accessed: 2025-08-15).

[ref95] Hong S.. MetaGPT: Meta Programming for Multi-Agent Collaborative Framework. arXiv:2308.00352 [cs.AI] 2023:na. doi: 10.48550/arXiv:2308.00352. [DOI] [Google Scholar]

[ref96] Li G., Hammoud H. A. A. K., Itani H., Khizbullin D., Ghanem B.. CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society. arXiv:2303.17760 [cs.AI] 2023:na. doi: 10.48550/arXiv.2303.17760. [DOI] [Google Scholar]

[ref97] CrewAI Inc. CrewAI GitHub Repository, 2024; https://github.com/crewAIInc/crewAI (Accessed: 2025-09-08).

[ref98] TransformerOptimus SuperAGI: A dev-first open source autonomous AI agent framework. GitHub repository, 2025; https://github.com/TransformerOptimus/SuperAGI.

[ref99] Google Cloud Agent-to-Agent (A2A) Protocol: A new standard for AI agent interoperability. https://cloud.google.com/blog/products/ai-machine-learning/agent2agent-protocol-is-getting-an-upgrade, 2024. (Accessed: 2025-08-15).

[ref100] Achiam, J. ; Adler, S. ; Agarwal, S. ; Ahmad, L. ; Akkaya, I. ; Aleman, F. L. ; Almeida, D. ; Altenschmidt, J. ; Altman, S. ; Anadkat, S. ; et al. Technical Report: OpenAI, 2023.

[ref101] OpenAI GPT-5 System Card. 2025; https://openai.com/index/introducing-gpt-5/ (Accessed: 2025-08-20).

[ref102] Carbone M. R., Topsakal M., Lu D., Yoo S.. Machine-learning X-ray absorption spectra to quantitative accuracy. Physical review letters. 2020;124:156401. doi: 10.1103/PhysRevLett.124.156401. [DOI] [PubMed] [Google Scholar]

[ref103] Meng F., Maurer B., Peschel F., Selcuk S., Hybertsen M., Qu X., Vorwerk C., Draxl C., Vinson J., Lu D.. Multicode benchmark on simulated Ti K-edge x-ray absorption spectra of Ti-O compounds. Physical Review Materials. 2024;8:013801. doi: 10.1103/PhysRevMaterials.8.013801. [DOI] [Google Scholar]

[ref104] Allan D., Caswell T., Campbell S., Rakitin M.. Bluesky’s Ahead: AMulti-Facility Collaboration for an a la Carte Software Project for Data Acquisition and Management. Synchrotron Radiation News. 2019;32:19–22. doi: 10.1080/08940886.2019.1608121. [DOI] [Google Scholar]

[ref105] Binz M., Akata E., Bethge M., Brändle F., Callaway F., Coda-Forno J., Dayan P., Demircan C., Eckstein M. K., Éltető N.. et al. A foundation model to predict and capture human cognition. Nature. 2025;644:1002–1009. doi: 10.1038/s41586-025-09215-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref106] Abdurahman S., Atari M., Karimi-Malekabadi F., Xue M. J., Trager J., Park P. S., Golazizian P., Omrani A., Dehghani M.. Perils and opportunities in using large language models in psychological research. PNAS. 2024;3:245. doi: 10.1093/pnasnexus/pgae245. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref107] Fernando C., Marcello H., Wlodek J., Sinsheimer J., Olds D., Campbell S. I., Maffettone P. M.. Robotic integration for end-stations at scientific user facilities. Digital Discovery. 2025;4:1083–1091. doi: 10.1039/D5DD00036J. [DOI] [Google Scholar]

[ref108] Bertrand Q., Bose A. J., Duplessis A., Jiralerspong M., Gidel G.. On the Stability of Iterative Retraining of Generative Models on their own Data. arXiv:2310.00429 [cs.LG] 2024:na. doi: 10.48550/arXiv.2310.00429. [DOI] [Google Scholar]

[ref109] Shumailov I., Shumaylov Z., Zhao Y., Papernot N., Anderson R., Gal Y.. AI models collapse when trained on recursively generated data. Nature. 2024;631:755–759. doi: 10.1038/s41586-024-07566-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref110] Srinidhi S., Lu E., Rowe A.. XaiR: An XR Platform that Integrates Large Language Models with the Physical World. 2024 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2024:759–767. [Google Scholar]

[ref111] Wang Z., Rao M., Ye S., Song W., Lu F.. Towards spatial computing: recent advances in multimodal natural interaction for XR headsets. Front. Comput. Sci. 2025;19:1912708. doi: 10.1007/s11704-025-41123-8. [DOI] [Google Scholar]

PERMALINK

Exocortex Network for AI-Augmented Human-Led Scientific Expedition

Esther H R Tsai

Kevin G Yager

Abstract

Introduction

Exocortex Network

1.

Delegate-AI

AI-Filtering

Augment Human–Human Interaction

Envisioning Collaborative AI

2.

Outlook

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Exocortex Network for AI-Augmented Human-Led Scientific Expedition

Esther H R Tsai

Kevin G Yager

Abstract

Introduction

Exocortex Network

1.

Delegate-AI

AI-Filtering

Augment Human–Human Interaction

Envisioning Collaborative AI

2.

Outlook

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases