Abstract
Intelligent decision-making (IDM) is a cornerstone of artificial intelligence (AI) designed to automate or augment decision processes. Modern IDM paradigms integrate advanced frameworks to enable intelligent agents to make effective and adaptive choices and decompose complex tasks into manageable steps, such as AI agents and high-level reinforcement learning. Recent advances in multimodal foundation-based approaches unify diverse input modalities—such as vision, language, and sensory data—into a cohesive decision-making process. Foundation models (FMs) have become pivotal in science and industry, transforming decision-making and research capabilities. Their large-scale, multimodal data-processing abilities foster adaptability and interdisciplinary breakthroughs across fields such as healthcare, life sciences, and education. This survey examines IDM’s evolution, advanced paradigms with FMs and their transformative impact on decision-making across diverse scientific and industrial domains, highlighting the challenges and opportunities in building efficient, adaptive, and ethical decision systems.
Keywords: artificial intelligence, intelligent decision-making, foundation models, agent, large language model
Graphical abstract

Public summary
-
•
Decision intelligence evolved from rule-based to AI-driven, enabling adaptive, context-aware choices.
-
•
Foundation models unify knowledge to enable scalable, adaptive decision-making in healthcare and other fields.
-
•
The decision-making foundation model’s progress hinges on security, privacy, and human-AI ethics.
Introduction
Decision theory has evolved over centuries and progressed from early concepts of probability theory and expected value to more sophisticated models incorporating psychological factors. In the 1940s, Von Neumann and Morgenstern’s work about expected utility provided the mathematical foundations and a conceptual framework for decision-making.1 Herbert Simon’s Administrative Behavior was a landmark work that significantly shaped modern decision theory, emphasizing the cognitive aspects of decision-making.2,3 Later, Daniel Kahneman and Amos Tversky proposed Prospect Theory and the concept of two systems of thinking, providing a more accurate understanding of decision-making.4 In summary, decision-making is a complex process of problem solving, which is not a single event but a process involving a series of steps. One widely accepted multistep decision-making framework is called OODA,5 short for observe, orient, decide, and act phases in a loop. Intuitively, OODA describes the process of collecting or rendering the sensory data, extracting informative evidence, executing logical reasoning, and determining the optimal action in either a sequential or nonsequential manner. Intelligent decision-making (IDM) represents a cornerstone of artificial intelligence (AI) with the purpose of surrogating some OODA phases or assisting human capabilities in evaluating options, making choices, or manipulating outcomes through active intervention in complex and dynamic environments.6 It is an interdisciplinary field that comes across computer science, psychology and cognitive science, economics and game theory, operation research, control theory, and statistics. Distinguished from traditional decision-making processes, the implementation of IDM relies on a collection of models, optimization algorithms, and probabilistic inference tools to automate the process and enjoys lasting popularity in robotics,7 finance,8 healthcare,9 and other industrial applications.
In total, this survey aims at facilitating IDM’s application across different fields. To this end, we introduce IDM concepts, summarize the required techniques in IDM, recap recent advances in typical IDM paradigms, and present some promising foundation model (FM) based IDM approaches.
Elements of decision-making
IDM is composed of four elements: (1) the environment for agents to engage and collect observations and feedback; (2) the agents or decision-makers to execute plans and strategies under a feasible action space; (3) the rewards or utility function to specify the objective or goal; and (4) intelligent tools such as heuristic rule construction, learning or mining models, and other optimization strategies. As the environment is sometimes interactive and involves the state transition of dynamical systems, we can further constitute the Markovian decision process (MDP)10 when the system transition relies only on the state observed in the last time step; otherwise, the environment is non-Markovian. On the other hand, from the system state’s observability in decision-making, we can further split the decision process into completely observed decision processes and partially observed decision processes. For example, some robotic arms are not able to perceive some joint positions but can collect some images from cameras to infer the exact state of itself, and the inherent decision-making environment can be abstracted as a partially observed MDP (POMDP).11 We can further define the nonstationarity of environments when underlying system dynamics and statistical traits change over time. In terms of intelligent tools, several have emerged in the last century, which include rule construction, heuristic search strategies, machine-learning methods, and employment of FMs.
Concept of FMs
The literature generally refers to the large-scale, pretrained machine-learning model for general-purpose use in numerous downstream tasks such as FMs. Some typical models, such as BERT,12 GPT,13 and contrastive language-image pretraining (CLIP),14 are trained on massive language, vision, audio, or multimodal datasets to capture informative patterns and extract generalizable representations of examples. Developing FMs handles diverse tasks and depends on the integration of several learning paradigms, such as self-supervised learning,15,16 meta-learning,17,18,19 and multitask learning.20,21,22
When decision-making meets FMs
The birth of FMs allows for rapid adaptation to specific applications through fine-tuning,23,24 thereby circumventing the need to learn from scratch during deployment. Considering some computational- and data-expensive components in traditional decision-making paradigms, e.g., vanilla RL, it is necessary to revolutionize decision-making with the help of technologies in developing FMs and seek paths to benefit from them.
Also, it is important to emphasize that this work distinguishes itself from other surveys on FMs. The primary focus here is to outline trends in IDM and explore the potential of leveraging recent advancements in FMs to facilitate the development of IDM models across a wider array of application scenarios.
Decision-making technologies genealogy
Decision-making techniques can be divided into traditional decision-making techniques and IDM techniques. Conventional decision-making generally depends on the experience and intuition of human experts, while IDM depends on the driving mode of algorithms and data. IDM solves the problem of algorithm explosion when traditional decision-making faces large-scale state action spaces, as well as the problem of poor generalization of traditional decision-making algorithms in different fields. The conventional decision-making techniques include game-theory-based decision-making techniques,25 heuristic-optimization-based decision-making techniques,26 and knowledge-based decision-making techniques.27 IDM includes IDM techniques based on deep reinforcement learning (DRL), large language models (LLMs), and basic large models.28 Traditional techniques are highly effective in handling simple, linear decision problems, but they still have certain limitations when facing complex decision spaces with multidimensional nonlinearity. IDM based on large models has excellent decision-making ability when facing high-dimensional complex nonlinear state action spaces. Different development stages of IDM are shown in Figure 1.
Figure 1.
The development history of intelligent decision-making
Rule-based decision support system achieves decision support through the rule base and the fact base and is suitable for scenarios driven by clear rules. Data-driven decision support system, which combines technologies such as neural networks and decision trees, is applied to projects like Deep Blue and AlphaGo and has surpassed humans in multiple fields. Decision-making with foundation models, an emerging technology in data-driven decision-making, achieves decision optimization by utilizing large models (such as the GPT series and LLaMA) through steps like demonstration data collection, data annotation, and reward model training, with application cases including OpenVLA, RoboGen, and others.
IDM with expert rules
Throughout the history of intelligent decision systems (IDSs), the decision support system (DSS) has played a pivotal role in shaping both academia and industry. The primary aim of establishing the DSS is to replicate the decision-making patterns of domain experts and to execute judgments through automated programs.29 In the early stages, structured datasets were scarce and challenging to obtain, leading to the use of a collection of rules and common sense as the knowledge base to identify scenarios and apply reasoning for decision-making. This approach enables fast processing of decision queries with a good level of explainability; however, it is limited to specific domains, relies heavily on costly expert knowledge, and struggles to address cases beyond the established knowledge base.
As the number of learnable episodes increases within the database, machine-learning and data-mining tools gain prominence in system development. They facilitate the creation of data-driven models that capture meaningful patterns, thereby enhancing decision-making processes. Effective algorithms and learning models enhance the advantages of data-driven DSSs by predicting outcomes and trends under various policies while also enabling the automatic discovery of knowledge. One significant approach within this realm is DRL,10 which involves interacting with environments, collecting reward signals, and assigning credit to actions during sequential decision-making. DRL has achieved impressive successes in areas such as real-time strategic games,30,31 drone racing,32 and GO games.33 To enhance sample efficiency, a notable paradigm known as offline RL has emerged, which learns from a static, large transition dataset.34,35 While these data-driven DSSs improve generalization capabilities as data quantity and quality increase and reduce reliance on meticulously crafted expert knowledge, they remain largely static and are only effective in a limited range of complex scenarios, ultimately falling short of achieving true plug-and-play functionality in practice.
IDM with shallow and deep learning methods
The conventional decision-making techniques include game-theory-based decision-making methods, heuristic optimization algorithm-based methods, and knowledge-based decision-making techniques. Traditional decision-making techniques are designed to address different problems and select appropriate decision-making algorithms based on the specific decision problem. The game modes between intelligent agents include prisoner’s dilemma, gambler’s game, and Nash equilibrium. To achieve maximum cumulative benefits and returns for each agent, IDM driven by agent returns is utilized based on agent game rules.
Heuristic optimization algorithms include genetic algorithm (GA), particle swarm optimization algorithm (PSO), ant colony algorithm (ACO), taboo search (TS), and simulated annealing algorithm (SA), each designed for different optimization problems and used to solve optimization problems in specific fields. GA, PSO, and ACO are all swarm optimization algorithms, while TS and SA are individual optimization algorithms. GA and ACO can only be used for discrete optimization, while the other three algorithms can be used for both discrete and continuous optimization. The core content of GA algorithm is to preserve better solutions in the selection stage, while new solutions are generated in the crossover and mutation stages. The core content of PSO is to use both local and global optimal solutions during the solution update process. The core of ACO algorithm is to update and optimize the path through the precipitation and volatilization of pheromones between ants. The core of TS algorithm is to use taboo tables to avoid repeated searches. Global neighborhood search algorithm is an extension of local neighborhood search algorithm. The core of SA algorithm is to simulate the principle of solid annealing and gradually reach the ground state through neighborhood search. The application scope of GA algorithm includes optimization design, machine learning (parameter optimization), and image processing. PSO is applied to neural network weight optimization, wireless sensor network node deployment, machine-learning parameter optimization, image processing, and intelligent control. ACO algorithm’s application areas include traveler problems (finding the shortest path), resource scheduling, and network optimization, The application scope of TS algorithm is combination optimization problem, optimization model parameters, and optimization of communication network topology structure. The application scope of SA algorithm is traveler issues and parameter optimization.
Knowledge-based decision-making methods include Bayesian inference and expert systems, which use prior knowledge or knowledge bases to achieve reasoning decisions and action execution from environmental states to actions. Bayesian inference techniques and expert decision systems utilize prior knowledge and predefined rule libraries to achieve decision-making and action selection for specific problems. Traditional decision-making techniques are commonly used to deal with simple linear low-dimensional decision-making problems, selecting corresponding decision-making algorithms for specific environments and decision-making applications. However, when faced with complex decisions in high-dimensional large-scale state spaces, traditional decision-making presents problems of scale explosion and exponential increase in complexity. IDM based on RL and large models can exhibit good performance in large-scale state spaces and high-dimensional state action spaces, achieving greater rewards in action execution.
The trend reveals the potential of AI technology in reshaping decision-making frameworks, and the source of decision-making priors evolves from the hard-encoded expert or human skill to the extraction of the large-scale dataset. IDM technologies are divided into RL-based and large-model-based decision, as well as decision-making with an FM indicating any model trained on a broad dataset (text, image, audio, and video) and can apply to a wide of downstream tasks.36 RL-based decision methods are generally used for the selection from state to action, while FM-based decisions can be used to sequence decision, group decision, and multimodal decision.
RL-based methods consist of value-based, value-distribution-based, policy-based, and actor-critic-based RL algorithms and have already been used for Atari games and multiple decision-making scenes and state-action selection environment, which will improve the decision-making performance for more application environments.
The initial RL is Q-learning based on table learning, then the DQN algorithm using a deep neural network instead of the table, which expands the state action to a higher dimensional state space and more complex representation. Additionally, the Double DQN algorithm uses the value network and target network to update the Q-value (return or accumulated rewards); the Deuling DQN uses Dueling architecture containing the state value and advantage value. Value-distribution-based RL methods include C51,37 QR-DQN, IQN, and FQF, which utilize the idea of value distribution to improve the decision-making ability. Policy-based RL methods utilize the direct action output to generate optimal action and include the deep policy gradient (DPG), DDPG, and proximal policy optimization (PPO). Furthermore, actor-critic-based RL algorithms include SAC, AC, A2C, and A3C, of which A3C utilizes asynchronous advantage actor-critic to acquire more efficient decision-making and larger cumulative reward.
Compared with RL-based decision-making methods, the FM-based decision-making has stronger generalization ability and adaptability. FM can act as agent (planner, decision-maker, perceiver, and actor), environment and designer, encoder, conditional generation module, and human-machine interactor. The current FMs include Transformer, Bert, T5, and GPT series, as well as LLaMA, PaLM, and others, which can be used for FM-based decision-making including sequence, group, and multimodal decision-making. Importantly, combining FMs with RL-based methods has emerged as a more popular IDM paradigm.
Based on the advancements in single-agent RL, multi-agent reinforcement learning (MARL) extends RL methods to environments involving multiple agents that must interact through cooperation or competition. MARL introduces new challenges, including nonstationarity, credit assignment, and inter-agent communication, which are addressed through various training and execution paradigms.38 Decentralized training and decentralized execution (DTDE)39 enables agents to learn and act independently without requiring centralized coordination, making it suitable for fully distributed systems. Centralized training and decentralized execution (CTDE)40,41,42 allows agents to leverage centralized information during training for improved learning efficiency while maintaining decentralized policies during execution to ensure scalability and adaptability in real-world applications. Grouped training and decentralized execution (GTDE)43 combines these approaches by organizing agents into groups for intra-group coordination during training while ensuring decentralized execution across groups. These paradigms provide robust frameworks to tackle the complexities of multi-agent systems, enabling MARL to optimize decision-making across cooperative, competitive, and mixed settings.
IDM technology based on large models utilizes the input of the large model as the state input, the output of the large model as the action execution, and the use of prompt engineering such as thought chain technology, thought tree, and thought map technology to form a decision-making process based on the large model. The currently available large models include Transformer, Bert, T5, and GPT series large models, as well as LLaMA, PaLM, and other large models, which can be used for sequence decision-making and group decision-making.
FM-based IDM
Armed with the computational platform, data are consolidated into model parameters as knowledge after optimization.
Realizing the significance of datasets, computation power, and model capacity,44 pioneer researchers have shifted attention to developing foundation decision-making models (FDMMs). Unlike traditional paradigms for developing intelligent models specific to decision-making scenarios, the original wish of developing the FDMM is to capture generalizable representations for scenarios, enable fast adaptation in a zero-shot or few-shot way,45,46 and dynamically evolve with open environments in decision-making. In this way, we can achieve computational efficiency during the test and seamlessly adapt to changing environments with minimal learning resources, which can be an indispensable consideration in real-time control problems, such as autonomous driving.
The primary aspect of the FDMM that differs from the previous data-driven DSSs lies in learning paradigms oriented to scenario distributions. To secure the cross-scenario decision-making at a lower computational or data cost, the intelligent decision-maker has to capture inherent structures from the sequential dataset, e.g., the next token as the optimal decision action through self-supervised learning,47 simultaneously handle multiple requests in a multitask manner,48 and sometimes learn to learn with a few examples as a guideline.49 Admittedly, we must recognize the ingredients in the above FDMM’s general recipe, i.e., sufficient scenarios, compact neural inference modules, and large-scale computations.
Overall, the necessity of developing the FDMM lies in a broad application purpose to scale across a wide range of decision-making scenarios rather than specific ones. Importantly, the FDMM scales previous data-driven models from scenario-specific to scenario-versatile and compresses large-scale decision-making episodes into large-scale model parameters as the primary source of the prior, obtaining transferable representations for downstream tasks. Meanwhile, the decision-making priors seem to be comprehensive nowadays to constrain the search of policies.
Overview and development of FMs
Foundation models, such as LLMs and multimodal AI systems, have emerged as powerful tools for IDM due to their ability to process and synthesize vast amounts of data across diverse domains. These models, trained on extensive datasets, excel at identifying patterns, generating insights, and providing context-aware recommendations, which are critical for making informed decisions in complex and dynamic environments. By leveraging their generalized knowledge and adaptability, FMs can assist in tasks ranging from strategic planning and risk assessment to real-time problem-solving and personalized decision support. Their integration into decision-making frameworks enhances efficiency, accuracy, and scalability, enabling organizations and individuals to tackle challenges that were previously intractable. To better introduce decision intelligence based on FMs, we introduce the development of FMs in this section. Specifically, we first present the basics of FMs. We then detail the development of LLMs (LLMs) and multimodal FMs. Finally, we introduce the optimization of FMs.
Basics of FMs
An FM is any model that is trained on broad data (generally using self-supervision at scale) that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks.50 FMs are characterized by their massive scale and vast parameter count. They excel in transfer learning, easily adapting to new tasks.51 FMs exhibit emergent abilities, often demonstrating unexpected functionalities. These features enable FMs to exert transformative impacts across various industries, significantly advancing AI technology.50,52,53,54,55
The evolution of FMs has been intrinsically linked to advancements in IDM, beginning with word-embedding techniques such as Word2Vec56 and GloVe,57 which laid the groundwork for understanding semantic relationships in data. A pivotal moment came with the introduction of the Transformer architecture,58 which enabled more sophisticated decision-making through its self-attention mechanism, followed by BERT,59 which revolutionized natural language processing by enabling context-aware predictions. The field then saw rapid advancements with large-scale language models such as GPT-260 and GPT-3,61 which demonstrated unprecedented capabilities in language understanding and generation, empowering systems to make more informed and nuanced decisions. The scope expanded further with multimodal models such as DALL-E62 and CLIP,14 which integrated vision and language for richer decision-making contexts, while Swin Transformer63 applied similar principles to computer vision, enhancing visual reasoning and decision-making. Recent developments, exemplified by models like GPT-4,13 have continued to push the boundaries of scale and refinement, enabling more accurate, context-aware, and adaptive decision-making across diverse domains. This progression underscores a shift from specialized, unimodal models to increasingly powerful, versatile, and multimodal systems, fundamentally reshaping the landscape of AI and its capacity for IDM.64,65,66 Indeed, FMs have revolutionized academic research in fields such as natural language processing (NLP), computer vision (CV), and graph learning, enabling more sophisticated and data-driven decision-making processes that were previously unattainable.
Natural language processing
FMs first gained popularity in NLP. The FMs of NLP start with ELMo,67 which adopts bidirectional long short-term memory (LSTM)68 and learns contextualized word representation. Following the introduction of Transformer, FMs in NLP have seen tremendous development, and various FMs have been proposed. They can generally be divided into two approaches: (1) autoregressive language FMs and (2) contextual language FMs. Autoregressive FMs are a class of models that generate text word by word, using previously generated words as context to predict the next word. A typical example is the GPT series.60,61,69 This approach excels in text-generation tasks, capable of producing coherent and creative text. The autoregressive language model only utilizes information from either left or right, but not simultaneously from both directions. In contrast, the contextual language FMs focus on capturing the complex semantics and contextual information of language. Typical contextual language FMs include BERT,61 UniLM,70 and T5.71 They enhance performance in various natural language processing tasks, such as text classification, question answering, and sentiment analysis, through bidirectional encoder architectures or deep contextual representations. The advantage of these models lies in their ability to understand and generate language outputs that are more contextually relevant.
Computer vision
The evolution of computer vision FMs starts with convolutional neural networks (CNNs), exemplified by models like ResNet.72 Through training on large amounts of image data, they can be used to extract image features and facilitate transfer learning.73,74 Inspired by the success of Transformers in NLP, some studies adopt Transformers as a new backbone. The Vision Transformer75 marks a significant shift, which processes images as sequences of patches, capturing long-range dependencies and enabling more nuanced feature extraction. Models like Swin Transformer63 further refine this approach. In another development, diffusion models76,77,78,79 are proposed, which iteratively denoise a random Gaussian noise image through a series of learned transformations, progressively refining it to match a target distribution. This approach excels in image-generation tasks by producing high-quality diverse outputs. Representative models include Stable Diffusion 3.0,80 DiT,81 and Sora,82 among others.78,83,84,85 The progression continued with multimodality models86,87,88 such as CLIP14 and MiniGPT-4,87 which align images and text in a shared space, enabling versatile zero-shot classification.
Graph learning
Graph learning aims to solve the problem of understanding and analyzing graph-structured data, focusing on tasks like node classification, link prediction, and graph classification.89,90,91,92,93,94,95 FMs in graph learning are also becoming an emerging research topic, aiming to create versatile models that can be applied across diverse graph-based tasks and domains. The graph learning FMs can be categorized into three types: graph neural network (GNN)-based models, LLM-based models, and GNN+LLM-based models.96,97 (1) GNN-based models focus on leveraging existing graph learning paradigms and improving graph learning by innovating in areas such as the basic GNN architecture, pretraining techniques, and task-specific adaptations.98,99,100 For example, GCC101 utilizes contrastive learning to pretrain graph node embeddings. All in One102 proposes a graph prompt to adopt GNN to various downstream tasks. (2) LLM-based models investigate the use of LLMs for graph tasks by transforming graph data into text or token formats, enabling the application of language model capabilities to graph problems. Among them, TextForGraph103 transforms graph data to textual data by a carefully designed prompt template and adopts LLMs to process the textual data. InstructGLM104 incorporates graph node tokens into the vocabulary of the LLM and utilizes instruction tuning to adapt the LLM to graph learning tasks. (3) GNN+LLM-based models aim to enhance graph learning by combining GNNs and LLMs and thus can leverage the strengths of both approaches. For instance, GraD105 utilizes LLMs to encode nodes’ textual attributes, after which classic GNNs are adopted to encode the graph.
Other fields
FMs are also making significant strides in other fields, such as like time-series analysis, code generation, and speech processing106,107,108,109 In time-series analysis, FMs aim to improve the ability of forecasting and anomaly detection by capturing complex temporal patterns.110,111,112,113 Studies114,115,116 try to utilize the knowledge of LLMs and build FMs based on LLMs. Single-modal117,118,119 time-series FMs are trained solely based on time-series data, while multimodal models120,121 are also proposed to model information from both text and time-series modalities. In code generation, models like OpenAI’s Codex122 and Google’s AlphaCode123 are designed to understand and generate code snippets from natural language descriptions. These models assist in software development by automating coding tasks, thus enhancing productivity and reducing human errors.124,125 Speech processing focuses on tasks such as speech recognition, synthesis, and translation. Speech FMs generally involve both speech and text modalities. Early models like Speech-Transformer126 adapt Transformer for speech tasks, enhancing automatic speech recognition by leveraging attention mechanisms to integrate and align audio and textual information, improving transcription accuracy and contextual understanding. Recently, some studies have aimed to replicate the scaling law in speech processing. For example, Vall-E127 introduces the Transformer architecture to encode audio features and attempts to utilize the capabilities of LLMs to achieve more natural text-to-speech synthesis.
The application of FMs has become a cornerstone in advancing IDM. In knowledge-based question answering, LLMs like GPT61 demonstrate remarkable versatility in responding to diverse queries, while BERT59 excels in reading comprehension, enabling more accurate and context-aware decision-making. In reasoning tasks, Chain of Thought (CoT)128 enhances LLMs’ complex reasoning abilities by prompting them with intermediate reasoning steps, significantly boosting performance in arithmetic, common sense, and symbolic reasoning tasks. Tree of Thought (ToT)129 takes this further by enabling LLMs to explore multiple reasoning paths and evaluate choices through deliberate decision-making processes, fostering more robust and adaptive reasoning. In autonomous systems, LanguageMPC130 integrates LLMs as decision-makers, enabling complex reasoning and actionable command generation through structured thought processes and seamless integration with low-level controllers like model predictive control. The integration of LLMs into multi-agent systems131 further enhances collective intelligence by enabling agent communication, reasoning, and learning in complex decision-making scenarios. In healthcare, ChatGPT has been evaluated for its potential to assist in clinical decision-making for radiology,132 demonstrating the transformative role of LLMs in high-stakes decision-making. Similarly, integrating LLMs into autonomous vehicles enhances decision-making through natural language interaction, contextual understanding and continuous learning.133 Expel134 introduces an agent that collects experiences through trial and error, extracts insights, and leverages both insights and past experiences to improve decision-making on new tasks without requiring model parameter updates, showcasing the potential for adaptive and experience-driven decision-making. These advancements underscore the pivotal role of FMs in enabling more accurate, context-aware, and adaptive decision-making across diverse domains, fundamentally transforming how intelligent systems operate and interact with the world.
The development of FMs
FMs first attracted great attention in the domain of NLP, and soon they were expanded to the multimodal domain. In this section, we introduce the development of LLMs and multimodel FMs, respectively. The key developments of FMs are shown in Figure 2.
Figure 2.
Overview and development of foundation models
The development of LLMs
LLMs represent a significant advancement in AI and are designed to understand, process, and generate human-like text. Through extensive pretraining on vast corpora, LLMs have not only demonstrated remarkable language comprehension but also achieved a level of general intelligence that positions them as essential tools for IDM. Their ability to analyze complex data, generate insights, and predict outcomes makes them integral to enhancing decision-making processes across various domains. This section will provide a detailed introduction to the backbone models, mainstream architectures, pretraining strategies, fine-tuning, and alignment techniques, as well as the applications of LLMs.
Transformer: The backbone
LLMs are advanced machine-learning models designed to understand and generate human language. Their large parameter counts enable excellence in tasks like text generation, translation, and summarization. At the heart of the Transformer58,135 model is a design consisting of encoders and decoders, with each encoder comprising multiple identical layers that include multi-head self-attention mechanisms and simple feedforward networks. In addition, the decoders add cross-attention layers to process encoder outputs.136,137 The self-attention mechanism allows direct contextual links between distant inputs, effectively capturing long-range dependencies.138,139 Their hierarchical design and parallel processing capabilities accelerate training, enabling scaling to massive datasets and model sizes.140,141 These features make Transformers the preferred architecture for building flexible and scalable LLMs.
Mainstream LLM architectures
There are mainly four structures of language models based on the Transformer architecture: encoders (like BERT59), causal decoders (like GPT13 and LLaMA142), prefix decoders (Prefix-LM71), and encoder-decoders (like Google T572). Although encoder structures were widely used in earlier models, they have gradually been replaced by the latter three due to their limited generative capabilities and difficulty in scaling to large sizes. These structures support everything from zero-shot learning to large-scale knowledge integration.
-
(1)
Causal decoders utilize unidirectional attention mechanisms, ensuring that each output can only access previous outputs to maintain causal relationships during generation. Their advantage lies in generating natural, fluent, and consistent text, often used in chatbots, story continuation, and other tasks.
-
(2)
Prefix decoders combine a fixed prefix with a freely generated suffix. The prefix part uses bidirectional attention, while the suffix part is freely generated. Prefix decoders allow for more flexible content generation within a given contextual framework, suitable for conditional generation tasks.
-
(3)
Encoder-decoders: this classic structure includes an encoder to process input text and generate intermediate representations and a decoder to produce output text based on these representations. Its advantage is in effectively handling complex input-output relationships, although it requires extensive data for training.
Pretraining strategies
Pretraining strategies involve training models on vast amounts of unlabeled data before learning specific tasks.143 This step is crucial for LLMs, as they rely on learning general language patterns and structures from a broad range of texts. Different pretraining strategies focus models on various linguistic capabilities, such as grammar comprehension, semantic extraction, or text generation. Currently, mainstream LLMs like the GPT series and T5 primarily use autoregressive pretraining approaches. In the autoregressive framework, the prediction content can vary, further divided into two categories: language modeling and denoising autoencoding.
-
(1)
Language modeling: the most common autoregressive pretraining strategy, aiming to predict the next word given the context. This strategy captures the natural sequence in language generation, with models learning to generate subsequent words by observing previous ones. Models trained this way typically exhibit strong generative capabilities, as demonstrated by the GPT-series13 models in tasks like natural language generation, continuation, and dialog generation.
-
(2)
Denoising autoencoding: a strategy whereby input data are intentionally corrupted and the model is tasked with restoring the original data. Many LLMs implement denoising autoencoding autoregressively, such as GLM,144 Google T5,71 and BART.145 Some efforts also combine both strategies, such as UL2,146 which considers language modeling as a type of denoising autoencoding under a masking strategy.
Given the effectiveness and scalability of language modeling, mainstream LLMs continue to primarily use this approach, such as GPT,13 LLaMA,142 and QWen.147
Fine-tuning and alignment
Fine-tuning adapts a pretrained language model to specific tasks or aligns it with specific needs. However, due to the large number of parameters, the complete training process requires substantial computational resources. Parameter efficient fine-tuning (PEFT)148 significantly reduces training costs and time by focusing on a small subset of parameters. Various methods support this process, including adapter techniques that add trainable layers to the model, prefix tuning that guides outputs by adjusting input prompts, and low-rank adaptation methods that optimize by adjusting the rank of parameter matrices. Meanwhile, alignment techniques ensure that the model’s responses comply with human values and preferences, crucial for generating outputs that meet user expectations and ethical standards. Key methods in this field include reinforcement learning from human feedback (RLHF),149 effectively aligning model behavior with human values by incorporating human feedback into the training process.
Applications of LLMs
LLMs, despite their impressive capabilities, face challenges in accuracy, knowledge updates, and interpretability. To address these issues, frameworks such as prompt learning, knowledge enhancement, and tool learning have been developed. Prompt learning adapts LLMs to specific tasks by using clear instructions, with advanced methods like CoT128 mimicking human reasoning through incremental steps. However, relying solely on LLMs’ internal knowledge often leads to inaccuracies, outdated information, and limited transparency. Knowledge enhancement, such as retrieval-augmented generation (RAG)150 and knowledge graphs, improves accuracy and domain-specific knowledge by integrating external resources. Tool learning151 further enhances LLMs by enabling dynamic interaction with external tools, improving problem-solving abilities and overall performance.
The development of multimodal FMs
With the rapid advancement of AI, multimodal FM (MFM) technology has become a popular topic in both research and applications. MMFMs are capable of processing and understanding various types of data (such as text, images, audio, and video).64,65,66 By integrating information from different modalities, they generate more comprehensive and accurate insights. This capability provides a robust data foundation and cognitive support for decision intelligence. For instance, in business scenarios, multimodal models can analyze customer feedback (text), product images (visuals), and market trends (time-series data), offering decision-makers a holistic view of the market.
Model architecture
The architecture of MFMs is often based on Transformers, known for their flexibility and efficiency. However, as research has deepened, various architectures have been proposed to accommodate different types of data and tasks. Primary model architectures include the following.
-
(1)
Transformer-based encoder: in multimodal models, the encoder is a core component that extract features from diverse input forms like text, images, and audio.152 Models like Vision-and-Language Transformer (ViLT)153 directly integrate visual and textual information, enhancing the understanding of multimodal inputs.
-
(2)
Sequence generation model: this architecture is typically used for generation tasks, such as image captioning and dialog systems. The model generates coherent, contextually relevant outputs using contextual and historical information.83,154
-
(3)
Diffusion model: recently, diffusion models have gained attention for generating high-quality samples. Unlike traditional generative models, diffusion models generate data through a gradual denoising process, offering new possibilities for multimodal generation tasks.155,156
-
(4)
Autoregressive decoder: in combination with encoders, autoregressive decoders generate output sequences step by step. This mechanism is especially suitable for tasks like dialog generation and text completion, as it utilizes historical content as context.157,158
-
(5)
Graph neural network (GNN): when handling data with graph structures, GNNs can effectively model the relationships and structural information within multimodal data, suitable for applications like social networks and knowledge graphs.159,160
-
(6)
Hybrid architecture: some models combine CNNs with Transformers, enhancing visual feature extraction while using Transformers to integrate information for a better understanding of complex data.161,162,163,164
-
(7)
Generative adversarial network (GAN): GANs can be used in multimodal generation tasks to produce high-quality images or text, leveraging adversarial training to increase the realism of generated results.165,166
-
(8)
Multitask learning framework: by handling multiple tasks simultaneously, models can share knowledge, improving generalization and robustness, which is beneficial in complex applications requiring multimodal understanding.167,168
These diverse model architectures play a significant role in enhancing the performance and application range of MFMs, enabling them to better address complex real-world tasks.
Key technologies
Several key technologies are essential in the development of MFMs, boosting model performance and applicability.
-
(1)
Alignment techniques: effective alignment techniques ensure consistency across different modalities in the feature space. Through alignment, models can better understand relationships, such as between images and their descriptions, enhancing generation and recognition capabilities. Common alignment methods include contrastive learning and attention mechanisms.169,170
-
(2)
Pretraining data collection: the richness and diversity of data are crucial for model performance. During pretraining, models are typically exposed to large amounts of labeled and unlabeled data from social media, image libraries, and open-source publications across multiple domains, enabling them to learn comprehensive knowledge.165,171
-
(3)
Self-supervised learning and model optimization training: self-supervised learning allows models to learn autonomously from unlabeled data, enhancing performance in downstream tasks. Model optimization training focuses on improving performance through hyper-parameter tuning and architectural refinement.172,173,174
-
(4)
Downstream task fine-tuning: after pretraining, models need to be fine-tuned for specific downstream tasks. This process typically uses labeled datasets to improve model accuracy and efficiency on specific tasks.155,175
-
(5)
Modality fusion techniques: by effectively merging data from different modalities, models enhance multimodal information understanding, often using attention mechanisms or feature concatenation.176,177
-
(6)
Knowledge distillation: transferring knowledge from larger models to smaller models reduces computational overhead and improves efficiency, allowing models to run effectively in resource-limited environments.178,179
-
(7)
Incremental learning: this allows models to update dynamically with new data, avoiding retraining from scratch, and is crucial for handling dynamic data streams.180,181
The combination and innovation of these key technologies provide strong support for the practicality and flexibility of MFMs, facilitating their broad application in various fields.
Representative models
Currently, a variety of representative MFMs come from industry or academia.
-
(1)
Models from industry: examples include OpenAI’s GPT-413 and Google’s MUM,182 which leverage powerful computing resources and dedicated hardware, typically deployed on cloud platforms for real-time processing of large user requests. These models focus on scalability and stability for production use cases, such as virtual assistants, content generation, and data analysis.
-
(2)
Models from academia: examples include CLIP14 and DALL-E,183 primarily aimed at research with a focus on theoretical exploration and algorithmic innovation. Research models often operate on smaller hardware, prioritizing algorithm development over large-scale computation.
-
(3)
Computer and hardware differences: industrial FMs usually rely on extensive distributed computing architectures and specialized hardware (e.g., TPU or GPU) to process vast amounts of data and support efficient inference. In contrast, models from academia are more likely to operate on smaller hardware, emphasizing algorithmic novelty rather than massive computational scale.64,184
MFMs have achieved remarkable progress in recent years, driving development across various fields of AI. From model architecture to key technologies and typical applications, MFMs demonstrate tremendous potential and flexibility. With advancements in computational power and ongoing optimization of algorithms, this field is poised to see broader applications and theoretical breakthroughs in the future.
Optimization of FM training, fine-tuning, and deployment
FMs require the use of high-performance GPU clusters during the training phase, which can take weeks or months to train. During the fine-tuning process of an FM, it is necessary to consider the optimization of hyperparameters and prevent overfitting issues. During the deployment process of an FM, it is necessary to adjust the model size according to different computing resources and ensure timely response. This section first introduces the optimization strategy for training FMs, then introduces the main optimization methods for fine-tuning FMs, and finally introduces the inference optimization strategy for FMs.
Optimization of FM training
FMs typically utilize massive amounts of data and employ unsupervised pretraining methods. From the architecture of pretrained models, they can be divided into three categories, including encoder only, decoder only, and encoder-decoder forms. FMs based on the encoder-only architecture are generally modeled using masked language models and include BERT59 and RoBERTa.185 FMs based on decoder only structure are generally trained using autoregression, which maximizes the prediction probability of the next word element given an input sequence and include GPT series,60,61,69 PaLM,186 and LLaMA.142,187 The model based on the encoder-decoder hybrid structure adopts the encoder-decoder modeling method, which integrates the first two pretraining methods. It randomly occludes a character sequence and then restores the occluded content through autoregression. Representative models are the T571 and BART145 models.
Optimization of FM fine-tuning
Parameter-efficient fine-tuning is a suitable training method for FMs to adapt to downstream tasks. Therefore, efficient fine-tuning approaches have become one of the hot topics in recent years.188 LoRA (low-rank adaptation) is one of the best efficient fine-tuning paradigms.189 It works by introducing low-rank approximations for the matrices that will be used to adapt the model for specific tasks. Instead of directly modifying the original weights of the model, LoRA applies a transformation using these low-rank matrices to the outputs of affected layers. A series of improvements based on LoRA have further enhanced the performance and applicability of LoRA. QLoRA (quantized low-rank adaptation) significantly reduces memory usage while preserving fine-tuning performance. It enables the fine-tuning of a 65-B parameter model on a single 48-GB GPU, which is a substantial reduction in memory requirements compared to traditional 16-bit fine-tuning methods.190 LoRA-Flow191 employs dynamic fusion weights at each generation step for FMs in generative tasks. The method calculates fusion weights using a softmax function applied to a gate mechanism, which allows for more flexible adaptation during the generation process. This approach has been shown to outperform standard LoRA in terms of parameter efficiency and performance. MoSLoRA (mixture-of-subspaces LoRA)192 extends the concept of LoRA by decomposing the weights into two subspaces and mixing them to enhance performance. This method is equivalent to employing a fixed mixer to fuse the subspaces. MoSLoRA jointly learns the mixer with the original LoRA weights, leading to consistent improvements over standard LoRA across various tasks, including commonsense reasoning, visual instruction tuning, and subject-driven text-to-image generation.
LoRA, QLoRA, and adapter-based methods each have distinct characteristics in terms of parameter efficiency, training speed, and task performance. LoRA significantly reduces the number of trainable parameters through low-rank decomposition, achieving high parameter efficiency and relatively fast training speed while performing close to full fine-tuning on most tasks. QLoRA builds on LoRA by introducing quantization, further improving parameter efficiency and achieving very fast training speeds. However, due to potential minor precision loss from quantization, its task performance is slightly lower than LoRA, although it still performs well in resource-constrained environments. Adapter-based methods insert small adapter modules and train only a small number of parameters, resulting in moderate parameter efficiency and relatively fast training speeds, though typically slower than LoRA. They perform close to full fine-tuning on most tasks but may slightly underperform on some complex tasks compared to LoRA. Overall, LoRA and QLoRA excel in parameter efficiency and training speed, while adapter-based methods offer advantages in flexibility and task adaptability. The comparison of LoRA, QLoRA, and adapter-based methods in terms of parameter efficiency, training speed, and task performance is shown in Table 1.
Table 1.
Comparison of optimization methods for foundation models
| Method | Parameter efficiency | Training speed | Task performance |
|---|---|---|---|
| LoRA | high | faster | close to full parameter tuning |
| QLoA | very high | very fast | slightly lower than LoRA |
| Adapter-based | medium | faster | close to full parameter fine-tuning |
Optimization of FM deployment
Model compression and quantization are essential strategies for downsizing FMs without drastically affecting their performance. This process encompasses techniques such as pruning, which eliminates less critical neurons to streamline the model,193 and knowledge distillation,194,195,196,197 a method that transfers the learned knowledge from a complex model to a more compact one. Additionally, quantization plays a pivotal role by reducing the numerical precision required for model calculations,198 leading to a substantial decrease in model size and a consequent acceleration of inference times. These optimizations are crucial for making FMs more efficient and deployable in resource-constrained environments. The optimized model has shown quite impressive performance under limited resources. SANA-0.6B, a model variant, is competitive with modern giant diffusion models like Flux-12B, being 20 times smaller and over 100 times faster in throughput. It can also be deployed on a 16-GB laptop GPU, taking less than 1 s to generate a 1,024 × 1,024 resolution image.199 LLaMA 3.2 3B is designed to be optimized for edge computing and mobile devices, supporting 128k token contexts, which is exceptional in the industry. It excels in tasks such as summarization, instruction following, and text rewriting on device-side use cases. It represents a step forward in making FMs accessible and efficient for a wide range of applications, particularly those that require on-device processing and privacy.200,201
With the development of FMs, they are playing an increasingly important role in intelligent decision-making. We highlight the advantages of FMs in decision-making as follows.
-
(1)
Strong predictive capabilities: through deep learning and large-scale data training, FMs can capture complex patterns and nonlinear relationships in data, thereby improving the accuracy of predictions. For example, in the financial field, FMs can predict market trends; in the medical field, they can assist in diagnosis and formulating treatment plans.
-
(2)
Cross-domain knowledge integration: FMs possess the capability to integrate knowledge from diverse domains, offering interdisciplinary decision-making support. For instance, in climate change research, FMs can integrate multidisciplinary data from meteorology, economics, and sociology, aiding in the formulation of comprehensive strategies to address the issue.
-
(3)
Human-machine collaboration: traditional decision-making models are usually used as a tool to assist human decision-making, with limited interaction methods, while FMs are able to collaborate with humans through natural language interactions (such as ChatGPT) to provide more intuitive and flexible decision support.
The paradigm of FM-based decision-making and key technologies
Advanced decision-making paradigms leverage a combination of sophisticated models and frameworks to enable intelligent agents to make effective choices in dynamic environments. AI agents tend to exploit the rules and experience consolidated in the LLM to facilitate more efficient decision-making. High-level RL models learn policies through trial and error, optimize long-term rewards across various tasks, and decompose decision-making into easy-to-implement steps.202,203 For example, multi-agent systems and hierarchical RL architectures have emerged to address more complex coordination and task decomposition scenarios. Additionally, multimodal foundation-based decision-making integrates diverse input modalities—such as vision, language, and sensory data—into a unified decision-making process. By combining different types of information (e.g., visual cues and textual descriptions), multimodal decision-making approaches enhance robustness and adaptability, allowing agents to tackle tasks that require nuanced understanding or reasoning.
This section systematically discusses existing advanced decision-making paradigms. Integrating FMs and advanced decision frameworks brings AI closer to human-like decision-making, where cognitive flexibility and efficiency are paramount in uncertain, real-world situations.
Important roles of FMs in IDM
In the decision-making process, the FM can empower new decision-making paradigms. On the basis of the FM acting as an agent, the FM can also serve as an environment and its designer, encoder, conditional generation representation module, and human-computer interactor. The use of new decision-making paradigms can further enhance the generalization and decision-making ability of foundation large-scale models in various fields. FM-based IDM technologies generate optimal policy, action, planning, and schemes. The decision-making paradigm can be divided into three modes. In the FM agent module, the FM can serve as agent including planner, perceiver, decision-maker, and actor, as shown in Figure 3.
Figure 3.
The critical roles that FM can play for intelligent decision-making
In the FM Environment and Designer module, the FM can serve as a target for action execution, a part of the environment, or a bridge for environment state transition, enhancing the effectiveness of strategies. In FM Encoder, the FM is used to generate state encoding or optimize policy encoding. In FM Conditional Generative Presentation module and FM Interactor Human Machine, the FM is used for conditional generation and human-computer interaction, respectively. The multiple decision-making paradigms based on the FM effectively promote the adaptability of decision-making modes and different application environments. Multimode FM-based decision-making paradigms optimize greatly the performance and generalization of decision-making.
On the basis of FM-based agents, the FM can act as environment and designer, encoder, and conditional generative and human-machine interactor. When the FM acts as environment and designer, the action obtained by policy is executed into the FM to update the FM’s state, and the FM designs the reward, transition, state, and environment. The FM encodes the state to form extracted representation and policy when the FM as an encoder, i.e., the FM formalizes the state information. Combined with conditions, the FM uses the task, state, and prompt to output behavior generation and world generation, as well as action content when the FM as a conditional generative. As a human-machine interactor, the FM receives a human’s command including dialog format and then outputs policy and action to interpret the human’s command. The multiple FM decision-making paradigm optimizes greatly the performance of decision-making and extends it to the fields of natural sciences and social sciences. The basic content and various comparative attributes of three types of FM are shown in Table 2, which includes the main model technologies and application domains.
Table 2.
FM paradigm conceptual framework: Mapping functional roles, core modeling technologies, cross-domain applications, and comparative merits and flaws
| FM functions | Main modeling technologies | Application domains | Merits | Flaws |
|---|---|---|---|---|
| Agent | LLM chain of thought (CoT); reinforcement learning; expert rules | Robotics, financial trading, etc | general-purpose decision-making to complex environments, real-time response | low inference efficiency, safety verification, and alignment with human values, quality, and latency, issue in generation |
| Environment | Generative simulation (GenSim) | Urban planning, medical simulation, etc | low-cost generation of diverse scenarios, pretraining for high-risk tasks at lower cost | significant Sim2Real gap, high complexity in multimodal dynamic modeling |
| Interactor | Retrieval-augmented generation (RAG) | Educational assistance, intelligent customer service, etc | natural language interaction, user-friendly, personalized demand matching | high hallucination risk, privacy protection, and ethical alignment |
FM as agent
Task and its prompt are input into FM planner and then generate the corresponding policy 1,2, … n to select the optimal action to execute. As perceiver, the FM collects multimodal information including text, image, audio, and video from multiple environment 1,2, … n to formalize the current state. Furthermore, the FM as decision-maker55,204 uses the state and state prompt to obtain the optimal policy and corresponding action, then the action is executed into the environment, and a reward feedback of a current state is generated to optimize the policy of FM-based decision-making. In addition, the FM as actor uses tools, such as application programming interfaces (APIs), Web GPT, or Python, to generate action presentation and execute it in the environment according to the current state.205 FM-based IDM agents improve the decision scenario and paradigm to improve decision-making ability.
FM as environment and designer
In the FM-based non-agent module, the FM can act as the environment and designer36 to formalize state information into the MDP including state transition, reward function, policy, and action. the FM can serve as a target for action execution, a part of the environment, or a bridge for environment state transition, enhancing the effectiveness of strategies.109 Furthermore, the FM can design and formalize the environment state format and action space to improve policy generation. For example, the fine-tuning parameters206,207,208 matrix of FM can be regarded as an environment and trained by the fine-tuning methods, which can act as an environment state to generate corresponding policy and action to update FM parameters. As an environment designer the FM can design the corresponding state encoding,209 reward function, and state transition function210 to formalize the environment, i.e., the FM can be used to formalize its parameters, as shown in Figure 3. In addition, as a conditional generative presentation module,36 the FM can be used to output behavior generation (i.e., actions) and world models generation (i.e., environment dynamics) by combining the task description and condition.204 Furthermore, the generation model can be applied to text or image data to model behaviors, environments, and long-term trajectory.204,211 The behavior and world model can be regarded as the policy and generate the corresponding action and MDP trajectory. Generated behavior and world model can be applied to execute and formalize the environment and outline the environment conditional generation information. Also, in the FM-based encoder module, the FM can be used to encode the environment state information into extracted representation information to generate policy and action. The FM encoder module can be regarded as the multimodal encoder of environmental data; for example, the video data can convert to audio data and image data, where the audio data can be encoded into corresponding encoding information. Non-numerical state information is encoded as numerical vector information; for example, the FM encodes the text or image information into vector form for more convenient input into the large model. Vectorization212 is an excellent method that can convert non-numerical information into vectors. Thus, the encoder module is the tool that converts state information into another form for the generation of efficient policy and action. The FM is the bridge of the state representation form transition, encoding the state to form a more convenient form for computer input.
FM as human-machine interactor
In the FM-based human-machine interactor module, the FM can generate the corresponding policy, action, and interpretation according to the command of the human by the form of dialog.204 The intelligent interaction between human and machine can be realized by the FM via the dialog based on text, image, and audio. Thus, the FM can be regarded as the bridge of interaction between human and machine, meaning that the FM can be installed in robots or unmanned vehicles213 to provide intelligent answering of questions. The multiple decision-making paradigms based on FM effectively promote the adaptability of decision-making modes and different application environments. Multimode FM-based decision-making paradigms optimize greatly the performance and generalization of decision-making.
Intelligent decision-making with LLM-empowered AI agent
IDM is undergoing a transformative shift with the advent of AI agents empowered by LLMs. Unlike traditional decision-making methods, which often rely on handcrafted rules, planning from scratch, or scheduling algorithms, LLM-powered agents capitalize on large-scale pretrained models to process and respond in real time to diverse and dynamic inputs. These agents not only excel at understanding complex tasks and contexts but also facilitate knowledge transfer and enable rapid adaptation of strategies across various domains. A key advantage of LLM-based agents is their ability to swiftly adjust decisions and optimize actions in response to changing environments without requiring extensive retraining. Through mechanisms like CoT,214 LLM agents offer highly transparent and interpretable decision-making processes, breaking down complex reasoning into clear steps. This capability allows them to outperform traditional decision models, particularly in complex, dynamic settings, by providing a more adaptive and transparent decision-making framework, ideal for scenarios demanding rapid response and real-time decision-making.
Techniques for enhancing decision-making capabilities of AI agents
LLM-powered AI agents significantly enhance the decision-making capabilities by leveraging advanced techniques such as RLHF, RAG, search algorithms, and advanced reasoning methods. RLHF has proven to be highly effective in aligning LLMs with human values.149,187,215,216,217 Typically, these methods rely on human-annotated preference datasets to train a reward model, which is subsequently used to guide the training of LLMs through RL. The traditional RLHF approach employs the PPO algorithm218 to fine-tune LLMs for alignment, although it is often criticized for its significant resource demands. As a more resource-efficient alternative, recent approaches focus on directly optimizing the LLMs themselves.219,220,221,222,223,224 RAG improves knowledge acquisition by retrieving relevant external information, enabling LLM agents to generate more accurate, context-sensitive responses based on real-time data. This is particularly beneficial for tasks requiring up-to-date knowledge or complex cross-domain decision-making.150 Search algorithms, including Beam Search and Monte Carlo tree search (MCTS),225,226 enhance decision-making efficiency by exploring multiple candidate solutions or simulating various decision paths, making LLM agents more robust in complex, long-term decision tasks such as game theory or strategic planning.227 Lastly, advanced inference methods like CoT,214 ToT,129 and Graph of Thought (GoT)228 deepen reasoning capabilities. CoT improves decision transparency by breaking down complex reasoning steps, while ToT and GoT enable agents to handle intricate, multistep decision tasks by structuring information hierarchically or graphically. Together, these technologies allow LLM-powered agents to make adaptive, transparent, and informed decisions in a wide range of dynamic environments.
Application scenarios of AI agents
LLM-powered agents have a broad range of applications across fields such as strategic reasoning,229 game theory,230 real-time decision-making,231,232,233,234,235 and cross-task knowledge transfer,236,237,238,239,240,241 demonstrating their strong capabilities in complex environments. In strategic reasoning, LLM agents can predict opponents’ actions in dynamic and highly uncertain environments and adjust their strategies in real time, particularly in multi-agent game settings where the integration of game theory enhances decision-making effectiveness. For example, in complex strategy games like StarCraft, LLM agents can not only predict enemy movements but also adapt their strategies to gain an advantage in the game’s complex dynamics. Furthermore, as LLMs are increasingly applied in real-time decision-making, testing platforms such as LLM-PySC2231 and SC-Phi2232 have become essential tools for evaluating LLM agents’ performance in macro-level decisions and tactical collaboration. These platforms not only assess agents’ abilities in long-term strategic decisions but also address challenges like multimodal observation and real-time feedback, advancing LLM research in complex decision-making contexts. In deductive reasoning tasks, LLM agents also exhibit impressive performance. For example, in Werewolf, LLM agents can simulate human-like deception, trust-building, and strategic communication in virtual interactions, enhancing adaptability in complex and dynamic environments.234,235 In creative tasks, LLMs have demonstrated their ability to generate novel and creative definitions in games like MineCraft,242,243,244 and Balderdash,245 showcasing their strategic logic and innovative thinking in solving complex, open-ended problems. These applications not only highlight the potential of LLMs in reasoning and creative tasks but also reveal their versatility and broad applicability across diverse domains. LLM agents’ practical applications extend to fields such as autonomous driving239 and robotics,238,240 further proving their strength in real-time decision-making and strategic reasoning. In autonomous driving, LLM agents process real-time data from vehicles and their surroundings to quickly identify potential risks and formulate emergency strategies, providing efficient and accurate decision support.239 In robotic tasks, LLMs need to receive human natural language instructions and translate them into specific actions that the robot can execute to complete the assigned task. This requires LLMs to effectively bridge language understanding with the robot’s control system.238,240 In complex multi-agent environments, LLM agents demonstrate unique advantages in both cooperation and competition. For instance, the TMGBench platform246 tests LLM agents’ strategic reasoning and decision-making abilities across various game types, advancing the application of rational decision-making in competitive scenarios. Additionally, social reasoning games powered by LLMs are becoming increasingly popular in AI research, with platforms like AdaSociety247 and AI Metropolis248 enabling LLM agents to simulate and optimize complex social dynamics and collaborative tasks, improving decision-making efficiency and system adaptability. LLM agents’ applications in economic decision-making are also expanding. By simulating real-world economic interactions, LLM agents help researchers better understand and predict human behavior in economic decisions, advancing research in economics, sociology, and related fields.236,237 Overall, the development of LLM-agent technology and its diverse applications across complex domains not only enhances real-time decision-making capabilities but also drives innovation in strategic reasoning, social reasoning, creative thinking, and more.
Limitations and challenges of AI agents
Despite the remarkable performance of LLM-powered agents across various domains, several challenges and bottlenecks remain. One significant limitation is their ability to process multimodal data, especially when integrating signals from images, audio, or sensors. As LLMs are primarily designed for text inputs, they often struggle with nonsymbolic data, which can impede decision-making in these contexts. Moreover, LLM agents face scalability and real-time decision-making challenges, particularly in low-level control tasks. While adept at complex reasoning, they often fall short in dynamic environments that demand precise, immediate responses. In high-frequency, low-latency scenarios, delays in decision-making can undermine system responsiveness and efficiency. Safety remains another critical concern, especially in high-risk applications. Without proper alignment with human values and safety protocols, LLM agents may make unintended and potentially harmful decisions. Ensuring that LLM agents make sound, secure choices in complex environments and maintain stability under uncertainty will be essential for advancing AI-agent technologies.
Intelligent decision-making with advanced deep RL
Vanilla RL treats the decision-making environment as the typical Markov decision process with complete elements that seldom belong in the real world. The several remaining bottlenecks concern efficiency, generalization, or scalability in a technical sense. Meanwhile, the success of RL’s problem-solving heavily relies on reward engineering, which is based on nontrivial expert knowledge. To this end, some high-level RL topics are being investigated to close the theoretical gap and make deep RL more tractable in practice. Overall, these topics take sample efficiency, policy transferability, credit assignment, incomplete environment, and safety into consideration. The different paradigms of RL are shown in Figure 4.
Figure 4.
Different paradigms of reinforcement learning
(A) Offline RL is a method that learns optimal or near-optimal policies using only existing historical data with little or no online interaction.
(B) Meta RL enables agents to have the ability of “learning to learn” and quickly adapt when facing new tasks.
(C) Hierarchical RL simplifies the learning process by introducing a hierarchical structure to decompose complex tasks into high-level “meta-actions” or “sub-tasks” and low-level specific execution policies.
(D) Multi-agent RL is a reinforcement learning paradigm that studies multiple agents learning optimal policies through collaboration or competition in an environment.
Offline RL
Offline RL34 refers to learning optimal policies completely from a static dataset collected from interactions with environments, which discards error-trial mode in online interactions. In other words, offline RL aims to extract and generalize knowledge inside the historical dataset and induce policies performing well under similar conditions. The practical need for offline RL arises from the infeasibility or risk of exploration in RL in risk-sensitive fields (e.g., millions of automatic transactions in financial markets). It also involves reusing valuable and expensive datasets, thereby reducing data-collection costs. Some commonly used strategies to develop offline RL include (1) behavior regularization to constrain the learned policy close to the behavior policy and reduce the risk of encountering out-of-distribution (OOD) states249,250; (2) Q-learning with uncertainty quantification and offline policy evaluation to suppress overestimation of values in less visited regions of the dataset for safety control251,252; and (3) implicit policy optimization that employs some sequence or generative modeling to learn expressive decision-making such as decision transformer or diffusion policies.253,254 Recent advancements provide a less restrictive in-support constraint for policy learning35 and value learning,255 facilitating policy optimization within the support of the behavior policy and delivering state-of-the-art performance. Despite its promise, offline RL still encounters practical challenges that need to be resolved, including the existence of OOD states and actions,256,257 the diversity and quality of static datasets,258 and robust policy evaluation techniques. Even so, some challenges worth noting lie in versatile approaches to locating meaningful subgoals, more efficient credit assignment, design of subgoal exploration strategies, and good coordination across hierarchical actors.
Meta RL
Meta RL is a paradigm considered in the distribution over MDPs, and the agent is trained to adapt rapidly to unseen but similar tasks by leveraging past experience. Intuitively, the motivation is to create a generalist agent that learns to learn in a generalizable way and avoids training from scratch in deployment. Hence, the primary technique in meta RL is to encode meta-knowledge for fast adaptation or infer the task-specific representation from a few episodes. The mentioned trait makes meta RL suitable for changing environments, such as robotic control in diverse terrains and autonomous driving in various scenarios, as it does not suffer from computational cost and sample efficiency in deployment. Some existing typical methods are based on either optimization17,259 or the context45,260,261,262 approaches. MAML17 is an optimization-based method that seeks a meta policy that can be fine-tuned in a gradient update way to adapt to new MDPs from support episodes. PEARL260 is a context-based method that learns task embedding to amortize task-specific policies. However, meta RL is computationally and sampling intensive in meta-training phases, and its generalization heavily relies on the task distribution design, similar to domain randomization. Identified as a promising direction for creating adaptable agents, meta RL requires further advancements in task distribution design, efficient meta-training, and robustness to ensure broader applicability.
Hierarchical RL
In complex decision-making, there are clear structures in the task. Hierarchical RL (HRL) manages the decision-making process in such a structure, which decomposes the task into hierarchical levels. In specific high-level policies, subgoals for low-level policies are devised to execute and achieve these subgoals. In this way, a complex long-horizon task is transformed into a series of manageable ones. For example, in an autonomous driving system, the navigation agent is at a high level to reach destinations and orders the controllers as low-level policies to enable specific steering. These decompositions allow for modular policy learning and foster sample efficiency by specifying diverse combinatorial skills across different tasks. Meanwhile, HRL benefits from temporal abstraction in tasks where decision-making must span varying time scales. Commonly used approaches involve different options frameworks, where options are temporally extended actions with their own policies and termination conditions. For example, the MAXQ framework recursively decomposes the value function into simpler, hierarchical components.263 Similarly, such an option can be extended to DQN to obtain H-DQN.264 HRL also improves exploration efficiency by allowing the agent to explore subgoals rather than actions and resembles a human being’s strategic planning in pursuing a long-horizon goal.
Multi-agent RL
MARL refers to the case when multiple autonomous agents are involved in learning and interacting with a shared environment to achieve cooperative, competitive, or mixed goals. In this case, agents are not completely isolated in decision-making, and they must adapt according to not only environmental dynamics but also other agents’ strategies. MARL can meet complicated real-world domains like swarm robots, real-time strategy games,265 distributed control systems,266 and intelligent transportation267 well. However, MARL also poses additional complexities, requiring specialized techniques to handle interactions, communication, resource partition, and strategic decision-making. These agents might reserve conflicting objectives and mandate swarm IDM. Formulated as stochastic games or Markov games, MARL aims to derive a robust solution in environments where agents can coordinate, compete, or both. Consequently, agents need to handle partial observability and diverse multi-agent interactions and seek decentralized learning in practice. The commonly used strategy in MARL is “centralized training, decentralized execution” (CTDE), which enables agents to learn policies from the shared information during training phases while acting independently during execution. The independent learning strategy reduces MARL to multiple single-agent policies without a coordination mechanism during training phases and fails to handle nonstationary cases with other agents’ changing policies.268 Value-based methods learn to decompose the joint value into individual players as the credit assignment for efficient cooperation.31,269,270 Even so, some challenges still last, including the design of communication mechanisms,271 the credit assignment, and environment nonstationarity.
The past decade has witnessed several advances in deep RL’s theory, and some applications, particularly the mentioned high-level deep RL paradigms, improve the plausibility of deep RL in complex sequential decision-making. Nevertheless, scaling deep RL to more real-world scenarios is nontrivial and still faces some technique, safety, and efficiency bottlenecks. These originate from expensive interactions with environments, unstable policy-learning dynamics, and reward design. Fortunately, some recent AI agents and superior generative models have shown some promise in alleviating these limitations, e.g., world model approximation,272 subgoal design,273 and credit assignment in a temporal and multi-agent sense.274,275 FMs and high-level RL approaches are increasingly intertwined in large decision-making models for balancing efficiency and accuracy. As decision-making models scale, the synergy between FMs and RL becomes essential for achieving advanced intelligence.
Intelligent decision-making paradigms with advanced FMs
In making sense of the neural scaling law in decision-making, we have to inevitably cover sufficient decision-making scenarios for pretraining and meta-training. However, managing these processes can often be risky, particularly with robots that may engage in hazardous actions, creating a classic chicken-and-egg dilemma. In light of this challenge, collecting a diverse and high-quality dataset, developing a counterfactual predictor, and utilizing Sim2Real or Real2Sim2Real modules276 appear to be effective strategies for addressing the limitations in decision-making scenarios. Overall, we provide an overview of the promising approaches for constructing the FDMM as shown in Figure 5.
Figure 5.
Advanced intelligent decision-making paradigms with foundation models
(A) Vision-Language-Action (VLA) integrates the hierarchical reasoning of LLM and the perceptual capabilities of vision models to decompose high-level tasks into executable subtasks, addressing computational and data bottlenecks in complex decision-making tasks caused by scenario diversity and partial observability.
(B) Learning from Videos (LfV) leverages large-scale, inexpensive online video datasets to transform raw video into structured transition trajectories for policy learning, enabling the training of generalist decision-making agents by exploiting diverse, real-world contextual information from noisy, uncurated videos.
(C) Generative Simulation (GenSim) utilizes simulation environments alongside components such as prompt-guided task proposal modules and agent modules to create diverse decision-making scenarios and optimize adaptive policies, thus minimizing dependence on expensive real-world data for training agents in complex tasks like robotics and autonomous systems.
Learning from demonstrations and vision-language-action
A natural schema for decision-making in AI is learning from demonstrations (LfD),277 whereby extensive decision-making sequences guided by expert policies across diverse skills are structured for a model from which to learn. LfD leverages demonstrations as a rich source of supervision, providing examples of desired behaviors in a given environment. The offline setup of LfD facilitates the application of probabilistic models to generate scenario-specific episodes, enabling context-dependent policy learning. These probabilistic frameworks allow for a nuanced understanding of variability and uncertainty in decision-making processes, thus supporting the development of robust policies for partially observable environments. However, a challenge arises in scaling this framework: as scenario diversity increases, the number of required interaction episodes grows exponentially, leading to significant computational and data requirements. This is further compounded by partial observability, where incomplete information hinders accurate policy derivation. Vision-language-action (VLA) multimodal FMs have emerged as promising alternatives to address these challenges. VLA models integrate the hierarchical reasoning capabilities of LLMs with the perception capabilities of vision models to tackle complex decision-making tasks. By leveraging language’s inherent structure, VLA models decompose high-level tasks into manageable subtasks. Techniques such as behavior cloning278 are employed to directly map vision inputs (e.g., image or video tokens) to corresponding low-level execution actions. Notable instances include RT-2,279 UniPi,280 and OpenVLA,281 which showcase the utility of demonstrations in training generalist agents. These models capitalize on human decision-making priors encoded in language, utilizing demonstrations to refine their multimodal understanding and action execution capabilities. Over time, demonstrations become indispensable for downstream tasks, enhancing the models’ ability to generalize across diverse environments. However, VLA models face critical limitations, including reliance on large-scale VLA datasets and multimodal decision-making episodes. This dependency restricts their scalability and generalization potential in unseen scenarios, posing a significant bottleneck in realizing their full capabilities. Research must address these data and computational challenges to ensure broader applicability and robustness in real-world tasks.
Learning from videos
High-quality and diverse demonstrations are essential for training robust decision-making agents but often require significant time and financial investment to curate. To cultivate a generalist decision-maker, identifying cost-effective sources of episodes is paramount. The Internet provides a rich repository of inexpensive videos that capture real-world interactions with objects, showcasing how the environment itself functions as a generative model. This observation underpins learning from videos (LfV),282,283 which leverages large-scale video datasets to develop a comprehensive video FM capable of inferring implicit actions. LfV focuses on transforming raw video data into structured transition trajectories. With the help of techniques such as weak supervision and unsupervised learning, LfV annotates actions and rewards indirectly to reduce the reliance on labor-intensive manual labeling. This process enables the extraction of meaningful task demonstrations from uncurated and noisy datasets. The resulting episodes can then serve as datasets for policy-learning frameworks, bridging the gap between perception and decision-making. The advantage of the LfV paradigm lies in its ability to exploit fruitful affordable data available online, significantly lowering the cost of generating training episodes. Moreover, the variety of scenarios depicted in videos introduces rich contextual information that supports learning generalized policies for complex tasks. Key advancements include the use of self-supervised learning paradigms and contrastive techniques to address challenges in action segmentation and state representation. However, LfV faces notable limitations. The inherent partial observability of video data—where critical states may not be directly visible—hampers accurate policy derivation. Noise from irrelevant objects and environmental distractors can disrupt the learning process, leading to suboptimal trajectories. Furthermore, incomplete or imprecise action spaces pose another bottleneck, as videos may lack comprehensive coverage of all possible transitions within a task. Even the latest interactive generative decision-making model Genie-2 relies on massive expert annotations on the video.282 Hence, future explorations have to overcome these challenges through refining action space, representing multimodal signals, and integrating robust data-filtering mechanisms to ensure scalability and reliability.
Generative simulation AI
In efforts to mitigate data acquisition expense, generative simulation (GenSim)205,284,285,286 leverages simulation environments to facilitate decision-making and policy-learning processes. The GenSim framework consists of two key components: the task proposal module, which generates diverse simulation scenarios guided by prompts, and the agent module, which learns to adapt or optimize policies across these scenarios. This paradigm enables the exploration of complex and diverse decision-making contexts without relying exclusively on expensive real-world datasets. In implementations such as RoboGen,284 LLMs are integrated into the task proposal mechanism, enabling the decomposition of high-level tasks into manageable subtasks. This allows agents to interact with simulation APIs, retrieve knowledge, and train low-level policies through RL techniques. Such modular frameworks enhance scalability and adaptability, making them suitable for real-world applications like robotics and autonomous systems. OMNI286,287 exemplifies this by simulating comprehensive decision-making environments that replicate a wide spectrum of real-world conditions. In GenSim, the term “generation” encompasses two pivotal dimensions: the creation of decision-making scenarios and the strategic development of policies capable of robust adaptation across tasks.205 These simulated scenarios facilitate the testing and refinement of policies, bridging gaps between synthetic and real-world environments. This iterative feedback loop accelerates policy learning, reduces dependency on real-world data, and allows for proactive experimentation in high-stakes or risky contexts.
However, GenSim faces critical challenges. The effectiveness of this framework is contingent on the accuracy and fidelity of simulators, as discrepancies between simulated and real-world dynamics can degrade policy performance—a phenomenon referred to as the Sim2Real gap.288 Moreover, as noted in Real2Sim2Real frameworks,276 the complexities of transferring policies between simulated and real-world contexts add another layer of difficulty. Ensuring robust generalization, designing high-fidelity simulators, and validating policy performance in dynamic real-world environments remain pressing research directions for GenSim. By integrating LLM-guided task design, scalable simulation modules, and RL techniques, a GenSim such as Genesis represents a promising direction for cost-effective and adaptive decision-making research.205 However, addressing the technical and conceptual bottlenecks will be key to fully realizing its potential.
Key technologies for large-scale IDM
In this section, the key technologies of large-scale IDM are introduced. To more clearly illustrate how these technologies work together, a framework is first presented, as shown in Figure 6.289 This framework outlines the core technical components involved in the decision-making process of an intelligent agent and their inter-relationships, providing an intuitive perspective to understand the roles of various technological elements in IDM.
Figure 6.
Examples of key technologies for intelligent decision-making
The large-scale IDM system can be viewed as an “Agent,” whose core technical framework can be broadly divided into four modules: Memory, Planning, Tools, and Action. The Agent acquires information about the external environment through the Perception module and stores it in both short-term and long-term memory. The Planning module generates decision plans based on the current environmental state and historical information, while the Tools module provides external resources, such as computation and search functions, to assist in the decision-making process. Finally, the Agent executes the corresponding actions based on the planning results. This process affects the state of the environment, and through environmental feedback it further adjusts the Agent’s behavior strategy.
Through the integrated application of “memory,” “planning,” “tools,” and “action,” large-scale intelligent decision-making systems can efficiently handle complex tasks, achieving automation and optimization of the decision-making process. The key technologies of each aspect and their roles in IDM will be discussed in detail.
By integrating “memory,” “planning,” “tools,” and “action,” large-scale IDM systems can efficiently handle complex tasks, enabling the automation and optimization of the decision-making process. For example, in autonomous driving, the agent uses memory to recall traffic rules and historical information, plans routes, and applies tools (such as sensors and map data) in real time to determine the best driving strategy. In intelligent grid management, the agent utilizes historical data and real-time grid status to perform power scheduling and optimization, ensuring the efficient and stable operation of the grid. Here we explore each technical module and its role in IDM in more detail.
Memory modules
In large-scale IDM technologies, memory technology optimizes the decision-making process by accumulating historical experiences, improving efficiency, and reducing errors. It enables systems to learn and improve in complex environments, accelerates learning through experience replay, and supports personalized and context-aware decision-making. Additionally, memory enhances the stability and interpretability of the system, allowing decision traces to be reviewed and improving transparency. In multi-agent systems, memory facilitates knowledge sharing and collaborative decision-making, enhancing overall performance. It also helps the system detect anomalies, diagnose issues, and adjust strategies in a timely manner, thereby strengthening the system’s robustness and adaptability.
Early explorations of memory mechanisms largely focused on model design and algorithm optimization, aiming to identify efficient methods for storing and utilizing historical information to improve task performance.290,291 Recurrent neural networks (RNNs),292 as one of the most classic and representative methods, enable the model to possess memory capabilities by cyclically passing information through hidden states across time steps. However, due to the problem of vanishing gradients, RNNs struggle to effectively learn long-term dependencies. This limitation results in RNNs being able to retain information only within short time ranges, performing poorly on tasks involving long-term dependencies.
To address this issue, scholars have proposed numerous improvement methods.293,294,295 Among them, LSTM networks293 introduced a gating mechanism on top of RNNs, which helps mitigate the vanishing gradient problem to some extent and significantly improves the ability to model short-term memory. As a result, LSTMs have been widely applied in time-series analysis and NLP tasks.296,297 However, LSTM still faces challenges such as a lack of parallelization and limited ability to model long-range dependencies, leading to inefficiencies when handling long-sequence tasks.
However, the aforementioned works struggle to meet the demands of long-term memory. To address this challenge, researchers have conducted extensive explorations into developing methods to enhance the modeling and utilization of long-term dependencies.298,299,300 Among them, memory networks298 propose a neural network architecture with explicit long-term memory storage, enabling knowledge storage and retrieval. This method represents an early successful exploration of explicit long-term memory modeling and laid the foundation for subsequent memory-augmented neural networks. By incorporating an external memory component, memory networks allow the model to access and utilize stored information more effectively, improving its performance in tasks requiring long-term dependencies and complex reasoning. Otherwise, Transformer58 utilizes the self-attention mechanism to dynamically allocate attention weights, capturing dependencies within the input sequence and breaking the limitations of traditional sequential time steps. This significantly enhances the model’s parallel computation capabilities. The attention mechanism dynamically selects key information relevant to the current task, reducing computational redundancy and enabling intelligent agents to make rapid decisions in a short time frame. This architecture has laid the groundwork for numerous large-scale pretrained models (e.g., GPT and BERT), establishing the widely adopted pretraining-and-fine-tuning paradigm in modern machine-learning research and applications.
Since the introduction of Transformer, numerous improvement methods have been proposed based on its architecture.301,302,303,304 Among them is the Compressive Transformer, proposed in 2020,301 combining short-term and compressed memory to preserve historical context in long-sequence tasks, thereby enhancing Transformer’s performance in memory modeling. In the same year, a paper on RAG305 proposed a framework that augments language models with external database retrievals to supplement their limited knowledge. This framework facilitates long-term dependency modeling and knowledge-augmented task generation, effectively overcoming the knowledge memory bottleneck in language models. In recent years, Mamba306 and similar works have introduced a novel memory modeling framework by incorporating a selective state space model and linear recursion mechanisms. This approach facilitates efficient state updates with low computational complexity, showing potential usefulness in enhancing the performance of sequence-based tasks and contributing to long-term dependency modeling.
Planning and control technologies
The key technologies of IDM rely on reasonable planning methods, the use of tools, and the execution of actions. The planning technology designs the optimal action plan by reasonably analyzing task requirements and constraints, ensuring the effectiveness and feasibility of the decision; the tool-usage technology provides the necessary auxiliary tools and resources for the agent, enabling it to efficiently process information, solve problems, or complete tasks; the action execution technology ensures that the agent can accurately and quickly perform specific operations according to the planned scheme, ultimately achieving the desired goal. The organic integration of these three elements enables IDM to be efficient and precise.
In the past, researchers have conducted extensive explorations on planning problems.307,308,309 Among them, The HTN (hierarchical task network) method307 achieves planning by hierarchically decomposing tasks, gradually refining high-level goals into lower-level specific operational steps to form executable planning paths. As one of the classic representatives of early planning technologies, this method has been widely applied over the past decades.
With the integration of machine learning and environmental modeling, World Models,310 proposed in 2018, introduced the concept of agents planning and making decisions by learning latent representations of the environment. This approach simulates environmental states using a world model, overcoming the challenges of complex decision-making in dynamic environments. It significantly enhances the generalization ability and efficiency of task planning and has been widely applied in strategy games and RL tasks.
In addition to the aforementioned technologies, researchers have also begun exploring methods of “tool usage,” aiming to enhance the planning and execution capabilities of intelligent agents by enabling them to learn how to effectively use existing external tools. Some studies311,312 integrated neural networks with modular tools for logical reasoning and complex task decomposition, demonstrating the potential of models to utilize tools for task completion. The recently proposed Toolformer,151 through self-supervised learning, trains models to autonomously decide when to call APIs, which parameters to pass, and how to integrate the API results into the language model’s text predictions. This approach significantly improves the model’s zero-shot learning capabilities across various tasks. However, Toolformer suffers from limitations such as a fixed tool invocation mechanism, insufficient contextual adaptability, and limited generalization capability for complex tasks. ToolLLM313 constructed an instruction-tuning dataset called ToolBench for tool usage and proposed a depth-first search decision tree method, enabling open LLMs to utilize over 16,000 real-world APIs. This approach addresses the limitations of open models in tool invocation and complex task execution, significantly enhancing their reasoning and generalization capabilities. However, ToolLLM excels only in narrow task domains, such as specific operations within a certain class of tools, and struggles with multitask, multidomain complex interactive tasks. In contrast, the recently proposed OS-Copilot314 constructs a general framework capable of comprehensive interaction with operating systems, enabling agents to autonomously operate across multiple domains such as web pages, terminals, files, multimedia, and third-party applications. This approach addresses the limitations of current agents in task scope and tool adaptability, providing a technical foundation for building general-purpose OS-level agents and advancing their evolution from tool invocation to multitask, multidomain adaptability in open environments.
With the development of LLMs such as GPT-4, planning techniques that integrate reasoning and acting have become a new research focus. Among these, Google DeepMind’s ReAct315 enables efficient planning by combining natural language-based CoT reasoning with tool utilization. This approach leverages the reasoning capabilities of LLMs to break down complex tasks and directly interact with tools or environments to execute actions, paving the way for more advanced and versatile planning systems. In recent years, the world knowledge model method316 has been proposed, which integrates prior task knowledge with dynamic state knowledge. This approach significantly enhances an agent’s global planning and dynamic adaptation capabilities in complex environments, achieving substantial performance breakthroughs across various tasks. In addition, ReAct315 proposed a method that combines CoT reasoning with dynamic tool usage, effectively integrating the reasoning and execution capabilities of LLMs to enable efficient action execution in decision-making tasks. Voyager242 further advanced exploration and skill reuse in open-world environments. By leveraging LLMs, Voyager generates action plans and code through natural language, dynamically adapting to task objectives, and incorporates a long-term memory mechanism to store and reuse skills. It demonstrated autonomous exploration and task execution in the open-world game Minecraft, addressing challenges related to autonomous exploration, skill acquisition, and continual learning in open environments. This significantly enhanced the ability of agents to generate and optimize actions in dynamic settings.
Simulation technology and its pivotal role in IDM
Simulation replicates real-world processes or systems using mathematical formulas, physical models, machine-learning algorithms, computer-generated representations, or their combinations, enabling the study of complex behaviors, underlying characteristics, and emergent phenomena.317 Through its unique capability to explore cause-and-effect relationships and model-based scenarios, these “what-if” analyses serve as invaluable tools for evaluating potential outcomes and informing future decisions.318 In addition to directly supporting decision-making, simulation technology plays an indispensable role in IDM paradigms such as RL and FMs, serving multiple critical functions such as providing optimized learning environments, generating vast amounts of data, and enabling comprehensive testing and evaluation. As a result, this technology has been extensively adopted for analysis and decision-making across diverse sectors, especially for complex systems including transportation, social systems, economics, military operations, and energy management.319 Despite the controllable analyses enabled by simulation-based approaches for supporting decision-making, current methods face several limitations in real-world applications: (1) balancing computational efficiency and simulation accuracy is difficult; (2) designing simulation environments to improve a model’s generalization ability in unseen scenarios is challenging; and (3) coupling mechanisms of decision variables remain unclear. Fortunately, simulation, as an interdisciplinary solution, has continuously embraced cutting-edge information/communication and AI technologies to advance itself. Examples include knowledge-data jointly driven modeling, multimodal and multitask simulation, and computational experimental methods,320 which have given rise to innovative simulation concepts such as parallel intelligence,321 generative simulation,205 and digital cousins.322 Notably, incorporating LLMs into modeling and simulation has revitalized this approach, gaining widespread attention and advancing IDM in complex systems.
Simulation-based intelligent decision-making
The US Department of Defense defined simulation as the use of models—physical, mathematical, or other logical representations of systems, entities, phenomena, or processes—to replicate the operations of real-world processes or systems over time, with the goal of supporting management or technical decision-making.323 It enables experimentation, hypothesis testing, and scenario analysis, offering valuable insights into system behaviors under varying conditions and supporting decision-making processes across various fields. Advancing technology has propelled simulation forward, giving rise to innovative concepts like parallel intelligence,321 LLM-empowered agent-based modeling and simulation,324 and simulation intelligence decision generation,319,325 all closely tied to intelligent decision-making. Based on the way used to support decision-making, the motivations of simulation can be categorized as follows. (1) Simulation-based prediction aims to explore the solution space by analyzing the future trends of one or more variables, facilitating future decisions. For instance, an interactive individual-based simulator was created to predict the future spread of an epidemic through multisource information fusion during the COVID-19 outbreak.326 (2) Causal reasoning involves conducting hypothetical experiments by altering externally applied interventions, enabling decision adjustments based on experimental results, and supporting intervention and management. For example, to reach a Pareto optimum with respect to efficacy versus costs, Zhu et al.327 proposed a universal computational experiment framework with a fine-grained artificial society integrating with functional data-based models. The purpose of the framework is to evaluate the effects of different interventions. (3) Emergence discovery leverages multiscale simulation to study emergent behaviors and element coupling mechanisms, gaining new knowledge to enhance decision-making. Taking epidemic control in large transportation hubs as an example, by developing individual-level mobility models and contact networks, the spread of infectious diseases can be accurately modeled and characterized.328 It is found that the increase in cumulative incidence exhibits a linear growth mode, different from that (an exponential growth mode) in a static network of a city, which can be leveraged to devise more effective control strategies.
Simulation-enhanced intelligent decision-making
Simulation also plays a vital role in the training, learning, and evaluation of multi-agent RL and embodied agents. It provides safe, efficient, and customizable environments, generates large-scale training data, and enables comprehensive assessments of model performance and generalization. Moreover, it offers critical support for model optimization and predeployment testing (Sim2Real) in real-world applications.329 Based on the way used to aid IDM paradigms such as RL and FMs, the motivations of simulation can be categorized as follows.
-
(1)
Provide a safe, low-cost, and customizable testing or interactive environment and generate training and testing data to accelerate the learning process. For example, in the fields of autonomous driving and robotics, testing in real-world environments may involve safety risks and high experimental costs. Simulation environments can be customized to create various conditions as needed, enabling the safe and cost-effective simulation of various scenarios, including rare, hazardous, or difficult-to-reproduce situations in real-world settings. This allows for comprehensive testing of a model’s robustness and adaptability. Typical platforms include TongVerse,330 Isaac Sim,331 and Genesis.
-
(2)
Evaluate the generalization ability of models and support various evaluation metrics. Simulation allows models to be tested across diverse scenarios that mirror real-world conditions, helping to verify their ability to generalize from simulation to reality. The introduction of generative simulation205 has made this process more cost-effective and efficient. Moreover, simulation environments support various evaluation metrics, such as average reward, best single-instance reward, and sample efficiency, enabling a thorough and reliable assessment of a model’s performance. For example, RL-CycleGAN, trained in simulation and validated in real-world robotic grasping tasks, has showcased exceptional results.332
-
(3)
Optimize RL through human feedback to achieve human-machine value alignment. In RLHF, simulation can be used to generate post hoc feedback to evaluate whether a model’s behavior is truly beneficial. For example, reinforcement learning from hindsight simulation (RLHS) was introduced to simulate plausible consequences and then elicits feedback to assess what behaviors were genuinely beneficial in hindsight, thereby reducing inconsistencies between the model’s actions and human values.333 Experimental results show that RLHS consistently outperforms RLHF in helping users achieve their goals and earns higher satisfaction ratings.
Open challenges and future directions
In this discussion, we explore the open challenges and future directions of simulation-based IDM across the following aspects. (1) Data and knowledge jointly driven modeling and simulation. The gap between simulated and real environments means that models showing excellent performance in simulations may not maintain the same level of effectiveness when deployed in real-world situations. Thus, it is nontrivial to jointly utilize the information at different scales to build the simulation environment. Knowledge-based methods are constrained by the cognitive capabilities of its time, often failing to precisely capture the evolution mechanisms of complex systems. In contrast, data-driven methods rely on the quantity and quality of data samples. Their effectiveness may be greatly affected in the scenario where the observation data are not covered. Thus, using the advantages of both methods and studying data and knowledge jointly driven modeling and simulation is an important tendency for future research. (2) Tradeoff between the computation efficiency and simulation accuracy in large-scale simulation systems. Simulation environments face challenges in efficiency, scalability, and resource consumption when scaling up to large and complex scenarios. Optimizing simulation environments to support large-scale training and testing is a pressing challenge, particularly for models that rely on API-based commercial LLMs. Thus, studying system-level (e.g., computational task optimization) and prompt-level optimization (novel prompt strategy) to ensure accurate results while reducing the running time is another issue worth investigating. Beyond the points mentioned above, several other issues have attracted widespread attention and merit investigation, including constructing open scalable simulation platforms, large-model-empowered modeling and simulation workflow, achieving continuous learning in a simulation environment, multimodal and multitask simulation, and more.
By tackling these open challenges and exploring future directions, simulation technology holds the potential to significantly enhance IDM and propel further advancements in AI.
FM-based IDM for sciences
With the continuous advancement of AI technology, FMs have become increasingly integral in the field of science, significantly contributing to the enhancement of scientific research and decision-making capabilities. This section explores the applications of FMs across diverse scientific domains, detailing their roles in bolstering research capabilities and the quality of decision-making processes in information science, mathematical science, life science, healthcare, dentistry, urban science, agricultural science, economic science, and educational science. The outline is shown in Figure 7.
Figure 7.
Foundation-model-driven intelligent decision-making in multidisciplinary sciences featuring the core roles of FM in intelligent decision-making, supported by diverse data types for training, and showcasing applications in key scientific fields such as information science, mathematical science, life science, healthcare, dentistry, urban science, agricultural science, economic science, and educational science
Information science
FMs, pretrained on large-scale data with self-supervised learning, have demonstrated strong generalization across diverse downstream tasks in information science.50 Their capacity for transfer learning enables success in domains such as inference,334 control,335 planning,336 and searching,50 with applications spanning robotics, automation, remote sensing, communications, and power systems. For example, FM-driven models empower robots to operate in the real world and support human decision-making through data-driven insights. Unlike task-specific models, FMs can generalize to unseen problems by leveraging shared features across tasks, which enables in-context learning and cross-modal processing.48 Gato,337 for instance, acts as a generalist agent that can chat, caption, play games, and control robots.334 By integrating diverse datasets, FMs enhance sequential decision-making in areas like Atari games,11 board games,338 and robotic tasks,339,340 holding significant promise for future intelligent systems.
Generalizable robotics and autonomous systems
Prior to the emergence of FMs, deep learning in robotics relied heavily on the task-specific datasets, which constrained both flexibility and scalability.341 Traditional robotic systems required manually curated datasets tailored to specific tasks, limiting their ability to generalize to complex or unfamiliar environments. FMs, however, have transformed this paradigm by leveraging large-scale pretraining on diverse datasets, followed by task-specific fine-tuning. This approach allows FMs to learn transferable representations, enabling robots to extract high-level semantic features from raw sensory inputs and apply them to various decision-making processes. Specifically, techniques such as in-context learning and instruction tuning empower robots to infer task objectives from natural language prompts or multimodal cues rather than requiring explicit retraining. One of the most transformative features of FMs is their zero-shot capability, which is achieved through contrastive learning and prompt-based adaptation. These mechanisms allow robots to generalize to unseen tasks without task-specific training, significantly improving their adaptability in unstructured or novel environments.342 Notable examples of FMs, such as BERT,59 GPT-3,61 GPT-4,13 CLIP,14 DALL-E,62 and PaLM-E238 demonstrate the versatility of these models in robotics. BERT, originally designed for natural language processing, helps robots decode complex semantic information, especially for multistep language instructions. GPT-3 and GPT-4, known for their natural language reasoning and generation abilities, allow robots to process user commands and create multistep action plans in dynamic environments. CLIP aligns text with visual representations and enables robots to identify and interact with objects based on textual descriptions. DALL-E enhances visual tasks by generating synthetic environments for task rehearsal and route planning. In multimodal reasoning, FMs integrate heterogeneous sensory data into unified representations and improve the robots’ perception, reasoning, and decision-making.343,344,345,346,347,348,349 FMs’ ability enables robots to link textual commands with objects, locations, and actions and facilitates spatial reasoning and task execution in real-world settings. For instance, PaLM-E integrates data from visual, linguistic, and sensory inputs, equipping robots with robust reasoning capabilities for complex scenarios.238 Recent advancements in robotic swarm intelligence further illustrate the impact of FMs. Unlike traditional swarm robots that rely on predefined communication protocols and task-specific planning strategies, human-like swarm behavior emerges when leveraging LLM DeepSeek for reasoning and communication.350 In a decentralized multirobot system where each agent initially possesses only local information and is unaware of others’ existence, FMs enable robots to discover peers, exchange information, and coordinate dynamically using natural language. Experimental results in zero-shot settings reveal emergent social behaviors such as collaboration, negotiation, and mutual error correction, mimicking aspects of human teamwork. This novel approach highlights how FM-driven agents can form interactive societies, advancing the study of “robot anthropology” and shedding light on emergent collaborative structures in autonomous systems.
FMs’ ability to transfer knowledge from pretraining reduces the training time and computational resources required compared to traditional models. In imitation learning, FMs leverage expert demonstrations in visual or textual formats to generate high-quality strategies.351 In RL, FMs utilize language-driven reward mechanisms to optimize policies and improve task performance with fewer iterations.352 Moreover, large vision-language models (VLMs) assist robots in visual question answering and generate descriptive labels for visual content, which simplifies data annotation and task execution.353 Through fine-tuning, FMs adapt to various robotic applications, such as autonomous systems, household assistants, industrial automation, and multirobot coordination.354 These advancements highlight the transformative impact of FMs in enhancing cross-modal reasoning and bridging user intent with machine actions. The integration of FMs in robotics represents a significant milestone in the development of autonomous systems. Unlike traditional rule-based or task-specific learning approaches, FMs leverage large-scale pretraining on multimodal data, enabling them to generalize across diverse scenarios. By utilizing transformer-based architectures and self-supervised learning, FMs can parse and infer user intent from natural language commands, ensuring that autonomous systems align closely with human objectives. Specifically, techniques such as prompt engineering and instruction tuning enable FMs to dynamically adjust their responses based on contextual cues, improving decision-making in dynamic environments.
Beyond understanding commands, FMs enhance IDM in robotics through structured reasoning and few-shot learning. They achieve this by leveraging cross-modal embeddings, which allow autonomous systems to correlate sensory inputs (e.g., vision, language, and motion data) and make context-aware decisions.355 For example, models like GPT-413 and PaLM-E238 have demonstrated proficiency in processing intricate language instructions and translating them into executable robotic actions with high accuracy. This transformation is facilitated by contrastive learning and RL techniques that fine-tune the model’s response patterns based on real-world feedback. Furthermore, the reliability of these models depends on both the quality of their learned representations and the optimization of prompt structures, reinforcing the importance of fine-tuning strategies for specific downstream tasks. This deep integration of FMs in robotics establishes them as a foundational technology for the future of autonomous systems.
Multimodal understanding in remote sensing
Remote sensing technology has advanced significantly in recent years, driven by the development of diverse sensors, including optical, thermal, and radar, which enable the collection of high-resolution data about the Earth’s surface. Optical sensors capture visible and near-infrared light for vegetation and land cover analysis, thermal sensors detect heat signatures for monitoring volcanic activity and climate change, and radar sensors provide crucial data in extreme weather conditions for tasks such as soil moisture estimation and urban infrastructure mapping.61,356,357 FMs enhance remote sensing by integrating large-scale multispectral and multitemporal data through self-supervised learning techniques. Unlike conventional remote sensing models that require task-specific feature engineering, FMs leverage transformer architectures to learn spatial and temporal correlations across diverse sensor modalities. For instance, VLMs pretrained on satellite imagery and geospatial descriptions enable zero-shot classification and segmentation of land cover changes without requiring extensive labeled datasets. Additionally, contrastive learning techniques allow FMs to align satellite images with textual descriptions, improving their ability to extract meaningful patterns from heterogeneous remote sensing data. These capabilities significantly enhance the efficiency and accuracy of tasks such as deforestation monitoring, disaster response, and climate modeling, demonstrating the transformative impact of FMs in remote sensing applications.
The applications of FMs in remote sensing tasks, such as the scene classification, the semantic segmentation, the object detection, and the change detection, has substantially improved the performance and set new benchmarks in this field. Initially, the CNNs, such as ResNet,72 were used to improve image recognition and classification tasks. Later, the introduction of transformers, which utilize the self-attention mechanisms to model long-range dependencies and enable more effective handling of large-scale image data.58,358 In remote sensing, FMs’ ability to leverage self-supervised learning techniques allows them to learn robust representations even without extensive labeled datasets and enhances their versatility.359 Satellite Masked Autoencoder (SatMAE), which is a model designed specifically for temporal and multispectral satellite imagery, is a notable contribution to this field.360 By employing masked autoencoders, SatMAE learns both spatial and temporal representations of the satellite images and makes it especially effective for tasks like change detection, since understanding the evolution of a region over time is crucial. Another significant development is the Scale-MAE, which incorporates scale-aware learning into the autoencoder framework and enables the model to capture the geospatial representations at multiple scales.361 This ability is crucial for applications such as urban planning, where both macro- and micro-level details are important for the infrastructure mapping and land-use classification. Furthermore, DINO-MC enhances the capability of FMs in remote sensing by improving the global-local alignment through a self-supervised learning approach.362 DINO-MC extends contrastive learning methods to align global features with local image patches, which enhances performance in the object detection and the scene classification tasks. By leveraging the power of FMs, these models offer significant advancements in processing the complex remote sensing data, driving improvements in the environmental monitoring and the planning of urban development. Despite these models facing challenges such as the need for high-quality, diverse datasets and substantial computational resources, the progress made by FMs marks a new era in remote sensing, setting new benchmarks for both accuracy and efficiency in the remote sensor field.
Efficient intelligent manufacturing
Traditional machine-learning approaches encounter substantial challenges in processing multimodal data within intelligent manufacturing systems, as they typically rely on task-specific feature engineering and require individually designed models tailored to specific production lines or distinct manufacturing processes. This paradigm imposes significant computational and labor costs while limiting model generalization across diverse industrial scenarios. Furthermore, traditional ML models often struggle to integrate heterogeneous data sources effectively, restricting their capacity for cross-modal reasoning and adaptive decision-making. In contrast, FMs leverage cross-modal embedding representation learning to map multimodal data into a unified vector space, facilitating seamless information fusion and collaborative decision-making across different modalities. Through large-scale pretraining on diverse datasets, FMs acquire robust zero-shot and few-shot learning capabilities, enabling them to generalize to previously unseen tasks with minimal adaptation. These properties significantly enhance their scalability, adaptability, and overall performance in dynamic and complex industrial environments, offering a promising direction for the development of intelligent, data-driven manufacturing systems. Traditional deep-learning models in prognostics and health management often face challenges such as limited generalization, difficulty in handling multimodal data, and the inability to perform multitasking, which hampers their application in dynamic industrial environments. Leveraging their ability to capture long-term dependencies, GPT-like models excel in processing diverse sensor data streams, such as vibration,363 sound, current,364 voltage,365 temperature,366 and pressure.367 For instance, the Time Series Transformer (TST) integrates time-series tokenization and Transformer architectures and significantly outperforms conventional CNNs and RNNs in fault mode recognition for rotating machinery.368 Furthermore, by incorporating domain knowledge through prompt engineering, FMs enhance both the quality and accuracy of outputs without altering model architecture.369 The VS-LLaVA pipeline370 was extended by applying LLMs to signal parameter identification and fault diagnosis, yielding substantial performance improvements.
In parallel, the paradigm of intelligent manufacturing is transitioning from machine-centric to human-centric models, with human-robot collaboration (HRC) enabling greater flexibility and efficiency in multivariety, small-batch production.371 Despite its potential, HRC is constrained by task-specific limitations and the need for retraining when encountering novel objects. FMs, with their robust reasoning and generalization capabilities, address these constraints, making them ideal for diverse HRC tasks. Initial research enhanced robot perception using computer vision techniques like gesture recognition and motion pattern encoding,372,373,374,375 while recent efforts have shifted toward generalizable task execution frameworks. For example, the Robotics Transformer is trained on large-scale, task-agnostic datasets to achieve generalization across diverse robotic tasks.376 Furthermore, the FM was integrated with vision foundation models (VFMs) for scene perception and LLMs for task reasoning, creating a pipeline that generates and executes control code, enabling robots to handle previously unseen tasks with language and visual guidance.
Driving the intelligence of next-G communications
The technical challenges in communication networks stem from their dynamic nature, complexity, and increasingly diverse demands, encompassing network configuration and enhanced security.377,378 FM, with their powerful capabilities in multimodal data processing, generalization, and contextual understanding, offer the potential to collaboratively solve these issues across these domains, providing crucial technical support for the intelligent and efficient operation of future communication networks.
Network configuration involves setting parameters for various devices within the network, such as switches, routers, servers, and network interfaces, to ensure reliable data transmission from the source to the destination. A flexible and efficient network configuration framework is a critical technology supporting resource scheduling, traffic management, and service optimization in next-generation (next-G) communications. CloudEval-YAML379 is proposed as a benchmark tool for YAML configurations in cloud-native applications, with 12 LLMs analyzed for generation quality, task performance, and cost efficiency. It addresses the lack of standardized benchmarks, aiding LLM application and optimization in cloud environments. Leveraging generative mechanisms such as autoregressive generation (e.g., GPT-4) or diffusion models (e.g., DiffusionBERT380), verified prompt programming381 enhances the accuracy of network configuration by integrating GPT-4’s generation capabilities with structured prompts and human-in-the-loop validation. Through iterative refinement of model-generated code via prompt engineering and manual corrections, this approach ensures more precise and reliable automated configuration processes. With the advancement of large models, LLM-driven end-to-end approaches are emerging as key to intelligent network configuration. Furthermore, a general framework is proposed for fully automated network management systems, eliminating the need for manual validation.382 Leveraging natural language and LLM-generated code, this approach utilizes prompt engineering to integrate domain knowledge with general program synthesis techniques, ensuring the generation of high-quality network configuration code.
Advancing telecom technologies brings increased complexity and interconnectivity, heightening the sophistication and variety of network attacks. This makes network security and attack detection especially important. Bayer et al.383 developed a high-quality cybersecurity dataset and proposed a domain-specific language model as a foundational component to enhance understanding of specialized knowledge and technical terms. Furthermore, SecureBERT is designed to capture the semantic meaning of cybersecurity texts like Cyber Threat Intelligence.384 It was trained on a large corpus of cybersecurity-related content and maintains its general semantic understanding while being tailored for evaluation in various cybersecurity tasks. In contrast to the approaches that fine-tune pretrained LLMs, SecureBERT is developed by building a security-specific LLM from the ground up.385 The model is a network threat detection method based on the BERT architecture, utilizing privacy-preserving fixed-length encoding and byte-level byte-pair encoding tokenizers to process network traffic data. With a compact model size of only 16.7 MB and an inference time of less than 0.15 s on a standard CPU, the model demonstrates remarkable efficiency. Furthermore, it outperforms traditional machine-learning and deep-learning methods in identifying 14 distinct attack types, achieving an overall accuracy of 98.2%.
Advancement in power systems
Electricity, a key component of the energy system, deeply affects our daily lives in various ways. To promote global electrification and achieve carbon neutrality, it is imperative to build highly efficient, flexible, and interconnected power systems. Currently, new technological breakthroughs, such as the Internet of Things (IoT) and AI, have brought exciting development opportunities and critical challenges to the digital and intelligent transformation in the power industry.
Recently, AI-based large model technologies such as LLMs have made remarkable progress, showing promising potential in a wide range of global industries.386 Represented by the generative pre-trained transformer (GPT) family of OpenAI, the latest GPT-4 effectively improves the performance of large models by deepening the Transformer architecture and innovative pretraining strategies, largely promoting their applications in broad domains. The iterative technology evolution in LLMs profoundly influences the power industry and promotes the development and application of potential power large models.
In power systems, traditional data acquisition often relies on the feature selection according to previous experiences, which are inefficient and subjectively selective. In contrast, automated data analysis based on AI large models breaks the limitations of manual selection methods by learning from large amounts of variable data.52 The massive data and the multifactor complexity of power systems also provide excellent opportunities for training and using AI large models. They are able to extract information from datasets that converge from smart terminals into the cloud through feature selection by learning algorithms, finally improving the predictive accuracy of analytical models. At present, large-models-based IDM technologies have been preliminarily used for the intelligent diagnosis, operation, and maintenance of electrical equipment. Their powerful data-processing capability, self-learning ability, and analysis-warning function effectively solve many problems such as the insufficient diagnostic accuracy, delayed response, and high cost of operation and maintenance in traditional technologies. Meanwhile, through deep-mining operation data of electrical equipment, IDM by large models can warn of potential faults in advance, achieve accurate localization of fault sources, and optimize operational strategies to improve operation and maintenance efficiency.
For instance, the State Grid Corporation of China has launched an AI auxiliary power decision-making system based on numerous basic data and evaluation models.387 With the evaluation standard of distribution network equipment, this system established a comprehensive evaluation framework by integrating static equipment parameters with dynamic operation data, enabling the assessment of each piece of main equipment in the station or on the line. Accordingly, intelligent decisions could be made for contributing to station inspection, operation, and maintenance strategies. Impressively, China Southern Power Grid has also developed a multimodal power model, “Big Watt,” which employed AI technologies to analyze various data such as grid operation information, user load, weather forecast, and terminal detection, consequently providing detailed analysis and prediction information for the power system’s operation and maintenance.388 Such a large model can recognize different typical defect hazards in the distribution electric grid and enables fast and accurate response suggestion under emergencies or unforeseen circumstances, thus greatly improving the resilience and adaptability of power grids and systems. In addition, the ABB Ability data platform has adopted AI with, e.g., cloud computing, big data, and 5G, to establish an information cross-fertilization power assistance system for calculating and analyzing the collected data, which realized fault analysis and remote diagnosis of power equipment and improved the efficiency of power systems’ intelligent operation and maintenance.389 Similarly, a company in Switzerland, Alpiq, has launched the Grid Sense system, utilizing AI technologies to analyze the electrical load, system fault, and power detection in power systems.390 It closely integrated advanced information technologies with power systems to address a series of problems such as high labor costs, high work intensity, and poor inspection results that existed in manual inspection methods. These successful cases demonstrate the powerful function of AI large models with IDM’s ability in status monitoring, fault prediction, and other aspects, significantly enhancing the safety and reliability of power systems.
With continuous deepening and popularization of AI technologies, the IDM-assisted techniques for power system operation and energy management are moving in the direction of refinement, real time, and collaboration. Recent advancements also showcase that future power systems will rely on AI large models closely for deep integration and intelligent analysis of massive complex data so as to realize the dual enhancement of operation/maintenance efficiency and grid resilience. It is vastly expected that large models will greatly promote the power systems toward a more intelligent, reliable, and green direction.
Data quality and data availability
In the field of computer and information science, particularly in industrial systems and next-G communication, data (e.g., equipment failure records, sensor measurements, and network latency391,392) are often difficult to obtain and frequently contain significant noise,393 missing values, or incorrect time stamps. Additionally, existing datasets in next-G communication typically suffer from insufficient volume and task specificity, as current datasets are often constructed for specific tasks (e.g., traffic prediction or network optimization) and lack comprehensive coverage of the complex scenarios encountered in communication networks. FMs heavily rely on large-scale and diverse datasets during training, which poses challenges in these domains. The scarcity, low quality, and task specificity of data significantly limit the training and application of models in industrial systems and next-G communication. Therefore, a key challenge is how to leverage generative FMs (e.g., synthetic data-generation techniques) to augment data or employ few-shot learning approaches (such as transfer learning and meta-learning) to achieve efficient learning with limited data.
Deployment
In the fields of industrial robotics and next-G communication, platform resources are often limited, especially in edge devices or mobile terminals. Deploying FMs typically requires powerful computational hardware, particularly GPU or TPU clusters, making the deployment of FMs in resource-constrained environments a challenge. Additionally, industrial operations and next-G communication networks demand real-time or near-real-time data-processing capabilities, which place higher demands on the model’s inference speed. Therefore, effectively compressing FM models (such as pruning and quantization) and optimizing low-latency inference are key challenges for practical application.
Mathematical science
FMs leverage large-scale pretraining to extract generalized mathematical patterns from diverse datasets, enabling novel approaches to traditional problems like optimization, statistical inference, and pattern recognition.36,394 Their effectiveness stems from core mathematical principles: Linear algebra structures neural networks through matrix operations and high-dimensional transformations.395,396 Calculus enables gradient-based optimization and probabilistic integration.397,398 Probability statistics support uncertainty quantification via Bayesian inference and hypothesis testing.399,400 These mathematical foundations not only enable FM development but also benefit from FM-driven insights—creating a synergistic cycle where theoretical advances inform model architectures, while model behaviors reveal new mathematical questions. We systematically examine this interplay through FM’s model architecture and training, optimization techniques, applications, and challenges.
Model architecture and training
Understanding the architectural choices and training paradigms of FM is crucial for leveraging their potential in mathematical sciences, where their ability to process complex structures and extract meaningful patterns underpins significant advancements. Neural networks, composed of interconnected layers of neurons,401 provide a flexible and powerful framework for approximating nonlinear and high-dimensional functions.
Within the neural networks frameworks, specialized architectures such as CNNs, RNNs, and feedforward neural networks cater to specific data structures and problem domains. CNNs, for example, excel in processing grid-like data structures by extracting localized features, making them indispensable for tasks like image analysis or spatial data processing. RNNs, on the other hand, are designed for sequential data, capturing temporal dependencies and uncovering patterns across time steps, although they often encounter challenges with long-range dependencies due to vanishing gradients.402 Feedforward networks, as the simplest variant, are highly effective for problems involving static input-output mappings, demonstrating the versatility of neural network architectures.403 In contrast, Transformers have revolutionized the FM landscape by addressing the limitations of traditional sequence-based models like RNNs.404 Central to their success is the self-attention mechanism, which enables Transformers to process entire sequences in parallel, efficiently capturing long-range dependencies and scalability.405,406 This innovation has proven invaluable in mathematical sciences, where Transformers excel in parsing symbolic data, solving complex equations, and identifying intricate patterns in large datasets. Their ability to handle diverse tasks with precision has positioned Transformers as a cornerstone for advancing computational methods in the field.407
Optimization techniques
Equally important in the development of FMs is optimization, which determines how effectively these models can learn and generalize. Optimization techniques, which refine model parameters by minimizing loss functions, play a critical role in enabling convergence and improving performance. Stochastic gradient descent,408 as a foundational method, updates parameters incrementally using random subsets of data, offering a balance between computational efficiency and learning stability.409 Building on this, adaptive moment estimation introduces adaptive learning rates and momentum, which accelerate convergence and improve performance, particularly in high-dimensional spaces.410 More advanced algorithms, such as second-order methods or techniques like gradient clipping, address challenges such as vanishing or exploding gradients, enhancing the stability and precision of the optimization process. In the context of mathematical sciences,411 optimization demands a higher level of precision and stability, as numerical computations often involve solving equations or analyzing multidimensional data with stringent accuracy requirements. By fine-tuning optimization algorithms to these unique demands, researchers can unlock the full potential of FMs, enabling them to tackle increasingly complex problems and push the boundaries of mathematical research. Through the seamless integration of robust architectures and sophisticated training techniques, FMs continue to transform the way we approach and solve problems in mathematical sciences.
Applications
FMs are revolutionizing scientific applications, offering transformative advancements across mathematical modeling and simulation, applied sciences, symbolic mathematics, and decision-making under uncertainty.
In mathematical modeling and simulation, FMs optimize processes through data-driven approaches, particularly in areas where traditional analytical methods struggle.412 For instance, physics-informed neural networks are widely used to solve complex nonlinear partial differential equations in fluid dynamics and climate modeling by incorporating physical laws directly into the neural network architecture. Similarly, GNNs model traffic flow by capturing spatial dependencies and dynamics in traffic networks.413 In applied sciences, FMs enhance the understanding of complex systems like climate dynamics and material sciences. CNNs analyze climate data, such as satellite imagery, to predict weather patterns, while RNNs model the temporal evolution of material properties, enabling the discovery of new materials.414,415 These data-driven models complement traditional frameworks, bridging the gap between theory and empirical observations.416 In symbolic mathematics, FMs replicate and extend human reasoning through tasks like symbolic integration and theorem proving. Transformer-based architectures, such as those in DeepMind’s AlphaTensor, automate complex symbolic operations by learning the structure of mathematical expressions.417,418 Furthermore, FMs excel in decision-making under uncertainty, a critical capability in fields such as epidemiology and finance. Bayesian neural networks provide probabilistic reasoning for modeling disease spread, while RL optimizes trading strategies under uncertain market conditions.419,420
Challenges and perspectives
Despite these transformative applications, FMs face substantial limitations, particularly regarding computational demands and interpretability. Training and deploying these models require immense computational resources, including high-performance computing infrastructure and significant memory capacity. As FMs grow larger and more complex, their resource requirements increase exponentially, posing substantial barriers for research institutions with limited access to such technologies. Furthermore, the “black-box” nature of FMs makes the decision-making processes difficult to interpret, especially in tasks involving complex mathematical reasoning and verification. This challenge is particularly critical in high-stakes fields such as finance and healthcare, where trust, transparency, and accountability are non-negotiable, and understanding the model’s reasoning is essential.421 Mining the rationale behind these models’ outputs from their intricate architectures remains a profound and ongoing challenge, highlighting the need for innovative methods to enhance their interpretability and usability.422
Life sciences
Life sciences focus on exploring the essence and developmental laws of biological activities. In recent years, AI technologies have significantly propelled life sciences’ applications, especially in drug design, synthetic biology, and health interventions (see Figure 7). With FM-based technologies advancing, life sciences can achieve a qualitative leap in analytical precision, predictive capabilities, and IDM.
De novo drug design and decision-making
The rapid development of AI technologies has led to the emergence of large models with massive parameters, represented by systems such as ChatGPT and AlphaFold. In the field of de novo drug design, researchers have leveraged large model techniques to design a wide variety of drug molecules with significant biological activity, including small molecules, macrocycles, peptides, proteins, and nucleic acids. Using LLMs, these models not only autonomously learn sequence features but also rapidly generate ligand small molecules. For example, hybrid generative chemical language models for designing PI3K γ ligands, as demonstrated in leveraging molecular structure and bioactivity with chemical language models for de novo drug design, exhibit submicromolar to nanomolar activity and showcase scaffold-hopping potential.423 Additionally, LLM-based methods can generate candidate bioactive peptides. Chen et al. designed de novo bioactive peptide sequences with no toxic side effects.424 While LLM methods efficiently and accurately generate biomolecular sequences, they also face challenges in data dependency and interpretability. Inspired by AlphaFold’s protein structure prediction, deep-learning-based foundational models can accurately design and predict macrocyclic peptide structures. Rettie et al. introduced “cyclization encoding” as a positional encoding to predict the structure of natural cyclic peptides from sequence information, expanding the structural space of macrocyclic drug molecules.425,426
In addition to leveraging pretrained FMs, another approach is to apply standard RL agents to optimize the de novo design of drug molecules. Baker’s team presented an RL approach using MCTS to design protein nanomaterials, overcoming challenges that bottom-up methods (which build proteins from fragments) cannot address.427 The DRL method heavily relies on computational predictions due to the explosive structural space of proteins, with potential for improvement through policy and value networks to enhance efficiency and broaden applications. For example, Frederic et al. trained a policy-based FM using DRL, combining neural architecture search, hyperparameter tuning, and joint optimization of the sequential decision process to design RNA-based drug molecules.428 In summary, AI IDM models, as a cutting-edge and advanced technical means, are widely used to solve scientific problems and technical challenges encountered in the process of new drug development.
Synthetic biology planning and engineering
With the deep integration of AI and biology, the field of synthetic biology is advancing rapidly. For instance, the combination of AI with plant-based synthetic biology technologies has led to the development of disruptive, sustainable agricultural applications.429 By training advanced FMs, traditional biosynthetic cyclic processes have been transformed into a multidimensional “design-build-test-learn-predict” workflow,429 improving synthesis efficiency while simultaneously reducing costs. Recent AI-assisted advancements in synthetic biology focus on key areas such as genome annotation, protein engineering, metabolic pathway prediction, and synthetic route planning. For example, Zhou et al. proposed a few-shot learning approach combined with meta-transfer learning, ranking, and parameter fine-tuning to optimize various protein language models and enhance prediction performance under conditions of extreme data scarcity.430 Although the method’s effectiveness was validated through a polymerase wet-lab experiment, the optimization of LLMs for proteins is significantly influenced by data distribution, indicating the need for further refinement of the approach.
Additionally, the issue of time-consuming biosynthetic processes can be addressed through AI models that analyze, plan synthetic routes, and optimize reaction conditions, ultimately leading to the identification of faster and more efficient synthetic pathways, thereby effectively shortening the biochemical synthesis cycle. For instance, Vaucher et al., from the perspective of NLP, used a custom rule-based NLP model to treat the construction of chemical reaction rules as a text extraction problem.431 Although this method offers high prediction accuracy and interpretability, it requires substantial computational resources and suffers from poor generalization in FMs. Traditional methods for optimizing biosynthetic reaction conditions involve chemists manually enumerating all possible combinations of reaction conditions and making decisions independently, which is both time-consuming and costly. Optimizing biosynthetic reaction conditions is a critical step in achieving AI-assisted chemical synthesis. For example, Zhou et al. combined RL with chemical knowledge to iteratively record chemical reaction outcomes and select new reaction conditions to improve results, enabling a dynamic, interactive decision-making process for optimizing chemical reactions.432 AI is set to become a powerful tool for improving synthetic reaction conditions.
Life health intervention and management
By leveraging the powerful data processing, prediction, and adaptive learning capabilities of FMs, more scientific and refined life and health intervention plans can be achieved, improving intervention effectiveness and reducing medical costs while promoting the popularization of health management. In particular, the application of AI in precision nutrition has brought about profound changes in the field. The FM with NLP not only extracts and predicts dietary patterns433 but also provides interpretable predictions for diet-related diseases, explores the relationship between dietary patterns and metabolic health outcomes, and proves the effectiveness of NLP methods in improving disease prediction models.434 The FM based on NLP builds molecular-level nutrient analysis and dietary recommendation models based on food intake, providing customized and precise dietary recommendations for individuals based on factors such as genetics, environment, and lifestyle.435
By establishing efficient models, RL can dynamically balance multiple objectives, enhancing both the sensory attributes and nutritional value of foods. Amiri et al. introduced a multilevel real-time reward mechanism that combines collaborative filtering with user ratings, preferences, and nutritional data.436 This algorithm not only addresses nutritional and health factors but also dynamically adapts to uncover users’ latent dietary habits, thereby significantly enhancing user acceptance and adherence. Furthermore, traditional dietary recommendation methods typically suggest foods based on users’ historical preferences but often fail to meet real-time health needs. Liu et al. harnessed the continuous decision-making and interactive capabilities of RL, alongside collaborative filtering algorithms, to develop an adaptive dietary decision model.437,438 This model not only fulfills nutritional and health requirements but also dynamically adjusts to users’ taste preferences and personal satisfaction. RL techniques can iteratively refine food formulations through feedback mechanisms, enabling responsiveness to changing consumer demands. Despite the successes of RL in meal recommendations, significant shortcomings remain. Existing literature often integrates food components merely into categories without adequately considering the specific impacts of food composition and dietary structure on health. Additionally, individuals’ genetic data are frequently overlooked in the food recommendation process, resulting in an incomplete assessment of health characteristics. Future research should incorporate dynamic factors such as seasonal variations, specific occasions, and ingredient availability into meal recommendation plans. By integrating knowledge graphs, user preference information can be displayed more dynamically, enhancing the generalizability and effectiveness of dietary recommendation algorithms.
Challenges and perspectives
The application of IDM models in the life sciences, particularly in drug development, synthetic biology, and health interventions, is driving revolutionary changes. However, these advancements also face challenges in data, algorithms, ethics, and legal issues, which manifest in the following three main aspects. (1) Data challenges: compared to the large datasets required by FMs, the scale of data in the life sciences is relatively small and often contains noise and missing values, which can impact the accuracy of IDM models. In the future, these data-related challenges can be addressed through approaches such as data augmentation and preprocessing, multisource data integration and fusion, and sensitivity analysis. (2) Data scarcity for pretraining large models: biomedical data (e.g., compounds, targets, molecular interactions, and clinical trial data) are scarce and difficult to obtain, making pretraining large models a significant challenge. In the future, a combination of techniques such as cross-domain transfer learning, few-shot learning, and self-supervised learning can be employed to extract valuable information from limited data samples, maximizing model generalization and performance. (3) Privacy and compliance in health interventions: the field of health interventions requires large amounts of personal health data, which often contains sensitive information. Ensuring privacy protection and compliance with regulations is a major challenge. In the future, IDM models in life sciences can ensure data security and compliance by employing comprehensive strategies such as data anonymization, differential privacy, federated learning, encryption, and adherence to legal and ethical standards, thereby minimizing the risk of privacy breaches and misuse.
Healthcare
Medicine is a cornerstone of human progress and well-being. FMs now provide efficient, intelligent decision support for complex tasks such as early disease screening and surgical planning by enabling unified solutions across text, images, and genomics with minimal task-specific data.165,439,440 Below, we discuss how FMs are reshaping diverse healthcare domains across diagnostics, medical imaging, and beyond.
Advancing diagnostics via multimodal and genomic data integration
FMs show considerable promise in clinical diagnostics by combining vast medical knowledge with advanced reasoning capabilities. LLMs such as GPT-4 have exhibited near-expert performance in tasks like medical question answering and case analysis.441,442,443 In multiple evaluations, LLM-driven DSSs have equaled or even outperformed clinicians in specialized diagnostic settings, highlighting their versatility and potential for widespread application.444
Despite these gains, reliability challenges persist. For instance, prominent models may occasionally produce factually inaccurate responses, underscoring the importance of human oversight in diagnostic workflows. Retrieval-augmented methods offer a compelling solution by grounding outputs in trustworthy sources, effectively reducing error rates and bolstering model credibility.445,446 Moreover, multimodal models now integrate textual data with real-time imaging or video to generate more holistic diagnoses. This is particularly vital in fields such as surgical pathology and retinal images, where contextual cues from multiple data streams can sharpen diagnostic precision.447,448
Beyond text and images, FMs have made inroads in waveform data, radiology, and histopathology data for diagnosis. For example, FMs have been developed to diagnose cardiovascular diseases using ECG data.449 MedSAM excels at universal segmentation, adeptly identifying anatomical regions and lesions across modalities,450 while VLMs generate synthetic radiological images to augment training data for resource-constrained settings.451 Within oncology, models like MUSK integrate pathology images with patient data to pinpoint molecular biomarkers and gauge treatment response, thereby enhancing diagnostic specificity.452 Similarly, large-scale histopathology applications have demonstrated the ability to accurately classify common and rare cancers and adapt to different staining protocols.453,454
Moreover, emerging genomic and multi-omics FMs further expand diagnostic capabilities.455 The Nucleotide Transformer captures meaningful DNA sequence representations useful for detecting specific variants in low-data scenarios,456 while scGPT integrates single-cell transcriptomics and proteomics for cell-type identification.457 Likewise, GET leverages chromatin accessibility data to reveal previously unknown regulatory elements linked to disease states.458 Genomic FMs459 enhances personalized gene-expression prediction from DNA sequences. Together, these achievements illustrate the synergy of large models in diagnostics, harnessing multimodal data to increase diagnostic accuracy and reduce clinician workloads.
Optimizing treatment strategies and medication management
FMs are playing an increasingly influential role in guiding therapeutic strategies. By distilling a broad spectrum of clinical data, such systems can assist with treatment plan selection, medication management, and complex medical decisions. For instance, recent research underscores the utility of LLMs in generating actionable guidance for oncologists by correlating genomic data with standardized treatment protocols.442,444
In high-risk environments like surgery, integrated large multimodal models provide a comprehensive view of patient status by interpreting textual records, imaging, and even real-time operative video.447 This holistic perspective can potentially refine intraoperative decisions, although rigorous domain-specific validation remains paramount to guarantee patient safety. Oncology, in particular, benefits from vision-language approaches such as MUSK, which synchronizes imaging evidence with patient history to deliver more targeted therapeutic options.452
Treatment support also extends to pharmacogenomics and targeted therapies, where multi-omics FMs inform drug-efficacy predictions and toxicity risks.457,458 By consolidating diverse data types—from genomic variants to proteomic signatures—these models can uncover individualized treatment pathways that traditional siloed systems might overlook. Although these applications have shown promising results, continued refinement through real-world validation is essential to promote safe and effective model deployment.
Enhancing personalized prognostic predictions and treatment responses
Prognostic assessment forms another critical domain where FMs exhibit growing influence. Their ability to fuse data across imaging, pathology, and multi-omics permits more nuanced predictions of clinical outcomes, such as survival rates or recurrence risks. In oncology, for example, LLMs not only offer specific diagnostic insights but can also project disease progression timelines, guiding clinicians in discussing treatment goals and end-of-life care with patients.444
Multimodal approaches strengthen prognostic modeling by considering imaging features, genetic information, and electronic health records. Vision-language systems—initially aimed at diagnosis—now provide risk estimations for treatment response, shedding light on the likely success of targeted therapies.450,451 Meanwhile, multi-omics models such as GET predict gene-expression changes with high fidelity, offering clues about disease trajectories and potential intervention points.458 These capabilities can be extended to broader population health questions, including public health surveillance and risk stratification for chronic conditions.
Furthermore, LLMs trained on extensive clinical texts help forecast outcomes such as length of hospital stay, readmission rates, and complication probabilities.460,461 By correlating longitudinal clinical data with patient histories, these models can highlight patients at higher risk for poor outcomes. As with diagnosis and treatment, the path to robust real-world performance hinges on careful calibration to mitigate data shifts and potential biases, necessitating ongoing oversight and validation.
Streamlining clinical workflow and resource management with FMs
Efficient clinical workflow management and automation remain top priorities in modern healthcare, where administrative burdens can detract from patient care. FMs excel in summarizing patient data, automating documentation, and flagging key clinical concepts. For instance, GatorTron—trained on extensive clinical corpora—achieves state-of-the-art performance in medical question answering and semantic similarity tasks, underscoring the power of scale in reducing repetitive documentation workloads.462
Beyond straightforward information extraction, domain-fine-tuned LLMs facilitate classification tasks with impressive accuracy, including the categorization of specific conditions such as musculoskeletal pain, thereby streamlining triage and referral processes.463 These models also generate concise clinical summaries for patient records, pathology reports, and radiological findings. In certain metrics, their summarization quality rivals or surpasses that of human experts.464
Resource management stands as another area of promise. Large models have shown aptitude for predicting hospital throughput metrics, including readmissions, length of stay, and quality-of-care indicators.460,461 By automatically integrating patient data from disparate sources, these systems can enable more proactive scheduling, optimize bed allocation, and support cost-efficient healthcare delivery. Although the benefits of such automation are evident, issues related to data privacy, model interpretability, and fairness must remain at the forefront of clinical implementation efforts.
Challenges and perspectives
Despite their undeniable potential, FMs encounter several implementation hurdles in healthcare settings. Ethical concerns emerge from potential biases in training data, especially if models predominantly learn from Western-centric datasets, raising the risk of suboptimal or inequitable outcomes in other demographic or geographic populations.465,466 Ensuring clinical reliability also remains problematic, as model performance on standardized evaluations does not always translate seamlessly to the variability of real-world practice.467,468
Regulatory considerations add another dimension of complexity, as healthcare institutions, clinicians, and AI developers face evolving liability issues. While AI-assisted systems could mitigate some legal risks for individual practitioners, manufacturers and organizations must navigate uncertain regulatory frameworks and reimbursement models for novel AI solutions.469 The path forward calls for transparent model architectures, continual diversification of training data, and human-AI collaboration guidelines that emphasize safety and accountability.470 Ongoing clinical assessments, including automated-expert evaluations and prospective trials, will be crucial in validating both diagnostic and therapeutic claims.471 In this way, FMs can reinforce—not supplant—clinicians’ expertise, offering scalable, data-driven insights that enhance patient outcomes while adhering to rigorous standards of care.
Dentistry
AI models trained on healthcare data have demonstrated significant potential in disease diagnosis, treatment planning, and health management, particularly in dentistry. Within this field, FMs and IDSs are poised to transform clinical workflows by enhancing precision diagnostics, optimizing treatment strategies, and improving patient outcomes. This section examines the opportunities, existing applications, critical challenges, and future directions of FMs and IDSs in dental practice.
Basic principles of FMs and intelligent decisions for healthcare in dentistry
The common healthcare foundation model (HFM) can be flexibly applied to multiple medical tasks, and processes multiple medical data modalities. In contrast to traditional specialized AI models that focus on specific medical tasks or data modalities of interest, the healthcare FM has demonstrated remarkable success in the related subfields of healthcare AI, such as language, vision, bioinformatics, and multimodality.472 FMs have demonstrated exceptional performance in medical text processing and discussion tasks after learning extensive medical language data.441,473 The VFM has shown impressive promise in medical images. Modality, organ, task, and specific VFMs have demonstrated their general performance and flexibility in terms of possible medical situations.474 The bioinformatics FM has given us opportunities for situations involving protein sequences, DNA, RNA, and other elements.456,475 The multimodal foundation model (MFM) has offered a more efficient method by combining data from several modalities, which can interpret different medical modalities and carry out activities that depend on numerous modalities.165,476 As a result, these models have advanced the healthcare field by offering a basis for tackling intricate clinical problems and enhancing the efficacy and efficiency of medical or dental procedures, such as free-text or nurse notes, electronic health record notes, reports, radiological images, laboratory tests, dental imaging, audits, digital scanning information, integrated genomics data, and referential clinical and research archives.477 For medical decision-making, clinicians consider the patient’s previous and present medical history, the evidence that is currently accessible from medical literature, and their domain competence and experience.478
FMs in diagnostic and prognostic advancements in dentistry
Medical diagnosis is crucial for preventing disease progression and improving treatment outcomes. Medical diagnosis by FM predicts the most probable disease based on medical examinations and patient accounts, which is essential for prompt treatment and the avoidance of consequences.479 Recently, FMs have been utilized to improve medical diagnosis and have exhibited generalist capabilities across several disorders, including dental problems.480,481,482 A study examined ChatGPT-4’s capabilities as an intelligent virtual assistant in the field of oral surgery. A professional oral surgeon assessed ChatGPT-4’s answers to 30 oral surgery-related inquiries, identifying discrepancies and yielding a 71.7% accuracy rating.483 This highlights ChatGPT-4’s role as an adjunctive resource for clinical decision-making in dentistry, while underscoring that it cannot supplant the proficiency of a skilled oral surgeon. VFMs provide automated disease screening on select low-risk images and aid in the detection and identification of ambiguous target anatomies, thereby alleviating the burden of radiologists and enhancing their diagnostic accuracy. Traditionally, caries-related diagnoses are conducted by dentists using visual and tactile examination. A prompt yet thorough assessment of oral health concerns is essential prior to any treatment strategy. Diagnosing dental diseases may need considerable effort and time from specialists, sometimes utilizing X-ray scans and cone-beam CT (CBCT) to arrive at a valid judgment. Several studies have investigated the efficacy of AI-assisted models in diagnosing caries, periodontitis, medication-related osteonecrosis, maxillofacial bone fractures, oral squamous cell carcinoma, and temporomandibular disorders.484,485,486,487,488,489 These ailments can be identified by medical imaging. Segmentation and identification of VFMs furnish positional information in medical imaging, aiding radiologists in delineating images into semantic regions and identifying areas of interest.490,491
Certain VFMs have demonstrated encouraging outcomes in illness prognosis, capable of supplying biomarkers to anticipate the probability or anticipated progression of a disease. Tooth GenAI helps with early intervention for diseases like periodontitis by forecasting tooth-bone loss. It evaluates patient data to forecast bone loss over 3 and 6 months using support vector regression. The model processes the personal data supplied by users and uses it to make predictions. Findings, including illustrations, support the design of treatments and the tracking of the course of diseases.492
Advancing personalized treatment planning via FMs in dentistry
FMs, which create plug-and-play medical image-processing tools for surgical planning or support without requiring extra data gathering and model training typical of conventional paradigms, have potential applications in the field of surgery. Personalized treatment planning has emerged as a key strategy for enhancing patient outcomes as development of digital dentistry has advanced. HFMs can help with individualized planning. Surgeons can visualize pertinent structures for surgical planning by using 3D segmentation VFMs to distinguish 3D objects from medical imaging such as CT and MRI. A segmentation VFM may also recognize instruments or relevant areas in endoscopic views during the surgical procedure, which helps the operation and enhances surgical results.493
For dental implant design and placement planning, AI can automatically create optimal implant designs and placement plans to increase implant success rates, based on patients’ CBCT data, considering variables including bone density, adjacent tooth positions, and occlusal relationships.494,495 The main use of AI has been found in the segmentation of anatomical landmarks, which is one of the processes in the construction of virtual patients. Virtual implant implantation still requires the development and scientific validation of a fully automated digital approach.496 Another important choice when creating an orthodontic treatment plan is whether orthognathic surgery is required. Different practitioners may have different opinions about whether orthognathic surgery is necessary,497 and there are no set standards for determining whether the surgery is necessary. However, there are methods that try to assist clinicians by using AI algorithms.498,499 Choi et al. reported that AI could not only predict the indication for orthognathic surgery but also the indication for premolar extraction with a success rate of about 91%.500 It is obvious that such innovation would make it easier for surgeons and dental specialists to complete the presurgical planning phase for predictable and timely dental therapy.
Intelligent decision-making technologies in dentistry
With the onset of the digital era in dentistry, the integration of IDM technologies has progressively demonstrated significant benefits. These innovations are now routinely employed across various domains, including orthodontics, the design of removable partial dentures in prosthodontics, and predicting postoperative outcomes in complex maxillofacial reconstruction.501,502 More recently, the advent of the PUMCH therapy (photoacoustic-steaming unite minimal-invasive chemomechanical-preparation hydramatic-obturation) has provided a creative idea for endodontics.503 The incorporation of automated endodontic instrumentation further enhances the scope of IDM within this specialty, offering improved treatment convenience and predictability.
As clinical data accumulation and machine-learning capabilities advance during automatic therapy, sophisticated modeling tools are now enabling clinicians to optimize treatment plans by aligning tooth morphology with specific therapeutic parameters. This approach fosters more evidence-based, patient-centric decision-making.504 Additionally, these modeling systems enhance the ability to predict post-treatment resistance loss, thereby aiding in the longitudinal assessment of treatment efficacy. The integration of these tools also strengthens the communication between clinicians and patients, facilitating a more informed discussion on potential outcomes and the anticipated long-term effects of treatment.505 Furthermore, the predictive capabilities of these systems support early intervention strategies, thereby improving the precision and timeliness of clinical decision-making.
Challenges and perspectives
An HFM requires substantial medical data for training; hence, how to integrate and share data while ensuring privacy and security remains a pressing ethics issue. Healthcare data must be ethically obtained. Scanning the body provides healthcare data, yet CT imaging might injure the body.506 While such damage may be minor for illness treatment, scanning human bodies for AI training datasets is unethical. These specific data will not be readily available for extensive datasets, as observed in some current data-gathering paradigms, preventing HFM task training. Moreover, ethics limit healthcare data use and distribution. Healthcare includes delicate and even dangerous body data, including genetic data. Data use and distribution are strictly regulated by law and data owners. It is unsafe when accumulated unregulated and used to train FMs. Due to the uncontrollable external environment, HFM use will increase this risk.507,508,509
An HFM needs to operate across multiple data modalities510; however, the diverse origins of health data can lead to significant differences in data formats and quality across institutions. The characteristics of healthcare data differ among populations, regions, and medical institutions, resulting in heterogeneous data in the practical use of HFMs.511 The evolution of FMs signifies a transition from specialized tasks to generalized duties. It equips AI with a broader capacity to tackle diverse requirements and intricate surroundings in the actual world. FMs also possess the capacity to revolutionize healthcare. The advanced HFMs will seamlessly analyze various data modalities, acquire new tasks dynamically, and utilize domain knowledge, presenting prospects across an extensive array of medical jobs. Notwithstanding their potential, HFMs provide distinct obstacles. Their remarkable adaptability complicates full validation, and their scale can lead to heightened computational expenses. HFMs offer unparalleled opportunities for healthcare, assisting clinicians with various critical tasks and alleviating the administrative workload on clinicians to enable greater patient interaction.
Urban science
FMs have significantly advanced decision-making in urban science by enhancing various aspects of urban planning, policy-making, and management through their capacity to process vast amounts of data, identify patterns, and generate actionable insights.512 Here we outline several ways in which they contribute to urban science, from prediction to decision-making, along with related literature or papers for each point.
FMs enhances urban predictive modeling
FMs empower urban scientists to predict urban events with high accuracy,513 such as traffic congestion, pollution levels, and energy demand. By leveraging both historical and real-time data, FMs facilitate improved decision-making for both short-term and long-term planning. Recent literature has demonstrated that LLMs can serve as zero-shot forecasters for predictive learning tasks,116,514 particularly in urban scenarios. By designing novel tokenizers and in-context learning techniques, LLMs demonstrate superior forecasting performance against conventional statistical methods such as ARIMA. However, due to the challenges LLMs face in interpreting complicated patterns of numerical data, researchers have explored fine-tuning LLMs for time-series analysis120,515,516 and urban spatiotemporal forecasting.517,518 The primary approach involves fine-tuning specific modules within LLMs, such as layer normalization and position encodings, or training additional neural layers (e.g., embeddings and prediction heads) to better align with downstream applications. Other methodologies519 investigate the transformation of time-series data into a fixed vocabulary through techniques like scaling and quantization. By tokenizing the time-series values in this manner, these approaches enable the application of existing LLM architectures, which are trained on the tokenized sequences using cross-entropy loss.
In addition to leveraging pretrained LLMs, another approach is to train an FM from scratch using cross-domain urban data, including transportation, energy, climate, air pollution, and so forth. This trend is exemplified by the introduction of UniST,520 a universal model designed for general urban spatiotemporal prediction across a wide range of scenarios. The core idea behind UniST is to utilize diverse spatiotemporal data from various urban contexts and conduct effective pretraining to capture complex spatiotemporal dynamics. For downstream applications, UniST enhances its generalization capabilities by incorporating knowledge-guided prompts. Subsequent works have explored similar approaches in predictive learning, applying these methods to a broader spectrum of urban data, such as human trajectory data521,522 and remote sensing data.523,524
FMs integrate multimodal data for accelerated plant breeding
FMs are playing a transformative role in advancing interpretable and transparent decision-making in urban science. By leveraging vast amounts of textual, numerical, and spatial data, LLMs can facilitate more informed, data-driven decisions that are easier for both experts and the public to understand. These models contribute to the interpretability of urban systems by providing clear explanations for complex decisions, enhancing transparency, and fostering greater public trust. In particular, FMs support decision-making processes that are both efficient and accessible, whether through human-computer interaction, participatory planning, or evaluation/validation frameworks.
To illustrate the first category, consider the application of traffic-light control, where LLMs contribute to adaptive data-driven traffic-light systems that optimize traffic flow based on real-time environment and traffic patterns.525 These models can analyze sensor data, weather conditions, and urban mobility trends to adjust signal timings dynamically, aiming to reduce congestion and improve traffic safety. By providing interpretable insights into how decisions are made (e.g., why certain signal changes were triggered), LLMs promote a more transparent approach to traffic management. This level of explainability ensures that both urban planners and the public can understand the rationale behind traffic control measures, fostering trust in the system and its ability to respond to changing conditions.
In the second category, FMs function as agents that simulate human behavior to inform urban decision-making. For example, in participatory urban planning, these models integrate diverse data sources—such as city plans, community feedback, and environmental reports—to guide decisions in a manner that is both data-driven and transparent. By processing large-scale public inputs, FMs can identify key trends, priorities, and concerns within communities, helping planners align urban developments with public needs. Furthermore, FMs generate accessible explanations of planning decisions, ensuring that complex policies are clearly communicated to the public. This transparency empowers citizens to engage more meaningfully in the planning process, knowing their voices are heard and their perspectives considered. A recent study526 demonstrates an innovative approach to participatory urban planning using FMs as agents. This framework involves LLM agents simulating both urban planners and residents with diverse profiles. The process begins with the planner drafting an initial land-use plan, followed by a simulated discussion among residents, who provide feedback based on their unique needs. To enhance the efficiency of these discussions a fishbowl mechanism is employed, allowing a subset of residents to engage in conversation while others listen. The planner then revises the plan based on this input, creating a more inclusive and responsive urban planning process.
Within the last class, FMs significantly contribute to policy evaluation and validation by leveraging their natural language understanding and reasoning capabilities. They assist in contextual analysis, offering concise policy summaries and retrieving relevant literature for informed decision-making.61 LLMs enable scenario simulation, generating hypothetical outcomes and stakeholder perspectives to anticipate societal responses.13 Additionally, they analyze public sentiment from surveys or social media to align policies with public opinion. In validation, FMs identify logical inconsistencies, cross-compare policies for best practices, and evaluate inclusivity to ensure fairness. They address ethical concerns by highlighting biases and unintended impacts, supporting equitable policymaking. Routine tasks like document parsing, data extraction, and drafting evaluation reports are automated, improving efficiency.527 By facilitating iterative refinements, LLMs act as dynamic tools for refining policies and monitoring updates, ensuring adaptability and robustness in policy-making processes.528
Challenges and perspectives
The application of FMs in future urban governance and decision-making is poised to bring broad and profound social impacts. FMs lay a solid foundation for the precise, intelligent governance and decentralized sustainable development of cities. For example, FM-based intelligent transportation systems have the potential to achieve real-time and comprehensive situational awareness of urban traffic conditions and enable flexible control of traffic signals in response to changing road conditions, significantly alleviating traffic congestion in major metropolitan areas and thereby reducing vehicle emission levels substantially. The implementation of FM-based intelligent urban planning systems can significantly reduce expert knowledge biases, providing decentralized scientific decision support for urban development. Furthermore, simulation experiments in virtual environments effectively avoid unnecessary material and energy waste caused by shortsighted planning in the real world. Overall, applications based on FMs will lead traditional cities toward transformation to smart cities, markedly reducing the workload of urban management decision-making departments and allowing them to focus more on human care within cities.
In addition to the positive societal impacts mentioned above, the application of FMs in cities may also raise some concerns. For instance, data privacy issues arise with the use of powerful FM applications that require the collection of large amounts of data for training, including personal trajectory data in urban areas. It is essential to anonymize and protect these data to prevent the potential leakage of significant amounts of personal privacy. Another challenge is aligning FM technology with societal values in urban governance. The complex demographic structure of metropolitan areas leads to diverse demands in city life based on factors such as gender, race, and income levels. This diversity can result in the theoretical optimal solutions not necessarily aligning with the social and cultural realities of the real world. Therefore, addressing the alignment of FM applications with society is crucial for their widespread adoption among diverse groups within cities.
Agricultural science
Agriculture serves as the cornerstone of food security, social stability, and economic growth. FM-based IDM empowers farmers with better-informed choices, optimizes resource allocation, and enhances overall farm management, potentially transforming the agricultural sector by boosting productivity, sustainability, and decision efficacy.
FM-driven crop management: Insights into precision decisions
The core concept of precision agriculture and intelligent agriculture is to use high-tech methods to realize the refined management of agricultural production.529 AI is gradually changing the production mode of traditional agriculture and improving the productivity and sustainability of agriculture. In this context, FMs have become the key method for smart crop management by providing intelligent decision support and optimizing resource allocation.530 Combining agricultural remote sensing data531 and ground sensor data, FMs show great potential through fine-tuned modeling in the fields of crop growth monitoring, precision agriculture technology, and pest and disease monitoring and control.
Specifically, FMs realize real-time monitoring of crop growth conditions by integrating weather patterns, soil properties, and remote sensing data, thereby identifying potential crop problems and providing adjustment solutions promptly.532 Combined with historical yield data, FMs can also accurately predict crop yields to assist in agricultural planning and food security.533,534 In precision agriculture technology, FMs can accurately analyze weather and soil data to make personalized fertilization and irrigation decisions.535 This intelligent resource management ensures that the nutrients and water needed for crop growth are optimally rationed, avoiding over-fertilization and -irrigation. For pest and disease monitoring and control, traditional visual methods usually rely on single-image data, which are limited by the quality of the data and insufficient to make decisions on disease-spread prediction and control measures.536 With powerful multimodal learning ability, FMs can quickly identify the types and severity of crop pests and diseases and provide targeted prevention and control suggestions.537 The advantages of an FM in efficient processing and understanding of multimodal data enable it to make comprehensive analysis and decisions in complex agricultural production environments, thus promoting the further development of smart crop management. The application of FMs not only optimizes the agricultural production process but also provides strong support for the sustainable development of agriculture.538
FMs empower plant breeding
Plant breeding plays a pivotal role in crop improvement, with the primary objective being the selective enhancement of traits such as yield, disease resistance, and stress tolerance.539 Traditional plant breeding relies on phenotypic selection and genotypic analysis, typically conducted through field trials, breeding, and progeny selection. However, this approach faces limitations, including long breeding cycles, high resource consumption, and the complexity of environmental factors, which create bottlenecks in improving breeding efficiency. Against this backdrop, and in conjunction with the latest advances in the field of AI, intelligent plant breeding has emerged as a methodology capable of combining multidimensional data to optimize crop varieties using AI, big data, and advanced genomics techniques.540 Traditional machine-learning methods, however, often struggle with the complexities of spatiotemporal omics data. MFMs provide a promising solution.541
As a new class of FMs, MFMs can process multiple data modalities simultaneously, such as text, images, video, audio, and structured data (e.g., genomic sequences or sensor data).165 This enables them to effectively capture and analyze the interactions between genotype, phenotype, and environment. Moreover, MFMs are capable of cross-modal tasks, offering broader applicability, such as generating images from text (e.g., generating predicted crop phenotypes based on genotype descriptions). Notable models in this field include CLIP14 and BLIP (bootstrapping language-image pretraining).542 By efficiently handling heterogeneous datasets, MFMs significantly improve the accuracy of phenotype prediction, allowing breeders to more precisely forecast crop performance and optimize trait selection, thus accelerating genetic gains. In the future, MFMs are expected to play an increasingly crucial role in crop breeding, transforming the landscape of agricultural innovation.
FM-driven livestock farming: Health monitoring to full-chain optimization
The continuous development of AI and FM technologies is driving modern livestock farming toward greater intelligence, precision, and efficiency.543,544 The integration of AI in livestock farming, especially in areas such as animal health monitoring, disease prediction, and resource allocation optimization,545,546,547 shows immense potential. CNNs realize the detection and early warning of abnormal animal behavior by processing and analyzing animal image and video data. GNNs consider each livestock as a node on the graph and the interaction or propagation pathways between them as edges, which can effectively capture complex relational structures and thus improve the accuracy of health monitoring. By improving management efficiency and optimizing resource distribution, AI significantly contributes to reducing production and operational costs. Currently, deep-learning decision models in livestock farming often focus on specific scenarios, such as predicting animal health based on changes in temperature, activity levels, and appetite.548 However, these decision models often rely on human expertise and historical data for training, which limits their ability to address cross-disciplinary knowledge integration and complex data challenges.549 In contrast to traditional models, FMs effectively address these challenges by integrating knowledge from multiple disciplines and learning from large-scale multimodal data.50 FMs are capable of processing and integrating multidimensional complex information across different fields, thereby providing more accurate and comprehensive IDM. FMs not only enhance the accuracy of health monitoring and disease prediction in livestock farming but also optimize resource allocation, driving the precision and intelligence of agricultural production management. As a result, the FM has become a key tool in current smart livestock farming for improving management efficiency and optimizing resource allocation.
Challenges and perspectives
In the next decade, we will continue to witness the development of emerging AI methods and IoT technologies. These advancements will contribute to optimal decision-making and enhance the intelligence of agricultural production and management. Currently, we face numerous challenges in effectively integrating these cutting-edge technologies into agricultural production, especially in interdisciplinary system solutions and the realization of low-cost AI applications in agriculture. There are still technological barriers to the interdisciplinary integration of fields such as agriculture, computer science, and environmental science. Finding ways to combine the practical needs of agriculture with AI technologies to design intelligent systems that are both efficient and adaptable to production rules requires substantial innovation and research. At the same time, the high cost of AI technology and the differing needs of small-scale farming economies make low-cost applications a central challenge in promoting smart agriculture. To overcome these challenges, in addition to technological innovation, there is a need to develop more cost-effective hardware devices and simplified interfaces as well as to tailor application solutions to different regions. Government policy support, the establishment of industry standards, and collaboration between agricultural enterprises and technology companies will also play a crucial role in promoting technology adoption and reducing costs. With advancements in technology and decreasing costs, smart agriculture will achieve precision, intelligence, and sustainable development. This will enhance the efficiency of the global agricultural industry and contribute to food security and the sustainable growth of rural economies.
Economic science
In economic science, FMs fuse heterogeneous market signals to deliver faster risk assessment, sharper investment insights, and more responsive compliance monitoring,550,551,552,553,554,555 as shown in Figure 7. Their generalization across asset classes and scenarios outperforms rule-based tools, driving innovation in credit scoring, inclusive finance, and strategic decision-making.
FMs empower credit assessment and inclusive finance
Traditional credit evaluation often depends on manual expertise and offline data collection, resulting in time-consuming processes, information asymmetry, and limited coverage. In contrast, FM-based decision intelligence can efficiently integrate and semantically analyze extensive heterogeneous data sources—such as text, images, transaction histories, and social media information—to construct more accurate and dynamic credit profiles.556 With capabilities such as automated factor discovery, generative dialog, and continuous monitoring of borrowers’ credit behavior, FMs enable financial institutions to reduce operational costs, expedite lending decisions, and extend tailored financial services to small and micro enterprises as well as underserved customer segments. This enhances the reach and accessibility of inclusive finance.557,558
Investment decision support and market analysis by FMs
The application of FMs to securities investment and asset allocation is gaining momentum. Beyond traditional quantitative strategies, integrating FM decision intelligence facilitates the incorporation of diverse information sources, including macroeconomic indicators, industry reports, corporate disclosures, market news, and sentiment data. Utilizing advanced natural language understanding and multimodal learning techniques, FMs provide a more comprehensive depiction of market dynamics and risk profiles.559,560,561 Additionally, generative AI techniques can produce a wide array of trading strategy suggestions and early warning signals, empowering analysts, portfolio managers, and traders to optimize asset allocation and pricing decisions with greater precision.562
The application of FMs in risk control and regulatory technologies
Enhancing risk prevention and ensuring regulatory compliance are crucial for maintaining stable financial markets. FM decision intelligence improves both the early detection and real-time monitoring of abnormal transactions, fraudulent disclosures, and illicit financing activities by leveraging deep-learning, graph-based knowledge representations and anomaly detection algorithms.563,564 Furthermore, regulatory authorities can utilize FMs to implement “intelligent regulation,” automating compliance checks, tracking policy execution, and swiftly adapting supervisory strategies to emerging industry developments.565 International institutions such as the Monetary Authority of Singapore, the UK Financial Conduct Authority, and the US Securities and Exchange Commission have actively experimented with FM- and AI-driven regulatory technologies, providing valuable references for enhancing domestic supervisory frameworks.
FMs empower emerging financial services and business reconfiguration
FM decision intelligence also supports the development of new financial service models. Integrating intelligent contracts, digital identity verification, and distributed ledger technologies establishes the infrastructure for streamlined financial processes.566,567 By introducing natural language interfaces, customers can engage in “human-like” interactions, applying for loans, wealth management products, or insurance claims without relying on text-based inputs, thereby significantly improving user experience and operational efficiency. Additionally, combining FMs with federated learning, privacy-preserving computation, and multiparty secure computation enables secure data collaboration across institutions. This approach safeguards user data privacy and security while facilitating decision sharing at scale, ultimately enhancing resource allocation and market efficiency.568
Challenges and perspectives
Despite rapid advancements, the deep integration of FM decision intelligence into finance presents several challenges. Model biases, data-quality issues, privacy protection, regulatory gaps, and ethical concerns remain pressing issues that require collaborative solutions across technological, institutional, and policy dimensions.569,570 Future research will focus on developing more trustworthy and explainable foundation-model decision frameworks, enhancing robustness under anomalous conditions, and ensuring dynamic adaptability to evolving regulatory policies.571 By establishing a robust policy and regulatory ecosystem, it is possible to balance fostering innovation with controlling risks, thereby promoting both service efficiency and fairness while maintaining market stability and sustainable innovation in the financial sector.
Educational science
FMs are reshaping educational science by powering adaptive tutoring, data-driven learning analytics, and equitable access to high-quality content,50 as shown in Figure 7. Their multimodal reasoning and vast knowledge base support personalized trajectories and real-time feedback that measurably boost learner engagement and outcomes.
The application of FMs for personalized and adaptive learning
One key advantage of employing FMs in education lies in their capacity for personalized learning recommendations. By analyzing large volumes of student data, including performance logs, interaction histories, and assessment results, these models can infer individual learner profiles and suggest tailored instructional materials. For example, a language-based FM can dynamically adapt reading passages or prompts to a student’s current proficiency level, thereby maintaining an optimal challenge and minimizing frustration.61,572 Furthermore, such models can identify gaps in student understanding and proactively provide targeted exercises or explanatory content, thus facilitating more efficient remediation and improving retention rates.
Beyond content adaptation, FMs support differentiated instruction by catering to diverse learning styles and preferences. Visual learners, for instance, can benefit from models that generate infographics or interactive simulations, while auditory learners might receive content in the form of narrated explanations or podcasts. This level of personalization ensures that educational materials are accessible and engaging to a broader range of students, thereby promoting inclusive education.
Additionally, FMs enhance accessibility by supporting multiple languages and dialects, bridging language barriers, and making educational content accessible to non-native speakers and students with diverse linguistic backgrounds. They can also generate alternative content formats, such as braille, audio descriptions, and sign-language interpretations, accommodating different students.573 Adaptive technologies powered by FMs provide customized learning paths and assistive tools, fostering an equitable learning environment where all students have the opportunity to succeed regardless of their unique challenges.574
For example, recently, FMs such as DeepSeek-v3 have been be leveraged to analyze students’ learning patterns and provide real-time feedback, enabling personalized tutoring experiences. For instance, by identifying each student’s knowledge gaps through data-driven assessments, the model tailors content delivery and practice exercises, facilitating targeted intervention from educators. This approach not only improves overall learning efficiency and engagement but also contributes to more equitable educational opportunities by addressing individual needs.
The application of FMs in intelligent tutoring and feedback systems
Another critical area of application involves intelligent tutoring systems (ITSs) and automated feedback mechanisms. Traditionally, providing high-quality, individualized feedback at scale has been a challenge. With FMs, however, ITSs can offer richer, more nuanced guidance. These systems can evaluate student responses to open-ended questions, highlight specific misconceptions, and present alternative solution strategies in real time. Moreover, by leveraging decision-making capabilities, the systems can determine not only what feedback to provide but also when and how to deliver it for maximum pedagogical impact.575
Advanced ITS powered by FMs can simulate one-on-one tutoring experiences, fostering deeper understanding through Socratic questioning and scaffolded learning. For example, in mathematics education, such systems can guide students through complex problem-solving processes, offering hints and prompting reflections that encourage critical thinking and self-regulation. Additionally, automated grading systems can assess not only the correctness of answers but also the reasoning processes, providing comprehensive evaluations that support formative assessment practices.
Furthermore, automated feedback mechanisms can support a wide range of subjects by providing instant, personalized feedback that helps students understand their mistakes and learn from them. This immediate reinforcement loop enhances the learning process and contributes to better academic performance and higher retention rates.576
The application of FMs for educators and institutional decision-making
The potential of FMs in education also extends to supporting teachers and educational administrators in decision-making processes. Intelligent systems built atop large models can assist in curriculum design by analyzing existing instructional materials, identifying coverage gaps, and recommending supplementary resources. They can also support student placement decisions and predict at-risk learners, enabling early interventions.575,577
Furthermore, FMs aid in professional development by providing personalized training resources for educators, identifying areas for improvement, and suggesting evidence-based teaching strategies. For administrators, these models streamline operational tasks such as scheduling, resource allocation, and policy formulation by analyzing institutional data and forecasting trends.
Additionally, FMs facilitate data-driven decision-making by aggregating and analyzing vast amounts of educational data, providing insights that inform strategic planning and policy development. This enables institutions to make informed decisions that enhance educational quality and operational efficiency.
Moreover, these models support collaboration among educators by offering platforms for sharing best practices, resources, and innovative teaching methods. By fostering a collaborative educational environment, FMs contribute to the continuous improvement of teaching and learning processes.
Challenges and perspectives
Looking ahead, the application of FMs in education presents numerous opportunities for innovation and research. Future developments may focus on improving the interpretability and explainability of these models, enabling educators to understand and trust the AI-driven recommendations and feedback. Additionally, integrating multimodal data sources, such as behavioral analytics, biometric data, and contextual information, can enhance the models’ ability to provide holistic and nuanced support to learners.
Interdisciplinary research that combines insights from education, cognitive science, and AI will be instrumental in advancing the effective use of FMs in educational contexts. Collaborative efforts can lead to the design of more sophisticated and pedagogically sound AI systems that align with educational best practices and learning theories.
Moreover, addressing the digital divide and ensuring equitable access to AI-driven educational technologies are critical for maximizing the societal benefits of FMs. Research initiatives aimed at developing low-cost, scalable solutions and promoting digital literacy among educators and students will contribute to more inclusive and widespread adoption of these technologies. In summary, the integration of FMs into educational contexts promises more adaptive, inclusive, and data-driven learning environments. By harnessing their decision-making capabilities, educators and institutions can significantly enhance teaching quality, learning personalization, and overall educational outcomes. However, addressing the associated challenges and ethical considerations is essential to realizing the full potential of these advanced AI systems in education.
Risks and challenges
LLM-based agent security in decision-making
As the development of LLMs has rapidly advanced in recent years, the LLM-based agent technique has been penetrating into the domain of decision-making. However, the current studies578,579,580 disclose security issues in the LLM agent, which bring potential risks for decision-making, as shown in Figure 8. Referring to the recent investigation on LLM-based agents578 and intelligent algorithm security,581 we know that an LLM-based decision-making framework can comprise several components, namely user, predefined system prompt, memory retrieval, external environment (a set of toolkits), and others. Thus, the potential vulnerabilities and security threats can derive from these aspects. On the other hand, the security problem can also come from the interior of LLMs per se. Here, we unveil the security issues of decision-making from the aspects of both external risks and internal risks.
Figure 8.
Risks and challenges in LLM agent
The left of the three subfigures exhibits the algorithm-level attack and mitigation from perspectives of intrinsic vulnerabilities and interactive environment, e.g., jailbreak, backdoor, and model interrogation. The middle panel describes the application-level privacy and risk from the viewpoint of different disciplinary fields, e.g., membership inference attack and information cocoon. The right panel presents system-level LLM trustworthiness and robustness from the aspects of PoT and intelligent decision-making environment, e.g., content-conflicting hallucination and fact-conflicting hallucination.
External risks from LLM agents
During the chain of decision-making, the LLM agent may encounter the following attacks. (1) Prompt injection attacks: the attacker injects special instructions to the original prompt and manipulates the model’s understanding, leading to erroneous output.582,583,584 On the other hand, such prompt injection attack can also compromise the planning by manipulating the accessible external environment, i.e., various available auxiliary tools.585,586 (2) Agent memory poisoning: unlike traditional data-poisoning attack during the training of deep learning, memory poisoning here is by injecting mischievous or misleading data into the retrieval-used database, by which the agent will provide irrational decision planning or actions.587,588 (3) Backdoor attacks on LLM agent: targeted on the plan of thought (PoT), the attacker first poisons a subset of the plan demonstration, wherein the backdoored planning step and adversarial target action are involved, then injects a trigger into the query prompt.589,590,591,592
Internal risks from LLMs
There also exist several internal security issues in LLMs that can induce decision failure. (1) Jailbreak attacks are similar to the traditional adversarial examples and launched by various specific strategies of prompt engineering through breaking the safety guard of LLMs, which leads to deceiving LLM into producing unexpected contents.593,594,595,596 (2) Model interrogation is a new threat toward LLM alignment; unlike a jailbreak attack requiring crafted prompts, it coerces LLMs to disclose harmful/unaligned response by forcefully outputting low-ranked tokens, i.e., the adversary needs to have prerequisite to access to the top-k token predictions at each output position of LLMs.597 (3) In backdoor attacks, akin to backdoor risk in traditional deep learning, LLMs also confront such a threat of backdoor implantation, i.e., the adversary can embed a trigger discreetly into LLMs through fine-tuning the instruction.598,599,600,601 Resorting to such covert backdoor attacks, the adversary can deceive the LLM into producing planning/action responses that align with the adversary’s intention; this is critical to the decision-maker.
Both the external and internal risks described above bring serious threats and risks to decision-making. To facilitate the practical applications in decision-making, we next review the defense countermeasures from two angles.
Mitigation in LLM agents
As aforementioned, a decision-making task usually consists of several stages to infer planning/actions; thus, there exist several different corresponding defense policies. (1) Delimiters: the decision-makers can delimiter to encapsulate the query, with the purpose of solely implementing a query.578 (2) Paraphrasing: the defender can reword the query and disrupt the special-character sequence against mischievous instructions or triggers.602 (3) Shuffle: to defend against backdoor PoT, the procedure of PoT demonstration can be reordered randomly. (4) Memory-poisoning detection: compromised memory can be identified by measuring text perplexity or employing an LLM.584
Mitigation in LLMs
Given the complexity of LLMs, the defense in general needs comprehensive strategies to mitigate the adverseness. We summarize here the representative defense mechanisms. (1) Unbiased training: as aforementioned, there exists similarity between jailbreaking attacks and adversarial attacks,603 one effective mitigation is to enhance and balance the training datasets of LLMs, even conflating the mischievous instructions to run co-training. For example, adversarial training enables one to enhance the model’s robustness by introducing the adversarial examples as a training dataset against adversarial attacks.604,605 (2) System prompt enhancement: one study606 reported that a short system prompt can induce an increased rate of successful attack; therefore, the system prompt also needs to be robust. (3) Malicious-content filtration: some attacks bypass the security guard of the input stage but fails at the output stage, thus, countermeasures of output detection and malicious-content filtration are necessary during the chains of decision-making.
To sum up, the LLM-agent-based decision-making is sophisticated at present, stemming from a set of components other than the LLM per se. Hence, the attacks are not only from the interior of the LLM but are also induced from the exterior of the LLM, which will cause severe threats and risks in the course of decision-making. To boost the practical applications of LLMs on decision-making, its security issues deserve more attention and effort in future.
Machine hallucination causality and mitigation
Hallucination, which emerged in the NLP domain before the birth of LLMs, in general refers to generation of nonsensical or unfaithful responses to the provided source content.607 At present, given the versatility of LLMs, three categories of hallucinations608 can be drawn. (1) Input-conflicting hallucination: this category denotes that the generated content/response is not coincidental with user input; it can happen when there is a misunderstanding between LLM response and task instruction. (2) Context-conflicting hallucination: for an LLM, the produced content conflicts with its pregenerated contents. This occurs when the LLM cannot track the context or maintain consistency during the conversation, possibly stemming from the insufficiency of long-term memory.609 (3) Fact-conflicting hallucination: the content produced by the LLM is unfaithful to the established knowledge. This phenomenon derives from multifarious reasons introduced at different phases of LLM utility.
The LLM hallucination is caused by several factors. (1) Noise-contained training data: as we know, LLMs currently are pretrained on trillions of tokens, some of which, however, come from fabricated, outdated, or biased information.610 For instance, the one study611 pointed out that LLMs may misunderstand camouflaged correlated content as factual knowledge. McKenna and Cheng612 found a strong connection between hallucination and distribution of training data and observed that LLMs are biased toward those samples that have attested in training data. (2) Error invisibility: given the fact that LLMs store a huge volume of knowledge, even the false information usually becomes plausible, which amplifies the difficulty of degrading the hallucination on input conflict and context conflict. (3) Overestimation of LLM: Kadavath’s group613 reported that an LLM has equal confidence in generating correct answers and incorrect answers, and Yin et al.614 investigated the capability of representative LLMs to distinguish unknowable questions. Nevertheless, the experiments disclosed that even advanced LLMs cannot handle questions well compared to humans—that is to say, the LLM’s understanding of factual knowledge surpasses knowledge boundaries, i.e., overconfidence.
Correspondingly, there also exist several mitigation countermeasures to hallucination. (1) Curation of pretraining corpora: as aforementioned, the noised data can deteriorate the knowledge of the LLM, so an effective and direct mitigation is to curate the pretraining corpus and make such training data reliable and faithful.615 For example, current LLMs attempt to collect pretraining data from credible sources. Llama 2 up-samples data from Wikipedia. (2) Supervised fine-tuning (SFT): SFT usually gets involved in two steps, namely, it first annotates massive task instruction data, then leverages maximum likelihood estimation to fine-tune the pretrained LLMs on the annotated data. Similarly, a valid countermeasure is also to curate such an instruction-tuning dataset. The hallucination-related benchmark TruthfulQA616 verifies that SFT on a curated instruction dataset would yield higher truthfulness and factuality compared to an uncurated dataset. (3) Decoding strategy designing (DSD): DSD aims to determine the means to select output tokens from generated probability distribution by models.617 Lee et al.618 propose a decoding-type “factual-nucleus sampling” to achieve a reasonable balance between diversity and factuality resorting to both “top-p” sampling and greedy decoding. Dhuliawala et al.619 developed a decoding framework, also known as chain-of-verification, to alleviate the hallucination. Li et al.620 proposed an inference-time intervention strategy to improve the truthfulness of LLMs. (4) Uncertainty estimation: uncertainty can be referred to as a criterion for when to trust LLMs; therefore, it can well assist the user to filter out uncertain responses. Currently, three perspectives of uncertainty estimation are provided621 in terms of model logit, verbalization expression, and logic consistency.
The discussion above presents the reasons why the hallucination can occur and the associated mitigations. For the decision-making life cycle, given that the hallucination may happen in the stages of pretraining, fine-tuning, and inference, we need to carefully figure out the appropriately aligned task-input instruction in addition to the extra operations on supervised fine-tuning, decoding strategies, and uncertainty estimation.
Data-privacy leakage
In the deep-learning domain, we in general refer to dataset-privacy attack as membership inference attack (MIA), that is, given a certain record D, a deep-learning model trained on dataset Dtrain, the attacker intends to infer whether the target record D belongs to training dataset Dtrain or not. In traditional MIA, the commonly used method is to retrain a substitute model,622 also called a shadow model, through resorting to feedback information from black-box access to the victim model.
With the rapid development of data-privacy attacks, present-day LLMs also encounter such data-privacy leakage risks. For instance, Carlini et al.623 discussed how log-perplexity from LSTM trained for next token prediction can be manipulated to infer the sensitive sequences from the training dataset. Subsequently, the same authors623 also proposed a data-extraction method from Transformer-based GPT-2 under the combination of perplexity queries and zlib entropy. Furthermore, referring to the language model GPT-2, Mattern et al.624 worked out a neighborhood attack for MIAs through computing the loss difference between a given sequence and neighboring samples. Differently, Meeus et al.579 recently proposed a document-level MIA to today’s LLMs with 7B+ parameters.
To date, the studies of dataset-related privacy attacks is still limited, partially due to the extreme complexity of generating a substitute model in consideration of the necessities of trillions of tokens and billions of parameters. Nevertheless, during the procedure of decision-making, the decision-related training datasets are still worthy of protection against various membership inference attacks.
Social management and alignment risks
LLMs have witnessed pervasive applications in multiple domains, especially in NLP and computer vision disciplines. Nevertheless, the misalignment with human values causes serious problems to users and society, such as stereotyping,625 social bias,626 illegitimate instruction,595 and moral judgment,627 among others. Aiming to align the human values, to date a set of benchmarks have been built to test diverse LLMs and rectify their misbehaviors from different perspectives. For example, FLAMES628 tackles both harmlessness principles and a unique morality with respect to Chinese values (harmony). ALI-Agent629 employs the autonomous capability of an LLM-powered agent to conduct adaptive alignment assessment on stereotyping, morality, and legality. Lee et al.630 propose an alignment benchmark for Korean social values and common knowledge. Fu et al.631 investigated the misalignment in the aspect of culture heritage. From the above research, we can conclude that the LLMs are penetrating daily life and identifying increasingly more problems and challenges in social management and human values.
Technical vulnerabilities
With the development of information technology and social networks, the channels of information dissemination and management paradigms in society have gradually shifted from traditional top-down single forms to decentralized, flexible, and diverse paradigms. In recent years, the emergence of AI technologies led by LLMs has undoubtedly significantly enhanced the speed and breadth of information dissemination in human society while also empowering the administrative efficiency of social public management departments. However, the rapid development of LLMs has also brought various risks to social management. First, LLM-based technology can easily generate a large amount of deceptive text, images, and even short videos, and such false information spreads rapidly through developed social media networks. In particular, the dissemination speed of false information distorting or misinterpreting policies often exceeds the response speed of social management departments, disrupting normal policy implementation and social order.632 Second, social bots based on LLMs and recommendation systems exacerbate the phenomenon of the “information cocoon.”633 They lock onto target audiences with precise information delivery through powerful decision-making capabilities, gradually solidifying human cognition within certain boundaries through repeated interactions. This mechanism becomes a catalyst for inciting racism and political polarization among more extreme user groups online, leading to the expansion of online violence into offline protests, demonstrations, and even riots, causing social unrest.634
Regulatory gaps
Furthermore, for some ethically sensitive areas of social management, especially in healthcare, the LLMs also have many hidden risks. Although LLMs like Med-PaLM635 currently outperform human doctors in eight out of nine clinical dimensions in terms of diagnostic accuracy, their deployment still poses regulatory challenges. The US Food and Drug Administration is reluctant to certify LLMs as medical devices due to instances where LLM models have been found to inadvertently violate clinical decision support guidelines. This underscores the urgent need for regulatory mechanisms and policy frameworks like the EU Artificial Intelligence Act, which mandates rigorous validation and transparency for high-risk AI systems in healthcare settings. In addition, the cultural biases embedded in LLM training data further exacerbate these risks. A comparative analysis reveals that GPT-4’s diagnostic performance on non-Western medical datasets lags behind its Western-centric counterparts, with error rates increasing by 19% in low-resource settings.636 In conclusion, LLM-based technology poses certain risks and challenges to future social management, and there is a need to pave the gaps of establishing regulatory mechanisms and policy frameworks with respect to the requirement to align LLM models with mainstream values and prevent the potential issues proactively.
Value-alignment concerns
Current AI value-alignment methods can be broadly categorized into three main types: RL from human feedback, supervised fine-tuning methods, and inference-time alignment.637 These approaches still face numerous challenges regarding the interpretability of alignment and the variability of alignment objectives. Recently, the Value Compass project led by Microsoft Research Asia has approached this issue from an interdisciplinary perspective, drawing extensively from theories in ethics and sociology.638 This initiative has introduced the BaseAlign algorithm, which is grounded in the theory of basic human values. BaseAlign constructs a fundamental values space based on the various dimensions of human basic values proposed by social psychologist Shalom H. Schwartz.639 The target values for alignment can be represented as a vector within this value space. A discriminative model is then utilized to derive the value vector corresponding to the current behavior of the large model, with alignment achieved by minimizing the distance between these two vectors. To some extent, BaseAlign enhances the interpretability and transparency of value alignment in large models as well as their adaptability to the continuously evolving sociocultural context and changing social norms. However, despite the advancements in AI value alignment techniques, there remains a significant gap in achieving true alignment in large models. Challenges include the “alignment tax” problem, where value alignment may compromise the original capabilities of large models, and the scalability of supervision, especially in scenarios where future AI capabilities and knowledge far exceed human abilities, necessitating effective oversight and control.
Therefore, it is particularly important to implement industry-specific compliance frameworks in response to the regulatory mechanisms and policies, mandate bias audits for LLMs, and develop cross-cultural alignment and validation benchmarks.
Conclusion
In conclusion, this paper provides a comprehensive review of the technical developments in IDM, highlighting its evolution through the integration of models, optimization algorithms, and probabilistic inference tools. The focus has been on the paradigm of FM-based decision-making, exploring its potential to revolutionize IDM across various fields. While FMs offer unprecedented opportunities for advancing IDM in diverse applications, they also present significant challenges such as security, data privacy, and deployment costs.
The future development of IDM will focus on the following aspects, the first of which is the interpretability and transparency of decisions. Although methods such as CoT can partially reveal the decision-making process of large models, challenges remain, such as instability in generation, uncontrollable granularity, and misalignment with human reasoning. Therefore, for IDM applications in high-risk industries like military, law, and healthcare, further research is needed to enhance its interpretability and address issues such as model hallucination. Second, interdisciplinary IDM requires further investigation. Current IDM approaches are either based on general models or targeted at specific domains. However, to ensure the compliance and economic feasibility of decisions, it is often necessary to integrate knowledge from fields such as law and economics. Thus, one of the future research directions will be how to build interdisciplinary IDM. Lastly, the decision-making environment is typically dynamic. Existing model training and deployment methods make it difficult for models to continuously adapt to the evolving decision environment. Therefore, research into how models can self-evolve and dynamically update during the inference phase to cope with such changes is also a crucial area for future exploration.
IDM, at its essence, is about automating decision-making processes. It holds profound significance for human society and scientific progress, as it paves the way for more informed, efficient, and intelligent solutions. Looking ahead, continuous efforts are needed to address the existing challenges and further explore its potential in emerging fields, thus promoting the overall development of IDM and its positive impact on the world.
Funding and acknowledgments
This work was partially supported by the National Natural Science Foundation of China under grant nos. 62372470, 72225011, 62402414, U23B2059, 62173034, 32222070, 62402017, 72421002, 62206303, 62476264, 62406312, 62102266, 52173241, and U23A20468, the National Key Research and Development Program of China (2023YFD1900604), the Strategic Priority Research Program of the Chinese Academy of Science (XDB0680301), the Youth Innovation Promotion Association CAS (2023112), the National High Level Hospital Clinical Research funding (2022-PUMCH-A-014), the Beijing Natural Science Foundation (4244098), the Science and Technology Innovation Program of Hunan Province (2023RC3009), the Key Research and Development Program of Yunnan Province (202202AE090034), the MNR Key Laboratory for Geo-Environmental Monitoring of Greater Bay Area (GEMLab-2023001), the Science and Technology Innovation Key R&D Program of Chongqing (CSTB2024TIAD-STX0024), the China National Postdoctoral Program for Innovative Talents (BX20240385), and the River Talent Recruitment Program of Guangdong Province (2019ZT08X603).
Author contributions
J.H. and Y.X. designed and organized the review. Qi Wang, Qi (Cheems) Wang, T.L., X. Li, and S.Q. wrote the introduction. Z.Z., Z.S., T.Q., T.S., B.D., C. Yang, C. Yu, Y.W., and M.L. wrote the section “overview and development of FMs.” X. Liang, W.W., H.Z., Y.Z., Zhicheng Zhang, Z. Zhu, Y.L., A.L., Xu Cheng, B.A., and X. Zheng wrote the section “the paradigm of FM-based decision-making and key technologies.” F.W., B.Z., L.H., J.C., L.M., T.M., Y.L., J.Z., Jian Guo, X.J., W.X., C.B., Y.M., Z.Y., S.G., W.S., Y. Zhu, Junyi Gao, X.H., Y.L., G.J., X.A., X. Zhai, H.T., L.Y., H.S.,J.L.,E.H., V.C.M.L, Y.D, G.W., Y. Zheng, Yuanzhuo Wang, Jiafeng Guo, and L.W. wrote the section “FM-based intelligent decision-making for sciences.” X.F., Qi (Cheems) Wang, and G.J. wrote the section “risks and challenges of intelligent decision-making with large models.” Z.A., C.F., and K.H. wrote the conclusion. X. Chen, Yaonan Wang, S.Y., M.F., and A.F. mentored and revised the review.
Declaration of interests
The authors declare no competing interests.
Published Online: May 12, 2025
Contributor Information
Sihang Qiu, Email: qiusihang11@nudt.edu.cn.
Yanjie Dong, Email: ydong@smbu.edu.cn.
Xiaolong Zheng, Email: xiaolong.zheng@ia.ac.cn.
Gang Wang, Email: gangwang@bit.edu.cn.
Yu Zheng, Email: msyuzheng@outlook.com.
Yuanzhuo Wang, Email: wangyuanzhuo@ict.ac.cn.
Jiafeng Guo, Email: guojiafeng@ict.ac.cn.
Lizhe Wang, Email: lizhe.wang@gmail.com.
Xueqi Cheng, Email: cxq@ict.ac.cn.
Yaonan Wang, Email: yaonan@hnu.edu.cn.
Shanlin Yang, Email: yangsl@hfut.edu.cn.
Mengyin Fu, Email: fumy@bit.edu.cn.
Aiguo Fei, Email: aiguofei@bupt.edu.cn.
References
- 1.Hawkins D. Theory of games and economic behavior. Philos. Sci. 1945;12(3):221–227. doi: 10.1086/286866. [DOI] [Google Scholar]
- 2.Simon H.A. Administrative behavior: A study of decision-making processes in administrative organization. Macmillan. 1947 doi: 10.2307/1950596. [DOI] [Google Scholar]
- 3.Simon H.A. The new science of management decision. Harper & Brothers; 1960. [DOI] [Google Scholar]
- 4.Kahneman D., Tversky A. Prospect theory: An analysis of decision under risk. Econometrica. 1979;47:263–291. doi: 10.2307/1914185. [DOI] [Google Scholar]
- 5.Boyd J. (1976). Destruction and creation. Unpublished manuscript.
- 6.Gupta J.N.D., Forgionne G.A., Mora M.T., et al. Development processes of intelligent decision-making support systems: review and perspective. Intell. Decis.-Mak. Support Syst. 2006;2006:97–121. doi: 10.1007/1-84628-231-4_6. [DOI] [Google Scholar]
- 7.Agostini A., Torras C., Wörgötter F. Efficient interactive decision-making framework for robotic applications. Artif. Intell. 2017;247:187–212. doi: 10.1016/j.artint.2015.06.005. [DOI] [Google Scholar]
- 8.Yu L., Wang S., Lai K.K. An intelligent-agent-based fuzzy group decision making model for financial multicriteria decision support: The case of credit scoring. Eur. J. Oper. Res. 2009;195:942–959. doi: 10.1016/j.ejor.2007.11.023. [DOI] [Google Scholar]
- 9.Yang H., Kundakcioglu E., Li J., et al. Healthcare intelligence: turning data into knowledge. IEEE Intell. Syst. 2014;29:54–68. doi: 10.1109/MIS.2014.36. [DOI] [Google Scholar]
- 10.Sutton R.S., Barto A.G. Reinforcement learning: An introduction. MIT Press; 2018. [DOI] [Google Scholar]
- 11.Mnih V., Kavukcuoglu K., Silver D., et al. Playing Atari with deep reinforcement learning. arXiv. 2013 doi: 10.48550/arXiv.1312.5602. Preprint at. [DOI] [Google Scholar]
- 12.awahar G., Sagot B., Seddah D. What does BERT learn about the structure of language? Proc. ACL. 2019;57:3651–3657. doi: 10.18653/v1/P19-1356. [DOI] [Google Scholar]
- 13.Achiam J., Adler S., Agarwal S., et al. GPT-4 technical report. arXiv. 2023 doi: 10.48550/arXiv.2303.08774. Preprint at. [DOI] [Google Scholar]
- 14.Radford A., Kim J.W., Hallacy C., et al. Learning transferable visual models from natural language supervision. arXiv. 2021 doi: 10.48550/arXiv.2103.00020. Preprint at. [DOI] [Google Scholar]
- 15.Chen T., Kornblith S., Norouzi M., et al. A simple framework for contrastive learning of visual representations. Proc. ICML. 2020;119:1597–1607. doi: 10.48550/arXiv.2002.05709. [DOI] [Google Scholar]
- 16.He K., Chen X., Xie S., et al. Masked autoencoders are scalable vision learners. Proc. CVPR. 2022;2022:16000–16009. doi: 10.1109/CVPR52688.2022.01552. [DOI] [Google Scholar]
- 17.Finn C., Abbeel P., Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. Proc. ICML. 2017;70:1126–1135. doi: 10.48550/arXiv.1703.03400. [DOI] [Google Scholar]
- 18.Snell J., Swersky K., Zemel R.S. Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 2017;30 doi: 10.48550/arXiv.1703.05175. [DOI] [Google Scholar]
- 19.Wang Q., Federici M., van Hoof H. Bridge the inference gaps of neural processes via expectation maximization. Proc. ICLR. 2023 doi: 10.48550/arXiv.2210.05217. [DOI] [Google Scholar]
- 20.Liu B., Feng Y., Stone P., et al. FAMO: Fast adaptive multitask optimization. Adv. Neural Inf. Process. Syst. 2023;36 doi: 10.48550/arXiv.2306.03792. [DOI] [Google Scholar]
- 21.Shen J., Zhen X., Wang Q., et al. Episodic multi-task learning with heterogeneous neural processes. Adv. Neural Inf. Process. Syst. 2023;36:75214–75228. doi: 10.48550/arXiv.2310.18713. [DOI] [Google Scholar]
- 22.Shen J., Wang C., Xiao Z., et al. Go4Align: group optimization for multi-task alignment. arXiv. 2024 doi: 10.48550/arXiv.2404.06486. Preprint at. [DOI] [Google Scholar]
- 23.Lester B., Al-Rfou R., Constant N. The power of scale for parameter-efficient prompt tuning. Proc. EMNLP. 2021;2021:3045–3059. doi: 10.18653/v1/2021.emnlp-main.243. [DOI] [Google Scholar]
- 24.Yang Y., Shi Y., Wang C., et al. Reducing fine-tuning memory overhead by approximate and memory-sharing backpropagation. arXiv. 2024 doi: 10.48550/arXiv.2406.16282. Preprint at. [DOI] [Google Scholar]
- 25.Ma Z., Wang S., Deng X., et al. An improved approach for adversarial decision making under uncertainty based on simultaneous game. Proc. Chin. Control Decis. Conf. 2018;2018:2499–2503. doi: 10.1109/CCDC.2018.8407552. [DOI] [Google Scholar]
- 26.Gigerenzer G., Gaissmaier W. Heuristic decision making. Annu. Rev. Psychol. 2011;62:451–482. doi: 10.1146/annurev-psych-120709-145346. [DOI] [PubMed] [Google Scholar]
- 27.Kireeva N., Pozdnyak I., Filippov N. Development of a decision-making algorithm for expert system in information security. Proc. IEEE PIC S&T. 2020;2020:212–216. doi: 10.1109/PICST51311.2020.9467899. [DOI] [Google Scholar]
- 28.Chen L., Wang L., Dong H., et al. Introspective tips: Large language model for in-context decision making. arXiv. 2023 doi: 10.48550/arXiv.2305.11598. Preprint at. [DOI] [Google Scholar]
- 29.Bohanec M., Rajkovic V. Expert system for decision making. Sistemica. 1990;1(1):145–157. doi: 10.24432/C5P88W. [DOI] [Google Scholar]
- 30.Mnih V., Kavukcuoglu K., Silver D., et al. Human-level control through deep reinforcement learning. Nature. 2015;518(7540):529–533. doi: 10.1038/nature14236. [DOI] [PubMed] [Google Scholar]
- 31.Shao J., Qu Y., Chen C., et al. Counterfactual conservative Q learning for offline multi-agent reinforcement learning. Adv. Neural Inf. Process. Syst. 2023;36 doi: 10.48550/arXiv.2309.12696. [DOI] [Google Scholar]
- 32.Kaufmann E., Bauersfeld L., Loquercio A., et al. Champion-level drone racing using deep reinforcement learning. Nature. 2023;620(7976):982–987. doi: 10.1038/s41586-023-06222-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Schrittwieser J., Antonoglou I., Hubert T., et al. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature. 2020;588(7839):604–609. doi: 10.1038/s41586-020-03051-4. [DOI] [PubMed] [Google Scholar]
- 34.Levine S., Kumar A., Tucker G., et al. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv. 2020 doi: 10.48550/arXiv.2005.01643. Preprint at. [DOI] [Google Scholar]
- 35.Mao Y., Zhang H., Chen C., et al. Supported trust region optimization for offline reinforcement learning. Proc. ICML 2023. 2023;202:23829–23851. doi: 10.48550/arXiv.2311.08935. [DOI] [Google Scholar]
- 36.Yang S., Nachum O. and Du Y. et al. (2023). Foundation models for decision making: Problems, methods, and opportunities. Preprint at arXiv:2303.04129. DOI: 10.48550/arXiv.2307.0522210.48550/arXiv.2303.04129 [DOI]
- 37.Bellemare M.G., Dabney W. and Munos R. (2017). A distributional perspective on reinforcement learning. Proc. ICML2017:449–458. DOI:https://doi.org/10.48550/arXiv.1707.06887
- 38.Albrecht S.V., Christianos F., Schäfer L. Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press; 2024. [Google Scholar]
- 39.Schröder de Witt C., Gupta T. and Makoviichuk D. et al. (2020). Is independent learning all you need in the StarCraft multi-agent challenge? Preprint at arXiv:2011.09533. DOI: 10.48550/arXiv.2011.09533. [DOI]
- 40.Oliehoek F.A., Spaan M.T.J., Vlassis N. Optimal and approximate Q-value functions for decentralized POMDPs. J. Artif. Intell. Res. 2008;32:289–353. doi: 10.1613/jair.2429. [DOI] [Google Scholar]
- 41.Kraemer L., Banerjee B. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing. 2016;190:82–94. doi: 10.1016/j.neucom.2016.01.031. [DOI] [Google Scholar]
- 42.Yu C., Velu A., Vinitsky E., et al. The surprising effectiveness of PPO in cooperative multi-agent games. Adv. Neural Inf. Process. Syst. 2022;35:1–12. doi: 10.48550/arXiv.2103.01955. [DOI] [Google Scholar]
- 43.Li M., Wang Q., Xu Y. GTDE: Grouped training with decentralized execution for multi-agent actor-critic. Proc. AAAI Conf. Artif. Intell. 2025;39:1–10. doi: 10.1609/aaai.v39i1.34021. [DOI] [Google Scholar]
- 44.Springenberg J.T., Abdolmaleki A., Zhang J., et al. Offline actor-critic reinforcement learning scales to large models. Proc. Int. Conf. Mach. Learn. 2024;235:46323–46350. doi: 10.48550/arXiv.2402.05546. [DOI] [Google Scholar]
- 45.Wang Q., Van Hoof H. Model-based meta reinforcement learning using graph structured surrogate models and amortized policy search. Proc. Int. Conf. Mach. Learn. 2022;162:1–12. doi: 10.48550/arXiv.2102.08291. [DOI] [Google Scholar]
- 46.Liu F., Liu H., Grover A., Abbeel P. Masked autoencoding for scalable and generalizable decision making. Adv. Neural Inf. Process. Syst. 2022;35:12608–12618. doi: 10.48550/arXiv.2211.12740. [DOI] [Google Scholar]
- 47.Sekar R., Rybkin O., Daniilidis K., et al. Planning to explore via self-supervised world models. Proc. Int. Conf. Mach. Learn. 2020;119:8583–8592. doi: 10.48550/arXiv.2005.05960. [DOI] [Google Scholar]
- 48.Reed S., Zolna K., Parisotto E. et al. (2022). A generalist agent. Preprint at arXiv:2205.06175. DOI: 10.48550/arXiv.2205.06175. [DOI]
- 49.Wang J.X., Kurth-Nelson Z., Tirumala D. et al. (2016). Learning to reinforcement learn. Preprint at arXiv:1611.05763. DOI: 10.48550/arXiv.1611.05763. [DOI]
- 50.Bommasani R., Hudson D.A., Adeli E. et al. (2021). On the opportunities and risks of foundation models. Preprint at arXiv:2108.07258. DOI: 10.48550/arXiv.2108.07258. [DOI]
- 51.Zhuang F., Qi Z., Duan K., et al. A comprehensive survey on transfer learning. Proc. IEEE. 2020;109(1):43–76. doi: 10.1109/JPROC.2020.3004555. [DOI] [Google Scholar]
- 52.Xu Y., Liu X., Cao X., et al. Artificial intelligence: A powerful paradigm for scientific research. The Innovation. 2021;2(4):100104. doi: 10.1016/j.xinn.2021.100104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Yuan L., Chen D., Chen Y.-L., et al. (2021). Florence: A new foundation model for computer vision. Preprint at arXiv:2111.11432. DOI: 10.48550/arXiv.2111.11432. [DOI]
- 54.Subramanian S., Harrington P., Keutzer K., et al. Towards foundation models for scientific machine learning: Characterizing scaling and transfer behavior. Adv. Neural Inf. Process. Syst. 2023;36:12345–12367. doi: 10.48550/arXiv.2306.00258. [DOI] [Google Scholar]
- 55.Xu Y., Wang F., An Z., et al. Artificial intelligence for science—bridging data to wisdom. The Innovation. 2023;4:100525. doi: 10.1016/j.xinn.2023.100525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Mikolov T., Chen K., Corrado G., et al. (2013). Efficient estimation of word representations in vector space. Preprint at arXiv:1301.3781.
- 57.Pennington J., Socher R., Manning C.D. GloVe: Global vectors for word representation. Adv. Neural Inf. Process. Syst. 2014;27:1532–1543. doi: 10.3115/v1/D14-1162. [DOI] [Google Scholar]
- 58.Vaswani A., Shazeer N., Parmar N., et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017;30:5998–6008. doi: 10.48550/arXiv.1706.03762. [DOI] [Google Scholar]
- 59.Devlin J., Chang M.-W., Lee K., et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Adv. Neural Inf. Process. Syst. 2019;32:4171–4186. doi: 10.18653/v1/N19-1423. [DOI] [Google Scholar]
- 60.Radford A., Wu J., Child R., et al. Language models are unsupervised multitask learners. OpenAI Blog. 2019;1(8):9. [Google Scholar]
- 61.Brown T.B., Mann B., Ryder N., et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020;33:1877–1901. doi: 10.48550/arXiv.2005.14165. [DOI] [Google Scholar]
- 62.Ramesh A., Pavlov M., Goh G., et al. Zero-shot text-to-image generation. Adv. Neural Inf. Process. Syst. 2021;34:8821–8831. doi: 10.48550/arXiv.2102.12092. [DOI] [Google Scholar]
- 63.Liu Z., Lin Y., Cao Y., et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Adv. Neural Inf. Process. Syst. 2021;34:9992–10002. doi: 10.1109/ICCV48922.2021.00986. [DOI] [Google Scholar]
- 64.Yin S., Fu C., Zhao S., et al. A survey on multimodal large language models. CoRR. 2023 doi: 10.48550/arXiv.2306.13549. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Xu P., Zhu X., Clifton D.A. Multimodal learning with transformers: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023;45(10):12113–12132. doi: 10.1109/TPAMI.2023.3275156. [DOI] [PubMed] [Google Scholar]
- 66.Zhang D., Yu Y., Dong J., et al. MM-LLMs: Recent Advances in MultiModal Large Language Models. Findings Assoc. Comput. Linguist. 2024;2024:12401–12430. doi: 10.18653/v1/2024.findings-acl.738. [DOI] [Google Scholar]
- 67.Peters M.E., Neumann M., Iyyer M., et al. (2018). Deep contextualized word representations. Proc. NAACL-HLT 2018:2227–2237. DOI:10.18653/v1/N18-1202
- 68.Graves A. Vol. 385. Springer; 2012. Long short-term memory. In: Supervised Sequence Labelling with Recurrent Neural Networks; pp. 37–45. (Studies in Computational Intelligence). [DOI] [Google Scholar]
- 69.Radford A., Narasimhan K., Salimans T., et al. Improving language understanding by generative pre-training. OpenAI Blog. 2018 doi: 10.48550/arXiv.1801.06146. [DOI] [Google Scholar]
- 70.Dong L., Yang N., Wang W., et al. Unified language model pre-training for natural language understanding and generation. Adv. Neural Inf. Process. Syst. 2019;32:13042–13054. doi: 10.48550/arXiv.1905.03197. [DOI] [Google Scholar]
- 71.Raffel C., Shazeer N., Roberts A., et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020;21:1–67. [Google Scholar]
- 72.He K., Zhang X., Ren S., et al. (2016). Deep residual learning for image recognition. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2016:770–778. DOI:10.1109/CVPR.2016.90
- 73.Yosinski J., Clune J., Bengio Y., et al. How transferable are features in deep neural networks? Adv. Neural Inf. Process. Syst. 2014;27:3320–3328. [Google Scholar]
- 74.Long M., Cao Y., Wang J., et al. Learning transferable features with deep adaptation networks. Proc. Int. Conf. Mach. Learn. 2015;37:97–105. [Google Scholar]
- 75.Dosovitskiy A., Beyer L., Kolesnikov A., et al. (2020). An image is worth 16×16 words: Transformers for image recognition at scale. Preprint at arXiv:2010.11929. DOI: 10.48550/arXiv.2010.11929. [DOI]
- 76.Ho J., Jain A., Abbeel P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020;33:6840–6851. doi: 10.48550/arXiv.2006.11239. [DOI] [Google Scholar]
- 77.Sohl-Dickstein J., Weiss E., Maheswaranathan N., et al. Deep unsupervised learning using nonequilibrium thermodynamics. Proc. Int. Conf. Mach. Learn. 2015;37:2256–2265. [Google Scholar]
- 78.Song Y., Ermon S. Generative modeling by estimating gradients of the data distribution. Adv. Neural Inf. Process. Syst. 2019;32:11895–11907. [Google Scholar]
- 79.Feng W., Yang C., An Z., et al. Relational diffusion distillation for efficient image generation. Proc. ACM Int. Conf. Multimedia. 2024;32:205–213. doi: 10.1145/3581783.3612406. [DOI] [Google Scholar]
- 80.Esser P., Kulal S., Blattmann A., et al. (2024). Scaling rectified flow transformers for high-resolution image synthesis. Proc. Int. Conf. Mach. Learn. 503:12606–12633.
- 81.Peebles W., Xie S. Scalable diffusion models with transformers. Proc. IEEE/CVF Int. Conf. Comput. Vis. 2023;2023:4195–4205. doi: 10.1109/ICCV.2023.00474. [DOI] [Google Scholar]
- 82.Brooks T., Peebles B., Holmes C., et al. Video generation models as world simulators. OpenAI Blog. 2024;1:8. [Google Scholar]
- 83.Sun Q., Yu Q., Cui Y., et al. (2023). Generative pretraining in multimodality. Preprint at arXiv:2307.05222. DOI: 10.48550/arXiv.2307.05222. [DOI]
- 84.Rombach R., Blattmann A., Lorenz D., et al. High-resolution image synthesis with latent diffusion models. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 2022;2022:10684–10695. doi: 10.1109/CVPR52688.2022.01041. [DOI] [Google Scholar]
- 85.Ramesh A., Dhariwal P., Nichol A., et al. (2022). Hierarchical text-conditional image generation with CLIP latents. Preprint at arXiv:2204.06125. DOI: 10.48550/arXiv.2204.06125. [DOI]
- 86.Liu H., Li C., Wu Q., et al. Visual instruction tuning. Adv. Neural Inf. Process. Syst. 2024;36 doi: 10.48550/arXiv.2304.08485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Zhu D., Chen J., Shen X., et al. (2023). MiniGPT-4: Enhancing vision-language understanding with advanced large language models. arXiv abs/2304.10592. DOI:10.48550/arXiv.2304.10592.
- 88.Alayrac J.-B., Donahue J., Luc P., et al. Flamingo: A visual language model for few-shot learning. Adv. Neural Inf. Process. Syst. 2022;35:23716–23736. [Google Scholar]
- 89.Cheng X., Yan X., Lan Y., et al. BTM: Topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 2014;26(12):2928–2941. doi: 10.1109/TKDE.2014.2313872. [DOI] [Google Scholar]
- 90.Cheng X.-Q., Shen H.-W. Uncovering the community structure associated with the diffusion dynamics on networks. J. Stat. Mech. 2010;2010(04):04024. doi: 10.1088/1742-5468/2010/04/P04024. [DOI] [Google Scholar]
- 91.Li Y., Sun T., Shao Z., et al. Trajectory-user linking via multi-scale graph attention network. Pattern Recognit. 2025;158:110978. doi: 10.1016/j.patcog.2024.110978. [DOI] [Google Scholar]
- 92.Zhang Y., Lin Y., Zheng G., et al. MetaCity: Data-driven sustainable development of complex cities. The Innovation. 2025;6(2):100775. doi: 10.1016/j.xinn.2024.100775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Qian T., Xu Y., Zhang Z., et al. Trajectory prediction from hierarchical perspective. Proc. ACM Int. Conf. Multimedia. 2022;30:6822–6830. doi: 10.1145/3503161.3548092. [DOI] [Google Scholar]
- 94.Sun T., Wang F., Zhang Z., et al. Human mobility identification by deep behavior relevant location representation. Proc. Int. Conf. Database Syst. Adv. Appl. 2022;13247:439–454. doi: 10.1007/978-3-031-00126-0_33. [DOI] [Google Scholar]
- 95.Guan S., Jin X., Guo J., et al. NeuInfer: Knowledge inference on n-ary facts. In. Proc. Annu. Meet. Assoc. Comput. Linguist. 2020;58:6141–6151. doi: 10.18653/v1/2020.acl-main.546. [DOI] [Google Scholar]
- 96.Chen Z., Zhang Z., Li Z., et al. Self-improvement programming for temporal knowledge graph question answering. Proc. LREC-COLING. 2024;2024:14579–14594. [Google Scholar]
- 97.Liang K., Meng L., Liu M., et al. A survey of knowledge graph reasoning on graph types: Static, dynamic, and multi-modal. IEEE Trans. Pattern Anal. Mach. Intell. 2024;46(12):9456–9478. doi: 10.1109/TPAMI.2024.3417451. [DOI] [PubMed] [Google Scholar]
- 98.Yu C., Wang F., Shao Z., et al. GinAR: An end-to-end multivariate time series forecasting model suitable for variable missing. Proc. ACM SIGKDD Conf. Knowl. Discov. Data Min. 2024;2024:3989–4000. doi: 10.1145/3637528.3672055. [DOI] [Google Scholar]
- 99.Li Y., Shao Z., Xu Y., et al. Dynamic frequency domain graph convolutional network for traffic forecasting. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 2024;2024:5245–5249. [Google Scholar]
- 100.Liang K., Liu Y., Zhou S., et al. Knowledge graph contrastive learning based on relation-symmetrical structure. IEEE Trans. Knowl. Data Eng. 2023;36(1):226–238. doi: 10.1109/TKDE.2023.3282989. [DOI] [Google Scholar]
- 101.Qiu J., Chen Q., Dong Y., et al. GCC: Graph contrastive coding for graph neural network pre-training. Proc. ACM SIGKDD Conf. Knowl. Discov. Data Min. 2020;26:1150–1160. doi: 10.1145/3394486.3403168. [DOI] [Google Scholar]
- 102.Sun X., Cheng H., Li J., et al. All in one: Multi-task prompting for graph neural networks. Proc. ACM SIGKDD Conf. Knowl. Discov. Data Min. 2023;29:2120–2131. doi: 10.1145/3580305.3599256. [DOI] [Google Scholar]
- 103.Wenkel F., Wolf G. and Knyazev B. (2023). Pretrained language models to solve graph tasks in natural language. Proc. ICML Workshop Struct. Probab. Inference Generative Model. 2023.
- 104.Ye R., Zhang C., Wang R., et al. Language is all a graph needs. Findings Assoc. Comput. Linguist.: EACL. 2024;2024:1955–1973. [Google Scholar]
- 105.Mavromatis C., Ioannidis V.N., Wang S., et al. Train your own GNN teacher: Graph-aware distillation on textual graphs. Lect. Notes Comput. Sci. 2023;14171:157–173. doi: 10.1007/978-3-031-43418-1_10. [DOI] [Google Scholar]
- 106.Zhao T., Wang S., Ouyang C., et al. Artificial intelligence for geoscience: Progress, challenges and perspectives. The Innovation. 2024;5(5):100691. doi: 10.1016/j.xinn.2024.100691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Qian T., Chen Y., Cong G., et al. Adaptraj: a multi-source domain generalization framework for multi-agent trajectory prediction. IEEE; 2024. pp. 5048–5060. [DOI] [Google Scholar]
- 108.Xu Y., Wang F., Zhang T. Artificial intelligence is restructuring a new world. Innovation. 2024;5(6):100725. doi: 10.1016/j.xinn.2024.100725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Wang F., Yao D., Li Y., et al. Ai-enhanced spatial-temporal data-mining technology: New chance for next-generation urban computing. Innovation. 2023;4(2):100405. doi: 10.1016/j.xinn.2023.100405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Shao Z., Zhang Z., Wang F., et al. Pre-training enhanced spatial-temporal graph neural network for multivariate time series forecasting. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2022. p. 1567–1577. DOI:10.1145/3534678.3539396
- 111.Shao Z., Zhang Z., Wei W., et al. Decoupled dynamic spatial-temporal graph neural network for traffic forecasting. Proceedings VLDB Endowment. 2022;15(11):2733–2746. doi: 10.14778/3551793.3551827. [DOI] [Google Scholar]
- 112.Shao Z., Wang F., Xu Y., et al. Exploring progress in multivariate time series forecasting: Comprehensive benchmarking and heterogeneity analysis. IEEE Trans. Knowl. Data Eng. 2024 doi: 10.1109/TKDE.2024.3484454. [DOI] [Google Scholar]
- 113.Shao Z., Zhang Z., Wang F., et al. Spatial-temporal identity: A simple yet effective baseline for multivariate time series forecasting. Proceedings of the 31st ACM International Conference on Information & Knowledge Management; 2022. p. 4454–4458. DOI: 10.1145/3511808.3557702. [DOI]
- 114.Cao D., Jia F., Arik S.O., Tempo, et al. Prompt-based generative pre-trained transformer for time series forecasting. arXiv. 2023 doi: 10.48550/arXiv.2310.04948. Preprint at. [DOI] [Google Scholar]
- 115.Chang C., Peng W.-C., Chen T.-F. (2023). Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained LLMs Preprint at arXiv:2308.08469. DOI: 10.1145/3719207. [DOI]
- 116.Gruver, N., Finzi, M., Qiu, S. et al. (2023). Large language models are zero-shot time series forecasters. In Proceedings of the 37th International Conference on Neural Information Processing Systems, pp. 19622–19635.
- 117.Yue, Z., Wang, Y., Duan, J. et al. (2022). TS2Vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8980–8987. DOI: 10.1609/aaai.v36i8.20881. [DOI]
- 118.Dong, J., Wu, H., Zhang, H. et al. (2023). SimMTM: A simple pre-training framework for masked time-series modeling. In Proceedings of the 37th International Conference on Neural Information Processing Systems, pp. 29996–30025.
- 119.Wang Y., Shao Z., Sun T., et al. Clustering-property matters: A cluster-aware network for large scale multivariate time series forecasting. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 2023:4340–4344. [Google Scholar]
- 120.Jin M., Wang S., Ma L., et al. Time-LLM: Time series forecasting by reprogramming large language models. arXiv. 2023:2310.01728. doi: 10.48550/arXiv.2310.01728. Preprint at. [DOI] [Google Scholar]
- 121.Xue H., Salim F.D. Promptcast: A new prompt-based learning paradigm for time series forecasting. IEEE Trans. Knowl. Data Eng. 2023. DOI: 10.1109/TKDE.2023.3342137. [DOI]
- 122.Chen M., Tworek J., Jun H., et al. Evaluating large language models trained on code. arXiv. 2021:2107.03374. doi: 10.48550/arXiv.2107.03374. Preprint at. [DOI] [Google Scholar]
- 123.Li Y., Choi D., Chung J., et al. Competition-level code generation with alphacode. Science. 2022;378(6624):1092–1097. doi: 10.1126/science.abq1158. [DOI] [PubMed] [Google Scholar]
- 124.Starcoder: may the source be with you! Li R., Ben Allal L., Zi Y., et al., editors. Preprint at. arXiv. 2023;2305(06161) doi: 10.48550/arXiv.2305.06161. [DOI] [Google Scholar]
- 125.Jimenez C.E., Yang J., Wettig A., Swebench, et al. Can language models resolve real-world github issues? arXiv. 2023:2310.06770. doi: 10.48550/arXiv.2310.06770. Preprint at. [DOI] [Google Scholar]
- 126.Dong L., Xu S., Xu B. Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. IEEE; 2018. pp. 5884–5888. [DOI] [Google Scholar]
- 127.Wang C., Chen S., Wu Y., et al. Neural codec language models are zero-shot text to speech synthesizers. arXiv. 2023:2301.02111. doi: 10.48550/arXiv.2301.02111. Preprint at. [DOI] [Google Scholar]
- 128.Jason W., Wang X., Schuurmans D., et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022;35:24824–24837. [Google Scholar]
- 129.Yao S., Yu D., Zhao J., et al. Tree of thoughts: Deliberate problem solving with large language models. Adv. Neural Inf. Process. Syst. 2024;36 [Google Scholar]
- 130.Sha H., Yao M., Jiang Y., Languagempc, et al. Large language models as decision makers for autonomous driving. arXiv. 2023:2310.03026. doi: 10.48550/arXiv.2310.03026. Preprint at. [DOI] [Google Scholar]
- 131.Guo, T., Chen, X., Wang, Y. et al. (2024). Large Language Model Based Multi-Agents: A Survey of Progress and Challenges. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24), pp. 8048–8057.
- 132.Rao A., Kim J., Kamineni M., et al. Evaluating chatGPT as an adjunct for radiologic decision-making. medRxiv. 2023 doi: 10.1101/2023.02.02.23285399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133.Cui C., Ma Y., Cao X., et al. Drive as you speak: Enabling human-like interaction with large language models in autonomous vehicles. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024:902–909. [Google Scholar]
- 134.Zhao A., Huang D., Xu Q., et al. Expel: Llm agents are experiential learners. Proceedings of the AAAI Conference on Artificial Intelligence. 2024:19632–19642. doi: 10.1609/aaai.v38i17.29936. [DOI] [Google Scholar]
- 135.Yu C., Yan G., Yu C., et al. Attention mechanism is useful in spatio-temporal wind speed prediction: Evidence from china. Appl. Soft Comput. 2023;148:110864. doi: 10.1016/j.asoc.2023.110864. [DOI] [Google Scholar]
- 136.Yu C., Wang F., Wang Y., et al. MGSFformer: A multi-granularity spatiotemporal fusion transformer for air quality prediction. Inf. Fusion. 2025;113:102607. doi: 10.1016/j.inffus.2024.102607. [DOI] [Google Scholar]
- 137.Cheng F., Liu H. Multi-step electric vehicles charging loads forecasting: An autoformer variant with feature extraction, frequency enhancement, and error correction blocks. Appl. Energy. 2024;376:124308. doi: 10.1016/j.apenergy.2024.124308. [DOI] [Google Scholar]
- 138.Yu C., Wang F., Shao Z., et al. Dsformer: A double sampling transformer for multivariate time series long-term prediction. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management; 2023. p. 3062–3072. DOI: 10.1145/3583780.3614851. [DOI]
- 139.Chengqing Y., Guangxi Y., Chengming Y., et al. A multi-factor driven spatiotemporal wind power prediction model based on ensemble deep graph attention reinforcement learning networks. Energy. 2023;263:126034. doi: 10.1016/j.energy.2022.126034. [DOI] [Google Scholar]
- 140.Yu C., Yan G., Yu C., et al. A multi-resolution interactive transformer for wind speed multi-step prediction. Inf. Sci. 2024;661:120150. doi: 10.1016/j.ins.2024.120150. [DOI] [Google Scholar]
- 141.Yu C., Qiao J., Chen C., et al. A new temporal frequency ensemble transformer for day-ahead photovoltaic power prediction. J. Clean. Prod. 2024;448:141690. doi: 10.1016/j.jclepro.2024.141690. [DOI] [Google Scholar]
- 142.Touvron H., Lavril T., Izacard G., Llama, et al. Open and efficient foundation language models. arXiv. 2023 doi: 10.48550/arXiv.2302.13971. Preprint at. [DOI] [Google Scholar]
- 143.Shao Z., Qian T., Sun T., et al. Spatial-temporal large models: A super hub linking multiple scientific areas with artificial intelligence. Innovation. 2025;6:100763. doi: 10.1016/j.xinn.2024.100763. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 144.Team G.L.M., Zeng A., Xu B., Chatglm, et al. A family of large language models from glm-130b to glm-4 all tools. arXiv. 2024 doi: 10.48550/arXiv.2406.12793. Preprint at. [DOI] [Google Scholar]
- 145.Lewis M. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension Preprint at arXiv:1910.13461. DOI: 10.48550/arXiv.1910.13461. [DOI]
- 146.Tay Y., Dehghani M., Tran V.Q., et al. Ul2: Unifying language learning paradigms Preprint at arXiv. 2022:2205.05131. doi: 10.48550/arXiv.2205.05131. [DOI] [Google Scholar]
- 147.Bai J., Bai S., Chu Y., et al. Qwen technical report. arXiv. 2023 doi: 10.48550/arXiv.2309.16609. Preprint at. [DOI] [Google Scholar]
- 148.Hu E.J., Shen Y., Wallis P., et al. Lora: Low-Rank Adaptation of Large Language Models. International Conference on Learning Representations. 2022:1–20. [Google Scholar]
- 149.Ouyang, L., Wu, J., Jiang, X. et al. (2022). Training Language Models to Follow Instructions with Human Feedback. In Proceedings of the 36th International Conference on Neural Information Processing Systems, pp. 27730–27744.
- 150.Lewis P., Perez E., Piktus A., et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv. Neural Inf. Process. Syst. 2020;33:9459–9474. [Google Scholar]
- 151.Schick, T., Dwivedi-Yu, J., Dessì, R. et al. (2023). Toolformer: Language Models Can Teach Themselves to use Tools. In Proceedings of the 37th International Conference on Neural Information Processing Systems, pp. 68539–68551.
- 152.Meng, F., Shao, W., Jiang, C. et al. (2023). Foundation Model is Efficient Multimodal Multitask Model Selector. In Proceedings of the 37th International Conference on Neural Information Processing Systems, pp. 33065–33094.
- 153.Kim W., Son B., Kim I. International Conference on Machine Learning. PMLR; 2021. Vilt: Vision-and-language transformer without convolution or region supervision; pp. 5583–5594. [Google Scholar]
- 154.Xu H., Ye Q., Yan M., et al. International Conference on Machine Learning. PMLR; 2023. mplug-2: A modularized multi-modal foundation model across text, image and video; pp. 38728–38748. [Google Scholar]
- 155.Li C., Gan Z., Yang Z., et al. Multimodal foundation models: From specialists to general-purpose assistants. Found. Trends Comput. Graph. Vis. 2024;16(1–2):1–214. doi: 10.1561/0600000110. [DOI] [Google Scholar]
- 156.Xu X., Wang Z., Zhang G., et al. Versatile diffusion: Text, images and variations all in one diffusion model. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023:7754–7765. [Google Scholar]
- 157.Zheng Y., Zhang Y.-J., Larochelle H. A deep and autoregressive approach for topic modeling of multimodal data. IEEE Trans. Pattern Anal. Mach. Intell. 2016;38(6):1056–1069. doi: 10.1109/TPAMI.2015.2476802. [DOI] [PubMed] [Google Scholar]
- 158.Piergiovanni A.J., Noble I., Kim D., et al. Mirasol3b: A multimodal autoregressive model for time-aligned and contextual modalities. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024:26804–26814. [Google Scholar]
- 159.Wang Y., Yasunaga M., Ren H., et al. Vqa-GNN: Reasoning with multimodal knowledge via graph neural networks for visual question answering. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023:21582–21592. [Google Scholar]
- 160.Chen F., Shao J., Zhu S., et al. Multivariate, multi-frequency and multimodal: Rethinking graph neural networks for emotion recognition in conversation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023:10761–10770. [Google Scholar]
- 161.Yang C., An Z., Zhu H., et al. Gated convolutional networks with hybrid connectivity for image classification. Proceedings of the AAAI Conference on Artificial Intelligence. 2020:12581–12588. doi: 10.1609/aaai.v34i07.6948. [DOI] [Google Scholar]
- 162.Roy S.K., Deria A., Hong D., et al. Multimodal fusion transformer for remote sensing image classification. IEEE Trans. Geosci. Remote Sens. 2023;61:1–20. doi: 10.1109/TGRS.2023.3286826. [DOI] [Google Scholar]
- 163.Wang Y., Chen X., Cao L., et al. Multimodal token fusion for vision transformers. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:12186–12195. [Google Scholar]
- 164.Cheng F., Liu H. An adaptive hybrid deep learning-based reliability assessment framework for damping track system considering multi-random variables. Mech. Syst. Signal Process. 2024;208:110981. doi: 10.1016/j.ymssp.2023.110981. [DOI] [Google Scholar]
- 165.Fei N., Lu Z., Gao Y., et al. Towards artificial general intelligence via a multimodal foundation model. Nat. Commun. 2022;13(1):3094. doi: 10.1038/s41467-022-30761-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 166.Ma F., Li Y., Ni S., et al. Data augmentation for audio-visual emotion recognition with an efficient multimodal conditional GAN. Applied Sciences. 2022;12(1):527. doi: 10.3390/app12010527. [DOI] [Google Scholar]
- 167.Wang Z., Cai S., Liu A., et al. Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. IEEE Trans. Pattern Anal. Mach. Intell. 2024 doi: 10.1109/TPAMI.2024.3511593. [DOI] [Google Scholar]
- 168.Zheng W., Yu J., Xia R., et al. A facial expression-aware multimodal multi-task learning framework for emotion recognition in multi-party conversations. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 1. Long Papers; 2023. p. 15445–15459. doi:10.18653/v1/2023.acl-long.861
- 169.Singh A., Hu R., Goswami V., et al. Flava: A foundational language and vision alignment model. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:15638–15650. [Google Scholar]
- 170.Sun Z., Shen S., Cao S., et al. Aligning large multimodal models with factually augmented RLHF. arXiv. 2023 doi: 10.48550/arXiv.2309.14525. Preprint at. [DOI] [Google Scholar]
- 171.Zhang S., Xu Y., Usuyama N., et al. Biomedclip: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. arXiv. 2023 doi: 10.48550/arXiv.2303.00915. Preprint at. [DOI] [Google Scholar]
- 172.Yang C., An Z., Cai L., et al. Mutual contrastive learning for visual representation learning. Proceedings of the AAAI Conference on Artificial Intelligence. 2022:3045–3053. doi: 10.1609/aaai.v36i3.20211. [DOI] [Google Scholar]
- 173.Chen Z., Jing L., Li Y., et al. Bridging the domain gap: Self-supervised 3d scene understanding with foundation models. Adv. Neural Inf. Process. Syst. 2024;36 [Google Scholar]
- 174.Valada A., Mohan R., Burgard W. Self-supervised model adaptation for multimodal semantic segmentation. Int. J. Comput. Vis. 2020;128(5):1239–1285. doi: 10.1007/s11263-019-01188-y. [DOI] [Google Scholar]
- 175.Furuta H., Lee K.-H., Nachum O., et al. Multimodal web navigation with instruction-finetuned foundation models. arXiv. 2023 doi: 10.48550/arXiv.2305.11854. Preprint at. [DOI] [Google Scholar]
- 176.Lahat D., Adali T.¨lay, Jutten C. Multimodal data fusion: an overview of methods, challenges, and prospects. Proc. IEEE. 2015;103(9):1449–1477. doi: 10.1109/JPROC.2015.2460697. [DOI] [Google Scholar]
- 177.Gao J., Li P., Chen Z., et al. A survey on deep learning for multimodal data fusion. Neural Comput. 2020;32(5):829–864. doi: 10.1162/neco_a_01273. [DOI] [PubMed] [Google Scholar]
- 178.Li Y., Wang Y., Cui Z. Decoupled multimodal distilling for emotion recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023:6631–6640. [Google Scholar]
- 179.Shicai W., Luo C., Luo Y. Mmanet: Margin-aware distillation and modality-aware regularization for incomplete multimodal learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023:20039–20049. [Google Scholar]
- 180.Yu C., Zhou Q., Li J., et al. Foundation model drives weakly incremental learning for semantic segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023:23685–23694. [Google Scholar]
- 181.Long Y., Hui B., Ye F., et al. Spring: Situated conversation agent pretrained with multimodal questions from incremental layout graph. Proceedings of the AAAI Conference on Artificial Intelligence. 2023:13309–13317. doi: 10.1609/aaai.v37i11.26562. [DOI] [Google Scholar]
- 182.Nayak P. Google; 2021. Mum: A New AI Milestone for Understanding Information. [Google Scholar]
- 183.Reddy D.M., Basha M., Hari C., et al. Dall-e: Creating images from text. UGC Care Group I Journal. 2021;8(14):71–75. [Google Scholar]
- 184.Baltrusaitis T., Ahuja C., Morency L.-P. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 2018;41(2):423–443. doi: 10.1109/TPAMI.2018.2798607. [DOI] [PubMed] [Google Scholar]
- 185.Liu Y. (2019). Roberta: A robustly optimized BERT pretraining approach Preprint at arXiv:1907.11692. DOI: 10.48550/arXiv.1907.11692. [DOI]
- 186.Chowdhery A., Narang S., Devlin J., et al. Scaling language modeling with pathways. J. Mach. Learn. Res. 2023;24(240):1–113. [Google Scholar]
- 187.Touvron H., Martin L., Stone K., et al. Llama 2: Open foundation and fine-tuned chat models. arXiv. 2023:2307.09288. doi: 10.48550/arXiv.2307.09288. Preprint at. [DOI] [Google Scholar]
- 188.Mao Y., Ge Y., Fan Y., et al. A survey on lora of large language models. arXiv. 2024:2407.11046. doi: 10.1007/s11704-024-40663-9. Preprint at. [DOI] [Google Scholar]
- 189.Hu E.J., Shen Y., Wallis P., et al. LoRA: Low-rank adaptation of large language models. International Conference on Learning Representations. 2022 [Google Scholar]
- 190.Dettmers T., Pagnoni A., Holtzman A., et al. In: Oh A., Naumann T., Globerson A., et al., editors. Vol. 36. Curran Associates, Inc.; 2023. Qlora: Efficient finetuning of quantized LLMs; pp. 10088–10115. (Advances in Neural Information Processing Systems). [Google Scholar]
- 191.Wang H., Ping B., Wang S., Lora-flow, et al. Dynamic lora fusion for large language models in generative tasks. Association for Computational Linguistics; Bangkok, Thailand: 2024. pp. 12871–12882. [DOI] [Google Scholar]
- 192.Wu T., Wang J., Zhao Z., et al. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Al-Onaizan Y., Bansal M., Chen Y.-N., editors. Association for Computational Linguistics; Miami, Florida, USA: 2024. Mixture-of-subspaces in low-rank adaptation; pp. 7880–7899. [Google Scholar]
- 193.Ma X., Fang G., Wang X. Llm-pruner: On the structural pruning of large language models. Adv. Neural Inf. Process. Syst. 2023;36:21702–21720. [Google Scholar]
- 194.Yang C., An Z., Huang L., et al. Clip-kd: An empirical study of clip model distillation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024:15952–15962. [Google Scholar]
- 195.Yang C., Zhou H., An Z., et al. Cross-image relational knowledge distillation for semantic segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:12319–12328. [Google Scholar]
- 196.Yang C., An Z., Cai L., et al. Hierarchical self-supervised augmented knowledge distillation. International Joint Conference on Artificial Intelligence. 2021:1217–1223. [Google Scholar]
- 197.Yang C., An Z., Zhou H., et al. Online knowledge distillation via mutual contrastive learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023;45(8):10212–10227. doi: 10.1109/TPAMI.2023.3257878. [DOI] [PubMed] [Google Scholar]
- 198.Zhao Y., Lin C.-Y., Zhu K., et al. In: Gibbons P., Pekhimenko G., De Sa C., editors. Vol. 6. 2024. Atom: Low-bit quantization for efficient and accurate LLM serving; pp. 196–209. (Proceedings of Machine Learning and Systems). [Google Scholar]
- 199.E. Xie, J. Chen, J. Chen, et al. (2024). Sana: Efficient high-resolution image synthesis with linear diffusion transformer. DOI: 10.48550/arXiv.2410.10629. [DOI]
- 200.Buehler M.J. (2024). Preflexor: Preference-based recursive language modeling for exploratory optimization of reasoning and agentic thinking Preprint at arXiv:2410.12375. DOI: 10.48550/arXiv.2410.12375. [DOI]
- 201.Wu Y., Zhang Z., Wang F., et al. Towards more economical large-scale foundation models : no longer a game for the few. Innovation. 2025:100832. doi: 10.1016/j.xinn.2025.100832. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 202.Liu H., Yu C., Wu H., et al. A new hybrid ensemble deep reinforcement learning model for wind speed short term forecasting. Energy. 2020;202:117794. doi: 10.1016/j.energy.2020.117794. [DOI] [Google Scholar]
- 203.Liu X., Qin M., He Y., et al. A new multi-data-driven spatiotemporal pm2. 5 forecasting model based on an ensemble graph reinforcement learning convolutional network. Atmos. Pollut. Res. 2021;12(10):101197. doi: 10.1016/j.apr.2021.101197. [DOI] [Google Scholar]
- 204.Yang S. University of California; Berkeley: 2024. Foundation Models for Decision Making: Algorithms, Frameworks, and Applications. PhD thesis, EECS Department. [Google Scholar]
- 205.Wang Q., Feng Y., Huang J., et al. Large-scale generative simulation artificial intelligence: The next hotspot. Innovation. 2023;4(6) doi: 10.1016/j.xinn.2023.100516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 206.Fu Z., Yang H., Man-Cho So A., et al. On the Effectiveness of Parameter-Efficient Fine-Tuning. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12799–12807. DOI: 10.1609/aaai.v37i11.26505. [DOI]
- 207.Han Z., Gao C., Liu J., et al. (2024). Parameter-efficient fine-tuning for large models: A comprehensive survey. Preprint at arXiv:2403.14608. DOI: 10.48550/arXiv.2403.14608. [DOI]
- 208.Liang K., Meng L., Li H., et al. Mgksite: Multi-modal knowledge-driven site selection via intra and inter-modal graph fusion. IEEE Trans. Multimed. 2024 doi: 10.1109/TMM.2024.3521742. [DOI] [Google Scholar]
- 209.Ashar P., Devadas S., Newton A.R. Springer US; Boston, MA: 1992. State Encoding; pp. 87–116. [Google Scholar]
- 210.Higashi H., Minami T., Nakauchi S. Cooperative update of beliefs and state-transition functions in human reinforcement learning. Sci. Rep. 2019;9(1):17704. doi: 10.1038/s41598-019-53600-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 211.Chen S., Liao Y., Wang F., et al. Toward the robustness of autonomous vehicles in the ai era. Innovation. 2025;6 doi: 10.1016/j.xinn.2024.100780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 212.Zhou Y., Wu L., Ramamoorthi R., et al. (2021). Vectorization for fast, analytic, and differentiable visibility. ACM Trans. Graph.40(3):1–21. DOI: 10.1145/3452097. [DOI]
- 213.Sivamani S., Kumar G., Gudipalli A. A comprehensive review on payloads of unmanned aerial vehicle. Egypt. J. Remote Sens. Space Sci. 2024;27(4):637–644. [Google Scholar]
- 214.Jason W., Wang X., Schuurmans D., et al. Vol. 35. NeurIPS; 2022. Chain-of-thought prompting elicits reasoning in large language models. (Advances in Neural Information Processing Systems). [Google Scholar]
- 215.Paul F.C., Leike J., Brown T., et al. Deep reinforcement learning from human preferences. Adv. Neural Inf. Process. Syst. 2017;30 [Google Scholar]
- 216.Bai Y., Jones A., Ndousse K., et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv. 2022 Preprint at. [Google Scholar]
- 217.Song F., Yu B., Li M., et al. Preference ranking optimization for human alignment. arXiv. 2023 Preprint at. [Google Scholar]
- 218.Schulman J., Wolski F., Dhariwal P., et al. Proximal policy optimization algorithms. arXiv. 2023 Preprint at. [Google Scholar]
- 219.Rafailov R., Sharma A., Mitchell E., et al. Direct preference optimization: Your language model is secretly a reward model. arXiv. 2023 Preprint at. [Google Scholar]
- 220.Gheshlaghi Azar M., Guo Z.D., Piot B., et al. International Conference on Artificial Intelligence and Statistics. PMLR; 2024. A general theoretical paradigm to understand learning from human preferences; pp. 4447–4455. [Google Scholar]
- 221.Ethayarajh K., Xu W., Muennighoff N., et al. Kto: Model alignment as prospect theoretic optimization. arXiv. 2023 Preprint at. [Google Scholar]
- 222.Zeng Y., Liu G., Ma W., et al. Token-level direct preference optimization. arXiv. 2023 Preprint at. [Google Scholar]
- 223.Meng Y., Xia M., Chen D. Simpo: Simple preference optimization with a reference-free reward. arXiv. 2023 Preprint at. [Google Scholar]
- 224.Hong J., Lee N., Thorne J. Orpo: Monolithic preference optimization without reference model. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024:11170–11189. [Google Scholar]
- 225.Feng X., Wan Z., Wen M., et al. Alphazero-like tree-search can guide large language model decoding and training. arXiv. 2023 Preprint at. [Google Scholar]
- 226.Zhang D., Zhoubian S., Hu Z., et al. Rest-mcts∗ Llm self-training via process reward guided tree search. arXiv. 2024 Preprint at. [Google Scholar]
- 227.Silver D., Huang A., Maddison C.J., et al. Mastering the game of go with deep neural networks and tree search. Nature. 2016;529(7587):484–489. doi: 10.1038/nature16961. [DOI] [PubMed] [Google Scholar]
- 228.Besta M., Blach N., Kubicek A., et al. Graph of thoughts: Solving elaborate problems with large language models. Proceedings of the AAAI Conference on Artificial Intelligence. 2024:17682–17690. [Google Scholar]
- 229.Zhang Y., Mao S., Ge T., et al. Llm as a mastermind: A survey of strategic reasoning with large language models. arXiv. 2024 Preprint at. [Google Scholar]
- 230.Hua W., Liu O., Li L., et al. Game-theoretic LLM: Agent workflow for negotiation games. arXiv. 2024 Preprint at. [Google Scholar]
- 231.Li Z., Ni Y., Qi R., et al. Llm-pysc2: Starcraft ii learning environment for large language models. arXiv. 2024 Preprint at. [Google Scholar]
- 232.Khan M.J., Sukthankar G. Sc-phi2: A fine-tuned small language model for starcraft ii macromanagement tasks. arXiv. 2024 Preprint at. [Google Scholar]
- 233.Ma W., Mi Q., Zeng Y., et al. Large language models play starcraft ii: Benchmarks and a chain of summarization approach. arXiv. 2023 Preprint at. [Google Scholar]
- 234.Jin X., Wang Z., Du Y., et al. (2024). Learning to discuss strategically: A case study on one night ultimate werewolf Preprint at arXiv:2405.19946.
- 235.Bailis S., Friedhoff J., Chen F. Werewolf arena: A case study in LLM evaluation via social deduction. arXiv. 2024 Preprint at. [Google Scholar]
- 236.Kitadai A., Dayana S., Lugo R., et al. Can ai with high reasoning ability replicate human-like decision making in economic experiments? arXiv. 2024 Preprint at. [Google Scholar]
- 237.Shapira E., Madmon O., Reinman I., et al. Glee: A unified framework and benchmark for language-based economic environments. arXiv. 2024 Preprint at. [Google Scholar]
- 238.Driess D., Xia F., Sajjadi M.S.M., et al. Palm-e: An embodied multimodal language model. arXiv. 2023 Preprint at. [Google Scholar]
- 239.Wang T.H., Maalouf A., Xiao W., et al. 2024 IEEE International Conference on Robotics and Automation (ICRA) IEEE; 2024. Drive anywhere: Generalizable end-to-end autonomous driving with multi-modal foundation models; pp. 6687–6694. [Google Scholar]
- 240.Firoozi R., Tucker J., Tian S., et al. The International Journal of Robotics Research; 2023. Foundation Models in Robotics: Applications, Challenges, and the Future. page 02783649241281508. [Google Scholar]
- 241.Fan L., Wang Y., Zhang H., et al. Multimodal perception and decision-making systems for complex roads based on foundation models. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2024 [Google Scholar]
- 242.Wang G., Xie Y., Jiang Y., et al. Voyager: An open-ended embodied agent with large language models. arXiv. 2023 Preprint at. [Google Scholar]
- 243.Zhu X., Chen Y., Tian H., et al. Ghost in the minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory. arXiv. 2023 Preprint at. [Google Scholar]
- 244.Rao S., Xu W., Xu M., et al. Collaborative quest completion with LLM-driven non-player characters in minecraft. arXiv. 2024 Preprint at. [Google Scholar]
- 245.Hejabi P., Rahmati E., Ziabari A.S., et al. Evaluating creativity and deception in large language models: A simulation framework for multi-agent balderdash. arXiv. 2024 Preprint at. [Google Scholar]
- 246.Wang H., Feng X., Li L., et al. A systematic game benchmark for evaluating strategic reasoning abilities of LLMs. arXiv. 2024 Preprint at. [Google Scholar]
- 247.Huang Y., Wang X., Liu H., et al. Adasociety: An adaptive environment with social structures for multi-agent decision-making. arXiv. 2024 Preprint at. [Google Scholar]
- 248.Xie Z., Kang H., Sheng Y., et al. Ai metropolis: Scaling large language model-based multi-agent simulation with out-of-order execution. arXiv. 2024 Preprint at. [Google Scholar]
- 249.Kumar A., Fu J., Soh M., et al. Stabilizing off-policy q-learning via bootstrapping error reduction. Adv. Neural Inf. Process. Syst. 2019;32 [Google Scholar]
- 250.Le H., Voloshin C., Yue Y. International Conference on Machine Learning. PMLR; 2019. Batch policy learning under constraints; pp. 3703–3712. [Google Scholar]
- 251.Kumar A., Zhou A., Tucker G., et al. Conservative q-learning for offline reinforcement learning. Adv. Neural Inf. Process. Syst. vol. 33, pp. 1179–1191, 2020.
- 252.Kostrikov I., Nair A., Levine S. Offline reinforcement learning with implicit q-learning. arXiv. 2021 Preprint at. [Google Scholar]
- 253.Chen L., Lu K., Rajeswaran A., et al. Decision transformer: Reinforcement learning via sequence modeling. Adv. Neural Inf. Process. Syst. 2021;34:15084–15097. [Google Scholar]
- 254.Wang Z., Hunt J.J., Zhou M. Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv. 2022 Preprint at. [Google Scholar]
- 255.Mao Y., Zhang H., Chen C., et al. Supported value regularization for offline reinforcement learning. Adv. Neural Inf. Process. Syst. 2024;36 [Google Scholar]
- 256.Mao Y., Wang C., Chen C., et al. Offline reinforcement learning with ood state correction and ood action suppression. arXiv. 2024 Preprint at. [Google Scholar]
- 257.Mao Y., Wang Q., Qu Y., et al. Doubly mild generalization for offline reinforcement learning. arXiv. 2024 Preprint at. [Google Scholar]
- 258.Fu J., Kumar A., Nachum O., et al. D4rl: Datasets for deep data-driven reinforcement learning. arXiv. 2020 Preprint at. [Google Scholar]
- 259.Yoon J., Kim T., Dia O., et al. Bayesian model-agnostic meta-learning. Adv. Neural Inf. Process. Syst. 2018;31 [Google Scholar]
- 260.Rakelly K., Zhou A., Finn C., et al. International Conference on Machine Learning. PMLR; 2019. Efficient off-policy meta-reinforcement learning via probabilistic context variables; pp. 5331–5340. [Google Scholar]
- 261.Qi W., Van Hoof H. (2022) Learning expressive meta-representations with mixture of expert neural processes. Adv. Neural Inf. Process. Syst.35, pp. 26 242–26 255, 2022.
- 262.Zintgraf L., Shiarlis K., Igl M., et al. Varibad: A very good method for bayes-adaptive deep RL via meta-learning. arXiv. 2019 Preprint at. [Google Scholar]
- 263.Thomas G.D. Hierarchical reinforcement learning with the max q-value function decomposition. J. Artif. Intell. Res. 2000;13:227–303. [Google Scholar]
- 264.Kulkarni T.D., Narasimhan K., Saeedi A., et al. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Adv. Neural Inf. Process. Syst. 2016;29 [Google Scholar]
- 265.Qu Y., Wang B., Shao J., et al. Hokoff: real game dataset from honor of kings and its offline reinforcement learning benchmarks. Adv. Neural Inf. Process. Syst. 2024;36 [Google Scholar]
- 266.Chen D., Chen K., Li Z., et al. (2022). Powernet: Multi-agent deep reinforcement learning for scalable powergrid control. IEEE Trans. Power Syst. 35, pp. 26 242–26 255.
- 267.Dong J., Yassine A., Armitage A., et al. Multi-agent reinforcement learning for intelligent v2g integration in future transportation systems. IEEE Trans. Intell. Transport. Syst. 2023 [Google Scholar]
- 268.Zhang K., Yang Z., Liu H., et al. International Conference on Machine Learning. PMLR; 2018. Fully decentralized multi-agent reinforcement learning with networked agents; pp. 5872–5881. [Google Scholar]
- 269.Sunehag P., Lever G., Gruslys A., et al. Value-decomposition networks for cooperative multi-agent learning. arXiv. 2017 Preprint at. [Google Scholar]
- 270.Rashid T., Samvelyan M., Schroeder De Witt C., et al. (2020). Monotonic value function factorisation for deep multi-agent reinforcement learning. J. Mach. Learn. Res.21(178) pp. 1–51, 2020.
- 271.Hu S., Shen L., Zhang Y., et al. Communication learning in multi-agent systems from graph modeling perspective. arXiv. 2024 Preprint at. [Google Scholar]
- 272.Alonso E., Jelley A., Vincent M., et al. Diffusion for world modeling: Visual details matter in atari. arXiv. 2024 Preprint at. [Google Scholar]
- 273.Qu Y., Wang B., Jiang Y., et al. Choices are more important than efforts: Llm enables efficient multi-agent exploration. arXiv. 2024 Preprint at. [Google Scholar]
- 274.Qu Y., Jiang Y., Wang B., et al. Latent reward: Llm-empowered credit assignment in episodic reinforcement learning. arXiv. 2024 Preprint at. [Google Scholar]
- 275.Ma Y.J., Hejna J., Wahid A., et al. Vision language models are in-context value learners. arXiv. 2024 Preprint at. [Google Scholar]
- 276.Lim V., Huang H., Chen L.Y., et al. 2022 International Conference on Robotics and Automation (ICRA) IEEE; 2022. Real2sim2real: Self-supervised learning of physical single-step dynamic actions for planar robot casting; pp. 8282–8289. [Google Scholar]
- 277.Mandlekar A., Xu D., Wong J., et al. What matters in learning from offline human demonstrations for robot manipulation. arXiv. 2021 Preprint at. [Google Scholar]
- 278.Bain M., Sammut C. A framework for behavioural cloning. Mach. Intell. 1995;15:103–129. [Google Scholar]
- 279.Zitkovich B., Yu T., Xu S., et al. Conference on Robot Learning. PMLR; 2023. Rt-2: Vision-language-action models transfer web knowledge to robotic control; pp. 2165–2183. [Google Scholar]
- 280.Du Y., Yang S., Dai B., et al. Learning universal policies via text-guided video generation. Adv. Neural Inf. Process. Syst. 2024;36 [Google Scholar]
- 281.Kim M.J., Pertsch K., Karamcheti S., et al. Openvla: An open-source vision-language-action model. arXiv. 2024 Preprint at. [Google Scholar]
- 282.Bruce J., Dennis M.D., Edwards A., et al. Genie: Generative interactive environments. Forty-first International Conference on Machine Learning. 2024 [Google Scholar]
- 283.McCarthy R., Tan D.C.H., Schmidt D., et al. Towards generalist robot learning from internet video: A survey. arXiv. 2024 Preprint at. [Google Scholar]
- 284.Wang Y., Xian Z., Chen F., et al. Robogen: Towards unleashing infinite data for automated robot learning via generative simulation. Forty-first International Conference on Machine Learning. 2024 [Google Scholar]
- 285.Zhou Y., Simon M., Peng Z., et al. Simgen: Simulator-conditioned driving scene generation. arXiv. 2024 Preprint at. [Google Scholar]
- 286.Zhang J., Lehman J., Stanley K., et al. Omni: Open-endedness via models of human notions of interestingness. The Twelfth International Conference on Learning Representations. 2023 [Google Scholar]
- 287.Faldor M., Zhang J., Cully A., et al. Omni-epic: Open-endedness via models of human notions of interestingness with environments programmed in code. arXiv. 2024 Preprint at. 2405.15568. [Google Scholar]
- 288.Kadian A., Truong J., Gokaslan A., et al. Sim2real predictivity: Does evaluation in simulation predict real-world performance? IEEE Robot. Autom. Lett. 2020;5(4):6670–6677. [Google Scholar]
- 289.Weng L. Llm-powered autonomous agents. lilianweng.github.io. 2023 [Google Scholar]
- 290.Lang K.J., Waibel A.H., Hinton G.E. A time-delay neural network architecture for isolated word recognition. Neural Netw. 1990;3(1):23–43. doi: 10.1016/0893-6080(90)90044-L. [DOI] [Google Scholar]
- 291.Elman J. Finding structure in time. Cogn. Sci. 1990;14(2):179–211. doi: 10.1207/s15516709cog14021. [DOI] [Google Scholar]
- 292.Williams R.J., Zipser D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1989;1(2):270–280. doi: 10.1162/neco.1989.1.2.270. [DOI] [Google Scholar]
- 293.Hochreiter S., Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]
- 294.Robinson A.J., Fallside F. Vol. 11. University of Cambridge Department of Engineering Cambridge; 1987. (The Utility Driven Dynamic Error Propagation Network). [Google Scholar]
- 295.Giles C.L., Miller C.B., Chen D., et al. Learning and extracting finite state automata with second-order recurrent neural networks. Neural Comput. 1992;4(3):393–405. doi: 10.1162/neco.1992.4.3.393. [DOI] [Google Scholar]
- 296.Sutskever I., Vinyals O., Le Q.V. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014. Ghahramani Z., Welling M., Cortes C., et al., editors. 2014. Sequence to sequence learning with neural networks; pp. 3104–3112. [Google Scholar]
- 297.Sukhbaatar S., Szlam A., Weston J., et al. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015. Cortes C., Lawrence N.D., Lee D.D., et al., editors. 2015. End-to-end memory networks; pp. 2440–2448. [Google Scholar]
- 298.Bahdanau D., Cho K., Bengio Y. In: 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings. Bengio Y., LeCun Y., editors. 2015. Neural machine translation by jointly learning to align and translate. [Google Scholar]
- 299.Westö J., May P.J.C., Tiitinen H. Memory stacking in hierarchical networks. Neural Comput. 2016;28(2):327–353. doi: 10.1162/NECOa00803. [DOI] [PubMed] [Google Scholar]
- 300.Weston J. In: Proceedings of the Eleventh ACM Conference on Recommender Systems, RecSys 2017. Cremonesi P., Ricci F., Berkovsky S., editors, et al., editors. ACM; Como, Italy: 2017. Memory networks for recommendation; p. 4. [DOI] [Google Scholar]
- 301.Rae J.W., Potapenko A., Jayakumar S.M., et al. 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net; 2020. Compressive transformers for long-range sequence modelling. [Google Scholar]
- 302.Dai Z., Yang Z., Yang Y., Transformer-xl, et al. In: Korhonen A., Traum D.R., Marquez L., editors. Vol. 1. 2019. Attentive language models beyond a fixed-length context; pp. 2978–2988. (Long Papers. Association for Computational Linguistics). [DOI] [Google Scholar]
- 303.So D.R., Le Q.V., Liang C. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019 Volume 97 of Proceedings of Machine Learning Research. Chaudhuri K., Salakhutdinov R., editors. PMLR; 2019. The evolved transformer; pp. 5877–5886. [Google Scholar]
- 304.Dehghani M., Gouws S., Vinyals O., et al. 7th International Conference on Learning Representations, ICLR 2019. OpenReview.net; 2019. Universal transformers. [Google Scholar]
- 305.Lewis P.S.H., Perez E., Piktus A., et al. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020. Larochelle H., Ranzato M., Hadsell R., et al., editors. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks. [Google Scholar]
- 306.Gu A., Dao T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv. 2023 doi: 10.48550/arXiv.2312.00752. Preprint at. [DOI] [Google Scholar]
- 307.Sacerdoti E.D. Advance Papers of the Fourth International Joint Conference on Artificial Intelligence, Tbilisi, Georgia. USSR; 1975. The nonlinear nature of plans; pp. 206–214. [Google Scholar]
- 308.Hart P., Nilsson N., Raphael B. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cyber. 1968;4(2):100–107. doi: 10.1109/TSSC.1968.300136. [DOI] [Google Scholar]
- 309.Fikes R., Nilsson N.J. In: Proceedings of the 2nd International Joint Conference on Artificial Intelligence. Cooper D.C., editor. William Kaufmann; London, UK: 1971. STRIPS: A new approach to the application of theorem proving to problem solving; pp. 608–620. [Google Scholar]
- 310.D. Ha, J. Schmidhuber. (2018). World models. CoRR, abs/1803.10122.
- 311.Jacob A., Rohrbach M., Darrell T., et al. Neural module networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:39–48. [Google Scholar]
- 312.Reed S.E., de Freitas N. In: 4th International Conference on Learning Representations, ICLR 2016 Conference Track Proceedings. Bengio Y., LeCun Y., editors. 2016. Neural programmer-interpreters. [Google Scholar]
- 313.Qin Y., Liang S., Ye Y., et al. The Twelfth International Conference on Learning Representations, ICLR 2024. OpenReview.net; 2024. ToolLLM: Facilitating large language models to master 16000+ real-world APIs. [Google Scholar]
- 314.Wu Z., Han C., Ding Z., et al. Os-copilot: Towards generalist computer agents with self-improvement. Preprint at arXiv:2402.07456. DOI: 10.48550/arXiv.2402.07456. [DOI]
- 315.Yao S., Zhao J., Yu D., et al. The Eleventh International Conference on Learning Representations, ICLR 2023. OpenReview.net; 2023. React: Synergizing reasoning and acting in language models. [Google Scholar]
- 316.Qiao S., Fang R., Zhang N., et al (2024). Agent planning with world knowledge model. Preprint at arXiv:2405.14205. DOI: 10.48550/arXiv.2405.14205. [DOI]
- 317.Wang H., Yan H., Rong C., et al. Multi-scale simulation of complex systems: A perspective of integrating knowledge and data. ACM Comput. Surv. 2024;56:1–38. doi: 10.1145/3654662. [DOI] [Google Scholar]
- 318.Rabia M.A.B., Bellabdaoui A. Simulation-based analytics: A systematic literature review. Simul. Model. Pract. Th. 2022;117:102511. doi: 10.1016/j.simpat.2022.102511. [DOI] [Google Scholar]
- 319.Chen B., Guo R., Zhu Z., et al. A novel research pattern for the simulation of complex systems sigd. J. Syst. Simul. 2024;36:2993. doi: 10.16182/j.issn1004731x.joss.24-0472. [DOI] [Google Scholar]
- 320.Xue X., Yu X., Wang F.-Y. ChatGPT chats on computational experiments: From interactive intelligence to imaginative intelligence for design of artificial societies and optimization of foundational models. IEEE/CAA J. Autom. Sinica. 2023;10(6):1357–1360. doi: 10.1109/JAS.2023.123585. [DOI] [Google Scholar]
- 321.Zhao Y., Zhu Z., Chen B., et al. Towards parallel intelligence: An interdisciplinary solution for complex systems. The Innovation. 2023;4(6) doi: 10.1016/j.xinn.2023.100521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 322.Dai T., Wong J., Jiang Y., et al. Automated creation of digital cousins for robust policy learning. arXiV. 2024 doi: 10.48550/arXiv.2410.07408. Preprint at. [DOI] [Google Scholar]
- 323.Thomas A. Us department of defense modeling and simulation: new approaches and initiatives. Inf. Secur. Int. J. 2009;23(23):32–48. doi: 10.11610/isij.2304. [DOI] [Google Scholar]
- 324.Mou X., Ding X., He Q., et al. From individual to society: A survey on social simulation driven by large language model-based agents. arXiv. 2024 doi: 10.48550/arXiv.2412.03563. Preprint at. [DOI] [Google Scholar]
- 325.Lavin A., Krakauer D., Zenil H., et al. Simulation intelligence: Towards a new generation of scientific methods. arXiv. 2021 doi: 10.48550/arXiv.2112.03235. Preprint at. [DOI] [Google Scholar]
- 326.Wu L., Wang L., Li N., et al. Modeling the covid-19 outbreak in china through multi-source information fusion. Innovation. 2020;1(2):100033. doi: 10.1016/j.xinn.2020.100033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 327.Zhu Z., Chen B., Chen H., et al. Strategy evaluation and optimization with an artificial society toward a pareto optimum. Innovation. 2022;3(5):100274. doi: 10.1016/j.xinn.2022.100274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 328.Chen B., Guo R., Zhu Z., et al. Simulation of covid-19 outbreak in Nanjing Lukou airport based on complex dynamical networks. Complex Syst. Model. Simul. 2023;3(1):71–82. [Google Scholar]
- 329.Petrenko A., Wijmans E., Shacklett B., et al. International Conference on Machine Learning. PMLR; 2021. Megaverse: Simulating embodied agents at one million experiences per second; pp. 8556–8566. [Google Scholar]
- 330.Yang Y., Jia B., Zhi P., et al. Physcene: Physically interactable 3d scene synthesis for embodied ai. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024:16262–16272. [Google Scholar]
- 331.Jacinto M., Pinto J., Patrikar J., et al. 2024 International Conference on Unmanned Aircraft Systems (ICUAS) IEEE; 2024. Pegasus simulator: An isaac sim framework for multiple aerial vehicles simulation; pp. 917–922. [Google Scholar]
- 332.Rao K., Harris C., Irpan A., et al. Rl-cyclegan: Reinforcement learning aware simulation-to-real. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020:11157–11166. [Google Scholar]
- 333.Liang K., Hu H., Liu R., Rlhs, et al. Mitigating misalignment in RLHF with hindsight simulation. arXiv. 2025 doi: 10.48550/arXiv.2501.08617. Preprint at. [DOI] [Google Scholar]
- 334.Jason W., Yi T., Bommasani R., et al. Emergent abilities of large language models. arXiv. 2022 doi: 10.48550/arXiv.2206.07682. Preprint at. [DOI] [Google Scholar]
- 335.Brohan A., Brown N., Carbajal J., et al. Rt-1: Robotics transformer for real-world control at scale. arXiv. 2022 doi: 10.48550/arXiv.2212.06817. Preprint at. [DOI] [Google Scholar]
- 336.Strohman T., Metzler D., Howard T., et al. Indri: A language model-based search engine for complex queries. Proceedings of the International Conference on Intelligent Analysis. 2005:2–6. Washington, DC. [Google Scholar]
- 337.Silver D., Hubert T., Schrittwieser J., et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv. 2017 doi: 10.48550/arXiv.1712.01815. Preprint at. [DOI] [PubMed] [Google Scholar]
- 338.Tesauro G. Td-gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 1994;6(2):215–219. DOI: 10.1162/neco.1994.6.2.215. [DOI]
- 339.Kalashnikov D., Irpan A., Pastor P., et al. Conference on Robot Learning. PMLR; 2018. Scalable deep reinforcement learning for vision-based robotic manipulation; pp. 651–673. [Google Scholar]
- 340.Akkaya I., Andrychowicz M., Chociej M., et al. Solving rubik’s cube with a robot hand. arXiv. 2019 doi: 10.48550/arXiv.1910.07113. Preprint at. [DOI] [Google Scholar]
- 341.Sun J., Huang D.-A., Lu B., et al. Plate: Visually-grounded planning with transformers in procedural tasks. IEEE Robot. Autom. Lett. 2022;7(2):4924–4930. doi: 10.1109/LRA.2022.3150855. [DOI] [Google Scholar]
- 342.Liu W., Sun J., Wang G., et al. Data-driven resilient predictive control under denial-of-service. IEEE Trans. Automat. Contr. 2023;68(8):4722–4737. doi: 10.1109/TAC.2022.3209399. [DOI] [Google Scholar]
- 343.Zhang D., Liang D., Yang H., et al. Sam3d: Zero-shot 3d object detection via segment anything model. arXiv. 2023 doi: 10.48550/arXiv.2306.02245. Preprint at. [DOI] [Google Scholar]
- 344.Hong Y., Zhen H., Chen P., et al. 3d-LLM: Injecting the 3d world into large language models. Adv. Neural Inf. Process. Syst. 2023;36:20482–20494. [Google Scholar]
- 345.Chen W., Hu S., Talak R., et al. Leveraging large language models for robot 3d scene understanding. arXiv. 2022 doi: 10.48550/arXiv.2209.05629. Preprint at. [DOI] [Google Scholar]
- 346.Zhang W., Wang G., Sun J., et al. STORM: Efficient stochastic transformer based world models for reinforcement learning. Adv. Neural Inf. Process. Syst. 2024;36 [Google Scholar]
- 347.Feng Z., Chen J., Xiao W., et al. Learning hybrid policies for MPC with application to drone flight in unknown dynamic environments. Un. Sys. 2024;12(02):429–441. doi: 10.1142/S2301385024410206. [DOI] [Google Scholar]
- 348.Zhou Z., Wang G., Sun J., et al. Efficient and robust time-optimal trajectory planning and control for agile quadrotor flight. IEEE Robot. Autom. Lett. 2023;8(12):7913–7920. doi: 10.1109/LRA.2023.3322075. [DOI] [Google Scholar]
- 349.Yuan Y., Wang S., Mei Y., et al. Improving world models for robot arm grasping with backward dynamics prediction. Int. J. Mach. Learn. Cybern. 2024;15:3879–3891. doi: 10.1007/s13042-024-02125-3. [DOI] [Google Scholar]
- 350.Y. Jiang, L. Zhao, A. Quattrini Li, et al. Exploring spontaneous social interaction swarm robotics powered by large language models. DOI:10.13140/RG.2.2.29928.38401
- 351.Mandi Z., Bharadhwaj H., Vincent M., et al. A framework for scalable multi-task multi-scene visual imitation learning. arXiv. 2022 doi: 10.48550/arXiv.2212.05711. Preprint at. [DOI] [Google Scholar]
- 352.Di Palo N., Byravan A., Hasenclever L., et al. Towards a unified agent with foundation models. arXiv. 2023 doi: 10.48550/arXiv.2307.09668. Preprint at. [DOI] [Google Scholar]
- 353.Kwon M., Xie S.M., Bullard K., et al. Reward design with language models. arXiv. 2023 doi: 10.48550/arXiv.2303.00001. Preprint at. [DOI] [Google Scholar]
- 354.Du Y., Li J., Tang T., et al. Zero-shot visual question answering with language model feedback. arXiv. 2023 doi: 10.48550/arXiv.2305.17006. Preprint at. [DOI] [Google Scholar]
- 355.Tami R., Soualmi B., Doufene A., et al. 2019 IEEE Intelligent Transportation Systems Conference (ITSC) IEEE; 2019. Machine learning method to ensure robust decision-making of avs; pp. 1217–1222. [Google Scholar]
- 356.Gui S., Song S., Qin R., et al. Remote sensing object detection in the deep learning era—a review. Remote Sens. 2024;16(2):327. [Google Scholar]
- 357.Navalgund R.R., Jayaraman V., Roy P.S. Remote sensing applications: An overview. Curr. Sci. 2007;93:1747–1766. [Google Scholar]
- 358.Dias P., Potnis A., Guggilam S., et al. IGARSS 2023-2023 IEEE International Geo-Science and Remote Sensing Symposium. IEEE; 2023. An agenda for multimodal foundation models for earth observation; pp. 1237–1240. [Google Scholar]
- 359.Ma Y., Chen S., Ermon S., et al. Transfer learning in environmental remote sensing. Remote Sens. Environ. 2024;301:113924. doi: 10.1016/j.rse.2023.113924. [DOI] [Google Scholar]
- 360.Cong Y., Khanna S., Meng C., et al. Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery. Adv. Neural Inf. Process. Syst. 2022;35:197–211. [Google Scholar]
- 361.Reed C.J., Gupta R., Li S., et al. Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023:4088–4099. [Google Scholar]
- 362.Wanyan X., Seneviratne S., Shen S., et al. Extending global-local view alignment for self-supervised learning with remote sensing imagery. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024:2443–2453. [Google Scholar]
- 363.Smith W.A., Randall R.B. Rolling element bearing diagnostics using the case western reserve university data: A benchmark study. Mech. Syst. Signal Process. 2015;64:100–131. doi: 10.1016/j.ymssp.2015.04.021. [DOI] [Google Scholar]
- 364.Koizumi Y., Saito S., Uematsu H., et al. A dataset of miniature-machine operating sounds for anomalous sound detection. 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019:313–317. [Google Scholar]
- 365.Zhang Y., Tang Q., Zhang Y., et al. Identifying degradation patterns of lithium ion batteries from impedance spectroscopy using machine learning. Nat. Commun. 04 2020;11:1706. DOI: 10.1038/s41467-020-15235-7. [DOI] [PMC free article] [PubMed]
- 366.Xing Y., He W., Pecht M., et al. State of charge estimation of lithium-ion batteries using the open-circuit voltage at various ambient temperatures. Appl. Energy. 2014;113:106–115. doi: 10.1016/j.apenergy.2013.07.008. [DOI] [Google Scholar]
- 367.Jakobsson E., Frisk E., Krysander M., et al. A dataset for fault classification in rock drills, a fast oscillating hydraulic system. Annual Conference of the PHM Society. 2022;14(1) doi: 10.36001/phmconf.2022.v14i1.3144. [DOI] [Google Scholar]
- 368.Jin Y., Hou L., Chen Y. A time series transformer based method for the rotating machinery fault diagnosis. Neurocomputing. 2022;494:379–395. doi: 10.1016/j.neucom.2022.04.111. [DOI] [Google Scholar]
- 369.Wang X., Jiang X., Ding H., et al. Knowledge-aware deep framework for collaborative skin lesion segmentation and melanoma recognition. Pattern Recogn. 2021;120:108075. doi: 10.1016/j.patcog.2021.108075. [DOI] [Google Scholar]
- 370.Qi L., J. Huang, H. He, et al. (2024). Vsllava: a pipeline of large multimodal foundation model for industrial vibration signal analysis. Preprint at arXiv:2409.07482. DOI: 10.48550/arXiv:2409.07482. [DOI]
- 371.Ajoudani A., Zanchettin A.M., Ivaldi S., et al. Progress and prospects of the human-robot collaboration. Auton. Robots 2018;42:957–975. DOI:10.1007/s10514-017- 9677-2
- 372.Diehl M., Paxton C., Ramirez-Amaro K. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2021. Automated Generation of Robotic Planning Domains from Observations; pp. 6732–6738. [Google Scholar]
- 373.Shirai K., Beltran-Hernandez C.C., Hamaya M., et al. 2024 IEEE International Conference on Robotics and Automation. ICRA; 2024. Vision-language Interpreter for Robot Task Planning; pp. 2051–2058. [Google Scholar]
- 374.Lee M.-L., Behdad S., Liang X., et al. Task allocation and planning for product disassembly with human–robot collaboration. Robot. Comput. Integrated Manuf. 2022;76:102306. doi: 10.1016/j.rcim.2021.102306. [DOI] [Google Scholar]
- 375.Yu T., Huang J., Chang Q. Optimizing task scheduling in human-robot collaboration with deep multi-agent reinforcement learning. J. Manuf. Syst. 2021;60:487–499. doi: 10.1016/j.jmsy.2021.07.015. [DOI] [Google Scholar]
- 376.A. Brohan, N. Brown, J. Carbajal, et al. (2023). Rt-1: Robotics transformer for real-world control at scale. Preprint at arXiv:2307.15818. DOI: 10.48550/arXiv.2307.15818. [DOI]
- 377.Ejaz W., Sharma S.K., Saadat S., et al. A comprehensive survey on resource allocation for cran in 5g and beyond networks. J. Netw. Comput. Appl. 2020;160:102638. doi: 10.1016/j.jnca.2020.102638. [DOI] [Google Scholar]
- 378.Kelechi Ijemaru G., Adeyanju I., Olusuyi K., et al. Security challenges of wireless communications networks: a survey. Int. J. Appl. Eng. Res. 2018;13(8):5680–5692. [Google Scholar]
- 379.Xu Y., Chen Y., Zhang X., et al. In: Gibbons P., Pekhimenko G., De Sa C., editors. Vol. 6. 2024. Cloudeval-yaml: A practical benchmark for cloud configuration generation; pp. 173–195. (Proceedings of Machine Learning and Systems). [Google Scholar]
- 380.Z. He, T. Sun, K. Wang, et al. (2022). Diffusionbert: Improving generative masked language models with diffusion models. Preprint at arXiv:2211.15029. DOI: 10.48550/arXiv.2211.15029. [DOI]
- 381.Mondal R., Tang A., Beckett R., et al. Proceedings of the 22nd ACM Workshop on Hot Topics in Networks, HotNets ’23. Association for Computing Machinery; 2023. What do LLMs need to synthesize correct router configurations? pp. 189–195. [Google Scholar]
- 382.Mani S.K., Zhou Y., Hsieh K., et al. Proceedings of the 22nd ACM Workshop on Hot Topics in Networks, HotNets ’23. Association for Computing Machinery; New York, NY, USA: 2023. Enhancing network management using code generated by large language models; pp. 196–204. [Google Scholar]
- 383.Bayer M., Kuehn P., Shanehsaz R., et al. A domain-adapted language model for the cybersecurity domain. ACM Trans. Priv. Secur. 2024;27(2):1–20. doi: 10.1145/3652594. [DOI] [Google Scholar]
- 384.Aghaei E., Niu X., Shadid W., et al. International Conference on Security and Privacy in Communication Systems. Springer; 2022. Securebert: A domain-specific language model for cybersecurity; pp. 39–56. [Google Scholar]
- 385.Ferrag M.A., Ndhlovu M., Tihanyi N., et al. Revolutionizing cyber threat detection with large language models: A privacy-preserving bert-based lightweight model for iot/iiot devices. IEEE Access. 2024 doi: 10.1109/ACCESS.2024.3363469. [DOI] [Google Scholar]
- 386.Rodrigues L., Dwan Pereira F., Cabral L., et al. Assessing the quality of automatic-generated short answers using GPT-4. Comput. Educ. Artif. Intell. 2024;7:100248. doi: 10.1016/j.caeai.2024.100248. [DOI] [Google Scholar]
- 387.Wu N., Xu J., Linghu J., et al. Real-time optimal control and dispatching strategy of multi-microgrid energy based on storage collaborative. Int. J. Electr. Power Energy Syst. 2024;160:110063. doi: 10.2139/ssrn.4739528. [DOI] [Google Scholar]
- 388.L Huang. (2024). “big watt” to promote the application of power ai in the future.
- 389.Ahmad T., Zhu H., Zhang D., et al. Energetics systems and artificial intelligence: Applications of industry 4.0. Energy Rep. 2022;8:334–361. doi: 10.1016/j.egyr.2021.11.256. [DOI] [Google Scholar]
- 390.van der Blij N.H., Ramirez-Elizondo L.M., Spaan M.T.J., et al. Grid sense multiple access: A decentralized control algorithm for dc grids. Int. J. Electr. Power Energy Syst. 2020;119:105818. doi: 10.1016/j.ijepes.2020.105818. [DOI] [Google Scholar]
- 391.Jan Z., Ahamed F., Mayer W., et al. Artificial intelligence for industry 4.0: Systematic review of applications, challenges, and opportunities. Expert Syst. Appl. 2023;216:119456. doi: 10.1016/j.eswa.2022.119456. [DOI] [Google Scholar]
- 392.Chen T., Sampath V., May M.C., et al. Machine learning in manufacturing towards industry 4.0: From ‘for now’ to ‘four-know. Appl. Sci. 2023;13(3):1903. doi: 10.3390/app13031903. [DOI] [Google Scholar]
- 393.Jagatheesaperumal S.K., Rahouti M., Ahmad K., et al. The duo of artificial intelligence and big data for industry 4.0: Applications, techniques, challenges, and future research directions. IEEE Internet Things J. 2022;9(15):12861–12885. doi: 10.48550/arXiv.2104.02425. [DOI] [Google Scholar]
- 394.National Research Counci . Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. National Academies Press; 2012. [DOI] [Google Scholar]
- 395.Aggarwal C.C., Aggarwal L.-F., Fife L.- Vol. 156. Springer; 2020. (Linear Algebra and Optimization for Machine Learning). [DOI] [Google Scholar]
- 396.Yu G. Thirteen years of clusterprofiler. Innovation. 2024;5(6):100722. doi: 10.1016/j.xinn.2024.100722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 397.Ahmadianfar I., Bozorg-Haddad O., Chu X. Gradient-based optimizer: A new metaheuristic optimization algorithm. Inf. Sci. 2020;540:131–159. doi: 10.1016/j.ins.2020.06.037. [DOI] [Google Scholar]
- 398.Li H., Li H., Zhang M., et al. Direct imaging of pulmonary gas exchange with hyperpolarized xenon mri. Innovation. 2024;5(6):100720. doi: 10.1016/j.xinn.2024.100720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 399.Seeger M., Steinke F., Tsuda K. Artificial Intelligence and Statistics. PMLR; 2007. Bayesian inference and optimal design in the sparse linear model; pp. 444–451. [Google Scholar]
- 400.Archer K.J., Fu H., Mro’zek K., et al. Improving risk stratification for 2022 european leukemianet favorable-risk patients with acute myeloid leukemia. Innovation. 2024;5(6) doi: 10.1016/j.xinn.2024.100719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 401.Koch C., Segev I. The role of single neurons in information processing. Nat. Neurosci. 2000;3(11):1171–1177. doi: 10.1038/81444. [DOI] [PubMed] [Google Scholar]
- 402.Dou C., Tang Y., Jiang N., et al. Analysis of sichuan wildfire based on the first synergetic observation from three payloads of sdgsat-1. Innovation. 2024;5(6):100707. doi: 10.1016/j.xinn.2024.100707. [DOI] [Google Scholar]
- 403.Haykin S. nsive foundation. Neural Netw. 2004;2(2004):41. [Google Scholar]
- 404.Khan S., Naseer M., Hayat M., et al. Transformers in vision: A survey. ACM Comput. Surv. 2022;54(10s):1–41. doi: 10.48550/arXiv.2101.01169. [DOI] [Google Scholar]
- 405.Tay Y., Bahri D., Metzler D., et al. International Conference on Machine Learning. PMLR; 2021. Synthesizer: Rethinking self-attention for transformer models; pp. 10183–10192. [Google Scholar]
- 406.Ma D., Ma Y., Ma J., et al. Energy conversion materials need phonons. Innovation. 2024;5(6):100709. doi: 10.1016/j.xinn.2024.100709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 407.Cobbe K., Kosaraju V., Bavarian M., et al. Training verifiers to solve math word problems. arXiv. 2021 doi: 10.48550/arXiv.2110.14168. Preprint at. [DOI] [Google Scholar]
- 408.Bottou L. Neural Networks: Tricks of the Trade. Second Edition. Springer; 2012. Stochastic gradient descent tricks; pp. 421–436. [Google Scholar]
- 409.Cui Y., Wu Y., Yuan Y. Amplification editing empowers in situ large-scale dna duplication. Innovation. 2024;5(6):100716. doi: 10.1016/j.xinn.2024.100716. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 410.Xiao P., Yin Y., Liu B., et al. Adaptive testing based on moment estimation. IEEE Trans. Syst. Man Cybern. Syst. 2020;50(3):911–922. doi: 10.1109/TSMC.2017.2761767. [DOI] [Google Scholar]
- 411.Xu S., Chen R., Zhang X., et al. The evolutionary tale of lilies: Giant genomes derived from transposon insertions and polyploidization. Innovation. 2024;5(6):100726. doi: 10.1016/j.xinn.2024.100726. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 412.George E.K., Kevrekidis I.G., Lu L., et al. Physics-informed machine learning. Nat. Rev. Phys. 2021;3(6):422–440. doi: 10.1038/s42254-021-00314-5. [DOI] [Google Scholar]
- 413.Li A., Chen R., Farimani A.B., et al. Reaction diffusion system prediction based on convolutional neural network. Sci. Rep. 2020;10(1):3894. doi: 10.1038/s41598-020-60853-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 414.Rasp S., Pritchard M.S., Gentine P. Deep learning to represent subgrid processes in climate models. Proc. Natl. Acad. Sci. USA. 2018;115(39):9684–9689. doi: 10.1073/pnas.1810286115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 415.Choudhary K., DeCost B., Chen C., et al. Recent advances and applications of deep learning methods in materials science. npj Comput. Mater. 2022;8(1):59. doi: 10.1038/s41524-022-00734-6. [DOI] [Google Scholar]
- 416.Xu N., Li W., Gong P., et al. Satellite altimeter observed surface water increase across lake-rich regions of the arctic. Innovation. 2024;5(6):100714. doi: 10.1016/j.xinn.2024.100714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 417.Kim S., Lu P.Y., Mukherjee S., et al. Integration of neural network-based symbolic regression in deep learning for scientific discovery. IEEE Trans. Neural Netw. Learn. Syst. 2021;32(9):4166–4177. doi: 10.1109/TNNLS.2020.3017010. [DOI] [PubMed] [Google Scholar]
- 418.Lample G., Charton F. Deep learning for symbolic mathematics. International Conference on Learning Representations. 2019 [Google Scholar]
- 419.Voznica J., Zhukova A., Boskova V., et al. Deep learning from phylogenies to uncover the epidemiological dynamics of outbreaks. Nat. Commun. 2022;13(1):3896. doi: 10.1038/s41467-022-31511-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 420.Ozbayoglu A.M., Gudelek M.U., Sezer O.B. Deep learning for financial applications: A survey. Appl. Soft Comput. 2020;93:106384. doi: 10.1016/j.asoc.2020.10638. [DOI] [Google Scholar]
- 421.Edelman E., Tijssen F., Munniksma P.R., et al. Clinical knowledge modeling: An essential step in the digital transformation of healthcare. The Innovation. 2024;5(6):100718. doi: 10.1016/j.xinn.2024.100718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 422.Yang X., Jia J., Zhou X., et al. The future of artificial intelligence: Time to embrace more international collaboration. The Innovation. 2024;5(6):100703. doi: 10.1016/j.xinn.2024.100703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 423.Moret M., Pachon Angona I., Cotos L., et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat. Commun. 2023;14(1):114. doi: 10.1038/s41467-022-35692-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 424.Chen Q., Yang C., Xie Y., et al. A High Efficiency Strategy to De Novo Design Functional Peptide Sequences. J. Chem. Inf. Model. 2022;62(10):2617–2629. doi: 10.1021/acs.jcim.2c00089. [DOI] [PubMed] [Google Scholar]
- 425.Rettie S.A., Campbell K.V., Bera A.K., et al. Cyclic peptide structure prediction and design using AlphaFold. bioRxiv. 2023 doi: 10.1038/s41467-025-59940-7. Preprint at. 2023.02.25.529956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 426.Wang X., Wang S., Liang X., et al. Deep reinforcement learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2022;35(4):5064–5078. doi: 10.1109/TNNLS.2022.3207346. [DOI] [PubMed] [Google Scholar]
- 427.Lutz I.D., Wang S., Norn C., et al. Top-down design of protein architectures with reinforcement learning. Science (New York, N.Y.) 2023;380(6642):266–273. doi: 10.1126/science.adf6. [DOI] [PubMed] [Google Scholar]
- 428.Runge, F., Stoll, D., Falkner, S., Falkner, F. (2019). Learning to Design RNA. Preprint at arXiv:1812.11951v2.
- 429.Iram A., Dong Y., Ignea C. Synthetic biology advances towards a bio-based society in the era of artificial intelligence. Curr. Opin. Biotechnol. 2024;87:103143. doi: 10.1016/j.copbio.2024.103143. [DOI] [PubMed] [Google Scholar]
- 430.Zhou Z., Zhang L., Yu Y., et al. Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning. Nat. Commun. 2024;15(1):5566. doi: 10.1038/s41467-024-49798-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 431.Vaucher A.C., Zipoli F., Geluykens J., et al. Automated extraction of chemical synthesis actions from experimental procedures. Nat. Commun. 2020;11(1):3601. doi: 10.1038/s41467-020-17266-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 432.Zhou Z., Li X., Zare R.N. Optimizing chemical reactions with deep reinforcement learning. ACS Cent. Sci. 2017;3(12):1337–1344. doi: 10.1021/acscentsci.7b00492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 433.Choi I., Kim J., Kim W.C. Dietary pattern extraction using natural language processing techniques. Front. Nutr. 2022;9:765794. doi: 10.3389/fnut.2022.765794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 434.Choi I., Kim J., Kim W.C. An explainable prediction for dietary-related diseases via language models. Nutrients. 2024;16(5):686. doi: 10.3390/nu16050686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 435.Hsu, C., Lee, H. A., Liu, C. Y., et al. (2023). Precision nutrition with AI: An NLP-based model for food intake analysis and dietary recommendations. Curr. Dev. Nutr. 7 (Suppl 1). DOI: 10.1109/ICICCS65191.2025.10985124. [DOI]
- 436.Amiri M., Sarani Rad F., Li J., et al. Delighting palates with AI: reinforcement learning’s triumph in crafting personalized meal plans with high user acceptance. Nutrients. 2024;16(3):346. doi: 10.3390/nu16030346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 437.Liu L., Guan Y., Wang Z., et al. An interactive food recommendation system using reinforcement learning. Expert Syst. Appl. 2024;254:124313. doi: 10.1016/j.eswa.2024.124313. [DOI] [Google Scholar]
- 438.Wu T., He S., Liu J., et al. A brief overview of chatGPT: The history, status quo and potential future development. IEEE/CAA J. Autom. Sinica. 2023;10(5):1122–1136. doi: 10.1109/JAS.2023.123618. [DOI] [Google Scholar]
- 439.Moor M., Banerjee O., Abad Z.S.H., et al. Foundation models for generalist medical artificial intelligence. Nature. 2023;616(7956):259–265. doi: 10.1038/s41586-023-05881-4. [DOI] [PubMed] [Google Scholar]
- 440.Tu T., Azizi S., Driess D., et al. Towards generalist biomedical ai. NEJM AI. 2024;1(3) doi: 10.1056/AIoa2300138. AIoa2300138. [DOI] [Google Scholar]
- 441.Singhal K., Azizi S., Tu T., et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172–180. doi: 10.1038/s41586-023-06291-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 442.Singhal K., Tu T., Gottweis J., et al. Towards expert-level medical question answering with large language models. Nat. Med. 2025;31:943–950. doi: 10.1038/s41591-024-03423-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 443.Liu X., Liu H., Yang G., et al. A generalist medical language model for disease diagnosis assistance. Nat. Med. 2025;31:932–942. doi: 10.1038/s41591-024-03416-6. [DOI] [PubMed] [Google Scholar]
- 444.Rydzewski N.R., Dinakaran D., Zhao S.G., et al. Comparative evaluation of LLMs in clinical oncology. NEJM AI. 2024;1(5) doi: 10.1056/AIoa2300151. AIoa2300151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 445.Zakka C., Shad R., Chaurasia A., et al. Almanac—retrieval-augmented language models for clinical medicine. NEJM AI. 2024;1(2) doi: 10.1056/AIoa2300068. AIoa2300068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 446.Unlu O., Shin J., Mailly C.J., et al. Retrieval-augmented generation–enabled GPT-4 for clinical trial screening. NEJM AI. 2024;1 doi: 10.1056/AIoa240018. AIoa2400181. [DOI] [Google Scholar]
- 447.Lam K., Qiu J. Foundation models: the future of surgical artificial intelligence? Br. J. Surg. 2024;111(4):znae090. doi: 10.1093/bjs/znae090. [DOI] [PubMed] [Google Scholar]
- 448.Peng Y., Lin A., Wang M., et al. Enhancing AI reliability: A foundation model with uncertainty estimation for optical coherence tomography-based retinal disease diagnosis. Cell Rep. Med. 2024;5:101568. doi: 10.1016/j.xcrm.2024.101568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 449.Tian Y., Li Z., Jin Y., et al. Foundation model of ECG diagnosis: Diagnostics and explanations of any form and rhythm on ECG. Cell Rep. Med. 2024;5:101875. doi: 10.1016/j.xcrm.2024.101875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 450.Ma J., He Y., Li F., et al. Segment anything in medical images. Nat. Commun. 2024;15:654. doi: 10.1038/s41467-024-44824-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 451.Bluethgen C., Chambon P., Delbrouck J.B., et al. A vision–language foundation model for the generation of realistic chest X-ray images. Nat. Biomed. Eng. 2024;9:494–506. doi: 10.1038/s41551-024-01246-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 452.Xiang J., Wang X., Zhang X., et al. A vision–language foundation model for precision oncology. Nature. 2025;638:769–778. doi: 10.1038/s41586-024-08378-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 453.Vorontsov E., Bozkurt A., Casson A., et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat. Med. 2024;30:2924–2935. doi: 10.1038/s41591-024-03141-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 454.Wang X., Zhao J., Marostica E., et al. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature. 2024;634(8035):970–978. doi: 10.1038/s41586-024-07894-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 455.Khalili N., Ciompi F. Scaling data toward pan-cancer foundation models. Trends Cancer. 2024;10:655–658. doi: 10.1016/j.trecan.2024.08.008. [DOI] [PubMed] [Google Scholar]
- 456.Dalla-Torre H., Gonzalez L., Mendoza-Revilla J., et al. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat. Methods. 2024;22:287–297. doi: 10.1038/s41592-024-02523-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 457.Cui H., Wang C., Maan H., et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods. 2024;21:1470–1480. doi: 10.1038/s41592-024-02201-0. [DOI] [PubMed] [Google Scholar]
- 458.Fu X., Mo S., Buendia A., et al. A foundation model of transcription across human cell types. Nature. 2025;637:965–973. doi: 10.1038/s41586-024-08391-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 459.Ramprasad P., Pai N., Pan W. Enhancing personalized gene expression prediction from dna sequences using genomic foundation models. HGG Adv. 2024;5(4):100347. doi: 10.1016/j.xhgg.2024.100347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 460.Jiang L.Y., Liu X.C., Nejatian N.P., et al. Health system-scale language models are all-purpose prediction engines. Nature. 2023;619(7969):357–362. doi: 10.1038/s41586-023-06160-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 461.Boussina A., Krishnamoorthy R., Quintero K., et al. Large language models for more efficient reporting of hospital quality measures. NEJM AI. 2024;1(11) doi: 10.1056/AIcs2400420. AIcs2400420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 462.Yang X., Chen A., PourNejatian N., et al. A large language model for electronic health records. npj Digit. Med. 2022;5(1):194. doi: 10.1038/s41746-022-00742-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 463.Vaid A., Landi I., Nadkarni G., et al. Using fine-tuned large language models to parse clinical notes in musculoskeletal pain disorders. Lancet Digit. Health. 2023;5(12):e855–e858. doi: 10.1016/S2589-7500(23)00202-9. [DOI] [PubMed] [Google Scholar]
- 464.Van Veen D., Van Uden C., Blankemeier L., et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat. Med. 2024;30(4):1134–1142. doi: 10.1038/s41591-024-02855-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 465.Li H., Moon J.T., Purkayastha S., et al. Ethics of large language models in medicine and medical research. Lancet Digit. Health. 2023;5(6):e333–e335. doi: 10.1016/S2589-7500(23)00083-3. [DOI] [PubMed] [Google Scholar]
- 466.Xiong Z., Wang X., Zhou Y., et al. How generalizable are foundation models when applied to different demographic groups and settings? NEJM AI. 2024 doi: 10.1056/AIcs2400497. AIcs2400497. [DOI] [Google Scholar]
- 467.Hager P., Jungmann F., Holland R., et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat. Med. 2024;30(9):2613–2622. doi: 10.1038/s41591-024-03097-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 468.Zhu Y., Gao J., Wang Z., et al. Is larger always better? evaluating and prompting large language models for non-generative medical tasks. arXiv. 2024 doi: 10.48550/arXiv.2407.18525. Preprint at. [DOI] [Google Scholar]
- 469.Saenz A.D., Harned Z., Banerjee O., et al. Autonomous ai systems in the face of liability, regulations and costs. npj Digit. Med. 2023;6(1):185. doi: 10.1038/s41746-023-00929-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 470.Dvijotham K.D., Winkens J., Barsbey M., et al. Enhancing the reliability and accuracy of ai-enabled diagnosis via complementarity-driven deferral to clinicians. Nat. Med. 2023;29(7):1814–1820. doi: 10.1038/s41591-023-02437-x. [DOI] [PubMed] [Google Scholar]
- 471.Johri S., Jeong J., Tran B.A., et al. An evaluation framework for clinical use of large language models in patient interaction tasks. Nat. Med. 2025;31:77–86. doi: 10.1038/s41591-024-03328-5. [DOI] [PubMed] [Google Scholar]
- 472.He Y., Huang F., Jiang X., et al. Foundation model for advancing healthcare: Challenges, opportunities, and future directions. arXiv. 2024 doi: 10.48550/arXiv.2404.03264. Preprint at. [DOI] [PubMed] [Google Scholar]
- 473.Arun J.T., Ting D.S.J., Elangovan K., et al. Large language models in medicine. Nature Medicine. 2023;29(8):1930–1940. doi: 10.1038/s41591-023-02448-8. [DOI] [PubMed] [Google Scholar]
- 474.Wang Z., Liu C., Zhang S., et al. Vol. 14228. 2023. Foundation model for endoscopy video analysis via large-scale self-supervised pre-train; pp. 101–111. (MICCAI). [DOI] [Google Scholar]
- 475.Brandes N., Ofer D., Peleg Y., et al. Proteinbert: a universal deep-learning model of protein sequence and function. Bioinformatics. 2022;38(8):2102–2110. doi: 10.1093/bioinformatics/btac020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 476.Zhang K., Yu J., Adhikarla E., et al. BiomedGPT: A unified and generalist biomedical generative pre-trained transformer for vision, language and multimodal tasks. arXiv. 2023 doi: 10.48550/arXiv.2305.17100. Preprint at. [DOI] [Google Scholar]
- 477.Azad B., Azad R., Eskandari S., et al. Foundational models in medical imaging: A comprehensive survey and future vision. arXiv. 2023 doi: 10.48550/arXiv.2310.18689. Preprint at. [DOI] [Google Scholar]
- 478.Elwyn G., Frosch D., Thomson R., et al. Shared decision making: a model for clinical practice. J. Gen. Intern. Med. 2012;27:1361–1367. doi: 10.1007/s11606-012-2077-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 479.Zhou H., Liu F., Gu B., et al. A survey of large language models in medicine: Progress, application, and challenge. arXiv. 2023 doi: 10.48550/arXiv.2311.05112. Preprint at. [DOI] [Google Scholar]
- 480.Wang G., Yang G., Du Z., et al. ClinicalGPT: large language models finetuned with diverse medical data and comprehensive evaluation. arXiv. 2023 doi: 10.48550/arXiv.2306.09968. Preprint at. [DOI] [Google Scholar]
- 481.Gao W., Deng Z., Niu Z., et al. Ophglm: Training an ophthalmology large language-and-vision assistant based on instructions and dialogue. arXiv. 2023 doi: 10.48550/arXiv.2306.12174. Preprint at. [DOI] [Google Scholar]
- 482.Farhadi Nia M., Ahmadi M., Irankhah E. Transforming dental diagnostics with artificial intelligence: Advanced integration of chatGPT and large language models for patient care. arXiv. 2024:2406.06616. doi: 10.48550/arXiv.2406.06616. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 483.Suárez A., Jiménez J., Llorente de Pedro M., et al. Beyond the scalpel: Assessing chatGPT’s potential as an auxiliary intelligent virtual assistant in oral surgery. Comput. Struct. Biotechnol. J. 2024;24:46–52. doi: 10.1016/j.csbj.2023.11.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 484.Mohammad-Rahimi H., Motamedian S.R., Rohban M.H., et al. Deep learning for caries detection: A systematic review. J. Dent. 2022;122:104115. doi: 10.1016/j.jdent.2022.104115. [DOI] [PubMed] [Google Scholar]
- 485.Revilla-Leo’n M., Gómez-Polo M., Barmak A.B., et al. Artificial intelligence models for diagnosing gingivitis and periodontal disease: A systematic review. J. Prosthet. Dent. 2023;130(6):816–824. doi: 10.1016/j.prosdent.2022.01.026. [DOI] [PubMed] [Google Scholar]
- 486.Wongratwanich P., Shimabukuro K., Konishi M., et al. Do various imaging modalities provide potential early detection and diagnosis of medication-related osteonecrosis of the jaw? a review. Dentomaxillofac. Radiol. 2021;50(6):20200417. doi: 10.1259/dmfr.20200417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 487.Alabi R.O., Youssef O., Pirinen M., et al. Machine learning in oral squamous cell carcinoma: Current status, clinical concerns and prospects for future—a systematic review. Artif. Intell. Med. 2021;115:102060. doi: 10.1016/j.artmed.2021.102060. [DOI] [PubMed] [Google Scholar]
- 488.Warin K., Limprasert W., Suebnukarn S., et al. Maxillofacial fracture detection and classification in computed tomography images using convolutional neural network-based models. Sci. Rep. 2023;13(1):3434. doi: 10.1038/s41598-023-30640-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 489.Jha N., Lee K.-S., Kim Y.-J. Diagnosis of temporomandibular disorders using artificial intelligence technologies: A systematic review and meta-analysis. PLoS One. 2022;17(8) doi: 10.1371/journal.pone.0272715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 490.Nguyen D.M.H., Nguyen H., Diep N., et al. Lvm-med: Learning large-scale self-supervised vision models for medical imaging via second-order graph matching. Adv. Neural Inf. Process. Syst. 2024;36 [Google Scholar]
- 491.Feng W., Zhu L., Yu L. Cheap lunch for medical image segmentation by fine-tuning sam on few exemplars. International MICCAI Brainlesion Workshop. Springer. 2023:13–22. doi: 10.1007/978-3-031-76160-7_2. [DOI] [Google Scholar]
- 492.Shankar A., Monisha T.R., Kumar S.S., et al. Advancements in AI-driven dentistry: Tooth genAI’s impact on dental diagnosis and treatment planning. 2024 2nd International Conference on Networking, Embedded and Wireless Systems (ICNEWS). IEEE; 2024. p. 1–7. DOI:10.1109/ICNEWS60873.2024.10731122.
- 493.Yue W., Zhang J., Hu K., et al. Part to whole: Collaborative prompting for surgical instrument segmentation. arXiv. 2023:2312.14481. doi: 10.48550/arXiv.2312.14481. Preprint at. [DOI] [Google Scholar]
- 494.Chan H.-L., Misch K., Wang H.-L. Dental imaging in implant treatment planning. Implant Dent. 2010;19(4):288–298. doi: 10.1097/ID.0b013e3181e59ebd. [DOI] [PubMed] [Google Scholar]
- 495.Y. Wu, Y. Zhang, M. Xu, et al. (2024). Effectiveness of various general large language models in clinical consensus and case analysis in dental implantology: A comparative study. DOI: 10.1186/s12911-025-02972-2. [DOI] [PMC free article] [PubMed]
- 496.Elgarba B.M., Fontenele R.C., Tarce M., et al. Artificial intelligence serving pre-surgical digital implant planning: A scoping review. J. Dent. 2024;143:104862. doi: 10.1016/j.jdent.2024.104862. [DOI] [PubMed] [Google Scholar]
- 497.Baumrind S., Korn E.L., Boyd R.L., et al. The decision to extract: part ii. analysis of clinicians’ stated reasons for extraction. Am. J. Orthod. Dentofacial Orthop. 1996;109(4):393–402. doi: 10.1016/S0889-5406(96)70121-X. [DOI] [PubMed] [Google Scholar]
- 498.Kim Y.-H., Park J.-B., Chang M.-S., et al. Influence of the depth of the convolutional neural networks on an artificial intelligence model for diagnosis of orthognathic surgery. J. Pers. Med. 2021;11(5):356. doi: 10.3390/jpm11050356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 499.Shin W.S., Yeom H.-G., Lee G.H., et al. Deep learning based prediction of necessity for orthognathic surgery of skeletal malocclusion using cephalogram in korean individuals. BMC Oral Health. 2021;21:130–137. doi: 10.1186/s12903-021-01513-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 500.Choi H.-I., Jung S.-K., Baek S.-H., et al. Artificial intelligent model with neural network machine learning for the diagnosis of orthognathic surgery. J. Craniofac. Surg. 2019;30(7):1986–1989. doi: 10.1097/SCS.0000000000005650. [DOI] [PubMed] [Google Scholar]
- 501.Mallineni S.K., Sethi M., Punugoti D., et al. Artificial intelligence in dentistry: A descriptive review. Bioengineering. 2024;11(12):1267. doi: 10.3390/bioengineering11121267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 502.Ding H., Wu J., Zhao W., et al. Artificial intelligence in dentistry—a review. Front. Dent. Med. 2023;4:1085251. doi: 10.3389/fdmed.2023.1085251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 503.Wu D., Jiang J., Pan J., et al. IEEE/ASME Transactions on Mechatronics. 2023. Root Canal Preparation Robot Based on Guiding Strategy for Safe Remote Therapy: System Design and Feasibility Study. [DOI] [Google Scholar]
- 504.Shan T., Tay F.R., Gu L. Application of artificial intelligence in dentistry. J. Dent. Res. 2021;100(3):232–244. doi: 10.1177/0022034520969115. [DOI] [PubMed] [Google Scholar]
- 505.Schwendicke F., Samek W., Krois J. Artificial intelligence in dentistry: chances and challenges. J. Dent. Res. 2020;99(7):769–774. doi: 10.1177/0022034520915714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 506.Salerno S., Laghi A., Cantone M.-C., et al. Overdiagnosis and overimaging: an ethical issue for radiological protection. Radiol. Med. 2019;124:714–720. doi: 10.1007/s11547-019-01029-5. [DOI] [PubMed] [Google Scholar]
- 507.Kulyabin M., Zhdanov A., Pershin A., et al. Segment anything in optical coherence tomography: Sam 2 for volumetric segmentation of retinal biomarkers. Bioengineering. 2024;11(9):940. doi: 10.3390/bioengineering11090940. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 508.Zhang J., Zhang Z.M. Ethics and governance of trustworthy medical artificial intelligence. BMC Med. Inform. Decis. Mak. 2023;23:7. doi: 10.1186/s12911-023-02103-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 509.Tang Y.D., Dong E.D., Gao W. LLMs in medicine: The need for advanced evaluation systems for disruptive technologies. The Innovation. 2024;5:100622. doi: 10.1016/j.xinn.2024.100622. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 510.Yang Y., Xu S., Hong Y., et al. Computational modeling for medical data: From data collection to knowledge discovery. Innov. Life. 2024;2(3):100079. doi: 10.59717/j.xinn-life.2024.100079. [DOI] [Google Scholar]
- 511.Guan H., Liu M. Domain adaptation for medical image analysis: A survey. IEEE Trans. Biomed. Eng. 2021;69(3):1173–1185. doi: 10.1109/TBME.2021.3117407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 512.Zhang W., Han J., Xu Z., et al. Towards urban general intelligence: A review and outlook of urban foundation models. arXiv. 2024:2402.01749. doi: 10.48550/arXiv.2402.01749. Preprint at. [DOI] [Google Scholar]
- 513.Liang Y., Wen H., Nie Y., et al. Foundation models for time series analysis: A tutorial and survey. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2024. p. 6555–6565. DOI:10.1145/3637528.3671451.
- 514.Alnegheimish S., Nguyen L., Berti-Equille L., et al. Large language models can be zero-shot anomaly detectors for time series? arXiv. 2024:2405.14755. doi: 10.48550/arXiv.2405.14755. Preprint at. [DOI] [Google Scholar]
- 515.Zhou T., Niu P., Sun L., et al. One fits all: Power general time series analysis by pretrained lm. Adv. Neural Inf. Process. Syst. 2023;36:43322–43355. doi: 10.5555/3666122.3667999. [DOI] [Google Scholar]
- 516.Liu Q., Liu X., Liu C., et al. Time-ffm: Towards lm-empowered federated foundation model for time series forecasting. arXiv. 2024 Preprint at. [Google Scholar]
- 517.Li Z., Xia L., Tang J., et al. UrbanGPT: Spatiotemporal large language models. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2024. p. 5351–5362. DOI: 10.1145/3637528.3671578. [DOI]
- 518.Liu C., Yang S., Xu Q., et al. Spatial-temporal large language model for traffic prediction. arXiv. 2024:2401.10134. doi: 10.48550/arXiv.2401.10134. Preprint at. [DOI] [Google Scholar]
- 519.Ansari A.F., Stella L., Turkmen C., et al. Chronos: Learning the language of time series. arXiv. 2024:2403.07815. doi: 10.48550/arXiv.2403.07815. Preprint at. [DOI] [Google Scholar]
- 520.Yuan Y., Ding J., Feng J., et al. Unist: A prompt-empowered universal model for urban spatio-temporal prediction. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2024. p. 4095–4106. DOI: 10.1145/3637528.3671662. [DOI]
- 521.Lin Y., Wei T., Zhou Z., et al. Trajfm: A vehicle trajectory foundation model for region and task transferability. arXiv. 2024 Preprint at. [Google Scholar]
- 522.Zhu Y., Yu J.J., Zhao X., et al. Unitraj: Universal human trajectory modeling from billion-scale worldwide traces. arXiv. 2024 Preprint at. [Google Scholar]
- 523.Xixuan H., Chen W., Yan Y., et al. Urbanvlp: A multi-granularity vision-language pre-trained foundation model for urban indicator prediction. arXiv. 2024 Preprint at. [Google Scholar]
- 524.Xiao C., Zhou J., Xiao Y., et al. Refound: Crafting a foundation model for urban region understanding upon language and visual foundations. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2024. p. 3527–3538. DOI: 10.1145/3637528.3671992. [DOI]
- 525.Lai S., Xu Z., Zhang W., et al. Large language models as traffic signal control agents: Capacity and opportunity. arXiv. 2023:2312.16044. doi: 10.48550/arXiv.2312.16044. Preprint at. [DOI] [Google Scholar]
- 526.Zhou Z., Lin Y., Jin D., et al. Large language model for participatory urban planning. arXiv. 2024:2402.17161. doi: 10.48550/arXiv.2402.17161. Preprint at. [DOI] [Google Scholar]
- 527.Li L., Li J., Chen C., et al. Political-LLM: Large language models in political science. arXiv. 2024 doi: 10.48550/arXiv.2412.06864. Preprint at. [DOI] [Google Scholar]
- 528.Liu M., Shi G. (2024). Poliprompt: A high-performance cost-effective LLM-based text classification framework for political science. Preprint at arXiv:2409.01466. DOI: 10.48550/arXiv.2409.01466. [DOI]
- 529.Chandra V., Albaaji G.F., Hareendran A. Precision farming for sustainability: An agricultural intelligence model. Comput. Electron. Agric. 2024 doi: 10.1016/j.compag.2024.109386. [DOI] [Google Scholar]
- 530.Li J., Xu M., Xiang L., et al. Foundation models in smart agriculture: Basics, opportunities, and challenges. Comput. Electron. Agric. 2024;222:109032. doi: 10.1016/j.compag.2024.109032. [DOI] [Google Scholar]
- 531.Zhang Z., Ni W., Quegan S., et al. Deforestation in latin america in the 2000s predominantly occurred outside of typical mature forests. Innovation. 2024;5(3):100610. doi: 10.1016/j.xinn.2024.100610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 532.Kuska M.T., Wahabzada M., Paulus S. Ai for crop production–where can large language models (LLMs) provide substantial value? Comput. Electron. Agric. 2024;221:108924. doi: 10.1016/j.compag.2024.108924. [DOI] [Google Scholar]
- 533.K. Tan. (2024). Large language models for crop yield prediction. DOI:10.21203/rs.3.rs-4750823/v1
- 534.Lyu Y., Wang P., Bai X., et al. Machine learning techniques and interpretability for maize yield estimation using time-series images of modis and multi-source data. Comput. Electron. Agric. 2024;222:109063. doi: 10.1016/j.compag.2024.109063. [DOI] [Google Scholar]
- 535.Pallottino F., Violino S., Figorilli S., et al. Applications and perspectives of generative artificial intelligence in agriculture. Comput. Electron. Agric. 2025;230:109919. doi: 10.1016/j.compag.2025.109919. [DOI] [Google Scholar]
- 536.Zhu H., Qin S., Su M., et al. Harnessing large vision and language models in agriculture: A review. arXiv. 2024 Preprint at. [Google Scholar]
- 537.Zhou Y., Yan H., Ding K., et al. Few-shot image classification of crop diseases based on vision–language models. Sensors. 2024;24(18):6109. doi: 10.3390/s24186109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 538.Wu X., Fu B., Wang S., et al. Three main dimensions reflected by national sdg performance. Innovation. 2023;4(6):100507. doi: 10.1016/j.xinn.2023.100507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 539.Zhu W., Li W., Zhang H., et al. Big data and artificial intelligence-aided crop breeding: Progress and prospects. J. Integr. Plant Biol. 2024 doi: 10.1111/jipb.13791. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 540.Farooq M.A., Gao S., Hassan M.A., et al. Artificial intelligence in plant breeding. Trends Genet. 2024 doi: 10.1016/j.tig.2024.07.001. [DOI] [PubMed] [Google Scholar]
- 541.Mai G., Huang W., Sun J., et al. On the opportunities and challenges of foundation models for geospatial artificial intelligence. arXiv. 2023 Preprint at. [Google Scholar]
- 542.Li J., Li D., Xiong C., et al. International Conference on Machine Learning. PMLR; 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation; pp. 12888–12900. [Google Scholar]
- 543.Guarino M., Tullo E., Finzi A. Environmental impact of livestock farming and precision livestock farming as a mitigation strategy. Sci. Total Environ. 2019 doi: 10.1016/j.scitotenv.2018.10.018. [DOI] [PubMed] [Google Scholar]
- 544.Yang X., Dai H., Wu Z., et al. Sam for poultry science. arXiv. 2023 Preprint at. [Google Scholar]
- 545.Xie Q., Bao J. Artificial intelligence in animal farming: A systematic literature review. J. Clean. Prod. 2022 doi: 10.1016/j.jclepro.2021.129956. [DOI] [Google Scholar]
- 546.Liu N., Qi J., An X., et al. A review on information technologies applicable to precision dairy farming: Focus on behavior, health monitoring, and the precise feeding of dairy cows. Agriculture. 2023;13(10):1858. doi: 10.3390/agriculture13101858. [DOI] [Google Scholar]
- 547.Sun W., Zhang J., Lei Y., et al. Rsprotoseg: High spatial resolution remote sensing images segmentation based on non-learnable prototypes. IEEE Trans. Geosci. Rem. Sens. 2024 doi: 10.1109/TGRS.2024.3404922. [DOI] [Google Scholar]
- 548.Arafat S., Priya C., Premkumar R., et al. Real time cattle health monitoring and early disease detection using iot and machine learning. 2024. IEEE; 2024. pp. 450–455. [DOI] [Google Scholar]
- 549.Islam M.M., Tonmoy S.S., Quayum S., et al. Smart poultry farm incorporating gsm and iot. 2019 International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST) IEEE; 2019. pp. 277–280. [DOI] [Google Scholar]
- 550.Nie Y., Kong Y., Dong X., et al. A survey of large language models for financial applications: Progress, prospects and challenges. arXiv. 2024 Preprint at. [Google Scholar]
- 551.Brynjolfsson E., McAfee A. WW Norton & company; 2014. The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. [Google Scholar]
- 552.Xu W., Feng Y. Does fintech promote green economic growth in chinese cities? 2024. Atlantis Press; 2024. pp. 942–948. [DOI] [Google Scholar]
- 553.Qatawneh A.M., Lutfi A., Al Barrak T. Effect of artificial intelligence (AI) on financial decision-making: Mediating role of financial technologies (fin-tech) HighTech Innov. J. 2024;5(3):759–773. doi: 10.28991/HIJ-2024-05-03-015. [DOI] [Google Scholar]
- 554.Kriebel J., Stitz L. Credit default prediction from user-generated text in peer-to-peer lending using deep learning. Eur. J. Oper. Res. 2022;302(1):309–323. doi: 10.1016/j.ejor.2021.12.024. [DOI] [Google Scholar]
- 555.Arner D.W., Zetzsche D.A., Buckley R.P., et al. Fintech and regtech: Enabling innovation while preserving financial stability. Georgetown J. Int. Aff. 2017;18:47–58. [Google Scholar]
- 556.Mhlanga D. Financial inclusion in emerging economies: The application of machine learning and artificial intelligence in credit risk assessment. Int. J. Financ. Stud. 2021;9(3):39. doi: 10.3390/ijfs9030039. [DOI] [Google Scholar]
- 557.Duarte J., Siegel S., Young L. Trust and credit: The role of appearance in peer-to-peer lending. Rev. Financ. Stud. 2012;25(8):2455–2484. doi: 10.1093/rfs/hhs071. [DOI] [Google Scholar]
- 558.Li C., Wang H., Jiang S., et al. The effect of ai-enabled credit scoring on financial inclusion: Evidence from an underserved population of over one million. MIS Q. 2024;48(4):1803–1834. doi: 10.25300/misq/2024/18340. [DOI] [Google Scholar]
- 559.Chen T., Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd Acm SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016. p. 785–794. DOI: 10.1145/2939672.293978
- 560.Delgadillo J., Kinyua J., Mutigwe C. Finsosent: Advancing financial market sentiment analysis through pretrained large language models. Big Data Cogn & Comput. 2024;8(8):87. DOI: 10.3390/bdcc8080087
- 561.Li B., Duan C. Construction and optimization of macroeconomic data forecasting model based on machine learning. Journal of Electrical Systems. 2024;20(3s):436–447. doi: 10.52783/jes.1310. [DOI] [Google Scholar]
- 562.Takahashi S., Chen Y., Tanaka-Ishii K. Modeling financial time-series with generative adversarial networks. Phys. Stat. Mech. Appl. 2019;527:121261. doi: 10.1016/j.physa.2019.121261. [DOI] [Google Scholar]
- 563.Hajek P., Henriques R. Mining corporate annual reports for intelligent detection of financial statement fraud–a comparative study of machine learning methods. Knowl. Base Syst. 2017;128:139–152. doi: 10.1016/j.knosys.2017.05.001. [DOI] [Google Scholar]
- 564.Avacharmal R. Leveraging supervised machine learning algorithms for enhanced anomaly detection in anti-money laundering (aml) transaction monitoring systems: A comparative analysis of performance and explainability. African Journal of Artificial Intelligence and Sustainable Development. 2021;1(2):68–85. [Google Scholar]
- 565.Teichmann F., Boticiu S., Sergi B.S. Regtech–potential benefits and challenges for businesses. Technol. Soc. 2023;72:102150. doi: 10.1016/j.techsoc.2022.102150. [DOI] [Google Scholar]
- 566.Tapscott D., Tapscott A. Portfolio; 2017. Blockchain Revolution: How the Technology behind Bitcoin and Other Cryptocurrencies Is Changing the World. [Google Scholar]
- 567.Cong L.W., He Z., Li J. Decentralized mining in centralized pools. Rev. Financ. Stud. 2021;34(3):1191–1235. doi: 10.1093/rfs/hhaa040. [DOI] [Google Scholar]
- 568.Yang Q., Liu Y., Chen T., et al. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. 2019;10(2):1–19. doi: 10.1145/3298981. [DOI] [Google Scholar]
- 569.Mittelstadt B.D., Allo P., Taddeo M., et al. The ethics of algorithms: Mapping the debate. Big Data & Society. 2016;3(2) doi: 10.1177/2053951716679679. [DOI] [Google Scholar]
- 570.Floridi L., Cowls J., Beltrametti M., et al. Ai4people—an ethical framework for a good ai society: opportunities, risks, principles, and recommendations. Minds Mach; 2018;28:689–707. doi: 10.1007/s11023-018-9482-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 571.Ribeiro M.T., Singh S., Guestrin C. Why should i trust you? explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016. p. 1135–1144. DOI: 10.1145/2939672.2939778. [DOI]
- 572.Hung C., Labutov I., Thaker K., et al. Automatic concept extraction for domain and student modeling in adaptive textbooks. Int. J. Artif. Intell. Educ. 2021;31:820–846. doi: 10.1007/s40593-020-00207-1. [DOI] [Google Scholar]
- 573.Liu Q., Zheng X., Liu Y., et al. Exploration of the characteristics of teachers’ multimodal behaviours in problem-oriented teaching activities with different response levels. Br. J. Educ. Technol. 2024;55(1):181–207. doi: 10.1111/bjet.13332. [DOI] [Google Scholar]
- 574.Oyedokun T.T. IGI Global Scientific Publishing; 2025. Assistive technology and accessibility tools in enhancing adaptive education. Advancing Adaptive Education: Technological Innovations for Disability Support; pp. 125–162. [DOI] [Google Scholar]
- 575.Strielkowski W., Grebennikova V., Lisovskiy A., et al. Ai-driven adaptive learning for sustainable educational transformation. Sustain. Dev. 2024 doi: 10.1002/sd.3221. [DOI] [Google Scholar]
- 576.Zaugg T. Vol. 275. 2024. Future innovations for assistive technology and universal design for learning. (Assistive Technology and Universal Design for Learning: Toolkits for Inclusive Instruction). [Google Scholar]
- 577.Celik I., Dindar M., Muukkonen H., et al. The promises and challenges of artificial intelligence for teachers: A systematic review of research. TechTrends. 2022;66(4):616–630. doi: 10.1007/s11528-022-00715-y. [DOI] [Google Scholar]
- 578.Zhang H., Huang J., Mei K., et al. Agent security bench (ASB): formalizing and benchmarking attacks and defenses in LLM-based agents. CoRR. 2024 abs/2410. [Google Scholar]
- 579.Meeus M., Jain S., Rei M., et al. In: 33rd USENIX Security Symposium, USENIX Security 2024. Balzarotti D., Xu W., editors. USENIX Association; Philadelphia, PA, USA: 2024. Did the neurons read your book? document-level membership inference for large language models. [Google Scholar]
- 580.Vishwamitra N., Guo K., Romit F.T., et al. Moderating new waves of online hate with chain-of-thought reasoning in large language models. IEEE; 2024. pp. 788–806. [DOI] [Google Scholar]
- 581.Cheng X., Chen W., Shen H., et al. Intelligent algorithm safety: Concepts, scientific problems and prospects. Bull. Chin. Acad. Sci. 2024;39 [Google Scholar]
- 582.Perez F., Ribeiro I. Ignore previous prompt: Attack techniques for language models. CoRR. 2022 abs/2211. [Google Scholar]
- 583.Suo X. Signed-prompt: A new approach to prevent prompt injection attacks against LLM-integrated applications. CoRR. 2024 abs/2401. [Google Scholar]
- 584.Liu Y., Jia Y., Geng R., et al. In: 33rd USENIX Security Symposium, USENIX Security 2024. Balzarotti D., Xu W., editors. USENIX Association; Philadelphia, PA, USA: 2024. Formalizing and benchmarking prompt injection attacks and defenses. [Google Scholar]
- 585.Abdelnabi S., Greshake K., Mishra S., et al. In: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, AISec 2023. Pintor M., Chen X., Tramer F., editors. ACM; Copenhagen, Denmark: 2023. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection; pp. 79–90. [Google Scholar]
- 586.Yi J., Xie Y., Zhu B., et al. Benchmarking and defending against indirect prompt injection attacks on large language models. CoRR. 2023 abs/2312. [Google Scholar]
- 587.Xiang C., Wu T., Zhong Z., et al. Certifiably robust RAG against retrieval corruption. CoRR. 2024 [Google Scholar]
- 588.Chen Z., Xiang Z., Xiao C., et al. Agentpoison: Red-teaming LLM agents via poisoning memory or knowledge bases. CoRR. 2024 abs/2407. [Google Scholar]
- 589.Wang Y., Xue D., Zhang S., Badagent, et al. Inserting and activating backdoor attacks in LLM agents. Association for Computational Linguistics; Bangkok, Thailand: 2024. pp. 9811–9827. [DOI] [Google Scholar]
- 590.Yang W., Bi X., Lin Y., et al. Watch out for your agents! investigating backdoor threats to LLM-based agents. CoRR. 2024 abs/2402. [Google Scholar]
- 591.Dong T., Xue M., Chen G., et al. The philosopher’s stone: Trojaning plugins of large language models. Network and Distributed System Security Symposium, NDSS 2025. The Internet Society. 2025 doi: 10.14722/ndss.2025.230164. [DOI] [Google Scholar]
- 592.Hubinger E., Denison C., Mu J., et al. Sleeper agents: Training deceptive LLMs that persist through safety training. CoRR, abs/2401. 2024 [Google Scholar]
- 593.Yu Z., Liu X., Liang S., et al. In: 33rd USENIX Security Symposium, USENIX Security 2024. Balzarotti D., Xu W., editors. USENIX Association; Philadelphia, PA, USA: 2024. Don’t listen to me: Understanding and exploring jailbreak prompts of large language models. [Google Scholar]
- 594.Yu J., Lin X., Yu Z., et al. In: 33rd USENIX Security Symposium, USENIX Security 2024. Balzarotti D., Xu W., editors. USENIX Association; Philadelphia, PA, USA: 2024. Llm-fuzzer: Scaling assessment of large language model jailbreaks. [Google Scholar]
- 595.Liu Y., Deng G., Xu Z., et al. Jailbreaking chatGPT via prompt engineering: An empirical study. CoRR. 2023 abs/2305. [Google Scholar]
- 596.Alexander W., Haghtalab N., Jacob S. In: Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023. Oh A., Naumann T., Globerson A., et al., editors. NeurIPS; 2023. Jailbroken: How does LLM safety training fail? [Google Scholar]
- 597.Zhang Z., Shen G., Tao G., et al. On large language models’ resilience to coercive interrogation. IEEE; 2024. pp. 826–844. [DOI] [Google Scholar]
- 598.Xu J., Ma M.D., Wang F., et al. In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), NAACL 2024. Duh K., Gomez-Adorno H., Bethard S., editors. Association for Computational Linguistics; 2024. Instructions as backdoors: Backdoor vulnerabilities of instruction tuning for large language models; pp. 3111–3126. [Google Scholar]
- 599.Zhao S., Wen J., Luu A.T., et al. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023. Bouamor H., Pino J., Bali K., editors. Association for Computational Linguistics; 2023. Prompt as triggers for backdoor attack: Examining the vulnerability in language models; pp. 12303–12317. [DOI] [Google Scholar]
- 600.Cai X., Xu H., Xu S., et al. In: Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022. Koyejo S., Mohamed S., Agarwal A., et al., editors. NeurIPS; 2022. Badprompt: Backdoor attacks on continuous prompts. [Google Scholar]
- 601.Li Y., Li T., Chen K., et al. The Twelfth International Conference on Learning Representations, ICLR 2024. OpenReview.net; 2024. Badedit: Backdooring large language models by model editing. [Google Scholar]
- 602.Jain N., Schwarzschild A., Wen Y., et al. Baseline defenses for adversarial attacks against aligned language models. CoRR. 2023 abs/2309. [Google Scholar]
- 603.Fan X., Li M., Zhou J., et al. GCSA: A new adversarial example-generating scheme toward black-box adversarial attacks. IEEE Trans. Consumer Electron. 2024;70(1):2038–2048. doi: 10.1109/TCE.2024.3358179. [DOI] [Google Scholar]
- 604.Wang B., Fan X., Jing Q., et al. GTAT: adversarial training with generated triplets. IEEE; 2022. pp. 1–8. [DOI] [Google Scholar]
- 605.Jing Q., Liu S., Fan X., et al. Can adversarial training benefit trajectory representation?: An investigation on robustness for trajectory similarity computation. ACM; Atlanta, GA, USA: 2022. pp. 905–914. [DOI] [Google Scholar]
- 606.Tong L., Zhang Y., Zhao Z., et al. In: 33rd USENIX Security Symposium, USENIX Security 2024. Balzarotti D., Xu W., editors. USENIX Association; Philadelphia, PA, USA: 2024. Making them ask and answer: Jailbreaking large language models in few queries via disguise and reconstruction. [Google Scholar]
- 607.Ji Z., Lee N., Frieske R., et al. Survey of hallucination in natural language generation. ACM Comput. Surv. 2023;55(12):248:1-38. DOI: 10.1145/3571730. [DOI]
- 608.Zhang Y., Li Y., Cui L., et al. Siren’s song in the AI ocean: A survey on hallucination in large language models. CoRR. 2023 abs/2309. [Google Scholar]
- 609.Liu N.F., Lin K., Hewitt J., et al. Lost in the middle: How language models use long contexts. Trans. Assoc. Comput. Linguist. 2024;12:157–173. doi: 10.1162/tacl_a_00638. [DOI] [Google Scholar]
- 610.Penedo G., Malartic Q., Hesslow D., et al. In: Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023. Oh A., Naumann T., Globerson A., et al., editors. NeurIPS; 2023. The refinedweb dataset for falcon LLM: outperforming curated corpora with web data only. [Google Scholar]
- 611.Li S., Li X., Shang L., et al. In: Findings of the Association for Computational Linguistics: ACL 2022. Muresan S., Nakov P., Villavicencio A., editors. Association for Computational Linguistics; 2022. How pre-trained language models capture factual knowledge? A causal-inspired analysis; pp. 1720–1732. [DOI] [Google Scholar]
- 612.McKenna N., Li T., Cheng L., et al. In: Findings of the Association for Computational Linguistics: EMNLP 2023. Bouamor H., Pino J., Bali K., editors. Association for Computational Linguistics; 2023. Sources of hallucination by large language models on inference tasks; pp. 2758–2774. [DOI] [Google Scholar]
- 613.Kadavath S., Conerly T., Askell A., et al. Language models (mostly) know what they know. CoRR. 2022 abs/2207. [Google Scholar]
- 614.Yin Z., Sun Q., Guo Q., et al. In: Findings of the association for Computational Linguistics: ACL 2023. Rogers A., Boyd-Graber J.L., Okazaki N., editors. Association for Computational Linguistics; 2023. Do large language models know what they don’t know? pp. 8653–8665. [DOI] [Google Scholar]
- 615.Gardent C., Shimorina A., Narayan S., et al. In: Barzilay R., Kan M.-Y., editors. Volume. 2017. Creating training corpora for NLG micro-planners; pp. 179–188. (1: Long Papers. Association for Computational Linguistics). [DOI] [Google Scholar]
- 616.Lin S., Jacob H., Evans O. Truthfulqa: Measuring how models mimic human falsehoods. In: Muresan S., Nakov P., Villavicencio A., editors. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022. Association for Computational Linguistics; 2022. p. 3214–3252. DOI: 10.18653/v1/2022.acl-long.229. [DOI]
- 617.Zarrieß S., Voigt H., Schüz S. Decoding methods in neural language generation: A survey. Inf. 2021;12(9):355. doi: 10.3390/info12090355. [DOI] [Google Scholar]
- 618.Lee N., Ping W., Xu P., et al. In: Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022. Koyejo S., Mohamed S., Agarwal A., et al., editors. NeurIPS; New Orleans, LA, USA: 2022. Factuality enhanced language models for open-ended text generation. [Google Scholar]
- 619.Dhuliawala S., Komeili M., Xu J., et al. In: Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and Virtual Meeting. Ku L.-W., Martins A., Srikumar V., editors. Association for Computational Linguistics; 2024. Chain-of-verification reduces hallucination in large language models; pp. 3563–3578. [DOI] [Google Scholar]
- 620.Li K., Patel O., Viégas F.B., et al. In: Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023. Oh A., Naumann T., Globerson A., et al., editors. NeurIPS; New Orleans, LA, USA: 2023. Inference-time intervention: Eliciting truthful answers from a language model. [Google Scholar]
- 621.Xiong M., Hu Z., Lu X., et al. Can LLMs express their uncertainty? an empirical evaluation of confidence elicitation in LLMs. The Twelfth International Conference on Learning Representations. 2024 [Google Scholar]
- 622.Shokri R., Stronati M., Song C., et al. 2017 IEEE Symposium on Security and Privacy (SP) IEEE; 2017. Membership inference attacks against machine learning models; pp. 3–18. [DOI] [Google Scholar]
- 623.Carlini N., Tramèr F., Wallace E., et al. In: 30th USENIX Security Symposium, USENIX Security 2021. Bailey M.D., Greenstadt R., editors. USENIX Association; 2021. Extracting training data from large language models; pp. 2633–2650. [Google Scholar]
- 624.Mattern J., Mireshghallah F., Jin Z., et al. In: Findings of the Association for Computational Linguistics: ACL 2023. Rogers A., Boyd-Graber J.L., Okazaki N., editors. Association for Computational Linguistics; 2023. Membership inference attacks against language models via neighbourhood comparison; pp. 11330–11343. [Google Scholar]
- 625.Wang B., Chen W., Pei H., et al. In: Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023. Oh A., Naumann T., Globerson A., et al., editors. NeurIPS; New Orleans, LA, USA: 2023. Decodingtrust: A comprehensive assessment of trustworthiness in GPT models. [Google Scholar]
- 626.Hendrycks D., Burns C., Basart S., et al. 9th International Conference on Learning Representations, ICLR 2021. OpenReview.net; 2021. Aligning AI with shared human values. [Google Scholar]
- 627.Jin Z., Levine S., Adauto F.G., et al. In: Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022. Koyejo S., Mohamed S., Agarwal A., et al., editors. NeurIPS; New Orleans, LA, USA: 2022. When to make exceptions: Exploring language models as accounts of human moral judgment. [Google Scholar]
- 628.Huang K., Liu X., Guo Q., et al. Flames: Benchmarking value alignment of LLMs in chinese. In: Duh K., Go’mez-Adorno H., Bethard S., editors. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), NAACL 2024. Association for Computational Linguistics; 2024. p. 4551–4591. DOI: 10.18653/v1/2024.naacl-long.256
- 629.Zheng J., Wang H., Zhang A., et al. In: Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024. Globersons A., Mackey L., Belgrave D., et al., editors. NeurIPS; 2024. Ali-agent: Assessing LLMs’ alignment with human values via agent-based evaluation. [Google Scholar]
- 630.Lee J., Kim M., Kim S., et al. In: Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and Virtual Meeting. Ku L.-W., Martins A., Srikumar V., editors. Association for Computational Linguistics; 2024. LLM alignment benchmark for korean social values and common knowledge; pp. 11177–11213. [DOI] [Google Scholar]
- 631.Bu F., Wang Z., Wang S., et al. An investigation into value misalignment in LLM-generated texts for cultural heritage. CoRR. 2025 abs/2501. [Google Scholar]
- 632.Vosoughi S., Roy D., Aral S. The spread of true and false news online. science. 2018;359(6380):1146–1151. doi: 10.1126/science.aap95. [DOI] [PubMed] [Google Scholar]
- 633.Zhang Y., Sharma K., Du L., et al. Toward mitigating misinformation and social media manipulation in LLM era. Companion Proceedings of the ACM on Web Conference. 2024:1302–1305. doi: 10.1145/3589335.364125. [DOI] [Google Scholar]
- 634.Kramer A.D.I., Guillory J.E., Hancock J.T. Experimental evidence of massive-scale emotional contagion through social networks. Proc. Natl. Acad. Sci. USA. 2014;111(24):8788–8790. doi: 10.1073/pnas.132004011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 635.Singhal K., Tu T., Gottweis J., et al. Toward expert-level medical question answering with large language models. Nat. Med. 2025;31:943–950. doi: 10.1038/s41591-024-03423-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 636.Chen Y., Huang X., Yang F., et al. Performance of chatGPT and bard on the medical licensing examinations varies across different cultures: a comparison study. BMC Med. Educ. 2024;24(1):1372. doi: 10.1186/s12909-024-06309-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 637.Ji J., Qiu T., Chen B., et al. Ai alignment: A comprehensive survey. arXiv. 2023 Preprint at. [Google Scholar]
- 638.Yao J., Yi X., Duan S., et al. Value compass leaderboard: A platform for fundamental and validated evaluation of LLMs values. arXiv. 2025 Preprint at. [Google Scholar]
- 639.Schwartz S.H. An overview of the Schwartz theory of basic values. ORPC. 2012;2(1):11. doi: 10.9707/2307-0919.1116. [DOI] [Google Scholar]








