Abstract
Artificial intelligence (AI) using large language models (LLMs) such as GPTs has revolutionized various fields. Recently, LLMs have also made inroads in chemical research even for users without expertise in coding. However, applying LLMs directly may lead to “hallucinations”, where the model generates unreliable or inaccurate information and is further exacerbated by limited data set and inherent complexity of chemical reports. To counteract this, researchers have suggested prompt engineering, which can convey human ideas formatively and unambiguously to LLMs and simultaneously improve LLMs’ reasoning capability. So far, prompt engineering remains underutilized in chemistry, with many chemists barely acquainted with its principle and techniques. In this Outlook, we delve into various prompt engineering techniques and illustrate relevant examples for extensive research from metal–organic frameworks and fast-charging batteries to autonomous experiments. We also elucidate the current limitations of prompt engineering with LLMs such as incomplete or biased outcomes and constraints imposed by closed-source limitations. Although LLM-assisted chemical research is still in its early stages, the application of prompt engineering will significantly enhance accuracy and reliability, thereby accelerating chemical research.
Short abstract
AI using large language models creates an unprecedented opportunity for chemical discovery and prompt engineering hopefully unleashes their true potential for accelerating chemical research.
Introduction
Large Language Models (LLMs), a type of artificial intelligence (AI) designed to understand and generate human language, are trained on extensive text data sets to perform a wide range of tasks.1−4 They have emerged as transformative tools in various domains, including natural language processing,5,6 programming,7,8 biology9−11 and chemical research.12,13 With the ability to predict molecular property,14 optimize experimental designs15−17 and analyze vast amounts of literature,18,19 LLMs hold great promise for increasing the efficiency of scientific discovery in the chemistry field, especially for chemists without the expertise of coding.20−23 Distinct from manual approaches, LLMs can efficiently handle repetitive and time-consuming tasks, such as organizing and summarizing literature, in a more cost-effective manner. Moreover, due to their strong learning and generation capabilities, LLMs have significant potential to provide constructive scientific insights and experimental guidance, speeding up research and decision-making.24 In contrast to traditional models, which are typically task-specific, LLMs are highly flexible and offer higher performance in many cases. Their large-scale training data further enhances their ability to handle diverse tasks. Moreover, their user-friendly interfaces enable chemical researchers without computer science expertise to interact with them effortlessly.25
Recently, cutting-edge LLMs such as OpenAI-o1, Gemini 2.0 and Claude 3.5 have demonstrated significant advancements. Particularly, OpenAI-o1, which has been trained using reinforcement learning and chain-of-thought, demonstrates enhanced reasoning capabilities and leading performance across multiple benchmarks.26 However, directly applying them in chemical research still faces notable challenges. A key limitation is LLMs’ insufficient domain-specific expertise, which restricts their ability to provide reliable experimental guidance. Additionally, LLMs are prone to hallucination,27 where the model generates inaccurate or misleading information due to its reliance on broad linguistic patterns rather than domain-specific, contextually accurate reports. This challenge is further compounded by the complexity of chemical knowledge, sparse experimental data, and unstructured inputs like molecular formulas and unstructured representations, which LLMs struggle to handle accurately without specialized pretraining.
To counteract this, prompt engineering emerged, which enhances LLMs’ ability to better understand users’ intentions and unlock LLMs full potential to turn human aspirations into reality with remarkable effectiveness.28 Prompt engineering not only guides the model to correctly fulfill users’ demands but also develops a comprehensive understanding of the underlying knowledge structures essential to the specific domain. Now, prompt engineering has advanced to incorporate a variety of techniques,29 and we categorize them into four types: simple prompt, chain, generation and integration, which are based on the interaction between the prompt and the model (Figure 1). Currently, there are preliminary uses of prompt engineering in LLMs for chemical and materials research, such as text and image mining,27,30 synthesis routes prediction and optimization,16,31−33 aging patterns and ionic conductivity prediction34−36 in battery research, and automated task processing in drug discovery and materials design1,37,38 (Figure 1). These studies not only encompass text processing but also integrate specific chemical experiments and data analysis, providing valuable guidance and practical convenience for chemists.
Figure 1.
A diagram illustrating prompt engineering methods of LLMs for chemical research. Example illustrations reproduced with permission from ref (1) (Copyright 2023, Nature), ref (27) (Copyright 2023, American Chemical Society), and ref (32) (Copyright 2023, Wiley-VCH).
Nevertheless, prompt engineering remains underutilized in the field of chemistry, with a considerable number of chemists possessing only a basic understanding of its principles and importance. Considering this, we summarize several prompt engineering technologies currently employed or potentially applicable in the chemical field. We highlight how these techniques can enhance LLMs’ chemical reasoning and accuracy through a detailed analysis of their principles, practical examples, and limitations. This Outlook offers both theoretical and practical insights, serving as a foundation for further optimization in prompt design and the broader integration of LLMs in chemistry.
Prompt Engineering Methods
Prompt engineering, also known as in-context prompting, involves the design and optimization of input prompts to interact with LLMs for achieving the most effective outputs.29 The evolution of prompt engineering is closely tied to advancements in LLMs. Along increase in training data and model parameters as well as the emergence of advanced training techniques, the performance of LLM models has greatly improved.39,40 This approach mitigates the high costs associated with retraining and facilitates the efficient application of large-scale models. Therefore, prompt engineering is essential for facilitating interactions between LLMs and chemical research.41
Zero-Shot and Few-Shot Prompting
Two common prompting strategies, zero-shot and few-shot prompting, serve different purposes in guiding the model’s performance. Zero-shot prompting presents a task without examples, depending on the model’s generalization abilities. Few-shot prompting, however, provides a few input-output examples to illustrate the task, aiding the model in understanding task specifics.24 For complex tasks where models may have insufficient contextual details to utilize its existing knowledge effectively, offering some representative examples enables the model to better understand the nuances of the task, ultimately enhancing its ability to generate accurate and relevant outputs for similar instances.
Recently, Yaghi and co-workers employed the few-shot prompting and zero-shot prompting for text mining of MOF (Metal Organic Frameworks) synthesis (Figure 2a).27 By specifying both the input text and the desired output format, they successfully converted the given experimental sections of MOF into tables that accurately captured all the synthesis parameters, such as compound name, metal source, organic linkers, reaction temperature and duration (Figure 2b). At the same time, they found that providing four or five short examples in a few-shot prompting strategy enables ChatGPT to identify the features of synthesis paragraphs more effectively than zero-shot prompting. Guo et al. also demonstrated that few-shot prompting can significantly enhance LLMs’ performance on eight practical chemical tasks, such as name prediction, property prediction, and reaction prediction.21 For instance, in the task of property prediction (Figure 2c), they evaluated the accuracy of zero-shot, few-shot (k = 4), and few-shot (k = 8) approaches by using GPT-4, where k denotes the number of shots, across five data sets: BBBP, BACE, HIV, Tox21, and ClinTox (Figure 2d). Overall, the accuracy improves on the BBBP, and ClinTox data sets as the number of shots increases from k = 0, k = 4 to k = 8. However, on the HIV and Tox21 data sets, zero-shot prompting outperform those utilizing few-shot approaches. These highlight the importance of example selection, as high-quality and appropriate amounts of examples can significantly enhance the model’s reasoning abilities, while poorly chosen examples may impede its inferencing process. The effective application of few-shot prompting strategies in chemical research requires careful consideration by chemists.
Figure 2.
Zero-shot and few-shot prompting diagrams and testing results. (a) The simple demo of zero-shot and few-shot prompting for text mining of MOFs’ synthesis. (b) A table generated by ChatGPT which includes all the synthesis parameters of MOFs. Reproduced with permission from ref (27). Copyright 2023, American Chemical Society. (c) Demo of penetration property prediction for molecular structures represented by SMILES strings. (d) Accuracy of GPT-4 in molecular property prediction tasks, where k is the number of examples. BBBP, BACE, HIV Tox21, and ClinTox represent five different data sets. Data are sourced from ref (21).
Chain-of-Thought (CoT) Prompting
While LLMs showcase considerable proficiency in basic knowledge retrieval and simple literature analysis, they face substantial limitations when tasked with complex scientific reasoning. Researchers have further found that LLMs are prone to various types of failure, often not due to an absence of domain-specific knowledge, but the lack of a robust reasoning framework to guide the processes.42,43 To address this challenge, Wei et al. proposed the Chain-of-Thought (CoT) prompting,44 which enhances traditional few-shot prompting by embedding a structured, step-by-step reasoning process in the prompts (Figure 3a). CoT prompting guides the model through a series of logical steps as demonstrated in provided examples, and encourages the model to emulate this structured reasoning, thereby producing more coherent and logically sound responses. In addition to the CoT method, Kojima et al. also introduced the zero-shot-CoT prompting.43 This variant utilizes a generic prompt, such as “Let’s think step by step”, to guide the model through the reasoning process without relying on specific examples. This adaptation allows models to perform reasoning tasks even when concrete examples are not available, making it a versatile tool in a broader range of contexts. So far, some advanced LLMs such as OpenAI-o1 have been trained chain-of-thought, and demonstrated strong reasoning ability, although they are not specialized logic engines. Certain complex reasoning problems, particularly those involving multistep logic or abstract thinking, may fall outside their capabilities.
Figure 3.
Diagrams illustrating CoT and APE. (a) The schematic diagram of CoT using zero-shot or few-shot CoT prompting. (b) The illustration of solving a complex chemistry problem (calculating Kc) and the response from GPT-4o with zero-shot CoT prompting. (c) The workflowchart of APE. (d) The synthesis of MOF-521 which can be optimized by APE. Reproduced with permission from ref (32). Copyright 2023, Wiley-VCH.
The CoT prompting method has proven effective in tackling chemistry-related problems. For instance, when employing GPT-4o with zero-shot CoT prompting for equilibrium constant (Kc) calculations, the model follows a three-step process (Figure 3b). First, it formulates the Kc expression based on the given chemical equation. Subsequently, it calculates the molar concentrations of the relevant species. Finally, these concentrations are substituted into the Kc expression to obtain the answer. However, the reasoning process remains imperfect. If we give GPT-4o an unbalanced chemical equation, it will encounter issues with factual hallucinations for chemical equation balancing. To solve this problem, Ouyang et al. introduced STRUCTCHEM, a CoT-based strategy that integrates a Program of Thought (PoT) and confidence verification into the CoT framework, further enhancing the reasoning accuracy of GPT-4.45
Automatic Prompt Engineer (APE)
In multitask and multimodel contexts, the manual design and optimization of prompts are not only resource-intensive but also frequently constrained by rigid cognitive frameworks, ultimately hindering the models’ overall performance. To fully harness the potential of LLMs, Zhou et al. introduced the APE framework (Figure 3c), which automates the generation and optimization of prompts to enhance model performance on specific tasks.46 This automated approach enables LLMs efficiently to explore a diverse range of prompts, thereby maximizing the applicability and effectiveness of LLMs in various contexts. Additionally, the automated prompt optimization provided by APE can help reduce human errors by iteratively refining prompts to improve model accuracy.
So far, we have not found specific examples of APE applied in chemistry research, but some research shares similar ideas with APE in optimizing prompts. For instance, Yaghi and co-workers proposed the GPT-4 Reticular Chemist framework which demonstrates the ability to guide GPT-4 in completing MOF synthesis and optimization through prompt engineering (Figure 3d).32 The framework shares profound connections with APE in multiple aspects, including prompt design, iterative feedback, task decomposition, and automation goals. However, the framework relies on manually designed prompts and iterative feedback mechanisms, which limits its level of automation. In the future, by integrating APE’s automated prompt generation and optimization techniques, the efficiency and scalability of this framework could be further enhanced, promoting broader applications of LLMs in chemical discovery.
Synergizing Reasoning + Acting (ReAct) Prompting
General prompt engineering techniques struggle with external interactions, while those designed for such tasks often lack strong reasoning. This is especially problematic in domains like chemistry, where models need both robust reasoning and external interactions. For instance, models might need to modify compound ratios based on experimental results or utilize search engines to acquire additional chemical information. ReAct prompting, proposed by Yao et al. enables both systematic reasoning and proactive engagement with external resources, overcoming the above limitations.47 Different from CoT prompting, ReAct examples feature not only detailed reasoning steps but also specific actions such as “searches” or “look-ups” that the model performs during the reasoning steps. By integrating external tools, ReAct prompting helps optimize chemical research workflows and support the automation of material design and complex chemical tasks. It demonstrates powerful capabilities in enhancing LLMs’ data generation, tool selection, and feedback loops, driving the intelligent advancement of chemical research.
Recently, Kang et al. reported the ChatMOF, which utilizes the principles of ReAct prompting to establish a highly effective framework for MOF research.2 Upon receiving a query about MOFs, LLMs systematically plan and select appropriate tools for data retrieval, property prediction, and MOF generation. Then, the generated MOF is evaluated. If the result is unsatisfactory, the evaluator provides feedback to the agent, prompting the process to repeat until the desired outcome is achieved (Figure 4a). For example, in response to the query “Can you generate structures with the largest surface area?″, the initial structures exhibit a wide distribution of surface areas, which gradually converge towards higher values through iterative optimization by using ChatMOF (Figure 4b). The result is the MOF structure with a predicted surface area of 6411.28 m2/g. After geometric optimization, the calculated surface area increases to 7647.62 m2/g, ranking it third highest in the CoREMOF database. This demonstrates ChatMOF’s ability to generate and refine high-performance MOFs through systematic optimization and validation.
Figure 4.
Diagram and application of ReAct and RAG. (a) The schematic image of ChatMOF, which leverages the principles of ReAct prompting. (b) The distribution of maximum surface area for initial and generated structures, and the MOF structure with the largest surface area generated by ChatMOF with optimized data. (a), (b) reproduced with permission from ref (2). Available under a CC-BY license. Copyright 2024, Springer Nature. (c) Application of LLMs with RAG for fast-charging batteries research. Reproduced with permission from ref (13). Copyright 2024, Elsevier.
As an advanced prompt engineering strategy, ReAct prompting excels in complex tasks but has limitations like high resource usage, reliance on external tools, and coordination challenges. Its use depends on task needs and resource constraints. For simple tasks, simpler strategies and advanced models may suffice, while for complex tasks, ReAct’s strength is significant.
Retrieval Augmented Generation (RAG)
To handle knowledge-intensive tasks, Lewis et al. introduced a method called RAG, which can access external knowledge and help avoid hallucinations.48 Specifically, RAG maps the user’s input and an external knowledge database into the same vector space. By using similarity-based retrieval, it identifies the most relevant entries from the database, which are then incorporated into the prompt to augment the model’s knowledge. Different from ReAct, which relies on internal reasoning, RAG ensures reliability by grounding its responses in verified external knowledge. If ReAct resembles a detective solving a mystery through logical deduction, RAG functions as a librarian finding the right book to answer the question. For example, to explain an enzyme’s catalytic mechanism, ReAct would analyze the active site, hypothesize substrate binding, and deduce transition states. In contrast, RAG would retrieve relevant research, extract key information, and synthesize it into a clear explanation. By leveraging authoritative external knowledge, RAG minimizes conjecture and inaccuracies while facilitating the swift assimilation of new information, thereby ensuring the precision and contemporaneity of the responses.
Recently, Zhao et al. applied RAG in the BatteryGPT, a system incorporating LLMs into battery research.13 The core workflow of RAG consists of two key steps: knowledge retrieval and answer generation (Figure 4c). When a user submits a query, RAG first searches battery-related literature, data, or other information from a predefined knowledge base, transforming this information into vector representations for further processing. Next, RAG combines the retrieved knowledge with the user’s query to create a comprehensive prompt, which is then fed into LLMs. By leveraging the enriched contextual information, the LLMs produce a tailored response to the user’s query.
The RAG framework offers several advantages. It enables rapid retrieval and delivery of high-quality answers by drawing upon cutting-edge literature and knowledge bases, while also ensuring that the knowledge base remains dynamically updated. Additionally, it can generate hierarchical responses. For instance, when users pose a broad question related to battery anode technologies, the system provides multilayered answers ranging from general insights to more detailed technical aspects, including specific materials, methods, and references. These references are cited to ensure traceability and facilitate further exploration of the research findings. Moreover, the RAG system is more than a passive literature retrieval tool. Through integrating and analyzing information, the system can generate precise, contextually relevant answers in a short time frame, significantly enhancing research efficiency.
Meta-prompting
Recently, researchers at Stanford University and OpenAI have introduced a technique called meta-prompting, which enhances the handling of complex tasks by coordinating multiple independent experts (such as LLMs or programmers).49 The core idea is to use a central control model to decompose a complex problem into several subtasks, assign these subtasks to specialized expert experts, and gradually interact with each expert to generate a more accurate final solution (Figure 5a). This approach leverages the expertise and diversity of multiple independent experts, thereby improving the model’s autonomy and flexibility when addressing complex challenges. Although the central model can integrate insights from multiple experts and share portions of the text with them, the experts cannot engage in direct communication. This design is intended to streamline the interaction process and ensure that the central control model remains the operational core.
Figure 5.
Diagram and application of meta-prompting. (a) The schematic diagram of meta-prompting; (b) Coscientist’s capabilities in chemical synthesis planning tasks. Comparison of various LLMs on compound synthesis. (c) The workflow diagram of Coscientist; (d) Two examples of generated syntheses of nitroaniline. (b)–(d) reproduced with permission from ref (1). Available under a CC-BY license. Copyright 2023, Nature.
Gomes and co-workers have demonstrated the development and capabilities of a multi-LLMs-based intelligent agent called Coscientist, which utilizes the principles of meta-prompting to design and conduct complex scientific experiments.1 With the implication of search-GPT 4 through web and documentation search, Coscientist provides detailed and accurate chemical synthesis of various substances (Figure 5b). For example, in the synthesis of nitroaniline, the planner model breaks down the task into multiple subtasks and consults an “advanced chemical expert” for assistance (Figure 5c). Followed by assigning these subtasks to other experts (such as a web searcher), Coscientist demonstrated exceptional performance in circumventing the proposal of direct nitration, which is not experimentally applicable, and at the same time, suggesting optimized reaction pathways (Figure 5d). Furthermore, in the synthesis of acetaminophen, Coscientist provided comprehensive guidance encompassing raw material selection, reaction condition optimization, and execution of experimental protocols. These instances underscore Coscientist’s efficiency and reliability in chemical synthesis planning, significantly enhancing both the success rate and overall efficiency of experimental endeavors.
Although the meta-prompting strategy has proven significantly efficient for chemical research, due to the constraints of a closed-domain system, the central model may frequently resort to apologetic language and occasionally fail to relay essential information to the experts, when handling underperforming tasks, leading to errors. Further research and optimization are needed to determine how to effectively apply the meta-prompting strategy in the chemical field.
Summary and Outlook
In summary, prompt engineering can significantly improve the accuracy and reasoning capabilities of LLMs, thereby accelerating chemistry-related research, in various fields such as MOFs, organic synthesis, batteries, and autonomous experiments. We summarize the basic prompt engineering methods and their features and applications in Table 1. Once familiar with these, more advanced and cutting-edge approaches such as graph prompt and directional stimulus prompting can be used in more specialized and complex chemical tasks, accelerating the pace of scientific discovery.
Table 1. Summary of Several Prompt Engineering Methods in LLMs.
Prompt Engineering | Principle | Features | Applications |
---|---|---|---|
Zero-shot | Directly provides task description without examples. | Simple to use, no additional data needed. | Simple classification, generation tasks (e.g., text mining of MOF synthesis27) |
Few-shot | Provides a few examples to guide the model. | Improves model understanding of the task. | Moderately complex tasks (e.g., property prediction through SMILES21) |
CoT | Guides the model to reason step-by-step. | Suitable for complex reasoning tasks. | Math problems, logical reasoning (e.g., calculating chemical equilibrium constants45) |
APE | Automatically generates and optimizes prompts using the model’s own capabilities. | Reduces manual effort; may produce more effective prompts than human-designed ones. | Tasks requiring efficient prompt design. |
ReAct | Solves tasks through dynamic reasoning and external actions | Suitable for multistep reasoning and external interaction tasks; improves transparency. | Complex question answering, tasks requiring external knowledge (e.g., prediction and generation of MOFs50) |
RAG | Combines retrieval from external knowledge bases with generation to produce accurate answers. | Improves accuracy and reliability; handles tasks requiring external knowledge. | Open-domain question answering, fact-based tasks (e.g., transform words in battery research13) |
Meta-prompting | Uses a meta-prompt to guide the model in generating specific subprompts or task decompositions. | Enhances model’s ability to understand and execute complex tasks; highly flexible. | Complex task decomposition, multistep reasoning tasks (e.g., autonomous chemical research1) |
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant No. 62276186), Beijing National Laboratory for Molecular Sciences (BNLMS202404), and the Haihe Lab of Information Technology Application Innovation (Grant 22HHXCJC00002). The authors also appreciate the funding support from the Haihe Laboratory of Sustainable Chemical Transformations and the National Industry-Education Integration Platform of Energy Storage.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acscentsci.4c01935.
Transparent Peer Review report available (PDF)
Author Contributions
F.L. and J.Z. contributed equally to this work.
The authors declare no competing financial interest.
Supplementary Material
References
- Boiko D. A.; MacKnight R.; Kline B.; Gomes G. Autonomous chemical research with large language models. Nature 2023, 624 (7992), 570–578. 10.1038/s41586-023-06792-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kang Y.; Kim J. ChatMOF: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models. Nat. Commun. 2024, 15 (1), 4705. 10.1038/s41467-024-48998-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang H.; Fu T.; Du Y.; Gao W.; Huang K.; Liu Z.; Chandak P.; Liu S.; Van Katwyk P.; Deac A.; et al. Scientific discovery in the age of artificial intelligence. Nature 2023, 620 (7972), 47–60. 10.1038/s41586-023-06221-2. [DOI] [PubMed] [Google Scholar]
- Zheng Z.; Rampal N.; Inizan T. J.; Borgs C.; Chayes J. T.; Yaghi O. M. Large language models for reticular chemistry. Nat. Rev. Mater. 2025, 10.1038/s41578-025-00772-8. [DOI] [Google Scholar]
- Radford A.; Wu J.; Child R.; Luan D.; Amodei D.; Sutskever I. Language models are unsupervised multitask learners. OpenAI blog 2019, 1 (8), 9. [Google Scholar]
- Holtzman A.; Buys J.; Du L.; Forbes M.; Choi Y., The curious case of neural text degeneration. arXiv 2019. 10.48550/arXiv.1904.09751. [DOI]
- Chen M.; Tworek J.; Jun H.; Yuan Q.; Pinto H. P. D. O.; Kaplan J.; Edwards H.; Burda Y.; Joseph N.; Brockman G., Evaluating large language models trained on code. arXiv 2021. 10.48550/arXiv.2107.03374. [DOI]
- Austin J.; Odena A.; Nye M.; Bosma M.; Michalewski H.; Dohan D.; Jiang E.; Cai C.; Terry M.; Le Q., Program synthesis with large language models. arXiv 2021. 10.48550/arXiv.2108.07732. [DOI]
- Aizu’bi S.; Kanan T.; Almiani M.. In Large Language Models for Knowledge Discovery in Healthcare, 2024 International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS), IEEE: 2024; pp 183–190. [Google Scholar]
- Haltaufderheide J.; Ranisch R. The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs). NPJ. digital medicine 2024, 7 (1), 183. 10.1038/s41746-024-01157-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bedi S.; Liu Y.; Orr-Ewing L.; Dash D.; Koyejo S.; Callahan A.; Fries J. A.; Wornow M.; Swaminathan A.; Lehmann L. S.. et al. , Testing and evaluation of health care applications of large language models: a systematic review. JAMA 2025. DOI: 333319. 10.1001/jama.2024.21700. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu X.; Fan K.; Huang X.; Ge J.; Liu Y.; Kang H. Recent advances in artificial intelligence boosting materials design for electrochemical energy storage. Chem. Eng. J. 2024, 490, 151625 10.1016/j.cej.2024.151625. [DOI] [Google Scholar]
- Zhao S.; Chen S.; Zhou J.; Li C.; Tang T.; Harris S. J.; Liu Y.; Wan J.; Li X. Potential to transform words to watts with large language models in battery research. Cell. Rep. Phys. Sci. 2024, 5 (3), 101844 10.1016/j.xcrp.2024.101844. [DOI] [Google Scholar]
- Jablonka K. M.; Schwaller P.; Ortega-Guerrero A.; Smit B. Leveraging large language models for predictive chemistry. Nat. Mach. Intell. 2024, 6 (2), 161–169. 10.1038/s42256-023-00788-1. [DOI] [Google Scholar]
- Zhang D.; Liu W.; Tan Q.; Chen J.; Yan H.; Yan Y.; Li J.; Huang W.; Yue X.; Zhou D., Chemllm: A chemical large language model. arXiv 2024. 10.48550/arXiv.2402.06852. [DOI]
- Wu Z.; Zhang O.; Wang X.; Fu L.; Zhao H.; Wang J.; Du H.; Jiang D.; Deng Y.; Cao D.; et al. Leveraging language model for advanced multiproperty molecular optimization via prompt engineering. Nat. Mach. Intell. 2024, 6, 1359–1369. 10.1038/s42256-024-00916-5. [DOI] [Google Scholar]
- Vangala S. R.; Krishnan S. R.; Bung N.; Nandagopal D.; Ramasamy G.; Kumar S.; Sankaran S.; Srinivasan R.; Roy A. Suitability of large language models for extraction of high-quality chemical reaction dataset from patent literature. J. Cheminform. 2024, 16 (1), 131. 10.1186/s13321-024-00928-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Taffa T. A.; Usbeck R. In Leveraging LLMs in Scholarly Knowledge Graph Question Answering; QALD/SemREC@ ISWC, 2023. [Google Scholar]
- Ren R.; Wang Y.; Qu Y.; Zhao W. X.; Liu J.; Tian H.; Wu H.; Wen J.-R.; Wang H., Investigating the factual knowledge boundary of large language models with retrieval augmentation. arXiv 2023. 10.48550/arXiv.2307.11019. [DOI]
- Yu S.; Ran N.; Liu J. Large-language models: The game-changers for materials science research. Artif. Intell. Chem. 2024, 2 (2), 100076 10.1016/j.aichem.2024.100076. [DOI] [Google Scholar]
- Guo T.; Nan B.; Liang Z.; Guo Z.; Chawla N.; Wiest O.; Zhang X. What can large language models do in chemistry? a comprehensive benchmark on eight tasks. Advances in Neural Information Processing Systems 2023, 36, 59662–59688. 10.5555/3666122.3668729. [DOI] [Google Scholar]
- Ramos M. C.; Collison C.; White A. D. A review of large language models and autonomous agents in chemistry. Chem. Sci. 2025, 16, 2514. 10.1039/D4SC03921A. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liao C.; Yu Y.; Mei Y.; Wei Y., From Words to Molecules: A Survey of Large Language Models in Chemistry. arXiv 2024. 10.48550/arXiv.2402.01439. [DOI]
- Brown T. B., Language models are few-shot learners. arXiv 2025. 10.48550/arXiv.2005.14165. [DOI]
- Ruder S.; Peters M. E.; Swayamdipta S.; Wolf T. In Transfer learning in natural language processing, Proceedings of the 2019 conference of the North American Chapter of the association for computational linguistics: Tutorials, 2019; pp 15–18. [Google Scholar]
- Jaech A.; Kalai A.; Lerer A.; Richardson A.; El-Kishky A.; Low A.; Helyar A.; Madry A.; Beutel A.; Carney A., OpenAI o1 System Card. arXiv 2024. 10.48550/arXiv.2412.16720. [DOI]
- Zheng Z.; Zhang O.; Borgs C.; Chayes J. T.; Yaghi O. M. ChatGPT chemistry assistant for text mining and the prediction of MOF synthesis. J. Am. Chem. Soc. 2023, 145 (32), 18048–18062. 10.1021/jacs.3c05819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen B.; Zhang Z.; Langrené N.; Zhu S., Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review. arXiv. 10.48550/arXiv.2310.14735. [DOI]
- Vatsal S.; Dubey H., A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks. arXiv. 10.48550/arXiv.2407.12994. [DOI]
- Zheng Z.; He Z.; Khattab O.; Rampal N.; Zaharia M. A.; Borgs C.; Chayes J. T.; Yaghi O. M. Image and data mining in reticular chemistry powered by GPT-4V. Digit. Discovery 2024, 3 (3), 491–501. 10.1039/D3DD00239J. [DOI] [Google Scholar]
- Kim S.; Jung Y.; Schrier J. Large Language Models for Inorganic Synthesis Predictions. J. Am. Chem. Soc. 2024, 146 (29), 19654–19659. 10.1021/jacs.4c05840. [DOI] [PubMed] [Google Scholar]
- Zheng Z.; Rong Z.; Rampal N.; Borgs C.; Chayes J. T.; Yaghi O. M. A GPT-4 Reticular Chemist for Guiding MOF Discovery. Angew. Chem., Int. Ed. 2023, 62 (46), e202311983 10.1002/anie.202311983. [DOI] [PubMed] [Google Scholar]
- Zheng Z.; Zhang O.; Nguyen H. L.; Rampal N.; Alawadhi A. H.; Rong Z.; Head-Gordon T.; Borgs C.; Chayes J. T.; Yaghi O. M. Chatgpt research group for optimizing the crystallinity of MOFs and COFs. ACS Cent. Sci. 2023, 9 (11), 2161–2170. 10.1021/acscentsci.3c01087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu H.; Ma S.; Wu J.; Wang Y.; Wang X. Recent advances in screening lithium solid-state electrolytes through machine learning. Front. Energy Res. 2021, 9, 639741 10.3389/fenrg.2021.639741. [DOI] [Google Scholar]
- Hargreaves C. J.; Gaultois M. W.; Daniels L. M.; Watts E. J.; Kurlin V. A.; Moran M.; Dang Y.; Morris R.; Morscher A.; Thompson K.; et al. A database of experimentally measured lithium solid electrolyte conductivities evaluated with machine learning. npj Comput. Mater. 2023, 9 (1), 9. 10.1038/s41524-022-00951-z. [DOI] [Google Scholar]
- Liu X.; Zou B.-B.; Wang Y.-N.; Chen X.; Huang J.-Q.; Zhang X.-Q.; Zhang Q.; Peng H.-J. Interpretable Learning of Accelerated Aging in Lithium Metal Batteries. J. Am. Chem. Soc. 2024, 146 (48), 33012–33021. 10.1021/jacs.4c09363. [DOI] [PubMed] [Google Scholar]
- Kang Y.; Kim J. ChatMOF: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models. Nat. Commun. 2024, 15 (1), 4705. 10.1038/s41467-024-48998-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Su Y.; Wang X.; Ye Y.; Xie Y.; Xu Y.; Jiang Y.; Wang C. Automation and machine learning augmented by large language models in a catalysis study. Chem. Sci. 2024, 15 (31), 12200–12233. 10.1039/D3SC07012C. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaplan J.; McCandlish S.; Henighan T.; Brown T. B.; Chess B.; Child R.; Gray S.; Radford A.; Wu J.; Amodei D., Scaling laws for neural language models. arXiv 2020. 10.48550/arXiv.2001.08361. [DOI]
- Zhao W. X.; Zhou K.; Li J.; Tang T.; Wang X.; Hou Y.; Min Y.; Zhang B.; Zhang J.; Dong Z., A survey of large language models. arXiv 2023. 10.48550/arXiv.2303.18223. [DOI]
- Liu H.; Yin H.; Luo Z.; Wang X. Integrating chemistry knowledge in large language models via prompt engineering. Synth. Syst. Biotechnol. 2025, 10 (1), 23–38. 10.1016/j.synbio.2024.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Achiam J.; Adler S.; Agarwal S.; Ahmad L.; Akkaya I.; Aleman F. L.; Almeida D.; Altenschmidt J.; Altman S.; Anadkat S., GPT-4 technical report. arXiv 2023. DOI: 10.48550/arXiv.2303.08774. [DOI] [Google Scholar]
- Kojima T.; Gu S. S.; Reid M.; Matsuo Y.; Iwasawa Y. Large language models are zero-shot reasoners. Advances in neural information processing systems 2022, 35, 22199–22213. 10.5555/3600270.3601883. [DOI] [Google Scholar]
- Wei J.; Wang X.; Schuurmans D.; Bosma M.; Xia F.; Chi E.; Le Q. V.; Zhou D. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 2022, 35, 24824–24837. 10.5555/3600270.3602070. [DOI] [Google Scholar]
- Ouyang S.; Zhang Z.; Yan B.; Liu X.; Choi Y.; Han J.; Qin L., Structured Chemistry Reasoning with Large Language Models. PMLR: Proceedings of Machine Learning Research 2024, 235, 38937–38952. [Google Scholar]
- Zhou Y.; Muresanu A. I.; Han Z.; Paster K.; Pitis S.; Chan H.; Ba J., Large language models are human-level prompt engineers. arXiv. 10.48550/arXiv.2211.01910. [DOI]
- Yao S.; Zhao J.; Yu D.; Du N.; Shafran I.; Narasimhan K.; Cao Y., React: Synergizing reasoning and acting in language models. arXiv. 10.48550/arXiv.2210.03629. [DOI]
- Lewis P.; Perez E.; Piktus A.; Petroni F.; Karpukhin V.; Goyal N.; Küttler H.; Lewis M.; Yih W.-T.; Rocktäschel T. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 2020, 33, 9459–9474. 10.5555/3495724.3496517. [DOI] [Google Scholar]
- Suzgun M.; Kalai A. T., Meta-prompting: Enhancing language models with task-agnostic scaffolding. arXiv. 10.48550/arXiv.2401.12954. [DOI]
- M Bran A.; Cox S.; Schilter O.; Baldassari C.; White A. D.; Schwaller P. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 2024, 6, 525–535. 10.1038/s42256-024-00832-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.