Abstract
Electrochemical C−H oxidation reactions offer a sustainable route to functionalize hydrocarbons, yet identifying suitable substrates and optimizing synthesis remain challenging. Here, we report an integrated approach combining machine learning and large language models to streamline the exploration of electrochemical C−H oxidation reactions. Utilizing a batch rapid screening electrochemical platform, we evaluated a wide range of reactions, initially classifying substrates by their reactivity, while LLMs text‐mined literature data to augment the training set. The resulting ML models for reactivity prediction achieved high accuracy (>90 %) and enabled virtual screening of a large set of commercially available molecules. To optimize reaction conditions for selected substrates, LLMs were prompted to generate code that iteratively improved yields. This human‐AI collaboration proved effective, efficiently identifying high‐yield conditions for 8 drug‐like substances or intermediates. Notably, we benchmarked the accuracy and reliability of 12 different LLMs–including LLaMA series, Claude series, OpenAI o1, and GPT‐4‐on code generation and function calling related to ML based on natural language prompts given by chemists to showcase potentials for accelerating research across four diverse tasks. In addition, we collected an experimental benchmark dataset comprising 1071 reaction conditions and yields for electrochemical C−H oxidation reactions.
Keywords: chemical reactions, machine learning, large language models, hydrocarbons, oxidation
In this study, we integrate machine learning (ML) and large language models (LLMs) to accelerate the exploration of electrochemical C−H oxidation reactions. A rapid screening platform is developed for experimental screening, while LLMs assist in literature mining and generate Python code to train ML models for reactivity prediction and synthesis optimization. This human‐AI collaboration enables synthetic chemists to streamline discovery processes and optimize reaction conditions efficiently.

Introduction
Electrochemical C−H oxidations are tunable and cost‐effective transformations for streamlining the conversion of hydrocarbons to decorated oxidized molecules.[ 1 , 2 , 3 , 4 ] As synthetic chemists actively expand the scope of this field and discover new chemical reactions, the selection of reactive substrates and the subsequent optimization of synthesis parameters, while often guided by fundamental chemical principles and hypotheses, still require extensive empirical condition screening and remain resource‐consuming. Consequently, smart workflows that bypass the traditional trial‐and‐error approach are essential to meet the increasing demand from chemists to navigate reactivity space and expedite new reaction discovery. [5]
Recent advancements in machine learning (ML) have shown promising potential in reactivity prediction and optimization of organic reactions.[ 6 , 7 , 8 , 9 , 10 ] Simultaneously, large language models (LLMs) have gained attention in chemical research, helping researchers streamline and enhance their digital workflows through intuitive natural language prompts.[ 11 , 12 , 13 , 14 , 15 , 16 ] In essence, ML provides a toolbox for specific tasks in chemical research and LLMs serve as meta‐tools that enhance the accessibility of these computational tools, bridging the gap between digital proficiency and chemical expertise. Consequently, the integration between the mathematical rigor of ML and the language understanding and domain‐specific knowledge of LLMs can make data‐driven methodologies accessible to a broader community of chemists. This motivates us to explore the following underexplored questions in this evolving field: What are the capabilities of LLMs in synthetic electrochemistry? How can LLMs be integrated with ML to reliably expedite reaction discovery?
Herein, we demonstrate the synergistic potential of ML and LLMs to advance the exploration and optimization of electrochemical C−H oxidation reactions. For the ML aspect, our approach aims to address two fundamental questions: (1) Which substrates are suitable for electrochemical oxidation? and (2) What synthesis conditions give optimal results? By leveraging both literature data and rapid screening experimental results, we employed LLM to implement code to train models to predict reactivity and selectivity for C(sp3)−H oxygenation, enabling in silico screening of chemical entities for initial hits. Subsequently, for these selected substrates of interest, an active learning protocol designed for reaction yield optimization was applied to iteratively and rapidly navigate the search space to identify optimal synthesis conditions. Throughout this study, we also illustrate the versatility of LLM agents – to be a tool (e.g. extracting knowledge from literature), to create tools (e.g. generating custom Python code), and to use tools (e.g. employing function calling to execute ML predictions and liquid handler tasks). As such, this multifaceted utility–enhancing workflow efficiency and intelligence–serves as a practical example of the application of LLMs in assisting synthetic chemists and streamlining chemical research processes to accelerate scientific discovery.
Results and Discussion
Rapid Screening Electrochemical Platform
At the onset of this study, the goal was to assess whether ML could guide the selection of compounds suitable for electrochemical C(sp3)−H oxidation. To accomplish this, we required a diverse training set composed of both substrates amenable to electrochemical oxidation and those that are not‐modeling requires both positive and negative training data sets for better accuracy in prediction. Most reactions available in the literature predominantly include successful examples of reactive and high‐yield substrates, leaving a gap in data on unsuccessful conditions, which are equally critical for training robust predictive models.
To address this issue, inspired by previous works on electrochemical synthesis platforms,[ 17 , 18 , 19 ] we developed a rapid screening electrochemical platform capable of conducting multiple reactions simultaneously, thus enabling both reactivity screening and optimization of synthesis conditions. This platform features a standardized 24‐well plate electrosynthesis reactor (Figure 1A), which includes a water jet‐cut anode and cathode connectors, an alignment plate, and a vial locator. Components for the reactor are readily available through commercial vendors and can be easily assembled at low cost in the lab (See Supporting Information Section S2 for detailed design and assembly information). Furthermore, the choice of electrodes can be adjusted to meet specific experimental needs, enhancing the flexibility of our setup. The experimental setup includes 4 mL electrochemical reactors with two pairs of counter electrodes, totaling four electrodes (Figure 1A), and it allows for increased surface area and improved current distribution, which enhances mass transport. Such improvements facilitate the diffusion of reactants to the electrode surfaces and the efficient removal of products. Additionally, the dual‐electrode setup ensures that the reaction can continue even if one pair fails or loses connection, thus improving the reliability of the experimental setup.
Figure 1.
(A) Design and assembly of the 24‐well electrochemical platform and the schematic overview of the electrochemical C(sp3)−H oxidation process using the electrocatalyst. (B) Semantic literature analysis for reaction data mining using a language model with natural language prompts. Performance is evaluated by comparing the ground truth with LLM‐assigned labels and examining the impact of prompt quality. (C) Overview of training data preparation and machine learning models used for predicting electrochemical C−H oxidation reactivity (Task 1) and selectivity (Task 2). Models with different architectures are evaluated for accuracy and AUC.
Initially, our platform was employed for reactivity screening to acquire labeled data points essential for training our ML models about substrate suitability for C(sp3)−H oxidation. To ensure a diverse chemical space in our training dataset, using a similar approach reported in previous literature, [5] we randomly selected 335 chemicals available in our laboratory and subjected them to predetermined electrochemical conditions[ 20 , 21 ] to enable rapid generation of data points, allowing us to classify each substrate based on its reactivity (Section S2.2 of the Supporting Information). The chemical space explored in this study incorporates diverse molecular structures (Table S2 and Figure S21), including hydrocarbons, heteroatom‐containing scaffolds, and functionalized drug‐like molecules, to provide a comprehensive foundation for reactivity modeling. In particular, the reactions targeted the transformation of substrates into ketone or alcohol products, using mediator‐catalyzed C(sp3)−H bond oxidation,[ 20 , 21 ] with the outcomes verified through NMR spectroscopy by monitoring the appearance of signature peaks indicative of these products.
As the first step to constructing a balanced and informative dataset, we assigned negative labels to reactions where the transformation either did not occur or resulted in unknown products beyond the scope of our study. This approach allowed us to simplify the modeling challenge to a binary classification task, focusing solely on predicting substrate reactivity under a fixed set of reaction conditions. This strategy avoided the complexities associated with predicting novel products and was sufficient to enable the machine learning model to focus on learning patterns relevant to the specific type of electrochemical transformation in this study (Figure 1A). We acknowledge that while predetermined conditions generally work for a number of known reactants within the domain of electrochemical C−H oxidation, they could yield low outcomes during screening. However, we designated yield optimization as a separate task later in this study to streamline the initial reactivity screening process and avoid the need to tune numerous parameters of electrochemical reactions during this phase. The deployment of this standardized and low‐cost rapid screening electrochemical reactor allowed the rapid acquisition of a 335 experimental electrochemical oxidation outcome dataset (Table S2). which includes both positive and negative labels.
Literature Data Mining
In parallel to collecting experimental data, we retrieved reaction data and outcomes from the scientific literature to augment the C−H oxidation reaction dataset with a diverse choice of candidates. Traditional approaches to querying reaction databases often struggle to capture the nuanced criteria specific to our study, such as focusing on electrochemical C−H oxidation reactions on aliphatic carbons using a mediator. For a specific demand on the reaction, manually analyzing and curating a large corpus of papers to extract relevant examples that meet these criteria would typically be very time‐consuming for a human.
To address this challenge, we employed semantic analysis using LLM agents guided by human instructions. This approach allows for precise extraction of relevant data by understanding and interpreting the context within scientific manuscripts. Specifically, the LLMs were tasked with identifying papers that met three critical criteria for our electrochemical oxidation dataset: (1) the paper must be an experimental study on electrochemical synthesis (i.e., not a review paper or computational study), (2) it must involve C−H bond oxidation to alcohol or ketone products (i.e., not other types like C−C coupling), and (3) the reaction must occur on an aliphatic carbon (i.e., not a C(sp2)−H) (Section S3.2 of the Supporting Information). In essence, the LLMs function as a customized filter, guided by human language instructions, that automates the process of reading through each paper, understanding not only the experimental sections but also the discussions, and selecting the qualifying papers. To evaluate the performance of this zero‐shot semantic analysis, we analyzed 140 relevant papers using pre‐designed prompts (Table S4). The LLMs took approximately 15 seconds per manuscript to assign a Yes/No label, resulting in a total analysis time of about 35 minutes for all selected literature (Figure S18). Validation against the ground truth labels of 140 papers revealed that the LLMs achieved 96 % accuracy, correctly identifying 21 relevant papers while missing 2 hits (Figure 1B and Section S3.3 of the Supporting Information), and the DOIs of the selected papers were further queried in the Reaxys chemical database, retrieving 497 reactions for 247 substrates (Table S4). We note that our primary focus was on streamlining and automating literature selection based on binary classification and full‐text analysis, not on extracting reactions in structured formats from the papers themselves, which can also be assisted through LLMs, as demonstrated in previous studies.[ 14 , 16 , 22 , 23 ]
The power of semantic analysis using LLMs is further underscored by their ability to provide reasoning for their decisions (Figures S15 and S16).[ 11 , 24 , 25 , 26 ] For each of the three criteria, the LLM justifies its answers based on specific paragraphs or sentences in the original literature, leveraging the reasoning capabilities of large language models. To ensure minimal hallucination and sound chemical knowledge, we devised specific prompts tailored to the goal of this study and manually analyzed the reasoning statements (Section S3.3 of the Supporting Information), confirming over 90 % correctness in decision‐making (Figure 1B) and in pointing back to the relevant sections of the original manuscripts. Simultaneously, we conducted an ablation study where shorter, less detailed prompts were used. These prompts, lacking strict instructions to reference the original literature and with less specificity on each question, came with more ambiguity in the instruction and resulted in decreased performance of the LLMs, highlighting the importance of detailed and specific prompts for guiding LLMs, particularly in semantic analysis.
Machine Learning Models Training
Upon completion of dataset collection from experimental outcomes and literature data mining, we amalgamated the data to create a balanced dataset suitable for model training (Figure 1C). The literature data, biased towards successful substrates, complemented the failure data points generated from our screening platform. This combination resulted in a dataset comprising 582 substrates, with 271 oxidizable (46.6 %) and 311 non‐oxidizable (53.4 %) towards electrochemical C−H oxidation reactions (Figure 1C). The dataset included 7,720 carbon atoms, 431 of which were oxidized during the transformations (a complete list is available in the Supporting Information). Our objective was to develop two classes of predictive models: (i) the reactivity prediction model to classify substrates as reactive or non‐reactive, and (ii) the selectivity prediction model to classify each carbon atom within a molecule as oxidized or unchanged. The former allows for rapid screening of chemical catalogs, while the latter helps chemists identify which sites are likely to undergo oxidation. Note that SMILES strings of the substrates were converted to Morgan fingerprints for the reactivity models, except for Chemprop, where SMILES were used directly. After optimizing the respective hyperparameters, each model‘s performance was rigorously tested using accuracy and Area Under the Curve (AUC) metrics. All models demonstrated high performance, with accuracies over 91.7 % and AUC values of 97.2 %. Additionally, we explored the inclusion of density functional theory (DFT) descriptors in the machine learning model (Section S4.1 of the Supporting Information), which provided richer quantum mechanical information and slightly enhanced model performance (Table S10). These results align well with our previous studies using ML to predict electrochemical reaction outcomes, [5] and the trained models were thus employed for catalog screenings to make rapid reactivity prediction on over a large number of commercially available compounds (Figure S21).
Following in silico screening, we focused on substrates identified as reactive and sought to develop a selectivity model to provide atom‐level insights into oxidation. Unlike traditional machine learning models, Chemprop offers interpretability features that give an intuitive view of which molecular substructures drive the predictions (Section S4.1 of the Supporting Information). This motivated us to further explore Chemprop's application for selectivity. It should be noted that in contrast to reactivity training set, which was balanced at the molecular level, the number of oxidizable sites at the atom level was significantly smaller compared to non‐oxidizable sites. It was observed that the model achieved high accuracy (Figure 1C), but this was accompanied by relatively lower recall (72.0 %). In contrast, the ROC‐AUC metric, which is more appropriate for imbalanced datasets, provided a robust evaluation, with the model achieving a ROC‐AUC value of 98.1 %, indicating the model‘s discriminative capability (Section S4.2 of the Supporting Information). We note that the primary focus of our ML models training remains on predicting reactivity for C−H oxidation reactions, with selectivity serving as additional information to provide atom‐level insights. Collectively, the consistent performance of the ML models on both tasks underscores the robustness of our integrated balanced dataset and highlights the benefit of combining wet lab experimentation with literature data mining.
Benchmarking LLMs Auto Code Generation Performance
To further streamline the integration of machine learning in electrochemical reaction exploration, we explored the use of LLMs to automatically generate code for the practical implementation of the ML models described earlier. While human expertise in both chemistry and coding is valuable, it would be beneficial to the chemistry community to have methods that not only understand the chemical context but also automate the coding processes required to analyze and process data. LLMs such as LLaMA, [27] GPT, [28] and Claude [29] models offer a promising solution by generating executable code from natural language prompts, potentially lowering the barrier to machine learning tools and enhancing chemists’ productivity.[ 11 , 12 , 15 , 30 , 31 ] This raises an important question: Can LLMs serve as reliable code assistants for chemists?
Toward this end, as the first step, we developed a “prompt‐to‐code” framework and used it to evaluate the performance of different open‐source and proprietary LLMs in tool‐making and tool‐use (Figure 2A). The core objective was to assess the reliability and accuracy of code produced by LLMs across four distinct tasks in the context of this study: (1) ML model training using a dataset on C−H oxidation, (2) development of code for tuning synthesis conditions and optimizing reaction yields, (3) interpretation of documentation and application of existing Python package for yield optimization, and (4) direct interaction with laboratory hardware [32] to prepare solutions based on generated synthesis parameters (Figure 2B). These tasks were designed to span a range of practical applications, from data handling to physical lab automation, reflecting the diverse ways LLMs can implement code for ML to support chemical research (Figure S22). Notably, previous evaluations of LLM code‐writing performance in chemistry tasks have often relied on qualitative assessments by human reviewers and have usually been based on one‐time conversations, which could introduce bias and did not account for the inherent variability and occasional inaccuracies (hallucinations) in LLM outputs. To address these issues, we developed a rigorous, quantitative benchmarking process using four Python‐based code evaluators tailored for each task to not only check the executability of the code but also assess its correctness in a simulated environment (Section S5 of the Supporting Information). To this regard, 12 LLMs were chosen (Full list shown in Tables S12–14 of the Supporting Information) and each LLM was evaluated by repeatedly generating code for the same task 100 times independently, a robust sample size that mitigates performance variability and reflects the long‐term reliability of each model.
Figure 2.
(A) Overview of the human‐LLM interaction for designing and creating research tools to process data for chemists. (B) Prompt‐to‐code tasks guiding the LLM to develop ML programs or executable code in the context of chemical research. Examples of code generated by LLM for task 1 to 4 are shown in Section S5 of the Supporting Information. (C) Comparison of various LLMs on code‐writing tasks, featuring 7 representative models from a total of 12. Each task was run with single‐shot (grey bar) or self‐reflection (green bar) approaches. Performance was evaluated by repeating the prompt 100 times and calculating the success rate. Details can be found in Tables S12–14 of the Supporting Information. (D) LLM interprets suggested experimental conditions via ML program and converts them into physical actions for a liquid handler on a robotic platform.
The results from this benchmarking demonstrated the potential of using LLMs as code assistants to implement ML models for chemists (Figure 2C). For task 1, which involved training ML models, the LLMs demonstrated a high degree of competency (Table S9), with code generation accuracy frequently surpassing 90 %. This indicates a strong understanding of the common ML frameworks and the ability to apply them correctly to chemical datasets. In task 2, the LLMs faced the more complex challenge of optimizing chemical synthesis conditions (Table S10). Here, the more advanced LLMs (e.g., o1 and gpt‐4o) showed impressive adaptability, with a success rate of 85 % and 75 % of the trials, respectively, highlighting their potential to handle complex, context‐dependent coding tasks through their enhanced reasoning ability. Task 3 tested the LLMs′ ability to comprehend and apply unfamiliar Python packages for yield optimization. Here, the LLMs proved to be proficient learners, quickly adapting to new documentation and examples to produce ready‐to‐use code (Table S11). Finally, in task 4, LLMs were tasked with generating executable scripts for liquid handling robots based on the suggestion made by Taks 2 or Task 3. This task demonstrated the practical applicability of LLM‐generated code in automating physical processes in the lab, consistent with the previous literature findings,[ 13 , 15 , 33 ] with successful execution reflecting the LLMs′ capacity to integrate digital and physical workflows effectively (Figure 2D). Interestingly, models such as o1 and o1‐mini spent more time and tokens on reasoning before responding and code writing, and such behavior leads to better performance in both code executability and correctness (Tables S12–14). These models also showed reduced hallucination rates by tailoring solutions more effectively based on the provided prompts.
Furthermore, we introduced a “self‐reflection” mode in the benchmarking process and evaluated the codes generated with or without this mode (Figure 2A and Figure 2C). When an LLM′s generated code failed initial execution, the error message was automatically sent to the LLM, prompting it to modify its output in real‐time.[ 34 , 35 , 36 , 37 ] It was observed that, for all models, regardless of the size and if it is open‐source or not, this iterative process significantly enhanced the quality of the generated code by reducing the hallucinations (Figure 2C), suggesting that LLMs can learn from their errors and improve subsequently generated codes. The benchmarking process not only allowed us to evaluate and compare the performance of a large number of existing LLMs but also created a flexible framework that can be easily adapted for testing future models.
Electrochemical Reaction Yield Optimization with Active Learning Approach
To further our exploration into the synthesis optimization of electrochemical C−H oxidation, we developed and compared several methodologies on the same electrochemical synthesis platform mentioned in the prior section. Our focus shifted towards active learning strategies designed to iteratively refine synthesis conditions to maximize yield while minimizing experimental iterations. We used a batch approach on the screening electrochemical reactor with each batch comprising 3 to 5 reactions, analyzed and adjusted based on the NMR yield outcomes to guide subsequent experimental conditions (Figure 3A). We developed and examined four different strategies: (i) random sampling, representing traditional trial‐and‐error; (ii) LLM‐driven prediction,[ 13 , 26 , 38 ] which mimics the decision‐making process by a human chemist and leverages chemical intuition without statistical learning; (iii) ML‐based optimization,[ 8 , 39 ] which applies a purely statistical approach and uses Bayesian optimization (BO) for parameter selection, with code generated by LLMs that utilize expected improvement on a Gaussian process; and (iv) a hybrid LLM–ML approach, where LLMs guide the initial parameter selection. Subsequently, LLMs use ML programs such as EDBO[ 8 , 39 ] as helper functions to make suggestions on the next synthesis parameters. We note that the key difference between these approaches lies in how LLMs are positioned within the workflow. Additionally, in methods (iii) and (iv), the generated code was derived from the outputs in Task 2 and Task 3 of the previous section, respectively, ensuring consistency in how the LLMs assisted the human chemist.
Figure 3.
(A) Illustration of different approaches in searching for optimal synthesis conditions, along with the performance of each approach on the electrochemical oxidation of α‐pinene to verbenone, with “n” indicating the number of reactions per iteration. All approaches were initialized with random sampling at iteration 0. (B) Selected substrates based on ML predictions for reactivity and selectivity, along with the resulting optimization process and yields. Substrates are grouped by interpretable substructures and reaction yields. Human‐level performance was estimated by reproducing conditions from the literature[ 20 , 21 ] or manually optimizing new reactions (Table S28). Each reaction batch size was 4, with a total of 88 reactions per substrate, including 44 from random sampling and 44 from the integrated method, over 10 iterations plus initialization.
The effectiveness of these methods was first tested on α‐pinene due to its high predicted reactivity score (0.97) and selectivity (0.80), making it an ideal candidate for methodological comparison. Over the course of 455 reactions (Figure S40), distributed across 10 iterative rounds per method with different batch sizes within the search space (Table 1), we closely monitored the improvement in yield (Tables S18, S26, and S27). Our findings indicate that the random method stagnated at low yields (around 20 %) even after extensive iteration, underscoring the inefficiency of non‐guided experimental approaches. While there is excitement and interest in using only LLMs for synthesis optimization, it is important to understand that their “suggestions” are not based on statistical learning and lack a mathematical foundation. Instead, they reasoned based on observed general trends, made educated guesses using domain knowledge, and tended to usually change one factor at a time (Figures S53 and S54). However, this does not mean that LLMs are not valuable for synthesis optimization; on the contrary, it was demonstrated that the integrated LLM–ML approach can start with an LLM‐informed search space that incorporates both literature‐derived insights and empirical data. This method rapidly refined the reaction conditions through ML algorithms. In this case, LLMs rely on the output from ML models to make decisions rather than making educated guesses on which conditions to try next. This synergy also enabled the precise tuning of conditions to achieve yields over 60 % (Figure 3A and Section S6 of the Supporting Information).
Table 1.
Optimization synthesis parameters and search space for electrochemical C−H oxidation reactions.
|
Synthesis Parameter[a] |
Choice[b] |
Number |
|---|---|---|
|
Concentration (mM) |
25, 50, 75, 100, 125 |
5 |
|
Electrocatalyst |
NHPI, TCNHPI, Quinuclidine, DABCO, TEMPO |
5 |
|
Equivalence of Electrocatalyst |
0, 0.25, 0.5, 0.75, 1 |
5 |
|
Electrolyte |
LiClO4, LiOTf, Bu4NClO4, Et4NBF4, Bu4NPF6 |
5 |
|
Solvent |
ACN, ACN/HFIP (19 : 1) |
5 |
[a] The reactions were carried out using graphite or RVC anode and nickel cathode, with a potential of 3.5 V, at room temperature for 12 hours. The reaction volume was 4 ml, with stirring at 600 rpm. Detailed procedures are provided in Section S6 of the Supporting Information. [b] Abbreviations: NHPI=N‐hydroxyphthalimide, TCNHPI=tetrachlorophthalimide, DABCO=1,4‐diazabicyclo[2.2.2]octane, TEMPO=2,2,6,6‐tetramethylpiperidin‐1‐oxyl, ACN=acetonitrile, HFIP=hexafluoroisopropanol.
Building on the success with α‐pinene, we applied the LLM–ML framework to optimize the synthesis conditions for 7 additional substrates (Table S13) with a batch size of four for 10 iterations to demonstrate its generalizability. Notably, 8 substrates were identified using the reactivity and selectivity models (Figure 3B), which provide chemists with not only a numerical value indicating the likelihood of oxidation under C−H electrochemical oxidation but also visualizations of the substructures contributing to oxidation, making the decision‐making process more transparent. The selectivity model also indicates which sites are likely to be oxidizable. It should be noted that while many potential candidates were identified based on the predictions from the reactivity and selectivity models, the 8 representative substrates in this study were filtered based on cost and molecular weight and then handpicked by considering factors such as structural diversity and potential for broad application in optimizing electrochemical reactions.
Upon applying the same active learning approach, we observed that the optimization process consistently yielded high‐performance results across all selected substrates. In total, we successfully identified the best combination of electrocatalyst, electrolyte, and concentration for reaction (Figure 3B) from 1,250 possible combinations within 10 iterations for each of the 8 molecules. We note that each compound was independently optimized from the same 1,250 reaction search space (Table 1). The optimization results demonstrated that all substrates achieved yields comparable to those obtained through human‐level optimization, without initial input from a chemist. Interestingly, while some optimal conditions mirrored those reported in the literature, others revealed new, efficient combinations not driven by traditional chemical intuition (Tables S15–28). This balance between exploitation (refining conditions in targeted regions) and exploration (testing novel combinations) underscored the robustness of the LLM–ML approach.
Additionally, we found that the optimized conditions were often specific to each substrate, as demonstrated by cross‐application of synthesis conditions (Figure 4). Each substrate‘s highest yield was achieved under its uniquely optimized parameters, highlighting the necessity of a tailored approach rather than a one‐size‐fits‐all methodology. This specificity is crucial, as conditions optimized for one substrate are different and did not necessarily translate to others for the optimal reaction yields (Table S14). By dynamically adapting and optimizing reaction conditions, our approach reduces the trial‐and‐error inherent in traditional methods, enhancing efficiency and productivity in chemical synthesis. The active learning framework proved effective in streamlining the optimization process, effectively reducing the experimental burden while achieving high yields with the active learning approach.
Figure 4.

Impact of condition‐specific optimization on yield outcomes for electrochemical C−H oxidation reactions. The heatmap illustrates the observed reaction yields (%) of eight different compounds (1 to 8) under various reaction conditions (I to VIII). The deep blue diagonal cells indicate the yields achieved using the unique optimized conditions for each specific compound, determined through the active learning approach. The off‐diagonal light blue cells show the yields when optimized conditions for one compound are applied to others. Detailed reaction conditions are available in Supporting Information Table S17, and the search space parameters are listed in Table 1.
Conclusions
We have successfully (1) developed and validated machine learning models for predicting reactivity and site selectivity in electrochemical C−H oxidation reactions, achieving high accuracy, (2) created a cost‐effective, rapid screening electrochemical platform to facilitate rapid data generation and reactivity screening, (3) leveraged large language models to semantically analyze scientific literature and generate ML code, significantly lowering the barrier for chemists to utilize ML tools, and (4) employed a synergistic approach combining ML and LLMs to iteratively refine synthesis conditions, leading to high‐yield optimizations for selected substrates and a 1071 electrochemical reaction dataset. At a fundamental level, large language models can be perceived as motivated learners, while most of them might only grasp the basics of chemistry. Importantly, incorporating machine learning models and coherent human instruction significantly enhances their proficiency in a range of chemistry‐related tasks. The approach presented in this study, while specifically tailored for electrochemical C−H oxidation, offers potential for transferability to other chemical reactions. By retraining ML models on reaction‐specific datasets and adapting LLM prompts for alternative reactivity spaces, the framework can be generalized to identify reactivity and optimize conditions in a wide range of contexts.
Despite these promising results, there is still a long journey ahead. Opportunities for improvement include designing more sophisticated human‐AI interaction frameworks, better interfacing with other digital tools, and expanding the knowledge base with external sources. Additionally, challenges remain in cases where experimental data in literature is scarce or heavily biased toward successful outcomes. This integrated AI‐powered methodology not only demonstrates potential to bypass the traditional trial‐and‐error process but also offers a robust and generalizable pathway for potentially expanding to a wider range of reaction types and conditions, enabling further automation of the discovery process by coupling with commercial molecule databases. Overall, this work underscores the potential of human‐AI collaboration, combining the strengths of LLMs and ML, to advance synthetic chemistry research.
Conflict of Interests
The authors declare no conflict of interest.
1.
Supporting information
As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.
Supporting Information
Acknowledgments
This material is based upon work supported by Pfizer. The authors extend their gratitude to the Machine Learning for Pharmaceutical Discovery and Synthesis Consortium for their support. Z.Z. is grateful to the OpenAI Researcher Access Program for subsidized access. The authors thank Drs. Andrew Zahrt, Jakob Dahl, Seung Kyun Ha, and Mr. Leo Maeser (Jensen Research Group) for their valuable discussions. Z.Z. expresses gratitude to Drs. Brent Koscher and Matthew McDonald (Jensen Research Group) for their assistance in setting up reactions on the autonomous chemical discovery platform, and to Wenhao Gao (Coley Research Group) for helpful discussions on synthesis optimization.
Zheng Z., Florit F., Jin B., Wu H., Li S.-C., Nandiwale K. Y., Salazar C. A., Mustakis J. G., Green W. H., Jensen K. F., Angew. Chem. Int. Ed. 2025, 64, e202418074. 10.1002/anie.202418074
Data Availability Statement
The data that support the findings of this study are available in the supplementary material of this article. Additional source code, as well as datasets, can be found in the GitHub repository (https://github.com/zach‐zhiling‐zheng/EChem‐Explorations).
References
- 1. Zhu C., Ang N. W. J., Meyer T. H., Qiu Y., Ackermann L., ACS Cent. Sci. 2021, 7, 415–431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Kingston C., Palkowitz M. D., Takahira Y., Vantourout J. C., Peters B. K., Kawamata Y., Baran P. S., Acc. Chem. Res. 2020, 53, 72–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Yan M., Kawamata Y., Baran P. S., Chem. Rev. 2017, 117, 13230–13319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Novaes L. F. T., Liu J., Shen Y., Lu L., Meinhardt J. M., Lin S., Chem. Soc. Rev. 2021, 50, 7941–8002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Zahrt A. F., Mo Y., Nandiwale K. Y., Shprints R., Heid E., Jensen K. F., J. Am. Chem. Soc. 2022, 144, 22599–22610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Coley C. W., Barzilay R., Jaakkola T. S., Green W. H., Jensen K. F., ACS Cent. Sci. 2017, 3, 434–443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Sandfort F., Strieth-Kalthoff F., Kühnemund M., Beecks C., Glorius F., Chem 2020, 6, 1379–1390. [Google Scholar]
- 8. Shields B. J., Stevens J., Li J., Parasram M., Damani F., Alvarado J. I. M., Janey J. M., Adams R. P., Doyle A. G., Nature 2021, 590, 89–96. [DOI] [PubMed] [Google Scholar]
- 9. Jinich A., Sanchez-Lengeling B., Ren H., Harman R., Aspuru-Guzik A., ACS Cent. Sci. 2019, 5, 1199–1210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Hou X., Li S., Frey J., Hong X., Ackermann L., Chem 2024, DOI 10.1016/j.chempr.2024.03.027. [DOI] [Google Scholar]
- 11. Science M. R., Quantum M. A., arXiv preprint 2023, DOI 10.48550/arXiv.2311.07361. [DOI] [Google Scholar]
- 12. Bran A. M., Cox S., Schilter O., Baldassari C., White A. D., Schwaller P., Nat Mach Intell 2024, 6, 525–535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Boiko D. A., MacKnight R., Kline B., Gomes G., Nature 2023, 624, 570–578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Zheng Z., Zhang O., Borgs C., Chayes J. T., Yaghi O. M., J. Am. Chem. Soc. 2023, 145, 18048–18062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Zheng Z., Zhang O., Nguyen H. L., Rampal N., Alawadhi A. H., Rong Z., Head-Gordon T., Borgs C., Chayes J. T., Yaghi O. M., ACS Cent. Sci. 2023, 9, 2161–2170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Leong S. X., Pablo-García S., Zhang Z., Aspuru-Guzik A., ChemRxiv preprint 2024, DOI 10.26434/chemrxiv-2024-7fwxv. [DOI] [Google Scholar]
- 17. Rein J., Annand J. R., Wismer M. K., Fu J., Siu J. C., Klapars A., Strotman N. A., Kalyani D., Lehnherr D., Lin S., ACS Cent. Sci. 2021, 7, 1347–1355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Palkowitz M. D., Laudadio G., Kolb S., Choi J., Oderinde M. S., Ewing T. E.-H., Bolduc P. N., Chen T., Zhang H., Cheng P. T. W., Zhang B., Mandler M. D., Blasczak V. D., Richter J. M., Collins M. R., Schioldager R. L., Bravo M., Dhar T. G. M., Vokits B., Zhu Y., Echeverria P.-G., Poss M. A., Shaw S. A., Clementson S., Petersen N. N., Mykhailiuk P. K., Baran P. S., J. Am. Chem. Soc. 2022, 144, 17709–17720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Siu T., Li W., Yudin A. K., J. Comb. Chem. 2000, 2, 545–549. [DOI] [PubMed] [Google Scholar]
- 20. Kawamata Y., Yan M., Liu Z., Bao D.-H., Chen J., Starr J. T., Baran P. S., J. Am. Chem. Soc. 2017, 139, 7448–7451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Horn E. J., Rosen B. R., Chen Y., Tang J., Chen K., Eastgate M. D., Baran P. S., Nature 2016, 533, 77–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Ai Q., Meng F., Shi J., Pelkie B., Coley C. W., ChemRxiv preprint 2024, DOI 10.26434/chemrxiv-2024-979fz. [DOI] [Google Scholar]
- 23. Zhang W., Wang Q., Kong X., Xiong J., Ni S., Cao D., Niu B., Chen M., Li Y., Zhang R., Wang Y., Zhang L., Li X., Xiong Z., Shi Q., Huang Z., Fu Z., Zheng M., Chem. Sci. 2024, 15, 10600–10611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Bubeck S., Chandrasekaran V., Eldan R., Gehrke J., Horvitz E., Kamar E., Lee P., Lee Y. T., Li Y., Lundberg S., Nori H., Palangi H., Ribeiro M. T., Zhang Y., arXiv preprint 2023, DOI 10.48550/arXiv.2303.12712. [DOI] [Google Scholar]
- 25. Lu P., Mishra S., Xia T., Qiu L., Chang K.-W., Zhu S.-C., Tafjord O., Clark P., Kalyan A., Advances in Neural Information Processing Systems 2022, 35, 2507–2521. [Google Scholar]
- 26. Zheng Z., Rong Z., Rampal N., Borgs C., Chayes J. T., Yaghi O. M., Angew. Chem. Int. Ed. 2023, 62, e202311983. [DOI] [PubMed] [Google Scholar]
- 27. Touvron H., Lavril T., Izacard G., Martinet X., Lachaux M.-A., Lacroix T., Rozière B., Goyal N., Hambro E., Azhar F., Rodriguez A., Joulin A., Grave E., Lample G., arXiv preprint 2023, DOI 10.48550/arXiv.2302.13971. [DOI] [Google Scholar]
- 28. Achiam J., Adler S., Agarwal S., Ahmad L., Akkaya I., Aleman F. L., Almeida D., Altenschmidt J., Altman S., Anadkat S., arXiv preprint 2023, DOI 10.48550/arXiv.2303.08774. [DOI] [Google Scholar]
- 29. Anthropic A. I., Claude-3 Model Card 2024. [Google Scholar]
- 30. Xu F. F., Alon U., Neubig G., Hellendoorn V. J., in Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, 2022, pp. 1–10. [Google Scholar]
- 31. White A. D., Hocky G. M., Gandhi H. A., Ansari M., Cox S., Wellawatte G. P., Sasmal S., Yang Z., Liu K., Singh Y., Ccoa W. J. P., Digital Discovery 2023, 2, 368–376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Koscher B. A., Canty R. B., McDonald M. A., Greenman K. P., McGill C. J., Bilodeau C. L., Jin W., Wu H., Vermeire F. H., Jin B., Hart T., Kulesza T., Li S.-C., Jaakkola T. S., Barzilay R., Gómez-Bombarelli R., Green W. H., Jensen K. F., Science 2023, 382, eadi1407. [DOI] [PubMed] [Google Scholar]
- 33. Ruan Y., Lu C., Xu N., He Y., Chen Y., Zhang J., Xuan J., Pan J., Fang Q., Gao H., Shen X., Ye N., Zhang Q., Mo Y., Nat. Commun. 2024, 15, 10160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Renze M., arXiv preprint 2024, DOI 10.48550/arXiv.2405.06682. [DOI] [Google Scholar]
- 35. Ishibashi Y., Nishimura Y., arXiv preprint 2024, DOI 10.48550/arXiv.2404.02183. [DOI] [Google Scholar]
- 36. Lee C., Xia C. S., Huang J., Zhu Z., Zhang L., Lyu M. R., arXiv preprint 2024, DOI 10.48550/arXiv.2404.17153. [DOI] [Google Scholar]
- 37. Ouyang L., Wu J., Jiang X., Almeida D., Wainwright C. L., Mishkin P., Zhang C., Agarwal S., Slama K., Ray A., Schulman J., Hilton J., Kelton F., Miller L., Simens M., Askell A., Welinder P., Christiano P., Leike J., Lowe R., arXiv preprint 2022, DOI 10.48550/arXiv.2203.02155. [DOI] [Google Scholar]
- 38. Mahjour B., Hoffstadt J., Cernak T., Org. Process Res. Dev. 2023, 27, 1510–1516. [Google Scholar]
- 39. Torres J. A. G., Lau S. H., Anchuri P., Stevens J. M., Tabora J. E., Li J., Borovika A., Adams R. P., Doyle A. G., J. Am. Chem. Soc. 2022, 144, 19999–20007. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.
Supporting Information
Data Availability Statement
The data that support the findings of this study are available in the supplementary material of this article. Additional source code, as well as datasets, can be found in the GitHub repository (https://github.com/zach‐zhiling‐zheng/EChem‐Explorations).



