Artificial Intelligence Methods and Models for Retro-Biosynthesis: A Scoping Review

Guillaume Gricourt; Philippe Meyer; Thomas Duigou; Jean-Loup Faulon

doi:10.1021/acssynbio.4c00091

. 2024 Jul 24;13(8):2276–2294. doi: 10.1021/acssynbio.4c00091

Artificial Intelligence Methods and Models for Retro-Biosynthesis: A Scoping Review

Guillaume Gricourt ^†, Philippe Meyer ^†, Thomas Duigou ^†, Jean-Loup Faulon ^†,^‡,^*

PMCID: PMC11334239 PMID: 39047143

Abstract

graphic file with name sb4c00091_0005.jpg

Retrosynthesis aims to efficiently plan the synthesis of desirable chemicals by strategically breaking down molecules into readily available building block compounds. Having a long history in chemistry, retro-biosynthesis has also been used in the fields of biocatalysis and synthetic biology. Artificial intelligence (AI) is driving us toward new frontiers in synthesis planning and the exploration of chemical spaces, arriving at an opportune moment for promoting bioproduction that would better align with green chemistry, enhancing environmental practices. In this review, we summarize the recent advancements in the application of AI methods and models for retrosynthetic and retro-biosynthetic pathway design. These techniques can be based either on reaction templates or generative models and require scoring functions and planning strategies to navigate through the retrosynthetic graph of possibilities. We finally discuss limitations and promising research directions in this field.

Keywords: retrosynthesis, retro-biosynthesis, artificial intelligence

Introduction

Retrosynthesis¹ is essential for the development of new compounds in the fields of pharmaceuticals and organic chemistry, providing chemists with the ability to access complex and novel molecules. This same approach, rebranded as retro-biosynthesis, is also used in biocatalysis, where reactions are catalyzed by enzymes. Compared with conventional chemical synthesis, enzymatic processes can catalyze chemical reactions in a specific, highly efficient manner, requiring less energy and generating minimal waste. Biochemical reactions can be carried out in vitro, as with an enzymatic cascade, or in vivo, as in synthetic biology via metabolic engineering.² Distinct challenges arise in biochemistry and synthetic biology, such as enzyme isolation and managing the trade-off between cell growth and molecular production, respectively.³

Identifying a set of building block molecules, also referred to as precursors, readily available in the commercial market or naturally existing in the environment is essential for synthesizing a desired target product through retro(-bio)synthesis. The technological disruption brought about by artificial intelligence (AI) paves the way for new possibilities across each of the key components required for this task.

Retrosynthesis proceeds by iteratively applying single-step processes, which consists of finding all the possible reactants of a given product. Methods have been proposed for single-step retrosynthesis that can be grouped into three categories: template-based, template-free, and semitemplate-based. Template-based methods rely on a library of reaction templates constructed from chemical reaction data sets to match them against a target molecule and extract the reactants from the selected templates. Here, AI techniques have been developed for selecting the most promising templates. Distinctively, template-free approaches use AI generative models for translating products directly into candidate reactants, whereas semitemplate-based methods forecast reactants by iteratively manipulating the bonds within the product.

In all the aforementioned cases, the single-steps are iterated via route-planning algorithms to produce pathways of reactions and identify available precursors. Predicting multistep retrosynthesis pathways is inherently challenging due to the extensive search space for synthesis routes, the subjective determination of selecting suitable candidates, which motivates the use of AI-based combinatorial graph search methods. To guide the route planning and rank-predicted solutions, AI strategies are applied to suggest the best choice according to dedicated algorithms, scoring functions, or the availability of enzymes in retro-biosynthesis.⁴

Because of the fast-paced developments in retrosynthesis for chemical synthesis planning, there is an evident requirement for a thorough summary of pertinent literature. The compilation of scientific articles was inspired by the preferred reporting items for systematic reviews and meta-analyses extension for scoping reviews (PRISMA-ScR)⁵ guidelines (Note S1). A comprehensive literature search was conducted across four academic search engines, covering a broad spectrum of disciplines including biology and computer science, both pertinent to the interdisciplinary field of retro(-bio)synthesis. While this search was successful in identifying papers published in academic journals, expanding it to include conference proceeding papers further extended the research scope, though it may not have fully captured all pertinent research presented at academic conferences. Outlined in Figure 1, this review summarizes retrosynthesis methods which can be utilized in several application domains (chemistry, biocatalysis, and synthetic biology) even when originally developed for synthesis planning in organic chemistry. In the following sections, we explore the diversity of AI methods and models (explained in a glossary in Note S2) tailored for both single-step and multistep processes, including the types of data and predictors used and data sets and evaluation metrics employed. We then reviewed popular databases and data set preparation. Finally, we assess the limitations of AI methods and highlight the distinctions across application domains. Our review also aims to identify gaps in knowledge, emphasizing areas that require additional research to advance the application of AI in retro-biosynthesis.

Retrosynthesis principles and its applications. Retrosynthesis is a computer-aided method that uses data sets and user expertise. Current algorithms for retro(-bio)synthesis are applied across several domains. In chemical synthesis, target molecules are crafted through organic chemistry reactions from commercially available building blocks. In biocatalysis, enzymes are employed to catalyze the reactions. Synthetic biology and metabolic engineering go a step further, using living cells such as bacteria, fungi, or plants to facilitate bioproduction pathways and supply the necessary building blocks. The retro(-bio)synthesis process unfolds in three key stages, leveraging molecular databases. The single-step stage consists of predicting reactants (red and gray nodes) necessary for producing a given product (yellow). The multistep stage determines possible routes linking the desirable product (yellow node) to available building blocks (red nodes) using sequences of single-step moves. Completed predictions are shown by a solid line, whereas future predictions are shown by a hatched line. Finally, route scoring helps in finding the best strategy to produce a molecule as well as ranking completed routes. AI techniques now play a critical role in each phase of the retro(-bio)synthesis process. A*, A* search; CNN, convolutional neural network; GNN, graph neural network; MCTS, Monte Carlo tree search; and RL, reinforcement learning.

Single-Step Retrosynthesis

As AI and its applications continue to advance, computational approaches have been proposed to predict the outcomes of chemical reactions. Chemical reactions involve the transformation of one set of chemical substances into another, which results in chemical changes. Two key challenges exist: predicting the products that result from given reactants (i.e., substrates) and solving the reverse problem of identifying the reactants when the products are known.⁶ One approach is the template-based method that extracts information from a reaction chemical database to generalize the application of existing reactions through reaction templates.⁷ A template corresponds to subgraph patterns that describe changes in the connectivity between a product and its reactants. A second approach is the template-free method that uses the ability of generative models to predict target molecules.⁸ Finally, semitemplate-based methods aim to reconcile the use of dedicated rules and the ability to generalize via AI.⁹ The general principle associated with single-step retrosynthesis is shown in Figure 2.

Molecular and Reaction Representations

To utilize molecules and reactions as inputs to AI models, employing a representation or vectorization is essential. The widely adopted approach is to use SMILES and SMARTS notations to encode molecules and reaction templates into strings for models predicting substrates from the product(s).^10,11 Representing molecules as character strings is helpful to use tools from natural language processing, such as transformers and generative models that use text sequences as inputs. Molecular fingerprints form a family of representations of molecules into vectors. In particular, circular molecular fingerprints capture local features around atoms, such as topology, atom types, bond types, and connectivity patterns within a specified radius. This type of representation is mostly used to predict properties associated with reactions, such as reaction template⁷ or molecular similarity.¹² A molecule has a natural graph structure that considers atoms to be nodes and bonds to be edges. Consequently, molecular graphs have also been extensively used as inputs in models such as graph neural networks.^11,13 Other less common representations include SELFIES,¹⁴ molecular signatures,¹⁵ and atom environments¹⁶ that focus on local information to characterize molecules. Table 1 and Figure 3A provide a comprehensive overview of various molecular representations, highlighting the proportion of their utilization in single-step retrosynthesis. Surveys^17,18 are available for more details about the different types of molecular representations, their respective advantages, and limitations.

Table 1. List of Molecular Representations Used in Single-Step Retrosynthesis.

Representation	Description	Related works
SMILES (canonical, rooted, ...)	String representation of a molecule	(8, 10, and 19−46)
Circular fingerprints (ECFP, FCFP, HSFP, ...)	Vectorization of a molecule by hashing local molecular features	(7, 11, 12, 25, 26, 29, 34, 43, and 46−56)
Molecular graph	Graph representation of a molecule where atoms are nodes and bonds are edges	(8, 9, 11, 13, 19, 24, 28, 36, 41, 42, 48, and 55−59)
Atom environment	Circular atom-centered topological neighborhood fragments of a molecule in SMARTS format	(46 and 50)
Signature descriptor	Representation of a molecule encoding all atom environments in its molecular graph up to a predefined radius in SMILES format	(60)
SELFIES	Extended SMILES string representation of a molecule	(46)

Open in a new tab

Analysis of the use of molecular representations and metrics by single-step algorithms. (A) Word cloud of types of molecular representations used as input in single-step AI models. (B) Proportions of different metrics used to evaluate models for single-step retro(-bio)synthesis. These metrics are further described in the single-step models evaluation section. Metrics mentioned only once in articles are not included in the analysis. Blue squares and green squares reflect the usage of metrics quantifying observational error and the diversity of reaction types, respectively. MaxFrag, maximum fragment and ROC, receiver operating characteristic.

Template-Based

Template-based single-step methods require templates that are derivatized either by human experts or extracted automatically from reaction databases in the form of reactants and major products. Template reactions, sometimes referred to as reaction rules or generalized reactions, are usually represented by atom-mapped SMARTS strings which can handle stereochemistry.²¹ Examples of templates used in chemistry are found in Szymkuć et al.⁶¹ and in biocatalysis and synthetic biology in Finnigan et al.⁶² and the RetroRules database.⁶³ Template-based methods select templates that can be applied to a given product. Then, the main challenge is to apply the best template to the product to obtain the reactants. Neural networks (NN) are used to learn patterns from molecular functional groups or fingerprints to select templates belonging to the same type of reaction rules for the products or substrates.⁵⁴ This strategy was applied to both reaction prediction and retrosynthesis tasks using deep highway networks²⁶ and Hopfield networks.⁴⁷ NN inputs are fed with the SMILES notation of the product, molecular fingerprints, or both.²⁹ Another strategy is based on graph neural networks (GNN) which use the graph structure of the molecule to select reaction templates, providing explainability through the model.^55,64 To this end, reaction templates are embedded in a conditional graphical model built using GNN⁵⁶ or partially encoded using the reaction center.⁵⁷

Template-based methods have the advantage of mimicking the bond rearrangements of existing reactions. However, these models suffer from several limitations, such as the duplication and overlap of templates describing the same chemical transformation in databases. Some studies have been conducted to optimize and palliate these limitations through canonicalizing templates¹¹ or using dedicated NN models.⁷ Among other limitations, template-based methods infer reactions derived from template databases but cannot suggest new mechanisms. To alleviate this inconvenience, Yan et al.³⁶ employed a template composition strategy to create new reactions and templates.

In retro-biosynthesis, the point of view is slightly different. Molecules may be more complex than those usually considered for retrosynthesis in chemistry,²⁹ and reactions involve enzymes which are highly specific. To manage this situation, strategies such as identifying disconnections in ring systems⁵² or evaluating the similarity between products and substrates have been suggested for selecting templates in both chemistry¹² and retro-biosynthesis with the aim of retaining enzymatic activity.^40,43 To the best of our knowledge, AI models have not been coupled to template-based single-step methods in the context of synthetic biology.

Template-Free

Template-free methods avoid relying on reaction templates and use generative models to directly predict reactants. To carry out this task, various models such as encoder-decoder built with long short-term memory cells^33,49 and Weisfeiler-Lehman networks were first employed.¹³ More recently, deep generative models such as transformers have extensively been used and improved upon. Transformer models are a type of NN architecture composed of an encoder and a decoder, particularly suited for natural language processing. Sequences of tokens, which represent single or multiple characters, are processed by the encoder to generate a set of hidden representations and decoded by the decoder part to generate the output sequence. This model embeds an attention mechanism to focus on the most informative parts of the sequence. Thus, the translation task from the SMILES product to the SMILES reactants has been conducted using a model built for forward prediction^31,65 and by integrating a reaction prediction model into a retrosynthesis process.⁶⁶ Instead of using SMILES representations as input, transformer models have considered other types of representations of molecules and combine them. For example, models such as Graph2SMILES,⁶⁷ BiG2S,⁶⁸ DVMP,²⁸ graph enhanced transformer,²⁴ graph truncated attention,⁴¹ and G2GT⁵⁹ combine graph representations of molecules with SMILES sequence representations. The transformer model retroformer⁸ associates local and global attention heads to identify the reactive region in molecules. Another possibility is to customize the transformer model with a tree representation of the SMILES,³⁵ use a set of predefined compounds,⁵⁸ or add information to the reaction using byproducts³⁹ or a reaction graph.⁶⁹ Transformer architectures have also been used to predict atom environments of the reactants knowing the products⁵⁰ and to translate the reactants back.⁴⁶

During the reconstruction of the representation of the molecule, transformer models are liable to add or omit characters, therefore producing grammatical errors in the SMILES notation. To alleviate this inconvenience, a neural-network-based syntax checker called the SCROP³⁰ framework was plugged into the transformer architecture, leading to a decrease in the number of invalid outputs. In order to increase the diversity of predictions, avoid invalid SMILES,²⁰ and predict the type of the reaction,⁴² a latent variable has been integrated into the model. Moreover, different strategies have been employed to increase prediction diversity. Adding reaction types to SMILES strings provides additional context to the transformer model, leading to an increase in the diversity of suggestions.⁴⁵ Achieving this objective without altering the data set, predictions were enhanced by employing a GNN¹⁹ or an energy-based model^70,71 to rerank predictions that the single-step model deemed less confident. Reranking methods harness additional chemical feature information from molecular graphs to refine the results.

Some methods have been tested to improve learning capacities of models, such as data augmentation and transfer learning. Data augmentation enriches data sets by applying diverse transformations on the original data SMILES. These transformations are basic, such as the swapping of reactants and products,³⁸ or more elaborate, such as the SMILES enumeration method, which randomly selects a starting atom to generate different SMILES for the same molecule.^27,44 On the other hand, transfer learning is a technique in which a pretrained model is used as a starting point for a new task to take advantage of the knowledge and features learned by the pretrained model. Learning on a data set of 380 K molecules, a transfer done on USPTO-50K^25,37 and on a data set of around 2200 Baeyer–Villiger reactions,²³ demonstrated better performance than on the data set alone. In contrast, multitask transformers trained on text and molecular representations show interesting results without the need for transfer learning.⁷²

To the best of our knowledge, template-free methods have not yet been used in the context of synthetic biology, while few methods are available for biocatalysis. BioNavi-NP⁷³ employed a transformer with a data set representing natural products to predict biosynthesis pathways. In the context of reaction predictions, Kreutter et al.⁷⁴ added a textual representation of enzymes to SMILES, whereas in retro-biosynthesis, Probst et al.⁷⁵ used EC numbers to enrich SMILES, thus predicting both molecules and the enzymes, and the EC number respectively associated with the reactions.

Semitemplate-Based

Semitemplate-based methods aim to imitate the reasoning of chemists. First, the reaction site within the product structure is detected, followed by the disconnection of bonds to create intermediate molecules known as synthons. Substrates are then retrieved from the synthons. Unlike template-based methods, this approach does not rely on a database of predefined chemical reaction templates.

After producing synthons, reactants are predicted with a generative model such as the graph-to-graph translation models found in G2Gs,⁷⁶ a transformer model such as RetroXpert,⁷⁷ RetroPrime,³² and RetroSub,³⁴ or based on a precomputed vocabulary using GraphRetro.⁷⁸ Another possibility is to optimize the molecular input; for instance, the hot-spot fingerprint (HSFP) has been employed by Hasic et al.⁵³ to identify the reaction site and generate synthons. Additionally, RetroExplainer⁷⁹ and Graph2Edits⁹ frameworks rely on a set of actions, such as deleting bonds or attaching groups of atoms to retrieve reactants.

Interestingly, a semitemplate method has been developed in the context of biocatalysis, using a prompt-based paradigm that enables the inclusion of additional information into the inputted SMILES, such as the EC number of the reactions.⁸⁰ To the best of our knowledge, semitemplate-methods have not yet been used in the context of synthetic biology.

Single-Step Models Evaluation

When comparing methods, it is important to establish standards for measuring their effectiveness. Using a metric facilitates fair and consistent comparisons, helping in the identification of the strengths and weaknesses of each model. Figure 3B shows the adoption of different metrics by the community.

Many of the models utilized for single-step prediction suggest multiple candidates, where “candidates” refer to either a collection of reaction templates or some sets of molecules envisioned as credible reactants. Accuracy metrics indicate the proximity of predictions to the true values and are typically evaluated using the top-n accuracy. While the top-n metric is widely used for reaction prediction, its relevance has been questioned,²² as many molecules can be built from more than one set of reactants, i.e., there are several “true” answers for a given product. Less common metrics in this context include fractional accuracy,³⁵ balanced accuracy,^26,55 weighted precision,⁵⁴ and ROC curve.^38,43

Metrics have also been used to assess the quality of ranking. The mean reciprocal rank (MRR) calculates the average rank of the first relevant prediction across multiple runs.⁵⁴ An alternative, called the coverage, counts the number of samples for which one or more valid predictions are made.^20,22,42,45 It is crucial to evaluate reactants belonging to multiple types of reactions to obtain a wide range of available synthetic routes. The class diversity metric measures the range of reaction types predicted by the single-step models,^22,32,45 while the Jensen–Shannon divergence quantifies the similarity between likelihood distributions of predicted reactions belonging within a fixed number of reaction types.²² Moreover, rather than evaluating richness, the presence of repeated predictions, which indicates a lack of variety produced by the models, has been estimated in Kim et al.²⁰ and in Yan et al.³⁹

Other ad-hoc metrics have been developed specifically for one-step retrosynthesis. The round-trip accuracy reflects the validity of retrosynthetic suggestions with a forward transformer that predicts the product molecule from the predicted precursor,²² but it has been shown that this transformer can produce bias.⁸¹ Due to the nature of the template-free algorithms, some predictions are grammatically invalid. More than one-third of the research papers utilizing template-free methods quantify the occurrence of invalid SMILES (Figure 3B). Finally, in an original approach specific to semitemplate-based methods, a dedicated metric was employed to measure the success of disconnection.⁸⁰

In retro(-bio)synthesis, one or more reactants are involved in a reaction, and the largest molecule better reflects the reaction type and is more susceptible to having an important reaction site, avoiding unambiguous reactions. To that end, the maximum fragment (MaxFrag) accuracy was applied to the predictions^39,44 to consider the similarity between molecules to reflect their bioactivity.³⁵

Perspectives in Retro-Biosynthesis

Although many approaches have been developed for retrosynthesis, only a few have been specifically designed for retro-biosynthesis. These methods are summarized in Table 2. Indeed, one of the prerequisites for a reaction to be biocatalyzed is, at the very least, to determine if the reaction could be catalyzed by an enzyme.⁸⁰ Additionally, enzymatic stability is achieved at narrow ranges of temperature, pH, and pressure values. It is therefore essential to accurately characterize these reaction elements to ensure the feasibility of reactions.⁸² The reaction solvent also plays a crucial role in enhancing reaction yield and in making chemical reactions more sustainable. Considering all these factors could lead to the development of a dedicated score for retro-biosynthesis to compare the suggestions made by the algorithm. Before opting for a retro-biosynthesis algorithm, users must ascertain whether the results obtained are highly exploratory and meant for theoretical contemplation or if they are intended for practical application. Indeed, if the user is open to novel reactions predicting a hypothetical substrate, the use of generative models is suitable.³¹ Conversely, employing SELFIES enables the generation of reactions not present in the data set, while ensuring the production of coherent molecules.⁴⁶ However, for the in vivo implementation of reactions, it is recommended to utilize known reaction mechanisms and therefore prefer template-based methods. We believe that the numerous approaches developed in retrosynthesis can serve as a source of inspiration to adapt them for retro-biosynthesis.

Table 2. Single-Step Retro-Biosynthesis Methods.

Single-step	Framework	Description	Focused on stereochemistry	Enzyme information	Data set
Template-based	EHreact⁴⁰	Templates are stored in a Hasse diagram and ranked according to their similarity	Yes	Name, EC number	Brenda, RetroRules, and Rhea
	retrosim_enz⁴³	Template ranking based on similarity and evolution scoring	Yes	Name	Rhea
	RingBreaker⁵²	Template ranking based on synthesis of ring systems	Yes	No	Reaxys and USPTO
Template-free	BioNavi-NP⁷³	Transformer trained on organic and biosynthetic reactions	Yes	No	MetaNetX and USPTO
	Kreutter et al.⁷⁴	Enrich SMILES with enzyme name using Transformer (forward prediction)	Yes	Name	Reaxys and USPTO
	Probst et al.⁷⁵	Enrich SMILES with EC number using Transformer	Yes	EC number	Brenda, MetaNetX, PathBank, and Rhea
Semitemplate-based	Thakkar et al.⁸⁰	Enrich SMILES with EC number using a prompt-based method	No	EC number	Pistachio and USPTO

Open in a new tab

Multistep Retrosynthesis

Multistep retrosynthesis, as introduced in Corey’s seminal work, is a strategic framework for synthesizing complex molecules by reversing known chemical reactions to break down a target molecule into simpler predecessors. The process involves multiple steps, varying in number based on the molecule’s complexity, available reactions, and starting materials. The ultimate aim is to develop efficient synthetic routes using these building blocks, employing available reactions for practical implementation. Originally focused on simplifying molecules, the concept has evolved to accommodate the synthesis of intermediates that may be more complex than the target, as seen in natural biosynthesis processes involving complex structures enhanced by cofactors like phosphorylated intermediates or coenzymes such as coenzyme A. The principles of multistep retrosynthesis now also apply to biocatalysis and synthetic biology, leveraging biological databases to guide reaction selection.

While algorithms suggesting viable single-step retrosynthesis pathways provide a critical foundation, forecasting multistep retrosynthesis pathways introduces substantial challenges. The complexity arises from the enormous potential synthesis routes and the subjective nature of determining a “good” synthesis route.

Chemists and biochemists grapple with these complexities, faced with a broad spectrum of potential intermediates and differing views on what constitutes an optimal retrosynthetic pathway. Key components of a comprehensive multistep retrosynthesis search, namely the planning algorithm, the selection of starting materials, and the single-step reaction predictor, are illustrated in Figure 4.

General principle of multistep retrosynthesis is represented as an AND/OR tree where circles represent molecules with an OR node, because multiple reactions can synthesize this product, and squares represent reactions with an AND node, indicating that all reactants are necessary to produce the product. Starting from a target molecule (yellow ●), single-step retrosynthesis is used at each step of the retrosynthesis to reach building blocks (red ●). In chemistry and biocatalysis, the building blocks are commercially available molecules, while in synthetic biology, the building blocks are molecules present in living cells. A scoring function, as explained in the next section, and a search algorithm guide the navigation over synthetic possibilities during retrosynthetic planning due to the vast search space of possible synthesis routes. For simplicity, the planning algorithms are explained with graphs instead AND/OR trees.

Availability of Building Blocks

The set of available building blocks, sometimes referred to as precursors or sinks, that could be used as starting material is a critical element influencing an algorithm’s capabilities to predict synthetic routes. Indeed, it is intuitive that a more diverse and extensive collection of building blocks will provide a broader “landing pad” for algorithms for ending retrosynthesis explorations.

In chemistry and biocatalysis, the building blocks often consist of chemicals commercially available, as in Probst et al.,⁷⁵ indexed from online portals like eMolecules or chemical providers such as the Sigma-Aldrich catalog. Additional information available, such as the chemical price, is eventually taken into account for guiding and ranking the retrosynthesis planning, as in Zhang et al.⁸³

The planning of biosynthetic pathways within living organisms poses a challenge in selecting exploitable building blocks because the entry of molecules into cells is highly selective. The building blocks are more specific and usually encompass sets of molecules naturally present in organisms, as in Koch et al.,⁸⁴ where the available precursors were extracted from a genome-scale metabolic model of the Escherichia coli bacterium.

Planning and Search Algorithms

Predicting potential synthesis routes relies heavily on search algorithms that navigate through various possibilities within a chemical space, as shown in Figure 4. These algorithms are crucial for effectively mapping out multistep pathways. Broadly, these search algorithms fall into two categories: uninformed searches and informed searches. Uninformed searches, such as depth-first and breadth-first searches, operate without relying on additional information to guide the exploration toward a specific area of the solution space. Conversely, informed searches incorporate heuristic functions that assess “how good” compounds are to be expanded. These heuristics may solely focus on the compounds discovered thus far, as in the beam searches mentioned in this review. Alternatively, they also estimate the proximity to a solution using a method such as rollout simulations in Monte Carlo Tree Search (MCTS) or value function estimators, as in A*-related algorithms. Overall, these heuristics guide the search process, ensuring a more efficient exploration. While informed search approaches do not guarantee an optimal solution, they significantly enhance the likelihood of finding a good solution within a reasonable time frame, balancing solution quality with search efficiency. In essence, these algorithms play a pivotal role in optimizing retrosynthesis planning by navigating through complex possibilities to derive viable synthetic routes. The retrosynthesis graph is generally represented as an AND/OR tree, where an OR node corresponds to a molecule since several reactions are possible to synthesize this product, while an AND node corresponds to a reaction, since all the reactants are required to synthesize the product. This graph could also be considered as a hypergraph where an edge can connect several nodes, thus representing the link between products and substrates.²² We now briefly present the main types of search algorithms recently used for multistep retro(-bio)synthesis.

Breadth-First Search

The breadth-first search algorithm is a textbook case for graph traversal algorithms. It explores a graph by visiting all its neighbor nodes at the present depth before moving on to nodes at the next depth level. This process continues until all reachable nodes have been visited or, in a pragmatic retrosynthesis context, until a certain depth is reached. Although being quite slow, this algorithm is useful for finding short routes. In the domain of retro(-bio)synthesis, a filter is classically applied between each level expansion, aiming to exclude nodes that are unlikely to participate in plausible solutions, thereby limiting combinatorics. Breadth-first search has been used in chemistry to predict thermodynamically feasible pathways, as in the analytic tool RetroSynX.⁸⁵ Here, reactions that are unlikely to occur are filtered out before advancing to the next depth level. Similarly, in the field of biocatalysis, Liu et al.⁸⁶ employ thermodynamic estimations as a filtering criterion to mitigate combinatorial explosion during breadth-first searches.

Beam Search

Beam search is nowadays an algorithm used in many AI generative methods. It constructs a collection of possible routes in a breadth-first manner, but it imposes a predefined limit on the number of nodes to be expanded at each depth level, named the beam width, selecting nodes to keep from heuristic evaluations. Often presented as an enhancement of the breadth-first algorithm, beam-search offers improved efficiency in finding solutions, especially in large solution spaces. This algorithm has been used in chemistry, such as in the work of Schwaller et al.²² where promising chemicals are chosen for further expansion based on a synthetic complexity score, SCScore, and generative single-step log-probabilities. In addition to these metrics, Kreutter et al.⁸⁷ incorporated a route penalty score, RPScore, in the heuristic evaluations. In retro-biosynthesis, the classification and availability of enzymes are crucial factors. For example, Probst et al.⁷⁵ rely on scores that consider EC number annotations and SCScore to select chemicals for expansion. Conversely, in synthetic biology, the RetroPath2.0⁶⁰ software integrates the beam search with RetroRules scores to prioritize reactants associated with high confidence in enzyme availability.

Depth-First Proof Number Search (DFPN)

The proof number search algorithm is a tree search method primarily used in game tree solving. It evaluates game positions by assigning proof numbers (values indicating a win) and disproof numbers (values indicating a loss) to nodes in the game tree. DFPN is a variant that conducts a depth-first search while updating these numbers, aiming to prove or disprove the existence of a forced win within a specific depth limit. It prunes branches of the search tree based on these proof and disproof numbers to focus on the most promising lines of play. This search algorithm is used in chemistry in DFPN-E,⁸⁸ where authors couple DFPN with a heuristic edge initialization method to address the imbalance between the numbers of OR versus AND moves, later improved to output multiple solutions.⁸⁹ The CompRet framework⁹⁰ proposes a comprehensive tool to enumerate and rank possible routes to synthesize compounds, using metrics derived from the SCScore to recommend most promising routes.

Monte Carlo Tree Search (MCTS)

Monte Carlo tree search is an informed heuristic search algorithm often used in decision-making processes. It builds a search tree by repeatedly simulating random sequences of moves, which are single-step transformations in the context of retrosynthesis, from selected leaves (e.g., compounds) of the tree. The algorithm prioritizes exploring promising paths by balancing between exploitation (focusing on most promising nodes) and exploration (focusing on alternative potential routes) to make informed decisions. Within the past few years, this algorithm has been widely used for retrosynthetic route planning in chemistry.^66,91−97 Segler et al.⁹¹ pioneered MCTS for retrosynthesis by combining rule-based single-step transformations and three NNs to assist the exploration of the chemical space. In this study, AI is notably utilized to preselect templates to apply, limiting combinatorial explosion while directing searches toward the most plausible routes. This advantageous integration of AI has been implemented in several other works combining NNs and template-based approaches in MCTS, such as ASKCOS⁹⁸ and AiZynthFinder,⁹⁶ as well as the work of Zhang et al.,⁹² where authors propose to combine efficiently five GNNs.⁹² The data source for building templates is an important factor, as highlighted by Thakkar et al.,⁹⁴ wherein the impact of four data sets (AiZynthFinder, Pistachio, Reaxys, and USPTO) on MCTS performance is investigated. Furthermore, the configuration of the model significantly influences the success of route discovery.⁹⁹ Template reactions model a finite number of transformations, which may lack exhaustiveness. As an alternative to template-based formalism, template-free single-step approaches have also been employed in MCTS. For instance, Lin et al. implement single-steps using Transformers in the AutoSynRoute.⁹⁵ Studies falling within the field of biocatalysis are by far fewer. However, reuses of ASKCOS have been proposed to exploit template-based transformations coming from both chemical and metabolic reaction databases.^100,101 As far as we know, RetroPath RL⁸⁴ stands as the sole implementation of the MCTS algorithm for synthetic biology where rule-based transformations have been extracted from metabolic databases and the list of available building blocks extracted from genome-scale metabolic models.

A* Search

A* search is an informed graph traversal algorithm used for pathfinding and optimization in graphs or search problems. It determines the priority of nodes for exploration by considering both the cost-so-far (i.e., historical cost from starting node) and an estimated cost-to-go (i.e., future cost to reach a goal node). By utilizing this evaluation function, occasionally designated as the value function, A* proposes to find the optimal path from the start to the goal node while minimizing the total cost. Compared to MCTS, this algorithm does not have a rollout phase and therefore does not depend on randomness and is faster. This search algorithm gained in popularity in the last years and has been used in several chemical planning software such as ASICS,¹⁰² Retro*,¹⁰³ RetroGraph,¹⁰⁴ GNN-Retro¹⁰⁵ for chemistry application and BioNavi-NP⁷³ for biosynthetic pathways predictions. While the A* algorithm serves as the multistep engine for guiding route discovery, it can be combined with different flavors of single-step moves, such as template-based transformations as in Retro* where a NN selects a template to be applied depending on the product molecule, or template-free moves as in BioNavi-NP, which relies on a Transformer for predicting the reactants given the product. The A* search is one variant of best-first search algorithms. Other nondata-driven AI-based approaches have been developed, such as greedy best-first search, that prioritize exploration based solely on the cost incurred so far (historical cost) without attempting to predict the future cost to reach a goal node. One example is SynRoute, where authors demonstrated this greedy approach to be the most effective among four planning algorithms tested.¹⁰⁶ Another prime example is Synthia¹⁰⁷ (previously known as Chematica), a well-known commercial software for synthesis planning. This expert system utilizes a comprehensive set of rules for reaction-template selection and application, alongside a dual scoring function for choosing chemicals for the next retrosynthesis iteration. While not described as such in the literature, Synthia’s exploration strategy can be classified as a best-first search.⁶¹

Other Reinforcement Learning Related Search

Similar to MCTS and A* search algorithms, other types of reinforcement learning methods can be found in the literature. Through iterative exploration and learning, the computer’s agent powered by NN¹⁰⁸ refines its decision-making process, effectively navigating the graph of possible reactions to identify the most efficient pathways for retrosynthesis. This approach has been used in chemistry^109,110 and in biocatalysis.⁸³

Table 3 summarizes various search algorithms used for multistep retrosynthesis, along with their respective scope of application in chemistry, biocatalysis, and synthetic biology. Within retro-biosynthesis, it is important to acknowledge the existence of foundational methods like novoStoic,¹¹¹ XTMS,¹¹² or BNICE.ch.¹¹³ These methods offer distinct approaches to metabolic pathway design, diverging from the AI-based algorithms discussed in this review. NovoStoic utilizes a template-based reaction formalization within a stoichiometric modeling framework, coupled with Mixed Integer Linear Programming (MILP) to efficiently enumerate metabolic pathways. XTMS constructs a retrosynthetic network using template-based reactions named reaction signatures. This network then serves as a starting point to extract all possible biosynthetic pathways connecting a preset of known compounds to the E. coli chassis organism. Similar to XTMS, BNICE.ch uses template-based reactions to build a comprehensive biochemical network, referred to as ATLAS,¹¹⁴ which is then inspected to enumerate in an exhaustive manner linear routes (i.e., no branched pathways). However, in both approaches, the prebuilt networks limit users from exploring entirely new compounds, hindering generalizability. Readers seeking more details on these techniques can find comprehensive reviews elsewhere.^2,115

Table 3. Search Algorithms Commonly Used in Multi-Step Retrosynthesis^a.

Multi-step	Scope of application	Framework	Important feature(s)/Highlights	Building block source	Single-step	Code availability
Breadth-first	Chemistry	RetroSynX⁸⁵	Intermediate chemicals filtered using thermodynamics estimation	aladdin-e.com	Template-based	No
	Biocatalysis	Liu et al.⁸⁶	Intermediate chemicals filtered using thermodynamics estimation	aladdin-e.com	Template-based	No
Beam	Chemistry	Schwaller et al.²²	Most promising chemicals are selected for further expansion based on SCScore and generative single-step log-probabilities	eMolecules	Template-free	No
		Kreutter et al.⁸⁷	Further extends beam selection from Schwaller et al.²² by using RPScore	Enamine, Molport		Yes
	Biocatalysis	Probst et al.⁷⁵	Beam selection considering enzyme classification and SCScore	eMolecules	Template-free	Yes
	Synthetic Biology	RetroPath2.0⁶⁰	Beam selection based on enzyme availability estimations	Metabolic model	Template-based	Yes
DF Proof Number	Chemistry	CompRet⁹⁰	Proof and disproof numbers according to reaching of building blocks	Enamine	Template-based	Yes
		DFPN-E⁸⁸	Use of an estimator to assess the difficulty of finding a proof + attainment of building blocks	USPTO		No
MCTS	Chemistry	Gao et al.⁹⁷	Two NNs used for template selection and reaction filtering	NA	Template-based	Yes
		Segler et al.⁹¹	3N-MCTS method: three NNs in use for template selection (expansion), feasibility, and rollouts	AlfaAesar, Acros, Reaxys, Sigma-Aldrich, ZINC		No
		AiZynthFinder⁹⁶	NN guided template selection	ZINC		Yes
		ASKCOS⁹⁸	Two NNs for template selection and reaction filtering	eMolecules, Sigma-Aldrich		Yes
		Wang et al.⁹³	Reinforcement learning network used instead of the MCTS rollout step	eMolecules, Sigma-Aldrich		No
		Zhang et al.⁹²	Five GNNs used for selecting templates, infer reaction solvent and catalyzer, filter reaction, and efficiently evaluate the rollout step	Molport		Yes
	Chemistry	AutoSynRoute⁹⁵	Rollouts guided by a heuristic based on log probabilities from Transformer output	Sigma-Aldrich, USPTO, and ZINC	template-free	Yes
	Biocatalysis	Sankaranarayanan et al.¹⁰⁰	Two NNs for template selection and reaction filtering applied to both	eMolecules, LabNetwork, Sigma-Aldrich	Template-based	No
		Levin et al.¹⁰¹	ASKCOS reimplementation using NN for template prioritization and balancing between chemical vs enzymatic templates	eMolecules, Sigma-Aldrich		Yes
	Synthetic Biology	RetroPath RL⁸⁴	Template selection according to reaction feasibility and enzyme confidence scores	Metabolic model	Template-based	Yes
A*	Chemistry	Retro*¹⁰³	Template selection using NN, cost of current paths estimated from the cost of current reactions, cost of future paths learned from NN trained from knowledge database	eMolecules	Template-based	Yes
		Retro*+¹¹⁶	Template selection is coupled during the search with the actual already predicted reactions, using self-improving learning	eMolecules		No
		RetroGraph¹⁰⁴	Cost of future routes estimated using an offline trained GNN, multitarget search using a graph search instead of a tree	eMolecules		No
		GNN-Retro¹⁰⁵	Cost of future routes estimated using an offline trained GNN	NA		No
		ASICS¹⁰²	Combines known reactions extracted from knowledge databases with template-based prediction, SAScore is used to estimate the cost of route toward the goal	eMolecules		Yes
	Biocatalysis	BioNavi-NP⁷³	Combination of chemical and biochemical data sets with the use of transfer learning to set up the single-step transformer, cost of future estimated with a NN	Custom list, main precursors of natural products	Template-free	Yes
Best-first	Chemistry	SynRoute¹⁰⁶	NN used to evaluate a feasibility score of predicted transformations	eMolecules	Template-based	No
	Chemistry	Synthia⁶¹	Chemical and reaction function scores are combined for selecting the next graph expansion	Sigma-Aldrich	Template-based	No
	Biocatalysis	RetroBioCat⁶²	SCScore used to guide the best-first search	eMolecules, Molport, ZINC	Template-based	Yes
Reinforcement learning	Chemistry	Schreck et al.¹⁰⁹	Improvement of node selection (policy) by training a NN using simulated experience	eMolecules, Sigma-Aldrich, LabNetwork	Template-based	Yes
		GRASP¹¹⁰	GRASP follows an MCTS-like approach, where the vanilla online roll-out is replaced by the reinforcement learning agent	eMolecules	Template-free	No
	Biocatalysis	Zhang et al.⁸³	A “decision maker” NN is trained to bias exploration, with a random component that promotes exploration of dissimilar routes	ChemSpace	Known reactions	No

Open in a new tab

NA, not applicable.

Perspectives in Retro-Biosynthesis

Various algorithms have been developed, yet none has demonstrated superiority in terms of results output. Depending on the application, it is important to pay attention to model parametrization for algorithms such as MCTS⁹⁹ or A* and to define an evaluation function that is tailored to the data.⁷³ Also, reducing the production cost of a molecule is often a goal in the search for new synthesis pathways.¹¹⁷ Consequently, the cost of building blocks and, more broadly, atom economy become significant criteria in the quest for innovative synthesis routes.⁸³ In this context, the use of enzymes that do not depend on cofactors is particularly appealing. For in vitro applications, although the addition of cofactors can regulate catalytic activity, this represents a cost factor that must be considered. To reduce this cost, cofactor recycling that utilizes photosensitization, electrochemical activation,¹¹⁸ or the creation of enzymatic cascades are promising strategies. To the best of our knowledge, retro-biosynthesis tools do not currently support the generation of pathways recycling cofactors. For in vivo applications, although implementing molecular cascades in a cellular host allows for the use of compounds that would be unstable if isolated,¹¹⁹ the use of cofactors can lead to competition between cell growth and the production of the desired chemical species, which are not considered by retro-biosynthesis tools.

Scoring Function

Selecting the most promising reactions and pathways to synthesize the target molecule is a crucial component for guiding the retrosynthesis planning process. The scoring functions outlined in Table 4 assist in navigating through the multiple synthetic possibilities encountered during retrosynthetic planning and route enumeration. These functions rely on various criteria, including factors such as chemical cost, structural considerations, and insights from enzyme knowledge.

Table 4. List of Scores Used to Evaluate the Routes.

Classification	Purpose	Scope of application	Score
Evaluate molecule	Structure availability	Chemistry	GASA⁴⁸ and SCScore¹³⁴
	Reaction availability	Chemistry	RAscore¹²¹ and ICHO⁵¹
Reaction condition	Experiments in continuous reactor	Chemistry	InFlow¹²⁹
	Liquid–liquid extraction	Chemistry	ExtractionScore¹³²
	Reaction yield	Chemistry	Yield-Bert¹²⁸
	Enzyme availability	Biocatalysis	DeepRFC¹⁴⁵ and EHReact⁴⁰
	Compound price	Chemistry	CoPriNet¹²⁶
	Thermodynamic	Multidisciplinary	eQuilibrator,¹³⁶ dGPredictor,¹³⁹ and GC-NORM-based¹²⁷
Pathway related	Predict the number of steps	Chemistry	CMPNN¹²²
		Drug relevant application	RetroGNN¹⁵² and DFRScore¹²³
		Synthetic biology	FCNN¹²⁴

Open in a new tab

Synthetic Accessibility Scores

Synthetic accessibility scores can discriminate feasible molecules from infeasible ones and are a helper retrosynthesis planning tool to identify viable synthetic routes from impractical ones.¹²⁰ The graph attention-based assessment of synthetic accessibility (GASA) score evaluates the synthetic accessibility of small molecules by labeling compounds as “easy” or “hard” to synthesize.⁴⁸ Contrary to relying solely on the structure of the molecule, RAScore¹²¹ evaluates the feasibility of synthesis, incorporating reaction information into the assessment, and similar strategies predict the probability of finding molecules involved in a reaction included in a database.⁵¹ Also, to surrogate synthesis accessibility, the number of steps required to produce a compound is estimated in chemistry,¹²² in drug-relevant,¹²³ and in biological¹²⁴ applications or has been integrated into a composite score.¹²⁵ Moreover, the scores provide estimates for data concerning the compound, including its price¹²⁶ or its thermodynamic properties,¹²⁷ and the reaction, its yield,¹²⁸ or its feasibility.^129−132

Routes Ranking

Regardless of the specific multistep algorithm employed, numerous retrosynthetic routes are typically generated, necessitating strategies to identify the most promising ones. To this end, route ranking strategies have been devised, integrating discriminative criteria related to chemicals (e.g., SAScore,¹³³ SCScore,¹³⁴ or cost of building blocks), to reactions (e.g., RAScore,¹²¹ reaction yield and thermodynamics, or enzyme availability), and overall properties of route (e.g., route diversifiability, number of reaction steps, or theoretical production flux). For instance, the SynRoute¹⁰⁶ framework evaluates routes based on the length, cost of building blocks, and reaction yield estimations to select an optimal pathway. In biocatalysis, RetroBioCat prioritizes pathways by considering the number of reaction steps, change in chemical complexity, the proportion of commercially available chemicals, and the linkage of steps to literature references. Interestingly, a diversity score is added to penalize pathways making use of reactions already ranked in top routes.⁶² Meanwhile, in synthetic biology, the Galaxy-SynBioCAD platform combines multiple criteria including enzyme availability, theoretical product flux (via Flux Balance Analysis¹³⁵), reaction thermodynamics (via eQuilibrator¹³⁶), and step count to train a classifier model for pathway scoring and ranking.¹³⁷ Additional notable efforts include a “route diversifiability” score that facilitates comparison of synthetic pathways¹³⁸ based on the potential production of analogous chemicals, and the RPScore,⁸⁷ which assesses routes by step count and molecular synthetic accessibility. More broadly, developments that predict Gibbs free energy (ΔG), such as eQuilibrator¹³⁶ and dGPredictor,¹³⁹ are valuable tools for both filtering single-step predictions and assessing the thermodynamic feasibility of overall pathways.

Enzyme Search

Transitioning from synthesis planning to biocatalysis and synthetic biology implementations requires identifying catalysts for predicted reactions, a crucial step in biosynthesis where enzymes play a predominant role. While numerous methods for enzyme selection exist,¹⁴⁰ only a few specifically tackle the challenges posed by retro-biosynthesis. The primary task in this context is taking a reaction defined by the chemical structures of its reactants and products and outputting a collection of candidate amino acid sequences that could catalyze the reaction.

Enzyme searches typically fall into two main categories: (i) referencing reactions and their catalyzing enzymes in metabolic databases such as KEGG and MetaCyc, and (ii) addressing de novo reactions not found in these databases, which necessitates predictive algorithms. E-zyme,¹⁴⁰ Selenzyme,^141,142 and BridgIT¹⁴³ are three methods instrumental for retrieving enzyme sequences for de novo reactions. In brief, their strategies involve a two-step process: first, use the de novo “query” reaction to identify similar known reactions in a reference database; second, retrieve the sequences associated with the best-hit known reactions. Sequence-to-reaction associations depend on the reference database. E-zyme and BridgIT use the KEGG Ortholog and Reaction databases, whereas Selenzyme uses the BIOCHEM4J, which integrates KEGG and Rhea databases for linking sequences to reactions. Interestingly, Selenzyme extends its ranking of sequences by incorporating factors like phylogenetic distance and sequence properties such as solubility and transmembrane regions. Meanwhile, the RetroBioCat database¹⁴⁴ allows users to search for enzymes from among the RetroBioCat tool predictions. However, it does not support direct querying using specific reactions SMILES.

In addition to sequence retrieval systems, computational methods such as Deep-RFC¹⁴⁵ use AI to evaluate whether a chemical reaction, given its substrate and product structures, is likely to occur. Furthermore, the EnzRank¹⁴⁶ tool aims to predict enzyme activity using as input a reactant and an enzyme sequence, effectively aiding in the selection of suitable enzymes for de novo reactions. Other methods exist that focus on different aspects of enzyme characterization, such as predicting EC numbers,^{143,147−151} further refining the selection process for appropriate enzymatic catalysts.

Perspectives in Retro-Biosynthesis

Pathway evaluation should include the thermodynamics of the reactions, the use of cofactors, the solubility of substrates, and the goals regarding the cost and sustainability of the building blocks. Estimating the viability of pathways within a cellular host ensures the reliability of predictions, but several elements must be considered, such as the choice of the host, the thermodynamics of all the reactions in the pathway, kinetic feasibility, or the presence and the accumulation of toxic compounds. It is also preferable to aim for the shortest possible pathway length and favor reactions that have already been characterized.¹⁵³ A significant number of sequential or tandem transformations in a one-pot process have been successfully carried out by combining enzyme-catalyzed reactions and metallic ions. However, it is crucial to ensure that all reactions within the pathways can share the same reaction conditions: solvent, temperature, pH, and that the catalysts can coexist with the enzymes. Indeed, some enzymes are less tolerant of the presence of metallic ions.¹⁵⁴ To enhance enzyme specificity, stability, and resistance, adaptation strategies can be pursued, and we refer the reader to these reviews for recent advances in enzyme engineering.^155,156

Data Sets

Common Databases

The use of AI for retrosynthesis relies on the quality of data and their diversity of information about molecules, reactions, pathways, and enzymes. In chemistry, the most popular benchmark data set of reactions is the US Patent and Trademark Office (USPTO) open-source database. It is composed of 3.7 million chemical reactions, and its subsets commonly used in AI models are the USPTO-50k, USPTO-full, and USPTO-MIT. Information about molecules and their properties is frequently extracted from the ChEMBL, MoleculeNet, and eMolecules data sets. Reactions have also been extracted from Reaxys and Pistachio. In biocatalysis and synthetic biology, biology databases of reactions include Rhea, RetroRules, and MetaNetX, which are open-source data sets. Databases of pathways such as the KEGG, MetaCyc, and PathBank databases are also used, along with databases of enzymes such as Brenda and UniProt. In Table 5, we summarize data sets used for retro(-bio)synthesis, their characteristics, and their scope of application in chemistry, biocatalysis, and synthetic biology.

Table 5. List of Datasets Commonly Used in AI Applied to Retro(-bio)synthesis.

Data set	Classification	Availability	Description	Scope of application
USPTO	Reactions	Public access (CC0 license)	A data set of organic reactions extracted from US Patent and Trademark Office-granted patents; it has been refined in several subsets, ranging from 50k reactions for the USPTO-50k data set to 1 M reactions for the USPTO-FULL data set	Chemistry^{7−13,19−28,30−37,39,41,42,44,45,47,49,50,52,53,55−59,66−71,76−79,87,88,94−96,102−105,116,122,132,157−161}
				Biocatalysis^29,73,80
Reaxys	Reactions	Commercial	Database that provides experimentally validated chemical data, including chemical structures, reactions, and properties	Chemistry^{23,38,52,54,59,90,91,93,94,97,109,129}
				Biocatalysis⁸³
Pistachio	Reactions	Commercial	Data set consisting of about 2,500,000 unique reactions	Chemistry^{22,45,45,94,106,110,122,162}
				Biocatalysis⁸⁰
Brenda	Enzymes, reactions	Public access (CC-BY license)	Database of enzyme catalyzed reactions, with information on alternative substrates, kinetic parameters, and protein sequences	Biocatalysis^40,75,146
				Synthetic biology¹¹¹
Rhea	Reactions	Public access (CC-BY license)	Database of biochemical reactions that uses the chemical ontology ChEBI covering enzymatic reactions and transport reactions	Biocatalysis^40,43,75
				Synthetic Biology¹¹¹
RetroRules	Reactions	Public access (CC-BY license)	Database of reaction template modeling enzyme catalyzed reactions for metabolic pathway discovery and metabolic engineering	Biocatalysis^40,145
				Synthetic biology^84,124,137
MetaNetX	Reactions	Public access (CC-BY license)	Collection of metabolites and biochemical pathways compiling more than 10 different biological databases (BiGG, ChEBI, Rhea, enviPath, HMDB, KEGG, MetaCyc, ...)	Biocatalysis^29,73,75
				Synthetic Biology^60,84,137
KEGG	Pathways	Public access with paywall for complete access	Comprehensive database integrating biological pathways, genomic, chemical, and disease information to facilitate understanding of biological systems and their functions	Biocatalysis^73,83,145
				Synthetic biology^111,113
PathBank	Pathways	Public access (open database license)	About 110,000 pathways found in 10 model organisms providing a pathway for every protein and a map for every metabolite	Biocatalysis⁷⁵
MetaCyc	Pathways	Commercial	Metabolic pathways of about 3000 pathways and 19,000 reactions and metabolites across diverse life forms, encompassing primary and secondary metabolism	Biocatalysis⁷³
				Synthetic biology¹¹¹
UniProt	Enzymes	Public access (CC-BY license)	Database of protein sequences	Chemistry¹⁶³
				Biocatalysis^43,146
				Synthetic biology^60,84,137

Open in a new tab

Data Preparation

Each algorithm dedicated to retro-biosynthesis selects reactions from databases to create reaction templates or a data set feeding AI algorithms. This selection results from either manual curation⁶² or automated extraction from one or more databases, applying specific filters to isolate relevant reactions and molecules. Some combined chemical and biological databases to increase the data set and to handle biochemistry reactions.^73,83 Initially, reactions are decomposed to isolate a single product per reaction.⁷³ Then, depending on their role in these reactions, some molecules are filtered out: molecules found as products in many reactions are identified as coproducts,⁴³ while cosubstrates, essential for enzymatic catalysis, are spotted from a predefined list and are partially⁷³ or entirely excluded.⁷⁵ Due to the enzymes’ ability to be stereospecific, the use of stereochemistry is central in retro-biosynthesis.⁴⁰ Noninformative reactions for this process, such as transport reactions or those without substrates, are eliminated.⁴³ Finally, the way molecules and reactions are represented is tailored to the specific needs of the algorithm.

Discussion and Outlook

While AI models in retrosynthesis and retro-biosynthesis have made significant strides, several challenges persist, requiring focused attention to enhance the models and effectively navigate the constraints inherent in their application in biocatalysis and synthetic biology. Below, we examine specific aspects, including molecular representations, model improvements, and evaluations, while also pointing out successful applications.

Single-step algorithms use multiple molecular representations or a combination thereof to gather complementary features from each representation. However, commonly used molecular representations exhibit notable limitations, such as the possibility of producing a SMILES which does not represent a molecule or certain fingerprints or atom environments that hinder accurate reconstruction.¹⁸ Alternatives have been introduced to address these shortcomings, like SELFIES or molecular signatures,⁶⁰ and require further development or a wider adoption. One avenue yet to be further explored is to prevent retrosynthesis exploration toward intermediate chemicals that may have undesirable properties, as in RetroPath RL, where toxic chemicals are avoided during the multistep exploration. Since properties and activities are generally well predicted using fingerprints such as ECFP,¹⁶⁴ and the same fingerprints are used in many parts of the retrosynthesis process, including single-step,⁷ multistep,¹² and scoring function,⁵¹ one could envision developing methods to perform retro(-bio)synthesis starting not from a targeted molecule but from a targeted fingerprint.

As mentioned earlier, there is a clear need for advancing template-free and semitemplate methods tailored for retro-biosynthesis, deserving focused investigation. A promising direction for future research involves the use of multimodal models that handle different kinds of data, like prompt-based methods, enhancing prediction performance by integrating additional information, such as the EC number, in addition to the molecular representation.⁸⁰ While large language models have recently showcased impressive capabilities in natural language processing and have shown promise in aggregating various types of data and leveraging automation,¹⁶⁵ their performance in retrosynthesis still remains behind that of state-of-the-art models.¹⁶⁶

The performance of AI models is significantly influenced by the availability and quality of data, while the creation of high-quality data sets is often both challenging and costly. The USPTO data set is widely used to train and evaluate single-step models. This data set, constructed from some patented syntheses not validated by experiments, suffers from unbalanced reaction classes, includes reactions with missing side products,¹⁶⁷ lacks reliable atom-mappings, and contains noisy stereochemical data.¹⁶⁸ Therefore, we believe that using data sets from well-maintained databases, although not perfect,²² is a preferable practice. Nevertheless, retrosynthetic data sets are frequently segmented into distinct subsets based on specific attributes, such as USPTO-50k or USPTO-STEREO, complicating model comparison even when assessed using the top-n metric. Accordingly, the performance of several single-step methods was aggregated by the data set.¹⁶⁹ Moreover, efforts are underway to better organize existing data, particularly through the use of AI models.¹⁷⁰ Concurrently, initiatives to improve access to biocatalytic information^144,171 have emerged. Such long-term commitments should be encouraged and promoted in the context of retro-biosynthesis to build reliable data sets.

Retro-(bio)synthesis have proven to be valuable across several applications. In chemistry, it has been used to conduct lead optimization of drug molecules¹⁷² and to synthesize alkaloid molecules¹⁷³ and natural products.¹⁷⁴ In the realm of synthetic biology, the Galaxy-SynBioCAD portal offers an all-in-one solution for designing metabolic pathways.¹³⁷ For instance, using RetroPath2.0, reactions were identified to produce lycopene in E. coli cells, and the pathway implementation was assessed in vivo using robotic equipment. The same platform has also been used to identify reactions crucial for producing biosensing intermediate molecules in cell-free systems.¹⁷⁵ Biocatalysis has been recognized as a method for advancing green and sustainable chemistry.³ In this regard, the potential of biofoundries has been showcased to produce material monomers assisted by human expertise and retro-biosynthesis tools.¹⁷⁶ Utilizing retro-biosynthetic tools, Zhang et al.¹⁷⁷ successfully produced aliphatic diamines without hazardous hydrogen cyanide, and Liu et al.¹⁷⁸ engineered cells to produce 3-phenylpropanol, circumventing petroleum-based processes. Similarly, Yiakoumetti et al.¹⁷⁹ synthesized flavonoids, avoiding extraction from plant sources, and Brito et al.¹⁸⁰ leveraged methanol as a sustainable alternative for producing 5-aminovalerate molecules. Once the pathway to bioproduction of a molecule is established, subsequent optimizations of the implemented pathways using retro-biosynthesis tools could enhance production levels. For example, the production of eugenol by Hanko et al.¹⁸¹ nearly tripled compared to previous reports when produced in small quantities.

In conclusion, this review extensively examines the latest advancements in AI-driven methods for both retrosynthesis and retro-biosynthesis paving the way for potential subsequent systematic reviews. The favorable outcomes observed in chemistry hold promise for its application in the field of retro-biosynthesis. As developments continue, we anticipate notable breakthroughs and increased incorporation of AI models in retro-biosynthesis, unlocking its full potential to catalyze innovation.

Glossary

Abbreviations

A*: A* search
AI: artificial intelligence
CNN: convolutional neural network
DFPN: depth-first proof number search
GNN: graph neural network
HSFP: hot-spot fingerprint
MaxFrag: maximum fragment
MCTS: Monte Carlo tree search
MILP: mixed integer linear programming
MRR: mean reciprocal rank
NN: neural network
RL: Reinforcement learning
ROC: receiver operating characteristic

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acssynbio.4c00091.

(Note S1) Literature review, (Note S2) glossary of terms and definitions, (Figure S1) results of the literature search and selection process, and (Table S1) research queries utilized for selecting articles from academic search engines (PDF)

Author Contributions

G.G., P.M., T.D., and J.L.F. conceived the study. G.G. created the collection of papers. J.L.F. acquired the funding. G.G., P.M., and T.D. wrote the manuscript. All authors read and approved the final manuscript.

This work was supported by a French government grant managed by the Agence Nationale de la Recherche under the France 2030 program, reference ANR-22-PEBB-0008. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

The authors declare no competing financial interest.

Supplementary Material

sb4c00091_si_001.pdf^{(195.8KB, pdf)}

References

Corey E. J.General Methods for the Construction of Complex Molecules. In The Chemistry of Natural Products; Elsevier, 1967; pp 19–37 10.1016/B978-0-08-020741-4.50004-X. [DOI] [Google Scholar]
Lin G.-M.; Warden-Rothman R.; Voigt C. A. Retrosynthetic Design of Metabolic Pathways to Chemicals Not Found in Nature. Curr. Opin. Syst. Biol. 2019, 14, 82–107. 10.1016/j.coisb.2019.04.004. [DOI] [Google Scholar]
Sheldon R. A.; Woodley J. M. Role of Biocatalysis in Sustainable Chemistry. Chem. Rev. 2018, 118 (2), 801–838. 10.1021/acs.chemrev.7b00203. [DOI] [PubMed] [Google Scholar]
Yu T.; Boob A. G.; Volk M. J.; Liu X.; Cui H.; Zhao H. Machine Learning-Enabled Retrobiosynthesis of Molecules. Nat. Catal. 2023, 6 (2), 137–151. 10.1038/s41929-022-00909-w. [DOI] [Google Scholar]
Tricco A. C.; Lillie E.; Zarin W.; O’Brien K. K.; Colquhoun H.; Levac D.; Moher D.; Peters M. D. J.; Horsley T.; Weeks L.; Hempel S.; Akl E. A.; Chang C.; McGowan J.; Stewart L.; Hartling L.; Aldcroft A.; Wilson M. G.; Garritty C.; Lewin S.; Godfrey C. M.; Macdonald M. T.; Langlois E. V.; Soares-Weiser K.; Moriarty J.; Clifford T.; Tunçalp Ö.; Straus S. E. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Int. Med. 2018, 169 (7), 467–473. 10.7326/M18-0850. [DOI] [PubMed] [Google Scholar]
Aal E Ali R. S.; Meng J.; Khan M. E. I.; Jiang X. Machine Learning Advancements in Organic Synthesis: A Focused Exploration of Artificial Intelligence Applications in Chemistry. Artif. Intell. Chem. 2024, 2, 100049. 10.1016/j.aichem.2024.100049. [DOI] [Google Scholar]
Fortunato M. E.; Coley C. W.; Barnes B. C.; Jensen K. F.. Machine Learned Prediction of Reaction Template Applicability for Data-Driven Retrosynthetic Predictions of Energetic Materials; AIP Conf. Proc.; AIP Publishing, Portland, OR, USA, 2020; Vol. 2272, p 070014. 10.1063/12.0000850. [DOI]
Wan Y.; Liao B.; Hsieh C.-Y.; Zhang S.. Retroformer: Pushing the Limits of Interpretable End-to-End Retrosynthesis Transformer. In Proceedings of the 39th International Conference on Machine Learning; Proceedings of Machine Learning Research; PMLR, 2022; Vol. 162, pp 22475–22490.
Zhong W.; Yang Z.; Chen C. Y.-C. Retrosynthesis Prediction Using an End-to-End Graph Generative Architecture for Molecular Graph Editing. Nat. Commun. 2023, 14 (1), 3009. 10.1038/s41467-023-38851-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Karpov P.; Godin G.; Tetko I. V.. A Transformer Model for Retrosynthesis. In Artificial Neural Networks and Machine Learning - ICANN 2019: Workshop and Special Sessions; Tetko I. V., Kůrková V., Karpov P., Theis F., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, 2019; Vol. 11731, pp 817–830 10.1007/978-3-030-30493-5_78. [DOI]
Heid E.; Liu J.; Aude A.; Green W. H. Influence of Template Size, Canonicalization, and Exclusivity for Retrosynthesis and Reaction Prediction Applications. J. Chem. Inf. Model. 2022, 62 (1), 16–26. 10.1021/acs.jcim.1c01192. [DOI] [PMC free article] [PubMed] [Google Scholar]
Coley C. W.; Rogers L.; Green W. H.; Jensen K. F. Computer-Assisted Retrosynthesis Based on Molecular Similarity. ACS Cent. Sci. 2017, 3 (12), 1237–1245. 10.1021/acscentsci.7b00355. [DOI] [PMC free article] [PubMed] [Google Scholar]
Coley C. W.; Jin W.; Rogers L.; Jamison T. F.; Jaakkola T. S.; Green W. H.; Barzilay R.; Jensen K. F. A Graph-Convolutional Neural Network Model for the Prediction of Chemical Reactivity. Chem. Sci. 2019, 10 (2), 370–377. 10.1039/C8SC04228D. [DOI] [PMC free article] [PubMed] [Google Scholar]
Krenn M.; Häse F.; Nigam A.; Friederich P.; Aspuru-Guzik A. Self-Referencing Embedded Strings (SELFIES): A 100% Robust Molecular String Representation. Mach. Learn. Sci. Technol. 2020, 1 (4), 045024. 10.1088/2632-2153/aba947. [DOI] [Google Scholar]
Carbonell P.; Carlsson L.; Faulon J.-L. Stereo Signature Molecular Descriptor. J. Chem. Inf. Model. 2013, 53 (4), 887–897. 10.1021/ci300584r. [DOI] [PubMed] [Google Scholar]
Hähnke V. D.; Bolton E. E.; Bryant S. H. PubChem Atom Environments. J. Cheminformatics 2015, 7 (1), 41. 10.1186/s13321-015-0076-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wigh D. S.; Goodman J. M.; Lapkin A. A. A Review of Molecular Representation in the Age of Machine Learning. WIREs Comput. Mol. Sci. 2022, 12 (5), e1603. 10.1002/wcms.1603. [DOI] [Google Scholar]
David L.; Thakkar A.; Mercado R.; Engkvist O. Molecular Representations in AI-Driven Drug Discovery: A Review and Practical Guide. J. Cheminformatics 2020, 12 (1), 56. 10.1186/s13321-020-00460-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li J.; Fang L.; Lou J.-G. RetroRanker: Leveraging Reaction Changes to Improve Retrosynthesis Prediction through Re-Ranking. J. Cheminformatics 2023, 15 (1), 58. 10.1186/s13321-023-00727-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim E.; Lee D.; Kwon Y.; Park M. S.; Choi Y.-S. Valid, Plausible, and Diverse Retrosynthesis Using Tied Two-Way Transformers with Latent Variables. J. Chem. Inf. Model. 2021, 61 (1), 123–133. 10.1021/acs.jcim.0c01074. [DOI] [PubMed] [Google Scholar]
Coley C. W.; Green W. H.; Jensen K. F. RDChiral: An RDKit Wrapper for Handling Stereochemistry in Retrosynthetic Template Extraction and Application. J. Chem. Inf. Model. 2019, 59 (6), 2529–2537. 10.1021/acs.jcim.9b00286. [DOI] [PubMed] [Google Scholar]
Schwaller P.; Petraglia R.; Zullo V.; Nair V. H.; Haeuselmann R. A.; Pisoni R.; Bekas C.; Iuliano A.; Laino T. Predicting Retrosynthetic Pathways Using Transformer-Based Models and a Hyper-Graph Exploration Strategy. Chem. Sci. 2020, 11 (12), 3316–3325. 10.1039/C9SC05704H. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang Y.; Wang L.; Wang X.; Zhang C.; Ge J.; Tang J.; Su A.; Duan H. Data Augmentation and Transfer Learning Strategies for Reaction Prediction in Low Chemical Data Regimes. Org. Chem. Front. 2021, 8 (7), 1415–1423. 10.1039/D0QO01636E. [DOI] [Google Scholar]
Mao K.; Xiao X.; Xu T.; Rong Y.; Huang J.; Zhao P. Molecular Graph Enhanced Transformer for Retrosynthesis Prediction. Neurocomputing 2021, 457, 193–202. 10.1016/j.neucom.2021.06.037. [DOI] [Google Scholar]
Irwin R.; Dimitriadis S.; He J.; Bjerrum E. J. Chemformer: A Pre-Trained Transformer for Computational Chemistry. Mach. Learn. Sci. Technol. 2022, 3 (1), 015022. 10.1088/2632-2153/ac3ffb. [DOI] [Google Scholar]
Baylon J. L.; Cilfone N. A.; Gulcher J. R.; Chittenden T. W. Enhancing Retrosynthetic Reaction Prediction with Deep Learning Using Multiscale Reaction Classification. J. Chem. Inf. Model. 2019, 59 (2), 673–688. 10.1021/acs.jcim.8b00801. [DOI] [PubMed] [Google Scholar]
Zhang B.; Lin J.; Du L.; Zhang L. Harnessing Data Augmentation and Normalization Preprocessing to Improve the Performance of Chemical Reaction Predictions of Data-Driven Model. Polymers 2023, 15 (9), 2224. 10.3390/polym15092224. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu J.; Xia Y.; Wu L.; Xie S.; Zhou W.; Qin T.; Li H.; Liu T.-Y.. Dual-View Molecular Pre-Training. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; ACM: Long Beach, CA, USA, 2023; pp 3615–3627 10.1145/3580305.3599317. [DOI]
Yang F.; Liu J.; Zhang Q.; Yang Z.; Zhang X. CNN-Based Two-Branch Multi-Scale Feature Extraction Network for Retrosynthesis Prediction. BMC Bioinformatics 2022, 23 (1), 362. 10.1186/s12859-022-04904-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zheng S.; Rao J.; Zhang Z.; Xu J.; Yang Y. Predicting Retrosynthetic Reactions Using Self-Corrected Transformer Neural Networks. J. Chem. Inf. Model. 2020, 60 (1), 47–55. 10.1021/acs.jcim.9b00949. [DOI] [PubMed] [Google Scholar]
Lee A. A.; Yang Q.; Sresht V.; Bolgar P.; Hou X.; Klug-McLeod J. L.; Butler C. R. Molecular Transformer Unifies Reaction Prediction and Retrosynthesis across Pharma Chemical Space. Chem. Commun. 2019, 55 (81), 12152–12155. 10.1039/C9CC05122H. [DOI] [PubMed] [Google Scholar]
Wang X.; Li Y.; Qiu J.; Chen G.; Liu H.; Liao B.; Hsieh C.-Y.; Yao X. RetroPrime: A Diverse, Plausible and Transformer-Based Method for Single-Step Retrosynthesis Predictions. Chem. Eng. J. 2021, 420, 129845. 10.1016/j.cej.2021.129845. [DOI] [Google Scholar]
Liu B.; Ramsundar B.; Kawthekar P.; Shi J.; Gomes J.; Luu Nguyen Q.; Ho S.; Sloane J.; Wender P.; Pande V. Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models. ACS Cent. Sci. 2017, 3 (10), 1103–1113. 10.1021/acscentsci.7b00303. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fang L.; Li J.; Zhao M.; Tan L.; Lou J.-G. Single-Step Retrosynthesis Prediction by Leveraging Commonly Preserved Substructures. Nat. Commun. 2023, 14 (1), 2446. 10.1038/s41467-023-37969-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang K.; Mann V.; Venkatasubramanian V. G MATT: Single step Retrosynthesis Prediction Using Molecular Grammar Tree Transformer. AIChE J. 2024, 70, e18244. 10.1002/aic.18244. [DOI] [Google Scholar]
Yan C.; Zhao P.; Lu C.; Yu Y.; Huang J. RetroComposer: Composing Templates for Template-Based Retrosynthesis Prediction. Biomolecules 2022, 12 (9), 1325. 10.3390/biom12091325. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bai R.; Zhang C.; Wang L.; Yao C.; Ge J.; Duan H. Transfer Learning: Making Retrosynthetic Predictions Based on a Small Chemical Reaction Dataset Scale to a New Level. Molecules 2020, 25 (10), 2357. 10.3390/molecules25102357. [DOI] [PMC free article] [PubMed] [Google Scholar]
Qiao H.; Wu Y.; Zhang Y.; Zhang C.; Wu X.; Wu Z.; Zhao Q.; Wang X.; Li H.; Duan H. Transformer-Based Multitask Learning for Reaction Prediction under Low-Resource Circumstances. RSC Adv. 2022, 12 (49), 32020–32026. 10.1039/D2RA05349G. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yan Y.; Zhao Y.; Yao H.; Feng J.; Liang L.; Han W.; Xu X.; Pu C.; Zang C.; Chen L.; Li Y.; Liu H.; Lu T.; Chen Y.; Zhang Y. RPBP: Deep Retrosynthesis Reaction Prediction Based on Byproducts. J. Chem. Inf. Model. 2023, 63 (19), 5956–5970. 10.1021/acs.jcim.3c00274. [DOI] [PubMed] [Google Scholar]
Heid E.; Goldman S.; Sankaranarayanan K.; Coley C. W.; Flamm C.; Green W. H. EHreact: Extended Hasse Diagrams for the Extraction and Scoring of Enzymatic Reaction Templates. J. Chem. Inf. Model. 2021, 61 (10), 4949–4961. 10.1021/acs.jcim.1c00921. [DOI] [PMC free article] [PubMed] [Google Scholar]
Seo S.-W.; Song Y. Y.; Yang J. Y.; Bae S.; Lee H.; Shin J.; Hwang S. J.; Yang E. GTA: Graph Truncated Attention for Retrosynthesis. Proc. AAAI Conf. Artif. Intell. 2021, 35 (1), 531–539. 10.1609/aaai.v35i1.16131. [DOI] [Google Scholar]
He H.-R.; Wang J.; Liu Y.; Wu F.. Modeling Diverse Chemical Reactions for Single-Step Retrosynthesis via Discrete Latent Variables. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management; ACM: Atlanta, GA, USA, 2022; pp 717–726 10.1145/3511808.3557397. [DOI]
Sankaranarayanan K.; Heid E.; Coley C. W.; Verma D.; Green W. H.; Jensen K. F. Similarity Based Enzymatic Retrosynthesis. Chem. Sci. 2022, 13 (20), 6039–6053. 10.1039/D2SC01588A. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tetko I. V.; Karpov P.; Van Deursen R.; Godin G. State-of-the-Art Augmented NLP Transformer Models for Direct and Single-Step Retrosynthesis. Nat. Commun. 2020, 11 (1), 5575. 10.1038/s41467-020-19266-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Toniato A.; Vaucher A. C.; Schwaller P.; Laino T. Enhancing Diversity in Language Based Models for Single-Step Retrosynthesis. Digit. Discovery 2023, 2 (2), 489–501. 10.1039/D2DD00110A. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ucak U. V.; Ashyrmamatov I.; Lee J. Reconstruction of Lossless Molecular Representations from Fingerprints. J. Cheminformatics 2023, 15 (1), 26. 10.1186/s13321-023-00693-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Seidl P.; Renz P.; Dyubankova N.; Neves P.; Verhoeven J.; Wegner J. K.; Segler M.; Hochreiter S.; Klambauer G. Improving Few- and Zero-Shot Reaction Template Prediction Using Modern Hopfield Networks. J. Chem. Inf. Model. 2022, 62 (9), 2111–2120. 10.1021/acs.jcim.1c01065. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu J.; Wang J.; Zhao H.; Gao J.; Kang Y.; Cao D.; Wang Z.; Hou T. Organic Compound Synthetic Accessibility Prediction Based on the Graph Attention Mechanism. J. Chem. Inf. Model. 2022, 62 (12), 2973–2986. 10.1021/acs.jcim.2c00038. [DOI] [PubMed] [Google Scholar]
Ucak U. V.; Kang T.; Ko J.; Lee J. Substructure-Based Neural Machine Translation for Retrosynthetic Prediction. J. Cheminformatics 2021, 13 (1), 4. 10.1186/s13321-020-00482-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ucak U. V.; Ashyrmamatov I.; Ko J.; Lee J. Retrosynthetic Reaction Pathway Prediction through Neural Machine Translation of Atomic Environments. Nat. Commun. 2022, 13 (1), 1186. 10.1038/s41467-022-28857-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
Badowski T.; Gajewska E. P.; Molga K.; Grzybowski B. A. Synergy Between Expert and Machine Learning Approaches Allows for Improved Retrosynthetic Planning. Angew. Chem., Int. Ed. 2020, 59 (2), 725–730. 10.1002/anie.201912083. [DOI] [PubMed] [Google Scholar]
Thakkar A.; Selmi N.; Reymond J.-L.; Engkvist O.; Bjerrum E. J. Ring Breaker”: Neural Network Driven Synthesis Prediction of the Ring System Chemical Space. J. Med. Chem. 2020, 63 (16), 8791–8808. 10.1021/acs.jmedchem.9b01919. [DOI] [PubMed] [Google Scholar]
Hasic H.; Ishida T. Single-Step Retrosynthesis Prediction Based on the Identification of Potential Disconnection Sites Using Molecular Substructure Fingerprints. J. Chem. Inf. Model. 2021, 61 (2), 641–652. 10.1021/acs.jcim.0c01100. [DOI] [PubMed] [Google Scholar]
Segler M. H. S.; Waller M. P. Neural Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. Chem. - Eur. J. 2017, 23 (25), 5966–5971. 10.1002/chem.201605499. [DOI] [PubMed] [Google Scholar]
Ishida S.; Terayama K.; Kojima R.; Takasu K.; Okuno Y. Prediction and Interpretable Visualization of Retrosynthetic Reactions Using Graph Convolutional Networks. J. Chem. Inf. Model. 2019, 59 (12), 5026–5033. 10.1021/acs.jcim.9b00538. [DOI] [PubMed] [Google Scholar]
Dai H.; Li C.; Coley C.; Dai B.; Song L.. Retrosynthesis Prediction with Conditional Graph Logic Network. In Advances in Neural Information Processing Systems 32; NeurIPS 2019; Curran Associates, Inc., 2019; Vol. 32.
Chen S.; Jung Y. Deep Retrosynthetic Reaction Prediction Using Local Reactivity and Global Attention. JACS Au 2021, 1 (10), 1612–1620. 10.1021/jacsau.1c00246. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee H.; Ahn S.; Seo S.-W.; Song Y. Y.; Yang E.; Hwang S.-J.; Shin J.. RetCL: A Selection-Based Approach for Retrosynthesis via Contrastive Learning. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence; IJCAI-21; International Joint Conferences on Artificial Intelligence Organization, 2021; pp 2673–2679 10.24963/ijcai.2021/368. [DOI]
Lin Z.; Yin S.; Shi L.; Zhou W.; Zhang Y. J. G2GT: Retrosynthesis Prediction with Graph-to-Graph Attention Neural Network and Self-Training. J. Chem. Inf. Model. 2023, 63 (7), 1894–1905. 10.1021/acs.jcim.2c01302. [DOI] [PubMed] [Google Scholar]
Delépine B.; Duigou T.; Carbonell P.; Faulon J.-L. RetroPath2.0: A Retrosynthesis Workflow for Metabolic Engineers. Metab. Eng. 2018, 45, 158–170. 10.1016/j.ymben.2017.12.002. [DOI] [PubMed] [Google Scholar]
Szymkuć S.; Gajewska E. P.; Klucznik T.; Molga K.; Dittwald P.; Startek M.; Bajczyk M.; Grzybowski B. A. Computer-Assisted Synthetic Planning: The End of the Beginning. Angew. Chem., Int. Ed. 2016, 55 (20), 5904–5937. 10.1002/anie.201506101. [DOI] [PubMed] [Google Scholar]
Finnigan W.; Hepworth L. J.; Flitsch S. L.; Turner N. J. RetroBioCat as a Computer-Aided Synthesis Planning Tool for Biocatalytic Reactions and Cascades. Nat. Catal. 2021, 4 (2), 98–104. 10.1038/s41929-020-00556-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Duigou T.; du Lac M.; Carbonell P.; Faulon J.-L. RetroRules: A Database of Reaction Rules for Engineering Biology. Nucleic Acids Res. 2019, 47 (D1), D1229–D1235. 10.1093/nar/gky940. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dong Z.; Chen Z.; Wang Q.. Retrosynthesis Prediction Based on Graph Relation Network. In 2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI); IEEE: Beijing, China, 2022; pp 1–5 10.1109/CISP-BMEI56279.2022.9979857. [DOI]
Schwaller P.; Laino T.; Gaudin T.; Bolgar P.; Hunter C. A.; Bekas C.; Lee A. A. Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction. ACS Cent. Sci. 2019, 5 (9), 1572–1583. 10.1021/acscentsci.9b00576. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guo Z.; Wu S.; Ohno M.; Yoshida R. Bayesian Algorithm for Retrosynthesis. J. Chem. Inf. Model. 2020, 60 (10), 4474–4486. 10.1021/acs.jcim.0c00320. [DOI] [PubMed] [Google Scholar]
Tu Z.; Coley C. W. Permutation Invariant Graph-to-Sequence Model for Template-Free Retrosynthesis and Reaction Prediction. J. Chem. Inf. Model. 2022, 62 (15), 3503–3513. 10.1021/acs.jcim.2c00321. [DOI] [PubMed] [Google Scholar]
Hu H.; Jiang Y.; Yang Y.; Chen J. X. BiG2S: A Dual Task Graph-to-Sequence Model for the End-to-End Template-Free Reaction Prediction. Appl. Intell 2023, 53, 29620. 10.1007/s10489-023-05048-8. [DOI] [Google Scholar]
Liu S.; Tu Z.; Xu M.; Zhang Z.; Lin L.; Ying R.; Tang J.; Zhao P.; Wu D.. FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning. In Proceedings of the 40th International Conference on Machine Learning; ICML’23; JMLR.org: Honolulu, Hawaii, USA, 2023 10.5555/3618408.3619322. [DOI] [Google Scholar]
Lin M. H.; Tu Z.; Coley C. W. Improving the Performance of Models for One-Step Retrosynthesis through Re-Ranking. J. Cheminformatics 2022, 14 (1), 15. 10.1186/s13321-022-00594-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sun R.; Dai H.; Li L.; Kearnes S.; Dai B.. Towards Understanding Retrosynthesis by Energy-Based Models. In Advances in Neural Information Processing Systems 34; NeurIPS 2021; Curran Associates, Inc., 2021.
Christofidellis D.; Giannone G.; Born J.; Winther O.; Laino T.; Manica M.. Unifying Molecular and Textual Representations via Multi-Task Language Modelling. In Proceedings of the 40th International Conference on Machine Learning; ICML’23; JMLR.org: Honolulu, Hawaii, USA, 2023 10.5555/3618408.3618651. [DOI]
Zheng S.; Zeng T.; Li C.; Chen B.; Coley C. W.; Yang Y.; Wu R. Deep Learning Driven Biosynthetic Pathways Navigation for Natural Products with BioNavi-NP. Nat. Commun. 2022, 13 (1), 3342. 10.1038/s41467-022-30970-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kreutter D.; Schwaller P.; Reymond J.-L. Predicting Enzymatic Reactions with a Molecular Transformer. Chem. Sci. 2021, 12 (25), 8648–8659. 10.1039/D1SC02362D. [DOI] [PMC free article] [PubMed] [Google Scholar]
Probst D.; Manica M.; Nana Teukam Y. G.; Castrogiovanni A.; Paratore F.; Laino T. Biocatalysed Synthesis Planning Using Data-Driven Learning. Nat. Commun. 2022, 13 (1), 964. 10.1038/s41467-022-28536-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shi C.; Xu M.; Guo H.; Zhang M.; Tang J.. A Graph to Graphs Framework for Retrosynthesis Prediction. In Proceedings of the 37th International Conference on Machine Learning; ICML’20; JMLR.org, 2020 10.5555/3524938.3525756. [DOI]
Yan C.; Ding Q.; Zhao P.; Zheng S.; Yang J.; Yu Y.; Huang J.. RetroXpert: Decompose Retrosynthesis Prediction Like A Chemist. In Proceedings of the 34th International Conference on Neural Information Processing Systems; NIPS’20; Curran Associates Inc.: Red Hook, NY, USA, 2020 10.5555/3495724.3496668. [DOI]
Somnath V. R.; Bunne C.; Coley C. W.; Krause A.; Barzilay R.. Learning Graph Models for Retrosynthesis Prediction. In Advances in Neural Information Processing Systems 34; NeurIPS 2021; Curran Associates, Inc., 2021.
Wang Y.; Pang C.; Wang Y.; Jin J.; Zhang J.; Zeng X.; Su R.; Zou Q.; Wei L. Retrosynthesis Prediction with an Interpretable Deep-Learning Framework Based on Molecular Assembly Tasks. Nat. Commun. 2023, 14 (1), 6155. 10.1038/s41467-023-41698-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thakkar A.; Vaucher A. C.; Byekwaso A.; Schwaller P.; Toniato A.; Laino T. Unbiasing Retrosynthesis Language Models with Disconnection Prompts. ACS Cent. Sci. 2023, 9 (7), 1488–1498. 10.1021/acscentsci.3c00372. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jaume-Santero F.; Bornet A.; Valery A.; Naderi N.; Vicente Alvarez D.; Proios D.; Yazdani A.; Bournez C.; Fessard T.; Teodoro D. Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios. J. Chem. Inf. Model. 2023, 63 (7), 1914–1924. 10.1021/acs.jcim.2c01407. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang X.; Hsieh C.-Y.; Yin X.; Wang J.; Li Y.; Deng Y.; Jiang D.; Wu Z.; Du H.; Chen H.; Li Y.; Liu H.; Wang Y.; Luo P.; Hou T.; Yao X. Generic Interpretable Reaction Condition Predictions with Open Reaction Condition Datasets and Unsupervised Learning of Reaction Center. Research 2023, 6, 0231. 10.34133/research.0231. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang C.; Lapkin A. A. Reinforcement Learning Optimization of Reaction Routes on the Basis of Large, Hybrid Organic Chemistry-Synthetic Biological, Reaction Network Data. React. Chem. Eng. 2023, 8 (10), 2491–2504. 10.1039/D2RE00406B. [DOI] [Google Scholar]
Koch M.; Duigou T.; Faulon J.-L. Reinforcement Learning for Bioretrosynthesis. ACS Synth. Biol. 2020, 9 (1), 157–168. 10.1021/acssynbio.9b00447. [DOI] [PubMed] [Google Scholar]
Wang W.; Liu Q.; Zhang L.; Dong Y.; Du J. RetroSynX: A Retrosynthetic Analysis Framework Using Hybrid Reaction Templates and Group Contribution-Based Thermodynamic Models. Chem. Eng. Sci. 2022, 248, 117208. 10.1016/j.ces.2021.117208. [DOI] [Google Scholar]
Liu Q.; Tang K.; Zhang L.; Du J.; Meng Q. Computer assisted Synthetic Planning Considering Reaction Kinetics Based on Transition State Automated Generation Method. AIChE J. 2023, 69 (7), e18092. 10.1002/aic.18092. [DOI] [Google Scholar]
Kreutter D.; Reymond J.-L. Multistep Retrosynthesis Combining a Disconnection Aware Triple Transformer Loop with a Route Penalty Score Guided Tree Search. Chem. Sci. 2023, 14 (36), 9959–9969. 10.1039/D3SC01604H. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kishimoto A.; Buesser B.; Chen B.; Botea A.. Depth-First Proof-Number Search with Heuristic Edge Cost and Application to Chemical Synthesis Planning. In Advances in Neural Information Processing Systems 32; NeurIPS 2019; Curran Associates, Inc., 2019.
Franz C.; Mogk G.; Mrziglod T.; Schewior K.. Completeness and Diversity in Depth-First Proof-Number Search with Applications to Retrosynthesis. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence; International Joint Conferences on Artificial Intelligence Organization: Vienna, Austria, 2022; pp 4747–4753 10.24963/ijcai.2022/658. [DOI]
Shibukawa R.; Ishida S.; Yoshizoe K.; Wasa K.; Takasu K.; Okuno Y.; Terayama K.; Tsuda K. CompRet: A Comprehensive Recommendation Framework for Chemical Synthesis Planning with Algorithmic Enumeration. J. Cheminformatics 2020, 12 (1), 52. 10.1186/s13321-020-00452-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Segler M. H. S.; Preuss M.; Waller M. P. Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI. Nature 2018, 555 (7698), 604–610. 10.1038/nature25978. [DOI] [PubMed] [Google Scholar]
Zhang B.; Zhang X.; Du W.; Song Z.; Zhang G.; Zhang G.; Wang Y.; Chen X.; Jiang J.; Luo Y. Chemistry-Informed Molecular Graph as Reaction Descriptor for Machine-Learned Retrosynthesis Planning. Proc. Natl. Acad. Sci. U. S. A. 2022, 119 (41), e2212711119. 10.1073/pnas.2212711119. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang X.; Qian Y.; Gao H.; Coley C. W.; Mo Y.; Barzilay R.; Jensen K. F. Towards Efficient Discovery of Green Synthetic Pathways with Monte Carlo Tree Search and Reinforcement Learning. Chem. Sci. 2020, 11 (40), 10959–10972. 10.1039/D0SC04184J. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thakkar A.; Kogej T.; Reymond J.-L.; Engkvist O.; Bjerrum E. J. Datasets and Their Influence on the Development of Computer Assisted Synthesis Planning Tools in the Pharmaceutical Domain. Chem. Sci. 2020, 11 (1), 154–168. 10.1039/C9SC04944D. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin K.; Xu Y.; Pei J.; Lai L. Automatic Retrosynthetic Route Planning Using Template-Free Models. Chem. Sci. 2020, 11 (12), 3355–3364. 10.1039/C9SC03666K. [DOI] [PMC free article] [PubMed] [Google Scholar]
Genheden S.; Thakkar A.; Chadimová V.; Reymond J.-L.; Engkvist O.; Bjerrum E. AiZynthFinder: A Fast, Robust and Flexible Open-Source Software for Retrosynthetic Planning. J. Cheminformatics 2020, 12 (1), 70. 10.1186/s13321-020-00472-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gao H.; Coley C. W.; Struble T. J.; Li L.; Qian Y.; Green W. H.; Jensen K. F. Combining Retrosynthesis and Mixed-Integer Optimization for Minimizing the Chemical Inventory Needed to Realize a WHO Essential Medicines List. React. Chem. Eng. 2020, 5 (2), 367–376. 10.1039/C9RE00348G. [DOI] [Google Scholar]
Coley C. W.; Thomas D. A.; Lummiss J. A. M.; Jaworski J. N.; Breen C. P.; Schultz V.; Hart T.; Fishman J. S.; Rogers L.; Gao H.; Hicklin R. W.; Plehiers P. P.; Byington J.; Piotti J. S.; Green W. H.; Hart A. J.; Jamison T. F.; Jensen K. F. A Robotic Platform for Flow Synthesis of Organic Compounds Informed by AI Planning. Science 2019, 365 (6453), eaax1566. 10.1126/science.aax1566. [DOI] [PubMed] [Google Scholar]
Westerlund A. M.; Barge B.; Mervin L.; Genheden S. Data driven Approaches for Identifying Hyperparameters in Multi step Retrosynthesis. Mol. Inform. 2023, 42, 2300128. 10.1002/minf.202300128. [DOI] [PubMed] [Google Scholar]
Sankaranarayanan K.; Jensen K. F. Computer-Assisted Multistep Chemoenzymatic Retrosynthesis Using a Chemical Synthesis Planner. Chem. Sci. 2023, 14 (23), 6467–6475. 10.1039/D3SC01355C. [DOI] [PMC free article] [PubMed] [Google Scholar]
Levin I.; Liu M.; Voigt C. A.; Coley C. W. Merging Enzymatic and Synthetic Chemistry with Computational Synthesis Planning. Nat. Commun. 2022, 13, 7747. 10.1038/s41467-022-35422-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jeong J.; Lee N.; Shin Y.; Shin D. Intelligent Generation of Optimal Synthetic Pathways Based on Knowledge Graph Inference and Retrosynthetic Predictions Using Reaction Big Data. J. Taiwan Inst. Chem. Eng. 2022, 130, 103982. 10.1016/j.jtice.2021.07.015. [DOI] [Google Scholar]
Chen B.; Li C.; Dai H.; Song L.. Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search. In Proceedings of the 37th International Conference on Machine Learning; Proceedings of Machine Learning Research (PMLR), 2020; Vol. 119, pp 1608–1616.
Xie S.; Yan R.; Han P.; Xia Y.; Wu L.; Guo C.; Yang B.; Qin T.. RetroGraph: Retrosynthetic Planning with Graph Search. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; ACM: Washington DC, USA, 2022; pp 2120–2129 10.1145/3534678.3539446. [DOI]
Han P.; Zhao P.; Lu C.; Huang J.; Wu J.; Shang S.; Yao B.; Zhang X. GNN-Retro: Retrosynthetic Planning with Graph Neural Networks. Proc. AAAI Conf. Artif. Intell. 2022, 36 (4), 4014–4021. 10.1609/aaai.v36i4.20318. [DOI] [Google Scholar]
Latendresse M.; Malerich J. P.; Herson J.; Krummenacker M.; Szeto J.; Vu V.-A.; Collins N.; Madrid P. B. SynRoute: A Retrosynthetic Planning Software. J. Chem. Inf. Model. 2023, 63 (17), 5484–5495. 10.1021/acs.jcim.3c00491. [DOI] [PMC free article] [PubMed] [Google Scholar]
Grzybowski B. A.; Szymkuć S.; Gajewska E. P.; Molga K.; Dittwald P.; Wołos A.; Klucznik T. Chematica: A Story of Computer Code That Started to Think like a Chemist. Chem. 2018, 4 (3), 390–398. 10.1016/j.chempr.2018.02.024. [DOI] [Google Scholar]
Russell S. J.; Norvig P.. Artificial Intelligence: A Modern Approach, 4th ed.; Pearson series in artificial intelligence; Pearson: Hoboken, 2021. [Google Scholar]
Schreck J. S.; Coley C. W.; Bishop K. J. M. Learning Retrosynthetic Planning through Simulated Experience. ACS Cent. Sci. 2019, 5 (6), 970–981. 10.1021/acscentsci.9b00055. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu Y.; Wei Y.; Kuang K.; Huang Z.; Yao H.; Wu F.. GRASP: Navigating Retrosynthetic Planning with Goal-Driven Policy. In Advances in Neural Information Processing Systems 35; NeurIPS 2022; Curran Associates, Inc., 2022.
Kumar A.; Wang L.; Ng C. Y.; Maranas C. D. Pathway Design Using de Novo Steps through Uncharted Biochemical Spaces. Nat. Commun. 2018, 9 (1), 184. 10.1038/s41467-017-02362-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carbonell P.; Parutto P.; Herisson J.; Pandit S. B.; Faulon J.-L. XTMS: Pathway Design in an eXTended Metabolic Space. Nucleic Acids Res. 2014, 42 (W1), W389–W394. 10.1093/nar/gku362. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tokic M.; Hadadi N.; Ataman M.; Neves D.; Ebert B. E.; Blank L. M.; Miskovic L.; Hatzimanikatis V. Discovery and Evaluation of Biosynthetic Pathways for the Production of Five Methyl Ethyl Ketone Precursors. ACS Synth. Biol. 2018, 7 (8), 1858–1873. 10.1021/acssynbio.8b00049. [DOI] [PubMed] [Google Scholar]
Hadadi N.; Hafner J.; Shajkofci A.; Zisaki A.; Hatzimanikatis V. ATLAS of Biochemistry: A Repository of All Possible Biochemical Reactions for Synthetic Biology and Metabolic Engineering Studies. ACS Synth. Biol. 2016, 5 (10), 1155–1166. 10.1021/acssynbio.6b00054. [DOI] [PubMed] [Google Scholar]
Otero-Muras I.; Carbonell P. Automated Engineering of Synthetic Metabolic Pathways for Efficient Biomanufacturing. Metab. Eng. 2021, 63, 61–80. 10.1016/j.ymben.2020.11.012. [DOI] [PubMed] [Google Scholar]
Kim J.; Ahn S.; Lee H.; Shin J.. Self-Improved Retrosynthetic Planning. In Proceedings of the 38th International Conference on Machine Learning, Virtual, July 18–24, 2021; Proceedings of Machine Learning Research (PMLR), 2021.
Gao D.; Song W.; Wu J.; Guo L.; Gao C.; Liu J.; Chen X.; Liu L. Efficient Production of L-Homophenylalanine by Enzymatic-Chemical Cascade Catalysis. Angew. Chem., Int. Ed. 2022, 61 (36), e202207077. 10.1002/anie.202207077. [DOI] [PubMed] [Google Scholar]
Rudroff F.; Mihovilovic M. D.; Gröger H.; Snajdrova R.; Iding H.; Bornscheuer U. T. Opportunities and Challenges for Combining Chemo- and Biocatalysis. Nat. Catal. 2018, 1 (1), 12–22. 10.1038/s41929-017-0010-4. [DOI] [Google Scholar]
Finnigan W.; Flitsch S. L.; Hepworth L. J.; Turner N. J.. Enzyme Cascade Design: Retrosynthesis Approach. In Enzyme Cascade Design and Modelling; Kara S., Rudroff F., Eds.; Springer International Publishing: Cham, 2021; pp 7–30 10.1007/978-3-030-65718-5_2. [DOI] [Google Scholar]
Skoraczyński G.; Kitlas M.; Miasojedow B.; Gambin A. Critical Assessment of Synthetic Accessibility Scores in Computer-Assisted Synthesis Planning. J. Cheminformatics 2023, 15 (1), 6. 10.1186/s13321-023-00678-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thakkar A.; Chadimová V.; Bjerrum E. J.; Engkvist O.; Reymond J.-L. Retrosynthetic Accessibility Score (RAscore) - Rapid Machine Learned Synthesizability Classification from AI Driven Retrosynthetic Planning. Chem. Sci. 2021, 12 (9), 3339–3349. 10.1039/D0SC05401A. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li B.; Chen H. Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph. Molecules 2022, 27 (3), 1039. 10.3390/molecules27031039. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim H.; Lee K.; Kim C.; Lim J.; Kim W. Y. DFRscore: Deep Learning-Based Scoring of Synthetic Complexity with Drug-Focused Retrosynthetic Analysis for High-Throughput Virtual Screening. J. Chem. Inf. Model 2024, 64, 2432. 10.1021/acs.jcim.3c01134. [DOI] [PubMed] [Google Scholar]
Correia J.; Carreira R.; Pereira V.; Rocha M.. Predicting the Number of Biochemical Transformations Needed to Synthesize a Compound. In 2022 International Joint Conference on Neural Networks (IJCNN); IEEE: Padua, Italy, 2022; pp 1–8 10.1109/IJCNN55064.2022.9892124. [DOI]
Parrot M.; Tajmouati H.; Da Silva V. B. R.; Atwood B. R.; Fourcade R.; Gaston-Mathé Y.; Do Huu N.; Perron Q. Integrating Synthetic Accessibility with AI-Based Generative Drug Design. J. Cheminformatics 2023, 15 (1), 83. 10.1186/s13321-023-00742-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sanchez-Garcia R.; Havasi D.; Takács G.; Robinson M. C.; Lee A.; Von Delft F.; Deane C. M. CoPriNet: Graph Neural Networks Provide Accurate and Rapid Compound Price Prediction for Molecule Prioritisation. Digit. Discovery 2023, 2 (1), 103–111. 10.1039/D2DD00071G. [DOI] [Google Scholar]
Tang K.; Zhuang Y.; Wang W.; Liu Q.; Zhang L.; Du J.; Meng Q. GC-NORM-Based Thermodynamic Framework for Evaluations of Organic Reactions Involving Carbon Dioxide Utilization. Chem. Eng. Sci. 2023, 278, 118913. 10.1016/j.ces.2023.118913. [DOI] [Google Scholar]
Schwaller P.; Vaucher A. C.; Laino T.; Reymond J.-L. Prediction of Chemical Reaction Yields Using Deep Learning. Mach. Learn. Sci. Technol. 2021, 2 (1), 015016. 10.1088/2632-2153/abc81d. [DOI] [Google Scholar]
Plehiers P. P.; Coley C. W.; Gao H.; Vermeire F. H.; Dobbelaere M. R.; Stevens C. V.; Van Geem K. M.; Green W. H. Artificial Intelligence for Computer-Aided Synthesis In Flow: Analysis and Selection of Reaction Components. Front. Chem. Eng. 2020, 2, 5. 10.3389/fceng.2020.00005. [DOI] [Google Scholar]
Toniato A.; Unsleber J. P.; Vaucher A. C.; Weymuth T.; Probst D.; Laino T.; Reiher M. Quantum Chemical Data Generation as Fill-in for Reliability Enhancement of Machine-Learning Reaction and Retrosynthesis Planning. Digit. Discovery 2023, 2 (3), 663–673. 10.1039/D3DD00006K. [DOI] [PMC free article] [PubMed] [Google Scholar]
Genheden S.; Engkvist O.; Bjerrum E. Fast Prediction of Distances between Synthetic Routes with Deep Learning. Mach. Learn. Sci. Technol. 2022, 3 (1), 015018. 10.1088/2632-2153/ac4a91. [DOI] [Google Scholar]
Kuznetsov A.; Sahinidis N. V. ExtractionScore: A Quantitative Framework for Evaluating Synthetic Routes on Predicted Liquid-Liquid Extraction Performance. J. Chem. Inf. Model. 2021, 61 (5), 2274–2282. 10.1021/acs.jcim.0c01426. [DOI] [PubMed] [Google Scholar]
Ertl P.; Schuffenhauer A. Estimation of Synthetic Accessibility Score of Drug-like Molecules Based on Molecular Complexity and Fragment Contributions. J. Cheminformatics 2009, 1 (1), 8. 10.1186/1758-2946-1-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Coley C. W.; Rogers L.; Green W. H.; Jensen K. F. SCScore: Synthetic Complexity Learned from a Reaction Corpus. J. Chem. Inf. Model. 2018, 58 (2), 252–261. 10.1021/acs.jcim.7b00622. [DOI] [PubMed] [Google Scholar]
Ebrahim A.; Lerman J. A.; Palsson B. O.; Hyduke D. R. COBRApy: COnstraints-Based Reconstruction and Analysis for Python. BMC Syst. Biol. 2013, 7 (1), 74. 10.1186/1752-0509-7-74. [DOI] [PMC free article] [PubMed] [Google Scholar]
Beber M. E.; Gollub M. G.; Mozaffari D.; Shebek K. M.; Flamholz A. I.; Milo R.; Noor E. eQuilibrator 3.0: A Database Solution for Thermodynamic Constant Estimation. Nucleic Acids Res. 2022, 50, D603–D609. 10.1093/nar/gkab1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hérisson J.; Duigou T.; Du Lac M.; Bazi-Kabbaj K.; Sabeti Azad M.; Buldum G.; Telle O.; El Moubayed Y.; Carbonell P.; Swainston N.; Zulkower V.; Kushwaha M.; Baldwin G. S.; Faulon J.-L. The Automated Galaxy-SynBioCAD Pipeline for Synthetic Biology Design and Engineering. Nat. Commun. 2022, 13 (1), 5082. 10.1038/s41467-022-32661-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Levin I.; Fortunato M. E.; Tan K. L.; Coley C. W. Computer aided Evaluation and Exploration of Chemical Spaces Constrained by Reaction Pathways. AIChE J. 2023, 69, e18234. 10.1002/aic.18234. [DOI] [Google Scholar]
Wang L.; Upadhyay V.; Maranas C. D. dGPredictor: Automated Fragmentation Method for Metabolic Reaction Free Energy Prediction and de Novo Pathway Design. PLOS Comput. Biol. 2021, 17 (9), e1009448. 10.1371/journal.pcbi.1009448. [DOI] [PMC free article] [PubMed] [Google Scholar]
Feehan R.; Montezano D.; Slusky J. S. G. Machine Learning for Enzyme Engineering, Selection and Design. Protein Eng. Des. Sel. 2021, 34, gzab019. 10.1093/protein/gzab019. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stoney R. A.; Hanko E. K. R.; Carbonell P.; Breitling R. SelenzymeRF: Updated Enzyme Suggestion Software for Unbalanced Biochemical Reactions. Comput. Struct. Biotechnol. J. 2023, 21, 5868–5876. 10.1016/j.csbj.2023.11.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carbonell P.; Wong J.; Swainston N.; Takano E.; Turner N. J.; Scrutton N. S.; Kell D. B.; Breitling R.; Faulon J.-L. Selenzyme: Enzyme Selection Tool for Pathway Design. Bioinforma. Oxf. Engl. 2018, 34 (12), 2153–2154. 10.1093/bioinformatics/bty065. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hadadi N.; MohammadiPeyhani H.; Miskovic L.; Seijo M.; Hatzimanikatis V. Enzyme Annotation for Orphan and Novel Reactions Using Knowledge of Substrate Reactive Sites. Proc. Natl. Acad. Sci. U. S. A. 2019, 116 (15), 7298–7307. 10.1073/pnas.1818877116. [DOI] [PMC free article] [PubMed] [Google Scholar]
Finnigan W.; Lubberink M.; Hepworth L. J.; Citoler J.; Mattey A. P.; Ford G. J.; Sangster J.; Cosgrove S. C.; da Costa B. Z.; Heath R. S.; Thorpe T. W.; Yu Y.; Flitsch S. L.; Turner N. J. RetroBioCat Database: A Platform for Collaborative Curation and Automated Meta-Analysis of Biocatalysis Data. ACS Catal. 2023, 13 (17), 11771–11780. 10.1021/acscatal.3c01418. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim Y.; Ryu J. Y.; Kim H. U.; Jang W. D.; Lee S. Y. A Deep Learning Approach to Evaluate the Feasibility of Enzymatic Reactions Generated by Retrobiosynthesis. Biotechnol. J. 2021, 16 (5), 2000605. 10.1002/biot.202000605. [DOI] [PubMed] [Google Scholar]
Upadhyay V.; Boorla V. S.; Maranas C. D. Rank-Ordering of Known Enzymes as Starting Points for Re-Engineering Novel Substrate Activity Using a Convolutional Neural Network. Metab. Eng. 2023, 78, 171–182. 10.1016/j.ymben.2023.06.001. [DOI] [PubMed] [Google Scholar]
Kotera M.; Okuno Y.; Hattori M.; Goto S.; Kanehisa M. Computational Assignment of the EC Numbers for Genomic-Scale Analysis of Enzymatic Reactions. J. Am. Chem. Soc. 2004, 126 (50), 16487–16498. 10.1021/ja0466457. [DOI] [PubMed] [Google Scholar]
Rahman S. A.; Cuesta S. M.; Furnham N.; Holliday G. L.; Thornton J. M. EC-BLAST: A Tool to Automatically Search and Compare Enzyme Reactions. Nat. Methods 2014, 11 (2), 171–174. 10.1038/nmeth.2803. [DOI] [PMC free article] [PubMed] [Google Scholar]
Egelhofer V.; Schomburg I.; Schomburg D. Automatic Assignment of EC Numbers. PLoS Comput. Biol. 2010, 6 (1), e1000661. 10.1371/journal.pcbi.1000661. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hu Q.-N.; Zhu H.; Li X.; Zhang M.; Deng Z.; Yang X.; Deng Z. Assignment of EC Numbers to Enzymatic Reactions with Reaction Difference Fingerprints. PLoS One 2012, 7 (12), e52901. 10.1371/journal.pone.0052901. [DOI] [PMC free article] [PubMed] [Google Scholar]
Probst D. An Explainability Framework for Deep Learning on Chemical Reactions Exemplified by Enzyme-Catalysed Reaction Classification. J. Cheminformatics 2023, 15 (1), 113. 10.1186/s13321-023-00784-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu C.-H.; Korablyov M.; Jastrzȩbski S.; Włodarczyk-Pruszyński P.; Bengio Y.; Segler M. RetroGNN: Fast Estimation of Synthesizability for Virtual Screening and De Novo Design by Learning from Slow Retrosynthesis Software. J. Chem. Inf. Model. 2022, 62 (10), 2293–2300. 10.1021/acs.jcim.1c01476. [DOI] [PubMed] [Google Scholar]
Hafner J.; Mohammadi-Peyhani H.; Hatzimanikatis V.. Pathway Design. In Metabolic Engineering; John Wiley & Sons, Ltd., 2021; pp 237–257 10.1002/9783527823468.ch8. [DOI] [Google Scholar]
de Souza R. O. M. A.; Miranda L. S. M.; Bornscheuer U. T. A Retrosynthesis Approach for Biocatalysis in Organic Synthesis. Chem. - Eur. J. 2017, 23 (50), 12040–12063. 10.1002/chem.201702235. [DOI] [PubMed] [Google Scholar]
Song Z.; Zhang Q.; Wu W.; Pu Z.; Yu H. Rational Design of Enzyme Activity and Enantioselectivity. Front. Bioeng. Biotechnol 2023, 11, 1129149. 10.3389/fbioe.2023.1129149. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ribeiro A. J. M.; Riziotis I. G.; Borkakoti N.; Thornton J. M. Enzyme Function and Evolution through the Lens of Bioinformatics. Biochem. J. 2023, 480 (22), 1845–1863. 10.1042/BCJ20220405. [DOI] [PMC free article] [PubMed] [Google Scholar]
Beaudoin C.; Kundu S.; Topaloglu R. O.; Ghosh S.. Quantum Machine Learning for Material Synthesis and Hardware Security (Invited Paper). In Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design; ACM: San Diego, California, 2022; pp 1–7 10.1145/3508352.3561115. [DOI]
Fan Y.; Xia Y.; Zhu J.; Wu L.; Xie S.; Qin T. Back Translation for Molecule Generation. Bioinformatics 2022, 38 (5), 1244–1251. 10.1093/bioinformatics/btab817. [DOI] [PubMed] [Google Scholar]
Zahoránszky-Kőhalmi G.; Lysov N.; Vorontcov I.; Wang J.; Soundararajan J.; Metaxotos D.; Mathew B.; Sarosh R.; Michael S. G.; Godfrey A. G. Algorithm for the Pruning of Synthesis Graphs. J. Chem. Inf. Model. 2022, 62 (9), 2226–2238. 10.1021/acs.jcim.1c01202. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen Z.; Ayinde O. R.; Fuchs J. R.; Sun H.; Ning X. G2Retro as a Two-Step Graph Generative Models for Retrosynthesis Prediction. Commun. Chem. 2023, 6 (1), 102. 10.1038/s42004-023-00897-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Genheden S.; Norrby P.-O.; Engkvist O. AiZynthTrain: Robust, Reproducible, and Extensible Pipelines for Training Synthesis Prediction Models. J. Chem. Inf. Model. 2023, 63 (7), 1841–1846. 10.1021/acs.jcim.2c01486. [DOI] [PubMed] [Google Scholar]
Mo Y.; Guan Y.; Verma P.; Guo J.; Fortunato M. E.; Lu Z.; Coley C. W.; Jensen K. F. Evaluating and Clustering Retrosynthesis Pathways with Learned Strategy. Chem. Sci. 2021, 12 (4), 1469–1478. 10.1039/D0SC05078D. [DOI] [PMC free article] [PubMed] [Google Scholar]
Born J.; Manica M.; Cadow J.; Markert G.; Mill N. A.; Filipavicius M.; Janakarajan N.; Cardinale A.; Laino T.; Rodríguez Martínez M. Data-Driven Molecular Design for Discovery and Synthesis of Novel Ligands: A Case Study on SARS-CoV-2. Mach. Learn. Sci. Technol. 2021, 2 (2), 025024. 10.1088/2632-2153/abe808. [DOI] [Google Scholar]
Rogers D.; Hahn M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50 (5), 742–754. 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
Boiko D. A.; MacKnight R.; Kline B.; Gomes G. Autonomous Chemical Research with Large Language Models. Nature 2023, 624 (7992), 570–578. 10.1038/s41586-023-06792-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guo T.; Guo K.; Nan B.; Liang Z.; Guo Z.; Chawla N. V.; Wiest O.; Zhang X.. What Can Large Language Models Do in Chemistry? A Comprehensive Benchmark on Eight Tasks. http://arxiv.org/abs/2305.18365 (accessed 2023-11-27).
Meng Z.; Zhao P.; Yu Y.; King I.. A Unified View of Deep Learning for Reaction and Retrosynthesis Prediction: Current Status and Future Challenges. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence; International Joint Conferences on Artificial Intelligence Organization: Macau, SAR China, 2023; pp 6723–6731 10.24963/ijcai.2023/753. [DOI]
Hu W.; Liu Y.; Chen X.; Chai W.; Chen H.; Wang H.; Wang G. Deep Learning Methods for Small Molecule Drug Discovery: A Survey. IEEE Trans. Artif. Intell. 2024, 5, 459. 10.1109/TAI.2023.3251977. [DOI] [Google Scholar]
Jiang Y.; Yu Y.; Kong M.; Mei Y.; Yuan L.; Huang Z.; Kuang K.; Wang Z.; Yao H.; Zou J.; Coley C. W.; Wei Y. Artificial Intelligence for Retrosynthesis Prediction. Engineering 2023, 25, 32–50. 10.1016/j.eng.2022.04.021. [DOI] [Google Scholar]
Kearnes S. M.; Maser M. R.; Wleklinski M.; Kast A.; Doyle A. G.; Dreher S. D.; Hawkins J. M.; Jensen K. F.; Coley C. W. The Open Reaction Database. J. Am. Chem. Soc. 2021, 143 (45), 18820–18826. 10.1021/jacs.1c09820. [DOI] [PubMed] [Google Scholar]
Heid E.; Probst D.; Green W. H.; Madsen G. K. H. EnzymeMap: Curation, Validation and Data-Driven Prediction of Enzymatic Reactions. Chem. Sci. 2023, 14 (48), 14229–14242. 10.1039/D3SC02048G. [DOI] [PMC free article] [PubMed] [Google Scholar]
Seierstad M.; Tichenor M. S.; DesJarlais R. L.; Na J.; Bacani G. M.; Chung D. M.; Mercado-Marin E. V.; Steffens H. C.; Mirzadegan T. Novel Reagent Space: Identifying Unorderable but Readily Synthesizable Building Blocks. ACS Med. Chem. Lett. 2021, 12 (11), 1853–1860. 10.1021/acsmedchemlett.1c00340. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin Y.; Zhang R.; Wang D.; Cernak T. Computer-Aided Key Step Generation in Alkaloid Total Synthesis. Science 2023, 379 (6631), 453–457. 10.1126/science.ade8459. [DOI] [PubMed] [Google Scholar]
Hardy M. A.; Nan B.; Wiest O.; Sarpong R. Strategic Elements in Computer-Assisted Retrosynthesis: A Case Study of the Pupukeanane Natural Products. Tetrahedron 2022, 104, 132584. 10.1016/j.tet.2021.132584. [DOI] [PMC free article] [PubMed] [Google Scholar]
Soudier P.; Zúñiga A.; Duigou T.; Voyvodic P. L.; Bazi-Kabbaj K.; Kushwaha M.; Vendrell J. A.; Solassol J.; Bonnet J.; Faulon J.-L. PeroxiHUB: A Modular Cell-Free Biosensing Platform Using H ₂ O ₂ as Signal Integrator. ACS Synth. Biol. 2022, 11 (8), 2578–2588. 10.1021/acssynbio.2c00138. [DOI] [PubMed] [Google Scholar]
Robinson C. J.; Carbonell P.; Jervis A. J.; Yan C.; Hollywood K. A.; Dunstan M. S.; Currin A.; Swainston N.; Spiess R.; Taylor S.; Mulherin P.; Parker S.; Rowe W.; Matthews N. E.; Malone K. J.; Le Feuvre R.; Shapira P.; Barran P.; Turner N. J.; Micklefield J.; Breitling R.; Takano E.; Scrutton N. S. Rapid Prototyping of Microbial Production Strains for the Biomanufacture of Potential Materials Monomers. Metab. Eng. 2020, 60, 168–182. 10.1016/j.ymben.2020.04.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang Z.; Fang L.; Wang F.; Deng Y.; Jiang Z.; Li A. Transforming Inert Cycloalkanes into α,ω-Diamines by Designed Enzymatic Cascade Catalysis. Angew. Chem., Int. Ed. 2023, 62 (16), e202215935. 10.1002/anie.202215935. [DOI] [PubMed] [Google Scholar]
Liu Z.; Zhang X.; Lei D.; Qiao B.; Zhao G.-R. Metabolic Engineering of Escherichia Coli for de Novo Production of 3-Phenylpropanol via Retrobiosynthesis Approach. Microb. Cell Factories 2021, 20 (1), 121. 10.1186/s12934-021-01615-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yiakoumetti A.; Hanko E. K. R.; Zou Y.; Chua J.; Chromy J.; Stoney R. A.; Valdehuesa K. N. G.; Connolly J. A.; Yan C.; Hollywood K. A.; Takano E.; Breitling R. Expanding Flavone and Flavonol Production Capabilities in Escherichia Coli. Front. Bioeng. Biotechnol. 2023, 11, 1275651. 10.3389/fbioe.2023.1275651. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brito L. F.; Irla M.; Nærdal I.; Le S. B.; Delépine B.; Heux S.; Brautaset T. Evaluation of Heterologous Biosynthetic Pathways for Methanol-Based 5-Aminovalerate Production by Thermophilic Bacillus Methanolicus. Front. Bioeng. Biotechnol 2021, 9, 1. 10.3389/fbioe.2021.686319. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hanko E. K. R.; Valdehuesa K. N. G.; Verhagen K. J. A.; Chromy J.; Stoney R. A.; Chua J.; Yan C.; Roubos J. A.; Schmitz J.; Breitling R. Carboxylic Acid Reductase-Dependent Biosynthesis of Eugenol and Related Allylphenols. Microb. Cell Factories 2023, 22 (1), 238. 10.1186/s12934-023-02246-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sb4c00091_si_001.pdf^{(195.8KB, pdf)}

[ref1] Corey E. J.General Methods for the Construction of Complex Molecules. In The Chemistry of Natural Products; Elsevier, 1967; pp 19–37 10.1016/B978-0-08-020741-4.50004-X. [DOI] [Google Scholar]

[ref2] Lin G.-M.; Warden-Rothman R.; Voigt C. A. Retrosynthetic Design of Metabolic Pathways to Chemicals Not Found in Nature. Curr. Opin. Syst. Biol. 2019, 14, 82–107. 10.1016/j.coisb.2019.04.004. [DOI] [Google Scholar]

[ref3] Sheldon R. A.; Woodley J. M. Role of Biocatalysis in Sustainable Chemistry. Chem. Rev. 2018, 118 (2), 801–838. 10.1021/acs.chemrev.7b00203. [DOI] [PubMed] [Google Scholar]

[ref4] Yu T.; Boob A. G.; Volk M. J.; Liu X.; Cui H.; Zhao H. Machine Learning-Enabled Retrobiosynthesis of Molecules. Nat. Catal. 2023, 6 (2), 137–151. 10.1038/s41929-022-00909-w. [DOI] [Google Scholar]

[ref5] Tricco A. C.; Lillie E.; Zarin W.; O’Brien K. K.; Colquhoun H.; Levac D.; Moher D.; Peters M. D. J.; Horsley T.; Weeks L.; Hempel S.; Akl E. A.; Chang C.; McGowan J.; Stewart L.; Hartling L.; Aldcroft A.; Wilson M. G.; Garritty C.; Lewin S.; Godfrey C. M.; Macdonald M. T.; Langlois E. V.; Soares-Weiser K.; Moriarty J.; Clifford T.; Tunçalp Ö.; Straus S. E. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Int. Med. 2018, 169 (7), 467–473. 10.7326/M18-0850. [DOI] [PubMed] [Google Scholar]

[ref6] Aal E Ali R. S.; Meng J.; Khan M. E. I.; Jiang X. Machine Learning Advancements in Organic Synthesis: A Focused Exploration of Artificial Intelligence Applications in Chemistry. Artif. Intell. Chem. 2024, 2, 100049. 10.1016/j.aichem.2024.100049. [DOI] [Google Scholar]

[ref7] Fortunato M. E.; Coley C. W.; Barnes B. C.; Jensen K. F.. Machine Learned Prediction of Reaction Template Applicability for Data-Driven Retrosynthetic Predictions of Energetic Materials; AIP Conf. Proc.; AIP Publishing, Portland, OR, USA, 2020; Vol. 2272, p 070014. 10.1063/12.0000850. [DOI]

[ref8] Wan Y.; Liao B.; Hsieh C.-Y.; Zhang S.. Retroformer: Pushing the Limits of Interpretable End-to-End Retrosynthesis Transformer. In Proceedings of the 39th International Conference on Machine Learning; Proceedings of Machine Learning Research; PMLR, 2022; Vol. 162, pp 22475–22490.

[ref9] Zhong W.; Yang Z.; Chen C. Y.-C. Retrosynthesis Prediction Using an End-to-End Graph Generative Architecture for Molecular Graph Editing. Nat. Commun. 2023, 14 (1), 3009. 10.1038/s41467-023-38851-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref10] Karpov P.; Godin G.; Tetko I. V.. A Transformer Model for Retrosynthesis. In Artificial Neural Networks and Machine Learning - ICANN 2019: Workshop and Special Sessions; Tetko I. V., Kůrková V., Karpov P., Theis F., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, 2019; Vol. 11731, pp 817–830 10.1007/978-3-030-30493-5_78. [DOI]

[ref11] Heid E.; Liu J.; Aude A.; Green W. H. Influence of Template Size, Canonicalization, and Exclusivity for Retrosynthesis and Reaction Prediction Applications. J. Chem. Inf. Model. 2022, 62 (1), 16–26. 10.1021/acs.jcim.1c01192. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref12] Coley C. W.; Rogers L.; Green W. H.; Jensen K. F. Computer-Assisted Retrosynthesis Based on Molecular Similarity. ACS Cent. Sci. 2017, 3 (12), 1237–1245. 10.1021/acscentsci.7b00355. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] Coley C. W.; Jin W.; Rogers L.; Jamison T. F.; Jaakkola T. S.; Green W. H.; Barzilay R.; Jensen K. F. A Graph-Convolutional Neural Network Model for the Prediction of Chemical Reactivity. Chem. Sci. 2019, 10 (2), 370–377. 10.1039/C8SC04228D. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref14] Krenn M.; Häse F.; Nigam A.; Friederich P.; Aspuru-Guzik A. Self-Referencing Embedded Strings (SELFIES): A 100% Robust Molecular String Representation. Mach. Learn. Sci. Technol. 2020, 1 (4), 045024. 10.1088/2632-2153/aba947. [DOI] [Google Scholar]

[ref15] Carbonell P.; Carlsson L.; Faulon J.-L. Stereo Signature Molecular Descriptor. J. Chem. Inf. Model. 2013, 53 (4), 887–897. 10.1021/ci300584r. [DOI] [PubMed] [Google Scholar]

[ref16] Hähnke V. D.; Bolton E. E.; Bryant S. H. PubChem Atom Environments. J. Cheminformatics 2015, 7 (1), 41. 10.1186/s13321-015-0076-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref17] Wigh D. S.; Goodman J. M.; Lapkin A. A. A Review of Molecular Representation in the Age of Machine Learning. WIREs Comput. Mol. Sci. 2022, 12 (5), e1603. 10.1002/wcms.1603. [DOI] [Google Scholar]

[ref18] David L.; Thakkar A.; Mercado R.; Engkvist O. Molecular Representations in AI-Driven Drug Discovery: A Review and Practical Guide. J. Cheminformatics 2020, 12 (1), 56. 10.1186/s13321-020-00460-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref19] Li J.; Fang L.; Lou J.-G. RetroRanker: Leveraging Reaction Changes to Improve Retrosynthesis Prediction through Re-Ranking. J. Cheminformatics 2023, 15 (1), 58. 10.1186/s13321-023-00727-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] Kim E.; Lee D.; Kwon Y.; Park M. S.; Choi Y.-S. Valid, Plausible, and Diverse Retrosynthesis Using Tied Two-Way Transformers with Latent Variables. J. Chem. Inf. Model. 2021, 61 (1), 123–133. 10.1021/acs.jcim.0c01074. [DOI] [PubMed] [Google Scholar]

[ref21] Coley C. W.; Green W. H.; Jensen K. F. RDChiral: An RDKit Wrapper for Handling Stereochemistry in Retrosynthetic Template Extraction and Application. J. Chem. Inf. Model. 2019, 59 (6), 2529–2537. 10.1021/acs.jcim.9b00286. [DOI] [PubMed] [Google Scholar]

[ref22] Schwaller P.; Petraglia R.; Zullo V.; Nair V. H.; Haeuselmann R. A.; Pisoni R.; Bekas C.; Iuliano A.; Laino T. Predicting Retrosynthetic Pathways Using Transformer-Based Models and a Hyper-Graph Exploration Strategy. Chem. Sci. 2020, 11 (12), 3316–3325. 10.1039/C9SC05704H. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref23] Zhang Y.; Wang L.; Wang X.; Zhang C.; Ge J.; Tang J.; Su A.; Duan H. Data Augmentation and Transfer Learning Strategies for Reaction Prediction in Low Chemical Data Regimes. Org. Chem. Front. 2021, 8 (7), 1415–1423. 10.1039/D0QO01636E. [DOI] [Google Scholar]

[ref24] Mao K.; Xiao X.; Xu T.; Rong Y.; Huang J.; Zhao P. Molecular Graph Enhanced Transformer for Retrosynthesis Prediction. Neurocomputing 2021, 457, 193–202. 10.1016/j.neucom.2021.06.037. [DOI] [Google Scholar]

[ref25] Irwin R.; Dimitriadis S.; He J.; Bjerrum E. J. Chemformer: A Pre-Trained Transformer for Computational Chemistry. Mach. Learn. Sci. Technol. 2022, 3 (1), 015022. 10.1088/2632-2153/ac3ffb. [DOI] [Google Scholar]

[ref26] Baylon J. L.; Cilfone N. A.; Gulcher J. R.; Chittenden T. W. Enhancing Retrosynthetic Reaction Prediction with Deep Learning Using Multiscale Reaction Classification. J. Chem. Inf. Model. 2019, 59 (2), 673–688. 10.1021/acs.jcim.8b00801. [DOI] [PubMed] [Google Scholar]

[ref27] Zhang B.; Lin J.; Du L.; Zhang L. Harnessing Data Augmentation and Normalization Preprocessing to Improve the Performance of Chemical Reaction Predictions of Data-Driven Model. Polymers 2023, 15 (9), 2224. 10.3390/polym15092224. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref28] Zhu J.; Xia Y.; Wu L.; Xie S.; Zhou W.; Qin T.; Li H.; Liu T.-Y.. Dual-View Molecular Pre-Training. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; ACM: Long Beach, CA, USA, 2023; pp 3615–3627 10.1145/3580305.3599317. [DOI]

[ref29] Yang F.; Liu J.; Zhang Q.; Yang Z.; Zhang X. CNN-Based Two-Branch Multi-Scale Feature Extraction Network for Retrosynthesis Prediction. BMC Bioinformatics 2022, 23 (1), 362. 10.1186/s12859-022-04904-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] Zheng S.; Rao J.; Zhang Z.; Xu J.; Yang Y. Predicting Retrosynthetic Reactions Using Self-Corrected Transformer Neural Networks. J. Chem. Inf. Model. 2020, 60 (1), 47–55. 10.1021/acs.jcim.9b00949. [DOI] [PubMed] [Google Scholar]

[ref31] Lee A. A.; Yang Q.; Sresht V.; Bolgar P.; Hou X.; Klug-McLeod J. L.; Butler C. R. Molecular Transformer Unifies Reaction Prediction and Retrosynthesis across Pharma Chemical Space. Chem. Commun. 2019, 55 (81), 12152–12155. 10.1039/C9CC05122H. [DOI] [PubMed] [Google Scholar]

[ref32] Wang X.; Li Y.; Qiu J.; Chen G.; Liu H.; Liao B.; Hsieh C.-Y.; Yao X. RetroPrime: A Diverse, Plausible and Transformer-Based Method for Single-Step Retrosynthesis Predictions. Chem. Eng. J. 2021, 420, 129845. 10.1016/j.cej.2021.129845. [DOI] [Google Scholar]

[ref33] Liu B.; Ramsundar B.; Kawthekar P.; Shi J.; Gomes J.; Luu Nguyen Q.; Ho S.; Sloane J.; Wender P.; Pande V. Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models. ACS Cent. Sci. 2017, 3 (10), 1103–1113. 10.1021/acscentsci.7b00303. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref34] Fang L.; Li J.; Zhao M.; Tan L.; Lou J.-G. Single-Step Retrosynthesis Prediction by Leveraging Commonly Preserved Substructures. Nat. Commun. 2023, 14 (1), 2446. 10.1038/s41467-023-37969-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref35] Zhang K.; Mann V.; Venkatasubramanian V. G MATT: Single step Retrosynthesis Prediction Using Molecular Grammar Tree Transformer. AIChE J. 2024, 70, e18244. 10.1002/aic.18244. [DOI] [Google Scholar]

[ref36] Yan C.; Zhao P.; Lu C.; Yu Y.; Huang J. RetroComposer: Composing Templates for Template-Based Retrosynthesis Prediction. Biomolecules 2022, 12 (9), 1325. 10.3390/biom12091325. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref37] Bai R.; Zhang C.; Wang L.; Yao C.; Ge J.; Duan H. Transfer Learning: Making Retrosynthetic Predictions Based on a Small Chemical Reaction Dataset Scale to a New Level. Molecules 2020, 25 (10), 2357. 10.3390/molecules25102357. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref38] Qiao H.; Wu Y.; Zhang Y.; Zhang C.; Wu X.; Wu Z.; Zhao Q.; Wang X.; Li H.; Duan H. Transformer-Based Multitask Learning for Reaction Prediction under Low-Resource Circumstances. RSC Adv. 2022, 12 (49), 32020–32026. 10.1039/D2RA05349G. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref39] Yan Y.; Zhao Y.; Yao H.; Feng J.; Liang L.; Han W.; Xu X.; Pu C.; Zang C.; Chen L.; Li Y.; Liu H.; Lu T.; Chen Y.; Zhang Y. RPBP: Deep Retrosynthesis Reaction Prediction Based on Byproducts. J. Chem. Inf. Model. 2023, 63 (19), 5956–5970. 10.1021/acs.jcim.3c00274. [DOI] [PubMed] [Google Scholar]

[ref40] Heid E.; Goldman S.; Sankaranarayanan K.; Coley C. W.; Flamm C.; Green W. H. EHreact: Extended Hasse Diagrams for the Extraction and Scoring of Enzymatic Reaction Templates. J. Chem. Inf. Model. 2021, 61 (10), 4949–4961. 10.1021/acs.jcim.1c00921. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref41] Seo S.-W.; Song Y. Y.; Yang J. Y.; Bae S.; Lee H.; Shin J.; Hwang S. J.; Yang E. GTA: Graph Truncated Attention for Retrosynthesis. Proc. AAAI Conf. Artif. Intell. 2021, 35 (1), 531–539. 10.1609/aaai.v35i1.16131. [DOI] [Google Scholar]

[ref42] He H.-R.; Wang J.; Liu Y.; Wu F.. Modeling Diverse Chemical Reactions for Single-Step Retrosynthesis via Discrete Latent Variables. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management; ACM: Atlanta, GA, USA, 2022; pp 717–726 10.1145/3511808.3557397. [DOI]

[ref43] Sankaranarayanan K.; Heid E.; Coley C. W.; Verma D.; Green W. H.; Jensen K. F. Similarity Based Enzymatic Retrosynthesis. Chem. Sci. 2022, 13 (20), 6039–6053. 10.1039/D2SC01588A. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref44] Tetko I. V.; Karpov P.; Van Deursen R.; Godin G. State-of-the-Art Augmented NLP Transformer Models for Direct and Single-Step Retrosynthesis. Nat. Commun. 2020, 11 (1), 5575. 10.1038/s41467-020-19266-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref45] Toniato A.; Vaucher A. C.; Schwaller P.; Laino T. Enhancing Diversity in Language Based Models for Single-Step Retrosynthesis. Digit. Discovery 2023, 2 (2), 489–501. 10.1039/D2DD00110A. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref46] Ucak U. V.; Ashyrmamatov I.; Lee J. Reconstruction of Lossless Molecular Representations from Fingerprints. J. Cheminformatics 2023, 15 (1), 26. 10.1186/s13321-023-00693-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref47] Seidl P.; Renz P.; Dyubankova N.; Neves P.; Verhoeven J.; Wegner J. K.; Segler M.; Hochreiter S.; Klambauer G. Improving Few- and Zero-Shot Reaction Template Prediction Using Modern Hopfield Networks. J. Chem. Inf. Model. 2022, 62 (9), 2111–2120. 10.1021/acs.jcim.1c01065. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref48] Yu J.; Wang J.; Zhao H.; Gao J.; Kang Y.; Cao D.; Wang Z.; Hou T. Organic Compound Synthetic Accessibility Prediction Based on the Graph Attention Mechanism. J. Chem. Inf. Model. 2022, 62 (12), 2973–2986. 10.1021/acs.jcim.2c00038. [DOI] [PubMed] [Google Scholar]

[ref49] Ucak U. V.; Kang T.; Ko J.; Lee J. Substructure-Based Neural Machine Translation for Retrosynthetic Prediction. J. Cheminformatics 2021, 13 (1), 4. 10.1186/s13321-020-00482-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref50] Ucak U. V.; Ashyrmamatov I.; Ko J.; Lee J. Retrosynthetic Reaction Pathway Prediction through Neural Machine Translation of Atomic Environments. Nat. Commun. 2022, 13 (1), 1186. 10.1038/s41467-022-28857-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref51] Badowski T.; Gajewska E. P.; Molga K.; Grzybowski B. A. Synergy Between Expert and Machine Learning Approaches Allows for Improved Retrosynthetic Planning. Angew. Chem., Int. Ed. 2020, 59 (2), 725–730. 10.1002/anie.201912083. [DOI] [PubMed] [Google Scholar]

[ref52] Thakkar A.; Selmi N.; Reymond J.-L.; Engkvist O.; Bjerrum E. J. Ring Breaker”: Neural Network Driven Synthesis Prediction of the Ring System Chemical Space. J. Med. Chem. 2020, 63 (16), 8791–8808. 10.1021/acs.jmedchem.9b01919. [DOI] [PubMed] [Google Scholar]

[ref53] Hasic H.; Ishida T. Single-Step Retrosynthesis Prediction Based on the Identification of Potential Disconnection Sites Using Molecular Substructure Fingerprints. J. Chem. Inf. Model. 2021, 61 (2), 641–652. 10.1021/acs.jcim.0c01100. [DOI] [PubMed] [Google Scholar]

[ref54] Segler M. H. S.; Waller M. P. Neural Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. Chem. - Eur. J. 2017, 23 (25), 5966–5971. 10.1002/chem.201605499. [DOI] [PubMed] [Google Scholar]

[ref55] Ishida S.; Terayama K.; Kojima R.; Takasu K.; Okuno Y. Prediction and Interpretable Visualization of Retrosynthetic Reactions Using Graph Convolutional Networks. J. Chem. Inf. Model. 2019, 59 (12), 5026–5033. 10.1021/acs.jcim.9b00538. [DOI] [PubMed] [Google Scholar]

[ref56] Dai H.; Li C.; Coley C.; Dai B.; Song L.. Retrosynthesis Prediction with Conditional Graph Logic Network. In Advances in Neural Information Processing Systems 32; NeurIPS 2019; Curran Associates, Inc., 2019; Vol. 32.

[ref57] Chen S.; Jung Y. Deep Retrosynthetic Reaction Prediction Using Local Reactivity and Global Attention. JACS Au 2021, 1 (10), 1612–1620. 10.1021/jacsau.1c00246. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref58] Lee H.; Ahn S.; Seo S.-W.; Song Y. Y.; Yang E.; Hwang S.-J.; Shin J.. RetCL: A Selection-Based Approach for Retrosynthesis via Contrastive Learning. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence; IJCAI-21; International Joint Conferences on Artificial Intelligence Organization, 2021; pp 2673–2679 10.24963/ijcai.2021/368. [DOI]

[ref59] Lin Z.; Yin S.; Shi L.; Zhou W.; Zhang Y. J. G2GT: Retrosynthesis Prediction with Graph-to-Graph Attention Neural Network and Self-Training. J. Chem. Inf. Model. 2023, 63 (7), 1894–1905. 10.1021/acs.jcim.2c01302. [DOI] [PubMed] [Google Scholar]

[ref60] Delépine B.; Duigou T.; Carbonell P.; Faulon J.-L. RetroPath2.0: A Retrosynthesis Workflow for Metabolic Engineers. Metab. Eng. 2018, 45, 158–170. 10.1016/j.ymben.2017.12.002. [DOI] [PubMed] [Google Scholar]

[ref61] Szymkuć S.; Gajewska E. P.; Klucznik T.; Molga K.; Dittwald P.; Startek M.; Bajczyk M.; Grzybowski B. A. Computer-Assisted Synthetic Planning: The End of the Beginning. Angew. Chem., Int. Ed. 2016, 55 (20), 5904–5937. 10.1002/anie.201506101. [DOI] [PubMed] [Google Scholar]

[ref62] Finnigan W.; Hepworth L. J.; Flitsch S. L.; Turner N. J. RetroBioCat as a Computer-Aided Synthesis Planning Tool for Biocatalytic Reactions and Cascades. Nat. Catal. 2021, 4 (2), 98–104. 10.1038/s41929-020-00556-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref63] Duigou T.; du Lac M.; Carbonell P.; Faulon J.-L. RetroRules: A Database of Reaction Rules for Engineering Biology. Nucleic Acids Res. 2019, 47 (D1), D1229–D1235. 10.1093/nar/gky940. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref64] Dong Z.; Chen Z.; Wang Q.. Retrosynthesis Prediction Based on Graph Relation Network. In 2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI); IEEE: Beijing, China, 2022; pp 1–5 10.1109/CISP-BMEI56279.2022.9979857. [DOI]

[ref65] Schwaller P.; Laino T.; Gaudin T.; Bolgar P.; Hunter C. A.; Bekas C.; Lee A. A. Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction. ACS Cent. Sci. 2019, 5 (9), 1572–1583. 10.1021/acscentsci.9b00576. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref66] Guo Z.; Wu S.; Ohno M.; Yoshida R. Bayesian Algorithm for Retrosynthesis. J. Chem. Inf. Model. 2020, 60 (10), 4474–4486. 10.1021/acs.jcim.0c00320. [DOI] [PubMed] [Google Scholar]

[ref67] Tu Z.; Coley C. W. Permutation Invariant Graph-to-Sequence Model for Template-Free Retrosynthesis and Reaction Prediction. J. Chem. Inf. Model. 2022, 62 (15), 3503–3513. 10.1021/acs.jcim.2c00321. [DOI] [PubMed] [Google Scholar]

[ref68] Hu H.; Jiang Y.; Yang Y.; Chen J. X. BiG2S: A Dual Task Graph-to-Sequence Model for the End-to-End Template-Free Reaction Prediction. Appl. Intell 2023, 53, 29620. 10.1007/s10489-023-05048-8. [DOI] [Google Scholar]

[ref69] Liu S.; Tu Z.; Xu M.; Zhang Z.; Lin L.; Ying R.; Tang J.; Zhao P.; Wu D.. FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning. In Proceedings of the 40th International Conference on Machine Learning; ICML’23; JMLR.org: Honolulu, Hawaii, USA, 2023 10.5555/3618408.3619322. [DOI] [Google Scholar]

[ref70] Lin M. H.; Tu Z.; Coley C. W. Improving the Performance of Models for One-Step Retrosynthesis through Re-Ranking. J. Cheminformatics 2022, 14 (1), 15. 10.1186/s13321-022-00594-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref71] Sun R.; Dai H.; Li L.; Kearnes S.; Dai B.. Towards Understanding Retrosynthesis by Energy-Based Models. In Advances in Neural Information Processing Systems 34; NeurIPS 2021; Curran Associates, Inc., 2021.

[ref72] Christofidellis D.; Giannone G.; Born J.; Winther O.; Laino T.; Manica M.. Unifying Molecular and Textual Representations via Multi-Task Language Modelling. In Proceedings of the 40th International Conference on Machine Learning; ICML’23; JMLR.org: Honolulu, Hawaii, USA, 2023 10.5555/3618408.3618651. [DOI]

[ref73] Zheng S.; Zeng T.; Li C.; Chen B.; Coley C. W.; Yang Y.; Wu R. Deep Learning Driven Biosynthetic Pathways Navigation for Natural Products with BioNavi-NP. Nat. Commun. 2022, 13 (1), 3342. 10.1038/s41467-022-30970-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref74] Kreutter D.; Schwaller P.; Reymond J.-L. Predicting Enzymatic Reactions with a Molecular Transformer. Chem. Sci. 2021, 12 (25), 8648–8659. 10.1039/D1SC02362D. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref75] Probst D.; Manica M.; Nana Teukam Y. G.; Castrogiovanni A.; Paratore F.; Laino T. Biocatalysed Synthesis Planning Using Data-Driven Learning. Nat. Commun. 2022, 13 (1), 964. 10.1038/s41467-022-28536-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref76] Shi C.; Xu M.; Guo H.; Zhang M.; Tang J.. A Graph to Graphs Framework for Retrosynthesis Prediction. In Proceedings of the 37th International Conference on Machine Learning; ICML’20; JMLR.org, 2020 10.5555/3524938.3525756. [DOI]

[ref77] Yan C.; Ding Q.; Zhao P.; Zheng S.; Yang J.; Yu Y.; Huang J.. RetroXpert: Decompose Retrosynthesis Prediction Like A Chemist. In Proceedings of the 34th International Conference on Neural Information Processing Systems; NIPS’20; Curran Associates Inc.: Red Hook, NY, USA, 2020 10.5555/3495724.3496668. [DOI]

[ref78] Somnath V. R.; Bunne C.; Coley C. W.; Krause A.; Barzilay R.. Learning Graph Models for Retrosynthesis Prediction. In Advances in Neural Information Processing Systems 34; NeurIPS 2021; Curran Associates, Inc., 2021.

[ref79] Wang Y.; Pang C.; Wang Y.; Jin J.; Zhang J.; Zeng X.; Su R.; Zou Q.; Wei L. Retrosynthesis Prediction with an Interpretable Deep-Learning Framework Based on Molecular Assembly Tasks. Nat. Commun. 2023, 14 (1), 6155. 10.1038/s41467-023-41698-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref80] Thakkar A.; Vaucher A. C.; Byekwaso A.; Schwaller P.; Toniato A.; Laino T. Unbiasing Retrosynthesis Language Models with Disconnection Prompts. ACS Cent. Sci. 2023, 9 (7), 1488–1498. 10.1021/acscentsci.3c00372. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref81] Jaume-Santero F.; Bornet A.; Valery A.; Naderi N.; Vicente Alvarez D.; Proios D.; Yazdani A.; Bournez C.; Fessard T.; Teodoro D. Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios. J. Chem. Inf. Model. 2023, 63 (7), 1914–1924. 10.1021/acs.jcim.2c01407. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref82] Wang X.; Hsieh C.-Y.; Yin X.; Wang J.; Li Y.; Deng Y.; Jiang D.; Wu Z.; Du H.; Chen H.; Li Y.; Liu H.; Wang Y.; Luo P.; Hou T.; Yao X. Generic Interpretable Reaction Condition Predictions with Open Reaction Condition Datasets and Unsupervised Learning of Reaction Center. Research 2023, 6, 0231. 10.34133/research.0231. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref83] Zhang C.; Lapkin A. A. Reinforcement Learning Optimization of Reaction Routes on the Basis of Large, Hybrid Organic Chemistry-Synthetic Biological, Reaction Network Data. React. Chem. Eng. 2023, 8 (10), 2491–2504. 10.1039/D2RE00406B. [DOI] [Google Scholar]

[ref84] Koch M.; Duigou T.; Faulon J.-L. Reinforcement Learning for Bioretrosynthesis. ACS Synth. Biol. 2020, 9 (1), 157–168. 10.1021/acssynbio.9b00447. [DOI] [PubMed] [Google Scholar]

[ref85] Wang W.; Liu Q.; Zhang L.; Dong Y.; Du J. RetroSynX: A Retrosynthetic Analysis Framework Using Hybrid Reaction Templates and Group Contribution-Based Thermodynamic Models. Chem. Eng. Sci. 2022, 248, 117208. 10.1016/j.ces.2021.117208. [DOI] [Google Scholar]

[ref86] Liu Q.; Tang K.; Zhang L.; Du J.; Meng Q. Computer assisted Synthetic Planning Considering Reaction Kinetics Based on Transition State Automated Generation Method. AIChE J. 2023, 69 (7), e18092. 10.1002/aic.18092. [DOI] [Google Scholar]

[ref87] Kreutter D.; Reymond J.-L. Multistep Retrosynthesis Combining a Disconnection Aware Triple Transformer Loop with a Route Penalty Score Guided Tree Search. Chem. Sci. 2023, 14 (36), 9959–9969. 10.1039/D3SC01604H. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref88] Kishimoto A.; Buesser B.; Chen B.; Botea A.. Depth-First Proof-Number Search with Heuristic Edge Cost and Application to Chemical Synthesis Planning. In Advances in Neural Information Processing Systems 32; NeurIPS 2019; Curran Associates, Inc., 2019.

[ref89] Franz C.; Mogk G.; Mrziglod T.; Schewior K.. Completeness and Diversity in Depth-First Proof-Number Search with Applications to Retrosynthesis. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence; International Joint Conferences on Artificial Intelligence Organization: Vienna, Austria, 2022; pp 4747–4753 10.24963/ijcai.2022/658. [DOI]

[ref90] Shibukawa R.; Ishida S.; Yoshizoe K.; Wasa K.; Takasu K.; Okuno Y.; Terayama K.; Tsuda K. CompRet: A Comprehensive Recommendation Framework for Chemical Synthesis Planning with Algorithmic Enumeration. J. Cheminformatics 2020, 12 (1), 52. 10.1186/s13321-020-00452-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref91] Segler M. H. S.; Preuss M.; Waller M. P. Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI. Nature 2018, 555 (7698), 604–610. 10.1038/nature25978. [DOI] [PubMed] [Google Scholar]

[ref92] Zhang B.; Zhang X.; Du W.; Song Z.; Zhang G.; Zhang G.; Wang Y.; Chen X.; Jiang J.; Luo Y. Chemistry-Informed Molecular Graph as Reaction Descriptor for Machine-Learned Retrosynthesis Planning. Proc. Natl. Acad. Sci. U. S. A. 2022, 119 (41), e2212711119. 10.1073/pnas.2212711119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref93] Wang X.; Qian Y.; Gao H.; Coley C. W.; Mo Y.; Barzilay R.; Jensen K. F. Towards Efficient Discovery of Green Synthetic Pathways with Monte Carlo Tree Search and Reinforcement Learning. Chem. Sci. 2020, 11 (40), 10959–10972. 10.1039/D0SC04184J. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref94] Thakkar A.; Kogej T.; Reymond J.-L.; Engkvist O.; Bjerrum E. J. Datasets and Their Influence on the Development of Computer Assisted Synthesis Planning Tools in the Pharmaceutical Domain. Chem. Sci. 2020, 11 (1), 154–168. 10.1039/C9SC04944D. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref95] Lin K.; Xu Y.; Pei J.; Lai L. Automatic Retrosynthetic Route Planning Using Template-Free Models. Chem. Sci. 2020, 11 (12), 3355–3364. 10.1039/C9SC03666K. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref96] Genheden S.; Thakkar A.; Chadimová V.; Reymond J.-L.; Engkvist O.; Bjerrum E. AiZynthFinder: A Fast, Robust and Flexible Open-Source Software for Retrosynthetic Planning. J. Cheminformatics 2020, 12 (1), 70. 10.1186/s13321-020-00472-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref97] Gao H.; Coley C. W.; Struble T. J.; Li L.; Qian Y.; Green W. H.; Jensen K. F. Combining Retrosynthesis and Mixed-Integer Optimization for Minimizing the Chemical Inventory Needed to Realize a WHO Essential Medicines List. React. Chem. Eng. 2020, 5 (2), 367–376. 10.1039/C9RE00348G. [DOI] [Google Scholar]

[ref98] Coley C. W.; Thomas D. A.; Lummiss J. A. M.; Jaworski J. N.; Breen C. P.; Schultz V.; Hart T.; Fishman J. S.; Rogers L.; Gao H.; Hicklin R. W.; Plehiers P. P.; Byington J.; Piotti J. S.; Green W. H.; Hart A. J.; Jamison T. F.; Jensen K. F. A Robotic Platform for Flow Synthesis of Organic Compounds Informed by AI Planning. Science 2019, 365 (6453), eaax1566. 10.1126/science.aax1566. [DOI] [PubMed] [Google Scholar]

[ref99] Westerlund A. M.; Barge B.; Mervin L.; Genheden S. Data driven Approaches for Identifying Hyperparameters in Multi step Retrosynthesis. Mol. Inform. 2023, 42, 2300128. 10.1002/minf.202300128. [DOI] [PubMed] [Google Scholar]

[ref100] Sankaranarayanan K.; Jensen K. F. Computer-Assisted Multistep Chemoenzymatic Retrosynthesis Using a Chemical Synthesis Planner. Chem. Sci. 2023, 14 (23), 6467–6475. 10.1039/D3SC01355C. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref101] Levin I.; Liu M.; Voigt C. A.; Coley C. W. Merging Enzymatic and Synthetic Chemistry with Computational Synthesis Planning. Nat. Commun. 2022, 13, 7747. 10.1038/s41467-022-35422-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref102] Jeong J.; Lee N.; Shin Y.; Shin D. Intelligent Generation of Optimal Synthetic Pathways Based on Knowledge Graph Inference and Retrosynthetic Predictions Using Reaction Big Data. J. Taiwan Inst. Chem. Eng. 2022, 130, 103982. 10.1016/j.jtice.2021.07.015. [DOI] [Google Scholar]

[ref103] Chen B.; Li C.; Dai H.; Song L.. Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search. In Proceedings of the 37th International Conference on Machine Learning; Proceedings of Machine Learning Research (PMLR), 2020; Vol. 119, pp 1608–1616.

[ref104] Xie S.; Yan R.; Han P.; Xia Y.; Wu L.; Guo C.; Yang B.; Qin T.. RetroGraph: Retrosynthetic Planning with Graph Search. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; ACM: Washington DC, USA, 2022; pp 2120–2129 10.1145/3534678.3539446. [DOI]

[ref105] Han P.; Zhao P.; Lu C.; Huang J.; Wu J.; Shang S.; Yao B.; Zhang X. GNN-Retro: Retrosynthetic Planning with Graph Neural Networks. Proc. AAAI Conf. Artif. Intell. 2022, 36 (4), 4014–4021. 10.1609/aaai.v36i4.20318. [DOI] [Google Scholar]

[ref106] Latendresse M.; Malerich J. P.; Herson J.; Krummenacker M.; Szeto J.; Vu V.-A.; Collins N.; Madrid P. B. SynRoute: A Retrosynthetic Planning Software. J. Chem. Inf. Model. 2023, 63 (17), 5484–5495. 10.1021/acs.jcim.3c00491. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref107] Grzybowski B. A.; Szymkuć S.; Gajewska E. P.; Molga K.; Dittwald P.; Wołos A.; Klucznik T. Chematica: A Story of Computer Code That Started to Think like a Chemist. Chem. 2018, 4 (3), 390–398. 10.1016/j.chempr.2018.02.024. [DOI] [Google Scholar]

[ref108] Russell S. J.; Norvig P.. Artificial Intelligence: A Modern Approach, 4th ed.; Pearson series in artificial intelligence; Pearson: Hoboken, 2021. [Google Scholar]

[ref109] Schreck J. S.; Coley C. W.; Bishop K. J. M. Learning Retrosynthetic Planning through Simulated Experience. ACS Cent. Sci. 2019, 5 (6), 970–981. 10.1021/acscentsci.9b00055. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref110] Yu Y.; Wei Y.; Kuang K.; Huang Z.; Yao H.; Wu F.. GRASP: Navigating Retrosynthetic Planning with Goal-Driven Policy. In Advances in Neural Information Processing Systems 35; NeurIPS 2022; Curran Associates, Inc., 2022.

[ref111] Kumar A.; Wang L.; Ng C. Y.; Maranas C. D. Pathway Design Using de Novo Steps through Uncharted Biochemical Spaces. Nat. Commun. 2018, 9 (1), 184. 10.1038/s41467-017-02362-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref112] Carbonell P.; Parutto P.; Herisson J.; Pandit S. B.; Faulon J.-L. XTMS: Pathway Design in an eXTended Metabolic Space. Nucleic Acids Res. 2014, 42 (W1), W389–W394. 10.1093/nar/gku362. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref113] Tokic M.; Hadadi N.; Ataman M.; Neves D.; Ebert B. E.; Blank L. M.; Miskovic L.; Hatzimanikatis V. Discovery and Evaluation of Biosynthetic Pathways for the Production of Five Methyl Ethyl Ketone Precursors. ACS Synth. Biol. 2018, 7 (8), 1858–1873. 10.1021/acssynbio.8b00049. [DOI] [PubMed] [Google Scholar]

[ref114] Hadadi N.; Hafner J.; Shajkofci A.; Zisaki A.; Hatzimanikatis V. ATLAS of Biochemistry: A Repository of All Possible Biochemical Reactions for Synthetic Biology and Metabolic Engineering Studies. ACS Synth. Biol. 2016, 5 (10), 1155–1166. 10.1021/acssynbio.6b00054. [DOI] [PubMed] [Google Scholar]

[ref115] Otero-Muras I.; Carbonell P. Automated Engineering of Synthetic Metabolic Pathways for Efficient Biomanufacturing. Metab. Eng. 2021, 63, 61–80. 10.1016/j.ymben.2020.11.012. [DOI] [PubMed] [Google Scholar]

[ref116] Kim J.; Ahn S.; Lee H.; Shin J.. Self-Improved Retrosynthetic Planning. In Proceedings of the 38th International Conference on Machine Learning, Virtual, July 18–24, 2021; Proceedings of Machine Learning Research (PMLR), 2021.

[ref117] Gao D.; Song W.; Wu J.; Guo L.; Gao C.; Liu J.; Chen X.; Liu L. Efficient Production of L-Homophenylalanine by Enzymatic-Chemical Cascade Catalysis. Angew. Chem., Int. Ed. 2022, 61 (36), e202207077. 10.1002/anie.202207077. [DOI] [PubMed] [Google Scholar]

[ref118] Rudroff F.; Mihovilovic M. D.; Gröger H.; Snajdrova R.; Iding H.; Bornscheuer U. T. Opportunities and Challenges for Combining Chemo- and Biocatalysis. Nat. Catal. 2018, 1 (1), 12–22. 10.1038/s41929-017-0010-4. [DOI] [Google Scholar]

[ref119] Finnigan W.; Flitsch S. L.; Hepworth L. J.; Turner N. J.. Enzyme Cascade Design: Retrosynthesis Approach. In Enzyme Cascade Design and Modelling; Kara S., Rudroff F., Eds.; Springer International Publishing: Cham, 2021; pp 7–30 10.1007/978-3-030-65718-5_2. [DOI] [Google Scholar]

[ref120] Skoraczyński G.; Kitlas M.; Miasojedow B.; Gambin A. Critical Assessment of Synthetic Accessibility Scores in Computer-Assisted Synthesis Planning. J. Cheminformatics 2023, 15 (1), 6. 10.1186/s13321-023-00678-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref121] Thakkar A.; Chadimová V.; Bjerrum E. J.; Engkvist O.; Reymond J.-L. Retrosynthetic Accessibility Score (RAscore) - Rapid Machine Learned Synthesizability Classification from AI Driven Retrosynthetic Planning. Chem. Sci. 2021, 12 (9), 3339–3349. 10.1039/D0SC05401A. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref122] Li B.; Chen H. Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph. Molecules 2022, 27 (3), 1039. 10.3390/molecules27031039. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref123] Kim H.; Lee K.; Kim C.; Lim J.; Kim W. Y. DFRscore: Deep Learning-Based Scoring of Synthetic Complexity with Drug-Focused Retrosynthetic Analysis for High-Throughput Virtual Screening. J. Chem. Inf. Model 2024, 64, 2432. 10.1021/acs.jcim.3c01134. [DOI] [PubMed] [Google Scholar]

[ref124] Correia J.; Carreira R.; Pereira V.; Rocha M.. Predicting the Number of Biochemical Transformations Needed to Synthesize a Compound. In 2022 International Joint Conference on Neural Networks (IJCNN); IEEE: Padua, Italy, 2022; pp 1–8 10.1109/IJCNN55064.2022.9892124. [DOI]

[ref125] Parrot M.; Tajmouati H.; Da Silva V. B. R.; Atwood B. R.; Fourcade R.; Gaston-Mathé Y.; Do Huu N.; Perron Q. Integrating Synthetic Accessibility with AI-Based Generative Drug Design. J. Cheminformatics 2023, 15 (1), 83. 10.1186/s13321-023-00742-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref126] Sanchez-Garcia R.; Havasi D.; Takács G.; Robinson M. C.; Lee A.; Von Delft F.; Deane C. M. CoPriNet: Graph Neural Networks Provide Accurate and Rapid Compound Price Prediction for Molecule Prioritisation. Digit. Discovery 2023, 2 (1), 103–111. 10.1039/D2DD00071G. [DOI] [Google Scholar]

[ref127] Tang K.; Zhuang Y.; Wang W.; Liu Q.; Zhang L.; Du J.; Meng Q. GC-NORM-Based Thermodynamic Framework for Evaluations of Organic Reactions Involving Carbon Dioxide Utilization. Chem. Eng. Sci. 2023, 278, 118913. 10.1016/j.ces.2023.118913. [DOI] [Google Scholar]

[ref128] Schwaller P.; Vaucher A. C.; Laino T.; Reymond J.-L. Prediction of Chemical Reaction Yields Using Deep Learning. Mach. Learn. Sci. Technol. 2021, 2 (1), 015016. 10.1088/2632-2153/abc81d. [DOI] [Google Scholar]

[ref129] Plehiers P. P.; Coley C. W.; Gao H.; Vermeire F. H.; Dobbelaere M. R.; Stevens C. V.; Van Geem K. M.; Green W. H. Artificial Intelligence for Computer-Aided Synthesis In Flow: Analysis and Selection of Reaction Components. Front. Chem. Eng. 2020, 2, 5. 10.3389/fceng.2020.00005. [DOI] [Google Scholar]

[ref130] Toniato A.; Unsleber J. P.; Vaucher A. C.; Weymuth T.; Probst D.; Laino T.; Reiher M. Quantum Chemical Data Generation as Fill-in for Reliability Enhancement of Machine-Learning Reaction and Retrosynthesis Planning. Digit. Discovery 2023, 2 (3), 663–673. 10.1039/D3DD00006K. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref131] Genheden S.; Engkvist O.; Bjerrum E. Fast Prediction of Distances between Synthetic Routes with Deep Learning. Mach. Learn. Sci. Technol. 2022, 3 (1), 015018. 10.1088/2632-2153/ac4a91. [DOI] [Google Scholar]

[ref132] Kuznetsov A.; Sahinidis N. V. ExtractionScore: A Quantitative Framework for Evaluating Synthetic Routes on Predicted Liquid-Liquid Extraction Performance. J. Chem. Inf. Model. 2021, 61 (5), 2274–2282. 10.1021/acs.jcim.0c01426. [DOI] [PubMed] [Google Scholar]

[ref133] Ertl P.; Schuffenhauer A. Estimation of Synthetic Accessibility Score of Drug-like Molecules Based on Molecular Complexity and Fragment Contributions. J. Cheminformatics 2009, 1 (1), 8. 10.1186/1758-2946-1-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref134] Coley C. W.; Rogers L.; Green W. H.; Jensen K. F. SCScore: Synthetic Complexity Learned from a Reaction Corpus. J. Chem. Inf. Model. 2018, 58 (2), 252–261. 10.1021/acs.jcim.7b00622. [DOI] [PubMed] [Google Scholar]

[ref135] Ebrahim A.; Lerman J. A.; Palsson B. O.; Hyduke D. R. COBRApy: COnstraints-Based Reconstruction and Analysis for Python. BMC Syst. Biol. 2013, 7 (1), 74. 10.1186/1752-0509-7-74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref136] Beber M. E.; Gollub M. G.; Mozaffari D.; Shebek K. M.; Flamholz A. I.; Milo R.; Noor E. eQuilibrator 3.0: A Database Solution for Thermodynamic Constant Estimation. Nucleic Acids Res. 2022, 50, D603–D609. 10.1093/nar/gkab1106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref137] Hérisson J.; Duigou T.; Du Lac M.; Bazi-Kabbaj K.; Sabeti Azad M.; Buldum G.; Telle O.; El Moubayed Y.; Carbonell P.; Swainston N.; Zulkower V.; Kushwaha M.; Baldwin G. S.; Faulon J.-L. The Automated Galaxy-SynBioCAD Pipeline for Synthetic Biology Design and Engineering. Nat. Commun. 2022, 13 (1), 5082. 10.1038/s41467-022-32661-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref138] Levin I.; Fortunato M. E.; Tan K. L.; Coley C. W. Computer aided Evaluation and Exploration of Chemical Spaces Constrained by Reaction Pathways. AIChE J. 2023, 69, e18234. 10.1002/aic.18234. [DOI] [Google Scholar]

[ref139] Wang L.; Upadhyay V.; Maranas C. D. dGPredictor: Automated Fragmentation Method for Metabolic Reaction Free Energy Prediction and de Novo Pathway Design. PLOS Comput. Biol. 2021, 17 (9), e1009448. 10.1371/journal.pcbi.1009448. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref140] Feehan R.; Montezano D.; Slusky J. S. G. Machine Learning for Enzyme Engineering, Selection and Design. Protein Eng. Des. Sel. 2021, 34, gzab019. 10.1093/protein/gzab019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref141] Stoney R. A.; Hanko E. K. R.; Carbonell P.; Breitling R. SelenzymeRF: Updated Enzyme Suggestion Software for Unbalanced Biochemical Reactions. Comput. Struct. Biotechnol. J. 2023, 21, 5868–5876. 10.1016/j.csbj.2023.11.039. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref142] Carbonell P.; Wong J.; Swainston N.; Takano E.; Turner N. J.; Scrutton N. S.; Kell D. B.; Breitling R.; Faulon J.-L. Selenzyme: Enzyme Selection Tool for Pathway Design. Bioinforma. Oxf. Engl. 2018, 34 (12), 2153–2154. 10.1093/bioinformatics/bty065. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref143] Hadadi N.; MohammadiPeyhani H.; Miskovic L.; Seijo M.; Hatzimanikatis V. Enzyme Annotation for Orphan and Novel Reactions Using Knowledge of Substrate Reactive Sites. Proc. Natl. Acad. Sci. U. S. A. 2019, 116 (15), 7298–7307. 10.1073/pnas.1818877116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref144] Finnigan W.; Lubberink M.; Hepworth L. J.; Citoler J.; Mattey A. P.; Ford G. J.; Sangster J.; Cosgrove S. C.; da Costa B. Z.; Heath R. S.; Thorpe T. W.; Yu Y.; Flitsch S. L.; Turner N. J. RetroBioCat Database: A Platform for Collaborative Curation and Automated Meta-Analysis of Biocatalysis Data. ACS Catal. 2023, 13 (17), 11771–11780. 10.1021/acscatal.3c01418. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref145] Kim Y.; Ryu J. Y.; Kim H. U.; Jang W. D.; Lee S. Y. A Deep Learning Approach to Evaluate the Feasibility of Enzymatic Reactions Generated by Retrobiosynthesis. Biotechnol. J. 2021, 16 (5), 2000605. 10.1002/biot.202000605. [DOI] [PubMed] [Google Scholar]

[ref146] Upadhyay V.; Boorla V. S.; Maranas C. D. Rank-Ordering of Known Enzymes as Starting Points for Re-Engineering Novel Substrate Activity Using a Convolutional Neural Network. Metab. Eng. 2023, 78, 171–182. 10.1016/j.ymben.2023.06.001. [DOI] [PubMed] [Google Scholar]

[ref147] Kotera M.; Okuno Y.; Hattori M.; Goto S.; Kanehisa M. Computational Assignment of the EC Numbers for Genomic-Scale Analysis of Enzymatic Reactions. J. Am. Chem. Soc. 2004, 126 (50), 16487–16498. 10.1021/ja0466457. [DOI] [PubMed] [Google Scholar]

[ref148] Rahman S. A.; Cuesta S. M.; Furnham N.; Holliday G. L.; Thornton J. M. EC-BLAST: A Tool to Automatically Search and Compare Enzyme Reactions. Nat. Methods 2014, 11 (2), 171–174. 10.1038/nmeth.2803. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref149] Egelhofer V.; Schomburg I.; Schomburg D. Automatic Assignment of EC Numbers. PLoS Comput. Biol. 2010, 6 (1), e1000661. 10.1371/journal.pcbi.1000661. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref150] Hu Q.-N.; Zhu H.; Li X.; Zhang M.; Deng Z.; Yang X.; Deng Z. Assignment of EC Numbers to Enzymatic Reactions with Reaction Difference Fingerprints. PLoS One 2012, 7 (12), e52901. 10.1371/journal.pone.0052901. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref151] Probst D. An Explainability Framework for Deep Learning on Chemical Reactions Exemplified by Enzyme-Catalysed Reaction Classification. J. Cheminformatics 2023, 15 (1), 113. 10.1186/s13321-023-00784-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref152] Liu C.-H.; Korablyov M.; Jastrzȩbski S.; Włodarczyk-Pruszyński P.; Bengio Y.; Segler M. RetroGNN: Fast Estimation of Synthesizability for Virtual Screening and De Novo Design by Learning from Slow Retrosynthesis Software. J. Chem. Inf. Model. 2022, 62 (10), 2293–2300. 10.1021/acs.jcim.1c01476. [DOI] [PubMed] [Google Scholar]

[ref153] Hafner J.; Mohammadi-Peyhani H.; Hatzimanikatis V.. Pathway Design. In Metabolic Engineering; John Wiley & Sons, Ltd., 2021; pp 237–257 10.1002/9783527823468.ch8. [DOI] [Google Scholar]

[ref154] de Souza R. O. M. A.; Miranda L. S. M.; Bornscheuer U. T. A Retrosynthesis Approach for Biocatalysis in Organic Synthesis. Chem. - Eur. J. 2017, 23 (50), 12040–12063. 10.1002/chem.201702235. [DOI] [PubMed] [Google Scholar]

[ref155] Song Z.; Zhang Q.; Wu W.; Pu Z.; Yu H. Rational Design of Enzyme Activity and Enantioselectivity. Front. Bioeng. Biotechnol 2023, 11, 1129149. 10.3389/fbioe.2023.1129149. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref156] Ribeiro A. J. M.; Riziotis I. G.; Borkakoti N.; Thornton J. M. Enzyme Function and Evolution through the Lens of Bioinformatics. Biochem. J. 2023, 480 (22), 1845–1863. 10.1042/BCJ20220405. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref157] Beaudoin C.; Kundu S.; Topaloglu R. O.; Ghosh S.. Quantum Machine Learning for Material Synthesis and Hardware Security (Invited Paper). In Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design; ACM: San Diego, California, 2022; pp 1–7 10.1145/3508352.3561115. [DOI]

[ref158] Fan Y.; Xia Y.; Zhu J.; Wu L.; Xie S.; Qin T. Back Translation for Molecule Generation. Bioinformatics 2022, 38 (5), 1244–1251. 10.1093/bioinformatics/btab817. [DOI] [PubMed] [Google Scholar]

[ref159] Zahoránszky-Kőhalmi G.; Lysov N.; Vorontcov I.; Wang J.; Soundararajan J.; Metaxotos D.; Mathew B.; Sarosh R.; Michael S. G.; Godfrey A. G. Algorithm for the Pruning of Synthesis Graphs. J. Chem. Inf. Model. 2022, 62 (9), 2226–2238. 10.1021/acs.jcim.1c01202. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref160] Chen Z.; Ayinde O. R.; Fuchs J. R.; Sun H.; Ning X. G2Retro as a Two-Step Graph Generative Models for Retrosynthesis Prediction. Commun. Chem. 2023, 6 (1), 102. 10.1038/s42004-023-00897-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref161] Genheden S.; Norrby P.-O.; Engkvist O. AiZynthTrain: Robust, Reproducible, and Extensible Pipelines for Training Synthesis Prediction Models. J. Chem. Inf. Model. 2023, 63 (7), 1841–1846. 10.1021/acs.jcim.2c01486. [DOI] [PubMed] [Google Scholar]

[ref162] Mo Y.; Guan Y.; Verma P.; Guo J.; Fortunato M. E.; Lu Z.; Coley C. W.; Jensen K. F. Evaluating and Clustering Retrosynthesis Pathways with Learned Strategy. Chem. Sci. 2021, 12 (4), 1469–1478. 10.1039/D0SC05078D. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref163] Born J.; Manica M.; Cadow J.; Markert G.; Mill N. A.; Filipavicius M.; Janakarajan N.; Cardinale A.; Laino T.; Rodríguez Martínez M. Data-Driven Molecular Design for Discovery and Synthesis of Novel Ligands: A Case Study on SARS-CoV-2. Mach. Learn. Sci. Technol. 2021, 2 (2), 025024. 10.1088/2632-2153/abe808. [DOI] [Google Scholar]

[ref164] Rogers D.; Hahn M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50 (5), 742–754. 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]

[ref165] Boiko D. A.; MacKnight R.; Kline B.; Gomes G. Autonomous Chemical Research with Large Language Models. Nature 2023, 624 (7992), 570–578. 10.1038/s41586-023-06792-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref166] Guo T.; Guo K.; Nan B.; Liang Z.; Guo Z.; Chawla N. V.; Wiest O.; Zhang X.. What Can Large Language Models Do in Chemistry? A Comprehensive Benchmark on Eight Tasks. http://arxiv.org/abs/2305.18365 (accessed 2023-11-27).

[ref167] Meng Z.; Zhao P.; Yu Y.; King I.. A Unified View of Deep Learning for Reaction and Retrosynthesis Prediction: Current Status and Future Challenges. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence; International Joint Conferences on Artificial Intelligence Organization: Macau, SAR China, 2023; pp 6723–6731 10.24963/ijcai.2023/753. [DOI]

[ref168] Hu W.; Liu Y.; Chen X.; Chai W.; Chen H.; Wang H.; Wang G. Deep Learning Methods for Small Molecule Drug Discovery: A Survey. IEEE Trans. Artif. Intell. 2024, 5, 459. 10.1109/TAI.2023.3251977. [DOI] [Google Scholar]

[ref169] Jiang Y.; Yu Y.; Kong M.; Mei Y.; Yuan L.; Huang Z.; Kuang K.; Wang Z.; Yao H.; Zou J.; Coley C. W.; Wei Y. Artificial Intelligence for Retrosynthesis Prediction. Engineering 2023, 25, 32–50. 10.1016/j.eng.2022.04.021. [DOI] [Google Scholar]

[ref170] Kearnes S. M.; Maser M. R.; Wleklinski M.; Kast A.; Doyle A. G.; Dreher S. D.; Hawkins J. M.; Jensen K. F.; Coley C. W. The Open Reaction Database. J. Am. Chem. Soc. 2021, 143 (45), 18820–18826. 10.1021/jacs.1c09820. [DOI] [PubMed] [Google Scholar]

[ref171] Heid E.; Probst D.; Green W. H.; Madsen G. K. H. EnzymeMap: Curation, Validation and Data-Driven Prediction of Enzymatic Reactions. Chem. Sci. 2023, 14 (48), 14229–14242. 10.1039/D3SC02048G. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref172] Seierstad M.; Tichenor M. S.; DesJarlais R. L.; Na J.; Bacani G. M.; Chung D. M.; Mercado-Marin E. V.; Steffens H. C.; Mirzadegan T. Novel Reagent Space: Identifying Unorderable but Readily Synthesizable Building Blocks. ACS Med. Chem. Lett. 2021, 12 (11), 1853–1860. 10.1021/acsmedchemlett.1c00340. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref173] Lin Y.; Zhang R.; Wang D.; Cernak T. Computer-Aided Key Step Generation in Alkaloid Total Synthesis. Science 2023, 379 (6631), 453–457. 10.1126/science.ade8459. [DOI] [PubMed] [Google Scholar]

[ref174] Hardy M. A.; Nan B.; Wiest O.; Sarpong R. Strategic Elements in Computer-Assisted Retrosynthesis: A Case Study of the Pupukeanane Natural Products. Tetrahedron 2022, 104, 132584. 10.1016/j.tet.2021.132584. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref175] Soudier P.; Zúñiga A.; Duigou T.; Voyvodic P. L.; Bazi-Kabbaj K.; Kushwaha M.; Vendrell J. A.; Solassol J.; Bonnet J.; Faulon J.-L. PeroxiHUB: A Modular Cell-Free Biosensing Platform Using H ₂ O ₂ as Signal Integrator. ACS Synth. Biol. 2022, 11 (8), 2578–2588. 10.1021/acssynbio.2c00138. [DOI] [PubMed] [Google Scholar]

[ref176] Robinson C. J.; Carbonell P.; Jervis A. J.; Yan C.; Hollywood K. A.; Dunstan M. S.; Currin A.; Swainston N.; Spiess R.; Taylor S.; Mulherin P.; Parker S.; Rowe W.; Matthews N. E.; Malone K. J.; Le Feuvre R.; Shapira P.; Barran P.; Turner N. J.; Micklefield J.; Breitling R.; Takano E.; Scrutton N. S. Rapid Prototyping of Microbial Production Strains for the Biomanufacture of Potential Materials Monomers. Metab. Eng. 2020, 60, 168–182. 10.1016/j.ymben.2020.04.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref177] Zhang Z.; Fang L.; Wang F.; Deng Y.; Jiang Z.; Li A. Transforming Inert Cycloalkanes into α,ω-Diamines by Designed Enzymatic Cascade Catalysis. Angew. Chem., Int. Ed. 2023, 62 (16), e202215935. 10.1002/anie.202215935. [DOI] [PubMed] [Google Scholar]

[ref178] Liu Z.; Zhang X.; Lei D.; Qiao B.; Zhao G.-R. Metabolic Engineering of Escherichia Coli for de Novo Production of 3-Phenylpropanol via Retrobiosynthesis Approach. Microb. Cell Factories 2021, 20 (1), 121. 10.1186/s12934-021-01615-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref179] Yiakoumetti A.; Hanko E. K. R.; Zou Y.; Chua J.; Chromy J.; Stoney R. A.; Valdehuesa K. N. G.; Connolly J. A.; Yan C.; Hollywood K. A.; Takano E.; Breitling R. Expanding Flavone and Flavonol Production Capabilities in Escherichia Coli. Front. Bioeng. Biotechnol. 2023, 11, 1275651. 10.3389/fbioe.2023.1275651. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref180] Brito L. F.; Irla M.; Nærdal I.; Le S. B.; Delépine B.; Heux S.; Brautaset T. Evaluation of Heterologous Biosynthetic Pathways for Methanol-Based 5-Aminovalerate Production by Thermophilic Bacillus Methanolicus. Front. Bioeng. Biotechnol 2021, 9, 1. 10.3389/fbioe.2021.686319. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref181] Hanko E. K. R.; Valdehuesa K. N. G.; Verhagen K. J. A.; Chromy J.; Stoney R. A.; Chua J.; Yan C.; Roubos J. A.; Schmitz J.; Breitling R. Carboxylic Acid Reductase-Dependent Biosynthesis of Eugenol and Related Allylphenols. Microb. Cell Factories 2023, 22 (1), 238. 10.1186/s12934-023-02246-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Artificial Intelligence Methods and Models for Retro-Biosynthesis: A Scoping Review

Guillaume Gricourt

Philippe Meyer

Thomas Duigou

Jean-Loup Faulon

Abstract

Introduction

Figure 1.

Single-Step Retrosynthesis

Figure 2.

Molecular and Reaction Representations

Table 1. List of Molecular Representations Used in Single-Step Retrosynthesis.

Figure 3.

Template-Based

Template-Free

Semitemplate-Based

Single-Step Models Evaluation

Perspectives in Retro-Biosynthesis

Table 2. Single-Step Retro-Biosynthesis Methods.

Multistep Retrosynthesis

Figure 4.

Availability of Building Blocks

Planning and Search Algorithms

Breadth-First Search

Beam Search

Depth-First Proof Number Search (DFPN)

Monte Carlo Tree Search (MCTS)

A* Search

Other Reinforcement Learning Related Search

Table 3. Search Algorithms Commonly Used in Multi-Step Retrosynthesisa.

Perspectives in Retro-Biosynthesis

Scoring Function

Table 4. List of Scores Used to Evaluate the Routes.

Synthetic Accessibility Scores

Routes Ranking

Enzyme Search

Perspectives in Retro-Biosynthesis

Data Sets

Common Databases

Table 5. List of Datasets Commonly Used in AI Applied to Retro(-bio)synthesis.

Data Preparation

Discussion and Outlook

Glossary

Abbreviations

Supporting Information Available

Author Contributions

Supplementary Material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 3. Search Algorithms Commonly Used in Multi-Step Retrosynthesis^a.