Pathway design using de novo steps through uncharted biochemical spaces

Akhil Kumar; Lin Wang; Chiam Yu Ng; Costas D Maranas

doi:10.1038/s41467-017-02362-x

. 2018 Jan 12;9:184. doi: 10.1038/s41467-017-02362-x

Pathway design using de novo steps through uncharted biochemical spaces

Akhil Kumar ^1,^#, Lin Wang ^2,^#, Chiam Yu Ng ^2,^#, Costas D Maranas ^2,^✉

PMCID: PMC5766603 PMID: 29330441

Abstract

Existing retrosynthesis tools generally traverse production routes from a source to a sink metabolite using known enzymes or de novo steps. Generally, important considerations such as blending known transformations with putative steps, complexity of pathway topology, mass conservation, cofactor balance, thermodynamic feasibility, microbial chassis selection, and cost are largely dealt with in a posteriori fashion. The computational procedure we present here designs bioconversion routes while simultaneously considering any combination of the aforementioned design criteria. First, we track and codify as rules all reaction centers using a prime factorization-based encoding technique (rePrime). Reaction rules and known biotransformations are then simultaneously used by the pathway design algorithm (novoStoic) to trace both metabolites and molecular moieties through balanced bio-conversion strategies. We demonstrate the use of novoStoic in bypassing steps in existing pathways through putative transformations, assembling complex pathways blending both known and putative steps toward pharmaceuticals, and postulating ways to biodegrade xenobiotics.

Existing pathway design tools make use of existing reactions from databases or successively apply retrosynthetic rules. novoStoic provides an integrated optimization-based framework combining known reactions with novel steps in pathway design allowing for constraints on thermodynamic feasibility, product yield, pathway length and number of novel steps.

Introduction

Advances in genetic engineering capabilities have expanded the range of chemicals synthesized by microbial platforms to non-natural synthetic molecules such as drugs and various pharmaceutical precursors. Synthesis of these non-natural molecules often relies on enzymes with broad-substrate range^{1, 2} as well as enzymes with promiscuous activity on novel substrates³. A recent survey of Escherichia coli enzymes revealed that 37% of them could act on multiple substrates⁴. Protein engineering techniques have already demonstrated the feasibility of expanding the substrate range of existing enzymes^5–8. These examples allude to the increasingly important role of in silico protein modeling tools such as IPRO⁹ and Rosetta¹⁰ to systematically change enzyme substrate specificity. In addition, recent successes in the de novo enzyme design^{11, 12} have expanded the scope of enzymatic functions that can be called upon in the construction of synthetic de novo metabolic pathways. Nevertheless, the systematic redesign of enzymes for new and novel activities remains a daunting challenge implying that novel transformations should only be sparingly used only if they confer pathway length, carbon yield, and/or redox benefits.

Computational pathway design and retrosynthetic tools can be used to guide the systematic recruitment of either native or de novo enzymatic functions to assemble pathways toward targeted chemicals. This is an area of research with significant prior work. Network-based path finding methods such as PathComp¹³, Pathway Hunter Tool¹⁴, and MetaRoute¹⁵ identify linear pathways from a single source to one target molecule. These methods rely on heuristics such as substrate-product similarity, atom transitions, or substrate-product reaction co-occurrence frequency to reduce carbon loss while designing the path from source to target. In contrast to linear pathfinding methods, network optimization-based approaches such as CFP¹⁶, k-shortest EFM¹⁷, and optStoic¹⁸ can incorporate non-injective (i.e., not necessarily a one-to-one mapping between reactants to products) stoichiometry and directly model carbon flow as well as cofactor balancing. All network-based path finding methods require literature extracted information available in biochemical databases such as MetRxn¹⁹, KEGG²⁰, BRENDA²¹, and MetaCyc²². Prediction methods, on the other hand, expand upon uncovered knowledge space by suggesting putative reactant combinations plausible under the tenets of organic chemistry. By drawing from biochemistry principles, atom connectivity changes, or molecule fingerprint changes between substrates and products have been encoded as reaction rule operators in formats such as BEM²³, RDM²⁴, and SMIRKS²⁵. Pathway prediction techniques such as BNICE²⁶, XTMS²⁷, UM-PPS²⁸, PathPred²⁹, Route Designer³⁰, and GEM-Path³¹ employ reaction rule operators for a single molecular target iteratively in a retrosynthetic fashion so as to identify a bioconversion from a single source. This traversal strategy invoked by such retrosynthesis algorithms prunes the vast combinatorial space of putative transformations by evaluating free-energy change and substrate similarity metrics at each iteration. However, after each step, the trade-off between carbon yield, energy (ATP requirements), and thermodynamic feasibility of the main carbon conversion path often remain unexplored. Instead, tools such as TMFA³² and EFM³³ are used in a posteriori steps to assess energy and carbon efficiency. Computational tools such as optStoic¹⁸ can be used to design mass and energy balanced pathways. However, they are limited to using reactions present in existing biochemical databases and metabolic models. Therefore, invoking novel molecules as intermediates through hypothetical reactions while maintaining a mass and energy balanced pathway remains elusive.

To address the problems in current computational methods, herein we present rePrime and novoStoic, an optimization-based de novo pathway design framework that seamlessly integrates existing reactions and novel reaction rules. First, we augmented the MetRxn repository¹⁹ with a new data set of elementally balanced reaction operators using the automated CLCA-³⁴ based reaction rule extraction procedure termed rePrime. For each reaction, the rePrime procedure identifies and captures as a reaction rule the molecular graph topological changes underpinning the substrate to product graph conversion. A reaction rule is a vector that captures the location of active reaction centers affected by the conversion of substrates to products. The reaction rules and known reactions are then operated upon a mixed integer linear programming (MILP) procedure, novoStoic that identifies a mass-balanced biochemical network that converts a source metabolite to a target while satisfying a multitude of constraints and optimizing the objective function. Both precursors and possibly co-substrates and co-products required to satisfy the mass balance constraints serve as optimization variables (see Fig. 1 for a schematic overview). Overall, novoStoic identifies “by design” mass-balanced, high yield, and economically favorable biotransformations with a negative overall standard Gibbs free energy change from a substrate(s) to natural and synthetic product(s) by combining both existing and novel reactions. novoStoic can also be used to elucidate strategies to biodegrade xenobiotics by targeting a broad activity toward a topologically diverse set of substrates. In the Results sections, we introduce four case studies of different complexities, i.e., an illustrative example of rePrime/novoStoic procedure, the biosynthesis of 1,4-butanediol (bdo), the synthesis of phenylephrine, and the biodegradation of benzo[a]pyrene.

Fig. 1 — Schematic overview of the rePrime/novoStoic procedure. First, the rePrime procedure is used to pre-process the MetRxn database of reactions (blue boxes) to extract a unique set of reaction rules, R^λ (red boxes) at moiety size λ. The reaction rules are derived from the molecular signature $C_{m i}^{λ}$ of each participating metabolites and are captured by $T_{m r}^{λ}$ . The novoStoic procedure is then used to identify a series of intervening reactions and reaction rules that convert source molecule(s) (green circle) into target molecule(s) (orange circle) such that the profit can be maximized or the number of reactions in the pathway can be minimized. Other criteria including the number of reaction rules and thermodynamic feasibility of the pathway can be flexibly incorporated as constraints. By controlling the number of reaction rules, two possible pathways designed by novoStoic are shown in the bottom right panel. Pathway A uses two known reactions (blue boxes) that present in the MetRxn database, whereas pathway B uses a combination of a known reaction (blue box) and a reaction rule (red box) to perform the same conversion

Results

An illustrative example for rePrime and novoStoic

We first clarify the rePrime and novoStoic procedures using only two decarboxylase reactions (i) 2-hydroxyisophthalate decarboxylase (2HIPD) and (ii) salicylate decarboxylase (SLD), and four participating metabolites. The first step (rePrime) extracts reaction rules from a database of known reactions (Fig. 1). To this end, each metabolite i within the database is first encoded using a molecular signature (see Supplementary Methods for detail procedure). For a moiety size λ, a molecular signature $(C_{m i}^{λ})$ is a vector that concatenates into a single numeric value the number of attributes for every moiety m in a metabolite i, described as a collection of prime numbers centered at node n. For λ = 1, the moiety is simply the atom at node n, whereas at λ = 2 the moiety is composed of all the atoms bonded to the atom at node n (Fig. 2a). Figure 2b, c shows the iterative rePrime procedure to generate molecular signature at moiety size λ = 1 and 2. In brief, each iteration involves the assignment of a canonical label followed by a prime number assignment to each node. The unique prime numbers are then counted and stored in the molecular signature vector. We apply the rePrime algorithm to generate the molecular signatures for metabolites 2-hydroxyisopthalate (2hipa), salicylate (sal), carbon dioxide (CO₂), and phenol (phnl) at different moiety sizes λ (see Fig. 2b, c and Supplementary Tables 1, 2, and 3). A reaction rule $(T_{m j}^{λ})$ is defined as a vector that captures the changes in all moieties m of the participating metabolites upon reaction j. It is derived based on the reaction stoichiometry and the corresponding molecular signatures of the reactants and products. A reaction rule uniquely captures the eliminated and newly formed moieties around the reaction center. In Fig. 2d, we generated the reaction rules for reactions 2HIPD and SLD for moiety size λ = 1. Upon removing repetitive rules, a unique set of reaction rules are indexed by set R^λ (i.e., $T_{m r}^{λ}$ ). Hence, $T_{m, 2 HIPD}^{1}$ and $T_{m, SLD}^{1}$ are now stored as $T_{m, 1}^{1}$ .

Fig. 2 — The rePrime procedure is demonstrated for the molecule 2-hydroxy isophthalate (2hipa). a The moiety size (λ) indicates the distance between nodes in a molecular graph. At λ = 1, the moiety at node n (the carbon atom shaded in orange) is the atom itself. The moiety centered at node n, the number of unique moieties and reaction rules extracted from MetRxn at $λ = {1, 2, 3}$ are provided on the table. b, c The rePrime procedure iteratively assigns prime number to each node at λ = 1, 2 (see Supplementary Methods for details). The molecular signature of 2hipa, $C_{m, 2hipa}^{λ}$ contains the count of unique moieties (i.e., prime number). The resulting $C_{m, 2hipa}^{λ}$ increases in specificity at larger moiety size (λ) as neighboring atoms are also involved in determining the canonical label of the node. d The derivation of reaction rule for two decarboxylase reactions (i) 2-hydroxyisophtalate decarboxylase (2HIPD) and (ii) salicylate decarboxylase (SLD) are demonstrated at λ = 1. The resulting reaction rules $T_{m, 2HIPD}^{1}$ and $T_{m, SLD}^{1}$ are identical at this moiety size, hence they are stored as $T_{m, 1}^{1}$ . The specificity of the reaction rules increases with increasing moiety size λ, therefore $T_{m, 2HIPD}^{2}$ is not equal to $T_{m, SLD}^{2}$ (Supplementary Table 3)

The second step (novoStoic) uses an MILP representation to pose the task of identifying a component and moiety balanced biochemical pathway that converts a source to a target metabolite as an optimization problem. It searches the database of both known reactions and reaction rules to complete the pathway (Fig. 3) by introducing moiety balances along with component balances as key constraints in the MILP formulation. The two balances are linked through the imbalance metabolic flux v^imb, which quantifies the surplus or deficit for any metabolites in the metabolic network that necessitates the involvement of novel reaction steps (Fig. 3a) to balance the overall conversion. The overall change for moiety m $(\sum_{i \in I_{ex}} C_{m i}^{λ} v_{i}^{EX})$ is set to be equal to the moiety changes incurred in both the novel reaction network $(\sum_{r \in R^{λ}} T_{m r}^{λ} v_{r})$ and the ones implied by the known metabolic reaction network $(\sum_{i \in I} C_{m i}^{λ} v_{i}^{imb})$ (Fig. 3b). Herein, we used novoStoic to design biosynthesis pathways from 2hipa to phnl and obtained four separate solutions. novoStoic not only identified a pathway which involves two known reactions 2HIPD and SLD but also pathways which involve a single reaction rule $(T_{m, 1}^{1})$ implying a novel conversion (see Supplementary Fig. 1 and Supplementary Fig. 2 for the details).

Fig. 3 — The novoStoic procedure. novoStoic seamlessly blend existing reactions and reaction rules while designing a pathway from source metabolite to target metabolite. a The variable $v_{i}^{imb}$ defines the connection (green arrows) between the known reactions network (black arrows) and the reaction rules network (red arrows and red dotted box). It indicates the surplus or deficit of metabolite i in the known metabolic network. The reaction rules network is invoked to complete a pathway when no known reaction can catalyze the conversion. b Illustration of the component balance and moiety balance defined in novoStoic constraints (2) and (3) (Supplementary Methods). The bottom half (gray dotted box) is the known network operating on metabolites, while the upper half (red dotted box) is the reaction rules network operating on moieties. The section of the entire system that is involved in the moiety balance constraint is color-coded: (red) moiety changes by reaction rules, (blue) overall moiety changes, and (green) moiety changes by known reactions. The pathway here converts 2hipa to phnl using a known reaction (i.e., 2hipa → sal, black arrow) and a reaction rule (i.e., sal → phnl, red arrow). Metabolite 2hipa first enters the reaction rule network using exchange reaction (blue arrow, $v_{source}^{EX}$ ) and subsequently exported to the known reaction network (green arrow, $v_{2hipa}^{imb} = - 1$ ). It is converted by a known reaction (black arrow) into sal, which is then transferred to the reaction rule network (green arrow, $v_{sal}^{imb} = 1$ ). The reaction rule $T_{m r}^{1}$ (red arrow) converts sal into phnl, which is exported out of the entire system $(v_{target}^{EX})$ . Note that for clarity purpose, the involvement of carbon dioxide in the conversion of 2hipa to sal and sal to phnl is omitted

The toy example helps explain how the two key quantities molecular signatures $(C_{m i}^{λ})$ and reaction rules $(T_{m r}^{λ})$ are used to impose constraints for mass and moiety balance. In the following case studies, we will explore much more complex conversion strategies that make use of multiple novel reactions absent from the reaction databases. Each reaction rule identified is then matched to a corresponding enzyme homolog. Supplementary Methods provide a thorough description of the algorithmic details of rePrime and novoStoic along with a pseudo-code description.

1,4-butanediol synthesis

1,4-butanediol (bdo) is a commodity chemical that is used as an industrial solvent and as a monomer in polymer synthesis. Efforts are under way aimed at replacing petroleum-derived production with more sustainable processes using bio-based production based on genetically engineered microbial strains³⁵. By leveraging computational design and scoring of a large number of bdo biosynthetic routes, Yim et al. reconstituted a pathway in E. coli with a titer of 18 g/L[35]. Herein, we demonstrate three pathways designed using KEGG database and our novoStoic algorithm, which seamlessly combines known and novel steps starting from the precursor succinyl-CoA (succoa) toward bdo. novoStoic provides possibilities for lowering the number of steps in the pathway by invoking novel steps.

Pathway 4A identified by novoStoic is a five-step pathway of bdo synthesis that recapitulates the downstream pathway of Yim et al.³⁵ (Fig. 4, steps 1–5). This pathway converts succoa into bdo with a concomitant oxidation of four NAD(P)H and production of two co-enzyme A (coa) molecules. Co-substrate acetyl-coA is converted into acetate in the third step. The overall conversion of this pathway is given by

succoa + 3 NADPH + NADH + 4 H^{+} + a c c o a \overset{- 49 k c a l ∕ m o l}{\to} b d o + a c + 3 N A D P + N A D + 2 c o a .

Pathway 4A can be further improved upon by preventing carbon flux from draining toward undesirable co-substrates/co-products (i.e., accoa and acetate) when glucose is used as feedstock thereby leading to a higher theoretical carbon yield. Furthermore, we explored if a shorter pathway is feasible upon invoking one or two novel transformations. Therefore, we imposed three additional design criteria: (1) the fewest number of reaction rules, (2) the maximal number of steps from succoa to bdo is less or equal to four, and (3) no acetyl-CoA as a co-substrate. Criterion (1) was set as the objective function and criteria (2) and (3) were defined as constraints in the novoStoic MILP formulation. As a result, novoStoic identified pathway 4B that requires only four reaction steps (Fig. 4). Pathway 4B shares three reactions with pathway 4A (steps 1, 2, and 5). In a departure from pathway 4A, intermediate 4-hydroxybutanoate generated by step 2 is directly converted into 4-hydroxybutanal, thus bypassing steps 3 and 4, using reaction rule R1 (Fig. 4; Table 1). A putative succinate-semialdehyde dehydrogenase is suggested as the enzyme homolog for this step. The overall conversion can be described as followed:

succoa + 3 NADPH + NADH + 4 H^{+} \overset{- 52 k c a l ∕ m o l}{\to} b d o + 3 N A D P + N A D + c o a + H_{2} O .

Table 1.

Reaction templates for 1,4-butandiol (bdo) synthesis

graphic file with name 41467_2017_2362_Figa_HTML.gif

Open in a new tab

The reaction-rule template column presents the reactions that correspond to the reaction rules identified by the alphanumeric scheme (i.e., R1 and R2) for 14bdo synthesis. Step id identifies the reaction that was predicted from the corresponding reaction rule

Pathway 4C was identified by novoStoic upon appending the constraint of using a single cofactor (i.e., NADPH). Reducing the number of cofactors can help focus the strain engineering efforts on only increasing NADPH availability without having to worry about the balance of the stoichiometric ratio NADH:NADPH while improving both their availabilities.

succoa + 4 NADPH + 4 H^{+} \overset{- 52 k c a l ∕ m o l}{\to} bdo + 4 NADP + c o a + H_{2} O .

This exact pathway was identified before by GEM-path³¹ and found to be exhibiting higher theoretical product yield using FBA. In contrast to pathway 4B, two NADPH-dependent novel steps are required in this pathway, in order to bypass the NADH-dependent 4-hydroxybutyrate dehydrogenase (step 2). The direct production of succinic aldehyde from succinic semialdehyde (product of step 1) was proposed by novoStoic by invoking a homolog of succinate-semialdehyde dehydrogenase (R1). The homolog of 4-hydroxybutryate dehydrogenase (R2) was next suggested for the conversion of succinic aldehyde to 4-hydroxybutanal. In all three pathways, the intermediate 4-hydroxybutanal is produced in the second to last step, which is then converted into bdo using alcohol dehydrogenase (step 5).

Despite requiring the same numbers of redox cofactors, pathways 4B and 4C designed by novoStoic bypass the involvement of acetyl-CoA and generation of the by-product acetate. We caution the reader that computational pathway design using novoStoic requires careful scrutiny of the cofactor systems as there are often inconsistencies between current databases and experimental studies. For example, reaction rule R1 was reported to utilize ATP as a cofactor³⁶, whereas the manually approved Rhea database³⁷ and KEGG²⁰ denote it as a reversible reaction without utilization of ATP. When we enforced rule R1 to use ATP (Supplementary Tables 5, 6), novoStoic designed a pathway with a different overall stoichiometry including ATP as one of the cofactors:

ATP + succoa + 4 NADPH + 4 H^{+} \overset{- 58 k c a l ∕ m o l}{\to} b d o + 4 N A D P + c o a + A D P + p i .

This small molecule focused example demonstrates the potential of novoStoic to handle multiple constraints on pathway design and bypass multiple known steps by invoking novel conversions only when needed.

Phenylephrine synthesis

The non-natural molecule phenylephrine (phep) is a member of the phenylethanolamines class. It mimics the action of stimulants such as adrenaline and dopamine. Traditionally, phenylephrine is produced through chemical synthesis involving the reduction of the aromatic substrate m-hydroxybenzaldehyde³⁸. Figure 5 illustrates three pathways each starting with a different aromatic precursor with B and C identified by novoStoic. The pathways involve (i) homovanillate (hmv) as the only substrate, (ii) co-utilization of phenylalanine (phal) and methyl salicylate (msal), and (iii) catechol along with methyl salicylate.

Pathway 5A is an example of a design with a positive standard Gibbs free energy of change (Fig. 5a). It involves three existing reactions (steps 1–3) and three reaction rules (R3, R4, and R5 in steps 4–9) to perform the overall conversion as followed:

hmv + N H_{3} \overset{+ 127 k c a l}{\to} p h e p + O_{2} .

It is a solution typically predicted by substrate similarity-based retrosynthesis tools. Such algorithms first choose a likely substrate as the target feedstock and then proceed to prune the retrosynthesis reaction network using similarity-based filters on each backward reaction step until the shortest (one substrate to one product) linear route is found. The free energy change in the direction of bioconversion toward phep is positive and thus thermodynamically infeasible.

Pathway 5B involves the conversion of co-substrates phal and msal to phep with a concomitant production of l-serine (ser) and acetate (Fig. 5b). The overall conversion for pathway 5B is:

phal + m s a l + 2 H_{2} O + 4 O_{2} + N H_{3} \overset{- 323 k c a l}{\to} s e r + p h e p + a c e t a t e + 3 C O_{2} + H_{2} O_{2} .

In the first half of the 14 reaction cascade (steps 1–9), the salicylate produced upon the demethylation of msal is converted to acetate and pyruvate via the benzoate degradation pathway. Pyruvate is then aminated by serine hydratase to produce l-serine. The degradation of msal to l-serine and acetaldehyde also produces an S-adenosyl-l-methionine (adm), which acts as a methyl donor in the second half of the network (steps 10–16). In addition, the NADH consumed in steps 2 and 10 is regenerated in steps 7 and 9, thereby maintaining the NADH/NAD⁺ cofactor balance.

In the second half of the network, the phal to phep conversion starts with the reduction of phenylalanine by a homolog of phenylalaninase (step 10). A typical phenylalaninase hydroxylates phal at the 4th carbon position of the phenyl ring to produce p-tyrosine (Table 2, R6). However, to produce m-tyrosine (mtro), a hydroxylation at the 3rd carbon position is suggested (step 10). This reaction is not present in any of the reaction databases and is predicted de novo by novoStoic. Nevertheless, pacidamycin studies focusing on bacterial phenylalaninase have indicated the presence of the phenylalaninase homolog in many Streptomyces species with regiospecific hydroxylation activity at the 3rd carbon atom of the phenyl ring³⁹. For example, the phal to m-tyrosine reaction suggested by novoStoic is predicted as a secondary activity by the phenylalanine hydroxylase homolog encoded by pacX from Streptomyces coeruleorubidus³⁹.

Table 2.

Reaction templates for phenylephrine synthesis

graphic file with name 41467_2017_2362_Figb_HTML.gif

Open in a new tab

S-adenosyl-L-methionine (adm), S-adenosyl homocysteine (adh), ascorbic acid (asc), dehydroascorbic acid (dhasc), tetrahydrobiopterin (tdh) and 4a-hydroxytetrahydrobiopterin (dhb)

Step 11 involves the decarboxylation of m-tyrosine to form m-tyramine catalyzed by m-tyrosine decarboxylase (EC 4.1.1.28) (Fig. 5b). The conversion of m-tyramine to phep (steps 12, 13, 15, and 16) in the remainder of the network is identical to the subnetwork of pathway 5A with the same N-methyltransferase (R4) and oxidoreductase (R5) reactions. With msal contributing only one carbon as a methyl group toward phep synthesis, we can, therefore, consider only the second half (steps 10–14) of the network for phep if endogenous adm is supplied.

Pathway 5C shown in green in Fig. 5c combines a number of fungal reactions from the tyrosine and phenylalanine metabolism pathways for the conversion of catechol and methyl salicylate to acetaldehyde and phenylephrine. This pathway is similar to the phytochemical synthesis pathway of pseudoephedrine, wherein the carboligation product of the benzoate derivate and pyruvate undergoes transamination and subsequently N-methylation⁴⁰. The overall conversion for pathway 5C is:

catechol + m s a l + 2 O_{2} + H_{2} O + N H_{3} \overset{- 60 k c a l}{\to} p h e p + a c e t a l d e h y d e + 2 C O_{2} .

Two possible routes with 17 reactions each were suggested (Fig. 5c). novoStoic predicted a novel reaction (R3) to convert 23hpa to m-tyramine as the next step (step 14), which requires a homolog of monoamine oxidase. The monoamine oxidase typically deaminates p-tyramine, with trace deamination activity reported for m-tyramine^{41, 42}. Similar with pathway 5A and 5B, the conversion of m-tyramine to phenylephrine in the last two steps proceeds further by the action of an N-methyltransferase (R4) and oxidoreductase (R5). The de novo steps suggested by novoStoic provide the reaction templates for the development of protein engineered dopamine hydroxylase (R5, steps 16, 18) and phenylethanolamine methyltransferase (R4, steps 15, 19) that can act upon new substrates.

The co-substrate S-adenosyl-l-methionine (adm) required by the N-methylation reaction is generated in the O-methylation reaction by the action of salicylate 1-O-methyltransferase (step 5). Therefore, msal provides the carbons in the phenolic and the methylamino moieties, whereas catechol provides the carbons in the ethyl moiety of phenylephrine through its degradation to pyruvate (Fig. 5c). Steps 1–4 can be bypassed if pyruvate is directly provided. Note that the reaction in step 13 is treated as reversible in accordance with the reaction designation in the source database MetRxn without requiring ATP. However, literature evidence³⁶ suggests that step 13 requires ATP. The updated overall stoichiometry with ATP as a cofactor for step 13 becomes:

A T P + c a t e c h o l + m s a l + 2 O_{2} + 2 H_{2} O + N H_{3} \overset{- 66 k c a l}{\to} p h e p + a c e t a l d e h y d e + 2 C O_{2} + A D P + p i .

The total number of steps of the designed pathways is generally larger than the synthetic chemistry-based approaches^43–45. However, if the methyl donor adm can be provided by the host cell (e.g., engineered Saccharomyces cerevisiae⁴⁶ and Pichia pastoris⁴⁷), then pathway 5B can be shortened into four steps thereby becoming on par with the synthetic chemistry pathways. It is noteworthy that the identified pathways use a more appealing set of substrates (e.g., catechol and methyl salicylate) with a lower cost than the substrate (i.e., m-hydroxybenzaldehyde) commonly used for chemical synthesis. In all the three pathways designed, the conversion of the intermediate m-tyramine to phenylephrine involves reaction mechanisms that can be derived from natural enzymes thus suggesting potential candidates for protein engineering (Table 2). In the next case study, we focus on identifying biodegradation pathways using reaction rules from a particular organism or species (see Supplementary Methods, novoStoic design criteria iii/iv).

Oxidative degradation of benzo[a]pyrene to catechol

Polycyclic aromatic hydrocarbons (PAHs), with mutagenic and carcinogenic properties, are unfortunately ubiquitous in the environment and have both natural and anthropogenic origins⁴⁸. Biodegradation studies around industrial effluent treatment plants and hydrocarbon drilling sites have implicated PAHs as the sole carbon and energy source for many soil dwelling organisms⁴⁹. Metabolic assays on a number of terrestrial bacterial species indicate the involvement of a common cluster of metabolic genes to convert various PAHs to metabolites of the central carbon pathways⁵⁰. Numerous studies have identified various pathway intermediates and pointed toward the use of ring-cleaving dioxygenases to degrade various PAHs into pyruvate, catechol, and phthalate⁵¹. The dioxygenases first introduce oxygen to the aromatic rings (i.e., dioxygen activation), which are subsequently hydroxylated⁵². Decyclization reactions at the ortho-positions involve intradiol dioxygenases to cleave the carbon–carbon bonds between the two hydroxyl groups (e.g., R9 in Fig. 6), while the decyclization reactions at meta-positions involve extradiol dioxygenases to cleave carbon–carbon bonds adjacent to one of the hydroxyl groups (Fig. 7). When the substrate undergoes ortho-cleavage upon dioxygen activation, dehydrogenase, dioxygenase, and decarboxylase reactions enable the release of K-region carbons (Fig. 6) as two carbon dioxide molecules. However, when the substrate undergoes meta-cleavage, a pyruvate and carbon dioxide molecule is released for each decyclization step.

Fig. 6 — Benzo[a]pyrene degradation pathway. The oxidative degradation routes suggested by novoStoic combine both existing reactions and reaction rules in a mass-balanced fashion. The degradation products pyruvate and catechol were set as targets and benzo[a]pyrene was set as the source metabolite. Known reactions are indicated by solid lines while reactions suggested using reaction rules are denoted with dashed lines. Reaction rule ids are shown in red (see Table 3 for description). Novel intermediates predicted by novoStoic are abbreviated using an alphanumeric scheme (i.e., h1, h2, h3, etc.). The predicted routes are color-coded in blue and green based on the overall conversion. The degradation initiates with the formation of a (poly)aromatic diol in a dioxygenase reaction (R11–R12), while a subsequent oxidation forms a heterocyclic chromene derivative (h3). Next, a ring opening reaction of the chromene derivative (R14) is followed by an aldolase reaction to yield pyruvate and a (poly)aromatic aldehyde (R15). The aldehyde is then oxidized yielding carbon dioxide and a (poly)aromatic diol as a substrate for the next decyclization process (R16, R17)

Fig. 7 — Intradiol and extradiol cleavage. Intradiol dioxygenases catalyze the decyclization reactions at *ortho-*positions, while extradiol dioxygenases catalyze the decyclization reactions at *meta-*positions

Using novoStoic, we considered both cleavage strategies while suggesting multiple catabolic routes for benzo[a]pyrene (bzp) to catechol. In addition, we also factored into our pathway design the findings from multiple metabolic PAHs degradation studies on various species. Metabolic studies on the Pseudomonas indicate the recruitment of nah genes and its homologs associated with naphthalene degradation in converting diverse bay-region PAHs to pyruvate through the naphthalene degradation pathway intermediates 1-naphthol-2-carboxylate and salicylaldehyde⁵³. Thus, we focused our design rules toward reaction rules related to naphthalene degradation pathways by setting $y_{nah}^{path} = 1$ , $v_{catechol}^{EX} \geq 1,$ and $v_{pyr}^{EX} \geq 1$ . In this way, we ensured that at least one molecule of catechol and pyruvate should be produced from each molecule of benzo[a]pyrene. We also imposed constraints (8) and (9) by setting $y_{P s e u d o m o n a s genus}^{org} = 1$ to study PAH biodegradation by only the organisms belonging to the Pseudomonas genus⁵³. The reactions to genus associations were downloaded from KEGG. The benzo[a]pyrene degradation network to catechol is derived by limiting the list of co-substrates/products to central carbon metabolites and small molecules such as CO₂, H₂O, and O₂.

By expanding upon the boundaries of catabolic routes to include cataloged intermediates and reactions, we identify the set of previously unknown reactions and provide a putative explanation for the complete degradation of benzo[a]pyrene to catechol. Figure 6 illustrates seven different routes from benzo[a]pyrene to catechol identified by novoStoic with the minimal number of hypothetical reaction steps (16 steps). The pathway design in blue contains one bioconversion route from benzo[a]pyrene to catechol while the grouping in green contains six different bioconversion routes. Our result recapitulates the findings of PAHs degradation through ortho-cleavage or meta-cleavage pathways and imputes metabolites into the ill-defined and incomplete benzo[a]pyrene degradation substrate annotations in current biochemistry databases.

Pathway 6A (Fig. 6, blue route) combines 21 reactions for the biodegradation of benzo[a]pyrene to catechol, pyruvate, and carbon dioxide. The overall stoichiometric conversion for Pathway 6A is:

b z p + 8 O_{2} + 3 H_{2} O \to c a t e c h o l + 3 p y r + 5 C O_{2} .

The first 16 steps in the pathway are hypothetical reactions predicted by novoStoic. Reactions in steps 1–4 have incomplete EC numbers in KEGG. They were reproduced using reaction rules extracted from the reactions naphthalene dioxygenase (Table 3, R7), naphthalene dihydrodiol dehydrogenase (R8), protocatechuate oxidoreductase (R9), and phthalate decarboxylase (R10). Steps 5 and 6 were derived using reaction rules extracted from terephthalate dioxygenase (R11) and cyclohexadiene oxidoreductase (R12) reactions. Steps 1 and 2 are the dioxygen activation steps for ortho-cleavage, while steps 5 and 6 are dioxygen activation steps for meta-cleavage. Steps 7–11, 12–16, and 17–21 depict a cyclic pathway wherein a chromene derivative is produced (steps 7, 12, and 17) followed by the ring opening reaction (steps 8, 13, and 18). Next, a pyruvate molecule and an aromatic aldehyde are produced in steps 9, 14, and 19. The aromatic carboxylate generated in steps 10, 15, and 20 are subsequently decarboxylated to produce substrates (diols) for the next iteration of biodegradation. Steps 17–21 are known reactions indexed in current biochemistry databases. Additionally, of the 22 intermediate metabolites identified in this pathway, 16 are listed in most biochemistry databases.

Table 3.

Reaction templates for benzo[a]pyrene

graphic file with name 41467_2017_2362_Figc_HTML.gif

Open in a new tab

Note that reaction rules R13 to R17 are listed as step 17 to 21 in Fig. 6

Pathway 6B spans six separate routes, and they have the same stoichiometric overall conversion:

b z p + 9 O_{2} + 3 H_{2} O \to c a t e c h o l + 2 p y r + a c l d + 6 C O_{2} .

While one of the routes in pathway 6B shares steps 1–3 with pathway 6A, all the routes share steps 17–21 with pathway 6A. Each route in pathway 6B requires the same number of 22 steps. In each route, the substrate undergoes two ortho-cleavage reactions and loses the K-region carbons as two carbon dioxide molecules, to converge at the hypothetical metabolite h9. Subsequently, h9 undergoes meta-cleavage to yield a pyruvate and an acetaldehyde molecule, as well as the naphthalene (ntl) degradation pathway intermediate, naphthalene 1,2-diol (ntdl). Ntdl is further degraded to catechol using steps 17–21.

novoStoic enables us to predict various intermediates and bioconversions and rapidly develop a hypothesis consistent with sparse information in many PAHs degradation studies and biochemical databases. The pathways predicted by novoStoic provide a complete overall stoichiometry of reactions from benzo[a]pyrene to catechol, while none of the current metabolic databases and metabolic models contain a complete degradation pathway. Based on the predicted degradation pathways we proposed here and a number of PAHs degradation studies, molecules such as chrysene and picene would not require any ortho-cleavage reactions to form naphthalene intermediates, while molecules such as pyrene and triphenylene would require at least one ortho-cleavage to form ntl intermediates.

Discussion

In this paper, we introduce two novel procedures rePrime and novoStoic for the de novo pathway design. rePrime is a reaction rule-based algorithm that encodes reaction centers as elementally balanced operators. These reaction rules capture moiety changes in the reaction centers by using the changes in the counts of prime numbers (i.e., canonical label for moieties) between substrates and products. Other than metabolites currently present in the database, our approach can be extended to novel metabolites as long as the structure can be codified as counts of moieties (i.e., molecular signature). rePrime allows for different moiety sizes. In the current implementation of rePrime, we trace moieties of up to a size of λ = 3. In principle, one could expand the size of moieties traced or customize the size of the moiety traced based on the underlying reaction chemistry. By combining metabolite balance and moiety balance constraints, novoStoic simultaneously integrates reaction rules with known reactions. It thus enables homing in first to the most desirable designs avoiding costly enumeration of alternatives that either include too many novel steps, are redox imbalanced, or fail to meet cost/yield requirements. The MILP-based computational framework allows for straightforward control of cofactor regeneration, the number of novel reactions, and the imposition of carbon yield or profit margin requirements.

novoStoic allows us to exploit enzyme plasticity by suggesting homologs to perform the hypothesized conversion when natural options are not available. In typical industrial bioprocesses, the number of novel reactions must be carefully controlled (or minimized) as each novel reaction implies an additional enzyme–substrate engineering challenge. In the event that the homolog is not promiscuous, protein engineering steps have to be recruited to enhance non-natural substrate binding (e.g., by tuning the binding pocket structure to accommodate the non-natural substrate⁵⁴) and subsequently to increase catalytic rate⁵⁵. For example, Cargill, Inc. engineered a multi-step 3-hydroxypropionic acid biosynthesis pathway, which employed a single non-natural enzyme (i.e., alanine 2,3-aminomutase), to bypass an ATP consuming step⁵⁶. The team had to engineer a homolog lysine 2,3-aminomutase to confer it the desirable activity and at the same time select a variant with the least negative effect on the host cell⁵⁷. With the capability to blend known reactions and non-natural ones, novoStoic could invoke novel steps only when necessary.

A number of chemical manufacturing processes are increasingly exploiting the chemoselectivity and catalytic rate boost potential of enzymes^{8, 58} for the synthesis of pharmaceuticals and precursors. Studies have demonstrated that multi-enzyme cascades of non-natural enzymes can be implemented in both in vivo and in vitro fashion as well as in combination⁵⁹. rePrime/novoStoic address the timely challenge of integrating recent advancements for the rapid identification of complete pathways for bio-based chemosynthesis and the elucidation of intermediates of ill-defined xenobiotic degradative pathways. The detailed degradation map can therefore assist in evaluating the toxicity and potential side effects of new drugs, and even enable the assessment of synergistic, antagonist, or toxic drug interactions.

novoStoic sometimes predicts pathways where the rules invoked to fill in intermediate steps could map to multiple possible reactions. The degree of specificity of the reaction rules can be controlled by preferentially using moieties of size 3 or 2 and only size 1 if no solutions were recovered. Note that novoStoic does not allow for mixing of moieties of different sizes during the pathway design phase. As anticipated, larger moiety sizes generally yield novel steps “closer” to a known reaction and thus more likely to involve an existing (promiscuous) enzyme with some level of this activity. However, larger moiety sizes (distance of 2 or 3) severely restrict the number of possibilities for novel steps. Generally, we start the pathway design using moieties of distance 3 and then reduce to 2 or even 1 depending on the efficacy of the search so far. In addition, the requirement of elementally balanced reaction rules with proper cofactor utilization and stereo-chemical changes necessitate a high-quality biochemical database as an input for rePrime/novoStoic. Incomplete or incorrect reaction annotation (e.g., molecular structure, stereochemistry, stoichiometry, cofactor, and reaction mechanism) could significantly affect the quality of the rules identified and the reliability of a pathway. A number of automated algorithms⁶⁰ and procedures⁶¹ have been developed to reduce annotation inconsistencies and unify discrepancies across different databases. However, expert curation is often necessary to include updated discoveries (e.g., fixing cofactor utilization of a stoichiometrically balanced reaction) as well as to evaluate and resolve contradicting information⁶². Furthermore, we generally treat all rules as reversible. Therefore, additional scrutiny may be needed to ensure that the reaction rule ultimately maps to a reaction that is thermodynamically feasible.

Moving forward, the rePrime/novoStoic framework can be augmented by ranking designed pathways based on additional criteria on enzyme performance, toxicity of intermediate metabolites coupled with genetic intervention tools to explore high-yield reaction modulations or deletions^63–65. Machine learning-based algorithms such as support vector machine and Gaussian processes have already been applied to select protein sequences based on their predicted probability of promiscuity and enzyme–substrate affinity^{59, 66}. Databases such as Tox21⁶⁷ can be used to predict the lethality of pathway intermediates and thus filter accordingly pathway designs. In addition, the pathways can be ranked based on their orthogonality score⁶⁸ when exploring genetic intervention strategies after the pathway/reaction addition in the production host are made. The addition of new pathways designed by rePrime/novoStoic or other methods will increase the probability of orthogonality⁶⁸ by expanding upon the range of alternative pathways and possibly destroy growth-coupled mode of production.

Methods

Data and parameters required by rePrime and novoStoic

Reactions and metabolites from KEGG, BRENDA, MetaCyc, Rhea, HMDB, ECMDB, ChEBI, and ChEMBL and over 112 metabolic models were aggregated and standardized to create a database with elementally balanced reactions using the MetRxn curation workflow¹⁹. Note that the application of unbalanced reactions should always be avoided as they would generate elementally unbalanced rules thus leading to incorrect predictions. rePrime/novoStoic requires as input: (i) the standardized data set, which contains 44,784 unique elementally balanced reactions and 32,478 metabolites encoded within sets J and I, respectively, (ii) molecular graph of each metabolite (include cofactors), wherein each atom was represented by a node n that is indexed uniquely in the set $N_{i}$ , (iii) formation energies of exchange metabolites that were calculated using component contribution method⁶⁹ or eQuilibrator⁷⁰ (at standard cellular conditions, pH 7.0, and ionic strength of 0.1 M) and stored as $Δ_{f} G_{i}^{' \circ}$ , and (iv) the reactions in pathway/subsystem (set $P$ ) and organism/genus/taxa (set B) annotations downloaded from KEGG and stored as sets J_P and J_B, respectively.

Developing a database of reaction rules using rePrime

The rePrime procedure uses the information encoded in the molecular graphs to generate molecular signatures and subsequently reaction rules. Principles from existing reaction rule operators BEM²³, RDM²⁴, and SMIRKS²⁵ schemes were integrated into rePrime. BEM tracks bond changes as a summation operator, whereas RDM codifies the topological changes in the neighborhood of the reaction center, as changes between chemical fingerprints. The SMIRKS protocol leverages prime factorization to codify molecular structures as canonical strings. In rePrime, all nodes in each molecular graph were first assigned canonical labels and ranked (Fig. 2). Each molecular graph was then represented by a vector called molecular signature that captures the count of unique nodes (i.e., moiety centered at node n) identified by their canonical labels. Molecular signatures provide a convenient data structure for elementary graph operations. Graph edit operations such as addition and/or subtraction to transform the molecular signatures of reactants into products capture the bond changes between reactants and products. By linking bond changes to the reactant-product molecular signature of cataloged reactions, we can identify generalized reaction rules primitives shared by multiple reactions.

The rePrime procedure to generate these reaction rules can be divided into three main steps (see Fig. 2, Supplementary Table 4, and Supplementary Methods for algorithmic details).

Step 1: Each node on a molecule is first assigned a canonical label. Note that the canonical label can be extended with stereo-descriptors as described in the CLCA³⁴ algorithm to account for stereo-chemical changes when the goal of a pathway design involves such changes. A prime number is then assigned iteratively to each node for every metabolite (including cofactors), wherein each prime number maps uniquely to a specific moiety.

Step 2: The molecular signature $C_{m i}^{λ}$ (i.e., a vector containing the number of moieties) for each metabolite (including cofactors) is extracted.

Step 3: The reaction rule termed $T_{m j}^{λ}$ for each reaction in the database is derived based on the stoichiometry of the reaction and the molecular signatures of the participating metabolites. A unique set of reaction rules $T_{m r}^{λ}$ is then retained to generate the reaction rule database (Fig. 1). Every rule is currently assumed to be reversible. However, additional constraints can be added to restrict the reversibility of reaction based on experimental evidence or Gibbs free energy of reaction (e.g., component contribution method⁶⁹).

Steps 1–3 were repeated for different moiety sizes (λ = {1, 2, 3}), which represent the distance between atoms (nodes) in a molecular graph (Fig. 2a). After analyzing 44,784 (non-transport) MetRxn reactions using rePrime, a total of 826, 1929, and 6043 unique reaction rules at moiety size (λ) of 1, 2, and 3, respectively, were extracted (see Supplementary Methods for details). The database for reaction rules was then used within the novoStoic procedure to predict the route to product molecules from reactant molecules. rePrime is consistent with the algorithmic requirements (i.e., elementally balanced reaction rules) of the novoStoic pathway prediction procedure.

Pathway design by using the novoStoic optimization framework

The novoStoic procedure implements an MILP optimization framework to identify a metabolite- and moiety-balanced biochemical pathway that converts source to target molecule(s) (Fig. 1). In particular, the novoStoic algorithm searches within a merged database of both MetRxn reactions and rePrime reaction rules. Preference is given to the use of reactions while reaction rules are minimally invoked when there is not a cataloged reaction available to complete the pathway. The objective function generally involves the maximization of the profit margin (i.e., the cost difference between substrate and product) as a way of prioritizing toward biosynthetic routes from inexpensive substrates to high-value products. The objective function can be modified to the minimization of the number of reaction rules or known reaction step(s) that form the pathway. By design, the pathway is component and moiety-balanced with an overall standard free energy of change that is negative. The combined use of reaction rules expands the reaction search space with putative enzyme-catalyzed reactions that may be realized through protein engineering for the desired substrate specificity. Therefore, novoStoic could explore de novo biosynthetic or biodegradation routes, which are yet to be catalogued in the enzyme databases. The novoStoic algorithm is versatile enough to allow for any combination of the design rules such as customization of network size, selection of host organism to minimize heterologous reactions, and enforcing the selection of reactions/rules from common categories (see Supplementary Methods for details).

Code availability

The code supporting the findings of this study is available at https://github.com/maranasgroup as Python and GAMS scripts. The Git repositories for rePrime and novoStoic will be made public post publication.

Data availability

All data generated or analyzed during this study are included in this published article and on MetRxn database (http://www.maranasgroup.com/metrxn).

Electronic supplementary material

Supplementary Information^{(631.1KB, pdf)}

Acknowledgements

We acknowledge the inputs given by Anthony Burgard, Anupam Chowdhury, and Ali Khodayari at the various stages of idea creation, refinement, and realization. We gratefully acknowledge funding from the DOE (http://www.energy.gov/) grant no. DE-SC0008091, DOE's Center for Bioenergy Innovation (CBI), and NSF (http://www.nsf.gov/) award no. EEC-0813570. The funders had no role in the study design, data collection, and analysis, decision to publish, or preparation of the manuscript.

Author contributions

A.K. and C.D.M. conceived the study. A.K., L.W., and C.Y.N. designed the algorithm and performed the simulations, data analysis, and interpretation. All authors contributed to writing the manuscript and result discussion.

Competing interests

The authors declare no competing financial interests.

Footnotes

Akhil Kumar, Lin Wang and Chiam Yu Ng contributed equally to this work.

Electronic supplementary material

Supplementary Information accompanies this paper at 10.1038/s41467-017-02362-x.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Rodriguez GM, Tashiro Y, Atsumi S. Expanding ester biosynthesis in Escherichia coli. Nat. Chem. Biol. 2014;10:259–265. doi: 10.1038/nchembio.1476. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Atsumi S, Hanai T, Liao JC. Non-fermentative pathways for synthesis of branched-chain higher alcohols as biofuels. Nature. 2008;451:86–89. doi: 10.1038/nature06450. [DOI] [PubMed] [Google Scholar]
3.Khersonsky O, Roodveldt C, Tawfik DS. Enzyme promiscuity: evolutionary and mechanistic aspects. Curr. Opin. Chem. Biol. 2006;10:498–508. doi: 10.1016/j.cbpa.2006.08.011. [DOI] [PubMed] [Google Scholar]
4.Nam H, et al. Network context and selection in the evolution to enzyme specificity. Science. 2012;337:1101–1104. doi: 10.1126/science.1216861. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Coelho PS, Brustad EM, Kannan A, Arnold FH. Olefin cyclopropanation via carbene transfer catalyzed by engineered cytochrome P450 enzymes. Science. 2013;339:307–310. doi: 10.1126/science.1231434. [DOI] [PubMed] [Google Scholar]
6.Young EM, Tong A, Bui H, Spofford C, Alper HS. Rewiring yeast sugar transporter preference through modifying a conserved protein motif. Proc. Natl Acad. Sci. USA. 2014;111:131–136. doi: 10.1073/pnas.1311970111. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Huisman GW, Liang J, Krebber A. Practical chiral alcohol manufacture using ketoreductases. Curr. Opin. Chem. Biol. 2010;14:122–129. doi: 10.1016/j.cbpa.2009.12.003. [DOI] [PubMed] [Google Scholar]
8.Savile CK, et al. Biocatalytic asymmetric synthesis of chiral amines from ketones applied to sitagliptin manufacture. Science. 2010;329:305–309. doi: 10.1126/science.1188934. [DOI] [PubMed] [Google Scholar]
9.Saraf MC, Moore GL, Goodey NM, Cao VY. IPRO: an iterative computational protein library redesign and optimization procedure. Biophys. J. 2006;90:4167–4180. doi: 10.1529/biophysj.105.079277. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Liu Y, Kuhlman B. RosettaDesign server for protein design. Nucleic Acids Res. 2006;34:W235–W238. doi: 10.1093/nar/gkl163. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Siegel JB, et al. Computational protein design enables a novel one-carbon assimilation pathway. Proc. Natl Acad. Sci. USA. 2015;112:3704–3709. doi: 10.1073/pnas.1500545112. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Khersonsky O, et al. Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59. Proc. Natl Acad. Sci. USA. 2012;109:10358–10363. doi: 10.1073/pnas.1121063109. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Ogata H, et al. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 1999;27:29–34. doi: 10.1093/nar/27.1.29. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Rahman SA, Advani P, Schunk R, Schrader R, Schomburg D. Metabolic pathway analysis web service (Pathway Hunter Tool at CUBIC) Bioinformatics. 2005;21:1189–1193. doi: 10.1093/bioinformatics/bti116. [DOI] [PubMed] [Google Scholar]
15.Blum T, Kohlbacher O. MetaRoute: fast search for relevant metabolic routes for interactive network navigation and visualization. Bioinformatics. 2008;24:2108–2109. doi: 10.1093/bioinformatics/btn360. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Pey J, Prada J, Beasley JE, Planes FJ. Path finding methods accounting for stoichiometry in metabolic networks. Genome Biol. 2011;12:R49. doi: 10.1186/gb-2011-12-5-r49. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.de Figueiredo L, Podhorski A, Rubio A. Computing the shortest elementary flux modes in genome-scale metabolic networks. Bioinformatics. 2009;25:3158–3165. doi: 10.1093/bioinformatics/btp564. [DOI] [PubMed] [Google Scholar]
18.Chowdhury A, Maranas CD. Designing overall stoichiometric conversions and intervening metabolic reactions. Sci. Rep. 2015;5:16009. doi: 10.1038/srep16009. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Kumar A, Suthers PF, Maranas CD. MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases. BMC Bioinformatics. 2012;13:6. doi: 10.1186/1471-2105-13-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Kanehisa M, Goto S, Sato Y. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 2014;42:D199–D205. doi: 10.1093/nar/gkt1076. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Schomburg I, Chang A, Placzek S. BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA. Nucleic Acids Res. 2013;41:D764–D772. doi: 10.1093/nar/gks1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Caspi R, et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 2012;40:D742–D753. doi: 10.1093/nar/gkr1014. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Dugundji Ivar JU. An algebraic model of constitutional chemistry as a basis for chemical computer programs. Comput. Chem. 1973;39:19–64. doi: 10.1007/BFb0051317. [DOI] [Google Scholar]
24.Yamanishi Y, Hattori M, Kotera M, Goto S, Kanehisa M. Enzyme: predicting potential EC numbers from the chemical transformation pattern of substrate-product pairs. Bioinformatics. 2009;25:i179–i186. doi: 10.1093/bioinformatics/btp223. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Weininger D, Weininger A, Weininger JL. SMILES. 2. Algorithm for generation of unique SMILES notation. J. Chem. Inf. Model. 1989;29:97–101. doi: 10.1021/ci00062a008. [DOI] [Google Scholar]
26.Finley SD, Broadbelt LJ, Hatzimanikatis V. Computational framework for predictive biodegradation. Biotechnol. Bioeng. 2009;104:1086–1097. doi: 10.1002/bit.22489. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Carbonell P, Parutto P, Herisson J, Pandit SB, Faulon JL. XTMS: pathway design in an eXTended metabolic space. Nucleic Acids Res. 2014;42:W389–W394. doi: 10.1093/nar/gku362. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Fenner K, Gao J, Kramer S, Ellis L, Wackett L. Data-driven extraction of relative reasoning rules to limit combinatorial explosion in biodegradation pathway prediction. Bioinformatics. 2008;24:2079–2085. doi: 10.1093/bioinformatics/btn378. [DOI] [PubMed] [Google Scholar]
29.Moriya Y, et al. PathPred: an enzyme-catalyzed metabolic pathway prediction server. Nucleic Acids Res. 2010;38:1–6. doi: 10.1093/nar/gkq318. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Law J, et al. Route Designer: a retrosynthetic analysis tool utilizing automated retrosynthetic rule generation. J. Chem. Inf. Model. 2009;49:593–602. doi: 10.1021/ci800228y. [DOI] [PubMed] [Google Scholar]
31.Campodonico MA, Andrews BA, Asenjo JA, Palsson BO, Feist AM. Generation of an atlas for commodity chemical production in Escherichia coli and a novel pathway prediction algorithm, GEM-Path. Metab. Eng. 2014;25:140–158. doi: 10.1016/j.ymben.2014.07.009. [DOI] [PubMed] [Google Scholar]
32.Henry CS, Broadbelt LJ, Hatzimanikatis V. Thermodynamics-based metabolic flux analysis. Biophys. J. 2007;92:1792–1805. doi: 10.1529/biophysj.106.093138. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Boghigian BA, Shi H, Lee K, Pfeifer BA. Utilizing elementary mode analysis, pathway thermodynamics, and a genetic algorithm for metabolic flux determination and optimal metabolic network design. BMC Syst. Biol. 2010;4:49. doi: 10.1186/1752-0509-4-49. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Kumar A, Maranas CD. CLCA: maximum common molecular substructure queries within the MetRxn database. J. Chem. Inf. Model. 2014;54:3417–3438. doi: 10.1021/ci5003922. [DOI] [PubMed] [Google Scholar]
35.Yim H, et al. Metabolic engineering of Escherichia coli for direct production of 1,4-butanediol. Nat. Chem. Biol. 2011;7:445–452. doi: 10.1038/nchembio.580. [DOI] [PubMed] [Google Scholar]
36.Akhtar MK, Turner NJ, Jones PR. Carboxylic acid reductase is a versatile enzyme for the conversion of fatty acids into fuels and chemical commodities. Proc. Natl Acad. Sci. USA. 2013;110:87–92. doi: 10.1073/pnas.1216516110. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Morgat A, Lombardot T, Axelsen KB, Aimo L. Updates in rhea—an expert curated resource of biochemical reactions. Nucleic Acids Res. 2016;45:4279. doi: 10.1093/nar/gkw1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Baison W, Teerawutgulrag A, Puangsombat P, Rakariyatham N. An alternative synthesis of (+/−)-phenylephrine hydrochloride. Maejo Int. J. Sci. Technol. 2014;8:41–47. [Google Scholar]
39.Zhang W, Ames BD, Walsh CT. Identification of phenylalanine 3-hydroxylase for meta-tyrosine biosynthesis. Biochemistry. 2011;50:5401–5403. doi: 10.1021/bi200733c. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Hagel JM, Krizevski R, Marsolais F, Lewinsohn E, Facchini PJ. Biosynthesis of amphetamine analogs in plants. Trends Plant. Sci. 2012;17:404–412. doi: 10.1016/j.tplants.2012.03.004. [DOI] [PubMed] [Google Scholar]
41.Lenders JWM, et al. Specific genetic deficiencies of the A and B isoenzymes of monoamine oxidase are characterized by distinct neurochemical and clinical phenotypes. J. Clin. Invest. 1996;97:1010–1019. doi: 10.1172/JCI118492. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.McClymont K, Soyer OS. Metabolic tinker: an online tool for guiding the design of synthetic metabolic pathways. Nucleic Acids Res. 2013;41:e113. doi: 10.1093/nar/gkt234. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Pandey RK, Upadhyay PK, Kumar P. Enantioselective synthesis of (R)-phenylephrine hydrochloride. Tetrahedron Lett. 2003;44:6245–6246. doi: 10.1016/S0040-4039(03)01554-5. [DOI] [Google Scholar]
44.Russell PB, Childress SJ. New route to phenylephrine. J. Pharm. Sci. 1961;50:713–771. doi: 10.1002/jps.2600500824. [DOI] [PubMed] [Google Scholar]
45.Gurjar MK, Krishna LM, Sarma BVNBS, Chorghade MS. A practical synthesis of (R)-(-)-phenylephrine hydrochloride. Org. Proc. Res. Dev. 1998;2:422–424. doi: 10.1021/op970128+. [DOI] [Google Scholar]
46.Shobayashi M, Mukai N, Iwashita K, Hiraga Y, Iefuji H. A new method for isolation of S-adenosylmethionine (SAM)-accumulating yeast. Appl. Microbiol. Biotechnol. 2006;69:704–710. doi: 10.1007/s00253-005-0009-7. [DOI] [PubMed] [Google Scholar]
47.Chen H, et al. Intracellular expression of Vitreoscilla hemoglobin improves S-adenosylmethionine production in a recombinant Pichia pastoris. Appl. Microbiol. Biotechnol. 2007;74:1205–1212. doi: 10.1007/s00253-006-0705-y. [DOI] [PubMed] [Google Scholar]
48.Yu H. Environmental carcinogenic polycyclic aromatic hydrocarbons: photochemistry and phototoxicity. J. Environ. Sci. Health C Environ. Carcinog. Ecotoxicol. Rev. 2002;20:149–183. doi: 10.1081/GNC-120016203. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Mueller JG, et al. Phylogenetic and physiological comparisons of PAH-degrading bacteria from geographically diverse soils. Antonie Van Leeuwenhoek. 1997;71:329–343. doi: 10.1023/A:1000277008064. [DOI] [PubMed] [Google Scholar]
50.Habe H, Omori T. Genetics of polycyclic aromatic hydrocarbon metabolism in diverse aerobic bacteria. Biosci. Biotechnol. Biochem. 2003;67:225–243. doi: 10.1271/bbb.67.225. [DOI] [PubMed] [Google Scholar]
51.Gadd, G. M. Fungi in Bioremediation (Cambridge University Press, Cambridge, UK, 2001).
52.Haritash AK, Kaushik CP. Biodegradation aspects of polycyclic aromatic hydrocarbons (PAHs): a review. J. Hazard. Mater. 2009;169:1–15. doi: 10.1016/j.jhazmat.2009.03.137. [DOI] [PubMed] [Google Scholar]
53.Yang Y, Chen RF, Shiaris MP. Metabolism of naphthalene, fluorene, and phenanthrene: preliminary characterization of a cloned gene cluster from Pseudomonas putida NCIB 9816. J. Bacteriol. 1994;176:2158–2164. doi: 10.1128/jb.176.8.2158-2164.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Zhang K, Sawaya MR, Eisenberg DS, Liao JC. Expanding metabolism for biosynthesis of nonnatural alcohols. Proc. Natl Acad. Sci. USA. 2008;105:20653–20658. doi: 10.1073/pnas.0807157106. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem. Soc. Rev. 2015;44:1172–1239. doi: 10.1039/C4CS00351A. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Jessen, H. J., Liao, H. H., Gort, S. J. & Selifonova, O. V. Beta-alanine/alpha-ketoglutarate aminotransferase for 3-hydroxypropionic acid production. US patent application US20090291480 A1 (2014).
57.Liao, H. H., Gokarn, R. R., Gort, S. J. & Jessen, H. J. Alanine 2, 3-aminomutase. US patent application US20080124785 A1 (2007).
58.Renata H, Wang ZJ, Arnold FH. Expanding the enzyme universe: accessing non‐natural reactions by mechanism‐guided directed evolution. Angew. Chem. Int. Ed. 2015;54:3351–3367. doi: 10.1002/anie.201409470. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.France SP, Hepworth LJ, Turner NJ, Flitsch SL. Constructing biocatalytic cascades: in vitro and in vivo approaches to de novo multi-enzyme pathways. ACS Catal. 2017;7:710–724. doi: 10.1021/acscatal.6b02979. [DOI] [Google Scholar]
60.Moretti S, et al. MetaNetX/MNXref–reconciliation of metabolites and biochemical reactions to bring together genome-scale metabolic networks. Nucleic Acids Res. 2016;44:D523–D526. doi: 10.1093/nar/gkv1117. [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Lang M, Stelzer M, Schomburg D. BKM-react, an integrated biochemical reaction database. BMC Biochem. 2011;12:42. doi: 10.1186/1471-2091-12-42. [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Poux S, et al. On expert curation and scalability: UniProtKB/Swiss-Prot as a case study. Bioinformatics. 2017;33:3454–3460. doi: 10.1093/bioinformatics/btx439. [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Burgard AP, Pharkya P, Maranas CD. OptKnock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol. Bioeng. 2003;84:647–657. doi: 10.1002/bit.10803. [DOI] [PubMed] [Google Scholar]
64.Ranganathan S, Suthers PF, Maranas CD. OptForce: an optimization procedure for identifying all genetic manipulations leading to targeted overproductions. PLoS Comput. Biol. 2010;6:e1000744. doi: 10.1371/journal.pcbi.1000744. [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Mahadevan R, Kamp, von A, Klamt S. Genome-scale strain designs based on regulatory minimal cut sets. Bioinformatics. 2015;31:2844–2851. doi: 10.1093/bioinformatics/btv217. [DOI] [PubMed] [Google Scholar]
66.Mellor J, Grigoras I, Carbonell P, Faulon JL. Semisupervised Gaussian process for automated enzyme search. ACS Synth. Biol. 2016;5:518–528. doi: 10.1021/acssynbio.5b00294. [DOI] [PubMed] [Google Scholar]
67.Richard AM, et al. ToxCast chemical landscape: paving the road to 21st century toxicology. Chem. Res. Toxicol. 2016;29:1225–1251. doi: 10.1021/acs.chemrestox.6b00135. [DOI] [PubMed] [Google Scholar]
68.Pandit AV, Srinivasan S, Mahadevan R. Redesigning metabolism based on orthogonality principles. Nat. Commun. 2017;8:15188. doi: 10.1038/ncomms15188. [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Noor E, Haraldsdóttir HS, Milo R, Fleming RMT. Consistent estimation of Gibbs energy using component contributions. PLoS Comput. Biol. 2013;9:e1003098. doi: 10.1371/journal.pcbi.1003098. [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Flamholz A, Noor E, Bar-Even A, Milo R. eQuilibrator–the biochemical thermodynamics calculator. Nucleic Acids Res. 2012;40:D770–D775. doi: 10.1093/nar/gkr874. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(631.1KB, pdf)}

Data Availability Statement

All data generated or analyzed during this study are included in this published article and on MetRxn database (http://www.maranasgroup.com/metrxn).

[CR1] 1.Rodriguez GM, Tashiro Y, Atsumi S. Expanding ester biosynthesis in Escherichia coli. Nat. Chem. Biol. 2014;10:259–265. doi: 10.1038/nchembio.1476. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Atsumi S, Hanai T, Liao JC. Non-fermentative pathways for synthesis of branched-chain higher alcohols as biofuels. Nature. 2008;451:86–89. doi: 10.1038/nature06450. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Khersonsky O, Roodveldt C, Tawfik DS. Enzyme promiscuity: evolutionary and mechanistic aspects. Curr. Opin. Chem. Biol. 2006;10:498–508. doi: 10.1016/j.cbpa.2006.08.011. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Nam H, et al. Network context and selection in the evolution to enzyme specificity. Science. 2012;337:1101–1104. doi: 10.1126/science.1216861. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Coelho PS, Brustad EM, Kannan A, Arnold FH. Olefin cyclopropanation via carbene transfer catalyzed by engineered cytochrome P450 enzymes. Science. 2013;339:307–310. doi: 10.1126/science.1231434. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Young EM, Tong A, Bui H, Spofford C, Alper HS. Rewiring yeast sugar transporter preference through modifying a conserved protein motif. Proc. Natl Acad. Sci. USA. 2014;111:131–136. doi: 10.1073/pnas.1311970111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Huisman GW, Liang J, Krebber A. Practical chiral alcohol manufacture using ketoreductases. Curr. Opin. Chem. Biol. 2010;14:122–129. doi: 10.1016/j.cbpa.2009.12.003. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Savile CK, et al. Biocatalytic asymmetric synthesis of chiral amines from ketones applied to sitagliptin manufacture. Science. 2010;329:305–309. doi: 10.1126/science.1188934. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Saraf MC, Moore GL, Goodey NM, Cao VY. IPRO: an iterative computational protein library redesign and optimization procedure. Biophys. J. 2006;90:4167–4180. doi: 10.1529/biophysj.105.079277. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Liu Y, Kuhlman B. RosettaDesign server for protein design. Nucleic Acids Res. 2006;34:W235–W238. doi: 10.1093/nar/gkl163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Siegel JB, et al. Computational protein design enables a novel one-carbon assimilation pathway. Proc. Natl Acad. Sci. USA. 2015;112:3704–3709. doi: 10.1073/pnas.1500545112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Khersonsky O, et al. Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59. Proc. Natl Acad. Sci. USA. 2012;109:10358–10363. doi: 10.1073/pnas.1121063109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Ogata H, et al. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 1999;27:29–34. doi: 10.1093/nar/27.1.29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Rahman SA, Advani P, Schunk R, Schrader R, Schomburg D. Metabolic pathway analysis web service (Pathway Hunter Tool at CUBIC) Bioinformatics. 2005;21:1189–1193. doi: 10.1093/bioinformatics/bti116. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Blum T, Kohlbacher O. MetaRoute: fast search for relevant metabolic routes for interactive network navigation and visualization. Bioinformatics. 2008;24:2108–2109. doi: 10.1093/bioinformatics/btn360. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Pey J, Prada J, Beasley JE, Planes FJ. Path finding methods accounting for stoichiometry in metabolic networks. Genome Biol. 2011;12:R49. doi: 10.1186/gb-2011-12-5-r49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.de Figueiredo L, Podhorski A, Rubio A. Computing the shortest elementary flux modes in genome-scale metabolic networks. Bioinformatics. 2009;25:3158–3165. doi: 10.1093/bioinformatics/btp564. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Chowdhury A, Maranas CD. Designing overall stoichiometric conversions and intervening metabolic reactions. Sci. Rep. 2015;5:16009. doi: 10.1038/srep16009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Kumar A, Suthers PF, Maranas CD. MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases. BMC Bioinformatics. 2012;13:6. doi: 10.1186/1471-2105-13-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Kanehisa M, Goto S, Sato Y. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 2014;42:D199–D205. doi: 10.1093/nar/gkt1076. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Schomburg I, Chang A, Placzek S. BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA. Nucleic Acids Res. 2013;41:D764–D772. doi: 10.1093/nar/gks1049. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Caspi R, et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 2012;40:D742–D753. doi: 10.1093/nar/gkr1014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Dugundji Ivar JU. An algebraic model of constitutional chemistry as a basis for chemical computer programs. Comput. Chem. 1973;39:19–64. doi: 10.1007/BFb0051317. [DOI] [Google Scholar]

[CR24] 24.Yamanishi Y, Hattori M, Kotera M, Goto S, Kanehisa M. Enzyme: predicting potential EC numbers from the chemical transformation pattern of substrate-product pairs. Bioinformatics. 2009;25:i179–i186. doi: 10.1093/bioinformatics/btp223. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Weininger D, Weininger A, Weininger JL. SMILES. 2. Algorithm for generation of unique SMILES notation. J. Chem. Inf. Model. 1989;29:97–101. doi: 10.1021/ci00062a008. [DOI] [Google Scholar]

[CR26] 26.Finley SD, Broadbelt LJ, Hatzimanikatis V. Computational framework for predictive biodegradation. Biotechnol. Bioeng. 2009;104:1086–1097. doi: 10.1002/bit.22489. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Carbonell P, Parutto P, Herisson J, Pandit SB, Faulon JL. XTMS: pathway design in an eXTended metabolic space. Nucleic Acids Res. 2014;42:W389–W394. doi: 10.1093/nar/gku362. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Fenner K, Gao J, Kramer S, Ellis L, Wackett L. Data-driven extraction of relative reasoning rules to limit combinatorial explosion in biodegradation pathway prediction. Bioinformatics. 2008;24:2079–2085. doi: 10.1093/bioinformatics/btn378. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Moriya Y, et al. PathPred: an enzyme-catalyzed metabolic pathway prediction server. Nucleic Acids Res. 2010;38:1–6. doi: 10.1093/nar/gkq318. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Law J, et al. Route Designer: a retrosynthetic analysis tool utilizing automated retrosynthetic rule generation. J. Chem. Inf. Model. 2009;49:593–602. doi: 10.1021/ci800228y. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Campodonico MA, Andrews BA, Asenjo JA, Palsson BO, Feist AM. Generation of an atlas for commodity chemical production in Escherichia coli and a novel pathway prediction algorithm, GEM-Path. Metab. Eng. 2014;25:140–158. doi: 10.1016/j.ymben.2014.07.009. [DOI] [PubMed] [Google Scholar]

[CR32] 32.Henry CS, Broadbelt LJ, Hatzimanikatis V. Thermodynamics-based metabolic flux analysis. Biophys. J. 2007;92:1792–1805. doi: 10.1529/biophysj.106.093138. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Boghigian BA, Shi H, Lee K, Pfeifer BA. Utilizing elementary mode analysis, pathway thermodynamics, and a genetic algorithm for metabolic flux determination and optimal metabolic network design. BMC Syst. Biol. 2010;4:49. doi: 10.1186/1752-0509-4-49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Kumar A, Maranas CD. CLCA: maximum common molecular substructure queries within the MetRxn database. J. Chem. Inf. Model. 2014;54:3417–3438. doi: 10.1021/ci5003922. [DOI] [PubMed] [Google Scholar]

[CR35] 35.Yim H, et al. Metabolic engineering of Escherichia coli for direct production of 1,4-butanediol. Nat. Chem. Biol. 2011;7:445–452. doi: 10.1038/nchembio.580. [DOI] [PubMed] [Google Scholar]

[CR36] 36.Akhtar MK, Turner NJ, Jones PR. Carboxylic acid reductase is a versatile enzyme for the conversion of fatty acids into fuels and chemical commodities. Proc. Natl Acad. Sci. USA. 2013;110:87–92. doi: 10.1073/pnas.1216516110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Morgat A, Lombardot T, Axelsen KB, Aimo L. Updates in rhea—an expert curated resource of biochemical reactions. Nucleic Acids Res. 2016;45:4279. doi: 10.1093/nar/gkw1299. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Baison W, Teerawutgulrag A, Puangsombat P, Rakariyatham N. An alternative synthesis of (+/−)-phenylephrine hydrochloride. Maejo Int. J. Sci. Technol. 2014;8:41–47. [Google Scholar]

[CR39] 39.Zhang W, Ames BD, Walsh CT. Identification of phenylalanine 3-hydroxylase for meta-tyrosine biosynthesis. Biochemistry. 2011;50:5401–5403. doi: 10.1021/bi200733c. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Hagel JM, Krizevski R, Marsolais F, Lewinsohn E, Facchini PJ. Biosynthesis of amphetamine analogs in plants. Trends Plant. Sci. 2012;17:404–412. doi: 10.1016/j.tplants.2012.03.004. [DOI] [PubMed] [Google Scholar]

[CR41] 41.Lenders JWM, et al. Specific genetic deficiencies of the A and B isoenzymes of monoamine oxidase are characterized by distinct neurochemical and clinical phenotypes. J. Clin. Invest. 1996;97:1010–1019. doi: 10.1172/JCI118492. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.McClymont K, Soyer OS. Metabolic tinker: an online tool for guiding the design of synthetic metabolic pathways. Nucleic Acids Res. 2013;41:e113. doi: 10.1093/nar/gkt234. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Pandey RK, Upadhyay PK, Kumar P. Enantioselective synthesis of (R)-phenylephrine hydrochloride. Tetrahedron Lett. 2003;44:6245–6246. doi: 10.1016/S0040-4039(03)01554-5. [DOI] [Google Scholar]

[CR44] 44.Russell PB, Childress SJ. New route to phenylephrine. J. Pharm. Sci. 1961;50:713–771. doi: 10.1002/jps.2600500824. [DOI] [PubMed] [Google Scholar]

[CR45] 45.Gurjar MK, Krishna LM, Sarma BVNBS, Chorghade MS. A practical synthesis of (R)-(-)-phenylephrine hydrochloride. Org. Proc. Res. Dev. 1998;2:422–424. doi: 10.1021/op970128+. [DOI] [Google Scholar]

[CR46] 46.Shobayashi M, Mukai N, Iwashita K, Hiraga Y, Iefuji H. A new method for isolation of S-adenosylmethionine (SAM)-accumulating yeast. Appl. Microbiol. Biotechnol. 2006;69:704–710. doi: 10.1007/s00253-005-0009-7. [DOI] [PubMed] [Google Scholar]

[CR47] 47.Chen H, et al. Intracellular expression of Vitreoscilla hemoglobin improves S-adenosylmethionine production in a recombinant Pichia pastoris. Appl. Microbiol. Biotechnol. 2007;74:1205–1212. doi: 10.1007/s00253-006-0705-y. [DOI] [PubMed] [Google Scholar]

[CR48] 48.Yu H. Environmental carcinogenic polycyclic aromatic hydrocarbons: photochemistry and phototoxicity. J. Environ. Sci. Health C Environ. Carcinog. Ecotoxicol. Rev. 2002;20:149–183. doi: 10.1081/GNC-120016203. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Mueller JG, et al. Phylogenetic and physiological comparisons of PAH-degrading bacteria from geographically diverse soils. Antonie Van Leeuwenhoek. 1997;71:329–343. doi: 10.1023/A:1000277008064. [DOI] [PubMed] [Google Scholar]

[CR50] 50.Habe H, Omori T. Genetics of polycyclic aromatic hydrocarbon metabolism in diverse aerobic bacteria. Biosci. Biotechnol. Biochem. 2003;67:225–243. doi: 10.1271/bbb.67.225. [DOI] [PubMed] [Google Scholar]

[CR51] 51.Gadd, G. M. Fungi in Bioremediation (Cambridge University Press, Cambridge, UK, 2001).

[CR52] 52.Haritash AK, Kaushik CP. Biodegradation aspects of polycyclic aromatic hydrocarbons (PAHs): a review. J. Hazard. Mater. 2009;169:1–15. doi: 10.1016/j.jhazmat.2009.03.137. [DOI] [PubMed] [Google Scholar]

[CR53] 53.Yang Y, Chen RF, Shiaris MP. Metabolism of naphthalene, fluorene, and phenanthrene: preliminary characterization of a cloned gene cluster from Pseudomonas putida NCIB 9816. J. Bacteriol. 1994;176:2158–2164. doi: 10.1128/jb.176.8.2158-2164.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR54] 54.Zhang K, Sawaya MR, Eisenberg DS, Liao JC. Expanding metabolism for biosynthesis of nonnatural alcohols. Proc. Natl Acad. Sci. USA. 2008;105:20653–20658. doi: 10.1073/pnas.0807157106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR55] 55.Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem. Soc. Rev. 2015;44:1172–1239. doi: 10.1039/C4CS00351A. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR56] 56.Jessen, H. J., Liao, H. H., Gort, S. J. & Selifonova, O. V. Beta-alanine/alpha-ketoglutarate aminotransferase for 3-hydroxypropionic acid production. US patent application US20090291480 A1 (2014).

[CR57] 57.Liao, H. H., Gokarn, R. R., Gort, S. J. & Jessen, H. J. Alanine 2, 3-aminomutase. US patent application US20080124785 A1 (2007).

[CR58] 58.Renata H, Wang ZJ, Arnold FH. Expanding the enzyme universe: accessing non‐natural reactions by mechanism‐guided directed evolution. Angew. Chem. Int. Ed. 2015;54:3351–3367. doi: 10.1002/anie.201409470. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR59] 59.France SP, Hepworth LJ, Turner NJ, Flitsch SL. Constructing biocatalytic cascades: in vitro and in vivo approaches to de novo multi-enzyme pathways. ACS Catal. 2017;7:710–724. doi: 10.1021/acscatal.6b02979. [DOI] [Google Scholar]

[CR60] 60.Moretti S, et al. MetaNetX/MNXref–reconciliation of metabolites and biochemical reactions to bring together genome-scale metabolic networks. Nucleic Acids Res. 2016;44:D523–D526. doi: 10.1093/nar/gkv1117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR61] 61.Lang M, Stelzer M, Schomburg D. BKM-react, an integrated biochemical reaction database. BMC Biochem. 2011;12:42. doi: 10.1186/1471-2091-12-42. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR62] 62.Poux S, et al. On expert curation and scalability: UniProtKB/Swiss-Prot as a case study. Bioinformatics. 2017;33:3454–3460. doi: 10.1093/bioinformatics/btx439. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR63] 63.Burgard AP, Pharkya P, Maranas CD. OptKnock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol. Bioeng. 2003;84:647–657. doi: 10.1002/bit.10803. [DOI] [PubMed] [Google Scholar]

[CR64] 64.Ranganathan S, Suthers PF, Maranas CD. OptForce: an optimization procedure for identifying all genetic manipulations leading to targeted overproductions. PLoS Comput. Biol. 2010;6:e1000744. doi: 10.1371/journal.pcbi.1000744. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR65] 65.Mahadevan R, Kamp, von A, Klamt S. Genome-scale strain designs based on regulatory minimal cut sets. Bioinformatics. 2015;31:2844–2851. doi: 10.1093/bioinformatics/btv217. [DOI] [PubMed] [Google Scholar]

[CR66] 66.Mellor J, Grigoras I, Carbonell P, Faulon JL. Semisupervised Gaussian process for automated enzyme search. ACS Synth. Biol. 2016;5:518–528. doi: 10.1021/acssynbio.5b00294. [DOI] [PubMed] [Google Scholar]

[CR67] 67.Richard AM, et al. ToxCast chemical landscape: paving the road to 21st century toxicology. Chem. Res. Toxicol. 2016;29:1225–1251. doi: 10.1021/acs.chemrestox.6b00135. [DOI] [PubMed] [Google Scholar]

[CR68] 68.Pandit AV, Srinivasan S, Mahadevan R. Redesigning metabolism based on orthogonality principles. Nat. Commun. 2017;8:15188. doi: 10.1038/ncomms15188. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR69] 69.Noor E, Haraldsdóttir HS, Milo R, Fleming RMT. Consistent estimation of Gibbs energy using component contributions. PLoS Comput. Biol. 2013;9:e1003098. doi: 10.1371/journal.pcbi.1003098. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR70] 70.Flamholz A, Noor E, Bar-Even A, Milo R. eQuilibrator–the biochemical thermodynamics calculator. Nucleic Acids Res. 2012;40:D770–D775. doi: 10.1093/nar/gkr874. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Pathway design using de novo steps through uncharted biochemical spaces

Akhil Kumar

Lin Wang

Chiam Yu Ng

Costas D Maranas

Abstract

Introduction

Fig. 1.

Results

An illustrative example for rePrime and novoStoic

Fig. 2.

Fig. 3.

1,4-butanediol synthesis

Fig. 4.

Table 1.

Phenylephrine synthesis

Fig. 5.

Table 2.

Oxidative degradation of benzo[a]pyrene to catechol

Fig. 6.

Fig. 7.

Table 3.

Discussion

Methods

Data and parameters required by rePrime and novoStoic

Developing a database of reaction rules using rePrime

Pathway design by using the novoStoic optimization framework

Code availability

Data availability

Electronic supplementary material

Acknowledgements

Author contributions

Competing interests

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases