Skip to main content
Nature Communications logoLink to Nature Communications
. 2024 Feb 21;15:1602. doi: 10.1038/s41467-024-45886-9

Iterative design of training data to control intricate enzymatic reaction networks

Bob van Sluijs 1, Tao Zhou 1,, Britta Helwig 1, Mathieu G Baltussen 1, Frank H T Nelissen 1, Hans A Heus 1, Wilhelm T S Huck 1,
PMCID: PMC10881569  PMID: 38383500

Abstract

Kinetic modeling of in vitro enzymatic reaction networks is vital to understand and control the complex behaviors emerging from the nonlinear interactions inside. However, modeling is severely hampered by the lack of training data. Here, we introduce a methodology that combines an active learning-like approach and flow chemistry to efficiently create optimized datasets for a highly interconnected enzymatic reactions network with multiple sub-pathways. The optimal experimental design (OED) algorithm designs a sequence of out-of-equilibrium perturbations to maximize the information about the reaction kinetics, yielding a descriptive model that allows control of the output of the network towards any cost function. We experimentally validate the model by forcing the network to produce different product ratios while maintaining a minimum level of overall conversion efficiency. Our workflow scales with the complexity of the system and enables the optimization of previously unobtainable network outputs.

Subject terms: Chemistry, Process chemistry, Computational models, Computational chemistry, Dynamic networks


Kinetic modeling of in vitro enzymatic reaction networks (ERNs) is severely hampered by the lack of training data. Here, authors introduce a methodology that combines an active learning-like approach and flow chemistry to create optimized datasets for an intricate ERN.

Introduction

Living cells rely on enzymatic reaction networks (ERNs) to produce energy and building blocks to support cellular processes. Evolution has shaped these ERNs into interconnected sub-pathways to generate multiple outputs from multiple inputs, driving product formation across complex kinetic landscapes. Recently, significant progress has been made in reconstituting ERNs in vitro with the aim of building a cell from the bottom up14, or to produce value-added chemicals from sustainable substrates as an advanced biotechnology58. However, most of these networks typically do not capture one of the essential features of biological ERNs, where several interconnected sub-pathways function simultaneously to generate multiple outputs. Controlling such networks remains challenging due to the lack of sufficiently informative experimental datasets that can be utilized to train kinetic models which trace the dynamic properties of large ERNs and enable on-demand design9,10.

Typically, the optimization of ERNs towards specific outcomes, like increasing the overall efficiency, is achievable by searching a large combinatorial space of inputs and measuring the product formation of the ERN. Experimentally, this is prohibitively time-, labor-, and cost-intensive11. Recently Pandi et al. have shown that such a screening process could be significantly improved by an AI based active learning protocol12. Additionally, promising advances have been published recently, utilizing machine learning to derive and individual reaction mechanisms from large datasets9,10,1315. Yet, these black box approaches are limited in their ability to guide the design of large ERNs, however they are often very adept at mapping a specific region of the input output space, but not the entire space (the entire kinetic landscape). Kinetic models based on ordinary differential equations can track all the intermediates through time by explicitly formulated reaction rates and hence are especially powerful in guiding the optimization of complex ERNs16. In the context of larger networks, parameterizing these models is challenging. Not every interaction can be observed which complicates the identification of individual rates. Training data often relies on steady state batch experiments where a single combination of control inputs is tested. These experiments tend to be kinetically non-informative and are not sufficient to approximate the kinetic landscape of complex ERNs. To address this, time-course datasets which track the responses of ERNs to controlled perturbations are needed. This is demonstrated by both Shen et al. and Hold et al. in both batch and flow respectively, who characterized networks by adding the enzymes sequentially and measuring the change in product formation1719. However, as the complexity and scale of an ERN increases (substrate competition, allosteric interactions, feedback loops, futile cycles, etc.) choosing a set of perturbations intuitively such that we obtain relevant information about the kinetic landscape becomes increasingly difficult.

Here, we present a generalizable method that trains a kinetic model iteratively, by adding new and more informative experiments to a training dataset in each optimization cycle (akin to active learning). It incorporates an optimal experimental design (OED) algorithm that evolves a sequence of out-of-equilibrium perturbations to be maximally informative. We subsequently test the utility of the model by using the experimental outcomes of these perturbation experiments as test data for the previous iteration of the model. Using this approach, we demonstrate that a limited number of design iterations is enough to obtain data of sufficient quality to map the kinetic landscape of the ERN and obtain a measure of control over it as a multi-input multi-output (MIMO) system in vitro.

Results

Overview of the nucleotide salvage pathway

The in vitro ERN constructed in this work derives from the nucleotide salvage pathway (Fig. 1a), which regenerates nucleotides for cellular processes by recovering bases and nucleosides from the degradation of RNA and DNA. The network starts with phosphoribosyl pyrophosphate (PRPP), which can be converted from glucose via the pentose phosphate pathway and is coupled by the enzyme UPRT and APRT to nucleobases uracil and adenine, respectively, to form the monophosphate nucleotides UMP and AMP. For solubility reasons we did not include guanine as a nucleobase and started from GMP. UMP, GMP and AMP are subsequently converted to their corresponding diphosphate nucleotides (NDPs) by enzymes UMPK, GMPK and AK, respectively, using ATP as cofactor. Finally, NDPs are converted to NTPs by a single enzyme, PK. In total, this system consists of six enzymes catalyzing eight reversible reactions, where PK is shared between three substrates, and resource competition for ATP, PEP and PRPP throughout the network. Previous works demonstrated all these enzymes could function in one pot to synthesize labeled nucleotides with an excess amount of the key compound2022, yet the overall performance is poor, controlling multiple state outputs remains a challenge, this requires the guidance of a kinetic model with sufficient resolution.

Fig. 1. Overview of the nucleotide salvage pathway and kinetic model parameter sensitivities to observed species.

Fig. 1

a Reaction scheme of the in vitro reconstructed part of the nucleotide salvage pathway. The network consists of 6 enzymes and 15 substrates/products, resulting in a set of ODEs containing over 40 kinetic parameters. b Positive and negative correlations between the forward sensitivities between the kinetic parameters with respect to the measured output.

Kinetic model of the nucleotide salvage pathway

Translating the reactions of the ERN into a coarse grained model of ordinary differential equation (ODEs), resulted in an ODE system of 15 equations with over 40 kinetic rates (for a full description of the model and coarse graining process see supplementary information 2). Generally, choosing the right model can be challenging (Fig S5-11), large enzymatic reaction networks require more parameterization, this can cause the model to overfit the training data, reducing its predictive power. This parameter problem is present in all models, but with ODEs it can be viewed from the perspective of a parameter’s forwards sensitivity to the observed species (Fig. 1b)23. These sensitivities map onto the contribution a parameter has to the observed rates of change over time (Supplementary information 1.6eq.6). When these sensitivities correlate with one another, the observations can be approximated by the model by modifying both rates simultaneously. A positive correlation between the forward sensitivities of kinetic rates implies a similar effect on the rate of change of the observed species, thus the model can fit the data by increasing the value of one rate whilst decreasing the value of its partner, a negative correlation implies an opposing effect on the rate of change, thus, to fit the data the kinetic rates need both to either increase or decrease.

This unidentifiability means many combinations of kinetic rates can approximate the data (not just the ‘true’ rates), which in turn leads to prediction errors as the experimental conditions change from those used to generate the initial training data2325. Thus, experimental data can be deemed uninformative if the inability to discern which reactions contribute most to the flux of a species at a specific time and results in prediction errors as conditions change. Generally, it is easier to completely identify rates in simplified models, but their quantitative predictive power will be limited as mechanistic assumptions are readily broken (Supplementary information 2, Fig. S10). Conversely, detailed mechanistic models are more descriptive but it is harder to identify kinetic rates.

However, from a broadly practical perspective, precisely identifying individual rates is not needed to control the behavior of an ERN, a model just needs to approximate the kinetic landscape adequately and the remaining uncertainty needs to be manageable. To address this efficiently, we adapted an active learning approach commonly applied in machine learning with the singular goal of controlling ERNs. We utilized optimal experimental design (OED) to design experiments that maximize information about the ERN in the data, and subsequently train a kinetic model and tested its predictive power. This cycle was repeated until the uncertainty around the predictions was reduced and they matched the experimental outcome.

OED and pulsing substrates into the flow reactor

We highlight this experimental workflow in Fig. 2a. First, all enzymes were individually immobilized on microfluidically produced hydrogel beads with a diameter of 50 μm26. The activity of each enzyme after immobilization was measured separately. Next, enzyme-loaded beads were loaded into a microfluidic continuous stirred-tank reactor (CSTR). The CSTR chamber itself has a volume of 100 μl and the flow setup has six inlets for each of the input substrates uracil, GMP, adenine, ATP, PEP, and PRPP and a single outlet (Fig. S15). Samples were collected from the output at different intervals depending on the total flow rates by a fraction collector and analyzed offline by ion-pair HPLC27. The analysis of the chromatographic peaks provides a compositional pattern of eight input substrate, intermediates, and product molecules (uracil, UMP, GMP, adenine, ADP, GTP, UTP, and ATP), each changing at every input combination.

Fig. 2. Overview of the experimental flow set-up and the iterative design of training data to train a kinetic model.

Fig. 2

a Schematic of the experimental workflow. Enzymes are immobilized on gel beads and placed in CSTR with 6 inlets containing different substrates The output is measured offline on an Ion paired HPLC, 8 species (N = 1, indicated by the arrows, from left to right: uracil, UMP, GMP, adenine, ADP, GTP, UTP, and ATP) can be observed over time. b Computational workflow to design an information dense dataset and train a kinetic model. In step one the OED algorithm evolves control inputs (i.e. inflow rates of the 6 inlets) to be maximally informative. In step two this data is added to a training dataset which is subsequently used to fit a model in step three, resulting in a range of possible parameter values for each parameter (color). In step four we use the previous iteration of the model to predict the outcome of the latest experiment, utilizing this round as test data.

The optimal experimental design workflow is shown in Fig. 2b. In step one a swarm/evolutionary algorithm evolves an input flow profile for each of the six inputs at three different flowrates28,29. This algorithm scores input patterns by maximizing the D-Fisher information criterion (Supplementary information 1.6eq.7)30. This criterion is obtained by computing the determinant of the Fisher information matrix which is derived from the parameter sensitivities (Supplementary information 1.2eq.6). This metric maps onto the volume of the parameter space where the ODE model can fit to the experimental data30,31. This means the algorithm is driven to find a combination of input sequences that breaks the correlation between parameter sensitivities (if only temporary). The transition between different total flow rates results in different output compositions and serves as another control parameter that increases the information content about substrate conversion fluxes in the data. At high flow rates input molecules and monophosphates are detected (as only a fraction of substrate has been converted); at low flow rates increased NTP formation is observed (Supplementary information 2.3 Fig. S13 & S14). In step two this data is added to a training dataset, the model is trained on this data in step three. In step four the predictive power of the model is assessed by using the previous iteration of the model to predict the current experiment (test data), if the predictive power is not sufficient or no longer improves, the cycle is terminated; if not, the cycle continues, and the latest iteration of the model and database is used to design a new experiment in step one32.

Iterative design of training data to build a kinetic model

A total of three iterations of the optimization cycle were performed (excluding a calibration), each time exchanging the microfluidic chip, altering the enzyme concentrations (Supplementary information 3.2 Fig. S17-S20). The lower and upper boundary of the concentration ranges for the substrates were based on the enzyme activity assays and substrate solubility (Supplementary information 4.4 Fig. S24-S37). The initial experiment (not part of the cycle) is manually designed and ‘calibrates’ the model (Supplementary information 3.2 Fig. S17). This allows for the subsequent OED of an informative input sequence since more knowledge about the system equates to better OED outcomes26. To illustrate the non-intuitive character of the evolved input sequence we show the substrate inputs of the final experiment of the optimization cycle (Fig. 3a) and the complexity of the time-course data including model convergence (Fig. 3b).

Fig. 3. Example of optimally designed flow profile and measured output (3rd iteration).

Fig. 3

a Input flow rates for each of the 6 inputs substrates evolved by the OED algorithm. b Data as measured on HPLC (black triangles) and the fit of the model to the data (solid lines), Source data are provided in Source Data Fig3.

We subsequently place these data in the context of the optimization cycle (Fig. 4). Figure 4a shows parameter distributions of the model trained in the first iteration (top) and the parameter distribution of the model trained in the third iteration (bottom). We note a significant decline in the distribution width of most kinetic parameters (Fig. S14). To demonstrate the improved predictive power of the model, Fig. 4b compares the predicted outcomes (shaded area) of the model trained after iteration one and iteration two of the OED cycle (predicting the experiment performed in the third iteration shown in Fig. 3). The second iteration of the model already shows a drastic reduction in the variance around the prediction and highlights that the model can approximate the behavior of the ERN quantitatively.

Fig. 4. Application of the iterative design of training data and its impact on identifiability and the predictive power of the model.

Fig. 4

a Distribution of fits of the parameters including either only the first (number of datapoints N = 211) or the 3rd iteration (number of datapoints N = 166, the box itself shows the quartiles where middle boxes represent 50%, with a line showing the median value. The whiskers of the box show the highest and lowest values), the parameter set is included if the fit score deviates no less than 15% from the best fit, the y-axis denotes the parameter value, the catalysis rates are in mM/min, the Km values in mM. We note that after new rounds are added the distributions of the parameters decreases. b) Prediction of the last experiment (black triangles) in the dataset using the model trained on the dataset obtained after the first (shaded blue) and second (shaded orange) iteration of the cycle, we simulated the model using the best parameter sets (N = 20), the shade area reflects the standard deviation around the mean prediction. Source data are provided in Source Data.

Trained model controls nucleotide salvage pathway in flow

This presents us with new opportunities for the third iteration of the model, beyond traditional optimization schemes that often focus on maximizing the yield of a single product. Here, we demonstrate how we can use the final iteration of the model to control a MIMO system to achieve a range of more complex output states29,33. We opted to tune the ATP/GTP/UTP output ratios whilst maintaining a minimal conversion efficiency—defined as the percentage of nucleobases converted to triphosphates—of 60%.

The outcome of this sampling process is shown in Fig. 5a, we randomly generated 105 substrate input combinations, each input combination was simulated twenty times using different combinations of estimated kinetic rates. Every dot represents a different condition, the color indicates the ratio between ATP/UTP/GTP. The 20 sets of estimated kinetic rates –when simulated- predict different ATP, UTP, and GTP concentrations. This is reflected by the y-axis which shows the standard deviation of the predicted mean concentrations for these simulations. It captures the certainty of the model and the likelihood there will be a prediction error for a given set of input conditions. The x-axis subsequently shows the conversion efficiency. We selected seven experimental conditions representing seven ATP/UTP/GTP ratios in Fig. 5a, including one repeated ratio (experiment 1 & 7) and one experiment with a lower conversion efficiency (experiment 3). This experiment serves two purposes: first, to demonstrate that the model can control a MIMO system and access a part of the output space that requires an accurate map of the kinetics and finely tuned control inputs (which is achieved by optimizing the ERN for different triphosphate blends with a high conversion efficiency). Second, to identify the operable space of the model, for which we test a range of total input substrate concentrations along with compositional blends of final products.

Fig. 5. Controlling the nucleotide salvage pathway as a MIMO system and testing the model by predicting product ratios.

Fig. 5

a Shows the range of possible different ratios given different substrate inflow rates, we opted to screen a large space of experimental and select 7 ratios (number in circle) to test along a range of summed substrate input concentrations (numbered spots). Each color is a simulated ratio, the standard deviation is the simulated deviation around the predicted mean (y-axis). The conversion efficiency is the predicted fraction of nucleobases that is converted to a triphosphate. To calculate the efficiency of the adenine conversion we first subtract the ATP concentration input from the measured ATP output. b) shows the experiments, labeled 1-7, with both the simulated concentrations including confidence interval (N = 20, The box itself shows the quartiles where middle boxes represent 50%, with a line showing the median value. The whiskers of the box show the highest and lowest values) and the HPLC measurement. The ratio between ATP (blue), UTP (gray), and GTP (green) are shown on top. c) Shows the prediction error defined as the percentage the simulated mean deviates from the HPLC data on the y-axis (averaging the error for the three triphosphates) and the total concentrations of the input substrates on the x-axis. Source data are provided in Source Data.

Figure 5b shows the predicted confidence interval of the final yield, and the yield as measured on the HPLC. For experiments 1-5 uncertainties and total output concentrations vary but predictions still match. For very low input concentrations of UMP, guanine, adenine, and ATP in experiments 6 and 7, the predictions error increases even though the simulated standard deviation is low. This relation between the prediction error, quantified as the percentage the simulated mean deviates from the HPLC measurement and the summed input concentration of the nucleobases is shown in Fig. 5c. It highlights that the model can predict exact concentrations as long as the total concentration of substrate inputs is larger than 0.3 mM. The cause can likely be attributed to a decrease in the signal to noise ratio for the HPLC measurement, leading to larger variations in the experimental data (see Supplementary information table S1-2). To test this, we used different models. More complex models which contained different rate laws (Fig. S5-6) as well as combinations of allosteric interactions reported in literature (Fig. S7). However, none of these models performed better and prediction errors increased. This suggests that these interactions do not play a significant role in this network, at least not significant enough to overcome a potential overfit of the training data. In contrast, reducing the complexity of the model increased the prediction error significantly, we were able to confirm that reactions catalyzed by the PK, UMPK, AMPK, and AK enzymes need to be reversible, whereas UPRT and APRT can be considered unidirectional (Fig. S8-11). In summary, this means that a total input concentration 0.3 mM marks the practical boundary of the model trained on this data, knowledge we can leverage to efficiently probe conditions in the identified operable space.

Discussion

We have presented a methodology to design informative training data and map the kinetic landscape of an ERN as efficiently as possible. By designing sufficiently complex experiments we were able to restrict the combinations of potential kinetic rates such that they map onto real product formation fluxes across a large input-output space. This space could subsequently be sampled for any cost function. To highlight this versatility, we opted to create different compositional blends of triphosphate compounds which require not one but multiple finely tuned input conditions. Finally, we identify the operable space wherein the model is useful and demonstrate that other mechanistic descriptions of the systems reduce the predictive power of the model. This underscores that the active learning aspect of the OED pipeline is able to balance the degree to which we parameterize the model, its mechanistic assumptions, and its predictive power within three iterations.

The number of OED iterations required to achieve this depends on both the complexity of the network and the quality of the experimental data. If the system is highly non-linear, more certainty about the rates will be needed as smaller deviations from the true value will result in larger prediction errors. In contrast, very linear and orthogonal networks will likely require significantly fewer optimization cycles (and a simpler model) to enable a form of MIMO control. Overall, this means the pipeline can be utilized in different contexts as long as there is a kinetic model with control inputs (in Fig. S1 we probed the applicability of this software to larger systems, specifically by comparing CPU time needed for the model presented here and the E. coli core metabolism model). Process optimization for organic synthesis using design of experiments in flow has been reported, most of which aim to determine the optimal operational conditions for one reaction3436. So far experimental design schemes have not been applied to multiple organic reaction networks (or placed in the context of an active learning cycle). However, there is no reason why it cannot be applied to train a kinetic model which provides more understanding and a high-level of control over chemical reaction networks10,13.

In future work, more complex cost functions can be defined, including the identification of key reaction mechanisms and interactions by making the coarse graining process of the model an explicit part of the active learning process. In this instance, the algorithm—besides mapping the kinetic landscape- seeks to find input combinations which either validate or invalidate mechanistic assumptions embedded in different models. Currently, we are able to discriminate between different rate laws (broadly classifying them as descriptive or not) and the inclusion of reaction reversibility, whereas potential allosteric interactions did not seem to be present in a manner that effected predicted outcomes. However, the differences were not explicitly maximized by the algorithm, thus the observed difference in predictive power was minimal in most cases37. Nevertheless our results are promising, and are complimentary to other work that has shown that black box models can identify the reaction mechanism of a single reaction from bulk data14. Such approaches have not been reported in the context of a biochemical network nor have they been embedded in an active learning like approach which offers promise for the future. Overall, we believe our pipeline is beneficial to all who seek to build complex biochemical pathways with controlled inputs.

Methods

Materials

Enzymes adenylate kinase (AK) and pyruvate kinase (PK) and all chemicals were purchased from Sigma and directly used without further processes. Enzymes adenine phosphoribosyl transferase (APRT) and, uracil phosphoribosyl transferase (UPRT) were expressed and purified as described by Arthur et al.38, Genes for guanosine monophosphate kinase (GMPK) and, uridine monophosphate kinase (UMPK) were PCR amplified from E. coli K12 using gene specific primers, cloned into pET15b, expressed overnight at 30 °C (GMPK) and 18 °C (UMPK) in E. coli BL21(DE3) and purified according to protocols modified from Oeschger et al. 39 (GMPK) and Serina et al. 40 (UMPK) to accommodate Ni2+-sepharose purification. Purified enzymes were dialyzed against 20 mM potassium phosphate buffer (pH 7.2) prior to immobilization. All the enzymes were immobilised on microfluidic produced hydrogel beads, as reported26. After immobilization, all the enzyme-beads were freeze dried and stored in -20 °C. 1 mg of beads for each enzyme was suspended in 31 ul IVTT buffer (pH 7.3, 9 mM magnesium acetate, 5 mM potassium phosphate, 95 mM potassium glutamate, 5 mM ammonium chloride, 0.5 mM calcium chloride, 1 mM spermidine, 8 mM putrescine, 1 mM dithiothreitol, 10 mM creatine phosphate). All reactions were conducted in this so-called IVTT buffer at room temperature.

Flow experiments setup

Cetoni Nemesys syringe pumps with Hamilton syringes were used to control input and the flow profile was programmed using the Cetoni neMESYS software26,41. Before performing the designed flow profile, the whole system was equilibrium with buffer for two hours. The outflow of the CSTR was collected using a fraction collector, collecting for either 30 or 15 minutes or three droplets per fraction. The ion-pair HPLC analysis was adapted from ref. 26 and performed on Shimadzu Nexera X3 HPLC system with an Inertsil ODS-4 column (3 μm, 150 × 4.6 mm; GL Science) and a guard column (3 μm; 10 × 4.6 mm) at 40 °C. The elution gradient was as follows: 100% buffer A (100 mM potassium phosphate buffer (pH 6.4) with 8 mM ion-pair reagent tetrabutylammonium bisulfate, filtered before use) for 13 min; 0–77% linear gradient of buffer B for 22 min; 77–100% buffer B (70% buffer A with 30% acetonitrile) for 1 min; and 100% buffer B for 14 min. The flow rate was maintained at 1 ml/min. Peaks were identified by comparison with standard samples. The concentration was obtained from the integrated peak areas with the calibration curve of each standard.

Software and modeling

An overview of the software that performs the optimizations can be found in Supplementary Information 1. A generated text-based model object25 is translated to an SBML and AMICI object modified from ref. 28 and ref. 42 (Supporting information Fig. S1-S4). AMICI is an ODE compilation package to C + + which is continuously updated4346. Several publicly available tools integrate with AMICI4549. This is needed for the expanding repertoire of ever larger kinetic models (most in vivo)5054. To quantify the computational cost (and its general application to larger systems) we tested the speed of the pipeline presented on an in vivo metabolic core E. coli core metabolism model (ref. 53) and placed it in the context of our in vitro reactor set-up (see Fig. S1). This test was run on a single core of Intel Xeon E5-1660 v4 @ 3.2 GHz. For more information on the efficiency of AMICI itself (where the bulk of the calculations are performed), we refer the reader to refs. 43,44,5557, or its, by now, numerous applications5864.

Statistics & reproducibility

No statistical method was used to predetermine the sample size. No data were excluded from the analyses; for the experimental data shown in Fig. 5, the experimental conditions predicting specific ratios were selected randomly after sampling 105 possible ratios of ATP/UTP/GTP. Provided these ratios conformed to the required conversion efficiency (60%) and the chosen set of conditions differed sufficiently between the summed total inflow concentration of all substrates to cover the largest possible space and test the model.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Supplementary information

Peer Review File (1.9MB, pdf)
Reporting Summary (719.6KB, pdf)

Source data

Source data (234.4KB, zip)

Acknowledgements

This project is funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (ERC Adv. Grant Life-Inspired, grant agreement No. 833466 and ERC PoC Grant OptiPlex, grant agreement No. 101069237). T. Z. acknowledges the Swiss National Science Foundation for financial support (P500PB_203166).

Author contributions

B.v.S., T.Z. and W.T.S.H. conceived the study. B.v.S. and T.Z designed and performed experiments respectively. B.v.S., T.Z. and W.T.S.H. analyzed the data and discussed the results. B.H. carried out foundational work and B.v.S. and M.G.B. built software to auto- generate strings for kinetic models of ERNs. F.N. and H.H. purified the four commercial unavailable enzymes provided the related plasmids. All authors discussed the results, provided comments, and revised the manuscript.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Data availability

All relevant data supporting the key findings of this study are available within the article and its Supplementary Information files. Source data are provided with this paper as a singular source data file, including the time-dependent inputs and HPLC quantifications and parameter estimates, archive 10.5281/zenodo.10411170. Source data are provided in this paper.

Code availability

The package is written in Python 3.8 (python software foundation, Delaware US). Code can be found at Huckgroup GitHub at http://github.com/huckgroup/OED, code archived (see ref. 65), 10.5281/zenodo.10411170 (2023). For more information contact bob.vansluijs@gmail.com.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Tao Zhou, Email: tao.zhou@ru.nl.

Wilhelm T. S. Huck, Email: w.huck@science.ru.nl

Supplementary information

The online version contains supplementary material available at 10.1038/s41467-024-45886-9.

References

  • 1.Berhanu S, Ueda T, Kuruma Y. Artificial photosynthetic cell producing energy for protein synthesis. Nat. Commun. 2019;10:1325. doi: 10.1038/s41467-019-09147-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bhattacharya A, Brea RJ, Niederholtmeyer H, Devaraj NK. A minimal biochemical route towards de novo formation of synthetic phospholipid membranes. Nat. Commun. 2019;10:300. doi: 10.1038/s41467-018-08174-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Lee KY, et al. Photosynthetic artificial organelles sustain and control ATP-dependent reactions in a protocellular system. Nat. Biotechnol. 2018;36:530–535. doi: 10.1038/nbt.4140. [DOI] [PubMed] [Google Scholar]
  • 4.Pols T, et al. A synthetic metabolic network for physicochemical homeostasis. Nat. Commun. 2019;10:4239. doi: 10.1038/s41467-019-12287-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Burgener S, Luo S, McLean R, Miller TE, Erb TJ. A roadmap towards integrated catalytic systems of the future. Nat. Catal. 2020;3:186–192. doi: 10.1038/s41929-020-0429-x. [DOI] [Google Scholar]
  • 6.Valliere MA, Korman TP, Arbing MA, Bowie JU. A bio-inspired cell-free system for cannabinoid production from inexpensive inputs. Nat. Chem. Biol. 2020;16:1427–1433. doi: 10.1038/s41589-020-0631-9. [DOI] [PubMed] [Google Scholar]
  • 7.Rasor BJ, et al. Toward sustainable, cell-free biomanufacturing. Curr. Opin. Biotechnol. 2021;69:136–144. doi: 10.1016/j.copbio.2020.12.012. [DOI] [PubMed] [Google Scholar]
  • 8.Miller TE, et al. Light-powered CO(2) fixation in a chloroplast mimic with natural and synthetic parts. Science. 2020;368:649–654. doi: 10.1126/science.aaz6802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Yu T, et al. Machine learning-enabled retrobiosynthesis of molecules. Nat. Catal. 2023;6:137–151. doi: 10.1038/s41929-022-00909-w. [DOI] [Google Scholar]
  • 10.Margraf JT, Jung H, Scheurer C, Reuter K. Exploring catalytic reaction networks with machine learning. Nat. Catal. 2023;6:112–121. doi: 10.1038/s41929-022-00896-y. [DOI] [Google Scholar]
  • 11.Morgado G, Gerngross D, Roberts TM, Panke S. Synthetic biology for cell-free biosynthesis: fundamentals of designing novel in vitro multi-enzyme reaction networks. Adv. Biochem. Eng. Biotechnol. 2018;162:117–146. doi: 10.1007/10_2016_13. [DOI] [PubMed] [Google Scholar]
  • 12.Pandi A, et al. A versatile active learning workflow for optimization of genetic and metabolic networks. Nat. Commun. 2022;13:3876. doi: 10.1038/s41467-022-31245-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wen M, et al. Chemical reaction networks and opportunities for machine learning. Nat. Comput. Sci. 2023;3:12–24. doi: 10.1038/s43588-022-00369-z. [DOI] [PubMed] [Google Scholar]
  • 14.Bures J, Larrosa I. Organic reaction mechanism classification using machine learning. Nature. 2023;613:689–695. doi: 10.1038/s41586-022-05639-4. [DOI] [PubMed] [Google Scholar]
  • 15.Faulon JL, Faure L. In silico, in vitro, and in vivo machine learning in synthetic biology and metabolic engineering. Curr. Opin. Chem. Biol. 2021;65:85–92. doi: 10.1016/j.cbpa.2021.06.002. [DOI] [PubMed] [Google Scholar]
  • 16.Martin JP, et al. A dynamic kinetic model captures cell-free metabolism for improved butanol production. Metab. Eng. 2023;76:133–145. doi: 10.1016/j.ymben.2023.01.009. [DOI] [PubMed] [Google Scholar]
  • 17.Shen L, et al. A combined experimental and modelling approach for the Weimberg pathway optimisation. Nat. Commun. 2020;11:1098. doi: 10.1038/s41467-020-14830-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Bujara M, Schumperli M, Pellaux R, Heinemann M, Panke S. Optimization of a blueprint for in vitro glycolysis by metabolic real-time analysis. Nat. Chem. Biol. 2011;7:271–277. doi: 10.1038/nchembio.541. [DOI] [PubMed] [Google Scholar]
  • 19.Hold C, Billerbeck S, Panke S. Forward design of a complex enzyme cascade reaction. Nat. Commun. 2016;7:12971. doi: 10.1038/ncomms12971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Parkin DW, Leung HB, Schramm VL. Synthesis of nucleotides with specific radiolabels in ribose. Primary 14C and secondary 3H kinetic isotope effects on acid-catalyzed glycosidic bond hydrolysis of AMP, dAMP, and inosine. J. Biol. Chem. 1984;259:9411–9417. doi: 10.1016/S0021-9258(17)42716-5. [DOI] [PubMed] [Google Scholar]
  • 21.Tolbert TJ, Williamson JR. Preparation of specifically deuterated and 13C-labeled RNA for NMR studies using enzymatic synthesis. J. Am. Chem. Soc. 1997;119:12100–12108. doi: 10.1021/ja9725054. [DOI] [Google Scholar]
  • 22.Nelissen FHT, Girard FC, Tessari M, Heus HA, Wijmenga SS. Preparation of selective and segmentally labeled single-stranded DNA for NMR by self-primed PCR and asymmetrical endonuclease double digestion. Nucleic Acids Res. 2009;37:e114–e114. doi: 10.1093/nar/gkp540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Gábor A, Villaverde AF, Banga JR. Parameter identifiability analysis and visualization in large-scale kinetic models of biosystems. BMC Syst. Biol. 2017;11:54. doi: 10.1186/s12918-017-0428-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kreutz C, Raue A, Kaschek D, Timmer J. Profile likelihood in systems biology. FEBS J. 2013;280:2564–2571. doi: 10.1111/febs.12276. [DOI] [PubMed] [Google Scholar]
  • 25.Raue A, et al. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics. 2009;25:1923–1929. doi: 10.1093/bioinformatics/btp358. [DOI] [PubMed] [Google Scholar]
  • 26.Baltussen MG, van de Wiel J, Fernandez Regueiro CL, Jakstaite M, Huck WTS. A Bayesian approach to extracting kinetic information from artificial enzymatic networks. Anal. Chem. 2022;94:7311–7318. doi: 10.1021/acs.analchem.2c00659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Nakajima K, et al. Simultaneous determination of nucleotide sugars with ion-pair reversed-phase HPLC. Glycobiology. 2010;20:865–871. doi: 10.1093/glycob/cwq044. [DOI] [PubMed] [Google Scholar]
  • 28.van Sluijs B, Maas RJM, van der Linden AJ, de Greef TFA, Huck WTS. A microfluidic optimal experimental design platform for forward design of cell-free genetic networks. Nat. Commun. 2022;13:3626. doi: 10.1038/s41467-022-31306-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Smith RW, van Sluijs B, Fleck C. Designing synthetic networks in silico: a generalised evolutionary algorithm approach. BMC Syst. Biol. 2017;11:118. doi: 10.1186/s12918-017-0499-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Sinkoe A, Hahn J. Optimal experimental design for parameter estimation of an IL-6 signaling model. Processes. 2017;5:49. doi: 10.3390/pr5030049. [DOI] [Google Scholar]
  • 31.de Aguiar PF, Bourguignon B, Khots MS, Massart DL, Phan-Than-Luu R. D-optimal designs. Chemometrics Intell. Lab. Syst. 1995;30:199–210. doi: 10.1016/0169-7439(94)00076-X. [DOI] [Google Scholar]
  • 32.Ruess J, Parise F, Milias-Argeitis A, Khammash M, Lygeros J. Iterative experiment design guides the characterization of a light-inducible gene expression circuit. Proc. Natl Acad. Sci. USA. 2015;112:8148–8153. doi: 10.1073/pnas.1423947112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Otero-Muras I, Carbonell P. Automated engineering of synthetic metabolic pathways for efficient biomanufacturing. Metab. Eng. 2021;63:61–80. doi: 10.1016/j.ymben.2020.11.012. [DOI] [PubMed] [Google Scholar]
  • 34.Taylor CJ, et al. A brief introduction to chemical reaction optimization. Chem. Rev. 2023;123:3089–3126. doi: 10.1021/acs.chemrev.2c00798. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Taylor CJ, et al. Flow chemistry for process optimisation using design of experiments. J. Flow. Chem. 2021;11:75–86. doi: 10.1007/s41981-020-00135-0. [DOI] [Google Scholar]
  • 36.Wyvratt BM, McMullen JP, Grosser ST. Multidimensional dynamic experiments for data-rich process development of reactions in flow. React. Chem. Eng. 2019;4:1637–1645. doi: 10.1039/C9RE00078J. [DOI] [Google Scholar]
  • 37.Egert, J. & Kreutz, C. Realistic simulation of time-course measurements in systems biology. bioRxiv, 2023.2001. 2005.522854 (2023). [DOI] [PubMed]
  • 38.Arthur PK, Alvarado LJ, Dayie TK. Expression, purification and analysis of the activity of enzymes from the pentose phosphate pathway. Protein Expr. Purif. 2011;76:229–237. doi: 10.1016/j.pep.2010.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Oeschger MP, Bessman MJ. Purification and properties of guanylate kinase from Escherichia coli. J. Biol. Chem. 1966;241:5452–5460. doi: 10.1016/S0021-9258(18)96451-3. [DOI] [PubMed] [Google Scholar]
  • 40.Serina L, et al. Escherichia coli UMP kinase, a member of the aspartokinase family, is a hexamer regulated by guanine nucleotides and UTP. Biochemistry. 1995;34:5066–5074. doi: 10.1021/bi00015a018. [DOI] [PubMed] [Google Scholar]
  • 41.Helwig B, van Sluijs B, Pogodaev AA, Postma SGJ, Huck WTS. Bottom-up construction of an adaptive enzymatic reaction. Netw. Angew. Chem. Int Ed. Engl. 2018;57:14065–14069. doi: 10.1002/anie.201806944. [DOI] [PubMed] [Google Scholar]
  • 42.Choi K, et al. Tellurium: An extensible python-based modeling environment for systems and synthetic biology. Biosystems. 2018;171:74–79. doi: 10.1016/j.biosystems.2018.07.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Frohlich F, et al. AMICI: high-performance sensitivity analysis for large ordinary differential equation models. Bioinformatics. 2021;37:3676–3677. doi: 10.1093/bioinformatics/btab227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Lakrisenko P, et al. Efficient computation of adjoint sensitivities at steady-state in ODE models of biochemical reaction networks. PLoS Comput. Biol. 2023;19:e1010783. doi: 10.1371/journal.pcbi.1010783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Schälte, Y. et al. pyPESTO: A modular and scalable tool for parameter estimation for dynamic models. arXiv preprint arXiv:2305.01821 (2023). [DOI] [PMC free article] [PubMed]
  • 46.Schmiester L, Weindl D, Hasenauer J. Efficient gradient-based parameter estimation for dynamic models using qualitative data. Bioinformatics. 2021;37:4493–4500. doi: 10.1093/bioinformatics/btab512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Schmiester L, Weindl D, Hasenauer J. Parameterization of mechanistic models from qualitative data using an efficient optimal scaling approach. J. Math. Biol. 2020;81:603–623. doi: 10.1007/s00285-020-01522-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Schmiester L, et al. PEtab—Interoperable specification of parameter estimation problems in systems biology. PLoS Comput. Biol. 2021;17:e1008646. doi: 10.1371/journal.pcbi.1008646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.van Rosmalen RP, Smith R, Dos Santos VM, Fleck C, Suarez-Diez M. Model reduction of genome-scale metabolic models as a basis for targeted kinetic models. Metab. Eng. 2021;64:74–84. doi: 10.1016/j.ymben.2021.01.008. [DOI] [PubMed] [Google Scholar]
  • 50.Dash S, et al. Development of a core Clostridium thermocellum kinetic metabolic model consistent with multiple genetic perturbations. Biotechnol. Biofuels. 2017;10:1–16. doi: 10.1186/s13068-017-0792-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Foster CJ, Gopalakrishnan S, Antoniewicz MR, Maranas CD. From Escherichia coli mutant 13C labeling data to a core kinetic model: a kinetic model parameterization pipeline. PLoS Comput. Biol. 2019;15:e1007319. doi: 10.1371/journal.pcbi.1007319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Gopalakrishnan S, Dash S, Maranas C. K-FIT: An accelerated kinetic parameterization algorithm using steady-state fluxomic data. Metab. Eng. 2020;61:197–205. doi: 10.1016/j.ymben.2020.03.001. [DOI] [PubMed] [Google Scholar]
  • 53.Khodayari A, Zomorrodi AR, Liao JC, Maranas CD. A kinetic model of Escherichia coli core metabolism satisfying multiple sets of mutant flux data. Metab. Eng. 2014;25:50–62. doi: 10.1016/j.ymben.2014.05.014. [DOI] [PubMed] [Google Scholar]
  • 54.Foster CJ, Wang L, Dinh HV, Suthers PF, Maranas CD. Building kinetic models for metabolic engineering. Curr. Opin. Biotechnol. 2021;67:35–41. doi: 10.1016/j.copbio.2020.11.010. [DOI] [PubMed] [Google Scholar]
  • 55.Städter P, Schälte Y, Schmiester L, Hasenauer J, Stapor PL. Benchmarking of numerical integration methods for ODE models of biological systems. Sci. Rep. 2021;11:2696. doi: 10.1038/s41598-021-82196-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Shaikh B, et al. BioSimulators: a central registry of simulation engines and services for recommending specific tools. Nucleic Acids Res. 2022;50:W108–W114. doi: 10.1093/nar/gkac331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Fröhlich F, Theis FJ, Rädler JO, Hasenauer J. Parameter estimation for dynamical systems with discrete events and logical operations. Bioinformatics. 2017;33:1049–1056. doi: 10.1093/bioinformatics/btw764. [DOI] [PubMed] [Google Scholar]
  • 58.Fröhlich, F. In Computational Modeling of Signaling Networks 59-86 (Springer, 2022).
  • 59.Lao-Martil D, et al. Kinetic modeling of Saccharomyces cerevisiae central carbon metabolism: achievements, limitations, and opportunities. Metabolites. 2022;12:74. doi: 10.3390/metabo12010074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Fröhlich F, Gerosa L, Muhlich J, Sorger PK. Mechanistic model of MAPK signaling reveals how allostery and rewiring contribute to drug resistance. Mol. Syst. Biol. 2023;19:e10988. doi: 10.15252/msb.202210988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Smith RW, van Rosmalen RP, Martins dos Santos VA, Fleck C. DMPy: a Python package for automated mathematical model construction of large-scale metabolic systems. BMC Syst. Biol. 2018;12:1–16. doi: 10.1186/s12918-018-0584-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Massonis G, Villaverde AF, Banga JR. Improving dynamic predictions with ensembles of observable models. Bioinformatics. 2023;39:btac755. doi: 10.1093/bioinformatics/btac755. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Mishra S, Wang Z, Volk MJ, Zhao H. Design and application of a kinetic model of lipid metabolism in Saccharomyces cerevisiae. Metab. Eng. 2023;75:12–18. doi: 10.1016/j.ymben.2022.11.003. [DOI] [PubMed] [Google Scholar]
  • 64.Contento, L., Stapor, P., Weindl, D. & Hasenauer, J. In International Conference on Computational Methods in Systems Biology 36-43 (Springer, 2023).
  • 65.van Sluijs, B. “Iterative design of training data to control intricate enzymatic networks”, Zenodo. 10.5281/zenodo.10411170 (2023). [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Peer Review File (1.9MB, pdf)
Reporting Summary (719.6KB, pdf)
Source data (234.4KB, zip)

Data Availability Statement

All relevant data supporting the key findings of this study are available within the article and its Supplementary Information files. Source data are provided with this paper as a singular source data file, including the time-dependent inputs and HPLC quantifications and parameter estimates, archive 10.5281/zenodo.10411170. Source data are provided in this paper.

The package is written in Python 3.8 (python software foundation, Delaware US). Code can be found at Huckgroup GitHub at http://github.com/huckgroup/OED, code archived (see ref. 65), 10.5281/zenodo.10411170 (2023). For more information contact bob.vansluijs@gmail.com.


Articles from Nature Communications are provided here courtesy of Nature Publishing Group

RESOURCES