Skip to main content
NPJ Systems Biology and Applications logoLink to NPJ Systems Biology and Applications
. 2024 Aug 22;10:94. doi: 10.1038/s41540-024-00412-x

Assessing structural uncertainty of biochemical regulatory networks in metabolic pathways under varying data quality

Yue Han 1, Mark P Styczynski 1,
PMCID: PMC11341918  PMID: 39174554

Abstract

Ordinary differential equation (ODE) models are powerful tools for studying the dynamics of metabolic pathways. However, key challenges lie in constructing ODE models for metabolic pathways, specifically in our limited knowledge about which metabolite levels control which reaction rates. Identification of these regulatory networks is further complicated by the limited availability of relevant data. Here, we assess the conditions under which it is feasible to accurately identify regulatory networks in metabolic pathways by computationally fitting candidate network models with biochemical systems theory (BST) kinetics to data of varying quality. We use network motifs commonly found in metabolic pathways as a simplified testbed. Key features correlated with the level of difficulty in identifying the correct regulatory network were identified, highlighting the impact of sampling rate, data noise, and data incompleteness on structural uncertainty. We found that for a simple branched network motif with an equal number of metabolites and fluxes, identification of the correct regulatory network can be largely achieved and is robust to missing one of the metabolite profiles. However, with a bi-substrate bi-product reaction or more fluxes than metabolites in the network motif, the identification becomes more challenging. Stronger regulatory interactions and higher metabolite concentrations were found to be correlated with less structural uncertainty. These results could aid efforts to predict whether the true metabolic regulatory network can be computationally identified for a given stoichiometric network topology and dataset quality, thus helping to identify optimal measures to mitigate such identifiability issues in kinetic model development.

Subject terms: Computer modelling, Differential equations, Dynamical systems, Regulatory networks, Biochemical networks

Introduction

As the biomanufacturing industry has rapidly grown, there has been an increasing demand for mechanistic modeling of metabolism at the cellular scale for strain engineering1. These models empower data-driven investigation of biological systems and thus inform experimental design and accelerate the engineering cycle2. While steady-state models such as flux balance analysis have been used to predict the performance of engineered cells3, these analyses fail to capture the dynamic nature of cellular metabolism4. The dynamics of metabolic pathways are often studied using ordinary differential equation (ODE) models5 that facilitate the identification of kinetic bottleneck reactions and of ways to improve productivity6.

A challenge in constructing ODE models for metabolic pathways is uncertainty about the networks of direct (allosteric) interactions between metabolites and enzymes that often control reaction rates7. While large swaths of the chemical reaction network are highly conserved across organisms, metabolite-level regulatory interactions can vary significantly across organisms or even across experimental conditions for the same organism. As a result, despite significant advances in the identification of protein-metabolite interactions in vitro8,9 and the development of computational approaches to identify allosteric sites10,11, large-scale organism-specific and condition-specific experimental identification of these interactions remains elusive12. If these regulatory interactions are not known and thus not incorporated into models, the predictability of those models can be significantly compromised as observed flux changes cannot be explained1316.

Computational kinetic modeling approaches have helped to address this challenge. Link et al.17 fitted models with putative allosteric regulation to experimental data and identified the key interactions that govern the switch between gluconeogenesis and glycolysis. Hackett et al. developed an approach to systematically add regulatory interactions to a model and assess whether there is an improved fit, which led to the discovery of novel regulatory interactions18. Many previously unknown metabolite-level regulatory interactions in various organisms were uncovered using related approaches19,20. However, these often require a large amount of experimental data or several iterations of experimentation and modeling, which is not always feasible. Moreover, correct identification of regulatory interactions can be severely hampered by a lack of high-quality data: noisy and low-frequency metabolite measurements can allow ODE models to assume incorrect regulatory networks to fit as well as the correct regulatory network (i.e., overfit), leaving uncertainty as to the true structure.

Two diverse strategies are commonly used to develop computational kinetic models in the face of this challenge. In one strategy, unknown regulatory interactions are identified with model discrimination. For example, ModelMaGe21 and TopoFilter22 automatically generate and manage alternative model structures given heuristics for model reduction to make the computations tractable, while Guillen-Gosalbez et al.23 identified the model structure and performed parameter estimation simultaneously to handle the trade-off between model complexity and fitting accuracy. A second strategy assumes that identifying correct regulatory interactions is not possible and uses ensembles of models to deal with structural uncertainty. For example, ensemble models for signaling network structures have been developed using logic-based models to account for structural uncertainty24. The choice between these two strategies depends on the availability and quality of experimental data, and significant computational or experimental efforts can potentially be saved with the choice of the correct strategy upfront. Further, whether the correct strategy is chosen will affect the downstream model interpretation and utility. If it is possible to identify the correct regulatory network confidently but an ensemble modeling strategy was taken, mechanistic understanding of the biological system would be lost, and thus, the model would underperform its potential. On the other hand, if an ensemble modeling strategy should have been used when one chooses to identify a single regulatory network structure, model predictability could be compromised. It is also worth noting that the approaches described above were designed for gene regulatory networks and signaling networks, not metabolic pathways where partial network structure can be assumed to be known and the strength of regulatory interactions plays a larger role in model accuracy.

The structural uncertainty issue is under-characterized, mainly due to the nested nature of structure uncertainty and parameter identifiability25. Parameter identifiability is quite well-studied, with approaches developed to quantify and mitigate the issue2634. For structural parameter identifiability, which is the potential for multiple sets of parameters to reproduce the same characteristic behavior independent of any data, approaches like reparameterization can mitigate the issue28,30. For practical parameter identifiability, which is where a single set of parameters cannot be distinguished from the rest due to limitations of data quality, the uncertainty of parameters can be quantified by confidence intervals or posterior probability densities33,35, and ensemble modeling approaches can be used to account for uncertainties in parameter values3638. However, parameter identifiability analyses can only be performed given a single well-defined model structure, as minor structural changes can significantly alter dependencies39. This means that structural uncertainty both confounds and is confounded by parameter identifiability issues for regulatory networks in metabolic pathways.

This stresses the significance of elucidating whether, given limited experimental data availability and quality, it is feasible to identify the correct model structure from data. Are some regulatory interactions inherently more difficult to identify than others? If so, what are the characteristics that make them hard to identify? With regards to data, what characteristics of the data impact the difficulty of identifying the metabolite-level regulatory network? And under what combination of data quality and structure complexity can we confidently identify one correct regulatory network?

Here, we begin to explore these questions via the analysis of four common topological motifs in metabolic pathways. We assessed the difficulty of identifying the structure of metabolic regulatory networks by fitting alternative network models to data of varying qualities. A variety of factors were found to jointly impact the identification of true regulatory networks, including the noise and sampling frequency of the data, missing metabolite profiles, topological location of metabolites with missing profiles, and the strength and relative position of the underlying regulatory network topology. These insights could be used to aid future efforts to mitigate, or at least predict, limitations on regulatory structure identification in metabolic models.

Results

To assess the impact of data quality and network topologies on regulatory network structural uncertainty in metabolic pathways, we studied common stoichiometric and regulatory network motifs under different conditions of synthetic, noise-added data generated with biochemical systems theory (BST) kinetics. We used four reaction network topologies (Fig. 1) with three to six regulatory network topologies (Supplementary Fig. 1), each with 20 parameterizations that yield diverse kinetic profiles. We generated three noisy replicates of data for each topology pair’s parameterizations, with four different sampling rates and three different noise levels, as well as the removal of data for specific metabolites (reflecting analytical instrument limitations) from some datasets. Each of these datasets was then used to fit a comprehensive set of regulatory network models (subject to the constraints described in Methods). Parameter estimation was performed on the regulatory kinetic parameters only, assuming that kinetic parameters for stoichiometric interactions were known. The candidate network models were then ranked based on Bayesian Information Criterion (BIC) scores, and the rank of the true regulatory network model was recorded as a proxy for structural uncertainty. This yielded a total of 41,040 cases (with each case defined as the ranking for one noisy replicate of a given stoichiometric and regulatory topology with a given parameterization, sampling rate, noise level, and metabolite removal status), including 12,960 cases for the determined, underdetermined, and multi-substrate motifs each and 2160 cases for the cycle motif. The impact of metabolites with missing data was only examined in the more computationally tractable determined, underdetermined, and multi-substrate motifs. The distribution of true model ranks was then analyzed to assess the impact of noise levels, metabolite missingness, and underlying network topologies. A common enzyme kinetics rate law, Michaelis–Menten, was also used to assess structural uncertainty for a subset of cases, and the uncertainty was significantly worse (Supplementary Fig. 2).

Fig. 1. Network motifs studied.

Fig. 1

a Determined motif with stoichiometric merge; b underdetermined motif due to multiple stoichiometric branch points; c multi-substrate; d cycle. Metabolites are represented by blue circles containing metabolite labels xi in white, while reactions are represented by black arrows with adjacent vi labels in black.

True network rank is correlated with the noise and sampling rate of the data

Across 4 sampling rates ranging from 100 to 1000 time points per metabolite and 3 noise levels ranging from CoV = 0.05–0.25, we found a direct relationship between the quality of the data and the ability to identify the true regulatory network. The true network rank distribution for each of the 12 {sampling rate, noise level} pairs for the determined motif is shown in Fig. 2a. The frequency of the top-ranked network, being the true network generally decreases with increasing noise and decreasing sampling rate, which is to be expected. The rank distribution for the underdetermined, multi-substrate, and cycle motifs shown in Supplementary Fig. 3 follows the same trend. Lower sampling rates from 10 to 50-time points per metabolite were studied for the determined motif in Supplementary Figs. 4 and 5, showing the continuation of the same trends. With fewer data points and more noise in the experimental data, incorrect regulatory network models can better overfit the data, yielding fits similar to or better than that of the true regulatory network model. We used a Spearman correlation to quantitatively characterize the relationship between the rank and sampling rate or noise. For the determined motif, we found a Spearman coefficient of −0.207 between the true network rank and sampling rate, and a Spearman coefficient of 0.248 between the true network rank and CoV (Fig. 2b). It is worth noting that other factors, such as the 20 different parameterizations, are major contributors of variability to the relatively low Spearman correlation coefficient; an example of the variability among different parameterizations is shown in Supplementary Fig. 6. While there is variability across individual parameterizations, the general trends in the aggregate average behavior are consistent across stoichiometric topologies (Fig. 2b).

Fig. 2. Impact of noise and sampling rate.

Fig. 2

a Distribution of the rank of the true regulatory network model among all possible regulatory network models in the determined motif across all underlying true regulatory network structures, parameterizations, and replicates for 12 different noise conditions. Slices of the pie charts represent the percentage of cases where the true regulatory network is ranked in the corresponding category. The percentage of high ranks decreases as the sampling rate decreases or the noise increases. b Spearman correlation coefficients across all cases for the relationship between the true regulatory network model rank and sampling frequency or CoV in the determined, underdetermined, multi-substrate, and cycle motifs.

The impact of missing metabolite profiles varies with network topology and the position of the missing metabolite

Individual metabolites having completely missing measurement profiles is a common occurrence in metabolomics data due to challenges including metabolite annotation/identification and limits of detection40. To assess the impact of this phenomenon, two metabolites were arbitrarily selected from the determined, underdetermined, and multi-substrate motifs to be withheld from training datasets for all parameterizations, data noise levels, and sampling frequencies. While the unavailability of metabolite profiles for metabolites #2 and #4 in the determined motif did not change the true network rank distributions (two-sample Kolmogorov–Smirnov test, α = 0.05) and thus the structural uncertainty, removal of data for metabolites #2 and #3 in the underdetermined motif changed the rank distributions (two-sample Kolmogorov–Smirnov test, α = 0.05), indicating worsened structural uncertainty for that motif (Fig. 3). In the multi-substrate motif, removal of metabolite #2, surprisingly, benefits the structural uncertainty. This could be due to the high concentrations of metabolite #2 and the associated large signal-dependent noise. The large noise may outweigh the fitting error of the other metabolites and cause overfitting to noise. Removal of metabolite #4 did not change the true network rank distribution.

Fig. 3. Impact of missing data.

Fig. 3

Distribution of the rank of the true regulatory network model given different missing metabolites in the a determined motif, b underdetermined motif, and c multi-substrate motif. Slices of the pie charts represent the percentage of cases where the true regulatory network is ranked in the corresponding category across all underlying true regulatory networks, parameterizations, replicates, noise levels, and sampling frequencies. A two-sample Kolmogorov–Smirnov test was used to compare the rank distribution between datasets with and without a missing metabolite profile (α = 0.05); in the determined motif the differences were not significant, whereas in the underdetermined and multi-substrate motifs they were significant.

We assessed the impact of all other possible missing metabolite profiles at a subset of noise conditions (nT = 100 and CoV = 0.25; nT = 100 and CoV = 0.05; Supplementary Figs. 7 and 8). We used only a subset of noise conditions to make the problem more computationally tractable, because in both stoichiometric topologies tested above, any single noise condition was found to be reasonably representative of the trend observed in 3 (Supplementary Figs. 9 and 10). We found that at low noise (CoV = 0.05), structural uncertainty in the determined motif is robust to the removal of any single metabolite profile, while in the underdetermined motif, it is robust to only missing metabolites #1 and #4, one at a branch point and one in a linear pathway. On the other hand, at higher noise (CoV = 0.25), the structural uncertainty in the determined motif is robust to the removal of most metabolite profiles except for #1, whereas the removal of any metabolite profile has a detrimental impact on the structural uncertainty in the underdetermined motif (Supplementary Figs. 7 and 8). These results indicate that determined systems may be more robust to missing metabolite profiles than underdetermined systems and that the position of the missing metabolite in the underdetermined systems has varying impacts on uncertainty.

Some types of regulatory networks are easier to identify than others

To assess the extent to which structure uncertainty can be attributed to the topology and other characteristics of the true regulatory network, we analyzed the true network rank distribution across each of the six regulatory network topologies in the determined, underdetermined, and multi-substrate motifs. In the determined and underdetermined motifs, the six regulatory network topologies can be split into two categories: one with regulatory interactions spanning across branches (Crosstalk), and the other with feedback regulatory interactions only (Feedback). In the multi-substrate motif, the six regulatory network topologies can be split into one with feedback only (Feedback) and one with mixed feedback and feedforward regulatory interactions (Mixed). For the determined motif, two regulatory network topologies yielded different true network rank distributions from the rest: one with crosstalk and one with feedback only (Supplementary Fig. 11). Both involve regulatory control exerted by metabolite #5, which is a merge point for two network branches, while other regulatory network topologies do not involve metabolite #5. For the underdetermined motif, the regulatory network topologies with crosstalk were found to have different true network rank distributions than those with feedback only (two-sample Kolmogorov–Smirnov test, α = 0.05) (Fig. 4a). For the multi-substrate motif, identification of correct regulatory interactions for the regulatory network topologies involving feedforward regulatory interactions was found to be more difficult than for those involving feedback regulatory interactions only (Fig. 4b). To further explore potential factors at play in identification, regulatory networks with a single interaction were studied (Supplementary Figs. 1214), revealing that the challenge of identifying correct regulatory interactions seems to be related to individual regulatory interactions. In the underdetermined motif, regulatory interactions involving a metabolite on a linear pathway are easier to identify than others, while regulatory interactions involving fluxes downstream of the 2nd branch points are harder to identify. In the multi-substrate motif, regulatory networks involving feedforward interaction or a multi-substrate reaction are harder to identify. Taken together, these results suggest that the topological position of the regulator metabolite and the complexity or topological proximity of the regulator to the regulated reaction can have a substantial impact on the difficulty of identifying the true regulatory network.

Fig. 4. Impact of regulatory interaction characteristics.

Fig. 4

Distribution of the rank of the true regulatory network model for different regulatory network topologies in the a underdetermined b multi-substrate motif. Slices of the pie charts represent the percentage of cases where the true regulatory network is ranked in the corresponding category across all parameterizations, noise levels, sampling frequencies, and replicates. a rank distributions for the three crosstalk regulatory network topologies are significantly different than the three feedback ones. b rank distributions for the three mixed regulatory network topologies are significantly different than the three feedback-only ones. A two-sample Kolmogorov–Smirnov test was used to compare rank distributions (α = 0.05).

Furthermore, the strength of individual regulatory interactions seems to often play a role in structural uncertainty. All regulatory parameters whose magnitudes had a non-zero correlation (α = 0.05) with the rank of the true regulatory network model had negative correlation coefficients (Supplementary Fig. 15), suggesting that a larger regulatory kinetic parameter—which indicates a stronger regulatory interaction, whether positive or negative—is correlated with less structural uncertainty.

Characteristic features of time-course data are correlated with the rank

We next sought to identify whether there might be any characteristics of the measured time-course metabolite datasets that are correlated with structural uncertainty. We calculated 30 time-course data-derived features (Supplementary Table 1) for each metabolite profile and then found their respective correlations with the true regulatory network rank across all regulatory topologies, parameterizations, noise conditions, and sampling rates for each network motif using a Spearman correlation coefficient. Since we found many time-course-derived features to be highly correlated with the sampling rate (Supplementary Fig. 16), the correlation analysis was performed for each sampling rate separately. 52, 34, 14, and 14 features were found to have significant Spearman correlation coefficients (α = 0.05) with at least one sampling rate for the determined, underdetermined, multi-substrate, and cycle motifs, respectively (Supplementary Data 1). The five features that are most significantly correlated with true regulatory network model rank for each stoichiometric network topology are shown in Fig. 5.

Fig. 5. Impact of time-course data.

Fig. 5

Spearman correlation coefficients for the top five time-course-derived features in each stoichiometric network topology at each sampling rate. Correlation coefficients for each feature are calculated across all parameterizations, noise levels, replicates, and regulatory network structures.

A more detailed analysis of these correlated time-course-derived features revealed a few interesting trends (Fig. 5). First, in the determined motif, the top five significantly correlated time-course-derived features all have positive correlation coefficients with rank and thus are positively correlated with structural uncertainty. These time-course-derived features are mostly related to the derivatives or the slope of the dynamic profile. This indicates that larger changes in metabolite accompanied by lower sampling rates are associated with structural uncertainty. In the underdetermined, multi-substrate, and cycle motifs, features in Fig. 5 are mostly related to the absolute value of one certain metabolite, and larger minimum and mean concentrations are associated with increased structural identification, suggesting that larger metabolite concentrations are often associated with improved uncertainty. In addition, the correlation coefficients at higher sampling rates are generally (though not always) smaller in magnitude than those at lower sampling rates, suggesting that the change in numerical values in these features, as the sampling rate decreases, plays a key role in the increased challenge of identifying the correct regulatory network topology as the sampling decreases.

Discussion

We have found that specific data quality and network topology features are linked to the challenges of identifying the correct regulatory network topology in a metabolic network from metabolite profiling data. High noise and low sampling rates in the data pose heightened obstacles in discerning the network structure, as would be expected. Additionally, the impact of metabolites with missing profiles is more detrimental to chemical reaction networks with more fluxes than metabolites but can vary based on the topological position of the metabolite with respect to both the regulatory and stoichiometric topologies. In terms of the characteristics of regulatory networks and of the time-course data, weaker regulatory interactions, lower metabolite concentrations, and faster concentration changes accompanied by lower sampling frequency were associated with increased difficulty in identifying correct regulatory network structures. In addition, the topological proximity of the regulator to the regulated reaction greatly impacts the uncertainty in the underdetermined network motif, and feedforward regulatory interactions are harder to identify than feedback interactions in the multi-substrate motif. Additional studies are warranted for multi-substrate reaction motifs that also include mass conservation (e.g., cofactor recycling), which may face different challenges in structural uncertainty. Though this study was limited to four common network motifs using BST kinetics, characteristics of the data and topology were found to collectively influence the identification of regulatory network models in metabolic systems. These observations help us understand the unexploited boundary of topology complexity and data quality at which the true regulatory network can be inferred and motivate structural uncertainty analysis on more network motifs. For example, identifying the correct regulatory interaction in a determined network motif is feasible in a high noise condition and when metabolite profiles are missing, while the identification in a slightly more complicated multi-substrate network motif is challenging even at a low noise condition. Taken together, the insights and trends identified in this work support the importance of broader efforts to identify the right strategy to address structural uncertainty challenges effectively, which would lead to better models of cellular metabolism to support biomanufacturing efforts.

Methods

Test models

Four common stoichiometric topological motifs present in metabolic pathways (Fig. 1) were chosen for study as an initial testbed for exploration. These motifs were selected as they can be readily identified as parts of common biochemical networks, and thus represent a kind of building block whose study would be valuable. For example, the determined motif can be found in the glycolysis and steroid hormone biosynthesis pathway, the underdetermined motif can be found in highly branched pathways such as pyruvate metabolism, and the cycle motif is inspired by pathways such as the TCA cycle, urea cycle, and Krebs cycle. For the determined motif, 12 different regulatory networks were used for the study: three with two regulatory interactions spanning across branches (Crosstalk), three with only two feedback regulatory interactions (Feedback), and six with single feedback or crosstalk regulatory interaction. For the underdetermined motif, 13 total were studied, including three two-interaction networks involving crosstalk, three two-interaction networks involving only feedback, and seven single feedback or crosstalk regulatory interactions. Seven networks with single feedback or feedforward interaction, three two-interaction networks involving feedback only, and three two-interaction networks involving a mix of feedback and feedforward were studied for the multi-substrate motif. Three different regulatory networks were used for the cycle motif. The regulatory networks for each stoichiometric topology are included in Supplementary Fig. 1. For each combination of topology and regulatory network, 20 different parameterizations were tested.

Generating synthetic data

Synthetic data were used in this study to decouple the impacts of experimental noise from inherent structural uncertainty issues. To generate these data, ODE models assuming power-law kinetics were used. Example equations for these models are provided in Supplementary Fig. 17. Kinetic BST parameters for stoichiometric reactions were sampled from a uniform distribution [0,2], while those for regulatory interactions were sampled from [−2,2]. For regulatory networks with a single interaction, all kinetic parameters were modified from parameterizations in two-interaction regulatory networks. Time-course metabolite data were generated by solving the ODE system in MATLAB using ode15s. To model experimental measurements, the original noiseless data were modified to be at various levels of data quality. Down-sampling of the metabolite data was accomplished by evenly sampling nT time points over the time interval. Measurement noise was added by replacing concentrations with a random value drawn from the normal distribution Ni,k~(yitk,CoVyi(tk)) where yi(tk) is the value of metabolite i at time point k, and CoV is a coefficient of variance. The addition of signal-independent noise does not significantly impact uncertainty (Supplementary Fig. 18). To model the issue of metabolites that cannot be measured or annotated via metabolomics approaches, the time course for some metabolites was removed from some datasets.

Assessing structure uncertainty

Structural uncertainty was assessed by defining regulatory networks for a given stoichiometric topology, calculating metabolite profiles for each regulatory network, and then using the resulting simulated metabolite profiles to fit different candidate regulatory network models. Due to the combinatorial nature of the regulatory networks, to keep the computational time manageable we assumed that one metabolite cannot regulate more than one flux and one flux cannot be regulated by more than one metabolite. In all, 227 putative regulatory networks were generated for the determined and multi-substrate motif, 823 for the underdetermined motif, and 4079 for the cycle motif. Parameter estimation was then performed on the regulatory kinetic parameters only, assuming that kinetic parameters for stoichiometric interactions were known. Eight initial points were sampled from the range [−10,10] for each candidate regulatory network, and the fmincon function in MATLAB (which uses an interior-point method) was used to find parameter values locally, minimizing the sum of square residuals with respect to the generated noisy synthetic data. The mean and standard deviation of the objective value is then calculated, and if the standard error is larger than 0.1, 16 more optimization runs with initial points are run to increase the likelihood of finding a minimum closer to the global minimum. We acknowledge that global optima may not be guaranteed in these optimization runs, which is a challenge for all regulatory interaction models. The candidate regulatory network models were then ranked by their BIC scores. BIC was chosen to be the metric among the seven metrics tested as it consistently outperformed other metrics in discerning the true network structure model in initial testing (Supplementary Fig. 19). Since there is no established or widely-used metric for the degree of uncertainty in regulatory network structures, uncertainty was assessed using the rank of the correct regulatory network as a proxy: the correct network ranking first was interpreted as minimal structural uncertainty, while higher rankings indicated more structural uncertainty since overfitted incorrect models could outperform the correct model.

Feature generation

To explore whether there are characteristics of the data quality or of the data characteristics that make the correct regulatory network in a given case unidentifiable, multiple features were analyzed. Categories of features studied include noise-indicating factors such as a number of datapoints and CoV, time-course derived features, and the strength of regulatory interactions as indicated by magnitudes of true regulatory kinetic parameters. Time-course-derived features were generated with tsfel in temporal and statistical domains41.

Correlation between the rank of the true regulatory network and features

The Spearman correlation coefficient was used to assess the correlation between the rank of the true regulatory network and each feature. The analyses were performed on all datapoints as well as on subsets of the data to minimize the impact of confounding factors (i.e., covariance of different features). For example, it was found that some time-course features are highly correlated with the sampling rate, so the correlation between true regulatory network rank and the time-course features (4320 datapoints) was also performed for each individual sampling rate (1080 datapoints). In all cases, a Bonferroni correction was applied to adjust for multiple hypothesis testing.

Supplementary information

Supplementary Data 1 (22.7KB, xlsx)

Acknowledgements

This research was supported in part through research cyberinfrastructure resources and services provided by the Partnership for an Advanced Computing Environment at the Georgia Institute of Technology, Atlanta, GA, USA. The authors thank the National Institutes of Health (R35-GM149286 and R35-GM119701) for funding support. The funder played no role in the study design, data collection, analysis, and interpretation of data, or the writing of this manuscript.

Author contributions

YH: conceptualization, formal analysis, investigation, software, visualization, writing—original draft, writing—review and editing; M.P.S.: conceptualization, funding acquisition, supervision, writing—review and editing. All authors read and approved the final manuscript.

Data availability

The data generated and analyzed during the current study are available in the GitHub repository: https://github.com/gtStyLab/Uncertainty.git.

Code availability

The underlying code for this study is available in the Uncertainty GitHub repository and can be accessed via this link: https://github.com/gtStyLab/Uncertainty.git.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

The online version contains supplementary material available at 10.1038/s41540-024-00412-x.

References

  • 1.Park, S.-Y., Park, C.-H., Choi, D.-H., Hong, J. K. & Lee, D.-Y. Bioprocess digital twins of mammalian cell culture for advanced biomanufacturing. Curr. Opin. Chem. Eng.33, 100702 (2021).
  • 2.Carbonell, P. et al. An automated design-build-test-learn pipeline for enhanced microbial production of fine chemicals. Commun. Biol.1, 66 (2018). 10.1038/s42003-018-0076-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Orth, J. D., Thiele, I. & Palsson, B. O. What is flux balance analysis? Nat. Biotechnol.28, 245–248 (2010). 10.1038/nbt.1614 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Mahadevan, R., Edwards, J. S. & Doyle, F. J. Dynamic flux balance analysis of diauxic growth in Escherichia coli. Biophys. J.83, 1331–1340 (2002). 10.1016/S0006-3495(02)73903-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Srinivasan, S., Cluett, W. R. & Mahadevan, R. Constructing kinetic models of metabolism at genome-scales: a review. Biotechnol. J.10, 1345–1359 (2015). 10.1002/biot.201400522 [DOI] [PubMed] [Google Scholar]
  • 6.Costa, R. S., Hartmann, A. & Vinga, S. Kinetic modeling of cell metabolism for microbial production. J. Biotechnol.219, 126–141 (2016). 10.1016/j.jbiotec.2015.12.023 [DOI] [PubMed] [Google Scholar]
  • 7.Verkhivker, G. M., Agajanian, S., Hu, G. & Tao, P. Allosteric regulation at the crossroads of new technologies: multiscale modeling, networks, and machine learning. Front. Mol. Biosci.7, 136 (2020). 10.3389/fmolb.2020.00136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Piazza, I. et al. A map of protein-metabolite interactions reveals principles of chemical communication. Cell172, 358–372.e323 (2018). 10.1016/j.cell.2017.12.006 [DOI] [PubMed] [Google Scholar]
  • 9.Orsak, T. et al. Revealing the allosterome: systematic identification of metabolite-protein interactions. Biochemistry51, 225–232 (2012). 10.1021/bi201313s [DOI] [PubMed] [Google Scholar]
  • 10.Lu, S., Huang, W. & Zhang, J. Recent computational advances in the identification of allosteric sites in proteins. Drug Discov. Today19, 1595–1600 (2014). 10.1016/j.drudis.2014.07.012 [DOI] [PubMed] [Google Scholar]
  • 11.Shen, Q. et al. ASD v3.0: unraveling allosteric regulation with structural mechanisms and biological networks. Nucleic Acids Res.44, D527–D535 (2016). 10.1093/nar/gkv902 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Diether, M. & Sauer, U. Towards detecting regulatory protein-metabolite interactions. Curr. Opin. Microbiol.39, 16–23 (2017). 10.1016/j.mib.2017.07.006 [DOI] [PubMed] [Google Scholar]
  • 13.Machado, D., Herrgard, M. J. & Rocha, I. Modeling the contribution of allosteric regulation for flux control in the central carbon metabolism of E. coli. Front. Bioeng. Biotechnol.3, 154 (2015). 10.3389/fbioe.2015.00154 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Vasilakou, E. et al. Current state and challenges for dynamic metabolic modeling. Curr. Opin. Microbiol.33, 97–104 (2016). 10.1016/j.mib.2016.07.008 [DOI] [PubMed] [Google Scholar]
  • 15.Rodriguez, M., Good, T. A., Wales, M. E., Hua, J. P. & Wild, J. R. Modeling allosteric regulation of de novo pyrimidine biosynthesis in Escherichia coli. J. Theor. Biol.234, 299–310 (2005). 10.1016/j.jtbi.2004.11.023 [DOI] [PubMed] [Google Scholar]
  • 16.Chubukov, V. et al. Transcriptional regulation is insufficient to explain substrate-induced flux changes in Bacillus subtilis. Mol. Syst. Biol.9, 709 (2013). 10.1038/msb.2013.66 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Link, H., Kochanowski, K. & Sauer, U. Systematic identification of allosteric protein-metabolite interactions that control enzyme activity in vivo. Nat. Biotechnol.31, 357–361 (2013). 10.1038/nbt.2489 [DOI] [PubMed] [Google Scholar]
  • 18.Hackett, S. R. et al. Systems-level analysis of mechanisms regulating yeast metabolic flux. Science354, aaf2786 (2016). [DOI] [PMC free article] [PubMed]
  • 19.Christodoulou, D. et al. Reserve flux capacity in the pentose phosphate pathway enables escherichia coli’s rapid response to oxidative stress. Cell Syst.6, 569–578.e567 (2018). 10.1016/j.cels.2018.04.009 [DOI] [PubMed] [Google Scholar]
  • 20.Nishiguchi, H., Liao, J., Shimizu, H. & Matsuda, F. Novel allosteric inhibition of phosphoribulokinase identified by ensemble kinetic modeling of Synechocystis sp. PCC 6803 metabolism. Metab. Eng. Commun.11, e00153 (2020). 10.1016/j.mec.2020.e00153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Max Flöttmann, J. S., Stephan H., Edda K. & Pedro M. ModelMage: a tool for automatic model generation, selection and management. Genome Inform.20, 52–63 (2008). [PubMed]
  • 22.Rybinski, M., Moller, S., Sunnaker, M., Lormeau, C. & Stelling, J. TopoFilter: a MATLAB package for mechanistic model identification in systems biology. BMC Bioinform.21, 34 (2020). 10.1186/s12859-020-3343-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Guillen-Gosalbez, G., Miro, A., Alves, R., Sorribas, A. & Jimenez, L. Identification of regulatory structure and kinetic parameters of biochemical networks via mixed-integer dynamic optimization. BMC Syst. Biol.7, 113 (2013). 10.1186/1752-0509-7-113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Henriques, D., Villaverde, A. F., Rocha, M., Saez-Rodriguez, J. & Banga, J. R. Data-driven reverse engineering of signaling pathways using ensembles of dynamic models. PLoS Comput. Biol.13, e1005379 (2017). 10.1371/journal.pcbi.1005379 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Schaber, J., Liebermeister, W. & Klipp, E. Nested uncertainties in biochemical models. IET Syst. Biol.3, 1–9 (2009). 10.1049/iet-syb:20070042 [DOI] [PubMed] [Google Scholar]
  • 26.Massonis, G., Banga, J. R. & Villaverde, A. F. AutoRepar: a method to obtain identifiable and observable reparameterizations of dynamic models with mechanistic insights. Int. J. Robust Nonlin. Control. 33, 5039–5057 (2021).
  • 27.Bellu, G., Saccomani, M. P., Audoly, S. & D’Angio, L. DAISY: a new software tool to test global identifiability of biological and physiological systems. Comput. Methods Prog. Biomed.88, 52–61 (2007). 10.1016/j.cmpb.2007.07.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Joubert, D., Stigter, J. D. & Molenaar, J. An efficient procedure to assist in the re-parametrization of structurally unidentifiable models. Math. Biosci.323, 108328 (2020). 10.1016/j.mbs.2020.108328 [DOI] [PubMed] [Google Scholar]
  • 29.Ligon, T. S. et al. GenSSI 2.0: multi-experiment structural identifiability analysis of SBML models. Bioinformatics34, 1421–1423 (2018). 10.1093/bioinformatics/btx735 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Meshkat, N., Kuo, C. E. & DiStefano, J. 3rd. On finding and using identifiable parameter combinations in nonlinear dynamic systems biology models and COMBOS: a novel web implementation. PLoS One9, e110261 (2014). 10.1371/journal.pone.0110261 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Wieland, F.-G., Hauber, A. L., Rosenblatt, M., Tönsing, C. & Timmer, J. On structural and practical identifiability. Curr. Opin. Syst. Biol.25, 60–69 (2021). 10.1016/j.coisb.2021.03.005 [DOI] [Google Scholar]
  • 32.Berthoumieux, S., Brilli, M., Kahn, D., de Jong, H. & Cinquemani, E. On the identifiability of metabolic network models. J. Math. Biol.67, 1795–1832 (2013). 10.1007/s00285-012-0614-x [DOI] [PubMed] [Google Scholar]
  • 33.Raue, A. et al. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics25, 1923–1929 (2009). 10.1093/bioinformatics/btp358 [DOI] [PubMed] [Google Scholar]
  • 34.Chis, O. T., Banga, J. R. & Balsa-Canto, E. Structural identifiability of systems biology models: a critical comparison of methods. PLoS One6, e27755 (2011). 10.1371/journal.pone.0027755 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Hines, K. E., Middendorf, T. R. & Aldrich, R. W. Determination of parameter identifiability in nonlinear biophysical models: a Bayesian approach. J. Gen. Physiol.143, 401–416 (2014). 10.1085/jgp.201311116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Tran, L. M., Rizk, M. L. & Liao, J. C. Ensemble modeling of metabolic networks. Biophys. J.95, 5606–5617 (2008). 10.1529/biophysj.108.135442 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Kuepfer, L., Peter, M., Sauer, U. & Stelling, J. Ensemble modeling for analysis of cell signaling dynamics. Nat. Biotechnol.25, 1001–1006 (2007). 10.1038/nbt1330 [DOI] [PubMed] [Google Scholar]
  • 38.Schaber, J. et al. Automated ensemble modeling with modelMaGe: analyzing feedback mechanisms in the Sho1 branch of the HOG pathway. PLoS One6, e14791 (2011). 10.1371/journal.pone.0014791 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Babtie, A. C., Kirk, P. & Stumpf, M. P. Topological sensitivity analysis for systems biology. Proc. Natl. Acad. Sci. USA111, 18507–18512 (2014). 10.1073/pnas.1414026112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Lee, J. Y. & Styczynski, M. P. NS-kNN: a modified k-nearest neighbors approach for imputing metabolomics data. Metabolomics14, 153 (2018). 10.1007/s11306-018-1451-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Barandas, M. et al. TSFEL: time series feature extraction library. SoftwareX11, 10.1016/j.softx.2020.100456 (2020).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data 1 (22.7KB, xlsx)

Data Availability Statement

The data generated and analyzed during the current study are available in the GitHub repository: https://github.com/gtStyLab/Uncertainty.git.

The underlying code for this study is available in the Uncertainty GitHub repository and can be accessed via this link: https://github.com/gtStyLab/Uncertainty.git.


Articles from NPJ Systems Biology and Applications are provided here courtesy of Nature Publishing Group

RESOURCES