Skip to main content
NPJ Systems Biology and Applications logoLink to NPJ Systems Biology and Applications
. 2026 Jan 26;12:29. doi: 10.1038/s41540-026-00650-1

In silico modeling of anterior foregut endoderm differentiation towards lung epithelial progenitors

Amirmahdi Mostofinejad 1, David A Romero 1, Dana Brinson 2,3, Thomas K Waddell 2,3,4, Golnaz Karoubi 1,3,5, Cristina H Amon 1,2,
PMCID: PMC12920931  PMID: 41588005

Abstract

Directed differentiation of human induced pluripotent stem cells (iPSCs) into anterior foregut endoderm (AFE) and lung progenitors (LPs) has wide-ranging implications for lung developmental biology, disease modeling, and regenerative medicine. We expand on a previously developed mathematical modeling framework and apply it to the directed differentiation of AFE into LPs. A model-based approach guides experimental design, followed by a multistage model inference process: maximum likelihood estimation based on in vitro data and identifiability analyses to eliminate unidentifiable candidates, thereby guiding model selection. To the authors’ knowledge, this is the first mathematical model of the population dynamics of directed differentiation of AFE into LPs. The model suggests that the overall dynamics are primarily driven by AFE proliferation and differentiation into LPs. In silico experiments predict that daily media change nearly doubles LP yields compared to cultures without media replenishment. Moreover, the model suggests that higher split ratios on day 10 enhance yield per input cell, a measure of differentiation efficiency, by 26%. This work provides a blueprint for refining iPSC-based lung lineage differentiation protocols by combining empirical data and mathematical modeling.

Subject terms: Developmental biology, Stem cells, Systems biology, Engineering, Mathematics and computing

Introduction

The differentiation of induced pluripotent stem cells (iPSCs) into lung epithelial progenitors (LPs) is critical for developmental biology studies and regenerative medicine applications. LPs give rise to alveolar1 and airway2 epithelium, both essential for lung function3. During directed differentiation, iPSCs are exposed to small molecules that mimic the signaling pathways guiding cell fate during fetal development. Definitive endoderm is first specified, followed by anterior foregut endoderm (AFE)4. Activation of Wnt and supplementation of BMP4 and retinoic acid in AFE yields LPs, which can be further differentiated into airway or alveolar lineages5,6.

Applications of directed iPSCs differentiation protocols are wide-ranging, especially in the fields of cell therapy and tissue engineering, where billions of cells are needed to attain clinically relevant grafts and treatments7. These protocols typically require the addition of a series of small molecules, growth factors, and reagents. Many of these protocols require several weeks or even months to generate the desired cell types5,6. Optimization of timing, cell density, and media formulation, among other factors, would be beneficial for most directed differentiation protocols, as it maximizes the desired cell types yields8. The design of differentiation protocols has historically been guided by developmental biology and relied predominantly on empirical studies, which can be costly, time-consuming, and suboptimal9.

Mathematical models can formulate experimentally testable hypotheses, guide the design of experiments, and be utilized in scaled-up clinical applications10,11. These models also have the potential to complement our understanding of complex biological phenomena, yielding an understanding of the dynamics that result in the presence of undesired non-lung endodermal lineages such as intestine, liver, or stomach when differentiating to distal LPs12. In silico modeling further complements in vitro experimentation by allowing researchers to test various culture conditions computationally, thereby reducing the number of experimental iterations11. Moreover, each iPSC line can respond differently to the same differentiation protocol, as mentioned by Jacob et al. NKX2-1+ LP yield can be between 30% and 90%, depending on the cell line6. This necessitates precise adjustments of culture parameters and timelines, which can be facilitated by mathematical models13,14. Such tailoring is especially critical for personalized medicine, where patient-specific iPSCs often demand custom approaches to achieve optimal cell populations15. Given these advantages, designing mathematical models to predict the differentiation and growth kinetics of AFE cells into LPs is essential to improve our capacity to optimize experimental conditions for generating lung tissues15.

In this paper, a previously reported model development approach is augmented to include biochemical effects alongside multicellular populations16,17. Due to the intrinsic differences between the physical units and measurement errors associated with cell density and substrate concentrations, independent error parameters are defined18. This model serves as a tool for further understanding the directed differentiation of iPSCs to LPs and enabling optimization of its protocols.

This is the first population dynamics model of directed differentiation of AFE to LPs, to the authors’ knowledge. Model inference starts with multiple biology-informed model proposals, considering two approaches to cell populations: one incorporating only the total population and one incorporating AFE and LP populations individually. We then perform model calibration and selection using in vitro observations from designed experiments and ensure model identifiability using mathematical tests. The inferred model is then validated by calculating the goodness of fit on the hold-out dataset. The model is then used to study the effects of different day 10 split ratios and the importance of split ratios and media refreshment protocols during culture.

Results

This paper is organized around the experimental window in which AFE cells are induced toward NKX2-1+ LPs (Fig. 1A). Specifically, we focus on days 10–15 of the protocol, and the corresponding model development steps used to quantify this process, and use the model to predict the differentiation dynamics under modified culture conditions. Another paper by the authors performs a similar analysis for days 0–3 of this protocol17.

Fig. 1. Experimental protocol and the lineage models.

Fig. 1

A Typical experimental protocol for directed iPSCs differentiation to lung progenitors. Also, mTeSR, DE Kit, FACS, DS, and SB represent mTeSR-1 iPSC maintenance medium, STEMdiff™ Definitive Endoderm Kit, fluorescence-activated cell sorting, dorsomorphin, and SB431542, respectively. We focus on model inference for days 10 to 15 of the directed differentiation process. Created in BioRender. Brinson, D. (2025)82. B Lineage models. AFE and LPs are anterior foregut endoderm and NKX2-1+ lung progenitors, respectively. Biomarker expressions are shown in red text. Green, blue, and orange lines correspond to proliferation, differentiation, and death rates, respectively. C Model development protocol. Yellow and blue boxes show the initial definitions and mathematical tests. Green boxes indicate the steps with model inference. Model-based design of experiments involves inference with synthetic data for experimental design, whereas model inference involves inference with in vitro data. This methodology is influenced by earlier work by the authors16.

Two lineage formulations are considered in the model development stage (Fig. 1B): a one-population model that captures the total live cells (M0) and a two-population model that explicitly resolves the AFE and LP populations (M1), describing the differentiation of AFE to LP. Both models consider glucose and lactate as nutrients and waste products and evaluated under different growth and environmental effect hypotheses. Error models are also used to describe the difference between experimental observations and the mean (structural, shown in Fig. 1B) model.

Figure 1C depicts the workflow in this paper. We first define various candidate models and screen them using structural identifiability test. The next step is model-based design, followed by running experiments to obtain the necessary measurements. The collected data is used for model calibration and selection, and to validate the selected model. Parameter uniqueness is ensured, and global sensitivity analysis is used to understand the population system’s dynamics. The in silico model is then used to predict the effects of split ratio and media refreshment on LP yield and differentiation efficiency.

Structural identifiability analysis

Structural identifiability is a property of a mathematical model indicating that, if we could measure the system perfectly (i.e., with no measurement error) and as often as needed, the model parameters would be uniquely determined1921. Structural identifiability analysis was performed on all candidate models with the observables being the state variables in Eqs. (6) and (8). We showed that all but two of these models are globally structurally identifiable (StructuralIdentifiability.jl)2224. The two unidentifiable structural models are M0 and M1, with exponential growth and without glucose or lactate effects. These two models are discarded, and the rest (22 remaining structural models) are used for parameter inference. Having restricted attention to structurally identifiable candidates, we then determined the measurement frequency required for reliable parameter estimation.

Model-based design of experimental protocols

At this stage, we applied model-based design of experimental protocols (MBDEP) to determine the experimental details required for model inference. Since we assumed spatial homogeneity in the cell populations in the well plate, the population dynamics system simplifies to a temporal problem. In each experimental condition, the key design parameter is the measurement frequency (sampling period)16.

Based on our experimental capabilities, the sampling period could range from 0.5 to 4 days. Note that this design assumes four experimental conditions: two different split ratios and two settings, with and without media change. Here, it is assumed that the AFE differentiation can be well described by Eq. (8) with Gompertz growth and proportional noise of 30% (bn = bc = 0.3 in Eq. (13)). Next, we assume the model parameters are known from our prior understanding of the differentiation process; the model with these parameter values yields qualitatively similar dynamics to those observed in our previous experimental data (Supplementary Table 2).

Then, we used the assumed model and its parameters to generate synthetic data, perform parameter inference, and determine which sampling period yields the smallest distance (error) between the assumed and inferred parameters. The resulting error in parameter inference for the observables is plotted as a function of sampling frequency in Fig. 2. Estimating parameters for the set of model proposals considered here, taking individual live cell population measurements every 24 and 48 h, result in 44% and 61% error, respectively. For our experiments, we decided to sample concentrations and individual populations at 24 and 48 h, respectively, to balance experimental cost and model inference accuracy (Supplementary Table 1).

Fig. 2. Parameter inference errors for different sampling periods.

Fig. 2

Sampling periods below 48 h are acceptable for this experiment.

Running and postprocessing experiments

As described in the experimental setup, the cells are passaged at two ratios of 1:2 and 1:5, resulting in different initial AFE cell populations (N = 4). This experiment, with daily media changes (MCH1), yielded total individual cell populations on alternate days and daily measurements of glucose and lactate concentrations. Additionally, a parallel experiment was conducted without any media change (MCH0), and similar measurements were recorded. Note that this results in 16 data points for the populations (2 in time, 2 for the AFE and LP populations, 2 for plating ratios, and 2 for media change) and 32 data points for concentrations (4 in time, 2 for the glucose and lactate, 2 for plating ratios, and 2 for media change), resulting in 48 total data points for each replicate.

Two-way statistical analysis of variance (ANOVA) test (using the Pingouin25 Python library) demonstrated statistically significant differences in terminal concentrations (day 15) between different initial populations and media change condition experiments. Similarly, the split ratio significantly affected some cell populations on day 15, suggesting that the experimental conditions affected the terminal (day 15) live LP cell populations. The ANOVA test confirms that the experimental data supports a connection between media refreshment and populations, corroborating environmental effect growth models.

Model inference

We inferred the parameters for the candidate models using the calibration dataset (2:1:1 split for calibration, selection, and validation, respectively). The candidate models are M0 and M1 (named lineage models, stating the dynamics regarding distinguishable cell populations, refer to the Structural models section), with 11 growth models and three error models, resulting in 66 total models. Note that M0 and M1 have 6 and 9 parameters each, logistic and Gompertz growth models add one parameter, nmax, and each biochemical effect adds one parameter, the corresponding Kg or Kl, so models range between 6 to 12 structural parameters and error models add 2 to 4 extra parameters.

To guarantee thorough coverage, we used a maximin Latin hypercube26 to choose 100 starting points for the optimization. This space-filling sampling scheme selects initial parameter guesses that are as evenly distributed as possible across the allowed ranges, reducing the risk of missing good solutions. This resulted in 100 × 66 = 6600 optimization runs. A hypercube to draw the initial guesses for the optimization runs had the bounds [10−4, 10] for rates (unit being d−1), [0, 1] for differentiation ratio (unitless), and [nmaxu/10,nmaxu] for nmax. The upper bound of the latter, nmaxu, is the maximum number of cells with a diameter of 15 μm that can occupy the entire area of the well plate bottom as a monolayer. Table 1 summarizes all the parameter search bounds.

Table 1.

Parameter search space for our models

Definition Parameter Bounds Unit
Average growth rate βq (0.0001, 10.0) d−1
Average death rate δq (0.0001, 10.0) d−1
AFE proliferation rate βa (0.0001, 10.0) d−1
AFE renewal ratio pap (0.0, 1.0) dimensionless
AFE death rate δa (0.0001, 10.0) d−1
LP proliferation rate βp (0.0001, 10.0) d−1
LP death rate δp (0.0001, 10.0) d−1
Glucose reaction constant Vg (0.0001, 1.0) cell mmol L−1 mm2 d−1
Lactate reaction constant Vl (0.0001, 1.0) cell mmol L−1 mm2 d−1
Glucose proliferation MMK constant Kg (0.01, 50.0) mmol L−1
Lactate proliferation MMK constant Kl (0.5, 200.0) mmol L−1
Glucose reaction MMK constant c¯g (0.0001, 50.0) mmol L−1
Lactate reaction MMK constant c¯l (0.0001, 50.0) mmol L−1
Maximum density nmax (565.8, 5659.0) cells mm−2
Density additive constant an (0.1, 500.0) cells mm−2
Density proportional constant bn (0.001, 2.0) dimensionless
Concentration additive constant ac (0.1, 20.0) mmol L−1
Concentration proportional constant bc (0.001, 2.0) dimensionless

The two lineage models have different loss functions, resulting in incompatible BIC (Bayesian information criterion) definitions. This results in the two lineage models being compared separately, resulting in Figs. 3, 4 on the selection datasets. Model comparison on BIC values in Fig. 3 indicates the best-performing M0 model is exponential Glu M0 with additive error. It is important to note that the large spread observed in the boxplots for the BIC values reflects the inherent complexity and multimodality of parameter estimation in systems biology27,28, rather than the lack of convergence of the optimization algorithms.

Fig. 3. M0 model comparison.

Fig. 3

The rows show the 33 inferred models, with colors and shades representing structural and error models, respectively. Each row is derived from 100 parameter calibrations, each inferred from different initial guesses. The x-axis presents BIC values as discrete points, where a lower value indicates better model performance. These values are collectively summarized in the form of a boxplot.

Fig. 4. M1 model comparison.

Fig. 4

Note that the BIC value here is inconsistent with Fig. 3 since it includes two cell populations.

Inferred parameter values and the corresponding confidence intervals are represented in Supplementary Table 3 for the four best-fitted inferred M1 models: Gompertz with proportional error, logistic with additive error, logistic with proportional error, and exponential Glu with additive error, in ascending order of BIC, according to Fig. 4. As can be seen from the table, two of the inferred values of nmax are unidentifiable, as observed by the upper bounds of the confidence interval not being found, indicating that it is either infinity or a very large value. The upper bound of nmax for logistic M1 with additive error is much greater than the physically defined upper bound shown in Table 1, making the parameter unidentifiable for this model. This observation shows that all models with nmax are practically unidentifiable. Population variations are minor in experiments, so the model might not see the existing effects of limitation by space. The existence of space-constrained growth could be observed with more experiments.

Observing the likelihood profiles for the models, Fig. 5, Supplementary Figs 3, 4, 5, it is evident that only exponential Glu M1 model has a concave downward profile, needed for a well-defined model in the proximity of the inferred parameters. The differentiation ratio, pap, is unidentifiable for logistic M1 with proportional error and Gompertz M1 with proportional error, and βa is unidentifiable in the logistic M1 with an additive error model. This analysis yields exponential Glu M1 with additive error as the chosen M1 model, as it is the only practically identifiable model for all the parameters in the top four M1 models.

Fig. 5. Likelihood profile for exponential M1 Glu with additive error model.

Fig. 5

Each x-axis corresponds to a parameter in the model, and the y-axis is the log-likelihood value. The intersection of the horizontal red line with the blue curve indicates each parameter confidence interval with 95% confidence, and the vertical red line is the inferred parameter value. Panels correspond to model parameters: (a) βa, (b) pap, (c) δa, (d) βp, (e) δp, (f) Vg, (g) Kg, (h) c¯g, (i) an, (j) ac.

The BIC values in Fig. 4 can be compared with the loss values in Supplementary Fig 1. The loss values are similar to BIC values without the effect of the number of parameters; zeroing the first term in Eq. (18). This analysis shows that the negative effect of the number of parameters on the error measure, BIC, directs model selection towards model parsimony.

In silico model predictions versus the experimental observations in all experiments for the inferred M0 and M1 models are shown in Figs. 6, 7, respectively, with different colors representing the different experimental conditions; experiments one and three correspond to MCH0 culture, and experiments two and four to MCH1 culture. Markers show the mean values of the experimental data, and the error bars indicate the standard deviations (N = 4). The curves represent the inferred model expected values, while the bands show the inferred model standard deviations. Looking at the mean experimental measurements for M0, 87.5% of population measurements are inside the error model prediction, and the number is 96.4% for concentration measurements, displaying a good match between observations and the model predictions at most time points. A similar trend is seen with M1, with 100% and 92.8% coverage of the population and concentration measurements, respectively. The highest deviation between the model and prediction appears on day 4 of the total and AFE populations, meaning that the model overestimates the effect of glucose deficiency on growth rate reduction in higher populations and underestimates it in lower populations.

Fig. 6.

Fig. 6

Experimental observations versus the model predictions for the exponential Glu M0 with the additive error model.

Fig. 7.

Fig. 7

Experimental observations versus the model predictions for the exponential Glu M1 with an additive error model.

Practical identifiability analysis

Practical identifiability extends structural identifiability analysis to real, limited experimental data subject to measurement error. In practical terms, it describes how sensitive the model fit is to changes in parameter values: an identifiable model has a clear “best fit," meaning that even small parameter changes noticeably worsen the fit29. The parameter confidence intervals are calculated by confining each parameter and minimizing the loss function17. Studying the width of the confidence intervals develops insights into the quality of model inference30. We performed profile likelihood-based practical identifiability analysis using the ProfileLikelihood.jl31 package.

Table 2 shows the inferred parameters and the resulting confidence intervals for all estimated model parameters for inferred M0 and M1 models. A few observations from the inferred error parameters can be driven from the inferred values of error model parameters, an and ac. Both models have the same concentration state, glucose, meaning that similar values for the additive standard deviations are expected, corroborated by the two inferred ac values not being significantly different, as shown in Table 2. On the contrary, the population states are different between the models, total population for M0, and AFE and LP populations for M1, with total population defined as the sum of the two individual populations, nq = na + np. So, in the case of dependence of na and np with the correlation coefficient ρnanp, the standard deviation of nq is defined as,

anM0=σnq=σnq2=σna2+σnp2+2ρnanpσnaσnp=2anM12(1+ρnanp)=2(1+ρnanp)anM1 1

where σ is the standard deviation, and anM0 and anM1 are the error parameters of models M0 and M1, respectively. This means that looking at the two inferred an values from Table 2, the value for the correlation coefficient is 32%, which is comparable with the correlation coefficient calculated from raw data, 45%. These observations on the error model parameters indicate their consistency and support the correctness of the model inference process.

Table 2.

Value and confidence intervals (lower bound, higher bound) for inferred models

Parameter M1 M0 Unit
Value CI Value CI
βq 8.209 (5.090, 27.74) d−1
δq 1.710 (1.238, 4.034) d−1
βa 10.58 (10.39, 10.73) d−1
pap 0.7975 (0.7919, 0.8018) dimensionless
δa 2.002 (1.971, 2.042) d−1
βp 1.178 (-2.976, 3.581) d−1
δp 3.869 (3.121, 5.350) d−1
Vg 0.1257 (0.1082, 0.1404) 0.2059 (0.01002, 1.823) cell mmol L−1 mm2 d−1
Kg 29.56 (28.95, 30.39) 52.70 (38.76, 488.2) mmol L−1
c¯g 398.5 (355.6, 465.8) 662.4 (17.94, 5000) mmol L−1
an 89.77 (76.70, 106.4) 145.8 (115.5, 198.3) cell mm−2
ac 0.9900 (0.8472, 1.200) 0.9732 (0.8045, 1.185) mmol L−1

Both models are Exponential Glu with additive error models.

Figure 5 shows the likelihood profiles for the inferred M1 model. Note that the red vertical and horizontal lines correspond to the inferred parameters and the 95% confidence threshold, respectively. The intersections of the curve and the horizontal line show the lower and upper bounds. The figure indicates finite, relatively narrow confidence intervals, consistent with practical identifiability of the model based on our experimental data32. On the contrary, the likelihood profiles for M0 are depicted in Supplementary Fig 2, showing a few unidentifiable parameters.

In summary, this analysis shows that the M1 model, while adding three parameters and needing more measurements provided by flow cytometry, resulted in an identifiable and more detailed model with two cell populations. The rest of the paper focuses mainly on exponential M1 Glu with additive error as the chosen model. Finally, it is noted that the confidence interval for βp in the model includes zero, meaning that the LP proliferation effect may not be statistically significant based on the experimental data and that the system’s dynamics might be entirely driven by the proliferation and differentiation of the AFE population.

Goodness of fit

The root-mean-square prediction error (RMSE) of the inferred model was calculated using the validation (hold-out) dataset. Results show that the inferred model has an RMSE of 102.3 cells mm−2 for cell densities. For context, raw data and inferred standard deviations are 104.46 and 89.77 cells mm−2, respectively. Given that the model RMSE is comparable to the experimental variance in the data, we consider the model sufficiently accurate for its applications in supporting AFE differentiation to LPs.

Furthermore, to estimate the predictive accuracy of the model over unobserved experiments, we performed leave-one-out cross-validation, holding out the entire time series (all time points) from all replicates of one experimental condition. The inferred model, exponential Glu M1 with additive error, is recalibrated on 12 of the time series datasets (related to 3 of the experiments), and the held-out experiment time series is used to assess the population prediction error. Normalized RMSE is calculated using Eq. (22) to be equal to 18.2 ± 8.8% (average and standard deviation), which is below the standard deviation of the experimental data (89.2%) and the 30% threshold that has been used in previous works33.

Similarly, RMSE, mean absolute error (MAE) and mean error (ME) across the folds are calculated as 87.5 ± 42.0 cells mm−2, 61.6 ± 24.4 cells mm−2, and 5.9 ± 26.3 cells mm−2, respectively. MAE provides the typical magnitude of the prediction error in the same units as the measurements, while ME quantifies directional bias; values near zero indicate no systematic over- or under-prediction. Here, the MAE indicates that predictions are typically within 62 cells mm−2 of the observations, and the near-zero average ME suggests that the model is not consistently biased in one direction across conditions. The higher RMSE relative to MAE suggests that, while typical errors are moderate, some held-out conditions or time points exhibit larger deviations34. This analysis reflects prediction accuracy for new experimental conditions in the split ratio interval [1:5, 1:2], not just interpolation between time points.

Global sensitivity analysis

We conducted global sensitivity analysis of the exponential M1 Glu model. This analysis ranks model parameters by their impact on predictions, helping to identify model dynamics. In particular, we computed the sensitivity of the AFE and LP populations, as well as glucose concentration, to the structural model parameters. First-order Sobol indices rate the significance of parameters, while total-order Sobol indices additionally consider parameter interactions.

In order to achieve this, we employ 40,000 samples from the bounds [0.909, 1.10]θ* for all the parameters with Sobol’s method35. Figure 8A shows the resulting Sobol indices for the split ratio of 1:2 with MCH0 culture at time t = 96 h, equivalent to experiment 1 in Fig. 7. The model predicts that parameter interactions are less significant than their first-order effects, as evidenced by the qualitative consistency between the first-order and total-order Sobol indices. Also, it predicts that proliferation and death rates of LPs have insignificant effects on the AFE population, as can be inferred from Eq. (8) by examining the effect of the LP population on glucose consumption. The negligible contribution of the LP proliferation rate, βp, to any of the states is consistent with confidence intervals from practical identifiability analysis. On the contrary, the model’s population growth is driven primarily by the differentiation ratio, pap, AFE proliferation, βa, and the death rate, δa. The sensitivity analysis indicates that, at this stage of the differentiation protocol, population dynamics are predominantly driven by AFE cellular processes.

Fig. 8. Global sensitivity analysis on exponential Glu M1 model.

Fig. 8

A GSA on day 4 values of observables. The x-axis shows the observables. B Time evolution of the total order Sobol indices of the rate of LP population to model parameters.

Figure 8B shows the time evolution of total-order Sobol sensitivity indices throughout the experiment for the LP population growth rate. The figure indicates that throughout the experiment, the LP population rate is driven by the AFE cellular process and glucose through the parameter Kg. It is worth noting that glucose metabolism parameters, Vg and c¯g, do not directly affect the population, which accounts for their lower GSA values. The model predicts that the initial glucose concentration and media replacement rate affect differentiation; further explored in the following section.

Applications

The primary motivation for developing mathematical models is their ability to run multiple in silico experiments quickly, enabling exploration of various protocols and cell growth conditions. These in silico experiments focus on the effect of day 10 split ratios on the growth dynamics by studying the defined response variables. Further, we study the effect of media change protocols by considering two conditions, no media change (MCH0) and daily media change (MCH1), to quantify the extent to which media change has enhanced the protocols.

As mentioned (Fig. 1A), the experimental procedure includes passaging the cells at day 10 of the protocol with a given split ratio and taking measurements of AFE and LP populations on days 11, 13, and 15. The cells need to be seeded between days 10 and 11, and as observed in our experiments, the live cell population drops significantly, showing a completely different set of dynamics between day 10 and day 11, compared with the dynamics between day 11 and day 15. The inferred model predicts the dynamics between days 11 and 15, with day 11 measurements as its initial conditions. The split ratio at day 10 can be directly controlled, while the subsequent day 11 populations are not directly controlled, they result from the split ratio and the growth environment.

To bridge this gap between the initial conditions of the model (not directly controllable) and the day 10 split ratios (directly controllable during plating), we introduced two linear functions to map the day 10 split ratios to day 11 populations, where SR is the split ratio (dimensionless). The inferred mappings between the split ratio, SR, and model initial conditions are defined as,

na0=κ1SR+κ01,np0=κ2SR+κ02. 2

Here, κ1, κ2, κ01, κ02 are estimated through robust linear regression (GLM.jl36) to be 407.845, 140.941, −13.2916, and −2.04044 cells mm−2, respectively. The coefficients in the split ratio interval [1:5, 1:2] show that on day 11 of the protocol, approximately 25% of the cell population is differentiated to LPs. The top plot in Supplementary Fig 6 depicts the initial conditions used in the in silico models, and the line shows the one-dimensional space explored to observe the effect of the split ratio. The bottom plots show the two mappings between the day 10 split ratios and each of the initial conditions.

The previous step generated initial conditions corresponding to target split ratios, producing the time evolution of LP densities shown in Fig. 9 for MCH0 and MCH1 cultures. The figure illustrates the model prediction that the MCH0 culture causes the LP populations to peak and then decline before four days of culture. It is also predicted that the fluctuations become more pronounced at higher AFE densities. This might be caused by the linear increase in glucose consumption with cell density, so glucose would be depleted faster at higher densities, leading to lower proliferation and differentiation rates and a swifter decline in cell populations. Production of lactate and consumption or degradation of other substrates, such as recombinant human Bone Morphogenetic Protein 4 (BMP4), retinoic acid (RA), and CHIR99021, have a similar direct relationship with respect to the cell population and might contribute to this behavior6,37,38. Supplementary Fig 7 shows the model prediction of the long-term behavior of the cell population, assuming the inferred dynamics hold. It is predicted that although exponential growth would result in no limit to the population, because the growth rate is also dependent on the glucose concentration, MCH1 culture would result in a maximum LP density of around 400 cells mm−2.

Fig. 9. Time evolution of LP density in different split ratios and media change conditions.

Fig. 9

A No media change (MCH0). B Daily media change (MCH1).

Quantification of the effect of the experimental conditions is done by defining response variables as,

np4=np(t=4),Yieldperinputcell=np/nq0,LPratio=np/nq. 3

Here, np4 is the LP density at day 4 of the model (equivalent to day 15 of the protocol), yield per input cell is the ratio of LP density to initial total density, and LP ratio is LP density to total density.

Traversing the day 11 initial condition space shown by the orange line on Supplementary Fig 6 and running simulations to calculate the response variables defined by Eq. (3) results in a larger scale comparison between the split ratios shown by Fig. 10. The vertical dashed lines denote split ratios of 1:2 and 1:5, the two ratios used in experiments, meaning predictions in this interval are interpolations. Figure 10A, B relate to the MCH0 and MCH1 cultures, respectively, and their rows stand for different response variables.

Fig. 10. The effect of different day 10 split ratios.

Fig. 10

Each row shows the effect of the split ratio on one of the response variables. A No media change (MCH0). B Daily media changes (MCH1).

Effect of media change protocol

To investigate the effect of media changes on cell population dynamics, we compared two experimental conditions, MCH0 and MCH1, as defined previously. Note that both conditions yield similar results on day 1, as the first media change is made immediately after the day 1 measurements (Fig. 10A, B). By day 4, however, as predicted by the model, MCH1 significantly enhances all three response variables, LP density, yield per input cell, and LP ratio, especially at lower split ratios where nutrient consumption is greater.

For example, in the [1:5, 1:2] split ratio interval, daily media refreshment is predicted to nearly double the day 4 LP density and yield per input cell (both increased by 94%), while moderately improving the LP ratio by 5.3%. The model suggests that more frequent nutrient replenishment supports larger overall populations without substantially altering their terminal proportion of cell types. A similar pattern emerges in the experimental data presented in Fig. 7, where MCH1 culture improves day 4 LP density and yield per input cell by 80% and increases the LP ratio by 21%. Considering the error parameter an (Table 2), these observed improvements closely align with the in silico results and are consistent with model predictions that MCH1 culture increases cell growth without altering the proportion of cell types.

Effect of split ratios

In silico, we investigated the effect of split ratios on cell population dynamics of the MCH1 culture. The uppermost panel in Fig. 10B shows the model-predicted LP densities as a function of split ratios, indicating that lower split ratios increase LP density. In contrast, all the daily density plots are concave downward over the split ratio, predicting that the yield per input cell is decreased at lower split ratios, as also depicted by the middle row in Fig. 10B. In silico simulations suggest that the system’s efficiency decreases as the split ratio decreases, and the yield per input cell drops from 0.925 to 0.733 in the [1:5, 1:2] split ratio interval. As the lower row in Fig. 10B illustrates, the change is predicted to be mainly around the total population, and the proportion of the cell types does not change significantly. In summary, the model predicts that higher split ratios yield up to 26% higher efficiency on day 15 of the differentiation protocol. This is rather conservative compared to the experimental observations from Fig. 7, where a split ratio of 1:5 would nearly double (99% increase) the yield per input cell compared to the split ratio of 1:2. This is evidence that the model underestimates the effect of growth deceleration in higher populations caused by the selection of exponential growth over logistic and Gompertz growth.

Discussion

This paper illustrates a mathematical model for the differentiation of AFE to LPs. The candidate models were structured using two lineage models (one- and two-state models), three growth models (exponential, logistic, and Gompertz), and the presence or absence of the MMK effect of glucose and lactate on growth, resulting in 24 structural models. This was complemented with three candidate standard deviation models individually defined for the cell densities and the substrate concentrations, resulting in 72 total models. The two individual error models are defined because the measurement methods are different, and human error plays a more significant role in cell density measurements. Six models were discarded because of the structural unidentifiability, and model calibration and selection were performed on the rest of the models using calibration and selection datasets. The best practically identifiable model was selected by analyzing the likelihood profiles of models with the best BIC scores. The inferred model was validated with the RMSE of 102.3 cells mm−2 compared with inferred standard deviations of 89.77 cells mm−2, indicating a sufficiently accurate model. All these steps showed the extensibility of the previously developed framework for equation definition and model inference16,17.

The sampling period for the experimental protocol is supported by MBDEP16. The in vitro experiments are conducted with and without growth media refreshment to ensure the cells are subject to resource-deprived conditions and the potential effect of the biochemical environment can manifest. Practical identifiability analysis of the inferred models showed that the individual populations obtained from flow cytometry not only assisted with constructing a more detailed model but also helped create an identifiable model with unique parameter values. A practical identifiability analysis step could be added to MBDEP to ensure model identifiability given the perceived measurement error and the design sampling period prior to running experiments.

Mathematical tests, such as practical identifiability and global sensitivity analyses, provided better insight into the experimental protocol. Both analyses predicted that cellular differentiation is more important than other dynamics, confirming that the AFE population is the main initial population affecting the terminal LP density. The former did this by not refuting the LP proliferation rate, βp, being zero, and the latter by showing a small sensitivity index for this parameter.

Directed differentiation results are sensitive to cell density, necessitating optimization of seeding density for different cell lines39. Cell density has been shown in several differentiation systems to influence pluripotency and cell fate. This occurs via paracrine signaling, cell shape, and metabolic activity4042. Specifically, during the derivation of AFE into LPs, it is recommended to passage the cells between days 8 and 10 to avoid over-confluence6. The AFE differentiation model is applied to conduct in silico experiments to predict the effects of split ratios. The model suggested no significant effect of split ratios on the LP ratios. This observation is corroborated by Ptasinski et al.43, who show that split ratios between 1:6 and 1:3 do not significantly affect the day 15 LP ratio.

In silico experiments further suggested that decreasing the split ratio lowers the yield per input cell, a measure of the system’s efficiency, while it increases the LP density. A similar pattern has been observed in the directed differentiation of iPSCs to cardiomyocytes44, where a higher split ratio resulted in higher yields44. The yield per input cell might be an essential variable for optimization when working with scarce cells. However, since no candidate model accounted for the Allee effect45, which stipulates a minimum cell population needed for survival, extrapolating the results to higher split ratios is less reliable, meaning the model cannot suggest split ratios above 1:5.

Another in silico experiment was performed to investigate the effect of daily media changes on growth dynamics. Daily media changes prevent nutrient depletion and metabolite buildup46. Furthermore, the stability of small molecules and growth factors used in directed differentiation protocols is a concern because BMP4 and retinoic acid have limited half-lives under normal cell culture conditions, especially in serum-free media47. Because only 8% of added all-trans retinoic acid remains after 24 h of incubation with cells48, daily media changes help replenish this critical factor. The model predicts that, on average, daily media replacement nearly doubles LP density on day 15 relative to no media change, without significantly affecting population ratios. One possible explanation is that daily media changes likely improves differentiation efficiency by continually replenishing the small molecules required for directed differentiation. This analysis is consistent with experimental data, illustrating how the model can be used to study the effect of the biochemical environment on culture dynamics. Using the in silico model, the media change frequency and ratio (the fraction of media changed) can be explored, optimized, and prioritized for experimental testing.

As mentioned in the Model inference section, none of the inferred models with maximum density, nmax, were identifiable. This might be because the populations were too small for this effect to appear. Future experiments could focus on split ratios below 1:2, the lower bound used in this paper. Also, the existence of a minimum population for the AFE to successfully differentiate into LPs, the Allee effect, can be explored49, needing experiments with split ratios higher than 1:5. Note that the model developed in this paper incorporated data from experiments with day 10 passaging. Future model inferences can include data from experiments without day 10 passaging to quantify the effect of passaging and further generalize the model. These would enable the inferred model to cover a broader experimental range and facilitate the search for a global optimum in split ratios.

The inferred model can be improved to become more descriptive by including a ventral anterior foregut endoderm (vAFE) transition state. Expression of NKX2-1, PAX1, and NKX2-5 can be used to indicate the population of vAFE4. Then, mathematical models similar to the M2 lineage model from the authors17 can be calibrated, and a similar model selection protocol can be utilized.

LPs are developmentally immature primordial cells, the first cells expressing biomarkers specific to the lungs1. There is a wealth of research on establishing protocols for producing iPSC-derived LPs1,5,50,51. It is shown that WNT (activated using CHIR99021), RA, and BMP4 are critical to lung specification6. Due to wide adoption by the regenerative medicine community and the utilization of CHIR, BMP4, and RA in many protocols for later-stage cells, e.g., type I and II alveolar epithelial6, airway organoids2, and purified basal cells52, this paper focused on differentiation to LPs using these small molecules. Future studies could incorporate cell signaling pathways by considering more states and observables representing biomarkers, inhibitors, catalysts, and proteins, thereby increasing the model’s explainability and its value for protocol optimization. Such model inference would require additional measurements from the newly defined observables to provide an identifiable model. This is done by conducting experiments with varying nutrient and growth factor levels and media replacement periods53,54.

In summary, this study highlights the potential utility of integrating in silico modeling to optimize AFE differentiation protocols. The model offers a refined approach for enhancing the production of LPs from AFE cells by quantifying the impact of key experimental parameters such as media refreshment and split ratios. This procedure is intended to support reproducibility and efficiency in generating clinically relevant cell populations.

Methods

Experimental setup

iPSCs were maintained as colonies on hESC-Qualified Matrigel (Corning, cat. no. 354277)-coated 6-well plates. Prior to differentiation, cells were first passaged as single cells onto Corning® Matrigel®-coated 12-well plates and cultured in the iPSC maintenance medium (mTeSR-1, StemCell Technologies, cat. no. 85850) for 24 hours with Y-27632 (10 μM, StemCell Technologies, cat. no. 72304). Then, the cells were cultured with STEMdiff™ Definitive Endoderm Kit (StemCell Technologies, cat. no. 05110) for 72 h.

On day 3, cells were dissociated as clumps using Gentle Cell Dissociation Regent (GCDR; StemCell Technologies, cat. no. 07174) and passaged onto Corning® Matrigel®-coated 6-well plates and cultured in DS/SB (cSFDM with 2 × 10−6 mol L−1 dorsomorphin (Tocris, cat. no. 3093) and 10 × 10−6 mol L−1 SB431542 (Tocris, cat. no. 1614)) with Y-27632 (10 μM, StemCell Technologies, cat. no. 72304) medium for 24 h. Complete serum-free differentiation media (cSFDM) consisted of Iscove’s Modified Dulbecco’s Medium (IMDM; Gibco, cat. no. 12440053) and Ham’s F-12 (Gibco, cat. no. 11765054) supplemented with 0.5x B-27 (Invitrogen, cat. no. 17504001), 0.5x N-2 (Invitrogen, cat. no. 17502-048), 50 μg mL−1 ascorbic acid (Sigma-Aldrich, cat. no. A4544), 500 μg mL−1 monothioglycerol (Sigma-Aldrich, cat. no. M6145), 0.056% bovine albumin fraction V (Thermo Fisher, cat. no. 15260037), 1x Glutamax (Thermo Fisher, cat. no. 35050-061), and 50 μg mL−1 Primocin (Invivogen, cat. no. ant-pm-2). Subsequently, they were incubated for another 48 h in DS/SB medium without Y-27632 at 37 °C.

On day 6, the culture medium was switched to CBRa (cSFDM with 3 × 10−6 mol L−1 CHIR99021 (Tocris, cat. no. 4423), 10 ng mL−1 recombinant human bone morphogenic protein (BMP4; R&D Systems, cat. no. 314-BP-050), and 50 × 10−9 mol L−1 retinoic acid (RA; Sigma-Aldrich, cat. no. R2625)) medium. When the cells were confluent (typically at day 10), they were passaged at densities 1:2 and 1:5 into fresh Matrigel-coated 6-well plates containing CBRa medium and were incubated until day 15. Two culture conditions, one without and one with daily media refreshment, were run to observe the effect of the biochemical environment on the induction of lung progenitors (Fig. 1A).

Each day, media samples of 160 μL were taken from all wells and the glucose and lactate concentrations were measured using RAPIDPoint 500 Blood Gas Systems (Siemens Healthcare Limited, Canada). Then, one well per condition was harvested daily to measure total live, AFE, and LP populations. The measurements for each model, along with the time points at which they were collected, are shown in Supplementary Table 1. Note that the data collection frequency is determined by the model-based design of experimental protocols.

The wells were first rinsed with PBS (−/−). Cells were then treated with 0.05% Trypsin-EDTA (Wisent, cat. no. 325-542-CL) and incubated at 37 °C for 3 min. The empty plate was washed with Dulbecco’s Modified Eagle Medium (DMEM; Wisent, cat. no. 319-005-CL) supplemented with 10% Fetal bovine serum (FBS; Thermo Fisher, cat. no. 12483020) and 1% penicillin/streptomycin (Wisent, cat. no. 450-201-EL), which was then combined with the trypsinized cells. The cell mixture was centrifuged at 300 g for 5 min. The cell pellet was resuspended in DMEM with 10% FBS and 1% penicillin/streptomycin. A 20 μL aliquot of the resuspended cells was used for cell counting with a hemocytometer. Trypan blue (Gibco, cat. no. 15250061) was used to identify and count dead cells.

Model proposal

The mathematical model incorporates two main components, the structural, g, and error, ϵ, models. The two models are used to define the experimental observations, z, as,

z(Y,Θ,ξ,u)=g(Y,Θ,u)+ϵ(Y,Θ,ξ,u)η, 4

where η is the vector of normalized residuals, which are assumed to be independent random variables drawn from a Gaussian distribution with zero mean and unit standard deviation18. Also, Θ, ξ, Y, u, are vectors containing structural parameters, error parameters, state variables, and external stimuli, respectively. Below, structural and error models are further discussed.

Structural models

The structural model is composed of lineage and growth models. Lineage models focus on the cellular states and processes, while the growth models focus on the growth rates affected by different stimuli. Here, lineage models are discussed, followed by growth models.

As seen in Fig. 1B, we investigate two potential dynamics for the population. The populations that are observed in the models are the AFE population denoted by na, the NKX2-1+ LP population represented by np, and the total live cells population defined as nq = na + np. The population of NKX2-1+ LPs is indicated with the expression of NKX2-1 GFP marker1. The uncertainty of the biological system characterizes this stage of the directed differentiation protocol (Fig. 1A), which is exacerbated by data sparsity due to the high cost of running experiments with daily measurements. Hence, the mathematical models cannot be high-dimensional with respect to the number of their population states.

The model also incorporates the per capita death, dj, and proliferation rates, bj, for each population, nj. It also includes the per capita differentiation rate pjj between one population nj and another nj. Two cellular processes, proliferation and differentiation, are affected by biochemical concentrations, specifically glucose, cg, and lactate cl, in addition to cell populations. The rates are defined as,

bj(N(t),C(t))=bj(nj,cg,cl),pjj(N(t),C(t))=pjj(nj,cg,cl),dj(N(t),C(t))=δj. 5

Here, all the populations and the concentrations are contained in two vectors, N(t) and C(t), respectively. As seen in the equation above, we defined the per capita death rates as constant to make the proposed mathematical models structurally identifiable.

Lineage model M0 (Fig. 1B) is defined as,

dnqdt(t)=bq(nq,cg,cl)nq(t)-δqnq(t)dcgdt(t)=-Vg(nq)cgcg+c¯gdcldt(t)=Vl(nq)clcl+c¯l 6

This model has one population, the total population, and two concentrations, glucose and lactate concentrations. Michaelis-Menten kinetics (MMK) govern glucose and lactate dynamics, with Vg and c¯g (Vl and c¯l) being the limiting rates and the half-saturating constants for glucose (lactate)55. MMK models enzyme-limited systems, which is consistent with cellular metabolism stages such as glucose uptake56, hexokinase reaction57, and cytochrome-c oxidase activity58, and is widely used in cell population dynamics models5961. The average proliferation rate is defined as,

bq(nq,cg,cl)=βqf(nq,cg,cl) 7

Since only the total count is used in this analysis, a differentiation term is not applicable here, as it would not affect the total population.

The more detailed model, lineage model M1, has two different populations, AFE and LPs. They are shown as,

dnadt(t)=ba(na,cg,cl)na(t)-δana(t)-pa(na,cg,cl)na(t),dnpdt(t)=bp(np,cg,cl)np(t)-δpnp(t)+pa(na,cg,cl)na(t),dcgdt(t)=-Vg(na+np)cgcg+c¯g,dcldt(t)=Vl(na+np)clcl+c¯l. 8

This model assumes the population of other cell types, e.g., definitive endoderm or other by-products, is negligible. Also, it assumes that LPs do not differentiate into later-stage cells; our experimental observations confirm this. Cellular processes are defined here as,

ba(na,cg,cl)=βaf(na,cg,cl),pap(na,cg,cl)=2(1-pap)ba(na,cg,cl),bp(np,cg,cl)=βpf(np,cg,cl). 9

As seen above, βa and βp are the maximum proliferation rates for AFE and LPs. Also, AFE proliferates with the rate of βaf(na, cg, cl), that would result in two daughter stem cells with the probability of pap or two differentiated cells with the probability of (1-pap) assuming only symmetric division62,63. Note that the lineage models are designed so that their states, na, np, nq, cg, and cl are measured in the experiments (Supplementary Table 1).

The one-state formulation, M0, is most appropriate when experimental observations are limited to aggregate live cell densities and extracellular metabolites. In such contexts, the primary objective is rapid forecasting of overall biomass or nutrient demand, and the parsimony of M0 facilitates structural and practical identifiability with sparse data59,60. On the other hand, the two-state formulation, M1, resolves the AFE and NKX2-1+ LP populations, and therefore requires lineage-specific readouts using flow cytometry facilitated by immunofluorescence staining or reporter lines. When these measurements are available, M1 affords mechanistic insight into the AFE to LP transition and enables optimization of differentiation efficiency62,64.

Per capita growth models are modulated by the respective cell population and the biochemical concentrations of glucose and lactate. The total per capita growth is defined as,

f(nj,cg,cl)=Fenv(cg,cl)fn(nj), 10

where fn(nj) is the per capita growth rate. There are multiple ways to define it,

fn(nj)=1Exponential,fn(nj)=1-njnmaxLogistic,fn(nj)=lognmaxnjGompertz. 11

In these equations, nmax is the maximum population caused by limited space. Exponential growth means there is no space limit on the growth, and logistic growth means that space causes a linear decrease in the per capita growth rate21.

The effect of the biochemical environment, Fenv(cg, cl), is defined as a multiplicative effect of each chemical substrate as below61.

Fenv(cg,cl)=f1(cg(t))f2(cl(t))GluLac:Fenv=cg(t)Kg+cg(t)Kl(t)Kl+cl(t)Lac:Fenv=1KlKl+cl(t)Glu:Fenv=cg(t)Kg+cg(t)1NoEffect:Fenv=11 12

Here, two models are assumed for the glucose effect, no effect or positive MMK, and two for the lactate effect, no effect or negative MMK60,65. This results in a total of 12 growth models, formed by all combinations of three population-controlled growth models (Eq. (11)) and four environmental effect models (Eq. (12)).

Error models

Two error functions, one for densities, ϵnj, and one for concentrations, ϵcj, are defined because these states possess different units. Also, three error model candidates are defined as follows,

ϵnj=an,ϵcj=ac,Additive,ϵnj=bngj(Y,Θ,u),ϵcj=bcgj(Y,Θ,u),Proportional,ϵnj=an+bngj(Y,Θ,u),ϵcj=ac+bcgj(Y,Θ,u),Combined. 13

Here, bn and bc are unitless, while an and ac have the units of density and concentration, respectively.

Note that this study requires error models since data standard deviations are unreliable due to the small number of replicates18. As shown by Eq. (13), the error models quantify the relationship between the standard deviations, ϵ, and the expected values, g. Individual error models are also justified since the density measurements include human-related errors, while the concentration measurements are affected by instrument errors but entirely independent of human errors. Introducing error models in the model definition necessitates the definition of a general loss function based on the likelihood function to infer the structural and error parameters simultaneously.

Objective function definition

Parameter estimation using experimental data is done by maximizing the likelihood function, L. Model parameters consist of structural parameters and error parameters and are mathematically defined as Ψ = [Θ, ξ]. When dealing with independent observations, zi, the likelihood function is simplified to the multiplication of probability density functions, p, as,

L(Θ,ξ;z)=i=1np(Θ,ξ;zi). 14

It should be noted that under our assumption of independent and identically distributed normalized residuals, η~N(0,I). Eq. (14) further simplifies to obtaining the inferred structural and error parameters, Θ* and ξ*, by minimizing the negative log-likelihood, − , as,

(Θ*,ξ*)=argminΘ,ξ2(Θ,ξ;ST)=argminΘ,ξgjgZjkSTϵj2(Θ,ξ,tk)(Zjkgj(Θ,tk))2+gjgZjkST2ln(ϵj(Θ,ξ,tk))+gjgZjkSTln(2π). 15

Here, gj is the model prediction, Zj is the experimental measurement, ln is natural logarithm, and ST is the training dataset10,16. In the case of M0, the total population and the two concentrations are employed in Eq. (15), while for M1, each of the two distinctive populations and the concentrations are used.

Structural identifiability analysis

A mathematical model is structurally identifiable if it has a unique set of parameters that corresponds to a set of unlimited noiseless observations1921. Mathematically, if two valid sets of structural model parameters are Θ and Φ, then,

g(Y,Θ,u)=g(Y,Φ,u)Θ=Φ. 16

Structural identifiability analysis focuses on the relationship between the model equations and the measured states and is independent of the experimental data16. The analysis entails differential algebra techniques to derive input-output equations from the model. This leads to constructing a symbolic identifiability matrix, and if the matrix is of full rank, it would indicate the structural identifiability of the model22,66. In the model development procedure, structurally unidentifiable models are discarded or modified10,67.

Model-based design of experimental protocols

We used model-based design of experimental protocols (MBDEP) to determine the sampling period needed to infer an accurate model, as indicated by the error measure, which represents the relative distance between the inferred and assumed parameters. Using Eq. (4) and by choosing specific structural and error models with their corresponding parameters (assumed parameters, [Θ^,ξ^]), synthetic data can be generated. By utilizing the generated noisy synthetic data to solve Eq. (15), the inferred structural parameters (Θ*) may be obtained. We define e as the relative distance measure between the assumed (Θ^) and inferred parameters, i.e., parameters utilized to produce synthetic data using the equation below,

e=Θ^-Θ*Θ^. 17

The procedure was repeated to perform a grid search at various sampling intervals, with the sampling interval that minimizes the error, e, chosen for the experiments.

Model inference

A version of adaptive differential evolution (ADE), formally known as DE/rand/1/bin with radius-limited sampling68,69, was used for each optimization run (BlackBoxOptim.jl70). ADE methods have demonstrated exceptional capabilities in handling nonlinear, multimodal, and constrained global optimization problems71,72. Since ADE does not guarantee local convergence with a given error bound, the solution from ADE is taken as an initial guess for a local optimizer (Nelder-Mead73,74) until convergence to the specified tolerance, 10−6, is achieved. Note that the choice of Nelder-Mead was made after testing multiple alternatives, including gradient-based methods such as BFGS75 and Adam76 in this specific use case.

The selection dataset, SS, is used to calculate the Bayesian information criterion (BIC), defined as,

BIC=kln(ms)-2(Θ,ξ;SS), 18

where k and ms are the number of inferred parameters and the number of observations in the selection dataset. This measure is calculated on all candidate models on the inferred parameter values, and minimizing it strikes a balance between model complexity and prediction accuracy77.

Practical identifiability analysis

Practical identifiability extends structural identifiability analysis by incorporating experimental data into the assessment of the uniqueness of the inferred parameters. Here, one model parameter, ψi, is fixed at a value, and the likelihood is maximized on the rest of the parameters (ψj, ∀ ji) in Eq. (15), i.e.,

PL(ψi)=maxψji(Ψ). 19

Likelihood profile is generated by repeating this for different values of the fixed parameter ψi, leading to the confidence interval for each parameter,

CIPL(ψi)={ψiPL(ψi)max-Δα}, 20

where max and Δα are the maximum log-likelihood and α-quantile of the χ2 distribution with one degree of freedom, respectively16. Finite confidence intervals signal practically identifiable parameters32,78.

Goodness of fit

The inferred model is validated on the validation (hold-out) dataset, SV, using the root-mean-squared error (RMSE).

RMSE=1mgjgZjkSV(Zjk-gj(Θ,tk))2. 21

In this equation, nz, nk, and m are the number of data points in each state, the number of states, and the total number of data points, respectively79. Further, Normalized RMSE can be defined as,

NRMSE=1max(Z)-min(Z)RMSE 22

where Z, shows the experimental data vector80.

Global sensitivity analysis

In this work, global sensitivity analysis (GSA) is employed to rank model parameters based on their impact on predictions. Variance-based Sobol’s method is a GSA method that decomposes nonlinear continuous functions into a set of integrals as,

Yk=f0+i=1dfi (Xi)+i<jdfij (Xi,Xj)++f1,2,,d (X1,X2,,Xd) 23

These integrals increase in dimensionality, representing the overall mean, f0, main effects, fi (Xi), and interactions of increasing order between variables35,81. Here, fij (Xi,Xj) is the first-order interaction between Xi and Xj. Note that d is the size of X, which is a vector that includes the inputs under study, i.e., model parameters, initial conditions, or boundary conditions.

The bounds for sensitivity analysis were calculated using exp (log(θ*)±log(l))=[θ*/l,θ*l] to have θ* at the center of the log-scaled search space. Note that θ* stands for the inferred parameter values. The span for the bounds was chosen as l = 1.1. For a more in-depth look into the methodology (Fig. 1C), refer to previous works by the authors16,17.

Supplementary information

Supplementary Information (933.9KB, pdf)

Acknowledgements

This study is funded in part by a Collaborative Health Research Project (CHRP) grant provided by the Canadian Institutes of Health Research in partnership with the Natural Sciences and Engineering Research Council (158270 to C.H.A. and T.W.). The study is also supported by the New Frontiers in Research Fund Transformation stream (NFRFT-2020-00787 to T.W., C.H.A., G.K., and NFRFT-2022-00447 to C.H.A.), and the University of Toronto’s Medicine by Design initiative, which receives funding from the Canada First Research Excellence Fund (to C.A. and T.W.).

Author contributions

C.A., D.R., T.W., and G.K. designed and supervised the research. A.M. and D.B. performed research and analyzed data. A.M., D.R., and D.B. took the lead in writing the manuscript. All authors provided critical feedback and helped shape the research and the final manuscript.

Data availability

All data and code used to generate the results in this manuscript are available through https://github.com/amostof/inSilicoAFEPaper.

Code availability

All data and code used to generate the results in this manuscript are available through https://github.com/amostof/inSilicoAFEPaper.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

The online version contains supplementary material available at 10.1038/s41540-026-00650-1.

References

  • 1.Hawkins, F. et al. Prospective isolation of nkx2-1–expressing human lung progenitors derived from pluripotent stem cells. J. Clin. Investig.127, 2277–2294 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.McCauley, K. B. et al. Efficient derivation of functional human airway epithelium from pluripotent stem cells via temporal regulation of Wnt signaling. Cell Stem Cell20, 844–857 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Spence, J. R. et al. Directing differentiation of pluripotent stem cells toward somatic progenitors and functional tissue units. FASEB J.25, 3775–3785 (2011).21778325 [Google Scholar]
  • 4.Green, M. D. et al. Generation of anterior foregut endoderm from human embryonic and induced pluripotent stem cells. Nat. Biotechnol.29, 267–272 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Jacob, A. et al. Differentiation of human pluripotent stem cells into functional lung alveolar epithelial cells. Cell Stem Cell21, 472–488 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Jacob, A. et al. Derivation of self-renewing lung alveolar epithelial type ii cells from human pluripotent stem cells. Nat. Protoc.14, 3303–3332 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Yuan, H. et al. Scalable expansion of human pluripotent stem cells under suspension culture condition with human platelet lysate supplementation. Front. Cell Dev, Biol.11, 1280682 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Venkatesan, M. et al. Recombinant production of growth factors for application in cell culture. Iscience25, 105054 (2022). [DOI] [PMC free article] [PubMed]
  • 9.Möller, J. & Pörtner, R. Digital twins for tissue culture techniques-concepts, expectations, and state of the art. Processes9, 447 (2021). [Google Scholar]
  • 10.Villaverde, A. F., Pathirana, D., Fröhlich, F., Hasenauer, J. & Banga, J. R. A protocol for dynamic model calibration. Brief. Bioinforma.23, bbab387 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Geris, L., Lambrechts, T., Carlier, A. & Papantoniou, I. The future is digital: in silico tissue engineering. Curr. Opin. Biomed. Eng.6, 92–98 (2018). [Google Scholar]
  • 12.Hurley, K. et al. Reconstructed single-cell fate trajectories define lineage plasticity windows during differentiation of human PSC-derived distal lung progenitors. Cell Stem Cell26, 593–608 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Engle, S. J. & Vincent, F. Small molecule screening in human induced pluripotent stem cell-derived terminal cell types. J. Biol. Chem.289, 4562–4570 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bock, C. et al. Reference maps of human ES and IPS cell variation enable high-throughput characterization of pluripotent cell lines. Cell144, 439–452 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Varghese, B., Ling, Z. & Ren, X. Reconstructing the pulmonary niche with stem cells: a lung story. Stem Cell Res. Ther.13, 161 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Mostofinejad, A. et al. In silico model development and optimization of in vitro lung cell population growth. PLOS ONE19, 1–27 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Mostofinejad, A. et al. In silico modeling of directed differentiation of induced pluripotent stem cells to definitive endoderm. PLOS Comput. Biol.21, 1–30 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Lavielle, M.Mixed Effects Models for the Population Approach: Models, Tasks, Methods and Tools, 1st edn. 10.1201/b17203 (Chapman and Hall/CRC, 2014).
  • 19.Walter, E. Identifiability of Parametric Models (Elsevier, 2014).
  • 20.Salmaniw, Y. & Browning, A. P. Structural identifiability of linear-in-parameter parabolic PDEs through auxiliary elliptic operators. J. Math. Biol.91, 4 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Simpson, M. J., Browning, A. P., Warne, D. J., Maclaren, O. J. & Baker, R. E. Parameter identifiability and model selection for sigmoid population growth models. J. Theor. Biol.535, 110998 (2022). [DOI] [PubMed] [Google Scholar]
  • 22.Dong, R., Goodbrake, C., Harrington, H. A. & Pogudin, G. Differential elimination for dynamical models via projections with applications to structural identifiability. J. Applied Algebra Geometry7, 194–235 (2023). [Google Scholar]
  • 23.Rackauckas, C. & Nie, Q. Differentialequations.jl–a performant and feature-rich ecosystem for solving differential equations in Julia.J. Open Research Software5, 15 (2017). [Google Scholar]
  • 24.Bezanson, J., Edelman, A., Karpinski, S. & Shah, V. B. Julia: A fresh approach to numerical computing. SIAM Rev.59, 65–98 (2017). [Google Scholar]
  • 25.Vallat, R. Pingouin: statistics in python. J. Open Source Softw.3, 1026 (2018). [Google Scholar]
  • 26.Stein, M. Large sample properties of simulations using Latin hypercube sampling. Technometrics29, 143–151 (1987). [Google Scholar]
  • 27.Moles, C. G., Mendes, P. & Banga, J. R. Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Res.13, 2467–2474 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Gábor, A. & Banga, J. R. Robust and efficient parameter estimation in dynamic models of biological systems. BMC Syst. Biol.9, 74 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wieland, F.-G., Hauber, A. L., Rosenblatt, M., T”nsing, C. & Timmer, J. On structural and practical identifiability. Opin. Syst. Biol25, 60–69 (2021). [Google Scholar]
  • 30.Raue, A., Karlsson, J., Saccomani, M. P., Jirstrand, M. & Timmer, J. Comparison of approaches for parameter identifiability analysis of biological systems. Bioinformatics30, 1440–1448 (2014). [DOI] [PubMed] [Google Scholar]
  • 31.VandenHeuvel, D. J. Profilelikelihood.jl (2023).
  • 32.Simpson, M. J. & Maclaren, O. J. Profile-wise analysis: a profile likelihood-based workflow for identifiability analysis, estimation, and prediction with mechanistic mathematical models. PLoS Comput. Biol.19, e1011515 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Farshidfar, S. S. et al. Towards a validated musculoskeletal knee model to estimate tibiofemoral kinematics and ligament strains: comparison of different anterolateral augmentation procedures combined with isolated ACL reconstructions. Biomed. Eng. OnLine22, 31 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.El Wajeh, M. et al. Can the Kuznetsov model replicate and predict cancer growth in humans?. Bull. Math. Biol.84, 130 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Sobol, I. M. Sensitivity analysis for non-linear mathematical models. Math. Model. Comput. Exp.1, 407–414 (1993). [Google Scholar]
  • 36.Bates, D. et al. Juliastats/glm.jl: v1.9.0. 10.5281/zenodo.8345558 (2023).
  • 37.Liu, Q. et al. Advances in the application of bone morphogenetic proteins and their derived peptides in bone defect repair. Compos. Part B: Eng.262, 110805 (2023). [Google Scholar]
  • 38.Fernandes, R., Barbosa-Matos, C., Borges-Pereira, C., Carvalho, A. L. R. T. d & Costa, S. Glycogen synthase kinase-3 inhibition by chir99021 promotes alveolar epithelial cell proliferation and lung regeneration in the lipopolysaccharide-induced acute lung injury mouse model. Int. J. Mol. Sci.25, 1279 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Wilson, H. K., Canfield, S. G., Hjortness, M. K., Palecek, S. P. & Shusta, E. V. Exploring the effects of cell seeding density on the differentiation of human pluripotent stem cells to brain microvascular endothelial cells. Fluids Barriers CNS12, 1–12 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.McBeath, R., Pirone, D. M., Nelson, C. M., Bhadriraju, K. & Chen, C. S. Cell shape, cytoskeletal tension, and rhoa regulate stem cell lineage commitment. Dev. cell6, 483–495 (2004). [DOI] [PubMed] [Google Scholar]
  • 41.Peerani, R. et al. Niche-mediated control of human embryonic stem cell self-renewal and differentiation. EMBO J.26, 4744–4755 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Huang, H., Ye, K. & Jin, S. Cell seeding strategy influences metabolism and differentiation potency of human induced pluripotent stem cells into pancreatic progenitors. Biotechnol. J.20, e70022 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Ptasinski, V. et al. Modeling fibrotic alveolar transitional cells with pluripotent stem cell-derived alveolar organoids. Life Sci. Alliance6 (2023). [DOI] [PMC free article] [PubMed]
  • 44.Burridge, P. W., Holmström, A. & Wu, J. C. Chemically defined culture and cardiomyocyte differentiation of human pluripotent stem cells. Curr. Protoc. Hum. Genet.87, 21–3 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Stephens, P. A., Sutherland, W. J. & Freckleton, R. P. What is the Allee effect? Oikos87, 185–190 (1999).
  • 46.Masters, J. R. & Stacey, G. N. Changing medium and passaging cell lines. Nat. Protoc.2, 2276–2284 (2007). [DOI] [PubMed] [Google Scholar]
  • 47.Hong, P., Boyd, D., Beyea, S. D. & Bezuhly, M. Enhancement of bone consolidation in mandibular distraction osteogenesis: a contemporary review of experimental studies involving adjuvant therapies. J. Plast. Reconstruct. Aesthetic Surg.66, 883–895 (2013). [DOI] [PubMed] [Google Scholar]
  • 48.Sharow, K. A., Temkin, B. & Asson-Batres, M. A. Retinoic acid stability in stem cell cultures. Int. J. Dev. Biol.56, 273–278 (2012). [DOI] [PubMed] [Google Scholar]
  • 49.Charlebois, D. A. & Balázsi, G. Modeling cell population dynamics. In Silico Biol.13, 21–39 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Longmire, T. A. et al. Efficient derivation of purified lung and thyroid progenitors from embryonic stem cells. Cell Stem Cell10, 398–411 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Huang, S. X. et al. Efficient generation of lung and airway epithelial cells from human pluripotent stem cells. Nat. Biotechnol.32, 84 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Suzuki, S. et al. Differentiation of human pluripotent stem cells into functional airway basal stem cells. STAR Protoc.2, 100683 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Myers, P. J., Lee, S. H. & Lazzara, M. J. Mechanistic and data-driven models of cell signaling: Tools for fundamental discovery and rational design of therapy. Curr. Opin. Syst. Biol.28, 100349 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Pir, P. & Le Novère, N.Mathematical Models of Pluripotent Stem Cells: At the Dawn of Predictive Regenerative Medicine, 331–350 (Springer New York, 2016). [DOI] [PubMed]
  • 55.Ingalls, B. P. Mathematical Modeling in Systems Biology: an Introduction (MIT Press, 2013).
  • 56.Nishimura, H. et al. Kinetics of glut1 and glut4 glucose transporters expressed in Xenopus oocytes. J. Biol. Chem.268, 8514–8520 (1993). [PubMed] [Google Scholar]
  • 57.Fujii, S. & Beutler, E. High glucose concentrations partially release hexokinase from inhibition by glucose 6-phosphate. Proc. Natl. Acad. Sci.82, 1552–1554 (1985). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Zhang, B. et al. Cooperative transport mechanism of human monocarboxylate transporter 2. Nat. Commun.11, 2429 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Coy, R. et al. Combining in silico and in vitro models to inform cell seeding strategies in tissue engineering. J. R. Soc. Interface17, 20190801 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Osiecki, M. J., McElwain, S. D. & Lott, W. B. Modelling mesenchymal stromal cell growth in a packed bed bioreactor with a gas permeable wall. PLoS ONE13, e0202079 (2018). [DOI] [PMC free article] [PubMed]
  • 61.Mehrian, M. et al. Maximizing neotissue growth kinetics in a perfusion bioreactor: an in silico strategy using model reduction and Bayesian optimization. Biotechnol. Bioeng.115, 617–629 (2018). [DOI] [PubMed] [Google Scholar]
  • 62.Marciniak-Czochra, A., Stiehl, T., Ho, A. D., Jäger, W. & Wagner, W. Modeling of asymmetric cell division in hematopoietic stem cells—regulation of self-renewal is essential for efficient repopulation. Stem Cells Dev.18, 377–386 (2009). [DOI] [PubMed] [Google Scholar]
  • 63.Wodarz, D. Effect of cellular de-differentiation on the dynamics and evolution of tissue and tumor cells in mathematical models with feedback regulation. J. Theor. Biol.448, 86–93 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Duchesne, R., Guillemin, A., Crauste, F. & Gandrillon, O. Calibration, selection and identifiability analysis of a mathematical model of the in vitro erythropoiesis in normal and perturbed contexts. In Silico Biol.13, 55–69 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Hossain, M. S., Bergstrom, D. & Chen, X. Modelling and simulation of the chondrocyte cell growth, glucose consumption and lactate production within a porous tissue scaffold inside a perfusion bioreactor. Biotechnol. Rep.5, 55–62 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Kalami Yazdi, A., Nadjafikhah, M. & Distefano III, J. Combos2: an algorithm to the input–output equations of dynamic biosystems via Gaussian elimination. J. Taibah Univ. Sci.14, 896–907 (2020). [Google Scholar]
  • 67.Raue, A., Becker, V., Klingmüller, U. & Timmer, J. Identifiability and observability analysis for experimental design in nonlinear dynamical models. Chaos: Interdiscip. J. Nonlinear Sci.20, 045105 (2010). [DOI] [PubMed] [Google Scholar]
  • 68.Storn, R. & Price, K. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim.11, 341–359 (1997). [Google Scholar]
  • 69.Price, K., Storn, R. M. & Lampinen, J. A. Differential Evolution: a Practical Approach to Global Optimization (Springer Science & Business Media, 2006).
  • 70.Feldt, R. Blackboxoptim.jl. https://github.com/robertfeldt/BlackBoxOptim.jl (2018).
  • 71.Das, S. & Suganthan, P. N. Differential evolution: a survey of the state-of-the-art. IEEE Trans. Evolut. Comput.15, 4–31 (2010). [Google Scholar]
  • 72.Mashwani, W. K. Enhanced versions of differential evolution: state-of-the-art survey. Int. J. Comput. Sci. Math.5, 107–126 (2014). [Google Scholar]
  • 73.Nelder, J. A. & Mead, R. A simplex method for function minimization. Comput. J.7, 308–313 (1965). [Google Scholar]
  • 74.Johnson, S. G. The NLopt nonlinear-optimization package. https://github.com/stevengj/nlopt (2007).
  • 75.Wright, S., Nocedal, J. et al. Numerical optimization. Springe. Sci.35, 7 (1999). [Google Scholar]
  • 76.Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR) (2015).
  • 77.Stoica, P. & Selen, Y. Model-order selection: a review of information criterion rules. IEEE Signal Process. Mag.21, 36–47 (2004). [Google Scholar]
  • 78.Pawitan, Y. In All Likelihood: Statistical Modelling and Inference Using Likelihood (Oxford University Press, 2001).
  • 79.Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd edn (Springer, 2009).
  • 80.Daume, S., Kofler, S., Kager, J., Kroll, P. & Herwig, C. Generic workflow for the setup of mechanistic process models. In Pörtner, R. (ed.) Animal Cell Biotechnology: Methods and Protocols, vol. 2095 of Methods in Molecular Biology, 189–211 (Humana, 2020). [DOI] [PubMed]
  • 81.Zhang, X.-Y., Trame, M. N., Lesko, L. J. & Schmidt, S. Sobol sensitivity analysis: a tool to guide the development and evaluation of systems pharmacology models. CPT: Pharmacomet. Syst. Pharmacol.4, 69–79 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Brinson, D. Figure 1. Experimental protocol and the lineage models. https://BioRender.com/2uyml6l (2025).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information (933.9KB, pdf)

Data Availability Statement

All data and code used to generate the results in this manuscript are available through https://github.com/amostof/inSilicoAFEPaper.

All data and code used to generate the results in this manuscript are available through https://github.com/amostof/inSilicoAFEPaper.


Articles from NPJ Systems Biology and Applications are provided here courtesy of Nature Publishing Group

RESOURCES