Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Dec 1.
Published in final edited form as: Curr Pathobiol Rep. 2020 Nov 6;8(4):121–131. doi: 10.1007/s40139-020-00216-8

Acceleration of PDE-Based Biological Simulation Through the Development of Neural Network Metamodels

Lukasz Burzawa 1,2, Linlin Li 1, Xu Wang 1, Adrian Buganza-Tepole 1,3, David M Umulis 1,4
PMCID: PMC8104327  NIHMSID: NIHMS1644824  PMID: 33968495

Abstract

Purpose of Review:

Partial differential equation (PDE) mathematical models of biological systems and the simulation approaches used to solve them are widely used to test hypotheses and infer regulatory interactions based on optimization of the PDE model against the observed data. In this review, we discuss the ability of powerful machine learning methods to accelerate the parametric screening of biophysical informed- PDE systems.

Recent Findings:

A major shortcoming in more broad adaptation of PDE-based models is the high computational complexity required to solve and optimize the models and it requires many simulations to traverse the very high-dimensional parameter spaces during model calibration and inference tasks. For instance, when scaling up to tens of millions of simulations for optimization and sensitivity analysis of the PDE models, compute times quickly extend from months to years for sufficient coverage to solve the problems. For many systems, this brute-force approach is simply not feasible. Recently, neural network metamodels have been shown to be an efficient way to accelerate PDE model calibration and here we look at the benefits and limitations in extending the PDE acceleration methods to improve optimization and sensitivity analysis.

Summary:

We use an example simulation to quantitatively and qualitatively show how neural network metamodels can be accurate and fast and demonstrate their potential for optimization of complex spatiotemporal problems in biology. We expect these approaches will be broadly applied to speed up scientific research and discovery in biology and other systems that can be described by complex PDE systems.

Keywords: Neural Network Metamodel, AI, PDE Acceleration, Biological simulation, Zebrafish BMP signaling

Introduction

Mathematical modeling is widely used to interrogate mechanisms of signaling in biological systems. Spatiotemporal control of biological signaling is constrained by physical laws such as conservation of energy, mass and momentum. These physical laws, under a continuum assumption, are represented through partial differential equations (PDEs). Mechanism-based PDEs of biological signaling networks involve many coupled variables through nonlinear relations and many parameters. Model calibration often requires the screening of a massive parameter space due to the complexity of the system and the limitations of experimental evidence. During embryo development, regulation of the body plan can be described by nonlinear systems of reaction-advection-diffusion PDEs of relevant proteins. Due to the complexity of the regulatory system and the physics involved, experiments alone are not enough to gain mechanistic understanding of pattern formation in embryos or to understand how cells pass information to each other over long distances. Rather, identifying the correct parameters of the PDE system that explain the observed data is essential to our understanding of the biology(Hengenius et al. 2014; Umulis and Othmer 2015). Unfortunately, solving PDE models can be a computationally intensive task. The type of nonlinear PDEs appearing in morphogenesis and pattern formation have to be solved numerically with methods such as the finite difference method or the finite element method. Because of the high dimensionality of the input parameters specifying the PDEs, parameter calibration through random search involves running millions of PDE simulations(Zinski et al. 2017). Even with the unrealistic assumption that a single PDE evaluation takes on the order of seconds, the computational cost for the calibration task quickly adds up to weeks or longer. For more detailed PDE models accounting for realistic geometries, more proteins, other physical phenomena, and geometric and constitutive nonlinearities, the brute-force approach is simply infeasible. Due to these problems, alternatives to direct PDE simulations are needed. One approach is to approximate the numerical simulation of a PDE system by another, simpler model - a metamodel.

Machine learning and data analytics have yielded transformative results across multiple scientific fields due to the explosive growth of available data and computing resources. In this review, we discuss the ability of powerful machine learning methods to accelerate the parametric screening of biophysical informed- PDE systems. Training a deep learning algorithm enables us to accurately identify a nonlinear map between high-dimensional input and output data pairs that replaces the direct numerical simulation of the PDEs. We propose to use neural network (NN) proxies to build these metamodels. Figure 1 illustrates the differences between the traditional PDE model approach versus the proposed NN meta-model. A NN proxy can give results that are very close to those of a PDE model while providing significant speedups for model evaluation. Here we review literature where these methods were utilized and then focus on an example system that models zebrafish embryonic patterning through an extracellular reaction-advection-diffusion system represented by PDEs for chemical components that evolve over space and time. This model is a prototypical example of a signaling network that can explain how complex patterns emerge in organisms during development. We show that the NN metamodel is capable of replacing the PDE solver over a wide and high-dimensional parameter space, while at the same time requiring a much smaller computational cost. As a consequence, we are able to do parameter calibration against a set of experimental data using the NN metamodel. The example shown here is a specific application of machine learning to replace physics solvers by inexpensive surrogates, but the methodology described in this manuscript extends to other models of morphogenesis and pattern formation which can be described in terms of PDEs. Finally, even though we focus on the reaction-advection-diffusion PDEs and on NNs, we review alternative machine learning algorithms and their application to different classes of physics solvers.

Figure 1.

Figure 1.

PDE modeling (blue) and meta-modeling (yellow)

Machine learning and deep learning

Machine learning encompasses a class of algorithms where a given task can be learned through implicit pattern recognition rather than by relying on explicit instructions. Machine learning can be split into subcategories: supervised learning, unsupervised learning and reinforcement learning(Alber et al. 2019). Supervised learning involves fitting a function between inputs and outputs where the outputs have clearly defined labels that are to be exactly predicted by a machine learning model. It can be done either in the form of classification or regression. Some popular examples of methods used in supervised learning are Support Vector Machines (SVM), Naive Bayes classifiers, Gaussian Processes and Neural Networks (Sugiyama 2015). Unsupervised learning involves finding patterns in data that does not have any labels. It can be done with k-means clustering, Gaussian Mixture Models (GMM) or also with Neural Networks (Hastie, Trevor, Tibshirani, Robert, Friedman 2009). Reinforcement learning (RL) involves an agent exploring an environment and attempting to find a sequence of actions that leads it to obtaining a highest reward based on reward function that was crafted by a human. Popular approaches to RL include policy gradient (Williams 1992)and Q-learning (Mnih et al. 2015) and usually use Neural Networks. For other reviews about recent machine learning developments specially in the context of biological systems modeling, we refer the reader to (Goodfellow et al. 2016; Peng et al. 2020).

In recent years a subfield of machine learning called deep learning (Leah Edelstein-Keshet 2012) has been gaining popularity due to advances in big data and massively parallel computer hardware(Krizhevsky et al. 2012). Deep learning involves training NN with many layers. The number of layers needed in a NN model depends on the complexity of the task. Adding multiple layers allows the network to learn more detailed and more abstract relationships within the data and how the features interact with each other on a non-linear level. The networks are trained through backpropagation (Rumelhart et al. 1986) and stochastic gradient descent (SGD) optimization. Some common SGD algorithms that implement an adaptive learning rate include Adagrad (Duchi et al. 2011), RMSprop (Hinton 2012) and Adam (Kingma and Ba 2015). The simplest NN architecture is a Multilayer Perceptron (MLP) where all nodes are fully connected. If the data involves sequences, then usually Recurrent Neural Networks (RNN) are used(Wang et al. 2020; Sengupta et al. 2020). A generic RNN diagram is shown in Figure 2. As shown on the diagram, the input at time s is combined with the hidden state ℎ at time s − 1 to give output y at time s. Training RNNs is a bit more complex since it involves backpropagation through time where gradients are summed up for all time steps. Standard RNNs struggle with long sequences as gradients start to vanish if they are backpropagated though a long graph. To deal with this problem, Long Short-Term Memory Networks (LSTM) (Hochreiter and Schmidhuber 1997) were proposed where some gradients are allowed to be passed almost undisturbed. LSTMs are much more computationally expensive to train compared to MLPs.

Figure.2.

Figure.2.

MLP vs LSTM model architecture

Input and output data type identification is critical to choosing the type of neural network. If the data involves 1D or 2D correlations to be explored like in speech signal or images, then Convolutional Neural Networks (CNN)(LeCun et al. 1998) are used. Unlike in MLP, nodes of a CNN are sparsely connected to focus on spatial interdependencies.

Applications of machine learning to mathematical modeling

A major advantage of NN models is that nesting layers leads to the ability to interpolate highly nonlinear functions with a sequence of mostly linear steps. They are built primarily on basic linear algebra operations like matrix multiplications and convolutions which are highly parallelizable and run on the order of milliseconds unlike many scientific simulations. Hence if a NN metamodel could emulate a given mathematical model running on a computer then it could offer major advantages in terms of speed. Such use of NN has been seen in materials science, chemistry, physics, robotics and recently biology(Mjolsness and DeCoste 2001). Furthermore, the predictions of the NN model includes scalars, prediction of a small set of discrete variables over all time, and prediction of spatial-temporal fields.

In the simpler cases, a NN model can be used to predict a single scalar. In the context of physics models, energy quantities are a common target for machine learning. For instance, take Schrodinger’s Equation (SE), a fundamental tool in quantum mechanics. A problem of interest is to predict the potential energy as predicted by solving SE from particle positions. Similarly, potential energy for atomistic simulations or molecular dynamics simulations can be calculated as a single scalar output taking particle positions as input. Yet exact solutions are only possible for the smallest systems and otherwise expensive numerical approximations have to be used. To overcome this limit, (Rupp et al. 2012) use a sum of weighted Gaussians to predict molecular energy based on the distance between molecules. The regression coefficients are found through kernel ridge regression on data generated with Density Functional Theory (DFT) which is taken as the ground-truth. They circumvent the task of explicitly solving the SE by training the machine learning algorithm on a finite subset of known solutions. Since many interesting questions in physics require repeated solutions of SE or molecular dynamics equations, the highly competitive performance of the ML approach is a boon for larger-scale exploration of molecular energies in chemical compound space(Deringer et al. 2018). The efficiency of the machine learning approach paves the way for large scale exploration of chemical compounds and their energies(Chan et al. 2019). In (Smith et al. 2017) an MLP is used to predict the potential energy of a molecular system based on DFT calculations. What’s notable is the authors carefully crafted a molecular representation as an input to the MLP. They took into account that the representation needs to be compact while maximizing resolution of the local atomic environment and covering all the relevant space the molecule occupies. They termed the resulting input as the Atomic Environment Vector (AEV). That is in line with other work in deep learning that shows that input representation matters. For example (Wang et al. 2019c) shows they are able to narrow the gap in 3D object detection between stereo and LIDAR vision data by generating a pseudo-LIDAR representation from stereo data. One of major problems within materials discovery is to be able to identify stable compositions of chemical compounds. It is mostly done through expensive DFT calculations. In (Ye et al. 2018) they train an MLP to predict formation energy of a crystal based on Pauling electronegativity and ionic radius of species. They achieve very accurate results on garnets and perovskites. The ability of machine learning tools in predicting single scalars from the state of a complex multi-dimensional input space can help in accelerating the modeling of biological systems. For example, given a set of input parameters, a single scalar metric of interest might be predicting the maximum expression of a given morphogen, or the total expression of a molecular species.

In some studies, NN models are used to predict a small set of discrete variables rather than a scalar, and in many cases the dynamics of these quantities are needed. In other words, predictions are needed over all time (or some time frame) for the set of discrete quantities. Dynamics simulators of this type are common in robotics. However, the rigid body dynamics models they employ can be very time consuming due to complex nature of the physics they are trying to capture. In (Pretorius et al. 2013) the authors use experimental data to train MLP and RNN to predict motion and sensory outputs of a robot based on its current positioning and kinematics. They developed various Simulator Neural Networks (SNNs) to capture different type of simulators. Depending on the specific simulator the prediction can either be a scalar or a set of motion locations over time. Accuracy tests indicated that NN simulators created for these robots generally trained well and could generalize well on data not presented during simulator construction. This approach increases effectiveness of building robust control system for robots. Even when part of the model is known, deep learning approaches can be used to learn unknown nonlinearities(Nguyen-Tuong et al. 2009). Another crucial yet very computationally costly problem in science is a three-body problem. In (Breen et al. 2019) the authors train an MLP to predict location of particles 1 and 2 given input of time t and initial location of particle 2. Location of particle 3 follows from problem symmetry. To acquire data, they used the Brutus numerical integrator and the time saved by using a NN is on the order of 100 million-fold. Such fast and accurate three-body solver has major implications for research into dynamical systems, especially to capture chaotic dynamics(Wang et al. 2019a). The ability of machine learning tools to predict the dynamics of complex systems might help accelerate discovery in biological systems. For example, instead of dealing directly with the PDE, the regulatory network can be considered as a 0-dimensional system leading to a system of ordinary differential equations (ODE). The ODE dynamics, including derivatives, can be captured through machine learning tools. This type of metamodel can either be used in the PDE solver to replace part of the computations, or it can be used by itself to at least delineate plausible parameter ranges before running the PDE solver(Hagge et al. 2017).

The last type of ML algorithms we would like to showcase are those related to our problem of interest of accelerating PDE simulation for biological systems. For prediction of spatial-temporal fields, Wang applies neural networks to a mechanistic PDE model of pattern formation in bacteria(Wang et al. 2019b). Wang et al. trained an LSTM for a PDE system that calculates cell and molecular concentrations for pattern formation in Escherichia coli. The model was non-linear and complex and simulated E. coli programmed by a synthetic gene circuit and the outputs of the model are cell growth and movement, intercellular signaling and circuit dynamics and transport and these outputs depended on PDE input parameters like the cell growth rate, cell motility rate and the kinetics of gene expression. Training data was generated by a PDE solver and their neural network achieved an R2 of around 0.99 and provides speedup of about 30,000x. That carries great potential for efficiently exploring the parameters space of the PDE model and finding spatial distributions not easily seen before. Physics-constrained machine learning approaches have also received significant attention recently in the context of fluid and solid mechanics(Raissi et al. 2019; Karumuri et al. 2020; Zhang et al. 2020).

Most of the above work uses deep learning which scales better with increasing data sizes than more traditional machine learning approaches like SVMs and Gaussian Processes(Lee et al. 2018, 2020). For biological systems modeling, the number of chemical species that needs to be considered is large because regulatory networks are complex. In turn, a large set of parameters is needed. The output is also high dimensional. Deep NN have a fixed number of parameters (the weights and biases of the neurons) to represent these high dimensional input spaces and the nonlinear outputs. Thus, deep NN are a good choice considering the data here comes from simulation and is automatically labeled. Since it does not require strenuous manual annotation like vision, speech and text data, there is no limit other than time in generating training data through solution of the PDE systems. There are also multiple options for the type of NN used for acceleration. In some cases where data is sequential, as occurs in the simulation of physical systems over time, we can use RNNs to solve the problem. One thing that also weighs into the selection of NN is the tradeoff between speed and accuracy. RNNs operate in sequences and involve multiple linear layers in a single module which considerably slows them down compared to standard MLPs. Since inference latency is a crucial concern when emulating scientific simulations with neural networks, this disparity between MLP and RNN with MLP often having lower latency but potentially lower accuracy.

Mathematical modeling in developmental biology

One of the fundamental problems in developmental biology is how complex patterns in organisms emerge from a group of nearly identical cells. A major tool in understanding such complex pattern emergence is to use reaction-diffusion mathematical models which model how molecular concentrations change over space and time (Thompson et al. 2018). Three major components of reaction-diffusion models are molecular transport, production and clearance. The reaction-diffusion PDEs involve many parameters, for example diffusion rate, production rate and decay rate of each protein. In order to get insight into the dynamics and interactions of the different molecules, the parameters of the PDEs have to be identified such that the simulations match experimental data. Moreover, we frequently want to find parameters that optimize the system for multiple different species or mutations in which case multi-objective or Pareto optimization is used(Pargett et al. 2014).

Out of many proteins important for pattern formation in tissues, we are interested in Bone morphogenetic protein (BMP). In this review, we focus on the PDE model of BMP signaling network which patterns of gene expression along the dorsal-ventral (DV) body axis in early development of zebrafish embryo(Little and Mullins 2004; Zinski et al. 2017). BMPs pattern DV tissues of zebrafish, Xenopus, and Drosophila embryos by using a gradient-based mechanism, in which different levels of BMP signaling drive differential gene expression (Little and Mullins 2006).

Previously, we developed both 1-D and 3-D modeling approaches to investigate the mechanisms of BMP-mediated DV patterning in blastula embryos through 5.7 hpf (hours post fertilization) before the initiation of BMP-mediated feedback(Zinski et al. 2017; Li et al. 2020). For the 1D approach, we assume the patterning region is on the margin line. For the 3D approach, the zebrafish embryo is approximated as a perfect hemisphere and the reaction-diffusion process happens on the surface of the sphere. Using a hemisphere allows us to discretize the model using the spherical coordinate system. We solved the coupled non-linear partial differential equations (PDEs) for BMP ligand, Chordin, Noggin, Sizzled (in the 1D model only) and the complexes of BMP-Chordin, BMP-Noggin using finite difference method in MATLAB (Li et al. 2020). No-flux boundary conditions are applied for all species on both the ventral and dorsal boundaries for both 1D and 3D model.

As a general definition, the advection-diffusion-reaction PDE describing the changes in the concentration field of a chemical species C(X, t) in space Xn and time t can be stated as

Ct=DΔCuC+R+P

The first term in the right-hand side is the change in concentration due to diffusion. The Laplace operator acting on the concentration ΔC establishes that the concentration will flow in the direction of the negative gradient. Diffusion is parameterized by the constant D. The second term in the right-hand side is the advection term corresponding to the velocity field u(X, t) and the concentration gradient ∇C. The velocity field is assumed to be known and to satisfy the continuity equation. The last two terms on the right-hand side are functions describing reaction and production of C. These functions could depend on C itself, could also depend on position and time, or even be functions of other fields.

For illustration, we consider a simple system of BMP regulation. Let B(X, t) denote the concentration field of BMP over some domain XΩn, and R(X, t) the concentration of a regulator protein. The field BR(X, t) is the concentration of the BMP-regulator compound. In this example we ignore the advection term assuming that the tissue and cell populations are stationary and u(X, t) = 0. The system of equations describing local mass balance are

Bt=DBΔBkbBR+kubBR
Rt=DRΔRkbBR+kubBR
BRt=DBRΔBR+kbBRkubBR

This system is parameterized by three diffusion parameters DB, DR, DBR and two reaction rates kb, kub. In order to actually compute the change in the concentration fields B(X, t), R(X, t) and BR(X, t) over time and space, a few more ingredients need to be defined. First of all, the domain of interest Ωn has to be introduced. For example, as we will see later, we are interested in a surface that is a portion of a sphere, see Figure 3. Boundary conditions also need to be specified. No-flux conditions at the boundary of the domain are a common and reasonable assumption. Lastly, the initial conditions B(X, 0), B(X, 0), B(X, 0) are also needed.

Figure. 3.

Figure. 3.

Illustration of the 1D vs 3D approach of PDE simulations on BMP concentration profile on Margin region/whole embryo region

Once the boundary value problem is fully specified, the next challenge is to solve it. Analytical solutions are not an option beyond extreme simplifications and assumptions. Instead, numerical methods are used to solve the PDE. There are several alternatives, depending on the characteristics of the PDE. We won’t dive into the details of all possible approaches here. For the case of solid elastic or viscoelastic domains like the ones we are interested in, material points can be followed throughout the simulation. For these cases, structured grids or unstructured discretization are the most common. Structured grids allow the use of finite difference schemes to discretize the derivatives in the PDE but are limited to regular domains. Among unstructured grids, the finite element method allows to represent functions and derivatives on arbitrary geometries but increases computational cost. The time derivative also allows for different discretization strategies. Explicit time integration schemes bypass the solution of a linear system of equations but are only conditionally stable and may require extremely small-time steps depending on the nonlinearities of the PDE. Implicit time integration results in a possibly nonlinear system that needs to be solved for every time step but has the benefit of being stable for large time steps. The system of 3 equations introduced above for BMP regulation consists of only constant linear operators and therefore can be efficiently solved with implicit time integration schemes. More realistic models are often nonlinear.

In specific, we consider a PDE model of reaction-diffusion in zebrafish development that is represented by equations below. There are six proteins: BMP, Chordin, Noggin, BMP-Chordin, BMP-Noggin and Sizzled that interact with each other.

Bt=DBΔB+ϕB+λtBC11+S/kit+(C+BC)/kmtBC+λaBC11+S/kia+(C+BC)/kmaBCkonCBC+koffCBCkonNBN+koffNBNdecBB
Ct=DCΔC+ϕCλtC11+S/kit+(C+BC)/kmtCλaC11+S/kia+(C+BC)/kmaCkonCBC+koffCBCdecCC
Nt=DNΔN+ϕNkonNBN+koffNBNdecNN
BCt=DBCΔBCλtBC11+S/kit+(C+BC)/kmtBCλaBC11+S/kia+(C+BC)/kmaBC+konCBCkoffCBCdecBCBC
BNt=DBNΔBN+konNBNkoffNBNdecBNBN
St=DSΔS+VsBnkn+BndecsS
etaSVs=pbnknB0+bn

The model has 23 unknow parameters and to learn the overall system behavior, a common approach is to screen a distribution of these parameters. Furthermore, for each set of parameters there are seven mutations for which experimental data is available: (wild type (WT), Chordin loss of function (CLF), Noggin loss of function (NLF), Bmp1a loss of function (ALF), Tolloid loss of function (TLF), Bmp1a and Tolloid loss of function (TALF), Sizzled loss of function (SLF)). Thus, a separate set of PDEs, each for a given mutation, is needed. The mutation simulation is based on the turning on/off of some specific parameters, for example in CLF simulation the Chordin expression is set to φC = 0. The model has unknown and tunable parameters. Most of the variable parameters have to be randomly searched according to their ranges so that the outputs of the simulation match the experimental data. The fitness of the parameters is determined by Normalized Root Mean Square Error (NRMSE) between the final BMP distribution from the simulation and the experimental values. The values of Smax that are used to find appropriate reaction rate kit and kia ranges are determined based on other simulations not discussed here.

Application on PDE acceleration through neural networks

In this section, we will discuss how we applied the neural network approach in PDE acceleration of the specific PDE system of BMP patterning in two different ways based on the type of data input and output. For this problem, this biological process happens on a developing zebrafish embryo during blastulation and gastrulation. By assuming the patterning only happens on the margin of the cell group, this problem can be considered as a one-dimensional problem with the input of unknown parameters and output of 1D concentration profiles. On the other hand, by considering the spatial distribution of the cells and their movement, this problem can be considered as a three-dimensional moving domain problem with the input of unknown parameter space and 3D concentration profiles are generated at each node. For the 1D approach, we assume the patterning region is on the margin line (Figure 3). The inputs to the neural network are the PDE model parameters that are generated randomly according to the specific ranges. The output is the final distribution of BMP concentration in a 1D line with 36 nodes. Since the parameter values can take on an unknown number in the screen, and often over a very large range, both inputs and outputs are normalized by taking a logarithm of base 10 of all the values and then dividing the resulting values by 10. Also, any value less than 10−8, including 0, is approximated to 10−8. This way we ignore concentrations too small to be significant and avoid taking a logarithm of zero. To collect data for neural network training we run 100,000 parameter sets, each consisting of 7 different mutation for a total of 700,000 unique simulation data points. 90% of data is used for training and 10% is used for validation. We consider two neural network architectures: MLP and LSTM. In the MLP model, the PDE parameters are passed through a sequence of linear layers, each followed by Rectified Linear Unit (ReLU) activation function. The output layer gives the 1D distribution of BMP concentrations on the margin in 36 nodes at once. That is in contrast to an LSTM model, shown in Figure 2, where the BMP concentrations are output in a sequence over spatial locations, one by one through the calculation domain. Here the PDE parameters are passed first through a linear layer that gives a higher dimensional parameter embedding. Then the parameter embedding is concatenated with an LSTM output at a previous step in sequence and passed to an LSTM module which outputs the BMP concentration at the current point in sequence. A BMP concentration of 0 (−0.8 after normalization) is given as a dummy input at the first step of a sequence. The sequence length of LSTM is 36 since there are 36 points in space for the PDE model.

To match neural network outputs with actual PDE simulation outputs an L1 loss is calculated between the two. (Both L1 and L2 loss have been tested, L1 loss has better performance than L2 loss in our case) It is then backpropagated through the neural network to calculate the gradient at each weight of the neural network. Then the weights are optimized through the Adam algorithm with an initial learning rate of 0.001. The training is run for 100 epochs. In addition to the L1 loss we consider another metric (NRMSE between NN prediction and direct PDE simulation result) to evaluate the performance of the neural network.

Results

Tables 1 and 2 show the validation set results of training different MLP and LSTM model for the 1D case. The MLPs are named with the format: MLP-number of layers-number of units. For example, MLP-3–256 means the model has 3 layers, first with 256 outputs, second with 256 outputs and third with 36 outputs. MLP-4–256 would have one more layer with 256 outputs. For LSTMs we consider modules with output sizes of 256 and 512. In addition to accuracy metrics like R2 and relative error we also consider number parameters and computational cost metrics like number of floating-point operations (FLOPs), latency on a standard Intel CPU and latency on Titan X Pascal GPU. Among MLP models, MLP-4–1024 has the best accuracy while among LSTM models it is LSTM-512. We can also see that LSTMs models slightly outperform MLP models in accuracy. For example, LSTM-512 has the same number of learnable parameters as MLP-4–1024 with a relative error lower by over 1%. That is due to LSTM’s ability to understand sequences. However, that also comes with a bigger computational cost. All of FLOPs, CPU latency and GPU latency are more than 10x larger for LSTM- 512 than for MLP-4–1024. From here on we only consider the best MLP and LSTM models hence we refer to the MLP-4–1024 model as MLP and to the LSTM-512 model as LSTM.

Table 1.

Comparing MLP models

MLP - 3 -256 MLP - 3 -1024 MLP - 4 -256 MLP - 4 -1024
Parameters 0.0812M 1.11M 0.147M 2.16M
FLOPs 0.162M 2.22M 0.294M 4.32M
CPU latency 0.092ms 0.120ms 0.125ms 0.319ms
GPU latency 0.187ms 0.189ms 0.237ms 0.237ms
Rel.error 12.19% 9.57% 8.34% 6.99%

Table 2.

Comparing MLP models

LSTM - 256 LSTM -1024
Parameters 0.533M 2.11M
FLOPs 37.9M 151M
CPU latency 5.96ms 12.4ms
GPU latency 5.51ms 5.67ms
R2 0.9994 0.9996
Rel.error 7.67% 5.74%

Next we investigate how the neural network model responds to the amount of data it is fed and how its error changes as training progresses. We only consider the MLP model here since the LSTM model is expected to give very similar trends. As expected, the accuracy improves as more samples are used. Using 100,000 samples gives satisfactory results. Results are acceptable for 10,000 samples and not good for 1000 samples. Such differences in accuracy based on number of samples are quite standard compared to other deep learning application where usually 10,000 training samples are needed for this number of inputs and the degree of nonlinearity. This table also shows importance of our relative error metric. For 1000 samples we get a high relative error while still a respectable R2. That is because R2 is calculated on normalized log values since calculating it on actual values would make those of larger magnitude dominate which we avoid.

Figures 4 shows how MLP and LSTM model respectively reproduce the PDE simulations results on seven randomly chosen samples, one for each mutation. We can see that generally the neural networks provide a final BMP distribution that is very similar to the one given by a direct PDE simulation. Since LSTM gives outputs in the form of a sequence, one by one, we would expect that its plots would generally be smoother than those of MLP.

Figure. 4.

Figure. 4.

Comparison of protein distributions obtained by direct PDE simulation, MLP metamodel and LSTM metamodel plotted over the 1D domain parameterized by the nondimensional x coordinate, with x = 0 (0 micrometer) the ventral end and x = 35 (700 micrometers) the dorsal end on margin.

Both quantitative and qualitative evaluations slightly favor the LSTM model. It gives higher R2, lower relative error, produces smoother BMP distributions and reproduces the mutation data with less variation. The RMSD between simulation results and metamodel results was calculated, and Figure 4 shows the histogram of the case number of RMSD distribution for both the MLP and LSTM metamodel. Predicted results of only a slight improvement over the MLP. However, those improvements come at a significantly higher computational cost of at least 20x. Since we would like to use the neural network metamodel for rapid exploration of PDE parameter space, the MLP may often be the better choice.

We further applied the neural network approach to a more complex PDE system that contains 3D dynamics on the surface of a hemisphere geometry. Previous simulations of a growing domain 3D simulation were used as the training and validation data in the MLP model (Li et al. 2020). To handle the large input and output data size in the prediction of the results for the entire embryo (over 1000 spatial locations in 3D compared to 36 spatial points in 1D), adjustments to the structure of the neural network were needed resulting in a more complicated model with multiple layers and output points that increased the CPU latency and decreases the accuracy of the model prediction relative to the 1D simulation. To train the neural networks to solve supervised learning tasks as an alternative to the PDE solver, we used the input data with the structure as [parameters + coordinates + time] and the output data structure as [concentrations of different species], which is similar to the inputs and outputs of a PDE solver. To train the neural network for the 3D hemisphere, the MLP model was given the individual calculation points extracted from the grid (see Figure 5). Ten thousand WT simulation results were used to test the model, provide 4,051,234 points for the training set (9000 whole embryo simulation results) and 450,000 (1000 whole embryo simulation results) points are used for the validation set. Train accuracy remains 99% after 100 epochs. Figure 6 shows the comparison of multi-objective plots between simulation and NN model of 3D growing domain model results with the MLP network.

Figure. 5.

Figure. 5.

Comparison of RMSD distribution among 70,000 validation MLP metamodel and LSTM metamodel

Figure. 6.

Figure. 6

Comparison between simulation and NN model of 3D growing domain model results with MLP model

CONCLUSION

In this review, we summarize and show an example for how neural network metamodels are effective in accelerating PDE-based biological simulations. The acceleration offers speedups of about 1000x while preserving accuracy with R2 above 0.99. We considered a specific PDE model from zebrafish development, but the neural network models discussed here can be applied to many other PDE systems. Compared to (Wang et al. 2019b) we train both LSTM and MLP models and show that MLP offers advantages in speed without sacrificing much accuracy.

The key contribution in the approach presented is the acceleration of PDE evaluation via NN metamodel, which enables the inverse problem of identifying the parameters of the PDE that best explain experimental data. With the proposed NN metamodel, we can now replace the direct PDE solver and explore the entire input space thanks to the computational efficiency of the NN metamodel. However, brute force parameter exploration may not be the most efficient approach, even with a fast metamodel. Another advantage of NN is that they are fully differentiable and thus open the possibility for gradient descent which would be much more complicated when the direct PDE solver is used. For direct PDE solvers, gradient optimization entails either costly and inaccurate numerical approximations based on more function evaluations, or solution of the adjoint problem. Finally, NN and other machine learning methods could also contribute to the inverse problem by employing reinforcement learning (RL) to find optimal parameters. The actions from RL could be the parameters searched and reward could be the inverse of error between PDE simulation or NN metamodel and experiments. In a similar fashion, reinforcement learning has already been used in Neural Architecture Search (NAS) (Zoph et al. 2018). Another application that can expand the use of the NN model-based acceleration of PDE is multi-objective optimization. Multi-objective optimization is an area of multiple criteria decision making that is concerned with mathematical optimization problems involving more than one objective function to be optimized simultaneously. It is a useful tool for quantitative biology (Pargett et al. 2014). However, it may require a large number of parameter screens among the different types of simulation. We expect that the approaches discussed here will improve the capabilities of AI-based surrogate models and accelerate scientific research and discovery in biology.

Footnotes

Conflicts of Interest

No potential conflicts of interest relevant to this article were reported.

Human and Animal rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.

Publisher's Disclaimer: This Author Accepted Manuscript is a PDF file of a an unedited peer-reviewed manuscript that has been accepted for publication but has not been copyedited or corrected. The official version of record that is published in the journal is kept up to date and so may therefore differ from this version.

References

  • 1.Alber M, Buganza Tepole A, Cannon WR, et al. (2019) Integrating machine learning and multiscale modeling—perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences. npj Digit Med 2:1–11. doi: 10.1038/s41746-019-0193-y• This article reviews the current state of the art of the applications of machine learning and multiscale modeling in the biological field.
  • 2.Breen PG, Foley CN, Boekholt T, Zwart SP (2019) Newton vs the machine: solving the chaotic three-body problem using deep neural networks. MNRAS [Google Scholar]
  • 3.Chan H, Narayanan B, Cherukara MJ, et al. (2019) Machine Learning Classical Interatomic Potentials for Molecular Dynamics from First-Principles Training Data. J. Phys. Chem. C 123:6941–6957 [Google Scholar]
  • 4.Deringer VL, Bernstein N, Bartók AP, et al. (2018) Realistic Atomistic Structure of Amorphous Silicon from Machine-Learning-Driven Molecular Dynamics. J Phys Chem Lett 9:2879–2885. doi: 10.1021/acs.jpclett.8b00902 [DOI] [PubMed] [Google Scholar]
  • 5.Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. JMLR [Google Scholar]
  • 6.Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press [Google Scholar]
  • 7.Hagge T, Stinis P, Yeung E, Tartakovsky AM (2017) Solving differential equations with unknown constitutive relations as recurrent neural networks [Google Scholar]
  • 8.Hastie Trevor, Tibshirani Robert, Friedman J (2009) The Elements of Statistical Learning The Elements of Statistical LearningData Mining, Inference, and Prediction, Second Edition [Google Scholar]
  • 9.Hengenius JB, Gribskov M, Rundell AE, Umulis DM (2014) Making models match measurements: Model optimization for morphogen patterning networks. Semin. Cell Dev. Biol. 35:109–123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hinton G (2012) Overview of mini-batch gradient descent
  • 11.Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput [DOI] [PubMed] [Google Scholar]
  • 12.Karumuri S, Tripathy R, Bilionis I, Panchal J (2020) Simulator-free solution of highdimensional stochastic elliptic partial differential equations using deep neural networks. J Comput Phys 404:109120. doi: 10.1016/j.jcp.2019.109120 [DOI] [Google Scholar]
  • 13.Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. ICLR [Google Scholar]
  • 14.Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. NIPS [Google Scholar]
  • 15.Leah Edelstein-Keshet (2012) Mathematical Models in Biology
  • 16.LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE [Google Scholar]
  • 17.Lee T, Bilionis I, Tepole AB (2020) Propagation of uncertainty in the mechanical and biological response of growing tissues using multi-fidelity Gaussian process regression. Comput Methods Appl Mech Eng 359:112724. doi: 10.1016/j.cma.2019.112724 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Lee T, Turin SY, Gosain AK, et al. (2018) Propagation of material behavior uncertainty in a nonlinear finite element model of reconstructive surgery. Biomech Model Mechanobiol 17:1857–1873. doi: 10.1007/s10237-018-1061-4 [DOI] [PubMed] [Google Scholar]
  • 19.Li L, Wang X, Mullins MC, Umulis DM (2020) Evaluation of BMP-mediated patterning in a 3D mathematical model of the zebrafish blastula embryo. J Math Biol 80:505–520. doi: 10.1007/s00285-019-01449-x• This study provides a thorough description of the mathematical modeling in BMP-mediated patterning of the zebrafish embryo.
  • 20.Little SC, Mullins MC (2006) Extracellular modulation of BMP activity in patterning the dorsoventral axis. Birth Defects Res Part C Embryo Today Rev 78:224–242. doi: 10.1002/bdrc.20079 [DOI] [PubMed] [Google Scholar]
  • 21.Little SC, Mullins MC (2004) Twisted gastrulation promotes BMP signaling in zebrafish dorsal-ventral axial patterning. Development 131:5825–35. doi: 10.1242/dev.01464 [DOI] [PubMed] [Google Scholar]
  • 22.Mjolsness E, DeCoste D (2001) Machine learning for science: State of the art and future prospects. Science (80-. ). 293:2051–2055 [DOI] [PubMed] [Google Scholar]
  • 23.Mnih V, Kavukcuoglu K, Silver D, et al. (2015) Human-level control through deep reinforcement learning. Nature [DOI] [PubMed] [Google Scholar]
  • 24.Nguyen-Tuong D, Seeger M, Peters J (2009) Model learning with local Gaussian process regression. In: Advanced Robotics. Taylor & Francis Group, pp 2015–2034 [Google Scholar]
  • 25.Pargett M, Rundell AE, Buzzard GT, Umulis DM (2014) Model-Based Analysis for Qualitative Data: An Application in Drosophila Germline Stem Cell Regulation. PLoS Comput Biol 10:. doi: 10.1371/journal.pcbi.1003498 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Peng GCY, Alber M, Buganza Tepole A, et al. (2020) Multiscale Modeling Meets Machine Learning: What Can We Learn? Arch Comput Methods Eng 1–21. doi: 10.1007/s11831-020-09405-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Pretorius CJ, du Plessis MC, Cilliers CB (2013) Simulating Robots Without Conventional Physics: A Neural Network Approach. J Intell Robot Syst [Google Scholar]
  • 28.Raissi M, Perdikaris P, Karniadakis GE (2019) Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J Comput Phys 378:686–707. doi: 10.1016/j.jcp.2018.10.045 [DOI] [Google Scholar]
  • 29.Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature [Google Scholar]
  • 30.Rupp M, Tkatchenko A, Muller K-R, von Lilienfeld OA (2012) Fast and accurate modeling of molecular atomization energies with machine learning. Phys Rev Lett [DOI] [PubMed] [Google Scholar]
  • 31.Sengupta S, Basak S, Saikia P, et al. (2020) A review of deep learning with special emphasis on architectures, applications and recent trends. Knowledge-Based Syst 194:105596. doi: 10.1016/j.knosys.2020.105596 [DOI] [Google Scholar]
  • 32.Smith JS, Isayev O, Roitberg AE (2017) ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem Sci [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Sugiyama M (2015) Introduction to Statistical Machine Learning. Elsevier Inc. [Google Scholar]
  • 34.Thompson MJ, Othmer HG, Umulis DM (2018) A Primer on Reaction-Diffusion Models in Embryonic Development. In: eLS. John Wiley & Sons, Ltd, Chichester, UK, pp 1–16 [Google Scholar]
  • 35.Umulis DM, Othmer HG (2015) The Role of Mathematical Models in Understanding Pattern Formation in Developmental Biology. Bull Math Biol 77:817–845. doi: 10.1007/s11538-014-0019-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wang R, Kalnay E, Balachandran B (2019a) Neural machine-based forecasting of chaotic dynamics. Nonlinear Dyn 98:2903–2917. doi: 10.1007/s11071-019-05127-x [DOI] [Google Scholar]
  • 37.Wang S, Fan K, Luo N, et al. (2019b) Massive computational acceleration by using neural networks to emulate mechanism-based biological models. Nat Commun • This study applies neural networks to a mechanistic PDE model of pattern formation in bacteria for prediction of spatial-temporal fields, they trained an LSTM for a PDE system that calculates cell and molecular concentrations for pattern formation in Escherichia coli
  • 38.Wang X, Zhao Y, Pourpanah F (2020) Recent advances in deep learning. Int. J. Mach. Learn. Cybern. 11:747–750 [Google Scholar]
  • 39.Wang Y, Chao W-L, Garg D, et al. (2019c) Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving. CVPR [Google Scholar]
  • 40.Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn [Google Scholar]
  • 41.Ye W, Chen C, Wang Z, et al. (2018) Deep neural networks for accurate predictions of crystal stability. Nat Commun [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Zhang D, Guo L, Karniadakis GE (2020) Learning in modal space: Solving time-dependent stochastic PDEs using physics-informed neural networks. SIAM J Sci Comput 42:A639–A665. doi: 10.1137/19M1260141 [DOI] [Google Scholar]
  • 43.Zinski J, Bu Y, Wang X, et al. (2017) Systems biology derived source-sink mechanism of BMP gradient formation. Elife [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Zoph B, Vasudevan V, Shlens J, Le Q V (2018) Learning Transferable Architectures for Scalable Image Recognition. CVPR [Google Scholar]

RESOURCES