Abstract
Quantitative Systems Pharmacology (QSP) models capture the physiological underpinnings driving the response to a drug and express those in a semi-mechanistic way, often involving ordinary differential equations (ODEs). The process of developing a QSP model generally starts with the definition of a set of reasonable hypotheses that would support a mechanistic interpretation of the expected response which are used to form a network of interacting elements. This is a hypothesis-driven and knowledge-driven approach, relying on prior information about the structure of the network. However, with recent advances in our ability to generate large datasets rapidly, often in a hypothesis-neutral manner, the opportunity emerges to explore data-driven approaches to establish the network topologies and models in a robust, repeatable manner. In this paper, we explore the possibility of developing complex network representations of physiological responses to pharmaceuticals using a logic-based analysis of available data and then convert the logic relations to dynamic ODE-based models. We discuss an integrated pipeline for converting data to QSP models. This pipeline includes using k-means clustering to binarize continuous data, inferring likely network relationships using a Best-Fit Extension method to create a Boolean network, and finally converting the Boolean network to a continuous ODE model. We utilized an existing QSP model for the dual-affinity re-targeting antibody flotetuzumab to demonstrate the robustness of the process. Key output variables from the QSP model were used to generate a continuous data set for use in the pipeline. This dataset was used to reconstruct a possible model. This reconstruction had no false-positive relationships, and the output of each of the species was similar to that of the original QSP model. This demonstrates the ability to accurately infer relationships in a hypothesis-neutral manner without prior knowledge of a system using this pipeline.
Keywords: Boolean network, QSP, Dynamic models, Machine learning
Introduction
Biological systems form complex networks, with the interactions between components of these networks creating emergent behaviors that drive physiological functions [1, 2]. These networks may consist of genes, proteins, RNA, cells, biochemical species, or any combination of these elements. The proper functioning of these networks, as well as their disruption, can determine how a disease progresses or how an individual may respond to a therapeutic drug [3–5]. Understanding the behaviors of these networks plays a key role in understanding diseases [6, 7], identifying new drug targets [8, 9], and predicting drug-drug interactions [10]. The complexity of these networks and the nonlinear nature of the underlying processes drove the development of complex computational and mathematical representations explaining the responses to various drugs and diseases [11]. These models combine data with the available pharmacological and physiological understanding representing relevant biological mechanisms using quantifiable kinetic rate expressions, usually resulting in (large) systems of ordinary differential equations [12]. These comprehensive efforts have given rise to the field of Quantitative Systems Pharmacology (QSP) as means to systematically integrate the host’s response to drug treatment [13, 14]. The promise of QSP models is beginning to materialize, and applications of QSP models are now considered beyond the original intent to assess a mechanism and beginning to enter the regulatory arena [15]. While QSP models are very informative, quantifying them presents a substantial hurdle since a non-trivial amount of data is needed and the relations between the components of the network need to be precisely articulated. One way to overcome these hurdles is to explore qualitative representations such as Boolean models [16].
Boolean models have existed as a way to express qualitative (logic) relations between interacting elements and were used to analyze gene networks for over half a century [17, 18]More advanced models of Boolean networks have been used in dynamic biochemical pathway models [19, 20], cell signaling pathways [21], systems pharmacology models [16, 22, 23], and used for drug target identification [8]. These models are often targeted to simulate fairly complex, system-level behaviors such as host immune response [24], and cardiac signaling [20] and can provide valuable biomedical information about those systems. It can be argued that mechanistic models may be more accurate compared to logic-based models, however, mechanistic knowledge is often lacking during the early stages of research, and logic-based models can serve to initially identify possible network structures and relationships when there is limited knowledge about a system [25]. Several toolsets, including CellNOpt, have been developed to construct these networks from data, allowing for rapid implementations of existing inference algorithms [26, 27]. CellNOpt provides a platform to infer networks from large datasets and the ability to express these inferred networks using multiple logic formalisms.
Nevertheless, a qualitative model can only be used to describe putative relations between elements and cannot readily be used to make quantitative recommendations. A challenge, therefore, remains as to how to use a Boolean model as the basis for the development of a mechanistic QSP model. Recent advances, however, have demonstrated how once a Boolean network has been constructed, it can be transformed into a continuous model that can then be fit to the original, raw data set and parametrized [28, 29]. This continuous model can serve as a backbone or a starting point of a more detailed ODE (i.e., QSP) model to represent the network of interest, with relationships or parameters being updated as more data, information, or knowledge of the system is attained. Current automated algorithms generally rely on certain types of ODE structures (such as Hill-type equations) and therefore systems with different types of relationships, or more nuanced relationships that cannot be trivially represented by an ODE, may require fine-tuning of the model as more information becomes available.
There exists, therefore, a need for a process that reliably, repeatably, and systematically transforms data collected in vitro and in vivo into models to understand the relationships between components of these complex biological networks. However, constructing models of these networks is non-trivial, even if there is extensive and rich data available for the individual components within the network. Identifying the relationships, and how one species may influence, promote, or inhibit another requires analysis of many components simultaneously. Different approaches exist to develop network models from data, each with unique challenges and opportunities. Depending on the knowledge of a system, the data available, and the coarseness of the model desired, different strategies may be used independently or together to infer relationships and construct meaningful network models [30–32].
In this work, we propose a pipeline to transform biological data into a Boolean network, which can then be further transformed into a continuous ODE-based model. This pipeline is beneficial because initial modeling efforts with Boolean networks can be undertaken with limited data from pilot experiments, allowing for the results of these initial models to guide hypothesis testing and data collection efforts to develop a more full, fine-grained model. We discuss a prototype implementation of an integrated pipeline for constructing QSP models. Starting with data, a logic network is developed using Boolean logic which is subsequently converted to a dynamic (ODE) model. We validated this pipeline by reverse engineering an existing QSP model of flotetuzumab. Flotetuzumab is a bispecific CD3xCD123 DART molecule, developed for the treatment of acute myeloid leukemia. Flotetuzumab binds to CD3 on T-cells as well as a receptor on the target cells, leading to the formation of a cytolytic immunologic synapse [33, 34].
We used the output of the existing continuous model as a dataset and utilized the pipeline to construct a network based on the model output. This output includes the concentration of flotetuzumab in the central compartment (CC), the drug-CD123 complex (DTA), the drug-CD3 complex (DTC), the total number of T-cells (TC), the number of active T-cells (ATC), the trimolecular synapse (SYN), and the number of CD123 cells (CD123). This output does not capture all variables utilized in the original model, including the number of receptors and the receptor occupancy. We had no false-positive relationships identified, meaning that every relationship identified by the pipeline was in the original model. Further, the Boolean model we identified was converted to an ODE model, which mimicked the qualitative behavior of the original model for the variables included.
Process overview
Constructing Boolean networks consists of collecting and processing data into usable binary data set and utilizing that data set to construct a Boolean network. The network can then be studied independently (for sensitivity analysis, or network reduction) as well as used to directly build an ODE model. This process is summarized in Fig. 1.
Fig. 1.

Overview of the pipeline. First data is collected, processed, and binarized. The details of this step depend on the quality and type of data collected: sufficient data with low noise may use a more advanced binarization technique. Second, candidate networks are inferred and compared. Third, candidate networks can be examined for biological realism, both expert knowledge and first principles can be used to reduce the possible networks. Fourth, sensitivity analysis can be performed along with network reduction to remove unnecessary nodes. Once a final, reduced, network is selected, the final Boolean network can be converted into an ODE model
Once data is processed and the final binary data set is available, the binarized data set is used to construct a Boolean network. This can be done in several ways; within this paper, we discuss a method to infer Boolean equations known as “best-fit extension” [35]. Other methods including machine learning [36] or methods relying on mutual information [37] exist and may be appropriate depending on the size of the network and the data on hand. Some methods combine multiple learning methods to improve the computational speed of inferring particularly large networks [32]. Most of these methods can be incorporated with expert knowledge, allowing for certain relationships to be included or excluded from the model fitting process.
With a given set of binary data, there are likely multiple Boolean networks that could reproduce the same signals, and therefore it is unlikely that a single network will be identified using network inference. The set of networks that may be produced needs to be pruned to narrow the search for a good model of the system in question. Here we utilize several strategies to prune some networks from the search, including eliminating networks that use relationships that are not biologically feasible; using expert or prior knowledge may allow researchers to quickly rule out certain relationships within a network.
Once a final candidate network is selected, sensitivity analysis and network reduction can be performed to remove any remaining unnecessary components from the network. The finalized model can then be converted into an ODE or continuous model using existing techniques, or the relationships inferred may be used as a starting point for constructing a model.
Collecting and processing data
Care during data collection is necessary to properly construct a Boolean network. One can envision a Boolean network and the binary output of the network as being event-driven: each time point represents a network state, and transitions between network states are events. The amount of real-time that passes between each network state may not be consistent. For example, it may take a network an hour to transition from the first to the second network state, and only a few minutes for the transition from the second to the third state. The time between data points will depend on the type of reactions being measured and the rate of reactions; however, we recommend trying to capture a minimum of 1 unique network state per species in the network.
In a binary network, there are a total of 2n possible network states, where n is the number of variables in the system. So, for a system with 2 binary variables, there are 22 possible network states, meaning that only 4 data points are necessary to fully define the system. As the number of components within a system grows it becomes impractical to gather sufficient data to create a fully defined system, and even a relatively small network with 6 species in it would require a minimum of 64 time points to be fully defined, assuming no time points are identical to one another. This creates two challenges for data collection: It becomes impractical to collect the sheer number of data points necessary to capture every theoretical network state for large networks, and not all binary networks will naturally cycle through all theoretical states. Figure 2A shows a network consisting of two nodes. Five data points are shown for this network, and the first and final data points are identical. This means that four unique data points were captured and essentially support a fully defined Boolean network. The Boolean equations (Shown in Eqs. 1 and 2) for this system can be correctly and uniquely identified using the best-fit extension paradigm discussed later:
| (1) |
| (2) |
Fig. 2.

Example networks with similar amounts of data. A Shows a 2-node network (left) with the network output (right). All theoretical network states are present in the dataset, and the network is uniquely and correctly identifiable using existing inference techniques. B Shows a 3-node network and the output of the network. In this case, not all theoretical network states are present in the dataset, but the network is still uniquely and correctly identifiable from the dataset. C Shows a 5-node network and the output of the network. This network is not uniquely identifiable from the network output. We recommend ensuring more unique data points than there are nodes to improve identifiability; however, this may not always be possible
In this work we will represent equations using AND (&) OR (∥) and NOT (~) functions. If a variable has no inputs, it will be identified by < >. Figure 2B shows a network consisting of three nodes. There are eight possible network states, however in this case, only five data points are included and only four of them are unique. In this case, the equations can still be correctly and uniquely identified for A and B (shown above) as well as for C:
| (3) |
This makes it clear that we do not need every theoretical data point to identify a unique, correct equation. In contrast, Fig. 2C shows a system of five nodes with five data points, four of which are unique. In this case, none of the nodes can be uniquely identified, and the fifth node (E) has four biologically plausible equations (Eqs. 4, 5, 6, 7). None of these equations are necessarily more or less likely to be true for E unless there is some knowledge of the network known outside of the data set:
| (4) |
| (5) |
| (6) |
| (7) |
As a rule of thumb: if there are fewer unique data points than variables then the system will be unidentifiable, as can be seen with the example in Fig. 2C. This is a general rule for model building, where it becomes possible that variables will be mistaken as being statistically significant even when they are not influential. This is known as Freedman’s paradox and is present in all types of model building. Freedman’s paradox can lead to false-positive identification of edges and model selection bias [38]. There are methods for overcoming this data challenge, including utilizing prior knowledge of the network structure or making assumptions about the sparsity or connectivity of the network [39–41].
Binarization
Several binarization techniques exist, including iterative k-means clustering [42], BASC-A [43], and alternative clustering-based methods [44]. These methods seek to more accurately categorize individual data points as “high” or “low”, which may be beneficial for binarizing species that have oscillatory behaviors with different levels of magnitude [42]. However, attempting to identify smaller oscillatory behaviors compared to noise may result in the misclassification of time points. Depending on the size of a system, a single time point being misclassified may make identification of the network difficult. In our framework, K-means clustering is used to binarize the data set. K-means clustering is a widely used method of binarization, including being used specifically for binarizing data for inferring biological networks [45–47] and being used as a comparison for the development of novel methods [42, 43]. Depending on the general noise, the number of large outliers, and other artifacts, different binarization methods may be more appropriate to properly characterize each data point. We used K-means here because it will generally be robust with real-world data [47]. For a given variable, the data is partitioned into two clusters using the K-means method. The data points in the cluster with the higher valued centroid are assigned a binary value of 1, and the data points in the lower valued cluster are assigned a binary value of 0. This creates a signal with periods of “on” and off”. Figure 3 shows the data record of SYN from our flotetuzumab case study. The solid line is the output of the original model, the black asterisks represent the binarized values for the corresponding time. Here we can see the four dosing events result in four approximately equal “active” periods, at which the synapse variable has a value of 1.
Fig. 3.

Example of binarization. This shows the amount of active drug/CD3/CD123 synapse which leads to the immunogenic response to cancer cells. The solid line shows the continuous value produced by the gold standard ODE model. The black stars represent the binarized values: the low values (0) are “off” while the high values (1) are “on”. Although the continuous values are different at each of the 4 dosing events, the binarized values show the same activity for each
Figure 4 shows the same three node network from Fig. 2B. This network is uniquely and correctly identifiable from the provided data points. However, if a single data point is changed from a 1 to a 0 (highlighted in red), nearly all relationships identified are incorrect. This highlights the importance of proper binarization.
Fig. 4.

The effect of a mis-binarized data point on network inference. The 3-node network from Fig. 2 is uniquely and correctly identifiable. However, if a single data point is changed from a 1 to a 0 (highlighted in red), then the network inferred from the data set becomes unrecognizable. Binarized data should be assessed qualitatively to ensure that there are few, if any, mischaracterizations. Averaging multiple runs or using multiple data sets for inference should reduce the effect of noise
Variable reduction and data simplification
After data are binarized, different species may have identical behaviors. Species with identical values present an issue of identifiability: if two species have identical values, then it is impossible to determine whether one species or the other or both drive changes throughout the network.
Figure 5 shows a sample set of binarized data, in which species A and B have identical values at every single time point. Visual inspection of the data set makes it clear that A, B, and C form an oscillatory network, although the specifics of the network are unclear: C may be regulated by both A and B, or may be regulated by one or the other, or may be autoregulated. In this case, C has 12 possible Boolean equations that are equally valid, without any meaningful ability to determine between them. In this case, A and B can be reduced to a single variable. If B was combined with A, representing a module or cluster of species rather than a single species, the number of possible equations is reduced to 4.
Fig. 5.

Top: example of species being clustered. A and B have identical time-course data after binarization, which leads to a problem of identifying a suitable model that might be present. Instead, both species can be treated as a single variable (highlighted in green). Bottom: several complex relationships may be inferred from the binarized data. After A and B are treated as a single variable, the number of potential networks is significantly reduced (not all networks are shown)
Within our dataset, the two different heterodimer species, drug/CD3 (DTA) and drug/CD123 (DTC) heterodimers have identical binarized datasets and were reduced into a single heterodimer variable (referred to as DT in our model).
Constructing Boolean networks
We utilize a paradigm for inferring Boolean networks known as a best-fit extension [32, 35]. For a given set of time-course data with n species and T time points, we define a Boolean network as a set of n Boolean functions. The Boolean function that defines the value of species Si is Bfi. The state of the network at time point t is the set of all values of all species within the network and is referred to as X(t). The value of Si at a time point t + 1 is Si(t + 1) = Bfi(X(t)).
For a given species Si, a partially defined Boolean function (pdBf) can be described as a set of states that relate X(t) to Si(t + 1). These states can be visualized as a truth table or transition table, with the input columns representing the current state of the network at time t and the output column representing the value of Si(t + 1) as seen in Fig. 6. In Fig. 6, the highlighted green rows show the example state values identified in the time-course data for A and B, along with the value of C at the following time point. Highlighted in red are theoretical system states that are not present within the time-course data. Thus, the pdBf is considered partially defined because the empirical dataset only presents a subset of all possible network states. The remaining states are undefined and create uncertainty in which Boolean functions might best explain the data.
Fig. 6.

Example of fitting a Boolean function to a partially defined Boolean function. Highlighted in green are the known relationships from the data set. Highlighted in red are theoretical states that are not defined by the data. Thus, the data provides a “partial definition” of the Boolean functions (pdBF). On the right are three proposed Boolean functions, which are extensions of the partially defined Boolean data on the left. Here each proposed function has radically different behaviors for the undefined states, while perfectly representing the known results. This will result in many possible extensions that classify the pdBF equally well
The best-fit extension paradigm seeks to find an extension of the pdBf that can most accurately model the available data. In this paradigm, an exhaustive search is performed to fit all possible Boolean functions for each species in the network. For each proposed Boolean function for species Si, an error is calculated from the exhaustive search, in which the error is determined by the number of times the proposed Boolean function misclassifies the value of Si in the data set. The Boolean functions with the lowest error are considered candidate functions, which are all equally viable for inclusion in the final model. If a dataset is sufficiently descriptive, there may be a small number of candidate functions, and likewise, if a dataset has few unique time points, there may be many candidate functions.
In the example shown in Fig. 6, multiple Boolean functions each agree with the pdBf for C, but each one has very different implications for the theoretical states which are not present within the dataset. These equations would all be considered candidate networks, and further information would be necessary to differentiate them. A set of candidate equations for each species within the network can be assembled to create a candidate network. Thus, it becomes obvious if there are multiple candidate equations for each species within the network there may be many candidate networks.
Pruning possible network structures
The Boolean network inference process can likely result in multiple candidate networks. Depending on the prior knowledge of a system, known relationships can help identify which candidate functions are more likely. As an example: B-Raf is known to be negatively regulated by Akt [48], therefore upon reviewing candidate functions for B-Raf activity, one might select only functions that include inhibition by Akt. Alternatively, RKIP is known to not inhibit B-Raf activity [49], and therefore one might reject any candidate functions that propose this relationship.
Specific prior knowledge can help reduce the candidate functions for individual species, but there are several tools to reduce the number of candidate equations for all species in a network. One tool to limit the number of candidate networks is to restrict the indegree, or the number of inputs, a species has in its Boolean function. Previous authors have recommended using a low indegree [50], or specifically limited the number of inputs to 3 [51]. The specific limitations will likely depend on the number and nature of the species in the network, and prior knowledge about the system will help predict whether a large or small indegree for a specific species can be expected.
Another useful tool is to remove Boolean equations that lack biological realism. The exhaustive search performed when solving for a best-fit extension is generally inclusive of all possible Boolean functions; however, certain categories of Boolean functions are unlikely in biological networks. One such example is an XOR gate. In our case study with flotetuzumab, the number of total T-cells present in the system was identified to either be determined by an AND function (Eq. 8) or a parity XOR function (Eq. 9).
| (8) |
| (9) |
To our knowledge, there are no natural examples of XOR gates in genetic systems, and the theoretical possibility of any parity function evolving in a natural system has been questioned altogether [52]. This can be ruled out because in this case Any Boolean function that appears biologically unfeasible should be discounted, and the presence of such a relationship might indicate that the dataset being used is lacking or that the true relationships present may be poorly represented by a Boolean function. For this reason, we can dismiss Eq. 9 as a candidate equation for the number of T-cells, reducing the possible relationships to just Eq. 8.
Network reduction and sensitivity analysis
Sensitivity analysis in Boolean networks can be viewed as disrupting the network in some way and observing the response by the model. This can be done by perturbing the value of a single species and evaluating how the system responds, or by adding or removing a relationship between two species (known as a bridging fault) [53]. The perturbation to the system can either be transient (similar to a pharmaceutical or toxin being present in the system), or it can be permanent (referred to as a ‘stuck-at-fault event). Both types of perturbation attacks are biologically relevant, with stuck-at-fault events being used as models of disease progression in oxidative stress [54] and cancer [55–57]. Bridging faults are less biologically feasible, whereas a stuck-at-fault event may represent the over- or under-expression of a gene or biological component representing a fundamental change in the relationships between components.
Figure 7 shows the effect of perturbations on the performance of a network. The top of Fig. 7 shows the normal functioning of a 3-node negative feedback loop. The middle of Fig. 7 shows the effect of a single, transient perturbation on the network: It induces a time-lag in the network output; however, it quickly returns to the same behavior of the original network. The bottom of Fig. 7 shows a stuck-at-fault perturbation: Node C is stuck to a value of 1, and the dynamic behavior of the network is completely stopped. These attacks can be evaluated in several ways: Qualitatively, transient attacks can be studied to determine whether or not a system returns to its normal behavior after a perturbation [58]. If the network does not return to its normal behavior, then likely the perturbed node is critical to the performance of the network. The effect of the perturbation is measured by calculating the probability of activation for each species when the network is perturbed and comparing it to the probability of activation for each species when the network is unperturbed. The ratio of these probabilities determines how influential the attacked node is. This has previously been used to identify drug targets for Lupus [59].
Fig. 7.

Examples of “attacks” on a Boolean network, which can be used to assess sensitivity. Top shows a network consisting of nested feedback loops, and the steady-state network output is on the right. This steady-state behavior is oscillatory, T6 will be the same as T1, and the network will repeat indefinitely. The middle shows a “perturbation”, in which the value of a variable is switched from 0 to 1. Here the system quickly returns to the same steady-state behavior, although phase-shifted. Bottom shows a “stuck-at-fault” attack, in which one variable is forced permanently to a value. In this case, “C” is forced to a value of “1”, the system never returns to the same behavior, and the system becomes stuck in a certain state
Whereas sensitive nodes may be valuable to identify potential drug targets, nodes that have little influence over the overall network dynamic may be dropped. This can serve to simplify the Boolean network or to make it easier to fit the network to a continuous model. Networks can consist of tens or hundreds of nodes, which can make them difficult to analyze or convert into continuous models given the large number of reactions that occur. Several methods exist for reducing Boolean networks, all of which aim to retain fundamental dynamic properties of the network [60–62]. The simplest form of network reduction is to identify all nodes that do not have a self-loop (i.e., do not have an edge from themselves to themselves). Each non-self-looping node can be removed, one at a time, and the incoming nodes from the reduced node are attached to the children of the reduced node. Figure 8 shows an example of this approach: Node C is reduced, and the incoming edge to Node C is attached to Node A. The network is further reduced to only consist of Node A. During this process, the network remains a negative feedback loop and retains its oscillatory behaviors.
Fig. 8.

Example of network reduction. In this method, a node with a single input and output can be reduced, and the input to the reduced node is instead attached to the children of that node. In the middle, the node “C” is reduced, and the input to C (B) is inherited by the children of C. In this case, only A is a child of C, and therefore the equation for A is set to be A = B. This process is repeated and reduces B (Bottom). During this process, the network remains oscillatory and fundamentally retains the structure of a negative feedback loop. The period is different in each case, with the full network having a period of 6, and the fully reduced network at the bottom having a period of 2
We contend that this network reduction should only be done after any type of perturbation/sensitivity analysis has been performed. If a node is vulnerable to a stuck-at-fault attack, then it may be critical to the progression of a disease or represent a potential future drug target. With this in mind, highly sensitive nodes should not be reduced, even if they may be reduceable using an existing reduction method.
Converting Boolean networks into ODE models
Once a candidate network has been selected for further analysis, it can be converted into a more detailed model. Each Boolean function within a network can be fitted to a continuous approximation. Homologs between discrete logic functions and continuous functions have been made since the inception of Boolean networks [19]. The Boolean function for a species can be represented as an ordinary differential equation with the form:
| (10) |
with τ representing the lifespan of the species, and is a homolog for the Boolean function of species y. In this species y equation, the species y is being synthesized based on the value of and τ, and is being degraded at a rate dependent on the concentration of y and the value of τ. Many biochemical reactions follow Hill-type dynamics [63], which makes Hill-type equations a suitable starting point for these homolog equations. From this, we can assume a general structure of:
| (11) |
with X as a species y is dependent on, n corresponds to the slope of the curve, and k represents a threshold of activity. This generic structure has been generalized to Boolean equations [28], allowing for a rapid and automatic transition from the Boolean space to the continuous space. There are some obvious limitations to this method: We assume that the rate of synthesis and the rate of degradation is both dependent on the same τ, rather than using different time constants. This serves to bound the value of y between 0 and 1, but removing these bounds may be useful when further developing a model. One can further determine that this ODE will not apply to all species: cell types may need a proliferation constant, rather than a degradation constant. With these limitations in mind, we accept that the system of ODEs produced by this process can serve as a backbone of a final model but may need to be updated to reflect biological reality.
This type of automatic fitting is useful in generating an overall structure but refining the model will likely be necessary in many cases. If species within the network follow different types of behaviors or do not conform to the assumptions of the Boolecube/Hillcube equations described above, then those equations will need to be reworked. Parameterization also plays a critical role: the same set of equations with different parameters may create different dynamics. This may mean that even if a Boolean model properly describes the binarized data, the converted system of ODE equations may not appear to describe the continuous data available.
Figure 9 shows a very simple Boolean network: two nodes creating a negative feedback loop with one another. Using a synchronous updating scheme, the network will be oscillatory regardless of the starting values of either node. Hence, this network is converted into Hillcube equivalents:
| (12) |
| (13) |
Fig. 9.

Example of converting a Boolean network into a continuous system. Here the Boolean network is predefined to consist of a negative feedback loop, which creates an oscillatory output at a steady-state (Top). The bottom represents the output of the ODE model fitted by ODEfy with 3 different parameterizations. These show non-oscillatory, damped oscillatory, and steady-state oscillatory behaviors, even though the structure of the equation is the same
The equations bound the values of A and B between 0 and 1, relating them to the binary space. The parameters n1 and n2 reflect the steepness of the response of each variable. The parameters τA and τB represent the rate of degradation or lifetime of the species. The parameters ka and kb correspond to the value at which a variable is considered “active” or when it is half maximally active. All of this assumes a Hill-function-like relationship, but if there are linear kinetics or if there is competition between species, then other equations may need to be substituted. This pipeline was implemented in R and MATLAB. The binarization and network inference have been implemented in R, the conversion from the Boolean network to ODEs was done in MATLAB using ODEfy, and simulations and graphs were generated in MATLAB.
Case study: flotetuzumab
Our framework was applied for the development of a biological network modeling the pharmacodynamic effects of flotetuzumab [34]. This bispecific antibody-based construct binds to CD3 and CD123, forming a tri-molecular complex. This complex leads to a pool of “active” T-cells, which proliferate and result in an expansion of the number of T-cells post-treatment. Mathematical models of the drug show that CD123 + cells present should be depleted and gradually return post treatment [33].
After a dose was administered, the model output was sampled every 10 min for the first hour. Following this hour, the model output was sampled every 24 h until the next dosing event. The pharmacodynamic effect of flotetuzumab (also referred to as MGD006) is relatively rapid compared to the overall timeframe of the simulated treatment. This type of sampling enabled the capture of more unique network states by having a higher sampling density when the network is changing more rapidly. This dataset was used in our pipeline to infer a Boolean network consisting of 6 nodes present in the pharmacodynamic system.
This inference was done using the Best-Fit extension algorithm, described in detail in the methods section. This process performs an exhaustive search for all possible Boolean functions which may explain the binarized dataset and identifies the Boolean function with the smallest number of inputs which produces the smallest error for each variable [35]. The identified Boolean functions were then converted into a system of ODEs based on the Hill-Cube fitting described in the methodology [29].
As mentioned earlier, the two heterodimer species DTA and DTC had the same binarized datasets, and therefore were reduced into a single variable DT. During the network inference, the drug (CC) was assumed to have no inputs from the system. The heterodimers (DT), trimolecular complex (SYN), active T-cells (ATC), T-cells (TC), and CD123 ? cells (CD123) were not restricted in identifying potential relationships. This algorithm uniquely identified a single likely relationship for each species except for the T-cells. The species TC had two potential equations (Eqs. 8 and 9), although one of the equations contained an exclusive OR (XOR) function and therefore was not considered. This left a single unique set of Boolean equations (Eqs. 14 to 19), which was identified as the candidate network:
| (14) |
| (15) |
| (16) |
| (17) |
| (18) |
| (19) |
Figure 10 shows a comparison of the relationships of the original QSP model, containing only the variables considered in our dataset, compared to the network inferred using the best fit extension with the binarized dataset. All identified relationships are known to reflect biological relationships present within the system. However, several relationships were not identified: CD123 + cells should have a self-loop to represent the proliferation of the cells, and there should be a feedback loop between DT and SYN. We hypothesize that in the given dataset, these specific relationships do not generally drive behavior, and a more detailed dataset may be able to infer these relationships. One challenge in these identifications is that the inference may only detect one direction of bidirectional reactions: If the reversible reaction DT ↔ SYN has a much stronger forward reaction than the backward reaction, only the forward reaction may be detected. Careful experimental design may allow for one to collect sufficient data to infer both directions of a reaction, but it is recommended to be mindful of potential false negatives in model building.
Fig. 10.

Comparison between the true DART network (left) and an inferred network (right). The true network is inferred qualitatively from the equations presented in the original paper, for the species present in the dataset. The overall structures are highly similar, with 0 misidentified relationships and 2 relationships missing. The relationships in the original model but missing in the inferred model are represented with dashed lines. These include the self-proliferation of CD123 cells, and then dissociation of the trimolecular synapse into the heterodimer DTA or DTC
These equations were then mapped into continuous ODEs, and a simulation was performed to determine if the dynamics of the final network were qualitatively similar to those in the literature [33]. Equations 20 to 25 are the ODEs produced by the pipeline:
| (20) |
| (21) |
| (22) |
| (23) |
| (24) |
| (25) |
Despite the missed relationships, and the missing species from the network the overall dynamics of the system are qualitatively correct. The output of the ODE model is present in Fig. 11. Here the drug was set to a value of “1” initially to simulate a dose. There is a sharp rise and initially sustained presence of the Drug-CD3 complex, followed by the tri-molecular complex and presence of active T-cells. Once these values decrease, there is a sharp increase in the presence of total T-cells and the number of CD123 + cells. Overall, the equations identify the relationships between species but do not fully capture the dynamics. For example, the differential equations for DTA, DTC, and SYN from the original model are:
| (26) |
| (27) |
| (28) |
Fig. 11.

Comparison of the dynamics of the ODE identified by our pipeline (A) compared to the original continuous model, which the dataset is based on (B). Many qualitative behaviors are the same: CD123 is suppressed after the first dose and does not recover until after the final dose is finished, total T-cell counts are suppressed during the dose but recover before the next dose is taken and activated T-cells are persistent until after the final dose is finished. Although this is not a perfect fit, this demonstrates that with only data one can recreate the relationships and qualitative dynamics of the system
The equations for DTA and DTC are dependent on CC, the relevant unbound receptors for the dimers, and SYN. We identify the combined DT variable as being dependent on CC, which is present in both equations. Our system did not have the unbound receptors present, and therefore could not include them in the equation. Finally, we did not identify DT as being dependent on SYN as it is in the original equation. This is considered a false negative: our process failed to identify a relationship that was present within the original network. One of the limitations of this methodology is that reversible reactions (such as DT reversibly forming SYN) may only be identified in a single direction, especially if one direction of the reaction is significantly stronger. The original equation of SYN is dependent on DTA and DTC while in our equation for SYN it is dependent on the combined dimer variable DT, showing the accuracy of the network inference.
In addition, there are no Hill-type relationships in these differential equations, indicating that our initial assumptions about the equation structure were incorrect; using linear relationships would be more accurate to the original model. Because the differential equations produced are bounded between 0 and 1, there are no rate constants (such as the kon and koff terms), only a single value of τ, which influences both the rate of production and rate of degradation. Although this might be suitable for a first-pass analysis, this will likely need to be removed or reworked for a more refined ODE model. Within the constructed equations, the only way to reduce the rate of degradation is to change the value of τ, which will also slow the rate of production of a species. These equations also assume that species should degrade over time; however, this network utilizes 3 different cellular species (active T-cells, total T-cells, and CD123 + cells), all of which have self-proliferation not captured by the model. Whereas some simpler behaviors might be able to be adapted to the pipeline (e.g., assuming linear instead of Hill-type relationships or using proliferation terms instead of degradation terms within equations), other behaviors might be harder to infer through this process. The equation for activated T-cells from the original paper utilized a value of ‘Time after dose’ to define an analytical equation. ‘Time’ is not defined throughout the pipeline, making it impossible to infer a relationship between the value of a species and time through this pipeline.
Despite these limitations, the final ODE model produced through the methods described qualitatively captures the behaviors of the original gold-standard model. Despite several variables being present in the original model (such as receptor occupancy) that do not exist within our network, the relationships identified between the individual components are fundamentally correct.
Conclusions and future directions
We were able to capture the qualitative dynamics of the flotetuzumab pharmacodynamic system, and the relationships that were identified appear to be accurate and reflect biological reality. The relative concentrations of the cellular and biochemical species need to be further studied within the context of our final model (as all values are bounded between 0 and 1), but this demonstrates that even in systems with complex behaviors, this pipeline can robustly capture the overarching and critical behaviors and relationships. The inability to differentiate between DTA and DTC represents a loss of information when the two variables are combined, and this represents a fundamental limitation of data-driven approaches: if two variables appear to have highly similar patterns it becomes difficult to differentiate between them without more information. Despite this limitation, many key connections within the network are accurately identified. The key biological results identified in the original paper were still present within the Boolean network inferred within this pipeline: our final model demonstrates intermittent changes in the number of T-cells, while the number of CD123 remains suppressed for the duration of the simulation.
Boolean networks present an opportunity within quantitative systems pharmacology and systems biology. These networks synthesize qualitative and quantitative techniques that can utilize less data to create models early in the research of new systems. We envision these techniques being used in the initial phases of research to begin the development of models. These early models will likely be imperfect but present two opportunities: to serve as a backbone for building a more detailed model, with most of the relationships being identified in a data-driven manner when expert knowledge of a system may be lacking, or to serve to test for critical species within a system and drive hypothesis testing and data collection.
Our case study demonstrates that even when different types of species exist within a system (i.e., chemical species and cellular species), or when those species do not follow biochemical assumptions (cellular species that should proliferate, as opposed to the assumed degradation of biochemical species), the overall structure and dynamics of the network can still be accurately inferred (Figs. 10 and 11). This shows that the model building process: binarizing data, inferring Boolean networks, fitting the Boolean networks to ODEs, and simulating the system of ODEs, is robust toward many types of behaviors. There may be value to creating new methods for transforming Boolean equations to ODEs, and different types of biological components may require different types of ODE structures to accurately represent their emergent and dynamic behaviors.
Acknowledgements
IPA acknowledges support from NIH GM131800.
References
- 1.Emmert-Streib F, Dehmer M (2011) Networks for systems biology: conceptual connection of data and function. IET Syst Biol 5(3):185–207 [DOI] [PubMed] [Google Scholar]
- 2.Berger SI, Iyengar R (2009) Network analyses in systems pharmacology. Bioinformatics 25(19):2466–2472 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Berger SI, Iyengar R (2011) Role of systems pharmacology in understanding drug adverse events. Wiley Interdiscip Rev Syst Biol Med 3(2):129–135 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Danhof M (2016) Systems pharmacology–towards the modeling of network interactions. Eur J Pharm Sci 94:4–14 [DOI] [PubMed] [Google Scholar]
- 5.Wist AD, Berger SI, Iyengar R (2009) Systems pharmacology and genome medicine: a future perspective. Genome Med 1(1):11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Del Sol A et al. (2010) Diseases as network perturbations. Curr Opin Biotechnol 21(4):566–571 [DOI] [PubMed] [Google Scholar]
- 7.Jordan F, Nguyen TP, Liu WC (2012) Studying protein-protein interaction networks: a systems view on diseases. Brief Funct Genomics 11(6):497–504 [DOI] [PubMed] [Google Scholar]
- 8.Biane C, Delaplace F (2017) Abduction based drug target discovery using Boolean control network. International Conference on Computational Methods in Systems Biology. Springer. [Google Scholar]
- 9.Haanstra JR, Bakker BM (2015) Drug target identification through systems biology. Drug Discov Today Technol 15:17–22 [DOI] [PubMed] [Google Scholar]
- 10.Huang J et al. (2013) Systematic prediction of pharmacodynamic drug-drug interactions through protein-protein-interaction network. 9(3):e1002998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ayyar VS, Jusko W (2020) Transitioning from basic towards systems pharmacodynamic models: lessons from corticosteroids. Pharmacol Rev 72:1–25 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Friedrich CM (2016) A model qualification method for mechanistic physiological QSP models to support model-informed drug development. CPT: Pharmacometr Syst Pharmaco 5(2):43–53 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Androulakis IP (2016) Quantitative systems pharmacology: a framework for context. Curr Pharmacol Rep 2(3):152–160 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Androulakis IP (2015) Systems engineering meets quantitative systems pharmacology: from low-level targets to engaging the host defenses. Wiley Interdisc Rev 7(3):101–112 [DOI] [PubMed] [Google Scholar]
- 15.Peterson MC, Riggs MM (2015) FDA advisory meeting clinical pharmacology review utilizes a quantitative systems pharmacology (QSP) model: a watershed moment. CPT Pharmacometrics Syst Pharmacol 4(3):e00020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Putnins M, Androulakis IP (2019) Boolean modeling in quantitative systems pharmacology: challenges and opportunities. Crit Rev Biomed Eng 47(6):473–488 [DOI] [PubMed] [Google Scholar]
- 17.Kauffman S (1969) Homeostasis and differentiation in random genetic control networks. Nature 224(5215):177–178 [DOI] [PubMed] [Google Scholar]
- 18.Thomas R (1973) Boolean formalization of genetic control circuits. J Theor Biol 42(3):563–585 [DOI] [PubMed] [Google Scholar]
- 19.Glass L, Kauffman SA (1973) The logical analysis of continuous, non-linear biochemical control networks. J Theor Biol 39(1):103–129 [DOI] [PubMed] [Google Scholar]
- 20.Kraeutler MJ, Soltis AR, Saucerman JJ (2010) Modeling cardiac β-adrenergic signaling with normalized-Hill differential equations: comparison with a biochemical model. BMC Syst Biol 4(1):1–12 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Morris MK et al. (2010) Logic-based models for the analysis of cell signaling networks. Biochemistry 49(15):3216–3224 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Balbas-Martinez V et al. (2018) A systems pharmacology model for inflammatory bowel disease. 13(3):e0192949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bloomingdale P, Niu J, Mager DE (2018) Boolean network modeling in systems pharmacology. J Pharmacokinet Pharmacodyn 45(1):159–180 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Thakar J et al. (2007) Modeling systems-level regulation of host immune responses. PLoS Comput Biol 3(6):e109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Birtwistle M, Mager D, Gallo J (2013) Mechanistic vs Empirical network models of drug action. CPT Pharmacometr Syst Pharmacol 2(9):1–3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Müssel C, Hopfensitz M, Kestler HA (2010) BoolNet—an R package for generation, reconstruction and analysis of Boolean networks. Bioinformatics 26(10):1378–1380 [DOI] [PubMed] [Google Scholar]
- 27.Terfve C et al. (2012) CellNOptR: a flexible toolkit to train protein signaling networks to data using multiple logic formalisms. BMC Syst Biol 6(1):1–14 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Krumsiek J et al. (2010) Odefy-from discrete to continuous models. 11(1):1–10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Wittmann DM et al. (2009) Transforming Boolean models to continuous models: methodology and application to T-cell receptor signaling. BMC Syst Biol 3(1):98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Carter GW (2005) Inferring network interactions within a cell. Brief Bioinform 6(4):380–389 [DOI] [PubMed] [Google Scholar]
- 31.Wang RS et al. (2007) Inferring transcriptional regulatory networks from high-throughput data. Bioinformatics 23(22):3056–3064 [DOI] [PubMed] [Google Scholar]
- 32.Gao S et al. (2018) Efficient Boolean modeling of gene regulatory networks via random forest based feature selection and best-fit extension. In: 2018 IEEE 14th International Conference on Control and Automation (ICCA). IEEE [Google Scholar]
- 33.Campagne O et al. (2018) Integrated pharmacokinetic/pharmacodynamic model of a bispecific CD3xCD123 DART molecule in nonhuman primates: evaluation of activity and impact of immunogenicity. Clin Cancer Res 24(11):2631–2641 [DOI] [PubMed] [Google Scholar]
- 34.Chichili GR et al. (2015) A CD3xCD123 bispecific DART for redirecting host T cells to myelogenous leukemia: preclinical activity and safety in nonhuman primates. Sci Transl Med 7(289):289ra82. [DOI] [PubMed] [Google Scholar]
- 35.Boros E, Ibaraki T, Makino K (1998) Error-free and best-fit extensions of partially defined Boolean functions. Inf Comput 140(2):254–283 [Google Scholar]
- 36.Saez-Rodriguez J et al. (2009) Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction. Mol Syst Biol 5(1):331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Barman S, Kwon Y-KJPO (2017) A novel mutual information-based Boolean network inference method from time-series gene expression data. PloS One 12(2):e0171097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lukacs PM, Burnham KP, Anderson DR (2010) Model selection bias and Freedman’s paradox. Ann Inst Stat Math 62(1):117 [Google Scholar]
- 39.Nordling TE (2013) Robust inference of gene regulatory networks. PhD, KTH Royal Institute of Technology [Google Scholar]
- 40.Cheng D, Qi H, Li Z (2011) Model construction of Boolean network via observed data. IEEE Trans Neural Netw 22(4):525–536 [DOI] [PubMed] [Google Scholar]
- 41.Gonçalves J, Warnick S (2008) Necessary and sufficient conditions for dynamical structure reconstruction of LTI networks. IEEE Trans Autom Control 53(7):1670–1674 [Google Scholar]
- 42.Berestovsky N, Nakhleh L (2013) An evaluation of methods for inferring Boolean networks from time-series data. PLoS One 8(6):e66031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Hopfensitz M et al. (2012) Multiscale binarization of gene expression data for reconstructing Boolean networks. IEEE/ACM Trans Comput Biol Bioinform 9(2):487–498 [DOI] [PubMed] [Google Scholar]
- 44.Zhou X, Wang X, Dougherty ER (2003) Binarization of microarray data on the basis of a mixture model. J Mol Cancer Ther 2(7):679–684 [PubMed] [Google Scholar]
- 45.Shmulevich I, Kauffman SA, Aldana M (2005) Eukaryotic cells are dynamically ordered or critical but not chaotic. Proc Natl Acad Sci 102(38):13439–13444 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Trinh H-C, Kwon Y-K (2021) A novel constrained genetic algorithm-based Boolean network inference method from steady-state gene expression data. Bioinformatics 37(Supplement_1):i383–i391 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Charlebois DA et al. (2007) Effects of microarray noise on inference efficiency of a stochastic model of gene networks. WSEAS Trans Biol Biomed 4:15–21 [Google Scholar]
- 48.Guan KL et al. (2000) Negative regulation of the serine/threonine kinase B-Raf by Akt. J Biol Chem 275(35):27354–27359 [DOI] [PubMed] [Google Scholar]
- 49.Trakul N et al. (2005) Raf kinase inhibitory protein regulates Raf-1 but not B-Raf kinase activation. J Biol Chem 280(26):24931–24940 [DOI] [PubMed] [Google Scholar]
- 50.Tabus I, Astola J (2001) On the use of MDL principle in gene expression prediction. EURASIP J Appl Signal Process 4:297–303 [Google Scholar]
- 51.Kim H, Lee JK, Park TJBB (2007) Boolean networks using the chi-square test for inferring large-scale gene regulatory networks. BMC Bioinformatics 8(1):1–15 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Valiant LG (2009) Evolvability. J ACM 56(1):1–21 [Google Scholar]
- 53.Abramovici M, Breuer MA, Friedman AD (1990) Digital systems testing and testable design. Vol. 2. Computer science press; New York [Google Scholar]
- 54.Sridharan S et al. (2012) Boolean modeling and fault diagnosis in oxidative stress response. BMC Genomics 13(6):S4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Layek R et al. (2011) Cancer therapy design based on pathway logic. Bioinformatics 27(4):548–555 [DOI] [PubMed] [Google Scholar]
- 56.Lin PC, Khatri SP (2012) Application of Max-SAT-based ATPG to optimal cancer therapy design. BMC Genomics 13(6):S5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Mohanty AK, Datta A, Venkatraj J (2012) Determining the relative prevalence of different subpopulations in heterogeneous cancer tissue. In: Proceedings 2012 IEEE International workshop on genomic signal processing and statistics (GENSIPS). IEEE. [Google Scholar]
- 58.Ghanbarnejad F, Klemm K (2011) Stability of Boolean and continuous dynamics. Phys Rev Lett 107(18):188701. [DOI] [PubMed] [Google Scholar]
- 59.Ruiz-Cerdá ML et al. (2016) Towards patient stratification and treatment in the autoimmune disease lupus erythematosus using a systems pharmacology approach. Eur J Pharm Sci 94:46–58 [DOI] [PubMed] [Google Scholar]
- 60.Saadatpour A, Albert R, Reluga TC (2013) A reduction method for Boolean network models proven to conserve attractors. SIAM J Appl Dyn Syst 12(4):1997–2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Veliz-Cuba A (2011) Reduction of Boolean network models. J Theor Biol 289:167–172 [DOI] [PubMed] [Google Scholar]
- 62.Zanudo JG, Albert R (2013) An effective network reduction approach to find the dynamical repertoire of discrete dynamic networks. Chaos 23(2):025111. [DOI] [PubMed] [Google Scholar]
- 63.Weiss JN (1997) The Hill equation revisited: uses and misuses. FASEB J 11(11):835–841 [PubMed] [Google Scholar]
