Skip to main content
PNAS Nexus logoLink to PNAS Nexus
. 2025 Jul 23;4(8):pgaf228. doi: 10.1093/pnasnexus/pgaf228

Critical assessment of the ability of Boolean threshold models to describe gene regulatory network dynamics

Claus Kadelka 1,b,, Kishore Hari 2
Editor: Panayiotis Benos
PMCID: PMC12344490  PMID: 40809087

Abstract

The inference of gene regulatory networks (GRNs) from high-throughput data constitutes a fundamental and challenging task in systems biology. Boolean networks are a popular modeling framework to understand the dynamic nature of GRNs. In the absence of reliable methods to infer the regulatory logic of Boolean GRN models, researchers frequently assume threshold logic as a default. Using the largest repository of published expert-curated Boolean GRN models as best proxy of reality, we systematically compare the ability of two popular threshold formalisms, the Ising and the 01 formalism, to truthfully recover biological functions and biological system dynamics. While Ising rules match fewer biological functions exactly than 01 rules, they yield a better average agreement. In general, more complex regulatory logic proves harder to be represented by either threshold formalism. Informed by these results and a meta-analysis of regulatory logic, we propose modified versions for both formalisms, which provide a better function-level and dynamic agreement with biological GRN models than the usual threshold formalisms. For small biological GRN models with low connectivity, corresponding threshold networks exhibit similar dynamics. However, they generally fail to recover the dynamics of large networks or highly connected networks. In conclusion, this study provides new insights into an important question in computational systems biology: how truthfully do Boolean threshold networks capture the dynamics of GRNs?

Keywords: Boolean networks, network inference, threshold rules, systems biology, dynamical systems


Significance Statement.

Gene regulatory networks (GRNs) describe the complex interactions between genes. GRNs control biological processes, respond to environmental changes, and contribute to diseases. An accurate understanding of the dynamics of GRNs is therefore crucial. Given insufficient data, researchers frequently employ Boolean network models with default threshold logic to study the dynamics of GRNs. We systematically assess the ability of two commonly used types of threshold networks to recover the true regulatory logic and the dynamics of published expert-curated Boolean GRN models. Inspired by biological insights, we further propose modifications to each threshold formalism that improve their match with biological networks.

Introduction

Gene regulatory network (GRN) inference, a fundamental but challenging task in computational systems biology, aims to uncover the complex interactions between genes from high-throughput expression data. Detailed knowledge of GRNs can help us understand how genes control biological processes, respond to environmental changes, or contribute to diseases. The complexity of GRN inference depends on the desired level of knowledge (Fig. 1). Earliest methods solely infer undirected coexpression networks (1, 2). Around 2010, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) challenges boosted the development of numerous GRN inference methods, as well as their comprehensive assessment using both synthetic and real bulk transcriptomic data (3–5). Many of these methods, e.g. the random forest-based GENIE3 (6), are able to infer directed networks. The advent of single-cell data led to the development of further, more specialized inference methods, comprehensively assessed in Refs. (7, 8). Some of these methods, e.g. SCODE (9) or SINCERITIES (10), tackle the even harder task of inferring signed directed networks, i.e. differentiating between positive (e.g. activation) and negative (e.g. inhibition) effects. Even a signed directed network is however static. Gene regulation, on the other hand, is a highly dynamic process.

Fig. 1.

Fig. 1.

The difficulty of GRN inference depends on the desired level of knowledge. Given high-throughput transcriptomic data, the inference of undirected coexpression networks is the simplest task. Inferring the directionality of regulation and the type of regulation (positive: green arrows, negative: red arrows) adds complexity. Inference of the underlying Boolean regulatory logic is even much harder. This explains why frequently only the signed directed wiring diagram is known and default threshold assumptions are made about the regulatory logic.

Discrete dynamical systems (e.g. Boolean networks) are a popular tool to model the dynamic nature of GRNs. The static—frequently signed—directed graph, termed wiring diagram or dependency graph, describes the dependencies of the system. An additional set of update rules (e.g. Boolean functions) describes the regulatory logic governing the expression of each gene. In the simplest Boolean case, each gene is either on/active or off/inactive. Time is modeled in discrete time steps, and the expression level of a gene at a time step depends solely on the expression level of its regulators at the previous time step. Inference of the update rules represents a substantially more difficult task than inference of only the wiring diagram. Nevertheless, there exist several Boolean network inference tools (see e.g. (11, 12) and a recent review (13)). To this day however, none of these tools can accurately and consistently infer complex Boolean GRNs.

To enable analyses of the dynamics of GRNs, modelers therefore rely on assumptions about the Boolean rules. A common assumption is that all rules are so-called threshold rules, where the number of active and inactive activators and inhibitors is compared to determine the expression of a gene at the next time point. By design, this assumes that gene regulatory processes are additive (14). We know however that gene regulation often involves combinatorial interactions: A recent meta-analysis of 122 expert-curated Boolean GRNs found that the majority of gene regulatory logic is described by so-called nested-canalizing functions (NCFs) (15). For example, a gene might be activated whenever one of multiple transcription factors is present—an example of a nested-canalizing OR function. This raises multiple important questions for the dynamical analysis of Boolean GRNs:

  1. How much do the rules in expert-curated Boolean GRN models differ from threshold rules?

  2. How different are the dynamics of expert-curated GRN and corresponding threshold networks?

  3. There exist multiple threshold formalisms. Is one consistently “better” than another?

Using the repository of 122 expert-curated Boolean GRNs as the closest proxy of a “ground truth,” this manuscript attempts to answer these questions.

Methods

Boolean networks

A Boolean network  F=(f1,,fn) constitutes a popular modeling framework in systems biology. It consists of n nodes (e.g. genes, proteins, etc.). Each node can only be in two states, denoted F2. Commonly, F2={0,1} or F2={1,1}. With biology in mind, we refer to the two states as absence or OFF (e.g. low protein concentration) and presence or ON (e.g. high protein concentration), respectively. Each node xi in a Boolean network possesses a Boolean update function  fi:F2nF2, which describes the state of xi at the next time point given the current state of the system. Under a synchronous updating scheme, all nodes are updated simultaneously. In this case, F:F2nF2n defines a deterministic state transition graph, also known as state space, which consists of 2n states x=(x1,,xn)F2n and their deterministically defined transitions. Asynchronous updating schemes allow for nodes to be updated separately, at potentially different time scales (16). In most schemes, only a single node is updated at a time. In this case, the state space is typically stochastic because xF2n may transition to n different states yF2n, depending on which node xi is updated. In this study, we consider both a synchronous updating scheme and a general asynchronous updating scheme (each node is updated with equal probability (17)), which has been established as the most efficient and informative asynchronous updating method (18).

Boolean threshold functions

A Boolean function is a threshold function if there exists a hyperplane that separates the points where the function is OFF and ON. More precisely, f:F2nF2 is a threshold function if there exist weights w1,,wnR and a threshold θR such that for all (x1,,xn)F2n,

f(x1,,xn)=0if and only ifi=1nwixiθ. (1)

Threshold functions arise in a number of applications ranging from electrical engineering to reliability and game theory (19). Threshold functions are also frequently used in GRN inference. Here, only the type of regulation (activation vs. inhibition) is typically known from experiments. In other words, the sign of the weights wi is known but the magnitude is unknown. Thus, all standard threshold formalisms used in GRN inference assume, in the absence of further experimental evidence, wi{1,1},θ=0. We refer to such threshold functions as basic. The weights wi can be thought of as labels of the edges of the wiring diagram of a biological Boolean network. That is, all experimental evidence is contained in the wiring diagram  W{1,0,1}n×n, also known as interaction matrix, where

Wij={1ifxjactivatesxi,1ifxjinhibitsxi,0ifxjdoes not regulatexi. (2)

In this manuscript, we assume that the wiring diagram has already been inferred and compare two threshold formalisms that are frequently employed to model the dynamics of biological Boolean networks: the Ising and the 01 formalism. The Ising formalism stems from statistical physics where it describes the zero temperature Glauber dynamics of a disordered Ising model, commonly used to model ferromagnetic hysteresis in ferromagnets and spin glasses (20–22). In the Ising formalism, the two binary states are F2={1,1}. In biological applications, 1 corresponds to the absence of a protein (i.e. OFF) and 1 to the presence (i.e. ON). In the 01 formalism, the two binary states are instead F2={0,1}, with 0 indicating the OFF state. For both threshold formalisms, the next state of node xi is derived from the current state of the network as follows:

xi(t+1)={ONifjWijxj(t)>0,xi(t)ifjWijxj(t)=0,OFFifjWijxj(t)<0, (3)

where W{1,0,1}n×n is the underlying wiring diagram.

For an alternative form of Eq. 3, we define Ai,Ii{1,2,,n} as the set of indices of activators and inhibitors of node xi. That is, Ai={j:Wij=1} and Ii={j:Wij=1}. With this, we have

xi(t+1)={ONifjAixj(t)>jIixj(t),xi(t)ifjAixj(t)=jIixj(t),OFFifjAixj(t)<jIixj(t). (4)

Whether the OFF state is represented by 1 or 0 has a major implication: A 01 threshold function is ON if the number of present (i.e. ON) activators is higher than the number of present inhibitors. If both numbers are equal, the state of node xi does not change. The number of absent (i.e. OFF) activators and inhibitors does not matter in the 01 formalism but it does matter in the Ising formalism. An Ising threshold function is ON if the number of present activators and absent inhibitors is higher than the number of absent activators and present inhibitors.

For ease of comparison, we solely use F2={0,1} throughout the Results and Discussion. To express both 01 and Ising threshold functions as Boolean functions f:{0,1}n{0,1}, we modify Eq. 4 slightly by replacing OFF and ON with 0 and 1. We define X0(t)={j:xj(t)=0} and X1(t)={j:xj(t)=1}. With this, the 01 threshold formalism yields

xi(t+1)={1if|AiX1(t)|>|IiX1(t)|,xi(t)if|AiX1(t)|=|IiX1(t)|,0if|AiX1(t)|<|IiX1(t)|, (5)

while the Ising formalism yields

xi(t+1)={1if|AiX1(t)|+|IiX0(t)|>|AiX0(t)|+|IiX1(t)|,xi(t)if|AiX1(t)|+|IiX0(t)|=|AiX0(t)|+|IiX1(t)|,0if|AiX1(t)|+|IiX0(t)|<|AiX0(t)|+|IiX1(t)|. (6)

Similarity between Boolean functions

To assess the similarity between two Boolean functions f(x1,,xm) and g(x1,,xn), where mn, we generate an extended Boolean function f¯(x1,,xm,xm+1,,xn)=f(x1,,xm), which does not depend on the last nm variables. We then quantify the similarity s(f,g) between f and g as the proportion of inputs x=(x1,,xn)F2 for which f¯(x)=g(x). That is,

s(f,g)=12nxF2n1[f¯(x)=g(x)][0,1]. (7)

For example, to compare f(x1)=x1 and g(x1,x2)=x1x2, we generate f¯(x1,x2)=x1 and observe that f¯(x1,x2)=g(x1,x2) whenever (x1,x2)(0,1). Since f and g yield the same output for 3 out of 4 possible inputs, s(f,g)=3/4.

Similarity between Boolean network dynamics

We employ several measures to quantify the similarity between the dynamics of two Boolean networks F,G:F2nF2n. The state space describes the entire network dynamics. The overlap between the synchronous state spaces of networks F,G, denoted DSsync(F,G), can be quantified by the proportion of states that update to the same state under F and G. i.e.

DSsync(F,G)=12nxF2n1[F(x)G(x)][0,1]. (8)

Instead of the indicator function, we can use the mean to quantify the difference between F(x) and G(x). This yields an alternative measure

DSasync(F,G)=1n2nxF2ni=1n1[(F(x))i=(G(x))i][0,1], (9)

which describes the overlap between the state spaces of the two networks under the general asynchronous updating scheme. With Eq. 7, this is simply the mean similarity of the Boolean functions, i.e.

DSasync(F,G)=1ni=1ns(fi,gi). (10)

Due to their finite size and deterministic transitions, synchronously updated Boolean networks eventually exhibit periodic dynamics and settle into an attractor—either a steady state or a limit cycle. Asynchronously updated Boolean networks—with stochastic transitions—share the same steady states but they rarely have periodic limit cycles. Instead, they often possess so-called complex attractors, a set of states wherein the system oscillates, typically without any periodicity. In biological networks, the attractors correspond to phenotypes or cell types. To assess the similarity of the attractor spaces of Boolean networks F,G, we first obtain the network attractors and corresponding basin sizes. While the entire state space of small Boolean networks can be computed quickly, the exponentially increasing size of the state space makes this impossible for large networks. To avoid introducing bias, we used the same method to sample the state space of each network, irrespective of network size. For each network, we generated N=1,000 random initial states and repeatedly updated each initial state until it transitioned into a periodic orbit, associated with an attractor. The proportion of random initial states that transitioned to a specific attractor served as estimate for the corresponding relative basin size. We note that an attractor with relative basin size b(0,1] will be found with probability p(b,N)=1(1b)N. For example, we will identify attractors with relative basin sizes of b=1% and b=0.1% with a probability of 99.996 and 63.23%, respectively. Thus, while our sampling approach cannot reliably identify all attractors, it will find all attractors with a decent basin size (i.e. all biologically meaningful attractors) and provides unbiased estimates of the relative basin sizes.

Let the lists A(F),A(G) describe the attractors of F and G. Each element in these lists is an ordered sequence of states (x(1),,x(l)), where l is the length (i.e. periodicity) of the attractor. Further, let B(F)(0,1]|A(F)|,B(G)(0,1]|A(G)| describe the relative basin sizes corresponding to the attractors. Note that bB(F)b=bB(G)b=1. As an example, consider the 2-node Boolean network F(x1,x2)=(x1x2,x1¬x2). Under synchronous update, F has two attractors, a steady state and a 2-cycle: A(F)=[(00),(10,11)]. The state 01 transitions to 10, which is part of the 2-cycle. Thus, B(F)=[1/4,3/4]. From the set of attractors and corresponding basin sizes, we can compute the long-term distribution μF of a Boolean network F when initializing F at random (i.e. when starting from any state xF2n with equal probability). For any xF2n, we have

μF(x)={b(x)/l(x)ifxis part of an attractor ofF,0otherwise, (11)

where b(x) is the basin size and l(x) is the length of the attractor that contains x. By design, xμF(x)=1. We can then compute the Jensen–Shannon distance metric (23) between μF and μG to assess the similarity of the attractor spaces of Boolean networks F,G,

DA(F,G)=1DKL(μFμM)+DKL(μGμM)2[0,1], (12)

where DKL is the Kullback–Leibler divergence and M=(μF+μG)/2 is the mixture distribution (i.e. the pointwise mean) of μF and μG.

Biological Boolean network models contain a surprisingly high number of so-called source nodes, which remain constant at all time and typically codify a cellular context or external inputs (15, 24). A network with k source nodes has at least 2k attractors. To avoid any confounding effect of the number of source nodes on the network-level similarity metrics, we computed DS1(F,G), DS2(F,G), and DA(F,G) only for “fixed-source” networks (as described in detail in Ref. (24)), and reported the average across all sampled fixed-source networks. In short, for a Boolean network with k source nodes, we randomly selected min(32,2k) different source node states and replaced all instances of n in the formulas by the number of nodes that are not source nodes. We note that this approach only works as long as F and G have the same source nodes, which is the case in our study, in which F and G even possess the same exact wiring diagram.

Repository of expert-curated GRN models

The largest repository of expert-curated Boolean GRN models contains 122 distinct models (15). Here, “distinct” means that the variables of any two of the 122 models overlap <90%. Threshold rules can only be generated for a GRN with defined directionality and signs (activating vs. inhibiting). Fifteen models contain regulators that have a conditional effect (i.e. both activating and inhibiting depending on the state of the other regulators). We excluded these regulatory functions (0.9% of all 5,112 regulatory logic functions) from the function-level analyses. For all network-level analyses, we analyzed 100 of the 122 distinct models. In addition to the 15 models with conditional regulators, we excluded seven models, which possess regulatory functions with 15 or more inputs (exclusion for computational reasons). For all presented analyses, we derived from each published model its signed wiring diagram W (Eq. 2), which we never modified. In other words, we assumed the signed wiring diagram associated with a published model is 100% correct.

Results

Similarity between threshold formalisms

Throughout, we compare two different threshold formalisms: the Ising formalism and the 01 formalism. We begin with a theoretical analysis and a comparison of the two types of threshold rules. Ising threshold rules are ON (OFF) if the number of present activators and absent inhibitors is higher (lower) than the number of absent activators and present inhibitors. Thus, Ising rules are, by design, always unbiased, i.e. both binary values, ON and OFF, occur with the same probability when considering the rule in truth table format. On the other hand, a 01(-threshold) rule is ON (OFF) if the number of present activators is higher (lower) than the number of present inhibitors. The number of absent regulators does not affect 01 rules. Thus, 01 rules are only unbiased when the number of activators and inhibitors is the same.

To compare the differences in threshold formalisms more comprehensively, we considered all possible Boolean functions y(t+1)=f(x1(t),x2(t)) with two inputs (Fig. 2A–C) and y(t+1)=f(x1(t),x2(t),x3(t)) with three inputs (Fig. 3A–D). Note that y may regulate itself, in which case we have y(t+1)=f(y(t),x1(t)) (Fig. 2D–G) and y(t+1)=f(y(t),x1(t),x2(t)) (Fig. 3E–J). Since each input in a threshold rule represents either an activator or an inhibitor, there are seven (10) possibly different basic threshold functions with two (three) inputs. If the number of activators equals the number of inhibitors, Eqs. 5 and 6 are equal. This is the only case where the Ising formalism and the 01 formalism yield the same Boolean functions (Fig. 2B–F). Whenever there are strictly more (fewer) activators than inhibitors, 01 rules contain more (fewer) ones than zeros and are biased (Figs. 2A, C, D, G and 3). It is therefore not very surprising that Ising rules and 01 rules differ most if all regulators are of the same type (Fig. 4).

Fig. 2.

Fig. 2.

Boolean threshold functions with two inputs. Each table contains all combinations of Boolean inputs (left columns) and outputs under a different threshold formalism given a certain combination of activating (+) and inhibiting () inputs, as described in the second row. In A–C), y does not regulate itself, i.e. the Boolean function is y(t+1)=f(x1(t),x2(t)). In D–G), y regulates itself, i.e. the Boolean function is y(t+1)=f(y(t),x1(t)). Yellow cells highlight degenerated threshold functions (i.e. functions which do not depend on all inputs). Red entries indicate that the value of y(t) is used to determine y(t+1).

Fig. 3.

Fig. 3.

Boolean threshold functions with three inputs. Each table contains all combinations of Boolean inputs (left columns) and outputs under a different threshold formalism given a certain combination of activating (+) and inhibiting () inputs, as described in the second row. In A–D), y does not regulate itself, i.e. the Boolean function is y(t+1)=f(x1(t),x2(t),x3(t)). In E–J), y regulates itself, i.e. the Boolean function is y(t+1)=f(y(t),x1(t),x2(t)). Yellow cells highlight degenerated threshold functions (i.e. functions which do not depend on all inputs). Red entries indicate that the value of y(t) is used to determine y(t+1).

Fig. 4.

Fig. 4.

Similarity between different types of threshold rules and published biological rules for degree A) 2 and B) 3. The leftmost column contains a reference to Figs. 2 and 2. The next columns describe the type of regulation of each input: positive (+) and negative (). The expected similarity among two Boolean functions is always 50%. Higher-than-expected (lower-than-expected) values are colored in shades of blue (red).

In the case of auto-regulation, all investigated threshold formalisms can give rise to degenerated functions, which do not depend on all its inputs. For example, the auto-regulatory 2-input threshold function y(t+1)=f(x(t),y(t)) simplifies to y(t) when y is an activator and x is an inhibitor (Fig. 2E), while it simplifies to x(t) if x is an activator and y and inhibitor (Fig. 2F). This is the case for both threshold formalisms. Under the Ising formalism, f(x(t),y(t)) also simplifies to y(t) if x and y are both activators, completely disregarding the regulatory effect of x (Fig. 2D). The corresponding 01 rule is f(x(t),y(t))=x(t)y(t). If both x and y are inhibitors, then f(x(t),y(t))=¬x(t) under the Ising formalism and f(x(t),y(t))=0 under the 01 formalism (Fig. 2G). An analysis of auto-regulatory Boolean 01 threshold functions y(t+1)=f(x1(t),,xn1,y(t)) with degree n2 revealed that the value of y(t) does not at all affect f if and only if y(t) is an inhibitor (see e.g. Fig. 3E–J). This fact can be proven mathematically by expressing 01 threshold rules as

xi(t+1)=1[ϵxi(t)+jAixj(t)jIixj(t)>0], (13)

which is equivalent to Eq. 4 for any ϵ(0,1).

In the case of inhibitory auto-regulation (i.e. iIi),

xi(t+1)={1[jAixjjIi{i}xj>0]ifxi=0,1[jAixjjIi{i}xj>1ϵ]ifxi=1.

Since xj{0,1}, the two cases are always equal, which proves that a 01 threshold rule never depends on an inhibitory auto-regulatory input.

In the case of activating auto-regulation (i.e. iAi),

xi(t+1)={1[jAi{i}xjjIixj>0]ifxi=0,1[jAi{i}xjjIixj>(1+ϵ)]ifxi=1.

Here, the value of the activating auto-regulatory input xi(t) matters in determining xi(t+1). The gap of 1+ϵ between the two cases proves that xi(t) is even more important than the other inputs (i.e. xi has higher activity (25) and higher edge effectiveness (26)), unless all inputs are activators. This can be seen in Figs. 2E and 3F, G.

Using the same arguments, we can show that the same is true for auto-regulatory Ising rules, however only if their degree is even. If the degree is odd, an Ising rule is always determined by the state of its activators and inhibitors because equality in the conditions of Eq. 6 cannot occur. These observations likely explain why some expert-curated Boolean regulatory logic rules contain nonessential variables (15).

Insights from expert-curated regulatory logic functions can inform improved threshold formalisms

The largest repository of expert-curated biological Boolean network models consists of 122 distinct models with a total of 5112 regulated nodes (15). Prior analysis of the 5112 Boolean update rules has revealed that biological regulatory functions are more canalizing, more redundant and more biased than expected (15). Moreover, 73.9% of regulators have a positive (i.e. activating) effect on the target node, i.e. an increased expression of the regulator (i.e. a change from 0 to 1) leads to an increased expression of the target for some states of the other regulators, and possibly no change in the target for other states of the other regulators. 23.6% of regulators have a negative (i.e. inhibiting) effect, and 0.9% of regulators have a conditional effect (i.e. both activating and inhibiting depending on the state of the other regulators). Further, the higher the number of regulators the lower is the proportion of activating regulators: Excluding functions with conditional regulators, the proportion of inputs that are activators is pa(n)=78.6,71.3,71.0,65.1,and61.8% in biological regulatory functions with degree n=2,3,4,5,6, respectively. This implies that the probability that an n-input function contains a activators and na inhibitors is (na)(pa(n))a(1pa(n))na. Interestingly, we found that biological regulatory functions tend to have regulators of the same type (Fig. 5). That is, more genes than expected are regulated only by activators, and many more genes than expected are solely inhibited. On the contrary, genes that have exactly one inhibitor proved particularly rare. These trends were consistent across all investigated degrees (i.e. all degrees with sample size 100).

Fig. 5.

Fig. 5.

Boolean biological regulatory rules tend to have regulators of the same type. A) Stratification of all expert-curated Boolean biological regulatory rules based on the number of inputs (x-axis) and the number of negative inputs (color). Rules that contain conditional inputs are excluded. Each observed distribution (the right bars with solid borders) is compared to the expected distribution (the left bars with dashed borders), which is computed based on the proportion of positive vs. negative inputs for functions of a given degree and the combinatorial likelihood. n= observed total number of biological rules of a given degree and without any conditional input. B) Ratio of the observed vs. expected proportion of functions with a given number of total inputs (subpanels) and negative inputs (x-axis). Ratios above 1 indicate types of functions that are enriched in expert-curated Boolean biological network models. n= observed number of biological rules of a given number of total inputs and negative inputs, excluding rules with any conditional input.

Most expert-curated GRN models govern cellular processes in eukaryotes (15). The observed higher proportion of activating regulators can be contributed to the fact that eukaryotic genes are by default off and are typically only transcribed when needed (27). This also explains another observation: published regulatory functions, especially those with many inputs (i.e. higher complexity), contain more 0s than 1s (in truth table format). The mean bias across all expert-curated functions with degree 2,3,,6 is 45.64, 43.71, 39.42, 38.51, and 35.2%, respectively.

In light of these observations, we propose a slight modification to each threshold formalism that defines the function value in some/all cases of equality among positive and negative forces in Eq. 4. In the modified Ising formalism (denoted Ising*), the Boolean function is 0 (rather than remaining at its current value) in the case of equality among positive and negative forces in Eq. 6. That is, we assume the default state of a gene to be off, the standard for eukaryotic genes. The modified Ising formalism thus takes the following form:

xi(t+1)={1if|AiX1(t)|+|IiX0(t)|>|AiX0(t)|+|IiX1(t)|,0if|AiX1(t)|+|IiX0(t)||AiX0(t)|+|IiX1(t)|, (14)

where Ai,Ii,X0(t) and X1(t) are defined as in Eq. 6.

If a gene is only regulated by inhibitors, the 01 formalism yields strange Boolean functions: the zero function in the presence of auto-regulation (Figs. 2G and 3J), and, in the absence of auto-regulation, a function that prevents a gene from ever becoming expressed again once it is not expressed (Figs. 2C and 3D). We therefore propose the following modification (denoted 01*): If a gene is only regulated by inhibitors and all inhibitors are absent, the 01*-function is 1 (i.e. the gene is expressed). Likewise, if a gene is only regulated by activators and all activators are absent, the 01*-function is 0 (to ensure such a gene does not remain expressed forever). The modified 01 formalism takes the following form:

xi(t+1)={1if|AiX1(t)|>|IiX1(t)|orif|Ai|=|X1(t)|=0,xi(t)if|AiX1(t)|=|IiX1(t)|,0if|AiX1(t)|<|IiX1(t)|orif|Ii|=|X1(t)|=0. (15)

Differences between biological regulatory logic and threshold rules

Next, we compared the similarity between threshold rules and biological rules. All 2-input biological rules f(x1,x2) are nested canalizing (15, 28). That is, they are of the form f=(¬)x1(¬)x2 or f=(¬)x1(¬)x2 where a negation appears whenever xi is an inhibitor. For any such function, the bias (i.e. the proportion of 1s in truth table format) is 1/4=25% or 3/4=75%. This explains why 2-input Ising rules (with bias 50%) agree with biological rules for 75% of the input combinations, irrespective of the type of the two inputs (Fig. 4A). On the other hand, the average agreement between 01 threshold and biological rules depends on the type of the inputs. If x1 and x2 are both activators, with n=802 the most frequently observed case, 01 rules only agree with biological rules on average for 63.81% of input combinations. If f=x1x2, the agreement is high with 7/8=87.5%. However, if f=x1x2, the agreement is only 3/8=37.5%. By forcing f(x1=0,x2=0)=0 in the modified 01 rule, the agreement increases by exactly 1/8=12.5% to an average of 76.31%. Overall, the modified Ising rules exhibit the highest average agreement with 2-input biological rules (79.37%), followed by modified 01 rules (76.95%), Ising rules (75%) and 01 rules (68.47%). For degree n=3, Ising rules (and modified Ising rules which do not differ from Ising rules whenever the degree is odd) match biological rules better than modified 01 rules and 01 rules with a mean agreement of 73.80, 69.22, and 66.11%, respectively (Fig. 4B). The same is true for degree n{4,5,6} (Fig. 6). The higher the degree the smaller is the average agreement between threshold and biological rules, irrespective of the choice of threshold formalism. Moreover at higher degrees, the differences between 01 and modified 01 rules become negligible (because the maximal possible difference between the two rules is 1/2n, which decreases as the degree n increases), while differences among Ising and modified Ising rules, as well as differences in their agreement with biological rules persist. As already described above for the case of two activators, 01 rules match some nested-canalizing biological rules almost perfectly—modified 01 rules even equal some biological rules (Table 1). However, the agreement with other biological rules is rather low.

Fig. 6.

Fig. 6.

Similarity between threshold rules and published biological rules. All published biological rules of a given degree (x-axis) and without any conditional regulators are compared to different threshold rules (color). Each box extends across the interquartile range (IQR), the notch shows the median, the triangle shows the mean, whiskers extend from the box to the farthest data point lying within 1.5 IQR from the box, and circles (outside the whiskers) show outliers.

Table 1.

Bias and similarity with corresponding threshold functions for the most frequent 3-input NCFs in published GRN models.

graphic file with name pgaf228il1.jpg

NCFs with the same layer structure (29) and the same number of activators per layer (second column) are equivalent as they have the same bias and similarity with corresponding threshold functions.

Theoretical considerations reveal how well any monotonic Boolean function f can match its corresponding threshold rule t(f): The key idea is to compare the bias of the two functions. Let pf,pt(f)[0,1] describe the bias of f and its corresponding threshold rule, respectively. Then, any difference in bias implies that the similarity between the two functions must satisfy

s(f,t(f))1|pfpt(f)|. (16)

Specifically, s(f,t(f))=1|pfpt(f)| if and only if the two functions match as closely as possible, i.e. if f=0 whenever t(f)=0 and t(f)=1 whenever f=1, when assuming, without loss of generality, that pfpt(f).

Ising rules are always unbiased, i.e. pIsing(f)=1/2, irrespective of f. The bias of 01-rules depends on the number of activating vs. inhibitory inputs: As can be seen in Figs. 2A–C and 3A–D, a 01-rule with n nonauto-regulatory inputs, of which m{0,1,,n} are activating, has bias

p01rule(f)=12n+1k=0m(n+1k). (17)

Similar formulas for the case of auto-regulation and modified 01-rules can be derived. These considerations explain why an AND-function x1xn with low bias 1/2n poorly matches its corresponding 01-threshold rule, which has bias 11/2n+1, or why a NOT-AND-function ¬x1¬xn with high bias 11/2n+1 poorly matches its 01-rule, which has bias 1/2n+1. Moreover, the results show that 01-rules are well-suited to describe the action of multiple independent transcription factors (which is modeled by an OR rule). However, if multiple activators form a protein complex, which regulates transcription, the appropriate biological rule is an AND function, which differs substantially from the 01-rule. This also explains why the deviations in agreement with biological rules are much larger for 01 rules than Ising rules, especially at higher degree (Fig. 6, Table 1).

Differences between the dynamics of biological networks and threshold networks

Thus far, we have only compared the similarity between expert-curated regulatory logic and corresponding threshold functions. The primary purpose of Boolean network models is however insight into the dynamic behavior of systems. Specifically, researchers are interested in the long-term behavior of systems as they settle into attractors, which in meaningful biological networks correspond to different cell types or phenotypes. We therefore asked the question: how well do threshold models recover (i) the entire synchronous state space (i.e. the entire dynamics) of a biological GRN and (ii) the attractors and corresponding basin sizes of a biological GRN?

To assess the similarity of two synchronous state spaces, we ask: Given a random state that is updated once according to each model, how similar are the updated states on average? If similarity of two states is quantified by the normalized Hamming distance (e.g. for x=(0,0,1,1) and y=(0,0,0,0), d(x,y)=1/2), then the similarity of two synchronous state spaces is simply the average agreement of the regulatory functions, which we analyzed in Fig. 6. Using this measure of state space similarity, we found that modified 01 networks recovered the transitions in biological networks best (Fig. 7A), with an average similarity across the 100 biological networks of 78.36%, followed by modified Ising networks (75.44%), Ising networks (71.58%), and 01 networks (67.32%). Interestingly, the modified 01 formalism performed particularly well for networks with hard-to-recover state spaces (i.e. for networks with lower mean function-level agreement across the threshold formalisms). For networks that were easy to recover, the modified Ising formalism was slightly better. The similarity of two states can also be quantified by a binary indicator function (i.e. d(x,y)=1 if x=y and 0 otherwise). As expected, this more stringent definition yielded much lower state space similarities (Fig. 7B). Across the 100 biological networks, modified Ising networks exhibited the highest state space overlap (mean=12.46%), followed by modified 01 networks (8.56%), Ising networks (7.38%), and 01 networks (2.98%).

Fig. 7.

Fig. 7.

Ability of threshold networks to recover the dynamics of biological networks. For four threshold formalisms (color) and four dynamic similarity measures A) the mean agreement of the regulatory functions (i.e. the overlap of the asynchronous state space), see Eq. 10, B) the overlap of the synchronous state space, see Eq. 8, and C, D) the overlap of the C) asynchronous and D) synchronous attractor space, see Eq. 12), the empirical cumulative density function across the 100 biological networks is shown.

While a truthful representation of the entire state space is desirable, correct long-term behavior is arguably even more important. Boolean network trajectories eventually transition towards a network attractor. In biological networks, each attractor typically corresponds to a cell type or phenotype. Across the 100 biological networks, modified Ising networks had the highest average attractor space similarity (17.68% when asynchronously updated and 13.84% when synchronously updated), followed by Ising networks (11.41% and 10.14%), modified 01 networks (10.14% and 8.91%), and 01 networks (3.79% and 2.96%; Fig. 7C, D).

Figure 7 displays an empirical cumulative density function across the 100 biological networks: For each threshold formalism, the networks are ranked in increasing order of dynamic similarity with corresponding threshold networks. This may imply that the dynamics of some biological networks are easier to recover than others, irrespective of the type of threshold formalism, which is not our intention. To quantify the degree to which the ranking in dynamic similarity differed among the threshold formalisms, we computed the pairwise Kendall rank correlation coefficient (Fig. 8). As expected, similar threshold formalisms (Ising and its modification, as well as 01 and its modification) mostly recover the same biological network dynamics well. On the other hand, biological networks with a higher function-level agreement with Ising and modified Ising rules did not exhibit any higher agreement with 01 rules and only slightly higher agreement with modified 01 rules (Fig. 8A). For the network-level dynamic similarity measures, all four threshold formalisms agreed to a substantial part on which biological network dynamics were easier to recover (Fig. 8B–D).

Fig. 8.

Fig. 8.

Pairwise agreement among threshold formalisms in dynamic similarity rankings of biological networks. For four threshold formalisms, the dynamic similarity A) the mean agreement of the regulatory functions (i.e. the overlap of the asynchronous state space), see Eq. 10, B) the overlap of the synchronous state space, see Eq. 8, and C, D) the overlap of the C) asynchronous and D) synchronous attractor space, see Eq. 12) with 100 biological networks is computed, see Fig. 7. The heatmaps display the similarity (quantified using Kendall’s τ coefficient) of the rankings of the biological networks between any two threshold formalisms.

Thus far, we have ignored obvious confounders that affect how truthfully biological networks and their dynamics can be represented by threshold networks. These include the network size and the network connectivity. A Boolean network in n nodes has 2n states. The larger the network the smaller is thus the chance that (i) a state is updated to exactly the same state and (ii) that attractors and their basins coincide in both the biological and the corresponding threshold network. This explains the observed strong negative Spearman correlations between network size and the overlap in state space and attractor space (Fig. 9). This trend was consistent across threshold formalisms. As shown before and summarized in Fig. 6, the average function-level agreement among biological and corresponding threshold rules decreases as the degree increases, irrespective of the threshold formalism. In line with this, more connected biological networks exhibited a lower average mean function-level agreement with their corresponding Ising, modified Ising and modified 01 threshold networks. Surprisingly, the mean function-level agreement of 01 networks with biological networks did not depend on the average network connectivity (ρSpearman=0.04,p=0.69). For Ising, modified Ising and 01 networks, network size was also significantly correlated with the mean function-level agreement (ρSpearman=0.59,0.6, and 0.29, respectively). This is somewhat surprising, especially because network size and average connectivity are not significantly correlated across the 100 biological networks (ρSpearman=0.14,p=0.16). Average network connectivity was weakly negatively correlated with the overlap of state space and attractor space for Ising and modified Ising networks but was not statistically significantly correlated for 01 and modified 01 networks. A weak negative correlation can be expected due to the reported decreasing agreement between biological and threshold rules as complexity increases.

Fig. 9.

Fig. 9.

Correlation among dynamics similarity measures and network properties. For each threshold formalism (four rows each), the Spearman correlation of dynamics similarity measures with other dynamics similarity measures, as well as network properties, is computed across the 100 biological networks.

Discussion

Boolean network models have long served as a cornerstone for investigating the dynamics of biological networks and specifically GRNs. In the absence of sufficient data to infer the exact regulatory logic, the Ising and 01 threshold formalisms are frequently used as default Boolean update rules. In our study, we critically evaluate these default models against a comprehensive repository of expert-curated Boolean GRN models, the closest available proxy of “reality.” By comparing their performance in recovering both the biological regulatory functions and the dynamic behaviors of cellular networks, our work addresses a key challenge in systems biology: the accurate inference of GRN dynamics from high-throughput data.

We described how the two formalisms can be expressed using the same equation (Eq. 3), with the sole difference that the OFF state is represented by 1 in Ising rules and by 0 in 01 rules. When auto-regulation is present, both types of threshold rules can contain nonessential inputs, which means they fail to accurately represent available biological knowledge in a Boolean context. Informed by a meta-analysis of 122 expert-curated Boolean GRN models (15), we introduced a modified version of each threshold formalism, without loss of generality that is characteristic of threshold rules. The modified threshold rules consistently outperform the standard models. Notably, these improvements are reflected in a higher agreement with biological regulatory rules and a more faithful recovery of state space transitions and attractor landscapes—features that are essential for representing cellular phenotypes. Overall, the Ising formalism outperformed the 01 formalism, and modified Ising rules emerged as best default rules of Boolean GRN models from our systematic assessment. Given the generality of the modifications, further changes to threshold rules specific to individual contexts are expected to improve the ability of threshold rules to accurately capture the underlying dynamics. These findings underscore the importance of refining Boolean threshold models to better mirror the underlying biology, thereby enhancing their utility in the computational analysis of gene regulatory networks.

This study has some limitations. First, we treat the repository of 122 expert-curated Boolean GRN models as ground truth. Unfortunately, we cannot rule out that the diverse set of researchers that designed the published models may have had some shared preconceived notions and biases about what reasonable regulatory logic should look like. However, it is worth noting—and should partially help mitigate the potential impact of these biases on this study—that any Boolean models whose regulatory rules were assumed to follow some default formalism (e.g. threshold rules), rather than being informed by expert knowledge and biological experiments, were excluded from the repository from the beginning. Second, the modification of the Ising formalism was informed by the observation that most genes in expert-curated Boolean GRN models are more frequently off, the default state in eukaryotes. Using for validation the same set of rules that was used to inform this modification gives rise to a somewhat circular argument. While the improved function-level similarity of modified Ising rules is thus expected, it is assuring to see their improved ability to recover Boolean GRN dynamics and especially their phenotypes (i.e. attractors). That being said, the low number of expert-curated prokaryotic Boolean GRN models means it is impossible to properly evaluate the suitability of the modified threshold functions as default rules for prokaroytic regulatory logic.

Lastly, we consider the problem of Boolean GRN model inference as a two-step procedure: (i) infer the signed, directed wiring diagram from high-throughput data and (ii) infer the Boolean update rules that best match the data and agree with the inferred wiring diagram. In this article, we only focus on the second step, i.e. we assume the inferred signed, directed wiring diagram represents the truth. In reality, the inferred wiring diagram itself may contain edges with higher and lower confidence or even errors. Access to information on the confidence in an edge and its sign could potentially result in more accurate default Boolean update rules. Moreover, considering Boolean GRN model inference as a one-step procedure, i.e. inferring the best Boolean network directly from high-throughput data, may harbor additional benefits, which are beyond the scope of this study.

Future work could explore additional modeling frameworks to identify robust default approaches for simulating GRNs. Extending Boolean models with multilevel logical rules—allowing for discrete states such as low, medium, and high for key regulatory nodes—could enhance the representation of graded expression patterns (30). Incorporating memory-dependent dynamics, where node states are influenced by their own past states, may better capture temporal dependencies in gene regulation (31, 32). Furthermore, investigating how structural properties of GRNs—such as degree distributions, fractal organization, or modularity—influence system dynamics could inform the selection of appropriate modeling frameworks (32). Together, these approaches would improve the capacity to simulate GRN dynamics, especially under conditions of incomplete knowledge about the underlying regulatory rules.

Contributor Information

Claus Kadelka, Department of Mathematics, Iowa State University, 411 Morrill Rd, Ames, IA 50011, USA.

Kishore Hari, Center for Theoretical Biological Physics, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USA.

Funding

C.K. was partially supported by a travel grant from the Simons Foundation (grant number 712537) and a grant by the National Science Foundation (award 2424632). K.H. was partially supported by the National Science Foundation through the Center for Theoretical Biological Physics, PHY-2019745 and under award number MCB-2114191.

Author Contributions

C.K.: conceptualization, software, formal analysis, investigation, visualization, methodology, writing-original draft, writing-conceptualization, methodology, writing-review and editing. K.H.: conceptualization, investigation, methodology, writing-review and editing.

Preprints

This manuscript was posted on a preprint: https://www.biorxiv.org/content/10.1101/2025.03.06.641948v2.

Data Availability

Kadelka et al. (15) contains standardized update rules of the 122 investigated published, expert-curated Boolean biological network models. All Python code underlying the analyses described in this article is available at https://zenodo.org/records/15858654.

References

  • 1. Stuart  JM, Segal  E, Koller  D, Kim  SK. 2003. A gene-coexpression network for global discovery of conserved genetic modules. Science. 302(5643):249–255. [DOI] [PubMed] [Google Scholar]
  • 2. Zhang  B, Horvath  S. 2005. A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol. 4(1):1–45. [DOI] [PubMed] [Google Scholar]
  • 3. Prill  RJ, et al.  2010. Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS One. 5(2):e9202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Marbach  D, et al.  2010. Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci U S A. 107(14):6286–6291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Marbach  D, et al.  2012. Wisdom of crowds for robust gene network inference. Nat Methods. 9(8):796–804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Huynh-Thu  VA, Irrthum  A, Wehenkel  L, Geurts  P. 2010. Inferring regulatory networks from expression data using tree-based methods. PLoS One. 5(9):e12776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Pratapa  A, Jalihal  AP, Law  JN, Bharadwaj  A, Murali  TM. 2020. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods. 17(2):147–154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Kang  Y, Thieffry  D, Cantini  L. 2021. Evaluating the reproducibility of single-cell gene regulatory network inference algorithms. Front Genet. 12:617282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Matsumoto  H, et al.  2017. SCODE: an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation. Bioinformatics. 33(15):2314–2321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Papili Gao  N, Ud-Dean  SMM, Gandrillon  O, Gunawan  R. 2018. SINCERITIES: inferring gene regulatory networks from time-stamped single cell transcriptional expression profiles. Bioinformatics. 34(2):258–266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Lähdesmäki  H, Shmulevich  I, Yli-Harja  O. 2003. On learning gene regulatory networks under the Boolean network model. Mach Learn. 52(1-2):147–167. [Google Scholar]
  • 12. Barman  S, Kwon  Y-K. 2018. A Boolean network inference from time-series gene expression data using a genetic algorithm. Bioinformatics. 34(17):i927–i933. [DOI] [PubMed] [Google Scholar]
  • 13. Pušnik  Ž, Mraz  M, Zimic  N, Moškon  M. 2022. Review and assessment of Boolean approaches for inference of gene regulatory networks. Heliyon. 8(8):e10222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Zanudo  JGT, Aldana  M, Martínez-Mekler  G. Boolean threshold networks: virtues and limitations for biological modeling. In: Kacprzyk J, Jain LC, editors. Information processing and biological systems. 2011. p. 113–151. [Google Scholar]
  • 15. Kadelka  C, et al.  2024. A meta-analysis of Boolean network models reveals design principles of gene regulatory networks. Sci Adv. 10(2):eadj0822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Thomas  R, d’Ari  R. Biological feedback. CRC Press, 1990. [Google Scholar]
  • 17. Harvey  I, Bossomaier  T. Time out of joint: attractors in asynchronous random Boolean networks. In: Proceedings of the fourth European conference on artificial life. Citeseer, 1997. p. 67–75.
  • 18. Saadatpour  A, Albert  I, Albert  R. 2010. Attractor analysis of asynchronous Boolean models of signal transduction networks. J Theor Biol. 266(4):641–656. [DOI] [PubMed] [Google Scholar]
  • 19. Crama  Y, Hammer  PL. Boolean functions: theory, algorithms, and applications. Cambridge University Press, 2011. [Google Scholar]
  • 20. Font-Clos  F, Zapperi  S, La Porta  CAM. 2018. Topography of epithelial–mesenchymal plasticity. Proc Natl Acad Sci U S A. 115(23):5902–5907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Pázmándi  F, Zaránd  G, Zimányi  GT. 1999. Self-organized criticality in the hysteresis of the Sherrington-Kirkpatrick model. Phys Rev Lett. 83(5):1034–1037. [Google Scholar]
  • 22. Sethna  JP, et al.  1993. Hysteresis and hierarchies: dynamics of disorder-driven first-order phase transformations. Phys Rev Lett. 70(21):3347–3350. [DOI] [PubMed] [Google Scholar]
  • 23. Lin  J. 1991. Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory. 37(1):145–151. [Google Scholar]
  • 24. Park  KH, Costa  FX, Rocha  LM, Albert  R, Rozum  JC. 2023. Models of cell processes are far from the edge of chaos. PRX Life. 1(2):023009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Shmulevich  I, Kauffman  SA. 2004. Activities and sensitivities in Boolean network models. Phys Rev Lett. 93(4):048701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Gates  AJ, Brattig Correia  R, Wang  X, Rocha  LM. 2021. The effective graph reveals redundancy, canalization, and control pathways in biochemical regulation and signaling. Proc Natl Acad Sci U S A. 118(12):e2022598118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Phillips  T. 2008. Regulation of transcription and gene expression in eukaryotes. Nat Edu. 1(1):199. [Google Scholar]
  • 28. Dimitrova  E, Stigler  B, Kadelka  C, Murrugarra  D. 2022. Revealing the canalizing structure of Boolean functions: algorithms and applications. Automatica. 146(13):110630. [Google Scholar]
  • 29. Kadelka  C, Kuipers  J, Laubenbacher  R. 2017. The influence of canalization on the robustness of Boolean networks. Physica D. 353–354:39–47. [Google Scholar]
  • 30. Wang  R-S, Saadatpour  A, Albert  R. 2012. Boolean modeling in systems biology: an overview of methodology and applications. Phys Biol. 9(5):055001. [DOI] [PubMed] [Google Scholar]
  • 31. Maheshwari  P, Assmann  SM, Albert  R. 2020. A guard cell abscisic acid (ABA) network model that captures the stomatal resting state. Front Physiol. 11:927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Ghorbani  M, Jonckheere  EA, Bogdan  P. 2018. Gene expression is not random: scaling, long-range cross-dependence, and fractal characteristics of gene regulatory networks. Front Physiol. 9:1446. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Kadelka et al. (15) contains standardized update rules of the 122 investigated published, expert-curated Boolean biological network models. All Python code underlying the analyses described in this article is available at https://zenodo.org/records/15858654.


Articles from PNAS Nexus are provided here courtesy of Oxford University Press

RESOURCES