Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Nov 15.
Published in final edited form as: J Phys Chem B. 2022 Dec 5;126(49):10374–10383. doi: 10.1021/acs.jpcb.2c05412

What makes a functional gene regulatory network? A circuit motif analysis

Lijia Huang 1,2, Benjamin Clauss 1,3, Mingyang Lu 1,2,3,4,*
PMCID: PMC9896654  NIHMSID: NIHMS1863137  PMID: 36471236

Abstract

One of the key questions in systems biology is to understand the roles of gene regulatory circuits in determining cellular states and their functions. In previous studies, some researchers have inferred large gene networks from genome-wide genomics/transcriptomics data using the top-down approach, while others have modeled core gene circuits of small sizes using the bottom-up approach. Despite of many existing systems biology studies, there is still no general rule on what sizes of gene networks and what types of circuit motifs a system would need to achieve robust biological functions. Here, we adopt a gene circuit motif analysis to discover four-node circuits responsible for multiplicity (rich in dynamical behavior), flexibility (versatile to alter gene expression), or both. We identify the most reoccurring two-node circuit motifs and the co-occurring motif pairs. Furthermore, we investigate the contributing factors of multiplicity and flexibility for large gene networks of different types and sizes. We find that gene networks of intermediate sizes tend to have combined high levels of multiplicity and flexibility. Our study will contribute to a better understanding of the dynamical mechanisms of gene regulatory circuits and provide insights into rational designs of robust gene circuits in synthetic and systems biology.

Keywords: Gene regulatory circuits, motif enrichment analysis, network multiplicity, network flexibility, network construction, multistability, dynamical systems

Graphical Abstarct

graphic file with name nihms-1863137-f0001.jpg

Introduction

It has been established that many important biological cellular processes are controlled by complex gene regulatory networks (GRNs)1,2. The inference of the GRNs driving cellular state transitions has become one of the major challenges in systems biology3. To address this question, many bioinformatics methods412 have been developed to infer GRNs using genomics datasets, such as gene expression data. Yet, researchers have found it very difficult to generate high-quality network models1315. In our view, the main issue is that most bioinformatics methods rely on statistical tests to determine whether one gene regulates another, but seldom evaluate whether an inferred GRN is able to operate as a functional dynamical system. This view is supported by a recent benchmark study of GRN inference methods in that current existing methods often perform poorly to recover ground-true networks13.

The above-mentioned concern has led us to think about what properties of a GRN would contribute to a functional system. There are two features of a GRN worth looking into. First, a functional GRN needs to generate rich dynamical behaviors, e.g., multiple steady states (i.e., multistability) and/or oscillatory states. As shown in earlier studies, random GRNs tend to generate less interesting dynamical behaviors than biological networks16,17. On the other hand, multistability is often required for a GRN model to capture a variety of cellular states during cell differentiation16,1821. Second, a functional GRN needs to be sufficiently flexible so that the GRN can be controlled by extrinsic cell signaling or an environmental factor2225. It is quite common that the activation of a signaling pathway can drive the transition of cellular states26. Equally typical are gene knockdown/knockout experiments designed to understand the functions of genes based on the effects of gene perturbations27. Thus, functional GRNs need to be flexible, even in the presence of a certain level of compensation and adaptation due to network redundancy28,29. Therefore, it is reasonable to hypothesize that a functional GRN is required to produce rich dynamics and meanwhile be flexible upon perturbations.

Here, under this conceptual framework, we adopted our recently developed gene circuit motif analysis approach30 to explore nonredundant four-node gene circuits that are responsible for multiplicity (i.e., being rich in dynamical behavior), flexibility (i.e., being versatile to alter gene expression), or both. There are many previous studies on circuit motif analysis2,18,31,32, however we here focused on the properties of multiplicity and flexibility by extensive simulations and statistical analysis. From the identified small circuits, we will determine the most reoccurring two-node circuit motifs and the propensity of co-occurrence of two circuit motifs. Furthermore, using the identified circuit motifs, we generated a variety of large GRNs of different types (linear, scale-free, and random) and different sizes, from which we investigated the contributing factors of the multiplicity and flexibility of large GRNs. We hope that the outcomes of these analyses will shed light on the improved modeling of biological GRNs.

Methods

A quantitative circuit motif analysis

We have recently developed a new approach for gene circuit motif analysis30, which allows identifying reoccurring two-node circuit motifs and patterns of motif coupling from the ranking of 60,212 non-redundant four-node gene circuits by a certain dynamical feature. In this approach, we numerate all possible non-redundant four-node gene circuits, and, for each circuit, we generate the steady-state gene expression profiles for an ensemble of 10,000 mathematical models with the random circuit perturbation (RACIPE) method33,34 (see section “RACIPE simulations” and SI Text 1 for details). Based on a user-defined scoring function computed from the simulated gene expression data (see section “Defining network multiplicity and flexibility” for the two scores defined in this study), we can rank all the four-node circuits and identify two-node circuit motifs enriched in the top-ranking circuits. A similar enrichment analysis can also be applied to evaluate the co-occurrence of two circuit motifs. Our approach has several advantages over some existing methods. First, to ensure a robust statistical analysis, the circuit motif analysis utilizes extensive simulation data from all non-redundant four-node gene circuits. Second, the ensemble-based circuit simulations allow us to quantify circuits’ dynamical behavior not specific to a special set of kinetic parameters. A scoring function defined in this way enables us to rank gene circuits robustly and efficiently. Third, from the analysis of all four-node gene circuits, we can evaluate the enrichment of small circuit motifs and their coupling. Note that we limit our analysis to four-node gene circuits without any signaling node, i.e., a node without another regulator, as a circuit with signaling nodes could usually be reduced to a gene circuit of a smaller size. Therefore, the circuit motifs we would explore here could be complimentary but also distinct from the most significant circuit motifs identified in the previous studies2. In this study, we applied this enrichment analysis to identify small circuit motifs contributing to a functional regulatory system.

RACIPE simulations

For any four-node gene regulatory circuit or a large gene regulatory network (GRN), we computed its gene expression distribution by using random circuit perturbation (RACIPE)33,34 (sRACIPE 1.12). RACIPE simulates an ensemble of mathematical models for a gene circuit/network with randomly selected kinetic parameters and obtains steady-state gene expression profiles. Compared to the traditional mathematical modeling approaches, RACIPE simulated gene expression profiles are derived from the same ODEs but with different kinetic parameters to capture extrinsic factors, such as cell-to-cell variations and different environmental conditions. We have previously shown, in a few examples of biological networks, that the gene expression profiles derived from RACIPE simulations form distinct clusters of gene expression patterns, where the cluster can be associated with experimentally observed cellular states of the systems16,33,3537. Thus, RACIPE is a convenient and powerful method to evaluate the behavior of gene circuits/networks from simulated gene expression distributions. Here, we generated 10,000 gene expression profiles (log-transformed and standardized) for each circuit/network to compute its gene expression distribution. A summary of the RACIPE implementation, including the ODEs, the ranges of model parameters, and the choice of initial conditions, is presented in SI Text 1.

Defining network multiplicity and flexibility

For each gene circuit, we applied RACIPE to generate the steady-state gene expression profiles of 10,000 mathematical models with randomly generated kinetic parameters. As mentioned above, RACIPE-simulated gene expression profiles from a biological network usually form robust clusters of gene expression patterns. However, the gene expression profiles from random gene networks are usually less structured16,17. To reflect the ability of biological GRNs in generating distinct cellular states, we defined a scoring function H, namely multiplicity, by the negative differential entropy38 of the simulated gene expression distributions of the 10,000 models:

H=logpi~1NilogkNVRkd (1),

where pi is the local density of model i, 〈⋅〉 denotes average, N is the number of models, and the summation is over all simulated models. Here, the local density pi is computed by the k-nearest neighbors (knn) estimator39, where R(k) is the Euclidean distance of gene expression profiles between k-th nearest model and the center model, d is the dimension of the gene expression space (d = 4 for any four-node gene circuit), and a constant scaling factor V=πd/2Γ(d2+1). The multiplicity defined in Equation (1) can be interpreted by the mean log local density. The higher the overall local densities, the higher the H values. Moreover, in the situation of high local density, more gene expression clusters can be observed. This is consistent with our previous findings that the local densities of the gene expression profiles simulated from a stem cell gene regulatory circuit are overall larger than those from a random gene circuit16.

We next defined the flexibility of a gene circuit, F, by the extent of changes in the gene expression distributions of 10,000 RACIPE models between the unperturbed and knockdown (KD) conditions. More specifically, the flexibility F is defined as

F=j=1dl=1delD(pl,0,pl,j) (2),

where the summations are over all gene nodes j (from 1 to the dimension d) and all principal components (PCs) l (from 1 to d). Here, principal component analysis is performed on the gene expression data of models from the unperturbed condition. el is the l-th eigenvalue, which we incorporated here to emphasize the changes along the largest PCs. We quantified the differences in gene expression distributions by D, the Kolmogorov–Smirnov test40 of the probability distribution of the data along each PC between the unperturbed condition (pl,0 for the l-th PC) and the perturbed condition, in which gene j is knocked down (pl,j). Here, we subset 10% models with the lowest production rates of the KD gene j to compute the distribution for the KD condition.

In addition, we also defined another scoring function for the combined multiplicity and flexibility. Since the circuits’ multiplicity H and flexibility F have values in different ranges, we chose to rank circuits with a new score G, defined by the product of H and F,

G=HF (3).

Enrichment analysis of circuit motifs

To explore gene circuit motifs associated with circuit multiplicity or flexibility, we performed an extensive circuit motifs enrichment analysis on all 60,212 non-redundant four-node gene circuits. These circuits exclude those that can be equivalently reduced to circuits of three or a smaller number of nodes (see SI Text 2 and the previous study30 for more details). For each circuit, we performed RACIPE simulations to generate gene expression profiles and evaluated the H, F, and G scores defined by Equations (1) - (3).

For each of the above scores (denoted below as Q), we calculated the occurrence of any two-node circuit motifs (listed in Fig.S1) within the top-ranked four-node circuits by the score. There are several existing deliciated methods for network motif detection41. But we chose to utilize a simple numeration approach based on adjacency matrices, because we focused on very small circuits in this study, and our analysis can account for different edge types (i.e., activation and inhibition) and auto-regulations. We computed the enrichment score for each two-node circuit motif, defined as

E=logl1HQ,Q0,nlHQ,Q0,n (4),

where Hx,x0,n=1/(1+(xx0)n) is the inhibitory Hill function. The Hill threshold, Q0 is set to be the Q value of the 600th ranked circuit. The Hill coefficient n is set to 20, for a sharp transition near the threshold Q0. Moreover, we applied the enrichment analysis to determine the enriched co-occurrence of two two-node circuit motifs.

When performing the enrichment analysis, we evaluated statistical significance by a permutation test as follows. First, enrichment scores were calculated for every two-node motif as described above. Second, a null distribution was created by shuffling the ranking indices of all four-node circuits. Third, enrichment scores were calculated again but with the shuffled indices. Fourth, steps two and three were repeated for 10,000 times. Fifth, p-values were computed by the fraction of cases where the enrichment score of each motif from the shuffled indices is greater than the original enrichment score with the unshuffled indices. Adjusted p-values were then computed for multiple hypothesis testing by the BH method42. Similar strategy for statistical significance test has been utilized in earlier studies43. Details of the enrichment analysis are also described in our previous study30.

Generating large GRNs

We programmatically generated three types of large GRNs – random, scale-free, and sequential networks, where the procedure is illustrated in Fig.S2. To generate the random networks, we first built skeleton networks using standard network generation algorithms, and then each node in a skeleton network is replaced with a gene circuit motif of choice. For an edge connecting two circuit motifs, a randomly selected gene from the first circuit motif was linked to a randomly selected gene from the second circuit motif.

First, to generate the skeleton networks with gaussian degree distribution and directed edges, we used the erdos.renyi.game function from igraph R package44 and the gnp or gnm method. For the gnm version of the random networks (denoted as random ver1), the total number of edges was set to equal the total number of nodes. Therefore, these GRNs are sparsely connected. For the gmp version of the random networks (denoted as random ver2), the probability of an edge occurring between nodes was set to 30%. Therefore, these GRNs are densely connected. From a skeleton network, half of the edges were randomly selected and designated as inhibitory edges, while the other half as excitatory edges. Second, to generate the scale-free networks, we used the sample_pa function from igraph R package to generate the skeleton networks with the power law degree distribution and directed edges. Third, the sequential networks were constructed by connecting the desired number of two-node circuit motifs one after another. An edge (either excitatory or inhibitory) was added to connect a randomly picked gene from the first motif (source) to a randomly picked gene from the second motif (target). Afterwards, another edge (either excitatory or inhibitory) was added to connect the other gene from the second motif (source) to a randomly picked gene from the third motif (target). We continued the procedure iteratively to connect all motifs.

Altogether, we generated networks of four types (random ver1, random ver2, scale-free, and sequential), five different network sizes (10, 20, 30, 40, and 50 nodes), and with three sets of circuit motifs as the building blocks. The first set of motifs contains the top three enriched motifs by multiplicity; the second set contains the top three enriched motifs by flexibility; and the third set contains all the above six motifs (see Results for details of the motif enrichment analysis). Each kind of networks were generated randomly for ten times; thus, we analyzed on a total of 600 large GRNs (4 × 5 × 3 × 10).

Results

Characterizing circuit multiplicity

We applied the multiplicity scoring function to all 60,212 non-redundant four-node gene circuits. As shown in Fig.1A, the multiplicity score H ranges from about 2.5 to 4.8, and the distribution of multiplicity is a negatively skewed unimodal distribution, with slightly more circuits of high H values. We found that the scoring function H is indeed effective in capturing circuit multiplicity (Fig.S3). We observed that the topmost circuits ranked by H tend to contain regulatory links of mutual activations, mutual inhibitions, and self-activations. The tight regulatory connectivity in those circuits allows them to have a higher number of gene expression clusters, as shown in the density map of the simulated gene expression profiles projected onto the first two principal components (PCs) (Fig.1A, right panels). Instead, the bottommost circuits ranked by H tend to have only one gene expression cluster (Fig.S4A).

Figure 1. Multiplicity and flexibility of gene regulatory circuits.

Figure 1.

(A) The leftmost plot shows the histogram of multiplicity for all nonredundant four-node circuits. The right panels show, for the top five circuits ranked by multiplicity, the circuit diagrams (top row) and the density maps of RACIPE-simulated gene expression projected onto the first two PCs (bottom row). In the circuit diagrams, the nodes represent genes, labeled as A, B, C, and D. The blue lines and arrows represent excitatory regulations, and the red lines and dots represent inhibitory regulations. Panel (B) shows the outcomes for the flexibility score.

Next, we applied the circuit motif enrichment analysis by comparing the occurrence of any two-node circuit motifs within the topmost four-node circuits ranked by multiplicity with the occurrence within the rest circuits. As shown in Fig.2AB (significant test in Fig.S6A), the top five enriched two-node circuit motifs all contain self-activations, suggesting its dominant role in determining high multiplicity. The top three motifs (#30, 21, 39, see Fig.S1 for the circuit diagrams and indices of all two-node motifs), which have similarly high enrichment scores, all contain mutual regulatory links. Contrarily, the bottom three motifs (#7, 34, 16) all contain self-inhibitions. These findings are all consistent with what we observed in the top-ranked four-node circuits in Fig.1A and with previous studies showing that self-activation generates multistability, and self-inhibition stabilizes gene expression state2,30,45. Furthermore, we evaluated the enrichment of the co-occurrence of two circuit motifs within the top-ranked four-node circuits by the multiplicity, as shown in Fig.2CDE. We observed again that two motifs with self-activation tend to be more enriched, while two motifs with self-inhibition tend to be less enriched.

Figure 2. Circuit motif enrichment analysis with respect to circuit multiplicity.

Figure 2.

(A) The enrichment scores for all two-node circuit motifs were computed from all non-redundant four-node gene circuits ranked by multiplicity. The enrichment is significant for most motifs (adjusted p-values < 0.01, as shown in Fig.S6A), except for motifs #17 and 36 (circuit diagrams and indices in Fig.S1). (B) The diagrams of the top five enriched circuit motifs. Panels (C – E) show the heatmaps of the enrichment scores for the cooccurrence of all pairs of two two-node circuit motifs. Three heatmaps correspond to the overall co-occurrence (C), the co-occurrence of two motifs with a shared node (overlapping, D), and the co-occurrence of two motifs without a shared node (non-overlapping, E). Hierarchical clustering analysis was applied to each case with Euclidean distance and complete linkage. There are grey colors in panel D, as some motif combinations do not exist.

Characterizing circuit flexibility

We applied the flexibility scoring function to rank all 60,212 non-redundant four-node gene circuits. As shown in Fig.1B, the flexibility score F ranges from about 0.6 to 1.4, and the distribution of flexibility is close to a symmetric unimodal distribution. We tested the flexibility score F on a few four-node circuits (Fig.S5) and found that, for circuits with larger F, gene expression distributions have noticeably larger changes upon gene KD perturbations. We observed that the topmost circuits ranked by F tend to be sparsely connected. Compared to the topmost circuits ranked by H, the topmost circuits by F usually have a mono-directional interaction between two nodes (either the 1st node regulating the 2nd, or the 2nd node regulating the 1st), and there are fewer auto-regulations. Interestingly, the circuits with high flexibility usually have gene expression profiles of two clusters (Fig.1B, right panels), while the circuits with low flexibility tend to have gene expression profiles of multiple clusters (Fig.S4B). This observation can be understood as follows. For circuits allowing only one gene expression cluster, the possible gene expression distribution is limited by the cluster. For circuits with a higher number of gene expression clusters, it is hard to transit through multiple states by perturbation. Thus, circuits with two gene expression clusters are the most likely to achieve substantial changes in gene expression distributions upon perturbations.

Next, we applied the circuit motif enrichment analysis to circuits ranked by flexibility. As shown in Fig.3AB (significant test in Fig.S6B), the first and second most enriched circuit motifs (#1 and 10) all contain a single regulatory link from one gene to the other. The third and fourth most enriched circuit motifs (#19 and 37) are circuits with mutual activation and mutual inhibition (toggle switch), respectively. We noticed slight differences in the enrichment scores between circuits with excitatory regulations and those with inhibitory regulations, presumably because of sampling deviations and the measurement of flexibility by knockdown perturbations. These most enriched circuit motifs usually do not prefer autoregulation. Interestingly, the toggle-switch-like circuit motifs (#19 and 37) are frequently observed in the topmost flexible circuits, as they are known to generate bistability46,47. These motifs ensure circuits to be sparsely connected and bistable, thus allowing the whole circuit to be also flexible. Furthermore, we evaluated the enrichment of the co-occurrence of two circuit motifs within the top-ranked four-node circuits by the flexibility, as shown in Fig.3CDE. Interestingly, we observed frequent co-occurrence of motif #1 with three motifs, #5, #6, and #1 itself. These three circuit motifs all share the same excitatory regulatory link from one gene to the other but differ by just a self-inhibitory link. The co-occurrence of these motifs further demonstrates the sparseness of regulatory interactions as one of the determining factors of flexible circuits.

Figure 3. Circuit motif enrichment analysis with respect to circuit flexibility.

Figure 3.

Contents similar to those in Fig.2. Here, all non-redundant four-node gene circuits are scored and ranked by flexibility. The enrichment of a single motif is significant for all motifs (adjusted p-values <0.01, as shown in Fig.S6B).

Distinct circuits with features of combined multiplicity and flexibility

In the previous two sections, we have evaluated the multiplicity and flexibility of all nonredundant four-node gene circuits and applied motif enrichment analysis to identify two-node circuit motifs associated with either high multiplicity or high flexibility. Furthermore, we evaluated the relationship between the multiplicity and flexibility of a circuit. From the density and scatter plot in Fig.4A, we observed a weak correlation (0.17275 Pearson correlation coefficient) between these two scores. Interestingly, we found circuits rarely have high multiplicity and flexibility simultaneously (only 0.048% of circuits with both H and F higher than 1.5 standard deviations above the mean value). However, much more circuits were found to have high multiplicity and low flexibility (0.144% of circuits with H higher than 1.5 standard deviations above the mean value and F lower than 1.5 standard deviations below the mean value). More circuits were also found to have low multiplicity and high flexibility (0.332% circuits with H lower than 1.5 standard deviations below the mean value and F higher than 1.5 standard deviations above the mean value). It is reasonable that, despite no apparent correlation between multiplicity and flexibility, circuits with the highest multiplicity are less likely to be flexible, while circuits with the highest flexibility are required to have fewer gene expression clusters, thus being low in multiplicity.

Figure 4. Gene regulatory circuits ranked by combined multiplicity and flexibility.

Figure 4.

(A) The density map of multiplicity (x-axis) and flexibility (y-axis) for all non-redundant four-node gene circuits. (B) The panels show, for the top five circuits ranked by the product of multiplicity and flexibility, the circuit diagrams (top row) and the density maps of RACIPE-simulated gene expression projected onto the first two PCs (bottom row). Panel (C) shows the outcomes for the bottom five circuits.

As we discussed earlier, we are interested in functional GRNs with both high multiplicity and flexibility, despite their low occurrence in four-node circuits. We applied the circuit motif analysis with the combined score G defined in Equation (3). The range of G is from about 1.5 to 6.72. The top five ranked circuits, as shown in Fig.4B, have the following features. First, these circuits all have relatively simpler circuit topologies compared to the topmost circuits ranked by multiplicity. Second, these circuits all contain multiple self-activations, thus generating a high number of gene expression clusters. Third, the gene expression distributions resulting from these circuits seem to be less structured, compared to those from the circuits with the highest multiplicity. All these features contribute to being high in both multiplicity and flexibility. Contrarily, the bottom five ranked circuits, as shown in Fig.4C, have more self-inhibitions, allow for gene expression distributions of a single cluster, and have highly connected circuit topologies. These properties are exactly opposite to those of top-ranked circuits, explaining why the bottom-ranked circuits have low multiplicity and flexibility. These observations are also consistent with the outcomes of the circuit motif enrichment analysis, as shown in Fig.5 (significant test in Fig.S6C). Note that we identified circuit motifs #19 and #39 again as enriched motifs with high multiplicity and flexibility. These toggle-switch-like motifs were observed, presumably because they can generate bistability, thus having more potential to generate more states when coupled with similar motifs, and meanwhile can allow flexible switches among states21,48.

Figure 5. Circuit motif enrichment analysis by both multiplicity and flexibility.

Figure 5.

Contents similar to those in Fig.2. Here, all non-redundant four-node gene circuits are scored and ranked by the product of multiplicity and flexibility. The enrichment of a single motif is significant for most motifs, except for motif #31 (adjusted p-values < 0.01, as shown in Fig.S6C).

Multiplicity and flexibility in large random GRNs

Lastly, we explored the properties of multiplicity and flexibility in large random GRNs. To generate GRNs of an extended range of multiplicity and flexibility, we selected a list of two-node circuit motifs as the building blocks and synthesized them into large GRNs of different sizes, with either sequential, scale-free, or random topological structure (see Methods for the detailed implementation). The selected motifs are either (1) the top 3 two-node circuit motifs ranked by multiplicity (i.e., motifs # 30, 21, 39), (2) the top 3 two-node circuit motifs ranked by flexibility (i.e., motifs #1, 10, 19), or (3) the motifs from both (1) and (2). For each motif type, GRN topology type, and GRN size, we randomly generated the topology of ten networks (see the companion GitHub repository49 for all GRN topologies), followed by RACIPE simulations to generate 10,000 gene expression profiles for each GRN. To calculate the multiplicity and flexibility for a large GRN, we performed principal component analysis on the standardized log transformation gene expression, and applied Equations (1) and (2) using the data projected onto the first four PCs. Note that in Equation (2), the summation over gene perturbation is still applied to all genes in the GRN. We chose to first project data onto the first four PCs, as the data with reduced dimensions usually capture gene expression states well. Moreover, low dimensional reduction has been widely used in high dimensional gene expression data analysis.

The multiplicity and flexibility of these random GRNs are summarized in Fig.6, where we identified the following interesting findings. First, the overall trends of multiplicity and flexibility for different GRN sizes and types are very similar for different choices of circuit motifs. The multiplicity scores are usually at high levels when using the motifs with the highest multiplicity and at low levels when using the motifs with the highest flexibility. Similarly, the flexibility scores are usually at high levels when using the motifs with the highest flexibility and at low levels when using the motifs with the highest multiplicity. Thus, among GRNs of different sizes, multiplicity and flexibility are anticorrelated. Our finding also suggests that the multiplicity/flexibility properties of a large GRN are largely determined by the properties of the circuit motifs within the GRN.

Figure 6. Multiplicity and flexibility of large gene regulatory networks.

Figure 6.

The plots show the multiplicity (blue points) and flexibility (red points) for GRNs of different sizes (number of genes on x-axis) and different network types (each panel). Different columns correspond to sequential networks (1st column), scale-free networks (2nd column), random networks with sparse connectivity (ver 1, 3rd column), and random networks with dense connectivity (ver 2, 4th column). Different rows correspond to networks synthesized with top 3 multiplicity two-node motifs (1st row), top 3 flexibility two-node motifs (2nd row), and all these 6 motifs (3rd row).

Second, regardless of the type of GRNs, multiplicity was found to be linearly correlated with the number of genes but saturated for large number of genes (blue points in Fig.6). For each category of GRNs (i.e., different sizes and motif types), the variations of multiplicity among ten random networks are mostly small, but slightly larger for the GRNs with mixed motifs. We also computed the multiplicity when the local density was estimated with the gene expression profiles of all dimensions (Fig.S7), and, in this case, multiplicity is always linearly correlated with the number network genes. The dependence of multiplicity on GRN sizes can be understood as follows. When the GRNs are very small, the number of distinct states allowed by the GRNs are also limited. When the GRNs become larger, much richer network behaviors can be observed, therefore larger multiplicity. However, when the GRNs get extremely large, although the variations of gene expression still increase (multiplicity for data with full dimensions), the number of distinct gene expression states get saturated (multiplicity for data with reduced dimensions).

Third, flexibility was found to be linearly anti-correlated with the number genes for sequential networks, scale-free networks, and random networks where motifs are sparsely connected with a fixed number of interactions per motif denoted as random ver1, see Methods for details) (red points, 1st to 3rd columns in Fig.6), despite much larger variations in flexibility among ten networks of the same category. In those situations, we also observed a saturation of flexibility for small GRNs. Because of high variations and saturation of flexibility, we also observed a few small GRNs with low flexibility. Interestingly, for networks where motifs are densely connected with a fixed ratio of interactions per motif (denoted as random ver2), we observed a bell shape of flexibility with respect to the number of genes, i.e., the highest flexibility may occur in GRNs of intermediate sizes.

Taken together, when the network size increases, multiplicity increases while flexibility decreases. Both multiplicity and flexibility tend to be saturated for large and small GRNs, respectively. Based on these findings, we perceive that the GRNs with both high multiplicity and flexibility are likely of intermediate sizes.

Discussion

In this study, we explored the types of gene circuit motifs that contribute to a functional gene regulatory network (GRNs). We first defined two scoring functions to quantify the multiplicity and flexibility of a gene regulatory circuit based on the circuit’s gene expression distribution. We then systematically applied the scores to rank all non-redundant four-node gene circuits. By applying gene circuit motif analysis, we identified reoccurring two-node circuit motifs and the co-occurrence of two motifs that enriched in top-ranked circuits by either multiplicity, flexibility, or a combination of both. Furthermore, using the enriched motifs as the building blocks, we generated many GRNs of different types and sizes and investigated the GRN properties that contribute to high levels of multiplicity and flexibility. We hope this study will improve our understanding of the design of biological GRNs.

The core approach utilized in this study is the circuit motif enrichment analysis that we recently introduced30. We have demonstrated the effectiveness of this approach in identifying not only circuit motifs associated with a particular dynamical behavior but also the coupling of two circuit motifs. Here, we focused on multiplicity, the ability of a GRN in generating a high number of states, and flexibility, the ability of a GRN in altering gene expression upon perturbations. In our view, multiplicity and flexibility are among the most important features of a functional GRN. From the enrichment analysis, circuit motifs with mutual regulations and self-activation tend to have high multiplicity, while circuit motifs with single mono-directional regulation and without autoregulation tend to have high flexibility. Remarkably, two types of circuit motifs allow both high multiplicity and high flexibility – either motifs with sparse connectivity and self-activation or toggle-switch-like motifs.

While it is important to elucidate the types of circuit motifs having high multiplicity and/or flexibility, we also wonder how these circuit motifs contribute to the multiplicity and flexibility of larger GRNs. To address this question, we generated GRNs of different sizes and types using the enriched circuit motifs as the building blocks. From an extensive network analysis, we found that network multiplicity and flexibility indeed are largely impacted by the types of circuit motifs with the GRNs. Overall, GRNs of intermediate sizes (around 30, also see Fig. 6) tend to have combined high levels of multiplicity and flexibility. Thus, we hypothesize that a biological GRN, when considered as a functional dynamical system, should be of intermediate sizes. This can be understood by the following: when a GRN is too small, it is not complex enough to robustly generate desired functionality; when a GRN is too large, it could be too rigid to allow sufficient control by external signals or environmental factors50,51. Thus, GRNs of intermediate sizes can alleviate the issues of smaller and larger GRNs. In our view, this criterion of network size would be helpful to elucidate the design principle of biological GRNs and improve the effectiveness of GRN inference.

There are a few related topics that are worth further investigation. First, when simulating circuit dynamics, we assume AND logics when multiple genes regulate a target gene. It is interesting to evaluate how other types of logical rules52 affect GRN multiplicity and flexibility. Second, the current approach focuses on characterizing gene expression distributions, but many functional GRNs may act as oscillators5355. One of the potential future directions is to evaluate oscillatory dynamics56 in the circuit motif analysis. Third, we have observed that multiplicity get saturated for large networks. Indeed, biological networks usually exhibit a limited number of cellular states, thus limiting the level of multiplicity. It is worth some further studies to elucidate the saturation of cellular states in biological networks17.

Supplementary Material

SI

SI text 1. Details of modeling gene circuits using RACIPE

SI text 2. Generation of all non-redundant four-node gene circuits

Figure S1. The circuit diagrams and indices of all two-node motifs

Figure S2. Schematic of random network generation.

Figure S3. Five four-node gene circuits of different multiplicity scores.

Figure S4. Nonredundant four-node circuits with the least multiplicity and flexibility ranks.

Figure S5. Four four-node gene circuits of different flexibility scores.

Figure S6. Adjusted p-values for the enrichment of all two-node circuit motifs.

Figure S7. Multiplicity and flexibility of large gene regulatory networks (where multiplicity scores were computed using all dimensions).

Acknowledgments

This work is supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM128717, and by startup funds from Northeastern University.

Footnotes

Data Requirements

Relevant R code from this study is available at the following link: (https://github.com/huanglijiaU201614513/circuitanalysis). That includes the code for RACIPE simulations, state distribution scoring, construction of all four-node circuit motifs, enrichment analysis of non-redundant four-node circuits, and generation of random gene networks. The topologies of all large gene networks are also provided.

References

  • (1).Chuang H-Y; Hofree M; Ideker T A Decade of Systems Biology. Annu Rev Cell Dev Biol 2010, 26, 721–744. 10.1146/annurev-cellbio-100109-104122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (2).Shen-Orr SS; Milo R; Mangan S; Alon U Network Motifs in the Transcriptional Regulation Network of Escherichia Coli. Nat Genet 2002, 31 (1), 64–68. 10.1038/ng881. [DOI] [PubMed] [Google Scholar]
  • (3).Karlebach G; Shamir R Modelling and Analysis of Gene Regulatory Networks. Nat Rev Mol Cell Biol 2008, 9 (10), 770–780. 10.1038/nrm2503. [DOI] [PubMed] [Google Scholar]
  • (4).Aibar S; González-Blas CB; Moerman T; Huynh-Thu VA; Imrichova H; Hulselmans G; Rambow F; Marine J-C; Geurts P; Aerts J et al. SCENIC: Single-Cell Regulatory Network Inference and Clustering. Nat Methods 2017, 14 (11), 1083–1086. 10.1038/nmeth.4463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (5).Ding H; Douglass EF; Sonabend AM; Mela A; Bose S; Gonzalez C; Canoll PD; Sims PA; Alvarez MJ; Califano A Quantitative Assessment of Protein Activity in Orphan Tissues and Single Cells Using the MetaVIPER Algorithm. Nat Commun 2018, 9 (1), 1471. 10.1038/s41467-018-03843-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (6).Ding J; Aronow BJ; Kaminski N; Kitzmiller J; Whitsett JA; Bar-Joseph Z Reconstructing Differentiation Networks and Their Regulation from Time Series Single-Cell Expression Data. Genome Res 2018, 28 (3), 383–395. 10.1101/gr.225979.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (7).Chan TE; Stumpf MPH; Babtie AC Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures. Cell Syst 2017, 5 (3), 251–267.e3. 10.1016/j.cels.2017.08.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (8).Huynh-Thu VA; Irrthum A; Wehenkel L; Geurts P Inferring Regulatory Networks from Expression Data Using Tree-Based Methods. PLoS One 2010, 5 (9), e12776. 10.1371/journal.pone.0012776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (9).Kim S Ppcor: An R Package for a Fast Calculation to Semi-Partial Correlation Coefficients. Commun Stat Appl Methods 2015, 22 (6), 665–674. 10.5351/CSAM.2015.22.6.665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (10).Matsumoto H; Kiryu H; Furusawa C; Ko MSH; Ko SBH; Gouda N; Hayashi T; Nikaido I SCODE: An Efficient Regulatory Network Inference Algorithm from Single-Cell RNA-Seq during Differentiation. Bioinformatics 2017, 33 (15), 2314–2321. 10.1093/bioinformatics/btx194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (11).Specht AT; Li J LEAP: Constructing Gene Co-Expression Networks for Single-Cell RNA-Sequencing Data Using Pseudotime Ordering. Bioinformatics 2017, 33 (5), 764–766. 10.1093/bioinformatics/btw729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (12).Qiu X; Rahimzamani A; Wang L; Ren B; Mao Q; Durham T; McFaline-Figueroa JL; Saunders L; Trapnell C; Kannan S Inferring Causal Gene Regulatory Networks from Coupled Single-Cell Expression Dynamics Using Scribe. Cell Syst 2020, 10 (3), 265–274.e11. 10.1016/j.cels.2020.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (13).Pratapa A; Jalihal AP; Law JN; Bharadwaj A; Murali TM Benchmarking Algorithms for Gene Regulatory Network Inference from Single-Cell Transcriptomic Data. Nat Methods 2020, 17 (2), 147–154. 10.1038/s41592-019-0690-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (14).Kang Y; Thieffry D; Cantini L Evaluating the Reproducibility of Single-Cell Gene Regulatory Network Inference Algorithms. Frontiers in Genetics 2021, 12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (15).Seçilmiş D; Hillerton T; Sonnhammer ELL GRNbenchmark - a Web Server for Benchmarking Directed Gene Regulatory Network Inference Methods. Nucleic Acids Research 2022, 50 (W1), W398–W404. 10.1093/nar/gkac377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (16).Huang B; Lu M; Galbraith M; Levine H; Onuchic JN; Jia D Decoding the Mechanisms Underlying Cell-Fate Decision-Making during Stem Cell Differentiation by Random Circuit Perturbation. Journal of The Royal Society Interface 2020, 17 (169), 20200500. 10.1098/rsif.2020.0500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (17).Tripathi S; Kessler DA; Levine H Biological Networks Regulating Cell Fate Choice Are Minimally Frustrated. Phys. Rev. Lett. 2020, 125 (8), 088101. 10.1103/PhysRevLett.125.088101. [DOI] [PubMed] [Google Scholar]
  • (18).Ye Y; Kang X; Bailey J; Li C; Hong T An Enriched Network Motif Family Regulates Multistep Cell Fate Transitions with Restricted Reversibility. PLoS Comput Biol 2019, 15 (3), e1006855. 10.1371/journal.pcbi.1006855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (19).Duddu AS; Sahoo S; Hati S; Jhunjhunwala S; Jolly MK Multi-Stability in Cellular Differentiation Enabled by a Network of Three Mutually Repressing Master Regulators. J R Soc Interface 2020, 17 (170), 20200631. 10.1098/rsif.2020.0631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (20).Laurent M; Kellershohn N Multistability: A Major Means of Differentiation and Evolution in Biological Systems. Trends Biochem Sci 1999, 24 (11), 418–422. 10.1016/s0968-0004(99)01473-5. [DOI] [PubMed] [Google Scholar]
  • (21).Guantes R; Poyatos JF Multistable Decision Switches for Flexible Control of Epigenetic Differentiation. PLOS Computational Biology 2008, 4 (11), e1000235. 10.1371/journal.pcbi.1000235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (22).Bhalla US; Iyengar R Emergent Properties of Networks of Biological Signaling Pathways. Science 1999, 283 (5400), 381–387. 10.1126/science.283.5400.381. [DOI] [PubMed] [Google Scholar]
  • (23).Li M; Gao H; Wang J; Wu F-X Control Principles for Complex Biological Networks. Brief Bioinform 2019, 20 (6), 2253–2266. 10.1093/bib/bby088. [DOI] [PubMed] [Google Scholar]
  • (24).Zañudo JGT; Yang G; Albert R Structure-Based Control of Complex Networks with Nonlinear Dynamics. Proceedings of the National Academy of Sciences 2017, 114 (28), 7234–7239. 10.1073/pnas.1617387114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (25).Liu Y-Y; Barabási A-L Control Principles of Complex Systems. Rev. Mod. Phys. 2016, 88 (3), 035006. 10.1103/RevModPhys.88.035006. [DOI] [Google Scholar]
  • (26).Zhang W; Liu HT MAPK Signal Pathways in the Regulation of Cell Proliferation in Mammalian Cells. Cell Res 2002, 12 (1), 9–18. 10.1038/sj.cr.7290105. [DOI] [PubMed] [Google Scholar]
  • (27).Adli M The CRISPR Tool Kit for Genome Editing and Beyond. Nat Commun 2018, 9 (1), 1911. 10.1038/s41467-018-04252-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (28).Bao Y; Hayashida M; Liu P; Ishitsuka M; Nacher JC; Akutsu T Analysis of Critical and Redundant Vertices in Controlling Directed Complex Networks Using Feedback Vertex Sets. J Comput Biol 2018, 25 (10), 1071–1090. 10.1089/cmb.2018.0019. [DOI] [PubMed] [Google Scholar]
  • (29).Bhattacharya P; Raman K; Tangirala AK Discovering Adaptation-Capable Biological Network Structures Using Control-Theoretic Approaches. PLOS Computational Biology 2022, 18 (1), e1009769. 10.1371/journal.pcbi.1009769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (30).Clauss B; Lu M A Quantitative Evaluation of Topological Motifs and Their Coupling in Gene Circuit State Distributions. bioRxiv July 20, 2022, p 2022.07.19.500691. 10.1101/2022.07.19.500691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (31).Schaerli Y; Munteanu A; Gili M; Cotterell J; Sharpe J; Isalan M A Unified Design Space of Synthetic Stripe-Forming Networks. Nat Commun 2014, 5 (1), 4905. 10.1038/ncomms5905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (32).Jiménez A; Cotterell J; Munteanu A; Sharpe J A Spectrum of Modularity in Multi-Functional Gene Circuits. Molecular Systems Biology 2017, 13 (4), 925. 10.15252/msb.20167347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (33).Huang B; Jia D; Feng J; Levine H; Onuchic JN; Lu M RACIPE: A Computational Tool for Modeling Gene Regulatory Circuits Using Randomization. BMC Syst Biol 2018, 12 (1), 74. 10.1186/s12918-018-0594-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (34).Kohar V; Lu M Role of Noise and Parametric Variation in the Dynamics of Gene Regulatory Circuits. NPJ Syst Biol Appl 2018, 4, 40. 10.1038/s41540-018-0076-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (35).Katebi A; Kohar V; Lu M Random Parametric Perturbations of Gene Regulatory Circuit Uncover State Transitions in Cell Cycle. iScience 2020, 23 (6), 101150. 10.1016/j.isci.2020.101150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (36).Ramirez D; Kohar V; Lu M Toward Modeling Context-Specific EMT Regulatory Networks Using Temporal Single Cell RNA-Seq Data. Front Mol Biosci 2020, 7, 54. 10.3389/fmolb.2020.00054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (37).Su K; Katebi A; Kohar V; Clauss B; Gordin D; Qin ZS; Karuturi RKM; Li S; Lu M NetAct: A Computational Platform to Construct Core Transcription Factor Regulatory Networks Using Gene Activity. bioRxiv May 9, 2022, p 2022.05.06.487898. 10.1101/2022.05.06.487898. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (38).Lord WM; Sun J; Bollt EM Geometric K-Nearest Neighbor Estimation of Entropy and Mutual Information. Chaos 2018, 28 (3), 033114. 10.1063/1.5011683. [DOI] [PubMed] [Google Scholar]
  • (39).Altman NS An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician 1992, 46 (3), 175–185. 10.2307/2685209. [DOI] [Google Scholar]
  • (40).Dodge Y The Concise Encyclopedia of Statistics; Springer Science & Business Media, 2008. [Google Scholar]
  • (41).Yu S; Feng Y; Zhang D; Bedru HD; Xu B; Xia F Motif Discovery in Networks: A Survey. Computer Science Review 2020, 37, 100267. 10.1016/j.cosrev.2020.100267. [DOI] [Google Scholar]
  • (42).Benjamini Y, H., Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. 57,. Journal of the Royal Statistical Society Series B No. 57, 289––300. [Google Scholar]
  • (43).Subramanian A; Tamayo P; Mootha VK; Mukherjee S; Ebert BL; Gillette MA; Paulovich A; Pomeroy SL; Golub TR; Lander ES et al. Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles. Proceedings of the National Academy of Sciences 2005, 102 (43), 15545–15550. 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (44).Gabor Csardi TN The Igraph Software Package for Complex Network Research. InterJournal 2006, 1695. [Google Scholar]
  • (45).Alon U Network Motifs: Theory and Experimental Approaches. Nat Rev Genet 2007, 8 (6), 450–461. 10.1038/nrg2102. [DOI] [PubMed] [Google Scholar]
  • (46).Huang B; Lu M; Jia D; Ben-Jacob E; Levine H; Onuchic JN Interrogating the Topological Robustness of Gene Regulatory Circuits by Randomization. PLoS Comput Biol 2017, 13 (3), e1005456. 10.1371/journal.pcbi.1005456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (47).Gardner TS; Cantor CR; Collins JJ Construction of a Genetic Toggle Switch in Escherichia Coli. Nature 2000, 403 (6767), 339–342. 10.1038/35002131. [DOI] [PubMed] [Google Scholar]
  • (48).Tian X-J; Zhang X-P; Liu F; Wang W Interlinking Positive and Negative Feedback Loops Creates a Tunable Motif in Gene Regulatory Networks. Phys. Rev. E 2009, 80 (1), 011926. 10.1103/PhysRevE.80.011926. [DOI] [PubMed] [Google Scholar]
  • (49).Github repository of this study. https://github.com/huanglijiaU201614513/circuitanalysis.
  • (50).Cooke J; Nowak MA; Boerlijst M; Maynard-Smith J Evolutionary Origins and Maintenance of Redundant Gene Expression during Metazoan Development. Trends Genet 1997, 13 (9), 360–364. 10.1016/s0168-9525(97)01233-x. [DOI] [PubMed] [Google Scholar]
  • (51).Kafri R; Levy M; Pilpel Y The Regulatory Utilization of Genetic Redundancy through Responsive Backup Circuits. Proceedings of the National Academy of Sciences 2006, 103 (31), 11653–11658. 10.1073/pnas.0604883103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (52).Wang N; Lefaudeux D; Mazumder A; Li JJ; Hoffmann A Identifying the Combinatorial Control of Signal-Dependent Transcription Factors. PLOS Computational Biology 2021, 17 (6), e1009095. 10.1371/journal.pcbi.1009095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (53).Li Z; Yang Q Systems and Synthetic Biology Approaches in Understanding Biological Oscillators. Quant Biol 2018, 6 (1), 1–14. 10.1007/s40484-017-0120-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (54).Ferrell JE; Tsai TY-C; Yang Q Modeling the Cell Cycle: Why Do Certain Circuits Oscillate? Cell 2011, 144 (6), 874–885. 10.1016/j.cell.2011.03.006. [DOI] [PubMed] [Google Scholar]
  • (55).Bell-Pedersen D; Cassone VM; Earnest DJ; Golden SS; Hardin PE; Thomas TL; Zoran MJ CIRCADIAN RHYTHMS FROM MULTIPLE OSCILLATORS: LESSONS FROM DIVERSE ORGANISMS. Nat Rev Genet 2005, 6 (7), 544–556. 10.1038/nrg1633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (56).Panovska-Griffiths J; Page KM; Briscoe J A Gene Regulatory Motif That Generates Oscillatory or Multiway Switch Outputs. J R Soc Interface 2013, 10 (79), 20120826. 10.1098/rsif.2012.0826. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SI

SI text 1. Details of modeling gene circuits using RACIPE

SI text 2. Generation of all non-redundant four-node gene circuits

Figure S1. The circuit diagrams and indices of all two-node motifs

Figure S2. Schematic of random network generation.

Figure S3. Five four-node gene circuits of different multiplicity scores.

Figure S4. Nonredundant four-node circuits with the least multiplicity and flexibility ranks.

Figure S5. Four four-node gene circuits of different flexibility scores.

Figure S6. Adjusted p-values for the enrichment of all two-node circuit motifs.

Figure S7. Multiplicity and flexibility of large gene regulatory networks (where multiplicity scores were computed using all dimensions).

RESOURCES