Skip to main content
PLOS One logoLink to PLOS One
. 2012 Sep 21;7(9):e43819. doi: 10.1371/journal.pone.0043819

Gene Regulatory Network Inference from Multifactorial Perturbation Data Using both Regression and Correlation Analyses

Jie Xiong 1,*, Tong Zhou 2
Editor: Alberto de la Fuente3
PMCID: PMC3448649  PMID: 23028471

Abstract

An important problem in systems biology is to reconstruct gene regulatory networks (GRNs) from experimental data and other a priori information. The DREAM project offers some types of experimental data, such as knockout data, knockdown data, time series data, etc. Among them, multifactorial perturbation data are easier and less expensive to obtain than other types of experimental data and are thus more common in practice. In this article, a new algorithm is presented for the inference of GRNs using the DREAM4 multifactorial perturbation data. The GRN inference problem among Inline graphic genes is decomposed into Inline graphic different regression problems. In each of the regression problems, the expression level of a target gene is predicted solely from the expression level of a potential regulation gene. For different potential regulation genes, different weights for a specific target gene are constructed by using the sum of squared residuals and the Pearson correlation coefficient. Then these weights are normalized to reflect effort differences of regulating distinct genes. By appropriately choosing the parameters of the power law, we constructe a 0–1 integer programming problem. By solving this problem, direct regulation genes for an arbitrary gene can be estimated. And, the normalized weight of a gene is modified, on the basis of the estimation results about the existence of direct regulations to it. These normalized and modified weights are used in queuing the possibility of the existence of a corresponding direct regulation. Computation results with the DREAM4 In Silico Size 100 Multifactorial subchallenge show that estimation performances of the suggested algorithm can even outperform the best team. Using the real data provided by the DREAM5 Network Inference Challenge, estimation performances can be ranked third. Furthermore, the high precision of the obtained most reliable predictions shows the suggested algorithm may be helpful in guiding biological experiment designs.

Introduction

Reconstructing the structure of a gene regulatory network (GRN) from experimental data and other a priori information is very helpful in understanding the development, pathology and functioning of all biological organisms. Recently, with the development of high-throughput technologies, such as DNA microarrays, mass spectroscopy, etc., it is possible to reconstruct GRNs from some types of experimental data. In practice, the common data types contain knockout data, knockdown data, time series data, etc. Various models and methods have been suggested to attack this problem based on these types of experimental data, such as Boolean networks [1], Bayesian networks [2], information theory based algorithms [3], ordinary differential equation (ODE) based methods [4], etc.

Recently, the Dialogue for Reverse Engineering Assessments and Methods (DREAM) has been providing not only a set of benchmark networks extracted from actual biological networks of some most important and typical biological modules, such as Escherichia coli transcriptional regulatory network and Saccharomyces cerevisiae (yeast) transcriptional regulatory network [5], but also some types of In Silico gene expression data sets generated by the GeneNetWeaver tool version 2.0 [6], to motivate the systems biology community to investigate and develop structure identification methods for GRNs. In particular, the DREAM project offers an alternative type of steady-state data, i.e., multifactorial perturbation data, which are obtained by slightly perturbing all genes simultaneously so that the basal activation of all genes of the network is slightly increased or decreased simultaneously by different random amounts [5]. Multifactorial perturbation data might be regarded as expression profiles obtained from different patients [5]. Therefore, such data are easier and less expensive to be obtained than other types of experimental data and are thus more common in practice [7]. On the other hand, such data provide less information about GRNs with respect to other types of data which make the GRN identification problem more formidable [7].

Several methods have been shown to be effective in inferring the structure of a GRN through participating in the DREAM4 In Silico Size 100 Multifactorial subchallenge. For example, the best performer has developed GENIE3 algorithm for the inference of GRNs, which decomposes the prediction of a regulatory network among Inline graphic genes into Inline graphic different regression problems. In each of the regression problems, the expression pattern of a target gene is predicted from the expression patterns of all the other genes, using tree-based methods [7]. The second place team tackled the problem via a sparse Gaussian Markov Random Field, which relates network topology with the covariance inverse generated by the gene measurements. And, the Graphical Lasso algorithm is used to compute the covariance inverse. Then, the optimal network is selected by different model selection criteria [8]. On the other hand, a GRN can be modeled as a correlation network [9], which is obtained by computing the correlation coefficient between arbitrary two genes. Surprisingly but also interestingly, this simple method was proved to be placed at the second (tie) for the DREAM4 In Silico Size 100 Multifactorial subchallenge. However, due to the symmetry of the correlation coefficient, the estimated correlation network topology is undirected.

Motivated by the GENIE3 algorithm, an identification algorithm is developed in this paper for GRN topology inference, based on the regression analysis and the correlation analysis. Specifically, the GRN inference problem among Inline graphic genes is decomposed into Inline graphic different regression problems. And, in each of the regression problems, the expression level of a target gene is predicted solely from the expression level of a potential regulation gene. For different potential regulation genes, different weights for a specific target gene are constructed. The larger the sum of squared residuals is, the weaker the direct regulatory interaction will be. And, the higher the Pearson correlation coefficient is, the stronger the rationality is for the application of the linear regression. To take both into consideration, the weight corresponding to a possible direct regulation is selected as their product. Then these weights are normalized to reflect effort differences of regulating distinct genes.

It has been observed that most large scale gene regulatory networks are sparse. Mathematically, the sparsity of a GRN may be characterized by the power law [4]. And, the in-degree distribution of a GRN can be obtained by means of the power law. In this paper, the so-called in-degree distribution means the number of genes with in-degree equal to Inline graphic. By appropriately choosing the in-degree distribution of a GRN, this paper suggest a method to utilize the sparsity quantitatively. Through constructing loss functions and incorporating power law, and solving a 0–1 integer programming problem, the direct regulation genes for an arbitrary gene can be estimated. Then, the above normalized weights can be further adjusted based on these estimated direct regulatory relationships.

In general, these weights are used to queue the possibility of the direct causal regulation. The larger the adjusted weight is, the higher the confidence is for the existence of the direct causal regulation. When a threshold is provided, this queue can lead to an estimate about the structure of a GRN. Computation results with the DREAM4 In Silico Size 100 Multifactorial subchallenge show that estimation performances of the suggested algorithm can even outperform the best team. Using the real data provided by the DREAM5 Network Inference Challenge, estimation performances by the proposed method can be ranked third. Furthermore, the high precision of the obtained most reliable predictions implies that the suggested algorithm may be helpful in guiding biological validation experiment designs.

The outline of this paper is as follows. At first, the structure estimation algorithm is illustrated. Afterwards, the proposed estimation method is assessed using the data sets of the DREAM4 In Silico Size 100 Multifactorial subchallenge and the DREAM5 Network Inference Challenge. Variations of estimation performances with respect to parameters of the suggested method will also be reported. Finally, some concluding remarks are given about the characteristics of the suggested method, as well as some future works worthy of further efforts.

Materials and Methods

Problem Statement

Considering a GRN with Inline graphic genes, it is assumed that the targeted network is a directed graph, in which each node represents a gene, and an edge directed from one gene Inline graphic to another gene Inline graphic indicates that gene Inline graphic regulates the expression of gene Inline graphic directly. The goal of gene regulatory network inference in this paper is to recover the network solely from multifactorial perturbation data. A set of multifactorial perturbation data can be obtained by first perturbing all genes simultaneously, and then measuring steady-state levels of all genes. Different data sets can be obtained by implementing different perturbations to the network [5]. At the same time, such data do not give information about the regulatory network dynamics, but about the system equilibrium once it has recovered after the perturbation [8].

Denote Inline graphic by Inline graphic sets of multifactorial perturbation data:

graphic file with name pone.0043819.e015.jpg

where, Inline graphic represents the steady-state levels of gene Inline graphic in the Inline graphic-th experiment. Specifically, the problem of recovering regulatory networks is addressed as follows:

Utilizing data set Inline graphic, design a GRN inference algorithm and assign weights Inline graphic. The larger the weight Inline graphic is, the higher the confidence is for the existence of the direct causal regulation from gene Inline graphic to gene Inline graphic.

For most of large scale networks, it has been observed that the distribution of the number of chemical elements that have direct regulatory effects on a randomly chosen chemical element, obeys approximately a power law [4], [10][13]. More specifically, let Inline graphic denote the probability that the number of direct regulations on a randomly chosen chemical element equals to Inline graphic, then there exist a positive number Inline graphic and a positive integer Inline graphic, such that [4]

graphic file with name pone.0043819.e028.jpg

in which Inline graphic. This important prior structural information is also incorporated into our estimation procedures.

Regression Analysis

It is well known that the relevance between any two genes can be represented by the Pearson correlation coefficient [9]. But this method is non-causal. On the other hand, the GENIE3 algorithm decomposes the prediction of a regulatory network among Inline graphic genes into Inline graphic different regression problems. In each of the regression problems, the expression pattern of a target gene is predicted from the expression patterns of all the other genes, using tree-based methods [7]. Motivated by this idea, we decompose the GRN inference problem among Inline graphic genes into Inline graphic different regression problems. The novelty is as follows. In each of the regression problems, the expression level of a target gene is predicted solely from the expression level of a potential regulation gene. For different potential regulation genes, different weights for a specific target gene are constructed. That is, for a particular gene Inline graphic and its potential regulation gene Inline graphic, the aim of the regression analysis is to establish a function, i.e., Inline graphic. Obviously, this function reveals the causal relationship between them. Here, Inline graphic and Inline graphic represent the steady state expression concentrations of the genes Inline graphic and Inline graphic, respectively. In practice, Inline graphic is not completely determined by Inline graphic, because there are many factors which may affect Inline graphic. Therefore, Inline graphic is used to represent the unknown secondary factors or/and the random errors, all of which may affect Inline graphic. An important parameter in the regression model is the variance of Inline graphic, i.e., Inline graphic. In essence, Inline graphic is the mean squared error when Inline graphic is approximated by an suitable function Inline graphic [14]. Generally, when Inline graphic is reasonably selected as the most important factor, then the value of Inline graphic will be relatively smaller; otherwise, the value of Inline graphic will be relatively larger. In other words, for the particular gene Inline graphic, the smaller the magnitude of Inline graphic is, the larger the probability is for the existence of the direct causal regulation from gene Inline graphic to gene Inline graphic.

In practice, although Inline graphic is unavailable, it can be estimated from the sum of squared residuals by using linear regression (least-squares estimation). Therefore, we can construct the weight Inline graphic based on the above discussion. A practical network prediction is obtained by setting a threshold on the ranking of weights from the most to the less significant. In this paper, we focus on the task of constructing weights, while the question of the choice of an optimal confidence threshold, although important, will be left open.

Weight Construction

Denote by Inline graphic the steady-state level of gene Inline graphic. The steady-state level of gene Inline graphic may be directly affected by all other genes expression levels. Therefore we have the following expression:

graphic file with name pone.0043819.e063.jpg (1)

The function Inline graphic in Equation (1) not only contains lots of arguments, but also may be a non-linear function. Thus, it is hard to directly estimate the function Inline graphic. On the other hand, from the definition of the weight Inline graphic, we know that Inline graphic represents the probability of the direct causal regulation only from gene Inline graphic to gene Inline graphic. That is, when the weight Inline graphic is computed, the function Inline graphic in Equation (1) can be approximated by the following expression:

graphic file with name pone.0043819.e072.jpg (2)

The form of the function Inline graphic, however, is not clear and might be non-linear. Hence, the linear regression technique is used to analyze the direct causal regulation from gene Inline graphic to gene Inline graphic. And, the function Inline graphic is approximated by its first order Taylor expansion, i.e.,

graphic file with name pone.0043819.e077.jpg (3)

where, Inline graphic represents the approximation error or/and the measurement error.

Consequently, from Equation (2) and Equation (3), we have

graphic file with name pone.0043819.e079.jpg (4)

The regression coefficients Inline graphic and Inline graphic can be estimated by the least squares estimation. Let Inline graphic, then Inline graphic, Here, Inline graphic Moreover, the sum of squared residuals Inline graphic is also obtained in this process, i.e.,

graphic file with name pone.0043819.e086.jpg (5)

where, Inline graphic. The value of Inline graphic might be regarded as the capability of the direct regulatory interaction from gene Inline graphic to gene Inline graphic. In other words, we assume that the larger the sum of squared residuals Inline graphic is, the weaker the direct regulatory interaction from gene Inline graphic to gene Inline graphic will be. For this reason, the constructed weight should utilize this characteristic provided by Inline graphic.

On the other hand, for arbitrary two data sets Inline graphic and Inline graphic, not matter whether there exists the linear correlation between them, the sum of squared residuals Inline graphic can always be obtained by Equation (5). However, if there does not exist the linear correlation between them, the application of the linear regression is meaningless. To test whether the data sets Inline graphic and Inline graphic are linear correlation, the correlation coefficient is the most frequently used test statistic. The expression for the correlation coefficient Inline graphic is as follows:

graphic file with name pone.0043819.e101.jpg (6)

According to the discussion above, it is clear that the larger the sum of squared residuals Inline graphic is, the weaker the direct regulatory interaction from gene Inline graphic to gene Inline graphic will be. And, the larger the Pearson correlation coefficient Inline graphic is, the stronger the rationality is for the application of the linear regression on data sets Inline graphic and Inline graphic. To take both of them into consideration, a weight Inline graphic corresponding to a possible direct regulation from gene Inline graphic to gene Inline graphic is constructed as follows:

graphic file with name pone.0043819.e111.jpg (7)

For the particular gene Inline graphic, the larger the magnitude of Inline graphic is, the larger the confidence is that gene Inline graphic is directly regulated by gene Inline graphic.

Weight Normalization

It is important to note that in GRN topology inferences the larger the value of Inline graphic is, the larger the probability is for the existence of a direct regulation from gene Inline graphic to gene Inline graphic. Define a Inline graphic dimensional matrix Inline graphic with its Inline graphic-th row Inline graphic-th column element being the estimate of Inline graphic when Inline graphic and its diagonal element being zero, and denote its Inline graphic-th column vector by Inline graphic. And then, it is clear that this matrix contains information about the probability of the existence of a direct regulation between any two different genes in a GRN. However, to infer the structure of a GRN from this matrix, an important fact must be taken into account. That is, in a GRN, some genes may be easily regulated by other genes, while regulations on some other genes may need more efforts [15][17]. This implies that direct regulations to different genes may lead to weights of different magnitude orders. Therefore, in order to obtain a good estimate from the matrix Inline graphic about the topology of a GRN, an appropriate normalization is still required for the estimated Inline graphics among different genes.

In [17], it is suggested to use the Inline graphic-norm of the vector Inline graphic and the geometric average of its non-zero elements to achieve the normalization. More specifically, when Inline graphic is adopted as 3.5, the structure inference performance is improved the most. Therefore, in this paper, it is suggested to also use the Inline graphic-norm of the vector Inline graphic to achieve this normalization, that is, Inline graphic is replaced by

graphic file with name pone.0043819.e135.jpg (8)

It is worthwhile to note that this normalization does not change the diagonal elements. For presentation conciseness, the normalized matrix Inline graphic using the vector Inline graphic-norm is denoted by Inline graphic in the rest of this paper.

The normalization is firstly proposed in [17], in which the weight is represented by the RELV (relative expression level variation). The goal of the normalization is to guarantee that the weights for different genes hold the same magnitude order. For a GRN, in the last ranking list of Inline graphic, if the magnitude is larger, the corresponding transcription regulation will be established in a larger probability.

In-degree Estimation and Weight Magnitude Modification

To compute the weight Inline graphic, the multivariate function Inline graphic is approximated by a univariate function Inline graphic, which implies that the in-degree for an arbitrary gene Inline graphic is assumed as one. Thus, the constructed weights do not employ the information about the combinatorial regulation to a gene. In this subsection, we try to estimate the in-degrees of genes in a GNR to utilize the information about the combinatorial regulation.

It is clear that the value of Inline graphic represents the capability of the direct regulatory interaction from gene Inline graphic to gene Inline graphic, that is, the smaller the sum of squared residuals Inline graphic is, the stronger the direct regulatory interaction from gene Inline graphic to gene Inline graphic will be. Sort the sum of squared residuals of gene Inline graphic in a non-decreasing order, and denote the sorted results as follows:

graphic file with name pone.0043819.e151.jpg

In this ranking Inline graphic, so it is assumed that the top Inline graphic genes from gene Inline graphic to gene Inline graphic have great chance to combinatorially regulate gene Inline graphic. Therefore, the multivariate function Inline graphic can be approximated by a Inline graphic-variable function in such case, i.e.:

graphic file with name pone.0043819.e159.jpg (9)

The form of the function Inline graphic, however, is also not clear and might be non-linear. Hence, the linear regression technique is used again. Applying the first order multiple Taylor expansion to the function Inline graphic, we have

graphic file with name pone.0043819.e162.jpg

where, Inline graphic represents the approximation error or/and the measurement error.

Using the least squares again, not only the regression coefficients Inline graphic, but also the sum of squared residuals Inline graphic and the sum of deviation squares Inline graphic can be estimated. Let

graphic file with name pone.0043819.e167.jpg

then,

graphic file with name pone.0043819.e168.jpg

and,

graphic file with name pone.0043819.e169.jpg (10)

Define a loss function Inline graphic as follows:

graphic file with name pone.0043819.e171.jpg (11)

Here, the value of the sum of squared residuals Inline graphic represents the capability of a direct combinatorial regulation from genes numbered Inline graphic to gene Inline graphic. Obviously, it can be thought that the smaller the sum of squared residuals Inline graphic is, the stronger the direct combinatorial regulation interaction will be. And, Inline graphic is the test statistic. The larger the test statistic is, the stronger the rationality is for the application of the multiple linear regression analysis. Therefore, to take both into consideration, the loss function Inline graphic is defined as Equation (11). And, it can be presumed that the smaller the value of Inline graphic is, the higher the probability is for the establishment of a direct combinatorial regulation from genes numbered Inline graphic to gene Inline graphic.

To estimate the in-degree for a specific gene Inline graphic optimally, one can search Inline graphic from 1 to Inline graphic to find the minimum of the loss function Inline graphic at Inline graphic. In such case, the optimal in-degree for the specific gene Inline graphic is Inline graphic and genes numbered Inline graphic are most likely to have a direct regulation effect on gene Inline graphic. However, to estimate the in-degree for every gene in a GRN optimally, the structural characteristics of GRNs should be taken into consideration, that is, the power low could be taken into consideration. let Inline graphic(Inline graphic) and Inline graphic denote respectively the maximum in-degree of a GRN and the number of genes with its in-degree equalling to Inline graphic. Then, from the power law, it is clear that Inline graphic. Since each gene has a unique in-degree, we can utilize the following 0–1 integer optimization to estimate the in-degree for every gene optimally.

graphic file with name pone.0043819.e195.jpg (12)

Problem (12) can be solved by using a linear programming-based branch-and-bound algorithm [18], [19], and its optimal estimates can be denoted by Inline graphic. For gene Inline graphic, if Inline graphic, with Inline graphic, then, from the above problem description, it is clear that the optimal estimate for the in-degree of this gene is Inline graphic, and genes numbered Inline graphic are most likely to have a direct regulation effect on this gene.

In GRN topology estimation, another important thing worthy of considering is that genes estimated to have a direct regulation should correspond to a weight with a magnitude greater than those estimated not to have a direct regulation [20], [21]. To achieve this purpose, the following adjustment is suggested in this paper. Define Inline graphic as

graphic file with name pone.0043819.e203.jpg (13)

With this value, the normalized weights for an arbitrary gene Inline graphic are modified as follows,

graphic file with name pone.0043819.e205.jpg (14)

Here, for each gene Inline graphic, Inline graphic is determined by the solution of Problem (12).

Denote by Inline graphic the Inline graphic dimensional matrix with its Inline graphic-th row Inline graphic-th column element being Inline graphic. Elements of this matrix are directly used to infer the structure of a GRN. The bigger the Inline graphic-th row Inline graphic-th element is, the higher the probability is that gene Inline graphic is directly regulated by gene Inline graphic.

It should be stressed here that the effectiveness of the in-degree estimation depends on the veracity of the prior structural information. In this paper, the sparsity of a GRN is characterized by the power law. Therefore, the number of genes, whose in-degree are equal to Inline graphic, can be represented as Inline graphic. Here, Inline graphic is the so-called power law. That is, the solution of Problem (12) depends on the parameters of the power law. If the in-degree distribution of a GRN is pertinent and appropriate, the effectiveness of this step may be positive. Otherwise the performance may deteriorate. The results from Table 1,2,3 in the following section may support the argument.

Table 1. Performances Only for Weight Normalization.

Net1 Net2 Net3 Net4 Net5 Inline graphic
Best Team AUROC 0.745 0.733 0.775 0.791 0.798 37.428
Inline graphic (3.334e-18) (1.076e-28) (9.705e-34) (6.736e-33) (1.912e-34)
AUPR 0.154 0.155 0.231 0.208 0.197
Inline graphic (3.309e-34) (7.897e-54) (1.791e-54) (5.489e-47) (4.563e-44)
Inline graphic AUROC 0.6899 0.6485 0.7081 0.6998 0.6655 16.4038
Inline graphic (1.4643e-12) (1.6105e-13) (1.2922e-20) (5.6590e-18) (6.6749e-13)
AUPR 0.0711 0.0893 0.1230 0.0938 0.0532
Inline graphic (5.3481e-14) (1.9060e-24) (3.2720e-27) (6.6316e-19) (3.5962e-09)
Inline graphic AUROC 0.7524 0.7097 0.7694 0.7590 0.7651 35.7303
Inline graphic (5.3124e-19) (5.3557e-24) (1.4810e-32) (2.9514e-27) (3.7157e-28)
AUPR 0.1464 0.1673 0.2212 0.2046 0.1944
Inline graphic (2.2398e-32) (1.7894e-59) (4.0881e-52) (3.0186e-46) (2.1768e-43)
Inline graphic AUROC 0.7614 0.7149 0.7690 0.7697 0.7691 38.1179
Inline graphic (5.1100e-20) (5.0426e-25) (1.8088e-32) (4.0920e-29) (6.5978e-29)
AUPR 0.1626 0.1697 0.2283 0.2229 0.2271
Inline graphic (2.4910e-36) (1.4524e-60) (6.1040e-54) (9.2038e-51) (2.5905e-51)
Inline graphic AUROC 0.7641 0.7172 0.7660 0.7762 0.7693 38.4670
Inline graphic (2.5068e-20) (1.7462e-25) (8.0330e-32) (2.7481e-30) (6.3238e-29)
AUPR 0.1673 0.1612 0.2273 0.2271 0.2448
Inline graphic (1.7753e-37) (1.5096e-56) (1.1603e-53) (8.4634e-52) (1.3305e-55)

Table 2. Performances with the optimal Inline graphic and Inline graphic.

Net1 Net2 Net3 Net4 Net5 Inline graphic
Best Team AUROC 0.745 0.733 0.775 0.791 0.798 37.428
Inline graphic (3.334e-18) (1.076e-28) (9.705e-34) (6.736e-33) (1.912e-34)
AUPR 0.154 0.155 0.231 0.208 0.197
Inline graphic (3.309e-34) (7.897e-54) (1.791e-54) (5.489e-47) (4.563e-44)
Inline graphic AUROC 0.7642 0.7173 0.7865 0.7764 0.7693 39.9465
Inline graphic (2.4413e-20) (1.6671e-25) (2.2385e-36) (2.5303e-30) (6.0610e-29)
AUPR 0.1799 0.1648 0.2341 0.2326 0.2540
Inline graphic (1.4115e-40) (2.7231e-58) (2.1883e-55) (3.7182e-53) (7.8435e-58)
Inline graphic 1 1 3 2 1
Inline graphic 3.3 3.3 1 3.7 5.0

Table 3. Performances with typical Inline graphic and Inline graphic.

Net1 Net2 Net3 Net4 Net5 Inline graphic
Best Team AUROC 0.745 0.733 0.775 0.791 0.798 37.428
Inline graphic (3.334e-18) (1.076e-28) (9.705e-34) (6.736e-33) (1.912e-34)
AUPR 0.154 0.155 0.231 0.208 0.197
Inline graphic (3.309e-34) (7.897e-54) (1.791e-54) (5.489e-47) (4.563e-44)
Inline graphic, Inline graphic AUROC 0.7634 0.7165 0.7691 0.7752 0.7687 37.4489
Inline graphic (3.0972e-20) (2.4137e-25) (1.8088e-32) (4.1501e-30) (7.8172e-29)
AUPR 0.1710 0.1564 0.2290 0.2047 0.2287
Inline graphic (2.2195e-38) (2.4170e-54) (4.2999e-54) (3.0186e-46) (1.0608e-51)
Inline graphic,Inline graphic AUROC 0.7641 0.7171 0.7674 0.7759 0.7694 38.1914
Inline graphic (2.5068e-20) (1.9156e-25) (4.0136e-32) (3.1102e-30) (5.8091e-29)
AUPR 0.1720 0.1598 0.2263 0.2189 0.2394
Inline graphic (1.2653e-38) (6.1630e-56) (1.9625e-53) (8.9338e-50) (2.5610e-54)
Inline graphic,Inline graphic AUROC 0.7641 0.7173 0.7670 0.7762 0.7692 38.8123
Inline graphic (2.5068e-20) (1.6671e-25) (4.8953e-32) (2.8639e-30) (6.3238e-29)
AUPR 0.1722 0.1646 0.2378 0.2173 0.2455
Inline graphic (1.1308e-38) (3.7733e-58) (2.5223e-56) (2.2175e-49) (8.5142e-56)
Inline graphic, Inline graphic AUROC 0.7642 0.7173 0.7660 0.7765 0.7693 39.0655
Inline graphic (2.5068e-20) (1.7462e-25) (8.0330e-32) (2.5303e-30) (6.0610e-29)
AUPR 0.1716 0.1632 0.2396 0.2212 0.2540
Inline graphic (1.5842e-38) (1.5489e-57) (8.8169e-57) (2.4181e-50) (7.8435e-58)
Inline graphic,Inline graphic AUROC 0.7609 0.7152 0.7714 0.7756 0.7685 37.9334
Inline graphic (5.8275e-20) (4.3935e-25) (5.7087e-33) (3.6677e-30) (8.8762e-29)
AUPR 0.1639 0.1536 0.2349 0.2216 0.2380
Inline graphic (1.1998e-36) (4.9060e-53) (1.3716e-55) (2.0391e-50) (5.9144e-54)
Inline graphic,Inline graphic AUROC 0.7630 0.7166 0.7688 0.7758 0.7687 38.2446
Inline graphic (3.3525e-20) (2.3048e-25) (1.9987e-32) (3.3776e-30) (7.8172e-29)
AUPR 0.1634 0.1574 0.2334 0.2271 0.2397
Inline graphic (1.5022e-36) (9.1641e-55) (3.2932e-55) (8.4634e-52) (2.2905e-54)
Inline graphic,Inline graphic AUROC 0.7636 0.7170 0.7687 0.7762 0.7690 38.4010
Inline graphic (2.9378e-20) (2.0063e-25) (2.2085e-32) (2.7481e-30) (6.8837e-29)
AUPR 0.1652 0.1599 0.2312 0.2270 0.2410
Inline graphic (5.7785e-37) (6.1630e-56) (1.1225e-54) (8.9583e-52) (1.1089e-54)
Inline graphic, Inline graphic AUROC 0.7637 0.7171 0.7687 0.7762 0.7692 38.3533
Inline graphic (2.7865e-20) (1.8289e-25) (2.2085e-32) (2.7481e-30) (6.5978e-29)
AUPR 0.1668 0.1612 0.2284 0.2216 0.2428
Inline graphic (2.2228e-37) (1.3546e-56) (5.7578e-54) (2.0391e-50) (4.0615e-55)
Inline graphic, Inline graphic AUROC 0.7586 0.7167 0.7730 0.7745 0.7685 38.0248
Inline graphic (1.0644e-19) (2.2007e-25) (2.5460e-33) (5.5346e-30) (8.8762e-29)
AUPR 0.1626 0.1567 0.2327 0.2231 0.2370
Inline graphic (2.4910e-36) (1.9486e-54) (4.6750e-55) (8.2152e-51) (1.0333e-53)
Inline graphic, Inline graphic AUROC 0.7625 0.7170 0.7710 0.7750 0.7690 38.3306
Inline graphic (3.8251e-20) (1.9156e-25) (6.6388e-33) (4.5061e-30) (6.8837e-29)
AUPR 0.1682 0.1577 0.2340 0.2231 0.2396
Inline graphic (1.0706e-37) (6.6312e-55) (2.3199e-55) (8.2152e-51) (2.4220e-54)
Inline graphic, Inline graphic AUROC 0.7631 0.7175 0.7702 0.7759 0.7689 38.7893
Inline graphic (3.2651e-20) (1.5916e-25) (9.9210e-33) (3.2412e-30) (7.1819e-29)
AUPR 0.1682 0.1624 0.2369 0.2274 0.2414
Inline graphic (1.0706e-37) (3.6898e-57) (4.2661e-56) (7.1370e-52) (8.8708e-55)
Inline graphic, Inline graphic AUROC 0.7634 0.7165 0.7693 0.7760 0.7686 38.4687
Inline graphic (3.0972e-20) (2.4137e-25) (1.6368e-32) (3.1102e-30) (8.5083e-29)
AUPR 0.1696 0.1586 0.2351 0.2270 0.2381
Inline graphic (4.8746e-38) (2.2538e-55) (1.2204e-55) (8.9583e-52) (5.2898e-54)

Estimation Algorithm

In summary, on the basis of the regression analysis and the correlation analysis, the algorithm suggested in this paper for identifying direct regulations of a GRN consists of the following steps.

  1. Compute the weight matrix Inline graphic according to Equations (5), Equations (6) and Equations (7).

  2. Normalize the weight matrix Inline graphic according to Equations (8).

  3. Choose appropriate values for Inline graphic, Inline graphic and Inline graphic, and solve the Problem (12), and modify the matrices Inline graphic according to Equations (13) and (14). (This is an optional step, not necessary.)

Using elements of these matrices Inline graphic (or Inline graphic), queue possibilities of the existence of a direct regulation from the gene with the same number of the row to the gene with the same number of the column. The bigger the element is, the higher the confidence is for the existence of the direct causal regulation.

Results and Discussion

Data Sets and Assessment Metrics

To illustrate the effectiveness of the developed inference algorithm, tests are firstly performed on the DREAM4 In Silico Size100 Multifactorial subchallenge, which are designed to assess performances of an identification method for the structure of a large scale GRN [22]. They respectively contain 5 different benchmark networks with 100 genes which are obtained through extracting some important and typical modules from actual biological networks of E. coli and S. cerevisiae. Auto-regulatory interactions are removed, that is, there are no self-interactions in the in silico networks. For each network, 100 sets of multifactorial perturbation data are supplied.

Predictions are compared with the actual structure of the networks by the DREAM project organizers using the following two different metrics in topology prediction accuracy evaluations.

  • AUPR: The area under the precision-recall curve;

  • AUROC: The area under the receiver operating characteristic curve.

Moreover, for every network, the Inline graphic-values of the AUPR and AUROC measures, which indicate the probability that random predictions would have the same or better performances, are computed, which are respectively denoted by Inline graphic and Inline graphic, Inline graphic. Based on these Inline graphic-values, a final score is calculated as Inline graphic. A larger score indicates a better performance of the adopted inference algorithm. Here, Inline graphic and Inline graphic are defined as follows.

graphic file with name pone.0043819.e314.jpg

Similarly, we can define a specification for each network as Inline graphic, Inline graphic. Based on the above discussion, we know

graphic file with name pone.0043819.e317.jpg

More detailed explanations can be found in [22] or on the web site of the DREAM project at http://wiki.c2b2.columbia.edu/dream/. Moreover, to evaluate performance on real data, tests are also performed on the DREAM5 Network Inference Challenge. Finally, the computation time needed by the suggested method is discussed.

Prediction Performances of Inline graphic

To evaluate the prediction accuracy of Inline graphic, Inline graphic is normalized by using some typical vector norms, such as the 1-norm and the Euclidean norm. Moreover, it is reported that when Inline graphic is adopted as 3.5, the structure inference performance is improved significantly [17]. Thus, each column of Inline graphic is also normalized by using the 3.5-norm. The corresponding results are given in Table 1. Also, the Performance of Inline graphic is include in Table 1.

To compare prediction performances with the best team, the corresponding specifications are also included in Table 1, obtained directly from the web site of the DREAM project. Their digit lengthes are different from the other results that are obtained through actual computations. In addition, the corresponding Inline graphic-value for each specification is given in parentheses. In the last column of Table 1, the obtained scores are also given for each method. From Table 1, it is clear that by the normalization step the structure inference performance is improved remarkably. Specifically, when Inline graphic is chosen as 2 and 3.5, their final scores even outperform the best team's final score.

The final score is a pretty important specification in inferring the structure of GRNs, while the precision specification can not be revealed by the final score. In topology estimations, highly confident predictions can become a good guidance to biological experiment designs [22]. However, these predictions will be helpful only if their precisions are sufficiently high. This requires that a desirable estimation algorithm should have a PR (precision-recall) curve starting from the left upper corner, and decreasing monotonically and slowly with the increment of the recall rate. The ROC curve and PR curve of each network according to Inline graphic, Inline graphic, Inline graphic and Inline graphic are represented in Figure 1.

Figure 1. ROC and PR curves of .

Figure 1

Inline graphic , Inline graphic , Inline graphic and Inline graphic .

From Figure 1, we can draw some conclusions as follows. The AUPR and AUROC measures of Inline graphic, are improved much more by the normalization step compared with these measures of Inline graphic. What's more, when the weight matrix is adopted as Inline graphic, most of the PR curves start from the left upper corner. Specifically, when Inline graphic is chosen as 1 and 3.5, the precision specification is pretty well for all the five networks. And, when the weight matrix is adopted as Inline graphic, except the network 4, the PR curves start from the left upper corner for all other networks. This high precision implies that the suggested algorithm may be helpful in guiding biological validation experiment designs.

To investigate how the AUPR and AUROC measures and the final score of Inline graphic are influenced by Inline graphic, Inline graphic is searched over the interval Inline graphic through an equally spaced sampling with 90 samples. The corresponding results are given in Figure 2.

Figure 2. Prediction results of Inline graphic.

Figure 2

Left: Variations of the AUPR and AUROC measures with q; Right: Variations of the score with q.

The results in Figure 2(a) suggest that when Inline graphic, the AUROC measure for each network maintains growth along with the increase of Inline graphic. And when Inline graphic, the AUROC measure for each network nearly remains unchanged. On the other hand, for the networks 2,3,4, when Inline graphic, the AUPR measure maintains growth; and when Inline graphic, the AUPR measure slowly falls. The situation for the network 1 is similar, while the inflexion point is about Inline graphic. For the network 5, when Inline graphic, the AUPR measure maintains growth; then this measure nearly remains unchanged. As for the final score, when Inline graphic, it is more than 38. And, the results in Figure 2(b) confirm again that when Inline graphic is adopted as 3.5, the structure inference performance is improved the most.

Prediction Performances of Inline graphic

In the previous subsection, it is clear that prediction performances are improved by the normalization step, especially when the weight matrix is adopted as Inline graphic. In this subsection, the prediction performances of Inline graphic is under investigation. For convenience, Inline graphic is adopted as 3.5 in this subsection.

To investigate influences of power low parameters on the prediction accuracy of the estimation algorithm, optimal values are searched for both Inline graphic and Inline graphic. Particularly, for every network, the optimal Inline graphic is searched over the set Inline graphic, and the optimal Inline graphic is over the interval Inline graphic through an equally spaced sampling with 100 samples. In this optimization, the desirable Inline graphic and Inline graphic are selected to be the sample that maximizes the Inline graphic specification, Inline graphic. The corresponding results are given in Table 2.

Taking the exponential decay of power law into account, Inline graphic is utilized in these estimations. To compare prediction performances with the best team, the corresponding specifications are also included in Table 2, obtained directly from the web site of the DREAM project. The best values of the AUROC and the AUPR specifications for each network are written in boldface. In addition, the corresponding Inline graphic-value for each specification is given in parentheses. In the last column of Table 2, the obtained scores are also given for each method. Furthermore, the optimal Inline graphic and Inline graphic for each network are given in the last two lines. From results of Table 2, it is clear that compared with the method adopted by the best team, although there are networks with which the AUROC specification of the suggested method is slightly worse, its AUPR specification is much better than the best team for every network. Therefore, the final score for the suggested method is greater than the best team.

It is worthwhile to note that in actual applications, the optimal Inline graphic and Inline graphic are usually not available. On the other hand, it is currently well known that for most biology systems, the parameter Inline graphic belongs to the interval Inline graphic [23]. To test practical effectiveness of the suggested method, its estimation performances with the power law parameters taking some typical values, i.e., Inline graphic and Inline graphic have been studied. The corresponding results are given in Table 3.

For each case, the AUROC and the AUPR specifications with the corresponding Inline graphic-value written in parentheses are presented. And, in the last column of Table 3, the obtained scores are given for each case. In addition, similarly to Table 2, the prediction specifications of the best team are also included in Table 3. It is obvious that the performance of this step is affected by the parameters of the power law. Although estimation performance deteriorates slightly when Inline graphic and Inline graphic deviate from their optimal values, it is still better than the available methods.

The ROC curve and PR curve of each network with empirical and optimal power law parameters are presented in Figure 3. Here, the empirical power law parameter means that Inline graphic and Inline graphic for every network.

Figure 3. ROC and PR curves of Inline graphic.

Figure 3

Figure 3 show that the precision specification is also very well, when the weight matrix is adopted as Inline graphic. More importantly, the third step of the proposed method may guarantee that the PR curve starts from the left upper corner. This phenomenon is verified by Figure 4. Figure 4 contains two PR curves. The one is Net4 without the third step, while the other is also Net4 when its weight matrix is adopted as Inline graphic. It is clear that the PR curve of Inline graphic starts from the left upper corner. This feature is a good guidance to biological experiment designs.

Figure 4. Effect for the third step.

Figure 4

Most large scale networks may have the sparse property, which may be approximated by the power law. The developed algorithm has quantitatively employed this property by constructing a 0–1 integer programming problem. Consequently, direct regulation genes for an arbitrary gene can be (sub)optimally estimated. Furthermore, this information is incorporated into the developed algorithm by the manipulation of Equations (13) and (14). That is the reason why the propose method has a property of high confident predictions. On the other hand, there are some potential risks when the third step is used. Specifically, when the distribution of in-degree is not accurate, the prediction accuracy of Inline graphic may deteriorate with respect to Inline graphic. For example, when Inline graphic and Inline graphic, the final score of Inline graphic is less than Inline graphic. Therefore, it is suggested that when the in-degree distribution is unreliable or unavailable in practice, the operations of the third step should be used with caution.

Performances on the DREAM5 Network Inference Challenge

To evaluate the performance on real data, tests are performed on the DREAM5 Network Inference Challenge. Here, all of gene expression data offered by the DREAM5 organizers are regarded as multifactorial perturbation data. To better reconstruct the real GRNs, some special improvements are taken into consideration. First, the networks in the DREAM5 Network Inference Challenge are much more complicated than those in the DREAM4. The function Inline graphic may not be properly approximated by its first order Taylor expansion. In general, if the order of the Taylor series is high enough, Inline graphic will be obtained precisely. However, this treatment may bring some adverse impacts. Especially, when Inline graphic is approximated by its fourth order Taylor expansion, the matrix inversion operation will be infeasible when the least squares estimation is used. Therefore, we use the third order Taylor expansion to approximate it, i.e.,

graphic file with name pone.0043819.e395.jpg

With the help of the Least Squares, the coefficients in above equation and the sum of squared residuals Inline graphic can be obtained. Second, consider two genes Inline graphic and Inline graphic. Assume that gene Inline graphic regulates gene Inline graphic and gene Inline graphic has no direct effect on gene Inline graphic. And, suppose Inline graphic and Inline graphic is slightly smaller than Inline graphic. In this case, Inline graphic may be very close to Inline graphic in the weight ranking list. To overcome this drawback, the factor Inline graphic in Equation (7) is replaced by Inline graphic. Similarly, the factor Inline graphic in Equation (7) is replaced by Inline graphic. For example, suppose Inline graphic and Inline graphic, the gap between Inline graphic and Inline graphic is 0.0101. While the gap between Inline graphic and Inline graphic is 0.0160; and the gap between Inline graphic and Inline graphic is 0.0190. In general, the value of Inline graphic is larger than 1, but it can not be too large, to avoid Inline graphic tending to 0. Based on our computational experience, when Inline graphic is set as 4, the performance is improved significantly. Therefore, Equation (7) is replaced by the following expression:

graphic file with name pone.0043819.e423.jpg

Due to the reason that the in-degree distribution is unreliable (unavailable), the operations of the third step are canceled. The prediction performances of Inline graphic are shown in Table 4.

Table 4. Performances on the DREAM5 Network Inference Challenge.

Net1 Net3 Net4 Inline graphic
Inline graphic AUROC 0.7231 0.5469 0.5049 32.9093
Inline graphic (2.2891e-10) (0.9996) (0.9998)
AUPR 0.3438 0.0595 0.0189
Inline graphic (2.2209e-185) (7.0052e-4) (0.9840)

The final score in Table 4 is better than the third team. Further more, the improved method is also tested on the DREAM4 In Silico Size100 Multifactorial subchallenge. The final performances are represented in Table 5, and the estimation performances of the improved algorithm significantly outperform the best team. These results shows that our improved method may be competent to infer gene regulatory networks.

Table 5. Performances on the DREAM4 Multifactorial subchallenge using improved method.

Net1 Net2 Net3 Net4 Net5 Inline graphic
Inline graphic AUROC 0.7510 0.7416 0.7995 0.7865 0.8071 42.8862
Inline graphic (7.6122e-19) (1.2736e-30) (2.1254e-39) (3.6995e-32) (2.7275e-36)
AUPR 0.1740 0.1646 0.2524 0.2472 0.2825
Inline graphic (4.1120e-39) (3.7733e-58) (4.7196e-60) (9.2795e-57) (9.7252e-65)

Computation Time

In this section, the computational complexity of the proposed method is discussed. It is well known that integer programming is an NP-complete problem and there is no known polynomial-time algorithm to solve it [18], [19]. Therefore, we only discuss the computational complexity of the first two steps. The main calculating module is the least squares estimator. More precisely, this estimator involves a large matrix multiplication operation, for instance Inline graphic. Here,

graphic file with name pone.0043819.e434.jpg

where, Inline graphic represents the experiment number, and Inline graphic represents the order of the Taylor series. Therefore, for a particular network including Inline graphic genes, in which the number of transcription factors is Inline graphic, the computational complexity of the proposed method is Inline graphic. In general, Inline graphic, that is, the computational complexity is Inline graphic.

Using the first order Taylor expansion, the computation time for each network in the DREAM4 In Silico Size100 Multifactorial subchallenge is respectively: 0.1047 s, 0.1054 s, 0.1042 s, 0.1052 s, and 0.1046 s. While, using the third order Taylor expansion the computation time is respectively: 0.7725 s, 0.7285 s, 0.7338 s, 0.7281 s, and 0.7272 s. For the DREAM5 Network Inference Challenge, the computation time is respectively: 65.5030 s, 378.9817 s, and 295.2905 s.

The computation is performed on a PC with Inter(R) Core (TM) i5-2400 CPU, 4 GB RAM, and Matlab 2008a.

Concluding Remarks

In this paper, an algorithm is developed for the GRN topology inference from steady state multifactorial perturbation data. The GRN inference problem among Inline graphic genes is decomposed into Inline graphic different regression problems. In each of the regression problems, the expression level of a target gene is predicted solely from the expression level of a potential regulation gene. For different potential regulation genes, different weights for a specific target gene are constructed by using the sum of squared residuals and the Pearson correlation coefficient. The larger the sum of squared residuals is, the weaker the direct regulatory interaction will be. And, the higher the Pearson correlation coefficient is, the stronger the rationality is for the application of the regression analysis. Then, the constructed weight of a gene is normalized. To employ the network sparse property quantitatively, a 0–1 integer programming problem is constructed. By solving this problem, direct regulation genes for an arbitrary gene can be estimated. Lastly, the normalized weight of a gene is modified, on the basis of the estimation results about the existence of direct regulations to it. These normalized and modified weights are used in queuing the possibility of the existence of a corresponding direct regulation.

Computational results with the DREAM4 In Silico Size100 Multifactorial subchallenge show that this method can outperform the available method, particularly in improving the AUPR specifications. Using the real data provided by the DREAM5 Network Inference Challenge, estimation performances can be ranked third. In addition, if the veracity of the prior structural information is certifiable, the third step of this method not only improve the final score but also could guarantee the PR curve starts from the left upper corner, which may be helpful in guiding designs of a biological validation experiment.

Although the computational results are promising, many important issues still need further efforts. Among them, how to utilize the experimental data to obtain the in-degree distribution of a GRN is currently under investigations.

Supplementary Information

The Matlab files for this method will be offered upon request. Please contact the following email address: xiongj08@mails.tsinghua.edu.cn.

Funding Statement

The reported work was financially supported in part by the 973 Program under Grant 2012CB316504 and 2009CB320602 and by the National Natural Science Foundation of China under Grants 61174122, 61021063, 60721003, and 60625305. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Martin S, Zhang Z, Martino A, Faulon J (2007) Boolean dynamics of genetic regulatory networks inferred from microarray time series data. Bioinformatics 23: 866–874. [DOI] [PubMed] [Google Scholar]
  • 2. Ferrazzi F, Sebastiani P, Ramoni M, Bellazzi R (2007) Bayesian approaches to reverse engineer cellular systems: a simulation study on nonlinear gaussian networks. BMC bioinformatics 8: S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Opgen-Rhein R, Strimmer K (2007) From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data. BMC Systems Biology 1: 37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Zhou T, Wang Y (2010) Causal relationship inference for a large-scale cellular network. Bioinformatics 26: 2020–2028. [DOI] [PubMed] [Google Scholar]
  • 5.The dream4 In Silico network challenge. Available: http://wiki.c2b2.columbia.edu/dream/index. php/D4c2. Accessed 2012 Aug 30.
  • 6.Genenetweaver tool version 2.0. Available: http://gnw.sourceforge.net. Accessed 2012 Aug 30.
  • 7. Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P (2010) Inferring regulatory networks from expression data using tree-based methods. PLoS One 5: e12776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Menéndez P, Kourmpetis Y, Ter Braak C, van Eeuwijk F (2010) Gene regulatory networks from multifactorial perturbations using graphical lasso: application to the dream4 challenge. PloS one 5: e14147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Eisen M, Spellman P, Brown P, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences 95: 14863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Barabási A, Oltvai Z (2004) Network biology: understanding the cell's functional organization. Nature Reviews Genetics 5: 101–113. [DOI] [PubMed] [Google Scholar]
  • 11. MacLean D, Elina N, Havecker E, Heimstaedt S, Studholme D, et al. (2010) Evidence for large complex networks of plant short silencing rnas. PLoS One 5: e9901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Albert R (2005) Scale-free networks in cell biology. Journal of cell science 118: 4947–4957. [DOI] [PubMed] [Google Scholar]
  • 13. Hempel S, Koseska A, Nikoloski Z, Kurths J (2011) Unraveling gene regulatory networks from time-resolved gene expression data–a measures comparison study. BMC bioinformatics 12: 292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Weisberg S (1981) Applied linear regression. New York: Wiley. [Google Scholar]
  • 15. De Jong H (2002) Modeling and simulation of genetic regulatory systems: a literature review. Journal of computational biology 9: 67–103. [DOI] [PubMed] [Google Scholar]
  • 16. Cantone I, Marucci L, Iorio F, Ricci M, Belcastro V, et al. (2009) A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell 137: 172–181. [DOI] [PubMed] [Google Scholar]
  • 17. Wang Y, Zhou T (2012) A relative variation-based method to unraveling gene regulatory networks. PloS one 7: e31194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Nemhauser G, Wolsey L (1988) Integer and combinatorial optimization. John Wiley & Son. [Google Scholar]
  • 19.Wolsey L (1988) Integer Programming. Hoboken, NJ: John Wiley and Son. [Google Scholar]
  • 20. Pinna A, Soranzo N, de la Fuente A (2010) From knockouts to networks: establishing direct causeeffect relationships through graph analysis. PloS one 5: e12912. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Klamt S, Flassig R, Sundmacher K (2010) Transwesd: inferring cellular networks with transitive reduction. Bioinformatics 26: 2160–2168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Prill R, Marbach D, Saez-Rodriguez J, Sorger P, Alexopoulos L, et al. (2010) Towards a rigorous assessment of systems biology models: the dream3 challenges. PloS one 5: e9202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Andrecut M, Kauffman S, Madni A (2008) Evidence of scale-free topology in gene regulatory network of human tissues. International Journal of Modern Physics C 19: 283–290. [Google Scholar]

Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES