Gene Regulatory Network Inference from Multifactorial Perturbation Data Using both Regression and Correlation Analyses

Jie Xiong; Tong Zhou

doi:10.1371/journal.pone.0043819

. 2012 Sep 21;7(9):e43819. doi: 10.1371/journal.pone.0043819

Gene Regulatory Network Inference from Multifactorial Perturbation Data Using both Regression and Correlation Analyses

Jie Xiong ^1,^*, Tong Zhou ²

Editor: Alberto de la Fuente³

PMCID: PMC3448649 PMID: 23028471

Abstract

An important problem in systems biology is to reconstruct gene regulatory networks (GRNs) from experimental data and other a priori information. The DREAM project offers some types of experimental data, such as knockout data, knockdown data, time series data, etc. Among them, multifactorial perturbation data are easier and less expensive to obtain than other types of experimental data and are thus more common in practice. In this article, a new algorithm is presented for the inference of GRNs using the DREAM4 multifactorial perturbation data. The GRN inference problem among Inline graphic genes is decomposed into different regression problems. In each of the regression problems, the expression level of a target gene is predicted solely from the expression level of a potential regulation gene. For different potential regulation genes, different weights for a specific target gene are constructed by using the sum of squared residuals and the Pearson correlation coefficient. Then these weights are normalized to reflect effort differences of regulating distinct genes. By appropriately choosing the parameters of the power law, we constructe a 0–1 integer programming problem. By solving this problem, direct regulation genes for an arbitrary gene can be estimated. And, the normalized weight of a gene is modified, on the basis of the estimation results about the existence of direct regulations to it. These normalized and modified weights are used in queuing the possibility of the existence of a corresponding direct regulation. Computation results with the DREAM4 In Silico Size 100 Multifactorial subchallenge show that estimation performances of the suggested algorithm can even outperform the best team. Using the real data provided by the DREAM5 Network Inference Challenge, estimation performances can be ranked third. Furthermore, the high precision of the obtained most reliable predictions shows the suggested algorithm may be helpful in guiding biological experiment designs.

Introduction

Reconstructing the structure of a gene regulatory network (GRN) from experimental data and other a priori information is very helpful in understanding the development, pathology and functioning of all biological organisms. Recently, with the development of high-throughput technologies, such as DNA microarrays, mass spectroscopy, etc., it is possible to reconstruct GRNs from some types of experimental data. In practice, the common data types contain knockout data, knockdown data, time series data, etc. Various models and methods have been suggested to attack this problem based on these types of experimental data, such as Boolean networks [1], Bayesian networks [2], information theory based algorithms [3], ordinary differential equation (ODE) based methods [4], etc.

Recently, the Dialogue for Reverse Engineering Assessments and Methods (DREAM) has been providing not only a set of benchmark networks extracted from actual biological networks of some most important and typical biological modules, such as Escherichia coli transcriptional regulatory network and Saccharomyces cerevisiae (yeast) transcriptional regulatory network [5], but also some types of In Silico gene expression data sets generated by the GeneNetWeaver tool version 2.0 [6], to motivate the systems biology community to investigate and develop structure identification methods for GRNs. In particular, the DREAM project offers an alternative type of steady-state data, i.e., multifactorial perturbation data, which are obtained by slightly perturbing all genes simultaneously so that the basal activation of all genes of the network is slightly increased or decreased simultaneously by different random amounts [5]. Multifactorial perturbation data might be regarded as expression profiles obtained from different patients [5]. Therefore, such data are easier and less expensive to be obtained than other types of experimental data and are thus more common in practice [7]. On the other hand, such data provide less information about GRNs with respect to other types of data which make the GRN identification problem more formidable [7].

Several methods have been shown to be effective in inferring the structure of a GRN through participating in the DREAM4 In Silico Size 100 Multifactorial subchallenge. For example, the best performer has developed GENIE3 algorithm for the inference of GRNs, which decomposes the prediction of a regulatory network among Inline graphic genes into different regression problems. In each of the regression problems, the expression pattern of a target gene is predicted from the expression patterns of all the other genes, using tree-based methods [7]. The second place team tackled the problem via a sparse Gaussian Markov Random Field, which relates network topology with the covariance inverse generated by the gene measurements. And, the Graphical Lasso algorithm is used to compute the covariance inverse. Then, the optimal network is selected by different model selection criteria [8]. On the other hand, a GRN can be modeled as a correlation network [9], which is obtained by computing the correlation coefficient between arbitrary two genes. Surprisingly but also interestingly, this simple method was proved to be placed at the second (tie) for the DREAM4 In Silico Size 100 Multifactorial subchallenge. However, due to the symmetry of the correlation coefficient, the estimated correlation network topology is undirected.

Motivated by the GENIE3 algorithm, an identification algorithm is developed in this paper for GRN topology inference, based on the regression analysis and the correlation analysis. Specifically, the GRN inference problem among Inline graphic genes is decomposed into different regression problems. And, in each of the regression problems, the expression level of a target gene is predicted solely from the expression level of a potential regulation gene. For different potential regulation genes, different weights for a specific target gene are constructed. The larger the sum of squared residuals is, the weaker the direct regulatory interaction will be. And, the higher the Pearson correlation coefficient is, the stronger the rationality is for the application of the linear regression. To take both into consideration, the weight corresponding to a possible direct regulation is selected as their product. Then these weights are normalized to reflect effort differences of regulating distinct genes.

It has been observed that most large scale gene regulatory networks are sparse. Mathematically, the sparsity of a GRN may be characterized by the power law [4]. And, the in-degree distribution of a GRN can be obtained by means of the power law. In this paper, the so-called in-degree distribution means the number of genes with in-degree equal to Inline graphic . By appropriately choosing the in-degree distribution of a GRN, this paper suggest a method to utilize the sparsity quantitatively. Through constructing loss functions and incorporating power law, and solving a 0–1 integer programming problem, the direct regulation genes for an arbitrary gene can be estimated. Then, the above normalized weights can be further adjusted based on these estimated direct regulatory relationships.

In general, these weights are used to queue the possibility of the direct causal regulation. The larger the adjusted weight is, the higher the confidence is for the existence of the direct causal regulation. When a threshold is provided, this queue can lead to an estimate about the structure of a GRN. Computation results with the DREAM4 In Silico Size 100 Multifactorial subchallenge show that estimation performances of the suggested algorithm can even outperform the best team. Using the real data provided by the DREAM5 Network Inference Challenge, estimation performances by the proposed method can be ranked third. Furthermore, the high precision of the obtained most reliable predictions implies that the suggested algorithm may be helpful in guiding biological validation experiment designs.

The outline of this paper is as follows. At first, the structure estimation algorithm is illustrated. Afterwards, the proposed estimation method is assessed using the data sets of the DREAM4 In Silico Size 100 Multifactorial subchallenge and the DREAM5 Network Inference Challenge. Variations of estimation performances with respect to parameters of the suggested method will also be reported. Finally, some concluding remarks are given about the characteristics of the suggested method, as well as some future works worthy of further efforts.

Materials and Methods

Problem Statement

Considering a GRN with Inline graphic genes, it is assumed that the targeted network is a directed graph, in which each node represents a gene, and an edge directed from one gene to another gene indicates that gene regulates the expression of gene directly. The goal of gene regulatory network inference in this paper is to recover the network solely from multifactorial perturbation data. A set of multifactorial perturbation data can be obtained by first perturbing all genes simultaneously, and then measuring steady-state levels of all genes. Different data sets can be obtained by implementing different perturbations to the network [5]. At the same time, such data do not give information about the regulatory network dynamics, but about the system equilibrium once it has recovered after the perturbation [8].

Denote Inline graphic by sets of multifactorial perturbation data:

graphic file with name pone.0043819.e015.jpg

where, Inline graphic represents the steady-state levels of gene in the -th experiment. Specifically, the problem of recovering regulatory networks is addressed as follows:

Utilizing data set Inline graphic , design a GRN inference algorithm and assign weights . The larger the weight is, the higher the confidence is for the existence of the direct causal regulation from gene to gene .

For most of large scale networks, it has been observed that the distribution of the number of chemical elements that have direct regulatory effects on a randomly chosen chemical element, obeys approximately a power law [4], [10]–[13]. More specifically, let Inline graphic denote the probability that the number of direct regulations on a randomly chosen chemical element equals to , then there exist a positive number and a positive integer , such that [4]

in which Inline graphic . This important prior structural information is also incorporated into our estimation procedures.

Regression Analysis

It is well known that the relevance between any two genes can be represented by the Pearson correlation coefficient [9]. But this method is non-causal. On the other hand, the GENIE3 algorithm decomposes the prediction of a regulatory network among Inline graphic genes into different regression problems. In each of the regression problems, the expression pattern of a target gene is predicted from the expression patterns of all the other genes, using tree-based methods [7]. Motivated by this idea, we decompose the GRN inference problem among genes into Inline graphic different regression problems. The novelty is as follows. In each of the regression problems, the expression level of a target gene is predicted solely from the expression level of a potential regulation gene. For different potential regulation genes, different weights for a specific target gene are constructed. That is, for a particular gene Inline graphic and its potential regulation gene , the aim of the regression analysis is to establish a function, i.e., . Obviously, this function reveals the causal relationship between them. Here, and represent the steady state expression concentrations of the genes and , respectively. In practice, Inline graphic is not completely determined by , because there are many factors which may affect . Therefore, is used to represent the unknown secondary factors or/and the random errors, all of which may affect . An important parameter in the regression model is the variance of , i.e., . In essence, Inline graphic is the mean squared error when is approximated by an suitable function [14]. Generally, when is reasonably selected as the most important factor, then the value of will be relatively smaller; otherwise, the value of will be relatively larger. In other words, for the particular gene Inline graphic , the smaller the magnitude of is, the larger the probability is for the existence of the direct causal regulation from gene to gene .

In practice, although Inline graphic is unavailable, it can be estimated from the sum of squared residuals by using linear regression (least-squares estimation). Therefore, we can construct the weight based on the above discussion. A practical network prediction is obtained by setting a threshold on the ranking of weights from the most to the less significant. In this paper, we focus on the task of constructing weights, while the question of the choice of an optimal confidence threshold, although important, will be left open.

Weight Construction

Denote by Inline graphic the steady-state level of gene . The steady-state level of gene may be directly affected by all other genes expression levels. Therefore we have the following expression:

(1)

The function Inline graphic in Equation (1) not only contains lots of arguments, but also may be a non-linear function. Thus, it is hard to directly estimate the function . On the other hand, from the definition of the weight , we know that represents the probability of the direct causal regulation only from gene Inline graphic to gene . That is, when the weight is computed, the function in Equation (1) can be approximated by the following expression:

(2)

The form of the function Inline graphic , however, is not clear and might be non-linear. Hence, the linear regression technique is used to analyze the direct causal regulation from gene to gene . And, the function is approximated by its first order Taylor expansion, i.e.,

(3)

where, Inline graphic represents the approximation error or/and the measurement error.

Consequently, from Equation (2) and Equation (3), we have

(4)

The regression coefficients Inline graphic and can be estimated by the least squares estimation. Let , then , Here, Moreover, the sum of squared residuals is also obtained in this process, i.e.,

(5)

where, Inline graphic . The value of might be regarded as the capability of the direct regulatory interaction from gene to gene . In other words, we assume that the larger the sum of squared residuals is, the weaker the direct regulatory interaction from gene to gene will be. For this reason, the constructed weight should utilize this characteristic provided by Inline graphic .

On the other hand, for arbitrary two data sets Inline graphic and , not matter whether there exists the linear correlation between them, the sum of squared residuals can always be obtained by Equation (5). However, if there does not exist the linear correlation between them, the application of the linear regression is meaningless. To test whether the data sets Inline graphic and are linear correlation, the correlation coefficient is the most frequently used test statistic. The expression for the correlation coefficient is as follows:

graphic file with name pone.0043819.e101.jpg

(6)

According to the discussion above, it is clear that the larger the sum of squared residuals Inline graphic is, the weaker the direct regulatory interaction from gene to gene will be. And, the larger the Pearson correlation coefficient is, the stronger the rationality is for the application of the linear regression on data sets and . To take both of them into consideration, a weight corresponding to a possible direct regulation from gene Inline graphic to gene is constructed as follows:

(7)

For the particular gene Inline graphic , the larger the magnitude of is, the larger the confidence is that gene is directly regulated by gene .

Weight Normalization

It is important to note that in GRN topology inferences the larger the value of Inline graphic is, the larger the probability is for the existence of a direct regulation from gene to gene . Define a dimensional matrix with its -th row -th column element being the estimate of when and its diagonal element being zero, and denote its -th column vector by . And then, it is clear that this matrix contains information about the probability of the existence of a direct regulation between any two different genes in a GRN. However, to infer the structure of a GRN from this matrix, an important fact must be taken into account. That is, in a GRN, some genes may be easily regulated by other genes, while regulations on some other genes may need more efforts [15]–[17]. This implies that direct regulations to different genes may lead to weights of different magnitude orders. Therefore, in order to obtain a good estimate from the matrix Inline graphic about the topology of a GRN, an appropriate normalization is still required for the estimated s among different genes.

In [17], it is suggested to use the Inline graphic -norm of the vector and the geometric average of its non-zero elements to achieve the normalization. More specifically, when is adopted as 3.5, the structure inference performance is improved the most. Therefore, in this paper, it is suggested to also use the -norm of the vector to achieve this normalization, that is, Inline graphic is replaced by

(8)

It is worthwhile to note that this normalization does not change the diagonal elements. For presentation conciseness, the normalized matrix Inline graphic using the vector -norm is denoted by in the rest of this paper.

The normalization is firstly proposed in [17], in which the weight is represented by the RELV (relative expression level variation). The goal of the normalization is to guarantee that the weights for different genes hold the same magnitude order. For a GRN, in the last ranking list of Inline graphic , if the magnitude is larger, the corresponding transcription regulation will be established in a larger probability.

In-degree Estimation and Weight Magnitude Modification

To compute the weight Inline graphic , the multivariate function is approximated by a univariate function , which implies that the in-degree for an arbitrary gene is assumed as one. Thus, the constructed weights do not employ the information about the combinatorial regulation to a gene. In this subsection, we try to estimate the in-degrees of genes in a GNR to utilize the information about the combinatorial regulation.

It is clear that the value of Inline graphic represents the capability of the direct regulatory interaction from gene to gene , that is, the smaller the sum of squared residuals is, the stronger the direct regulatory interaction from gene to gene will be. Sort the sum of squared residuals of gene in a non-decreasing order, and denote the sorted results as follows:

In this ranking Inline graphic , so it is assumed that the top genes from gene to gene have great chance to combinatorially regulate gene . Therefore, the multivariate function can be approximated by a -variable function in such case, i.e.:

(9)

The form of the function Inline graphic , however, is also not clear and might be non-linear. Hence, the linear regression technique is used again. Applying the first order multiple Taylor expansion to the function , we have

where, Inline graphic represents the approximation error or/and the measurement error.

Using the least squares again, not only the regression coefficients Inline graphic , but also the sum of squared residuals and the sum of deviation squares can be estimated. Let

graphic file with name pone.0043819.e167.jpg

then,

and,

(10)

Define a loss function Inline graphic as follows:

(11)

Here, the value of the sum of squared residuals Inline graphic represents the capability of a direct combinatorial regulation from genes numbered to gene . Obviously, it can be thought that the smaller the sum of squared residuals is, the stronger the direct combinatorial regulation interaction will be. And, is the test statistic. The larger the test statistic is, the stronger the rationality is for the application of the multiple linear regression analysis. Therefore, to take both into consideration, the loss function Inline graphic is defined as Equation (11). And, it can be presumed that the smaller the value of is, the higher the probability is for the establishment of a direct combinatorial regulation from genes numbered to gene .

To estimate the in-degree for a specific gene Inline graphic optimally, one can search from 1 to to find the minimum of the loss function at . In such case, the optimal in-degree for the specific gene is and genes numbered are most likely to have a direct regulation effect on gene . However, to estimate the in-degree for every gene in a GRN optimally, the structural characteristics of GRNs should be taken into consideration, that is, the power low could be taken into consideration. let Inline graphic () and denote respectively the maximum in-degree of a GRN and the number of genes with its in-degree equalling to . Then, from the power law, it is clear that . Since each gene has a unique in-degree, we can utilize the following 0–1 integer optimization to estimate the in-degree for every gene optimally.

graphic file with name pone.0043819.e195.jpg

(12)

Problem (12) can be solved by using a linear programming-based branch-and-bound algorithm [18], [19], and its optimal estimates can be denoted by Inline graphic . For gene , if , with , then, from the above problem description, it is clear that the optimal estimate for the in-degree of this gene is , and genes numbered are most likely to have a direct regulation effect on this gene.

In GRN topology estimation, another important thing worthy of considering is that genes estimated to have a direct regulation should correspond to a weight with a magnitude greater than those estimated not to have a direct regulation [20], [21]. To achieve this purpose, the following adjustment is suggested in this paper. Define Inline graphic as

(13)

With this value, the normalized weights for an arbitrary gene Inline graphic are modified as follows,

(14)

Here, for each gene Inline graphic , is determined by the solution of Problem (12).

Denote by Inline graphic the dimensional matrix with its -th row -th column element being . Elements of this matrix are directly used to infer the structure of a GRN. The bigger the -th row -th element is, the higher the probability is that gene is directly regulated by gene .

It should be stressed here that the effectiveness of the in-degree estimation depends on the veracity of the prior structural information. In this paper, the sparsity of a GRN is characterized by the power law. Therefore, the number of genes, whose in-degree are equal to Inline graphic , can be represented as . Here, is the so-called power law. That is, the solution of Problem (12) depends on the parameters of the power law. If the in-degree distribution of a GRN is pertinent and appropriate, the effectiveness of this step may be positive. Otherwise the performance may deteriorate. The results from Table 1,2,3 in the following section may support the argument.

Table 1. Performances Only for Weight Normalization.

		Net1	Net2	Net3	Net4	Net5
Best Team	AUROC	0.745	0.733	0.775	0.791	0.798	37.428
		(3.334e-18)	(1.076e-28)	(9.705e-34)	(6.736e-33)	(1.912e-34)
	AUPR	0.154	0.155	0.231	0.208	0.197
		(3.309e-34)	(7.897e-54)	(1.791e-54)	(5.489e-47)	(4.563e-44)
	AUROC	0.6899	0.6485	0.7081	0.6998	0.6655	16.4038
		(1.4643e-12)	(1.6105e-13)	(1.2922e-20)	(5.6590e-18)	(6.6749e-13)
	AUPR	0.0711	0.0893	0.1230	0.0938	0.0532
		(5.3481e-14)	(1.9060e-24)	(3.2720e-27)	(6.6316e-19)	(3.5962e-09)
	AUROC	0.7524	0.7097	0.7694	0.7590	0.7651	35.7303
		(5.3124e-19)	(5.3557e-24)	(1.4810e-32)	(2.9514e-27)	(3.7157e-28)
	AUPR	0.1464	0.1673	0.2212	0.2046	0.1944
		(2.2398e-32)	(1.7894e-59)	(4.0881e-52)	(3.0186e-46)	(2.1768e-43)
	AUROC	0.7614	0.7149	0.7690	0.7697	0.7691	38.1179
		(5.1100e-20)	(5.0426e-25)	(1.8088e-32)	(4.0920e-29)	(6.5978e-29)
	AUPR	0.1626	0.1697	0.2283	0.2229	0.2271
		(2.4910e-36)	(1.4524e-60)	(6.1040e-54)	(9.2038e-51)	(2.5905e-51)
	AUROC	0.7641	0.7172	0.7660	0.7762	0.7693	38.4670
		(2.5068e-20)	(1.7462e-25)	(8.0330e-32)	(2.7481e-30)	(6.3238e-29)
	AUPR	0.1673	0.1612	0.2273	0.2271	0.2448
		(1.7753e-37)	(1.5096e-56)	(1.1603e-53)	(8.4634e-52)	(1.3305e-55)

Open in a new tab

Table 2. Performances with the optimal and .

		Net1	Net2	Net3	Net4	Net5
Best Team	AUROC	0.745	0.733	0.775	0.791	0.798	37.428
		(3.334e-18)	(1.076e-28)	(9.705e-34)	(6.736e-33)	(1.912e-34)
	AUPR	0.154	0.155	0.231	0.208	0.197
		(3.309e-34)	(7.897e-54)	(1.791e-54)	(5.489e-47)	(4.563e-44)
	AUROC	0.7642	0.7173	0.7865	0.7764	0.7693	39.9465
		(2.4413e-20)	(1.6671e-25)	(2.2385e-36)	(2.5303e-30)	(6.0610e-29)
	AUPR	0.1799	0.1648	0.2341	0.2326	0.2540
		(1.4115e-40)	(2.7231e-58)	(2.1883e-55)	(3.7182e-53)	(7.8435e-58)
		1	1	3	2	1
		3.3	3.3	1	3.7	5.0

Open in a new tab

Table 3. Performances with typical and .

		Net1	Net2	Net3	Net4	Net5
Best Team	AUROC	0.745	0.733	0.775	0.791	0.798	37.428
		(3.334e-18)	(1.076e-28)	(9.705e-34)	(6.736e-33)	(1.912e-34)
	AUPR	0.154	0.155	0.231	0.208	0.197
		(3.309e-34)	(7.897e-54)	(1.791e-54)	(5.489e-47)	(4.563e-44)
,	AUROC	0.7634	0.7165	0.7691	0.7752	0.7687	37.4489
		(3.0972e-20)	(2.4137e-25)	(1.8088e-32)	(4.1501e-30)	(7.8172e-29)
	AUPR	0.1710	0.1564	0.2290	0.2047	0.2287
		(2.2195e-38)	(2.4170e-54)	(4.2999e-54)	(3.0186e-46)	(1.0608e-51)
,	AUROC	0.7641	0.7171	0.7674	0.7759	0.7694	38.1914
		(2.5068e-20)	(1.9156e-25)	(4.0136e-32)	(3.1102e-30)	(5.8091e-29)
	AUPR	0.1720	0.1598	0.2263	0.2189	0.2394
		(1.2653e-38)	(6.1630e-56)	(1.9625e-53)	(8.9338e-50)	(2.5610e-54)
,	AUROC	0.7641	0.7173	0.7670	0.7762	0.7692	38.8123
		(2.5068e-20)	(1.6671e-25)	(4.8953e-32)	(2.8639e-30)	(6.3238e-29)
	AUPR	0.1722	0.1646	0.2378	0.2173	0.2455
		(1.1308e-38)	(3.7733e-58)	(2.5223e-56)	(2.2175e-49)	(8.5142e-56)
,	AUROC	0.7642	0.7173	0.7660	0.7765	0.7693	39.0655
		(2.5068e-20)	(1.7462e-25)	(8.0330e-32)	(2.5303e-30)	(6.0610e-29)
	AUPR	0.1716	0.1632	0.2396	0.2212	0.2540
		(1.5842e-38)	(1.5489e-57)	(8.8169e-57)	(2.4181e-50)	(7.8435e-58)
,	AUROC	0.7609	0.7152	0.7714	0.7756	0.7685	37.9334
		(5.8275e-20)	(4.3935e-25)	(5.7087e-33)	(3.6677e-30)	(8.8762e-29)
	AUPR	0.1639	0.1536	0.2349	0.2216	0.2380
		(1.1998e-36)	(4.9060e-53)	(1.3716e-55)	(2.0391e-50)	(5.9144e-54)
,	AUROC	0.7630	0.7166	0.7688	0.7758	0.7687	38.2446
		(3.3525e-20)	(2.3048e-25)	(1.9987e-32)	(3.3776e-30)	(7.8172e-29)
	AUPR	0.1634	0.1574	0.2334	0.2271	0.2397
		(1.5022e-36)	(9.1641e-55)	(3.2932e-55)	(8.4634e-52)	(2.2905e-54)
,	AUROC	0.7636	0.7170	0.7687	0.7762	0.7690	38.4010
		(2.9378e-20)	(2.0063e-25)	(2.2085e-32)	(2.7481e-30)	(6.8837e-29)
	AUPR	0.1652	0.1599	0.2312	0.2270	0.2410
		(5.7785e-37)	(6.1630e-56)	(1.1225e-54)	(8.9583e-52)	(1.1089e-54)
,	AUROC	0.7637	0.7171	0.7687	0.7762	0.7692	38.3533
		(2.7865e-20)	(1.8289e-25)	(2.2085e-32)	(2.7481e-30)	(6.5978e-29)
	AUPR	0.1668	0.1612	0.2284	0.2216	0.2428
		(2.2228e-37)	(1.3546e-56)	(5.7578e-54)	(2.0391e-50)	(4.0615e-55)
,	AUROC	0.7586	0.7167	0.7730	0.7745	0.7685	38.0248
		(1.0644e-19)	(2.2007e-25)	(2.5460e-33)	(5.5346e-30)	(8.8762e-29)
	AUPR	0.1626	0.1567	0.2327	0.2231	0.2370
		(2.4910e-36)	(1.9486e-54)	(4.6750e-55)	(8.2152e-51)	(1.0333e-53)
,	AUROC	0.7625	0.7170	0.7710	0.7750	0.7690	38.3306
		(3.8251e-20)	(1.9156e-25)	(6.6388e-33)	(4.5061e-30)	(6.8837e-29)
	AUPR	0.1682	0.1577	0.2340	0.2231	0.2396
		(1.0706e-37)	(6.6312e-55)	(2.3199e-55)	(8.2152e-51)	(2.4220e-54)
,	AUROC	0.7631	0.7175	0.7702	0.7759	0.7689	38.7893
		(3.2651e-20)	(1.5916e-25)	(9.9210e-33)	(3.2412e-30)	(7.1819e-29)
	AUPR	0.1682	0.1624	0.2369	0.2274	0.2414
		(1.0706e-37)	(3.6898e-57)	(4.2661e-56)	(7.1370e-52)	(8.8708e-55)
,	AUROC	0.7634	0.7165	0.7693	0.7760	0.7686	38.4687
		(3.0972e-20)	(2.4137e-25)	(1.6368e-32)	(3.1102e-30)	(8.5083e-29)
	AUPR	0.1696	0.1586	0.2351	0.2270	0.2381
		(4.8746e-38)	(2.2538e-55)	(1.2204e-55)	(8.9583e-52)	(5.2898e-54)

Open in a new tab

Estimation Algorithm

In summary, on the basis of the regression analysis and the correlation analysis, the algorithm suggested in this paper for identifying direct regulations of a GRN consists of the following steps.

Compute the weight matrix according to Equations (5), Equations (6) and Equations (7).
Normalize the weight matrix according to Equations (8).
Choose appropriate values for , and , and solve the Problem (12), and modify the matrices according to Equations (13) and (14). (This is an optional step, not necessary.)

Using elements of these matrices Inline graphic (or ), queue possibilities of the existence of a direct regulation from the gene with the same number of the row to the gene with the same number of the column. The bigger the element is, the higher the confidence is for the existence of the direct causal regulation.

Results and Discussion

Data Sets and Assessment Metrics

To illustrate the effectiveness of the developed inference algorithm, tests are firstly performed on the DREAM4 In Silico Size100 Multifactorial subchallenge, which are designed to assess performances of an identification method for the structure of a large scale GRN [22]. They respectively contain 5 different benchmark networks with 100 genes which are obtained through extracting some important and typical modules from actual biological networks of E. coli and S. cerevisiae. Auto-regulatory interactions are removed, that is, there are no self-interactions in the in silico networks. For each network, 100 sets of multifactorial perturbation data are supplied.

Predictions are compared with the actual structure of the networks by the DREAM project organizers using the following two different metrics in topology prediction accuracy evaluations.

AUPR: The area under the precision-recall curve;
AUROC: The area under the receiver operating characteristic curve.

Moreover, for every network, the Inline graphic -values of the AUPR and AUROC measures, which indicate the probability that random predictions would have the same or better performances, are computed, which are respectively denoted by and , . Based on these -values, a final score is calculated as . A larger score indicates a better performance of the adopted inference algorithm. Here, Inline graphic and are defined as follows.

graphic file with name pone.0043819.e314.jpg

Similarly, we can define a specification for each network as Inline graphic , . Based on the above discussion, we know

graphic file with name pone.0043819.e317.jpg

More detailed explanations can be found in [22] or on the web site of the DREAM project at http://wiki.c2b2.columbia.edu/dream/. Moreover, to evaluate performance on real data, tests are also performed on the DREAM5 Network Inference Challenge. Finally, the computation time needed by the suggested method is discussed.

Prediction Performances of

To evaluate the prediction accuracy of Inline graphic , is normalized by using some typical vector norms, such as the 1-norm and the Euclidean norm. Moreover, it is reported that when is adopted as 3.5, the structure inference performance is improved significantly [17]. Thus, each column of is also normalized by using the 3.5-norm. The corresponding results are given in Table 1. Also, the Performance of Inline graphic is include in Table 1.

To compare prediction performances with the best team, the corresponding specifications are also included in Table 1, obtained directly from the web site of the DREAM project. Their digit lengthes are different from the other results that are obtained through actual computations. In addition, the corresponding Inline graphic -value for each specification is given in parentheses. In the last column of Table 1, the obtained scores are also given for each method. From Table 1, it is clear that by the normalization step the structure inference performance is improved remarkably. Specifically, when is chosen as 2 and 3.5, their final scores even outperform the best team's final score.

The final score is a pretty important specification in inferring the structure of GRNs, while the precision specification can not be revealed by the final score. In topology estimations, highly confident predictions can become a good guidance to biological experiment designs [22]. However, these predictions will be helpful only if their precisions are sufficiently high. This requires that a desirable estimation algorithm should have a PR (precision-recall) curve starting from the left upper corner, and decreasing monotonically and slowly with the increment of the recall rate. The ROC curve and PR curve of each network according to Inline graphic , , and are represented in Figure 1.

From Figure 1, we can draw some conclusions as follows. The AUPR and AUROC measures of Inline graphic , are improved much more by the normalization step compared with these measures of . What's more, when the weight matrix is adopted as , most of the PR curves start from the left upper corner. Specifically, when is chosen as 1 and 3.5, the precision specification is pretty well for all the five networks. And, when the weight matrix is adopted as Inline graphic , except the network 4, the PR curves start from the left upper corner for all other networks. This high precision implies that the suggested algorithm may be helpful in guiding biological validation experiment designs.

To investigate how the AUPR and AUROC measures and the final score of Inline graphic are influenced by , is searched over the interval through an equally spaced sampling with 90 samples. The corresponding results are given in Figure 2.

Left: Variations of the AUPR and AUROC measures with q; Right: Variations of the score with q.

The results in Figure 2(a) suggest that when Inline graphic , the AUROC measure for each network maintains growth along with the increase of . And when , the AUROC measure for each network nearly remains unchanged. On the other hand, for the networks 2,3,4, when , the AUPR measure maintains growth; and when , the AUPR measure slowly falls. The situation for the network 1 is similar, while the inflexion point is about Inline graphic . For the network 5, when , the AUPR measure maintains growth; then this measure nearly remains unchanged. As for the final score, when , it is more than 38. And, the results in Figure 2(b) confirm again that when is adopted as 3.5, the structure inference performance is improved the most.

Prediction Performances of

In the previous subsection, it is clear that prediction performances are improved by the normalization step, especially when the weight matrix is adopted as Inline graphic . In this subsection, the prediction performances of is under investigation. For convenience, is adopted as 3.5 in this subsection.

To investigate influences of power low parameters on the prediction accuracy of the estimation algorithm, optimal values are searched for both Inline graphic and . Particularly, for every network, the optimal is searched over the set , and the optimal is over the interval through an equally spaced sampling with 100 samples. In this optimization, the desirable and are selected to be the sample that maximizes the specification, . The corresponding results are given in Table 2.

Taking the exponential decay of power law into account, Inline graphic is utilized in these estimations. To compare prediction performances with the best team, the corresponding specifications are also included in Table 2, obtained directly from the web site of the DREAM project. The best values of the AUROC and the AUPR specifications for each network are written in boldface. In addition, the corresponding Inline graphic -value for each specification is given in parentheses. In the last column of Table 2, the obtained scores are also given for each method. Furthermore, the optimal and for each network are given in the last two lines. From results of Table 2, it is clear that compared with the method adopted by the best team, although there are networks with which the AUROC specification of the suggested method is slightly worse, its AUPR specification is much better than the best team for every network. Therefore, the final score for the suggested method is greater than the best team.

It is worthwhile to note that in actual applications, the optimal Inline graphic and are usually not available. On the other hand, it is currently well known that for most biology systems, the parameter belongs to the interval [23]. To test practical effectiveness of the suggested method, its estimation performances with the power law parameters taking some typical values, i.e., Inline graphic and have been studied. The corresponding results are given in Table 3.

For each case, the AUROC and the AUPR specifications with the corresponding Inline graphic -value written in parentheses are presented. And, in the last column of Table 3, the obtained scores are given for each case. In addition, similarly to Table 2, the prediction specifications of the best team are also included in Table 3. It is obvious that the performance of this step is affected by the parameters of the power law. Although estimation performance deteriorates slightly when Inline graphic and deviate from their optimal values, it is still better than the available methods.

The ROC curve and PR curve of each network with empirical and optimal power law parameters are presented in Figure 3. Here, the empirical power law parameter means that Inline graphic and for every network.

Figure 3 show that the precision specification is also very well, when the weight matrix is adopted as Inline graphic . More importantly, the third step of the proposed method may guarantee that the PR curve starts from the left upper corner. This phenomenon is verified by Figure 4. Figure 4 contains two PR curves. The one is Net4 without the third step, while the other is also Net4 when its weight matrix is adopted as Inline graphic . It is clear that the PR curve of starts from the left upper corner. This feature is a good guidance to biological experiment designs.

Most large scale networks may have the sparse property, which may be approximated by the power law. The developed algorithm has quantitatively employed this property by constructing a 0–1 integer programming problem. Consequently, direct regulation genes for an arbitrary gene can be (sub)optimally estimated. Furthermore, this information is incorporated into the developed algorithm by the manipulation of Equations (13) and (14). That is the reason why the propose method has a property of high confident predictions. On the other hand, there are some potential risks when the third step is used. Specifically, when the distribution of in-degree is not accurate, the prediction accuracy of Inline graphic may deteriorate with respect to . For example, when and , the final score of is less than . Therefore, it is suggested that when the in-degree distribution is unreliable or unavailable in practice, the operations of the third step should be used with caution.

Performances on the DREAM5 Network Inference Challenge

To evaluate the performance on real data, tests are performed on the DREAM5 Network Inference Challenge. Here, all of gene expression data offered by the DREAM5 organizers are regarded as multifactorial perturbation data. To better reconstruct the real GRNs, some special improvements are taken into consideration. First, the networks in the DREAM5 Network Inference Challenge are much more complicated than those in the DREAM4. The function Inline graphic may not be properly approximated by its first order Taylor expansion. In general, if the order of the Taylor series is high enough, will be obtained precisely. However, this treatment may bring some adverse impacts. Especially, when is approximated by its fourth order Taylor expansion, the matrix inversion operation will be infeasible when the least squares estimation is used. Therefore, we use the third order Taylor expansion to approximate it, i.e.,

With the help of the Least Squares, the coefficients in above equation and the sum of squared residuals Inline graphic can be obtained. Second, consider two genes and . Assume that gene regulates gene and gene has no direct effect on gene . And, suppose and is slightly smaller than . In this case, may be very close to in the weight ranking list. To overcome this drawback, the factor in Equation (7) is replaced by Inline graphic . Similarly, the factor in Equation (7) is replaced by . For example, suppose and , the gap between and is 0.0101. While the gap between and is 0.0160; and the gap between and is 0.0190. In general, the value of is larger than 1, but it can not be too large, to avoid tending to 0. Based on our computational experience, when Inline graphic is set as 4, the performance is improved significantly. Therefore, Equation (7) is replaced by the following expression:

Due to the reason that the in-degree distribution is unreliable (unavailable), the operations of the third step are canceled. The prediction performances of Inline graphic are shown in Table 4.

Table 4. Performances on the DREAM5 Network Inference Challenge.

	Net1	Net3	Net4
AUROC	0.7231	0.5469	0.5049	32.9093
	(2.2891e-10)	(0.9996)	(0.9998)
AUPR	0.3438	0.0595	0.0189
	(2.2209e-185)	(7.0052e-4)	(0.9840)

Open in a new tab

The final score in Table 4 is better than the third team. Further more, the improved method is also tested on the DREAM4 In Silico Size100 Multifactorial subchallenge. The final performances are represented in Table 5, and the estimation performances of the improved algorithm significantly outperform the best team. These results shows that our improved method may be competent to infer gene regulatory networks.

Table 5. Performances on the DREAM4 Multifactorial subchallenge using improved method.

	Net1	Net2	Net3	Net4	Net5
AUROC	0.7510	0.7416	0.7995	0.7865	0.8071	42.8862
	(7.6122e-19)	(1.2736e-30)	(2.1254e-39)	(3.6995e-32)	(2.7275e-36)
AUPR	0.1740	0.1646	0.2524	0.2472	0.2825
	(4.1120e-39)	(3.7733e-58)	(4.7196e-60)	(9.2795e-57)	(9.7252e-65)

Open in a new tab

Computation Time

In this section, the computational complexity of the proposed method is discussed. It is well known that integer programming is an NP-complete problem and there is no known polynomial-time algorithm to solve it [18], [19]. Therefore, we only discuss the computational complexity of the first two steps. The main calculating module is the least squares estimator. More precisely, this estimator involves a large matrix multiplication operation, for instance Inline graphic . Here,

graphic file with name pone.0043819.e434.jpg

where, Inline graphic represents the experiment number, and represents the order of the Taylor series. Therefore, for a particular network including genes, in which the number of transcription factors is , the computational complexity of the proposed method is . In general, , that is, the computational complexity is Inline graphic .

Using the first order Taylor expansion, the computation time for each network in the DREAM4 In Silico Size100 Multifactorial subchallenge is respectively: 0.1047 s, 0.1054 s, 0.1042 s, 0.1052 s, and 0.1046 s. While, using the third order Taylor expansion the computation time is respectively: 0.7725 s, 0.7285 s, 0.7338 s, 0.7281 s, and 0.7272 s. For the DREAM5 Network Inference Challenge, the computation time is respectively: 65.5030 s, 378.9817 s, and 295.2905 s.

The computation is performed on a PC with Inter(R) Core (TM) i5-2400 CPU, 4 GB RAM, and Matlab 2008a.

Concluding Remarks

In this paper, an algorithm is developed for the GRN topology inference from steady state multifactorial perturbation data. The GRN inference problem among Inline graphic genes is decomposed into different regression problems. In each of the regression problems, the expression level of a target gene is predicted solely from the expression level of a potential regulation gene. For different potential regulation genes, different weights for a specific target gene are constructed by using the sum of squared residuals and the Pearson correlation coefficient. The larger the sum of squared residuals is, the weaker the direct regulatory interaction will be. And, the higher the Pearson correlation coefficient is, the stronger the rationality is for the application of the regression analysis. Then, the constructed weight of a gene is normalized. To employ the network sparse property quantitatively, a 0–1 integer programming problem is constructed. By solving this problem, direct regulation genes for an arbitrary gene can be estimated. Lastly, the normalized weight of a gene is modified, on the basis of the estimation results about the existence of direct regulations to it. These normalized and modified weights are used in queuing the possibility of the existence of a corresponding direct regulation.

Computational results with the DREAM4 In Silico Size100 Multifactorial subchallenge show that this method can outperform the available method, particularly in improving the AUPR specifications. Using the real data provided by the DREAM5 Network Inference Challenge, estimation performances can be ranked third. In addition, if the veracity of the prior structural information is certifiable, the third step of this method not only improve the final score but also could guarantee the PR curve starts from the left upper corner, which may be helpful in guiding designs of a biological validation experiment.

Although the computational results are promising, many important issues still need further efforts. Among them, how to utilize the experimental data to obtain the in-degree distribution of a GRN is currently under investigations.

Supplementary Information

The Matlab files for this method will be offered upon request. Please contact the following email address: xiongj08@mails.tsinghua.edu.cn.

Funding Statement

The reported work was financially supported in part by the 973 Program under Grant 2012CB316504 and 2009CB320602 and by the National Natural Science Foundation of China under Grants 61174122, 61021063, 60721003, and 60625305. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Martin S, Zhang Z, Martino A, Faulon J (2007) Boolean dynamics of genetic regulatory networks inferred from microarray time series data. Bioinformatics 23: 866–874. [DOI] [PubMed] [Google Scholar]
2. Ferrazzi F, Sebastiani P, Ramoni M, Bellazzi R (2007) Bayesian approaches to reverse engineer cellular systems: a simulation study on nonlinear gaussian networks. BMC bioinformatics 8: S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Opgen-Rhein R, Strimmer K (2007) From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data. BMC Systems Biology 1: 37. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Zhou T, Wang Y (2010) Causal relationship inference for a large-scale cellular network. Bioinformatics 26: 2020–2028. [DOI] [PubMed] [Google Scholar]
5.The dream4 In Silico network challenge. Available: http://wiki.c2b2.columbia.edu/dream/index. php/D4c2. Accessed 2012 Aug 30.
6.Genenetweaver tool version 2.0. Available: http://gnw.sourceforge.net. Accessed 2012 Aug 30.
7. Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P (2010) Inferring regulatory networks from expression data using tree-based methods. PLoS One 5: e12776. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Menéndez P, Kourmpetis Y, Ter Braak C, van Eeuwijk F (2010) Gene regulatory networks from multifactorial perturbations using graphical lasso: application to the dream4 challenge. PloS one 5: e14147. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Eisen M, Spellman P, Brown P, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences 95: 14863. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Barabási A, Oltvai Z (2004) Network biology: understanding the cell's functional organization. Nature Reviews Genetics 5: 101–113. [DOI] [PubMed] [Google Scholar]
11. MacLean D, Elina N, Havecker E, Heimstaedt S, Studholme D, et al. (2010) Evidence for large complex networks of plant short silencing rnas. PLoS One 5: e9901. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Albert R (2005) Scale-free networks in cell biology. Journal of cell science 118: 4947–4957. [DOI] [PubMed] [Google Scholar]
13. Hempel S, Koseska A, Nikoloski Z, Kurths J (2011) Unraveling gene regulatory networks from time-resolved gene expression data–a measures comparison study. BMC bioinformatics 12: 292. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Weisberg S (1981) Applied linear regression. New York: Wiley. [Google Scholar]
15. De Jong H (2002) Modeling and simulation of genetic regulatory systems: a literature review. Journal of computational biology 9: 67–103. [DOI] [PubMed] [Google Scholar]
16. Cantone I, Marucci L, Iorio F, Ricci M, Belcastro V, et al. (2009) A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell 137: 172–181. [DOI] [PubMed] [Google Scholar]
17. Wang Y, Zhou T (2012) A relative variation-based method to unraveling gene regulatory networks. PloS one 7: e31194. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Nemhauser G, Wolsey L (1988) Integer and combinatorial optimization. John Wiley & Son. [Google Scholar]
19.Wolsey L (1988) Integer Programming. Hoboken, NJ: John Wiley and Son. [Google Scholar]
20. Pinna A, Soranzo N, de la Fuente A (2010) From knockouts to networks: establishing direct causeeffect relationships through graph analysis. PloS one 5: e12912. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Klamt S, Flassig R, Sundmacher K (2010) Transwesd: inferring cellular networks with transitive reduction. Bioinformatics 26: 2160–2168. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Prill R, Marbach D, Saez-Rodriguez J, Sorger P, Alexopoulos L, et al. (2010) Towards a rigorous assessment of systems biology models: the dream3 challenges. PloS one 5: e9202. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Andrecut M, Kauffman S, Madni A (2008) Evidence of scale-free topology in gene regulatory network of human tissues. International Journal of Modern Physics C 19: 283–290. [Google Scholar]

[pone.0043819-Martin1] 1. Martin S, Zhang Z, Martino A, Faulon J (2007) Boolean dynamics of genetic regulatory networks inferred from microarray time series data. Bioinformatics 23: 866–874. [DOI] [PubMed] [Google Scholar]

[pone.0043819-Ferrazzi1] 2. Ferrazzi F, Sebastiani P, Ramoni M, Bellazzi R (2007) Bayesian approaches to reverse engineer cellular systems: a simulation study on nonlinear gaussian networks. BMC bioinformatics 8: S2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0043819-OpgenRhein1] 3. Opgen-Rhein R, Strimmer K (2007) From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data. BMC Systems Biology 1: 37. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0043819-Zhou1] 4. Zhou T, Wang Y (2010) Causal relationship inference for a large-scale cellular network. Bioinformatics 26: 2020–2028. [DOI] [PubMed] [Google Scholar]

[pone.0043819-The1] 5.The dream4 In Silico network challenge. Available: http://wiki.c2b2.columbia.edu/dream/index. php/D4c2. Accessed 2012 Aug 30.

[pone.0043819-Genenetweaver1] 6.Genenetweaver tool version 2.0. Available: http://gnw.sourceforge.net. Accessed 2012 Aug 30.

[pone.0043819-HuynhThu1] 7. Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P (2010) Inferring regulatory networks from expression data using tree-based methods. PLoS One 5: e12776. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0043819-Menndez1] 8. Menéndez P, Kourmpetis Y, Ter Braak C, van Eeuwijk F (2010) Gene regulatory networks from multifactorial perturbations using graphical lasso: application to the dream4 challenge. PloS one 5: e14147. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0043819-Eisen1] 9. Eisen M, Spellman P, Brown P, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences 95: 14863. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0043819-Barabsi1] 10. Barabási A, Oltvai Z (2004) Network biology: understanding the cell's functional organization. Nature Reviews Genetics 5: 101–113. [DOI] [PubMed] [Google Scholar]

[pone.0043819-MacLean1] 11. MacLean D, Elina N, Havecker E, Heimstaedt S, Studholme D, et al. (2010) Evidence for large complex networks of plant short silencing rnas. PLoS One 5: e9901. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0043819-Albert1] 12. Albert R (2005) Scale-free networks in cell biology. Journal of cell science 118: 4947–4957. [DOI] [PubMed] [Google Scholar]

[pone.0043819-Hempel1] 13. Hempel S, Koseska A, Nikoloski Z, Kurths J (2011) Unraveling gene regulatory networks from time-resolved gene expression data–a measures comparison study. BMC bioinformatics 12: 292. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0043819-Weisberg1] 14.Weisberg S (1981) Applied linear regression. New York: Wiley. [Google Scholar]

[pone.0043819-DeJong1] 15. De Jong H (2002) Modeling and simulation of genetic regulatory systems: a literature review. Journal of computational biology 9: 67–103. [DOI] [PubMed] [Google Scholar]

[pone.0043819-Cantone1] 16. Cantone I, Marucci L, Iorio F, Ricci M, Belcastro V, et al. (2009) A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell 137: 172–181. [DOI] [PubMed] [Google Scholar]

[pone.0043819-Wang1] 17. Wang Y, Zhou T (2012) A relative variation-based method to unraveling gene regulatory networks. PloS one 7: e31194. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0043819-Nemhauser1] 18.Nemhauser G, Wolsey L (1988) Integer and combinatorial optimization. John Wiley & Son. [Google Scholar]

[pone.0043819-Wolsey1] 19.Wolsey L (1988) Integer Programming. Hoboken, NJ: John Wiley and Son. [Google Scholar]

[pone.0043819-Pinna1] 20. Pinna A, Soranzo N, de la Fuente A (2010) From knockouts to networks: establishing direct causeeffect relationships through graph analysis. PloS one 5: e12912. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0043819-Klamt1] 21. Klamt S, Flassig R, Sundmacher K (2010) Transwesd: inferring cellular networks with transitive reduction. Bioinformatics 26: 2160–2168. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0043819-Prill1] 22. Prill R, Marbach D, Saez-Rodriguez J, Sorger P, Alexopoulos L, et al. (2010) Towards a rigorous assessment of systems biology models: the dream3 challenges. PloS one 5: e9202. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0043819-Andrecut1] 23. Andrecut M, Kauffman S, Madni A (2008) Evidence of scale-free topology in gene regulatory network of human tissues. International Journal of Modern Physics C 19: 283–290. [Google Scholar]

PERMALINK

Gene Regulatory Network Inference from Multifactorial Perturbation Data Using both Regression and Correlation Analyses

Jie Xiong

Tong Zhou

Roles

Abstract

Introduction

Materials and Methods

Problem Statement

Regression Analysis

Weight Construction

Weight Normalization

In-degree Estimation and Weight Magnitude Modification

Table 1. Performances Only for Weight Normalization.

Table 2. Performances with the optimal and .

Table 3. Performances with typical and .

Estimation Algorithm

Results and Discussion

Data Sets and Assessment Metrics

Prediction Performances of

Figure 1. ROC and PR curves of .

Figure 2. Prediction results of .

Prediction Performances of

Figure 3. ROC and PR curves of .

Figure 4. Effect for the third step.

Performances on the DREAM5 Network Inference Challenge

Table 4. Performances on the DREAM5 Network Inference Challenge.

Table 5. Performances on the DREAM4 Multifactorial subchallenge using improved method.

Computation Time

Concluding Remarks

Supplementary Information

Funding Statement

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Gene Regulatory Network Inference from Multifactorial Perturbation Data Using both Regression and Correlation Analyses

Jie Xiong

Tong Zhou

Roles

Abstract

Introduction

Materials and Methods

Problem Statement

Regression Analysis

Weight Construction

Weight Normalization

In-degree Estimation and Weight Magnitude Modification

Table 1. Performances Only for Weight Normalization.

Table 2. Performances with the optimal and .

Table 3. Performances with typical and .

Estimation Algorithm

Results and Discussion

Data Sets and Assessment Metrics

Prediction Performances of

Figure 1. ROC and PR curves of .

Figure 2. Prediction results of .

Prediction Performances of

Figure 3. ROC and PR curves of .

Figure 4. Effect for the third step.

Performances on the DREAM5 Network Inference Challenge

Table 4. Performances on the DREAM5 Network Inference Challenge.

Table 5. Performances on the DREAM4 Multifactorial subchallenge using improved method.

Computation Time

Concluding Remarks

Supplementary Information

Funding Statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases