Skip to main content
Interface Focus logoLink to Interface Focus
. 2019 Dec 13;10(1):20190049. doi: 10.1098/rsfs.2019.0049

Reverse engineering gene networks using global–local shrinkage rules

Viral Panchal 1, Daniel F Linder 2,
PMCID: PMC6936010  PMID: 31897291

Abstract

Inferring gene regulatory networks from high-throughput ‘omics’ data has proven to be a computationally demanding task of critical importance. Frequently, the classical methods break down owing to the curse of dimensionality, and popular strategies to overcome this are typically based on regularized versions of the classical methods. However, these approaches rely on loss functions that may not be robust and usually do not allow for the incorporation of prior information in a straightforward way. Fully Bayesian methods are equipped to handle both of these shortcomings quite naturally, and they offer the potential for improvements in network structure learning. We propose a Bayesian hierarchical model to reconstruct gene regulatory networks from time-series gene expression data, such as those common in perturbation experiments of biological systems. The proposed methodology uses global–local shrinkage priors for posterior selection of regulatory edges and relaxes the common normal likelihood assumption in order to allow for heavy-tailed data, which were shown in several of the cited references to severely impact network inference. We provide a sufficient condition for posterior propriety and derive an efficient Markov chain Monte Carlo via Gibbs sampling in the electronic supplementary material. We describe a novel way to detect multiple scales based on the corresponding posterior quantities. Finally, we demonstrate the performance of our approach in a simulation study and compare it with existing methods on real data from a T-cell activation study.

Keywords: Bayesian shrinkage, horseshoe prior, gene networks, reverse engineering

1. Introduction

Methods for inferring gene networks attempt to probe the relationships among various cellular constituents such as proteins, metabolites and gene products. These networks and pathways allow intracellular species to interact with each other, so that the resulting coordinated molecular activity can support a wide range of cellular processes such as immune response [1], cell cycle and death. This underlying regulatory control structure is known to have implications in experimental and clinical biology.

The advances in high-throughput technologies, such as the microarray and more recently next-generation sequencing, have given rise to massive amounts of genomic data that can be used to learn these underlying regulatory networks [2,3]. However, network inference has remained a significant challenge because of statistical and computational issues. These computational challenges often arise when detailed modelling assumptions are made, which in turn lead to complicated likelihoods and high dimensionality [4].

A great deal of effort has been directed towards developing methods for learning reaction network systems. There are three broad classes of gene regulatory network inference methods that focus on modelling and constructing underlying gene regulatory networks [5,6]. One of the broad classes of methods is based on mutual information (MI). Algorithm for the reconstruction of accurate cellular networks (ARACNE) [7,8] is one of the MI-based methods to estimate network topology first developed by Butte & Kohane [9]. MI-based methods often perform well in the study of gene interactions where there are insufficient data or where the gene interactions are complex [10]. However, these methods do not provide information about the dynamic nature of gene regulation, and therefore are not as useful for predicting new observations. A second major class of gene regulatory methods is Bayesian networks [11]. Although Bayesian networks provide probabilistic relationships among genes by constructing directed acyclic graphs, one cannot infer causality among genes. Moreover, Bayesian networks cannot produce feedback loops or cyclic graphs, which is a major limitation. However, this limitation can be overcome by dynamic Bayesian networks that handle time-series data. The popular gene network inference algorithm known as ‘Banjo’ can be used to construct Bayesian and dynamic Bayesian networks [12] for time-series expression data. Dynamic Bayesian networks are Bayesian networks that can ‘unfold’ over time, so that the network structure is allowed to change at certain changepoints, for example to account for various types of developmental progression [13,14].

A third group of gene network inference methods is based on dynamical systems, usually ordinary differential equations (ODEs) or stochastic differential equations (SDEs). ODEs not only provide deterministic relationships among genes but also represent their causal interactions, which is a major advantage over the other two classes of gene network inference methods. They also yield directed cyclic graphs and can be used for steady-state and time-series gene expression profiles. Several ODE-based algorithms have been proposed in recent years, including network identification by multiple regression (NIR), microarray network identification (MNI) and time-series network identification (TSNI) [1517]. Other methods are focused on either structural inference or kinetic parameter estimation, with key modelling elements borrowed from a wide range of mathematical tools such as algebraic statistical models [18,19] and regularized objective functions [20]. From a statistical perspective, it is ideal to base inference on the likelihood function, since the likelihood principle has strong axiomatic foundations and the estimates based on likelihoods exhibit optimal properties. The exact data likelihoods for stochastic reaction networks are usually intractable, since each reaction event is seldom recorded. While some work has been done using exact likelihoods [2124], the methods based on the exact data likelihoods are generally reserved for small systems of not more than a few molecular species types. Approximations to the exact processes such as the diffusion approximation [2527] and the linear noise approximation (LNA) [2830] trade some accuracy for computational speed. Both of these approximations become exact under limiting arguments [27], and in the case of LNA a tractable Gaussian likelihood describing the process transition density can be obtained.

The approximating ODEs and SDEs to the exact systems provide a natural approach to gene network reconstruction through formulation of the inference problem via a regression-type analysis. Morrissey et al. [31] developed a Bayesian reverse engineering strategy based on treating the data as an auto-regressive process, with point mass priors on the adjacency matrix indicating regulatory relationships. Although their data modelling assumption is frequently used for network inference, regressing total mRNA expression on time-lagged expression values is problematic since transcription factors directly affect the rate of change of target mRNA, and not the accumulated amount of target mRNA [10]. Recently, Leday et al. [32] cast the network reconstruction problem into a Bayesian regression framework as well. Unfortunately, their model regresses mRNA expression levels onto all other expression values at the same time point, making causal conclusions less straightforward. In addition, by treating the data as independent, statistical dependencies in transcriptional activity not due to the regulatory effects of measured quantities are not accounted for in their methodology. The method that we propose addresses both of these issues.

2. Methods

Our approach to inferring gene regulatory networks is based on a data model that uses an Euler–Maruyama discretization [33] to approximate the differentials of ODEs/SDEs. These data modelling strategies allow the network inference based on SDEs and ODEs to proceed similar to a standard regression analysis, even though the inferred model is dynamical. Such modelling is expected to be accurate in general for ODE systems contaminated with measurement error, and also for SDE (diffusion) approximations of the exact density-dependent Markov jump processes, in a large volume setting. In combination with appropriate priors of a Bayesian model, the reformulation as a linear (statistical) model was shown to empirically produce good network estimates in [4].

2.1. Expression data

Consider gene expression data, Xi(tj)Rd, denoting the expression values of d molecular species at time point tj for replicate i, where i = 1, …, r are independent experimental replications of the process measured at tj j = 1, …, m time points, not necessarily equidistant. To form the response and design matrix, finite differences between time points are computed by

Yij:=(Xi(tj)Xi(tj1))(tjtj1), 2.1

and then the response, or approximate differential Yij, can be regressed onto the lagged expression values Xi(tj−1). A natural way to perform network inference is to consider estimating the regulatory matrix of the following statistical model:

Yij=βXi(tj1)+ϵijandϵijFij, 2.2

where βRd×d is the parameter of interest and Fij is some d-dimensional probability measure. For large systems, that is, when d is more than a few tens, some or many of the entries of β may be zero; this is often referred to as sparsity. Our Bayesian hierarchical model accounts for this by attempting to shrink the irrelevant entries in β to zero using priors we have specialized to the multivariate setting.

One pertinent observation about equation (2.2) is that it approximates a linear SDE/ODE. The proposed methodology easily extends to and is well defined in higher order systems; for instance, those obtained by adding higher order terms such as β2Xij2 to the right-hand side of equation (2.2). This follows from a result we provide in the electronic supplementary material about a more general model, which will guarantee posterior propriety and an efficient Gibbs strategy, regardless of the order of the system. However, the nonlinear forms derived from single-cell biochemistry are not appropriate for data aggregated over cellular populations, such as microarray and RNA-seq. This is because errors are incurred when commuting drift and expectation for nonlinear systems [4], so that averaging over populations will maintain consistency only in linear systems. Furthermore, Oates & Mukherjee [4] report no observed improvements in network inference by including such extra basis functions over the linear model, although our approach easily allows for such an extension. They also show that this linear model includes a number of existing approaches as special cases. They do note however that a crucial challenge is how to adequately account for heterogeneity due to uneven sampling intervals, which they show can lead to problems with inference. The commonly adopted strategy is to model these differences in variation by assuming that Var(Yij)=h(Δj)D(σ12,,σd2), where h:R+R+ is a variance function and D is a diagonal matrix. However, the correct choice for such a variance function is impossible without knowledge of the unknown process. Our strategy attempts to learn this from the data.

2.2. Hierarchical model

To address these challenges, we propose a fully Bayesian procedure. Our proposed model takes the following form:

Yij|qij,β,ΣN(βXi(tj1),Σqij),qijG(v2,v2),π1(v)(vv+3)1/2{g(v2)g(v+12)2(v+3)v(v+1)2}1/2,v>0,β|Σ,τ2,λ2k=1dN(0,λ2τk2Σ),π2(Σ)|Σ|(d+1/2),τkC+(0,1),k=1,,p,λC+(0,1). 2.3

The motivation for choosing the terms in the above hierarchical model deserves some attention. We model the error terms, εij, as coming from the class of scale mixtures of multivariate normal distributions. We posit that this likelihood specification will be accurate in large volume, and will also guard against potential heavy-tailed measurement error. These have been observed in the various omics technologies, and we also show empirical evidence that our T-cell data exhibit this property. Independence and normality, conditional on the lagged expression and variance, follow from the Markov property and approximating Gaussian transition density. The data likelihood model above will be a good approximation when the trajectories arise as observations from a stochastic reaction network in large volume, or a deterministic system, contaminated by (possibly correlated) measurement error. A key aspect of the formulation in equation (2.3) is the variance mixture component term, qij, that can automatically adjust for uneven sampling intervals by mixing over scales of the normal distribution. Our choice of the Gamma prior on qij leads to the multivariate Student’s t likelihood for fixed degrees of freedom v [34]. We add another layer of hierarchy by assigning the Jeffreys independence prior on v [35,36] to learn the tail heaviness from the data, instead of assigning it a fixed value a priori. In the above prior on the degress of freedom v, g and g′ are the digamma and trigamma functions, respectively. We also assign a Jeffreys prior on the covariance matrix Σ. Our reliance on non-informative priors (NIPs) for these variance components of the likelihood are on purpose, since previous work has shown that inference is sensitive to these choices, and we prefer to learn these primarily from the data. Since the joint prior is then improper, checking for posterior propriety becomes imperative, and we give a sufficient condition in the electronic supplementary material for posterior propriety in a more general setting.

We assign an independent-scale mixture of multivariate normals to the columns of β. This is a multivariate extension of the class of global–local shrinkage priors, where τk2 controls local shrinkage of the kth column vector of β and λ2 controls global shrinkage across all parameters. Global–local shrinkage priors shrink small signals strongly towards prior means while leaving large signals unshrunk. The half-Cauchy distribution, C+(0,1), on the roots of the variance components produce marginal densities termed the ‘horseshoe prior’ on β with infinite spikes at the origin (figure 1) and heavy tails (figure 2) that are able to separate signal from noise [37]. Importantly, although this prior is continuous, it has been shown to behave strikingly similar to the two-group model, or point mass mixture prior.

Figure 1.

Figure 1.

A comparison of the horseshoe prior density versus Cauchy and Laplace densities. The solid blue line indicates that the horseshoe prior approaches ∞ near 0. (Online version in colour.)

Figure 2.

Figure 2.

Tails of horseshoe, Cauchy and Laplace densities. This figure shows that the tail of the horseshoe prior is heavier than that of the Laplace prior. (Online version in colour.)

2.3. Posterior computation

An extremely useful and practical aspect of the hierarchical model in (2.3) is that it leads to an efficient computational strategy via data augmentation [6]. Here, we provide the Gibbs algorithm for posterior computation, and have made the corresponding R code implementation available online at https://github.com/v-panchal/GLP-GRN.

2.3.

In our notation above, Q is the r(m − 1) × r(m − 1) diagonal matrix with elements qij, T is the d × d diagonal matrix whose kth element is (λ2τk2)1, Ω=(XQ1X+T1)1, and μ=ΩXQ1Y. The quantity X is the r(m − 1) × p design matrix with (ij)th row Xi(tj−1) and Y is the r(m − 1) × d matrix whose (ij)th row is Yij, for i = 1, …, r and j = 2, …, m. The term MNd,d(μ,Σ,Ω) is the matrix normal distribution with location μ and scale matrices Σ and Ω. G() is a gamma distribution and IG() denotes the inverse gamma distribution with respective parameters. IWd(r(m1),Φ1) is the inverse-Wishart distribution with degrees of freedom r(m − 1) and scale matrix Φ−1.

2.4. Multiscaling

We outline here a novel method for using posterior quantities to both separate signal from noise and detect different scales in the signals. The idea behind multiscale clustering is to categorize parameters into clusters and use them to perform model selection subsequently. First, we calculate the mean and standard deviation for each coefficient from the posterior sample. Using the absolute value of the posterior mean of a coefficient, |μβ|, and |μβ|/σβ we can form the two-dimensional vector (|μβ|, |μβ|/σβ) to determine the different groups of coefficients. As an example, consider a reaction system with fast, slow and superfluous regulatory interactions. These would correspond to large, small and zero reaction rates/coefficients with three distinct clusters. Large |μβ|, and |μβ|/σβ would represent fast reactions that are likely to be non-zero, while small |μβ| and large |μβ|/σβ indicate slow non-zero ones, with the rest being zero. If a researcher suspects more than two scales, additional numbers of clusters can be used for further scale detection. For instance, consider the absolute mean values of a hypothetical coefficient to be β = (1.2, 0.9, 1.5, 0.02, 0.012, 0.05, 0.01, 0.003, 4.5, 5.15, 4.2, 0.012, 0.033, 0.025). In this case, we have three obvious clusters: cluster 1 contains the values close to zero, cluster 2 contains (1.2, 0.9, 1.5) and cluster 3 contains (4.5, 5.15, 4.2). We use K-means clustering to identify these different scales in subsequent analyses.

3. Simulation study

In this section, we design a simulation study to investigate how the proposed hierarchical model performs when the truth is known. We compare it with some standard statistical approaches across different metrics.

3.1. Parameter settings

We simulate time-series gene expression profiles under the assumption of the following model:

dZi(t)dt=βZi(t),

where β is the d × d regulatory matrix of the underlying true network. We set d = 25 so that β contains 25 × 25 = 625 unknown parameters of interest. For the reported results, we assume 10 true non-zero coefficients with values 0.91, 5.23, −1.04, 4.86, 4.5, 1.19, −0.81, −4.25, −4.23, 5.53 and set the rest to zero. This coincides with a fast scale of coefficients 5.23, 4.86, 4.5, −4.35, −4.22, 5.53, a slower scale of coefficients 0.91, −1.04, 1.19, −0.81 and the zero ones. The true regulatory network structure is indicated in figure 3. Thus, the sparsity level here is (10/625) = 0.016. Each replicate has initial conditions randomly sampled from the d = 25 dimensional truncated normal distribution with mean 11 and variance 36, truncated below at 5 and above at 15. This is to simulate heterogeneity in the initial conditions between replications. We collect the ODE data at m = 6 randomly sampled time points between 0 and 5 and then contaminate them with noise, i.e. Xi(tj)=Zi(tj)+ϵ~ij, to arrive at the final data. We consider ϵ~ij to be either normally distributed with variances (σ2) or Student’s t distributed with degrees of freedom (d.f.). The response, Y, is computed as discussed earlier by

Yi(tj)=Xi(tj)Xi(tj1)tjtj1,

and the network inference is then based on estimating the coefficient/regulatory matrix of the multivariate regression model

Yi(tj)=βXi(tj1)+ϵij 3.1

using the ordinary least-squares approach (OLS), the Jeffreys NIP, the least absolute shrinkage and selection operator (LASSO) approach (LS), and our Bayesian hierarchical model using the global–local prior (GLP). The OLS approach is perhaps the most frequently used classical method for estimation, but is known to have serious problems in sparse or data-poor situations. For the LS method, the penalty parameter is chosen via 10-fold cross-validation. The Jeffreys prior π(β, Σ) ∝ |Σ|−(d+1/2) is an NIP measure in our current setting, but it unfortunately leads to an improper posterior distribution under small replications. In the general multivariate regression setting, this happens when n < p + d, so that, as the size of predictors p grows, the Jeffreys prior is no longer a viable option. We prove in the electronic supplementary material that, for our partially informative prior on β and Σ, a sufficient condition for propriety is nd in both the general setting and our specialized setting.

Figure 3.

Figure 3.

True network for the simulation study with 25 nodes and 10 edges. It is a directed graph with blue lines for positive regulation, red lines for negative regulation and edge thickness proportional to coefficient magnitude. (Online version in colour.)

We note that the measurement time points are not equally spaced. The above error term, εij, is intended to capture the measurement errors ϵ~ij from the Euler data-processing step, together with stochastic fluctuations that are heterogeneous across time, by accounting for this through the mixing of data likelihood over the error variance in the hierarchical model. Moreover, different combinations of σ2 and d.f. are considered, as well as differing numbers of replicates r to study low- and high-dimensional situations. The following combinations of r, σ2 and d.f. are used in our study:

  • (i)

    r ∈ {8, 12, 16}, corresponding to n = r × (m − 1) ∈ {40, 60, 80}

  • (ii)

    σ2 ∈ {0.01, 0.1, 1}, for normal errors

  • (iii)

    d.f. ∈ {3, 5, 10}, for Student’s t errors.

3.2. Simulation results

We compare the four approaches previously mentioned in terms of prediction accuracy, estimation and variable selection. For prediction accuracy, we calculate the predicted median absolute deviation (PMAD) between true and predicted responses. For parameter estimation, we compute the Frobenius norm (FN) between the true coefficient matrix and the estimated one. Table 1 summarizes both the estimation and prediction performance of all the methods under Student’s t error. In all examples, the global–local shrinkage prior significantly outperforms its competitors. We can see that, when sample size decreases, FN and PMAD increase for all the methods; however, the GLP is less affected by drastic changes in the sample size. It is also noted that, with increasing d.f., FN and PMAD reduce in size for all methods as expected. We reason that the improved performance is due to the proposed model’s ability to handle both sparsity and heavier tailed, heterogeneous error. We report similar findings for normal errors (table 2) with different measurement error variance σ2 = (0.01, 0.1, 1). The fully NIP cannot be applied in the last case, because the posterior distribution is improper when n < p + d; note that on our specialized setting since p = d, this implies impropriety when n < 50. We indicate this as NA in the subsequent tables. Additionally, we note that OLS gives erratic behaviour because of identifiability issues (table 1).

Table 1.

Simulation results for Student’s t error. The numbers are the averages of the Frobenius norm (FN) and predicted median absolute deviation (PMAD) from 100 simulation replications for different scenarios of sample sizes (n) and degrees of freedom (d.f.).

FN
PMAD
n d.f. GLP OLS NIP LS GLP OLS NIP LS
80 3 5.37 13.88 13.19 7.33 1.85 3.06 2.90 1.99
5 4.43 12.74 12.45 6.36 1.62 2.69 2.58 1.73
10 4.08 12.45 12.52 5.48 1.50 2.53 2.44 1.57
60 3 5.79 17.35 17.26 8.24 1.89 3.74 3.73 2.10
5 4.75 16.12 16.48 7.32 1.67 3.42 3.40 1.81
10 4.44 16.10 16.51 6.42 1.54 3.22 3.21 1.63
40 3 7.56 24.83 NA 9.28 2.01 5.37 NA 2.18
5 6.71 25.79 NA 8.94 1.79 5.10 NA 1.99
10 6.24 24.59 NA 8.32 1.63 4.76 NA 1.82

Table 2.

Simulation results for normal error. The numbers are the averages of the Frobenius norm (FN) and predicted median absolute deviation (PMAD) from 100 simulation replications for different scenarios of sample sizes (n) and variances (σ2).

FN
PMAD
n σ2 GLP OLS NIP LS GLP OLS NIP LS
80 0.01 3.55 25.89 28.09 3.34 0.15 1.29 1.32 0.17
0.10 3.57 13.56 13.65 3.46 0.43 1.48 1.45 0.46
1.00 4.04 12.03 12.24 5.18 1.37 2.38 2.30 1.44
60 0.01 3.73 58.22 63.70 3.46 0.16 2.85 2.89 0.18
0.10 3.77 22.73 23.65 3.77 0.45 2.77 2.76 0.47
1.00 4.24 16.04 16.26 6.24 1.42 3.13 3.12 1.52
40 0.01 4.28 119.28 NA 4.13 0.17 4.68 NA 0.21
0.10 4.29 45.08 NA 4.80 0.49 4.53 NA 0.55
1.00 5.35 24.52 NA 8.19 1.51 4.73 NA 1.67

To illustrate the effect of shrinking coefficients using the global–local shrinkage prior in the proposed variable selection procedure in §2d, we plot the estimated stable edges over 100 iterations in figure 4. These are the edges appearing in at least half of the iterations. We also select the stable edges in the real data analysis in this way, that is, as those edges from the full data analysis that were also present in at least half of the inferred networks from the bootstrap resamples; see below. From this figure, we can see that as sample size increases our method tends to identify a higher number of true edges than the OLS and NIP methods. Moreover, the numbers of true edges look similar for the proposed method and the LS method for greater sample size, but the proposed method performs better for smaller sample sizes. Although we see some falsely identified edges, there are comparatively fewer than other methods. The method based on the NIP performs worse than the proposed method, but better than the OLS in terms of network identification. We quantify the network inference capabilities of different methods by reporting the average true positive rate (TPR) as the number of edges correctly identified by the method, the false discovery rate (FDR) as the ratio of falsely identified edges and total number of selected edges, and the false positive rate (FPR) as the number of incorrectly selected edges, as illustrated in tables 3 and 4. The formulae for each of these evaluation criteria are given as

FN=i=1pj=1p|βijβ^ij|2,PMAD=median(|YY^|),TPR=TPTP+FN,FDR=FPFP+TPandFPR=FPFP+TN.

Figure 4.

Figure 4.

Inferred networks for the simulation example with the GLP (green), OLS method (pink), NIP (blue) and LS method (orange) for different sample sizes (n = 80, 60 and 40). These are directed graphs with the blue lines indicating positive regulation, red lines indicating negative regulation and edge thickness proportional to the coefficient magnitude. Plots were generated using the R CRAN package igraph. (Online version in colour.)

Table 3.

Simulation results for Student’s t error. The numbers are the averages of the true positive rate (TPR), the false discovery rate (FDR) and the false positive rate (FPR) from 100 simulation replications for different scenarios of sample sizes (n) and degrees of freedom (d.f.).

TPR
FDR
FPR
n d.f. GLP OLS NIP LS GLP OLS NIP LS GLP OLS NIP LS
80 3 0.950 0.924 0.940 0.937 0.554 0.894 0.881 0.706 0.021 0.131 0.117 0.043
5 0.959 0.954 0.961 0.961 0.441 0.875 0.866 0.605 0.013 0.114 0.106 0.028
10 0.978 0.947 0.954 0.960 0.405 0.868 0.864 0.501 0.012 0.106 0.102 0.018
60 3 0.915 0.870 0.882 0.912 0.587 0.919 0.914 0.736 0.024 0.168 0.161 0.047
5 0.929 0.871 0.883 0.928 0.516 0.911 0.910 0.669 0.017 0.153 0.153 0.038
10 0.948 0.891 0.888 0.945 0.486 0.902 0.905 0.609 0.016 0.143 0.147 0.028
40 3 0.818 0.785 NA 0.850 0.622 0.949 NA 0.768 0.025 0.247 NA 0.055
5 0.856 0.786 NA 0.853 0.610 0.947 NA 0.738 0.024 0.241 NA 0.047
10 0.853 0.811 NA 0.880 0.566 0.947 NA 0.710 0.020 0.248 NA 0.042

Table 4.

Simulation results for normal error. The numbers are the averages of the true positive rate (TPR), the false discovery rate (FDR) and the false positive rate (FPR) from 100 simulation replications for different scenarios of sample sizes (n) and variances (σ2).

TPR
FDR
FPR
n σ2 GLP OLS NIP LS GLP OLS NIP LS GLP OLS NIP LS
80 0.01 1.000 0.953 0.930 1.000 0.282 0.804 0.801 0.287 0.006 0.066 0.064 0.007
0.10 1.000 0.981 0.978 1.000 0.284 0.795 0.790 0.288 0.006 0.068 0.065 0.007
1.00 0.983 0.968 0.963 0.962 0.370 0.854 0.846 0.436 0.010 0.097 0.091 0.013
60 0.01 1.000 0.760 0.739 0.996 0.278 0.889 0.884 0.291 0.006 0.104 0.098 0.007
0.10 0.999 0.884 0.883 0.993 0.286 0.887 0.885 0.304 0.007 0.119 0.117 0.008
1.00 0.948 0.872 0.869 0.947 0.449 0.894 0.891 0.561 0.014 0.125 0.119 0.022
40 0.01 0.998 0.459 NA 0.963 0.290 0.921 NA 0.416 0.007 0.088 NA 0.012
0.10 0.980 0.784 NA 0.966 0.327 0.935 NA 0.464 0.008 0.189 NA 0.015
1.00 0.875 0.807 NA 0.872 0.564 0.946 NA 0.701 0.020 0.240 NA 0.038

Inspection of tables 3 and 4 reveals that the GLP has improved edge selection with the highest TPR and least FPR and FDR over the other methods and in all the situations considered. We note that the GLP performs slightly better than LS in the case of normal error as expected. However, with Student’s t error, the performance of our method is significantly better than that with LS. These findings illustrate that not only is inference under the GLP hierarchy valid, but that it is robust across a range of sample sizes. However, when n approaches d, the posterior propriety threshold, GLP-based inference becomes worse across the range of metrics, as expected. We again remark that inference via GLP is valid as long as nd. In summary, these simulation results show that the proposed method performs consistently better than other methods, and is robust to sample size and measurement error.

4. T-cell activation data: reverse engineering gene regulatory network

In this section, we use the proposed methodology to infer a gene regulatory network from the time-series microarray data for T-cell activation presented in [38]. The data originate from an experiment performed to identify the response of a human T-cell line to phorbol 12-myristate 13-acetate (PMA) and ionomycin treatment. The authors investigated the expression of 88 genes using complementary DNA (cDNA) across 10 time points. Human T-cell line cells were treated with PMA and ionomycin, and cells were collected at the following time points after treatment: 0, 2, 4, 6, 8, 18, 24, 32, 48 and 72 h. Then, fluorescence-activated cell scanning (FACS) analysis was performed to measure T-cell expression and activation markers.

Two identical experiments were conducted on two sets of microarrays. The first experiment consisted of microarrays representing 34 replications, with the second experiment containing 10 replications. The authors pre-selected 58 genes out of 88 genes after filtering out gene expression values based on a pre-defined threshold value. In addition, normalization and log-transformation were performed to minimize systematic variation due to experimental situations and to normalize the data.

We consider this pre-processed data on all 44 replications and 58 genes for the final analysis. The design matrix, X, and the generated response matrix, Y, are computed according to equation (2.1). We perform preliminary diagnostics for the error terms by calculating residuals from a least-squares fit to the data, that is, the generated response regressed onto the design matrix. We provide some of the corresponding Q–Q plots in figure 5 for a subset of four different expression types. The figure indicates that the marginal error distributions are nearly symmetric but exhibit very heavy tails, and, indeed, the formal statistical tests of multivariate normality all fail (p < 0.0001).

Figure 5.

Figure 5.

Q–Q plots.

The final gene regulatory network that we report is computed by first fitting our Bayesian hierarchical model to the full data and then selecting from the inferred edges the stable edges, that is, those edges that showed up in at least 50% of 100 bootstrap resamples (figure 6). The multiscaling strategy based on the posterior were used in each of the 100 resamples to compute the network for that sample. Furthermore, we quantify uncertainty in a frequentist sense by setting the edge thickness to be the probability that a particular edge was selected in the 100 resamples. The R package igraph was used to plot the inferred gene regulatory network in figure 6. Next, we discuss some of the biological validity for these findings.

Figure 6.

Figure 6.

Directed gene regulatory network representing the elements of the regulatory matrix learned from the T-cell data. The blue lines indicate positive regulation, red lines indicate negative regulation and edge thickness is proportional to the probability that an edge appears in the 100 resamples. (Online version in colour.)

The process of T-cell activation and response comprises four main steps: adhesion, inflammation, differentiation and apoptosis. The network in figure 6 consists of genes involved in all four of these steps. In the inferred network, ZNFN1A1 (gene 8), EGR1 (gene 28), MCL1 (gene 30), FYB (gene 45), IL2RG (gene 46) and IL3RA (gene 55) have the highest number of connections. ZNFN1A1 encodes a zinc finger protein called Ikaros, which is essential for T-cell proliferation and differentiation [39]. ZNFN1A1 inhibits cell death by negatively influencing the expression of CASP4 and CASP8. Also, JUND affects T-cell proliferation by upregulating ZNFN1A1. We identified three FYB-regulated genes that were also found by Rangel et al. [38]; these genes are involved in activation (GATA3), proliferation (API2) and inflammation (IL2RG). The FYB gene plays a vital role in T-cell proliferation by activating the API2 gene, which functions in the cell by inhibiting caspases in the induction of cell death. EGR1 is an important gene in the regulation of cell growth, differentiation and apoptosis. Our model demonstrates an interesting association between EGR1 and the API1 gene. EGR1 was determined as an early response gene influenced by mitogenic activation [38,40]. According to our model, EGR1 negatively regulates API2, which is a gene responsible for inhibiting cell apoptosis. Furthermore, EGR1 inhibits cell growth and differentiation by suppressing genes CD69, JUND and JUNB. Several studies have reported the functional role of the EGR1/JUN complex in cell apoptosis [41]. Other examples of genes that appear linked in the inferred network include IL2RG, IL3RA, CD69, TRAF5 and IL16, which are involved in the production of cytokines and regulate the inflammatory response [42].

5. Concluding remarks

In this work, we have proposed a Bayesian hierarchical model using multivariate global–local shrinkage priors to infer gene networks. The proposed method handles heavy-tailed data by assuming a multivariate heavy-tailed data likelihood that mixes over Gaussian variance components. Our simulation study demonstrates superior performance of the proposed method in terms of edge selection, coefficient estimation and outcome prediction when compared with other approaches and in a variety of different situations. Our real data analysis reveals edges that have been confirmed experimentally and suggests interesting novel relationships for T-cell activation.

We have shown that the proposed method performs well in the high-dimensional situation and justify its use in this case by providing sufficient conditions for posterior propriety. This is essential to check when the joint prior distribution is not proper since it is required for valid Bayesian inference. When nd, the proposed hierarchical model has a proper posterior, and importantly this condition makes it straightforward to extend our model to arbitrary system types through simple basis expansion. By contrast, the fully NIP yields an improper posterior distribution in high dimensions. Another contribution of our work is the multivariate extension of the global–local shrinkage prior. We are not aware of any previous work that integrates global–local shrinkage rules into the multivariate setting to reverse engineer gene networks. In the biological context, it is often assumed that many biological networks, such as the one under consideration, are indeed sparse. Thus, in the case of true underlying sparsity, the additional information encoded by the global–local prior significantly improves inference. In this paper, we have demonstrated in the simulations and real data application that the use of the global–local shrinkage prior indeed produces such sparsity a posteriori.

Bayesian approaches often have computational challenges due to the required Markov chain Monte Carlo (MCMC) sampling in complicated or high-dimensional models, but a new interpretation of the horseshoe prior [43] has made our posterior sampling procedure much more straightforward and computationally efficient. Although we alleviate a good deal of the MCMC tuning issues by deriving conjugate full conditional distributions for all parameters, naturally the sampling can become slow with increasing model size. The computational time for our method ranged from 15.63 to 21.96 s and for the NIP method ranged from 18.69 to 23.49 s on a 3.06 GHz 6-Core Intel Xeon CPU. We are currently exploring strategies to gain even more computational efficiency by considering alternative representations of certain full conditionals in the sampler.

Supplementary Material

Reverse Engineering Gene Networks using Global-local Shrinkage Rules Supplementary Material
rsfs20190049supp1.pdf (198.7KB, pdf)

Acknowledgements

The authors thank Bani K. Mallick for helpful discussions about the multiscale clustering strategy.

Data accessibility

The data used in this study are publicly available in the open source software R.

Authors' contributions

Both authors contributed equally to developing the manuscript.

Competing interests

We declare we have no competing interests.

Funding

D.F.L. thanks the Mathematical Biosciences Institute (MBI) at The Ohio State University for partially supporting this research through an early career award. MBI receives its funding through the National Science Foundation, grant no. DMS 1440386.

References

  • 1.Cebula A, et al. 2013. Thymus-derived regulatory T cells contribute to tolerance to commensal microbiota. Nature 497, 258–262. 10.1038/nature12079) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Perez OD, Krutzik PO, Nolan GP. 2004. Flow cytometric analysis of kinase signaling cascades. Methods Mol. Biol. 263, 67–94. [DOI] [PubMed] [Google Scholar]
  • 3.Wheeler DA. et al. 2008. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876. ( 10.1038/nature06884) [DOI] [PubMed] [Google Scholar]
  • 4.Oates CJ, Mukherjee S. 2012. Network inference and biological dynamics. Ann. Appl. Stat. 6, 1209 ( 10.1214/11-AOAS532) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bansal M, Belcastro V, Ambesi-Impiombato A, Di Bernardo D. 2007. How to infer gene networks from expression profiles. Mol. Syst. Biol. 3, 122 ( 10.1038/msb4100158) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Liu C. 1996. Bayesian robust multivariate linear regression with incomplete data. J. Am. Stat. Assoc. 91, 1219–1227. ( 10.1080/01621459.1996.10476991) [DOI] [Google Scholar]
  • 7.Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A. 2005. Reverse engineering of regulatory networks in human B cells. Nat. Genet. 37, 382 ( 10.1038/ng1532) [DOI] [PubMed] [Google Scholar]
  • 8.Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A. 2006. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7(Suppl. 1), S7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Butte AJ, Kohane IS. 1999. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac. Symp. Biocomput. 2000, 418–429. [DOI] [PubMed] [Google Scholar]
  • 10.Madar A, Greenfield A, Vanden-Eijnden E, Bonneau R. 2010. DREAM3: network inference using dynamic context likelihood of relatedness and the inferelator. PLoS ONE 5, e9803 ( 10.1371/journal.pone.0009803) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Pearl J. 1985 Bayesian networks: a model of self-activated memory for evidential reasoning. Computer Science Department, University of California (Los Angeles), Los Angeles, CA, USA.
  • 12.Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED. 2004. Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics 20, 3594–3603. ( 10.1093/bioinformatics/bth448) [DOI] [PubMed] [Google Scholar]
  • 13.Grzegorczyk M, Husmeier D. 2013. Regularization of non-homogeneous dynamic Bayesian networks with global information-coupling based on hierarchical Bayesian models. Mach. Learn. 91, 105–154. ( 10.1007/s10994-012-5326-3) [DOI] [Google Scholar]
  • 14.Aderhold A, Husmeier D, Grzegorczyk M. 2014. Statistical inference of regulatory networks for circadian regulation. Stat. Appl. Genet. Mol. Biol. 13, 227–273. ( 10.1515/sagmb-2013-0051) [DOI] [PubMed] [Google Scholar]
  • 15.di Bernardo D, Thompson MJ, Gardner TS, Chobot SE, Eastwood EL, Wojtovich AP, Elliott SJ, Schaus SE, Collins JJ. 2005. Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks. Nat. Biotechnol. 23, 377 ( 10.1038/nbt1075) [DOI] [PubMed] [Google Scholar]
  • 16.Gardner TS, Di Bernardo D, Lorenz D, Collins JJ. 2003. Inferring genetic networks and identifying compound mode of action via expression profiling. Science 301, 102–105. ( 10.1126/science.1081900) [DOI] [PubMed] [Google Scholar]
  • 17.Girolami M. 2008. Bayesian inference for differential equations. Theor. Comput. Sci. 408, 4–16. ( 10.1016/j.tcs.2008.07.005) [DOI] [Google Scholar]
  • 18.Craciun G, Pantea C, Rempala GA. 2009. Algebraic methods for inferring biochemical networks: a maximum likelihood approach. Comput. Biol. Chem. 33, 361–367. ( 10.1016/j.compbiolchem.2009.07.014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Linder DF, Rempala GA. 2013. Algebraic statistical model for biochemical network dynamics inference. J. Coupled Syst. Multiscale Dyn. 1, 468–475. ( 10.1166/jcsmd.2013.1032) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Linder DF, Rempała GA. 2015. Bootstrapping least-squares estimates in biochemical reaction networks. J. Biol. Dyn. 9, 125–146. ( 10.1080/17513758.2015.1033022) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Boys RJ, Wilkinson DJ, Kirkwood TB. 2008. Bayesian inference for a discretely observed stochastic kinetic model. Stat. Comput. 18, 125–135. ( 10.1007/s11222-007-9043-x) [DOI] [Google Scholar]
  • 22.Golightly A, Wilkinson DJ. 2011. Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo. Interface Focus 1, 807–820. ( 10.1098/rsfs.2011.0047) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Choi B, Rempala GA. 2012. Inference for discretely observed stochastic kinetic networks with applications to epidemic modeling. Biostatistics 13, 153–165. ( 10.1093/biostatistics/kxr019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Golightly A, Henderson DA, Sherlock C. 2012. Efficient particle MCMC for exact inference in stochastic biochemical network models through approximation of expensive likelihoods. Stat. Comput. 25, 1039–1055. ( 10.1007/s11222-014-9469-x) [DOI] [Google Scholar]
  • 25.Golightly A, Wilkinson DJ. 2005. Bayesian inference for stochastic kinetic models using a diffusion approximation. Biometrics 61, 781–788. ( 10.1111/j.1541-0420.2005.00345.x) [DOI] [PubMed] [Google Scholar]
  • 26.Finkenstädt B, et al. 2008. Reconstruction of transcriptional dynamics from gene reporter data using differential equations. Bioinformatics 24, 2901–2907. ( 10.1093/bioinformatics/btn562) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ethier SN, Kurtz TG. 2009. Markov processes: characterization and convergence, vol. 282 Hoboken, NJ: Wiley. [Google Scholar]
  • 28.Komorowski M, Finkenstädt B, Harper C, Rand D. 2009. Bayesian inference of biochemical kinetic parameters using the linear noise approximation. BMC Bioinf. 10, 343 ( 10.1186/1471-2105-10-343) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Komorowski M, Costa MJ, Rand DA, Stumpf MPH. 2011. Sensitivity, robustness, and identifiability in stochastic chemical kinetics models. Proc. Natl Acad. Sci. USA 108, 8645–8650. ( 10.1073/pnas.1015814108) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Fearnhead P, Giagos V, Sherlock C. 2014. Inference for reaction networks using the linear noise approximation. Biometrics 70, 457–466. ( 10.1111/biom.12152) [DOI] [PubMed] [Google Scholar]
  • 31.Morrissey ER, Juárez MA, Denby KJ, Burroughs NJ. 2010. On reverse engineering of gene interaction networks using time course data with repeated measurements. Bioinformatics 26, 2305–2312. ( 10.1093/bioinformatics/btq421) [DOI] [PubMed] [Google Scholar]
  • 32.Leday GGR, de Gunst MCM, Kpogbezan GB, van der Vaart AW, van Wieringen WN, van de Wiel MA. 2017. Gene network reconstruction using global-local shrinkage priors. Ann. Appl. Stat. 11, 41–68. ( 10.1214/16-AOAS990) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Kloeden PE, Platen E. 1992. Numerical solution of stochastic differential equations, vol. 23 Berlin, Germany: Springer. [Google Scholar]
  • 34.Roy V, Hobert JP. 2010. On Monte Carlo methods for Bayesian multivariate regression models with heavy-tailed errors. J. Multivar. Anal. 101, 1190–1202. ( 10.1016/j.jmva.2009.12.015) [DOI] [Google Scholar]
  • 35.Fonseca TC, Ferreira MA, Migon HS. 2008. Objective Bayesian analysis for the Student-t regression model. Biometrika 95, 325–333. ( 10.1093/biomet/asn001) [DOI] [Google Scholar]
  • 36.Villa C. et al. 2014. Objective prior for the number of degrees of freedom of a t distribution. Bayesian Anal. 9, 197–220. ( 10.1214/13-BA854) [DOI] [Google Scholar]
  • 37.Carvalho CM, Polson NG, Scott JG. 2010. The horseshoe estimator for sparse signals. Biometrika 97, 462– 480 10.1093/biomet/asq017) [DOI] [Google Scholar]
  • 38.Rangel C, Angus J, Ghahramani Z, Lioumi M, Sotheran E, Gaiba A, Wild DL, Falciani F. 2004. Modeling T-cell activation using gene expression profiling and state-space models. Bioinformatics 20, 1361–1372. ( 10.1093/bioinformatics/bth093) [DOI] [PubMed] [Google Scholar]
  • 39.Karlsson A, Söderkvist P, Zhuang SM. 2002. Point mutations and deletions in the znfn1a1/ikaros gene in chemically induced murine lymphomas. Cancer Res. 62, 2650–2653. [PubMed] [Google Scholar]
  • 40.Pise-Masison CA, Radonovich M, Mahieux R, Chatterjee P, Whiteford C, Duvall J, Guillerm C, Gessain A, Brady JN. 2002. Transcription profile of cells infected with human T-cell leukemia virus type I compared with activated lymphocytes. Cancer Res. 62, 3562–3571. [PubMed] [Google Scholar]
  • 41.Chen L. et al. 2010. Identification of early growth response protein 1 (EGR-1) as a novel target for JUN-induced apoptosis in multiple myeloma. Blood 115, 61–70. ( 10.1182/blood-2009-03-210526) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Rothe M, Pan MG, Henzel WJ, Ayres TM, Goeddel DV. 1995. The TNFR2-TRAF signaling complex contains two novel proteins related to baculoviral inhibitor of apoptosis proteins. Cell 83, 1243–1252. ( 10.1016/0092-8674(95)90149-3) [DOI] [PubMed] [Google Scholar]
  • 43.Makalic E, Schmidt DF. 2016. A simple sampler for the horseshoe estimator. IEEE Signal Process Lett. 23, 179–182. ( 10.1109/LSP.2015.2503725) [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reverse Engineering Gene Networks using Global-local Shrinkage Rules Supplementary Material
rsfs20190049supp1.pdf (198.7KB, pdf)

Data Availability Statement

The data used in this study are publicly available in the open source software R.


Articles from Interface Focus are provided here courtesy of The Royal Society

RESOURCES