Skip to main content
Journal of Computational Biology logoLink to Journal of Computational Biology
. 2019 Oct 7;26(10):1113–1129. doi: 10.1089/cmb.2019.0036

Integration of Multiple Data Sources for Gene Network Inference Using Genetic Perturbation Data

Xiao Liang 1, William Chad Young 2, Ling-Hong Hung 3, Adrian E Raftery 4, Ka Yee Yeung 3,
PMCID: PMC6786343  PMID: 31009236

Abstract

The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we assemble multiple data sources, including gene expression data, genome-wide binding data, gene ontology, and known pathways, and use a supervised learning framework to compute prior probabilities of regulatory relationships. We show that our integrated method improves the accuracy of inferred gene networks as well as extends some previous Bayesian frameworks both in theory and applications. We apply our method to two different human cell lines, namely skin melanoma cell line A375 and lung cancer cell line A549, to illustrate the capabilities of our method. Our results show that the improvement in performance could vary from cell line to cell line and that we might need to choose different external data sources serving as prior knowledge if we hope to obtain better accuracy for different cell lines.

Keywords: data integration, gene regulation, machine learning, systems biology

1. Introduction

Gene regulatory networks play an important role in understanding the interactions between genes and have many applications. Also, advances in technology have led to the generation of high-throughput biological data, which can be leveraged in gene network inference. However, inferring gene networks from high-dimensional genomic data can be challenging.

We define a gene regulatory network as a directed graph that represents the regulatory relationships between genes, in which each node represents a gene and each directed edge represents the regulatory relationship between a regulator and a target gene. Furthermore, these regulatory relationships or edges from regulators to target genes can be calibrated by probabilities representing the likelihood of such edges, especially in Bayesian approaches.

1.1. Related work

There is an extensive literature on methods for the inference of human gene regulatory networks. For example, there are studies conducted on inferring gene networks to uncover causal relationships between gene expression and disease, which could facilitate drug discovery and development (Schadt et al., 2005a; Chen et al., 2008; Emilsson et al., 2008) as well as disease biomarkers (Schadt et al., 2005b). As another example, Woo et al. (2015) proposed a method to predict changes in gene expression level after drug perturbation, which offers insight into target prioritization of novel compounds. Also, gene network inference could advance the understanding of the mechanisms underlying various biological processes and identify genes that play important roles in biological activities.

1.1.1. Time series gene expression data

Although time series gene expression data may provide useful information from which gene regulatory relationships can be derived, they can also introduce noise and variations that can subsequently result in a reduction of accuracy. Another limitation is that it is difficult to infer causality using time series gene expression data alone without additional data sources, while causality in gene regulatory networks is of great biological interest. In the context of gene networks, an inferred directed edge representing causality in the form of (Inline graphic) means the following: gene A is the regulator of gene B, and that if the expression level of gene A changes then we can expect the expression level of gene B will change as well. However, time series data are only able to provide information on statistical causality but not biological causality.

1.1.2. Perturbation gene expression data

As static expression data without any time points, perturbation data do not reflect any dynamic biological behavior over time. Nevertheless, the experimental design in data generation can be used to derive a causal relationship. Specifically, after gene A is perturbed (i.e., by either knockdown or overexpression), the expression level of gene B is observed to change. Since the causal event (perturbation) is included in the experimental design, we can infer a directed edge (Inline graphic). Knockdown data have been widely used in the literature in gene network inference. For example, Pinna et al. (2010) showed the effectiveness of inferring gene networks using genetic perturbation expression data followed by graph analyses when applied to synthetic data from the DREAM4 in silico network challenge.

1.1.3. Bayesian networks

A Bayesian network is a directed acyclic graph (DAG) that describes the joint probabilities of the conditional independence between nodes. Bayesian networks are often used in gene network construction. For example, Friedman et al. (2000) built a framework on Bayesian networks to infer interactions between genes based on multiple expression measurements. Bayesian network methods have been applied to yeast gene expression data and further extended using probabilistic graphical models (Friedman, 2004). Bayesian networks also allow the incorporation of prior knowledge (Werhli et al., 2007). Although only statistical causality can be inferred by a Bayesian network, biological causality can be implied with constraints of prior knowledge or expression levels (Schadt, 2009; Zhu et al., 2010). On the other hand, Bayesian network inference could be time-consuming given a large set of variables. A Bayesian network is a DAG, and hence, it cannot contain any feedback loops that are ubiquitous in real-life biological systems. Using time series gene expression data, dynamic Bayesian networks considering gene expression levels from different time points allow self-loops (Murphy et al., 1999; Kim et al., 2003; Yu et al., 2004; Zhu et al., 2010).

1.1.4. Ordinary differential equations

By representing the model as linear or nonlinear differential equations, the interaction of expression level measured from different genes could be described by variables in the equations. Linear differential equations are generally more highly abstract in terms of describing the gene interactions and could be handled by existing linear algebra methods. Nonlinear differential equations are able to describe complex behaviors with the trade-off between computational cost and strict constraints (Voit, 2000; De Jong, 2002). Differential equations could be used to model different types of data, such as static gene expression data (Gardner et al., 2003; di Bernardo et al., 2005) and time series gene expression data (Bansal et al., 2006).

Ordinary differential equations (ODEs) could suffer from the curse of dimensionality given a large set of candidate genes such as a human gene set. Dimension reduction techniques, such as forward feature selection (Huang et al., 2010), singular value decompositions (Zhang et al., 2010), and principal component analysis (Bansal et al., 2006), have been used in ODE approaches to reduce the dimensionality of genomic data.

1.1.5. Regression-based methods

By formulating network inference as a variable selection problem, regression-based methods could be solved using existing techniques but suffer from the curse of dimensionality. Commonly used regression-based methods include regularization methods and Bayesian model averaging (BMA). Regularization methods, such as least absolute shrinkage and selection operator (Tibshirani, 1996), least angle regression (Efron et al., 2004) and elastic net (Zou and Hastie, 2005), have been used to model different types of data in gene network inference (van Someren et al., 2006; Charbonnier et al., 2010; Gustafsson and Hörnquist, 2010; Peng et al., 2010). Variations of BMA methods have been proposed to facilitate gene network inference. Examples include iBMA (Yeung et al., 2005), ScanBMA (Young et al., 2014), and fastBMA (Hung et al., 2017) for analyzing high-dimensional gene expression data.

1.1.6. Data integration

Instead of using a single data source, many proposed methods incorporate external knowledge in the construction of gene networks. For example, Le et al. (2004) and Geier et al. (2007) applied Bayesian networks to synthetic data with prior knowledge. Imoto et al. (2003) applied Bayesian networks to gene expression data with known regulatory interactions in yeast. Bonneau et al. (2006) proposed combining time series and knockdown gene expression data, in which a regression model coupled with biclustering algorithms was developed and applied to infer gene networks using data from the archaeon Halobacterium (Bonneau et al., 2006). Yeung et al. (2011), Lo et al. (2012), Young et al. (2014), and Hung et al. (2017) developed Bayesian regression-based network inference methods by integrating external data to yeast time series gene expression data.

1.2. Our contributions

Here, we present an integrated approach for inferring gene regulatory networks by proposing a Bayesian statistical framework to infer gene regulatory relationships in a directed manner, which is based on integrating external data sources with knockdown data from human cell lines.

This article builds on our previous work (Young et al., 2016, 2017) where a Bayesian regression framework is developed to infer gene networks from knockdown expression data. Our key contribution is the integration of this Bayesian regression framework with a supervised learning framework that leverages knockdown experiments as its primary data source as well as prior knowledge in the form of gene expression data, genome-wide binding data, gene ontologies, pathway data, and ChIP-Seq data. We show that the accuracy of inferred gene networks is increased after we apply our integrated approach to two different human cell lines, skin melanoma cell line A375 and lung cancer cell line A549, while the latter is not investigated in the aforementioned previous work. Our integrated approach and results not only improve and extend previous Bayesian frameworks and results (c.f., Young et al., 2016, 2017) in theory and applications, but also provide a general Bayesian integration framework for systems biology. Moreover, our results show that the improvement in performance could vary from cell line to cell line and that we might need to choose different external data sources serving as prior knowledge if we hope to obtain better accuracy for different cell lines. Selected results were summarized in a two-page extended abstract (Liang et al., 2018).

2. Lincs L1000 Gene Expression Data

The Library of Integrated Cellular Signatures (LINCS) (http://lincsproject.org) is a National Institutes of Health (NIH)-funded program that aims to develop comprehensive signatures of cellular states and related tools (Keenan et al., 2018). Many types of large-scale data were generated to profile changes induced by genetic and drug perturbations across human cell lines. In particular, the LINCS L1000 data generated by the Broad Institute measure the gene expression levels across ∼1000 landmark genes. These landmark genes were chosen to capture ∼80% of the information for 20,000 genes in the human genome. The LINCS L1000 gene expression data are publicly available from the Gene Expression Omnibus (GEO) database with accession number GSE70138 (www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE70138).

In this article, we use the knockdown L1000 gene expression data. There are ∼4500 knockdown experiments in the L1000 data set. Most of these data were generated using eight human cell lines: A375, A549, HA1E, HCC515, HEPG2, HT29, MCF7, and PC3. Data from these knockdown experiments are typically collected 96 hours after the perturbation.

2.1. Luminex bead technology

The L1000 experiments were performed using the Luminex bead technology (Dunbar, 2006), generating high-throughput gene expression assays using 384-well plates. To measure the expression level of specific genes, color-coded microspheres bind fluorophore and the corresponding RNA sequence. Therefore, the expression level of each gene could be represented by the intensity of the fluorescence. For a pair of genes, two types of beads sharing one bead color were designed to measure the expression of the two different genes. In each perturbation experiment, about 35,000 to 50,000 beads across 500 bead colors were added to each well to measure the expression levels of ∼1000 landmark genes.

The beads for each pair of genes were mixed in an ∼2:1 ratio. Therefore, two peaks are expected in a histogram of fluorescence levels, and these observed peaks were deconvoluted to assign expression values to the appropriate pair of genes. To reduce the noise from experimental conditions, there are several wells used for control on each plate. In addition, technical replicate data were generated, in which the same perturbations were performed in the same wells across multiple plates.

2.2. Data processing

The L1000 gene expression data were generated and processed by the Broad Institute as an extension of the Connectivity Map project (Lamb et al., 2006; Subramanian et al., 2017). L1000 gene expression data are publicly available in different formats varying from levels 1 to 5 (Subramanian et al., 2017). Level 1 represents the raw unprocessed data from the Luminex Bead technology. In level 2, the gene expression values of the landmark genes were deconvoluted from the observed fluorescence levels and normalized to a set of internal standards. In level 3, quantile normalization was performed on these landmark genes and interpolated to all 20,000 human genes. The level 4 and level 5 data consist of the gene signatures comparing the perturbed experiments to the unperturbed experiments.

Young et al. (2017) observed that the deconvolution step introduces artifacts in the data. There are three types of artifacts due to the deconvolution step assigning gene expression values to the appropriate genes that share the same bead color. First, two genes can be assigned the same expression value if their expression levels are not different enough to be distinguished. Second, together with the quantile normalization step, the deconvolution step can generate incorrect additional clusters. Finally, sometimes the two genes sharing the same bead color can be assigned flipped expression values, which means gene A is assigned the expression value of gene B and vice versa. A clustering algorithm developed to correct for the last two artifacts is discussed in Section 3.4.

3. Methods

We integrate external data sources in gene network inference using human genetic perturbation data from the LINCS L1000 project. Our inferred gene networks consist of directed edges representing causal relationships deduced from the gene knockdown experimental design and a supervised learning framework integrating multiple data sources. We show that the accuracy of our inferred networks is improved by external data integration and correction for data artifacts. Figure 1 shows an overview of our approach.

FIG. 1.

FIG. 1.

An overview of the approach. We first build a supervised framework for a selected set of target–gene regulatory pairs using external knowledge derived from the literature and existing data sets. Then, we apply machine learning methods to predict the regulatory relationships across all target–gene regulatory pairs for the landmark genes in the LINCS L1000 project. The predicted regulatory relationships are used as the prior probabilities in our Bayesian approach to predict the posterior probabilities.

3.1. BayesKnockdown

We use the BayesKnockdown Bioconductor package (BayesKnockdown package, 2016) to calculate posterior probabilities of regulatory relationships. This BayesKnockdown package was applied to the L1000 gene expression data from a single cell line (A375) (Young et al., 2016). Here, we extend Young et al. (2016) by integrating additional data sources and by applying the package to an additional cell line (A549).

To prepare the input data for the BayesKnockdown package, the LINCS L1000 knockdown experiments are first transformed by calculating z-scores to account for bias and noise among replicates:

graphic file with name eq3.gif

where Inline graphic represents the gene expression level of gene h and experiment (well) i on plate p, Inline graphic and Inline graphic are, respectively, the mean and standard deviation for gene h across all control experiments on plate p. A linear regression model is then applied to model the change in a target gene t as dependent on the change in the knockdown gene h, with Inline graphic as the error term:

graphic file with name eq8.gif

In the BayesKnockdown package (Young et al., 2016), the linear regression model is estimated with a Bayesian approach using Zellner's g-prior (Zellner, 1986) for the model parameters. The parameter g specifies the expected size of the regression coefficient Inline graphic. The value of g can be estimated using an expectation–maximization (EM) algorithm (Dempster et al., 1977; Young et al., 2014). Then the regression model with g-prior is used to calculate the probability Inline graphic that gene h regulates gene t given the data x, versus the probability Inline graphic that there is no regulatory relationship:

graphic file with name eq12.gif

Where R2 is the coefficient of determination for the aforementioned linear regression model.

In the absence of external data sources, the prior probability of regulatory relationship Inline graphic is set to Inline graphic for all the gene pairs in Young et al. (2016). The value of Inline graphic is derived from prior knowledge for yeast data, reflecting the expected number of regulators per gene (Guelzim et al., 2002). However, in this work, we calculate these prior probabilities of regulatory relationships Inline graphic using the supervised learning framework integrating multiple data sources described in Section 3.2.

The coefficient of determination R2 for the simple linear regression is calculated from the correlation of the expression data of gene h and gene t. Then we have

graphic file with name eq17.gif

the posterior probability of a regulatory relationship between a given gene pair.

3.2. Data integration using supervised machine learning

We download transcription factor and target gene pairs (TF-G pairs) from the PAZAR database, a public resource for transcription factor (TF) and regulatory sequence annotation (Portales-Casamar et al., 2007, 2009; PAZAR, public database of TFs and regulatory sequence annotation, 2017). Subsequently, we map the target genes from Ensemble IDs to Entrez IDs using BioMart (Smedley et al., 2015). After the data processing, we keep the TF-gene pairs for which both the TFs and target genes are in the L1000 landmark genes. This results in a total of 232 TF-gene pairs that we label as positive training samples (Y = 1) in our supervised framework. Due to a lack of documentation on nonregulatory TF-gene pairs, we randomly generate 240 negative training samples of TF-gene pairs (Y = 0) that are not documented in PAZAR.

After collecting the positive and negative training samples of TF-gene pairs, we derive the training data using external data sources to generate attributes in the supervised framework, as described below.

  • Gene expression data across human cell lines. For each TF-gene pair, we compute the Pearson's correlation between TF and gene across 917 human cell lines from the Cancer Cell Line Encyclopedia (CCLE) (Barretina et al., 2012). The CCLE data are publicly available from https://portals.broadinstitute.org/ccle/home and from GEO with accession number GSE36133. As another attribute (or variable) in the training data, we also compute the Pearson's correlation between TF and gene across 675 commonly used human cell lines in the RNA-seq data generated by Klijn et al. (2015). The data used in Klijn et al. are publicly available from ArrayExpress with accession number E-MTAB-2706 (www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-2706).

  • Gene ontology. Gene ontology (GO) defines a controlled vocabulary and descriptions of gene products across biological systems (Ashburner et al., 2000; Gene Ontology Consortium, 2015). Genes assigned to the same ontology terms are expected to share common functionalities. Intuitively, we expect regulatory TF-gene pairs to share common GO terms. Since GO terms are hierarchical in nature, we filter out large and hence less informative GO terms. The upper boundary is set to 100. We define a binary attribute in our supervised framework: if a given TF-gene pair is assigned to the same GO term, we define the binary variable to be 1, otherwise 0.

  • Genome-wide binding data. We also use genome-wide binding data (ChIP-chip, ChIP-seq) from ENCODE (ENCODE Project Consortium, 2012). The chromatin immunoprecipitation (ChIP) technology could be used to detect binding between proteins and DNA in vivo. Since transcriptional regulation is typically preceded by binding, we define a binary attribute for a (TF and gene) pair to be 1 if TF binds gene, and 0 otherwise. We derive these binary variables by parsing the processed ChIP data from the ENRICHR website (Chen et al., 2013).

  • Pathways data. We hypothesize that a regulatory (TF and gene) pair is more likely to be assigned to the same biochemical pathways. Therefore, we define a binary variable for each of WikiPathways (Kelder et al., 2012), KEGG (Kanehisa and Goto, 2000), BioCarta (Nishimura, 2001), and Reactome (Croff et al., 2014; Fabregat et al., 2016). If TF and gene appear in the same pathway, we define the binary variable to be 1, otherwise 0. We derive these binary variables by parsing the processed library data from the ENRICHR website (Chen et al., 2013) (http://amp.pharm.mssm.edu/Enrichr).

After finalizing the training data used in the supervised learning framework, we perform 10-fold cross validation using different machine learning methods on our supervised framework. Table 1 summarizes the results. Logistic regression yields the highest area under the receiving operating characteristic curve (AUROC) (0.76) in our cross-validation studies and is used to compute prior probabilities of regulatory relationships in subsequent steps.

Table 1.

The Average AUROC of Different Machine Learning Methods in 10 Rounds of 10-Fold Cross Validation

Machine learning method Average AUROC Assessment value
Logistic regression 0.762 Probability
SVM 0.728 Probability
5-Nearest neighbor 0.709 Probability
Ada Boost 0.669 Binary
RandomForest 0.501 Binary

Methods include logistic regression, SVM (e1071 package, 2017), 5-nearest neighbor (class package, 2015), Ada boost (ada package, 2016), and randomForest (randomForest package, 2015). Logistic regression produced the highest AUROC (area under the receiver operating characteristic curve).

3.3. Sampling bias correction

Our supervised training data consist of approximately the same proportion of positive and negative training samples (TF-G pairs). Specifically, there are 232 positive TF-G pairs and 240 negative TF-G pairs. However, the positive cases are expected to be rare in regulatory relationships. Therefore, we perform a sampling bias correction to better match the biological relationships between genes in practice.

We add an offset of Inline graphic to the log odds in our logistic regression model, where Inline graphic and Inline graphic are the sampling rates for positive and negative cases, respectively, in the training data. We use the prior knowledge from Lo et al. (2012) that the average number of regulators for each gene is about 2.76. Since there are ∼20,000 human genes, we then estimate Inline graphic Inline graphic, and Inline graphic.

Note that this sampling bias correction is performed only in the supervised learning step. These corrected prior probabilities of regulatory relationships are used as Inline graphic in the BayesKnockdown framework combining the L1000 gene expression data described in Section 3.1. Figure 2 shows the histograms before and after the correction in cell line A375. Figure 3 shows the histograms in cell line A549. Posterior probability values are thresholded at 0.5. We observe that the expected number of regulators per target gene is much closer to what we expect in biological networks after this sampling bias correction step.

FIG. 2.

FIG. 2.

Histograms of the expected number of regulators per target gene predicted using knockdown data in cell line A375. (A) Shows the histogram of the expected number of regulators per target gene without the sampling bias correction. (B) Shows the histogram of the expected number of regulators per target gene after applying the sampling bias correction to the prior.

FIG. 3.

FIG. 3.

Histograms of the expected number of regulators per target gene predicted using knockdown data in cell line A549. (A) Shows the histogram of the expected number of regulators per target gene without the sampling bias correction. (B) Shows the histogram of the expected number of regulators per target gene after applying the sampling bias correction to the prior.

3.4. Model-based clustering with data correction

Model-based clustering with data correction (MCDC) is a method developed to remove artifacts in the L1000 gene expression data (Young et al., 2017). Recall that additional noisy clusters may be generated and the expression values of the paired genes can be reversed in the deconvolution step of the L1000 gene expression data. We apply MCDC to the untreated data in cell lines A375 and A549.

Model-based clustering (Banfield and Raftery, 1993; Fraley and Raftery, 2002; McLachlan and Peel, 2005) assumes that the data come from a distribution consisting of a mixture of multiple components. Each of these components can be modeled by a Gaussian distribution with parameters that can be estimated using the EM algorithm. MCDC extends model-based clustering to detect flipped data points in the L1000 gene expression data. In particular, MCDC uses a transformation matrix to determine whether each data point should be corrected. Clustering is then done with both the original and transformed data, resulting in a probability of transformation for each data point. This method could be used to identify flipped data points. Furthermore, to eliminate the effect of noisy clusters generated from expression-level estimating process, the expression levels of the paired genes are estimated as the mean of the largest cluster after selecting the best model in MCDC. MCDC was applied with the number of clusters ranging from 1 to 9. The best number of clusters and the best model are then selected using the Bayesian information criterion (Fraley and Raftery, 2002).

3.5. Assessment

To assess our resulting networks, we use the TRANSFAC and JASPAR (Mathelier et al., 2013) databases that provide TF DNA-binding preferences. The TRANSFAC and JASPAR (T&J) databases consist of ∼4200 TF-gene relationships across 37 TFs that overlap with the 1000 LINCS landmark genes. This is the same gold standard that was used in Young et al. (2016, 2017). Although the T&J assessment criteria are limited to well-studied TFs, it is difficult to find a comprehensive gold standard for gene network assessment in mammalian systems.

We compute a contingency table consisting of the number of true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN). Table 2 shows a contingency table with the definitions of TP, FP, TN, and FN. In particular, TP is the number of TF-gene relationships that are in our inferred network and in the T&J assessment criteria. FP is the number of TF-gene relationships that are in our inferred network but not in the T&J assessment criteria. The Fisher's test is applied to the 2 × 2 contingency table to assess the consistency of our constructed network with the known regulatory relationships. Precision is defined as Inline graphic.

Table 2.

A Contingency Table Showing the Definitions of True Positives, False Positives, False Negatives, and True Negatives

Edge in our inferred network
    Yes No
Edge in T&J Yes TP FN
  No FP TN

T&J, TRANSFAC and JASPAR.

4. Results and Discussion

4.1. Results: Skin melanoma cell line A375

Table 3 shows the contingency tables that compare our inferred networks from the A375 cell line to the T&J data set. We use two cutoff posterior probability values, Inline graphic and Inline graphic, as thresholds for positive edges. The two tables in the first row correspond to the network computed using the L1000 knockdown gene expression data only. The two tables in the second row correspond to the network inferred using knockdown data and our external data integration. The two tables in the last row show the assessment results of the network inferred from knockdown data with MCDC and external data integration.

Table 3.

Assessment Results Comparing Our Inferred Networks Using Cell Line A375 to TRANSFAC and JASPAR

1. Knockdown data
    Cutoff 0.5     Cutoff 0.95
    Yes No     Yes No
T&J Yes 38 3100 T&J Yes 13 3125
No 225 27488 No 55 27658
    p-Value: 0.01717     p-Value: 0.01842
 
 
Precision: 0.14449
 
 
Precision: 0.19118
2. Knockdown data + supervised network
    Cutoff 0.5     Cutoff 0.95
    Yes No     Yes No
T&J Yes 27 3111 T&J Yes 8 3130
No 142 27,571   No 34 27,679
    p-Value: 0.01414     p-Value: 0.05842
 
 
Precision: 0.15976
 
 
Precision: 0.19048
3. Knockdown data + MCDC Untrt + supervised network
    Cutoff 0.5     Cutoff 0.95
    Yes No     Yes No
T&J Yes 25 3113 T&J Yes 11 3127
No 122 27,591 No 25 27,688
    p-Value: 0.00716     p-Value: 0.0006371
    Precision: 0.17007     Precision: 0.30556

The two tables in the first row correspond to the network computed using the L1000 knockdown gene expression data only. The two tables in the second row correspond to the network inferred using knockdown data and our external data integration. The two tables in the last row show the assessment results of the network inferred from knockdown data with MCDC and external data integration.

MCDC, model-based clustering with data correction.

We observe that both MCDC and integrating external data improve both the p-value from the Fisher's exact test and the precision. By applying both MCDC and external data integration, the p values were improved from 0.017 to 0.007 at the 0.5 posterior probability cutoff and from 0.018 to 0.0006 at the 0.95 posterior probability cutoff. Also, the precision increased from 0.144 to 0.170 at the 0.5 posterior probability cutoff and from 0.191 to 0.306 at the 0.95 posterior probability cutoff. Note that adding the external data integration step to MCDC yields more significant p values than the MCDC step alone (Young et al., 2017).

Next, we compare our inferred edges (TF-gene relationships) to the T&J data set by ranked lists. The assessment results are shown in Table 4. We first identify all the edges in the intersection of our predicted edges and T&J edges. Then we rank these found edges by posterior probabilities from our prediction in descending order. Finally, we rank all our predicted edges by posterior probabilities in descending order. For each edge also in T&J data set, we note down the corresponding ranking in our edge list for the same edge. With the edge ranks we not only assess our inferred networks by the found edges but also involved the values of posterior probabilities.

Table 4.

Comparison of the Rank of the First 25 Edges Found and Match the TRANSFAC and JASPAR Edge List in Cell Line A375

Found edge Knockdown data Supervised network KD + supervised network KD + MCDC Untrt KD + MCDC Untrt + supervised network
1 6 2 7 2 2
2 9 3 10 6 6
3 11 7 11 8 8
4 15 12 15 9 9
5 22 26 33 12 13
6 34 40 34 20 16
7 39 54 37 22 23
8 40 63 42 29 27
9 41 64 43 42 28
10 48 66 45 45 30
11 56 88 65 49 35
12 58 91 67 53 41
13 65 106 73 56 56
14 77 109 74 57 57
15 78 110 80 66 58
16 86 112 82 68 63
17 98 114 83 71 80
18 100 117 95 73 87
19 102 118 100 83 90
20 111 132 109 92 96
21 119 136 113 93 97
22 122 139 119 95 99
23 130 151 122 102 102
24 131 154 124 108 116
25 135 156 127 112 123

Edges are ranked by posterior probabilities. The numbers represent the rankings of true positive edges (i.e., edges found both in our gene network and T&J edge list) among positive edges (i.e., edges found in our network). The table shows that external knowledge integration helps improving results of middle-ranked edges, which makes the result steadier.

As an example, for the first column labeled “Knockdown Data,” the sixth edge in our edge list is the first edge found in T&J in Table 4. This number means that the top five edges in our edge list are not found in T&J. These ranked lists can help us to determine the differences between T&J and our own edge list. We can see that the external knowledge integration improves the results of middle-ranked edges. As another example, in the third column that corresponds to the network inferred using both the L1000 knockdown data and external data integration, we can see that the 25th found edge is ranked 127th in our edge list. In the first column, from knockdown data only, the 25th found edge is ranked 135th in our edge list. The larger difference in rankings indicates a larger difference between our prediction and T&J data set.

Figure 4 shows the precision–recall curves under different combinations of data and prior. The area under the curve improves with external data integration. Furthermore, the area under the curve is further improved with the MCDC.

FIG. 4.

FIG. 4.

Precision–recall curves for cell line A549 using different data assessed with TRANSFAC and JASPAR. The results are improved by external data integration with or without MCDC. MCDC, model-based clustering with data correction.

Figures 5 and 6 show our gene network inferred from the knockdown data of the A375 cell line with MCDC and external data integration. Figure 5 shows all the inferred edges at a cutoff of posterior probability 0.5. Figure 6 shows all the TP edges found in TRANSFAC and JASPAR database at a cutoff of posterior probability 0.5. We observe that some of our inferred edges are also found in the literature such as Inline graphic (Benbrook and Jones, 1990; Liu et al., 1999; Spring and Krebs, 1999).

FIG. 5.

FIG. 5.

Inferred directed edges at a posterior probability cutoff of 0.5 from the gene network generated by integrating external data with the knockdown data and MCDC-corrected untreated data. Each node represents a gene and each edge represents a regulatory interaction between the two genes. The width of each edge is in proportion to the inferred posterior probability that the regulatory relationship exists for the corresponding gene pair.

FIG. 6.

FIG. 6.

True positive edges at a posterior probability cutoff of 0.5 from the gene network generated by integrating external data with the knockdown data and MCDC-corrected untreated data. These true positive edges represent the edges from Figure 5 that are also found in our assessment criteria. Each node represents a gene and each edge represents a regulatory interaction between the two genes. The width of each edge is in proportion to the inferred posterior probability that the regulatory relationship exists for the corresponding gene pair.

4.2. Results: Lung cancer cell line A549

We apply our proposed method and assessment criteria to another cell line A549. Table 5 shows the contingency tables comparing T&J to our results. The posterior probability thresholds for positive edges are again set to Inline graphic and Inline graphic. Similar to the results on cell line A375, we observe that the MCDC and integrated external data improve the p-value and precision in the A549 cell line, although the effect of applying MCDC is not as significant as in A375. By applying both the MCDC and external data integration, the p values are improved from the 0.001 level to the 0.0001 level. Also, the precision increases from 0.13 to 0.15 at 0.5 cutoff and from 0.14 to 0.15 at 0.95 cutoff.

Table 5.

Assessment Results Comparing Our Inferred Networks on the A549 Cell Line to TRANSFAC and JASPAR

1. Knockdown data
    Cutoff 0.5     Cutoff 0.95
    Yes No     Yes No
T&J Yes 75 2853 T&J Yes 62 2866
No 487 26,472 No 380 26,579
    p-Value: 0.00371     p-Value: 0.002561
 
 
Precision: 0.13345
 
 
Precision: 0.14027
2. Knockdown data + supervised network
    Cutoff 0.5     Cutoff 0.95
    Yes No     Yes No
T&J Yes 77 2851 T&J Yes 64 2864
No 437 26,522 No 363 26,596
    p-Value: 0.0001138     p-Value: 0.0004039
 
 
Precision: 0.149805
 
 
Precision: 0.149883
3. Knockdown data + MCDC Untrt + supervised network
    Cutoff 0.5     Cutoff 0.95
    Yes No     Yes No
T&J Yes 79 2849 T&J Yes 64 2864
No 450 26,509 No 347 26,612
    p-Value: 0.0001037     p-Value: 0.0001383
    Precision: 0.149338     Precision: 0.155718

The assessment results of edge ranks are shown in Table 6. As in the A375 cell line, external data integration and MCDC improve the results of middle-ranked edges. Figure 7 shows the precision–recall curves for the A549 cell line. After external data integration, the area under the curve is improved. However, unlike A375, the MCDC to the untreated data does not show major improvement.

Table 6.

Comparison of the Rank of the First 25 Edges Found and Match the TRANSFAC and JASPAR Edge List in Cell Line A549

Found edge Knockdown data Supervised network KD + supervised network KD + MCDC Untrt KD + MCDC Untrt + supervised network
1 8 9 2 5 6
2 11 14 25 8 9
3 16 21 30 12 14
4 35 31 36 26 28
5 40 32 43 27 29
6 41 40 46 34 36
7 43 45 47 52 56
8 55 57 62 53 57
9 81 70 70 64 70
10 82 84 71 75 73
11 107 87 83 85 75
12 111 97 94 87 79
13 131 101 100 89 88
14 136 110 107 98 104
15 139 115 115 100 111
16 164 117 123 103 118
17 175 119 124 125 127
18 177 121 128 137 139
19 178 124 129 149 141
20 180 143 130 157 161
21 183 147 142 170 163
22 190 149 156 171 175
23 196 162 163 180 176
24 219 165 170 184 177
25 220 183 178 194 180

Edges are ranked by posterior probability. The numbers represent the rankings of true positive edges (i.e., edges found both in our gene network and T&J edge list) among positive edges (i.e., edges found in our network). The table shows that external data integration improves the results of middle-ranked edges.

FIG. 7.

FIG. 7.

Precision–recall curves for cell line A375 using different data assessed with TRANSFAC and JASPAR. The results are improved by external data integration with or without MCDC.

5. Conclusions

In this article, we present a systematic approach that integrates external data sources with knockdown data from human cell lines for gene network inference. This integrated approach includes a supervised learning framework that systematically integrates multiple data sources in human, including gene expression data across human cell lines, GO, genome-wide binding data (ChIP-chip, ChIP-seq), and pathways. This integrated framework improves and generalizes our previous work in Young et al. (2016) by computing prior probabilities using external data sources instead of using a constant value. We demonstrate the flexibility of our Bayesian regression framework to integrate multiple data sources, while retaining high computational efficiency. We infer directed edges by leveraging the experimental design of the knockdown data.

We apply our integrated approach to two different human cell lines (skin melanoma cell line A375 and lung cancer cell line A549) as two examples of applications of the approach, while A549 is not investigated in our previous work (Young et al., 2016, 2017). Our results show that our method is effective for both cell lines. In particular, the precision and p values of the inferred gene networks are improved by adding the external data integration step to MCDC. Moreover, it is clear that our method of interrogating the regulatory landscape of gene–gene interaction networks can be generally applied to human gene expression data. Thus, we not only improve and extend the results obtained by Young et al. (2016, 2017), but also develop a flexible supervised learning framework using external data integration to infer human gene networks with improved accuracy while retaining efficiency. In addition, we observe that the improvement in performance varies from cell line to cell line. When using the integrated approach to infer gene regulatory networks on different cell lines, careful choice on different external data sources as prior knowledge to reach better accuracy is needed. Furthermore, studying other different cell lines using this integrated algorithm remains a meaningful future research problem.

Acknowledgments

The authors thank Ms. Bo Ding for her contributions to the data processing and the supervised learning step. They also thank Prof. Lenwood S. Heath for his valuable and helpful suggestions and comments on the writing of the article. This work is supported by NIH grants U54 HL127624, R01 GM126019, R01 HD054511, and R01 HD070936.

Author Disclosure Statement

The authors declare there are no competing financial interests.

References

  1. ada package 2016. Available at: cran.r-project.org/package=ada Accessed February28, 2017
  2. Ashburner M., Ball C.A., Blake J.A., et al. 2000. Gene ontology: Tool for the unification of biology. Nat. Genet. 25, 25–29 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Banfield J.D., and Raftery A.E. 1993. Model-based gaussian and non-gaussian clustering. Biometrics 49, 803–821 [Google Scholar]
  4. Bansal M., Della Gatta G., and Di Bernardo D. 2006. Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics 22, 815–822 [DOI] [PubMed] [Google Scholar]
  5. Barretina J., Caponigro G., Stransky N., et al. 2012. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. BayesKnockdown package. 2016. Available at: bioconductor.org/packages/release/bioc/html/BayesKnockdown.html Accessed February28, 2017
  7. Benbrook D.M., and Jones N.C. 1990. Heterodimer formation between creb and jun proteins. Oncogene 5, 295–302 [PubMed] [Google Scholar]
  8. Bonneau R., Reiss D.J., Shannon P., et al. 2006. The inferelator: An algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol. 7, R36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Charbonnier C., Chiquet J., and Ambroise C. 2010. Weighted-lasso for structured network inference from time course data. Stat. Appl. Genet. Mol. Biol. 9, 15. [DOI] [PubMed] [Google Scholar]
  10. Chen E.Y., Tan C.M., Kou Y., et al. 2013. Enrichr: Interactive and collaborative html5 gene list enrichment analysis tool. BMC Bioinformatics 14, 128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chen Y., Zhu J., Lum P.Y., et al. 2008. Variations in dna elucidate molecular networks that cause disease. Nature 452, 429–435 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. class package. 2015. Available at: cran.r-project.org/package=class Accessed February28, 2017
  13. Croft D., Mundo A.F., Haw R., et al. 2014. The reactome pathway knowledgebase. Nucl. Acids Res. 42, D472–D477 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. De Jong H. 2002. Modeling and simulation of genetic regulatory systems: A literature review. J. Comput. Biol. 9, 67–103 [DOI] [PubMed] [Google Scholar]
  15. Dempster A.P., Laird N.M., and Rubin D.B. 1977. Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. B 39, 1–38 [Google Scholar]
  16. di Bernardo D., Thompson M.J., Gardner T.S., et al. 2005. Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks. Nat. Biotechnol. 23, 377–383 [DOI] [PubMed] [Google Scholar]
  17. Dunbar S.A. 2006. Applications of Luminex® xMAP™ technology for rapid, high-throughput multiplexed nucleic acid detection. Clin. Chim. Acta 363, 71–82 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. e1071 package. 2017. Available at: cran.r-project.org/package=e1071 Accessed February28, 2017
  19. Efron B., Hastie T., Johnstone I., et al. 2004. Least angle regression. Ann. Stat. 32, 407–499 [Google Scholar]
  20. Emilsson V., Thorleifsson G., Zhang B., et al. 2008. Genetics of gene expression and its effect on disease. Nature 452, 423–428 [DOI] [PubMed] [Google Scholar]
  21. ENCODE Project Consortium. 2012. An integrated encyclopedia of dna elements in the human genome. Nature 489, 57–74 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Fabregat A., Sidiropoulos K., Garapati P., et al. 2016. The reactome pathway knowledgebase. Nucl. Acids Res. 44, D481–D487 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Fraley C., and Raftery A.E. 2002. Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97, 611–631 [Google Scholar]
  24. Friedman N. 2004. Inferring cellular networks using probabilistic graphical models. Science 303, 799–805 [DOI] [PubMed] [Google Scholar]
  25. Friedman N., Linial M., Nachman I., et al. 2000. Using bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–620 [DOI] [PubMed] [Google Scholar]
  26. Gardner T.S., Di Bernardo D., Lorenz D., et al. 2003. Inferring genetic networks and identifying compound mode of action via expression profiling. Science 301, 102–105 [DOI] [PubMed] [Google Scholar]
  27. Geier F., Timmer J., and Fleck C. 2007. Reconstructing gene-regulatory networks from time series, knock-out data, and prior knowledge. BMC Syst. Biol. 1, 11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Gene Ontology Consortium. 2015. Gene ontology consortium: Going forward. Nucl. Acids Res. 43, D1049–D1056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Guelzim N., Bottani S., Bourgine P., et al. 2002. Topological and causal structure of the yeast transcriptional regulatory network. Nat. Genet. 31, 60–63 [DOI] [PubMed] [Google Scholar]
  30. Gustafsson M., and Hörnquist M. 2010. Gene expression prediction by soft integration and the elastic net-best performance of the dream3 gene expression challenge. PLoS One 5, e9134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Huang T., Liu L., Qian Z., et al. 2010. Using genereg to construct time delay gene regulatory networks. BMC Res. Notes 3, 142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Hung L.-H., Shi K., Wu M., et al. 2017. fastbma: Scalable network inference and transitive reduction. Gigascience 16:1–10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Imoto S., Kim S., Goto T., et al. 2003. Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network. J. Bioinform. Comput. Biol. 1, 231–252 [DOI] [PubMed] [Google Scholar]
  34. Kanehisa M., and Goto S. 2000. Kegg: Kyoto encyclopedia of genes and genomes. Nucl. Acids Res. 28, 27–30 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Keenan A.B., Jenkins S.L., Jagodnik K.M., et al. 2018. The library of integrated network-based cellular signatures nih program: System-level cataloging of human cells response to perturbations. Cell Syst. 6, 13–24 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Kelder T., van Iersel M.P., Hanspers K., et al. 2012. Wikipathways: Building research communities on biological pathways. Nucl. Acids Res. 40, D1301–D1307 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Kim S.Y., Imoto S., and Miyano S. 2003. Inferring gene networks from time series microarray data using dynamic bayesian networks. Brief Bioinform. 4, 228–235 [DOI] [PubMed] [Google Scholar]
  38. Klijn C., Durinck S., Stawiski E.W., et al. 2015. A comprehensive transcriptional portrait of human cancer cell lines. Nat. Biotechnol. 33, 306–312 [DOI] [PubMed] [Google Scholar]
  39. Lamb J., Crawford E.D., Peck D., et al. 2006. The connectivity map: Using geneexpression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935 [DOI] [PubMed] [Google Scholar]
  40. Le P.P., Bahl A., and Ungar L.H. 2004. Using prior knowledge to improve genetic network reconstruction from microarray data. In Silico Biol. 4, 335–353 [PubMed] [Google Scholar]
  41. Liang X., Young W.C., Hung L.-H., et al. 2018. Integration of multiple data sources for gene network inference using genetic perturbation data, 601–602. Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM, Washington, DC [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Liu N., Cigola E., Tinti C., et al. 1999. Unique regulation of immediate early gene and tyrosine hydroxylase expression in the odor-deprived mouse olfactory bulb. J. Biol. Chem. 274, 3042–3047 [DOI] [PubMed] [Google Scholar]
  43. Lo K., Raftery A.E., Dombek K.M., et al. 2012. Integrating external biological knowledge in the construction of regulatory networks from time-series expression data. BMC Syst. Biol. 6, 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Mathelier A., Zhao X., Zhang A.W., et al. 2013. Jaspar 2014: An extensively expanded and updated open-access database of transcription factor binding profiles. Nucl. Acids Res. 42, D142–D147 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. McLachlan G., and Peel D. 2005. Finite Mixture Models. John Wiley & Sons, New York, NY [Google Scholar]
  46. Murphy K., and Mian S. 1999. Modelling gene expression data using dynamic bayesian networks. Technical Report, Computer Science Division, University of California, Berkeley, CA [Google Scholar]
  47. Nishimura D. 2001. Biocarta. Biotech Software & Internet Report. Comput. Softw. J. Sci. 2, 117–120 [Google Scholar]
  48. PAZAR, public database of transcription factors and regulatory sequence annotation 2017. Available at: www.pazar.info Accessed February28, 2017
  49. Peng J., Zhu J., Bergamaschi A., et al. 2010. Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann. Appl. Stat. 4, 53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Pinna A., Soranzo N., and De La Fuente A. 2010. From knockouts to networks: Establishing direct cause-effect relationships through graph analysis. PLoS One 5, e12912. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Portales-Casamar E., Arenillas D., Lim J., et al. 2009. The pazar database of gene regulatory information coupled to the orca toolkit for the study of regulatory sequences. Nucl. Acids Res. 37, D54–D60 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Portales-Casamar E., Kirov S., Lim J., et al. 2007. Pazar: A framework for collection and dissemination of cis-regulatory sequence annotation. Genome Biol. 8, R207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. randomForest package 2015. Available at: cran.r-project.org/package=randomForest Accessed February28, 2017
  54. Schadt E.E. 2009. Molecular networks as sensors and drivers of common human diseases. Nature 461, 218–223 [DOI] [PubMed] [Google Scholar]
  55. Schadt E.E., Lamb J., Yang X., et al. 2005a. An integrative genomics approach to infer causal associations between gene expression and disease. Nat. Genet. 37, 710–717 [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Schadt E.E., Sachs A., and Friend S. 2005b. Embracing complexity, inching closer to reality. Sci. STkE 295, 40. [DOI] [PubMed] [Google Scholar]
  57. Smedley D., Haider S., Durinck S., et al. 2015. The biomart community portal: An innovative alternative to large, centralized data repositories. Nucl. Acids Res. 43, W589–W598 [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Spring D.J., and Krebs E.G. 1999. Deletion of 11 amino acids in p90 rsk-mo-1abolishes kinase activity. Mol. Cell. Biol. 19, 317–320 [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Subramanian A., Narayan R., Corsello S.M., et al. 2017. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452.e17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 [Google Scholar]
  61. van Someren E.P., Vaes B.L., Steegenga W.T., et al. 2006. Least absolute regression network analysis of the murine osteoblast differentiation network. Bioinformatics 22, 477–484 [DOI] [PubMed] [Google Scholar]
  62. Voit E.O. 2000. Computational Analysis of Biochemical Systems: A Practical Guide for Biochemists and Molecular Biologists. Cambridge University Press, New York, NY [Google Scholar]
  63. Werhli A.V., and Husmeier D. 2007. Reconstructing gene regulatory networks with bayesian networks by combining expression data with multiple sources of prior knowledge. Stat. Appl. Genet. Mol. Biol. 6, 15. [DOI] [PubMed] [Google Scholar]
  64. Woo J.H., Shimoni Y., Yang W.S., et al. 2015. Elucidating compound mechanism of action by network perturbation analysis. Cell 162, 441–451 [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Yeung K.Y., Bumgarner R.E., and Raftery A.E. 2005. Bayesian model averaging: Development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21, 2394–2402 [DOI] [PubMed] [Google Scholar]
  66. Yeung K.Y., Dombek K.M., Lo K., et al. 2011. Construction of regulatory networks using expression time-series data of a genotyped population. Proc. Natl. Acad. Sci. U.S.A. 108, 19436–19441 [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Young W.C., Raftery A.E., and Yeung K.Y. 2014. Fast Bayesian inference for gene regulatory networks using scanbma. BMC Syst. Biol. 8, 47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Young W.C., Raftery A.E., and Yeung K.Y. 2016. A posterior probability approach for gene regulatory network inference in genetic perturbation data. Math. Biosci. Eng. 13, 1241–1251 [DOI] [PubMed] [Google Scholar]
  69. Young W.C., Yeung K.Y., and Raftery A.E. 2017. Model-based clustering with data correction for removing artifacts in gene expression data. Ann. Appl. Stat. 11, 1998–2026 [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Yu J., Smith V.A., Wang P.P., et al. 2004. Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics 20, 3594–3603 [DOI] [PubMed] [Google Scholar]
  71. Zellner A. 1986. On assessing prior distributions and Bayesian regression analysis with g-prior distributions, 233–243. In: Goel P., and Zellner A., eds. Bayesian Inference and Decision Techniques: Essays in Honor of Bruno De Finetti. Elsevier Science Publishers, Inc., New York, NY [Google Scholar]
  72. Zhang S.-Q., Ching W.-K., Tsing N.-K., et al. 2010. A new multiple regression approach for the construction of genetic regulatory networks. Artif. Intell. Med. 48, 153–160 [DOI] [PubMed] [Google Scholar]
  73. Zhu J., Chen Y., Leonardson A.S., et al. 2010. Characterizing dynamic changes in the human blood transcriptional network. PLoS Comput. Biol. 6, e1000671. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Zou H., and Hastie T. 2005. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 [Google Scholar]

Articles from Journal of Computational Biology are provided here courtesy of Mary Ann Liebert, Inc.

RESOURCES