Fast hybrid Bayesian integrative learning of multiple gene regulatory networks for type 1 diabetes

Bochao Jia; Faming Liang; The TEDDY Study Group

doi:10.1093/biostatistics/kxz027

. 2019 Aug 2;22(2):233–249. doi: 10.1093/biostatistics/kxz027

Fast hybrid Bayesian integrative learning of multiple gene regulatory networks for type 1 diabetes

Bochao Jia ¹, Faming Liang ^2,^✉; The TEDDY Study Group²

PMCID: PMC8035990 PMID: 33838043

SUMMARY

Motivated by the study of the molecular mechanism underlying type 1 diabetes with gene expression data collected from both patients and healthy controls at multiple time points, we propose a hybrid Bayesian method for jointly estimating multiple dependent Gaussian graphical models with data observed under distinct conditions, which avoids inversion of high-dimensional covariance matrices and thus can be executed very fast. We prove the consistency of the proposed method under mild conditions. The numerical results indicate the superiority of the proposed method over existing ones in both estimation accuracy and computational efficiency. Extension of the proposed method to joint estimation of multiple mixed graphical models is straightforward.

Keywords: Data integration, Meta-analysis, Multiple Gaussian graphical models, ψ-Learning

1. Introduction

Type 1 diabetes (T1D) is one of the most common autoimmune diseases. The Environmental Determinants of Diabetes in the Young (TEDDY) study is designed to identify environmental exposures triggering islet autoimmunity and T1D in genetically high-risk children. A large dataset has been collected through the study, including clinical data, genetic data, and demographical data. While great efforts have been made for identifying the genetic and environmental factors that contribute to the etiology of the disease, the molecular mechanism underlying the disease is still far from understanding. To enhance our understanding to the molecular mechanism, this work aims to learn gene regulatory networks (GRNs) by integrating the gene expression data measured from both the patients and healthy controls at multiple time points. Figure 1 shows the structure of the data, where the gene expression was measured for each of the case and control children at nine time points within 4 years of age. How to integrate the data collected under the 18 distinct conditions have posed a great challenge on the current statistical methods.

Fig. 1. — Structure of the T1D data considered in the article, where the numbers represent nine time points at which gene expression data were collected and the arrows represent joint estimation of Gaussian graphical models by integrating the data across different time points and case–control groups.

During the past decade, a variety of approaches have been proposed for estimating multiple GRNs with data collected under multiple distinct conditions. These approaches can be roughly grouped into two categories, namely, regularization and Bayesian.

The regularization approaches work with some specific penalty functions that enhance the shared structure of the graphical models. For example, Guo and others (2011) employed a hierarchical penalty that targets the removal of common zeros in the precision matrices across conditions. Danaher and others (2014) employed penalized fused lasso or group lasso penalties that encourage shared elements of the precision matrices. Chun and others (2015) employed a class of nonconvex penalty functions that regularize the common and condition-specific structures hierarchically. A shortcoming of these approaches is that they assume the observations under different conditions are independent. This is hard to be satisfied for the temporal data, where the observations were taken from the same cohort at multiple time points. To address this issue, Zhou and others (2010) and Qiu and others (2016) proposed to model the temporal data in a high-dimensional time series and then estimate the time-varying graphical structure using a nonparametric method by assuming that the covariance changes smoothly over time. These approaches usually require the time series to be fairly long, say, 50 or longer.

As an analog to regularization approaches, Bayesian approaches enhance the shared structure of multiple graphical models by employing some specific priors. For example, Peterson and others (2015) links the estimation of graph structures via a Markov random field (MRF) prior which encourages common edges. However, since this method involves repeated calculations of concentration matrices (i.e., inverse of covariance matrices), it is only applicable when the graph is not very large. To accelerate computation, Lin and others (2017) proposed a Bayesian analog of the neighborhood selection method (Meinshausen and Bühlmann, 2006) to learn the structure of multiple graphical models with the MRF prior.

In this article, we propose a fast hybrid Bayesian integrative analysis (FHBIA) method for jointly estimating multiple Gaussian graphical models. The proposed method consists of both frequentist and Bayesian components. First, it applies a Inline graphic -learning method, which is a frequentist method, to transform the original data to edge-wise -scores. The -score, which forms an equivalent measure of the partial correlation coefficient, provides a good summary for the graph structure information contained in the data under each condition. Then, it applies a Bayesian method to model the Inline graphic -scores for edge clustering and applies a meta-analysis method for integrating data information across distinct conditions. Finally, it applies a multiple hypothesis test method for edge determination. Due to the use of the -score transformation, FHBIA avoids inversion of high-dimensional covariance matrices and thus can be executed very fast. The multiple hypothesis test produces a Inline graphic -value (Storey, 2002), which can be viewed as an uncertainty measure, for each potential edge of the multiple Gaussian graphs. We prove consistency of the proposed method under mild conditions and illustrate its performance using simulated and real data examples. The numerical results indicate the superiority of the proposed method over the existing ones in both estimation accuracy and computational efficiency.

2. Fast hybrid Bayesian integrative analysis

The FHBIA method consists of a few steps, including Inline graphic -score calculation, Bayesian clustering and meta-analysis, and joint edge detection (JED), with the diagram shown in Figure 2.

Fig. 2. — Diagram of the FHBIA method: (i) datasets ’s for are first transformed to edgewise -scores ’s through the step of *-score transformation*; (ii) -scores are processed through the step of *Bayesian clustering and meta-analysis* to get Bayesian integrated -scores denoted by ’s; (iii) Bayesian integrated -scores are further processed through the step of JED to get graph estimates ’s.

2.1. -Score transformation

Suppose that we have a dataset of Inline graphic variables observed under distinct conditions. Let denote the dataset observed under condition , where denotes the sample size under condition ; and is a -dimensional random vector distributed according to the multivariate normal distribution , and and are the mean and covariance matrix of the distribution, respectively. The sample size Inline graphic is not necessarily the same for all conditions. Without loss of generality, we assume that is a zero vector for all . With slight abuse of notation, we let denote the variables that are common for all datasets. Let denote the index set of the variables, where each variable is called a node in the terminology of graphs.

We adopt the Inline graphic -learning algorithm to transform each dataset to edge-wise scores independently. The -learning algorithm first produced a -partial correlation coefficient for each pair of nodes. Denote the -partial correlation coefficients by for all , which are equivalent to the true partial correlation coefficients Inline graphic for determining the structure of the Gaussian Graphical Model (GGM) in the sense that , . Further, the -learning algorithm converts the -partial correlation coefficients to -scores, denoted by , via Fisher’s transformation and the probit transformation such that approximately holds under the null hypothesis Inline graphic for any . Therefore, the -score can be used as a test statistic for identifying nonzero partial correlation coefficients and thus the structure of the GGM. The use of -scores enables the proposed method to avoid inversion of high-dimensional covariance matrices that the existing Bayesian methods often need to deal with, and hence the proposed method can be executed very fast. Refer to the supplementary material available at Biostatistics online for the detail of the Inline graphic -learning algorithm.

Since the GGM is undirected, we have a total of Inline graphic -scores to calculate for each dataset . For convenience, we re-arrange the -scores for each dataset into an -vector with and re-arrange the -scores for all datasets into an matrix with and .

2.2. Bayesian clustering and meta-analysis

Consider the Inline graphic -scores , where each pair corresponds to one candidate edge in the graph . Let be the indicator for the status of edge in the underlying graph ; if the edge exists and 0 otherwise. The ’s work as latent variables in FHBIA. Conditioned on , we assume that ’s are mutually independent and follow a two-component mixture Gaussian distribution given by

(2.1)

for Inline graphic and . When , ’s have a value close to 0, otherwise, ’s might have a large negative or positive value depending on the sign of the partial correlation coefficient. Under the assumption that the structure of the GGM changes only slightly under adjacent conditions, it is reasonable to assume that for each Inline graphic , the sign of ’s are not changed when the edge exists; therefore, ’s can be modeled by a two-component mixture Gaussian distribution. In some cases, e.g., when grows, a three-component mixture Gaussian distribution might be needed, which allows us to handle the scenario that an edge is included in multiple graphs, but its partial correlation coefficients have different signs in different graphs. The derivation under this scenario is given in the supplementary material available at Biostatistics online, which is a simple extension of the deviation presented below. Regarding the two-component mixture distribution (2.1), we further note that Inline graphic can be simply set to 0 considering the physical mean of -scores. However, as shown below, this general setup does not cause any computational difficulty.

Essentially, we have formulated the problem of inference of Inline graphic ’s as a clustering problem, grouping to up to two different clusters. For the case of three-component mixture distribution, this is similar. Let and . Conditioned on , the joint likelihood function of is given by

(2.2)

where Inline graphic is the density function of the Gaussian distribution with mean and variance . Taking a product of (2.2) over , we have the joint distribution of all -scores conditioned on ’s and other parameters. Then, using the Bayes theorem, ’s can be inferred with an appropriate priors of ’s and other parameters. For example, the MRF prior used in Peterson and others (2015) and Lin and others (2017) can again be used here as the prior of Inline graphic ’s. In this case, the posterior distribution can be sampled from using a Monte Carlo Markov chain (MCMC) algorithm.

Instead of specifying a joint prior distribution for all Inline graphic ’s, we assume in this article that ’s are a priori independent for different ’s, as we believe that the neighboring dependence of the Gaussian graphical network has been accounted for in calculation of the -scores. To enhance shared edges among distinct conditions, we consider two types of priors for Inline graphic ’s, namely, temporal prior and spatial prior, with borrowed terms from geostatistics. Figure 3 illustrates the application scenarios of the two types of priors. The temporal prior can be used in the scenario that the networks ’s evolve sequentially along with the index . In this scenario, it is quite common to consider the index Inline graphic as the time of experiments. The spatial prior can be used in the scenario that the networks or precision matrices evolve independently from a common structure. For example, the temporal prior can be applied when we construct genetic networks using a set of gene expression data measured for the same tissue at multiple time points, and the spatial prior can be applied if the gene expression data are measured for different tissues at the same time point.

Fig. 3. — Illustration of application scenarios of the temporal and spatial priors: (a) networks evolve along with time (temporal prior); (b) networks evolve independently from a common structure (spatial prior).

2.2.1. Temporal prior

To enhance the similarity of the networks between adjacent conditions, we let Inline graphic be subject to the following prior distribution

(2.3)

where Inline graphic indicates the change of the status of the edge from condition to condition , and is a prior hyperparameter representing the prior probability of edge status changes. In this article, we assume that follows a beta distribution , where and are pre-specified parameters. Further, we let Inline graphic and be subject to an improper uniform distribution, i.e., and , and let and be subject to an inverted-gamma distribution, i.e., , where and are pre-specified constants. Then the joint posterior distribution of is given by

where Inline graphic ’s denote the respective prior distributions. After integrating out the parameters , , , , and , we have the marginal posterior distribution of given by

(2.4)

when Inline graphic and hold, where , , and . When and , we have . When and , we have .

Given Inline graphic distinct conditions, the total number of possible configurations of is . When is small, we can provide an exhaustive evaluation of the configurations. That is, for each possible configuration of , we can calculate its posterior probability and integrated -scores exactly. For each possible configuration Inline graphic for , we denote the posterior probability by and the integrated -score by . According to Stouffer’s meta-analysis method (Stouffer and others, 1949), which is also known as the inverse normal method for combining -values (Zaykin, 2011), we define the integrated -score as

(2.5)

for Inline graphic , where the weight might account for the size or quality of the samples collected under each condition. In this article, we set for all . Such a weighted average score integrates data information on the edge across all conditions. Then the Bayesian Stouffer integrated -score, or Bayesian integrated Inline graphic -score in short, is given by

(2.6)

When Inline graphic is large, the ’s can be estimated with a short MCMC run, say, by running the Gibbs sampler (Geman and Geman, 1984) for a few hundred iterations. Since the MCMC can be run in parallel for different ’s, the computation is not a big burden in this case.

It is interesting to point out that the Bayesian Stouffer integrated Inline graphic -score is different from the conventional Bayesian estimator of . The latter is given by

(2.7)

and

(2.8)

It is easy to see that the Bayesian Stouffer integrated Inline graphic -score amplifies the Bayesian averaged -score (2.7) by a factor between 1 and . Such amplification makes the two clusters of edges more separable in the scores and, as pointed out in the Proof of Lemma 3.4 (in the supplementary material available at Biostatistics online), helps to improve the power of the proposed method by reducing the false negative error. Also, we would point out that for each Inline graphic , if the edge clustering pattern is correct and , then the Stouffer integrated -score has a constant variance of 1, while the simply averaged -score has a varied variance depending on the value of . Therefore, the Bayesian Stouffer integrated -scores are more comparable than the Bayesian averaged Inline graphic -scores in edge determination.

2.2.2. Spatial prior

To enhance our prior knowledge that there exists a common structure for all the networks from which they evolve independently, we let Inline graphic ’s be subject to the prior distribution

(2.9)

where Inline graphic indicates the status change of the edge at condition from , and is the mode of and represents the common status of the edge across all networks. With this prior distribution, the posterior distribution can also be expressed in the form of (2.4) but with and .

2.3. Joint edge detection

To jointly estimate the structure of multiple GGMs based on the Bayesian Stouffer integrated Inline graphic -scores (2.6), a multiple hypothesis test can be applied. The multiple hypothesis test classifies the integrated -scores into two classes, presence of edges and absence of edges. In this article, we adopt the empirical Bayesian method developed by Liang and Zhang (2008) for the multiple hypothesis test, which models the integrated Inline graphic -scores by a mixture distribution , where denotes an integrated -score; and denote the probabilities of edge absence and edge presence, respectively; and and denote the probability density functions of integrated -scores with edge absence and edge presence, respectively. As in Liang and Zhang (2008), we parameterize Inline graphic by an exponential power distribution, and parameterize by a mixture of exponential power distributions. The parameters , and those contained in and are estimated using the stochastic approximation method by minimizing the Kullback–Leibler divergence between the density and the empirical one. The threshold values for grouping the integrated Inline graphic -scores into the classes and are determined according to the value of , a pre-specified false discovery rate level. How to specify the value of will be discussed in Section 2.4. Note that this multiple hypothesis test method allows for the dependence between test statistics, i.e., integrated Inline graphic -scores for this problem. Other methods that account for the dependence between test statistics, e.g., the two-stage method by Benjamini and Yekutieli (2001), can also be applied here.

Finally, we would like to point out that the empirical Bayesian method used above produces a Inline graphic -value (Storey, 2002) for each potential edge of the multiple graphs. The -value, like the -value for the single hypothesis test, provides an uncertainty measure for each potential edge.

2.4. Parameter setting

FHBIA contains two free parameters, i.e., Inline graphic and , which refer to the significance levels of the multiple hypothesis tests conducted in correlation screening and JED, respectively. Following the suggestion of Liang and others (2015), we set and as the default values. Otherwise, their values will be stated in the context. In general, a high significance level of correlation screening will lead to slightly large conditioning sets in calculation of Inline graphic -partial correlation coefficients, which reduces the risk of missing important variables in the conditioning sets. Including a few false variables in the conditioning sets will not hurt much the accuracy of the -partial correlation coefficients. As shown in Xu and others (2019), the performance of the Inline graphic -learning algorithm can be quite robust to the choice of . However, the setting of is quite free, which determines the sparsity of the resulting graphs. A smaller value of might be used if sparse graphs are preferred.

In addition to the two free parameters, FHBIA contains four prior-hyperparameters, i.e., Inline graphic , , , and . Since the probability usually takes a small value, we set for its prior distribution Beta(,). Since the variance of the -scores is approximately equal to 1 under the null hypothesis that the true partial correlation coefficient is equal to 0, we set for its prior distribution IG( Inline graphic , ). The same prior hyperparameter settings have been used in all examples of this article.

2.5. Consistency

Under the faithfulness assumption, sparsity assumption, and other regularity conditions for the joint Gaussian distribution, e.g., the dimension Inline graphic is allowed to grow exponentially with the sample size for some constant and the largest eigenvalue of the covariance matrix can grow with at a restricted rate, Liang and others (2015) showed that the multiple hypothesis test based on the -scores produces a consistent estimate for the GGM with data observed under single condition. Essentially, Liang and others (2015) showed that the Inline graphic -scores are separable in probability for the pairs of nodes with edge absence and edge presence.

To accommodate the change from single condition to multiple conditions, we modified the assumptions of Liang and others (2015) and added an assumption about Inline graphic . Under the new set of assumptions, we proved that the FHBIA method is consistent.

Theorem 2.1

Assume – (see supplementary material available at Biostatistics online) hold. Then

where denotes the true network under condition , denotes the FHBIA estimator of , and denotes a threshold value of Bayesian integrated -scores based on which the edges are determined for all graphs.

The proof of the theorem is given in the supplementary material available at Biostatistics online. Theorem 2.1 implies that for all graphs there exists a common threshold with respect to which the Bayesian integrated Inline graphic -scores are separable in probability for the pairs of nodes with edge presence and edge absence. Here, we would like to highlight three points. First, as indicated by our proof [see inequality (S29) in the Proof of Lemma 3.5 in supplementary material available at Biostatistics online], the data integration step can indeed improve the power of proposed method. Second, following from the inequalities (S29) and (S30) and the condition Inline graphic given in the supplementary material available at Biostatistics online, we can conclude the sign consistency of the estimator ; i.e., for any edge of the graph, the sign of the Bayesian integrated -score has the same sign as the true partial correlation coefficient when the sample size Inline graphic becomes large. Third, the assumption imposed on , i.e., , is rather weak, where , , and are all some positive constants as defined in other assumptions and (see the supplementary material available at Biostatistics online). For example, we can choose and thus . This is consistent with our numerical results; the method can perform very well even with a small value of Inline graphic .

3. Simulation studies

3.1. Scenario with temporal priors

To illustrate the performance of the proposed FHBIA method under the scenario with temporal priors, we consider three types of network structures, namely, autoregressive (AR), scale-free, and hub, which are all allowed to change slightly with the evolvement of conditions. For all the types of structures, we fix Inline graphic and , and varied the sample size and 500. We let denote the precision matrix at condition for . At each condition , we generated 10 independent datasets of size by drawing from the multivariate Gaussian distribution .

For the AR network structure, the precision matrix at condition 1 is given by

(3.10)

which represents an AR(2) graphical model. To construct Inline graphic , we employed the following random edge deleting–adding procedure: we first randomly removed 5% edges in by setting the corresponding nonzero elements to 0, and then added the same number of edges at random by replacing zeros in with the values drawn from the uniform distribution defined on Inline graphic ; to ensure to be positive definite, we set the diagonal elements of to be the smallest absolute eigenvalue of plus a small positive number, where is obtained from by setting the diagonal elements to zero. In the same procedure, we generated conditioned on and then generated conditioned on Inline graphic . We note that similar procedures have been used in Peterson and others (2015) and Lin and others (2017) to generate multiple precision matrices. For the scale-free and hub structures, we first generated the precision matrix using the R package “huge,” then applied the random edge deleting-adding procedure to generate Inline graphic ’s for in a sequential manner.

The FHBIA method was first applied to this example. To access the performance of the method, we plot the precision-recall curves in Figure S1 of supplementary material available at Biostatistics online. The same rule applies to other tables and figures included in the supplementary material available at Biostatistics online. The precision and recall are defined by Inline graphic , , where TP, FP, and FN denote true positives, false positives, and false negatives, respectively, as defined in Table S1 of supplementary material available at Biostatistics online. To draw the precision-recall curves shown in Figure S1 of supplementary material available at Biostatistics online, we fix the significance level of correlation screening to Inline graphic and varied the value of , the significance level of JED. The precision and recall values were calculated by cumulating the TP, FP, FN, and TN values across all conditions. In this article, we employ the precision-recall curve instead of the Receiver Operating Characteristic (ROC) curve as the classification problem involved in recovering the network structure is severely imbalanced, which contains a large number of negative cases due to the network sparsity. As pointed out by Saito and Rehmsmeier (2015) and Davis and Goadrich (2006), the precision-recall curve can be more informative than the ROC curve in the imbalanced classification scenario.

For comparison, the MRF method (Lin and others, 2017), fused graphical Lasso (FGL), and group graphical Lasso (GGL) (Danaher and others, 2014) were applied to this example. The Matlab code of the MRF is available at https://github.com/linzx06/Spatial-and-Temporal-GGM. Both FGL and GGL are available in the R package JGL. For a thorough comparison, we also applied the original Inline graphic -learning algorithm to this example, for which the models under each condition were estimated separately. The results are summarized in Figure S1 and Table S2 of supplementary material available at Biostatistics online. The comparison indicates that FHBIA significantly outperforms the existing methods, especially when the sample size is small. When the sample size is large, FHBIA, MRF, FGL, and GGL tend to perform similarly for the scale-free and hub networks. It is not surprising that FHBIA always outperforms the separated Inline graphic -learning algorithm, which implies the importance of data integration for such high-dimensional problems.

Table S3 of supplementary material available at Biostatistics online reports the CPU time cost by FGL, GGL, MRF, separated Inline graphic -learning and FHBIA for one dataset of AR(2) structure, where the CPU time was measured on a Linux desktop with Inter Core i7-4790 CPU3.6Ghz. All computations reported in this article were done on the same computer. The CPU times of these methods for the other two graph structures are about the same. FGL is extremely slow for this example, as it needs to search over a grid of possible values for an optimal setting of Inline graphic . The grid we used consists of 100 different pairs of . Moreover, for each pair of , it needs to solve a generalized fused Lasso problem for which a closed-form solution does not exist when is greater than 2. Solving the generalized fused Lasso problem is time consuming and has a computational complexity of Inline graphic . The GGL is better as for which there exists a closed-form solution to the regularized parameter optimization problem under each setting of , although the optimal setting of also needs to be searched over a grid of 100 points. The computational complexity of MRF is of (Lin and others, 2017) while FHBIA is of Inline graphic , which can be pretty fast for a small value of . The separated -learning is a little more time consuming than FHBIA because it needs to conduct multiple hypothesis tests under each condition.

3.2. Scenario with spatial priors

As in the scenario with temporal priors, we considered three types of network structures: AR(2), scale-free, and hub. For each type of structures, we set Inline graphic and , and tried two sample sizes and . For AR(2), we first generated the precision matrix according to (2.1). Conditioned on , we generated the precision matrices , , independently using the random edge deleting–adding procedure as described in the scenario of temporal priors. For the other two types of structures, we generated the precision matrices Inline graphic using the R package huge, and then generated , independently using the random edge deleting–adding procedure. Given the precision matrices, we then generated 10 independent datasets of size by drawing from the multivariate Gaussian distribution for each condition .

The FHBIA, MRF, FGL, GGL, separated Inline graphic -learning and graphical EM (Xie and others, 2016) methods were applied to this example. The graphical EM algorithm was specially designed for jointly estimating multiple dependent Gaussian graphical models under this scenario.

Figure S2 of supplementary material available at Biostatistics online shows the precision-recall curves produced for two datasets by FHBIA, MRF, FGL, GGL, separated Inline graphic -learning, and graphical EM. Table S4 of supplementary material available at Biostatistics online summarizes the performance of these methods for all simulated datasets of this example. The comparison indicates that FHBIA significantly outperforms all other methods, especially when the sample size is small.

Table S5 of supplementary material available at Biostatistics online reports the CPU time cost by MRF, FGL, GGL, separated Inline graphic -learning, graphical EM, and FHBIA for one dataset of AR(2) structure. The CPU times for the other two graph structures are about the same. For FGL, this example is even more time consuming than the previous one, although it was run under exactly the same setting for the two examples. One reason is that Inline graphic has increased from 4 to 5. For FHBIA, the CPU time is not much increased compared to the previous example.

4. TEDDY data analysis

This section applied the FHBIA method to the mRNA gene expression data collected in the study of TEDDY. In the study, to reduce potential bias and retain study power while reducing the costs by limiting the numbers of samples requiring laboratory analyses, the gene expression data were collected from the nested matched case–control cohort. A subject who developed two primary outcomes, persistent confirmed islet autoimmunity (i.e., the presence of one confirmed autoantibody, GADA65A, IA-2A, or IAA, on two or more consecutive samples) and/or T1D, was defined as a case. The controls are randomly selected among cohort members who have not yet developed the disease at the time a case is diagnosed. For each subject, the gene expression data were collected at multiple time points within 4 years of age. Refer to Lee and others (2014) for the detailed description for the study. Our goal is to integrate all the data to construct one gene network under each distinct condition.

The dataset consists of 21 285 genes and 742 samples collected at multiple time points from a total of 313 subjects. Among the 742 samples, half of them are for the case and half of them are for the control. The dataset also contains some external variables for each patient, which include age (the time of data collected), gender, race, race ethnicity, season of birth, number of older siblings, and country. To simplify the analysis, we first filtered out some non-differentially expressed genes across the case and control conditions. This was done by conducting a paired Inline graphic -test for each gene at each time point and then applied the multiple hypothesis test method by Liang and Zhang (2008) to identify the set of genes that are significantly differentially expressed under the two conditions at least at one time point. With this filtering process, 572 genes were selected for further study. Figure S3 of supplementary material available at Biostatistics online shows the histogram of the ages of the samples. Based on this histogram, we selected only the samples fallen into the first nine groups for the further analysis, where each mode of the histogram is treated as a group. The respective group sizes are 29, 40, 49, 43, 32, 27, 27, 23, and 21, which are the same for both the case and control. Since the samples were grouped in ages, the index Inline graphic can be understood as the time of experiments. In grouping the samples, we have ensured that in each group, each sample corresponds to a different patient and thus the samples within the same group can be treated as mutually independent. Since the sample size of each group is small, we set Inline graphic and , which are smaller than the default values.

To adjust the effect of external variables, we adopted the method proposed by Liang and others (2015). Let Inline graphic denote the external variables observed at condition . To adjust for their effects, we can replace the empirical correlation coefficient used in the -score calculation step by the p-value obtained in testing the hypotheses for the regression

(4.11)

where Inline graphic denotes the expression value of gene measured at condition , and denotes a vector of Gaussian random errors. Similarly, we can replace the -partial correlation coefficient calculated in the -score calculation step by the p-value obtained in testing the hypotheses for the regression

(4.12)

where Inline graphic is the separator of and under condition . With the -values, we can define the adjusted -score as , where is the p-value obtained from equation (4.12) for edge at condition .

For this dataset, the effect of all available demographical variables, including age (the time of data collection), gender, race, race ethnicity, season of birth, number of older siblings, and country, have been adjusted. With the adjusted Inline graphic -scores, the FHBIA method is ready to be applied to construct the gene networks. Given the complexity of the dataset, which contains case and control groups and multiple time points for each group, we calculated the integrated -scores in two steps. First, we integrated the -scores across nine time points under the case and control, separately. Then, for each time point, we integrated the Inline graphic -scores across the case and control conditions. In this way, all information of the data collected under the 18 conditions were integrated together. Figure 1 shows a schematic diagram for this two-step procedure. Finally, we applied the multiple hypothesis test to the Bayesian integrated Inline graphic -scores to determine the structure of the gene networks under the 18 conditions. The total CPU time cost by FHBIA was 19.2 h, which is pretty long as is large. For a larger value of , we might resort to MCMC for estimating the posterior probabilities ’s.

Figure 4 shows the networks constructed by FHBIA for the case samples at nine time points. The networks have identified quite a few hub genes, which refer to the genes with high connectivity. Table 1 shows the top 5 hub genes identified at each time point for the case samples. The lists of hub genes are pretty stable. For example, RPS26P11 and RPS26 consistently appear as top 2 genes at all time points, the gene ADAM10 appeared at five out of nine time points, and quite a few genes appeared twice or more times, such as PRF1, POGZ, BCL11B, GGNBP2, and TMEM159. Note that RPS26P11 is a pseudo-gene, which represents a segment of the gene RPS26.

Fig. 4. — Gene networks produced by FHBIA for the case TEEDY samples at nine time points. The red edge lines denote new connections appearing in the current network compared with the network at the previous time point; the blue edge lines denote the disappearing connections in the network of the next time point; the green edge lines denote the lines that are both newly appearing connections and disappearing connections; the gray edge lines denote the unchanged connections in the current network and network at the previous time point. (a) Time 1, (b) time 2, (c) time 3, (d) time 4, (e) time 5, (f) time 6, (g) time 7, (h) time 8, and (i) time 9.

Table 1.

Top 5 hub genes identified by FHBIA for the case TEDDY samples at nine time points: “Links” denotes the number of links of the gene to other genes, Inline graphic is the index of time points, *indicates that there exist other genes which has the same number of links with this genes, indicates that this gene has been verified as a T1D-related gene in the literature


Gene	Links	Gene	Links	Gene	Links
RPS26	104	RPS26	68	RPS26	64
RPS26P11	40	RPS26P11	15	RPS26P11	12
ADAM10	4	ADAM10	5	ADAM10	5
POGZ	3	PRF1	4	U2SURP	4
TMEM159*	3	POGZ	3	BCL11B*	3
RPS26	99	RPS26	91	RPS26	86
RPS26P11	14	RPS26P11	18	RPS26P11	42
ADAM10	6	ADAM10	4	BCL11B	3
BCL11B	3	BCL11B	3	GNPTG	3
POGZ*	3	POGZ*	3	GGNBP2	3
RPS26	78	RPS26	70	RPS26	61
RPS26P11	46	RPS26P11	39	RPS26P11	30
BCL11B	3	PRF1	4	TMEM159	3
TMEM159	3	BCL11B	3	GGNBP2	3
GGNBP2	3	GGNBP2	3	OGT*	2

Open in a new tab

Table 1 includes 11 different genes in total. Among the 11 genes, 9 genes have been verified in the literature to be T1D associated genes. For example, Schadt and others (2008) reported that RPS26 is a T1D causal gene, and Ma and Hart (2013) reported that the gene O-GlcNAc transferase (OGT) is directly linked to many metabolic diseases including diabetes. Other than identifying some verified T1D associated genes, we have also some new findings such as gene PRF1. Orilieri and others (2008) claimed that PRF1 variations are susceptibility factors for T1D development. In Table 1, PRF1 appeared as a hub gene twice, which suggests that the connection between PRF1 and T1D might be worth to be further explored. Moreover, we also identifies some connection changes in the networks. As showed in Figure 4, the new appearing and disappearing connections are marked in different colors at each time point, which identify some evolvement patterns of the network.

For comparison, the GGL method was also applied to this example, for which the regularization parameters were chosen according to the minimum AIC criterion. The total CPU time cost by the method was 20.2 h. FGL was not applied to this example, as it would take extremely long CPU time. Figure S4 available at Biostatistics online shows the networks constructed by GGL for the case samples at all nine time points. Table S6 available at Biostatistics online shows the top 5 hub genes identified by GGL at each time point for the case samples. The lists of hub genes are pretty stable, which consists of seven different genes only. Among the seven genes, only three genes RPS26, OGT, and JMJD1C have been verified in the literature as T1D-associated genes. Moreover, as showed in Figure S4 available at Biostatistics online, the hub genes in networks are almost identical at each time point. In summary, FHBIA tends to outperform GGL for this real data example which can identify more hub genes which are associated with T1D.

From the perspective of data analysis, one might also be interested in estimating the gene networks constructed from the controls, as well as the differences between the networks from the cases and controls. For comparing the networks from the cases and controls, we can adopt the method described in Section 6 of Liang and others (2015). However, since the method by Liang and others (2015) requires that the two networks under comparison are independent, the sample information from the cases and controls should not be integrated in this case. We left this work to the future.

5. Discussion

We have proposed the FHBIA method for jointly estimating multiple GGMs under distinct conditions and applied it to TEDDY data. The FHBIA method consists of a few important steps, which is to first summarize the graph structure information contained in the data using the Inline graphic -learning algorithm, then integrate information via a meta-analysis procedure under the Bayesian framework, and finally determine the structures of multiple graphs via a multiple hypothesis test. Compared to the existing methods, FHBIA has a few significant advantages. First, FHBIA includes a meta-analysis procedure to explicitly integrate information across distinct conditions. In contrast, the existing methods integrate information through prior distributions or penalty function, which is often less efficient. Second, FHBIA can be run very fast, especially when Inline graphic is small. The overall computational complexity of FHBIA is , where the factor is the total number of possible configurations of an edge across all conditions. When is large, we need to resort to MCMC for an efficient estimation of the posterior probabilities ’s for . Since ’s can be estimated for each Inline graphic independently, this step can be done in parallel. In addition, we note that the correlation coefficients and -scores can also be calculated in parallel. Hence, the whole method can be executed very fast on a parallel architecture. Moreover, instead of working on the original data, the Bayesian integration step chooses to work on the edge-wise Inline graphic -scores, which avoids to invert high-dimensional covariance matrices and thus can be very fast. Note that, in calculation of -scores, the -learning algorithm also successfully avoids to invert high-dimensional covariance matrices through correlation screening. Third, the empirical Bayesian method that FHBIA employed for multiple hypothesis tests produces a Inline graphic -value (Storey, 2002) for each potential edge of the multiple graphs. The -value provides an uncertainty measure for each potential edge. This has been beyond the ability of many of the existing methods, especially when is large.

The FHBIA method has a very flexible framework, which can be easily extended to joint estimation of multiple mixed graphical models. For example, consider the scenario that the data consist of only Gaussian and multinomial random variables, for which the joint distribution is well defined (Lee and Hastie, 2015). For such mixed data, the Inline graphic -learning algorithm can be performed under the framework of generalized linear models; that is, we can replace the correlation coefficients and -partial correlation coefficients used in the algorithm by the corresponding -values obtained in the marginal variable screening tests (Fan and Song, 2010) and conditional independence tests. Then, we can replace the Inline graphic -scores by the -scores corresponding to the -values of the conditional independence tests. For other types of continuous random variables, we can apply the nonparanormal transformation (Liu and others, 2009) to Gaussianize them prior to the application of the FHBIA method. For certain types of discrete data, e.g., next generation sequencing data, we can apply the transformations developed in (Jia and others, 2017) to continuize and Gaussianize them prior to the application of the FHBIA method.

Finally, we would like to mention that the sparsity assumption imposed on the networks does not limit the applications of the FHBIA method. Sparsity is a just trick adopted by people for making statistical inference where there is no enough amount of data available, e.g., in dealing with small-n-large-p problems. In this article, the sparsity assumption A4 (in the supplementary material available at Biostatistics online) is given in the form of the sample size Inline graphic , which leads to a bounded neighborhood size of for each node, see Lemma 3.2 and the -learning algorithm presented in the supplementary material available at Biostatistics online. Therefore, when the sample size is large enough, e.g., when using the UK biobank data which consists of over 500 K samples to construct GRNs, FHBIA can be applied to learn dense genetic networks (Boyle and others, 2017). For UK biobank data, we have Inline graphic , which is much larger than the number of genes ( K) we usually considered and thus neighborhood truncation in the correlation screening step of -learning will not be triggered. When the data size is not large enough, the FHBIA will identify only the strongest connections in terms of partial correlation coefficients. This property is inherited from the Inline graphic -learning algorithm.

Supplementary Material

kxz027_Supplementary_Material

Click here for additional data file.^{(791.7KB, pdf)}

Acknowledgments

The authors thank the editor, associate editor, two referees, and Dr. George Tseng for their constructive comments which have led to significant improvement of this article. Members of the TEDDY Study Group are listed in the Supplementary File. Conflict of Interest: None declared.

6. Software

The software accompanied with this article is available as a module called JGGM in the R package equSA at https://cran.r-project.org/web/packages/equSA/index.html.

Funding

Leona M. and Harry B. Helmsley Charitable Trust (2015PG-T1D050); F.L.’s research was support in part by the grants USF-ITN-15-11-MH, DMS-1612924, DMS/NIH R01-GM117597, and NIH R01-GM126089. The TEDDY Study is funded by U01 DK63829, U01 DK63861, U01 DK63821, U01 DK63865, U01 DK63863, U01 DK63836, U01 DK63790, UC4 DK63829, UC4 DK63861, UC4 DK63821, UC4 DK63865, UC4 DK63863, UC4 DK63836, UC4 DK95300, UC4 DK100238, UC4 DK106955, UC4 DK112243, UC4 DK117483, and Contract No. HHSN267200700014C from the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institute of Allergy and Infectious Diseases (NIAID), National Institute of Child Health and Human Development (NICHD), National Institute of Environmental Health Sciences (NIEHS), Centers for Disease Control and Prevention (CDC), and JDRF. NIH/NCATS Clinical and Translational Science Awards to the University of Florida (UL1 TR000064) and the University of Colorado (UL1 TR001082), in part.

References

Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165–1188. [Google Scholar]
Boyle, E. A.,Li, Y. I. and Pritchard, J. K. (2017). An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chun, H.,Zhang, X. and Zhao, H. (2015). Gene regulation network inference with joint sparse Gaussian graphical models. Journal of Computational and Graphical Statistics 24, 954–974. [DOI] [PMC free article] [PubMed] [Google Scholar]
Danaher, P.,Wang, P. and Witten, D. M. (2014). The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76, 373–397. [DOI] [PMC free article] [PubMed] [Google Scholar]
Davis, J. and Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, Pennsylvania, USA. New York, NY, USA: ACM, pp. 233–240. [Google Scholar]
Fan, J. and Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. Annals of Statistics 38, 3567–3604. [Google Scholar]
Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–741. [DOI] [PubMed] [Google Scholar]
Guo, J.,Levina, E.,Michailidis, G. and Zhu, J. (2011). Joint estimation of multiple graphical models. Biometrika 98, 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jia, B.,Xu, S.,Xiao, G.,Lamba, V. and Liang, F. (2017). Learning gene regulatory networks from next generation sequencing data. Biometrics 73, 1221–1230. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee, H. S.,, Burkhardt, B. R.,, McLeod, W.,, Smith, S.,, Eberhard, C.,, Lynch, K.,, Hadley, D.,, Rewers, M.,, Simell, O.,, She, J. X. and others (2014). Biomarker discovery study design for type 1 diabetes in The Environmental Determinants of Diabetes in the Young (TEDDY) study. Diabetes/Metabolism Research and Reviews 30, 424–434. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee, J. and Hastie, T.J. (2015). Learning the structure of mixed graphical models. Journal of Computational and Graphical Statistics 24, 230–253. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liang, F.,Song, Q. and Qiu, P. (2015). An equivalent measure of partial correlation coefficients for high dimensional Gaussian graphical models. Journal of the American Statistical Association 110, 1248–1265. [Google Scholar]
Liang, F. and Zhang, J. (2008). Estimating the false discovery rate using the stochastic approximation algorithm. Biometrika 95, 961–977. [Google Scholar]
Lin, Z.,Wang, T.,Yang, C. and Zhao, H. (2017). On joint estimation of Gaussian graphical models for spatial and temporal data. Biometrics 73, 769–779. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu, H.,Lafferty, J. and Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research 10, 2295–2328. [PMC free article] [PubMed] [Google Scholar]
Ma, J. and Hart, G. W. (2013). Protein O-GlcNAcylation in diabetes and diabetic complications. Expert Review of Proteomics 10, 365–380. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Annals of Statistics 34, 1436–1462. [Google Scholar]
Orilieri, E.,, Cappellano, G.,, Clementi, R.,, Cometa, A.,, Ferretti, M.,, Cerutti, E.,, Cadario, F.,, Martinetti, M.,, Larizza, D.,, Calcaterra, V. and others (2008). Variations of the perforin gene in patients with type 1 diabetes. Diabetes 57, 1078–1083. [DOI] [PubMed] [Google Scholar]
Peterson, C.,Stingo, F. C. and Vannucci, M. (2015). Bayesian inference of multiple Gaussian graphical models. Journal of the American Statistical Association 110, 159–174. [DOI] [PMC free article] [PubMed] [Google Scholar]
Qiu, H.,Han, F.,Liu, H. and Caffo, B. (2016). Joint estimation of multiple graphical models from high dimensional time series. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78, 487–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
Saito, T. and Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One 10, e0118432. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schadt, E. E.,, Molony, C.,, Chudin, E.,, Hao, K.,, Yang, X.,, Lum, P. Y.,, Kasarskis, A.,, Zhang, B.,, Wang, S.,, Suver, C. and others (2008). Mapping the genetic architecture of gene expression in human liver. PLoS Biology 6, e107. [DOI] [PMC free article] [PubMed] [Google Scholar]
Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64, 479–498. [Google Scholar]
Stouffer, S. A.,Suchman, E. A.,Devinney, L. C.,Star, S. A., and Williams, R. M., Jr.. and others (1949). The American Soldier: Adjustment during Army Life, (Studies in social psychology in World War II) Volume 1. Oxford, England: Princeton University Press. [Google Scholar]
Xie, Y.,Liu, Y. and Valdar, W. (2016). Joint estimation of multiple dependent Gaussian graphical models with applications to mouse genomics. Biometrika 103, 493–511. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu, S.,Jia, B. and Liang, F. (2019). Learning moral graphs in construction of high-dimensional Bayesian networks for mixed data. Neural Computation 31, 1183–1214. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zaykin, D. V. (2011). Optimally weighted Z-test is a powerful method for combining probabilities in meta-analysis. Journal of Evolutionary Biology 24, 1836–1841. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhou, S.,Lafferty, J. and Wasserman, L. (2010). Time varying undirected graphs. Machine Learning 80, 295–319. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

kxz027_Supplementary_Material

Click here for additional data file.^{(791.7KB, pdf)}

[B1] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165–1188. [Google Scholar]

[B2] Boyle, E. A.,Li, Y. I. and Pritchard, J. K. (2017). An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] Chun, H.,Zhang, X. and Zhao, H. (2015). Gene regulation network inference with joint sparse Gaussian graphical models. Journal of Computational and Graphical Statistics 24, 954–974. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] Danaher, P.,Wang, P. and Witten, D. M. (2014). The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76, 373–397. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] Davis, J. and Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, Pennsylvania, USA. New York, NY, USA: ACM, pp. 233–240. [Google Scholar]

[B6] Fan, J. and Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. Annals of Statistics 38, 3567–3604. [Google Scholar]

[B7] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–741. [DOI] [PubMed] [Google Scholar]

[B8] Guo, J.,Levina, E.,Michailidis, G. and Zhu, J. (2011). Joint estimation of multiple graphical models. Biometrika 98, 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] Jia, B.,Xu, S.,Xiao, G.,Lamba, V. and Liang, F. (2017). Learning gene regulatory networks from next generation sequencing data. Biometrics 73, 1221–1230. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] Lee, H. S.,, Burkhardt, B. R.,, McLeod, W.,, Smith, S.,, Eberhard, C.,, Lynch, K.,, Hadley, D.,, Rewers, M.,, Simell, O.,, She, J. X. and others (2014). Biomarker discovery study design for type 1 diabetes in The Environmental Determinants of Diabetes in the Young (TEDDY) study. Diabetes/Metabolism Research and Reviews 30, 424–434. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] Lee, J. and Hastie, T.J. (2015). Learning the structure of mixed graphical models. Journal of Computational and Graphical Statistics 24, 230–253. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] Liang, F.,Song, Q. and Qiu, P. (2015). An equivalent measure of partial correlation coefficients for high dimensional Gaussian graphical models. Journal of the American Statistical Association 110, 1248–1265. [Google Scholar]

[B13] Liang, F. and Zhang, J. (2008). Estimating the false discovery rate using the stochastic approximation algorithm. Biometrika 95, 961–977. [Google Scholar]

[B14] Lin, Z.,Wang, T.,Yang, C. and Zhao, H. (2017). On joint estimation of Gaussian graphical models for spatial and temporal data. Biometrics 73, 769–779. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] Liu, H.,Lafferty, J. and Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research 10, 2295–2328. [PMC free article] [PubMed] [Google Scholar]

[B16] Ma, J. and Hart, G. W. (2013). Protein O-GlcNAcylation in diabetes and diabetic complications. Expert Review of Proteomics 10, 365–380. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Annals of Statistics 34, 1436–1462. [Google Scholar]

[B18] Orilieri, E.,, Cappellano, G.,, Clementi, R.,, Cometa, A.,, Ferretti, M.,, Cerutti, E.,, Cadario, F.,, Martinetti, M.,, Larizza, D.,, Calcaterra, V. and others (2008). Variations of the perforin gene in patients with type 1 diabetes. Diabetes 57, 1078–1083. [DOI] [PubMed] [Google Scholar]

[B19] Peterson, C.,Stingo, F. C. and Vannucci, M. (2015). Bayesian inference of multiple Gaussian graphical models. Journal of the American Statistical Association 110, 159–174. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] Qiu, H.,Han, F.,Liu, H. and Caffo, B. (2016). Joint estimation of multiple graphical models from high dimensional time series. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78, 487–504. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] Saito, T. and Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One 10, e0118432. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] Schadt, E. E.,, Molony, C.,, Chudin, E.,, Hao, K.,, Yang, X.,, Lum, P. Y.,, Kasarskis, A.,, Zhang, B.,, Wang, S.,, Suver, C. and others (2008). Mapping the genetic architecture of gene expression in human liver. PLoS Biology 6, e107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64, 479–498. [Google Scholar]

[B24] Stouffer, S. A.,Suchman, E. A.,Devinney, L. C.,Star, S. A., and Williams, R. M., Jr.. and others (1949). The American Soldier: Adjustment during Army Life, (Studies in social psychology in World War II) Volume 1. Oxford, England: Princeton University Press. [Google Scholar]

[B25] Xie, Y.,Liu, Y. and Valdar, W. (2016). Joint estimation of multiple dependent Gaussian graphical models with applications to mouse genomics. Biometrika 103, 493–511. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] Xu, S.,Jia, B. and Liang, F. (2019). Learning moral graphs in construction of high-dimensional Bayesian networks for mixed data. Neural Computation 31, 1183–1214. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] Zaykin, D. V. (2011). Optimally weighted Z-test is a powerful method for combining probabilities in meta-analysis. Journal of Evolutionary Biology 24, 1836–1841. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28] Zhou, S.,Lafferty, J. and Wasserman, L. (2010). Time varying undirected graphs. Machine Learning 80, 295–319. [Google Scholar]

PERMALINK

Fast hybrid Bayesian integrative learning of multiple gene regulatory networks for type 1 diabetes

Bochao Jia

Faming Liang

SUMMARY

1. Introduction

Fig. 1.

2. Fast hybrid Bayesian integrative analysis

Fig. 2.

2.1. -Score transformation

2.2. Bayesian clustering and meta-analysis

Fig. 3.

2.2.1. Temporal prior

2.2.2. Spatial prior

2.3. Joint edge detection

2.4. Parameter setting

2.5. Consistency

Theorem 2.1

3. Simulation studies

3.1. Scenario with temporal priors

3.2. Scenario with spatial priors

4. TEDDY data analysis

Fig. 4.

Table 1.

5. Discussion

Supplementary Material

Acknowledgments

6. Software

Funding

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Fast hybrid Bayesian integrative learning of multiple gene regulatory networks for type 1 diabetes

Bochao Jia

Faming Liang

SUMMARY

1. Introduction

Fig. 1.

2. Fast hybrid Bayesian integrative analysis

Fig. 2.

2.1. -Score transformation

2.2. Bayesian clustering and meta-analysis

Fig. 3.

2.2.1. Temporal prior

2.2.2. Spatial prior

2.3. Joint edge detection

2.4. Parameter setting

2.5. Consistency

Theorem 2.1

3. Simulation studies

3.1. Scenario with temporal priors

3.2. Scenario with spatial priors

4. TEDDY data analysis

Fig. 4.

Table 1.

5. Discussion

Supplementary Material

Acknowledgments

6. Software

Funding

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases