A Bayesian Graphical Model for Integrative Analysis of TCGA Data

Yanxun Xu; Jie Zhang; Yuan Yuan; Riten Mitra; Peter Müller; Yuan Ji

doi:10.1109/GENSIPS.2012.6507747

. Author manuscript; available in PMC: 2015 Apr 7.

Published in final edited form as: IEEE Int Workshop Genomic Signal Process Stat. 2012 Dec;2012:135–138. doi: 10.1109/GENSIPS.2012.6507747

A Bayesian Graphical Model for Integrative Analysis of TCGA Data

Yanxun Xu ^1,^✉, Jie Zhang ², Yuan Yuan ³, Riten Mitra ⁴, Peter Müller ⁵, Yuan Ji ^6,^✉

PMCID: PMC4387199 NIHMSID: NIHMS673684 PMID: 25859418

Abstract

We integrate three TCGA data sets including measurements on matched DNA copy numbers (C), DNA methylation (M), and mRNA expression (E) over 500+ ovarian cancer samples. The integrative analysis is based on a Bayesian graphical model treating the three types of measurements as three vertices in a network. The graph is used as a convenient way to parameterize and display the dependence structure. Edges connecting vertices infer specific types of regulatory relationships. For example, an edge between M and E and a lack of edge between C and E implies methylation-controlled transcription, which is robust to copy number changes. In other words, the mRNA expression is sensitive to methylational variation but not copy number variation. We apply the graphical model to each of the genes in the TCGA data independently and provide a comprehensive list of inferred profiles. Examples are provided based on simulated data as well.

I. Introduction

Gene expression is a critical genetic process in which DNA is transcribed to RNA. Perturbation of transcription directly affects mRNA expression and hence the subsequent protein production, leading to pathological states. Genetic variations such as copy-number variations (CNVs) and DNA methylations frequently contribute to disrupted gene expression. CNVs result in an abnormal number of copies of DNA and thus change the gene expression level and associated phenotypes. For example, a higher copy number of CCL3L1 has been associated with lower susceptibility to HIV infection [1], and a low copy number of FCGR3B can increase susceptibility to systemic lupus erythematosus and similar inflammatory autoimmune disorders [2]. DNA methylation is a biochemical modification that adds a methyl group to the 5 position of the cytosine pyrimidine ring or the number 6 nitrogen of the adenine purine ring. There is strong evidence that abnormal hypermethylation at the gene promoter region results in transcriptional silencing of tumor suppressor genes. Also, aberrant DNA methylation patterns have been associated with a large number of human malignancies such as cancer, lupus, and a range of birth defects [3]. Therefore, elucidating tumor-specific methylation changes will shed light on potential clinical applications in cancer diagnosis, prognosis and therapeutics [4].

Current literature mainly focuses on the pair-wise integration, between CNVs and mRNA or between methylation and mRNA. Bussey et al. [5] computed Pearson’s correlation coefficients and tested the significance of correlations using false discovery rate (FDR) control. Waaijenborg et al. [6] proposed a penalized canonical correlation analysis to study genome-wide association between DNA copy number and mRNA expression. Menezes et al. [7] modeled the relationship of DNA copy number and mRNA expression by a linear model based on a modified correlation coefficient and an explorative Wilcoxon test. Choi et al. [8] described a Bayesian double-layered mixture model which directly modeled the stochastic nature of CNVs and identified abnormally expressed genes due to aberrant copy number. Etcheverry et al. [9] investigated the effect of methylation on mRNA expression in glioblastoma, and identified 13 genes that display an inverse correlation between methylation and mRNA expression using Perason’s correlation coefficient.

Since both CNVs and DNA methylation play important roles in mRNA expression, an integrated analysis that models all three platforms together is most appropriate. Denoting with C, M, and E the three platforms used to measure CNVs, methylation, and mRNA expression, we integrate data from all three platforms and present inference results as graphs that include C, M, and E as three vertices. In particular, we propose a Bayesian graphical model which imposes a probability distribution on the unknown networks and apply an autologistic prior to learn the dependence structure of three platforms through a graph. The vertices of the graph represent the platforms, and the presence or absence of edges indicates the presence or absence of conditional dependence between the platforms. For example, an edge between M and E and a lack of edge between C and E implies methylation-controlled transcription, which is robust to copy number changes. In other words, the mRNA expression is sensitive to methylational variation but not copy number variation. In this application, the use of a 3-node graphical model to represent the dependence structure of C, M and E is mainly chosen for convenience and for ease of display.

In the next Section, we give a brief overview of the ovarian cancer data to which we apply our integration analysis. In Section III, we introduce the proposed Bayesian graphical models along with MCMC simulation details. Section IV presents several simulation studies to evaluate the performance of the proposed model. In Section V, we report results based on the analysis of ovarian cancer data. We conclude with a discussion in Section VI.

II. TCGA Ovarian Cancer Data

Ovarian cancer is ranked as the fifth leading cause of death related to reproductive cancer in women. The Cancer Genome Atlas (TCGA) Research Network (http://cancergenome.nih.gov/) has examined more than 500 tumor samples and thousands of genes. The data is publically available online [10]. Special effort has been directed to produce matched measurements on DNA copy number (C), DNA methylation (M), and mRNA expressoin (E) for all the genes across the tumor samples. Taking advantage of this effort, we use the level 3 data of measurements on (C, M, E) for each gene with matched tumor samples. Specifically, let y_{it_g} denote the measurement for gene g, on sample t, with platform i. Here i = 1, 2, 3 represents C, M, and E respectively, t indexes the T = 534 tumor samples, and g indexes the N = 9283 genes.

III. Probability Model

A. Sampling Model

We apply the proposed model for individual genes separately and thus drop the index g in subsequent discussion. For a single gene, the data is arranged in a 3 × T matrix Y = [y_it], i = 1, 2, 3 and t = 1, 2, …, T. We assume independence of measurements y_it across samples. The proposed model introduces latent trinary indicators e_it ∈ {−1, 0, 1}. The indicators have an interpretation as under-, regular and over-expression of the corresponding measurement. Using e_it we apply the mixture model proposed by Parmigiani et al. (2002) [11] for y_it. In words, we assume a mixture model with uniform, normal and uniform components corresponding to under-, regular and over-expression. The model is

(y_{it} - α_{t} - μ_{i}) | e_{it}, θ_{it} ~ I [e_{it} = - 1] U (y_{it} | - k_{i -}, 0) + I [e_{it} = 0] N (y_{it} | 0, σ_{i}^{2}) + I [e_{it} = 1] U (y_{it} | 0, k_{i +}),

(1)

where I[·] is the indicator function, U(A) denotes a uniform distribution over the set A, and N(·, ·) denotes the normal distribution. The vector $θ_{it} = (α_{t}, μ_{i}, σ_{i}^{2}, k_{i -}, k_{i +})$ collects all the other parameters. For example, α_t and μ_i are the random effects of sample t and platform i. We subsequently convert the trinary variable e_it to a binary variable z_it with p(e_it|z_it = 0) = δ₋₁(e_it), and

p (e_{it} = 0 | π_{i}, z_{it} = 1) = π_{i}, p (e_{it} = 1 | π_{i}, z_{it} = 1) = 1 - π_{i} .

This conversion is devised to set up the following graphical model.

Denote V = {1, 2, 3} the set of three vertices representing C, M, and E. We use a graph on these three nodes to characterize the dependence structure across the three platforms. A graph is a pair G = {V, S} where S is a set of undirected edges {i, j}, i, j ∈ V. A graph G can be used to describe the conditional independence structure of a set of variables indexed by V, for example the binary indicators {z_it, i ∈ V} in the case of our application. The absence of an edge {i, j} indicates conditional independence of z_it, z_jt given the remaining variables z_kt, k ≠ i, k ≠ j. In the case of the three platforms the set of remaining variables reduces to just the third platform. Any joint probability model p(z_1t, z_2t, z_3t) that respects the dependence structure G can be written as (Besag, 1974 [12]):

p (z_{t} | β, G) = p (0 | β, G) \times exp {\sum_{i = 1}^{3} β_{i} z_{it} + \sum_{{i, j} \in V; i < j} β_{i j} z_{it} z_{j t}}

(2)

where z_t = (z_1t, z_2t.z_3t) and β = (β₁, β₂, β₃, β₁₂, β₂₃, β₁₃). Coefficients β_ij are non-zero only when the corresponding edge is included in the graph. Model (2) is known as the autologistic model.

Caragea and Kaiser [13] and Hughes et al. [14] proposed a centered parametrization of the autologistic model and argued that the centered version improves mixing of the Markov chain Monte Carlo (MCMC) posterior simulation and simplifies prior specification. The centered version is used in the form of

p (z_{t} | β, G) = p (0 | β, G) exp {\sum_{i = 1}^{3} β_{i} z_{it} + \sum_{{i, j} \in V; i < j} β_{i j} (z_{it} - ν_{i}) (z_{j t} - ν_{j})},

(3)

where ν_i = exp(β_i)/{1 + exp(β_i)}.

The joint model factors as

p (Y, e, z, π, θ, β, G) = p (Y | e, θ) p (e | z, π) p (z | β, G) p (θ) p (β | G) p (G)

(4)

We introduce the priors p(θ)p(β | G)p(G) next. Let Ga(a, b) denote a gamma distribution with mean a/b. We assume conditionally conjugate priors

μ_{i} ~ N (0, τ_{μ}), \frac{1}{σ_{i}^{2}} ~ G a (γ_{σ}, λ_{σ}),

\frac{1}{k_{i -}} ~ G a (γ_{k_{i -}}, λ_{k_{i -}}), \frac{1}{k_{i +}} ~ G a (γ_{k_{i +}}, λ_{k_{i +}}),

β_{*} ~ N (0, σ_{β}^{2}), π_{i} ~ U (0, 1),

where β_⋆ stands for the coefficients β_i, β_ij in (3). For the sample random effects α_t’s, we assume α_t ~ N(0, τ_α) subject to identifiability constraint ∑_t α_t = 0. Lastly, we define a prior p(G) as a uniform distribution over all possible graphs. With 3 vertices, we only need to consider up to 8 graphs. Each of the subgraphs is given a prior probability of 1/8.

B. Markov Chain Monte Carlo (MCMC) Simulations

We carry out posterior inference for model (4) using MCMC simulations. Each iteration of the MCMC scheme includes the following transition probabilities. We start by generating z_it from its complete conditional posterior. Following the update of z, we generate values for e from complete conditional posterior p(e | Y, α, z). If z_it = 0, the update is deterministic, e_it = −1. If z_it = 1, the update requires a Bernoulli draw for e_it = 0 versus e_it = 1. The update of parameters θ is straight-forward. Resampling G and the regression coefficients β could be challenging in larger graphs, essentially because of the difficult evaluation of the normalization constant p(0 | β, G) in (3) (see, e.g. [15]). However, here p(G) is only supported over 8 possible graphs, making the evaluation of the normalization constant straightforward. Thus, resampling G and β reduces to straightforward trans-dimensional MCMC as in [16].

IV. Simulation Study

To evaluate the proposed model, we examine the performance of our model with 3 simulated data sets, each with T = 300 samples, one true graph and a single gene. For each simulation, a true graph G is first generated as follows. For a pair of vertices {i, j}, we include the edge with probability 0.5. For each imputed edge {i, j}, we generate values of β_ij ~ N(μ₁, 0.5²) with μ₁ ~ U(−3, 3). We generate β_i ~ N(μ₂, 0.5²) with μ₂ ~ U(−0.5, 0.5). Then, we generate z for T = 300 samples. Since p(e_it|z_it = 0) = δ₋₁(e_it), and p(e_it = 0|π_i, z_it = 1) = π_i, p(e_it = 1|π_i, z_it = 1) = 1 − π_i, we first generate π_i ~ U(0.25, 0.75) and then generate e. Furthermore, we let μ_i = 0, σ_i = 0.316, k_i− = 5.556, k_i+ = 5.556 for each node, and generate α_t ~ N(0, 0.1²) subject to the identifiability criterion ∑_t α_t = 0. Lastly, the hyper-parameters are τ_α = 1, τ_μ = 1, γ_σ = 2, λ_σ = 0.1, γ_k₊ = 10, λ_k₊ = 50, γ_k₋ = 10, λ_k₋ = 50, $σ_{β}^{2} = 10$ .

We implement our model to compute the posterior summaries for each simulated data set. The posterior estimates are obtained by MCMC posterior simulation with 5,000 iterations, of which 2,000 are burn-in. Since graph G is modeled as a random variable, we report the inference ξ = P(G = G₀ | data), where G₀ is the true graph in the simulation. For the three data sets ξ = 0.82, 0.86, and 1, respectively. We also report parameter estimates β̄ = E(β | Y) denoting the posterior mean for the autologistic coefficients.

From Figure 1, we can see that the estimated graph match the simulation truth for all three data sets. Here the estimated graph is the graph with highest posterior probability. We denote the positive and negative edges by black lines and red lines, respectively. The sign of β_ij has an intuitively appealing interpretation related to the effect of the j-th platform on the probability of presence of i-th platform, keeping the other platform fixed. Let z_−ij = z\{z_i, z_j}. We can show that β_ij is the log odds ratio of z_i and z_j through simple algebra, where β_ij > 0 implies that p(z_i = 1 | z_j = 1, z_−ij) > p(z_i = 1 | z_j = 0, z_−ij). See Figure 1 for the values of β’s.

Fig. 1 — The simulation truth versus the estimated graph for three simulated data set. Edge colors black and red represent positive and negative relationships. The solid line represents that the edge exists. The red dotted lines indicate that the corresponding edges do not exsit. The number next to each edge represents either the true value or the posterior mean of the autologistic coefficients β’s, 0 for the edges do not exist. The estimated graph based on posterior inference is identical to the simulation truth.

V. Ovarian Cancer DATA Analysis

We apply our model and inference method to one gene at a time using the ovarian cancer data described in Section II, aiming to recover the unknown dependence structure among the three platforms for each gene, and display it as a three-vertices graph. We carry out inference using the described MCMC posterior simulation and ran 5,000 iterations with 2,000 burn-in. We obtain a posterior estimate Ĝ of the unknown graph with the largest posterior probability.

An Excel table is provided as supplementary materials in which we present the posterior probability of each subgraph for each gene (https://sites.google.com/site/yanxunresearch). Genes are listed in descending order according to Pr(G = Ĝ | data). There are 142 genes whose Pr(G = Ĝ | data) > 0.4. When the cutoff is set to 0.6, there are 61 genes. For cutoff = 0.8, there are only 13 genes. From these 13 gene, we select two genes “ERLIN2” and “PIR” randomly to demonstrate the results.

Figure 2 shows smooth scatter plots of the data for the two selected genes. Figure 3 displays the estimated graph for them. From these two figures, we can see that the actual trend exhibited in the scatter plot is consistent with our model estimation. For example, there is an obvious positive correlation between mRNA expression and CNVs for ERLIN2 in Figure 2 and the posterior mean given by our model for the mRNA expression-CNVs edge in Figure 3 is 7.30, indicating a strong positive correlation between the two platforms, which corresponds well with what we observed in Figure 2. This matching pattern is also observed for other cases. Overall, our model estimation corresponds well with the association observed among the platforms.

Fig. 2 — Smooth scatter plots of pairwise relationship among platforms C, M and E. The upper panel is for gene “ERLIN2”, the low panel is for gene “PIR”. The red line in each smooth scatter plot is the lowess smoother. Dots correspond to the raw expression measurements from the level three TCGA data.

Fig. 3 — Posterior estimated graphs for genes “ERLIN2” and “PIR”. Black edges represent positive relationships and red edges represent negative relationships. The number next to each edge is the posterior mean of β_ij.

VI. Discussion

We propose a Bayesian graphical model to describe the dependence structure of three genetic phenomena, CNVs, DNA methylation, and mRNA expression. The inferred graph gives a clear representation of the regulatory relationships involving the three genetic features. For example, the mRNA expression of gene ERLIN2 is sensitive to copy number changes but robust to DNA methylation, while the mRNA expression of gene PIR is sensitive to both copy number changes and DNA methylation. We are in the process of making a comprehensive list of these relationships using the entire TCGA data, expanding the effort to include more cancer types and more features such as microRNA and protein expression.

Acknowledgment

Peter Müller and Yuan Ji’s research is supported in part by NIH R01 CA132897.

Contributor Information

Yanxun Xu, Department of Statistics, Rice University, Houston, TX, yanxun.xu@rice.edu.

Jie Zhang, Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin.

Yuan Yuan, Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, TX.

Riten Mitra, ICES, The University of Texas at Austin, Austin, TX.

Peter Müller, Department of Mathematics, The University of Texas at Austin, Austin, TX.

Yuan Ji, CCRI, NorthShore University HealthSystem, Chicago, IL, yji@northshore.org.

References

1.Gonzalez E, Kulkarni H, Bolivar H, Mangano A, Sanchez R, Catano G, Nibbs R, Freedman B, Quinones M, Bamshad M, et al. The influence of CCL3L1 gene-containing segmental duplications on hiv-1/aids susceptibility. Science’s STKE. 2005;307(5714):1434. doi: 10.1126/science.1101160. [DOI] [PubMed] [Google Scholar]
2.Aitman T, Dong R, Vyse T, Norsworthy P, Johnson M, Smith J, Mangion J, Roberton-Lowe C, Marshall A, Petretto E, et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature. 2006;439(7078):851–855. doi: 10.1038/nature04489. [DOI] [PubMed] [Google Scholar]
3.Robertson K. DNA methylation and human disease. Nature Reviews Genetics. 2005;6(8):597–610. doi: 10.1038/nrg1655. [DOI] [PubMed] [Google Scholar]
4.Das P, Singal R. DNA methylation and cancer. Journal of Clinical Oncology. 2004;22(22):4632–4642. doi: 10.1200/JCO.2004.07.151. [DOI] [PubMed] [Google Scholar]
5.Bussey K, Chin K, Lababidi S, Reimers M, Reinhold W, Kuo W, Gwadry F, Kouros-Mehr H, Fridlyand J, Jain A, et al. Integrating data on DNA copy number with gene expression levels and drug sensitivities in the NCI-60 cell line panel. Molecular cancer therapeutics. 2006;5(4):853. doi: 10.1158/1535-7163.MCT-05-0155. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Waaijenborg S, de Witt Hamer V, Philip C, Zwinderman A. Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis. Statistical Applications in Genetics and Molecular Biology. 2008;7(3) doi: 10.2202/1544-6115.1329. [DOI] [PubMed] [Google Scholar]
7.Menezes R, Boetzer M, Sieswerda M, Van Ommen G, Boer J. Integrated analysis of DNA copy number and gene expression microarray data using gene sets. BMC bioinformatics. 2009;10(1):203. doi: 10.1186/1471-2105-10-203. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Choi H, Qin Z, Ghosh D. A double-layered mixture model for the joint analysis of DNA copy number and gene expression data. Journal of computational biology: a journal of computational molecular cell biology. 2010 doi: 10.1089/cmb.2009.0019. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Amandine E, Marc A, de Tayrac Marie V, Frederique G, Stephan S, Abderrahmane H, Laurent R, Philippe M, Veronique Q, Jean M. DNA methylation in glioblastoma: impact on gene expression and clinical outcome. BMC Genomics. 2010;11 doi: 10.1186/1471-2164-11-701. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Bell D, Berchuck A, Birrer M, Chien J, Cramer D, Dao F, Dhir R, Disaia P, Gabra H, Glenn P, et al. Integrated genomic analyses of ovarian carcinoma. Nature. 2011 doi: 10.1038/nature10166. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Parmigiani G, Garrett E, Anbazhagan R, Gabrielson E. A statistical framework for expression-based molecular classification in cancer. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64(4):717–736. [Google Scholar]
12.Besag J. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B (Methodological) 1974:192–236. [Google Scholar]
13.Caragea P, Kaiser M. Autologistic models with interpretable parameters. Journal of agricultural, biological, and environmental statistics. 2009;14(3):281–300. [Google Scholar]
14.Hughes J, Haran M, Caragea P. Autologistic models for binary data on a lattice. Environmetrics. 2011 [Google Scholar]
15.Mitra R, Mueller P, Liang S, Yue L, Ji Y. A bayesian graphical model for chip-seq data on histone modifications. Journal of the American Statistical Association. In Press. [Google Scholar]
16.Green P. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995;82(4):711–732. [Google Scholar]

[R1] 1.Gonzalez E, Kulkarni H, Bolivar H, Mangano A, Sanchez R, Catano G, Nibbs R, Freedman B, Quinones M, Bamshad M, et al. The influence of CCL3L1 gene-containing segmental duplications on hiv-1/aids susceptibility. Science’s STKE. 2005;307(5714):1434. doi: 10.1126/science.1101160. [DOI] [PubMed] [Google Scholar]

[R2] 2.Aitman T, Dong R, Vyse T, Norsworthy P, Johnson M, Smith J, Mangion J, Roberton-Lowe C, Marshall A, Petretto E, et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature. 2006;439(7078):851–855. doi: 10.1038/nature04489. [DOI] [PubMed] [Google Scholar]

[R3] 3.Robertson K. DNA methylation and human disease. Nature Reviews Genetics. 2005;6(8):597–610. doi: 10.1038/nrg1655. [DOI] [PubMed] [Google Scholar]

[R4] 4.Das P, Singal R. DNA methylation and cancer. Journal of Clinical Oncology. 2004;22(22):4632–4642. doi: 10.1200/JCO.2004.07.151. [DOI] [PubMed] [Google Scholar]

[R5] 5.Bussey K, Chin K, Lababidi S, Reimers M, Reinhold W, Kuo W, Gwadry F, Kouros-Mehr H, Fridlyand J, Jain A, et al. Integrating data on DNA copy number with gene expression levels and drug sensitivities in the NCI-60 cell line panel. Molecular cancer therapeutics. 2006;5(4):853. doi: 10.1158/1535-7163.MCT-05-0155. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Waaijenborg S, de Witt Hamer V, Philip C, Zwinderman A. Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis. Statistical Applications in Genetics and Molecular Biology. 2008;7(3) doi: 10.2202/1544-6115.1329. [DOI] [PubMed] [Google Scholar]

[R7] 7.Menezes R, Boetzer M, Sieswerda M, Van Ommen G, Boer J. Integrated analysis of DNA copy number and gene expression microarray data using gene sets. BMC bioinformatics. 2009;10(1):203. doi: 10.1186/1471-2105-10-203. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Choi H, Qin Z, Ghosh D. A double-layered mixture model for the joint analysis of DNA copy number and gene expression data. Journal of computational biology: a journal of computational molecular cell biology. 2010 doi: 10.1089/cmb.2009.0019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Amandine E, Marc A, de Tayrac Marie V, Frederique G, Stephan S, Abderrahmane H, Laurent R, Philippe M, Veronique Q, Jean M. DNA methylation in glioblastoma: impact on gene expression and clinical outcome. BMC Genomics. 2010;11 doi: 10.1186/1471-2164-11-701. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Bell D, Berchuck A, Birrer M, Chien J, Cramer D, Dao F, Dhir R, Disaia P, Gabra H, Glenn P, et al. Integrated genomic analyses of ovarian carcinoma. Nature. 2011 doi: 10.1038/nature10166. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Parmigiani G, Garrett E, Anbazhagan R, Gabrielson E. A statistical framework for expression-based molecular classification in cancer. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64(4):717–736. [Google Scholar]

[R12] 12.Besag J. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B (Methodological) 1974:192–236. [Google Scholar]

[R13] 13.Caragea P, Kaiser M. Autologistic models with interpretable parameters. Journal of agricultural, biological, and environmental statistics. 2009;14(3):281–300. [Google Scholar]

[R14] 14.Hughes J, Haran M, Caragea P. Autologistic models for binary data on a lattice. Environmetrics. 2011 [Google Scholar]

[R15] 15.Mitra R, Mueller P, Liang S, Yue L, Ji Y. A bayesian graphical model for chip-seq data on histone modifications. Journal of the American Statistical Association. In Press. [Google Scholar]

[R16] 16.Green P. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995;82(4):711–732. [Google Scholar]

PERMALINK

A Bayesian Graphical Model for Integrative Analysis of TCGA Data

Yanxun Xu

Jie Zhang

Yuan Yuan

Riten Mitra

Peter Müller

Yuan Ji

Abstract

I. Introduction

II. TCGA Ovarian Cancer Data

III. Probability Model

A. Sampling Model

B. Markov Chain Monte Carlo (MCMC) Simulations

IV. Simulation Study

Fig. 1.

V. Ovarian Cancer DATA Analysis

Fig. 2.

Fig. 3.

VI. Discussion

Acknowledgment

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A Bayesian Graphical Model for Integrative Analysis of TCGA Data

Yanxun Xu

Jie Zhang

Yuan Yuan

Riten Mitra

Peter Müller

Yuan Ji

Abstract

I. Introduction

II. TCGA Ovarian Cancer Data

III. Probability Model

A. Sampling Model

B. Markov Chain Monte Carlo (MCMC) Simulations

IV. Simulation Study

Fig. 1.

V. Ovarian Cancer DATA Analysis

Fig. 2.

Fig. 3.

VI. Discussion

Acknowledgment

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases