BAYESIAN HIERARCHICAL MODELING FOR SIGNALING PATHWAY INFERENCE FROM SINGLE CELL INTERVENTIONAL DATA

Ruiyan Luo; Hongyu Zhao

doi:10.1214/10-AOAS425

. Author manuscript; available in PMC: 2011 Dec 7.

Published in final edited form as: Ann Appl Stat. 2011;5(2A):725–745. doi: 10.1214/10-AOAS425

BAYESIAN HIERARCHICAL MODELING FOR SIGNALING PATHWAY INFERENCE FROM SINGLE CELL INTERVENTIONAL DATA^¹

Ruiyan Luo ¹, Hongyu Zhao ¹

PMCID: PMC3233205 NIHMSID: NIHMS323230 PMID: 22162986

Abstract

Recent technological advances have made it possible to simultaneously measure multiple protein activities at the single cell level. With such data collected under different stimulatory or inhibitory conditions, it is possible to infer the causal relationships among proteins from single cell interventional data. In this article we propose a Bayesian hierarchical modeling framework to infer the signaling pathway based on the posterior distributions of parameters in the model. Under this framework, we consider network sparsity and model the existence of an association between two proteins both at the overall level across all experiments and at each individual experimental level. This allows us to infer the pairs of proteins that are associated with each other and their causal relationships. We also explicitly consider both intrinsic noise and measurement error. Markov chain Monte Carlo is implemented for statistical inference. We demonstrate that this hierarchical modeling can effectively pool information from different interventional experiments through simulation studies and real data analysis.

Key words and phrases: Bayesian network, dependency network, Gaussian graphical model, hierarchical model, interventional data, Markov chain Monte Carlo, mixture distribution, single cell measurements, signaling pathway

1. Introduction

Cells respond to internal and external changes through signaling networks. One major research area in biology is to identify signaling proteins and understand how they coordinate to function properly. With recent technological advances in genomics and proteomics, researchers now can monitor and quantify molecular activities at the genome level, making it possible to reconstruct signaling pathways from these high-throughput data. Although efforts have been made to use microarray gene expression data and sequence data to reveal signaling pathways [e.g., Liu and Ringnér (2007)], these data are limited in two important aspects. First, signaling pathways function at the protein level, so measured gene expression levels from microarrays at most can provide a proxy to the protein activity levels. Second, each cell may behave differently from other cells due to complex interactions among many proteins, some substantially. Therefore, population level data collected by microarrays can mask individual cell differences, making it difficult to infer underlying pathways. In contrast, single cell level protein activity data offer much richer information for pathway inference.

Flow cytometry [Herzenberg et al. (2002); Perez and Nolan (2002)] is a powerful fluorescence-based technology that can make rapid, sensitive, and quantitative measurements of multiple proteins for thousands of individual cells. It can measure both a specific protein’s expression level and protein modification states such as phosphorylation. Therefore, phospho-protein responses to environmental stimulations can be monitored at the single cell level for thousands of cells very efficiently, and this technology has been employed to infer signaling pathways through gathering activity levels of multiple proteins under different stimulatory or inhibitory conditions [Sachs et al. (2005)]. We focus on the analysis of single cell flow cytometry data in this article.

Several methods have been applied for network inference based on genomics data, including Bayesian Networks (BNs) [Pe’er et al. (2001); Pe’er (2005)], Markov Networks (MNs, also called Markov random fields) [Wei and Li (2007, 2008)], and Dependency Networks (DNs) [Heckerman et al. (2000)]. Common to all these methods, each protein (or gene) is represented by a node and a dependency between two proteins is represented by an edge in the network. More formally, we define a graph Inline graphic = (V, E) with its nodes V = {1, …, P} and an edge set E. We use X_i to refer to the value of the ith node, that is, the expression level of the ith protein. The methods differ in how the edges are inferred from the observed data. In BN, the network is a directed acyclic graph where the state of each node only depends on its immediate ancestors. This structure imposes Markovian dependency among all the nodes stating that each variable is conditionally independent of its nondescendants given its parent variables. So the joint likelihood for all the nodes, that is, proteins, can be factored into a product of conditional probabilities. BNs pose significant computational challenges to learn the network structure because the model space to be explored is super-exponential in the number of genes to be studied. More recently, Ellis and Wong (2008) proposed a method to reduce the bias in the fast mixing algorithm proposed by Friedman and Killer (2003) to sample the BN structures from the posterior distribution. MNs are undirected graphical models and are similar to BNs in representation of dependencies: each random variable is conditionally independent of all other variables given its neighbors. Gaussian Graphical Models (GGMs) [Lauritzen (1996); Schäfer and Strimmer (2005); Dobra et al. (2004)], a subclass of MNs, assume a multivariate normal distribution as the joint distribution of random variables. The existence of an undirected edge in a GGM is implied by the nonzero partial correlation coefficient derived from the precision matrix. Some studies have found that BNs outperform GGMs in inferring networks based on interventional data where the biological system is perturbed through designed experiments, but GGMs may perform better for observational data, for example, Werhli, Grzegorczyk and Husmeier (2006). DNs aim to reduce computational burden where a large number of genes are modeled by building a collection of conditional distributions separately. DNs define the conditional distributions {p(X_i|X₋_i)} separately for each X_i. When we focus on sparse normal models, DNs define a set of P separate conditional linear regression models in which X_i is regressed on a small selected subset of predictor variables, which are determined separately.

Because a statistical association between two variables only implies association not causation, standard DN and GGM approaches cannot be used to infer causal networks, a goal in signaling pathway analysis. In this article we develop a Bayesian hierarchical modeling approach based on DNs to address this limitation. This is achieved through appropriate intervention experiments to dissect directional influences. To accommodate varying relationships among proteins under different experimental conditions, we allow a different set of regression models for each condition. At the same time, the hierarchical framework imposes similar functional forms across conditions to borrow information from different experiments. As for causal inference, the basic idea is that for any protein i, its regulators exert similar effects if it is not intervened, and would have no effect when i is controlled. In contrast to standard regression models where the predictors are assumed to be error-free, our model allows measurement errors in predictor variables.

A large part of the difficulty in the standard BN computation is due to the requirement that the network be acyclic. Our approach is not guaranteed to give acyclic networks. However, in terms of sensitivity and specificity to detect true edges, our method is competitive with the best methods that impose the acyclic graph assumption. This is illustrated by our results in the example of Sachs et al. (2005).

The paper is organized as follows. In Section 2 we describe two hierarchical models (a general hierarchical model where no constraints are imposed and a restricted hierarchical model where a symmetry constraint is imposed) and the methods for statistical inference of the network. For comparison, we also describe a nonhierarchical model where all the experiments are pooled together for analysis. Then we investigate the performance of these methods on simulated data in Section 3. In Section 4 we apply these methods to data from a study of the signaling networks of human primary naive CD4⁺ cells [Sachs et al. (2005)]. We finish the paper with discussions in Section 5.

2. Methods

The primary goal of our statistical model is to infer causal influences among proteins from interventional data. In this section we describe three models: a hierarchical model (HM), a restricted hierarchical model (RHM), and a nonhierarchical model (NHM), that can be used to infer the relationships among proteins. We also discuss statistical methods to infer causal networks in this section.

2.1. Hierarchical model (HM)

First we discuss a Bayesian hierarchical model to infer the relationships among proteins both at the overall level across all experiments and under individual experimental conditions. Our model incorporates both measurement errors and the intrinsic noises due to the biological process and unmodeled biological variations.

Let P denote the number of proteins, K denote the number of experimental conditions, and N_k denote the number of samples (individual cells) under the kth condition, k = 1, 2, …, K. We further let x̃_ink denote the true activity level of the ith protein in the nth cell under the kth experimental condition, and x_ink the measured value of its activity, where $x_{ink} = {\tilde{x}}_{ink} + ε_{ink}^{M}$ , and the measurement error $ε_{ink}^{M}$ is a normal random variable with mean 0 and standard deviation σ^M, that is, $ε_{ink}^{M} \sim N (0, {(σ^{M})}^{2})$ . Our model assumes that there exists a linear relationship among the activity levels of proteins. That is, for each protein i = 1, 2, …, P and for each condition k = 1, 2, …, K,

{\tilde{x}}_{ink} = α_{i 0}^{(k)} + \sum_{j \neq i} α_{i j}^{(k)} {\tilde{x}}_{jnk} + ε_{ink}^{I},

(1)

where $ε_{ink}^{I}$ is the intrinsic noise and has a normal distribution $N (0, {(σ_{i}^{I})}^{2})$ .² We assume that the error terms { $ε_{ink}^{I}$ } are independent, and are independent of the measurement errors { $ε_{ink}^{M}$ }. In equation (1), $α_{i j}^{(k)} = 0$ if there is no linear relationship between the activity levels of proteins i and j under the kth experimental condition. A nonzero value of $α_{i j}^{(k)}$ implies the existence of a linear relationship (but not necessarily a causal effect). To correctly infer the network among these proteins, we need to first find, for each protein, the subset of proteins that are linearly associated with its expression level, which is implied by the set of nonzero coefficients in (1). The linear relationship among the true expression values x̃_ink implies that the observed values are also linearly related:

\begin{array}{l} x_{ink} = α_{i 0}^{(k)} + \sum_{j \neq i} α_{i j}^{(k)} (x_{jnk} - ε_{jnk}^{M}) + ε_{ink}^{I} + ε_{ink}^{M} \\ = α_{i 0}^{(k)} + \sum_{j \neq i} α_{i j}^{(k)} x_{jnk} + ε_{ink}^{I} + ε_{ink}^{M} - \sum_{j \neq i} α_{i j}^{(k)} ε_{jnk}^{M} . \end{array}

(2)

Comparing (1) and (2), we can see that correctly inferring the relationship in the network depends on the correct inference of the set of nonzero coefficients in (2).

For each protein, we utilize indicator variables z_ij = 0/1 to denote the relationship between proteins i and j such that z_ij = 1 if and only if the coefficient of the jth protein in the regression model for the ith protein is nonzero. The values of z_ij may differ under different experimental conditions. For example, if protein j regulates protein i, z_ij = 1 when i is not controlled and the association strength between the two proteins should be similar under such conditions. However, z_ij = 0 when i is controlled because the relation between X_i and X_j is destroyed. Therefore, it is natural to use a hierarchical structure to formalize this thinking. We use $w_{i j}^{(k)}$ to denote the probability that z_ij = 1, that is, protein j is related to protein i under the kth experimental condition. The prior we take for the regression coefficient $α_{i j}^{(k)}$ is a mixture of two distributions. One is a point mass at zero, indicating that the jth protein is not linearly related to the ith protein under the kth condition. The other is a normal distribution for nonzero effects, with weight $w_{i j}^{(k)}$ . Specifically, the prior for the slope coefficient $α_{i j}^{(k)} (j \neq i)$ is

α_{i j}^{(k)} ∣ w_{i j}^{(k)}, α_{i j}, σ_{i j}^{α} \sim (1 - w_{i j}^{(k)}) δ_{0} (α_{i j}^{(k)}) + w_{i j}^{(k)} N (α_{i j}^{(k)} ∣ α_{i j}, {(σ_{i j}^{α})}^{2}),

(3)

where δ₀(·) indicates a point-mass at zero, and $w_{i j}^{(k)}$ is the probability that $α_{i j}^{(k)} \neq 0$ . When $α_{i j}^{(k)} \neq 0$ , the prior for $α_{i j}^{(k)}$ is $N (α_{i j}, {(σ_{i j}^{α})}^{2})$ with a common mean and a common standard deviation across different experimental conditions. Under this setup, information is shared for coefficients $α_{i j}^{(k)}$ across different conditions. Similarly, we borrow information for $w_{i j}^{(k)}$ across different experimental conditions by applying a beta distribution as a prior for $w_{i j}^{(k)}$ with a common mean w_ij and a common variance w_ij(1 − w_ij)/(v_ij + 1):

w_{i j}^{(k)} ∣ w_{i j}, v_{i j} \sim Beta (w_{i j} v_{i j}, (1 - w_{i j}) v_{i j}) .

(4)

So $w_{i j}^{(k)}$ measures the probability that there is an association between proteins i and j under the kth experimental condition, and w_ij measures the overall-level probability that the two proteins are associated.

To complete the model, we specify a beta distribution Beta(β₁, β₂) for w_ij, a normal distribution N(a⁽ⁱ⁾, τ⁽ⁱ⁾) for α_ij and gamma distributions: G(γ₁, γ₂), G(γ₃, γ₄) and G(γ₅, γ₆) for ${(σ_{i}^{I})}^{- 2}, {(σ_{i j}^{α})}^{- 2}$ and (σ^M)⁻², as their respective prior distributions. In the simulation studies, we take γ_i = β_j = 1 for i = 1, 2 and j = 1, 2, …, 6, a⁽ⁱ⁾ = 0 and τ⁽ⁱ⁾ = 1000 for i = 1, …, P. In real data analysis, we vary the hyperparameter values to study the sensitivity of the inference results to these values. We note that the posterior distribution is proper since we take proper priors for all the parameters.

One attractive feature of this model is that when $w_{i j}^{(k)}$ is integrated out, the marginal distribution of $α_{i j}^{(k)}$ is independent of v_ij:

α_{i j}^{(k)} ∣ w_{i j}, α_{i j}, σ_{i j}^{α} \sim (1 - w_{i j}) δ_{0} (α_{i j}^{(k)}) + w_{i j} N (α_{i j}^{(k)} ∣ α_{i j}, {(σ_{i j}^{(α)})}^{2}) .

(5)

Given $α_{i j}^{(k)}$ and w_ij, when v_ij is specified, the posterior distribution of $w_{i j}^{(k)}$ is

w_{i j}^{(k)} ∣ w_{i j}, v_{i j}, α_{i j}^{(k)} \sim Beta (w_{i j} v_{i j} + I (α_{i j}^{(k)} \neq 0), (1 - w_{i j}) v_{i j} + I (α_{i j}^{(k)} = 0)) .

(6)

Hence, we can first sample the posterior distributions of $α_{i j}^{(k)}$ and w_ij, and then sample $w_{i j}^{(k)}$ according to equation (6).

Under this model, the inference of the causal network consists of two steps. First, based on the posterior means ŵ₍_i,j₎ of the overall-level probability 0.5 × (w_ij + w_ji), we infer whether there is an association between proteins i and j with a certain threshold u₁. Second, for each pair of proteins (i, j) that are inferred to be associated, we determine their regulatory direction based on the experiment-level probabilities $w_{i j}^{(k)}$ to infer the causal network. The underlying assumption of our inference is that for a pair of proteins (i and j) that has a regulatory relation, say, i regulates j (i → j), controlling (inhibiting or activating) over protein j affects the activity of j but not i, resulting in much reduced or lack of association between i and j; controlling over protein i affects the activity of i and hence j, keeping the association between them. The posterior distributions of $w_{i j}^{(k)}$ given $α_{i j}^{(k)}$ and w_ij, with v_ij prespecified, is given in equation (6). To better reflect the changes of $w_{i j}^{(k)}$ for different k, we use v_ij ≡ 0.1 in our analysis, because larger values of v_ij (e.g., 10) are not able to reveal the changes in $w_{i j}^{(k)}$ , as the parameters in (6) are dominated by v_ij when v_ij is large, and $I (α_{i j}^{(k)} \neq 0)$ plays a smaller role in (6). To put this into more concrete terms, we consider w_ij = 0.9 as an example, which gives a strong support for the association between proteins i and j. The difference between the distributions Beta(0.9 × 10 + 1, 0.1 × 10) and Beta(0.9×10, 0.1×10+1) when v_ij = 10 is much less than that between Beta(0.9×0.1+1, 0.1×0.1) and Beta(0.9×0.1, 0.1×0.1+1) when v_ij = 0.1.

To determine the directions of edges, we calculate the posterior means ${\hat{w}}_{i j}^{(k)}$ of $w_{i j}^{(k)}$ for all k and (i, j) pairs. For each pair (i, j), if all the values in a stream (e.g., ${{\hat{w}}_{j i}^{(k)}}_{k}$ ) are small (less than a threshold u₂ > 0, so the signal in this stream is weak compared to noises), then we ignore this stream and infer the causal relations only based on the other one ( ${{\hat{w}}_{i j}^{(k)}}_{k}$ ). The inference is based on checking whether ${\hat{w}}_{i j}^{(k)}$ under specific conditions decreases greatly compared to the highest value. Let S_i,j = {k : k ∈ {i, j}} denote the set of conditions under which i or j is perturbed, and |S_i,j| be its cardinality. We propose the following four criteria to determine the causal relationship between an associated protein pair (i, j):

Case 1: |S_i,j| = 1, that is, i or j is only perturbed in one condition. Without loss of generality, we suppose that protein i is controlled under condition k′ (S_i,j = {k′}). If ${max}_{k} {{\hat{w}}_{i j}^{(k)}} - {\hat{w}}_{i j}^{(k^{'})} > u_{3}$ for a threshold u₃ > 0, then from stream ${{\hat{w}}_{i j}^{(k)}}_{k}$ we infer j → i. Otherwise, we infer i → j. Similarly, we make an inference from the stream ${{\hat{w}}_{j i}^{(k)}}_{k}$ . If the directions inferred from both streams are the same, say, i → j, we infer that direction as the direction of the edge between i and j: i → j. If the directions from both streams are different, we say that the direction of the edge is undetermined. Taking the conditions in Table 1 for the network in Figure 1 as an example, pairs (1, 2), (1, 8), (2, 6) (3, 4), (4, 5), (6, 8), (8, 10), and (8, 11) belong to Case 1.
Case 2: |S_i,j| > 1 and for all k ∈ S_i,j, the same protein, say, i, is controlled. For each stream, for example, ${{\hat{w}}_{i j}^{(k)}}_{k}$ , if ${max}_{k} {{\hat{w}}_{i j}^{(k)}} - {\hat{w}}_{i j}^{(k^{'})} > u_{3}$ for all k′ ∈ S_i,j, then we infer j → i; if ${max}_{k} {{\hat{w}}_{i j}^{(k)}} - {\hat{w}}_{i j}^{(k^{'})} \leq u_{3}$ for all k′ ∈ S_i,j, then we infer i → j; otherwise, we do not infer a direction from this stream. If both streams lead to a directional inference and the directions are the same (Figure 4, top panel), or if only one stream provides a directional inference, then we infer the direction of the edge. Otherwise, the direction is undetermined. For the conditions in Table 1, pairs (1, 9), (3, 9), (5, 7) (6, 7), (9, 10), (9, 11) belong to Case 2.
Case 3: |S_i,j| > 1 and both proteins are controlled in the experiments. Let $S_{i, j}^{(i)}$ denote the set of conditions under which protein i is controlled, and $S_{i, j}^{(j)}$ the set of conditions under which protein j is controlled. For each stream, for example, ${{\hat{w}}_{i j}^{(k)}}_{k}$ , we calculate the differences of ${\hat{w}}_{i j}^{(k)}$ when i or j is controlled: $d_{i j}^{(k_{1} k_{2})} = {\hat{w}}_{i j}^{(k_{1})} - {\hat{w}}_{i j}^{(k_{2})}$ for each $k_{1} \in S_{i, j}^{(i)}$ and $k_{2} \in S_{i, j}^{(j)}$ . If $d_{i j}^{(k_{1} k_{2})} > u_{3}$ for all $k_{1} \in S_{i, j}^{(i)}$ and $k_{2} \in S_{i, j}^{(j)}$ , we infer that i → j; if $d_{i j}^{(k_{1} k_{2})} \leq - u_{3}$ for all $k_{1} \in S_{i, j}^{(i)}$ and $k_{2} \in S_{i, j}^{(j)}$ , we infer that j → i; otherwise, the direction is undetermined from this stream. If both streams lead to a directional inference and the directions are the same, or if only one stream provides a directional inference, then we infer the direction of the edge. Otherwise, the direction is undetermined. For the conditions in Table 1, pairs (2, 9), (4, 9), (7, 8) (8, 9) belong to Case 3.
Case 4: |S_i,j| = 0, that is, no perturbation is conducted on either protein. In this case, we cannot infer the causal relation.

Table 1.

A summary of the nine experimental conditions for the data in Sachs et al. (2005)

	Stimulus	Effect
1	CD3, CD28	general perturbation
2	ICAM2	general perturbation
3	Akt-inhibitor	Inhibits Akt
4	G0076	Inhibits Pkc
5	Psi	Inhibits Pip2
6	U0126	Inhibits Mek
7	Ly	Inhibits Akt
8	PMA	Activates Pkc
9	β₂cAMP	Activates Pka

Open in a new tab

Fig. 1 — Pathway adapted from Sachs et al. (2005) by including three missed edges and correcting one reversed edge. Nodes represent proteins, and directed edges represent signal transduction.

Fig. 4 — Boxplots of the samples from the posterior distributions of $w_{57}^{(k)}$ and $w_{75}^{(k)}$ (top panel), $w_{67}^{(k)}$ and $w_{76}^{(k)}$ (bottom panel) when *v_ij* ≡ 0.1 for all i and j. This is from one MCMC run of the HM on the simulated data with constant $σ_{i}^{I}$ .

The choices of the thresholds u₁, u₂, and u₃ will be discussed in simulation studies. Generally speaking, the two steps involved in causal network inference are based on the posterior distributions of w_ij and $w_{i j}^{(k)}$ . The overall-level probability w_ij measures the strength of the linear relationship between two proteins across all the conditions. Based on w_ij, we infer the set of proteins that are related from which we determine an undirected graph. The changes in the experiment-level probabilities $w_{i j}^{(k)}$ offer insights on the directions of causal regulations.

2.2. Restricted hierarchical model (RHM)

The w_ij and $w_{i j}^{(k)}$ in (3), (4), and (6) denote the probability that protein j is included in the linear model to predict the activity level of protein i across all the experiments and under the kth specific condition, respectively. In this framework, we may impose the constraint that w_ji = w_ij and $w_{j i}^{(k)} = w_{i j}^{(k)}$ for each k, that is, the existence of a linear relationship between proteins i and j is independent of which variable is the predictor and which is the response variable. We can infer the posterior distributions of w_ij and $w_{i j}^{(k)}$ under this constraint, and call this model a restricted hierarchical model (RHM). Based on the posterior means of w_ij, we can infer whether proteins i and j are associated with each other by setting up an appropriate threshold $u_{1}^{'}$ . For each associated pair, we can infer the causal relationship according to the changes in $w_{i j}^{(k)}$ . The choice of the threshold will be illustrated in Section 3.1 and Supplementary Material S3 [Luo and Zhao (2010)] details the criteria in determining the causal relations for the associated pairs of proteins. Different from HM, we must prespecify v_ij in RHM to sample from the posterior distributions of w_ij and $w_{i j}^{(k)}$ . We will show how different values of v_ij affect the network inference in the following discussion.

2.3. Nonhierarchical model (NHM)

To demonstrate the usefulness of the hierarchical model approach, we also consider a nonhierarchical model (NHM) as a reference model for comparisons. The NHM assumes a linear model among the activity levels of proteins and incorporates both measurement errors and intrinsic noises as in equation (2). The main difference is that this NHM assumes identical regression coefficients across different experimental conditions:

x_{ink} = α_{i 0} + \sum_{j \neq i} α_{i j} x_{jnk} + ε_{ink}^{I} + ε_{ink}^{M} - \sum_{j \neq i} α_{i j} ε_{jnk}^{M},

(7)

where the intrinsic noise $ε_{ink}^{I}$ follows the normal distribution $N (0, {(σ_{i}^{I})}^{2})$ , the measurement error $ε_{ink}^{M}$ follows the normal distribution N(0, (σ^M)²), and they are assumed to be independent. As in HM, we also apply mixture distributions as priors for the coefficients α_ij:

α_{i j} \sim (1 - w_{i j}) δ_{0} (α_{i j}) + w_{i j} N (α_{i j} ∣ a, τ^{2}) .

(8)

The posterior distributions of w_ij provide information about whether proteins i and j are associated. However, it is impossible to make causal inference from this model.

For all three models, we use MCMC methods to sample the posterior distributions. Supplementary Material S2 [Luo and Zhao (2010)] provides details of the MCMC updates for HM. The MCMC updates for RHM and NHM are similar and not shown in this paper.

3. Simulation study

We first apply our methods to simulated data to illustrate how to infer the causal network from the posterior distributions of the overall-level probabilities w_ij and the experiment-level probabilities $w_{i j}^{(k)}$ , for both HM and RHM. We also study how the inference differs between these two methods and for different choices of v_ij. We then study the performance of our methods on simulated data with heavy tail distributed intrinsic noises.

We simulate data based on the network shown in Figure 1, which is adapted from Sachs et al. (2005) by correcting one reversed edge and including three missed edges. From Figure 1, we can derive the parent set for each node (protein). For any protein i, we first generate the association strength α_ij from the uniform distribution over the interval [0.5, 2], and randomly assign the sign of α_ij. Given the activities of its parents, we simulate the activity x̃_i of protein i from the normal distribution: ${\tilde{x}}_{i} \sim N (α_{i 0} + \sum_{j} α_{i j} {\tilde{x}}_{j}, {(σ_{i}^{I})}^{2})$ , where the sum extends over all parents of protein i. Thus, we get the empirical distribution of x̃_i when protein i is not intervened. Let x_i denote the observed expression level of protein i, then x_i is simulated from N(x̃_i, (σ^M)²). We simulate the interventional data as follows. For an intervention experiment, if the ith protein is inhibited, we sample x̃_i from the left tail of its empirical distribution obtained when protein i is not perturbed, beyond the 5th percentile. If the ith node is stimulated, we sample x̃_i from the right tail of the empirical distribution, beyond the 95th percentile. We simulate a total of nine stimulatory or inhibitory interventional conditions, as summarized in Table 1. Under each perturbation condition, we simulate expression levels for each of the 11 proteins for 600 individual cells. We consider two cases: (1) constant intrinsic variances ${(σ_{i}^{I})}^{2} \equiv 1$ and (2) variable intrinsic variances with $σ_{i}^{I} = 0.1 \times \sqrt{IG (2, 1)}$ , where IG(2, 1) represents the inverse gamma distribution with mean 1 and variance ∞. Finally, we simulate data where the intrinsic noises are sampled from a heavy tail distribution: t(1), which represent a central t distribution with one degree of freedom.

3.1. Constant intrinsic variance ${(σ_{i}^{I})}^{2} \equiv 1$

3.1.1. Inference from HM

Based on the simulated data, we obtain samples for both w_ij and w_ji from their posterior distributions under HM. To infer whether an association exists between proteins i and j, we obtain the posterior means ŵ₍_i,j₎ of the average of the probability that each is included in the regression model of the other: (w_ij + w_ji)/2. Higher values of ŵ₍_i,j₎ imply stronger evidence of association between the two proteins. Figure 2 shows the posterior means ŵ₍_i,j₎, from one MCMC run, for each pair (i, j) (i < j), in the ascending order of ŵ₍_i,j₎. Large solid circles represent true associations, and small empty ones represent false ones. We see that the true associations dominate the higher values of ŵ₍_i,j₎.

Fig. 2 — Posterior means ŵ₍*_i,j*₎ of (*w_ij* + *w_ji*)/2, sorted in increasing order, from one MCMC run of the HM on the simulated data with constant intrinsic variances. Large solid and small empty circles represent true and false associations, respectively.

To infer the pair of proteins that are associated, we need to set a threshold u₁ on the posterior means ŵ₍_i,j₎ so that those above the threshold are inferred to be associated. The permutation study³ offers an over-liberal threshold (<0.1), based on which we get over 40 associations with false positive rate ≥0.5. Noting the jumps in the plot of ŵ₍_i,j₎, we propose to choose the threshold where a big jump occurs. Setting the threshold u₁ as any value between 0.2 and 0.4 and choosing the pairs with ŵ₍_i,j₎ > u₁, we get 22 associations with 2 false positives. When we have multiple MCMC runs, which lead to multiple plots of ŵ₍_i,j₎, we can combine the inferences from them. Figure S1 in the Supplementary Material [Luo and Zhao (2010)] draws the plots of ŵ₍_i,j₎ from four additional MCMC runs. They show the same features as seen in Figure 2 that true associations tend to have high ŵ₍_i,j₎ values and jumps exist in these plots. These five MCMC runs lead to quite similar results: from four of them we get 22 associations with 2 false positives, and from a fifth run we get 21 associations with 2 false positives and 1 missing association, when we choose u₁ between 0.3 and 0.4. Let u_f be the relative frequency that each association is selected. When u₁ ∈ (0.3, 0.4) and u_f ≥ 4/5, we get 22 associations with 2 false positives (Figure 3).

Fig. 3 — Networks inferred by choosing associations with u₁ ∈ (0.3, 0.4), u₂ = 0.1, u₃ ∈ (0.3, 0.5), and *u_f* ≥ 0.8 in five MCMC runs of the HM on the simulated data with constant $σ_{i}^{I}$ . Solid arrowed lines represent correctly inferred true edges, dashed thick lines with labels “u” represent edges whose directions cannot be determined from the simulations, dashed arrowed thin lines with labels “r” represent reversed edges, and dotted lines with labels “+” represent false positive edges.

For the pairs of proteins that are inferred to be associated, we then infer their causal directions based on the criteria listed in Section 2.1. To better illustrate the criteria, we give two examples in Figure 4. The top panel draws the boxplots of the samples from the posterior distributions of $w_{57}^{(k)}$ and $w_{75}^{(k)}$ . Both show that the experimental-level probabilities greatly decreased under conditions 3 and 7 where protein 7 (Akt) is inhibited (here ${max}_{k} {{\hat{w}}_{57}^{(k)}} - {\hat{w}}_{57}^{(k^{'})} \geq 0.8$ and ${max}_{k} {{\hat{w}}_{75}^{(k)}} - {\hat{w}}_{75}^{(k^{'})} = 0.9$ for k′ = 3, 7). So we infer the direction 5 → 7 (i.e., PIP3 → Akt). The bottom panel tells a different story. The posterior means of $w_{76}^{(k)}$ when k = 3 or 7 are much smaller than others ( ${max}_{k} {{\hat{w}}_{76}^{(k)}} - {\hat{w}}_{76}^{(k^{'})} = 0.9$ ), indicating the causal relation 6 → 7, but $w_{67}^{(k)}$ keeps the same level under all conditions ( ${max}_{k} {{\hat{w}}_{67}^{(k)}} - {\hat{w}}_{67}^{(k^{'})} = 0$ ), indicating the causal relation 7 → 6. The contradictory results from $w_{67}^{(k)}$ and $w_{76}^{(k)}$ lead to the failure in determining the causal relationship between proteins 6 (Erk) and 7. Taking u₂ = 0.1 and u₃ ∈ (0.3, 0.5), we infer a causal network as shown in Figure 3, which contains 14 true directed edges, 5 edges whose directions are undetermined, 1 reversed edge, and 2 false edges.

When we have multiple MCMC runs, we infer the causal relation of each edge based on the majority vote of the directions inferred from each MCMC run. In fact, these five runs lead to almost identical causal inference for the common associations [based on u₁ ∈ (0.3, 0.4)] when we take u₂ = 0.1 and u₃ ∈ (0.3, 0.5). The choices of u₂ and u₃ are affected by the value of $v_{i j}^{(k)}$ . Choosing $v_{i j}^{(k)} \equiv 0.1$ ensures that most ${\hat{w}}_{i j}^{(k)}$ are either above 0.9 or below 0.1. The streams with ${\hat{w}}_{i j}^{(k)} \leq 0.1$ for all k contain too weak a signal to provide sufficient information for causal inference. So we take u₂ = 0.1. The small value of $v_{i j}^{(k)}$ also leads to a great difference in ${\hat{w}}_{i j}^{(k)}$ for different experiments when protein i or j is intervened. In this simulation study, intervention of the child node for one edge leads to a decrease of at least 0.5 in ${\hat{w}}_{i j}^{(k)}$ . Any value of u₃ in (0.3, 0.5) leads to the same directional inference for the inferred associations.

3.1.2. Inference from RHM

RHM requires that w_ij = w_ji and $w_{i j}^{(k)} = w_{j i}^{(k)}$ for all i, j, and k. This restriction aims at avoiding the nonconsistent directional inferences based on $w_{j i}^{(k)}$ and $w_{i j}^{(k)}$ separately as HM does. We plot the posterior means ŵ₍_i,j₎ of w_ij in Figure 5 where we take v_ij = 0.1. Similar to Figures 2 and S1, true associations tend to have higher values of ŵ₍_ij₎. Setting $u_{1}^{'} = 0.2$ , we infer 22 associations with 2 false positives. Applying the criteria listed in Supplementary Material S3 [Luo and Zhao (2010)], we infer the causal network as shown in Figure 5. Compared to the network in Figure 3, RHM leads to a network with 16 true directed edges, 3 edges whose directions are undetermined, 1 reversed edge, and 2 false edges when we take $u_{1}^{'} = 0.2$ and u₃ ∈ (0.3, 0.5). If we increase the threshold $u_{1}^{'}$ to a value where there is a big jump, for example, 0.3, we will miss 1 true directed edge.

Fig. 5 — Inference from RHM with *v_ij* = 0.1 on the simulated data with constant $σ_{i}^{I}$ . Left: posterior means ŵ₍*_i,j*₎ of *w_ij*, sorted in increasing order. Right: inferred networks with $u_{1}^{'} = 0.2$ , u₃ ∈ (0.3, 0.5). Solid arrowed lines represent correctly inferred true edges, dashed thick lines with labels “u” represent edges whose directions cannot be determined from the simulations, dashed arrowed thin lines with labels “r” represent reversed edges, and dotted lines with labels “+” represent false positive edges.

When a bigger value v_ij = 10 is applied, the differences of the posterior means ŵ₍_i,j₎ of w_ij become much smaller between the true and false associations (Figure S2 in the Supplementary Material [Luo and Zhao (2010)]). This together with the fact that bigger values of v_ij lead to smaller changes in experimental level probabilities results in our conclusion that a small v_ij is preferred for causal network inference.

3.1.3. Inference from NHM

Ignoring the effect of perturbations on signaling pathway, NHM assumes a common coefficient α_ij in the linear regression models across all experimental conditions. From this model, we can only infer whether there is an association between two proteins. Similar to the inferred posterior means from HM, ŵ₍_i,j₎ from NHM also tend to take higher values for true associations (Figure 6), but with two differences. First, the range of ŵ₍_i,j₎ from NHM is smaller. In other words, compared to HM, NHM leads to smaller values of the biggest ŵ₍_i,j₎, and larger values of the smallest ŵ₍_i,j₎. So the support for true associations and the evidence against false associations are weaker. Second, the dominance of high values of true associations is not as strong as that from HM. More false associations take higher values of ŵ₍_i,j₎ than the hierarchical inference. If we take 0.6 as a threshold, we infer 23 associations with 15 true and 8 false. Taking 0.45 as the threshold, we recover all the true associations, but 27 false ones are also inferred. More importantly, we cannot determine the directions of associations from NHM because perturbation information is not utilized in this model.

Fig. 6 — Posterior means ŵ₍*_i,j*₎ of (*w_ij* + *w_ji*)/2, sorted in increasing order, from one MCMC run of NHM. Small empty and large solid circles represent the false and true associations, respectively.

3.2. Variable intrinsic variances ${(σ_{i}^{I})}^{2}$

We then consider the case where variances of intrinsic noises vary for different proteins. In this case, both HM and RHM with v_ij = 0.1 clearly separate the true associations from the false ones in the plot of the posterior means ŵ₍_i,j₎ (Figure 7). The causal networks inferred from both models are the same, with 18 correctly inferred true edges, 1 reversed edge, and 1 edge whose direction is undetermined [u₁ ∈ (0.2, 0.7), u₂ = 0.1, and u₃ = 0.3]. As in Section 3.1.2, RHM with a bigger value v_ij = 10 leads to association inference with bigger false positive rate and smaller changes in experimental level probabilities when a child node is perturbed (Figure S2 in the Supplementary Material [Luo and Zhao (2010)]). NHM is not applied here and thereafter since it does not provide causal relations.

Fig. 7 — Inference results for the simulated data with variable intrinsic variances. Upper panel: posterior means ŵ₍*_i,j*₎ of *w_ij*, sorted in increasing order from HM (left) and RHM with *v_ij* = 0.1 (right). Lower panel: inferred networks from both models with u₁ ∈ (0.2, 0.7) and u₃ = 0.3. Solid arrowed lines represent correctly inferred true edges, dashed thick lines with labels “u” represent edges whose directions cannot be determined from the simulations, and dashed arrowed thin lines with labels “r” represent reversed edges.

3.3. Heavy tail distribution for intrinsic noise

Considering the possibility of nonnormality for real biological processes, we simulate data where the expression levels of proteins have heavy tail distribution. This is realized by simulating $ε_{ink}^{I} \sim t (1)$ for each protein i under each experimental condition k. We reuse the parameter settings in Sections 3.1 and 3.2 so that the performance of our methods on the normal and nonnormal cases can be easily compared. We summarize the network inference results in Table 2. Due to the model misspecification when we use HM to analyze these heavy tail distributed data, we infer networks with more false positive and false negative edges. Therefore, our current model needs to be extended to analyze heavy tail data.

Table 2.

Summary of pathway inference in simulation study

Data	Methods	True	Undetermined	Reversed	Missing	False	Hamming distance
Data-1	HM	14	5	1	0	2	8
	RHM	16	3	0	1	2	6
Data-2	HM	18	1	1	0	0	2
	RHM	18	1	1	0	0	2
Data-1^t	HM	9	4	1	6	4	15
Data-2^t	HM	14	1	0	5	6	12

Open in a new tab

All are based on u₁ = 0.4 and u₃ = 0.3. The hamming distance is the minimum number of simple operations needed to go from the inferred graph to the true graph. Here simple operations include adding or removing an edge, and adding, removing, or changing the direction of an edge. Data-1: simulated data in Section 3.1 with constant intrinsic variances. Data-2: simulated data in Section 3.2 with varying intrinsic variances. Data-1^t: simulated data with parameter settings in Data-1 and intrinsic noises sampled from t(1). Data-2^t: simulated data with parameter settings in Data-2 and intrinsic noises sampled from t(1).

4. Case study

The Mitogen-Activated Protein Kinase (MAPK) pathways transduce a large variety of external signals, leading to a wide range of cellular responses such as growth, differentiation, inflammation, and apoptosis. External stimuli are sensed by cell surface markers, then travel through a cascade of protein modifications of signaling proteins, and eventually lead to changes in nuclear transcription. Single cell interventional data of 11 well-studied proteins from the MAPK pathways were originally generated by Sachs et al. (2005) using the intracellular multicolor flow cytometry technique. This pathway was perturbed by 9 different stimuli, each targeting a different protein in the selected pathway (Figure 1 and Table 1). Sachs et al. (2005) applied Bayesian network analysis to infer the causal protein-signaling network. Correcting the bias in the commonly used algorithm proposed by Friedman and Killer (2003), Ellis and Wong (2008) reanalyzed this data set through sampling BN structures from the correct posterior distribution. Both studies used the discretized data where the protein expression levels were grouped into three levels: “low,” “middle,” and “high.” The inhibited molecules were set at “low” values, and activated molecules were set to level “high.” We apply our method to this data and compare the results with those from Ellis and Wong (2008).

We infer the networks using HM and RHM with v_ij = 0.1 and v_ij = 10. Each analysis has five MCMC runs. Figure 8 shows the inferred posterior means ŵ₍_i,j₎ in one MCMC run (more can be found in Supplementary Figures S4 ~ S6 [Luo and Zhao (2010)]), and the inferred networks from five MCMC runs, from each method. We use the same symbols as in simulation studies to indicate true or false inferences, where the “true” network is taken to be the network in Figure 3 of Sachs et al. (2005), which is the current understanding of this pathway.

Fig. 8 — Inference results for the real data. From top to bottom: HM, RHM with *v_ij* = 0.1, and RHM with *v_ij* = 10. In networks, solid arrowed lines represent correctly inferred true edges, dashed thick lines with labels “u” represent edges whose directions cannot be determined from the simulations, dashed arrowed thin lines with labels “r” represent reversed edges, dotted arrowed thick lines represent missing edges, and dotted thin lines with labels “+” represent false positive edges.

Compared to HM, RHMs lead to fewer true associations with high values of ŵ₍_i,j₎ (v_ij = 10) or smaller gaps of ŵ₍_i,j₎ between most true and false associations (v_ij = 0.1). Taking the threshold u₁ = 0.2 and requiring u_f ≥ 0.6 in five runs of HM, we get 21 associations, with 5 missing edges and 6 false positives. Requiring $u_{1}^{'} = 0.11$ and u_f ≥ 0.6 in RHM with v_ij = 0.1, we get 19 associations, with 5 missing edges and 4 false ones. The threshold 0.11 exceeds the value (0.1) from the permutation study by only a small amount, implying that RHM offers weaker support to true associations than HM. Setting $u_{1}^{'} = 0.994$ and u_f ≥ 0.6 in RHM with v_ij = 10, we only get 14 associations, with 8 missing and 2 false associations.

From RHM, we can only correctly infer the directions of five or six edges. The causal relations for most inferred associations can not be determined. But HM leads to a better result: 9 true directed edges, 1 direction-undetermined, 4 reversed, 6 missed, and 4 false edges, under the thresholds u₁ = 0.2, u₂ = 0.1, u₃ = 0.3, and u_f ≥ 0.8 (Figure 8 and Table 3). This inferred network is comparable with that from Ellis and Wong (2008), which contains 9 true directed edges, 3 reversed, 8 missed, and 6 false edges. The Hamming distances of these two networks to Figure 1 are 15 and 17, respectively. These results are summarized in Table 3.

Table 3.

Summary of the inferred networks applying different methods to the real data

	True	Undetermined	Reversed	Missing	False	Hamming distance
HM	9	1	4	6	4	15
RHM v_ij = 0.1	6	8	1	5	4	18
RHM v_ij = 10	5	5	2	8	2	17
mHM	6	5	1	8	4	18
BN	9	0	3	8	6	17

Open in a new tab

Here mHM denotes the modified model described in Supplementary Material S1 [Luo and Zhao (2010)] which models the varying variances of intrinsic noises.

In MCMC analysis, we take β_i = γ_j = 1 for i = 1, 2 and j = 1, …, 6. To check the sensitivity of HM, we also consider other values: γ_i = 0.1 or 100, and β_j = 0.1 or 0.0001. Taking β_j = 0.1, we get a network (not shown) with 9 true directed edges, 1 direction-undetermined, 4 reversed, 6 missed, and 6 false associations. Other values of the hyperparameters result in 1~3 fewer true associations, and at least 3 fewer true directed edges. All these results are based on 5,500,000 iterations of MCMC updates in each run, which take about 20 hours on a node with an Intel(R) Xeon(R) 3 GHz CPU and a 16G memory.

5. Discussion

We have proposed hierarchical statistical methods to infer a signaling pathway from single cell data collected from a set of perturbation experiments. The advantage of this method is that it provides a more explicit framework to relate the activity levels of different proteins. In our models, we assume that the activity level of each protein is linearly associated with a small subset of other proteins under each condition. Using a Bayesian hierarchical structure, we model the existence of an association between two proteins both at the overall level and at the experimental level. The overall-level probabilities measure the strength of associations between any two proteins across all experiments. The experimental-level probabilities reflect the changes of associations between proteins under different conditions. Our inferential procedure consists of two steps. First we infer the existence of an association between any pair of proteins based on the overall-level probabilities. Then for those pairs of proteins inferred to be associated, we infer the directions of the causal relations based on the changes in the experimental level probabilities. The basic rationale in our causal inference is that for two associated proteins, controlling over the target molecule destroys the association, while perturbing the regulatory molecule does not.

We consider hierarchical models with (RHM) and without (HM) the restriction that w_ij = w_ji and $w_{i j}^{(k)} = w_{j i}^{(k)}$ for each k. For RHM, we have to specify the hyperparameter v_ij prior to MCMC analysis. We have considered the inference results when the value of v_ij is set at 0.1 and 10. Higher values of v_ij lead to higher ranges of the inferred overall-level and experimental-level probabilities, and smaller changes in experimental-level probabilities. In HM, the experimental-level probabilities can be integrated out, so the posterior inference of other parameters is independent of v_ij. Hence, the choice of associations, which is based on the overall-level probabilities, is independent of v_ij. We only need to specify v_ij in the causal inference. To better reflect the changes of the experimental-level probabilities, we suggest smaller values for v_ij, for example, v_ij = 0.1. Both HM and RHM perform well in simulation studies.

We need to choose thresholds to infer the causal network: u₁ for association inference and u₂ and u₃ for causal directional inference. Noting the jumps in the plots of ŵ₍_ij₎, we propose to choose the threshold u₁ where there are great differences in sorted ŵ₍_ij₎. This is easily determined when variations in the data are well captured by the proposed hierarchical models (e.g., Figures 2 and 7). If there are no great differences in the sorted overall-level probabilities (e.g., Figure 8), one may decide the number of edges to be included and then choose the top ones. Threshold u₂, which is taken as 0.1 in our study, can be chosen based on the experimental-level probabilities of those unassociated pairs of proteins. Threshold u₃ is closely related to v_ij, which measures the variability in ${\hat{w}}_{i j}^{(k)}$ in that a smaller v_ij leads to greater variabilities in ${\hat{w}}_{i j}^{(k)}$ between the experiments when the target protein is and is not intervened. When v_ij = 0.1, a difference of 0.3 in ${\hat{w}}_{i j}^{(k)}$ is enough to show the effect of intervening the target protein.

Compared to the nonhierarchical model, hierarchical models have at least two advantages. First, the hierarchical structure allows information borrowing across different experiments while allowing for differences among experiments, leading to a more clear-cut inference on whether two proteins are related. Second, this modeling framework allows us to infer causal relationships between proteins from the presence and absence of the association across different perturbation conditions. Overall, our proposed hierarchical modeling provides a general framework for inferring networks from high-throughout data.

There are several possible ways of extending this model. In Supplementary Material S1 [Luo and Zhao (2010)] we modify HM by incorporating varying variances of intrinsic noises under different experimental conditions. The modified model does not outperform HM in our simulation study. It is interesting to investigate when the varying variances of intrinsic noises are not ignorable and incorporating them improves the network inference. We also find in our simulation study that applying our methods to data where intrinsic noises are sampled from heavy tail distributions results in power loss in pathway inference. Therefore, there is a need to extend this hierarchical structure to model nonnormal data.

Supplementary Material

Supplementary Data

NIHMS323230-supplement-Supplementary_Data.pdf^{(146KB, pdf)}

Footnotes

Supported in part by National Science Foundation Grant DMS-07-14817, National Heart, Lung, and Blood Institute Contract N01 HV28186, National Institute of Drug Abuse Grant P30 DA018343, National Institute of General Medical Sciences Grant R01 GM59507, Yale University Biomedical High Performance Computing Center and NIH Grant RR19895, which funded the instrumentation.

Here a constant variance ${(σ_{i}^{I})}^{2}$ is assumed for the intrinsic noises of a particular protein. We can relax this assumption and allow varying variances ${(σ_{i k}^{I})}^{2}$ for intrinsic noises under different experimental conditions. This extended model and simulation results are described in Supplementary Material S1 [Luo and Zhao (2010)].

We permute the observations for each protein and then analyze the permuted data with HM. The obtained posterior means ŵ₍_i,j₎ are less than 0.1 for all (i, j) pairs.

Contributor Information

Ruiyan Luo, Email: ruiyan.luo@yale.edu.

Hongyu Zhao, Email: hongyu.zhao@yale.edu.

References

Dobra A, Hans C, Jones B, Nevins J, Yao G, West M. Sparse graphical models for exploring gene expression data. J Multivariate Anal. 2004;90:196–212. [Google Scholar]
Ellis B, Wong WH. Learning causal Bayesian network structures from experimental data. J Amer Statist Assoc. 2008;103:778–789. [Google Scholar]
Friedman N, Killer D. Being Bayesian about network structure. Machine Learning. 2003;50:95–126. [Google Scholar]
Heckerman D, Chickering DM, Meek C, Rounthwaite R, Kadie C. Dependency networks for inference, collaborative filtering, and data visulization. J Mach Learn Res. 2000;1:49–75. [Google Scholar]
Herzenberg LA, Parks D, Sahaf B, Perez O, Roederer M, Herzenberg LA. The history and future of the fluorescence activated cell sorter and flow cytometry: A view from Stanford. Clinical Chemistry. 2002;48:1819–1827. [PubMed] [Google Scholar]
Lauritzen SL. Graphical Models. Clarendon Press; Oxford: 1996. [Google Scholar]
Liu Y, Ringneŕ M. Revealing signaling pathway deregulation by using gene expression signatures and regulatory motif analysis. Genome Biology. 2007;8:R77.1–R77.10. doi: 10.1186/gb-2007-8-5-r77. [DOI] [PMC free article] [PubMed] [Google Scholar]
Luo R, Zhao H. Supplementary material for “Bayesian hierarchical modeling for signaling pathway inference from single cell interventional data. 2010 doi: 10.1214/10-AOAS425SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pe’er D. Bayesian network analysis of signaling networks: A primer. Science’s STKE. 2005;281:1–12. doi: 10.1126/stke.2812005pl4. [DOI] [PubMed] [Google Scholar]
Pe’er D, Regev A, Elidan G, Friedman N. Inferring subnetworks from perturbed expression profiles. Bioinformatics. 2001;17(Suppl):S215–S224. doi: 10.1093/bioinformatics/17.suppl_1.s215. [DOI] [PubMed] [Google Scholar]
Perez OD, Nolan G. Simultaneous measurement of multiple active kinase states using polychromatic flow cytometry. Nature Biotechnology. 2002;20:155–162. doi: 10.1038/nbt0202-155. [DOI] [PubMed] [Google Scholar]
Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP. Causal protein-signaling networks derived from multiparameter single-cell data. Science. 2005;308:523–529. doi: 10.1126/science.1105809. [DOI] [PubMed] [Google Scholar]
Schäfer J, Strimmer K. An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics. 2005;21:754–764. doi: 10.1093/bioinformatics/bti062. [DOI] [PubMed] [Google Scholar]
Wei W, Li H. A Markov random field model for network-based analysis of genomic data. Bioinformatics. 2007;23:1537–1544. doi: 10.1093/bioinformatics/btm129. [DOI] [PubMed] [Google Scholar]
Wei W, Li H. A hidden spatial–temporal Markov random field model for network-based analysis of time course gene expression data. Ann Appl Statist. 2008;2:408–429. [Google Scholar]
Werhli AV, Grzegorczyk M, Husmeier D. Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models and Bayesian networks. Systems Biology. 2006;22:2523–2531. doi: 10.1093/bioinformatics/btl391. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

NIHMS323230-supplement-Supplementary_Data.pdf^{(146KB, pdf)}

[R1] Dobra A, Hans C, Jones B, Nevins J, Yao G, West M. Sparse graphical models for exploring gene expression data. J Multivariate Anal. 2004;90:196–212. [Google Scholar]

[R2] Ellis B, Wong WH. Learning causal Bayesian network structures from experimental data. J Amer Statist Assoc. 2008;103:778–789. [Google Scholar]

[R3] Friedman N, Killer D. Being Bayesian about network structure. Machine Learning. 2003;50:95–126. [Google Scholar]

[R4] Heckerman D, Chickering DM, Meek C, Rounthwaite R, Kadie C. Dependency networks for inference, collaborative filtering, and data visulization. J Mach Learn Res. 2000;1:49–75. [Google Scholar]

[R5] Herzenberg LA, Parks D, Sahaf B, Perez O, Roederer M, Herzenberg LA. The history and future of the fluorescence activated cell sorter and flow cytometry: A view from Stanford. Clinical Chemistry. 2002;48:1819–1827. [PubMed] [Google Scholar]

[R6] Lauritzen SL. Graphical Models. Clarendon Press; Oxford: 1996. [Google Scholar]

[R7] Liu Y, Ringneŕ M. Revealing signaling pathway deregulation by using gene expression signatures and regulatory motif analysis. Genome Biology. 2007;8:R77.1–R77.10. doi: 10.1186/gb-2007-8-5-r77. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Luo R, Zhao H. Supplementary material for “Bayesian hierarchical modeling for signaling pathway inference from single cell interventional data. 2010 doi: 10.1214/10-AOAS425SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Pe’er D. Bayesian network analysis of signaling networks: A primer. Science’s STKE. 2005;281:1–12. doi: 10.1126/stke.2812005pl4. [DOI] [PubMed] [Google Scholar]

[R10] Pe’er D, Regev A, Elidan G, Friedman N. Inferring subnetworks from perturbed expression profiles. Bioinformatics. 2001;17(Suppl):S215–S224. doi: 10.1093/bioinformatics/17.suppl_1.s215. [DOI] [PubMed] [Google Scholar]

[R11] Perez OD, Nolan G. Simultaneous measurement of multiple active kinase states using polychromatic flow cytometry. Nature Biotechnology. 2002;20:155–162. doi: 10.1038/nbt0202-155. [DOI] [PubMed] [Google Scholar]

[R12] Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP. Causal protein-signaling networks derived from multiparameter single-cell data. Science. 2005;308:523–529. doi: 10.1126/science.1105809. [DOI] [PubMed] [Google Scholar]

[R13] Schäfer J, Strimmer K. An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics. 2005;21:754–764. doi: 10.1093/bioinformatics/bti062. [DOI] [PubMed] [Google Scholar]

[R14] Wei W, Li H. A Markov random field model for network-based analysis of genomic data. Bioinformatics. 2007;23:1537–1544. doi: 10.1093/bioinformatics/btm129. [DOI] [PubMed] [Google Scholar]

[R15] Wei W, Li H. A hidden spatial–temporal Markov random field model for network-based analysis of time course gene expression data. Ann Appl Statist. 2008;2:408–429. [Google Scholar]

[R16] Werhli AV, Grzegorczyk M, Husmeier D. Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models and Bayesian networks. Systems Biology. 2006;22:2523–2531. doi: 10.1093/bioinformatics/btl391. [DOI] [PubMed] [Google Scholar]

PERMALINK

BAYESIAN HIERARCHICAL MODELING FOR SIGNALING PATHWAY INFERENCE FROM SINGLE CELL INTERVENTIONAL DATA1

Ruiyan Luo

Hongyu Zhao

Abstract

1. Introduction

2. Methods

2.1. Hierarchical model (HM)

Table 1.

Fig. 1.

Fig. 4.

2.2. Restricted hierarchical model (RHM)

2.3. Nonhierarchical model (NHM)

3. Simulation study

3.1. Constant intrinsic variance (σiI)2≡1

3.1.1. Inference from HM

Fig. 2.

Fig. 3.

3.1.2. Inference from RHM

Fig. 5.

3.1.3. Inference from NHM

Fig. 6.

3.2. Variable intrinsic variances (σiI)2

Fig. 7.

3.3. Heavy tail distribution for intrinsic noise

Table 2.

4. Case study

Fig. 8.

Table 3.

5. Discussion

Supplementary Material

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

BAYESIAN HIERARCHICAL MODELING FOR SIGNALING PATHWAY INFERENCE FROM SINGLE CELL INTERVENTIONAL DATA^¹

3.1. Constant intrinsic variance ${(σ_{i}^{I})}^{2} \equiv 1$

3.2. Variable intrinsic variances ${(σ_{i}^{I})}^{2}$