Abstract
Networks effectively capture interactions among components of complex systems, and have thus become a mainstay in many scientific disciplines. Growing evidence, especially from biology, suggest that networks undergo changes over time, and in response to external stimuli. In biology and medicine, these changes have been found to be predictive of complex diseases. They have also been used to gain insight into mechanisms of disease initiation and progression. Primarily motivated by biological applications, this article provides a review of recent statistical machine learning methods for inferring networks and identifying changes in their structures.
INTRODUCTION
Networks are ubiquitous in many scientific disciplines. They are widely used to capture interactions among components of complex systems, and to glean insight into how these interactions shape the system’s behavior. The latter is often achieved by comparing networks over time and/or in different states, a task referred to as differential network analysis (Ideker and Krogan, 2012).
Differential network analysis has become particularly popular in biological studies, where growing evidence suggests that interactions among components of biological systems can vastly change over (evolutionary) time (Borneman et al., 2007; Schmidt et al., 2010), when the system responds to external stimuli (Bar-Yam and Epstein, 2004; Luscombe et al., 2004), or in disease conditions (Hussain and Harris, 2006; Goh et al., 2007). For instance, changes in gene, protein and metabolite networks have been found to be associated with the onset and progression of various diseases (Zhong et al., 2009; Zhang et al., 2016; West et al., 2012; Ma et al., 2019a). Similarly, changes in brain connectivity networks have been successfully used as predictive biomarkers for neurodegenerative diseases (Chuang et al., 2007; Taylor et al., 2009).
Let G = (V,E) be a network with nodes V = {1, 2,…,m} and edge set E ⊆ V × V. Changes in G can be due to changes in its nodes, V, its edges, E, or both. Changes in the node set are common in social and communication networks, where both V and E can change as the network grows over time. In these settings, network edges—e.g., social interactions or internet connections—are directly observed and the primary goal is to understand the mechanisms of network growth (Durrett, 2007). In contrast, in this paper we focus on the setting where the node set V is fixed and the goal is to identify changes in network edges, E. Identifying such changes is of primary interest in the study of biological systems, where network nodes—e.g., genes or brain regions—can be measured, but network edges are often not directly observed. In fact, despite recent progress in developing assays for identifying interactions among genes and proteins (Stelzl et al., 2005; Krogan et al., 2006; Tarassov et al., 2008), and changes in interactions in different biological conditions (Barrios-Rodiles et al., 2005), interactions in biological systems and changes in those interactions are commonly inferred from measurements on the nodes. Primarily motivated by the challenges in biological applications, this paper reviews statistical methods for identifying changes in the edge set, E, inferred from n observations on each node j ∈ V. To this end, we first briefly review probabilistic graphical models (Lauritzen, 1996), which are the primary building blocks for inferring network edges. We then review statistical methods for differential network analysis.
Throughout the paper, random variables are denoted by capital letters (e.g., X and Xj), scalar parameters are denoted by lower case Greek letters (e.g., θ) and parameter vectors/matrices are denoted by uppercase Greek letters (e.g., Θ). Matrices of observations are denoted by Calligraphic letters (e.g., 𝒳) and single observations are denoted by the corresponding lower case letters (e.g., xij).
Background: Learning Network Structures
Probabilistic graphical models are widely used to summarize dependency relationships among random variables (Lauritzen, 1996), and to learn such dependencies from observations on the variables (Drton and Maathuis, 2017). For a graph G = (V,E), the set of nodes V = {1,…,m} is associated with random variables X1,…,Xm, and the edge set E captures dependency relationships among the variables. The edges in E can be directed or undirected.
Directed graphical models are often used to capture causal relationships among random variables, with a directed edge j → k representing a direct causal effect of Xj on Xk. The special case of directed acyclic graphs (DAGs)—where there are no directed cycles in G—corresponds to well-known Bayesian networks (Pearl, 2009), which have found many applications in biological (Markowetz and Spang, 2007) and social (Babin and Svensson, 2012) sciences, as well as machine learning (Koller and Friedman, 2009).
As expected, learning directed causal graphs from observational data is challenging and often impossible, or only possible under (uncheckable) identifiability assumptions (Peters and Bühlmann, 2013). This is because multiple DAGs may have the same likelihood and may thus be indistinguishable from data. Instead, the completed partially directed acyclic graph (CPDAG) representing the class of Markov equivalent DAGs is often estimated from observational data. Despite recent progress (Shojaie and Michailidis, 2010; Wang et al., 2018; Ghoshal and Honorio, 2019; Manzour et al., 2019), existing methods for differential analysis of directed networks are in their infancy. As such, this review primarily focuses on differential analysis of undirected networks; references to recent work on differential analysis of directed networks are given in the Further Readings section.
Methods for learning the structure of undirected networks can be broadly categorized into methods based on (i) marginal and (ii) conditional associations among variables, X1,…,Xm. These two classes of methods are reviewed in the remainder of this section.
Learning Networks from Marginal Associations
Marginal inference procedures declare an (undirected) edge between two variables Xj and Xk if and only if they are dependent on each other. In practice, the dependence is often characterized by a marginal association measure, ρ(Xj,Xk). In that case, two nodes j and k are connected in G, i.e., j − k ∈ E, if and only if ρ(Xj,Xk) ≠ 0.
In the simplest case, the marginal association network is defined based on the (Pearson) correlation between Xj and Xk. In practice, this simple approach, which is widely used in biological settings (Junker and Schreiber, 2008), amounts to calculating the sample correlation coefficient between each pair of variables, Xj and Xk, or, equivalently, the (j, k) entry of the empirical correlation matrix of X1,…,Xm, denoted S. Learning the network structure then corresponds to selecting a subset of non-diagonal entries of S. This can be achieved by testing whether each ρ(Xj,Xk) is zero, using, e.g., the Fisher’s transformation of sample correlations (Fisher, 1921), which can be used to test the hypothesis of no correlation, H0 : ρ(Xj,Xk) = 0. As an alternative, the network structure can be learned by identifying the set of correlations that are larger in magnitude than a pre-specified threshold κ. The threshold κ plays the role of a tuning parameter, and can be selected to achieve a certain level of sparsity in the network (Wang et al., 2006), or to obtain a network that satisfies a certain degree distribution (Langfelder and Horvath, 2008).
While simple, the Pearson correlation in the above procedure only captures linear dependencies. This is appropriate if X1,…,Xm are jointly normally distributed. However, multivariate normality (or presence of linear dependencies (Khatri and Rao, 1976)) is a stringent assumption that may not hold in practice. As an alternative, rank-based correlation measures, such as Spearman correlation or Kendal’s-τ, or nonparametric measures of marginal association, such as mutual information (Margolin et al., 2006) or kernel-based measures of dependence (Yamanishi et al., 2004) can be used to test whether each pair of variables, Xj and Xk, are independent. The network structure can then be learned by from p-values for testing independence among variables from each of these approaches, or by applying a pre-specified threshold.
Regardless of the choice of association measure, the above network learning procedures have another limitation: marginal measures of associations cannot distinguish between direct and indirect relationships. As a simple example, consider three normally distributed variables X1,X2 and X3. Suppose the true network G consists of two edges, 1 − 2 and 1 − 3. Further, suppose the true correlation between X1 and both X2 and X3 is 0.8. In other words, ρ(X1,X2) = ρ(X1,X3) = 0.8. But this implies that ρ(X2,X3) = 0.64! Thus, with enough observations, the network learned from the (correctly specified) marginal association measure would incorrectly include the edge 2 − 3.
Despite its simplicity, the above example illustrates a major limitation of network inference based on marginal associations. Unfortunately, the same issue also arises with other distributions and other measures of associations. Network learning procedures based on conditional measures of associations, discussed next, try to address this limitation.
Learning Networks from Conditional Associations
Undirected graphical models, also known as Markov random fields (MRF), represent conditional dependence relationships between a set of random variables. For random variables {X1,X2,…,Xm}, an MRF is associated with an undirected graph G = (V,E) with vertex set V = {1,2,…,m} and undirected edges E ⊆ V × V, such that the absence of an edge between nodes j and k indicates that Xj and Xk are conditionally independent given all other variables, i.e., X\{j,k} (Lauritzen, 1996). In the smallest such graph G, known as the conditional independence graph, there is an edge between j and k, i.e. j − k ∈ E, if and only if Xj and Xk are conditionally dependent given all other variables (Lauritzen, 1996).
Given n observations from each random variable Xj; j ∈ V, learning the conditional independence graph (CIG) corresponds to identifying pairs of random variables that are independent given all other variables. While learning networks from conditional associations, and in particular the CIG, is more challenging than learning based on marginal associations, edges in a CIG capture unconfounded associations among variables and may thus be more scientifically meaningful. For instance, in the simple example of the previous section, the partial correlation between X2 and X3 after adjusting for X1 is indeed zero. Thus, the CIG correctly captures the association among variables.
When m = |V| is small compared to n, the CIG can be learned nonparametrically, using, e.g., nonparametric procedures for testing conditional independences, such as conditional mutual information (Margolin et al., 2006), or kernel-based procedures (Yamanishi et al., 2004). However, nonparametric procedures become computationally challenging, if not prohibitive, when m is large. Moreover, it is not straightforward to extend such nonparametric procedures to high-dimensional settings, i.e., when m > n. In contrast, characterizing conditional independence is often easier if the family of probability distributions corresponding to G is represented by finite-dimensional parameters. Such parametric models can also be more easily extended to high-dimensional settings. Finally, existing procedures for differential network analysis mainly consider parametric graphical models. Therefore, the rest of this section is primarily focused on parametric models, and we only provide a brief review of semi- and non-parametric graphical modeling approaches.
Gaussian Graphical Models
The most well known, and most widely studied, example of probabilistic graphical models is the class of Gaussian graphical models (GGM), wherein {X1,X2,…,Xm} are jointly Gaussian. Formally, in a GGM, (X1,X2,…,Xm) ~ N(μ, Σ), where , and denotes the set of symmetric positive definite matrices. In this case, any two variables Xj and Xk are conditionally independent, given all other variables X\{j,k}, if and only if the (j, k) entry of the inverse covariance, or precision, matrix, Ω = Σ−1, is zero (Lauritzen, 1996). Formally,
(1) |
Equation (1) implies that in the Gaussian case, the CIG is fully characterized by the precision matrix, Ω. This characterization suggests the following simple estimation strategy: Let 𝒳 be the n × m data matrix corresponding to n i.i.d. observations for centered variables {X1,X2,…,Xm} (so, ). Then, calculate the empirical covariance matrix, S = (n − 1)−1𝒳⊤𝒳, and estimate the CIG based on nonzero entries of S−1, by applying a threshold (similar to κ discussed earlier for marginal association networks) or using an inference procedure (Drton and Perlman, 2004).
While the above strategy is straightforward, the inverse of the empirical covariance matrix, S−1, may not be well-conditioned even when n > m (Dempster, 1972). Moreover, the inverse does not even exist in the high-dimensional setting, where m > n. An alternative strategy is to directly calculate the partial correlations among pairs of variables, which are well known measures of conditional independence for Gaussian random variables. The partial correlation between Xj and Xk can be computed by first regressing each of them on the other variables, X\{j,k}, and then calculating the correlation between the residuals from these two regressions (Hair et al., 1998). Partial correlations between Xj and other variables can also be more directly obtained by regressing Xj on all other variables. More specifically, suppose the variables are centered and scaled, and consider m linear regressions
(2) |
Then, βjk is the partial correlation between Xj and Xk given X\{j,k}. Moreover, βjk = −Ωjk/Ωjj (Meinshausen and Bühlmann, 2006). Thus, nonzero conditional independence relationships estimated based on Ωjk and βjk coincide, leading to (asymptotically) equivalent estimates of the CIG.
A potential drawback of regressions-based estimation of the CIG in (2) is that, given a fixed sample size n, estimated conditional independences between Xj and Xk based on βjk and βkj may not coincide. Nonetheless, this regression-based strategy can be easily generalized to high-dimensional settings, by, e.g., utilizing a sparsity-inducing penalty such as the lasso (Tibshirani, 1996). This approach, known as neighborhood selection, was first considered in the seminal work of Meinshausen and Bühlmann (2006), who also established the consistency of the estimated CIG in high-dimensional sparse settings. In this approach, the ‘neighborhood’ of each node j ∈ V is defined as variables with non-zero coefficients in m penalized regressions of the form
(3) |
Here, the tuning parameter λ controls the sparsity of the estimated neighborhoods, defined as . To mitigate the potential discrepancy between the neighborhoods (e.g., those estimated based on and ), the authors then propose constructing the CIG based on either the intersection or the union of the estimated neighborhoods.
Sparsity inducing penalties can also be used to directly estimate the precision matrix, Ω. In this approach, first considered by (Yuan and Lin, 2007; Banerjee et al., 2008) and popularized by the efficient graphical lasso algorithm (Friedman et al., 2008), Ω is estimated by minimizing the ℓ1-penalized negative log likelihood
(4) |
where, as before, S is the empirical covariance matrix, and for a square matrix M, trace(M) and logdet(M) denote the sum of its diagonal entries and the logarithm of its determinant, respectively. In graphical modeling applications, the ℓ1 penalty ‖Ω‖1 = ∑j,k |Ωjk| is often replaced by the sum of absolute values of the off-diagonal entries of Ω, ‖Ω‖1,off = ∑j≠k |Ωjk|.
Since their introductions, various authors have considered other penalties for both neighborhood selection and penalized likelihood estimation approaches, and have also investigated asymptotic properties of these estimators (Rothman et al., 2008). A number of other approaches have also been proposed, including symmetric estimation of partial correlations (Peng et al., 2009; Khare et al., 2015) as well as Bayesian estimation strategies (Wang et al., 2012). More comprehensive reviews of the relevant papers can be found in the recent book on estimation of covariance matrices (Pourahmadi, 2013) and the review paper on structure learning in graphical models (Drton and Maathuis, 2017).
Graphical Models for Other Probability Distributions
A key reason for the popularity of GGMs and the extensive recent work in this area is the convenient characterization of conditional independence relations for Gaussian random variables by the inverse covariance, or precision, matrix. However, joint normality is a stringent assumption that may not be satisfied in many real data applications (Voorman et al., 2013). In particular, GGMs are not appropriate when the observations are discrete (e.g., binary or Poisson), have heavy-tail distributions (e.g., exponential), or their support is a subset of the real line (e.g., non-negative).
The main challenge in estimating CIGs for other distributions is that unlike in the Gaussian case, conditional independence relations between pairs of variables are not necessarily characterized by a single parameter. Instead, conditional independence relations are more generally characterized by the Hammersley-Clifford Theorem (Besag, 1975), which states that a probability distribution P with a strictly positive density defines a Markov random field (MRF) over a graph G if and only if its density, f, can be factorized over complete subgraphs, or cliques, of G. While elegant and general, this characterization does not necessarily lead to tractable algorithms for estimating CIGs given observations from {X1,…,Xm}. That is because one would need to search over all possible subsets of the variables to find the cliques that define the MRF.
In the special case of GGMs, the Hammersley-Clifford Theorem is considerably simplified: In this case, it suffices to only consider pairwise interactions among variables, which is efficiently learned from the precision matrix of {X1,…,Xm}. Motivated by this property, graphical models for other distributions have also been defined based on pairwise interactions among variables. Denoting by fj(Xj) and fjk(Xj,Xk) the node and edge potentials, respectively, the density f(x) for such a pairwise MRF is proportional to
(5) |
Importantly, (5) implies that fjk = 0 for j − k ∉ E. Thus, the CIG can be estimated by identifying nonzero edge potentials. This characterization can be further simplified by parametrizing the edge potentials by, e.g., assuming
(6) |
for parameters . Let be the matrix with zero diagonal entries and off diagonal entries equal to θjk. Then, similar to GGMs, conditional independence relations for this family can be simply learned from the entries of Θ: j − k ∈ E if and only if θjk = θkj = 0 (Wainwright et al., 2008).
With the parametrization in (6), a key remaining challenge in estimating CIGs for exponential families is computing the normalizing constant to ensure that the distribution specified in (5) is well defined. To overcome this challenge, Yang et al. (2012) consider the case where conditional distributions for each node, given all other nodes, are generalized linear models (GLMs). More specifically, setting fj(Xj) = θjXj, they consider conditionally-specified graphical models, where node-conditional distributions are GLMs proportional to
(7) |
ne(j) = {k : k − j ∈ E} is the neighborhood of j in G, and g(·) is a function that specifies different GLM distributions.
Yang et al. (2012) show that the conditionally-specified model (7) leads to a unique joint probability distribution of the form
where h(Θ) is the normalizing constant. Various GLM distributions are then obtained by considering different functions g(·). For instance, corresponds to the Gaussian distribution, while g(x) = 0 corresponds to the Bernoulli distribution (Ravikumar et al., 2010). Chen et al. (2014); Yang et al. (2014) and Cheng et al. (2017) further extend this approach to estimate CIGs from mixed data, where node-conditional distributions are specified by multiple GLM distributions, for instance, binary, Poisson and Gaussian.
A key advantage of the conditionally-specified model (7) is that it allows bypassing the computation of the normalizing constant, and facilitates computationally-efficient estimation of CIGs for a broad class of distributions. In fact, for GLMs, estimating the pairwise MRF amounts to solving m GLM regressions—m logistic regressions for binary data (similar to Ravikumar et al. (2010)), and m Poisson regressions for Poisson variables (similar to Yang et al. (2013) and Allen and Liu (2013)). High-dimensional pairwise MRF for these and other distributions can then be estimated by augmenting the conditional negative log-likelihoods corresponding to (7) with a sparsity inducing penalty on Θ, such as lasso. This approach is thus a natural extension of the neighborhood selection estimator of Meinshausen and Bühlmann (2006) for other distributions in the exponential family.
While computationally convenient, conditionally-specified models are not guaranteed to result in a symmetric network estimate (as discussed in the case of GGMs). To circumvent the latter shortcoming, few authors have proposed estimation strategies similar to conditionally-specified models that result in symmetric network estimates (see, e.g., Drton and Maathuis, 2017). An alternative strategy for bypassing the computation of the normalizing constant, which can be used to directly obtain symmetric network estimates, is the score matching approach of Lin et al. (2016). In this approach, the loss function is defined as the Fisher information distance between the gradients, with respect to observations x, of true and candidate log densities. Using integration-by-parts, Hyvärinen (2005) showed that under mild conditions, the empirical loss for a candidate density f can be written as the average, over n observations, of
where ∇x and Δx denote the gradient and Laplace operators with respect to x.
Lin et al. (2016) equipped the score matching loss with an ℓ1 penalty to obtain estimates of high-dimensional graphical models for distributions in the exponential family with absolutely continuous densities. Using the generalized score matching loss of Hyvärinen (2007), they also extended this approach to distributions with densities supported over a subset of . See Yu et al. (2018) for further generalizations of this approach.
Semi-parametric and Nonparametric Graphical Models
While computationally attractive and statistically efficient, parametric graphical models can lead to biased and incorrect CIG estimates if their underlying model does not hold. As an alternative to parametric models, few authors have recently considered semi- and non-parametric estimation of graphical models. Early work in this area considered the Gaussian copula or nonparanormal distribution (Liu et al., 2009; Dobra et al., 2011); instead of assuming multivariate normality, the nonparanormal model posits that for some (unknown) monotone functions h1,…hm the transformed variables h1(x1),…,hm(xm) have a multivariate normal distribution with mean zero and precision matrix Ω. While estimating the unknown functions hj, j = 1,…,m seems difficult at first glance, Liu et al. (2012) and Xue et al. (2012) show that this approach is equivalent to estimating the CIG by plugging in a rank-based correlation matrix, such as Spearman correlation or Kendal’s τ into the graphical lasso optimization problem (4).
The nonparanormal graphical model can be efficiently estimated and provides a natural generalization of the graphical lasso estimator. However, Voorman et al. (2013) show that the nonparanormal model can be restrictive, and propose, as an alternative, conditionally-specified additive graphical models, by assuming
where εj is a mean-zero noise variable. In this model, Xj ⫫ Xk given other variables if and only if fjk = fkj = 0. Thus, in high dimensions, the CIG can be estimated by fitting m penalized nonparametric regressions. Voorman et al. (2013) consider a basis expansion approach and use a joint standardized group lasso penalty (Simon and Tibshirani, 2012) to enforce both fjk and fkj to zero in order to estimate the neighborhood of each node in G. Other related ideas include the graphical random forest estimator of Fellinghauer et al. (2013), the kernel-based estimator of Lee et al. (2016), as well as nonparametric approaches for exponential densities in Sun et al. (2015) and Suggala et al. (2017).
Statistical Methods for Differential Network Analysis
Before reviewing recent developments in statistical methods for differential network analysis, we discuss relevant hypotheses and measures of difference between networks. For simplicity, we restrict the discussion to comparing two networks, G1 and G2 with the same node set V and edges sets E1 and E2, or, equivalently, adjacency matrices A1 and A2. In general, E1 and E2 may have been directly observed, obtained from experiments, or learned from observations on the nodes via graphical modeling approaches. However, as mentioned in the Introduction, we focus primarily on networks inferred using graphical modeling methods. For instance, in the case of GGMs, As, s ∈ {1, 2} may correspond to estimated partial correlation matrices, , s ∈ {1,2}.
Various notions of difference between A1 and A2 can be considered. For instance, we may be interested in identifying global differences between A1 and A2, i.e., whether A1 = A2. However, similar to testing for equality of vectors of parameters, different norms or distance measures can be used to assess whether A1 and A2 are the same. For instance, one can examine the difference between weighted adjacency matrices, by examining the value of ‖A1 − A2‖ for some matrix norm. In the case of GGMs, this can be achieved by examining . Alternatively, one can consider the structural Hamming distance (Diestel, 2012) between A1 and A2, which counts the total number of edge differences between the two networks. Compared to the norm-based approach, which takes the quantitative values of estimated parameters into account, this approach assesses qualitative differences between the two networks. Finally, the topology of the space of networks offers additional measures of differences between A1 and A2, including (potentially vector-valued) summary measures of the two networks, such as the size and/or number of clusters, the average connectivity, or the degree distribution; see Shojaie and Sedaghat (2017) for examples of such measures.
In many applications, local differences between the two networks, including differences in individual edges, neighborhoods or subnetworks, can also be of interest. This is especially the case in biological applications, where network-based biomarkers can be used to interrogate mechanisms of diseases initiation and progression (Erler and Linding, 2010; Gomez-Ramirez and Wu, 2014; Liu, 2016). Identifying local differences between networks can also be of interest following an affirmative global test of difference between the two networks. As in the case of global differences, local differences between two networks can be assessed qualitatively or quantitatively. For instance, in the case of GGMs, one may be interested in identifying node pairs (j, k) such that . Alternatively, instead of looking at quantitative differences between parameters, we may want to identify node pairs (j, k) such that j − k ∈ G1 but j − k ∉ G2. In the Gaussian case, such qualitative differences can be identified by comparing the zero/nonzero patterns of and ; for instance, by identifying node-pairs (j, k) such that , where supp(ω) = 1 if ω ≠ 0 and 0 otherwise.
Examples of quantitative and qualitative differences in networks are depicted in Figure 1. This simple example highlights different insights and conclusions based on different notions of network difference: the differential network based on values of partial correlations (A2 − A1, bottom-left) captures differences in signs and magnitudes of model parameters; the differential network based on supports of A2 and A1 (bottom-center) captures differences in edge structures; and the differential network based on differences in signs (bottom-right) captures both support and sign differences between. The choice of the appropriate notion of difference depends on the application. In particular, as discussed in the remainder of this section, qualitative methods/tests may better capture differences in the structures of underlying networks, while quantitative methods could offer higher power for identifying differences in parameters of graphical models used to learn the networks.
Figure 1:
Illustration of different notions of difference in networks. Top: Hypothetical networks for two populations; here, networks correspond to two GGMs and adjacency matrices A1 and A2 correspond to (true) partial correlations among nodes. Bottom: Differential networks based on differences in values of adjacency matrices (left); differences in supports of the adjacency matrices (center); and differences in signs of adjacency matrices (right).
In the following, we discuss existing statistical approaches that examine various notions of difference between two networks (global vs. local and qualitative vs. quantitative). Given the current state of the literature, we focus primarily on methods for Gaussian observations, and briefly review methods for other graphical models at the end.
Global Tests of Network Differences
Naturally, the global null hypothesis of no difference between two GGMs, i.e., H0 : E1 = E2, can be tested by examining whether correlation, or partial correlation, matrices in the two populations are different. Formally, two GGMs are the same if H0 : Σ1 = Σ2, or, equivalently, H0 : Ω1 = Ω2, holds. However, as mentioned earlier, these matrix-based hypotheses can be tested using different matrix norms and summaries. Regardless of the choice of norm/summary, a key challenge arises from high-dimensionality: When m ≫ n classical estimates of Σs, s ∈ {1, 2} may be too noisy for an unbiased tests, and estimating Ωs, s ∈ {1, 2} requires regularization methods that rely on sparsity.
Motivated by classical multivariate methods, early tests of difference between high-dimensional correlation matrices (Schott, 2007; Li and Chen, 2012) were based on the Frobenius norm, . These tests are sensitive to orchestrated weak changes in entries of the correlation matrices, but may have low power if few correlations are significantly different, but the majority are similar. In contrast, methods based on maximum entries of matrices (Cai and Zhang, 2016; Chang et al., 2017) are sensitive to large differences between individual correlations, i.e., sparse but large differences. Other approaches have utilized eigen-structures (Srivastava and Yanagihara, 2010) and random matrix projections (Wu and Li, 2015). In a recent work, Zhu et al. (2017) proposed a test based on sparse leading eigenvectors that can detect both sparse and weak differences.
A potential advantage of the above methods for testing differences in covariance matrices is that they can also be applied to pre-specified subsets of nodes. More specifically, for a subset U ⊆ V of nodes, the above methods can test . Such tests are particularly relevant in pathway enrichment analysis (Khatri et al., 2012), where U is the set of nodes corresponding to a biological pathway, and the goal is to determine whether the distributions of random variables Xj for j ∈ U are the same across two populations. Similar problems also arise in other applications, for instance, when interrogating composite brain regions (Tryputsen et al., 2015). Both nonparametric methods, such as the energy statistic (Székely and Rizzo, 2013), and permutation-based approaches (Subramanian et al., 2005; Tian et al., 2005) have been used to test for differences in distributions. However, more recent approaches have focused on accounting for the topology of the underlying networks (Khatri et al., 2012) by utilizing the full power of graphical models. For instance, assuming normality, the topologyGSA method (Massa et al., 2010) first tests for equality of covariance matrices, Σ1 = Σ2. Depending on the outcome of this test, pathway enrichment is determined by testing for differences in means, i.e. μ1 = μ2: if equality of covariances is not rejected, a multivariate analysis of variance (MANOVA) (Smith et al., 1962) is used, whereas the Behrens-Fisher method (Anderson, 2003) is used if covariances are found to be different. Similarly, DEGraph (Jacob et al., 2012) also starts with testing Σ1 = Σ2. If this hypothesis is rejected, then the pathway is declared to be enriched. If not, differences in means are tested using a Hotelling’s T2 statistic (Hotelling, 1931) using the pooled estimate of the covariance matrix. The NetGSA framework (Shojaie and Michailidis, 2009,0; Ma et al., 2016) is also related, but takes a different perspective; it combines differences in mean and covariance matrices between the two populations by considering a latent variable model, and defines a contrast vector based on covariances. Aside from details of testing procedures, another key difference between NetGSA and other methods is that it uses the observations in each population to learn/update the estimated network in each condition, and thus accounts for differential connectivity in the two networks. See Ma et al. (2019b) for more discussions and a recent review of topology-based pathway enrichment methods.
Estimating Multiple GGMs and Their Differences
Biological systems are inherently robust (Kitano, 2004). Therefore, despite potential differences, networks in similar conditions or populations are expected to share many common edges. For instance, gene regulatory networks in different cancer subtypes, for instance ER+ and ER− subtypes of breast cancer in Figure 2, are expected to share many edges. It therefore makes sense to account for these common edges. This is particularly the case when estimating high-dimensional graphical models, where the small sample size, compared to the number of variables/features, is a key challenge. Recent graphical modeling approaches that try to account for common edges in networks in order to better delineate their differences can be broadly categorized into two classes: joint estimation of multiple graphical models and direct estimation of differences between graphical models.
Figure 2:
Differential network analysis in subtypes of breast cancer. The two networks show edges identified as significant in only one breast cancer subtypes (Left: ER+; Right: ER−). They correspond to interactions among a subset of m = 358 cancer-related genes, and are inferred using gene expression measurements from the Cancer Genome Atlas (TCGA).
In joint estimation of multiple graphical models, the goal is to borrow information across populations/conditions in order to better estimate the networks in each condition. For instance, when estimating two GGMs, this can be achieved by encouraging the entries of the precision matrices to be similar to each other. More specifically, let and be (j, k) entries of precision matrices in two populations. Then, joint estimation strategies encourage the estimates of and to be similar to each other. To achieve this goal, Guo et al. (2011) proposed to re-parametrize the entries of the precision matrices as the product of a common parameter (for both populations) and a population-specific parameter. Formally, for s ∈ {1, 2}, they let , where to avoid sign ambiguity, Θjk is restricted to be nonnegative. The graphical models are then jointly estimated by replacing the ℓ1 penalty in the graphical lasso problem (4) with two penalties on Θjk and :
The first penalty encourages sparsity in both and , and hence improves the selection of common zero coefficients in the precision matrices. If Θjk ≠ 0, then the second penalty induces condition-specific sparsity in each of the precision matrices.
The proposal of Guo et al. (2011) leads to a non-convex optimization problem, and potential challenges in large-scale networks. As an alternative, Danaher et al. (2014) proposed to directly augment the graphical lasso problem (4) with a second penalty to encourage similarity among , s ∈ {1, 2} coefficients. In particular, they proposed two penalties: a group lasso penalty (Yuan and Lin, 2006), , and a fused lasso penalty (Tibshirani et al., 2005), . The group lasso penalty encourages similar sparsity patterns across the two populations, whereas the fused lasso penalty encourages the coefficients across the two populations to be equal to each other.
While effective for jointly learning two networks, the strategies described above may not work well for learning multiple GGMs. This is because they inherently assume that the networks in multiple (sub)populations are equally similar to each other. Addressing this shortcoming is the primary focus of a number of recent papers, including Zhu et al. (2014); Peterson et al. (2015); Ma and Michailidis (2016); Saegusa and Shojaie (2016). To achieve this goal, Zhu et al. (2014) and Ma and Michailidis (2016) generalize the fused and group lasso penalties, respectively, to account for the known similarity structure among multiple networks. The methods by Peterson et al. (2015) and Saegusa and Shojaie (2016) focus instead on the setting where the similarity structure is unknown. In particular, Peterson et al. (2015) propose a Bayesian approach by using a Markov random field (MRF) prior to learn the precision matrices in a mixture of Gaussian distributions. To overcome the computational challenges of Bayesian estimation of GGMs, this approach assumes that network edges are formed independently. Saegusa and Shojaie (2016) instead propose to use a Laplacian shrinkage penalty (Huang et al., 2011) based on a similarity structure learned from data. More specifically, instead of a fused or group lasso penalty, the authors propose to use , where the data-driven weights πs,s′ capture the similarity among (sub)populations s and s′. To justify this data-driven penalty, the authors establish the consistency of hierarchical clustering in high-dimensional settings and use the resulting clustering to define the similarity structure among (sub)populations. The idea of combining clustering and estimation of multiple graphical models was also considered in (Hao et al., 2017), wherein clustering and graphical model estimation are combined into a single problem, which is solved using an Expectation Conditional Maximization (ECM) algorithm.
Methods for joint estimation of multiple graphical models provide valuable insight into commonalities and differences between networks in different populations. However, when the primary scientific focus is on differences between networks, learning their common structures may be unnecessary and inefficient. As an alternative, Zhao et al. (2014) proposed to directly estimate the difference of two GGMs. More specifically, the authors utilize the CLIME estimation framework (Cai et al., 2011) to estimate the sparse difference of two precision matrices, Δ = Ω2 − Ω1 subject to a constraint motivated by the observation that the true covariance and precision matrices must satisfy
The key advantage of this approach is it only assumes that the difference of the precision matrices, Δ, is sparse, and not each of the precision matrices. However, solving the optimization problem for direct estimation of differences introduces additional challenges. To overcome these, the authors also propose an alternative formulation based on neighborhood selection. Yuan et al. (2017) have recently proposed a more computationally-appealing alternative based on the D-trace loss (Zhang and Zou, 2014), which is a special case of the score matching loss (Lin et al., 2016) discussed earlier; see also Na et al. (2019) for a related approach to learn differences in networks with latent (hidden) nodes.
Testing for Differences in Network Edges
Unlike global tests of network differences, methods for joint estimation of multiple graphical models and their differences do not provide measures of uncertainty, such as confidence intervals and p-values. Thus, although they provide powerful tools for exploratory analysis and hypothesis generation, the methods discussed in the previous section have limited utility in scientific applications. In contrast, recent hypothesis testing procedures for single precision matrices (Ren et al., 2015; Janková and van de Geer, 2015,0; Xia and Li, 2017) offer confidence intervals for entries of each precision matrix, Ωs, and/or p-values for the null hypothesis for j ≠ k. Yu et al. (2019a) have further generalized this idea for inference in non-Gaussian graphical models using the framework fo generalized score matching (Yu et al., 2019b).
Equipped with a multiple comparison adjustment procedure (e.g., Benjamini and Hochberg, 1995), the above inference methods can be used to (asymptotically) control the probability of falsely detecting nonexistent network edges in each (sub)population. However, these inference procedures are not guaranteed to control the probability of false positives when testing differences between networks. To see this, consider testing the difference between the (j, k) entry in two precision matrices, i.e., and . Suppose we obtain confidence intervals for these parameters, using, e.g., the method of Janková and van de Geer (2015). These confidence intervals can be used to test the difference in support of the two networks with respect to the j − k edge, as illustrated in Figure 1. (The confidence intervals can also be used to test for differences in signs and values of the precision matrices, but, for simplicity, here we focus only on the support.) If both confidence intervals cover zero, or if both do not overlap with zero, then we conclude, with high confidence, that the two networks are not differentially connected at this edge. However, things become complicated if one confidence interval covers zero and the other does not. In this case, the optimistic conclusion is that the difference in coverage of confidence intervals points to differential connectivity between the two networks. However, that is not necessarily the case! The fact that one of the confidence intervals covers zero may simply be due to the low power of the inference procedure, especially if the true parameter or the sample size is small. This simple example highlights the primary limitation of single network inference for inferring differential connectivity between networks.
Inference procedures for detecting differences in two GGMs directly examine whether the entries in the two precision matrices are equal. For instance, Xia et al. (2015) tests whether using the connection between the entries of the precision matrix and the regression coefficients obtained from neighborhood selection (Meinshausen and Bühlmann, 2006). Alternatively, He et al. (2019) test the same hypothesis directly based on estimates of precision matrices using graphical lasso (Friedman et al., 2008). As yet another alternative, Belilovsky et al. (2016) propose a test by directly estimating the difference between two vectors of regression coefficients using a multi-task fused lasso penalty. This approach offers an efficient framework for testing the difference between partial correlations, which may be of interest in some applications.
As an alternative perspective to the above procedures, Zhao et al. (2019) have recently argued that quantitative tests for differential analysis of undirected networks, e.g., tests based on differences between entries of precision matrices (or partial correlations) may not be desirable. In making this argument, they first point out that while GGMs are used for network inference, differences in parameter values (e.g. differences in partial correlations) may not be scientifically meaningful. Rather, the scientists are often interested in whether connectivity patterns are different. They also point out that because of their complex dependence patterns, GGM parameters corresponding to other edges may change if few edges in the network are rewired. As a result, tests based on quantitative differences between GGM parameters could result in uncontrollable false positives if the goal is to identify differences in network structures. To circumvent these issues, Zhao et al. (2019) propose a new framework, termed differential connectivity analysis (DCA), for testing qualitative differences in patterns of connectivity between two GGMs. However, testing qualitative hypotheses is more challenging and DCA requires additional assumptions.
CONCLUSIONS
Differential network analysis is a promising new field with diverse biological applications (Sas et al., 2018; Gambardella et al., 2013; Ma et al., 2014; Cabusora et al., 2005; Troy et al., 2016). Given that networks are often not directly observed in biological settings, statistical methods for identifying differences between networks will continue to be essential tools in this area. With few exceptions (see Further Readings), existing statistical and computational approaches have thus far primarily focused on undirected Gaussian graphical models (GGMs). Differential network analysis for non-Gaussian data and directed networks offer fruitful opportunities of future research. Addressing the limitations of quantitative tests of differences between networks, discussed in Zhao et al. (2019) and briefly reviewed in the previous section, would also be an important direction of future research.
In addition to inferring networks based on activities of components of biological systems, a number of experimental platforms, such as ChIP-Seq and ChIP-chip assays (Landt et al., 2012), have also been developed to interrogate the interactions among these components as well as changes in these interactions (Barrios-Rodiles et al., 2005). These emerging assays offer the opportunity to more directly observe network edges or changes in the network structures. They may also be able to validate the findings from statistical/computational approaches, which is currently a key challenge. Designing efficient experiments based on these new assays (Kerr and Churchill, 2001) and accounting, and adjusting for batch effects (Leek et al., 2010) are challenging but impactful areas of future research.
ACKNOWLEDGEMENTS
I would like to thank three anonymous reviewers for their constructive feedback. I also thank Dr. Jing Ma for helpful input on an earlier version of the manuscript.
FUNDING INFORMATION
This work was made possible by the NSF grant DMS-1561814. Additional support from NIH grant R01GM114029 is also gratefully acknowledged.
Footnotes
To appear as an Advanced Review in WIREs Computational Statistics.
References
- Allen GI and Liu Z. A local poisson graphical model for inferring networks from sequencing data. IEEE transactions on nanobioscience, 12(3):189–198, 2013. [DOI] [PubMed] [Google Scholar]
- Anderson TW. An introduction to multivariate statistical analysis (3rd edition), 2003. [Google Scholar]
- Babin BJ and Svensson G. Structural equation modeling in social science research: Issues of validity and reliability in the research process. European Business Review, 24(4):320–330, 2012. [Google Scholar]
- Banerjee O, Ghaoui LE, and dAspremont A. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Journal of Machine learning research, 9 (Mar):485–516, 2008. [Google Scholar]
- Bar-Yam Y and Epstein IR. Response of complex networks to stimuli. Proceedings of the National Academy of Sciences, 101(13):4341–4345, 2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barrios-Rodiles M, Brown KR, Ozdamar B, Bose R, Liu Z, Donovan RS, Shinjo F, Liu Y, Dembowy J, Taylor IW, et al. High-throughput mapping of a dynamic signaling network in mammalian cells. Science, 307(5715):1621–1625, 2005. [DOI] [PubMed] [Google Scholar]
- Belilovsky E, Varoquaux G, and Blaschko MB. Testing for differences in gaussian graphical models: Applications to brain connectivity. In Lee DD, Sugiyama M, Luxberg UV, Guyon I, and Garnett R, editors, Advances in Neural Information Processing Systems, volume 29, pages 595–603. Curran Associates, Inc., Red Hook, NY, 2016. [Google Scholar]
- Benjamini Y and Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1):289–300, 1995. [Google Scholar]
- Besag J. Statistical analysis of non-lattice data. Journal of the Royal Statistical Society: Series D (The Statistician), 24(3):179–195, 1975. [Google Scholar]
- Borneman AR, Gianoulis TA, Zhang ZD, Yu H, Rozowsky J, Seringhaus MR, Wang LY, Gerstein M, and Snyder M. Divergence of transcription factor binding sites across related yeast species. Science, 317(5839):815–819, 2007. [DOI] [PubMed] [Google Scholar]
- Cabusora L, Sutton E, Fulmer A, and Forst CV. Differential network expression during drug and stress response. Bioinformatics, 21(12):2898–2905, 2005. [DOI] [PubMed] [Google Scholar]
- Cai T, Liu W, and Luo X. A constrained 1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106(494):594–607, 2011. [Google Scholar]
- Cai T, Li H, Ma J, and Xia Y. Differential markov random field analysis with an application to detecting differential microbial community networks. Biometrika, 103(1):1–16, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cai TT and Zhang A. Inference for high-dimensional differential correlation matrices. Journal of multivariate analysis, 143:107–126, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang J, Zhou W, Zhou W-X, and Wang L. Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering. Biometrics, 73(1): 31–41, 2017. [DOI] [PubMed] [Google Scholar]
- Chen S, Witten DM, and Shojaie A. Selection and estimation for mixed graphical models. Biometrika, 102(1):47–64, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng J, Li T, Levina E, and Zhu J. High-dimensional mixed graphical models. Journal of Computational and Graphical Statistics, 26(2):367–378, 2017. [Google Scholar]
- Chuang H-Y, Lee E, Liu Y-T, Lee D, and Ideker T. Network-based classification of breast cancer metastasis. Molecular systems biology, 3(1), 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Danaher P, Wang P, and Witten DM. The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(2):373–397, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dempster AP. Covariance selection. Biometrics, pages 157–175, 1972. [Google Scholar]
- Diestel R. Graph theory: Springer graduate text gtm 173, volume 173. Diestel Reinhard, 2012. [Google Scholar]
- Dobra A, Lenkoski A, et al. Copula gaussian graphical models and their application to modeling functional disability data. The Annals of Applied Statistics, 5(2A):969–993, 2011. [Google Scholar]
- Drton M and Maathuis MH. Structure learning in graphical modeling. Annual Review of Statistics and Its Application, 4:365–393, 2017. [Google Scholar]
- Drton M and Perlman MD. Model selection for gaussian concentration graphs. Biometrika, 91 (3):591–602, 2004. [Google Scholar]
- Durrett R. Random graph dynamics, volume 200. Cambridge university press; Cambridge, 2007. [Google Scholar]
- Erler JT and Linding R. Network-based drugs and biomarkers. The Journal of Pathology: A Journal of the Pathological Society of Great Britain and Ireland, 220(2):290–296, 2010. [DOI] [PubMed] [Google Scholar]
- Fellinghauer B, Bühlmann P, Ryffel M, Von Rhein M, and Reinhardt JD. Stable graphical model estimation with random forests for discrete, continuous, and mixed variables. Computational Statistics & Data Analysis, 64:132–152, 2013. [Google Scholar]
- Fisher RA. On the ‘probable error’ of a coefficient of correlation deduced from a small sample. Metron, 1:1–32, 1921. [Google Scholar]
- Friedman J, Hastie T, and Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fukushima A. Diffcorr: an r package to analyze and visualize differential correlations in biological networks. Gene, 518(1):209–214, 2013. [DOI] [PubMed] [Google Scholar]
- Gambardella G, Moretti MN, De Cegli R, Cardone L, Peron A, and Di Bernardo D. Differential network analysis for the identification of condition-specific pathway activity and regulation. Bioinformatics, 29(14):1776–1785, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ghoshal A and Honorio J. Direct estimation of difference between structural equation models in high dimensions. arXiv preprint arXiv:1906.12024, 2019. [Google Scholar]
- Gill R, Datta S, and Datta S. dna: An R package for differential network analysis. Bioinformation, 10(4):233–234, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, and Barabási A-L. The human disease network. Proceedings of the National Academy of Sciences, 104(21):8685–8690, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gomez-Ramirez J and Wu J. Network-based biomarkers in alzheimers disease: review and future directions. Frontiers in aging neuroscience, 6:12, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo J, Levina E, Michailidis G, and Zhu J. Joint estimation of multiple graphical models. Biometrika, 98(1):1–15, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hair JF, Black WC, Babin BJ, Anderson RE, and Tatham RL. Multivariate data analysis, volume 5. Prentice hall; Upper Saddle River, NJ, 1998. [Google Scholar]
- Hao B, Sun WW, Liu Y, and Cheng G. Simultaneous clustering and estimation of heterogeneous graphical models. The Journal of Machine Learning Research, 18(1):7981–8038, 2017. [PMC free article] [PubMed] [Google Scholar]
- He H, Cao S, Zhang J.-g., Shen H, Wang Y-P, and Deng H.-w.. A statistical test for differential network analysis based on inference of gaussian graphical model. Scientific reports, 9(1):1–8, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hotelling H. The generalization of student’s ratio. The Annals of Mathematical Statistics, 2(3): 360–378, 1931. [Google Scholar]
- Huang J, Ma S, Li H, and Zhang C-H. The sparse laplacian shrinkage estimator for high-dimensional regression. Annals of statistics, 39(4):2021, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hussain SP and Harris CC. p53 biological network: at the crossroads of the cellular-stress response pathway and molecular carcinogenesis. Journal of Nippon Medical School, 73(2):54–64, 2006. [DOI] [PubMed] [Google Scholar]
- Hyvärinen A. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(Apr):695–709, 2005. [Google Scholar]
- Hyvärinen A. Some extensions of score matching. Computational statistics & data analysis, 51(5): 2499–2512, 2007. [Google Scholar]
- Ideker T and Krogan NJ. Differential network biology. Molecular systems biology, 8(1), 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jacob L, Neuvial P, and Dudoit S. More power via graph-structured tests for differential expression of gene networks. The Annals of Applied Statistics, 6(2):561–600, 2012. [Google Scholar]
- Janková J and van de Geer S. Confidence intervals for high-dimensional inverse covariance estimation. Electronic Journal of Statistics, 9(1):1205–1229, 2015. [Google Scholar]
- Janková J and van de Geer S. Honest confidence regions and optimality in high-dimensional precision matrix estimation. TEST, 26(1):143–162, 2017. [Google Scholar]
- Junker BH and Schreiber F. Analysis of biological networks, volume 2. Wiley Online Library, 2008. [Google Scholar]
- Kerr MK and Churchill GA. Statistical design and the analysis of gene expression microarray data. Genetics Research, 77(2):123–128, 2001. [DOI] [PubMed] [Google Scholar]
- Khare K, Oh S-Y, and Rajaratnam B. A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(4):803–825, 2015. [Google Scholar]
- Khatri C and Rao CR. Characterizations of multivariate normality. i. through independence of some statistics. Journal of Multivariate Analysis, 6(1):81–94, 1976. [Google Scholar]
- Khatri P, Sirota M, and Butte AJ. Ten years of pathway analysis: current approaches and outstanding challenges. PLoS computational biology, 8(2):e1002375, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim B, Liu S, and Kolar M. Two-sample inference for high-dimensional markov networks. arXiv preprint arXiv:1905.00466, 2019. [Google Scholar]
- Kitano H. Biological robustness. Nature Reviews Genetics, 5:826–837, 2004. [DOI] [PubMed] [Google Scholar]
- Koller D and Friedman N. Probabilistic graphical models: principles and techniques. MIT press, 2009. [Google Scholar]
- Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, and Tikuisis AP. Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature, 440(7084):637, 2006. [DOI] [PubMed] [Google Scholar]
- Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, and Cayting P. Chip-seq guidelines and practices of the encode and modencode consortia. Genome research, 22(9):1813–1831, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langfelder P and Horvath S. Wgcna: an r package for weighted correlation network analysis. BMC bioinformatics, 9(1):559, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lauritzen SL. Graphical models, volume 17. Clarendon Press, 1996. [Google Scholar]
- Lee K-Y, Li B, and Zhao H. On an additive partial correlation operator and nonparametric estimation of graphical models. Biometrika, 103(3):513–530, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, and Irizarry RA. Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics, 11(10):733, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li J and Chen SX. Two sample tests for high-dimensional covariance matrices. The Annals of Statistics, 40(2):908–940, 2012. [Google Scholar]
- Lin L, Drton M, and Shojaie A. Estimation of high-dimensional graphical models using regularized score matching. Electronic journal of statistics, 10(1):806, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu H, Lafferty J, and Wasserman L. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research, 10(Oct):2295–2328, 2009. [PMC free article] [PubMed] [Google Scholar]
- Liu H, Han F, Yuan M, Lafferty J, and Wasserman L. High-dimensional semiparametric gaussian copula graphical models. The Annals of Statistics, 40(4):2293–2326, 2012. [Google Scholar]
- Liu Z-P. Identifying network-based biomarkers of complex diseases from high-throughput data. Biomarkers in Medicine, 10(6):633–650, 2016. [DOI] [PubMed] [Google Scholar]
- Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, and Gerstein M. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature, 431(7006): 308, 2004. [DOI] [PubMed] [Google Scholar]
- Ma C, Xin M, Feldmann KA, and Wang X. Machine learning–based differential network analysis: A study of stress-responsive transcriptomes in arabidopsis. The Plant Cell, 26(2): 520–537, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma J and Michailidis G. Joint structural estimation of multiple graphical models. The Journal of Machine Learning Research, 17(1):5777–5824, 2016. [Google Scholar]
- Ma J, Shojaie A, and Michailidis G. Network-based pathway enrichment analysis with incomplete network information. Bioinformatics, 32(20):3165–3174, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma J, Karnovsky A, Afshinnia F, Wigginton J, Rader DJ, Natarajan L, Sharma K, Porter AC, Rahman M, He J, Hamm L, Shafi T, Gipson D, Gadegbeku C, Feldman H, Michailidis G, Pennathur t. C., Subramaniam, and the CPROBE study investigators. Differential network enrichment analysis reveals novel lipid pathways in chronic kidney disease. Bioinformatics, 35 (18):3441–3452, 2019a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma J, Shojaie A, and Michailidis G. A comparative study of topology-based pathway enrichment analysis methods. BMC bioinformatics, 20(1):546, 2019b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Manzour H, Küçükyavuz S, and Shojaie A. Integer programming for learning directed acyclic graphs from continuous data. arXiv preprint arXiv:1904.10574, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, and Califano A. Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. In BMC bioinformatics, volume 7, page S7. BioMed Central, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Markowetz F and Spang R. Inferring cellular networks–a review. BMC bioinformatics, 8(6):S5, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Massa MS, Chiogna M, and Romualdi C. Gene set analysis exploiting the topology of a pathway. BMC Systems Biology, 4(1):121, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKenzie AT, Katsyv I, Song W-M, Wang M, and Zhang B. Dgca: a comprehensive r package for differential gene correlation analysis. BMC systems biology, 10(1):106, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meinshausen N and Bühlmann P. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3):1436–1462, 2006. [Google Scholar]
- Na S, Kolar M, and Koyejo O. Estimating differential latent variable graphical models with applications to brain connectivity. arXiv preprint arXiv:1909.05892, 2019. [Google Scholar]
- Pearl J. Causality. Cambridge university press, 2009. [Google Scholar]
- Peng J, Wang P, Zhou N, and Zhu J. Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association, 104(486):735–746, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peters J and Bühlmann P. Identifiability of gaussian structural equation models with equal error variances. Biometrika, 101(1):219–228, 2013. [Google Scholar]
- Peterson C, Stingo FC, and Vannucci M. Bayesian inference of multiple gaussian graphical models. Journal of the American Statistical Association, 110(509):159–174, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pourahmadi M. High-dimensional covariance estimation: with high-dimensional data, volume 882. John Wiley & Sons, 2013. [Google Scholar]
- Ravikumar P, Wainwright MJ, Lafferty JD, et al. High-dimensional ising model selection using 1-regularized logistic regression. The Annals of Statistics, 38(3):1287–1319, 2010. [Google Scholar]
- Ren Z, Sun T, Zhang C-H, and Zhou HH. Asymptotic normality and optimalities in estimation of large Gaussian graphical models. The Annals of Statistics, 43(3):991–1026, 2015. [Google Scholar]
- Rothman AJ, Bickel PJ, Levina E, and Zhu J. Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2:494–515, 2008. [Google Scholar]
- Saegusa T and Shojaie A. Joint estimation of precision matrices in heterogeneous populations. Electronic journal of statistics, 10(1):1341, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sas KM, Lin J, Rajendiran TM, Soni T, Nair V, Hinder LM, Jagadish HV, Gardner TW, Abcouwer SF, Brosius FC, et al. Shared and distinct lipid-lipid interactions in plasma and affected tissues in a diabetic mouse model. Journal of lipid research, 59(2):173–183, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmidt D, Wilson MD, Ballester B, Schwalie PC, Brown GD, Marshall A, Kutter C, Watt S, Martinez-Jimenez CP, and Mackay S. Five-vertebrate chip-seq reveals the evolutionary dynamics of transcription factor binding. Science, 328(5981):1036–1040, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schott JR. A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Computational Statistics & Data Analysis, 51(12):6535–6542, 2007. [Google Scholar]
- Shojaie A and Michailidis G. Analysis of gene sets based on the underlying regulatory network. Journal of Computational Biology, 16(3):407–426, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shojaie A and Michailidis G. Network enrichment analysis in complex experiments. Statistical applications in genetics and molecular biology, 9(1), 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shojaie A and Sedaghat N. How different are estimated genetic networks of cancer subtypes? In Big and Complex Data Analysis, pages 159–192. Springer, 2017. [Google Scholar]
- Simon N and Tibshirani R. Standardization and the group lasso penalty. Statistica Sinica, 22: 983–1001, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith H, Gnanadesikan R, and Hughes J. Multivariate analysis of variance (manova). Biometrics, 18(1):22–41, 1962. [Google Scholar]
- Srivastava MS and Yanagihara H. Testing the equality of several covariance matrices with fewer observations than the dimension. Journal of Multivariate Analysis, 101(6):1319–1329, 2010. [Google Scholar]
- Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, et al. A human protein-protein interaction network: a resource for annotating the proteome. Cell, 122(6):957–968, 2005. [DOI] [PubMed] [Google Scholar]
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, and Lander ES. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43):15545–15550, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suggala A, Kolar M, and Ravikumar PK. The expxorcist: nonparametric graphical models via conditional exponential densities. In Advances in Neural Information Processing Systems, pages, 4446–4456 2017. [PMC free article] [PubMed]
- Sun S, Kolar M, and Xu J. Learning structured densities via infinite dimensional exponential families. In Advances in Neural Information Processing Systems, pages 2287–2295, 2015. [Google Scholar]
- Székely GJ and Rizzo ML. Energy statistics: A class of statistics based on distances. Journal of statistical planning and inference, 143(8):1249–1272, 2013. [Google Scholar]
- Tarassov K, Messier V, Landry CR, Radinovic S, Molina MMS, Shames I, Malitskaya Y, Vogel J, Bussey H, and Michnick SW. An in vivo map of the yeast protein interactome. Science, 320(5882):1465–1470, 2008. [DOI] [PubMed] [Google Scholar]
- Taylor IW, Linding R, Warde-Farley D, Liu Y, Pesquita C, Faria D, Bull S, Pawson T, Morris Q, and Wrana JL. Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nature biotechnology, 27(2):199, 2009. [DOI] [PubMed] [Google Scholar]
- Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, and Park PJ. Discovering statistically significant pathways in expression profiling studies. Proceedings of the National Academy of Sciences, 102(38):13544–13549, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996. [Google Scholar]
- Tibshirani R, Saunders M, Rosset S, Zhu J, and Knight K. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(1): 91–108, 2005. [Google Scholar]
- Troy NM, Hollams EM, Holt PG, and Bosco A. Differential gene network analysis for the identification of asthma-associated therapeutic targets in allergen-specific t-helper memory responses. BMC medical genomics, 9(1):9, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tryputsen V, DiBernardo A, Samtani M, Novak GP, Narayan VA, Raghavan N, and A. D. N. Initiative. Optimizing regions-of-interest composites for capturing treatment effects on brain amyloid in clinical trials. Journal of Alzheimer’s Disease, 43(3):809–821, 2015. [DOI] [PubMed] [Google Scholar]
- Voorman A, Shojaie A, and Witten D. Graph estimation with joint additive models. Biometrika, 101(1):85–101, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wainwright MJ, Jordan MI, et al. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008. [Google Scholar]
- Wang H et al. Bayesian graphical lasso models and efficient posterior computation. Bayesian Analysis, 7(4):867–886, 2012. [Google Scholar]
- Wang Y, Joshi T, Zhang X-S, Xu D, and Chen L. Inferring gene regulatory networks from multiple microarray datasets. Bioinformatics, 22(19):2413–2420, 2006. [DOI] [PubMed] [Google Scholar]
- Wang Y, Squires C, Belyaeva A, and Uhler C. Direct estimation of differences in causal graphs. In Advances in Neural Information Processing Systems, pages 3770–3781, 2018. [Google Scholar]
- West J, Bianconi G, Severini S, and Teschendorff AE. Differential network entropy reveals cancer system hallmarks. Scientific reports, 2:802, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu T-L and Li P. Tests for high-dimensional covariance matrices using random matrix projection. arXiv preprint arXiv:1511.01611, 2015. [Google Scholar]
- Xia Y and Li L. Hypothesis testing of matrix graph model with application to brain connectivity analysis. Biometrics, 73(3):780–791, 2017. [DOI] [PubMed] [Google Scholar]
- Xia Y, Cai T, and Cai TT. Testing differential networks with applications to detecting gene-by-gene interactions. Biometrika, 102(2):247–266, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xue L, Zou H, et al. Regularized rank-based estimation of high-dimensional nonparanormal graphical models. The Annals of Statistics, 40(5):2541–2571, 2012. [Google Scholar]
- Yamanishi Y, Vert J-P, and Kanehisa M. Protein network inference from multiple genomic data: a supervised approach. Bioinformatics, 20(suppl_1):i363–i370, 2004. [DOI] [PubMed] [Google Scholar]
- Yang E, Allen G, Liu Z, and Ravikumar PK. Graphical models via generalized linear models. In Advances in Neural Information Processing Systems, pages 1358–1366, 2012. [Google Scholar]
- Yang E, Ravikumar PK, Allen GI, and Liu Z. On poisson graphical models. In Advances in Neural Information Processing Systems, pages 1718–1726, 2013. [Google Scholar]
- Yang E, Baker Y, Ravikumar P, Allen G, and Liu Z. Mixed graphical models via exponential families. In Artificial Intelligence and Statistics, pages 1042–1050, 2014. [Google Scholar]
- Yu M, Gupta V, and Kolar M. Simultaneous inference for pairwise graphical models with generalized score matching. arXiv preprint arXiv:1905.06261, 2019a. [Google Scholar]
- Yu S, Drton M, and Shojaie A. Graphical models for non-negative data using generalized score matching. In International Conference on Artificial Intelligence and Statistics, pages 1781–1790, 2018. [Google Scholar]
- Yu S, Drton M, and Shojaie A. Generalized score matching for non-negative data. Journal of Machine Learning Research, 20(76):1–70, 2019b. [PMC free article] [PubMed] [Google Scholar]
- Yuan H, Xi R, Chen C, and Deng M. Differential network analysis via lasso penalized d-trace loss. Biometrika, 104(4):755–770, 2017. [Google Scholar]
- Yuan M and Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49–67, 2006. [Google Scholar]
- Yuan M and Lin Y. Model selection and estimation in the gaussian graphical model. Biometrika, 94(1):19–35, 2007. [Google Scholar]
- Zhang T and Zou H. Sparse precision matrix estimation via lasso penalized d-trace loss. Biometrika, 101(1):103–120, 2014. [Google Scholar]
- Zhang X-F, Ou-Yang L, Zhao X-M, and Yan H. Differential network analysis from crossplatform gene expression data. Scientific reports, 6:34112, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao S, Ottinger S, Peck S, Mac Donald C, and Shojaie A. Network differential connectivity analysis. arXiv preprint arXiv:1909.13464, 2019. [Google Scholar]
- Zhao SD, Cai TT, and Li H. Direct estimation of differential networks. Biometrika, 101(2): 253–268, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhong Q, Simonis N, Li Q-R, Charloteaux B, Heuze F, Klitgord N, Tam S, Yu H, Venkatesan K, Mou D, et al. Edgetic perturbation models of human inherited disorders. Molecular systems biology, 5(1), 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu L, Lei J, Devlin B, and Roeder K. Testing high-dimensional covariance matrices, with application to detecting schizophrenia risk genes. Annals of Applied Statistics, 11(3):1810, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu Y, Shen X, and Pan W. Structural pursuit over multiple undirected graphs. Journal of the American Statistical Association, 109(508):1683–1696, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
FURTHER READING
- Recent developments on statistical approaches for differential network analysis have started to focus on directed networks, and, in particular, directed acyclic graphs (DAGs) (Wang et al., 2018; Ghoshal and Honorio, 2019), as well as graphical models for other data types (Cai et al., 2018; Kim et al., 2019; Zhao et al., 2019; He et al., 2019; Yu et al., 2019a). A number of software tools have also been developed that provide tests of differential connectivity based on permutation approaches (Gill et al., 2014), or by considering differences in marginal associations based on correlations, instead of conditional dependencies (Fukushima, 2013; McKenzie et al., 2016). While these tools may not have strong theoretical support, or may test different hypotheses, they provide more convenient user interfaces and may be more computationally amenable for analysis of large networks.