On an additive partial correlation operator and nonparametric estimation of graphical models

Kuang-Yao Lee; Bing Li; Hongyu Zhao

doi:10.1093/biomet/asw028

. 2016 Aug 24;103(3):513–530. doi: 10.1093/biomet/asw028

On an additive partial correlation operator and nonparametric estimation of graphical models

Kuang-Yao Lee ^1,^✉, Bing Li ², Hongyu Zhao ³

PMCID: PMC5793672 PMID: 29422689

Abstract

We introduce an additive partial correlation operator as an extension of partial correlation to the nonlinear setting, and use it to develop a new estimator for nonparametric graphical models. Our graphical models are based on additive conditional independence, a statistical relation that captures the spirit of conditional independence without having to resort to high-dimensional kernels for its estimation. The additive partial correlation operator completely characterizes additive conditional independence, and has the additional advantage of putting marginal variation on appropriate scales when evaluating interdependence, which leads to more accurate statistical inference. We establish the consistency of the proposed estimator. Through simulation experiments and analysis of the DREAM4 Challenge dataset, we demonstrate that our method performs better than existing methods in cases where the Gaussian or copula Gaussian assumption does not hold, and that a more appropriate scaling for our method further enhances its performance.

Keywords: Additive conditional covariance operator, Additive conditional independence, Copula, Gaussian graphical model, Partial correlation, Reproducing kernel

1. Introduction

We propose a new statistical object, the additive partial correlation operator, for estimating nonparametric graphical models. This operator is an extension of the partial correlation coefficient (Muirhead, 2005) to the nonlinear setting. It is akin to the additive conditional covariance operator of Li et al. (2014) but achieves better scaling, leading to enhanced estimation accuracy, when characterizing conditional independence in graphical models.

Let Inline graphic be a -dimensional random vector. Let be an undirected graph, where represents the set of vertices corresponding to random variables and represents the set of undirected edges. For convenience we assume that . A common approach to modelling an undirected graph is to associate separation with conditional independence; that is, node Inline graphic and node are separated if and only if and are independent given the rest of . In symbols,

(1)

where Inline graphic represents with its th and th components removed. Intuitively, this means that nodes and are connected if and only if, after removing the effects of all the other nodes, and still depend on each other. In other words, nodes and are connected in the graph if and only if and have a direct relation. The statistical problem is to estimate Inline graphic based on a sample of .

One of the most commonly used statistical models for (1) is the Gaussian graphical model, which assumes that Inline graphic satisfies (1) and is distributed as for a nonsingular covariance matrix . An appealing property of the multivariate Gaussian distribution is that conditional independence is completely characterized by the zero entries of the precision matrix. Specifically, let be the precision matrix and Inline graphic its th element. Then

(2)

Thus, under Gaussianity, estimation of Inline graphic amounts to identifying the zero entries or, equivalently, the sparsity pattern of the precision matrix . Many procedures have been developed to estimate the Gaussian graphical model. For example, Yuan & Lin (2007), Banerjee et al. (2008) and Friedman et al. (2008) considered penalized maximum likelihood estimation with Inline graphic penalties on . Based on a relation between partial correlations and regression coefficients, Meinshausen & Bühlmann (2006) and Peng et al. (2009) proposed to select the neighbours of each node by solving multiple lasso problems (Tibshirani, 1996). Other recent advances include the work of Bickel & Levina (2008a, b), who used hard thresholding to determine the sparsity pattern, Lam & Fan (2009), who used the smoothly clipped absolute deviation penalty (Fan & Li, 2001), and Yuan (2010) and Cai et al. (2011), who used the Danzig selector (Candès & Tao, 2007).

Since Gaussianity could be restrictive in applications, many recent papers have considered extensions. The challenge is not only to relax Gaussianity but also to preserve the simplicity of the conditional independence structure imparted by the Gaussian distribution. One elegant solution is to assume a copula Gaussian model, under which the data can be transformed marginally to multivariate Gaussianity; see Liu et al. (2009, 2012), Xue & Zou (2012) and Harris & Drton (2013). The copula Gaussian model preserves the equivalence (2) for the transformed Inline graphic , without requiring the to be marginally Gaussian. Other work on non-Gaussian graphical models includes Fellinghauer et al. (2013) and Voorman et al. (2014). In their settings, a given node is associated with its neighbours via either a semiparametric or a nonparametric model.

Another extension is the additive semigraphoid model of Li et al. (2014), which is based on a new statistical relation called additive conditional independence. By generalizing the precision matrix to the additive precision operator and replacing the conditional independence in (2) by additive conditional independence, Li et al. (2014) showed that the equivalence (2) emerges at the linear operator level, at which no distributional assumption is needed.

The primary motivation for introducing additive conditional independence is to maintain nonparametric flexibility without employing high-dimensional kernels. The distribution of points in a Euclidean space becomes increasingly sparse as the dimension of the space increases. For a kernel estimator in such spaces to be effective, we need to increase the bandwidth; otherwise we may have very few observations within a local ball of radius equal to the bandwidth. Increasing bandwidth, however, also increases bias. Therefore we face the dilemma of either increased bias or lack of data in each local region, a phenomenon known as the curse of dimensionality (Bellman, 1957). To avoid this problem while extracting useful information from high-dimensional data, one must impose some kind of additional structure, such as parametric models, sparsity or linear indices. The structure imposed by additive conditional independence is additivity, which allows us to employ only one-dimensional kernels, thus avoiding high dimensionality. The cost is that the graphical model is no longer characterized by conditional independence. Nonetheless, Li et al. (2014) have shown that additive conditional independence satisfies the semigraphoid axioms (Pearl & Verma, 1987; Pearl et al., 1989), a set of four fundamental properties of conditional independence.

To estimate the additive semigraphoid model, Li et al. (2014) proposed the additive conditional covariance and additive precision operators, which extend the conditional covariance and precision matrices and characterize additive conditional independence without distributional assumptions. In the classical setting, the conditional covariance Inline graphic between two random variables and given a third random variable describes the strength of dependence between and after removing the effect of . However, it is confounded by statistical variations in and , which have nothing to do with the conditional dependence. Partial correlation is designed to remove these effects, so that conditional dependence is retained. The additive partial correlation operator that we propose serves the same purpose in the nonlinear setting. We will also propose an estimator of the new operator, and establish its consistency along with that of the estimator of the additive conditional covariance operator, which was not proved in Li et al. (2014). Based on the additive partial correlation operator, we develop an estimator for the additive semigraphoid model and establish the consistency of this procedure.

All the proofs, as well as some additional propositions and numerical results, are presented in the Supplementary Material.

2. Additive conditional independence and graphical models

2.1. Additive conditional independence

Let Inline graphic be a probability space, a subset of , and a random vector. Let be the distribution of . Let be the th component of , and let be the support of . For a subvector of , let be the distribution of and the centred class

We assume that all functions in Inline graphic have mean zero, because constants have no bearing on our construction. Additive conditional independence (Li et al., 2014) was introduced in terms of the geometry. Suppose that , and are subvectors of . For a subvector such as , let be the additive family formed by functions in each Inline graphic ; that is,

Note that Inline graphic and are different: the former consists of additive functions, while the latter has no such restriction. If and are subspaces of , we write if for all and , where denotes the inner product in . If , we denote by the set of functions such that . Also, let denote the subspace .

Definition 1 —

We say thatandare additively independent conditional onif and only if

(3)

We denote this relation by.

Li et al. (2014) showed that the three-way relation Inline graphic satisfies the four semigraphoid axioms (Pearl & Verma, 1987; Pearl et al., 1989), which are features abstracted from probabilistic conditional independence suitable for describing a graph.

Based on (3), Li et al. (2014) proposed the following graphical model. Let Inline graphic be as defined in §1.

Definition 2 —

We say thatfollows an additive semigraphoid model with respect toif

Li et al. (2014) developed theoretical results and estimation methods for the additive semigraphoid model using the additive Inline graphic space .

2.2. Additive reproducing kernel Hilbert spaces

Rather than use Inline graphic geometry, here we use reproducing kernel Hilbert space geometry to derive our new operator and related methods. This is mainly because many asymptotic tools for linear operators have recently been developed in the reproducing kernel Hilbert space setting (Fukumizu et al., 2007; Bach, 2008). The advantage of this alternative formulation will become clear in §4. Let Inline graphic be a positive-definite kernel. For convenience, we assume and to be the same for all and write the common kernel function as . Let be the reproducing kernel Hilbert space of functions of based on the kernel ; that is, is the space spanned by , with its inner product given by . In our theoretical developments, we require that all the functions in Inline graphic be square-integrable, which is guaranteed by the following assumption:

Assumption 1 —

for

This condition is satisfied by most of the commonly used kernels, including the radial basis function.

Let Inline graphic be a subvector of . The additive reproducing kernel Hilbert space of functions of is defined as follows.

Definition 3 —

The spaceis the direct sumin the sense that

with inner product

Equivalently, Inline graphic can be viewed as the reproducing kernel Hilbert space generated by the additive kernel , , where and .

2.3. Other notation

For two Hilbert spaces Inline graphic and , we let denote the class of all bounded operators from to and the class of all Hilbert–Schmidt operators. When , we denote these classes simply by and . The symbols and stand for the operator and Hilbert–Schmidt norms. For and , the tensor product is the mapping , . For two matrices or Euclidean vectors Inline graphic and , denotes their Kronecker product. The symbol stands for the identity mapping in a functional space, whereas means the identity matrix. The symbol stands for the vector of length whose entries are all ones. For an operator , represents the null space of and the range of ; that is,

Also, Inline graphic stands for the closure of .

3. Additive partial covariance operator

3.1. The additive conditional covariance operator

The additive conditional covariance operator was proposed by Li et al. (2014) in terms of the Inline graphic geometry; here we redefine it in the reproducing kernel Hilbert space geometry.

For subvectors Inline graphic and of , by the Riesz representation theorem there exists a unique operator such that (Conway, 1994, p. 31)

We define Inline graphic and similarly. The nonadditive versions of these operators were introduced by Baker (1973) and Fukumizu et al. (2004, 2009). Moreover, by Baker (1973), for any there exists a unique operator

such that Inline graphic The operator is the correlation operator between and . Let denote the diagonal matrix of operators whose diagonal entries are the operators , and let be the matrix of operators whose th element is . Then it is obvious that Notice that is the identity operator. Define operators such as Inline graphic , and in a similar way. We make the following assumption about the entries of .

Assumption 2 —

For,is a compact operator.

In the Supplementary Material, we show that Inline graphic is invertible and its inverse is bounded. We are now ready to define the additive conditional covariance operator.

Definition 4 —

Suppose that Assumptions 1 and 2 hold. Then the operator

is called the additive conditional covariance operator ofgiven.

Again, this definition also accommodates operators such as Inline graphic and .

3.2. The additive partial correlation operator

We now introduce the additive partial correlation operator and establish its population-level properties. A straightforward way to define the additive partial correlation operator might be as

(4)

but caution is needed here because Inline graphic and are Hilbert–Schmidt operators and their eigenvalues tend to zero, so that there is no guarantee that (4) will be well-defined. The following theorem, which echoes Theorem 1 of Baker (1973), shows that (4) is well-defined under minimal conditions.

Theorem 1 —

Suppose that Assumptions 1 and 2 hold. Then there exists a unique operatorsuch that:

, where,, anddenotes the projection onto a subspacein;

;

.

Theorem 1 justifies the following definition.

Definition 5 —

Under Assumptions 1 and 2, the operatorin Theorem 1 is called the additive partial correlation operator.

The additive partial correlation operator is defined via a reproducing kernel Hilbert space, whereas additive conditional independence is characterized via Inline graphic . In the Supplementary Material we show that when the kernel function is sufficiently rich that it is a characteristic kernel (Fukumizu et al., 2008, 2009), projections onto can be well approximated by elements in reproducing kernel Hilbert spaces. Specifically, this requires the following assumption.

Assumption 3 —

Eachis a dense subset ofup to a constant; that is, for eachthere is a sequenceinsuch thatas.

We are now ready to state the first main result: one can use the additive conditional covariance or additive partial correlation operator to characterize additive conditional independence.

Theorem 2 —

If Assumptions 1–3 hold, then

3.3. Estimators

Here we define sample estimators of Inline graphic and . Let be independent copies of . Let represent the sample average: . We define the estimate of by replacing the expectation with the sample average ; that is,

Let Inline graphic be the matrix of operators whose th entry is , and let , and so forth be the submatrices corresponding to subvectors and . Let be a sequence of positive constants converging to zero. We define the estimator of as

(5)

Let Inline graphic be another sequence of positive constants converging to zero. We define the estimator of as

(6)

The tuning parameters Inline graphic and in (5) and (6) play roles similar to that of the penalty in ridge regression (Hoerl & Kennard, 1970). Technically, they ensure the invertibility of the relevant linear operators and the consistency of the estimators. In practice, they often bring efficiency gains in high dimensions due to their shrinkage effects. Interestingly, as we will see in the next section, Inline graphic needs to converge to zero more slowly than in order for to be consistent.

4. Consistency and convergence rate

We first establish consistency of Inline graphic . Besides serving as an intermediate step for proving the consistency of , the consistency of is of interest in its own right, because it was not proved in Li et al. (2014), where it was originally proposed under geometry. To derive the convergence rate, we need an additional assumption.

Assumption 4 —

There is an operatorsuch that

The operator Inline graphic also appeared in Lee et al. (2016), where it was called the regression operator because it can be written in the form , resembling the coefficient vector in linear regression. Assumption 4 is essentially a smoothness condition: it requires that the main components in the relation between Inline graphic and be sufficiently concentrated on the low-frequency components of the covariance operator , in the following sense. If is invertible, then Assumption 4 requires to be a compact operator. Since, under mild conditions, is a Hilbert–Schmidt operator (Fukumizu et al., 2007), is an unbounded operator. Intuitively, in order for Inline graphic to be compact, the range space of should be sufficiently concentrated on the eigenspaces of corresponding to its large eigenvalues, or the low-frequency components. As a simple special case of this scenario, in Lee et al. (2016, Proposition 1) it was shown that Assumption 4 is satisfied if the range of Inline graphic is a finite-dimensional reducing subspace of . This is true, for example, when the polynomial kernel of finite order is used. For kernels inducing infinite-dimensional spaces, Assumption 4 holds if there exist only finitely many eigenfunctions of that carry nontrivial correlations with any function in Inline graphic . Of course, these sufficient conditions can be relaxed with careful examination.

We state the consistency of the additive conditional covariance and additive partial correlation operators in the following two theorems, which require different rates for the ridge parameters. For two positive sequences Inline graphic and , we write if and only if , and we write if and only if .

Theorem 3 —

If Assumptions 1, 2 and 4 are satisfied and, then

Theorem 4 —

If Assumptions 1, 2 and 4 are satisfied and

(7)

thenin probability as.

We return now to the estimation of the additive semigraphoid graphical model in Definition 2. The estimators of the additive conditional covariance operator and additive partial correlation operator lead to the following thresholding methods for estimating the additive semigraphoid model:

where Inline graphic and are thresholding constants for the additive conditional covariance operator and additive partial correlation operator, respectively. By combining Theorems 2, 3 and 4, it is easy to show that and are consistent estimators of the true edge set , in the following sense.

Theorem 5 —

Suppose that Assumptions 1–4 hold andsatisfies the additive semigraphoid model in Definition 2 with respect to the graph. Suppose further thatandare positive sequences satisfying (7). Then, for sufficiently smalland, as,

The foregoing asymptotic development is under the assumption that Inline graphic is fixed as . We believe it should be possible to prove the consistency in the setting where as , perhaps along the lines of Bickel & Levina (2008a, b). We leave this to future research.

5. Implementation of estimation of graphical models

5.1. Coordinate representation

The estimators in (5) and (6) are defined in operator form. To compute them, we need to represent the operators as matrices. In the subsequent development we describe this process in the context of estimating the graphical models. We adopt the system of notation for coordinate representation from Horn & Johnson (1985); see also Li et al. (2012). Let Inline graphic be a generic -dimensional Hilbert space with spanning system . For any , there is a vector such that . The vector is called the coordinate of with respect to . Suppose that is another Hilbert space, spanned by , and is a linear operator from to . Then the coordinate of relative to Inline graphic and is the matrix , denoted by . If is a third finite-dimensional Hilbert space, with spanning system , and is a linear operator, then When there is no ambiguity regarding the spanning system used, we abbreviate to , to , and so on. One can also show that for any . In the rest of this section, square brackets Inline graphic will be reserved exclusively for the coordinate notation.

5.2. Norms of the estimated additive partial correlation operator

For each Inline graphic , let and be the th components of the vectors and , respectively. Consider the reproducing kernel Hilbert space

Let Inline graphic be the Riesz representation of the linear functional , , and let . For our purposes, it suffices to consider the subspace of , because it is the range of operators such as and . For this reason, we define this subspace to be .

Let Inline graphic be the Gram kernel matrix. Let , which is the projection onto the orthogonal complement of in . Let , and let be the matrix obtained by removing the th and th blocks from the matrix . Let be the -dimensional block-diagonal matrix whose diagonal blocks are the blocks of , each of dimension Inline graphic . To avoid complicated notation, throughout this subsection we write the estimated operators and as simply and .

By straightforward calculations, details of which are given in the Supplementary Material, we have the following coordinate representations:

(8)

Let Inline graphic , where indicates the Moore–Penrose inverse. Therefore, is equal to and is denoted by . Similarly, . Then we can compute via

(9)

In the Supplementary Material, we also derive an explicit formula for calculating Inline graphic . Let and . Then we have

(10)

The following result links the additive partial correlation operator with the partial correlation when a linear kernel is considered.

Corollary 1 —

Let. Then, as,converges in probability to the absolute value of the partial correlation betweenandgiven.

5.3. Reduced kernel and generalized crossvalidation

To make our method readily applicable to relatively large networks with thousands of nodes, we now propose, as alternatives to (9) and (10), simplified algorithms for estimating Inline graphic and . Lower-frequency eigenfunctions of kernels often play dominant roles, and the numbers of statistically significant eigenvalues of kernel matrices are often much smaller than ; see, for example, Lee & Huang (2007) and Chen et al. (2010). By employing only the dominant low-frequency eigenfunctions, we can greatly reduce the amount of computation without incurring much loss of accuracy. Let the eigendecomposition of the kernel matrix Inline graphic be written as

(11)

where Inline graphic corresponds to the first eigenvalues of and corresponds to the last eigenvalues. Instead of the original bases , we now work with , where and will be written simply as .

Let Inline graphic , let , let be the matrix obtained by removing and from , and let , where . Using derivations similar to (8) and (10), we find the coordinate representation of the additive conditional covariance operator with respect to the new basis as

(12)

which is denoted by Inline graphic . Correspondingly, the Hilbert–Schmidt norms of the additive conditional covariance operator and the additive partial correlation operator can be computed via

(13)

where Inline graphic is the Frobenius matrix norm. In (12) we need to invert an matrix which could be large if is large. However, as shown in Proposition 4 of Li et al. (2014), calculation of this matrix can be reduced to the eigendecomposition of an matrix.

For the choice of Inline graphic , we follow Fan et al. (2011) and determine it adaptively according to the sample size . Specifically, we take

(14)

We use the reduced kernel bases consistently for all the simulations and the real-data analysis in §6. Based on our experience, using the reduced bases not only cuts the computation time substantially but also gives very high accuracy compared with using the full bases.

Next, we introduce a generalized crossvalidation procedure to choose the thresholds Inline graphic and . Our process roughly follows Li et al. (2014). Given , let be the estimated graph by either criterion in (13), and define the neighbours of node as Our strategy is to regress each node on its neighbours and obtain the residuals; the generalized crossvalidation criterion is then used to minimize the total prediction error. Specifically, Inline graphic is determined by minimizing

(15)

where the Inline graphic are chosen differently for each node, as shown in the next subsection.

5.4. Algorithm

The following algorithm summarizes the estimating procedure for the additive semigraphoid model based on the estimated additive partial correlation operator and the estimated additive conditional covariance operator.

Step 1 —

For each , standardize such that and .

Step 2 —

Select the kernel , for example as the radial basis function where is the bandwidth parameter. As in Lee et al. (2013), we recommend choosing via

Step 3 —

Use the selected and to compute the kernel matrix , its centred counterpart , and the eigendecomposition (11) for . Choose according to (14).

Step 4 —

Determine the tuning parameters , and to be the fractions of the largest singular values of relevant matrices to be penalized. That is, let

where denotes the largest singular value of a matrix. The constants , and control the smoothing effects; we fix and let and be based on a criterion similar to that used in Step 3. Finally, to further simplify the computation, we can approximate by .

Step 5 —

For each , calculate or using (9) and (10) or their fast versions given in (13).

Step 6 —

Compute the thresholds that minimize (15), and determine the graph using either of the two criteria. For example, if is the best threshold, then remove from the edge set if .

6. Numerical study

6.1. Additive and high-dimensional settings

By means of simulated examples, we compare the additive partial correlation operator with the additive conditional covariance operator of Li et al. (2014) and the methods of Yuan & Lin (2007), Liu et al. (2009), Fellinghauer et al. (2013) and Voorman et al. (2014). The additive partial correlation operator is able to identify the graph whose underlying distribution does not satisfy the Gaussian or copula Gaussian assumption. To demonstrate this feature, we generate dependent random variables that do not have Gaussian or copula Gaussian distributions using the structural equation models of Pearl (2009). Specifically, given an edge set Inline graphic , we generate sequentially via

where Inline graphic is the link function and are independent and identically distributed standard Gaussian variables. If is linear, the joint distribution is Gaussian; otherwise, the joint distribution may be neither Gaussian nor copula Gaussian.

We consider the following graphical models based on three choices of Inline graphic .

Model I: .
Model II: .
Model III:

The sample sizes are taken to be Inline graphic and 100, and the number of nodes is .

We use the hub structure to generate the underlying graphs and the corresponding edge sets Inline graphic . Hubs are commonly observed in networks such as gene regulatory networks and citation networks; see Newman (2003). Specifically, given a graph of size , ten independent hubs are generated so that each module is of degree . For each of the combinations, we generate 100 samples and produce the averaged receiver operating characteristic curves and the areas under these curves. To draw the curves, we need to compute the false positive and true positive rates. Suppose Inline graphic is an estimate of ; then the formal definitions of these two measures are

(16)

The receiver operating characteristic curves are plotted in Fig. 1.

Inline graphic — Receiver operating characteristic curves for different estimators: the additive partial correlation operator (——); the additive conditional covariance operator (black —–); the method of Yuan & Lin (2007) (black - - - -); the method of Liu et al. (2009) (); the method of Fellinghauer et al. (2013) (grey ——); and the method of Voorman et al. (2014) with default basis (grey - - - -). The two middle panels also display the method of Voorman et al. (2014) with the correct basis for Model II (grey - -- -). In each panel the horizontal axis shows the false positive rate and the vertical axis the true positive rate.

For all the comparisons in §§6.1 and 6.2, we use the radial basis function for both the additive conditional covariance and the additive partial correlation operators. For Model I, we see that the methods of Yuan & Lin (2007) and Liu et al. (2009) perform better than the nonparametric methods. This is not surprising as Gaussianity holds under Model I, and because both methods use the Inline graphic penalty, which is more efficient than thresholding. Nevertheless, the performance of the additive partial correlation operator is not far behind. For example, the areas under the receiver operating characteristic curves for the additive partial correlation operator have an average of 0 Inline graphic 98 for the two curves in Model I, only slightly smaller than the average of the areas under the curves for the methods of Yuan & Lin (2007) and Liu et al. (2009), which is 100.

For Models II and III, under which neither Gaussianity nor copula Gaussianity is satisfied, the methods of Yuan & Lin (2007) and Liu et al. (2009) do not perform well. In contrast, both the additive conditional covariance and the additive partial correlation operators still perform remarkably well. Moreover, the receiver operating characteristic curves of the additive partial correlation operator are consistently better than those of the additive conditional covariance operator for Models I and II and for sample sizes 50 and 100, indicating the benefit of a better scaling by the additive partial correlation operator. We also observe that the performance of the method of Fellinghauer et al. (2013) is not very stable. Since their method is based on random forests, it may be affected by the curse of dimensionality that a fully nonparametric approach tends to suffer from. The method of Voorman et al. (2014) is implemented using the R package (R Development Core Team, 2016) spacejam, whose default basis is the cubic polynomial. It shows improvements over the methods of Yuan & Lin (2007) and Liu et al. (2009), but does not perform as well as the additive partial correlation operator. To investigate the effect of the choice of basis on the method of Voorman et al. (2014), we compute its receiver operating characteristic curve for Model II using the correct basis Inline graphic . Notably, this method with the correct basis performs the best under Model II among all the competing methods. Results for smaller graphs are presented in the Supplementary Material.

6.2. Nonadditive and low-dimensional settings

We also investigate a setting where the relationships between nodes are nonadditive and the dimension of the graph is relatively low, which favours a fully nonparametric method such as the method of Fellinghauer et al. (2013). Specifically, we consider

Model IV: , , , , , ,

where Inline graphic are independent and identically distributed standard Gaussian variables.

Our goal is to recover the graph determined by the set of conditional independence relations Inline graphic whenever . Under Model IV, the edge set is The graphical model based on pairwise conditional independence cannot fully describe the interdependence in , because it cannot capture three-way or multi-way conditional dependence. A fully descriptive approach in such situations would be to use a hypergraph (Lauritzen, 1996, p. 21). Nevertheless, the pairwise conditional independence graphical model is well-defined and helps to illustrate the difference between an additive and a fully nonparametric model.

Taking Inline graphic , we compute the receiver operating characteristic curves for 100 replicates, which are presented in the Supplementary Material. Since the model is nonlinear, we only compare the additive partial correlation operator with the additive conditional covariance operator and the methods of Voorman et al. (2014) and Fellinghauer et al. (2013). The method of Fellinghauer et al. (2013) performs the best, because it allows the conditional expectation of each node to be a nonadditive function of its neighbouring nodes. On the other hand, the additive partial correlation operator still performs reasonably well. This indicates that, in spite of its additive formulation, the additive partial correlation operator is capable of identifying conditional independence even in nonadditive models.

6.3. Effects of the choices of kernels, ridge parameters and number of eigenfunctions

In this subsection we study the performance of the additive partial correlation operator with different choices of kernel. We investigate six types of kernel: the radial basis function, the rational quadratic kernel with parameters 200 and 400, the linear kernel, the quadratic kernel, and the Laplacian kernel. The choice of parameters for the rational quadratic kernel follows Li et al. (2014). For each model, ten replicates are generated using Inline graphic and . The averaged receiver operating characteristic curves for the six kernels are presented in the Supplementary Material. The results suggest that all the nonlinear kernels give comparable performance across Models I, II and III. As expected, the linear kernel fails for Models II and III.

Next, we investigate the sensitivity of the proposed estimator to the tuning parameters Inline graphic and . We take 20 equally spaced grid points in each of the ranges and , with and computed via the empirical formulas in §5.4. Then, for each of the combinations, a receiver operating characteristic curve is produced and its area under the curve is computed. The means of the areas for Models I, II and III are 0 Inline graphic 995, 0956 and 0971, respectively, with standard deviations of 0004, 0004 and 0011. These values indicate that the performance of the proposed estimator is reasonably robust with respect to the choice of tuning parameters. The actual receiver operating characteristic curves for different combinations of tuning parameters are plotted in the Supplementary Material.

We also investigated the effect of using different numbers of eigenfunctions. For each of Models I–III and with Inline graphic , we increase from 1 to and, for each fixed , produce a receiver operating characteristic curve and compute the area under the curve. The areas under the curves are reported in the Supplementary Material. The results show that the effect of using a different number of eigenfunctions varies across the three models, which is to be expected as they have different complexities. Specifically, a single eigenfunction achieves the largest area under the curve for the linear model, but for the nonlinear models the optimal areas are achieved when more eigenfunctions are used. We also see that our choices of Inline graphic are not far from the best choice for all three models.

6.4. Exploring the generalized crossvalidation procedure

In this subsection we investigate the performance and computational cost of the generalized crossvalidation procedure introduced in §5.3, and compare it with the method of Voorman et al. (2014) using two different selection criteria: the Akaike information criterion and the Bayesian information criterion. Three measures are used to evaluate the comparisons: the true positive and false positive rates in (16), and a synthetic score defined as

(17)

Table 1 shows the averages of these criteria over 100 replicates using Model III with Inline graphic and . We omit the result obtained from the method of Voorman et al. (2014) using the Bayesian information criterion, because the Akaike information criterion for the same method always performs better in this setting. Our procedure consistently picks up the thresholds located around the best scenario. In comparison, the method of Voorman et al. (2014) with the Akaike information criterion does not perform as well as our estimator.

Table 1.

Comparison of the tuning procedures for (a) the additive partial correlation operator with generalized crossvalidation and (b) the method of Voorman et al. 2014 with the Akaike information criterion; Inline graphic and are defined in (16), and is defined in (17); larger , smaller and lower indicate better performance



(a)	971	109	011	981	178	018	987	221	022
(b)	588	247	061	900	716	072	786	515	056

Open in a new tab

We also compare the computational costs of the two methods in estimating larger networks with Inline graphic or 5000. For the tuning parameters, 40 grid points are used for both the additive partial correlation operator and the method of Voorman et al. (2014). The results are reported in Table 2. With , our algorithm takes only minutes to complete, and for it is still reasonably efficient. In terms of estimation accuracy, our method has smaller Inline graphic than the method of Voorman et al. (2014). The complexity of the additive partial correlation operator grows as . However, for handling graphs with thousands of nodes, our method is faster than the regression-based approaches.

Table 2.

Comparison of computing times for (a) the additive partial correlation operator with generalized crossvalidation and (b) the method of Voorman et al.2014 with the Akaike information criterion. All experiments were conducted on an Intel Xeon E5520 with Inline graphic GHz CPU


	Minutes		Minutes		Minutes		Minutes
(a)	23	035	86	023	584	059	2133	054
(b)	477	087	1228	068	5534	097	15728	096

Open in a new tab

6.5. Application to the DREAM4 Challenges data

We apply the six methods to a dataset from the DREAM4 Challenges project (Marbach et al., 2010). The goal of this study is to infer network structure from gene expression data. The topologies of the graphs are obtained by extracting subgraphs from real biological networks. The gene expression levels are generated based on a system of ordinary differential equations governing the dynamics of the biological interactions between the genes. There are five networks of size 100 to be estimated in this dataset. For each network, we stack up observations from three different experimental conditions, wild-type, knockdown and knockout, so that the overall sample size Inline graphic is 201. Then, the estimated graphs are produced using the additive partial correlation operator, the additive conditional covariance operator of Li et al. (2014), and the methods of Voorman et al. (2014), Fellinghauer et al. (2013), Yuan & Lin (2007) and Liu et al. (2009). The areas under the receiver operating characteristic curves are reported in Table 3, and the actual receiver operating characteristic curves are displayed in the Supplementary Material. We see that the additive partial correlation operator consistently performs best among the six estimators.

Table 3.

Areas under the receiver operating characteristic curves for the DREAM4 Challenges dataset, obtained from (a) the additive partial correlation operator, (b) the additive conditional covariance operator, (c) the method of Voorman et al. (2014), (d) the method of Fellinghauer et al. 2013, (e) the method of Liu et al. 2009, (f) the method of Yuan & Lin 2007, and (g) the championship method

	(a)	(b)	(c)	(d)	(e)	(f)	(g)
Network 1	086	067	079	073	061	074	091
Network 2	081	062	070	064	057	070	081
Network 3	083	070	077	068	064	073	083
Network 4	083	071	076	071	061	072	083
Network 5	077	066	070	073	061	070	075

Open in a new tab

The original DREAM4 project was open to public challenges, so it is reasonable to compare our results with those submitted by the participating teams. In column (g) of Table 3 we show the areas under the receiver operating characteristic curves obtained from the method of the championship team. The additive partial correlation operator yields the best areas under the curves for four of the five networks; in particular, it performs better than the method of the championship team for Network 5. As mentioned in Marbach et al. (2010), the best-performing approach used a combination of multiple models, including ordinary differential equations. Our operator replicates the most competitive results without employing any prior information on the model setting, which demonstrates the benefit of relaxing the distributional assumption; moreover, its additive structure does not seem to hamper its accuracy in this application.

7. Concluding remarks

In establishing the consistency of the additive conditional covariance operator and the additive partial correlation operator, we have developed a theoretical framework that is not limited to the current setting; it can be applied to other problems where additive conditional independence and linear operators are involved. Moreover, the idea of characterizing conditional independence by small values of the additive partial correlation operator has ramifications beyond those explored in this paper. For instance, the penalty in the proposed additive partial correlation operator is based on hard thresholding, but other penalties, such as the lasso-type penalties, may be more efficient in dealing with sparsity in the estimation of operators. We leave these extensions and refinements to future research.

Supplementary material

Supplementary material available at Biometrika online includes the proofs of the theoretical results and additional plots for the numerical studies.

Supplementary Material

Supplementary Data

Click here for additional data file.^{(464.4KB, zip)}

Acknowledgments

We are grateful to three referees for their constructive comments and helpful suggestions. Bing Li's research was supported in part by the U.S. National Science Foundation; Hongyu Zhao's research was supported in part by both the National Science Foundation and the National Institutes of Health.

References

Bach F. R. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9, 1179–225. [Google Scholar]
Baker C. R. (1973). Joint measures and cross-covariance operators. Trans. Am. Math. Soc. 186, 273–89. [Google Scholar]
Banerjee O., Ghaoui, El L. & d'Aspremont A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9, 485–516. [Google Scholar]
Bellman R. E. (1957). Dynamic Programming. Princeton: Princeton University Press. [Google Scholar]
Bickel P. J. & Levina E. (2008a). Covariance regularization by thresholding. Ann. Statist. 36, 2577–604. [Google Scholar]
Bickel P. J. & Levina E. (2008b). Regularized estimation of large covariance matrices. Ann. Statist. 36, 199–227. [Google Scholar]
Cai T., Liu W. & Luo X. (2011). A constrained minimization approach to sparse precision matrix estimation. J. Am. Statist. Assoc. 106, 594–607. [Google Scholar]
Candès E. & Tao T. (2007). The Dantzig selector: Statistical estimation when is much larger than . Ann. Statist. 35, 2313–51. [Google Scholar]
Chen P.-C., Lee K.-Y., Lee T.-J., Lee Y.-J. & Huang S.-Y. (2010). Multiclass support vector classification via coding and regression. Neurocomputing 73, 1501–12. [Google Scholar]
Conway J. B. (1994). A Course in Functional Analysis. New York: Springer, 2nd ed. [Google Scholar]
Fan J., Feng Y. & Song R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. J. Am. Statist. Assoc. 106, 544–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fan J. & Li R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Statist. Assoc. 96, 1348–60. [Google Scholar]
Fellinghauer B., Bühlmann P., Ryffelb M., Rheinc M. & Reinhardta J. D. (2013). Stable graphical model estimation with random forests for discrete, continuous, and mixed variables. Comp. Statist. Data Anal. 64, 132–52. [Google Scholar]
Friedman J. H., Hastie T. J. & Tibshirani R. J. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fukumizu K., Bach F. R. & Gretton A. (2007). Statistical consistency of kernel canonical correlation analysis. J. Mach. Learn. Res. 8, 361–83. [Google Scholar]
Fukumizu K., Bach F. R. & Jordan M. I. (2004). Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. J. Mach. Learn. Res. 5, 73–99. [Google Scholar]
Fukumizu K., Bach F. R. & Jordan M. I. (2009). Kernel dimension reduction in regression. Ann. Statist. 37, 1871–905. [Google Scholar]
Fukumizu K., Gretton A., Sun X. & Schölkopf B. (2008). Kernel measures of conditional dependence. Adv. Neural Info. Proces. Syst. 20, 489–96. [Google Scholar]
Harris N. & Drton M. (2013). PC algorithm for nonparanormal graphical models. J. Mach. Learn. Res. 14, 3365–83. [Google Scholar]
Hoerl A. E. & Kennard R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 55–67. [Google Scholar]
Horn R. A. & Johnson C. R. (1985). Matrix Analysis. Cambridge: Cambridge University Press. [Google Scholar]
Lam C. & Fan J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37, 4254–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lauritzen S. L. (1996). Graphical Models. Oxford: Oxford University Press. [Google Scholar]
Lee K.-Y., Li B. & Chiaromonte F. (2013). A general theory for nonlinear sufficient dimension reduction: Formulation and estimation. Ann. Statist. 41, 221–49. [Google Scholar]
Lee K.-Y., Li B. & Zhao H. (2016). Variable selection via additive conditional independence. J. R. Statist. Soc. B to appear, doi:10.1111/rssb.12150.
Lee Y.-J. & Huang S.-Y. (2007). Reduced support vector machines: A statistical theory. IEEE Trans. Neural Networks 18, 1–13. [DOI] [PubMed] [Google Scholar]
Li B., Chun H. & Zhao H. (2012). Sparse estimation of conditional graphical models with application to gene networks. J. Am. Statist. Assoc. 107, 152–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li B., Chun H. & Zhao H. (2014). On an additive semi-graphoid model for statistical networks with application to pathway analysis. J. Am. Statist. Assoc. 109, 1188–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu H., Han F., Yuan M., Lafferty J. & Wasserman L. (2012). High-dimensional semiparametric Gaussian copula graphical models. Ann. Statist. 40, 2293–326. [Google Scholar]
Liu H., Lafferty J. & Wasserman L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10, 2295–328. [PMC free article] [PubMed] [Google Scholar]
Marbach D., Prill R. J., Schaffter T., Mattiussi C., Floreano D. & Stolovitzky G. (2010). Revealing strengths and weaknesses of methods for gene network inference. Proc. Nat. Acad. Sci. 107, 6286–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meinshausen N. & Bühlmann P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34, 1436–62. [Google Scholar]
Muirhead R. J. (2005). Aspects of Multivariate Statistical Theory. New York: Wiley, 2nd ed. [Google Scholar]
Newman M. (2003). The structure and function of complex networks. SIAM Rev. 45, 167–256. [Google Scholar]
Pearl J. (2009). Causality: Models, Reasoning and Inference. Cambridge: Cambridge University Press, 2nd ed. [Google Scholar]
Pearl J., Geiger D. & Verma T. (1989). Conditional independence and its representations. Kybernetika 25, 33–44. [Google Scholar]
Pearl J. & Verma T. (1987). The logic of representing dependencies by directed graphs. In Proceedings of the Sixth National Conference on Artificial Intelligence, vol. 1. AAAI Press, pp. 374–9.
Peng J., Wang P., Zhou N. & Zhu J. (2009). Partial correlation estimation by joint sparse regression models. J. Am. Statist. Assoc. 104, 735–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
R Development Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org.
Tibshirani R. J. (1996). Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B 58, 267–88. [Google Scholar]
Voorman A., Shojaie A. & Witten D. (2014). Graph estimation with joint additive models. Biometrika 101, 85–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xue L. & Zou H. (2012). Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann. Statist. 40, 2541–71. [Google Scholar]
Yuan M. (2010). High dimensional inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res. 11, 2261–86. [Google Scholar]
Yuan M. & Lin Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94, 19–35. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Click here for additional data file.^{(464.4KB, zip)}

[ASW028C1] Bach F. R. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9, 1179–225. [Google Scholar]

[ASW028C2] Baker C. R. (1973). Joint measures and cross-covariance operators. Trans. Am. Math. Soc. 186, 273–89. [Google Scholar]

[ASW028C3] Banerjee O., Ghaoui, El L. & d'Aspremont A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9, 485–516. [Google Scholar]

[ASW028C4] Bellman R. E. (1957). Dynamic Programming. Princeton: Princeton University Press. [Google Scholar]

[ASW028C5] Bickel P. J. & Levina E. (2008a). Covariance regularization by thresholding. Ann. Statist. 36, 2577–604. [Google Scholar]

[ASW028C6] Bickel P. J. & Levina E. (2008b). Regularized estimation of large covariance matrices. Ann. Statist. 36, 199–227. [Google Scholar]

[ASW028C7] Cai T., Liu W. & Luo X. (2011). A constrained minimization approach to sparse precision matrix estimation. J. Am. Statist. Assoc. 106, 594–607. [Google Scholar]

[ASW028C8] Candès E. & Tao T. (2007). The Dantzig selector: Statistical estimation when is much larger than . Ann. Statist. 35, 2313–51. [Google Scholar]

[ASW028C9] Chen P.-C., Lee K.-Y., Lee T.-J., Lee Y.-J. & Huang S.-Y. (2010). Multiclass support vector classification via coding and regression. Neurocomputing 73, 1501–12. [Google Scholar]

[ASW028C10] Conway J. B. (1994). A Course in Functional Analysis. New York: Springer, 2nd ed. [Google Scholar]

[ASW028C11] Fan J., Feng Y. & Song R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. J. Am. Statist. Assoc. 106, 544–57. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ASW028C12] Fan J. & Li R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Statist. Assoc. 96, 1348–60. [Google Scholar]

[ASW028C13] Fellinghauer B., Bühlmann P., Ryffelb M., Rheinc M. & Reinhardta J. D. (2013). Stable graphical model estimation with random forests for discrete, continuous, and mixed variables. Comp. Statist. Data Anal. 64, 132–52. [Google Scholar]

[ASW028C14] Friedman J. H., Hastie T. J. & Tibshirani R. J. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–41. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ASW028C15] Fukumizu K., Bach F. R. & Gretton A. (2007). Statistical consistency of kernel canonical correlation analysis. J. Mach. Learn. Res. 8, 361–83. [Google Scholar]

[ASW028C16] Fukumizu K., Bach F. R. & Jordan M. I. (2004). Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. J. Mach. Learn. Res. 5, 73–99. [Google Scholar]

[ASW028C17] Fukumizu K., Bach F. R. & Jordan M. I. (2009). Kernel dimension reduction in regression. Ann. Statist. 37, 1871–905. [Google Scholar]

[ASW028C18] Fukumizu K., Gretton A., Sun X. & Schölkopf B. (2008). Kernel measures of conditional dependence. Adv. Neural Info. Proces. Syst. 20, 489–96. [Google Scholar]

[ASW028C19] Harris N. & Drton M. (2013). PC algorithm for nonparanormal graphical models. J. Mach. Learn. Res. 14, 3365–83. [Google Scholar]

[ASW028C20] Hoerl A. E. & Kennard R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 55–67. [Google Scholar]

[ASW028C21] Horn R. A. & Johnson C. R. (1985). Matrix Analysis. Cambridge: Cambridge University Press. [Google Scholar]

[ASW028C22] Lam C. & Fan J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37, 4254–78. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ASW028C23] Lauritzen S. L. (1996). Graphical Models. Oxford: Oxford University Press. [Google Scholar]

[ASW028C24] Lee K.-Y., Li B. & Chiaromonte F. (2013). A general theory for nonlinear sufficient dimension reduction: Formulation and estimation. Ann. Statist. 41, 221–49. [Google Scholar]

[ASW028C25] Lee K.-Y., Li B. & Zhao H. (2016). Variable selection via additive conditional independence. J. R. Statist. Soc. B to appear, doi:10.1111/rssb.12150.

[ASW028C26] Lee Y.-J. & Huang S.-Y. (2007). Reduced support vector machines: A statistical theory. IEEE Trans. Neural Networks 18, 1–13. [DOI] [PubMed] [Google Scholar]

[ASW028C27] Li B., Chun H. & Zhao H. (2012). Sparse estimation of conditional graphical models with application to gene networks. J. Am. Statist. Assoc. 107, 152–67. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ASW028C28] Li B., Chun H. & Zhao H. (2014). On an additive semi-graphoid model for statistical networks with application to pathway analysis. J. Am. Statist. Assoc. 109, 1188–204. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ASW028C29] Liu H., Han F., Yuan M., Lafferty J. & Wasserman L. (2012). High-dimensional semiparametric Gaussian copula graphical models. Ann. Statist. 40, 2293–326. [Google Scholar]

[ASW028C30] Liu H., Lafferty J. & Wasserman L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10, 2295–328. [PMC free article] [PubMed] [Google Scholar]

[ASW028C31] Marbach D., Prill R. J., Schaffter T., Mattiussi C., Floreano D. & Stolovitzky G. (2010). Revealing strengths and weaknesses of methods for gene network inference. Proc. Nat. Acad. Sci. 107, 6286–91. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ASW028C32] Meinshausen N. & Bühlmann P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34, 1436–62. [Google Scholar]

[ASW028C33] Muirhead R. J. (2005). Aspects of Multivariate Statistical Theory. New York: Wiley, 2nd ed. [Google Scholar]

[ASW028C34] Newman M. (2003). The structure and function of complex networks. SIAM Rev. 45, 167–256. [Google Scholar]

[ASW028C35] Pearl J. (2009). Causality: Models, Reasoning and Inference. Cambridge: Cambridge University Press, 2nd ed. [Google Scholar]

[ASW028C36] Pearl J., Geiger D. & Verma T. (1989). Conditional independence and its representations. Kybernetika 25, 33–44. [Google Scholar]

[ASW028C37] Pearl J. & Verma T. (1987). The logic of representing dependencies by directed graphs. In Proceedings of the Sixth National Conference on Artificial Intelligence, vol. 1. AAAI Press, pp. 374–9.

[ASW028C38] Peng J., Wang P., Zhou N. & Zhu J. (2009). Partial correlation estimation by joint sparse regression models. J. Am. Statist. Assoc. 104, 735–46. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ASW028C39] R Development Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org.

[ASW028C40] Tibshirani R. J. (1996). Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B 58, 267–88. [Google Scholar]

[ASW028C41] Voorman A., Shojaie A. & Witten D. (2014). Graph estimation with joint additive models. Biometrika 101, 85–101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ASW028C42] Xue L. & Zou H. (2012). Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann. Statist. 40, 2541–71. [Google Scholar]

[ASW028C43] Yuan M. (2010). High dimensional inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res. 11, 2261–86. [Google Scholar]

[ASW028C44] Yuan M. & Lin Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94, 19–35. [Google Scholar]



(a)	971	109	011	981	178	018	987	221	022
(b)	588	247	061	900	716	072	786	515	056



(a)	971	109	011	981	178	018	987	221	022
(b)	588	247	061	900	716	072	786	515	056

PERMALINK

On an additive partial correlation operator and nonparametric estimation of graphical models

Kuang-Yao Lee

Bing Li

Hongyu Zhao

Abstract

1. Introduction

2. Additive conditional independence and graphical models

2.1. Additive conditional independence

Definition 1 —

Definition 2 —

2.2. Additive reproducing kernel Hilbert spaces

Assumption 1 —

Definition 3 —

2.3. Other notation

3. Additive partial covariance operator

3.1. The additive conditional covariance operator

Assumption 2 —

Definition 4 —

3.2. The additive partial correlation operator

Theorem 1 —

Definition 5 —

Assumption 3 —

Theorem 2 —

3.3. Estimators

4. Consistency and convergence rate

Assumption 4 —

Theorem 3 —

Theorem 4 —

Theorem 5 —

5. Implementation of estimation of graphical models

5.1. Coordinate representation

5.2. Norms of the estimated additive partial correlation operator

Corollary 1 —

5.3. Reduced kernel and generalized crossvalidation

5.4. Algorithm

Step 1 —

Step 2 —

Step 3 —

Step 4 —

Step 5 —

Step 6 —

6. Numerical study

6.1. Additive and high-dimensional settings

Fig. 1.

6.2. Nonadditive and low-dimensional settings

6.3. Effects of the choices of kernels, ridge parameters and number of eigenfunctions

6.4. Exploring the generalized crossvalidation procedure

Table 1.

Table 2.

6.5. Application to the DREAM4 Challenges data

Table 3.

7. Concluding remarks

Supplementary material

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases



(a)	971	109	011	981	178	018	987	221	022
(b)	588	247	061	900	716	072	786	515	056