Conditional Functional Graphical Models

Kuang-Yao Lee; Dingjue Ji; Lexin Li; Todd Constable; Hongyu Zhao

doi:10.1080/01621459.2021.1924178

. Author manuscript; available in PMC: 2024 Jan 1.

Published in final edited form as: J Am Stat Assoc. 2021 Jun 22;118(541):257–271. doi: 10.1080/01621459.2021.1924178

Conditional Functional Graphical Models

Kuang-Yao Lee ^a, Dingjue Ji ^b, Lexin Li ^c, Todd Constable ^b,^d, Hongyu Zhao ^b

PMCID: PMC10181795 NIHMSID: NIHMS1746368 PMID: 37193511

Abstract

Graphical modeling of multivariate functional data is becoming increasingly important in a wide variety of applications. The changes of graph structure can often be attributed to external variables, such as the diagnosis status or time, the latter of which gives rise to the problem of dynamic graphical modeling. Most existing methods focus on estimating the graph by aggregating samples, but largely ignore the subject-level heterogeneity due to the external variables. In this article, we introduce a conditional graphical model for multivariate random functions, where we treat the external variables as conditioning set, and allow the graph structure to vary with the external variables. Our method is built on two new linear operators, the conditional precision operator and the conditional partial correlation operator, which extend the precision matrix and the partial correlation matrix to both the conditional and functional settings. We show that their nonzero elements can be used to characterize the conditional graphs, and develop the corresponding estimators. We establish the uniform convergence of the proposed estimators and the consistency of the estimated graph, while allowing the graph size to grow with the sample size, and accommodating both completely and partially observed data. We demonstrate the efficacy of the method through both simulations and a study of brain functional connectivity network.

Keywords: Brain connectivity analysis, Functional magnetic resonance imaging, Graphical model, Karhunen–Loève expansion, Linear operator, Reproducing kernel Hilbert space

1. Introduction

Functional graphical modeling is gaining increasing attention in the recent years, where the central goal is to investigate the interdependence among multivariate random functions. Applications include time course gene expression data in genomics (Wei and Li 2008), multivariate time series data in finance (Tsay and Pourahmadi 2017), electrocorticography, and functional magnetic resonance data in neuroimaging (Zhang et al. 2015), among many others.

Our motivation is brain functional connectivity analysis based on functional magnetic resonance imaging (fMRI). Functional MRI measures brain neural activities via blood oxygenlevel-dependent signals. It depicts brain functional connectivity network, which is shown to alter under different disorders or during different brain developmental stages. Such alterations contain crucial insights of both disorder pathology and development of the brain (Fox and Greicius 2010). The fMRI data are often summarized in the form of a location by time matrix for each individual subject. The rows correspond to a set of brain regions, and the columns correspond to time points that are usually 500 msec to 2 sec apart and span a few minutes in total. From the fMRI scans, a graph is constructed, where nodes represent brain regions, and links represent interactions and dependencies among the regions (Fornito, Zalesky, and Breakspear 2013). Numerous statistical methods have been developed to estimate functional connectivity network. Most of these methods treat the fMRI data as multivariate random variables with repeated observations, where each region is represented by a random variable and the time-course data are taken as repeated measures of that variable (e.g., Bullmore and Sporns 2009; Wang et al. 2016b). There are recent proposals to model the fMRI data as multivariate functions, where the time-course data of each region are taken as a function (e.g., Li and Solea 2018). Given the continuous nature and the short-time interval between the adjacent sampling points of fMRI, we treat the data as multivariate functions, and formulate the connectivity network estimation as a functional graphical modeling problem in this article.

Nearly, all existing graph estimation methods tackle the problem by aggregating samples, sometimes according to the diagnostic groups. However, there is considerable subject-level heterogeneity, which may contain crucial information for our understanding of the network, but has been largely untapped or ignored by existing methods (Fornito, Zalesky, and Breakspear 2013). Heterogeneity could arise due to a subject’s phenotype profile; for example, in our study in Section 7, the brain connectivity network may vary with an individual’s intelligence score. It could also arise due to a time variable, for example, the subject’s age, and the connectivity network may vary with age, which leads to dynamic graphical modeling. In this article, we introduce a conditional functional graphical model for a set of random functions, by modeling the external variables such as the phenotype or age as the conditioning set. Our proposal thus extends two lines of existing and relevant research: from conditional and dynamic graphical modeling of random variables to that of random functions, and from unconditional functional graphical modeling to conditional functional graphical modeling.

The first line of relevant research is the class of graphical models of random variables. Within this class, there is a rich literature on unconditional graphical models (Yuan and Lin 2007; Friedman et al. 2008; Peng et al. 2009, among others). There are extensions to joint estimation of multiple graphs, which arise from a small number of groups, typically two or three, according to an external variable such as the diagnostic status (Danaher, Wang, and Witten 2011; Chun, Zhang, and Zhao 2015; Lee and Liu 2015; Zhu and Li 2018). There have also been some recent proposals of dynamic graphical models (Kolar et al. 2010; Xu and Hero 2014; Zhang and Cao 2017). However, they only considered a discrete-time setting, in which the network is estimated only at a small and discrete number of time points. The most relevant work to ours is Qiu et al. (2016), who also targeted estimation of the individual graph according to an external variable, for example, age. Although both Qiu et al. (2016) and our method are designed for graph estimation at the individual level, the two solutions differ in many ways. Particularly, Qiu et al. (2016) assumed that the repeated observations of each individual come from the same Gaussian distribution, whose dependency was required to follow a certain stationary time series structure. By contrast, we treat the repeated measurements as realizations from a random function, and do not impose any structural or relational assumption on the entire function. More importantly, compared to the random variable setting in Qiu et al. (2016), our functional setting involves an utterly different and new set of modeling techniques and theoretical tools.

The second line of relevant research is the class of unconditional graphical models of random functions, which appeared only recently. Qiao, Guo, and James (2019) introduced a functional Gaussian graphical model, by assuming that the random functions follow a multivariate Gaussian distribution. Li and Solea (2018) relaxed the Gaussian assumption, developed a precision operator that generalizes the concept of precision matrix to the functional setting, and used it to estimate the unconditional functional graph. However, their precision operator is nonrandom, and their graph dimension is fixed. By contrast, we introduce a conditional precision operator (CPO), which is a function of the conditioning variable and is thus random, and we allow the graph dimension to diverge with the sample size. These differences bring extra challenges in analyzing the operator-based statistics. Moreover, because the relation between the CPO and the conditioning variable can be nonlinear, its estimation requires the construction of a reproducing kernel Hilbert space (RKHS) of the conditioning variable, which leads to a more complex asymptotic analysis than that of Li and Solea (2018). We also derive a number of concentration bounds and uniform convergence for our proposed estimators, while such results are not available in Li and Solea (2018).

To address the problem of conditional graphical modeling of random functions, we introduce two new linear operators: the CPO, and the conditional partial correlation operator (CPCO), which extend the precision matrix and the partial correlation matrix from the random variable setting to both the conditional and functional settings. We show that, when the conditional distribution is Gaussian, the conditional graph can be thoroughly captured by the nonzero elements of CPO or CPCO. This property echoes the classic result where a static graph can be inferred by the precision matrix or the partial correlation matrix under the Gaussian assumption. We note that, some early works, such as Lee, Li, and Zhao (2016a,b), also estimated the parameters of interest through linear operators. However, we studied utterly different problems: Lee, Li, and Zhao (2016b) targeted variable selection in classical regressions, Lee, Li, and Zhao (2016a) targeted unconditional graph estimation for random variables, while we target conditional graph estimation for random functions. Both the methodology and theory involved are thus substantially different.

Our proposal makes useful contributions at multiple fronts. On the method side, it offers a new class of statistical models to study conditional graph estimation for multivariate functional data, a problem that remains largely unaddressed. We investigate the parallels between the random variable-based and random function-based graphs, and between the unconditional and conditional graphs. On the theory side, our work develops new tools for operator-based functional data analysis. We establish the conditional graph estimation consistency, along with a set of concentration inequalities and error bounds, for our proposed method. To our knowledge, very little work has investigated function-on-function dependency at such a level of complexity that involves estimating the linear operators under a conditional framework. The tools we develop are general, and can be applied to other settings in high-dimensional functional data analysis. On the computation side, under a properly defined coordinate system, the proposed operators are functions of the sample covariance operator of dimension n × n, with n being the sample size. It is relatively easy to calculate, and the accompanying estimation algorithm can be scaled up to large graphs.

The rest of the article is organized as follows. We begin with a formal definition of conditional functional graphical model in Section 2. We introduce a series of linear operators in Section 3, develop their estimators in Section 4, and study their asymptotic properties in Section 5. We report the simulations in Section 6, and an analysis of an fMRI dataset in Section 7. We relegate all proofs and some additional results to the supplementary appendix.

2. Model

In this section, we formally define the conditional functional graphical model. Let (Ω_X, $F_{X}$ ) be a measurable space. Suppose $Ω_{X_{i}}$ is a Hilbert space of $ℝ$ -valued functions on an interval $T \subset ℝ$ , for i = 1, …, p, and Ω_X is the Cartesian product $Ω_{X_{1}} \times \dots \times Ω_{X_{p}}$ . Suppose X = (X₁, …, X_p)^⊤ is a p-dimensional random element on Ω_X. Let G = (V, E) be an undirected graph, where V = {1, …, p} represents the set of vertices corresponding to the p random functions, and E = {(i, j) ∈ V × V, i = j} represents the set of undirected edges. A common approach to modeling an undirected graph is to associate the separation on the graph with the conditional independence; in other words, nodes i and j are separated in G if and only if X_i and X_j are independent given the rest of X, that is,

(i, j) \notin E \Leftrightarrow X_{i} ⫫ X_{j} | X_{- (i, j)},

(1)

where X_−(i,j) represents X with its ith and jth components removed, and ⫫ represents statistical independence. Based on Equation (1), Qiao, Guo, and James (2019) proposed a functional graphical model, which assumed that X follows a multivariate Gaussian distribution.

Next, we introduce a conditional functional graphical model that allows the graph links to vary with the external variables. We focus on the univariate external variable case, but our method can be generalized to multivariate external variables, or external functions. Let Y be a random element defined on Ω_Y, and $F_{Y}$ be the Borel σ-field generated by the open sets in Ω_Y. Let $F_{Y} \to ℝ$ and $P_{X ∣ Y} (\cdot ∣ \cdot) : F_{X} \times Ω_{Y} \to ℝ$ be the distribution of Y and conditional distribution of X | Y, respectively. We next give our formal definition.

Definition 1.

Suppose a random graph E^y for each y ∈ Ω_Y is defined via the mapping Ω_Y → 2^V×V, y → E^y, where 2^V×V is the power set of V × V. We say X follows a conditional functional graphical model with respect to E^y if and only if, for y ∈ Ω_Y,

(i, j) \notin E^{y} \Leftrightarrow X_{i} ⫫ X_{j} ∣ [X_{- (i, j)}, Y = y] .

(2)

We note that Li, Chun, and Zhao (2012) introduced the notion of conditional graphical model. However, our notion of conditional functional graphical model is considerably different, in that our model extends theirs not only from the setting of random variables to random functions, but also from the setting of static graphs to random graphs. Specifically, letting X = (X₁, …, X_p)^⊤ and Y = (Y₁, …, Y_q)^⊤ denote two random vectors, Li, Chun, and Zhao (2012) considered the model,

(i, j) \notin E^{0} \Leftrightarrow X_{i} ⫫ X_{j} ∣ [X_{- (i, j)}, Y = y]

for all $y \in ℝ^{q}$ . In this model, E⁰ ⊆ 2^V×V is a fixed graph, and does not change with the value of Y. In comparison, our model in Equation (2) allows X to be a p-variate random function, and the graph E^y to vary with Y.

3. Linear Operators

In this section, we first introduce a series of linear operators, based on which we then formally define the CPO and the CPCO. Finally, we study the relation between these two operators and the conditional functional graph.

We adopt the following notation throughout this article. For two generic Hilbert spaces Ω, Ω′, let $B (Ω, Ω')$ and $B_{2} (Ω, Ω')$ denote the class of all bounded and Hilbert–Schmidt operators from Ω to Ω′, respectively. We abbreviate $B (Ω, Ω)$ and $B_{2} (Ω, Ω)$ as $B (Ω)$ and $B_{2} (Ω)$ whenever they are appropriate. Moreover, let ∥·∥ and ∥·∥_HS be the operator norm in $B (Ω)$ and Hilbert-Schmidt norm in $B_{2} (Ω)$ . Moreover, let ker(A), range(A), $\bar{range} (A)$ denote the null space, the range, and the closure of the range of an operator A, respectively.

3.1. Conditional Covariance and Correlation Operators

We first define three covariance operators, $V_{Y Y}$ , $V_{X_{i} X_{j}}$ , and $V_{Y X_{i j}}$ . We then define the conditional covariance operator $V_{X_{i} X_{j}}^{y}$ , and the conditional correlation operator $ℭ_{X_{i} X_{j}}^{y}$ , which is the building block of CPO and CPCO.

Let $κ_{Y} : Ω_{Y} \times Ω_{Y} \to ℝ$ be a positive-definite kernel, $H_{Y}$ be its corresponding RKHS, and L₂(P_Y) be the collection of all square-integrable functions of Y under P_Y. The next assumption ensures the square-integrability of X_i under P_X|_Y, and that $H_{Y}$ is a subset of L₂(P_Y).

Assumption 1.

There exist M₀ > 0 and M_Y > 0 such that $sup {E ({‖ X_{i} ‖}_{Ω_{X_{i}}}^{2} ∣ y) : y \in Ω_{Y}} \leq M_{0}$ , for i = 1, …, p, and κ_y(Y, Y) ≤ M_Y.

We comment that, we choose RKHS as the modeling space, so that the relation of X on Y can be very flexible, and the kernel matrix of Y is of dimension n × n, an attractive feature when the dimension of Y is large compared with n. In addition, a good number of asymptotic tools for RKHS operators have been developed (see, Bach 2009; Lee, Li, and Zhao 2016a). That being said, our theoretical development can also be easily extended to the spaces beyond RKHS. In fact, our population development only requires that $H_{Y}$ is a proper subset of L₂(P_Y), which can be ensured by the square-integrable condition, $var [h (Y)] = M ∥ h ∥_{H_{Y}}$ for an M > 0 and every $h \in H_{Y}$ .

Let ⊗ denote the tensor product; then κ_y(·, Y) ⊗ κ_y(·, Y), X_i ⊗ X_j and κ_y(·, Y) ⊗ X_i ⊗ X_j are random elements in $B_{2} (H_{Y})$ , $B_{2} (Ω_{X_{j}}, Ω_{X_{i}})$ , and $B_{2} [B_{2} (Ω_{X_{j}}, Ω_{X_{i}}), H_{Y}]$ , respectively. Their expectations uniquely define the covariance operators,

V_{Y Y} = E [κ_{y} (\cdot, Y) \otimes κ_{y} (\cdot, Y)], via 〈 h_{1}, V_{Y Y} h_{2} 〉 = E [h_{1} (Y) h_{2} (Y)], V_{X_{i} X_{j}} = E (X_{i} \otimes X_{j}), via 〈 f, V_{X_{i} X j} g 〉 = E (〈 f, X_{i} 〉 〈 g, X_{j} 〉), V_{Y X_{i j}} = E [κ_{y} (\cdot, Y) \otimes X_{i} \otimes X_{j}], via 〈 h, V_{Y X_{i j}} (f \otimes g) 〉 = E [〈 X_{i}, f 〉 〈 X_{j}, g 〉 h (Y)],

(3)

for all $f \in Ω_{X_{i}}$ , $g \in Ω_{X_{i}}$ , and h, h₁, $h_{2} \in H_{Y}$ . The next proposition justifies the existence of $V_{Y Y}$ , $V_{X_{i} X_{j}}$ , and $V_{Y X_{i j}}$ .

Proposition 1.

If Assumption 1 holds, then there exist linear operators $V_{Y Y}$ , $V_{X_{i} X_{j}}$ , and $V_{Y X_{i j}}$ satisfying the relations in Equation (3).

We next introduce a regression operator, $M_{X_{i j} ∣ Y}$ through the relation, $M_{X_{i j} ∣ Y} = V_{Y Y}^{†} V_{Y X_{i j}}$ , where † is the Moore–Penrose inverse. We first need an assumption on the ranges of $V_{Y Y}$ and $V_{Y X_{i j}}$ , and $H_{Y}$ is sufficiently rich in L₂(P_Y).

Assumption 2.

For every (i, j) ∈ V × V, $range (V_{Y X_{i j}}) \subseteq range (V_{Y Y})$ . Moreover, $H_{Y}$ is dense in L₂(P_Y).

By Assumption 2, for any $h \in range (V_{Y Y})$ , there exists a unique $h^{'} \in \bar{range} (V_{Y Y})$ such that $h = V_{Y Y} h^{'}$ . Therefore, the inverse $V_{Y Y}^{†}$ is defined as $V_{Y Y}^{†} : range (V_{Y Y}) \to \bar{range} (V_{Y Y})$ , h ↦ h′, which implies that $M_{X_{i j} | Y}$ is well-defined. The range condition that $range (V_{Y X_{i j}}) \subseteq range (V_{Y Y})$ is satisfied generally. For instance, it holds when the rank of $B_{2} (Ω_{X_{j}}, Ω_{X_{i}})$ is finite, which is reasonable, because in practice $Ω_{X_{i}}$ can often be approximated by the spanning space of a few leading eigenfunctions of $V_{X_{i} X_{i}}$ .

The next proposition shows that $M_{X_{i j} ∣ Y}$ maps every $(f, g) \in Ω_{X_{i}} \times Ω_{X_{j}}$ to E(〈f, X_i〉 〈g, X_j〉 | y), that is, the conditional expectation, and thus this operator can be viewed as a regression operator.

Proposition 2.

If Assumptions 1 and 2 hold, then for all y ∈ Ω_Y and $(f, g) \in Ω_{X_{i}} \times Ω_{X_{j}}$ , we have $[M_{X_{i j} ∣ Y} (f \otimes g)] \circ (y) = E (〈 f, X_{i} 〉 〈 g, X_{j} 〉 ∣ y)$ .

Now we are ready to define the conditional covariance operator, whose existence is justified by Proposition 2 and Riesz representation theorem.

Definition 2.

For each y ∈ Ω_Y, the bilinear form, $Ω_{X_{i}} \times Ω_{X_{j}} \to ℝ$ , $(f, g) \mapsto M_{X_{i j} ∣ Y} (f \otimes g) \circ (y)$ , uniquely defines an operator $V_{X_{i} X_{j}}^{y} \in B (Ω_{X_{j}}, Ω_{X_{i}})$ , via $〈 f, V_{X_{i} X_{j}}^{y} g 〉 Ω_{X_{i}} = E (〈 f, X_{i} 〉 〈 g, X_{j} 〉 ∣ y)$ for all $(f, g) \in Ω_{X_{i}} \times Ω_{X_{j}}$ . We call $V_{X_{i} X_{j}}^{y}$ the conditional covariance operator.

Note that the mapping $Ω_{Y} \to B (Ω_{X_{j}}, Ω_{X_{i}})$ , $y \mapsto V_{X_{i} X_{j}}^{y}$ defines a random operator. If the conditional expectations E(〈f, X_i〉| y) = 0, then $V_{X_{i} X_{i}}^{y}$ induces the conditional covariance cov(〈f, X_i〉, 〈g, X_j〉 | y). When X_i and X_j are random vectors, Fukumizu, Bach, and Jordan (2009) introduced the homoscedastic conditional covariance operator, which induces E[cov(〈f, X_i〉, 〈g, X_j〉 | Y)]. Our conditional covariance operator is different from that of Fukumizu, Bach, and Jordan (2009), as it extends the classical conditional covariance to the functional setting, and it deals directly with cov(〈f, X_i〉, 〈g, X_j〉 | y), instead of its expectation. We write the joint operator $V_{X X}^{y} : Ω_{X} \to Ω_{X}$ as the block matrix of operators whose (i, j)th element is $V_{X_{i} X_{j}}^{y}$ ,1 ≤ i, j ≤ p, or more explicitly, $V_{X X}^{y} f = {(\sum_{j = 1}^{p} V_{X_{1} X_{j}} f_{j}, \dots, \sum_{j = 1}^{p} V_{X_{p} X_{j}} f_{j})}^{⊤}$ , for any f = (f₁, …, f_p)^⊤ ∈ Ω_X.

Given the conditional covariance operator $V_{X_{i} X_{j}}^{y}$ , we next define the conditional correlation operator, $ℭ_{X_{i} X_{j}}^{y} : Ω_{X_{j}} \to Ω_{X_{i}}$ , for each y ∈ Ω_Y via,

V_{X_{i} X_{j}}^{y} = {(V_{X_{i} X_{i}}^{y})}^{1 / 2} ℭ_{X_{i} X_{j}}^{y} {(V_{X_{j} X_{j}}^{y})}^{1 / 2}, with ‖ ℭ_{X_{i} X_{j}}^{y} ‖ \leq 1.

(4)

Its existence and uniqueness are ensured by Baker (1973). Similar to the construction of $V_{X X}^{y}$ , we write the joint operator $ℭ_{X X}^{y} : Ω_{X} \to Ω_{X}$ as the block matrix of operators whose (i, j)th element is $ℭ_{X_{i} X_{j}}^{y}$ , 1 ≤ i, j ≤ p. Let $D_{X}^{y}$ denote the block diagonal matrix ${[D_{X}^{y}]}_{i, i} = V_{X_{i} X_{i}}^{y}$ , for i ∈ V. We then have $V_{X X}^{y} = {[D_{X}^{y}]}^{1 / 2} ℭ_{X X}^{y} {[D_{X}^{y}]}^{1 / 2}$ .

Next, we impose the distributional assumption on X | y.

Assumption 3.

Suppose X | y follows the conditional functional graphical model as defined via (2), and the conditional distribution of X = (X₁, …, X_p)^⊤ given Y = y follows a centered Gaussian distribution.

The zero-mean condition is imposed to simplify both methodological and theoretical development, and can be relaxed with some modifications. Moreover, we require the conditional distribution X | y to follow a Gaussian distribution. It is possible to relax this Gaussian assumption, by extending the notion of functional additive conditional independence (Li and Solea 2018), or the copula graphical model (Liu et al. 2012). However, we feel that the Gaussian case itself is worthy of a full investigation, and we leave the non-Gaussian extension as future research.

Assumption 3, together with Definition 2, imply that, for f = (f₁, …, f_p)^⊤ ∈ Ω_X, $E (exp \sum_{i = 1}^{p} ι {〈 f_{i}, X_{i} 〉}_{Ω_{X_{i}}} ∣ y) = exp (- 1 / 2 \sum_{i, j = 1}^{p} {〈 f_{i}, V_{X_{i} X_{j}}^{y} f_{j} 〉}_{Ω_{X_{i}}})$ , where $ι = \sqrt{- 1}$ .

3.2. Conditional Precision and Partial CorrelationOperators

We next formally define the operators CPO and CPCO, then establish their relations with the conditional functional graph. We introduce two assumptions, which ensure that $ℭ_{X_{i} X_{j}}^{y}$ is Hilbert-Schmidt and $ℭ_{X X}^{y}$ is invertible. We present some intuitions here, but relegate the detailed technical discussion to Section S.3 in the appendix.

Assumption 4.

For each y ∈ Ω_Y and i ∈ V, let ${(λ_{i}^{y, a}, η_{i}^{y, a})}_{a \in ℕ}$ denote the collection of eigenvalue and eigenfunction pairs of $V_{X_{i} X_{i}}^{y}$ . Let $ℕ_{i}^{y} = {a \in ℕ : λ_{i}^{y, a} > 0}$ . There exists c₁ > 0 such that,

max_{i, j \in V, i \neq j} \sum_{a \in ℕ_{i}^{y}, b \in ℕ_{j}^{y}} \frac{{cov}^{2} (〈 η_{i}^{y, a}, X_{i} 〉, 〈 η_{j}^{y, b}, X_{j} 〉 ∣ y)}{{var}^{2} (〈 η_{i}^{y, a}, X_{i} 〉 ∣ y) {var}^{2} (〈 η_{j}^{y, b}, X_{j} 〉 ∣ y)} \equiv max_{i, j \in V, i \neq j} \sum_{a \in ℕ_{i}^{y}, b \in ℕ_{j}^{y}} {(ρ_{i, j}^{y, a, b})}^{2} \leq c_{1} .

(5)

Assumption 5.

For each y ∈ Ω_Y, $ker (V_{X X}^{y}) = 0$ .

Assumption 4 characterizes the level of smoothness for the underlying distributions of the random functions. Assumption 5 is to prevent the existence of a constant function consisting of linear combination of nonconstant functions. It can be viewed as the generalization of the nonexistence of collinearity in linear models, or the empty concavity space in generalized liner models (Hastie and Tibshirani 1990). Assumption 4 ensures that $ℭ_{X_{i} X_{j}}^{y}$ is Hilbert-Schmidt, and thus compact. Mean-while, Assumptions 4 and 5 together ensure that $ℭ_{X X}^{y}$ is lower-bounded by a strictly positive constant. This implies that $ℭ_{X X}^{y}$ is invertible, and that $P^{y} = {[ℭ_{X X}^{y}]}^{- 1}$ is bounded, which justifies the following definition.

Definition 3.

Define the CPO as the inverse of the joint conditional correlation operator, $P^{y} = {[ℭ_{X X}^{y}]}^{- 1} \in B (Ω_{X})$ , for any y ∈ Ω_Y.

The operator $P^{y}$ generalizes the precision matrix to the functional and conditional settings. We should clarify that, unlike the standard definition where the precision matrix is defined as the inverse of the covariance matrix (Cai, Liu, and Luo 2011), our CPO is defined as the inverse of the conditional correlation operator. This is to avoid taking inversion on the covariance operator, which is usually not invertible because of its compactness. Next, we develop an operator that generalizes the partial correlation matrix to the functional and conditional setting. Similar to the definition for $V_{X X}^{y}$ , we define $V_{X_{A} X_{B}}^{y}$ for any subsets A, B ⊆ V. Therefore, for any subsets A ⊆ V\{i, j}, We define an intermediate operator, $V_{X_{i} X_{j} ∣ X_{A}}^{y} : Ω_{X_{j}} \to Ω_{X_{i}}$ through the relation, $V_{X_{i} X_{j} ∣ X_{A}}^{y} = V_{X_{i} X_{j}}^{y} - V_{X_{i} X_{A}}^{y} {[V_{X_{A} X_{A}}^{y}]}^{†} V_{X_{A} X_{j}}^{y}$ , for any (i, j) ∈ V × V. We then have the following result if A= −(i, j).

Proposition 3.

There uniquely exists $ℜ_{X_{i} X_{j} ∣ X_{- (i j)}}^{y} \in B (Ω_{X_{j}}, Ω_{X_{i}})$ which satisfies that $V_{X_{i} X_{j} ∣ X_{- (i, j)}}^{y} = {[V_{X_{i} X_{i} ∣ X_{- (i, j)}}^{y}]}^{1 / 2} ℜ_{X_{i} X_{j} ∣ X_{- (i, j)}}^{y} {[V_{X_{j} X_{j} ∣ X_{- (i, j)}}^{y}]}^{1 / 2}$ , and $‖ ℜ_{X_{i} X_{j} ∣ X_{- (i, j)}}^{y} ‖ \leq 1$ .

Its proof is similar to that of (Lee, Li, and Zhao 2016a, theo. 1) and is thus omitted. It justifies the definition of the following operator.

Definition 4.

We call the operator $ℜ_{X_{i} X_{j} ∣ X_{- (i, j)}}^{y}$ in Proposition 3 the CPCO between X_i and X_j given X_−(i,j) and Y.

3.3. Relation With Conditional Functional Graph

We first show that the conditional covariance operator can be constructed by the functions of conditional covariances between the Karhunen-Loève coefficients and the associated eigenfunctions. This simple form provides a convenient way to estimate the conditional covariance operator later. Let ${(λ_{i}^{a}, η_{i}^{a})}_{a \in ℕ}$ denote the collection of eigenvalue and eigenvector pairs of $V_{X_{i} X_{i}}$ , with $λ_{i}^{1} \geq λ_{i}^{2} \geq \dots \geq 0$ . Then X_i can almost surely be represented as $X_{i} = \sum_{a \in ℕ} α_{i}^{a} η_{i}^{a}$ , where $α_{i}^{a} = 〈 X_{i}, η_{i}^{a} 〉$ , for all $a \in ℕ$ . This expression is known as the Karhunen-Loève (K-L) expansion (Bosq 2000).

Proposition 4.

Suppose the same conditions in Proposition 2 hold. Then we have

V_{X_{i} X_{j}}^{y} = \sum_{a, b \in ℕ} E (α_{i}^{a} α_{j}^{b} ∣ y) (η_{i}^{a} \otimes η_{j}^{b}), for each y \in Ω_{Y},

where ( $α_{i}^{a}$ , $η_{i}^{a}$ ) and ( $α_{j}^{b}$ , $η_{j}^{b}$ ) are from the Karhunen-Loève expansion.

We next show that CPO and CPCO fully characterize the conditional functional independence, and are thus crucial for our estimation of conditional functional graph.

Theorem 1.

If Assumptions 1–5 hold, then we have, for any y ∈ Ω_Y,

X_{i} ⫫ X_{j} ∣ [X_{- (i, j)}, Y = y] \Leftrightarrow {[P^{y}]}_{i, j} = 0,

where ${[P^{y}]}_{i, j}$ denotes the (i, j)th element of $P^{y}$ .

Under the Gaussian distribution, the equivalence between the conditional independence and the zero element of a nonrandom precision matrix is well known in the classical random variables setting. By contrast, Theorem 1 extends to the setting of random functions and also allows the precision operator to vary with Y.

Theorem 2.

If Assumptions 1–3 hold, then we have, for any y ∈ Ω_Y,

X_{i} ⫫ X_{j} ∣ [X_{- (i, j)}, Y = y] \Leftrightarrow ℜ_{X_{i} X_{j} ∣ X_{- (i, j)}}^{y} = 0 .

Theorems 1 and 2 suggest that one can estimate the conditional functional graph E^y in Equation (2) through the proposed operators, CPO or CPCO. In the following, we primarily focus on the graph estimation based on CPO, and investigate the corresponding asymptotics. The results based on CPCO can be derived in a parallel fashion, which we only briefly discuss in Section S.4 in the Appendix.

4. Estimation

In this section, we first derive the sample estimate of CPO and the conditional graph at the operator level. We then construct empirical bases and develop coordinate representations for the functions observed at a finite set of time points. Using these coordinate representations, we are able to compute our estimated linear operators. Last, we provide a step-by-step summary of our proposed estimation procedure.

4.1. Operator-Level Estimation

We first derive the sample Karhunen-Loève expansion. We and then sequentially develop the estimators of $V_{X X}^{y}$ , $ℭ_{X X}^{y}$ , $P^{y}$ , and finally E^y.

Let (Y¹, …, Yⁿ) denote iid samples of Y, and (X¹, …, Xⁿ) denote iid samples of X, with $X^{k} = {(X_{1}^{k}, \dots, X_{p}^{k})}^{⊤}$ , for k = 1, …, n. Let E_n denote the sample mean operator; that is, for a sample (ω¹, …, ωⁿ) from Ω, $E_{n} (ω) = \sum_{k = 1}^{n} ω^{k} / n$ . We estimate the covariance operators, $V_{X_{i} X_{i}}$ , $V_{Y X_{i j}}$ , and $V_{Y Y}$ , by

{\hat{V}}_{X_{i} X_{i}} = E_{n} (X_{i} \otimes X_{i}),

{\hat{V}}_{Y X_{i j}} = E_{n} [κ_{Y} (\cdot, Y) \otimes X_{i} \otimes X_{j}],

{\hat{V}}_{Y Y} = E_{n} [κ_{Y} (\cdot, Y) \otimes κ_{Y} (\cdot, Y)],

for any (i, j) ∈ V × V. For i ∈ V, let ${({\hat{λ}}_{i}^{a}, {\hat{η}}_{i}^{a})}_{a \in ℕ}$ denote the collection of eigenvalue and eigenfunction pairs of ${\hat{V}}_{X_{i} X_{i}}$ . Then, we have $X_{i}^{k} = \sum_{a \in ℕ} {\hat{α}}_{i}^{k, a} {\hat{η}}_{i}^{a}$ , where ${\hat{α}}_{i}^{k, a} = 〈 X_{i}^{k}, {\hat{η}}_{i}^{a} 〉$ is the ath coefficient from the K-L expansion of $X_{i}^{k}$ for the kth subject. Furthermore, we use the leading d terms to approximate $X_{i}^{k}$ ; in other words, we have, $X_{i}^{k} \approx \sum_{a = 1}^{d} {\hat{α}}_{i}^{k, a} {\hat{η}}_{i}^{a}$ , for all k = 1, …, n, and i ∈ V.

By its definition, we estimate the regression operator $M_{X_{i j} ∣ Y}$ by

{\hat{M}}_{X_{i j} ∣ Y} (ϵ_{Y}) = {({\hat{V}}_{Y Y} + ϵ_{Y} I)}^{- 1} {\hat{V}}_{Y X_{i j}},

(6)

where ϵ_Y > 0 is a prespecified ridge parameter, and it imposes a level of smoothness on the regression structure. Next, by Propositions 2 and 4, given y, d, i, j, we estimate the conditional covariance operator $V_{X_{i} X_{j}}^{y}$ by

{\hat{V}}_{X_{i} X_{j}}^{y} (d, ϵ_{Y}) = \sum_{a, b = 1}^{d} {[{\hat{M}}_{X_{i j} ∣ Y} (ϵ_{Y})] ({\hat{η}}_{i}^{a} \otimes {\hat{η}}_{j}^{b}) \circ (y) ({\hat{η}}_{i}^{a} \otimes {\hat{η}}_{j}^{b})},

(7)

where ${\hat{η}}_{i}^{a}$ and ${\hat{η}}_{j}^{b}$ are from the sample Karhunen-Loève expansion. Let ${\hat{V}}_{X X}^{y} (d, ϵ_{Y})$ denote the block matrix of operators ${\hat{V}}_{X X}^{y} (d, ϵ_{Y}) = {{\hat{V}}_{X_{i} X_{j}}^{y} (d, ϵ_{Y})}_{i, j \in V}$ . Similarly, we can define ${\hat{V}}_{X_{A} X_{B}}^{y} (d, ϵ_{Y})$ , for any A, B ⊆ V.

Therefore, by Equation (4), we estimate the conditional correlation operator $ℭ_{X_{i} X_{j}}^{y}$ by

{\hat{ℭ}}_{X_{i} X_{j}}^{y} (d, ϵ_{Y}, ϵ_{1}) = {[{\hat{V}}_{X_{i} X_{i}}^{y} (d, ϵ_{Y}) + ϵ_{1} I]}^{- 1 / 2} {\hat{V}}_{X_{i} X_{j}}^{y} (d, ϵ_{Y}) {[{\hat{V}}_{X_{j} X_{j}}^{y} (d, ϵ_{Y}) + ϵ_{1} I]}^{- 1 / 2},

(8)

for i ≠ j, i, j ∈ V, and ${\hat{ℭ}}_{X_{i} X_{i}}^{y} = I$ , for i ∈ V, which is the identity mapping from $Ω_{X_{i}}$ to $Ω_{X_{i}}$ , and ϵ₁ is a ridge regularization parameter imposed on the inverses of ${\hat{V}}_{X_{i} X_{i}}^{y}$ and ${\hat{V}}_{X_{j} X_{j}}^{y}$ . Let ${\hat{ℭ}}_{X X}^{y} (d, ϵ_{Y}, ϵ_{1})$ denote the block matrix of operators ${\hat{ℭ}}_{X X}^{y} (d, ϵ_{Y}, ϵ_{1}) = {[{\hat{ℭ}}_{X_{i} X_{j}}^{y} (d, ϵ_{Y}, ϵ_{1})]}_{i, j \in V}$ . The next result shows that the norm of ${\hat{ℭ}}_{X_{i} X_{j}}^{y} (d, ϵ_{Y}, ϵ_{1})$ is bounded by one, which resembles the property of the correlation in the classical setting.

Proposition 5.

If κ_Y(y₁, y₂) ≥ 0 for any y₁, y₂ ∈ Ω_Y, then $‖ {[{\hat{ℭ}}_{X X}^{y} (d, ϵ_{Y}, ϵ_{1})]}_{i, j} ‖ \leq 1$ , for any (i, j) ∈ V × V.

Finally, we estimate the conditional precision matrix operator $P^{y}$ by

{\hat{P}}^{y} (d, ϵ_{Y}, ϵ_{1}, ϵ_{2}) = {[{\hat{ℭ}}_{X X}^{y} (d, ϵ_{Y}, ϵ_{1}) + ϵ_{2} I]}^{- 1},

(9)

where ϵ₂ > 0 is another ridge parameter. Using ${\hat{P}}^{y} (d, ϵ_{Y}, ϵ_{1}, ϵ_{2})$ , for each y ∈ Ω_Y, we estimate the graph E^y by

{\hat{E}}_{CPO}^{y} (d, ϵ_{Y}, ϵ_{1}, ϵ_{2}, ρ_{CPO}) = {(i, j) \in V \times V : {‖ {[{\hat{P}}^{y} (d, ϵ_{Y}, ϵ_{1}, ϵ_{2})]}_{i, j} ‖}_{HS} > ρ_{CPO}, i \neq j},

(10)

where ρ_CPO > 0 is a thresholding parameter. That is, we can obtain an estimator of the conditional graph by hard thresholding. For notational simplicity, hereafter we abbreviate ${\hat{M}}_{X_{i j} ∣ Y} (ϵ_{Y})$ , ${\hat{V}}_{X_{i} X_{j}}^{y} (d, ϵ_{Y})$ , ${\hat{V}}_{X X}^{y} (d, ϵ_{Y})$ , ${\hat{ℭ}}_{X_{i} X_{j}}^{y} (d, ϵ_{Y}, ϵ_{1})$ , ${\hat{ℭ}}_{X X}^{y} (d, ϵ_{Y}, ϵ_{1})$ , ${\hat{P}}^{y} (d, ϵ_{Y}, ϵ_{1}, ϵ_{2})$ , and ${\hat{E}}_{CPO}^{y} (d, ϵ_{Y}, ϵ_{1}, ϵ_{2}, ρ_{CPO})$ , by ${\hat{M}}_{X_{i j} ∣ Y}$ , ${\hat{V}}_{X_{i} X_{j}}^{y}$ , ${\hat{V}}_{X X}^{y}$ , ${\hat{ℭ}}_{X_{i} X_{j}}^{y}$ , ${\hat{ℭ}}_{X X}^{y}$ , ${\hat{P}}^{y}$ , and ${\hat{E}}_{CPO}^{y}$ , respectively.

4.2. Empirical Bases and Coordinate Representation

We next introduce a set of empirical bases and the corresponding coordinate representations. We adopt the following notation. Let Ω be a Hilbert space of functions of T, spanned by a set of bases $B = {b_{1}, \dots, b_{m}}$ . For any ω ∈ Ω, let ${⌊ ω ⌋}_{B} = {({⌊ ω ⌋}_{B, 1}, \dots, {⌊ ω ⌋}_{B, m})}^{⊤}$ denote its coordinate with respect to $B$ . Then ω can be represented as $\sum_{i - 1}^{m} {⌊ ω ⌋}_{B, i} b_{i} = B^{⊤} {⌊ ω ⌋}_{B}$ . Let $K_{B} = {[{〈 b_{s}, b_{t} 〉}_{Ω}]}_{s, t = 1}^{m}$ denote the Gram kernel matrix, which implies that, for any (ω₁, ω₂) ∈ Ω, the inner product ${〈 ω_{1}, ω_{2} 〉}_{Ω} = {⌊ ω_{1} ⌋}_{B}^{⊤} K_{B} {⌊ ω_{2} ⌋}_{B}$ . Throughout the article, we use the symbol ⌊·⌋ exclusively for a chosen coordinate system. Let Ω and Ω′ be two Hilbert spaces spanned by $B = {b_{1}, \dots, b_{m}}$ and $B^{'} = {b_{1}^{'}, \dots, b_{m'}^{'}}$ , respectively. Let A : Ω → Ω′ be a linear operator. Then the coordinate representation of A with respect to $B$ and $B^{'}$ is ${{⌊ A b_{1} ⌋}_{B^{'}}, \dots, {⌊ A b_{m} ⌋}_{B^{'}}} \equiv_{B^{'}} {⌊ A ⌋}_{B}$ . For a third Hilbert space Ω″ with base $B^{''}$ , and another linear operator A′ : Ω′ → Ω″, it is easy to see that $_{B^{''}} {⌊ A^{'} A ⌋}_{B} = (_{B^{''}} {⌊ A^{'} ⌋}_{B^{'}}) (_{B^{'}} {⌊ A ⌋}_{B})$ . For simplicity, we use ⌊A⌋ instead of $_{B^{'}} {⌊ A ⌋}_{B}$ when there is no confusion.

Note that the random function $X_{i}^{k}$ can only be observed at a finite set of points. To enable computation, we need to approximate the random functions using the partially observed data. We adopt the construction used in Li and Solea (2018), which assumes the sample path of X^k lies on an RKHS of the time variable T with a finite basis. Specifically, suppose $X_{i}^{k}$ is observed on a finite set of time points $T_{k} = {T_{k 1}, \dots, T_{k n_{k}}}$ , k = 1, … n. Let $T = {(T_{11}, \dots, T_{1, n_{1}}, \dots, T_{n 1}, \dots, T_{n n_{n}})}^{⊤} = {(T_{1}, \dots, T_{N})}^{⊤}$ which pools together all the unique time points across all subjects, and N is the length of $T$ . Letting $κ_{T} : T \times T \to ℝ$ be a positive-definite kernel, then $Ω_{X_{i}}$ can be constructed through $Ω^{N} = span {κ_{T} (\cdot, T_{1}), \dots, κ_{T} (\cdot, T_{N})} = span {B_{t}^{0} (\cdot) : t = 1, \dots, N}$ , for i ∈ V. Let $K_{T} = {[κ_{T} (T_{s}, T_{t})]}_{s, t = 1}^{N}$ be the N × N Gram matrix of κ_T, and its eigen-decomposition, $K_{T} = U_{T}^{1} D_{T}^{1} {(U_{T}^{1})}^{⊤} + U_{T}^{0} D_{T}^{0} {(U_{T}^{0})}^{⊤}$ , where $U_{T}^{1} D_{T}^{1} {(U_{T}^{1})}^{⊤}$ is associated with the m leading eigenvalues, and $U_{T}^{0} D_{T}^{0} {(U_{T}^{0})}^{⊤}$ associated with the last N − m eigenvalues. Here we require m ≤ N. Therefore, we can construct an orthonormal basis of Ω^N via

B (\cdot) = {[B_{1} (\cdot), \dots, B_{m} (\cdot)]}^{⊤} = {(D_{T}^{1})}^{- 1 / 2} {(U_{T}^{1})}^{⊤} B^{0} (\cdot),

(11)

where $B^{0} (\cdot) = {[B_{1}^{0} (\cdot), \dots, B_{N}^{0} (\cdot)]}^{⊤}$ . Then $X_{i}^{k}$ can be represented as $X_{i}^{k} (\cdot) = {\sum_{t = 1}^{m} ⌊ X_{i}^{k} ⌋}_{B, t} B_{t} (\cdot) = B^{⊤} (\cdot) {⌊ X_{i}^{k} ⌋}_{B}$ . Note that for the kth individual, the function is observed at n_k time points, which implies that $X_{i}^{k} (T_{k}) \equiv {[X_{i}^{k} (T_{k 1}), \dots, X_{i}^{k} (T_{k n_{k}})]}^{⊤} = {(B^{k})}^{⊤} {⌊ X_{i}^{k} ⌋}_{B}$ , where $B^{k}$ is the m × n_k matrix $[B (T_{k 1}), \dots, B (T_{k n_{k}})]$ . Therefore, for a given ridge parameter ϵ_T, the coordinate ${⌊ X_{i}^{k} ⌋}_{B}$ can be estimated by

⌊ X_{i}^{k} ⌋ = {[B^{k} {(B^{k})}^{⊤} + ϵ_{T} I_{m}]}^{- 1} B^{k} [X_{i}^{k} (T_{k})] .

(12)

We next derive the coordinate representations of ${\hat{V}}_{X_{i} X_{i}}$ , ${\hat{V}}_{Y X_{i j}}$ , ${\hat{V}}_{Y Y}$ , which then lead to the coordinates of ${\hat{V}}_{X X}^{y}$ , ${\hat{ℭ}}_{X X}^{y}$ , and ${\hat{V}}^{y}$ . Recall $κ_{Y} : Ω_{Y} \times Ω_{Y} \to ℝ$ is the kernel used to build the RKHS of Y. Let $K_{Y} = {[κ_{Y} (Y^{s}, Y^{t})]}_{s, t = 1}^{n}$ be the corresponding n × n Gram matrix of κ_Y, and $H_{Y} = span {κ_{Y} (\cdot, Y^{1}), \dots, κ_{Y} (\cdot, Y^{n})} \equiv span {B_{Y} (\cdot)}$ .

Proposition 6.

For (f, g) ∈ Ω^N × Ω^N, $⌊ {\hat{V}}_{X_{i} X_{i}} ⌋ = E_{n} (⌊ X_{i} ⌋ {⌊ X_{i} ⌋}^{⊤})$ , $⌊ {\hat{V}}_{Y Y} ⌋ = n^{- 1} K_{Y}$ , and $⌊ {\hat{V}}_{Y X_{i j}} (f \otimes g) ⌋ = n^{- 1} [({⌊ f ⌋}^{T} ⌊ X_{i}^{1} ⌋ {⌊ g ⌋}^{⊤} {⌊ X_{j}^{1} ⌋), \dots, ({⌊ f ⌋}^{⊤} ⌊ X_{i}^{n} ⌋ {⌊ g ⌋}^{⊤} ⌊ X_{j}^{n} ⌋)]}^{⊤}$ .

Moreover, for each i ∈ V, let ${({\hat{λ}}_{i}^{a}, ⌊ {\hat{η}}_{i}^{a} ⌋)}_{a = 1}^{m}$ denote the collection of eigenvalue and eigenvector pairs of $⌊ {\hat{V}}_{X_{i} X_{i}} ⌋$ ; that is $⌊ {\hat{V}}_{X_{i} X_{i}} ⌋ ⌊ {\hat{η}}_{i}^{a} ⌋ = {\hat{λ}}_{i}^{a} ⌊ {\hat{η}}_{i}^{a} ⌋$ . Then for each i ∈ V, k = 1, …, n, the sample Karhunen-Loève expansion of $X_{i}^{k}$ is of the form,

X_{i}^{k} = \sum_{a = 1}^{d} {\hat{α}}_{i}^{a} {\hat{η}}_{i}^{a} = \sum_{a = 1}^{d} 〈X_{i}^{k}, {\hat{η}}_{i}^{a}〉 {\hat{η}}_{i}^{a} = \sum_{a = 1}^{d} {⌊X_{i}^{k}⌋}^{⊤} ⌊{\hat{η}}_{i}^{a}⌋ {\hat{η}}_{i}^{a} .

(13)

Proposition 7.

Let $B_{i}^{*} = {{\hat{η}}_{i}^{1}, \dots, {\hat{η}}_{i}^{d}}$ , for each i ∈ V. For y ∈ Ω_Y, the coordinate representation of ${\hat{V}}_{X_{i} X_{j}}^{y}$ with respect to $B_{i}^{*}$ and $B_{j}^{*}$ is

_{B_{i}^{*}} {⌊{\hat{V}}_{X_{i} X_{j}}^{y}⌋}_{B_{j}^{*}} = {[{(a_{i}^{a})}^{⊤} c (y) a_{j}^{b}]}_{a, b = 1}^{d},

(14)

where c(y) = diag[ℓ(y)] with $ℓ (y) = {(K_{Y} + ϵ_{Y} I_{n})}^{- 1} B_{Y} (y)$ , and $a_{i}^{a} = {({\hat{α}}_{i}^{1, a}, \dots, {\hat{α}}_{i}^{n, a})}^{⊤}$ . Moreover, if $B^{*}$ is the collection $\{B_{i}^{*} : i \in V\}$ , then,

_{B^{*}} {⌊{\hat{V}}_{X X}^{y}⌋}_{B^{*}} = {[_{B_{i}^{*}} {⌊{\hat{V}}_{X_{i} X_{j}}^{y}⌋}_{B_{j}^{*}}]}_{i, j = 1}^{p}, {[_{B^{*}} {⌊{\hat{ℭ}}_{X X}^{y}⌋}_{B^{*}}]}_{i, j} = \{\begin{array}{l} A_{i}^{- 1 / 2}_{B_{i}^{*}} {⌊{\hat{V}}_{X_{i} X_{j}}^{y}⌋}_{B_{j}^{*}} A_{j}^{- 1 / 2} & when i \neq j, \\ I_{d} & when i = j, \end{array}_{B^{*}} {⌊{\hat{P}}^{y}⌋}_{B^{*}} = {(_{B^{*}} {⌊{\hat{ℭ}}_{X X}^{y}⌋}_{B^{*}} + ϵ_{2} I_{p \times d})}^{- 1},

(15)

where $A_{i} =_{B_{i}^{*}} {⌊ {\hat{V}}_{X_{i} X_{i}}^{y} ⌋}_{B_{i}^{*}} + ϵ_{1} I_{d}$ , and I_d is the d × d identity matrix.

Finally, we compute the squared Hilbert–Schmidt norm of ${[{\hat{P}}^{y}]}_{i, j}$ as

{‖{[{\hat{P}}^{y}]}_{i, j}‖}_{HS}^{2} = \sum_{a = 1}^{d} 〈{[{\hat{P}}^{y}]}_{i, j} {\hat{η}}_{j}^{a}, {[{\hat{P}}^{y}]}_{i, j} {\hat{η}}_{j}^{a}〉 = \sum_{a = 1}^{d} {⌊η_{j}^{a}⌋}_{B_{j}^{*} B_{j}^{*}}^{⊤} {⌊{[{\hat{P}}^{y}]}_{i, j}⌋}_{B_{i}^{*} B_{i}^{*}} {⌊{[{\hat{P}}^{y}]}_{i, j}⌋}_{B_{j}^{*}} {⌊η_{j}^{a}⌋}_{B_{j}^{*}} = {‖_{B_{i}^{*}} {⌊{[{\hat{P}}^{y}]}_{i, j}⌋}_{B_{j}^{*}}‖}_{F}^{2} = {‖{[_{B^{*}} {⌊{\hat{P}}^{y}⌋}_{B^{*}}]}_{i, j}‖}_{F}^{2},

for (i, j) ∈ V × V with i ≠ j, and ∥ · ∥F is the Frobenius norm.

4.3. Algorithm

We now summarize our conditional functional graph estimation algorithm based on CPO. The algorithm based on CPCO is similar and is thus omitted. Let λ_i(A) be the ith largest eigenvalues of an a × a matrix A, for i = 1, …, a.

Choose the kernel function κ_T. Some commonly used kernel functions include the Brownian motion function κ_T(s, t) = min(s, t), or the radial basis function (RBF) κ_T(s, t) = exp[−γ_T(s − t)²], $(s, t) \in ℝ^{2}$ . For RBF, the bandwidth γ_T is determined by ${(\sum_{s < t} | T_{s} - T_{t} |)}^{2} γ_{T} = N^{2} {(N - 1)}^{2} / 4$ .
Compute the N × N Gram matrix K_T of κ_T, and let $U_{T}^{1} D_{T}^{1} (U_{T}^{1})$ be its eigen-decomposition associated with its m leading eigenvalues. Then use (11) to construct the reduced basis $B (\cdot)$ , and the matrix $B^{k}$ . We suggest choosing m by $min {m^{'} : \sum_{i = 1}^{m^{'}} λ_{i} (K_{T}) / \sum_{i = 1}^{N} λ_{i} (K_{T}) \geq 0.99}$ , in the sense that the cumulative percentage of total variation of K_T explained exceeds 99%.
Calculate the coordinate $⌊ X_{k}^{i} ⌋, i = 1, \dots, p, k = 1, \dots, n$ , using (12) with a given ϵ_T. We choose ϵ_T = N^−1/5λ₁(K_T) (see, for example Lee, Li, and Zhao 2016a).
Perform eigen-decomposition of $⌊ {\hat{V}}_{X_{i} X_{i}} ⌋$ , for i ∈ V, and obtain the ath eigenvector $⌊ {\hat{η}}_{i}^{a} ⌋$ , and the ath K-L coefficient ${\hat{α}}_{i}^{k, a}$ using (13), a = 1, …, d. Stack ${\hat{α}}_{i}^{1, a}, \dots, {\hat{α}}_{i}^{n, a}$ to form $a_{i}^{a}, a = 1, \dots, d, i = 1, \dots, p$ , a = 1, …, d, i = 1, …, p. We choose d according to $min {d^{'} : \sum_{j = 1}^{d^{'}} λ_{j} (⌊ {\hat{V}}_{X_{i} X_{i}} ⌋) / \sum_{j = 1}^{N} λ_{j} (⌊ {\hat{V}}_{X_{i} X_{i}} ⌋) \geq 0.99}$ .
Choose the kernel function κ_Y, and compute the corresponding n × n Gram matrix K_Y. Compute the coordinate $_{B_{i}^{*}} {⌊ {\hat{V}}_{X_{i} X_{j}}^{y} ⌋}_{B_{j}^{*}}$ using (14), with the ridge parameter ϵ_Y = n^−1/5λ₁(K_Y).
Compute the representation of ${\hat{V}}^{y}$ using (15), for a given y ∈ Ω_Y and the ridge parameters $ϵ_{1} = n^{- 1 / 5} max {λ_{1} (⌊ {\hat{V}}_{X_{i} X_{i}}^{y} ⌋) : i \in V}$ , and $ϵ_{2} = n^{- 1 / 5} λ_{1} (⌊ {\hat{P}}^{y} ⌋)$ .
Estimate the conditional functional graph for a given y ∈ Ω_Y by ${\hat{E}}^{y} (ρ_{CPO}) = {(i, j) \in V \times V : {‖ {[B^{*} ⌊ {\hat{P}}^{y} ⌋ B^{*}]}_{i, j} ‖}_{F} > ρ_{CPO}, i \neq j}$ , with a given threshold ρ_CPO. We determine ρ_CPO by minimizing the following generalized cross-validation (GCV) criterion over a set of grid points,
${GCV}^{y} (ρ) = \sum_{i = 1}^{p} {GCV}_{i}^{y} (ρ), with {GCV}_{i}^{y} (ρ) = \frac{{‖_{B_{i}^{*}} {⌊{\hat{V}}_{X_{i} X_{i} ∣ X_{N_{i}^{y} (ρ)}}^{y}⌋}_{B_{i}^{*}}‖}_{F}^{2} / n}{{[1 - {DF}_{i} (ρ) / n]}^{2}},$
where ${\hat{V}}_{X_{i} X_{i} ∣ X_{N_{i}^{y} (ρ)}}^{y}$ is the sample estimate of ${\hat{V}}_{X_{i} X_{i} ∣ X_{N_{i}^{y} (ρ)}}^{y}$ , and its coordinate representation is derived in Section S.4 of the appendix, and DF_i(ρ) = d² card [N_i(ρ)], with card(·) being the cardinality and $N_{i}^{y} (ρ) = {j \in V : (i, j) \in {\hat{E}}^{y} (ρ)}$ the neighborhood of the ith node in ${\hat{E}}^{y} (ρ)$ .

Our graph estimation algorithm involves multiple tuning parameters, and their choices are given in the above algorithm. We further study their effects in Section S.6 of the appendix. In general, we have found that the estimated graph is not overly sensitive to the tuning parameters as long as they are within a reasonable range.

We also remark that, the above algorithm assumes the zero-mean condition. In practice, if this does not hold, we can easily modify the algorithm. Specifically, from the coordinates of the estimated conditional covariance operator, we can estimate $E (α_{i}^{a} ∣ y)$ by ${(a_{i}^{a})}^{⊤} c (y) 1_{n}$ , where 1_n is the n-dimensional vector with all ones. By redefining the conditional covariance operator as $V_{X_{i} X_{j}}^{y} = \sum_{a, b \in ℕ} [E (α_{i}^{a} α_{j}^{b} ∣ y) - E (α_{i}^{a} ∣ y) E (α_{j}^{b} ∣ y)] (η_{i}^{a} \otimes η_{j}^{b})$ , we can estimate the coordinates of ${\hat{V}}_{X_{i} X_{j}}^{y}$ by $_{B_{i}^{*}} {⌊ {\hat{V}}_{X_{i} X_{j}}^{y} ⌋}_{B_{j}^{*}}^{'} = {[{(a_{i}^{a})}^{⊤} c^{'} (y) a_{j}^{b}]}_{a, b = 1}^{d}$ , where c′ (y) = diag[ℓ(y)] − ℓ(y) ℓ(y)^⊤. We then replace $_{B_{i}^{*}} {⌊ {\hat{V}}_{X_{i} X_{j}}^{y} ⌋}_{B_{j}^{*}}$ in Step 5 of the algorithm, and all subsequent procedures, by $_{B_{i}^{*}} {⌊ {\hat{V}}_{X_{i} X_{j}}^{y} ⌋}_{B_{j}^{*}}$ .

5. Asymptotic Theory

We begin with some useful concentration inequalities and uniform convergence for several relevant operators. We next establish the uniform convergence of the CPO, and the consistency of the estimated graph. For simplicity, we first assume that the trajectory of the random functions X(t) = [X₁(t), …, X_p(t)]^⊤ is fully observed for all t ∈ T. We then discuss the setting when X_i is only partially observed in Section 5.3. We also remark that, all our theoretical results allow the dimension of the graph to diverge with the sample size.

5.1. Concentration Inequalities and Uniform Convergence

We first derive the concentration bound and uniform convergence rate for the sample estimators ${\hat{V}}_{X_{i} X_{j}}$ , ${\hat{V}}_{Y X_{i j}}$ , and ${\hat{V}}_{Y Y}$ . For two positive sequences {a_n} and {b_n}, write a_n ≺ b_n if a_n = o(b_n), a_n ⪯ b_n if a_n = O(b_n), and a_n ≍ b_n if a_n ⪯ b_n and b_n ⪯ a_n; moreover, let a_n ∨ b_n = b_n if a_n ⪯ b_n. Similarly, if c_n is a third positive sequence, then we let a_n ∨ b_n ∨ c_n = (a_n ∨ b_n) ∨ c_n.

Theorem 3.

If Assumptions 1 and 3 hold, then there exist positive constants C₁ to C₆, such that, (i) $P ({‖ {\hat{V}}_{X_{i} X_{j}} - V_{X_{i} X_{j}} ‖}_{HS} > t) \leq C_{1} exp [- C_{2} n (t \land t^{2})]$ ; (ii) $P ({‖ {\hat{V}}_{Y X_{i j}} - V_{Y X_{i j}} ‖}_{HS} > t) \leq C_{3} exp [- C_{4} n (t \land t^{2})]$ ; (iii) $P ({‖ {\hat{V}}_{Y Y} - V_{Y Y} ‖}_{HS} > t) \leq C_{5} exp (- C_{6} n t^{2})$ , for any t ≥ 0 and any (i, j) ∈ V × V. Moreover, if log p/n → 0, then (iv) ${max}_{i, j \in V} ‖ {\hat{V}}_{X_{i} X_{j}} - V_{X_{i} X_{j}} ‖_{HS} = O_{P} [{(log p / n)}^{1 / 2}]$ ; (v) ${max}_{i, j \in V} {‖ {\hat{V}}_{Y X_{i j}} - V_{Y X_{i j}} ‖}_{HS} = V_{X_{i} X_{j}} ‖_{HS} = O_{P} [{(log p / n)}^{1 / 2}]$ ; (vi) ${‖ {\hat{V}}_{Y Y} - V_{Y Y} ‖}_{HS} = O_{P} (n^{- 1 / 2})$ , as n →∞.

For any (i, j) ∈ V × V and y ∈ Ω, let $V_{X_{X} X_{j}}^{y} (d)$ be the truncated version of $V_{X_{i} X_{j}}^{y} : V_{X_{i} X_{j}}^{y} (d) = \sum_{a, b = 1}^{d} E (α_{i}^{a} α_{j}^{b} ∣ y) (η_{i}^{a} \otimes η_{j}^{b})$ . Next, we establish the uniform convergence rate for ${‖ {\hat{V}}_{X_{i} X_{j}}^{y} - V_{X_{i} X_{j}}^{y} (d) ‖}_{HS}$ . We define the exponent of a compact and self-adjoint operator as $A^{β} = \sum_{a \in ℕ} λ_{a}^{β} (η_{a} \otimes η_{a})$ , for any β > 0, where ${(λ_{a}, η_{a})}_{a \in ℕ}$ is the collection of eigenvalue and eigenfunction pairs of A. We need another assumption.

Assumption 6.

For any (i, j) ∈ V × V, there exist β ∈ (0, 1), c₂ > 0, and $M_{i j}^{0} \in B [B_{2} (Ω_{j}, Ω_{i}), H_{Y}]$ such that $M_{X_{i j} ∣ Y} = V_{Y Y}^{β} M_{i j}^{0}$ with $‖ M_{i j}^{0} ‖ \leq c_{2}$ .

This assumption regulates the complexity of (X_i, X_j)^⊤ given Y. To see this, note that, as β increases, the regression relation from (X_i, X_j)^⊤ on Y are more concentrated on those eigenfunctions corresponding to larger eigenvalues of $V_{Y Y}$ . Also, we define $κ_{d} \equiv min {| λ_{i}^{a} - λ_{i}^{b} | : 1 \leq a < b \leq d + 1, i \in V}$ as the minimal isolation distance among all d + 1 leading eigenvalues of $V_{X_{i} X_{i}}$ , for all i ∈ V. Note that we allow d to grow with n.

Theorem 4.

If Assumptions 1–3 and 6 hold, ϵ_Y ≺ 1, and $(log p / n) ≺ κ_{d}^{2}$ , then ${max}_{i, j \in V} {‖ {\hat{V}}_{X_{i} X_{j}}^{y} - V_{X_{i} X_{j}}^{y} (d) ‖}_{HS} = O_{P} [d^{2} ϵ_{Y}^{- 1} κ_{d}^{- 1} {(log p / n)}^{1 / 2} + d^{2} ϵ_{Y}^{- 1} n^{- 1 / 2} + d^{2} ϵ_{Y}^{β}]$ .

Next, we establish the uniform convergence of the estimated conditional correlation operator ${\hat{ℭ}}_{X X}^{y}$ . We need an assumption on the tail behavior of random functions.

Assumption 7.

There exists γ_y > 0 such that ${max}_{i \in V} {\sum_{a = d + 1}^{\infty} E [{(α_{i}^{a})}^{2} ∣ y]} ⪯ d^{- γ_{y}}$ , for any y ∈ Ω_Y.

Assumption 7 is on the decaying rate of the tail eigenvalues of $V_{X_{i} X_{i}}^{y}$ , which, in a sense, characterizes the smoothness of the distribution of X_i given Y.

Theorem 5.

If Assumptions 1–4, 6, and 7 hold, ϵ_Y, ϵ₁ ≺ 1, and $d^{2} ϵ_{Y}^{- 1} κ_{d}^{- 1} {(log p / n)}^{1 / 2} + d^{2} ϵ_{Y}^{- 1} n^{- 1 / 2} + d^{2} ϵ_{Y}^{β} ≺ 1$ , then, for any y ∈ Ω_Y,

max_{i, j \in V} {‖ {[{\hat{ℭ}}_{X X}^{y}]}_{i, j} - {[ℭ_{X X}^{y}]}_{i, j} ‖}_{HS} = O_{p} (δ_{y}),

where $δ_{y} = ϵ_{1}^{- 3 / 2} [d^{2} ϵ_{Y}^{- 1} κ_{d}^{- 1} {(log p / n)}^{1 / 2} + d^{2} ϵ_{Y}^{- 1} n^{- 1 / 2} + d^{2} ϵ_{Y}^{β} + d^{- γ_{y}}] + ϵ_{1}^{1 / 2}$ .

To better understand Theorem 5, suppose $d = O (n^{a_{1} / 2})$ , $ϵ_{Y} = O (n^{- a_{2}})$ , $ϵ_{1} = O (n^{- 2 a_{3}})$ , and $k_{d} = O (n^{- a_{4}})$ , for a₁, …, a₄ > 0. Then we have

max_{i, j \in V} {‖ {[{\hat{ℭ}}_{X X}^{y}]}_{i, j} - {[ℭ_{X X}^{y}]}_{i, j} ‖}_{HS} = O_{p} [{(log p / n^{1 - π})}^{1 / 2} + n^{- \frac{1}{2} a_{1} γ_{y} + 3 a_{3}} + n^{- a_{2} β + a_{1} + 3 a_{3}} + n^{- \frac{1}{2} + a_{1} + a_{2} + 3 a_{3}} + n^{- a_{3}}],

where π = 2(a₁ + a₂ + a₄) + 6a₃ < 1. This implies that the graph dimension p can diverge with the sample size n at an exponential rate. In comparison, in the classical random variable setting, the uniform convergence rate of the sample covariance is (log p/n)^1/2 (Bickel and Levina 2008). Theorem 5 thus can be viewed as an extension to both the functional and conditional settings where the parameter of interest $ℭ_{X X}^{y}$ is a random RKHS operator.

5.2. Uniform Convergence of CPO and Graph Consistency

We next derive the convergence of the estimated CPO, ${\hat{P}}^{y}$ . We need an assumption to regulate the relation between X_i and X_j when conditioning on X₋(_i,j) and Y.

Assumption 8.

There exists c₃ > 0 such that, ${max}_{i, j \in V, i \neq j} {‖ ℭ_{X_{i} X_{j} ∣ X_{- (i, j})}^{y} ‖}_{HS} \leq c_{3}$ , where $ℭ_{X_{i} X_{j} ∣ X_{- (i, j)}}^{y} = ℭ_{X_{i} X_{j}}^{y} - ℭ_{X_{i} X_{- (i, j)}}^{y} {[ℭ_{X_{- (i, j)} X_{- (i, j)}}^{y}]}^{- 1} ℭ_{X_{- (i, j)} X_{j}}^{y}$ , for any y ∈ Ω_Y

The following proposition provides a condition under which Assumption 8 is satisfied. Its proof is similar to Proposition S2 and is omitted.

Proposition 8.

Suppose Assumptions 1, 3–5 hold. Then, for any y ∈ Ω_Y, there exists c₃ > 0 such that, ${max}_{i, j \in V, i \neq j} ‖ ℭ_{X_{i} X_{j} ∣ X_{- (i, j)}}^{y} ‖_{HS} \leq c_{3}$ , if there exists c4 > 0 satisfying that

max_{i, j \in V, i \neq j} \sum_{a \in {\tilde{ℕ}}_{i}^{y}, b \in {\tilde{ℕ}}_{j}^{y}} {cor}^{2} (〈 {\tilde{ψ}}_{i}^{y, a}, X_{i} 〉, 〈 {\tilde{ψ}}_{j}^{y, b}, X_{j} 〉 ∣ X_{- (i, j)}, y) \leq c_{4},

(16)

where ${\tilde{ψ}}_{i}^{y, a} = ψ_{i}^{y, a} / {(μ_{i}^{y, a})}^{1 / 2}$ , and ${(μ_{i}^{y, a}, ψ_{i}^{y, a})}_{a \in ℕ}$ are the pairs of eigenvalue and eigenfunction of $V_{X_{i} X_{i} ∣ X_{- (i, j)}}^{y}$ , and ${\tilde{ℕ}}_{i}^{y} = {a \in ℕ : μ_{i}^{y, a} > 0}$ , for any i ∈ V and y ∈ Ω_Y.

Both conditions (5) and (16) introduce certain levels of smoothness on the conditional distribution of X given Y. Nevertheless, they target different subjects: Equation (5) is about the relation between X_i and X_j given Y, which is required for the consistency of $ℭ_{X_{i} X_{j}^{'}}^{y}$ , whereas Equation (16) is about the relation between X_i and X_j given X₋(_i,j) and Y, which is required for the consistency of ${[P^{y}]}_{i, j}$ .

Theorem 6.

If Assumptions 1–8 hold, ϵ_Y, ϵ₁ ≺ 1, and $d^{2} ϵ_{Y}^{- 1} κ_{d}^{- 1} {(log p / n)}^{1 / 2} + d^{2} ϵ_{Y}^{- 1} n^{- 1 / 2} + d^{2} ϵ_{Y}^{β} ≺ 1$ , then for any y ∈ Ω_Y,

max_{i, j \in V, i \neq j} {‖ {[{\hat{P}}^{y}]}_{i, j} - {[P^{y}]}_{i, j} ‖}_{HS} = O_{p} (ϵ_{2}^{- 1} p δ_{y} + ϵ_{2}) .

Moreover, if $ρ_{CPO} = min {‖ {[P^{y}]}_{i, j} ‖_{HS} : (i, j) \in V \times V, i \neq j, {[P^{y}]}_{i, j} \neq 0} / 2$ , and $ρ_{CPO} ≻ (ϵ_{2}^{- 1} p δ_{y} + ϵ_{2})$ , then,

P [{\hat{E}}_{CPO}^{y} = E^{y}] \to 1, as n \to \infty .

We make a few remarks. First, Li and Solea (2018) established the consistency of their precision operator, as well as the unconditional graph estimation consistency. Note that their operator is nonrandom, and their results were derived with the graph size p fixed. By contrast, Theorem 6 establishes the uniform convergence of the operator ${[{\hat{P}}^{y}]}_{i, j}$ , which is random, and the graph estimation consistency is obtained with a diverging p. For instance, if ϵ₂ O(n^−π′/2) with π′ > 0, then Theorem 6 says that the uniform convergence rate of the estimated CPO depends on p(log p/n^{1−π−π′})^1/2. This implies that we allow p to grow at the polynomial rate of n.

Second, a careful inspection of our proof reveals that, the convergence rate of the empirical CPO depends on the difference ${‖ {\hat{ℭ}}_{X X}^{y} - ℭ_{X X}^{y} ‖}_{HS}$ , whose order, by Theorem 5, can be no faster than p(log p/n^1−π′)^1/2. We note that, under the classical random variable setting, the convergence rate of the hard thresholding sample covariance matrix in terms of Frobenius norm is [pc₀(p) log p/n]^1/2, where the term c₀(p) imposes a sparsity structure on the covariance matrix and satisfies that $\sum_{j = 1}^{p} 1 [cov (X_{i}, X_{j}) \neq 0] \leq c_{0} (p)$ , for i = 1, …, p, with 1(·) being the indicator function (Bickel and Levina 2008, theo. 2). We feel our rate of p(log p/n^1−π)^1/2 is reasonable and is comparable to the classical result. We also note that there is a difference between p and p^1/2 in our rate and the rate of (Bickel and Levina 2008, theo. 2). This difference is mainly due to different sparsity settings imposed by our method and by Bickel and Levina (2008). Note that we do not impose any sparsity structure on the conditional correlation operator $ℭ_{X X}^{y}$ . This means we need to estimate all the off-diagonal elements of $ℭ_{X X}^{y}$ , whose cardinality grows in the order of p². In comparison, Bickel and Levina (2008) imposed a sparsity structure on the covariance matrix and the cardinality of nonzero covariances grows only in the order of pc₀(p). Moreover, we are dealing with a more complicated setting of random functions and random operators. On the other hand, we show in Section S.5 in the appendix that, if we introduce additional sparsity and regularization, we can further improve the rates in Lemma S8 in the appendix and Theorem 6, so that p can grow at an exponential rate of n. Actually, we have developed a theoretical platform that is not only limited to the present setting. The concept of using vanishing CPO to identify the conditional independence between random functions in a conditional graph can have divarication beyond the scenarios studied in this article; for example, when there are additional sparsity or bendable structures.

Finally, in our theory development, we have imposed a series of technical conditions, which are commonly imposed in the literature, and are usually easy to satisfy.

5.3. Consistency for Partially Observed Random Functions

We next derive the consistency under the scenario when the random functions are only partially observed. Partially observed functions are collected via a dense or a non-dense measurement schedule; see Wang, Chiou, and Muller (2016a) for more discussion on measurement schedule. To avoid digressing from the main context, in this article, we do not go after any specific measurement schedule or regularity setting on the partially observed random function. For partially observed functions X(t), suppose $\tilde{X} (t) = {[{\tilde{X}}_{1} (t), \dots, {\tilde{X}}_{p} (t)]}^{⊤}$ is the estimate of X(t) using the empirical bases developed in Section 4.2. We then estimate the series of the operators and the graph by

{\tilde{V}}_{X_{i} X_{i}} = E_{n} ({\tilde{X}}_{i} \otimes {\tilde{X}}_{i}), {\tilde{V}}_{Y X_{i j}} = E_{n} [κ_{Y} (\cdot, Y) \otimes {\tilde{X}}_{i} \otimes {\tilde{X}}_{j}],

{\tilde{V}}_{X_{i} X_{j}}^{y} = \sum_{a, b = 1}^{d} {[{\tilde{M}}_{X_{i j} ∣ Y} ({\tilde{η}}_{i}^{a} \otimes {\tilde{η}}_{j}^{b})] ° (y)} ({\tilde{η}}_{i}^{a} \otimes {\tilde{η}}_{j}^{b}),

{\tilde{ℭ}}_{X_{i} X_{j}}^{y} = {({\tilde{V}}_{X_{i} X_{i}}^{y} + ϵ_{1} I)}^{- 1 / 2} {\tilde{V}}_{X_{i} X_{j}}^{y} {(V_{X_{j} X_{j}}^{y} + ϵ_{1} I)}^{- 1 / 2} for i \neq j, and I for i = j,

{\tilde{P}}^{y} = {({\tilde{ℭ}}_{X X}^{y} + ϵ_{2} I)}^{- 1},

{\tilde{E}}^{y} (ρ) = {(i, j) \in V \times V : {‖ {[{\tilde{P}}^{y}]}_{i, j} ‖}_{HS} > ρ, i \neq j},

where ${\tilde{M}}_{X_{i j} ∣ Y} = {({\hat{V}}_{Y Y} + ϵ_{Y} I)}^{- 1} {\tilde{V}}_{Y X_{i j}}$ , ${\tilde{η}}_{i}^{a}$ is the eigenfunction of ${\tilde{V}}_{X_{i} X_{i}}$ , and ${\tilde{ℭ}}_{X X}^{y}$ is the block matrix of operators with (i, j)th element being ${\tilde{ℭ}}_{X_{i} X_{j}}^{y}$ .

Theorems 5 and 6 show that the convergence of the estimated CCO and CPO depends on the uniform convergence of the sample covariance operators in Theorem 3. In particular, it relies on the convergence rates of ${max}_{i, j \in V} ‖ {\hat{V}}_{X_{i} X_{j}} - V_{X_{i} X_{j}} ‖_{HS}$ and ${max}_{i, j \in V} ‖ {\hat{V}}_{Y X_{i j}} - V_{Y X_{i j}} ‖_{HS}$ . When the random functions are completely observed, both rates, by Theorem 3, are equal to (log p/n)^1/2. When the random functions are only partially observed, we let the convergence rates of ${max}_{i, j \in V} ‖ {\hat{V}}_{X_{i} X_{i}} - V_{X_{i} X_{i}} ‖_{HS}$ and ${max}_{i, j \in V} ‖ {\hat{V}}_{Y X_{i j}} - V_{Y X_{i j}} ‖_{HS}$ to be slower than (log p/n)^1/2. More specifically, as specified in Equation (17), we introduce a quantity a that reflects how dense the time points are observed in the random functions. The denser the time points are, the closer a is to zero. Correspondingly, the next theorem extends Theorem 6 to partially observed functions. Its proof is similar to that of Theorem 6, and is thus omitted.

Theorem 7.

If Assumptions 1–8 hold, ϵ_Y, ϵ₁ ≺ 1, and there exists a ∈ [0, 1) such that $d^{2} ϵ_{Y}^{- 1} κ_{d}^{- 1} log p / n^{(1 - a) / 2} + d^{2} ϵ_{Y}^{- 1} n^{- 1 / 2} + d^{2} ϵ_{Y}^{β} ≺ 1$ , $[ϵ_{2}^{- 1} p δ_{y}^{a} + ϵ_{2}] ≺ ρ_{CPO}$ with $δ_{y}^{a} = ϵ_{1}^{- 3 / 2} {d^{2} ϵ_{Y}^{- 1} κ_{d}^{- 1} log p / n^{(1 - a) / 2} + d^{2} ϵ_{Y}^{- 1} n^{- 1 / 2} + d^{2} ϵ_{Y}^{β} + d^{- γ_{y}}} + ϵ_{1}^{1 / 2}$ , and

max_{i, j \in V} {‖ {\tilde{V}}_{X_{i} X_{j}} - V_{X_{i} X_{j}} ‖}_{HS} = O_{P} [log p / n^{(1 - a) / 2}], max_{i, j \in V} {‖ {\tilde{V}}_{Y X_{i j}} - V_{Y X_{i j}} ‖}_{HS} = O_{P} [log p / n^{(1 - a) / 2}] .

(17)

Then, $max {{‖ {[{\tilde{P}}^{y}]}_{i, j} - {[P^{y}]}_{i, j} ‖}_{HS} : (i, j) \in V \times V, i \neq j} = O_{p} [ϵ_{2}^{- 1} p δ_{y}^{a} + ϵ_{2}]$ , and $P [{\tilde{E}}^{y} (ρ_{CPO}) = E^{y}] \to 1$ , for any y ∈ Ω_Y, as n →∞.

The first condition of Equation (17) is satisfied if the tail of ${‖ {\tilde{V}}_{X_{i} X_{j}} - V_{X_{i} X_{j}} ‖}_{HS}$ behaves as a sub-Exponential distribution. That is, when there exist c′, c″ > 0 and a ∈ [0, 1), such that $P ({‖ {\tilde{V}}_{X_{i} X_{j}} - V_{X_{i} X_{j}} ‖}_{HS} > t) \leq c^{'} exp (- c_{2}^{''} n^{(1 - a) / 2} t)$ , for any t ≥ 0. A similar condition also holds for the second condition of Equation (17). Recall in Theorem 3, we have shown that, for completely observed functions, ${‖ {\hat{V}}_{X_{i} X_{j}} - V_{X_{i} X_{j}} ‖}_{HS} > t$ and ${‖ {\hat{V}}_{Y X_{i j}} - V_{Y X_{i j}} ‖}_{HS} > t$ behave as a sub-Gaussian distribution for a small t. As such, Equation (17) is more appropriate for partially observed functions.

6. Simulations

Next, we study the finite-sample performance of our method through simulations. Specifically, we consider three graph structures of E: a hub, a tree, and a random graph, as shown in Table 1. We consider p = 10 nodes with the sample size n = 100 first, and consider larger graphs later. Given the graph structure E and the conditional variable y, we generate X_j(t) and its parent nodes based on the following model:

X_{j} (t) ∣ y = \sum_{i \in P_{j}} [{\sqrt{y}}^{v_{i j}} {(1 - \sqrt{y})}^{1 - v_{i j}}] \times X_{i} (t) + c_{j} \times ε_{j} (t),

where X(t) | y =[X₁(t) | y, …, X_p(t) | y]^⊤ is constructed sequentially via the given graph, P_j is the set of the parent nodes of j, and c_j is the scale parameter as specified in Table 1. We generate the error function, ε_i(t) as $\sum_{u = 1}^{J} ξ_{u} κ_{T} (t, t_{u})$ , where ξ₁, …, ξ_J are iid standard normal variables, t₁, …, t_J are equally spaced points between [0, 1] with J = 50, and κ_T is a RBF or a Brownian motion covariance function. We then generate the conditioning variable, Y¹, …, Yⁿ, as iid Uniform(0, 1). We generate each $X_{i}^{k}$ , k = 1, …, n, i = 1, …, p, with n_k = 50 time points. In this model, there are two types of edge patterns: when ν_ij = 1, the strength of edges grows with y, and when ν_ij = 0, the strength of edges decays with y. The tuning parameters are determined by the rules suggested in Section 4.3. In Section S.6 of the appendix, we discuss in detail the effect of the tuning parameters, and show that our method is relatively robust to a range of tuning parameters.

Table 1.

Simulation setup: the edges under three graph structures.

graphic file with name nihms-1746368-t0001.jpg

Open in a new tab

We compare our method with three alternative solutions, all of which are variants of graphical Lasso (Friedman et al. 2008, gLASSO). The first solution, which we refer as “Average,” is similar to Kolar et al. (2010). It first estimates the conditional covariance matrix by ${\hat{Σ}}_{X X}^{y} = \sum_{k = 1}^{n} κ_{y} (y, Y^{k}) {\sum_{j = 1}^{50} X^{k} (T_{j}) {[X^{k} (T_{j})]}^{⊤} / 50} / \sum_{k = 1}^{n} κ_{y} (y, Y^{k})$ , where $X^{k} (T_{j}) = {[X_{1}^{k} (T_{j}), \dots, X_{p}^{k} (T_{j})]}^{⊤}$ , j = 1, …, 50, k = 1, …, n. It then estimates the conditional precision matrix ${\hat{Θ}}^{y}$ by applying gLASSO to ${\hat{Σ}}_{X X}^{y}$ . The second solution, which we refer as “Majority,” is similar to a procedure in Qiao, Guo, and James (2019). It first estimates the covariance matrix at each time point by ${\hat{Σ}}_{X X}^{y} (T_{j}) = \sum_{k = 1}^{n} κ_{y} (y, Y^{k}) {X^{k} (T_{j}) {[X^{k} (T_{j})]}^{⊤}} / \sum_{k = 1}^{n} κ_{y} (y, Y^{k})$ . It then estimates the conditional precision matrix at each time point, by applying gLASSO to ${\hat{Σ}}_{X X}^{y} (T_{j})$ . It selects those edges that are detected by the majority of the estimates among all the estimated graphs. The third solution, referred as “Unconditional,” is similar to the naive procedure in Qiu et al. (2016), which, without using the information of Y, estimates the covariance matrix by ${\hat{Σ}}_{X X}^{k} = \sum_{j = 1}^{50} X^{k} (T_{j}) {[X^{k} (T_{j})]}^{⊤} / 50$ . It then estimates the conditional precision matrix by applying gLASSO to ${\hat{Σ}}_{X X}^{k}$ . For the penalty parameter in gLASSO, we adopt the empirical rule suggested by Rothman et al. (2008), and set it to (log p/n)^1/2. We use the RBF kernel for both κ_T and κ_Y in all simulations.

We evaluate the accuracy of the estimated graph using the area under ROC curve (AUC). We first compute the false positive (FP) and true positive (TP) as,

{TP}^{y, ρ} = \frac{\sum_{1 \leq i < j \leq p} 1 [(i, j) \in E^{0}, (i, j) \in {\hat{E}}^{y} (ρ)]}{\sum_{1 \leq i < j \leq p} 1 [(i, j) \in E^{0}]},

{FP}^{y, ρ} = \frac{\sum_{1 \leq i < j \leq p} 1 [(i, j) \notin E^{0}, (i, j) \in {\hat{E}}^{y} (ρ)]}{\sum_{1 \leq i < j \leq p} 1 [(i, j) \notin E^{0}]},

for a given ρ and y ∈ Ω_Y, and E⁰ is the true graph. We then compute the pairs of (TP^y,ρ, FP^y,ρ) over a set of values of ρ, which we set to be the sorted norms of empirical CPO.

Figure 1 reports the AUC for the estimated graph with respect to the external variable Y, under three graph structures, and by the four methods. It is seen that the methods “Majority” and “Unconditional” perform consistently the worst, while “Average” and our proposed CPO methods perform similarly in this example.

We next extend our simulations to larger graphs, where we set the graph size p = {30, 50, 100}, with the sample size n = 30 and n_k = 30 time points, k = 1, …, n. We then generate the graph in a similar way as before. Specifically, for the hub structure, we generate p/5 independent hubs, and within each hub, two edges have their strength growing with y, and two edges decaying with y. For the tree structure, we expand the tree until the graph reaches the designated size, and in each subsequent layer of the tree, one edge has growing strength and the other has decaying strength with y. For the random structure, we select the edges randomly following a Bernoulli distribution with probability 1/(p−1), and also randomly select half of the selected edges to have growing strength and the other half to have decaying strength. Due to the poor performance of “Majority” and “Unconditional” in the previous simulation setting, we only compare our CPO method with “Average” in the large-graph setting. Figure 2 reports the AUC. It is seen that our CPO method clearly outperforms the “Average” method in some settings, and is comparable in other settings. In particular, for a large graph with p = 100, the improvements of CPO over “Average” in both the hub and random structures are substantial.

Figure 2. — Area under the ROC curve for the estimated graph with respect to the external variable Y, under three graph structures, and by the two methods. From left to right: CPO, Average. The graph size p = {30, 50, 100}.

7. Application

In this section, we illustrate our conditional functional graph estimation method with a brain functional connectivity analysis example. We analyze a dataset from the Human Connectome Project (HCP), which consists of resting-state fMRI scans of 549 individuals. Each fMRI scan has been processed and summarized as a spatial temporal matrix, with rows corresponding to 268 brain regions-of-interest, and columns corresponding to 1200 time points (Greene et al. 2018). Additionally, each subject is collected with a score of the Penn Progressive Matrices, which is a nonverbal group test typically used in educational settings and is generally taken as a surrogate of fluid intelligence. It is of scientific interest to unravel the connectivity patterns of the brain regions conditioning on the intelligence measure.

We apply our conditional functional graphical modeling approach for the whole brain of 268 × 268 connectivity network. Applying the hard thresholding approach would yield a sparse estimate of the conditional graph at each value of Y = y. To identify edges that vary along with the conditional variable Y, we further employ a permutation test approach. Specifically, we permute the observations of Y five hundred times. For each permuted sample, we compute the Hilbert-Schmidt (H-S) norm of the sample CPO for each edge. We then compute the variance of the H-S norms based on 500 permutations. We treat those edges whose corresponding variances above 95% percentile as significant, since in this context a nonzero variance implies that the CPO varies with the conditional variable. In addition, those 268 brain regions have been partitioned into eight functional modules: medial frontal, frontoparietal, default mode, motor, visual, limbic, basal ganglia, and cerebellum (Finn et al. 2015). We count the number of significant edges from the permutation test within each functional module, then use the Fisher’s exact test to determine if a module is significantly enriched.

We have found that the medial frontal module is significantly enriched with numerous significant edges that vary along with the intelligence score. Figure 3 shows the identified significant edges. This finding agrees with the literature, as this module has been reported to contribute to high-level cognitive functions such as emotional responses (Smith et al. 2018) and learning (Zanto and Gazzaley 2013). Moreover, this module was found to have more impact on fluid intelligence compared to other modules (Finn et al. 2015). We have also observed more cross-lobe and inter-hemisphere interactions, which suggests the importance of these edges in cognitive functions. This again complies with the existing literature in neuroscience that the altered interhemispheric interactions of prefrontal lobe is closely related to higher level cognitive traits such as the Internet gaming disorder (Wang et al. 2015). Figure 4 reports the changes of the identified significant edges for the medial frontal module at five different values of the intelligence score. We see from the plot that both increasing and decreasing patterns exist for the strength of the significantly varying edges.

Figure 3. — Medial frontal network, with significant edges. Different colors of nodes indicate the ROIs in different lobes: prefrontal, motor, parietal, temporal, limbic and cerebellum. Red lines indicate inter-lobe edges and cyan ones for inner-lobe edges.

Figure 4. — Medial frontal network changes, with respect to the intelligence score at 7, 11, 15, 19, 23. Blue color represents the small H-S norm value of CPO, green the medium norm value, and red the high H-S norm value.

As an independent validation, we replicate the analysis using another resting-state fMRI data of 828 individuals in HCP. We report the changes of the identified significant edges for the medial frontal module from the new dataset in Figure S6 of the appendix. We again find that the medial frontal module is enriched. Hearne, Mattingley, and Cocchi (2016) identified positive correlations between functional connectivity and intelligence in general. Song et al. (2008) reported that most of the brain regions whose connectivity patterns are negatively correlated with the intelligence are around medial frontal gyrus, or part of motor regions, which are part of medial frontal module. Combined with our findings, it suggests that the medial frontal module may play a unique role in intelligence compared to other brain modules.

Supplementary Material

Supplement

NIHMS1746368-supplement-Supplement.pdf^{(2.8MB, pdf)}

Acknowledgments

We are grateful to two referees and an associate editor for their constructive comments and helpful suggestions.

Funding

Lexin Li’s research was partially supported by NIH (grants nos. R01AG061303, R01AG062542, and R01AG034570). Hongyu Zhao’s research was partially supported by NSF (grant nos. DMS1713120 and DMS1902903), and NIH (grant no. R01 GM134005).

Footnotes

Supplementary Materials

The Supplementary Appendix contains discussions of some assumptions, all the proofs of the theoretical results, and some results from additional asymptotic and numerical analyses.

References

Bach F (2009), “High-Dimensional Non-Linear Variable Selection Through Hierarchical Kernel Learning,” arXiv no. 0909.0844. [Google Scholar]
Baker CR (1973), “Joint Measures and Cross-Covariance Operators,” Transactions of the American Mathematical Society, 186, 273–289. [Google Scholar]
Bickel PJ, and Levina E (2008), “Covariance Regularization by Thresholding,” The Annals of Statistics, 36, 2577–2604. [Google Scholar]
Bosq D (2000), Linear Processes in Function Spaces, New York: Springer. [Google Scholar]
Bullmore E, and Sporns O (2009), “Complex Brain Networks: Graph Theoretical Analysis of Structural and Functional Systems,” Nature Reviews. Neuroscience, 10, 186–198. [DOI] [PubMed] [Google Scholar]
Cai TT, Liu W, and Luo X (2011), “A Constrained ℓ₁ Minimization Approach to Sparse Precision Matrix Estimation,” Journal of the American Statistical Association, 106(494):594–607. [Google Scholar]
Chun H, Zhang X, and Zhao H (2015), “Gene Regulation Network Inference With Joint Sparse Gaussian Graphical Models,” Journal of Computational and Graphical Statistics, 24, 954–974. [DOI] [PMC free article] [PubMed] [Google Scholar]
Danaher P, Wang P, and Witten DM (2011), “The joint Graphical Lasso for Inverse Covariance Estimation Across Multiple Classes,” arXiv no. 1111.0324. [DOI] [PMC free article] [PubMed] [Google Scholar]
Finn ES, Shen X, Scheinost D, Rosenberg MD, Huang J, Chun MM, Papademetris X, and Constable RT (2015), “Functional Connectome Fingerprinting: Identifying Individuals Using Patterns of Brain Connectivity,” Nature Neuroscience, 18, 1664–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fornito A, Zalesky A, and Breakspear M (2013), “Graph Analysis of the Human Connectome: Promise, Progress, and Pitfalls,” NeuroImage, 80, 426–444. [DOI] [PubMed] [Google Scholar]
Fox MD, and Greicius M (2010), “Clinical Applications of Resting State Functional Connectivity,” Frontiers in Systems Neuroscience, 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Friedman JH, Hastie TJ, and Tibshirani RJ (2008), “Sparse Inverse Covariance Estimation With the Graphical Lasso,” Biostatistics, 9, 432–441. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fukumizu K, Bach FR, and Jordan MI (2009), “Kernel Dimension Reduction in Regression,” The Annals of Statistics, 37, 1871–1905. [Google Scholar]
Greene AS, Gao S, Scheinost D, and Constable RT (2018), “Task-induced Brain State Manipulation Improves Prediction of Individual Traits,” A Nature Communications, 9, 2807. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hastie TJ and Tibshirani RJ (1990), Generalized Additive Models, New York: Chapman & Hall. [Google Scholar]
Hearne LJ, Mattingley JB, and Cocchi L (2016), “Functional Brain Networks Related to Individual Differences in Human Intelligence at Rest,” Scientific Reports, 6, 32328. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kolar M, Song L, Ahmed A, and Xing EP (2010), “Estimating Time-Varying Networks,” The Annals of Applied Statistics, 4, 94–123. [Google Scholar]
Lee K-Y, Li B, and Zhao H (2016a), “On an Additive Partial Correlation Operator and Nonparametric Estimation of Graphical Models,” Biometrika, 103, 513–530. [DOI] [PMC free article] [PubMed] [Google Scholar]
______ (2016b), “Variable Selection Via Additive Conditional Independence,” Journal of the Royal Statistical Society, Series B, 78, 1037–1055. [Google Scholar]
Lee W, and Liu Y (2015), “Joint Estimation of Multiple Precision Matrices With Common Structures,” The Journal of Machine Learning Research, 16, 1035–1062. [PMC free article] [PubMed] [Google Scholar]
Li B, Chun H, and Zhao H (2012), “Sparse Estimation of Conditional Graphical Models With Application to Gene Networks,” Journal of the American Statistical Association, 107, 152–167. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li B, and Solea E (2018), “A Nonparametric Graphical Model for Functional Data With Application to Brain Networks Based on fMRI,” Journal of the American Statistical Association, 113, 1637–1655. [Google Scholar]
Liu H, Han F, Yuan M, Lafferty J, and Wasserman L (2012), “ High-Dimensional Semiparametric Gaussian Copula Graphical Models,” The Annals of Statistics, 40, 2293–2326. [Google Scholar]
Peng J, Wang P, Zhou N, and Zhu J (2009), “Partial Correlation Estimation by Joint Sparse Regression Models,” Journal of the American Statistical Association, 104, 735–746. [DOI] [PMC free article] [PubMed] [Google Scholar]
Qiao X, Guo S, and James GM (2019), “Functional Graphical Models,” Journal of the American Statistical Association, 114, 211–222. [Google Scholar]
Qiu H, Han F, Liu H, and Caffo B (2016), “Joint Estimation of Multiple Graphical Models from High Dimensional Time Series,” Journal of the Royal Statistical Society, Series B, 78, 487–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rothman AJ, Bickel PJ, Levina E, and Zhu J (2008), “Sparse Permutation Invariant Covariance Estimation,” Electronic Journal of Statistics, 2, 494–515. [Google Scholar]
Smith R, Lane RD, Alkozei A, Bao J, Smith C, Sanova A, Nettles M, and Killgore WDS (2018), “The Role of Medial Prefrontal Cortex in the Working Memory Maintenance of One’s Own Emotional Responses,” Science Reports, 8, 3460. [DOI] [PMC free article] [PubMed] [Google Scholar]
Song M, Zhou Y, Li J, Liu Y, Tian L, Yu C, and Jiang T (2008), “Brain Spontaneous Functional Connectivity and Intelligence,” Neuroimage, 41, 1168–1176. [DOI] [PubMed] [Google Scholar]
Tsay RS, and Pourahmadi M (2017), “Modelling Structured Correlation Matrices,” Biometrika, 104, 237–242. [Google Scholar]
Wang J-L, Chiou J-M, and Muller H-G (2016a), “Functional Data Analysis,” Annual Review of Statistics and Its Application, 3, 257–295. [Google Scholar]
Wang Y, Kang J, Kemmer PB, and Guo Y (2016b), “An Efficient and Reliable Statistical Method for Estimating Functional Connectivity in Large Scale Brain Networks Using Partial Correlation,” Frontiers in Neuroscience, 10, 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang Y, Yin Y, Sun YW, Zhou Y, Chen X, Ding WN, Wang W, Li W, Xu JR, and Du YS (2015), “Decreased Prefrontal Lobe Interhemispheric Functional Connectivity in Adolescents With Internet Gaming Disorder: A Primary Study Using Resting-State fMRI,” PLoS ONE, 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wei Z and Li H (2008), “A Hidden Spatial-Temporal Markov Random Field Model for Network-Based Analysis of Time Course Gene Expression Data,” Annals of Applied Statistics, 2, 408–429. [Google Scholar]
Xu KS, and Hero AO (2014), “Dynamic Stochastic Blockmodels for Time-Evolving Social Networks,” IEEE Journal of Selected Topics in Signal Processing, 8, 552–562. [Google Scholar]
Yuan M, and Lin Y (2007), “Model Selection and Estimation in the Gaussian Graphical Model,” Biometrika, 94, 19–35. [Google Scholar]
Zanto TP, and Gazzaley A (2013), “Fronto-Parietal Network: Flexible Hub of Cognitive Control,” Trends in Cognitive Sciences, 17, 602–623. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang J, and Cao J (2017), “Finding Common Modules in a Time-Varying Network With Application to the Drosophila Melanogaster Gene Regulation Network,” Journal of the American Statistical Association, 112, 994–1008. [Google Scholar]
Zhang T, Wu J, Li F, Caffo B, and Boatman-Reich D (2015), “A Dynamic Directional Model for Effective Brain Connectivity Using Electrocorticographic (ECoG) Time Series,” Journal of the American Statistical Association, 110, 93–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu Y, and Li L (2018), “Multiple Matrix Gaussian Graphs Estimation,” Journal of the Royal Statistical Society, Series B, 80, 927–950. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

NIHMS1746368-supplement-Supplement.pdf^{(2.8MB, pdf)}

[R1] Bach F (2009), “High-Dimensional Non-Linear Variable Selection Through Hierarchical Kernel Learning,” arXiv no. 0909.0844. [Google Scholar]

[R2] Baker CR (1973), “Joint Measures and Cross-Covariance Operators,” Transactions of the American Mathematical Society, 186, 273–289. [Google Scholar]

[R3] Bickel PJ, and Levina E (2008), “Covariance Regularization by Thresholding,” The Annals of Statistics, 36, 2577–2604. [Google Scholar]

[R4] Bosq D (2000), Linear Processes in Function Spaces, New York: Springer. [Google Scholar]

[R5] Bullmore E, and Sporns O (2009), “Complex Brain Networks: Graph Theoretical Analysis of Structural and Functional Systems,” Nature Reviews. Neuroscience, 10, 186–198. [DOI] [PubMed] [Google Scholar]

[R6] Cai TT, Liu W, and Luo X (2011), “A Constrained ℓ₁ Minimization Approach to Sparse Precision Matrix Estimation,” Journal of the American Statistical Association, 106(494):594–607. [Google Scholar]

[R7] Chun H, Zhang X, and Zhao H (2015), “Gene Regulation Network Inference With Joint Sparse Gaussian Graphical Models,” Journal of Computational and Graphical Statistics, 24, 954–974. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Danaher P, Wang P, and Witten DM (2011), “The joint Graphical Lasso for Inverse Covariance Estimation Across Multiple Classes,” arXiv no. 1111.0324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Finn ES, Shen X, Scheinost D, Rosenberg MD, Huang J, Chun MM, Papademetris X, and Constable RT (2015), “Functional Connectome Fingerprinting: Identifying Individuals Using Patterns of Brain Connectivity,” Nature Neuroscience, 18, 1664–71. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Fornito A, Zalesky A, and Breakspear M (2013), “Graph Analysis of the Human Connectome: Promise, Progress, and Pitfalls,” NeuroImage, 80, 426–444. [DOI] [PubMed] [Google Scholar]

[R11] Fox MD, and Greicius M (2010), “Clinical Applications of Resting State Functional Connectivity,” Frontiers in Systems Neuroscience, 4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Friedman JH, Hastie TJ, and Tibshirani RJ (2008), “Sparse Inverse Covariance Estimation With the Graphical Lasso,” Biostatistics, 9, 432–441. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Fukumizu K, Bach FR, and Jordan MI (2009), “Kernel Dimension Reduction in Regression,” The Annals of Statistics, 37, 1871–1905. [Google Scholar]

[R14] Greene AS, Gao S, Scheinost D, and Constable RT (2018), “Task-induced Brain State Manipulation Improves Prediction of Individual Traits,” A Nature Communications, 9, 2807. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Hastie TJ and Tibshirani RJ (1990), Generalized Additive Models, New York: Chapman & Hall. [Google Scholar]

[R16] Hearne LJ, Mattingley JB, and Cocchi L (2016), “Functional Brain Networks Related to Individual Differences in Human Intelligence at Rest,” Scientific Reports, 6, 32328. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Kolar M, Song L, Ahmed A, and Xing EP (2010), “Estimating Time-Varying Networks,” The Annals of Applied Statistics, 4, 94–123. [Google Scholar]

[R18] Lee K-Y, Li B, and Zhao H (2016a), “On an Additive Partial Correlation Operator and Nonparametric Estimation of Graphical Models,” Biometrika, 103, 513–530. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] ______ (2016b), “Variable Selection Via Additive Conditional Independence,” Journal of the Royal Statistical Society, Series B, 78, 1037–1055. [Google Scholar]

[R20] Lee W, and Liu Y (2015), “Joint Estimation of Multiple Precision Matrices With Common Structures,” The Journal of Machine Learning Research, 16, 1035–1062. [PMC free article] [PubMed] [Google Scholar]

[R21] Li B, Chun H, and Zhao H (2012), “Sparse Estimation of Conditional Graphical Models With Application to Gene Networks,” Journal of the American Statistical Association, 107, 152–167. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Li B, and Solea E (2018), “A Nonparametric Graphical Model for Functional Data With Application to Brain Networks Based on fMRI,” Journal of the American Statistical Association, 113, 1637–1655. [Google Scholar]

[R23] Liu H, Han F, Yuan M, Lafferty J, and Wasserman L (2012), “ High-Dimensional Semiparametric Gaussian Copula Graphical Models,” The Annals of Statistics, 40, 2293–2326. [Google Scholar]

[R24] Peng J, Wang P, Zhou N, and Zhu J (2009), “Partial Correlation Estimation by Joint Sparse Regression Models,” Journal of the American Statistical Association, 104, 735–746. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Qiao X, Guo S, and James GM (2019), “Functional Graphical Models,” Journal of the American Statistical Association, 114, 211–222. [Google Scholar]

[R26] Qiu H, Han F, Liu H, and Caffo B (2016), “Joint Estimation of Multiple Graphical Models from High Dimensional Time Series,” Journal of the Royal Statistical Society, Series B, 78, 487–504. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Rothman AJ, Bickel PJ, Levina E, and Zhu J (2008), “Sparse Permutation Invariant Covariance Estimation,” Electronic Journal of Statistics, 2, 494–515. [Google Scholar]

[R28] Smith R, Lane RD, Alkozei A, Bao J, Smith C, Sanova A, Nettles M, and Killgore WDS (2018), “The Role of Medial Prefrontal Cortex in the Working Memory Maintenance of One’s Own Emotional Responses,” Science Reports, 8, 3460. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Song M, Zhou Y, Li J, Liu Y, Tian L, Yu C, and Jiang T (2008), “Brain Spontaneous Functional Connectivity and Intelligence,” Neuroimage, 41, 1168–1176. [DOI] [PubMed] [Google Scholar]

[R30] Tsay RS, and Pourahmadi M (2017), “Modelling Structured Correlation Matrices,” Biometrika, 104, 237–242. [Google Scholar]

[R31] Wang J-L, Chiou J-M, and Muller H-G (2016a), “Functional Data Analysis,” Annual Review of Statistics and Its Application, 3, 257–295. [Google Scholar]

[R32] Wang Y, Kang J, Kemmer PB, and Guo Y (2016b), “An Efficient and Reliable Statistical Method for Estimating Functional Connectivity in Large Scale Brain Networks Using Partial Correlation,” Frontiers in Neuroscience, 10, 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Wang Y, Yin Y, Sun YW, Zhou Y, Chen X, Ding WN, Wang W, Li W, Xu JR, and Du YS (2015), “Decreased Prefrontal Lobe Interhemispheric Functional Connectivity in Adolescents With Internet Gaming Disorder: A Primary Study Using Resting-State fMRI,” PLoS ONE, 10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Wei Z and Li H (2008), “A Hidden Spatial-Temporal Markov Random Field Model for Network-Based Analysis of Time Course Gene Expression Data,” Annals of Applied Statistics, 2, 408–429. [Google Scholar]

[R35] Xu KS, and Hero AO (2014), “Dynamic Stochastic Blockmodels for Time-Evolving Social Networks,” IEEE Journal of Selected Topics in Signal Processing, 8, 552–562. [Google Scholar]

[R36] Yuan M, and Lin Y (2007), “Model Selection and Estimation in the Gaussian Graphical Model,” Biometrika, 94, 19–35. [Google Scholar]

[R37] Zanto TP, and Gazzaley A (2013), “Fronto-Parietal Network: Flexible Hub of Cognitive Control,” Trends in Cognitive Sciences, 17, 602–623. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Zhang J, and Cao J (2017), “Finding Common Modules in a Time-Varying Network With Application to the Drosophila Melanogaster Gene Regulation Network,” Journal of the American Statistical Association, 112, 994–1008. [Google Scholar]

[R39] Zhang T, Wu J, Li F, Caffo B, and Boatman-Reich D (2015), “A Dynamic Directional Model for Effective Brain Connectivity Using Electrocorticographic (ECoG) Time Series,” Journal of the American Statistical Association, 110, 93–106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] Zhu Y, and Li L (2018), “Multiple Matrix Gaussian Graphs Estimation,” Journal of the Royal Statistical Society, Series B, 80, 927–950. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Conditional Functional Graphical Models

Kuang-Yao Lee

Dingjue Ji

Lexin Li

Todd Constable

Hongyu Zhao

Abstract

1. Introduction

2. Model

Definition 1.

3. Linear Operators

3.1. Conditional Covariance and Correlation Operators

Assumption 1.

Proposition 1.

Assumption 2.

Proposition 2.

Definition 2.

Assumption 3.

3.2. Conditional Precision and Partial CorrelationOperators

Assumption 4.

Assumption 5.

Definition 3.

Proposition 3.

Definition 4.

3.3. Relation With Conditional Functional Graph

Proposition 4.

Theorem 1.

Theorem 2.

4. Estimation

4.1. Operator-Level Estimation

Proposition 5.

4.2. Empirical Bases and Coordinate Representation

Proposition 6.

Proposition 7.

4.3. Algorithm

5. Asymptotic Theory

5.1. Concentration Inequalities and Uniform Convergence

Theorem 3.

Assumption 6.

Theorem 4.

Assumption 7.

Theorem 5.

5.2. Uniform Convergence of CPO and Graph Consistency

Assumption 8.

Proposition 8.

Theorem 6.

5.3. Consistency for Partially Observed Random Functions

Theorem 7.

6. Simulations

Table 1.

Figure 1.

Figure 2.

7. Application

Figure 3.

Figure 4.

Supplementary Material

Acknowledgments

Funding

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases