Estimating Time-Varying Graphical Models

Jilei Yang; Jie Peng

doi:10.1080/10618600.2019.1647848

. Author manuscript; available in PMC: 2021 Apr 6.

Published in final edited form as: J Comput Graph Stat. 2019 Sep 3;29(1):191–202. doi: 10.1080/10618600.2019.1647848

Estimating Time-Varying Graphical Models

Jilei Yang ¹, Jie Peng ^2,^†

PMCID: PMC8023339 NIHMSID: NIHMS1053832 PMID: 33828398

Abstract

In this paper, we study time-varying graphical models based on data measured over a temporal grid. Such models are motivated by the needs to describe and understand evolving interacting relationships among a set of random variables in many real applications, for instance the study of how stock prices interact with each other and how such interactions change over time.

We propose a new model, LOcal Group Graphical Lasso Estimation (loggle), under the assumption that the graph topology changes gradually over time. Specifically, loggle uses a novel local group-lasso type penalty to efficiently incorporate information from neighboring time points and to impose structural smoothness of the graphs. We implement an ADMM based algorithm to fit the loggle model. This algorithm utilizes blockwise fast computation and pseudo-likelihood approximation to improve computational efficiency. An R package loggle has also been developed and is available on https://cran.r-project.org/.

We evaluate the performance of loggle by simulation experiments. We also apply loggle to S&P 500 stock price data and demonstrate that loggle is able to reveal the interacting relationships among stock prices and among industrial sectors in a time period that covers the recent global financial crisis.

The supplemental materials for this paper are also available online.

Keywords: ADMM algorithm, Gaussian graphical model, group-lasso, pseudo-likelihood approximation, S&P 500

1. Introduction

In recent years, there are many problems where the study of the interacting relationships among a large number of variables is of interest. One popular approach is to characterize interactions as conditional dependencies: Two variables are interacting with each other if they are conditionally dependent given the rest of the variables. An advantage of using conditional dependency instead of marginal dependency (e.g., through correlation) is that we are aiming for more direct interactions after taking out the effects of the rest of the variables. Moreover, if the random variables follow a multivariate normal distribution, then the elements of the inverse covariance matrix Σ⁻¹ (a.k.a. precision matrix) would indicate the presence/absence of such interactions. This is because under normality, two variables are conditionally dependent given the rest of the variables if and only if the corresponding element of the precision matrix is nonzero. Furthermore, we can represent such interactions by a graph $G = (V, E)$ , where the node set $V$ represents the set of random variables and the edge set $E$ consists of pairs {i, j} where the (i, j)th element of Σ⁻¹ is nonzero. Such models are referred to as Gaussian graphical models (GGM).

Many methods have been proposed to learn GGMs when the number of variables is large (relative to the sample size), including Meinshausen & Bühlmann (2006), Yuan & Lin (2007), Friedman et al. (2008), Banerjee et al. (2008), Rothman et al. (2008), Peng et al. (2009), Lam & Fan (2009), Ravikumar et al. (2011), Cai et al. (2011). These methods rely on the sparsity assumption, i.e., only a small subset of elements in the precision matrix is nonzero, to deal with challenges posed by high-dimension-low-sample-size.

The aforementioned methods would learn a single graph. However, when data are observed over a temporal or spatial grid, the underlying graph could change over time/space. For example, the relationships among stock prices clearly evolve over time as illustrated by Figure 2. If we had described them by a single graph, the results would be misleading. This necessitates the study of time-varying graphical models. When the graph/covariance matrix changes over time, the observations are not identically distributed anymore. To deal with this challenge, one approach is to assume the covariance matrix changes smoothly over time. In Zhou et al. (2010), Song et al. (2009), Kolar et al. (2010), Kolar & Xing (2012), Wang & Kolar (2014), Monti et al. (2014), Gibberd & Nelson (2014), Gibberd & Nelson (2017), this is achieved by replacing the sample covariance matrix by kernel estimates of the covariance matrices in the objective function.

Figure 2: — Panels(a)-(e): Fitted graphs at 5 time points. Panel(f): The sector-wise percentage of presence of within-sector edges (y-axis, colored curves) and the percentage of presence of cross-sector edges (y-axis, black curve) vs. time (x-axis). These plots show clear evolving interacting patterns among stock prices.

In practice, understanding how graph topology evolves over time is often of more interests than estimating the covariance matrices. Moreover, imposing certain structural assumptions on the graph topology could greatly facilitate interpretation and consequently provide insights about the interacting relationships and how they change over time. In Zhou et al. (2010) and Song et al. (2009), sparsity of graph topology is imposed via the lasso penalty (Tibshirani 1996). In Wang & Kolar (2014), sparsity as well as constancy of graph topology are imposed via a group-lasso type penalty (Yuan & Lin 2006). In contrast, another group of time-varying graphical models imposes separate penalties on the sparsity of the graph and the smoothness of graph topology across time. Most of these methods utilize fused-lasso type penalties (Tibshirani et al. 2005) for smoothness, including Ahmed & Xing (2009), Kolar et al. (2010), Kolar & Xing (2012), Monti et al. (2014), Gibberd & Nelson (2014, 2015, 2017), Wit & Abbruzzo (2015), Hallac et al. (2017), so that the entries of the precision matrix are piecewise constant over time. This is particularly convenient when we are primarily interested in detecting jump points and abrupt changes. Moreover, Yang et al. (2015) considered a model that uses fused-lasso penalty to learn multiple graphs corresponding to ordered categories.

In this paper, we assume that the graph topology is gradually changing over time, although neither the entry of the precision matrix nor the graph topology is assumed to be piecewise constant. We refer to this as the structural smoothness assumption. We propose LOcal Group Graphical Lasso Estimation (loggle), a time-varying graphical model that imposes both sparsity and structural smoothness through a novel local group-lasso type penalty. The main innovation of the loggle method is as follows. First, the proposed local group-lasso penalty not only efficiently combines neighborhood information and ensures structural smoothness, it also allows loggle to adapt to the local degree of smoothness in a data driven fashion. Consequently, the loggle method is very flexible and is effective for a wide range of scenarios including both time-varying and time-invariant graphical models. Secondly, we implement a computationally efficient ADMM algorithm (Boyd et al. 2011) that utilizes both blockwise fast computation (Witten et al. 2011) and pseudo-likelihood approximation (Peng et al. 2009). The pseudo-likelihood approximation not only greatly improves computational efficiency, but also leads to better graph recovery results. Thirdly, for tuning parameters selection, we propose an efficient cross-validation based procedure for temporally indexed observations. We demonstrate the competitive performance of loggle through simulation studies. Finally, we apply loggle to the S&P 500 stock price data to reveal how interactions among stock prices evolve during the recent global financial crisis. An R package loggle has also been developed.

The rest of the paper is organized as follows. In Section 2, we introduce the loggle model, model fitting algorithms and strategies for model tuning. In Section 3, we present simulation results to demonstrate the performance of loggle and compare it with existing methods. We report the application on S&P 500 stock price data in Section 4, followed by conclusions in Section 5. Technical details are in an Appendix. Additional details are deferred to a Supplementary Material.

2. Methods

2.1. Local Group Graphical Lasso Estimation

In this section, we introduce loggle (LOcal Group Graphical Lasso Estimation) for time-varying graphical models.

Let $X (t) = {(X^{1} (t), \dots, X^{p} (t))}^{T} \sim N_{p} (μ (t), Σ (t))$ be a p-dimensional Gaussian random vector indexed by t ∈ [0, 1]. We assume X(t)’s are independent across t. We also assume the mean function μ(t) and the covariance function Σ(t) are smooth in t. We denote the observations by ${x_{k}}_{k \in I}$ where $I = {1, \dots, N}$ , x_k is a realization of $X (t_{k}) (k \in I)$ and 0 ≤ t₁ ≤ … ≤ t_N ≤ 1. For simplicity, we assume that the observations are centered so that x_k is drawn from $N_{p} (0, Σ (t_{k}))$ . In practice, we can achieve this by subtracting the estimated mean $\hat{μ} (t_{k})$ from x_k. See Section S.1.3 of the Supplementary Material for details.

Our goal is to estimate the precision matrix Ω(t) := Σ⁻¹(t) based on the observed data ${x_{k}}_{k \in I}$ and then construct the edge set (equiv. the graph topology) $E (t)$ based on the sparsity pattern of the estimated precision matrix $\hat{Ω} (t)$ . We further assume that the edge set (equiv. the graph topology) changes gradually over time.

To estimate the precision matrix Ω(t_k) at the kth observed time point, we propose to minimize a locally weighted negative log-likelihood function with a local group-lasso penalty (referred to as the loggle penalty) :

L (Ω_{k}) ≔ \frac{1}{\sqrt{| N_{k, d} |}} \sum_{i \in N_{k, d}} [tr (Ω (t_{i}) \hat{Σ} (t_{i})) - \log | Ω (t_{i}) |] + λ \sum_{u \neq v} \sqrt{\sum_{i \in N_{k, d}} Ω_{u v} {(t_{i})}^{2}},

(1)

where $N_{k, d} = {i \in I : | t_{i} - t_{k} | \leq d}$ denotes the indices of the time points centered around t_k with neighborhood width d and $| N_{k, d} |$ denotes the number of elements in $N_{k, d}; Ω_{k} = {Ω (t_{i})}_{i \in N_{k, d}}$ denotes the set of precision matrices within this neighborhood and Ω_uv(t_i) denotes the (u, v)-th element of $Ω (t_{i}); \hat{Σ} (t) = \sum_{j = 1}^{N} ω_{h}^{t_{j}} (t) x_{j} x_{j}^{T}$ is the kernel estimate of the covariance matrix at time t, where the weights $ω_{h}^{t_{j}} (t) = \frac{K_{h} (t_{j} - t)}{\sum_{j = 1}^{N} K_{h} (t_{j} - t)}, K_{h} (\cdot) = K (\cdot / h)$ is a symmetric nonnegative kernel function and h(> 0) is the bandwidth.

The use of the kernel estimate $\hat{Σ} (t)$ is justified by the assumption that the covariance matrix Σ(t) is smooth in t. This allows us to borrow information from neighboring time points. In practice, we often replace the kernel smoothed covariance matrices by kernel smoothed correlation matrices which amounts to data standardization.

The loggle penalty $λ \sum_{u \neq v} \sqrt{\sum_{i \in N_{k, d}} Ω_{u v} {(t_{i})}^{2}}$ is a group-lasso type sparse regularizer (Yuan & Lin 2006, Danaher et al. 2014) that makes the graph topology both sparse and change smoothly over time. The degree of smoothness is controlled by the tuning parameter d(> 0), the larger the neighborhood width d, the more gradually the graph topology would change. The overall sparsity of the graphs is controlled by the tuning parameter λ(> 0), the larger the λ, the sparser the graphs tend to be. The factor $\frac{1}{\sqrt{| N_{k, d} |}}$ in equation (1) is to make λ comparable for different d.

After obtaining

{\hat{Ω}}_{k} = {\hat{Ω} (t_{i})}_{i \in N_{k, d}} = \arg \min_{Ω (t_{i}) ≻ 0, i \in N_{k, d}} L (Ω_{k}),

we set $\hat{Ω} (t_{k})$ as the estimated precision matrix at time t_k. The estimated graph $\hat{G} (t_{k})$ is subsequently determined by the sparsity pattern of $\hat{Ω} (t_{k})$ . Note that, although here we obtain an estimate for every time point in the time-window ${t_{i} : i \in N_{k, d}}$ , we only retain the estimate at the center, i.e., at t_k. We would repeat the above estimation procedure for each time point t_k. For two nearby time points t_k, $t_{k^{'}}$ , as their respective neighborhoods $N_{k, d}$ and $N_{k^{'}, d}$ in the loggle penalty are largely overlapping with each other, we would expect the estimates of precision matrices $\hat{Ω} (t_{k})$ and $\hat{Ω} (t_{k^{'}})$ and those of the graphs $\hat{G} (t_{k})$ and $\hat{G} (t_{k^{'}})$ to be similar (though not necessarily the same). As such, the estimates $\hat{Ω} (t_{k})$ and $\hat{G} (t)$ would change gradually with time.

Since the loggle penalty is likely to over-shrink the elements in the precision matrix, we further perform model refitting by maximizing the weighted log-likelihood function under the constraint of the estimated edge set (equiv. sparsity pattern). We denote the refitted estimate by ${\hat{Ω}}^{rf} (t_{k})$ . Note that, the precision matrix may be estimated at any time point t ∈ [0,1]: If $t \notin {t_{k} : k \in I}$ , then choose an integer $\tilde{k} \notin I$ and define $t_{\tilde{k}} = t, \tilde{I} = I \cup {\tilde{k}}$ and $N_{\tilde{k}, d} = {i \in \tilde{I} : | t_{i} - t_{\tilde{k}} | \leq d}$ . For simplicity of exposition, throughout we describe the loggle fits at observed time points.

The loggle model includes two existing time-varying graphical models as special cases. Specifically, in Zhou et al. (2010), Ω(t_k) is estimated by minimizing a weighted negative log-likelihood function with the lasso penalty (Tibshirani 1996):

\min_{Ω (t_{k}) ≻ 0} tr (Ω (t_{k}) \hat{Σ} (t_{k})) - \log | Ω (t_{k}) | + λ \sum_{u \neq v} | Ω_{u v} (t_{k}) |,

which is a special case of loggle by setting d = 0. This method utilizes the smoothness of the covariance matrix by introducing the kernel estimate $\hat{Σ} (t)$ in the likelihood function. However, it ignores potential structural smoothness of the graphs and thus might not utilize the data most efficiently. Hereafter, we refer to this method as kernel.

On the other hand, Wang & Kolar (2014) propose to use a (global) group-lasso penalty to estimate Ω(t_k)’s simultaneously:

\min_{{Ω (t_{k}) ≻ 0}_{k = 1, .., N}} \sum_{k = 1}^{N} [tr (Ω (t_{k}) \hat{Σ} (t_{k})) - \log | Ω (t_{k}) |] + λ \sum_{u \neq v} \sqrt{\sum_{k = 1}^{N} Ω_{u v} {(t_{k})}^{2}} .

This is another special case of loggle by setting d large enough to cover the entire time interval [0, 1] (e.g., d = 1). The (global) group-lasso penalty makes the estimated precision matrices have the same sparsity pattern (equiv. same graph topology) across the entire time domain. This could be too restrictive for many applications where the graph topology is expected to change over time. Hereafter, we refer to this method as invar.

Moreover, as discussed in Section 1, there is a group of methods for time-varying graphical models, where separate penalties on sparsity of the graphs and the smoothness of graph topology across time are imposed. One such method is single – Smooth Incremental Graphical Lasso Estimation (Monti et al. 2014), where a lasso penalty and a fused-lasso type penalty are used for sparsity and smoothness, respectively. Consequently, the entries of the estimated precision matrices are piecewise constant. For a more detailed description of the single method, see Section S.4 of the Supplementary Material. In Section 3, we compare loggle with kernel, invar and single through simulation experiments.

2.2. Model Fitting

Minimizing the objective function in equation (1) with respect to $Ω_{k}$ is a convex optimization problem. This can be solved by an ADMM (alternating directions method of multipliers) algorithm (Boyd et al. 2011), which converges to global optimum under very mild conditions. However, this ADMM algorithm involves $| N_{k, d} |$ eigen-decompositions of p × p matrices (each corresponding to a time point in the neighborhood) in every iteration, which is computationally very expensive when p is large. The details of this ADMM algorithm can be found in Section S.1.1 of the Supplementary Material. In the following, we propose a fast blockwise algorithm (Witten et al. 2011) and a pseudo-likelihood approximation (Peng et al. 2009) of the objective function to speed up the computation.

Fast blockwise algorithm

If the solution is block diagonal (after suitable permutation of the variables), then we can apply the ADMM algorithm to each block separately, and consequently reduce the computational complexity from O(p³) to $\sum_{l = 1}^{L} O (p_{l}^{3})$ , where p_l’s are the block sizes and $\sum_{l = 1}^{L} p_{l} = p$ .

We establish the following theorems when there are two blocks. These results are parallel to results in Witten et al. (2011) and Danaher et al. (2014) and can be easily extended to an arbitrary number of blocks.

Theorem 1

Suppose the solution of minimization of (1) with respect to $Ω_{k}$ has the following form (after appropriate variable permutation):

\hat{Ω} (t_{i}) = (\begin{matrix} {\hat{Ω}}_{1} (t_{i}) & 0 \\ 0 & {\hat{Ω}}_{2} (t_{i}) \end{matrix}), i \in N_{k, d},

where all ${\hat{Ω}}_{1} (t_{i})$ ’s have the same dimension. Then ${{\hat{Ω}}_{1} (t_{i})}_{i \in N_{k, d}}$ and ${{\hat{Ω}}_{2} (t_{i})}_{i \in N_{k, d}}$ can be obtained by minimizing (1) on the respective sets of variables separately.

Theorem 2

Let {G₁, G₂} be a non-overlapping partition of the p variables. A necessary and sufficient condition for the variables in G₁ to be completely disconnected from those in G₂ in all estimated precision matrices ${\hat{Ω} (t_{i})}_{i \in N_{k, d}}$ through minimizing (1) is:

\frac{1}{| N_{k, d} |} \sum_{i \in N_{k, d}} {\hat{Σ}}_{u v} {(t_{i})}^{2} \leq λ^{2}, f o r a l l u \in G_{1}, v \in G_{2} .

The proof of Theorem 1 is straightforward through inspecting the Karush-Kuhn-Tucker (KKT) condition of the optimization problem of minimizing (1). The proof of Theorem 2 is given in Appendix A.1.

Based on Theorem 2, we propose the following fast blockwise ADMM algorithm:

Create a p × p adjacency matrix A. For 1 ≤ u ≠ v ≤ p, set the off-diagonal elements A_uv = 0 if $\frac{1}{| N_{k, d} |} \sum_{i \in N_{k, d}} {\hat{Σ}}_{u v} {(t_{i})}^{2} \leq λ^{2}$ ; and A_uv = 1, if otherwise.
Identify the connected components, G₁, ⋯, G_L, given the adjancency matrix A. Denote their sizes by $p_{1}, \dots, p_{L} (\sum_{l = 1}^{L} p_{l} = p)$ .
For l = 1, ⋯, L, if p_l = 1, i.e., G_l contains only one variable, say the uth variable, then set ${\hat{Ω}}_{u u} (t_{i}) = {({\hat{Σ}}_{u u} (t_{i}))}^{- 1}$ for $i \in N_{k, d}$ ; If p_l > 1, then apply the ADMM algorithm to the p_l variables in G_l to obtain the corresponding ${{\hat{Ω}}_{l} (t_{i})}_{i \in N_{k, d}}$ .

Pseudo-likelihood approximation

Even with the fast blockwise algorithm, the computational cost could still be high due to the eigen-decompositions. In the following, we propose a pseudo-likelihood approximation to speed up step (iii) of the algorithm (for simplicity of exposition, the description below is based on the entire set of the variables). In practice, the pseudo-likelihood approximation has been able to reduce computational cost by as much as 90%. Moreover, the pseudo-likelihood approximation often improves the graph recovery results especially for graphs with hubs as noted in Peng et al. (2009) and shown in Section 3.

The proposed approximation is based on the following well known fact that relates the elements of the precision matrix to the coefficients of regressing one variable to the rest of the variables (Meinshausen & Bühlmann 2006, Peng et al. 2009). Suppose a random vector (X¹,…,X^p)^T has mean zero and covariance matrix Σ. Denote the precision matrix by Ω = ((Ω_uv)) := Σ⁻¹. If we write $X^{u} = \sum_{v \neq u} β_{u v} X^{v} + ϵ_{u}$ , where the residual ϵ_u is uncorrelated with {X^v : v ≠ u}, then $β_{u v} = - \frac{Ω_{u v}}{Ω_{u u}}$ . Note that, β_uv = 0 if and only if Ω_uv = 0. Therefore identifying the sparsity pattern of the precision matrix is equivalent to identifying sparsity pattern of the regression coefficients.

We consider minimizing the following local group-lasso penalized weighted L₂ loss function for estimating $β (t_{k}) = {(β_{u v} (t_{k}))}_{u \neq v}$ :

L_{P L} (B_{k}) ≔ \frac{1}{\sqrt{| N_{k, d} |}} \sum_{i \in N_{k, d}} [\frac{1}{2} \sum_{u = 1}^{p} ‖ X_{u} - \sum_{v \neq u} β_{u v} (t_{i}) X_{v} ‖_{W_{h} (t_{i})}^{2}] + λ \sum_{u \neq v} \sqrt{\sum_{i \in N_{k, d}} β_{u v} {(t_{i})}^{2}},

(2)

where $B_{k} = {β (t_{i})}_{i \in N_{k, d}}$ is the set of β(t_i)’s within the neighborhood centered around t_k with neighborhood width d; $X_{u} = {(x_{1}^{u}, \dots, x_{N}^{u})}^{T}$ is the sequence of the uth variable in observations {x_j}_1≤j≤N and $W_{h} (t_{i}) = diag {ω_{h}^{t_{i}} (t_{j})}_{1 \leq j \leq N}$ is a weight matrix. The W-norm of a vector z is defined as $‖ z ‖_{W} = \sqrt{z^{T} Wz}$ . Once $\hat{β} (t_{k})$ is obtained through minimizing (2) with respect to $B_{k}$ , we can derive the estimated edge set at $t_{k} : \hat{E} (t_{k}) = {{u, v} : {\hat{β}}_{u v} (t_{k}) \neq 0, u < v}$ .

The objective function (2) may be viewed as an approximation of the likelihood based objective function (1) through the aforementioned regression connection by ignoring the correlation among the residuals ϵ_u’s. We refer to this approximation as the pseudo-likelihood approximation. However, minimizing (2) cannot guarantee symmetry of edge selection, i.e., ${\hat{β}}_{u v} (t)$ and ${\hat{β}}_{v u} (t)$ being simultaneously zero or nonzero. To achieve this, we modify (2) by using a paired group-lasso penalty (Friedman et al. 2010):

{\tilde{L}}_{P L} (B_{k}) = \frac{1}{\sqrt{| N_{k, d} |}} \sum_{i \in N_{k, d}} [\frac{1}{2} \sum_{u = 1}^{p} ‖ X_{u} - \sum_{v \neq u} β_{u v} (t_{i}) X_{v} ‖_{W_{h} (t_{i})}^{2}] + λ \sum_{u < v} \sqrt{\sum_{i \in N_{k, d}} [β_{u v} {(t_{i})}^{2} + β_{v u} {(t_{i})}^{2}]} .

(3)

The paired group-lasso penalty guarantees simultaneous selection of β_uv(t) and β_vu(t).

The objective function (3) can be rewritten as:

{\tilde{L}}_{P L} (B_{k}) = \frac{1}{\sqrt{| N_{k, d} |}} \sum_{i \in N_{k, d}} \frac{1}{2} {‖ Y (t_{i}) - X (t_{i}) β (t_{i}) ‖}_{2}^{2} + λ \sum_{u < v} \sqrt{\sum_{i \in N_{k, d}} [β_{u v} {(t_{i})}^{2} + β_{v u} {(t_{i})}^{2}]},

(4)

where $Y (t_{i}) = {({\tilde{X}}_{1} {(t_{i})}^{T}, \dots, {\tilde{X}}_{p} {(t_{i})}^{T})}^{T}$ is an Np × 1 vector with ${\tilde{X}}_{u} (t_{i}) = \sqrt{W_{h} (t_{i})} X_{u}$ being an N×1 vector (u = 1,⋯,p); $X (t_{i}) = ({\tilde{X}}_{(1, 2)} (t_{i}), \dots, {\tilde{X}}_{(p, p - 1)} (t_{i}))$ is an Np×p(p−1) matrix, with ${\tilde{X}}_{(u, v)} (t_{i}) = {(0_{N}^{T}, \dots, 0_{N}^{T}, {\tilde{X}}_{v} {(t_{i})}^{T}, 0_{N}^{T}, \dots, 0_{N}^{T})}^{T}$ being an Np×1 vector, where ${\tilde{X}}_{v} (t_{i})$ is in the uth block (1 ≤ u ≠ v ≤ p); and $β (t_{i}) = {(β_{12} (t_{i}), \dots, β_{p, p - 1} (t_{i}))}^{T}$ is a p(p − 1) × 1 vector.

We implement an ADMM algorithm to minimize (4), which does not involve eigen-decomposition and thus is much faster than the ADMM algorithm for minimizing the original likelihood based objective function (1). This is because, the L₂ loss used in (3) and (4) is quadratic in the parameters $B_{k}$ as opposed to the negative log-likelihood loss used in (1) which has a log-determinant term. Moreover, X(t_i) is actually a block diagonal matrix: $X (t_{i}) = diag {{\tilde{X}}_{(- u)} (t_{i})}_{1 \leq u \leq p}$ , where ${\tilde{X}}_{(- u)} (t_{i}) = ({\tilde{X}}_{1} (t_{i}), \dots, {\tilde{X}}_{u - 1} (t_{i}), {\tilde{X}}_{u + 1} (t_{i}), \dots, {\tilde{X}}_{p} (t_{i}))$ is an N ×(p−1) matrix. Therefore, computations can be done in a blockwise fashion and potentially can be parallelized. The detailed algorithm is given in Appendix A.2.

2.3. Model Tuning

In the loggle model, there are three tuning parameters, namely, h – the kernel bandwidth (for $\hat{Σ} (t)$ ’s), d – the neighborhood width (for $N_{k, d}$ ’s) and λ – the sparsity parameter. In the following, we describe V -fold cross-validation (CV) to choose these parameters.

Recall that observations are made on a temporal grid. So we create the validation sets by including every V th data point and the corresponding training set would be the rest of the data points. E.g., for V = 5, the 1st validation set would include observations at t₁, t₆, t₁₁, ⋯, the 2nd validation set would include those at t₂, t₇, t₁₂, ⋯, etc. In the following, let $I_{(v)}$ denote the indices of the time points in the vth validation set and $I_{- (v)}$ denote those in the vth training set (v = 1, ⋯, V).

Let h_grid, d_grid, λ_grid denote the tuning grids from which h, d and λ, respectively, are chosen. See Section 3 for an example of the tuning grids. We recommend to choose d and λ separately for each t_k as the degrees of sparsity and smoothness of the graph topology may vary over time. On the other hand, we recommend to choose a common h for all time points.

Given time t_k and h, for (d_k, λ_k), we obtain the refitted estimate ${\hat{Ω}}_{- (v)}^{rf} (t_{k}; d_{k}, λ_{k}, h)$ by applying loggle to the vth training set ${x_{i}}_{i \in I_{- (v)}}$ (v = 1, ⋯, V). As mentioned in Section 2.1, this can be done even if $t_{k} \notin {t_{i} : i \in I_{- (v)}}$ . We then derive the validation score on the vth validation set:

{CV}_{v} (t_{k}; λ_{k}, d_{k}, h) = tr ({\hat{Ω}}_{- (v)}^{rf} (t_{k}; d_{k}, λ_{k}, h) {\hat{Σ}}_{(v)} (t_{k})) - \log | {\hat{Ω}}_{- (v)}^{rf} (t_{k}; d_{k}, λ_{k}, h) |,

where ${\hat{Σ}}_{(v)} (t_{k}) ≔ \sum_{i \in I_{(v)}} ω_{h_{V}}^{t_{i}} (t_{k}) x_{i} x_{i}^{T}$ is the kernel estimate of the covariance matrix Σ(t_k) based on the vth validation set ${x_{i}}_{i \in I_{(v)}}$ . Here, the bandwidth h_V is set to be $h \cdot {(\frac{1}{V - 1})}^{- 1 / 5}$ to reflect the difference in sample sizes between the validation and training sets. Finally, the V -fold cross-validation score at time t_k is defined as: $CV (t_{k}; λ_{k}, d_{k}, h) = \sum_{v = 1}^{V} {CV}_{v} (t_{k}; λ_{k}, d_{k}, h)$ . The “optimal” tuning parameters at t_k given h, $({\hat{d}}_{k} (h), {\hat{λ}}_{k} (h))$ , is the pair that minimizes the CV score. Finally, the “optimal” h is chosen by minimizing the sum of $CV (t_{k}; {\hat{λ}}_{k} (h), {\hat{d}}_{k} (h), h)$ over those time points where a loggle model is fitted.

We also adopt the cv.vote procedure proposed in Peng et al. (2010) which has been shown to be able to significantly reduce the false discovery rate while sacrifice only modestly in power. Specifically, given the CV selected tuning parameters, we examine the fitted model on each training set and only retain those edges that appear in at least T% of these models. In practice, we recommend 80% as the cut off value for edge retention.

Moreover, we implement efficient grid search strategies including early stopping and coarse search followed by refined search to further speed up the computation. Details can be found in Section S.2 of the Supplementary Material.

3. Simulation

In this section, we evaluate the performance of loggle and compare it with kernel, invar and single by simulation experiments.

3.1. Setting

We consider models with both time-varying graph and time-invariant graph:

Time-varying graph: (i) Generate four lower triangular matrices $B_{1}, B_{2}, B_{3}, B_{4} \in ℝ^{p \times p}$ with elements independently drawn from $N (0, 1 / 2)$ . (ii) Let ϕ₁(t) = sin(πt/2), ϕ₂(t) = cos(πt/2), ϕ₃(t) = sin(πt/4) and ϕ₄(t) = cos(πt/4), t ∈ [0,1], and set G(t) = (B₁ϕ₁(t) + B₂ϕ₂(t) + B₃ϕ₃(t) + B₄ϕ₄(t))/2. (iii) Define Ω^o(t) = G(t)G^T (t) and “soft threshold” its off-diagonal elements to obtain Ω(t): $Ω_{u v} (t) = sign (1 - \frac{0.28}{| Ω_{u v}^{o} (t) |}) \cdot (1 - \frac{0.14}{| Ω_{u v}^{o} (t) |})_{+} Ω_{u v}^{o} (t)$ , where (x)₊ = max{x, 0}. (iv) Add log₁₀(p)/4 to the diagonal elements of Ω(t) to ensure positive definiteness.
Time-invariant graph: (i) Generate an Erdos-Renyi graph (Erdös & Rényi 1959) where each pair of nodes is connected independently with probability 2/p (so the total number of edges is around p). Denote the edge set of this graph by $E$ . (ii) For off-diagonal elements (1 ≤ u ≠ v ≤ p), if ${u, v} \notin E$ , set Ω_uv(t) ≡ 0 for t ∈ [0, 1]; If ${u, v} \in E$ , set Ω_uv(t) = sin(2πt − c_uv), where c_uv ∼ uniform(0,1) is a random offset. (iii) For diagonal elements (1 ≤ u ≤ p), set Ω_uu(t) = |sin(2πt−c_uu)|+log₁₀(p), where c_uu ∼ uniform(0,1) is a random offset.

We construct three models following the above descriptions. Specifically, two models have time-varying graph with p = 100 and p = 500 nodes, respectively. In these two models, the graph changes smoothly over time with the average number of edges being 51.6 (standard deviation 6.0) for p = 100 model and 203.0 (standard deviation 66.8) for p = 500 model. The plots depicting the number of edges vs. time are given in Figure S.1 of the Supplementary Material. In the third model, the graph is time-invariant (even though the precision matrix changes over time) with p = 100 nodes and 93 fixed edges.

For each model, we generate $x_{k} \sim N_{p} (0, Ω^{- 1} (t_{k}))$ , with $t_{k} = \frac{k - 1}{N} (k = 1, \dots, N + 1)$ . We use the Epanechnikov kernel $K_{h} (x) = \frac{3}{4} (1 - {(x / h)}^{2}) I_{{| x | \leq h}}$ to obtain smoothed estimates of the correlation matrices. In the following, we consider N = 1000,500,250 observations and conduct model fitting on K = 49 time points at ${\tilde{t}}_{k} \in {0.02, 0.04, \dots, 0.96, 0.98}$ .

For loggle, kernel and invar, we use 5-fold cross-validation (CV) for tuning parameters selection with the search grids h_grid = {0.1, 0.15, …, 0.3}, λ_grid = {0.15, 0.17, …, 0.35} and d_grid = {0, 0.001, 0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.25, 0.3, 1}. (Recall kernel has d fixed at 0 and invar has d fixed at 1). The corresponding results are reported in Table 1 under loggle-CV, kernel-CV and invar-CV, respectively.

Table 1: Simulation Results: Model Selection by 5-fold CV.

N = 1000 observations.

p = 100 time-varying graph model
Method	FDR	power	F₁	δ_KL

loggle-CV	0.196	0.702	0.747	2.284
kernel-CV	0.063	0.571	0.703	2.690
invar-CV	0.583	0.678	0.514	2.565

p = 500 time-varying graph model
Method	FDR	power	F₁	δ_KL

loggle-CV	0.215	0.613	0.678	9.564
kernel-CV	0.035	0.399	0.561	11.818
invar-CV	0.590	0.597	0.478	10.608

p =100 time-invariant graph model
Method	FDR	power	F₁	δ_KL

loggle-CV	0.000	0.978	0.988	1.559
kernel-CV	0.042	0.509	0.598	3.168
invar-CV	0.000	1.000	1.000	1.531

Open in a new tab

For single, the AIC implemented in the R package single results in poor estimates with very high false discovery rate. On the other hand, implementation of the CV procedure proposed in this paper would require modification of the single method as well as the code. Therefore, in Table 2, we report the best graph recovery results according to the F₁ score (defined in the next paragraph) on a search grid, referred to as single–optimal. Note that in practice, results based on data-driven tuning procedures (such as CV) are expected to be worse than single-optimal. More details such as search grids used for the single method can be found in Section S.4 in the Supplementary Material. For the purpose of comparison, in Table 2, we also report loggle, kernel and invar results corresponding to the best F₁ score, referred to as loggle-optimal, kernel-optimal and invar-optimal, respectively.

Table 2: Simulation Results: Model Selection by Optimal F₁ Score.

N = 1000 observations.

p = 100 time-varying graph model
Method	FDR	power	F₁	δ_KL

loggle-optimal	0.141	0.778	0.812	1.955
kernel-optimal	0.204	0.756	0.767	2.138
invar-optimal	0.575	0.673	0.519	2.587
single-optimal	0.221	0.735	0.750	2.115

p = 500 time-varying graph model
Method	FDR	power	F₁	δ_KL

loggle-optimal	0.153	0.690	0.758	8.361
kernel-optimal	0.139	0.607	0.703	9.903
invar-optimal	0.596	0.644	0.488	10.158
single-optimal	0.392	0.685	0.639	9.731

p =100 time-invariant graph model
Method	FDR	power	F₁	δ_KL

loggle-optimal	0.009	1.000	0.995	1.539
kernel-optimal	0.219	0.729	0.738	2.281
invar-optimal	0.011	1.000	0.995	1.538
single-optimal	0.409	0.668	0.603	2.677

Open in a new tab

The metrics used for performance evaluation include false discovery rate: $FDR ≔ 1 - \frac{1}{K} \sum_{k = 1}^{K} | {\hat{S}}_{k} \cap S_{k} | / | {\hat{S}}_{k} |$ and power := $\frac{1}{K} \sum_{k = 1}^{K} | {\hat{S}}_{k} \cap S_{k} | / | S_{k} |$ for edge detection (averaged over the K time points where graphs are estimated), where S_k and ${\hat{S}}_{k}$ are the true edge set and the estimated edge set at time point ${\tilde{t}}_{k}$ , respectively. We also consider $F_{1} := 2 \cdot \frac{(1 - FDR) \cdot power}{(1 - FDR) + power}$ as an overall metric for model selection performance which strikes a balance between FDR and power: The larger F₁ is, the better a method performs in terms of edge selection. In addition, we calculate the Kullback-Leibler (K-L) divergence (relative entropy) between the true models and the estimated models: $δ_{K L} ≔ \frac{1}{K} \sum_{k = 1}^{K} [tr (\hat{Ω} ({\tilde{t}}_{k}) Ω^{- 1} ({\tilde{t}}_{k})) - \log | \hat{Ω} ({\tilde{t}}_{k}) Ω^{- 1} ({\tilde{t}}_{k}) | - p]$ .

3.2. Results

Table 1 shows that under the time-varying graph models, loggle outperforms kernel according to F₁ score and K-L divergence. Not surprisingly, invar performs very poorly for time-varying graph models. On the other hand, under the time-invariant graph model, loggle performs similarly as invar, whereas kernel performs very poorly.

In Table S.2 of the Supplementary Material, we also report results under N = 500, 250 observations for the p = 100 time-varying and p = 100 time-invariant models, respectively. As can be seen from Tables 1 and S.2, for all three methods, power of edge detection and K-L divergence measure δ_KL deteriorate with decreasing sample size. On the other hand, the relative performances of the three methods remain the same across different sample sizes.

As for the comparison with the single method, the single-optimal result (Table 2) is slightly better than that of loggle-CV (Table 1) under the p = 100 time-varying graph model, whereas under the p = 500 time-varying graph model and p = 100 time-invariant graph model, the single-optimal results are worse than those of loggle-CV. Recall that the single-optimal results reported in Table 2 are usually not achievable in practice as F₁ scores are unknown in practice. A fairer comparison between loggle-optimal and single-optimal shows that the latter is considerably worse in all three cases (Table 2).

As for comparison between CV results and “optimal” results (Table 1 vs. Table 2), for loggle, the “optimal” results generally have smaller FDR and higher power, reflecting a better choice of tuning parameters by the “optimal” results; for kernel, the “optimal” results have both higher FDR and higher power; for invar, the “CV” results and the “optimal” results have similar FDR and similar power. It can be seen that model tuning is harder for loggle than for kernel and invar due to its increased complexity. Despite this, the results of loggle-CV are still generally better than kernel-CV and invar-CV as discussed earlier.

The simulation results demonstrate that loggle can adapt to different degrees of smoothness of the graph topology in a data driven fashion and has generally good performance across a wide range of scenarios including both time-varying and time-invariant graphical models. Moreover, loggle demonstrates competitive performance compared to the single method that utilizes separate penalties for sparsity and smoothness.

The loggle procedure is more computationally intensive than kernel and invar as it fits many more models. For the p = 100 time-varying graph model, loggle took 3750 seconds using 25 cores on a linux server with 72 cores, 256GB RAM and two Intel Xeon E5–2699 v3 @ 2.30GHz processors. At the same time, kernel took 226 seconds and invar took 777 seconds. On average, at each grid point, per loggle model fit took 23.2 milliseconds (ms), per kernel model fit took 16.8ms. Moreover, invar model fit took 2825.5 ms and single model fit took about 10 minutes (Note, for invar and single, models at all time points are fitted simultaneously). The additional computational cost of loggle is justified by its superior performance and should become less of a burden with fast growth of computational power.

As for comparison with the exact likelihood implementation, by comparing Table 1 and Table S.1 of the Supplementary Material, using pseudo-likelihood approximation improves the graph recovery results for all three methods. For example, the F₁ score of loggle increases from 0.714 to 0.747 under p = 100 time-varying graph model, from 0.647 to 0.678 under p = 500 time-varying graph model and from 0.866 to 0.988 under p = 100 time-invariant graph model. Moreover, the computational time is greatly reduced by the pseudo-likelihood approximation. For example, for p = 100 time-varying graph model, on average, per loggle model fit with exact-likelihood took 115.8 ms, and per loggle model fit with pseudo-likelihood approximation only took 23.2 ms.

4. S&P 500 Stock Price

In this section, we apply loggle to the S&P 500 stock price dataset obtained via R package quantmod from www.yahoo.com. We focus on 283 stocks from 5 Global Industry Classification Standard (GICS) sectors: 58 stocks from Information Technology, 72 stocks from Consumer Discretionary, 32 stocks from Consumer Staples, 59 stocks from Financials, and 62 stocks from Industrials. We are interested in elucidating how direct interactions (characterized by conditional dependencies) among stock prices are evolving over time and particularly how such interactions are affected by the recent global financial crisis.

For this purpose, we consider a 4-year time period from January 1st, 2007 to January 1st, 2011, which covers the recent global financial crisis: “According to the U.S. National Bureau of Economic Research, the recession, as experienced in that country, began in December 2007 and ended in June 2009, thus extending over 19 months. The Great Recession was related to the financial crisis of 2007–2008 and U.S. subprime mortgage crisis of 2007–2009 (Source: wikipedia)”. Each stock has 1008 closing prices during this period, denoted by ${y_{k}}_{k = 1}^{1008}$ . We use the logarithm of the ratio between two adjacent prices, i.e., $\log \frac{y_{k + 1}}{y_{k}} (k = 1, \dots, 1007)$ for the subsequent analysis. We also convert the time points onto [0, 1] by $t_{k} = \frac{k - 1}{1006}$ for k = 1, ⋯, 1007. By examining the autocorrelation (Figure S.2 of the Supplementary Material), the independence assumption appears to hold reasonably well.

We use the Epanechnikov kernel to obtain the kernel estimates of the correlation matrices. We then fit three models, namely, loggle, kernel and invar, at K = 201 time points {0.005, 0.010,…, 0.995} using 5-fold cross-validation for model tuning. We use the tuning grids h_grid = {0.1, 0.15}, λ_grid = {10⁻², 10^−1.9, …, 10^−0.1, 1} and d_grid = {0, 0.001, 0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.25, 0.3, 1}, where h_grid is pre-selected by using coarse search described in Section S.2 of the Supplementary Material. Table 3 reports the average number of edges across the fitted graphs (and standard deviations in parenthesis) as well as the CV scores. We can see that loggle has a significantly smaller CV score than those of kernel and invar. Moreover, on average, loggle and invar models have similar number of edges, whereas kernel models have more edges.

Table 3: Stock price:

Number of edges and CV score

Method	Average edge # (s.d.)	CV score
loggle	819.4 (331.0)	123.06
kernel	1103.5 (487.1)	160.14
invar	811.0 (0.0)	130.68

Open in a new tab

Figure 1(a) shows the number of edges in the fitted graph over time. The invar fitted graphs have an identical topology, which is unable to reflect the evolving relationships among these stocks. On the other hand, both loggle and kernel are able to capture the changing relationships by fitting graphs with time-varying topologies. More specifically, both methods detect an increased amount of interaction (characterized by larger number of edges) during the financial crisis. The amount of interaction peaked around the end of 2008 and then went down to a level still higher than that of the pre-crisis period. As can be seen from the figure, the kernel graphs show rather drastic changes, whereas the loggle graphs change more gradually. The loggle method in addition detects a period with increased interaction in the early stage of the financial crisis, indicated by the smaller peak around October 2007 in Figure 1(a). This is likely due to the subprime mortgage crisis which acted as a precursor of the financial crisis (Amadeo 2017). In the period after the financial crisis, the loggle fits are similar to those of invar with a nearly constant graph topology after March 2010, indicating that the relationships among the stocks had stabilized. In contrast, kernel fits show a small bump in edge number around the middle of 2010 and decreasing amount of interaction afterwards.

Figure 1(b) displays the proportion of within-sector edges among the total number of detected edges. During the entire time period, loggle fitted graphs consistently have higher proportion of within-sector edges than that of the kernel fitted graph. For both methods, this proportion decreased during the financial crisis due to increased amount of cross-sector interaction. For loggle, the within-sector edge proportion eventually increased and stabilized after March 2010, although at a level lower than that of the pre-crisis period. In contrast, for kernel, the within-sector proportion took a downturn again after October 2009. In summary, the loggle fitted graphs are easier to interpret in terms of describing the evolving interacting relationships among the stock prices and identifying the underlying sector structure of the stocks. Hereafter, we focus our discussion on loggle fitted graphs.

Figure 2(a)–(e) show the loggle fitted graphs at 5 different time points, namely, before, at the early stage, around the peak, towards the end and after the financial crisis. These graphs show clear evolving interacting patterns among stock prices. More specifically, the amount/degree of interaction among the stock prices increased with the deepened crisis. This is reflected by denser graphs and less isolated nodes when comparing Figure 2(c) to Figure 2(a,b). In these graphs, an isolated node may be interpreted as the corresponding stock price is not influenced by or only weakly interacting with the prices of other stocks. Moreover, with the passing of the crisis, the amount/degree of interaction decreased and eventually stabilized. This is reflected by relatively sparser graphs and more isolated nodes when comparing Figure 2(d,e) to Figure 2(c). In addition, by comparing Figure 2(e) to Figure 2(a), it can be seen that the amount/degree of interaction is higher after the crisis compared to the pre-crisis era, indicating fundamental change of the financial landscape. These graphs also show clear sector-wise clusters (nodes with the same color corresponding to stocks from the same sector).

Figure 2(f) shows the sector-wise percentage of presence of within-sector edges, defined as the ratio between the number of detected within-sector edges and the total number of possible within-sector edges for a given sector; and the percentage of presence of cross-sector edges, defined as the ratio between the number of detected cross-sector edges and the total number of possible cross-sector edges. As can be seen from this figure, the within-sector percentages are much higher than the cross-sector percentage, reaffirming the observation that loggle is able to identify the underlying sector structure. Moreover, the within-Financials sector percentage is among the highest across the entire time period, indicating that the stocks in this sector have been consistently highly interacting with each other. Finally, all percentages increased after the financial crisis began and leveled off afterwards, reflecting the increased amount of interaction during the financial crisis.

In Figure 3, the graphs describe cross-sector interactions among the 5 GICS sectors at five different time points (before, at early stage, around the peak, at late stage and after the financial crisis). In these graphs, each node represents a sector and edge width is proportional to the respective percentage of presence of cross-sector edges (defined as the detected number of edges between two sectors divided by the total number of possible edges between these two sectors). Moreover, edges with cross-sector percentage less than 0.2% are not displayed. We can see that there are more cross-sector interactions during the financial crisis, indicating higher degree of dependency among different sectors in that period. There are also some interesting observations with regard to how these sectors interact with one another and how such interactions change over time. For example, strong cross-sector interactions between the Financials sector and the Consumer Staples sector arose during the financial crisis despite of their weak relationship before and after the crisis. This is probably due to strong influence of the financial industry on the entire economy during financial crisis. Take the Consumer Discretionary sector and the Industrials sector as another example. These two sectors maintained a persistent relationship throughout the four years, indicating intrinsic connections between them irrespective of the financial landscape.

5. Conclusion

In this paper, we propose LOcal Group Graphical Lasso Estimation – loggle, a novel model for estimating a sequence of time-varying graphs based on temporal observations. By using a local group-lasso type penalty, loggle imposes structural smoothness on the estimated graphs and consequently leads to more efficient use of the data as well as more interpretable graphs. Moreover, loggle can adapt to the local degrees of smoothness and sparsity of the underlying graphs in a data driven fashion and thus is effective under a wide range of scenarios. We also develop a computationally efficient algorithm for loggle that utilizes the block-diagonal structure and pseudo-likelihood approximation and a customized cross-validation based procedure for tuning parameters selection. The effectiveness of loggle is demonstrated through simulation experiments. Moreover, by applying loggle to the S&P 500 stock price data, we obtain interpretable and insightful graphs about the dynamic interacting relationships among the prices of these stocks, particularly on how such relationships change in response to the recent global financial crisis.

An R package loggle is available on http://cran.r-project.org/. We have used R (R Core Team 2016) and the following R packages, Matrix (Bates & Maechler 2017), glasso (Friedman et al. 2014), doParallel (Corporation & Weston 2018), foreach (Microsoft & Weston 2017), sm (Bowman & Azzalini 2014) and igraph (Csardi & Nepusz 2006), in the implementation of the loggle package and when conducting simulation and real data application. The stock price data is publicly available at www.yahoo.com and R package quantmod (Ryan & Ulrich 2019) is used to extract the stock information used in this paper.

Supplementary Material

NIHMS1053832-supplement-1.pdf^{(210.6KB, pdf)}

Acknowledgments

The authors gratefully acknowledge the following support: UCD Dissertation Year Fellowship (JLY), NIH 1R01EB021707 (JLY and JP) and NSF-DMS-1148643 (JP).

Appendix

A.1. Proof of Theorem 2

By the KKT conditions, a necessary and sufficient set of conditions for ${\hat{Ω}}_{k} = {\hat{Ω} (t_{i})}_{i \in N_{k, d}}$ being the minimizer of $L (Ω_{k})$ in (1) is:

\frac{1}{\sqrt{| N_{k, d} |}} (\hat{Σ} (t_{i}) - \hat{Ω} {(t_{i})}^{- 1}) + λ Γ (t_{i}) = 0, \forall i \in N_{k, d},

(A.1)

where Γ(t_i) = (Γ_uv(t_i))_p×p, and ${(Γ_{u v} (t_{i}))}_{i \in N_{k, d}}$ is a subgradient of $\sqrt{\sum_{i \in N_{k, d}} Ω_{u v} {(t_{i})}^{2}}$ :

{(Γ_{u v} (t_{i}))}_{i \in N_{k, d}} {\begin{array}{l} = {(\frac{Ω_{u v} (t_{i})}{\sqrt{\sum_{i \in N_{k, d}} Ω_{u v} {(t_{i})}^{2}}})}_{i \in N_{k, d}} & if \sum_{i \in N_{k, d}} Ω_{u v} {(t_{i})}^{2} > 0 \\ such that \sum_{i \in N_{k, d}} Γ_{u v} {(t_{i})}^{2} \leq 1 & if \sum_{i \in N_{k, d}} Ω_{u v} {(t_{i})}^{2} = 0. \end{array}

If for $\forall i \in N_{k, d}$ , $\hat{Ω} (t_{i}) = (\begin{matrix} {\hat{Ω}}_{1} (t_{i}) & 0 \\ 0 & {\hat{Ω}}_{2} (t_{i}) \end{matrix})$ , where ${\hat{Ω}}_{1} (t_{i})$ and ${\hat{Ω}}_{2} (t_{i})$ consist of the variables in G₁ and G₂ respectively, then ${\hat{Ω}}_{k} = {\hat{Ω} (t_{i})}_{i \in N_{k, d}}$ satisfies (A.1) iff ∀u ∈ G₁, ∀v ∈ G₂, $\exists {(Γ_{u v} (t_{i}))}_{i \in N_{k, d}}$ satisfying $\sum_{i \in N_{k, d}} Γ_{u v} {(t_{i})}^{2} \leq 1$ such that

\frac{1}{\sqrt{| N_{k, d} |}} {\hat{Σ}}_{u v} (t_{i}) + λ Γ_{u v} (t_{i}) = 0, \forall i \in N_{k, d} .

This is equivalent to

\forall u \in G_{1}, \forall v \in G_{2}, \frac{1}{| N_{k, d} |} \sum_{i \in N_{k, d}} {\hat{Σ}}_{u v} {(t_{i})}^{2} \leq λ^{2} .

A.2. ADMM algorithm under pseudo-likelihood approximation

To solve the optimization problem in (4) using ADMM algorithm, we notice that the problem can be written as

minimize B_{k}, ℤ_{k} \sum_{i \in N_{k, d}} \frac{1}{2} {‖ Y (t_{i}) - X (t_{i}) β (t_{i}) ‖}_{2}^{2} + λ \sum_{u < v} \sqrt{\sum_{i \in N_{k, d}} [Z_{u v} {(t_{i})}^{2} + Z_{v u} {(t_{i})}^{2}]}, subject to β (t_{i}) - Z (t_{i}) = 0, i \in N_{k, d},

where $N_{k, d} = {i : | t_{i} - t_{k} | \leq d}$ , $B_{k} = {β (t_{i})}_{i \in N_{k, d}}$ and $ℤ_{k} = {Z (t_{i})}_{i \in N_{k, d}}$ . Note β(t_i) = (βuv(ti))_u≠v and Z(ti) = (Zuv(ti))_u≠v are $ℝ^{p (p - 1)}$ vectors.

The scaled augmented Lagrangian is

L_{ρ} (B_{k}, ℤ_{k}, U_{k}) = \sum_{i \in N_{k, d}} \frac{1}{2} {‖ Y (t_{i}) - X (t_{i}) β (t_{i}) ‖}_{2}^{2} + λ \sum_{u < v} \sqrt{\sum_{i \in N_{k, d}} [Z_{u v} {(t_{i})}^{2} + Z_{v u} {(t_{i})}^{2}]} + \frac{ρ}{2} \sum_{i \in N_{k, d}} {‖ β (t_{i}) - Z (t_{i}) + U (t_{i}) ‖}_{2}^{2},

where $U_{k} = {U (t_{i})}_{i \in N_{k, d}}$ are dual variables $(U (t_{i}) = {(U_{u v} (t_{i}))}_{u \neq v} \in ℝ^{p (p - 1)})$ .

The ADMM algorithm is as follows. We first initialize Z⁽⁰⁾(t_i) = 0, U⁽⁰⁾(t_i) = 0, $i \in N_{k, d}$ . We also need to specify ρ(> 0), which in practice is recommended to be ≈ λ (Wahlberg et al. 2012). For step s = 1, 2, … until convergence:

For $i \in N_{k, d},$
$β^{s} (t_{i}) = \arg \min_{β (t_{i})} \frac{1}{2} {‖ Y (t_{i}) - X (t_{i}) β (t_{i}) ‖}_{2}^{2} + \frac{ρ}{2} {‖ β (t_{i}) - Z^{s - 1} (t_{i}) + U^{s - 1} (t_{i}) ‖}_{2}^{2} .$

The solution β^s(t_i) sets the derivative of the objective function to 0:
$(X {(t_{i})}^{T} X (t_{i}) + ρ I) β^{s} (t_{i}) = X {(t_{i})}^{T} Y (t_{i}) + ρ (Z^{s - 1} (t_{i}) - U^{s - 1} (t_{i})) .$

It is easy to see that
$X {(t_{i})}^{T} X (t_{i}) + ρ I = diag {{\tilde{X}}_{(- u)} {(t_{i})}^{T} {\tilde{X}}_{(- u)} (t_{i}) + ρ I}_{1 \leq u \leq p} = diag {{(\hat{Σ} (t_{i}) + ρ I)}_{(- u, - u)}}_{1 \leq u \leq p},$
where $\hat{Σ} (t_{i})$ is the kernel estimate of the covariance matrix as in Section 2.1. That is, X(t_i)^T X(t_i) + ρI is a block diagonal matrix with p blocks, where the uth block, the (p−1)×(p−1) matrix ${(\hat{Σ} (t_{i}) + ρ I)}_{(- u, - u)}$ , is the matrix $\hat{Σ} (t_{i}) + ρ I$ with the uth row and the uth column deleted.

Moreover,
$X {(t_{i})}^{T} Y (t_{i}) = {({({\tilde{X}}_{(- 1)} {(t_{i})}^{T} {\tilde{X}}_{1} (t_{i}))}^{T}, \dots, {({\tilde{X}}_{(- p)} {(t_{i})}^{T} {\tilde{X}}_{p} (t_{i}))}^{T})}^{T} = {({(\hat{Σ} {(t_{i})}_{(- 1, 1)})}^{T}, \dots, {(\hat{Σ} {(t_{i})}_{(- p, p)})}^{T})}^{T} .$

That is, X(t_i)^T Y(t_i) is a p(p−1)×1 column vector consisting of p sub-vectors, where the uth sub-vector is the uth column of $\hat{Σ} (t_{i})$ with the uth element (i.e., the diagonal element) deleted.

Since X(t_i)^T X(t_i) + ρI and X(t_i)^T Y(t_i) can be decomposed into blocks, β^s(t_i) can be solved block-wisely:
$β_{u}^{s} (t_{i}) = {({(\hat{Σ} (t_{i}) + ρ I)}_{(- u, - u)})}^{- 1} (\hat{Σ} {(t_{i})}_{(- u, u)} + ρ (Z_{u}^{s - 1} (t_{i}) - U_{u}^{s - 1} (t_{i}))), u = 1, \dots, p,$
where $β_{u}^{s} (t_{i}) = (β_{u 1}^{s} (t_{i}), \dots, β_{u, u - 1}^{s} (t_{i}), \dots, β_{u, u + 1}^{s} (t_{i}), \dots, β_{u p}^{s} (t_{i}))$ is a (p−1)×1 column vector, $β^{s} (t_{i}) = {(β_{1}^{s} {(t_{i})}^{T}, \dots, β_{p}^{s} {(t_{i})}^{T})}^{T}$ , and $Z_{u}^{s - 1} (t_{i}) = (Z_{u 1}^{s - 1} (t_{i}), \dots, Z_{u p}^{s - 1} (t_{i}))$ and $U_{u}^{s - 1} (t_{i}) = (U_{u 1}^{s - 1} (t_{i}), \dots, U_{u p}^{s - 1} (t_{i}))$ contain the corresponding elements in Z^s−1(t_i) and U^s−1(t_i), respectively.

Here, we need to solve p linear systems, each with p equations. One way is to conduct Cholesky decompositions of the matrices ${(\hat{Σ} (t_{i}) + ρ I)}_{(- u, - u)}, u = 1, \dots, p$ in advance and use Gaussian elimination to solve the corresponding triangular linear systems. To do this, we apply Cholesky decomposition to $\hat{Σ} (t_{i}) + ρ I$ followed by p Givens rotations. This has overall time complexity O(p³), the same as the time complexity of the subsequent p applications of Gaussian eliminations. Note that, if we had performed Cholesky decomposition on each of the (p−1)×(p−1) matrix directly, the total time complexity would have been O(p⁴). The details of conducting Cholesky decompositions of the matrices ${(\hat{Σ} (t_{i}) + ρ I)}_{(- u, - u)}, (u = 1, \dots, p)$ through Givens rotations are given in S.1.2 of the Supplementary Material.
$ℤ_{k}^{s} = \arg \min_{ℤ_{k}} [\frac{ρ}{2} \sum_{i \in N_{k, d}} {‖ Z (t_{i}) - β^{s} (t_{i}) - U^{s - 1} (t_{i}) ‖}_{2}^{2} + λ \sum_{u < v} \sqrt{\sum_{i \in N_{k, d}} [Z_{u v} {(t_{i})}^{2} + Z_{v u} {(t_{i})}^{2}]}] .$

For $i \in N_{k, d}$ , 1 ≤ u ≠ v ≤ p, it is easy to see that
$Z_{u v}^{s} (t_{i}) = {(1 - \frac{λ}{ρ \sqrt{\sum_{j \in N_{k, d}} [{(β_{u v}^{s} (t_{j}) + U_{u v}^{s - 1} (t_{j}))}^{2} + {(β_{v u}^{s} (t_{j}) + U_{v u}^{s - 1} (t_{j}))}^{2}]}})}_{+} \cdot (β_{u v}^{s} (t_{i}) + U_{u v}^{s - 1} (t_{i})) .$
For $i \in N_{k, d}$ ,
$U^{s} (t_{i}) = U^{s - 1} (t_{i}) + β^{s} (t_{i}) - Z^{s} (t_{i}) .$

Over-relaxation

In steps (ii) and (iii), we replace β^s(t_i) by αβ^s(t_i) + (1 − α)Z^s−1(t_i), where the relaxation parameter α is set to be 1.5. It is suggested in Boyd et al. (2011) that over-relaxation with α ∈ [1.5, 1.8] can improve convergence.

Stopping criterion

The norm of the primal residual at step s is ${‖ r^{s} ‖}_{2} = \sqrt{\sum_{i \in N_{k, d}} {‖ β^{s} (t_{i}) - Z^{s} (t_{i}) ‖}_{2}^{2}}$ , and the norm of the dual residual at step s is ${‖ d^{s} ‖}_{2} = \sqrt{\sum_{i \in N_{k, d}} {‖ Z^{s} (t_{i}) - Z^{s - 1} (t_{i}) ‖}_{2}^{2}}$ . Define the feasibility tolerance for the primal as $ϵ^{p r i} = ϵ^{a b s} \sqrt{p (p - 1) | N_{k, d} |} + ϵ^{r e l} \max {\sqrt{\sum_{i \in N_{k, d}} {‖ β^{s} (t_{i}) ‖}_{2}^{2}}, \sqrt{\sum_{i \in N_{k, d}} {‖ Z^{s} (t_{i}) ‖}_{2}^{2}}}$ , and the feasibility tolerance for the dual as $ϵ^{d u a l} = ϵ^{a b s} \sqrt{p (p - 1) | N_{k, d} |} + ϵ^{r e l} \sqrt{\sum_{i \in N_{k, d}} {‖ U^{s} (t_{i}) ‖}_{2}^{2}}$ . Here ϵ^abs is the absolute tolerance and in practice is often set as 10⁻⁵ or 10⁻⁴, and ϵ^rel is the relative tolerance and in practice is often set as 10⁻³ or 10⁻². The stopping criterion is that the algorithm stops if and only if ${‖ r^{s} ‖}_{2} \leq ϵ^{p r i}$ and ${‖ d^{s} ‖}_{2} \leq ϵ^{d u a l}$ .

Footnotes

SUPPLEMENTARY MATERIAL

loggle_supplementary_text.pdf: Additional details of this paper, including additional details in algorithm, model tuning, simulation and real data application, as well as detailed description of the single method. (.pdf file)

loggle_test Folder: Folder containing the R package loggle, R scripts for simulation and real data application, and data used in simulation and real data application. (folder)

README.txt in loggle_test Folder: Detailed description of the files in loggle_test Folder. (.txt file)

Contributor Information

Jilei Yang, Department of Statistics, University of California, Davis.

Jie Peng, Department of Statistics, University of California, Davis.

References

Ahmed A & Xing EP (2009), ‘Recovering time-varying networks of dependencies in social and biological studies’, Proceedings of the National Academy of Sciences 106(29), 11878–11883. [DOI] [PMC free article] [PubMed] [Google Scholar]
Amadeo K (2017), ‘Here’s how they missed the early clues of the financial crisis’. URL: https://www.thebalance.com/2007-financial-crisis-overview-3306138
Banerjee O, Ghaoui LE & dAspremont A (2008), ‘Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data’, Journal of Machine learning research 9(March), 485–516. [Google Scholar]
Bates D & Maechler M (2017), Matrix: Sparse and Dense Matrix Classes and Methods. R package version 1.2–10. URL: https://CRAN.R-project.org/package=Matrix
Bowman AW & Azzalini A (2014), R package sm: nonparametric smoothing methods (version 2.2–5.4), University of Glasgow, UK and Università di Padova, Italia. URL: http://www.stats.gla.ac.uk/adrian/sm [Google Scholar]
Boyd S, Parikh N, Chu E, Peleato B & Eckstein J (2011), ‘Distributed optimization and statistical learning via the alternating direction method of multipliers’, Foundations and Trends® in Machine Learning 3(1), 1–122. [Google Scholar]
Cai T, Liu W & Luo X (2011), ‘A constrained 1 minimization approach to sparse precision matrix estimation’, Journal of the American Statistical Association 106(494), 594–607. [Google Scholar]
Corporation M & Weston S (2018), doParallel: Foreach Parallel Adaptor for the ‘parallel’ Package. R package version 1.0.14. URL: https://CRAN.R-project.org/package=doParallel
Csardi G & Nepusz T (2006), ‘The igraph software package for complex network research’, InterJournal Complex Systems, 1695. URL: http://igraph.org [Google Scholar]
Danaher P, Wang P & Witten DM (2014), ‘The joint graphical lasso for inverse covariance estimation across multiple classes’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(2), 373–397. [DOI] [PMC free article] [PubMed] [Google Scholar]
Erdös P & Rényi A (1959), ‘On random graphs, i’, Publicationes Mathematicae (Debrecen) 6, 290–297. [Google Scholar]
Friedman J, Hastie T & Tibshirani R (2008), ‘Sparse inverse covariance estimation with the graphical lasso’, Biostatistics 9(3), 432–441. [DOI] [PMC free article] [PubMed] [Google Scholar]
Friedman J, Hastie T & Tibshirani R (2010), Applications of the lasso and grouped lasso to the estimation of sparse graphical models, Technical report, Technical report, Stanford University. [Google Scholar]
Friedman J, Hastie T & Tibshirani R (2014), glasso: Graphical lasso- estimation of Gaussian graphical models. R package version 1.8. URL: https://CRAN.R-project.org/package=glasso [DOI] [PMC free article] [PubMed]
Gibberd AJ & Nelson JD (2014), High dimensional changepoint detection with a dynamic graphical lasso, in ‘Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on’, IEEE, pp. 2684–2688. [Google Scholar]
Gibberd AJ & Nelson JD (2015), ‘Estimating dynamic graphical models from multivariate time-series data’, Proceedings of AALTD 2015 p. 63. [Google Scholar]
Gibberd AJ & Nelson JD (2017), ‘Regularized estimation of piecewise constant gaussian graphical models: The group-fused graphical lasso’, Journal of Computational and Graphical Statistics (just-accepted). [Google Scholar]
Hallac D, Park Y, Boyd S & Leskovec J (2017), ‘Network inference via the time-varying graphical lasso’, arXiv preprint arXiv:1703.01958. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kolar M, Song L, Ahmed A & Xing EP (2010), ‘Estimating time-varying networks’, The Annals of Applied Statistics pp. 94–123. [Google Scholar]
Kolar M & Xing EP (2012), ‘Estimating networks with jumps’, Electronic journal of statistics 6, 2069. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lam C & Fan J (2009), ‘Sparsistency and rates of convergence in large covariance matrix estimation’, Annals of statistics 37(6B), 4254. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meinshausen N & Bühlmann P (2006), ‘High-dimensional graphs and variable selection with the lasso’, The annals of statistics pp. 1436–1462. [Google Scholar]
Microsoft & Weston S (2017), foreach: Provides Foreach Looping Construct for R. R package version 1.4.4. URL: https://CRAN.R-project.org/package=foreach
Monti RP, Hellyer P, Sharp D, Leech R, Anagnostopoulos C & Montana G (2014), ‘Estimating time-varying brain connectivity networks from functional mri time series’, NeuroImage 103, 427–443. [DOI] [PubMed] [Google Scholar]
Peng J, Wang P, Zhou N & Zhu J (2009), ‘Partial correlation estimation by joint sparse regression models’, Journal of the American Statistical Association 104(486), 735–746. [DOI] [PMC free article] [PubMed] [Google Scholar]
Peng J, Zhu J, Bergamaschi A, Han W, Noh D-Y, Pollack JR & Wang P (2010), ‘Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer’, The annals of applied statistics 4(1), 53. [DOI] [PMC free article] [PubMed] [Google Scholar]
R Core Team (2016), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/ [Google Scholar]
Ravikumar P, Wainwright MJ, Raskutti G, Yu B et al. (2011), ‘High-dimensional covariance estimation by minimizing 1-penalized log-determinant divergence’, Electronic Journal of Statistics 5, 935–980. [Google Scholar]
Rothman AJ, Bickel PJ, Levina E, Zhu J et al. (2008), ‘Sparse permutation invariant covariance estimation’, Electronic Journal of Statistics 2, 494–515. [Google Scholar]
Ryan JA & Ulrich JM (2019), quantmod: Quantitative Financial Modelling Framework. R package version 0.4–14. URL: https://CRAN.R-project.org/package=quantmod
Song L, Kolar M & Xing EP (2009), ‘Keller: estimating time-varying interactions between genes’, Bioinformatics 25(12), i128–i136. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tibshirani R (1996), ‘Regression shrinkage and selection via the lasso’, Journal of the Royal Statistical Society. Series B (Methodological) pp. 267–288. [Google Scholar]
Tibshirani R, Saunders M, Rosset S, Zhu J & Knight K (2005), ‘Sparsity and smoothness via the fused lasso’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(1), 91–108. [Google Scholar]
Wahlberg B, Boyd S, Annergren M & Wang Y (2012), ‘An admm algorithm for a class of total variation regularized estimation problems’, IFAC Proceedings Volumes 45(16), 83–88. [Google Scholar]
Wang J & Kolar M (2014), ‘Inference for sparse conditional precision matrices’, arXiv preprint arXiv:1412.7638. [Google Scholar]
Wit EC & Abbruzzo A (2015), ‘Inferring slowly-changing dynamic gene-regulatory networks’, BMC bioinformatics 16(6), S5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Witten DM, Friedman JH & Simon N (2011), ‘New insights and faster computations for the graphical lasso’, Journal of Computational and Graphical Statistics 20(4), 892–900. [Google Scholar]
Yang S, Lu Z, Shen X, Wonka P & Ye J (2015), ‘Fused multiple graphical lasso’, SIAM Journal on Optimization 25(2), 916–943. [Google Scholar]
Yuan M & Lin Y (2006), ‘Model selection and estimation in regression with grouped variables’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68(1), 49–67. [Google Scholar]
Yuan M & Lin Y (2007), ‘Model selection and estimation in the gaussian graphical model’, Biometrika 94(1), 19–35. [Google Scholar]
Zhou S, Lafferty J & Wasserman L (2010), ‘Time varying undirected graphs’, Machine Learning 80(2–3), 295–319. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1053832-supplement-1.pdf^{(210.6KB, pdf)}

[R1] Ahmed A & Xing EP (2009), ‘Recovering time-varying networks of dependencies in social and biological studies’, Proceedings of the National Academy of Sciences 106(29), 11878–11883. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Amadeo K (2017), ‘Here’s how they missed the early clues of the financial crisis’. URL: https://www.thebalance.com/2007-financial-crisis-overview-3306138

[R3] Banerjee O, Ghaoui LE & dAspremont A (2008), ‘Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data’, Journal of Machine learning research 9(March), 485–516. [Google Scholar]

[R4] Bates D & Maechler M (2017), Matrix: Sparse and Dense Matrix Classes and Methods. R package version 1.2–10. URL: https://CRAN.R-project.org/package=Matrix

[R5] Bowman AW & Azzalini A (2014), R package sm: nonparametric smoothing methods (version 2.2–5.4), University of Glasgow, UK and Università di Padova, Italia. URL: http://www.stats.gla.ac.uk/adrian/sm [Google Scholar]

[R6] Boyd S, Parikh N, Chu E, Peleato B & Eckstein J (2011), ‘Distributed optimization and statistical learning via the alternating direction method of multipliers’, Foundations and Trends® in Machine Learning 3(1), 1–122. [Google Scholar]

[R7] Cai T, Liu W & Luo X (2011), ‘A constrained 1 minimization approach to sparse precision matrix estimation’, Journal of the American Statistical Association 106(494), 594–607. [Google Scholar]

[R8] Corporation M & Weston S (2018), doParallel: Foreach Parallel Adaptor for the ‘parallel’ Package. R package version 1.0.14. URL: https://CRAN.R-project.org/package=doParallel

[R9] Csardi G & Nepusz T (2006), ‘The igraph software package for complex network research’, InterJournal Complex Systems, 1695. URL: http://igraph.org [Google Scholar]

[R10] Danaher P, Wang P & Witten DM (2014), ‘The joint graphical lasso for inverse covariance estimation across multiple classes’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(2), 373–397. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Erdös P & Rényi A (1959), ‘On random graphs, i’, Publicationes Mathematicae (Debrecen) 6, 290–297. [Google Scholar]

[R12] Friedman J, Hastie T & Tibshirani R (2008), ‘Sparse inverse covariance estimation with the graphical lasso’, Biostatistics 9(3), 432–441. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Friedman J, Hastie T & Tibshirani R (2010), Applications of the lasso and grouped lasso to the estimation of sparse graphical models, Technical report, Technical report, Stanford University. [Google Scholar]

[R14] Friedman J, Hastie T & Tibshirani R (2014), glasso: Graphical lasso- estimation of Gaussian graphical models. R package version 1.8. URL: https://CRAN.R-project.org/package=glasso [DOI] [PMC free article] [PubMed]

[R15] Gibberd AJ & Nelson JD (2014), High dimensional changepoint detection with a dynamic graphical lasso, in ‘Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on’, IEEE, pp. 2684–2688. [Google Scholar]

[R16] Gibberd AJ & Nelson JD (2015), ‘Estimating dynamic graphical models from multivariate time-series data’, Proceedings of AALTD 2015 p. 63. [Google Scholar]

[R17] Gibberd AJ & Nelson JD (2017), ‘Regularized estimation of piecewise constant gaussian graphical models: The group-fused graphical lasso’, Journal of Computational and Graphical Statistics (just-accepted). [Google Scholar]

[R18] Hallac D, Park Y, Boyd S & Leskovec J (2017), ‘Network inference via the time-varying graphical lasso’, arXiv preprint arXiv:1703.01958. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Kolar M, Song L, Ahmed A & Xing EP (2010), ‘Estimating time-varying networks’, The Annals of Applied Statistics pp. 94–123. [Google Scholar]

[R20] Kolar M & Xing EP (2012), ‘Estimating networks with jumps’, Electronic journal of statistics 6, 2069. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Lam C & Fan J (2009), ‘Sparsistency and rates of convergence in large covariance matrix estimation’, Annals of statistics 37(6B), 4254. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Meinshausen N & Bühlmann P (2006), ‘High-dimensional graphs and variable selection with the lasso’, The annals of statistics pp. 1436–1462. [Google Scholar]

[R23] Microsoft & Weston S (2017), foreach: Provides Foreach Looping Construct for R. R package version 1.4.4. URL: https://CRAN.R-project.org/package=foreach

[R24] Monti RP, Hellyer P, Sharp D, Leech R, Anagnostopoulos C & Montana G (2014), ‘Estimating time-varying brain connectivity networks from functional mri time series’, NeuroImage 103, 427–443. [DOI] [PubMed] [Google Scholar]

[R25] Peng J, Wang P, Zhou N & Zhu J (2009), ‘Partial correlation estimation by joint sparse regression models’, Journal of the American Statistical Association 104(486), 735–746. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Peng J, Zhu J, Bergamaschi A, Han W, Noh D-Y, Pollack JR & Wang P (2010), ‘Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer’, The annals of applied statistics 4(1), 53. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] R Core Team (2016), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/ [Google Scholar]

[R28] Ravikumar P, Wainwright MJ, Raskutti G, Yu B et al. (2011), ‘High-dimensional covariance estimation by minimizing 1-penalized log-determinant divergence’, Electronic Journal of Statistics 5, 935–980. [Google Scholar]

[R29] Rothman AJ, Bickel PJ, Levina E, Zhu J et al. (2008), ‘Sparse permutation invariant covariance estimation’, Electronic Journal of Statistics 2, 494–515. [Google Scholar]

[R30] Ryan JA & Ulrich JM (2019), quantmod: Quantitative Financial Modelling Framework. R package version 0.4–14. URL: https://CRAN.R-project.org/package=quantmod

[R31] Song L, Kolar M & Xing EP (2009), ‘Keller: estimating time-varying interactions between genes’, Bioinformatics 25(12), i128–i136. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Tibshirani R (1996), ‘Regression shrinkage and selection via the lasso’, Journal of the Royal Statistical Society. Series B (Methodological) pp. 267–288. [Google Scholar]

[R33] Tibshirani R, Saunders M, Rosset S, Zhu J & Knight K (2005), ‘Sparsity and smoothness via the fused lasso’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(1), 91–108. [Google Scholar]

[R34] Wahlberg B, Boyd S, Annergren M & Wang Y (2012), ‘An admm algorithm for a class of total variation regularized estimation problems’, IFAC Proceedings Volumes 45(16), 83–88. [Google Scholar]

[R35] Wang J & Kolar M (2014), ‘Inference for sparse conditional precision matrices’, arXiv preprint arXiv:1412.7638. [Google Scholar]

[R36] Wit EC & Abbruzzo A (2015), ‘Inferring slowly-changing dynamic gene-regulatory networks’, BMC bioinformatics 16(6), S5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] Witten DM, Friedman JH & Simon N (2011), ‘New insights and faster computations for the graphical lasso’, Journal of Computational and Graphical Statistics 20(4), 892–900. [Google Scholar]

[R38] Yang S, Lu Z, Shen X, Wonka P & Ye J (2015), ‘Fused multiple graphical lasso’, SIAM Journal on Optimization 25(2), 916–943. [Google Scholar]

[R39] Yuan M & Lin Y (2006), ‘Model selection and estimation in regression with grouped variables’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68(1), 49–67. [Google Scholar]

[R40] Yuan M & Lin Y (2007), ‘Model selection and estimation in the gaussian graphical model’, Biometrika 94(1), 19–35. [Google Scholar]

[R41] Zhou S, Lafferty J & Wasserman L (2010), ‘Time varying undirected graphs’, Machine Learning 80(2–3), 295–319. [Google Scholar]

PERMALINK

Estimating Time-Varying Graphical Models

Jilei Yang

Jie Peng

Abstract

1. Introduction

Figure 2: Stock Price: loggle fitted time-varying graphs.

2. Methods

2.1. Local Group Graphical Lasso Estimation

2.2. Model Fitting

Fast blockwise algorithm

Theorem 1

Theorem 2

Pseudo-likelihood approximation

2.3. Model Tuning

3. Simulation

3.1. Setting

Table 1: Simulation Results: Model Selection by 5-fold CV.

Table 2: Simulation Results: Model Selection by Optimal F1 Score.

3.2. Results

4. S&P 500 Stock Price

Table 3: Stock price:

Figure 1: Stock Price: Summary statistics of the estimated graphs by loggle, kernel and invar.

Figure 3: Stock Price: Cross-sector interaction plots at 5 different time points based on loggle fitted graphs.

5. Conclusion

Supplementary Material

Acknowledgments

Appendix

A.1. Proof of Theorem 2

A.2. ADMM algorithm under pseudo-likelihood approximation

Over-relaxation

Stopping criterion

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 2: Simulation Results: Model Selection by Optimal F₁ Score.