Bayesian analysis of matrix normal graphical models

Hao Wang; Mike West

doi:10.1093/biomet/asp049

. 2009 Oct 9;96(4):821–834. doi: 10.1093/biomet/asp049

Bayesian analysis of matrix normal graphical models

Hao Wang ¹, Mike West ¹

PMCID: PMC3376753 PMID: 22822246

Abstract

We present Bayesian analyses of matrix-variate normal data with conditional independencies induced by graphical model structuring of the characterizing covariance matrix parameters. This framework of matrix normal graphical models includes prior specifications, posterior computation using Markov chain Monte Carlo methods, evaluation of graphical model uncertainty and model structure search. Extensions to matrix-variate time series embed matrix normal graphs in dynamic models. Examples highlight questions of graphical model uncertainty, search and comparison in matrix data contexts. These models may be applied in a number of areas of multivariate analysis, time series and also spatial modelling.

Some key words: Gaussian graphical model, Graphical model search, Hyper-inverse Wishart distribution, Marginal likelihood, Matrix normal model, Matrix-variate dynamic graphical model, Parameter expansion

1. Introduction

We introduce and analyze matrix normal graphical models; that is, matrix normal distributions (Dawid, 1981; Gupta & Nagar, 2000) in which each of the two characterizing covariance matrices reflects conditional independencies consistent with an underlying graphical model (Whittaker, 1990; Lauritzen, 1996). We present fully Bayesian analysis of the matrix normal model as the special case of full graphs, and develop computational methods for marginal likelihood computation on a specified graphical model. This enables graphical model search and comparison for posterior inferences about conditional independence structures. The random sampling framework is extended to matrix-variate time series models that inherit the graphical model structure to represent conditional independencies in matrix time series. We focus on decomposable graphs although the general approach will also apply to nondecomposable models.

Matrix-variate normal distributions have been studied in analysis of two-factor linear models for cross-classified multivariate data (Finn, 1974; Galecki, 1994; Naik & Rao, 2001), in spatio-temporal models (Mardia & Goodall, 1993; Huizenga et al., 2002) and other areas. Some computational and inferential developments, including iterative calculation of maximum likelihood estimates (Dutilleul, 1999; Mitchell et al., 2006) and empirical Bayesian methods for Procrustes analysis with matrix models (Theobald & Wuttke, 2006) have been published. Our work appears to be the first to develop fully Bayesian analysis of the basic matrix normal model alone, though that is only a necessary first step to the broader framework of matrix graphical models.

In time series, graphical modelling of the covariance matrix of multivariate data appears in Carvalho & West (2007a, 2007b). Here we generalize that earlier work to time series of matrix data, providing fully Bayesian inference and graphical model search related to both row and column intra-dependencies in the cross-sectional structure of a matrix-valued time series.

2. Matrix variate normals, graphs and notation

The q × p random matrix Y is matrix normal, Y ∼ N (M, U, V), with mean M (q × p), column and row covariance matrices U = (u_ij) (q × q) and V = (υ_ij) (p × p), respectively, when

p (Y) \equiv p (Y | U, V) = k (U, V) exp [- tr {{(Y - M)}^{'} U^{- 1} (Y - M) V^{- 1} / 2}],

(1)

where k(U, V) = (2π)^−qp/2|U|^−p/2|V|^−q/2. The rows y_i★ (i = 1, . . ., p), and columns y_★j (j = 1, . . ., q), have margins y_i★ ∼ N (m_i★, u_ii V) and y_★j ∼ N (m_★j, υ_jj U) with precision matrices Λ = V⁻¹ = (λ_ij) and Ω = U⁻¹ = (ω_ij), respectively. The normal conditional distributions have mean vectors and covariance matrices

\begin{array}{l} E (y_{i ★} | y_{- i ★}) = m_{i ★} - ω_{i i}^{- 1} Σ_{s \in (1, \dots, q \ i)} ω_{i s} (y_{s ★} - m_{s ★}), & cov (y_{i ★} | y_{- i ★}) = ω_{i i}^{- 1} V, \\ E (y_{★ j} | y_{- ★ j}) = m_{★ j} - λ_{j j}^{- 1} Σ_{t \in (1, \dots, p \ j)} λ_{t j} (y_{★ t} - m_{★ t}), & cov (y_{★ j} | y_{- ★ j}) = λ_{j j}^{- 1} U, \end{array}

for rows i = 1, . . ., q and columns j = 1, . . ., p. Zeros in Λ and Ω define conditional independencies. If (i, j) ≠ (s, t) then y_ij and y_st may, conditional upon y_−(ij,st) be dependent through either rows or columns; conditional independence is equivalent to: at least one zero among λ_tj and ω_is when s ≠ i, j ≠ t; ω_is = 0 when s ≠ i, j = t; λ_jt = 0 when s = i, j ≠ t. With no loss of generality, in this section we set M = 0.

Undirected graphical models can be applied to each of Λ and Ω to represent strict conditional independencies. A graph G_V on nodes {1, . . ., p} has edges between pairs of column indices (j, t) for which λ_jt ≠ 0; Λ has off-diagonal zeros corresponding to within-row conditional independencies. Similarly, a graph G_U on nodes {1, . . ., q} lacks edges between row indices (i, s) for which ω_is = 0. We focus here on decomposable graphs G_U and G_V. The theory of graphical models can be now overlaid to define conditional factorizations of the matrix normal density over graphs. Over G_V, for example, we have

p (Y | U, V, G_{V}, G_{U}) = \prod_{P_{V} \in 𝒫_{V}} p (Y_{★ P_{V}} | U, V_{P_{V}}) / \prod_{S_{V} \in 𝒮_{V}} p (Y_{★ S_{V}} | U, V_{S_{V}}),

(2)

where 𝒫_V is the set of complete prime components, or cliques, of G_V and 𝒮_V is the set of separators. For each subgraph g ∈ {𝒫_V, 𝒮_V}, Y_★g is the q × |g| matrix with variables from the |g| columns of Y defined by the subgraph, and V_g the corresponding submatrix of V. Each term in equation (2) is matrix normal, Y_★g ∼ N (0, U, V_g) with $Λ_{g} = V_{g}^{- 1}$ having no off-diagonal zeros. We can similarly factorize the joint density over G_U.

Now, U and V are not uniquely identified since, for any c > 0, p(Y | U, V) = p(Y | cU, V/c). There are a number of approaches to imposing identification constraints such as tr(V) = p (Theobald & Wuttke, 2006), and possible strategies that use unconstrained parameters; we discuss the latter in § 8. Our use of hyper-Markov priors over each of U and V with underlying graphical models, discussed below, makes it desirable to adopt an explicit constraint and we enforce υ₁₁ = 1 from here on.

3. Matrix graphical modelling

Hyper-inverse Wishart priors are conjugate for covariance matrices in multivariate normal graphical models (Dawid & Lauritzen, 1993). Hyper-inverse Wishart distributions are compatible and consistent across graphs, which is critical when admitting uncertainty about graph structures (Giudici & Green, 1999; Jones et al., 2005). On decomposable graphs, the implied priors on sub-covariance matrices on all components and separators are inverse Wishart. Use of independent hyper-inverse Wishart priors for U, V in the current context is a natural choice, and maintains compatibility and consistency across graphs G_U, G_V. To incorporate the identification constraint υ₁₁ = 1, we use a parameter expansion approach. Parameter expansion involves expanding the parameter space by adding new nuisance parameters, and has been used purely algorithmically to accelerate Markov chain Monte Carlo samplers (Liu et al., 1998; Liu & Wu, 1999), but can also be used to induce new priors (Gelman, 2004, 2006) as is germane here.

We assume the prior p(U, V) = p(U)p(V) where, using the hyper-inverse Wishart notation of Giudici & Green (1999) and Jones et al. (2005), the margins are defined by

U \sim {HIW}_{G_{U}} (b, B), V = V^{*} / υ_{11}^{*}, V^{*} \sim {HIW}_{G_{V}} (d, D) .

(3)

The density function for U is, following Dawid & Lauritzen (1993),

p (U) = \prod_{P_{U} \in 𝒫_{U}} p (U_{P_{U}} | b, B_{P_{U}}) / \prod_{S_{U} \in 𝒮_{U}} p (U_{S_{U}} | b, B_{S_{U}}),

where each component is an inverse Wishart density; p(V^*) has a similar form.

The parameter expansion concept relates to $υ_{11}^{*}$ as an added parameter that converts column scales in V to those relative to the scale of the first column. As we move across graphs G_V, the priors p(V | G_V) have the same induced priors over subgraph correlation structures but are no longer in complete agreement for V = V^*/ $υ_{11}^{*}$ due to the different parameterizations and interpretations. This is natural and appropriate. Suppose G_V and G′_V are two graphs with a common clique C. Each element in diag(V_C) represents the relative scale of variance of that column to the variance of the first column so that, if G_V and G′_V imply different conditional dependencies between the first column and columns linked to C, then the induced priors over V_C should indeed be different.

The prior p(V) is obtained by transformation from V^*. On any graph G_V, V is determined only by those free elements appearing in the submatrices corresponding to the cliques of the graph, and the nonfree elements of V are deterministic functions of the free elements (Carvalho et al., 2007). Let ν be the number of free elements; then the transformation from V ^* to (V, $υ_{11}^{*}$ ) has Jacobian ${(υ_{11}^{*})}^{ν - 1}$ leading to

p (V, υ_{11}^{*}) = {HIW}_{G_{V}} (υ_{11}^{*} V | d, D) {(υ_{11}^{*})}^{ν - 1} .

Coupled with the prior p(U) on G_U, this defines a class of conditionally conjugate priors in the expanded parameter space.

4. Posterior and marginal likelihood computation

4.1. Gibbs sampling on given graphs

Assume an initial random sampling context with q × p data matrices Y_i (i = 1, . . ., n), drawn independently from equation (1), and write Y for the full set of data. It is easy to see that, on specified graphs (G_U, G_V), the posterior p(U, V, $υ_{11}^{*}$ |Y) has conditional distributions:

\begin{array}{l} (U | V, υ_{11}^{*}, Y) \sim {HIW}_{G_{U}} (b + n p, B + \sum_{i = 1}^{n} Y_{i} V^{- 1} Y_{i}^{'}), \\ (V | U, υ_{11}^{*}, Y) \sim {HIW}_{G_{V}} (d + n q, D / υ_{11}^{*} + \sum_{i = 1}^{n} Y_{i}^{'} U^{- 1} Y_{i}) I (υ_{11} = 1), \\ (υ_{11}^{*} | U, V, Y) \sim IG {a / 2 - ν, tr (D V^{- 1}) / 2}, \end{array}

where a = Σ_{P_V}|P_V|(2|P_V| + d) − Σ_{S_V}|S_V|(2|S_V| + d). These distributions form the basis of Gibbs sampling for the target posterior p(U, V, $υ_{11}^{*}$ | Y). This involves iterative resampling from the hyper-inverse Wishart, inverse gamma and new conditional hyper-inverse Wishart distributions. Simulation of the former is based on Carvalho et al. (2007), while sampling the latter can be done as follows. From Lemma 2.18 of Lauritzen (1996), we can always find a perfect ordering of the nodes in G_V so that node 1 is in the first clique, say C, and then initialize the hyper-inverse Wishart sampler of Carvalho et al. (2007) to begin with a simulation of the implied conditional inverse Wishart distribution for the covariance matrix on that first clique. Sampling V_C from an inverse Wishart distribution conditional on the first diagonal element set to unity is straightforward.

4.2. Marginal likelihood

Exploration of uncertainty about graphical model structures involves consideration of the marginal likelihood function over graphs. For any pair (G_U, G_V), this is

p (Y) \equiv p (Y | G_{U}, G_{V}) = \int p (Y | U, V) p (U) p (V) d U d V .

The priors in the integrand depend on the graphs although for clarity we drop that from the notation. In multivariate models, marginal likelihoods can be evaluated in closed form on decomposable graphs (Giudici, 1996; Giudici & Green, 1999; Jones et al., 2005; Carvalho & West, 2007a, 2007b). In our matrix models, the integral cannot be evaluated but we can generate useful approximations via use of the candidate’s formula (Besag, 1989; Chib, 1995). Write Θ = {U, V, $υ_{11}^{*}$ } for all parameters, and suppose that we can evaluate p(θ| Y) for some sub-set of parameters θ ∈ Θ; the candidate’s formula gives the marginal likelihood via the identity p(Y) = p(Y | θ)/p(θ| Y). Applying this requires that we estimate components of the numerator or denominator. Choosing θ to maximally exploit analytic integration is key, and different choices that integrate over different subsets of parameters will lead to different, parallel approximations of p(Y) that can be compared. We use two approximations based on marginalization over desirably disjoint parameter subsets, namely, (A): p(Y) = p(Y, $υ_{11}^{*}$ , U)/p( $υ_{11}^{*}$ , U | Y) at any chosen value of θ = { $υ_{11}^{*}$ , U}, and (B): p(Y) = p(Y, V)/p(V | Y) at any value of θ = V. We estimate the components of these equations that have no closed form, then insert chosen values U, V, $υ_{11}^{*}$ , such as approximate posterior means, to provide two estimates of p(Y).

For (A), first rewrite as

p (Y) = \frac{p (Y, υ_{11}^{*}, U) p (V | υ_{11}^{*}, U, Y)}{p (υ_{11}^{*}, U | Y) p (V | υ_{11}^{*}, U, Y)} = \frac{p (Y | V, υ_{11}^{*}, U) p (U) p (V | υ_{11}^{*}) p (υ_{11}^{*})}{p (υ_{11}^{*}, U | Y) p (V | υ_{11}^{*}, U, Y)} .

The numerator terms are each easily computed at any {V, $υ_{11}^{*}$ , U}. The second denominator term p(V | $υ_{11}^{*}$ , U, Y) has an easily evaluated closed form, as in the Gibbs sampling step. The first denominator term may be approximated by

\begin{array}{l} p (υ_{11}^{*}, U | Y) & = \int p (υ_{11}^{*} | Y, V) p (U | Y, V, υ_{11}^{*}) p (V | Y) d V \\ \approx \frac{1}{M} \sum_{j = 1}^{M} p (υ_{11}^{*} | Y, V_{j}) p (U | Y, V_{j}, υ_{11}^{*}), \end{array}

where the sum is over posterior draws V_j; this is easy to compute as it is a sum of the product of inverse gamma and hyper-inverse Wishart densities.

For (B), the numerator can be analytically evaluated as

\begin{array}{l} p (V, Y) & = \int p (Y, U, V, υ_{11}^{*}) d U d υ_{11}^{*} \\ = \frac{q_{V} {(2 π)}^{- n q p / 2} H (b, B, G_{U}) H (d, D, G_{V})}{H (b + n p, B + \sum_{i = 1}^{n} Y_{i} V^{- 1} Y_{i}^{'}, G_{U}) H {c, tr (D V^{- 1}), 1}}, \end{array}

where

q_{V} = \prod_{P_{V} \in 𝒫_{V}} {| V_{P_{V}} |}^{- (n q + d + 2 | P_{V} |) / 2} / \prod_{S_{V} \in 𝒮_{V}} | V_{S_{V}} |^{- (n q + d + 2 | S_{V} |) / 2},

the H(·, ·, G_·) terms are normalizing constants of the corresponding hyper-inverse Wishart distributions (Giudici & Green, 1999; Jones et al., 2005) and

c = \sum_{P_{V} \in 𝒫_{V}} | P_{V} | (2 | P_{V} | + d) - \sum_{S_{V} \in 𝒮_{V}} | S_{V} | (2 | S_{V} | + d) - 2 ν .

The density function in the denominator is approximated as

p (V | Y) = \int p (V | υ_{11}^{*}, U, Y) p (υ_{11}^{*}, U | Y) d υ_{11}^{*} d U \approx \frac{1}{M} \sum_{j = 1}^{M} P (V | Y, U_{j}, υ_{11, j}^{*}),

where the sum over posterior draws (U_j, $υ_{11, j}^{*}$ ) can be easily performed, with terms given by conditional hyper-inverse Wishart density evaluations.

4.3. Graphical model uncertainty and search

Now admit uncertainty about graphs (G_U, G_V) using sparsity-encouraging priors in which edge inclusion indicators are independent Bernoulli variates (Dobra et al., 2004; Jones et al., 2005). We now extend Markov chain Monte Carlo simulation for multivariate graphical models (Giudici & Green, 1999; Jones et al., 2005) to learning on (G_U, G_V) in the above matrix model analysis. Our analysis generates multiple graphs with values of approximate posterior probabilities, using the Markov chain simulation for model search. This relies on the computation of the unnormalized posterior over graphs, p(G_U, G_V| Y) ∝ p(Y | G_U, G_V) p(G_U, G_V) involving the marginal likelihood value for any specified model (G_U, G_V) at each search step. For the latter, we average the approximate marginal likelihood values from methods (A) and (B). The work in Jones et al. (2005) includes evaluation of the performance of various stochastic search methods in single multivariate graphical models; for modest dimensions, they recommend simple local-move Metropolis–Hastings steps. Here, given a current pair (G_U, G_V), we can apply local moves in G_U space based on the conditional posterior p(G_U | Y, G_V), and vice-versa. A candidate G′_U is sampled from a proposal distribution q(G′_U; G_U) and accepted with probability

α = min {1, p (G_{U}^{'} | Y, G_{V}) q (G_{U}; G_{U}^{'}) / p (G_{U} | Y, G_{V}) q (G_{U}^{'}; G_{U})};

our examples use the simple random add/delete edge move proposal of Jones et al. (2005). We then couple this with a similar step using p(G_V | Y, G_U) at each iteration. This requires a Markov chain analysis on each graph pair visited in order to evaluate marginal likelihood, so implying a substantial computational burden.

5. Example: a simulated random sample

A sample of size n = 48 was drawn from the (q = 8) × (p = 7) dimensional N (0, U, V) distribution, where, using · to denote zeros to highlight structure, the precision matrices are

\begin{array}{l} Λ = (\begin{matrix} 1.85 & - 0.09 & - 0.65 & . & - 0.24 & 0.45 & . \\ - 0.09 & 0.21 & 0.08 & . & . & 0.14 & - 0.13 \\ - 0.65 & 0.08 & 0.58 & 0.10 & . & - 0.30 & . \\ . & . & 0.10 & 0.48 & . & - 0.10 & . \\ - 0.24 & . & . & . & 0.70 & - 0.17 & . \\ 0.45 & 0.14 & - 0.30 & - 0.10 & - 0.17 & 0.61 & - 0.36 \\ . & - 0.13 & . & . & . & - 0.36 & 3.72 \end{matrix}), \\ Ω = (\begin{matrix} 0.99 & . & . & - 0.33 & . & 0.05 & . & . \\ . & 3.65 & 0.33 & . & - 0.39 & - 0.41 & . & - 0.03 \\ . & 0.33 & 2.23 & . & . & - 0.38 & . & . \\ - 0.33 & . & . & 1.65 & . & . & . & . \\ . & - 0.39 & . & . & 2.91 & - 0.30 & . & . \\ 0.05 & - 0.41 & - 0.38 & . & - 0.30 & 4.71 & - 0.13 & - 0.40 \\ . & . & . & . & . & - 0.13 & 1.07 & - 0.26 \\ . & - 0.03 & . & . & . & - 0.40 & - 0.26 & 1.45 \end{matrix}) \end{array}

First consider analysis on the true graphs under priors with b = d = 3 and B = 5I₈, D = 5I₇ and simulation sample size 8000 after an initial, discarded burn-in of 2000 iterations. Convergence is rapid and apparently fast-mixing in this as in other simulated examples. The corresponding posterior means of the precision matrices are

\begin{array}{l} \hat{Λ} = (\begin{matrix} 1.86 & - 0.11 & - 0.68 & . & - 0.28 & 0.44 & . \\ - 0.11 & 0.28 & 0.14 & . & . & 0.16 & - 0.21 \\ - 0.68 & 0.14 & 0.68 & 0.16 & . & - 0.33 & . \\ . & . & 0.16 & 0.59 & . & - 0.15 & . \\ - 0.28 & . & . & . & 0.75 & - 0.14 & . \\ 0.44 & 0.16 & - 0.33 & - 0.15 & - 0.14 & 0.71 & - 0.45 \\ . & - 0.21 & . & . & . & - 0.45 & 4.14 \end{matrix}), \\ \hat{Ω} = (\begin{matrix} 0.90 & . & . & - 0.27 & . & 0.02 & . & . \\ . & 3.23 & 0.50 & . & - 0.35 & - 0.22 & . & - 0.12 \\ . & 0.50 & 2.14 & . & . & - 0.37 & . & . \\ - 0.27 & . & . & 1.46 & . & . & . & . \\ . & - 0.35 & . & . & 2.88 & - 0.41 & . & . \\ 0.02 & - 0.22 & - 0.37 & . & - 0.41 & 4.20 & - 0.29 & - 0.08 \\ . & . & . & . & . & - 0.29 & 0.91 & - 0.26 \\ . & - 0.12 & . & . & . & - 0.08 & - 0.26 & 1.58 \end{matrix}) \end{array}

Figure 1 gives an implementation check on the concordance of the two marginal likelihood estimates. These are very close and differ negligibly on the log probability scale even at small Monte Carlo sample sizes.

Fig. 1 — Logmarginal likelihood values in the example of § 5. The two estimates A (full line) and B (dashed line) of § 4.2 were successively re-evaluated and plotted here at differing simulation sample sizes. The vertical scale has been adjusted by addition of 3583 for clarity.

Consider graphical model uncertainty with prior edge inclusion probabilities 2/(q − 1) for G_U and 2/(p − 1) for G_V. Repeat explorations suggest stability of the marginal likelihood estimation using smaller Monte Carlo sample sizes, and we use 2000 draws within each step of the model search. The add/delete Metropolis-within-Gibbs was run for 20 000 iterates starting from empty graphs. Results are essentially replicated starting at the full graphs. The most probable graphs visited, (Ĝ_U, Ĝ_V), are shown in Fig. 2; these are local modes and also have greater posterior probability than the true graphs also displayed, and this model was first visited after 2614 Markov chain steps. The edges in (Ĝ_U, Ĝ_V) generally have higher posterior edge inclusion probability than those not included; the lowest probability included edge has probability 0.52, while the highest probability excluded edge has probability 0.59. Thus, graphs discovered by highest posterior probability and by aggregating high probability edges are not dramatically different. The modal Ĝ_U is sparser than the true G_U, reflecting the difficulties in identifying very weak signals; for example, the modal graph lacks an edge corresponding to the true Ω_1,6 = 0.05, and the posterior probability of that edge is naturally low. One measure of inferred sparsity is the posterior mean of the proportion of edges in each graph; these are about 28%, 59.6% for G_U, G_V, respectively. Additional posterior summaries and exploration of the posterior samples suggest clean convergence of the simulation analysis and the Metropolis–Hastings steps over graphs had good empirical acceptance rates of about 26%, 9% for G_U, G_V, respectively.

Fig. 2 — True graphs in the simulated data example together with graphs of highest posterior probability identified from the analysis.

6. Dynamic matrix-variate graphical models for time series

Using the theory and methods for matrix normal models developed above, we are now able to extend our ideas to matrix time series involving two covariance matrices and associated graphical models. In the notation below, the work of Carvalho & West (2007a, 2007b) is the special case of vector data with q = 1, U fixed, and inference on (V, G_V) only.

A q × p matrix-variate times series Y_t follows the dynamic linear model

\begin{array}{l} Y_{t} = (I_{q} \otimes F_{t}^{'}) Θ_{t} + ν_{t}, ν_{t} \sim N (0, U, V), \\ Θ_{t} = (I_{q} \otimes G_{t}) Θ_{t - 1} + ϒ_{t}, ϒ_{t} \sim N (0, U \otimes W_{t}, V) \end{array}

for t = 1, 2, . . ., where (a) Y_t = (Y_t,ij), the q × p matrix observation at time t; (b) Θ_t = (Θ_t,ij), the qs × p state matrix comprised of q × p state vectors Θ_t,ij each of dimension s × 1; (c) ϒt = (ω_t,ij), the qs × p matrix of state evolution innovations comprised of q × p innovation vectors ω_t,ij each of dimension s × 1; (d) ν_t = (ν_t,ij), the q × p matrix of observational errors; (e) W_t is the s × s innovation covariance matrix at time t; (f) for all t, the s-vector F_t and s × s state evolution matrix G_t are known. Also, ϒt follows a matrix-variate normal distribution with mean 0, left covariance matrix U ⊗ W_t and right covariance matrix V. In terms of scalar elements, we have q × p univariate models with individual s-vector state parameters, namely,

\begin{array}{l} Y_{t, i j} = F_{t}^{'} Θ_{t, i j} + ν_{t, i j}, ν_{t, i j} \sim N (0, u_{i i} υ_{j j}), \\ Θ_{t, i j} = G_{t} Θ_{t - 1, i j} + ω_{t, i j}, ω_{t, i j} \sim N (0, u_{i i} υ_{j j} W_{t}), \end{array}

(4)

for each i, j and t. Each of the scalar series shares the same F_t and G_t elements, and the reference to the model as one of exchangeable time series reflects these symmetries. In the example below, F_t = F and G_t = G, as in many practical models, but the model class includes dynamic regressions when F_t involves predictor variables. This form of model is a standard specification (Quintana & West, 1987; West & Harrison, 1997) in which the correlation structures induced by U and V affect both the observation and evolution errors; for example, if u_ij is large and positive, vector series Y_t,i★ and Y_{t, j★} will show concordant behaviour in movement of their state vectors and in observational variation about their levels. Specification of the entire sequence of W_t in terms of discount factors (West & Harrison, 1997) is also standard practice, typically using multiple discount factors related to components of the state vector and their expected degrees of random change in time, as illustrated in the example below. The innovations here concern graphical modelling and inference on (U, V). The key theory, conditional on U, V, concerns the conjugate sequential learning and forecasting as data is processed, as follows.

Theorem 1. Define D_t = {D_t−1, Y_t} for t = 1, 2, . . ., with D₀ representing prior information. With initial prior (Θ₀ | U, V, D₀) ∼ N (m₀, U ⊗ C₀, V) we have, for all t:

posterior at t – 1: (Θ_t−1 | D_t−1, U, V) ∼ N (m_t−1, U ⊗ C_t−1, V);
prior at t: (Θ_t | D_t−1, U, V) ∼ N (a_t, U ⊗ R_t, V) where a_t = (I_n ⊗ G_t)m_t−1 and R_t = G_t C_t−1 G′_t + W_t;
one-step forecast at t – 1: (Y_t | D_t−1, U, V) ∼ N (f_t, Uq_t, V) with forecast mean matrix f_t = (I_n ⊗ F′₁ G_t)m_t−1 and scalar q_t = F′_t R_t F_t +1; and
posterior at t: (Θ_t | D_t, U, V) ∼ N (m_t, U ⊗ C_t, V) with m_t = a_t + (I_q ⊗ A_t)e_t and C_t = R_t − A_t A′_t q_t where A_t = R_t F_t/q_t and e_t = Y_t − f_t.

Proof. This stems from the theory of multivariate models applied to vec(Y_t) (West & Harrison, 1997). The main novelty here concerns the separability of covariance structures. That is: for all t, the distributions for state matrices have separable covariance structures; for example, (Θ_t |D_t, U, V) is such that cov{vec(Θ_t) | D_t, U, V} = V ⊗ U ⊗ C_t; the sequential updating equations for the set of qs × p state matrices are implemented in parallel based on computations for the univariate component models, each of them involving the same scalar q_t, s–vector A_t and s × s matrices R_t, C_t at time t.

Suppose now that U and V are constrained by graphs G_U and G_V, with priors as in equation (3) and sparsity priors over the graphs. Given data over t = 1, . . ., n, the sequential updating analysis on (G_U, G_V) leads to the full joint density

P (Y_{1}, \dots, Y_{n} | U, V) = \prod_{t = 1}^{n} p (Y_{t} | U, V, D_{t - 1}) = \prod_{t = 1}^{n} N (e_{t} | 0, q_{t} U, V),

marginalized with respect to all state vectors. The one-step forecast error matrices e_t are conditionally independent matrix normal variates. Apart from the scalars q_t, this is essentially the framework of § 2. Thus, with a small change to insert the q_t, we are able to directly fit and explore dynamic graphical models using the analysis for random samples with embedded sequential updating computations.

7. A macro-economic example

An example concerns exploration of conditional dependence structures in macroeconomic time series related to US labour market employment. The data are Current Employment Statistics for the eight US states, New Jersey, New York, Massachusetts, Georgia, North Carolina, Virginia, Illinois and Ohio. We explore these data across nine industrial sectors: construction; manufacturing; trade, transportation and utilities; information; financial activities; professional and business services; education and health services; leisure and hospitality; and government. In our model framework, we have q = 8, p = 9 and monthly data over several years. Then U characterizes the residual conditional dependencies among states while V does the same for industrial sectors, in the context of an overall model that incorporates time-varying state parameters for underlying trend and annual seasonal structure in the series. Trend and seasonal elements are represented in standard form, the former as random walks and the latter as randomly varying seasonal effects. Specifically, in month t, the monthly employment change in state i and sector j is Y_t,ij, modelled as a first-order polynomial/seasonal effect model (West & Harrison, 1997) with the state vector comprising a local-level parameter and 12 seasonal factors, so that the state dimension is s = 13.

The univariate models of equation (4) have state vectors Θ_t,ij = (μ_t,ij, ϕ_t,ij)′, where μ_t,ij is the local level and ϕ_t,ij = (ϕ_t,ij,k, ϕ_t,ij,k+1, . . ., ϕ_t,ij,11, ϕ_t,ij,0, . . ., ϕ_t,ij,k−1) contains current monthly seasonal factors, subject to 1′ϕ_t,ij = 0 for all i, j and t. Further, F_t = F (13 × 1) and G_t = G (13 × 13) for all t, where F′ = (1, 1, 0, . . ., 0). The state matrix G and the sequence of state evolution covariance matrices W_t (13 × 13) are

G = (\begin{matrix} 1 & 0 \\ 0 & P \end{matrix}), P = (\begin{matrix} 0 & I_{11} \\ 1 & 0^{'} \end{matrix}), W_{t} = (\begin{matrix} W_{t, μ} & 0 \\ 0 & W_{t, ϕ} \end{matrix}),

with the latter having entries as follows. The univariate W_t,μ and 12 × 12 matrix W_t,ϕ are defined via discount factors δ_l and δ_s and the corresponding block components of C_t as W_t,μ = C_t−1,,μ (1 − δ_l)/δ_l and W_t,ϕ = PC_t−1,ϕ P′(1 − δ_s)/δ_s for each t. The discount factor δ_l reflects the rate at which the levels μ_t,ij are expected to vary between months, with 100( $δ_{l}^{- 1}$ − 1)% of information on these parameters decaying each month. The factor δ_s plays the same role for seasonal parameters. We use δ_l = 0.9, δ_s = 0.95 to allow more adaptation to level changes than seasonal factors (West & Harrison, 1997); results, in terms of graphical model search and structure, are substantially similar using other values in appropriate ranges. In application, we can estimate discount factors and also extend the model to allow changes in discount factors to model change-points and other events impacting the series, based on monitoring and intervention methods (Pole et al., 1994; West & Harrison, 1997). Such considerations are secondary to our purposes in using this model for illustration of computational model search analysis for (U, V, G_U, G_V), but practically very germane. Model completion uses initial, vague priors with m₀ = 0, the 104 × 9 matrix, and C₀ = 100I₁₃. The constraint that 1′ϕ_t,ij = 0 is imposed by transforming m₀ and C₀ as discussed in West & Harrison (1997).

Model fitting estimates the movements in trend and seasonality, sequentially generating matrix series e_t whose row and column covariance patterns relate to (U, V). The North Carolina financial activities data and some aspects of the sequential model fit are graphed in Fig. 3. Priors for (U, V) use B = 5I₈, D = 5I₉ and b = d = 3, reflecting the range of residual variation, sparsity-encouraging priors with prior edge inclusion probabilities 2/(q − 1) for G_U and 2/(p − 1) for G_V. The add/delete Metropolis-within-Gibbs sampler was run for 20 000 steps. Two chains were run: one starting at empty graphs and one at full graphs. The most probable model identified, (Ĝ_U, Ĝ_V), is shown in Fig. 4. This, and the acceptance rates of graphs, were insensitive to the starting points. Beginning with empty graphs, the most probable model visited was found after 401 steps; its log posterior probability is −27 695.40, the sparsity of (Ĝ_U, Ĝ_V) in terms of percentage of edges included is (72.4%, 42.1%) and the acceptance rates are (7.3%, 11.9%). Beginning with full graphs led to a most probable model with log posterior probability −27 695.43 after 2194 steps, sparsity (73.7%, 41.9%) and acceptance rates (7.7%, 12.2%). Posterior edge inclusion probabilities are also consistent between the two runs; see Table 1. Further, the most probable graphs sit in a region of graphs of similar sparsity and posterior probability and the posterior is dense around this mode; see Fig. 5.

Fig. 3 — One of the 72 time series in the econometric example, plotted over 1990–2007. (a): Dots are monthly changes in employment of North Carolina financial activities and the line joins the corresponding one-step ahead forecasts over time. (b): Corresponding standardized one-step ahead forecast errors *e_t*/ √ *q_t*. (c): Corresponding on-line estimated seasonal pattern with 95% point-wise credible intervals indicated by vertical bars at each month.

Fig. 4 — Highest posterior probability graphs that illustrate aspects of inferred conditional dependencies among industrial sectors and among states in analysis of the econometric time series data.

Table 1.

Posterior edge inclusion probabilities in graphical model analysis of the matrix econometric time series data

	NJ	NY	MA	GA	NC	VA	IL	OH
NJ	1	0.05	1.00	0.55	0.01	1.00	1.00	1.00
NY		1	1.00	0.19	0.00	1.00	0.00	0.59
MA			1	0.98	0.96	1.00	1.00	1.00
GA				1	0.93	0.89	0.75	1.00
NC					1	1.00	0.06	1.00
VA						1	0.31	1.00
IL							1	1.00
OH								1
	C	M	T&U	I	FA	P&BS	E&H	L&H	G
C	1	0.02	1.00	0.16	0.75	1.00	0.99	1.00	0.06
M		1	1.00	0.28	0.02	0.98	0.01	0.03	0.01
T&U			1	0.02	1.00	1.00	0.93	1.00	0.02
I				1	0.06	0.34	0.02	0.01	0.55
FA					1	1.00	0.00	0.04	0.02
P&BS						1	0.02	1.00	0.00
E&H							1	0.75	0.02
L&H								1	0.01
G									1

Open in a new tab

US states: NJ, New Jersey; NY, New York; MA, Massachusetts; GA, Georgia; NC, North Carolina; VA, Virginia; IL, Illinois; OH, Ohio. Industrial sectors: C, industrial construction; M, manufacturing; T&U, trade, transportation & utilities; I, information; FA, financial activities; P&BS, professional & business services; E&H, education & health services; L&H, leisure & hospitality; G, government.

Fig. 5 — Summary of posterior on sparsity of *G_V* in the econometric example. Circled areas are proportional to the fraction of posterior sampled graphs at several levels of posterior probability plotted against levels of sparsity S_{G_V} measured as the proportion of edges included.

Graphs with high probability in the region of the mode seem to reflect relevant dependencies in the econometric context. There are strongly evident conditional independencies particularly among subsets of the industrial sectors; see Table 1. Further, the posterior indicates overall sparsity levels through posterior means of about 73% for the proportion of edges included in G_U and about 42% in G_V. Figure 5 further illustrates aspects of the posterior over sparsity for G_V.

8. Further comments

We have introduced Bayesian analysis of matrix-variate graphical models in random sampling and time series contexts. The main innovations include new priors for matrix normal graphical models, use of the parameter expansion approach, inference via Markov chain Monte Carlo for a specific graphical model, evaluation of marginal likelihoods over graphs using coupled candidate’s formula approximations, and the extension of graphical modelling to matrix time series analysis.

On the use of parameter expansion, Roy & Hobert (2007) and Hobert & Marchev (2008) provide theoretical support for the method in Gibbs samplers; in our models, this approach induces tractable and computationally accessible posteriors, leads to good mixing of Markov chain simulations, and is theoretically fundamental to the new model/prior framework in addressing identification issues directly and naturally.

On model identification, an alternative approach might use unconstrained hyper-inverse Wishart priors for each of (U, V) and run the Markov chain Monte Carlo simulation on the unconstrained parameters, similar to a strategy sometimes used in multinomial probit models (McCulloch et al., 2000). It can be argued that this is computationally less demanding than using our explicitly constrained prior and that inferences can be constructed from the simulation output by transforming to constraint-compatible parameters (U υ₁₁, V/υ₁₁). We had considered this, and note that posterior simulation analysis is marginally faster than under the explicitly identified model; in empirical studies, however, we find the computational benefit to be of negligible practical significance. Importantly, this approach relies on a proper prior for the effectively free, unidentified parameter υ₁₁, and is sensitive to that choice. More importantly, the implied prior on (U υ₁₁, V/υ₁₁) is nonstandard and difficult to interpret, and raises questions in prior elicitation and specification; for example, the implied margins for variances are those of ratios of inverse gamma variates and difficult to assess compared to the traditional inverse gamma, and there are now dependencies in priors on left and right covariance matrices. Perhaps most important are the resulting effects on approximate marginal likelihoods; in examples we have studied, the approach yields very different marginal likelihoods and the impact of the marginal prior on the unidentified υ₁₁ plays a key role in that. In contrast, and though very slightly more computationally demanding, the direct and explicitly constrained hyper-inverse Wishart prior is easy to interpret, specify and, with results from Carvalho et al. (2007), implement; synthetic examples have verified the resulting efficacy of the simulation and model search computations.

Our use of candidate’s formula to provide different approximations to marginal likelihoods over graphs can be extended to multiple such approximations. We have explored other constructions, and found no obvious practical differences in the resulting estimates in simulated examples. This is an area open for theoretical investigation and in other model contexts. This also offers a route to extending the analysis here to nondecomposable graphical models.

Our examples are in modest dimensional problems where local move Metropolis–Hastings methods for the graphical model components of the analysis can be expected to be effective, building on experiences in multivariate models (Jones et al., 2005). To scale to higher dimensions, alternative computational strategies such as shotgun stochastic search over graphs (Dobra et al., 2004; Jones et al., 2005; Hans et al., 2007) become relevant. A critical perspective is to define analysis that will rapidly find regions of graphical model space supported by the data. It is far better to work with a small selection of high-probability models than a grossly incorrect model on full graphs, and as dimensions scale the latter quickly becomes infeasible. Shotgun stochastic search and related methods reflect this and offer a path towards faster, parallelizable model search. There is also potential for computationally faster approximations using expectation-maximization style and variational methods (Jordan et al., 1999).

An interesting class of matrix graphical structures arises under autoregressive correlation specifications for the two covariance matrices. This generates a class of Markov random field models that is of potential interest in applications such as texture image modelling. With the matrix data representing a spatial process on a rectangular grid, taking covariance matrices U and V as those of two stationary autoregressive processes provides flexibility in modelling patterns separately in horizontal and vertical directions. We have experimented with examples that suggest potential for this direction in applying the new theory and methods we have presented.

Acknowledgments

The authors are grateful for the constructive comments of the editor and two referees on the original version of this paper. Research reported here was partially supported under grants from the U.S. National Science Foundation and the National Institutes of Health. Any opinions, findings and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of the NSF or NIH. Matlab code implementing the analysis and the Graph-Explore visualization software are freely available at www.stat.duke.edu/research/software/west.

References

Besag J. A candidate’s formula: a curious result in Bayesian prediction. Biometrika. 1989;76:183. [Google Scholar]
Carvalho CM, Massam H, West M. Simulation of hyper-inverse Wishart distributions in graphical models. Biometrika. 2007;94:647–59. [Google Scholar]
Carvalho CM, West M. Dynamic matrix-variate graphical models. Bayesian Anal. 2007a;2:69–98. [Google Scholar]
Carvalho CM, West M. Dynamic matrix-variate graphical models—a synopsis. In: Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, West M, editors. Bayesian Statistics. Oxford: Oxford University Press; 2007b. pp. 585–90. VIII, [Google Scholar]
Chib S. Marginal likelihood from the Gibbs output. J Am Statist Assoc. 1995;90:1313–21. [Google Scholar]
Dawid AP. Some matrix-variate distribution theory: notational considerations and a Bayesian application. Biometrika. 1981;68:265–74. [Google Scholar]
Dawid AP, Lauritzen SL. Hyper-Markov laws in the statistical analysis of decomposable graphical models. Ann Statist. 1993;21:1272–317. [Google Scholar]
Dobra A, Jones B, Hans C, Nevins J, West M. Sparse graphical models for exploring gene expression data. J Mult Anal. 2004;90:196–212. [Google Scholar]
Dutilleul P. The MLE algorithm for the matrix normal distribution. J Statist Comp Simul. 1999;64:105–23. [Google Scholar]
Finn JD. A General Model for Multivariate Analysis. New York: Holt, Rinehart and Winston; 1974. [Google Scholar]
Galecki A. General class of covariance structures for two or more repeated factors in longitudinal data analysis. Commun. Statist. A. 1994;23:3105–19. [Google Scholar]
Gelman A. Parameterization and Bayesian modeling. J Am Statist Assoc. 2004;99:537–45. [Google Scholar]
Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 2006;3:515–34. [Google Scholar]
Giudici P. Learning in graphical Gaussian models. In: Bernado JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian Statistics. Oxford: Oxford University Press; 1996. pp. 621–28. 5, [Google Scholar]
Giudici P, Green PJ. Decomposable graphical Gaussian model determination. Biometrika. 1999;86:785–801. [Google Scholar]
Gupta AK, Nagar DK. Matrix Variate Distributions Monographs and Surveys in Pure & Applied Mathematics. London: Chapman & Hall; 2000. p. 104. [Google Scholar]
Hans C, Dobra A, West M. Shotgun stochastic search in regression with many predictors. J Am Statist Assoc. 2007;102:507–16. [Google Scholar]
Hobert JP, Marchev D. A theoretical comparison of the data augmentation, marginal augmentation and PXDA algorithms. Ann Statist. 2008;2:532–54. [Google Scholar]
Huizenga HM, de Munck JC, Waldorp LJ, Grasman R. Spatiotemporal EEG/MEG source analysis based on a parametric noisecovariance model. IEEE Trans Biomed Eng. 2002;49:533–9. doi: 10.1109/TBME.2002.1001967. [DOI] [PubMed] [Google Scholar]
Jones B, Carvalho CM, Dobra A, Hans C, Carter C, West M. Experiments in stochastic computation for high-dimensional graphical models. Statist Sci. 2005;20:388–400. [Google Scholar]
Jordan M, Ghahramani Z, Jaakkola T, Saul L. An introduction to variational methods for graphical models. Mach Learn. 1999;37:183–233. [Google Scholar]
Lauritzen SL. Graphical Models. Oxford: Clarendon Press; 1996. [Google Scholar]
Liu C, Rubin DB, Wu YN. Parameter expansion to accelerate EM: the PX-EM algorithm. Biometrika. 1998;85:755–70. [Google Scholar]
Liu JS, Wu YN. Parameter expansion for data augmentation. J Am Statist Assoc. 1999;94:1264–74. [Google Scholar]
Mardia KV, Goodall CR. Spatial-temporal analysis of multivariate environmental monitoring data. In: Patil GP, Rao CR, editors. Multivariate Environmental Statistics. Amsterdam: Elsevier; 1993. pp. 347–85. [Google Scholar]
McCulloch RE, Polson NG, Rossi PE. Bayesian analysis of the multinomial probit model with fully identified parameters. J Economet. 2000;99:173–93. [Google Scholar]
Mitchell MW, Genton MG, Gumpertz ML. A likelihood ratio test for separability of covariances. J Mult Anal. 2006;97:1025–43. [Google Scholar]
Naik DN, Rao SS. Analysis of multivariate repeated measures data with a Kronecker product structured covariance matrix. J Appl Statist. 2001;29:91–105. [Google Scholar]
Pole A, West M, Harrison PJ. Applied Bayesian Forecasting and Time Series Analysis. New York: Chapman-Hall; 1994. [Google Scholar]
Quintana JM, West M. Multivariate time series analysis: new techniques applied to international exchange rate data. Statistician. 1987;36:275–81. [Google Scholar]
Roy V, Hobert JP. Convergence rates and asymptotic standard errors for MCMC algorithms for Bayesian probit regression. J. R. Statist. Soc. B. 2007;69:607–23. [Google Scholar]
Theobald DL, Wuttke DS. Empirical Bayes hierarchical models for regularizing maximum likelihood estimation in the matrix Gaussian procrustes problem. Proc Nat Acad Sci. 2006;103:18521–7. doi: 10.1073/pnas.0508445103. [DOI] [PMC free article] [PubMed] [Google Scholar]
West M, Harrison PJ. Bayesian Forecasting and Dynamic Models. 2nd ed. New York: Springer; 1997. [Google Scholar]
Whittaker J. Graphical Models in Applied Multivariate Statistics. Chichester, UK: John Wiley and Sons; 1990. [Google Scholar]

[b1-asp049] Besag J. A candidate’s formula: a curious result in Bayesian prediction. Biometrika. 1989;76:183. [Google Scholar]

[b2-asp049] Carvalho CM, Massam H, West M. Simulation of hyper-inverse Wishart distributions in graphical models. Biometrika. 2007;94:647–59. [Google Scholar]

[b3-asp049] Carvalho CM, West M. Dynamic matrix-variate graphical models. Bayesian Anal. 2007a;2:69–98. [Google Scholar]

[b4-asp049] Carvalho CM, West M. Dynamic matrix-variate graphical models—a synopsis. In: Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, West M, editors. Bayesian Statistics. Oxford: Oxford University Press; 2007b. pp. 585–90. VIII, [Google Scholar]

[b5-asp049] Chib S. Marginal likelihood from the Gibbs output. J Am Statist Assoc. 1995;90:1313–21. [Google Scholar]

[b6-asp049] Dawid AP. Some matrix-variate distribution theory: notational considerations and a Bayesian application. Biometrika. 1981;68:265–74. [Google Scholar]

[b7-asp049] Dawid AP, Lauritzen SL. Hyper-Markov laws in the statistical analysis of decomposable graphical models. Ann Statist. 1993;21:1272–317. [Google Scholar]

[b8-asp049] Dobra A, Jones B, Hans C, Nevins J, West M. Sparse graphical models for exploring gene expression data. J Mult Anal. 2004;90:196–212. [Google Scholar]

[b9-asp049] Dutilleul P. The MLE algorithm for the matrix normal distribution. J Statist Comp Simul. 1999;64:105–23. [Google Scholar]

[b10-asp049] Finn JD. A General Model for Multivariate Analysis. New York: Holt, Rinehart and Winston; 1974. [Google Scholar]

[b11-asp049] Galecki A. General class of covariance structures for two or more repeated factors in longitudinal data analysis. Commun. Statist. A. 1994;23:3105–19. [Google Scholar]

[b12-asp049] Gelman A. Parameterization and Bayesian modeling. J Am Statist Assoc. 2004;99:537–45. [Google Scholar]

[b13-asp049] Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 2006;3:515–34. [Google Scholar]

[b14-asp049] Giudici P. Learning in graphical Gaussian models. In: Bernado JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian Statistics. Oxford: Oxford University Press; 1996. pp. 621–28. 5, [Google Scholar]

[b15-asp049] Giudici P, Green PJ. Decomposable graphical Gaussian model determination. Biometrika. 1999;86:785–801. [Google Scholar]

[b16-asp049] Gupta AK, Nagar DK. Matrix Variate Distributions Monographs and Surveys in Pure & Applied Mathematics. London: Chapman & Hall; 2000. p. 104. [Google Scholar]

[b17-asp049] Hans C, Dobra A, West M. Shotgun stochastic search in regression with many predictors. J Am Statist Assoc. 2007;102:507–16. [Google Scholar]

[b18-asp049] Hobert JP, Marchev D. A theoretical comparison of the data augmentation, marginal augmentation and PXDA algorithms. Ann Statist. 2008;2:532–54. [Google Scholar]

[b19-asp049] Huizenga HM, de Munck JC, Waldorp LJ, Grasman R. Spatiotemporal EEG/MEG source analysis based on a parametric noisecovariance model. IEEE Trans Biomed Eng. 2002;49:533–9. doi: 10.1109/TBME.2002.1001967. [DOI] [PubMed] [Google Scholar]

[b20-asp049] Jones B, Carvalho CM, Dobra A, Hans C, Carter C, West M. Experiments in stochastic computation for high-dimensional graphical models. Statist Sci. 2005;20:388–400. [Google Scholar]

[b21-asp049] Jordan M, Ghahramani Z, Jaakkola T, Saul L. An introduction to variational methods for graphical models. Mach Learn. 1999;37:183–233. [Google Scholar]

[b22-asp049] Lauritzen SL. Graphical Models. Oxford: Clarendon Press; 1996. [Google Scholar]

[b23-asp049] Liu C, Rubin DB, Wu YN. Parameter expansion to accelerate EM: the PX-EM algorithm. Biometrika. 1998;85:755–70. [Google Scholar]

[b24-asp049] Liu JS, Wu YN. Parameter expansion for data augmentation. J Am Statist Assoc. 1999;94:1264–74. [Google Scholar]

[b25-asp049] Mardia KV, Goodall CR. Spatial-temporal analysis of multivariate environmental monitoring data. In: Patil GP, Rao CR, editors. Multivariate Environmental Statistics. Amsterdam: Elsevier; 1993. pp. 347–85. [Google Scholar]

[b26-asp049] McCulloch RE, Polson NG, Rossi PE. Bayesian analysis of the multinomial probit model with fully identified parameters. J Economet. 2000;99:173–93. [Google Scholar]

[b27-asp049] Mitchell MW, Genton MG, Gumpertz ML. A likelihood ratio test for separability of covariances. J Mult Anal. 2006;97:1025–43. [Google Scholar]

[b28-asp049] Naik DN, Rao SS. Analysis of multivariate repeated measures data with a Kronecker product structured covariance matrix. J Appl Statist. 2001;29:91–105. [Google Scholar]

[b29-asp049] Pole A, West M, Harrison PJ. Applied Bayesian Forecasting and Time Series Analysis. New York: Chapman-Hall; 1994. [Google Scholar]

[b30-asp049] Quintana JM, West M. Multivariate time series analysis: new techniques applied to international exchange rate data. Statistician. 1987;36:275–81. [Google Scholar]

[b31-asp049] Roy V, Hobert JP. Convergence rates and asymptotic standard errors for MCMC algorithms for Bayesian probit regression. J. R. Statist. Soc. B. 2007;69:607–23. [Google Scholar]

[b32-asp049] Theobald DL, Wuttke DS. Empirical Bayes hierarchical models for regularizing maximum likelihood estimation in the matrix Gaussian procrustes problem. Proc Nat Acad Sci. 2006;103:18521–7. doi: 10.1073/pnas.0508445103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b33-asp049] West M, Harrison PJ. Bayesian Forecasting and Dynamic Models. 2nd ed. New York: Springer; 1997. [Google Scholar]

[b34-asp049] Whittaker J. Graphical Models in Applied Multivariate Statistics. Chichester, UK: John Wiley and Sons; 1990. [Google Scholar]

PERMALINK

Bayesian analysis of matrix normal graphical models

Hao Wang

Mike West

Abstract

1. Introduction

2. Matrix variate normals, graphs and notation

3. Matrix graphical modelling