Generalized Dynamic Factor Models for Mixed-Measurement Time Series

Kai Cui; David B Dunson

doi:10.1080/10618600.2012.729986

. Author manuscript; available in PMC: 2014 Apr 29.

Published in final edited form as: J Comput Graph Stat. 2012 Sep 19;23(1):169–191. doi: 10.1080/10618600.2012.729986

Generalized Dynamic Factor Models for Mixed-Measurement Time Series

Kai Cui ¹, David B Dunson ²

PMCID: PMC4004613 NIHMSID: NIHMS449705 PMID: 24791133

Abstract

In this article, we propose generalized Bayesian dynamic factor models for jointly modeling mixed-measurement time series. The framework allows mixed-scale measurements associated with each time series, with different measurements having different distributions in the exponential family conditionally on time-varying latent factor(s). Efficient Bayesian computational algorithms are developed for posterior inference on both the latent factors and model parameters, based on a Metropolis Hastings algorithm with adaptive proposals. The algorithm relies on a Greedy Density Kernel Approximation (GDKA) and parameter expansion with latent factor normalization. We tested the framework and algorithms in simulated studies and applied them to the analysis of intertwined credit and recovery risk for Moody’s rated firms from 1982–2008, illustrating the importance of jointly modeling mixed-measurement time series. The article has supplemental materials available online.

Keywords: Adaptive Metropolis Hastings, Bayesian, Dynamic Factor Model, Exponential Family, Mixed-Measurement Time Series

1 Introduction

Multiple co-evolving time series or data streams consisting of either discrete (binary, categorical and counts) or continuous outcomes are ubiquitous in many applications areas. An effective approach of jointly modeling mixed-measurement time series and thus sharing information is essential to characterizing the dependence among the variables and can improve prediction and interpolation across missing data regions. In this paper, we propose generalized Bayesian dynamic factor models for jointly modeling mixed-measurement time series. Efficient computational algorithms are also developed for Bayesian inference and prediction.

There has been a rich history of jointly modeling mixed outcomes involving binary, categorical and continuous variables, using either underlying Gaussian models (Muthen (1984)) or generalized latent trait models (GLTM) that specify GLMs for each response with shared latent factors (Moustaki and Knott (2000) and Dunson (2000)). Underlying Gaussian models assume categorical manifest variables arise by thresholding normal underlying variables. This leads to substantial computational simplifications, as algorithms for Gaussian latent factor models can be easily adapted. However, the price to be paid for computational efficiency is lack of modeling flexibility in that categorical variables will be assigned probit models and count variables cannot be naturally accommodated. Although substantial flexibility can be gained through using mixtures of underlying Gaussian models (Yang and Dunson (2010)), there is an associated price in terms of computational and modeling complexity.

The GLTM framework represents a compromise in that it is much more flexible than the underlying Gaussian framework, while maintaining a simple parametric form. For example, counts can be characterized via Poisson log-linear models, and categorical variables via logistic regressions. Dunson (2003) applied GLTMs to dynamic modeling of mixed scale multivariate longitudinal data, specifying GLMs for each variable. Time-varying Gaussian latent factors were included, assuming a discrete Markov model. However, substantial computational challenges result outside of the underlying Gaussian framework. In particular in Bayesian analysis, one can no longer specify conditionally-conjugate priors and implement simple Gibbs sampling algorithms. Although adaptive rejection sampling can be used within Gibbs samplers, the sampler chains are commonly poorly behaved due to high posterior dependence in the model parameters leading to slow mixing (Ghosh and Dunson (2009)). Such problems are particularly prevalent in dynamic factor models. Therefore, latent factor models with distributions in the exponential family are much more challenging computationally than the underlying normal model class.

When it comes to multivariate time series analysis, the literature has focused primarily on joint models for multiple time series with normal (or conditionally normal) observations. Latent factor models, among other multiple time series analysis approaches, are useful tools for both dimension reduction and characterizing dependency between multiple time series (Anderson (1963) and Hamann et al. (2005)). In the absence of mixed-measurement data, the latent time-varying factors can be analyzed using either principal component analysis in an approximated dynamic factor framework (see e.g. Stock and Watson (2002), Bai (2003), Bai and Ng (2002), Zhang et al. (2012) and Mohamed et al. (2008)), frequency domain methods (Boashash (2006)), or filtering and smoothing techniques as in the state-space framework (Aguilar and West (2000) and Jungbacker and Koopman (2008)). Alternatively, Zhang and Nesselroade (2007) used an underlying Gaussian variable approach to jointly model multiple categorical time series. However, little effort has been made to jointly model mixed-measurement time series.

Motivated by providing a flexible and computationally efficient framework for mixed-measurement time series analysis, we propose the use of generalized Bayesian dynamic factor model. The framework allows mixed scale measurements in different time series, with the different measurements having distributions in the exponential family conditional on time-specific dynamic latent factors. Efficient computational algorithms for Bayesian inference are also developed, which, as shown later, can be easily extended to mixed-measurement time series with missing values and/or mixed-frequency. Briefly, a Metropolis Hastings algorithm with adaptive proposals is developed to efficiently obtain posterior samples of time-varying latent factors. The algorithm is based on a Greedy Density Kernel Approximation (GDKA) via a mixture of normal distributions and further combined with the forward filtering backward sampling algorithm. In addition, parameter expansion with latent factor normalization is proposed to improve the efficiency of sampling model parameters. These steps are further embedded in a higher-level MCMC sampler, which gives efficient updating, improved inference, and prediction.

We introduce the general framework of Bayesian dynamic factor modeling for mixed-measurement time series in Section 2. Section 3 describes computational algorithms developed for Bayesian inference on both latent factors and model parameters. Section 4 applies the model and algorithms to simulated data sets. In section 5 the model is applied to jointly modeling corporate default risk, bond recovery rates and business cycles. Section 6 contains a discussion of the model and results.

2 Proposed Modeling Framework

Consider p mixed-measurement time series y_t = {y_it} (i = 1, …, p, and t = 1, …, T), with y_it denoting the value of the i^th time series at time point t, and the points assumed to be equally-spaced. Each time series can have a different measurement scale, so, for example, y₁_t may be binary, y₂_t a count and y₃_t continuous. We follow the setting of Dunson (2003) and jointly model the mixed-measurement time series y_t via m common latent dynamic factors η_t = (η_t₁, …, η_tm)′ (t = 1, …, T) and time series-specific observed covariates z_it.

Conditional on a factor path (η₁, …, η_T), the observation y_it is modeled as coming from a distribution in the exponential family with canonical parameter θ_it:

y_{i t} ~ f_{i} (θ_{i t}, φ_{i}) .

(2.1)

The canonical parameters are defined by a generalized linear model linked to the latent factors and observed covariates:

h_{i} (θ_{i t}) = α_{i} + z_{i t}^{'} β_{i} + Λ_{i} η_{t},

(2.2)

where h_i(·) is the link function for the i^th time series, α_i is the intercept term, z_it is a b × 1 vector of observed covariates at time t, β_i is a b × 1 parameter vector, η_t is m × 1 latent factor vector at time t, and Λ_i is 1 × m factor loading vector. Let $Λ = {[Λ_{1}^{'}, \dots, Λ_{m}^{'}]}^{'}$ denote the factor loading matrix. Considering identifiability of factors, Λ is lower-triangular with positive elements on the diagonal.

For simplicity in exposition, a stationary VAR(1) vector autoregressive process is assumed for the equally spaced m-dimensional factor η_t (2.3). However, the methods are easily modified to place arbitrary Gaussian process priors on continuous time factors to accommodate unequally-spaced and mixed-frequency time series. For example, a simple yet flexible choice corresponds to the Ornstein-Uhlenbeck (O-U) process prior (Barndorff-Nielsen and Shephard (2001)).

Considering the identifiability of latent factors, the covariance matrix of η₁ is set to I_m, and mean is set to 0 (in this paper, N(mean, precision) is used in this paper to represent Normal distributions), and we let

\begin{array}{l} η_{1} ~ N (0, I_{m}) \\ η_{t} = A_{η} η_{t - 1} + ε_{t}, ε_{t} ~ N (0, \sum_{η}^{- 1}), t = 2, \dots, T \\ where \sum_{η} = Φ_{η}^{- 1} = I_{m} - A_{η} A_{η}^{'} \end{array}

(2.3)

Conditional on the factor path (η₁, …, η_T), the covariates z_1:_T and all other model parameters Θ, the observations y_1:_T are independent of each other, which implies that the likelihood is of the form

f (y_{1 : T} ∣ η_{1 : T}, z_{1 : T}, Θ) = \prod_{t = 1}^{T} \prod_{i = 1}^{m} f (y_{i t} ∣ η_{1 : T}, z_{1 : T}, Θ)

(2.4)

Further combined with the Bayesian computational algorithms explored in the next section, the model allows missing data on a subset of the time series and/or mixed-frequency.

3 Computational Algorithms for Bayesian Inference

For Bayesian computation in dynamic generalized latent trait models, Dunson (2003) proposed to use one-at-a-time Gibbs updates with adaptive rejection sampling (ARS) to sample from the full conditional posterior distributions of the model parameters and latent factors. There are several problems that arise with this approach. First, ARS can be slow as many sampling and updating steps are needed. Second, due to high posterior dependence in the unknowns, one-at-a-time Gibbs samplers tend to be very slow to converge and mix. This is particularly the case as the number of time points increases, with computation becoming completely infeasible for moderate to large T. To address these problems, we propose a Metropolis Hastings algorithm with adaptive proposals, which relies on a Greedy Density Kernel Approximation (GDKA) via mixture of normal distributions, to efficiently draw samples of latent factors in a block. For efficient updating of model parameters, parameter expansion techniques are proposed and combined with latent factor normalization to improve mixing.

3.1 Sampling Latent Factors – Background

Posterior computation in latent factor time series models shares many characteristics with similar issues in state-space models, where sequential filtering and backward sampling (FFBS) using Monte Carlo is pivotal. In the absence of mixed-measurement data, when smoothing distributions of the states do not have closed forms, computational techniques based on sequential Monte Carlo (Liu and West (2001)), particle filtering and using approximated distributions have been proposed for non-linear and non-Gaussian state-space models (see e.g. Harrison and Stevens (1971), Sorenson and Alspach (1971), Alspach (1974) and more recently Ravines and Schmidt (2007)). Stroud et al. (2003) further proposed to jointly update all unobserved states by using a Metropolis Hastings proposal that approximates the exact conditional posterior. Closer to our perspective is the recent work of Niemi and West (2010), who proposed adaptive mixture approximations by matching moments for the smoothing distributions in non-linear state-space models. However, most of the these algorithms are not designed for cases with multidimensional mixed-measurement response data, in which presence of multiple exponential family distributions of data makes sequential approximation of smoothing distributions more computationally challenging, especially when missing data exist.

In this paper, we proposed a fast and efficient Greedy Density Kernel Approximation (GDKA) to the smoothing distributions via mixtures of Gaussian kernels. The GDKA needs to be performed for each filtering step and chooses mixture Gaussian backward sampling distributions. Assuming the approximations are reasonably accurate, Metropolis Hastings steps based on the approximated mixture of normal distributions as proposals (the actual proposal distributions are based on the mixtures of normals approximation, modified to have heavier tails) jointly update latent factors efficiently.

3.2 Sampling Latent Factors – Approximating Filtering Distributions

Let α_t(η_t) denote the forward filtering function at time t

α_{t} (η_{t}) = f (η_{t} ∣ y_{1 : t})

We have the recursive relationship between time t and t+1

α_{t + 1} (η_{t + 1}) \propto [\int α_{t} (η_{t}) f (η_{t + 1} ∣ η_{t}) d η_{t}] \cdot f (y_{i, (t + 1)}, \forall i ∣ η_{t + 1})

(3.5)

For simplicity of exposition, we focus on the case η_t is one-dimensional. Clearly, the computational bottleneck is that α_t(η_t) does not have an analytic form with mixed-measurement time series, making it infeasible to perform filtering and smoothing. To tackle this, we propose to use a Greedy Density Kernel Approximation (GDKA) to approximate the forward filtering functions α_t(η_t) over a compact domain Ω via mixtures of K Gaussian kernels:

α_{t} (η_{t}) \approx \sum_{k = 1}^{K} w_{t k} N (η_{t}; μ_{t k}, φ_{t}), w_{t k} \geq 0.

(3.6)

Suppose there are N_t pairs of ${α_{t} (η_{t}^{(i)}), η_{t}^{(i)}} (i = 1, \dots, N_{t})$ over domain Ω available compromising a training set, and let $Γ_{t} = {(Γ_{t}^{(1)}, \dots, Γ_{t}^{(N_{t})})}^{T}$ be a N_t × 1 vector with $Γ_{t}^{(i)} \propto α_{t} (η_{t}^{(i)})$ . Then GDKA via a mixture of K Gaussian kernels has the form:

{\hat{Γ}}_{t}^{(i)} = \sum_{k = 1}^{K} w_{t k} exp (- \frac{φ_{t}}{2} {| | η_{t}^{(i)} - C_{t k} | |}^{2}), where w_{j} \geq 0

where C_tk (k = 1, …, K) are centers (means) of Gaussian kernels, and φ_t is the Gaussian precision, which are set to be the same for all K kernels. Letting H_t denote the distance matrix with $H_{t, i, k} = exp (- \frac{φ_{t}}{2} {| | η_{t}^{(i)} - C_{t k} | |}^{2})$ , (i = 1, …, N_t, k = 1, …, K), and w_t = (w_t₁, …, w_tK)^T be a K ×1 vector of the weights, we can represent the fitting/approximation problem in the regression form:

Γ_{t} = H_{t} w_{t} + ε_{t}

(3.7)

Given the means and variances of the Gaussian kernels, the best approximation can be achieved by minimizing the residual errors. The optimal weights are:

{\hat{w}}_{t} = \arg min_{w_{t}} {| | ε_{t} | |}^{2}; subject to : w_{t} \geq 0;

(3.8)

If we cannot be sure that H_t is well conditioned or even a full ranked matrix, a regularized least square solution is:

{\hat{w}}_{t} = \arg min_{w_{t}} {| | ε_{t} | |}^{2} + α {| | w_{t} | |}^{2}; 0 < α ≪ 1; subject to : w_{t} \geq 0;

(3.9)

It is trivial to prove that with the non-negative constraints, the quadratic optimization problem always has a unique solution.

Clearly, specification of number of kernels used (K), choice of training pairs, Gaussian centers C_tk and Gaussian precision φ_t is critical for the approximation. In details, to form the training pairs, a grid of N_t equally-spaced α_t are chosen within a compact domain on which the density α_t(η_t) is above certain threshold. In our studies, we chose the threshold of α_t(η_t) to be > 10⁻³ relative to the mode of the filtering distribution (Note that the filtering distributions have to be uni-modal if the observations are assumed to be coming from distributions in the exponential family). Gaussian centers are chosen from the N_t training points α_t. Given equation (3.7), choosing a greedy set of K Gaussian centers from N_t values of α_t can be achieved using the forward-selection algorithm as in the variable selection settings in linear regression problems. For the value of K, in our preliminary simulation studies, we found that K = 20 provides reasonably good approximations to a variety of distributions based on our algorithm, so in all our studies, we chose K to be 20. A greedy search step is also included within GDKA to find a reasonable value of φ_t. A detailed step-by-step procedure for GDKA is shown in Section A (in the appendix), and an example is shown in Figure S1 to show the fine approximation.

With the approximated α_t(η_t), forward filtering can be conducted sequentially. Let ψ_t(η_t₊₁) = ∫ α_t(η_t)f(η_t₊₁|η_t)dη_t ∝ f(η_t₊₁|y_i₁, …, y_i,t, ∀i) denote the one-step prediction function (the integral part in the filtering function). Then we have:

\begin{array}{l} ψ_{t} (η_{t + 1}) \propto \sum_{k} w_{k}^{*} N (η_{t + 1}; μ_{k}^{*}, φ_{k}^{*}), \\ where w_{k}^{*} = w_{t k}, μ_{k}^{*} = A_{η} μ_{t k}, φ_{k}^{*} = \frac{φ_{t k} Φ_{η}}{φ_{t k} + A_{η}^{2} Φ_{η}}, \end{array}

(3.10)

with A_η, μ_η and Φ_η defined in (2.3), and w_tk, μ_tk, φ_tk from (3.6).

With the approximated closed-form ψ_t(η_t₊₁), we can evaluate α_t₊₁(η_t₊₁), generate N_t₊₁ training pairs ${η_{t + 1}^{(i)}, α_{t + 1} (η_{t + 1}^{(i)})} (i = 1, \dots, N_{t + 1})$ and approximate α_t₊₁(η_t₊₁) using the same GDKA algorithm we used to approximate α_t(η_t), and obtain the updated filtering function α_t₊₁(η_t₊₁):

α_{t + 1} (η_{t + 1}) \propto \sum_{k} w_{t + 1, k} N (η_{t}; μ_{t + 1, k}, φ_{t + 1, k})

(3.11)

The normalizing constant is not involved in the computation, and the nature of the recursion relationship implies that we can start with approximating α₁(η_t) ∝ π₀(η₁)L(η₁|y_i₁, ∀i), and then proceed with sequentially approximating each α_t(η_t).

3.3 Sampling Latent Factors – Backward Sampling Distributions

Given the approximated filtering functions, backward sampling distributions π_t(·) (t = 1, …, T) can be approximated straightforwardly via π̂_t(·):

\begin{array}{l} {\hat{π}}_{T} (η_{T} ∣ y_{1 : T}) = α_{T} (η_{T}) \propto \sum_{k} w_{T k} N (η_{T}; μ_{T k}, φ_{T k}), \\ {\hat{π}}_{t - 1} (η_{t - 1} ∣ η_{t}, y_{1 : T}) \propto \sum_{k} {\tilde{w}}_{t - 1, k} N (η_{t - 1}; {\tilde{μ}}_{t - 1, k}, {\tilde{φ}}_{t - 1, k}), \\ where {\tilde{w}}_{t - 1, k} = w_{t - 1, k} exp (\frac{A_{η} Φ_{η} μ_{t - 1, k} φ_{t - 1, k} [2 (η_{t} - μ_{η}) - A_{η} μ_{k}^{(t - 1)}]}{2 (φ_{k}^{(t - 1)} + A_{η}^{2} Φ_{η})}), \\ {\tilde{μ}}_{t - 1, k} = \frac{μ_{t - 1, k} Φ_{η} + A_{η} Φ_{η} (η_{t} - μ_{η})}{φ_{t - 1, k} + A_{η}^{2} Φ_{η}}, \\ φ_{t - 1, k} = φ_{t - 1, k} + A_{η}^{2} Φ_{η}, \end{array}

(3.12)

with A_η, Φ_η defined in (2.3), and w_t_−1,_k, μ_t_−1,_k, φ_t_−1,_k from (3.6).

Given (3.12) the joint posterior distribution of all latent factors can be approximated by a sequential product of mixture of normals:

\hat{π} (η_{1 : T} ∣ y_{1 : T}) \propto {\hat{π}}_{T} (η_{T} ∣ y_{1 : T}) \prod_{t = 2}^{T} {\hat{π}}_{t} (η_{t - 1} ∣ η_{t}, y_{1 : T})

(3.13)

3.4 Sampling Latent Factors – Metropolis Hastings Updating

Using the approximated joint posterior distribution π̂(η_1:_T |y_1:_T ) of the latent factors as proposal distributions, sample paths of latent factors can be drawn from a Metropolis Hastings step. For practical reasons, the approximated joint posterior distribution π̂(η_1:_T |y_1:_T ) is further combined with a mixture of corresponding heavier-tailed Cauchy distributions π̃(η_1:_T |y_1:_T ), forming the proposal distribution g(η_1:_T |y_1:_T ) (as shown in (3.14)), to avoid irregular behaviors of the Markov chain in the tail region:

g (η_{1 : T} ∣ y) = p_{η} \hat{π} (η_{1 : T} ∣ y_{1 : T}) + (1 - p_{η}) \tilde{π} (η_{1 : T} ∣ y_{1 : T})

(3.14)

where 0 ≪ p_η < 1 is the weight of the approximated joint distribution. p_η = 0.95 is used in all the following computations.

The Metropolis Hastings step in the (m + 1)^th iteration of the MCMC then proceeds by:

Generate the entire block of candidate values γ^m⁺¹ for η = η_1:_T from the proposal distribution ~ g(·), with g(·) following (3.14).
Accept the candidate and let η ⁽^m⁺¹⁾ = γ ⁽^m⁺¹⁾ with probability $min (1, \frac{π (γ^{(m + 1)}) g (η^{(m)})}{π (η^{(m)}) g (γ^{(m + 1)})})$ where π(·) is the unnormalized posterior full conditional joint distribution of latent factors. η_1:_T. Otherwise, let η ⁽^m⁺¹⁾ = η ⁽^m⁾.

3.5 Sampling Model Parameters

For models with other parameters Θ in addition to the latent factors, the above latent factor sampler is applied at each stage of an MCMC algorithm that also includes steps for updating Θ. To reduce slow-mixing which is common in MCMC algorithms for latent factor models, we combine parameter expansion with latent factor normalization to induce default priors and improve computational efficiency for model parameters.

Parameter Expansion (PX) has been introduced as a useful approach for both introducing new families of prior distributions and improving computational efficiency (Liu and Wu (1999), Gelman (2004) and Kinney and Dunson (2007)). Ghosh and Dunson (2009) used parameter expansion in Bayesian Gaussian factor analysis to induce heavy-tailed priors for factor loadings and improve mixing of samplers simultaneously. In this paper, we extend the parameter expansion technique to the Bayesian dynamic factor model setting with mixed-measurement data. Basically, parameter expansion introduces a working model that is over-parameterized, and parameters in the working model are then transformed back to parameters in the inferential model.

Considering model (2.1)–(2.3) as our inferential model, we define the following working PX-factor model:

\begin{array}{l} h_{i} (θ_{i t}) = α_{i}^{*} + z_{i t}^{'} β_{i} + Λ_{i}^{*} η_{t}^{*}, \\ η_{1}^{*} ~ N (μ_{1}^{*}, {(Ψ_{1}^{*})}^{- 1}), \\ η_{t}^{*} ~ N (μ_{η}^{*} + A_{η}^{*} η_{t - 1}^{*}, {(ψ_{η}^{*})}^{- 1}), \\ where μ_{η}^{*} = (I - A_{η}^{*}) μ_{1}^{*}, Ψ_{η}^{*} = Ψ_{1}^{*} - A_{η}^{*} Ψ_{1}^{*} A_{η}^{*'}, \end{array}

(3.15)

where $Λ_{i}^{*}$ is an unconstrained working factor loading matrix having a lower triangular structure, $η_{t}^{*}$ is an m×1 working latent factor vector, Ψ₁ = diag(ψ₁, …, ψ_m) is a diagonal covariance matrix, and $α_{i}^{*}$ is the intercept term, with $α_{k}^{*} = 0$ , for k ≤ m. $A_{η}^{*} = diag (A_{1}^{*}, \dots, A_{m}^{*})$ is a diagonal matrix of AR(1) coefficients, with $∣ A_{1}^{*} ∣ < 1$ to guarantee stationarity. Note that the PX-factor model is clearly over-parameterized, having redundant parameters in both the covariance matrix Ψ₁ and mean $μ_{1}^{*}$ .

In order to transform the working model parameters back to inferential model parameters, we let:

\begin{array}{l} λ_{j l} = S (λ_{l l}^{*}) λ_{j l}^{*} ψ_{l}^{- 1 / 2}, η_{t} = S_{d} [Ψ_{1}^{1 / 2} (η_{t}^{*} - μ_{1}^{*})], α_{i} = α_{i}^{*} + Λ_{i}^{*} μ_{1}^{*}, \\ A_{η} = A_{η}^{*}, μ_{η} = S_{d} [Ψ_{1}^{1 / 2} {μ_{η}^{*} + (A_{η}^{*} = I) μ_{1}^{*}}], \end{array}

(3.16)

where Inline graphic (x) = −1 for x < 0, (x) = 1 for x ≥ 0, and $S_{d} = diag {S (λ_{l l}^{*})}$ . Then, instead of placing priors directly on α_i and Λ_i, with η_t~ N(0, I), we specify priors on α^*, $Λ_{1}^{*}, μ_{1}^{*}$ and $Ψ_{1}^{*}$ . In particular, we let

\begin{array}{l} λ_{j l}^{*} ~ N (0, 1) for all lower triangular elements, \\ ψ_{l} ~ inv - Γ (a_{l}, b_{l}), \\ α_{i}^{*} ~ N (0, φ_{α}), A_{i}^{*} ~ Unif (- 1, 1), μ_{1}^{*} ~ N (0, Φ_{B}) . \end{array}

(3.17)

Hence, in the one-factor case, we obtain

\begin{array}{l} Λ_{i} = S (Λ_{i}^{*}) Ψ_{1}^{- 1 / 2} Λ_{i}^{*}, η_{t} = S (Λ_{i}^{*}) Ψ_{1}^{1 / 2} (η_{t}^{*} - μ_{1}^{*}), α_{i} = α_{i}^{*} + Λ_{i}^{*} μ_{1}^{*}, \\ A_{η} = A_{η}^{*}, μ_{η} = S (Λ_{i}^{*}) Ψ_{1}^{1 / 2} {μ_{n}^{*} + (A_{η}^{*} - 1) μ_{1}^{*}} \end{array}

(3.18)

with priors

\begin{array}{l} Λ_{i}^{*} ~ N (0, 1) for all lower triangular elements, \\ Ψ_{1} ~ inv - Γ (a, b), \\ α_{i}^{*} ~ N (0, φ_{α}), A_{i}^{*} ~ Unif (- 1, 1), μ_{1}^{*} ~ N (0, Φ_{B}) \end{array}

(3.19)

To efficiently draw samples of parameters from the inferential model, we draw samples from the working model, followed by a post-processing step to transform working-model draws back to the parameters in the inferential model.

3.6 Sampling Model Parameters – Latent Factor Normalization

Parameter expansion can greatly reduce the autocorrelation of factor loadings, however, it has limited effect on improving the mixing of other parameters (e.g. the intercept term, see Figure (S12)). To further improve computational efficiency, we combine latent factor normalization with parameter expansion in sampling the model parameters from the full conditional distributions. In detail, suppose $η_{1 : T}^{*}$ is a realization of latent factors in model (3.15) with $η_{t}^{*} = (η_{1, t}^{*}, \dots, η_{m, t}^{*})$ , let η̃_t = (η̃₁_,t, …, η̃_m,t)′ denote the normalized latent factors, with

{\tilde{η}}_{i, t} = \frac{η_{i, t}^{*} - mean (η_{i, 1 : T}^{*})}{std (η_{i, 1 : T}^{*})}, t = 1, \dots, T; i = 1, \dots, m .

Then in matrix form we have:

{\tilde{η}}_{t} = ν + Δ η_{t}^{*}

(3.20)

where ν is a m × 1 vector with $ν_{i} = - \frac{mean (η_{i, 1 : T}^{*})}{std (η_{i, 1 : T}^{*})}$ , and Δ is a m × m diagonal matrix with $Δ_{i, i} = \frac{1}{std (η_{i, 1 : T}^{*})}$ .

PX with Latent Factor Normalization Algorithm

Considering the working PX-factor model (3.15) and priors (3.17), given a realization of latent factors $η_{1 : T}^{*}$ , in order to obtain a sample of model parameter Θ = (α_i, Λ_i) (i = 1, …, p) from the full conditional distribution f(Θ|η_1:_T ), we can first sample Θ̃ from the full conditional distribution using the normalized PX model (3.21) via normalized latent factors η̃_1:_T (i = 1, …, m):

h_{i} (θ_{i t}) = {\tilde{α}}_{i} + z_{i t}^{'} β_{i} + {\tilde{Λ}}_{i} {\tilde{η}}_{t},

(3.21)

with the induced priors (3.22) for the parameters of normalized PX-factor model derived from (3.17):

\begin{array}{l} {\tilde{λ}}_{j l} ~ N (0, Δ_{l l}^{2}) for all lower triangular elements, \\ {\tilde{α}}_{i} ~ N (0, {(1 / φ_{α_{i}} + \sum_{l = 1}^{min (i, m)} v_{l}^{2} / Δ_{l l}^{2})}^{- 1}) . \end{array}

(3.22)

and transform Θ̃ to obtain a sample of Θ via the following transformations via (3.23).

\begin{array}{l} Λ_{i}^{*} = {\tilde{Λ}}_{i} Δ, \\ α_{i}^{*} = {\tilde{α}}_{i} + {\tilde{Δ}}_{i} ν . \end{array}

(3.23)

4 Simulation Examples

4.1 Mixed-Measurement Time Series

To access the efficiency of the algorithms, we first consider the simulation example jointly modeling a 3-dimensional mixed-measurement time series (D_t) consisting of Poisson counts (Y_t), continuous (Z_t) and binary responses (W_t), with T=50, 100, 150. 200 and 500 respectively (as shown in (4.24)). In each case, 10% of the data in each time series are randomly missing, allowing us to evaluate the model in dealing with missing values. The time-varying latent factors are modeled to follow an O-U process.

\begin{array}{l} D_{t} = [Y_{t}, Z_{t}, W_{t}], \\ Y_{t} ~ Poiss (λ_{t}), Z_{t} ~ N (μ_{t}, 1 / σ_{z}^{2}), W_{t} ~ Bernoulli (p_{t}), \\ where log (λ_{t}) = a + b η_{t}, μ_{t} = c + d η_{t}, logit (p_{t}) = e + f η_{t}, \\ and d η_{t} = - β (η_{t} - α) d t + σ {d W}_{t} \end{array}

(4.24)

We run the sampler for 10,000 iterations, discarding the first 2,000 iterations as burn-in. The results are later compared with those obtained by simply using WinBUGS or extended Kalman filters. Taking T=200 case as an example, we set the values of a–f to be 2, 2, 3, 2, 0.5 and 1 respectively, $σ_{z}^{2} = 1$ and for the time-varying latent factors α = 0, β = 1 and σ = 1. The simulated time series are shown in Figure 1.

Illustration of simulated dataset and interval validation to show the model fitting. Solid lines (and the histogram for the first figure): simulated data. A 3-dimensional mixed-measurement time series *D_t* = [*Y_t, Z_t, W_t*] up to T=200 is simulated, where *Y_t* is count time series, *Z_t* is continuous and *W_t* is binary. 10% data in each time series are randomly missing.; dotted lines: posterior mean; dashed lines: 95% posterior C.I.

We introduce priors on the parameters in the PX working model as shown in (3.17). Specifically, half-Cauchy priors are placed on the variance components of latent factor autoregressions as recommened by Gelman (2006), while Inv-Gamma(0.01,0.01) priors are given to all other variances. To visualize the mixing the MCMC chain, trace-plots of the posterior samples of model parameters (intercepts and factor loadings shown in Figure 2) and latent factors (Figure 3) are shown.

Traceplot of Posterior Samples of Model Parameters: Intercepts (a,c,e) and factor loadings (b,d,f) shown. The parameters are defined in (4.24).

Traceplot of Posterior Samples of Latent Factors: Six time points shown. The latent factors are defined in (4.24).

A summary of effective sample size and sample autocorrelation of posterior samples are further listed in Table 1. A summary of effective sample size and sample autocorrelation of posterior samples are further listed in Table 1.We compare the mixing of our MCMC sampler with that of WinBUGS (Traceplots shown in Figure 4 and S2, and the summary of effective sample size and autocorrelation shown in Table S1), which is based on a Gibbs sampler and use of adaptive-rejection sampler or random-walk MCMC for full conditional distributions without closed forms. Our sampler relying on GDKA, parameter expansion and latent factor normalization leads to tremendous gains in the mixing of MCMC chains for both model parameters and latent factors. Furthermore, it is important to note that all three techniques contribute to improving the efficiency of the MCMC sampler, as not implementing one of them tends to give poor mixing of the chains for at least some model parameters (see e.g. Figure 6 and S12). Especially, we find that while Metropolis Hastings via GDKA is critical for latent factor filtering and sampling, parameter expansion substantially improves computational performance for factor loadings, and latent factor normalization helps the mixing of chains for other model parameters (e.g. intercepts).

Table 1.

MCMC Summary for T=100 and T=150 respectively shows improved mixing, compared to Table S1, which is obtained using WinBUGS. Using T=100 and T=150 as examples, in each case, 10000 posterior samples are drawn for each unknown with the first 2000 as burn-in. Effective sample size (ESS) and sample autocorrelation with lag 10 and 30 of the 8000 posterior samples are shown for selected parameters as representatives (c, d, η₁ and η₅₀ are defined as in (4.24)).

# of Time Points	T=100				T=150

Parameters	c	d	η₁	η₅₀	c	d	η₁	η₅₀
ESS	1815	6500	1875	1740	1905	6025	1982	1887
Autocorr-Lag 10	0.045	0.008	0.043	0.039	0.019	0.025	0.019	0.018
Autocorr-Lag 30	0.007	−0.005	0.012	0.009	−0.009	−0.005	−0.011	−0.013

Open in a new tab

Traceplot Produced by WinBugs of Posterior Samples of Latent Factors. The traceplots are compared with those in Figure 3, also showing that our algorithm improve the mixing of the Markov Chain.

Traceplots and Autocorrelation Plots of Posterior Samples. Note that our modification can effectively reduce both the autocorrelation of factor loading matrix (Λ₁ as a representative here) and the intercept (α₁ as a representative) very well, as compared to those in Figure S12. α₁ and Λ₁ are defined in (5.27).

On the other hand, since extended Kalman filters have been used to approximate non-Gaussian filtering and sampling distributions using Laplace normal approximations, especially when the length of time series is not long, we compared the results of our method with those of Kalman filters with all simulated datasets. A comparison of acceptance ratio clearly shows that our sampler gave much better acceptance ratio, especially with increased length of the time series. Low acceptance rates of extended Kalman filter inevitably lead to MCMC inefficiency, illustrated by the increased autocorrelation and decreased effective sample size (a comparison between Table 1 and Table S1).

To show the model fitting, internal validations are performed, with posterior means and 95% confidence intervals (C.I.s) of the simulated time series shown in Figure 1.

The use of our MCMC sampler seems to provide a reasonable inference to the unknown quantities. Furthermore, the framework and computational algorithms allows missing values on a subsect of the time series. Figure S3 shows that the model gives good interpolations of missing data based on the posterior marginal distributions.

Efficient adaptive Metropolis Hastings via GDKA to sample time-varying latent factors is one of the key steps in our sampling scheme, and it is well known that the acceptance rates of Metropolis Hastings steps tend to decrease rapidly as the number of time points increases (Niemi and West (2010)). To evaluate this, we generated 100 simulated datasets for each T and look at the overall performance. Table 2 lists the acceptance rates of the Metropolis Hastings steps to sample latent factors using our sampling scheme, showing that the algorithm is efficient even for big time series with T=500, which has much higher acceptance ratio than extend Kalman filters. The accuracy of inference to unknown quantities is also further confirmed based on the 100 simulated datasets by checking whether the posterior estimates would recover the true values of model parameters under such a complex model, with Table S3 showing that in almost all cases the 95% posterior C.I.s cover the true values with all 100 simulated datasets.

Table 2.

Acceptance Ratio for the Metropolis-Hastings update with GDKA for latent factors compared with extended Kalman filters (E-KF). Mean, minimum and maximum acceptance rates based on 100 simulated datasets are shown. Note that our algorithm both has higher acceptance ratio and more stable performance with mixed-measurement observations with possible missing values.

T (# of Time Points)	50	100	150	200	500

GDKA (mean)	73.1%	59.4%	43.3%	30.4%	12.4%
(min)	70.3%	57.3%	40.6%	27.4%	8.1%
(max)	76.1%	61.3%	46.8%	33.1%	13.3%

E-KF(mean)	41.1%	25.3%	1.7%	~0	0
(min)	30.1%	14.1%	0%	0	0
(max)	43.3%	27.1%	7.7%	0.3%	0

Open in a new tab

Furthermore, it is worth mentioning that although we show that our algorithm updates latent factors in a single block and infers model parameters efficiently with relatively big time series (T=500), the performance of the proposed algorithm is expected to decrease with bigger time series. In those cases, a reasonable strategy could be combining our algorithm with the idea of block sampling scheme, where the big time series can be divided to a few blocks with T smaller than 500 and updated sequentially in each block, which will definitely help in problems with extremely big time series.

4.2 Mixed Discrete Time Series

Time series with discrete outcomes having a variety of measurement scales (count, binary, categorical) are also commonly collected in various fields of study. Discrete variables contain less information than continuous variables (Chin and Lee (2008)), so it is of great help to share information between time series with mixed discrete outcomes whenever it is possible.

In this simulation study, 4-dimensional time series D_t consisting of a mixture of binomial counts (Y_t and Z_t) and binary responses (W_t and V_t) up to time T are jointly modeled, as shown in (4.25). Similarly, 10% of the simulated data in each time series are randomly missing to evaluate the model in dealing with missing data.

\begin{array}{l} D_{t} = [Y_{t}, Z_{t}, W_{t}, V_{t}], \\ Y_{t} ~ Bino (N_{y}, p_{t}^{(y)}), Z_{t} ~ Bino (N_{z}, p_{t}^{(z)}), \\ W_{t} ~ Bernoulli (p_{t}^{(w)}), V_{t} ~ Bernoulli (p_{t}^{(v)}), \\ where logit (p_{t}^{(y)}) = a + b η_{t}, logit (p_{t}^{(z)}) = c + d η_{t}, \\ logit (p_{t}^{(w)}) = e + f η_{t}, logit (p_{t}^{(v)}) = g + h η_{t}, \\ and η_{t} = γ η t_{- 1} + ε_{t}, ε_{t} ~ N (0, φ_{ε}) . \end{array}

(4.25)

Taking T=100 case as an example, we set the values of a–h to be 0.5, 1, 1, −1.5, 1, −2, 0.5 and −3 respectively. An AR(1) autoregressive structure is applied to the time-varying latent factors, with γ = 0.368 and σ_ε = 0.658. The simulated time series are shown in Figure S4. Similarly, to obtain posterior samples, we run our MCMC sampler for 10,000 iterations, discarding the first 2,000 samples as burn-in. Again, the effective sample size and autocorrelation shown in Table 3 show good mixing of the Markov Chain, which is further illustrated by the trace-plots of intercepts, factor loadings (Figure S5) and latent factors (Figure S6). In addition, with T=100, the Metropolis Hastings step for latent factors update gives a good acceptance ratio of 56.1%.

Table 3.

MCMC Summary for T=100 (selected parameters shown) shows good mixing of the Markov Chain. 10000 posterior samples are drawn for each unknown with the first 2000 as burn-in. Effective sample size (ESS) and sample autocorrelation with lag 10 and 30 of the 8000 posterior samples are shown for selected parameters as representatives (a – h, η₁ and η₅₀ are defined as in (4.25)).

Parameters	a	b	c	d	e	f	g	h	η₁	η₅₀
ESS	5726	6449	5742	6488	5030	2659	4057	2043	5998	5710
Autocorr-Lag 10	0.007	0.008	0.009	0.011	0.012	0.029	0.017	0.029	−0.004	−0.005
Autocorr-Lag 30	0.001	−0.006	0.003	−0.002	0.007	0.013	0.011	0.018	0.013	0.014

Open in a new tab

Internal validations with posterior mean and 95% C.I. of the simulated time series are shown in Figure S7. Again, the framework provides good fit to the simulated data, which is reinforced by the missing data interpolation results shown in Figure S8.

4.3 Mixed-Measurement-Mixed-Frequency Time Series

Another challenge with mixed-measurement time series is that time series are often sampled at different frequencies. For example, some time series are observed every month or quarter while others are recorded every year. In empirical financial analysis, daily and intra-daily time series are also often readily available for analysis. To model the time series simultaneously, the literature and methodology have been focused on either aggregating the higher-frequency data to the lower frequency, or interpolating the lower-frequency data to the higher-frequency (Ghysels et al. (2005), Andreou et al. (2010) and Ghysels et al. (2007)). Neither of these is generally satisfactory. First of all, temporal aggregation loses information and may lead to poor prediction. Secondly, the approaches via data interpolation transform the mixed-frequency problem to a missing data problem, assuming that the model operates at the highest frequency of the time series, while data in the lower-frequency time series are periodically missing. However, commonly used interpolation methods generally do not efficiently and fully exploit the sample information, especially when massive missing data exist, which is very likely to occur when performing data interpolation in mixed-measurement-mixed-frequency time series analysis.

We apply our generalized dynamic factor model and computational algorithms to mixed-measurement-mixed-frequency time series analysis, in which the observed time series are simultaneously modeled via dynamic latent factors at specifically defined frequencies discussed later. Under this scenario, neither data aggregation nor interpolation is required in the inference or prediction, which improves the computational efficiency with mixed-frequency observations.

In detail, we treat the low-frequency time series as high-frequency time series with “missing data” and low “observed” frequency. For the p-dimensional mixed-measurement-mixed-frequency time series, let t be the time index of the highest frequency time series, and y_t = {y_it} (i = 1, …, p and t = 1, …, T), with observed frequencies ω = (ω₁ …, ω_p)′. Let $y_{t}^{*} = {(y_{1 t}^{*}, \dots, y_{p t}^{*})}^{'}$ be the corresponding ordered p-dimensional time series by descending frequency. Let $ω^{*} = {(ω_{1}^{*}, \dots, ω_{p}^{*})}^{'}$ , with $ω_{i}^{*} (i = 1, \dots, p)$ denoting the highest common frequency from $y_{i t}^{*}$ to $y_{p t}^{*}$ . Then $y_{t}^{*}$ can be modeled following (2.1), (2.2) and (2.3) via m common factors η_t = (η₁_t, …, η_mt)′, with each latent factor η_it (i = 1, …, m) constructed to follow an AR(1) process with frequency $ω_{i}^{*} (i = 1, \dots, m \leq p)$ . In particular, in the one-factor model case, $ω_{1}^{*}$ is the highest common frequency of all observed time series.

In the following simulation study, we considered jointly modeling a 3-dimensional mixed-measurement-mixed-frequency time series y_t with observed frequency omega; = (3omega;₀, omega;₀, omega;₀)′ respectively (shown in (4.26)). The simulated data are generated through two common latent factors (η_t = (η₁_t, η₂_t)′) at the frequency of 3ω₀, and chosen in accordance with empirical applications to form multivariate mixed-measurement-mixed-frequency time series, where the first time series (Y₁_t) can be considered as monthly recorded data, while the other two as quarterly recorded data. Specifically,

\begin{array}{l} Y_{t} = {[Y_{1 t}^{'}, Y_{2 t}^{'}, Y_{3 t}^{'}]}^{'}, \\ Y_{1 t} ~ Bernoulli (p_{1 t}), Y_{2 t} ~ N (μ_{2 t}, 1 / σ^{2}), Y_{3 t} ~ Bino (N_{3}, p_{3 t}), \\ where logit (p_{1 t}) = a_{1} + b_{1} η_{1, t} + c_{1} η_{2, t} \\ μ_{2 t} = a_{2} + b_{2} η_{1, t} + c_{2} η_{2, t} \\ logit (p_{3 t}) = a_{3} + b_{3} η_{1, t} + c_{3} η_{2, t} \\ and η_{i, t} = γ_{i} η_{i, t - 1} + ε_{i t}; ε_{i t} ~ N (0, φ_{η_{i}}) \end{array}

(4.26)

The simulated data and latent factors up to T=90 are shown in Figure S9 (90 time points for Y₁_t, and 30 time points observed for Y₂_t and Y₃_t). Again, analysis of posterior samples indicates good mixing the the Markov Chain (Table S2), and internal validation shows good fitting of the model to the simulated mixed-measurement-mixed-frequency data (Figure S10).

5 Intertwined Corporate Default, Recovery Rate and Business Cycle

It has been noted that corporate default rates and bond recovery rates on defaults are negatively correlated (Altman et al. (2005)). Since recovery rates play a critical role in pricing and risk models, treating recovery rates as either constant or a stochastic variable independent of default rates while neglecting inverse relationship leads to inaccurate estimation of the loss function and suboptimal capital allocation. Furthermore, according to the current Basel proposal, banks can opt to provide their own recovery rate forecasts for the calculation of regulatory capital (Creal et al. (2011)). Thus there is an immediate need for statistical models explaining the relationship between the corporate default risk and bond recovery rates (and probably some other credit risk indicators), which can be used in default prediction and credit risk modeling. Macroeconomic conditions, on the other hand, may affect both default and recovery rates (see e.g. Nickell et al. (2000)), where the number of defaults tends to increase with lower recovery rates in economic recessions (Altman et al. (2005)).

In this study, we apply the generalized Bayesian dynamic factor models to jointly modeling corporate default counts, bond recovery rates and business cycle indicators over time. Default counts are model by binomial distribution with the default probability modeled via logistic link function. Recovery rates, which are between 0 and 1, are modeled via logistic-Normal model. Unemployment rate is used as an indicator for business cycle. The relationship between them are modeled via shared common unobserved systematic risk factor(s).

5.1 Data

The 5 time series (shown in Figure 5) we have are annual recovery rates, corporate default counts and unemployment rate data from 1982 to 2008 (Latest public data obtained from Moody’s Default and Recovery Database), denoted as:

y_{t} = {y_{i t}}; (i = 1, \dots, 5 and t = 1, \dots, 27)

where y₁_t: annual default counts within investment-grade cooperates. y₂_t: annual default counts within speculative-grade cooperates. y₃_t: all-bonds annual recovery rates. y₄_t: senior-secured bonds annual recovery rates. y₅_t: annual unemployment rate of the United States.

Data consisting of 5 Mixed-Measurement Time Series Shown. (A): Annual Default Counts: Speculative-Grade Cooperates. (B): Annual Default Counts: Investment-Grade Cooperates. (C): Annual Recovery Rates: All Bonds. (D): Annual Recovery Rates: Senior-Secured Bonds. (E): Annual Unemployment Rate.

5.2 Model Specification

Default counts are modeled as coming from binomial distributions. Recovery rates and unemployment rates are modeled as coming from logistic-Normal distributions. The inferential model is shown in (5.27). Default priors are placed on the corresponding parameter expanded working model.

\begin{array}{l} y_{1 t} ~ Bino (N_{1 t}, p_{1 t}), y_{2 t} ~ Bino (N_{2 t}, p_{2 t}), \\ logit (y_{3 t}) ~ N (μ_{3 t}, φ_{3}), logit (y_{4 t}) ~ N (μ_{4 t}, φ_{4}), logit (y_{5 t}) ~ N (μ_{5 t}, φ_{5}), \\ where logit (p_{1 t}) = α_{1} + Λ_{1} η_{t}, logit (p_{2 t}) = α_{2} + Λ_{2} η_{t}, \\ μ_{3 t} = α_{3} + Λ_{3} η_{t}, μ_{4 t} = α_{4} + Λ_{4} η_{t}, μ_{5 t} = α_{5} + Λ_{5} η_{t}, \\ and η_{1} ~ N (0, I_{m}), η_{t} ~ N (a_{η} η_{t} + b_{η}, I_{m}), ∣ a_{η} ∣ < 1 \end{array}

(5.27)

The 5 time series are jointly modeled via one or two factor models, based on the previous studies showing that default counts and recovery rates are negatively correlated, and they may be further associated with but do not match business cycle. Inference of number of factors is based on comparing fit for the one and two factor models.

5.3 Results

We are interested in the inference of both parameters and latent factors, as well as the relationship between corporate default counts, bond recovery rates and business cycle.

In the one-factor model, the latent factor can be interpreted as credit risk within the period. We explore the posterior distribution of the unknowns using the algorithms discussed earlier. Two results show the efficiency of the MCMC sampler: Firstly, the acceptance ratio of latent factors in the Metropolis Hastings update is 81%. Secondly, parameter expansion and latent factor normalization promote the mixing of the Markov chain, and greatly improve the sampling efficiency (as shown in Figure 6), compared to those obtained via only parameter expansion (Figure S12) or WinBUGS (results not shown).

We constructed point and interval estimates of the credit risk factor within the period. This is shown in Figure 7. Clearly, the figure shows that credit pressure peaks around 1990, 2001 and 2008, which are all well-known financial crises.

1982–2008 Credit Pressure Reconstruction. Dotted line: posterior mean of the credit risk factor, and dashed lines: 95% posterior C.I.

Furthermore, Figure 8 shows the fitting of the model to the data, with posterior retrospective mean and 95% intervals shown with dotted lines and dashed lines respectively. Note that the fitting of annual recovery rates and default counts are good under the one-factor setting (The fitting of recovery rate for senior-secured bonds seems not as good as others, indicating the possibility of existence of additional factors. Two-factor model gives a better fit, as shown later), indicating that a common credit risk factor may be used to model the inverse relationship between them. Fitting of unemployment rate data, however, is not as good, indicating the idea that business cycle, though may be related to credit cycle, have effects from additional latent factors.

Model Fitting using One-Factor Generalized Dynamic Factor Model. The solid lines (and the histograms for the first two panels) show real historical data, the dotted lines indicate posterior means and the dashed lines show posterior 95% C.I. (A): Annual Default Counts: Speculative-Grade Cooperates. (B): Annual Default Counts: Investment-Grade Cooperates. (C): Annual Recovery Rates: All Bonds. (D): Annual Recovery Rates: Senior-Secured Bonds. (E): Annual Unemployment Rate.

In comparison, we also fit the data with a two-factor model, with the two factors chosen to be corresponding to the speculative-grade-cooperate default counts and unemployment rates time series, which can be considered as representing the credit risk factor and the business cycle factor. Again, we have high acceptance rates for the two latent factors (83% and 78% respectively) and the model fitting is shown in Figure 9. Clearly, the two-factor model gives improved fitting for unemployment data and senior-secured bonds recovery rate data, but has trivial improvement on others. This indicates that default and recovery risk, though related to, is not solely determined by the business cycle.

Model Fitting using Two-Factor Generalized Dynamic Factor Model. The solid lines (and the histograms for the first two panels) show real historical data, the dotted lines indicate posterior means and the dashed lines show posterior 95% C.I. (A): Annual Default Counts: Speculative-Grade Cooperates. (B): Annual Default Counts: Investment-Grade Cooperates. (C): Annual Recovery Rates: All Bonds. (D): Annual Recovery Rates: Senior-Secured Bonds. (E): Annual Unemployment Rate.

Furthermore, we specifically looked at the posterior distributions of factor loadings to obtain a better understanding of the relationship between the time series (Histograms of posterior samples of factor loadings shown in Figure S11 and the posterior summaries are shown in Table 4). Table 4 clearly shows that default counts (the first and second times series) and bond recovery rates (the third and fourth time series) have strong negative correlation in terms of the credit risk factor (factor one), with the factor loadings of the first factor being strictly positive for default counts and strictly negative for bond recovery rates, at the 95% confidence level. In addition, we are not sure if unemployment rate is strongly correlated with default counts or bond recovery rates, although there is some weak evidence showing that default counts may be negatively associated and bond recovery rates may be positively associated with unemployment rates. This further demonstrates that systematic default and recovery risk does not coincide with business cycle risk, which is consistent with the related findings in Das et al. (2007), Bruche and Gonzlez-Aguado (2010) and Creal et al. (2011).

Table 4.

Posterior Summary of Factor Loadings Λ_ij (i = 1, …, 5 and j = 1, 2) in the Two-Factor Model, where Λ_ij is the factor loading of the jth factor for the ith time series (Λ₁₂ = 0).

Factor Loadings	Mean	Medium	SD	95% HPD C.I.

Λ₁₁	1.312	0.927	1.957	[0.478,3.127]
Λ₂₁	1.322	0.936	1.986	[0.503,3.195]
Λ₂₂	−0.0828	−0.0717	0.0808	[−0.225,0.035]
Λ₃₁	−0.592	−0.422	0.842	[−1.424,−0.172]
Λ₃₂	0.0642	0.0547	0.0735	[−0.038,0.196]
Λ₄₁	−0.683	−0.497	1.076	[−1.706,−0.128]
Λ₄₂	0.119	0.104	0.131	[−0.033,0.383]
Λ₅₁	0.0586	0.0440	0.115	[−0.077,0.213]
Λ₅₂	0.219	0.186	0.159	[0.099,0.454]

Open in a new tab

6 Discussion

This paper proposes the use of Bayesian generalized dynamic factor models for jointly modeling mixed-measurement time series, and efficient computational algorithms are developed. The proposed framework allows mixed-scale measurements associated with each time series, with different measurements having different distributions in the exponential family, and thus provides a more flexible framework for multivariate time series analysis. Although our illustration examples are based on specific k-factor models, the broader question of inference on the number of factors and model selection in (generalized) factor analysis, especially in the mixed-measurement observation framework, is interesting and challenging. Reversible jump MCMC algorithms can be used to allow for model uncertainly (Lopes and West (2004) and Lopes et al. (2011)). Alternatively, sparse spike-and-slab priors were proposed to induce sparsity in the factor loading matrix (see e.g. West (2003) and Carvalho et al. (2008)). More recently, non-parametric infinite factor models were introduced using either a multiplicative gamma process shrinkage prior on factor loadings (Bhattacharya and Dunson (2011)) or Indian-Buffet Process (IBP) prior on the factor loading matrix (Knowles and Ghahramani (2010)). However, translating these non-parametric approaches to the generalized dynamic factor model framework with mixed measurement is challenging and as yet unexplored.

Supplementary Material

supplement

NIHMS449705-supplement-supplement.pdf^{(722.3KB, pdf)}

Acknowledgments

This research was partially supported by grant R01ES17240 from the National Institute of Environmental Health Sciences (NIEHS) of the National Institutes of Health (NIH).

Footnotes

SUPPLEMENTAL MATERIALS

Appendix: the Detailed Greedy Density Kernel Approximation (GDKA) procedure, and supplemental tables and figures (Table S1–S3 and Figure S1–S12) as supporting results.

Matlab Codes: Matlab codes for modeling mixed-measurement time series using the generalized dynamic factor models. (codes.zip)

Contributor Information

Kai Cui, Department of Statistical Science, Duke University.

David B. Dunson, Department of Statistical Science, Duke University

References

Aguilar O, West M. Bayesian dynamic factor models and portfolio allocation. Journal of Business and Economic Statistics. 2000;18:338–357. [Google Scholar]
Alspach DL. Gaussian sum approximations in nonlinear filtering and control. Information Sciences. 1974;7:271– 290. [Google Scholar]
Altman EI, Brady B, Resti A, Sironi A. The link between default and recovery rates: Theory, empirical evidence, and implications. Journal of Business. 2005;78(6):2203–2228. [Google Scholar]
Anderson T. The use of factor analysis in the statistical analysis of multiple time series. Psychometrika. 1963;28(1):1–25. [Google Scholar]
Andreou E, Ghysels E, Kourtellos A. Regression models with mixed sampling frequencies. Journal of Econometrics. 2010;158(2):246–261. [Google Scholar]
Bai J. Inferential theory for factor models of large dimensions. Econometrica. 2003;71(1):135–171. [Google Scholar]
Bai J, Ng S. Determining the number of factors in approximate factor models. Econometrica. 2002;70(1):191–221. [Google Scholar]
Barndorff-Nielsen OE, Shephard N. Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial economics. Journal Of The Royal Statistical Society Series B. 2001;63(2):167–241. [Google Scholar]
Bhattacharya A, Dunson DB. Sparse Bayesian infinite factor models. Biometrika. 2011;98(2):291–306. doi: 10.1093/biomet/asr013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Boashash B. Time Frequency Signal Analysis and Processing: A Comprehensive Reference. Oxford University Press; 2006. [Google Scholar]
Bruche M, Gonzlez-Aguado C. Recovery rates, default probabilities, and the credit cycle. Journal of Banking & Finance. 2010;34(4):754–764. [Google Scholar]
Carvalho CM, Chang J, Lucas JE, Nevins JR, Wang Q, West M. High-Dimensional Sparse Factor Modelling: Applications in Gene Expression Genomics. Journal of the American Statistical Association. 2008;103(484):1438–1456. doi: 10.1198/016214508000000869. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chin R, Lee BY. Principles and Practice of Clinical Trial Medicine. 1 Academic Press; 2008. [Google Scholar]
Creal DD, Schwaab B, Koopman SJ, Lucas A. Tinbergen Institute Discussion Papers 11-042/2/DSF16. Tinbergen Institute; 2011. Observation driven mixed-measurement dynamic factor models with an application to credit risk. [Google Scholar]
Das SR, Duffie D, Kapadia N, Saita L. Common failings: How corporate defaults are correlated. Journal of Finance. 2007;62(1):93–117. [Google Scholar]
Dunson DB. Bayesian latent variable models for clustered mixed outcomes. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 2000;62(2):355–366. [Google Scholar]
Dunson DB. Dynamic latent trait models for multidimensional longitudinal data. Journal of the American Statistical Association. 2003;98:555–563. [Google Scholar]
Gelman A. Parameterization and Bayesian modeling. Journal of the American Statistical Association. 2004;99:537–545. [Google Scholar]
Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis. 2006;3:514–534. [Google Scholar]
Ghosh J, Dunson DB. Default prior distributions and efficient posterior computation in Bayesian factor analysis. Journal of Computational and Graphical Statistics. 2009;18(2):306–320. doi: 10.1198/jcgs.2009.07145. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ghysels E, Santa-Clara P, Valkanov R. There is a risk-return trade-off after all. Journal of Financial Economics. 2005;76(3):509–548. [Google Scholar]
Ghysels E, Sinko A, Valkanov R. Midas regressions: Further results and new directions. Econometric Reviews. 2007;26(1):53–90. [Google Scholar]
Hamann E, Deistler M, Scherrer W. Factor models for multivariate time series. In: Taudes A, editor. Adaptive Information Systems and Modelling in Economics and Management Science, volume 5 of Interdisciplinary Studies in Economics and Management. Springer; Vienna: 2005. pp. 243–251. [Google Scholar]
Harrison PJ, Stevens C. A Bayesian approach to short-term forecasting. Operational Research Quarterly (1970–1977) 1971;22(4):341–362. [Google Scholar]
Jungbacker B, Koopman SJ. Tinbergen Institute Discussion Papers 08-007/4. Tinbergen Institute; 2008. Likelihood-based analysis for dynamic factor models. [Google Scholar]
Kinney SK, Dunson DB. Fixed and random effects selection in linear and logistic models. Biometrics. 2007;63(3):690–698. doi: 10.1111/j.1541-0420.2007.00771.x. [DOI] [PubMed] [Google Scholar]
Knowles D, Ghahramani Z. Nonparametric Bayesian sparse factor models with application to gene expression modelling. Annals of Applied Statistics 2010 [Google Scholar]
Liu J, West M. Combined parameter and state estimation in simulation-based filtering. In: Doucet A, JFGDF, Gordon NJ, editors. Sequential Monte Carlo Methods in Practice. Springer-Verlag; New York: 2001. [Google Scholar]
Liu JS, Wu YN. Parameter expansion for data augmentation. Journal of the American Statistical Association. 1999;94:1264–1274. [Google Scholar]
Lopes HF, Gamerman D, Salazar E. Generalized spatial dynamic factor models. Computational Statistics & Data Analysis. 2011;55(3):1319–1330. [Google Scholar]
Lopes HF, West M. Bayesian model assessment in factor analysis. Statistica Sinica. 2004;14:41–67. [Google Scholar]
Mohamed S, Heller K, Ghahramani Z. Bayesian Exponential Family PCA. Proceedings of the 21st Conference on Advances in Neural Information Processing Systems (NIPS21).2008. [Google Scholar]
Moustaki I, Knott M. Generalized latent trait models. Psychometrika. 2000;65(3):391–411. [Google Scholar]
Muthen B. A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika. 1984;49(1):115–132. [Google Scholar]
Nickell P, Perraudin W, Varotto S. Stability of rating transitions. Journal of Banking & Finance. 2000;24(1–2):203–227. [Google Scholar]
Niemi J, West M. Adaptive mixture modelling metropolis methods for Bayesian analysis of non-linear state-space models. Journal of Computational and Graphical Statistics. 2010;19:260–280. doi: 10.1198/jcgs.2010.08117. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ravines HSM, Schmidt A. Working papers. Universidade Federal do Rio de Janeiro; 2007. An efficient sampling scheme for generalized dynamic models. [Google Scholar]
Sorenson H, Alspach D. Recursive Bayesian estimation using Gaussian sums. Automatica. 1971;7(4):465–479. [Google Scholar]
Stock JH, Watson MW. Macroeconomic forecasting using diffusion indexes. Journal of Business & Economic Statistics. 2002;20(2):147–62. [Google Scholar]
Stroud JR, Mller P, Polson NG. Nonlinear state-space models with state-dependent variances. Journal of the American Statistical Association. 2003;98(462):377–386. [Google Scholar]
West M. Bayesian Statistics. Oxford University Press; 2003. Bayesian factor regression models in the “large p, small n” paradigm; pp. 723–732. [Google Scholar]
Yang M, Dunson D. Bayesian semiparametric structural equation models with latent variables. Psychometrika. 2010;75(4):675–693. [Google Scholar]
Zhang G, Ferrari S, Cai C. A comparison of information functions and search strategies for sensor planning in target classification. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. 2012;42(1):2–16. doi: 10.1109/TSMCB.2011.2165336. [DOI] [PubMed] [Google Scholar]
Zhang Z, Nesselroade JR. Bayesian estimation of categorical dynamic factor models. Multivariate Behavioral Research. 2007;42(4):729. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplement

NIHMS449705-supplement-supplement.pdf^{(722.3KB, pdf)}

[R1] Aguilar O, West M. Bayesian dynamic factor models and portfolio allocation. Journal of Business and Economic Statistics. 2000;18:338–357. [Google Scholar]

[R2] Alspach DL. Gaussian sum approximations in nonlinear filtering and control. Information Sciences. 1974;7:271– 290. [Google Scholar]

[R3] Altman EI, Brady B, Resti A, Sironi A. The link between default and recovery rates: Theory, empirical evidence, and implications. Journal of Business. 2005;78(6):2203–2228. [Google Scholar]

[R4] Anderson T. The use of factor analysis in the statistical analysis of multiple time series. Psychometrika. 1963;28(1):1–25. [Google Scholar]

[R5] Andreou E, Ghysels E, Kourtellos A. Regression models with mixed sampling frequencies. Journal of Econometrics. 2010;158(2):246–261. [Google Scholar]

[R6] Bai J. Inferential theory for factor models of large dimensions. Econometrica. 2003;71(1):135–171. [Google Scholar]

[R7] Bai J, Ng S. Determining the number of factors in approximate factor models. Econometrica. 2002;70(1):191–221. [Google Scholar]

[R8] Barndorff-Nielsen OE, Shephard N. Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial economics. Journal Of The Royal Statistical Society Series B. 2001;63(2):167–241. [Google Scholar]

[R9] Bhattacharya A, Dunson DB. Sparse Bayesian infinite factor models. Biometrika. 2011;98(2):291–306. doi: 10.1093/biomet/asr013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Boashash B. Time Frequency Signal Analysis and Processing: A Comprehensive Reference. Oxford University Press; 2006. [Google Scholar]

[R11] Bruche M, Gonzlez-Aguado C. Recovery rates, default probabilities, and the credit cycle. Journal of Banking & Finance. 2010;34(4):754–764. [Google Scholar]

[R12] Carvalho CM, Chang J, Lucas JE, Nevins JR, Wang Q, West M. High-Dimensional Sparse Factor Modelling: Applications in Gene Expression Genomics. Journal of the American Statistical Association. 2008;103(484):1438–1456. doi: 10.1198/016214508000000869. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Chin R, Lee BY. Principles and Practice of Clinical Trial Medicine. 1 Academic Press; 2008. [Google Scholar]

[R14] Creal DD, Schwaab B, Koopman SJ, Lucas A. Tinbergen Institute Discussion Papers 11-042/2/DSF16. Tinbergen Institute; 2011. Observation driven mixed-measurement dynamic factor models with an application to credit risk. [Google Scholar]

[R15] Das SR, Duffie D, Kapadia N, Saita L. Common failings: How corporate defaults are correlated. Journal of Finance. 2007;62(1):93–117. [Google Scholar]

[R16] Dunson DB. Bayesian latent variable models for clustered mixed outcomes. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 2000;62(2):355–366. [Google Scholar]

[R17] Dunson DB. Dynamic latent trait models for multidimensional longitudinal data. Journal of the American Statistical Association. 2003;98:555–563. [Google Scholar]

[R18] Gelman A. Parameterization and Bayesian modeling. Journal of the American Statistical Association. 2004;99:537–545. [Google Scholar]

[R19] Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis. 2006;3:514–534. [Google Scholar]

[R20] Ghosh J, Dunson DB. Default prior distributions and efficient posterior computation in Bayesian factor analysis. Journal of Computational and Graphical Statistics. 2009;18(2):306–320. doi: 10.1198/jcgs.2009.07145. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Ghysels E, Santa-Clara P, Valkanov R. There is a risk-return trade-off after all. Journal of Financial Economics. 2005;76(3):509–548. [Google Scholar]

[R22] Ghysels E, Sinko A, Valkanov R. Midas regressions: Further results and new directions. Econometric Reviews. 2007;26(1):53–90. [Google Scholar]

[R23] Hamann E, Deistler M, Scherrer W. Factor models for multivariate time series. In: Taudes A, editor. Adaptive Information Systems and Modelling in Economics and Management Science, volume 5 of Interdisciplinary Studies in Economics and Management. Springer; Vienna: 2005. pp. 243–251. [Google Scholar]

[R24] Harrison PJ, Stevens C. A Bayesian approach to short-term forecasting. Operational Research Quarterly (1970–1977) 1971;22(4):341–362. [Google Scholar]

[R25] Jungbacker B, Koopman SJ. Tinbergen Institute Discussion Papers 08-007/4. Tinbergen Institute; 2008. Likelihood-based analysis for dynamic factor models. [Google Scholar]

[R26] Kinney SK, Dunson DB. Fixed and random effects selection in linear and logistic models. Biometrics. 2007;63(3):690–698. doi: 10.1111/j.1541-0420.2007.00771.x. [DOI] [PubMed] [Google Scholar]

[R27] Knowles D, Ghahramani Z. Nonparametric Bayesian sparse factor models with application to gene expression modelling. Annals of Applied Statistics 2010 [Google Scholar]

[R28] Liu J, West M. Combined parameter and state estimation in simulation-based filtering. In: Doucet A, JFGDF, Gordon NJ, editors. Sequential Monte Carlo Methods in Practice. Springer-Verlag; New York: 2001. [Google Scholar]

[R29] Liu JS, Wu YN. Parameter expansion for data augmentation. Journal of the American Statistical Association. 1999;94:1264–1274. [Google Scholar]

[R30] Lopes HF, Gamerman D, Salazar E. Generalized spatial dynamic factor models. Computational Statistics & Data Analysis. 2011;55(3):1319–1330. [Google Scholar]

[R31] Lopes HF, West M. Bayesian model assessment in factor analysis. Statistica Sinica. 2004;14:41–67. [Google Scholar]

[R32] Mohamed S, Heller K, Ghahramani Z. Bayesian Exponential Family PCA. Proceedings of the 21st Conference on Advances in Neural Information Processing Systems (NIPS21).2008. [Google Scholar]

[R33] Moustaki I, Knott M. Generalized latent trait models. Psychometrika. 2000;65(3):391–411. [Google Scholar]

[R34] Muthen B. A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika. 1984;49(1):115–132. [Google Scholar]

[R35] Nickell P, Perraudin W, Varotto S. Stability of rating transitions. Journal of Banking & Finance. 2000;24(1–2):203–227. [Google Scholar]

[R36] Niemi J, West M. Adaptive mixture modelling metropolis methods for Bayesian analysis of non-linear state-space models. Journal of Computational and Graphical Statistics. 2010;19:260–280. doi: 10.1198/jcgs.2010.08117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] Ravines HSM, Schmidt A. Working papers. Universidade Federal do Rio de Janeiro; 2007. An efficient sampling scheme for generalized dynamic models. [Google Scholar]

[R38] Sorenson H, Alspach D. Recursive Bayesian estimation using Gaussian sums. Automatica. 1971;7(4):465–479. [Google Scholar]

[R39] Stock JH, Watson MW. Macroeconomic forecasting using diffusion indexes. Journal of Business & Economic Statistics. 2002;20(2):147–62. [Google Scholar]

[R40] Stroud JR, Mller P, Polson NG. Nonlinear state-space models with state-dependent variances. Journal of the American Statistical Association. 2003;98(462):377–386. [Google Scholar]

[R41] West M. Bayesian Statistics. Oxford University Press; 2003. Bayesian factor regression models in the “large p, small n” paradigm; pp. 723–732. [Google Scholar]

[R42] Yang M, Dunson D. Bayesian semiparametric structural equation models with latent variables. Psychometrika. 2010;75(4):675–693. [Google Scholar]

[R43] Zhang G, Ferrari S, Cai C. A comparison of information functions and search strategies for sensor planning in target classification. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. 2012;42(1):2–16. doi: 10.1109/TSMCB.2011.2165336. [DOI] [PubMed] [Google Scholar]

[R44] Zhang Z, Nesselroade JR. Bayesian estimation of categorical dynamic factor models. Multivariate Behavioral Research. 2007;42(4):729. [Google Scholar]

PERMALINK

Generalized Dynamic Factor Models for Mixed-Measurement Time Series

Kai Cui

David B Dunson

Abstract

1 Introduction

2 Proposed Modeling Framework

3 Computational Algorithms for Bayesian Inference

3.1 Sampling Latent Factors – Background

3.2 Sampling Latent Factors – Approximating Filtering Distributions

3.3 Sampling Latent Factors – Backward Sampling Distributions

3.4 Sampling Latent Factors – Metropolis Hastings Updating

3.5 Sampling Model Parameters

3.6 Sampling Model Parameters – Latent Factor Normalization

PX with Latent Factor Normalization Algorithm

4 Simulation Examples

4.1 Mixed-Measurement Time Series

Figure 1.

Figure 2.

Figure 3.

Table 1.

Figure 4.

Figure 6.

Table 2.

4.2 Mixed Discrete Time Series

Table 3.

4.3 Mixed-Measurement-Mixed-Frequency Time Series

5 Intertwined Corporate Default, Recovery Rate and Business Cycle

5.1 Data

Figure 5.

5.2 Model Specification

5.3 Results

Figure 7.

Figure 8.

Figure 9.

Table 4.

6 Discussion

Supplementary Material

Acknowledgments

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases