Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2019 Apr 8;106(2):433–452. doi: 10.1093/biomet/asz007

Identifiability and estimation of structural vector autoregressive models for subsampled and mixed-frequency time series

A Tank 1,, E B Fox 1, A Shojaie 2
PMCID: PMC6508036  PMID: 31097836

Summary

Causal inference in multivariate time series is challenging because the sampling rate may not be as fast as the time scale of the causal interactions, so the observed series is a subsampled version of the desired series. Furthermore, series may be observed at different sampling rates, yielding mixed-frequency series. To determine instantaneous and lagged effects between series at the causal scale, we take a model-based approach that relies on structural vector autoregressive models. We present a unifying framework for parameter identifiability and estimation under subsampling and mixed frequencies when the noise, or shocks, is non-Gaussian. By studying the structural case, we develop identifiability and estimation methods for the causal structure of lagged and instantaneous effects at the desired time scale. We further derive an exact expectation-maximization algorithm for inference in both subsampled and mixed-frequency settings. We validate our approach in simulated scenarios and on a climate and an econometric dataset.

Keywords: Mixed frequency, Non-Gaussian error, Structural vector autoregressive model, Subsampling, Time series

1. Introduction

Classical approaches to multivariate time series and Granger causality assume that all time series are sampled at the same rate. However, due to data integration across heterogeneous sources, many datasets in econometrics, health care, environment monitoring, and neuroscience comprise multiple series sampled at different rates, referred to as mixed-frequency time series. Furthermore, due to the cost or technological challenge of data collection, many series may be sampled at a rate lower than the true causal scale of the underlying physical process. For example, many econometric indicators, such as gross domestic product, GDP, or housing price data, are recorded at quarterly and monthly scales (Moauro & Savio, 2005), though there may be important interactions between these indicators at the weekly or biweekly scale (Boot et al., 1967; Stram & Wei, 1986; Moauro & Savio, 2005). In neuroscience, imaging technologies with high spatial resolution, such as functional magnetic resonance imaging or fluorescent calcium imaging, have relatively low temporal resolutions, but many important neuronal processes and interactions happen at finer time scales (Zhou et al., 2014). A causal analysis rooted at a slower time scale than the true causal time scale may miss true interactions and add spurious ones (Boot et al., 1967; Breitung & Swanson, 2002; Silvestrini & Veredas, 2008; Zhou et al., 2014). A comprehensive approach to Granger causality in multivariate time series should be able to simultaneously accommodate both mixed-frequency and subsampled data.

Recently, causal discovery in subsampled time series has been studied with methods in causal structure learning using graphical models (Danks & Plis, 2013; Plis et al., 2015; Hyttinen et al., 2016). These methods are model-free and automatically infer a sampling rate for causal relations most consistent with the data. We maintain a similar goal, but take a model-based approach and examine the identifiability of structural vector autoregressive models under both subsampling and mixed-frequency settings. Structural models are an important tool in time series analysis (Harvey, 1990; Lütkepohl, 2005) and are a mainstay in econometrics and macro-economic policy analysis. These models combine classical linear autoregressive models with structural equation modelling (Bowen & Guo, 2011) to allow analysis of both instantaneous and lagged causal effects between time series. However, structural models are commonly applied to regularly sampled data, where each series is observed at the same regular intervals; moreover, the time scale of such an analysis is typically restricted to this shared sampling scale.

Gong et al. (2015) recently explored identifiability and estimation of vector autoregressive models under subsampling with independent innovations, i.e., no instantaneous causal effects or error correlations. They showed that with non-Gaussian errors, the transition matrix is identifiable under subsampling, implying that Granger causality estimation is possible. Unfortunately, their results do not cover correlated errors, a common and important aspect of many real-world time series (Lütkepohl, 2005). Interestingly, non-Gaussian errors have also been shown to aid model identifiability in structural autoregressive models with standard sampling assumptions (Hyvärinen et al., 2008; Zhang & Hyvärinen, 2009; Hyvärinen et al., 2010; Peters et al., 2013; Lanne et al., 2017). This line of work applies techniques developed for structural equation modelling with non-Gaussian errors and independent component analysis (Hyvärinen et al., 2004) to the structural time series context. Non-Gaussian errors allow identification of the structural model without other identifying restrictions (Lanne et al., 2017) and also allow identification of the causal ordering of the instantaneous effects if these are known to follow a directed acyclic graph (Hyvärinen et al., 2010). These models have been successfully applied to many non-Gaussian time series in econometrics (Lanne & Lütkepohl, 2010; Lanne et al., 2010; Lanne et al., 2017; Herwartz & Plödt, 2016).

Our approach to subsampling unifies existing approaches to identifiability along two complementary directions. First, our work connects the non-Gaussian subsampled autoregressive model to the independent innovations method (Gong et al., 2015) in the non-Gaussian structural autoregressive framework (Hyvärinen et al., 2008, 2010; Zhang & Hyvärinen, 2009; Peters et al., 2013; Lanne et al., 2017) by proving identifiability of a structural vector autoregressive model of order one under arbitrary subsampling. As a result, we find that one can identify not only the causal structure of lagged effects from subsampled data with correlated errors, but also the directed acyclic graph of the instantaneous effects, without prior knowledge of the causal ordering.

Second, we generalize our results to the mixed-frequency setting with arbitrary subsampling, where the subsampling level may be different for each time series. In doing so, we provide a unified theoretical approach and estimation method for subsampled and mixed-frequency cases. Deriving identifiability conditions on the model parameters in the mixed-frequency case is difficult (Anderson et al., 2016) and has only been studied based on the first two moments of the mixed-frequency process. Our work follows a complementary direction by leveraging higher-order moments and provides the first set of specific model conditions for mixed-frequency structural models needed for identifiability. Furthermore, previous mixed-frequency approaches have assumed a causal ordering, while our results indicate that this may be estimated by leveraging non-Gaussianity. Finally, our approach to identifiability allows us to move beyond the classical mixed-frequency setting, where the time scale is fixed at the most finely sampled series (Anderson et al., 2016); we instead consider identifiability and estimation in more general mixed-frequency cases. The four sampling types covered by our approach are shown in Fig. 1. To simplify the presentation, we first introduce our theoretical results for subsampled series of case (a) in § 3. We then generalize the results to the mixed-frequency cases (b), (c) and (d) in § 4.

Fig. 1.

Fig. 1.

Four types of structured sampling, where black lines indicate observed data and dotted lines indicate missing data: (a) both series are subsampled; (b) the standard mixed-frequency case, where only the second series is subsampled; (c) a subsampled version of case (b) where each series is subsampled at a different rate; (d) a subsampled mixed-frequency series that has no common factor across sampling rates and thus is not a subsampled version of case (b).

We introduce an exact expectation-maximization algorithm for inference in both subsampled and mixed-frequency cases. Gong et al. (2015) also use such an algorithm, but because they formulate inference directly on the subsampled process by marginalizing the missing data, their approach requires an extra approximation. Our approach instead casts inference as a missing-data problem and uses a Kalman filter to exactly compute the E-step for both subsampled and mixed-frequency cases. We validate our estimation and identifiability results via extensive simulations and apply our method to evaluate causal relations in a subsampled climate dataset and a mixed-frequency econometric dataset. Taken together, we present a unified theoretical analysis and estimation methodology for subsampled and mixed-frequency cases, which have previously been studied separately. A summary of our contributions is presented in Table 1.

Table 1.

Summary of contributions of our work to identifiability and estimation in mixed-frequency sampling for structural autoregressive models: the subsampling types are as in Fig. 1; citations refer to previous work and check marks indicate our contributions

Sampling type   None A B   D
Inline graphic Identifiability Lut05 Gong15
  Estimation Lut05 Gong15 (approx.) ✓ ✓ (ce) ✓ (ce)
Inline graphic free Identifiability Hyv08
  Estimation Hyv08 ✓ (ce) ✓ (ce)

(ce), computationally expensive; Hyv08, Hyvärinen et al. (2008); Gong15, Gong et al. (2015); Lut05, Lütkepohl (2005).

2. Background

Let Inline graphicInline graphic be a Inline graphic-dimensional multivariate time series generated at a fixed sampling rate. We collect all the Inline graphic into the matrix Inline graphic. We assume that the dynamics of Inline graphic follows a combination of instantaneous effects, autoregressive effects and independent noise:

graphic file with name M9.gif (1)

where Inline graphic is the structural matrix that determines the instantaneous-time linear effects, Inline graphic is an autoregressive matrix that specifies the lag-one effects conditional on the instantaneous effects, and Inline graphic is a white noise process such that Inline graphic for all Inline graphic and Inline graphic is independent of Inline graphic for all Inline graphic such that Inline graphic. We assume that Inline graphic is distributed as Inline graphic. Solving (1) in terms of Inline graphic gives the following lag-one structural vector autoregressive process for the evolution of Inline graphic:

graphic file with name M23.gif (2)

Under the representation in (2), each element Inline graphic denotes the lag-one linear effect of series Inline graphic on series Inline graphic and Inline graphic is the structural matrix. The error Inline graphic is referred to as the shock to series Inline graphic at time Inline graphic, and the element Inline graphic is the linear instantaneous effect of Inline graphic on Inline graphic.

Conditions on Inline graphic, or equivalently Inline graphic, for model identifiability and estimation have been explored (Harvey, 1990; Kilian & Lütkepohl, 2016). The most typical condition is that Inline graphic is a lower triangular matrix with ones on the diagonal, implying a known causal ordering of the instantaneous effects. In this case, one may interpret the instantaneous effects as a directed acyclic graph (Lauritzen, 1996), i.e., a graph Inline graphic with vertices Inline graphic and directed edge set Inline graphic that has no directed cycles. A causal ordering is an ordering of the vertices into a sequence, Inline graphic, such that if Inline graphic comes before Inline graphic in Inline graphic then Inline graphic does not contain a path of edges from Inline graphic to Inline graphic; see, e.g., Shojaie & Michailidis (2010) for details. In the structural context, for Inline graphic there exists a directed edge Inline graphic from Inline graphic to Inline graphic in Inline graphic if and only if Inline graphic is nonzero. Classical estimation for structural models with known causal ordering typically proceeds by simultaneously fitting Inline graphic and Inline graphic with the identifiability constraint that Inline graphic be lower triangular. When there are no unobserved confounders, as we assume throughout this paper, we may refer to the entries in Inline graphic as instantaneous causal effects.

A recent line of work (Zhang & Hyvärinen, 2009; Hyvärinen et al., 2010; Lanne et al., 2017) focuses on estimating Inline graphic and Inline graphic when Inline graphic is unknown. When the errors Inline graphic are non-Gaussian, both the causal ordering and the instantaneous effects Inline graphic may be inferred directly from the data using techniques from independent component analysis (Hyvärinen et al., 2010). Alternatively, one may dispense with orderings and lower triangular restrictions and directly estimate Inline graphic under non-Gaussian errors (Lanne et al., 2017). Our analysis continues along these directions, leveraging non-Gaussianity of the structural model with subsampling or mixed frequencies.

3. Subsampled structural vector autoregressive models

3.1. The subsampled process

Subsampling occurs when, due to low temporal resolution, we only observe Inline graphic every Inline graphic time-steps, as displayed graphically in case (a) of Fig. 1. In this situation, we only have access to observations Inline graphic, where Inline graphic is the number of subsampled observations. By marginalizing out the unobserved Inline graphic, we obtain the evolution equations

graphic file with name M68.gif (3)
graphic file with name M69.gif (4)

where Inline graphic is the stacked vector of errors for time Inline graphic and the unobserved time-points between times Inline graphic and Inline graphic, and Inline graphic. Equation (3) states that the subsampled process is a linear transformation of the past subsampled observations with transition matrix Inline graphic and a weighted sum of the shocks across all unobserved time-points. Each shock is weighted by Inline graphic raised to the power of the time lag. We provide an example of (3) in the Supplementary Material.

Equation (4) appears to take a similar form to the structural process in (1), but now the vector of shocks, Inline graphic, is of dimension Inline graphic, with special structure on both the structural matrix Inline graphic and the distributions of the elements in Inline graphic. Unfortunately, this representation does not have the interpretation of instantaneous causal effects described in § 2, as there are now multiple shocks per individual time series. We will refer to the full parameterization of the subsampled structural model in (4) as Inline graphic. Identifiability of this model means that there is a unique pair of matrices Inline graphic and Inline graphic consistent with the joint distribution of Inline graphic at subsampling rate Inline graphic.

3.2. Lagged and instantaneous causality confounds of subsampling

A classical analysis based on Inline graphic that does not account for subsampling would incorrectly estimate lagged Granger causal effects in Inline graphic, because Inline graphic does not imply Inline graphic, and vice versa (Gong et al., 2015). Similarly, estimation of structural interactions may also be biased if subsampling is ignored. Classical structural estimation methods that assume a known causal ordering of the instantaneous shocks simply estimate the covariance of the error process, Inline graphic, and let the estimated structural matrix be the Cholesky decomposition of Inline graphic. Under subsampling, the covariance of the error process is

graphic file with name M92.gif (5)

where Inline graphic is the Kronecker product and Inline graphic is the identity matrix of size Inline graphic. The causal structure given by zeros in the Cholesky decomposition of (5) need not be the same as that implied by Inline graphic.

Example 1.

Consider the process (Gong et al., 2015)

Example 1.

so that Inline graphic. Then, for a subsampling of Inline graphic,

Example 1.

This implies no lagged causal effect between Inline graphic and Inline graphic but a relatively large instantaneous interaction, contrary to the true data-generating model; see Fig. 2.

Fig. 2.

Fig. 2.

Graphical depiction of how subsampling confounds causal analysis of both lagged and instantaneous effects: (a) the true causal diagram for the regularly sampled data; (b) the estimated causal structure of the subsampled process when the effects of subsampling are ignored.

3.3. Identifiability of L under subsampling

While both lagged Granger causality and instantaneous structural interactions are confounded by subsampling, we show here that by accounting for subsampling we may, under some conditions, still estimate the Inline graphic and Inline graphic matrices of the underlying process directly from the subsampled data. As a first step towards proving the identifiability of Inline graphic and Inline graphic, we show that the matrix Inline graphic in (4) is identifiable up to permutation and scaling of columns when the Inline graphic, the distributions of the Inline graphic, are all non-Gaussian.

Proposition 1.

Suppose that all the Inline graphic are non-Gaussian. Given a known subsampling factor Inline graphic and subsampled data Inline graphic generated according to (4), Inline graphic may be determined up to permutation and scaling of columns.

The proof closely follows that of Proposition 1 in Gong et al. (2015) and depends on the following fundamental result in independent component analysis (Eriksson & Koivunen, 2004).

Lemma 1.

Let Inline graphic and Inline graphic be two representations of the Inline graphic-dimensional random vector Inline graphic, where Inline graphic and Inline graphic are constant matrices of orders Inline graphic and Inline graphic, respectively, and Inline graphic and Inline graphic are random vectors with independent components.

If the Inline graphicth column of Inline graphic is not proportional to any column of Inline graphic, then Inline graphic is Gaussian. Moreover, if the Inline graphicth column of Inline graphic is proportional to the Inline graphicth column of Inline graphic, then the logarithms of the characteristic functions of Inline graphic and Inline graphic differ by a polynomial in a neighbourhood of the origin.

This result states that if Inline graphic is non-Gaussian with independent elements and if Inline graphic, then Inline graphic and Inline graphic must be equal up to permutation and scaling of columns. This implies that one may estimate Inline graphic from only observations of Inline graphic and that the estimate of Inline graphic should be equal, up to permutations and scalings, to the true Inline graphic.

To prove Proposition 1 using Lemma 1, note that Inline graphic is identifiable by linear regression. Hence, the error component Inline graphic satisfies the conditions of Lemma 1 and Inline graphic is identifiable up to permutations and scalings since the Inline graphic are non-Gaussian.

3.4. Complete identifiability of the structural autoregressive model when Inline graphic

Using the identifiability result for Inline graphic in Proposition 1, we can derive identifiability statements and conditions for Inline graphic and Inline graphic in the subsampled case. We require a few mild assumptions.

Assumption 1.

Let Inline graphic be stationary so that all singular values of Inline graphic have modulus less than Inline graphic.

Assumption 2.

The distributions Inline graphic are distinct for each Inline graphic after rescaling Inline graphic by any nonzero scale factor; their characteristic functions are all analytic, or are all nonvanishing, and none of them has an exponent factor with polynomial of degree at least two.

Assumption 3.

All the Inline graphic are asymmetric.

Assumption 1 is standard in time series modelling (Lütkepohl, 2005), and Assumption 2 is common in independent component analysis. While many of our identifiability results for Inline graphic only require that the Inline graphic distributions be non-Gaussian, our identifiability results for Inline graphic in part (ii) of Theorems 1 and 2 and part (iii) of Theorems 3 and 4 further require Assumption 3, namely that the Inline graphic be asymmetric. In practice, assuming fully Gaussian errors may be unrealistic, as aspects of non-Gaussianity, such as asymmetry (Harvey & Siddique, 2000; Walls, 2005; Lanne & Pentti, 2007), heavy tails (Cont, 2001; Rachev, 2003) or stochastic volatility (Justiniano & Primiceri, 2008), are often observed. Not only are non-Gaussian errors empirically appealing but, furthermore, theoretical and modelling approaches that harness the higher-order moments of non-Gaussian distributions aid in identifying model parameters that are unidentifiable from the first two moments alone.

Gong et al. (2015) give identifiability results under Assumptions 1 and 2 for the subsampled autoregression with no error correlations, Inline graphic. We restate their result in our framework.

Theorem 1

(Gong et al., 2015) Suppose that Inline graphic is non-Gaussian for all Inline graphic and Inline graphic, and that the data Inline graphic are generated by (2) with Inline graphic. Assume that the process admits another representation Inline graphic. If Assumptions 1 and 2 hold, then we have the following:

  • (i) Inline graphic can be represented as Inline graphic, where Inline graphic is a diagonal matrix with Inline graphic or Inline graphic on the diagonal; if we constrain the self influences to be positive, represented by the diagonal entries, then Inline graphic.

  • (ii) If Assumption 3 also holds, then Inline graphic.

3.5. Complete identifiability of general structural autoregressive models

For identifiability of the full model under subsampling, we require two more assumptions.

Assumption 4.

The variance of each Inline graphic is equal to Inline graphic, i.e., Inline graphic.

Assumption 5.

The matrix Inline graphic is of full rank.

Assumption 4 is common in structural modelling and removes the nonidentifiability between scaling the Inline graphic and scaling the columns of Inline graphic. Assumption 5 is mild, and covers the more restrictive assumption in non-Gaussian structural models that Inline graphic may be row- and column-permuted to a lower triangular matrix (Shimizu et al., 2006). Under these assumptions, we have the following identifiability result for general subsampled structural models.

Theorem 2.

Suppose that the Inline graphic are all non-Gaussian and independent, and that the data Inline graphic are generated by (2) with representation Inline graphic. Assume that the process also admits another subsampling representation Inline graphic. If Assumptions 1, 2 and 4 hold, then we have the following:

  • (i) Inline graphic is equal to Inline graphic up to permutation of columns and scaling of columns by Inline graphic or Inline graphic; that is, Inline graphic where Inline graphic is a scaled permutation matrix with elements being Inline graphic or Inline graphic; this implies that Inline graphic.

  • (ii) If Assumptions 3 and 5 also hold, then Inline graphic.

The requirement that Inline graphic be of full rank stems from the structure of Inline graphic. Since one may identify Inline graphic as the first Inline graphic columns of Inline graphic, to obtain Inline graphic we must premultiply the second set of Inline graphic columns of Inline graphic by Inline graphic. The asymmetry assumption is needed since the scaling of the columns of Inline graphic and Inline graphic by factors of Inline graphic or Inline graphic is ambiguous if the distributions are symmetric; the asymmetry assumption ensures that the unit scalings are identifiable; see the Supplementary Material.

If the instantaneous causal effects follow a directed acyclic graph, we may identify the structure without any prior information about causal ordering of the variables.

Corollary 1.

Suppose that Assumptions 1, 2 and 4 hold. If the true structural process corresponds to a directed acyclic graph Inline graphic, i.e., it has a lower triangular structural matrix Inline graphic with positive diagonals, and if it admits another representation with structural matrix Inline graphic, then Inline graphic. Hence the structure of Inline graphic is identifiable without prior specification of the causal ordering of Inline graphic.

This result follows because Inline graphic may be identified up to a column permutation. Based on the identifiability results of Shimizu et al. (2006), if Inline graphic corresponds to an acyclic graph, it may be row- and column-permuted to a unique lower triangular matrix. The row permutations identify the causal ordering, and the nonzero elements below the diagonal identify the edges in Inline graphic. See Shimizu et al. (2006) for more details on identifiability and estimation of the graph from Inline graphic.

Taken together, the results of Theorem 2 and Corollary 1 imply that when the shocks Inline graphic are independent and asymmetric, a complete causal diagram of the lagged effects and the instantaneous effects is fully identifiable from the subsampled time series, Inline graphic.

4. Mixed-frequency structural autoregressive models

4.1. Background and motivation

Estimation and forecasting of mixed-frequency time series are commonly approached using autoregressive models (Schorfheide & Song, 2015). Typically, the model is fitted at the same scale as the fastest sampled time series, which is depicted in Fig. 1(c). The primary motivating example of Fig. 1(c) in the literature is GDP (Anderson et al., 2016). The subsampled structure of Fig. 1(c) simplifies the temporal aggregation of GDP and is used to simplify analysis. Due to costly data collection, especially for macro-economic indicators such as GDP, the scale of the fastest sampled series is generally arbitrary and may not reflect the true causal dynamics, leading to confounded Granger and instantaneous causality judgements (Breitung & Swanson, 2002; Zhou et al., 2014). If the true causal scale, or one of interest to an analyst, is at a lower rate, as in case (d) of Fig. 1, then an analysis at the observed rate will run into the same problems as those for the single-frequency subsampling case discussed in § 3.2. We provide an example at the end of § 4.2.

Finding identifiability conditions for mixed-frequency autoregressive models with no subsampling at the fastest scale, Fig. 1(b), was an open problem for many years (Chen & Zadrozny, 1998). Recently Anderson et al. (2016) showed that the mixed-frequency nonstructural autoregressive model of Fig. 1(b) is generically identifiable from the first two observed moments, so unidentifiable models make up a set of measure zero in the parameter space. Explicit identifiability conditions for the lag-one, bivariate case from the first two moments have also been established (Anderson et al., 2012). However, no explicit identifiability conditions for structural models or models in higher dimensions have been explored.

In this section, we generalize our identifiability results from § 3 to the mixed-frequency case with arbitrary levels of subsampling for each time series. Our analysis indicates that Granger and instantaneous causal effects can be accurately estimated from mixed-frequency time series. Specifically, we use the results from § 3 to provide explicit identifiability conditions for mixed-frequency structural models under arbitrary subsampling, namely cases (b), (c) and (d) in Fig. 1, with non-Gaussian error assumptions. Altogether, our framework provides a unified way of deriving explicit identifiability conditions for both subsampling and mixed-frequency cases. While case (c) in Fig. 1 is a subsampled version of the standard mixed-frequency case, our results also cover mixed-frequency subsampling as in case (d). To the best of our knowledge, these results are the first identifiability results for subsampled mixed-frequency cases like (c) and (d).

4.2. Mixed-frequency structural autoregressive models

We assume that each time series in Inline graphic is sampled at one of two sampling rates, a slow subsampling rate Inline graphic and a fast subsampling rate Inline graphic. We write Inline graphic, where Inline graphic are those series subsampled at Inline graphic and Inline graphic are those subsampled at Inline graphic. Let Inline graphic be the list of subsampling rates for each time series. In Fig. 1(b), Inline graphic and Inline graphic, whereas in Fig. 1(c), Inline graphic and Inline graphic. Analogously to the subsampled case, we refer to a parameterization of a mixed-frequency structural model as Inline graphic, where Inline graphic is now a Inline graphic-vector. Let Inline graphic be the smallest multiple of both Inline graphic and Inline graphic; for example, in Fig. 1(c) we have Inline graphic.

We may derive a similar representation to (4) for mixed-frequency series. Fix a time-point Inline graphic such that all series are observed. Let Inline graphic be a modified Inline graphic identity matrix where all rows Inline graphic such that Inline graphic is not observed at time Inline graphic are set to zero. Further, let Inline graphic, Inline graphic and Inline graphic. Then

graphic file with name M250.gif (6)

where

graphic file with name M251.gif

Equation (6) has the same form as (4), suggesting that similar identifiability results will hold. We provide an example of (6) for a mixed-frequency series in the Supplementary Material.

In a subsampled mixed-frequency setting where the fastest rate is greater than unity, Fig. 1(c), not accounting for subsampling leads not only to the kind of mistaken inferences discussed in § 3.2 but also to further mistakes unique to the mixed-frequency case.

Example 2.

Consider a subsampled mixed-frequency structural process generated by (6) with the Inline graphic parameters given by Example 1. Suppose subsampling is not taken into account and that Inline graphic is analysed instead as a classical mixed-frequency series, case (b), based on the first two moments (Anderson et al., 2016). We consider two cases.

Case 1: the sampling rate is Inline graphic. In this case, if Inline graphic is analysed at the rate Inline graphic using the first two moments, then Inline graphic and Inline graphic are not identifiable at this rate since both off-diagonal elements of Inline graphic are zero (Anderson et al., 2016). Thus, no inference of both the instantaneous correlations and the lagged effects is possible.

Case 2: the sampling rate is Inline graphic. In this case, if Inline graphic is analysed at the rate Inline graphic using the first two moments, the estimated Inline graphic and covariance Inline graphic will be the same as in Example 1 (Anderson et al., 2016), leading to an incorrect inference that there is an instantaneous effect but no directed lagged effect.

4.3. Identifiability of mixed-frequency structural autoregressive models

We provide generalizations of Theorems 1 and 2 to the mixed-frequency case.

Theorem 3.

Suppose the Inline graphic are non-Gaussian and independent for all Inline graphic and Inline graphic, and that the data Inline graphic are generated by (2) with Inline graphic. Assume that the process also admits another mixed-frequency representation Inline graphic. If Assumptions 1 and 2 hold, then we have the following:

  • (i) Inline graphic can be represented as Inline graphic, where Inline graphic is a diagonal matrix with Inline graphic or Inline graphic on the diagonal.

  • (ii) If any multiple of Inline graphic is Inline graphic smaller than some multiple of Inline graphic, then Inline graphic. If Inline graphic, this implies Inline graphic, i.e., the Inline graphicth columns of Inline graphic and Inline graphic are equal, Inline graphic.

  • (iii) If Assumption 3 also holds, then Inline graphic.

Proof.

Statements (i) and (iii) follow since we may further subsample all series in Inline graphic to a subsampling rate of Inline graphic. This gives a subsampled Inline graphic with representation Inline graphic. Applying Theorem 1 gives the result. Furthermore, if some multiple of Inline graphic is equal to some multiple of Inline graphic minus 1, then there exists a set of times Inline graphic for (6) such that series Inline graphic is observed at time Inline graphic and series Inline graphic is observed at time Inline graphic. By identifiability of linear regression, Inline graphic. This resolves the sign ambiguity of the columns in (iii) so that Inline graphic. □

Theorem 4.

Suppose the Inline graphic are non-Gaussian and independent for all Inline graphic and Inline graphic, and that the data Inline graphic are generated by (2) with representation Inline graphic. Assume that the process also admits another mixed-frequency subsampling representation Inline graphic. If Assumptions 1, 2 and 4 hold, then we have the following:

  • (i) Inline graphic is equal to Inline graphic up to permutation of columns and scaling of columns by Inline graphic or Inline graphic, i.e., Inline graphic where Inline graphic is a scaled permutation matrix with elements being Inline graphic or Inline graphic; this implies that Inline graphic.

  • (ii) If Inline graphic is lower triangular with positive diagonals, i.e., the instantaneous interactions follow a directed acyclic graph, and if for all Inline graphic there exists a Inline graphic such that any multiple of Inline graphic is Inline graphic smaller than some multiple of Inline graphic with Inline graphic, then Inline graphic.

  • (iii) If Assumptions 3 and 5 also hold, then Inline graphic.

The proofs of statements (i) and (iii) follow the same subsampling argument as in the proof of Theorem 3. The proof of (ii) is given in the Supplementary Material.

Theorems 3 and 4 demonstrate that identifiability of structural models still holds for mixed-frequency series with subsampling under non-Gaussian errors. Statements (i) and (iii) of Theorems 3 and 4 are the same as their subsampled counterparts; statement (ii) in both theorems shows how the mixed-frequency setting provides additional information for resolving parameter ambiguities in the non-Gaussian setting. Specifically, when there is a one-time-step difference between when series Inline graphic and Inline graphic are sampled, then Inline graphic is identifiable. We can then use this information to resolve sign ambiguities in columns of Inline graphic, which leads to statement (ii) in both of Theorems 3 and 4. This result applies directly to the standard mixed-frequency setting (Schorfheide & Song, 2015; Anderson et al., 2016), where one series is observed at every time-step in Fig. 1(b). It also applies to case (d), since there exist certain time-steps where one series is observed one time-step before another series.

5. Estimation

5.1. Modelling non-Gaussian errors

We model the non-Gaussian errors as a mixture of Gaussian distributions with Inline graphic components. This approach has been adopted widely in econometrics and other fields as a flexible and tractable way of modelling non-Gaussian innovations (Gong et al., 2015; Lanne et al., 2017). Formally, we assume that Inline graphic is drawn from the mixture distribution

graphic file with name M330.gif

where Inline graphic and Inline graphic are Inline graphic-vectors specifying the mean, variance and mixing weight of each mixture component. The Inline graphic component indicators are auxiliary variables introduced to facilitate inference. The mixture model for the errors implies that conditional on the assignment indicators Inline graphic, the mean and variance of the error distribution for each series Inline graphic are time-dependent. This mixture model can capture the types of non-Guassianity required for identifiability and also those observed in real-world time series. Asymmetric errors may be formed when the mixture centres are nonzero and the variances or mixture weights are different. A non-Gaussian symmetric distribution with kurtosis greater than 1 may be formed by setting the mixture centres to zero but allowing the mixture variances to have different values. The full set of parameters for the structural model is Inline graphic where Inline graphic and Inline graphic concatenate the mixture parameters of the errors across series. For example, Inline graphic is the mean of the Inline graphicth mixture component for the Inline graphicth error distribution, and likewise for Inline graphic and Inline graphic.

5.2. Expectation-maximization algorithm

We develop an expectation-maximization algorithm for joint maximum likelihood estimation of the full set of parameters Inline graphic based only on the observed subsampled and mixed-frequency data Inline graphic. Unlike the method of Gong et al. (2015), which is tailored to the subsampled case, our method is the same for both types of data. Furthermore, the non-structural-specific, i.e., Inline graphic, algorithm of Gong et al. (2015) introduces auxiliary noise terms to facilitate inference, rendering the resulting algorithm inexact, whereas our algorithm introduces no such approximations. Since the loglikelihood is nonconvex, we employ multiple random restarts to avoid poor local optima. For the subsampled case, the local optimum problem is particularly severe due to the nonidentifiability under the first two moments; many values of Inline graphic can give a good fit to the data. The basic algorithm also suffers from slow convergence due to the large amount of missing data. To speed up the algorithm, we deploy the adaptive over-relaxed method of Salakhutdinov & Roweis (2003).

Let Inline graphic, and let Inline graphic if error Inline graphic was generated by mixture component Inline graphic and Inline graphic otherwise. The complete loglikelihood, Inline graphic, of our structural model is

graphic file with name M355.gif

where Inline graphic is the Inline graphicth row vector of Inline graphic. The algorithm alternates between the E-step, where we compute the conditional expectation Inline graphic, and the M-step, where that expectation is maximized with respect to the parameters Inline graphic. We first describe the M-step updates, and then explain how the conditional expectations are computed using a Kalman filter.

5.3. The M-step

In the M-step, we maximize the expected complete loglikelihood conditional on the observed data, Inline graphic, with respect to Inline graphic via coordinate ascent, cycling through Inline graphic, Inline graphic and Inline graphic until convergence. The specific updates are as follows.

Updating Inline graphic: each row of Inline graphic, Inline graphic, may be updated independently according to

graphic file with name M369.gif

Updating Inline graphic, Inline graphic and Inline graphic: these may be optimized jointly in one step using

graphic file with name M373.gif

Updating Inline graphic: since the maximization is not given in closed form, we use the Newton-Raphson method. Let Inline graphic be the Inline graphic vectorization. At each step, the next Inline graphic iterate is

graphic file with name M378.gif

where Inline graphic and Inline graphic is the Hessian of Inline graphic with respect to Inline graphic. Expressions for the gradient and Hessian are given in the Supplementary Material.

5.4. The E-step

All conditional expectations in the M-step above are computed using the Kalman filtering-smoothing algorithm. For simplicity, consider one block of data, so that Inline graphic, where Inline graphic and Inline graphic are fully observed but the Inline graphic for Inline graphic have some missing data and hence are not included in Inline graphic. Any subsampled or mixed-frequency series can be broken into blocks of this type. The conditional expectation Inline graphic can be computed by noticing that

graphic file with name M390.gif

For a fixed Inline graphic, Inline graphic is computed using the Kalman filtering-smoothing algorithm, since for fixed Inline graphic, Inline graphic is a linear Gaussian state-space model with latent observations Inline graphic. We compute Inline graphic for each Inline graphic combination and then add these together weighted by Inline graphic. The probability Inline graphic may be computed as

graphic file with name M400.gif

where Inline graphic is the set of prior mixture component weights, Inline graphic, and Inline graphic is the likelihood of the observed data, which may also be computed by one Kalman pass. This process is repeated for all expectations in the E-step. The computational complexity of this exact algorithm scales as Inline graphic, since the Kalman filter must be run for all combinations of Inline graphic for each block. The approximate algorithm of Gong et al. (2015) has the same complexity. Like Gong et al. (2015), we have explored approximate inference methods based on variational and Markov chain Monte Carlo methods but found their performance to be poor; see § 8.

6. Simulations

6.1. Estimation dependence on the subsampling factor and number of observations

We first investigate the performance of the expectation-maximization algorithm under subsampling. We simulate data with Inline graphic time series and Inline graphic mixture components. The asymmetric error distributions are given by Inline graphic, Inline graphic and Inline graphic for Inline graphic, and Inline graphic, Inline graphic and Inline graphic for Inline graphic. We consider two cases for Inline graphic and Inline graphic:

graphic file with name M418.gif

Simulations are performed for two subsampling factors, Inline graphic, and three sample sizes, Inline graphic. Due to subsampling, the actual sample sizes are reduced. Data from each parameter configuration are generated 10 times, and the estimation algorithm is run on each realization using 1000 random restarts. Boxplots of the error estimates for two of the scenarios are shown in Figs. 3 and 4; see the Supplementary Material for plots in the other two settings.

Fig. 3.

Fig. 3.

Boxplots of errors in Inline graphic and Inline graphic parameter estimates over 10 random data samplings. The original series is of length 203 (top), 403 (middle) or 805 (bottom) and is then subsampled at Inline graphic (left) and Inline graphic (right).

Fig. 4.

Fig. 4.

As Fig. 3 but for Inline graphic and Inline graphic.

We perform a similar experiment for Inline graphic. We simulate data with parameters

graphic file with name M428.gif

The mixture of normal error distributions for Inline graphic and Inline graphic is the same as that for the Inline graphic case. The parameters for Inline graphic are Inline graphic, Inline graphic and Inline graphic. The average error rates are displayed in Table 2 and indicate increasingly accurate estimation in trivariate structural systems as the sample size increases.

Table 2.

Average log mean squared error of Inline graphic and Inline graphic in a Inline graphic structural system over ten random samples for Inline graphic and three sample sizes

  Inline graphic Inline graphic
Inline graphic 203 403 805 203 403 805
Inline graphic Inline graphic2.4 Inline graphic7.0 Inline graphic7.5 Inline graphic0.9 Inline graphic1.6 Inline graphic6.8
Inline graphic Inline graphic3.6 Inline graphic4.8 Inline graphic5.8 Inline graphic1.8 Inline graphic1.8 Inline graphic3.9

6.2. Estimation dependence on the asymmetry of errors

We analyse estimation performance as a function of the skewness of the error distribution, Inline graphic, which is a measure of asymmetry. We simulate data from the same Inline graphic parameter configurations as in § 6.1 for Inline graphic and Inline graphic. While keeping the variance fixed, we vary the error distributions across a range of Inline graphic, Inline graphic, so that Inline graphic and Inline graphic have the same magnitude of skewness but opposite signs. The skewness values are obtained by gradually modifying the Inline graphic, Inline graphic and Inline graphic values in a bivariate mixture of normals. See the Supplementary Material for the exact parameter values and plots of the simulated error distributions.

The results for estimation of Inline graphic are shown in Table 3. First, for Inline graphic estimation remains accurate across all skewness settings for Inline graphic, while for Inline graphic the error stays low for Inline graphic but spikes for Inline graphic. For Inline graphic, estimation is stable until Inline graphic for Inline graphic, but for Inline graphic estimation is only stable at Inline graphic. Taken together, these results suggest that the strength of identifiability depends on a combination of factors, Inline graphic, Inline graphic and Inline graphic, and the level of asymmetry of the error distributions. Similar results for Inline graphic are reported in the Supplementary Material.

Table 3.

Average log mean squared error of Inline graphic over ten random samplings for both Inline graphic and Inline graphic estimates across multiple settings of the parameters, number of observations, and subsampling factors

  Inline graphic Inline graphic
Inline graphic 1.8 1.2 0.8 0.4 0 1.8 1.2 0.8 0.4 0
Inline graphic Inline graphic Inline graphic9.0 Inline graphic7.7 −7.3 −7.0 −0.018 −8.1 −7.0 −7.1 −7.0 −7.4
Inline graphic Inline graphic Inline graphic9.0 Inline graphic7.9 −7.7 −7.4 0.16 −7.9 −7.2 −7.4 −7.2 −7.5
Inline graphic Inline graphic Inline graphic9.1 Inline graphic7.9 −0.94 −0.26 1.2 −8.0 −0.33 0.71 1.6 1.6
Inline graphic Inline graphic Inline graphic9.1 Inline graphic8.0 −0.94 0.15 1.3 −8.0 −0.32 1.0 1.4 1.2

6.3. Estimation dependence on the signal-to-noise ratio

We next investigate estimation performance in subsampling and mixed-frequency sampling as a function of the signal-to-noise ratio. In these experiments we use Inline graphic and Inline graphic. We scale Inline graphic by a factor to set its maximum eigenvalue to the desired level. We perform these experiments both for full subsampling of Inline graphic and Inline graphic and for mixed-frequency subsampling where one series is observed at every time-point and the other is subsampled. Data from each parameter configuration are generated 40 times. In Fig. 5 we plot the average absolute error for estimating the Inline graphic and Inline graphic matrices as a function of the maximum eigenvalue of Inline graphic. Estimation under subsampling is stable until the maximum eigenvalue falls to about Inline graphic, and thereafter estimation becomes dramatically worse, indicating unstable estimation in this regime. The increasing error in the estimation of Inline graphic as a function of the signal-to-noise ratio is also observed in the mixed-frequency case. However, the estimation error increases less dramatically than in the subsampled case, partly due to the presence of fewer local optima in the mixed-frequency case. In the mixed-frequency case, the error in Inline graphic estimation appears to be constant across the maximum eigenvalue range we considered.

Fig. 5.

Fig. 5.

Average mean squared error (MSE) in estimation of (a) Inline graphic and (b) Inline graphic as a function of the maximum eigenvalue of Inline graphic. Results are shown for subsamplings of Inline graphic (red solid), Inline graphic (red dashed), Inline graphic (blue solid), and Inline graphic (blue dashed). Error bars indicate one standard error from 40 simulation runs.

Unstable estimation arises from a combination of two factors. First, under subsampling, the transition matrix of the subsampled process is Inline graphic, indicating that the signal strength between observations scales exponentially as a function of Inline graphic. Furthermore, the likelihood surface is multimodal, such that multiple high probability modes all have approximately the same Inline graphic value. As the signal-to-noise ratio falls, Inline graphic estimation becomes more difficult due to subsampling, and so the multimodal estimation becomes more severe, and modes far from the true Inline graphic occasionally have higher likelihood. Overall, these simulations indicate that in the subsampling case, there appears to be a threshold on the maximum eigenvalue, below which inference becomes unstable.

The simulations cover cases (a) and (b) in Fig. 1. Unfortunately, the complexity of the E-step forbids performing simulations in a reasonable time for cases (c) and (d). Future work will explore tractable inference in these cases; see the discussion at the end of § 7.

7. Real data

7.1. Subsampled ozone data

We use the subsampled structural model to analyse the causal scale and pathways in an ozone and temperature dataset. The temperature-ozone data are the 50th causal effect pair from the website https://webdav.tuebingen.mpg.de/cause-effect/, and were also considered by Gong et al. (2015). The dataset consists of temperature and ozone concentration values, sampled daily. First we standardize each time series to zero mean and unit variance. We fit the subsampled structural model to the pre-processed series for Inline graphic subsampling regimes under both independent errors, Inline graphic, and structural covariance in the instantaneous errors, Inline graphic free. To ensure that good optima are found, we perform 30 000 restarts and run the adaptive over-relaxed algorithm until the relative change in loglikelihood is less than Inline graphic.

The estimated Inline graphic for Inline graphic is

graphic file with name M534.gif

with a maximum eigenvalue of Inline graphic, suggesting that accurate estimation of subsampled parameters is possible. The Bayesian information criterion scores for all models are displayed in Table 4. Across all subsampling rates, the structural model, Inline graphic free, has lower score, indicating that the two extra parameters of the structural model, the off-diagonal elements of Inline graphic, provide necessary flexibility. Furthermore, the best-performing model is the structural model with subsampling rate Inline graphic. The estimated transition matrix for Inline graphic is

graphic file with name M540.gif

similar to that given by Gong et al. (2015) for Inline graphic. After normalizing the columns, we obtain

graphic file with name M542.gif

Table 4.

Bayesian information criterion scores of subsampling and covariance types on the atmospheric dataset; an asterisk indicates the lowest value

Model / Inline graphic 1 2 3 4
Inline graphic 901.96 791.02 839.56 797.00
Inline graphic free 784.53 777.78Inline graphic 790.46 791.23

These results indicate weak lagged effects at the subsampled scale, but stronger instantaneous effects between temperature and ozone. Furthermore, the temperature series derives most of its power from a strong error variance, while the ozone series is driven more by the autoregressive component. See the Supplementary Material for quantile-quantile plots of the inferred mixture of error distributions.

7.2. Mixed-frequency data: GDP and treasury bonds

We perform a structural autoregressive analysis on the mixed-frequency dataset of quarterly GDP and monthly price of treasury bonds. The dataset has previously been compiled and analysed in the mixed-frequency setting by Schorfheide & Song (2015) and is available in the Supplementary Material. We follow Schorfheide & Song (2015) and compute the logarithm of both series. Furthermore, as is common in mixed-frequency analysis (Chen & Zadrozny, 1998), we compute first differences to remove first-order nonstationarities.

There are multiple approaches to modelling mixed-frequency GDP data in the literature. Recently, many authors have treated GDP as a flow variable and used state-space models to directly model the aggregation over months in a quarter (Schorfheide & Song, 2015; Ghysels, 2016). Others ignore the generative subsampling structure and instead jointly model the high- and low-frequency variables in a quarter using mixed data sampling methods (Ghysels, 2016). We follow another line of work that simplifies the analysis by ignoring aggregation (Chen & Zadrozny, 1998; Seong, 2012; Eraker et al., 2014; Anderson et al., 2016; Zadrozny, 2016), thus treating GDP as a purely subsampled series, and apply our mixed-frequency structural autoregressive model at the monthly rate. Indeed, recent theoretical work on mixed-frequency autoregressive models for GDP also focuses on the purely subsampled, rather than aggregated, case (Anderson et al., 2016). Since our subsampled approach to modelling GDP is a simplifying assumption that ignores aggregation, extending our framework to handle aggregated variables is an important direction of future research.

In the traditional approaches to mixed-frequency analysis, Inline graphic and the instantaneous covariance Inline graphic are generically identifiable from the first two moments (Anderson et al., 2016). What sets our non-Gaussian approach apart in this mixed-frequency domain is its ability to uniquely identify the ordering of the instantaneous causal effects in the structural matrix Inline graphic. To highlight this ability, we perform model selection on the zero entries in Inline graphic to determine the causal ordering of the instantaneous effects. Specifically, we calculate the Bayesian information criterion for the nested models Inline graphic, Inline graphic, Inline graphic, and Inline graphic. Models Inline graphic, Inline graphic and Inline graphic represent acyclic structures on the instantaneous effects, while the unrestricted model Inline graphic does not. The scores for all models shown in Table 5 indicate that Inline graphic performs best. The estimated matrices of

graphic file with name M560.gif

suggest a slight negative lagged interaction from GDP to treasury bonds and an instantaneous interaction at the monthly scale from treasury bonds to GDP. See the Supplementary Material for quantile-quantile plots of the inferred mixture of error distributions.

Table 5.

Bayesian information criterion scores of different instantaneous causality structures on the GDP dataset; an asterisk indicates the lowest value

Model Inline graphic Inline graphic Inline graphic Inline graphic
  1984.00 1983.41 1981.08Inline graphic 1987.55

The above analysis fits a structural model at the time scale of months, the same sampling rate as the treasury bond time series. The results from § 4 indicate that we could uniquely identify models at bimonthly, or even more granular, time scales. However, even at the bimonthly rate, the computational complexity of the E-step of our algorithm becomes prohibitive due to the large number of combinations of error mixture components in a data block, as discussed in § 5.4. Since the E-step requires running the forward-backward algorithm many times, a considerable computational speed-up could be achieved from a parallel implementation.

8. Discussion

Our results provide sufficient conditions for identifiability of structural autoregressive models for both subsampled and mixed-frequency series. The causal diagram of both lagged and instantaneous effects is identifiable under arbitrary subsampling and non-Gaussian errors.

We have developed an exact expectation-maximization algorithm for estimation and analysed its performance via simulations. Our algorithm has two drawbacks: high complexity due to a Kalman filter evaluation for all mixture error assignments within a time block; and many local optima due to weak identifiability. Our simulations show that the latter problem is more severe under even subsampling factors and low signal-to-noise regimes.

An ongoing line of work is to develop approximate inference for these models using Markov chain Monte Carlo or variational methods. Unfortunately, we have found that the local optima make sampling difficult. A Gibbs sampler we have explored gets stuck in one local mode and requires the same number of random restarts as our algorithm to find a good solution. Perhaps incorporating recent advances in sampling (Ma et al., 2016) may prove beneficial. We have also found the performance of a variational algorithm to be poor. Similarly, Gong et al. (2015) reported significantly worse results for a variational approach than for their approximate expectation-maximization algorithm. By breaking the dependence between the unobserved, subsampled Inline graphic and the auxiliary Inline graphic, the variational approach avoids the combinatorial evaluation of a Kalman filter; however, this dependence is critical for correctly evaluating the probable trajectories of the latent Inline graphic, without which inference of Inline graphic suffers.

While our work has focused on point estimation, future research aims to adapt the time series bootstrap to the mixed-frequency and subsampled settings for constructing confidence intervals. It would be interesting to explore method-of-moments estimation for this problem, which may side-step the local optima difficulty and the combinatorial complexity of our algorithm.

Supplementary Material

asz007_Supplementary_Material

Acknowledgement

This research was partially funded by the U.S. National Science Foundation, National Institutes of Health, Air Force Office of Scientific Research, and Office of Naval Research. We thank the referees for helpful comments and suggestions.

Supplementary material

Supplementary material available at Biometrika online includes an example of the subsampled and mixed-frequency structural processes, detailed proofs of Theorems 2 and 4, details on the Inline graphic update in the expectation-maximization algorithm, additional simulation results, both of the real datasets that we analysed, and the code for the expectation-maximization algorithm.

References

  1. Anderson, B. D., Deistler, M., Felsenstein, E., Funovits, B., Koelbl, L. & Zamani, M. (2016). Multivariate AR systems and mixed-frequency data: G-identifiability and estimation. Economet. Theory 32, 793–826. [Google Scholar]
  2. Anderson, B. D., Deistler, M., Felsenstein, E., Funovits, B., Zadrozny, P., Eichler, M., Chen, W. & Zamani, M. (2012). Identifiability of regular and singular multivariate autoregressive models from mixed-frequency data. In 51st IEEE Conference on Decision and Control (CDC 2012). Piscataway, New Jersey: IEEE, pp. 184–9. [Google Scholar]
  3. Boot, J. C., Feibes, W. & Lisman, J. H. C. (1967). Further methods of derivation of quarterly figures from annual data. Appl. Statist. 16, 65–75. [Google Scholar]
  4. Bowen, N. K. & Guo, S. (2011). Structural Equation Modeling. Oxford: Oxford University Press. [Google Scholar]
  5. Breitung, J. & Swanson, N. R. (2002). Temporal aggregation and spurious instantaneous causality in multiple time series models. J. Time Ser. Anal. 23, 651–65. [Google Scholar]
  6. Chen, B. & Zadrozny, P. A. (1998). An extended Yule-Walker method for estimating a vector autoregressive model with mixed-frequency data. Adv. Economet. 13, 47–74. [Google Scholar]
  7. Cont, R. (2001). Empirical properties of asset returns: Stylized facts and statistical issues. Quant. Finance 1, 223–36. [Google Scholar]
  8. Danks, D. & Plis, S. (2013). Learning causal structure from undersampled time series. In NIPS 2013 Workshop on Causality (Lake Tahoe, Nevada, 9December2013). [Google Scholar]
  9. Eraker, B., Chiu, C. W., Foerster, A. T., Kim, T. B. & Seoane, H. D. (2014). Bayesian mixed-frequency VARs. J. Finan. Economet. 13, 698–721. [Google Scholar]
  10. Eriksson, J. & Koivunen, V. (2004). Identifiability, separability, and uniqueness of linear ICA models. Sig. Proces. Lett. 11, 601–4. [Google Scholar]
  11. Ghysels, E. (2016). Macroeconomics and the reality of mixed-frequency data. J. Economet. 193, 294–314. [Google Scholar]
  12. Gong, M., Zhang, K., Schölkopf, B., Tao, D. & Geiger, P. (2015). Discovering temporal causal relations from subsampled data. In Proceedings of the 32nd International Conference on Machine Learning (Lille, France). New York: Association for Computing Machinery, pp. 1898–906. [Google Scholar]
  13. Harvey, A. C. (1990). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. [Google Scholar]
  14. Harvey, C. R. & Siddique, A. (2000). Conditional skewness in asset pricing tests. J. Finance 55, 1263–95. [Google Scholar]
  15. Herwartz, H. & Plödt, M. (2016). The macroeconomic effects of oil price shocks: Evidence from a statistical identification approach. J. Int. Money Finance 61, 30–44. [Google Scholar]
  16. Hyttinen, A., Plis, S., Järvisalo, M., Eberhardt, F. & Danks, D. (2016). Causal discovery from subsampled time series data by constraint optimization. arXiv: 1602.07970. [PMC free article] [PubMed] [Google Scholar]
  17. Hyvärinen, A., Karhunen, J. & Oja, E. (2004). Independent Component Analysis. New York: John Wiley & Sons. [Google Scholar]
  18. Hyvärinen, A., Shimizu, S. & Hoyer, P. O. (2008). Causal modelling combining instantaneous and lagged effects: An identifiable model based on non-Gaussianity. In Proceedings of the 25th International Conference on Machine Learning (Helsinki, Finland). New York: Association for Computing Machinery, pp. 424–31. [Google Scholar]
  19. Hyvärinen, A., Zhang, K., Shimizu, S. & Hoyer, P. O. (2010). Estimation of a structural vector autoregression model using non-Gaussianity. J. Mach. Learn. Res. 11, 1709–31. [Google Scholar]
  20. Justiniano, A. & Primiceri, G. E. (2008). The time-varying volatility of macroeconomic fluctuations. Am. Econ. Rev. 98, 604–41. [Google Scholar]
  21. Kilian, L. & Lütkepohl, H. (2016). Structural Vector Autoregressive Analysis. Cambridge: Cambridge University Press. [Google Scholar]
  22. Lanne, M. & Lütkepohl, H. (2010). Structural vector autoregressions with non-normal residuals. J. Bus. Econ. Statist. 28, 159–68. [Google Scholar]
  23. Lanne, M., Lütkepohl, H. & Maciejowska, K. (2010). Structural vector autoregressions with Markov switching. J. Econ. Dynam. Contr. 34, 121–31. [Google Scholar]
  24. Lanne, M., Meitz, M. & Saikkonen, P. (2017). Identification and estimation of non-Gaussian structural vector autoregressions. J. Economet. 196, 288–304. [Google Scholar]
  25. Lanne, M. & Pentti, S. (2007). Modeling conditional skewness in stock returns. Eur. J. Finance 13, 691–704. [Google Scholar]
  26. Lauritzen, S. L. (1996). Graphical Models. Oxford: Oxford University Press. [Google Scholar]
  27. Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Berlin: Springer. [Google Scholar]
  28. Ma, Y.-A., Chen, T., Wu, L. & Fox, E. B. (2016). A unifying framework for devising efficient and irreversible MCMC samplers. arXiv: 1608.05973. [Google Scholar]
  29. Moauro, F. & Savio, G. (2005). Temporal disaggregation using multivariate structural time series models. Economet. J. 8, 214–34. [Google Scholar]
  30. Peters, J., Janzing, D. & Schölkopf, B. (2013). Causal inference on time series using restricted structural equation models. In Proceedings of the 26th International Conference on Neural Information Processing Systems (Lake Tahoe, Nevada). New York: Association for Computing Machinery, pp. 154–62. [Google Scholar]
  31. Plis, S., Danks, D., Freeman, C. & Calhoun, V. (2015). Rate-agnostic (causal) structure learning. In Proceedings of the 28th International Conference on Neural Information Processing Systems (Montreal, Canada). New York: Association for Computing Machinery, pp. 3303–11. [PMC free article] [PubMed] [Google Scholar]
  32. Rachev, S. T. (2003). Handbook of Heavy Tailed Distributions in Finance, vol. 1 of Handbooks in Finance. Amsterdam: Elsevier. [Google Scholar]
  33. Salakhutdinov, R. & Roweis, S. T. (2003). Adaptive overrelaxed bound optimization methods. In Proceedings of the 20th International Conference on Machine Learning (Washington, DC). New York: Association for Computing Machinery, pp. 664–71. [Google Scholar]
  34. Schorfheide, F. & Song, D. (2015). Real-time forecasting with a mixed-frequency VAR. J. Bus. Econ. Statist. 33, 366–80. [Google Scholar]
  35. Seong, B. (2012). Cointegration analysis with mixed-frequency data of quarterly GDP and monthly coincident indicators. Korean J. Appl. Statist. 25, 925–32. [Google Scholar]
  36. Shimizu, S., Hoyer, P. O., Hyvärinen, A. & Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7, 2003–30. [Google Scholar]
  37. Shojaie, A. & Michailidis, G. (2010). Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika 97, 519–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Silvestrini, A. & Veredas, D. (2008). Temporal aggregation of univariate and multivariate time series models: A survey. J. Econ. Surv. 22, 458–97. [Google Scholar]
  39. Stram, D. O. & Wei, W. W. (1986). A methodological note on the disaggregation of time series totals. J. Time Ser. Anal. 7, 293–302. [Google Scholar]
  40. Walls, W. D. (2005). Modelling heavy tails and skewness in film returns. Appl. Finan. Econ. 15, 1181–8. [Google Scholar]
  41. Zadrozny, P. A. (2016). Extended Yule–Walker identification of VARMA models with single or mixed-frequency data. J. Economet. 193, 438–46. [Google Scholar]
  42. Zhang, K. & Hyvärinen, A. (2009). Causality discovery with additive disturbances: An information-theoretical perspective. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Berlin, Germany). Berlin: Springer, pp. 570–85. [Google Scholar]
  43. Zhou, D., Zhang, Y., Xiao, Y. & Cai, D. (2014). Analysis of sampling artifacts on the Granger causality analysis for topology extraction of neuronal dynamics. Front. Comp. Neurosci. 8, 10.3389/fncom.2014.00075. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

asz007_Supplementary_Material

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES