Causal Discovery from Temporally Aggregated Time Series

Mingming Gong; Kun Zhang; Bernhard Schölkopf; Clark Glymour; Dacheng Tao

. Author manuscript; available in PMC: 2018 Jun 11.

Published in final edited form as: Uncertain Artif Intell. 2017 Aug;2017:269.

Causal Discovery from Temporally Aggregated Time Series

Mingming Gong ^*,^†, Kun Zhang ^†, Bernhard Schölkopf ^‡, Clark Glymour ^†, Dacheng Tao ^#

PMCID: PMC5995575 NIHMSID: NIHMS904883 PMID: 29899680

Abstract

Discovering causal structure of a dynamical system from observed time series is a traditional and important problem. In many practical applications, observed data are obtained by applying subsampling or temporally aggregation to the original causal processes, making it difficult to discover the underlying causal relations. Subsampling refers to the procedure that for every k consecutive observations, one is kept, the rest being skipped, and recently some advances have been made in causal discovery from such data. With temporal aggregation, the local averages or sums of k consecutive, non-overlapping observations in the causal process are computed as new observations, and causal discovery from such data is even harder. In this paper, we investigate how to recover causal relations at the original causal frequency from temporally aggregated data when k is known. Assuming the time series at the causal frequency follows a vector autoregressive (VAR) model, we show that the causal structure at the causal frequency is identifiable from aggregated time series if the noise terms are independent and non-Gaussian and some other technical conditions hold. We then present an estimation method based on non-Gaussian state-space modeling and evaluate its performance on both synthetic and real data.

1 INTRODUCTION

Causal modeling (Spirtes et al., 2001; Pearl, 2000) of time series data has been widely applied in many fields such as econometrics (Ghysels et al., 2016), neuroscience (Zhou et al., 2014), and climate science (Van Nes et al., 2015). Classical causal discovery approaches, e.g., Granger causality test (Granger, 1969), usually assume that the data measurement frequency matches the true causal frequency of the underlying physical process. However, since the true causal frequency is usually unknown, the time series data are often measured at the frequency lower than the causal frequency. For example, some econometric indicators such as GDP and non-farm payroll are usually recorded at quarterly and monthly scales. Causal interactions between the processes, however, may take place at the weekly or fortnightly scales (Ghysels et al., 2016). In neuroscience, imaging technologies have relatively low temporal resolutions, while many high frequency neuronal interactions are important for understanding neuronal dynamics (Zhou et al., 2014). In these situations, the available observations have a lower resolution than the underlying causal process.

There are two typical schemes to generate low-resolution or low-frequency data from high-frequency ones (Silvestrini & Veredas, 2008; Marcellino, 1999). One is by subsampling: for every k consecutive observations, one is kept, the rest being skipped. The other is temporally aggregation, i.e., taking the local averages or sums of k consecutive, non-overlapping observations from the underlying causal process as new observations. For instance, the time series of interest, money supply, and temperature are usually obtained by subsampling; in contrast, the U.S. nominal GDP was obtained by aggregation – it refers to a total number of dollars spent over a time period.

Numerous contributions have been made on analyzing the effects of the above two schemes to generate low-resolution data on the properties of the time series such as estimated causal relations and exogeneity (Tiao, 1972; Weiss, 1984; Granger, 1987; Marcellino, 1999; Rajaguru & Abeysinghe, 2008). These studies found that temporal aggregation can lead to errors in the estimated causal relations if not properly addressed. For example, Breitung & Swanson (2002) examined the impact of temporal aggregation on Granger causality in vector autoregressive (VAR) models and found that the results of Granger causal analysis heavily depend on temporal aggregation.

Recovering the high frequency causal relations from temporally aggregated data is a very hard problem due to information loss in the aggregation process. A classical way to discover high frequency causal relations from temporally aggregated data is to first disaggregate the low frequency time series to high frequency ones and then apply standard causal discovery methods on the disaggregated data. Temporal disaggregation of low resolution time series has been extensively studied in the econometric and statistical literature (Boot et al., 1967; Stram & Wei, 1986; Harvey & Chung, 2000; Moauro & Savio, 2005; Proietti, 2006), which is clearly an even harder problem than discovering causal relations.

Recently, a set of methods have been proposed to estimate the causal structure at the causal frequency from subsampled data without resorting to disaggregation techniques (Hyttinen et al., 2016; Gong et al., 2015; Plis et al., 2015a; Danks & Plis, 2013). Plis et al. (2015a, b) first inferred the causal structure from the subsampled data, and then searched for the causal structure at the causal frequency from the causal structure inferred in the first step. Based on this framework, Hyttinen et al. (2016) proposed a much faster inference method using a general purpose Boolean constraint solver. Gong et al. (2015) proposed a model-based approach and examined the identifiability of the underlying vector autoregressive model (VAR) at the causal frequency from subsampled time series. They showed that the causal transition matrix is identifiable if the noise process is non-Gaussian. This work was recently extended to mixed frequency data by structural VAR modeling (Tank et al., 2017). However, how to estimate causal relations from aggregated data still remains unknown.

Compared to subsampling, temporal aggregation is perhaps more widely used to produce low-resolution time series, especially in economics and finance. However, the effect of temporal aggregation is more complex, and accordingly it is technically more difficult to recover the underlying causal relations from such data. Specifically, because the noise terms are generated by a larger number of independent components and thus the mixing matrix contains a more complicated structure, the estimation is both statistically and computationally harder.

The objective of this paper is to seek a possible solution to this problem, by studying the theoretical identifiability of the underlying causal relations and developing a practical causal discovery algorithm. Following (Gong et al., 2015), we assume that the high-frequency data follow a VAR model, the error terms are non-Gaussian, and there are no confounders (Geiger et al., 2015). We show that the original causal relation can be estimated from the aggregated data with known k, under a set of technical conditions.

Moreover, we propose an estimation method based on non-Gaussian state-space modeling of the aggregated data. Since the exact inference in the non-Gaussian state-space model is intractable, we estimate the model parameters using the particle stochastic approximation EM (PSAEM) algorithm (Lindsten, 2013; Svensson et al., 2014), which combines the efficient conditional particle filter with ancestor sampling (CPF-AS) (Lindsten et al., 2014) with the stochastic approximation EM (SAEM) algorithm (Delyon et al., 1999). Interestingly, in the extreme case where the aggregation factor k becomes larger and larger, we show that the observed time series will become independent and identically distributed (i.i.d.), and we study to what extent the underlying time-delayed causal relations can be recovered from the instantaneous dependence in the observed data.

2 EFFECT OF TEMPORAL AGGREGATION

In the linear case, Granger causal analysis Granger (1969) can be done by fitting the following first-order VAR model (Sims, 1980):

x_{t} = {Ax}_{t - 1} + e_{t},

(1)

where x_t = (x_t,1, x_t,2, …, x_t,n)^𝖳 is the vector of the observed data, e_t = (e_t,1, …, e_t,n)^𝖳 is the temporally and contemporaneously independent noise process, and A is the causal transition matrix containing temporal causal relations.

2.1 WITH A FINITE k

Gong et al. (2015) studied causal discovery from subsampled data. With subsampling, the observations ${\tilde{x}}_{1 : T}^{s} ≜ ({\tilde{x}}_{1}^{s}, {\tilde{x}}_{2}^{s}, \dots, {\tilde{x}}_{T}^{s}) = (x_{1}, x_{1 + k}, \dots, x_{1 + (T - 1) k})$ follow

{\tilde{x}}_{t}^{s} = {Ax}_{1 + (t - 1) k - 1} + e_{1 + (t - 1) k} = A ({Ax}_{1 + (t - 1) k - 2} + e_{1 + (t - 1) k - 1}) + e_{1 + (t - 1) k} = \dots = A^{k} {\tilde{x}}_{t - 1}^{s} + \sum_{l = 0}^{k - 1} A^{l} e_{1 + (t - 1) k - l},

(2)

which turns out to be a VAR model with temporally independent and contemporaneously dependent noise process. They demonstrated that it is possible to identify the high-resolution causal relation A from the low-resolution observations ${\tilde{x}}_{1 : T}^{s}$ if the noise terms are non-Gaussian.

In this paper, we are concerned with the temporally aggregated data x̃_1:T ≜ (x̃₁, x̃₂, …, x̃_T), which are obtained by taking the average (or sum) of every non-overlapping k points, i.e., ${\tilde{x}}_{t} = \frac{1}{k} \sum_{i = 1}^{k} x_{i + (t - 1) k}$ , where

x_{i + (t - 1) k} = A^{k} x_{i + (t - 2) k} + \sum_{l = 0}^{k - 1} A^{l} e_{i + (t - 1) k - l} .

Taking the average of the above equation over i = 1, 2, …, k, we have

{\tilde{x}}_{t} = \frac{A^{k}}{k} \sum_{i = 1}^{k} x_{i + (t - 2) k} + \frac{1}{k} \sum_{i = 1}^{k} (\sum_{l = 0}^{k - 1} A^{l} e_{i + (t - 1) k - l}) = A^{k} {\tilde{x}}_{t - 1} + \frac{1}{k} (\sum_{m = 0}^{k - 1} (\sum_{n = 0}^{m} A^{n}) e_{tk - m} + \sum_{m = 1}^{k - 1} (\sum_{n = m}^{k - 1} A^{n}) e_{(t - 1) k - m + 1}),

(3)

which is a vector autoregressive-moving-average (VARMA) model with one autoregressive term and two moving-average terms:

{\tilde{x}}_{t} = A^{k} {\tilde{x}}_{t - 1} + {\vec{e}}_{t} = A^{k} {\tilde{x}}_{t - 1} + M_{0} ε_{t} + M_{1} ε_{t - 1},

(4)

where $ε_{t} = \frac{1}{k} {[e_{tk}, e_{tk - 1}, \dots, e_{(t - 1) k + 1}]}^{𝖳}, M_{0} = [I, I + A, \dots, \sum_{n = 0}^{k - 1} A^{n}]$ , and $M_{1} = [\sum_{n = 1}^{k - 1} A^{n}, \dots, A^{k - 1}, 0]$ . Here, I represents the n × n identity matrix, and 0 represents the n × n zero matrix. We call (A, e, k) the representation of the k-th order aggregated time series x̃. Clearly A cannot be recovered by simply fitting a VAR model on x̃_t, as done by Granger causal analysis. Even if we using VARMA modeling, we are only guaranteed to identify A^k instead of the original A. In Section 3, we will show under what conditions can we identify the causal relation A at the causal frequency from the aggregated time series x̃_1:T.

2.2 WHEN k → ∞

Interestingly, causal discovery from aggregated data with a large aggregation factor k seems to have a wide range of applications. For instance, in the stock market, the causal influences between stocks take place very quickly (as indicated by the efficient market hypothesis), but we usually work with low-frequency data such as daily returns. The daily return is the sum of high-frequency returns within the same day. Discovering the causal interactions between stocks from their daily returns then become a problem of causal discovery from aggregated data with a large k.

When the aggregation factor k is very large, e⃗_t becomes a mixture of numerous independent components. Fortunately, we can use a simple model to approximate the generating process of the aggregated data. From (1), we have

\sum_{i = 1}^{k} x_{i + (t - 1) k + 1} = A \sum_{i = 1}^{k} x_{i + (t - 1) k} + \sum_{i = 1}^{k} e_{i + (t - 1) k + 1},

that is,

{\tilde{x}}_{t} = \frac{1}{k} \sum_{i = 1}^{k} x_{i + (t - 1) k + 1} = A [\frac{1}{k} \sum_{i = 1}^{k} x_{i + (t - 1) k + 1} - \frac{1}{k} (x_{1 + tk} - x_{1 + (t - 1) k})] + \frac{1}{k} \sum_{i = 1}^{k} e_{i + (t - 1) k + 1} .

Denote by ē_t the error term above, i.e., ${\bar{e}}_{t} = \frac{1}{k} \sum_{i = 1}^{k} e_{i + (t - 1) k + 1}$ . Note that ē_t has contemporaneous independent components. Since

\frac{1}{k} (x_{1 + tk} - x_{1 + (t - 1) k}) \to 0, as k \to \infty,

we have

{\tilde{x}}_{t} = A {\tilde{x}}_{t} + {\bar{e}}_{t},

(5)

as k → ∞. This is a linear instantaneous causal model for the components of x̃_t because the components of the total error term, ē_t, are still contemporaneously independent. When the error terms are non-Gaussian, it has the same form as the Linear, Non-Gaussian Model (LiNG) (Lacerda et al., 2008); when the causal relations are further assumed to be acyclic, it follows the form of the Linear, Non-Gaussian Acyclic Model (LiNGAM) (Shimizu et al., 2006). The difference is that in LiNG or LiNGAM, the self-loop influences, A_ii, are assumed to be zero. We will also investigate the identifiably of A in this case in Section 3.

3 IDENTIFIABILITY OF CAUSAL RELATIONS IN A

We investigate the identifiability of the high-resolution causal transition matrix A from the aggregated time time series x̃_1:T. In other words, suppose x̃ also admits another representation (A′, e′, k), we aim to see whether it is always the case that A = A′ as the sample size T → ∞. If the noise terms follow the Gaussian distribution, A is usually not identifiable (Palm & Nijman, 1984). Recently, it has been shown that A is identifiable from subsampled time series if the noise terms are non-Gaussian (Gong et al., 2015). However, this does not give rise to the identifiability of A from aggregated time series – the latter is much more difficult to see as the aggregated model described in (3) has a more complicated structure. Here, we show that, in the exact model (3), A is identifiable from the aggregated data under appropriate conditions; furthermore, as k → ∞, the approximate model (5) holds, and A is partially identifiable from the aggregated data, but the identification procedure is computationally much more efficient.

First, we will show that A^k can be identified by fitting the VARMA model (4). We make the following assumption.

A1
At least one of the τ-step (τ ≥ 2) delayed cross covariance matrices of x̃_t, $𝔼 [{\tilde{x}}_{t - 1} {\tilde{x}}_{t - τ}^{𝖳}]$ , is invertible.

Since ε_t is both temporally and contemporaneously independent, ε_t and ε_t−1 are independent of x̃_t−τ, which implies that

𝔼 [ε_{t} {\tilde{x}}_{t - τ}^{𝖳}] = 0, 𝔼 [ε_{t - 1} {\tilde{x}}_{t - τ}^{𝖳}] = 0 .

Multiplying both sides of (4) from the right side by ${\tilde{x}}_{t - τ}^{𝖳}$ and taking the expectation, we have

𝔼 [{\tilde{x}}_{t} {\tilde{x}}_{t - τ}^{𝖳}] = A^{k} 𝔼 [{\tilde{x}}_{t - 1} {\tilde{x}}_{t - τ}^{𝖳}] + M_{0} 𝔼 [ε_{t} {\tilde{x}}_{t - τ}^{𝖳}] + M_{1} 𝔼 [ε_{t - 1} {\tilde{x}}_{t - τ}^{𝖳}] = A^{k} 𝔼 [{\tilde{x}}_{t - 1} {\tilde{x}}_{t - τ}^{𝖳}] .

(6)

Under the assumption A₁, we can first see that A^k is identifiable:

A^{k} = 𝔼 [{\tilde{x}}_{t} {\tilde{x}}_{t - τ}^{𝖳}] \cdot 𝔼 {[{\tilde{x}}_{t - 1} {\tilde{x}}_{t - τ}^{𝖳}]}^{- 1} .

(7)

3.1 IDENTIFIABILITY WITH FINITE k

Substituting the above equation into (3), one can then find e⃗_t, which is defined to be linear mixtures of (2k − 1) noise terms, i.e., e_tk, e_tk−1, …, and e_(t−2)k+2. In the following, we will concentrate on the identifiablity of A from e⃗.

Let

H ≜ [I, I + A, \dots, \sum_{l = 0}^{k - 1} A^{l}, \sum_{l = 1}^{k - 1} A^{l}, \dots, \sum_{l = k - 2}^{k - 1} A^{l}, A^{k - 1}] .

(8)

The error terms in (3) correspond to the following mixing procedure of random vectors:

\vec{e} = H \tilde{e}, where

(9)

\tilde{e} = {(e_{1}^{(0)}, \dots, e_{n}^{(0)}, e_{1}^{(1)}, \dots, e_{n}^{(1)}, \dots, e_{1}^{(2 k - 2)}, \dots, e_{n}^{(2 k - 2)})}^{𝖳} .

Here, $e^{l} = {(e_{1}^{(l)}, \dots, e_{n}^{(l)})}^{𝖳}$ together with the time index t represent e_tk−l. The components of e⃗ are independent, and for each i, $e_{i}^{(l)}$ , l = 0, …, 2k − 2, have the same distribution p_{e_i}. Under the condition that p_{e_i} is non-Gaussian for each i, H can be estimated up to the permutation and scaling indeterminacies (including the sign indeterminacy) of the columns, as given in the following proposition.

Proposition 1

Suppose that all p_{e_i} are non-Gaussian. Given k and x̃_1:T generated according to (3), H can be determined up to permutation and scaling of columns.

For the proof of Proposition 1, please refer to (Gong et al., 2015).

We make the following assumptions on the underlying dynamic process (1) and the distributions p_{e_i}, and then we have the identifiability result for the causal transition matrix A.

A2
The system is stable, in that all eigenvalues of A have modulus smaller than one.
A3
The distributions p_{e_i} are different for different i after re-scaling by any non-zero scale factor, their characteristic functions are all analytic (or they are all non-vanishing), and none of them has an exponent factor with a polynomial of degree at least 2.

The following identifiability result on A states that in various situations, A for the original high frequency data is fully identifiable.

Theorem 1

Suppose all of e_it are non-Gaussian, and that the data x̃_t are generated by (3) and that it also admits another kth order aggregation representation (A′, e′, k). Let assumptions A1 and A2 hold. When the number of observed data points T → ∞, the following statements are true.

A′ can be represented as A′ − I = (A − I)D, where D is a diagonal matrix with 1 or −1 on its diagonal. If we constrain all the self influences, represented by diagonal entries of A and A′, to be no greater than 1, then A′ = A.
If each p_{e_i} is asymmetric, we have A′ = A.

A complete proof of Theorem 1 can be found in Section 6.

3.2 IDENTIFIABILITY AS k → ∞

We have shown that A is identifiable from aggregated data (3) when k is finite. However, when k becomes larger, estimating A will encounter more difficulty because more independent components in (9) are involved. When k = ∞, it is not necessary for Proposition 1, as well as Theorem 1, to hold, because e⃗ in (9) is the mixture of an infinite number of independent components.

Interestingly, as k → ∞, x̃_t follows an instantaneous causal model in the i.i.d. case, as shown in (5). We will then answer the following two questions. In this case, can we still estimate A from aggregated data? If we can, is there an efficient procedure to do so?

Equation (5) implies (I − A)x̃_t = ē_t. That is, applying the linear transformation (I − A) on x̃_t produces independent components, as components of ē_t. This can be achieved by the independent component analysis (ICA) procedure (Hyvärinen et al., 2001), and (I − A) can be estimated up to row scaling and permutation indeterminacies. We then have the following observations.

First, the diagonal entries of A, A_ii, which represent the self influences or “self-loops" of the time-delayed causal relations, cannot be determined (Lacerda et al., 2008). (Here we have assumed A_ii ≠ 1.) This is because the scale of each row of (I − A) is unknown, and so is (1 − A_ii).

Let D_A be the diagonal matrix with A₁₁, A₂₂, …, A_nn on its diagonal. Equation (5) is equivalent to

(I - D_{A}) {\tilde{x}}_{t} = (A - D_{A}) {\tilde{x}}_{t} + {\bar{e}}_{t} \Rightarrow {\tilde{x}}_{t} = \underset{≜ A^{NoSelfLoop}}{\underset{︸}{{(I - D_{A})}^{- 1} (A - D_{A})}} {\tilde{x}}_{t} + {(I - D_{A})}^{- 1} {\bar{e}}_{t} .

(10)

Secondly, suppose there is no feedback loop between the processes after removing the self-loops, meaning that (A − D_A) can be permuted to a strictly lower-triangular matrix by equal row and column permutations. According to the LiNGAM model, which assumes these is no self-loop, (I − D_A)⁻¹(A − D_A) in (10) can be uniquely estimated (Shimizu et al., 2006). In other words, if one applies LiNGAM analysis on x̃_t, the estimated causal coefficients from th ith variable to the jth variable is actually (1 − A_jj)⁻¹A_ji. From this we can see whether A_ji is zero or not; furthermore, if the self-loops A_jj are given by prior knowledge, then A is fully identifiable.

Thirdly, suppose there exist feedback loops between the processes after removing the self-loops. In this case, (A − D_A) cannot be permuted to a strictly lower-triangular matrix by equal row and column permutations. The identifiability of A in (5) has been studied by Lacerda et al. (2008): suppose the feedback loops are disjoint, although in theory there are multiple solutions to A^NoSelfLoop, the most stable solution (the product of the coefficients in A^NoSelfLoop along each loop is minimized) is unique.

4 ESTIMATING THE CAUSAL RELATIONS FROM AGGREGATED DATA

In this section, we present the algorithm to estimate A from aggregated data with finite k. Clearly, the larger k, the more difficult it is to estimate A from aggregated data. Therefore, when k is relatively large (say, larger than 6), we advocate the methods given in Section 3.2 to partially estimate A.

Since the identifiability of A from aggregated data relies on the non-Gaussianity of the error terms, we use Gaussian mixtures to represent their distributions. It is natural to do parameter estimation with the Expectation-Maximization algorithm, which, unfortunately, involves a large number of Gaussian components. To avoid this issue, we propose to use the Stochastic Approximation EM (SAEM) algorithm, as a variant of EM, and further resort to conditional particle filtering with ancestor sampling (CPF-AS) to achieve computational efficiency.

4.1 STATE-SPACE MODELING

We can consider (3) as a special state-space model:

{\tilde{x}}_{t} = A^{k} {\tilde{x}}_{t - 1} + H {\tilde{e}}_{t} = A^{k} {\tilde{x}}_{t - 1} + [I \sum_{l = 0}^{1} A^{l} \dots \sum_{l = 0}^{k - 1} A^{l} \sum_{l = 1}^{k - 1} A^{l} \dots \sum_{l = k - 2}^{k - 1} A^{l} A^{k - 1}] [\begin{matrix} {\tilde{e}}_{t}^{(0)} \\ {\tilde{e}}_{t}^{(1)} \\ ⋮ \\ {\tilde{e}}_{t}^{(2 k - 2)} \end{matrix}],

(11)

where

[\begin{matrix} {\tilde{e}}_{t}^{(0)} \\ {\tilde{e}}_{t}^{(1)} \\ ⋮ \\ {\tilde{e}}_{t}^{(2 k - 2)} \end{matrix}] = F [\begin{matrix} {\tilde{e}}_{t - 1}^{(0)} \\ {\tilde{e}}_{t - 1}^{(1)} \\ ⋮ \\ {\tilde{e}}_{t - 1}^{(2 k - 2)} \end{matrix}] + [\begin{matrix} e_{tk} \\ e_{tk - 1} \\ ⋮ \\ e_{(t - 1) k + 1} \\ 0_{(nk - n) \times 1} \end{matrix}],

F = [\begin{matrix} 0_{nk \times (nk - n)} & 0_{nk \times (nk)} \\ I_{(nk - n) \times (nk - n)} & 0_{(nk - n) \times (nk)} \end{matrix}],

${\tilde{e}}_{t}^{(l)} = e_{tk - l}$ , and the noise terms e_tk, e_tk−1, …, e_(t−1)k+1 share the same distribution for the same channel and are mutually independent. Since the non-Gaussianity is essential to the identifiability of A, we use a Gaussian mixture model to represent each channel of noise term e, i.e., $p_{e_{i}} = \sum_{c = 1}^{m} w_{i, c} 𝒩 (e_{i} | μ_{i, c}, σ_{i, c}^{2})$ , where w_i,c ≥ 0, and $\sum_{c = 1}^{m} π_{i, c} = 1$ , for i = 1, …, n. Correspondingly, each channel of ẽ is also represented by a Gaussian mixture model.

We aim to estimate the parameters A and the noise terms (if necessary) in the above state-space model. We introduce the additional latent variable z̃_t = (z̃_t,1, …, z̃_t,nk)^𝖳, in which z̃_t,j ∈ {1, …, m}, to model the distribution of noise terms ẽ_t by Gaussian mixture models. The joint distribution of the state-space model (11) over both observed and unobserved variables is given by

p ({\tilde{x}}_{1 : T}, {\tilde{e}}_{1 : T}, {\tilde{z}}_{1 : T}) = p ({\tilde{z}}_{1}) p ({\tilde{e}}_{1} | {\tilde{z}}_{1}) p ({\tilde{x}}_{1} | {\tilde{e}}_{1}) \prod_{t = 2}^{T} p ({\tilde{z}}_{t}) p ({\tilde{e}}_{t} | {\tilde{e}}_{t - 1}, {\tilde{z}}_{t}) p ({\tilde{x}}_{t} | {\tilde{x}}_{t - 1}, {\tilde{e}}_{t}) .

(12)

The distributions in (12) are specified as follows:

p ({\tilde{z}}_{t}) = \prod_{j = 1}^{nk} p ({\tilde{z}}_{t, j}) = \prod_{j = 1}^{nk} {\tilde{π}}_{j, {\tilde{z}}_{t, j}},

(13a)

p ({\tilde{e}}_{t} | {\tilde{e}}_{t - 1}, {\tilde{z}}_{t}) = 𝒩 ({\tilde{e}}_{t} | F {\tilde{e}}_{t - 1} + {\tilde{μ}}_{t}, {\sum^{\sim}}_{t}),

(13b)

p ({\tilde{x}}_{t} | {\tilde{e}}_{t}, {\tilde{x}}_{t - 1}) = 𝒩 ({\tilde{x}}_{t} | A^{k} {\tilde{x}}_{t - 1} + H {\tilde{e}}_{t}, Λ) .

(13c)

Since there are no additional additive noise terms in the model, we fix Λ to a small value in our estimation algorithm for regularization. μ̃_t is the conditional mean of ẽ_t, i.e., μ̃_t = [μ̃_{1, z̃_{t, 1}}, …, μ̃_{nk, z̃_t,nk}, 0_1×n(k−1)]^𝖳. Σ̃_t is a diagonal matrix containing the conditional variance parameters of ẽ_t, i.e., ${\sum^{\sim}}_{t} = diag ([{\tilde{σ}}_{1, {\tilde{z}}_{t, 1}}^{2}, \dots, {\tilde{σ}}_{nk, {\tilde{z}}_{t, nk}}^{2}, 0_{1 \times n (k - 1)}])$ . According to the structure of ẽ, the parameters π̃_{j, z̃_{t, j}}, μ̃_{j, z̃_t,j}, and σ̃_{j, z̃_t,j} are controlled by the parameters of e, i.e., π̃_i+nl,c = π_i,c, μ̃_i+nl,c = μ_i,c, and σ̃_i+nl,c = σ_i,c, for i = 1, …, n, l = 0, …, k − 1, and c = 1, …, m.

4.2 STOCHASTIC APPROXIMATION EM

The expectation-maximization (EM) algorithm is usually adopted to find the maximum likelihood estimation of the parameters in a probabilistic model with unobserved variables. We can estimate the parameters θ = (A, w_i,c, μ_i,c, σ_i,c) in (12) using the EM algorithm that iteratively maximizes the lower bound of the marginal log-likelihood p_θ(x̃_1:T) = log Σ_{z̃_1:T} ∫ p_θ(x̃_1:T, ẽ_1:T, z̃_1:T)d_{ẽ_1:T}. In the E-step, at the k-th iteration, given the parameters θ_k−1 estimated from the (k − 1)-th iteration, the EM algorithm firstly computes the posterior distribution p_{θ_k−1} (z̃_1:T, ẽ_1:T|x̃_1:T) and then computes the lower bound 𝒬(θ, θ_k−1) = Σ_{z̃_1:T} ∫ p_{θ_k−1} (z̃_1:T, ẽ_1:T|x̃_1:T) log p_θ(x̃_1:T, ẽ_1:T, z̃_1:T)d_{ẽ_1:T}. In the M-step, the parameters are updated as θ_k = arg max_θ 𝒬(θ, θ_k−1).

However, we note that the number of Gaussian mixtures in the posterior distribution grows exponentially with the dimension of the time series, n, the number of aggregation factor, k, and the duration of time series T. Therefore, computing the exact posterior p_{θ_k−1} (z̃_1:T, ẽ_1:T|x̃_1:T) and 𝒬(θ, θ_k−1) is intractable in this situation. A possible solution is to adopt the monte carlo EM (MCEM) algorithm (Wei & Tanner, 1990), which approximately calculates 𝒬(θ, θ_k−1) using samples drawn from the posterior distribution p_{θ_k−1} (z̃_1:T, ẽ_1:T|x̃_1:T). However, MCEM makes inefficient use of generated samples, as it discards samples generated in the previous EM iterations. Therefore, a large number of sample points are required for each iteration, which is computationally expensive when the sampling method is complex.

To reduce the number of simulated sample points, we propose to use the stochastic approximation EM (SAEM) algorithm (Delyon et al., 1999), which only requires of a single realization of the unobserved variables at each iteration. At the k-th iteration, the E-step and M-step are replaced by the following:

E-step: Generate a single sample point z̃_1:T[k] from the posterior p_{θ_k−1} (z̃_1:T|x̃_1:T), and compute
${\hat{𝒬}}_{k} (θ) = (1 - γ_{k}) {\hat{𝒬}}_{k - 1} (θ) + γ_{k} \int p_{θ_{k - 1}} ({\tilde{e}}_{1 : T} | {\tilde{z}}_{1 : T} [k], {\tilde{x}}_{1 : T}) log p_{θ} ({\tilde{x}}_{1 : T}, {\tilde{e}}_{1 : T}, {\tilde{z}}_{1 : T} [k]) d_{{\tilde{e}}_{1 : T}} .$ (14)
M-step: Update parameters by θ_k = arg max_θ 𝒬̂_k(θ).

In (14), ${γ_{k}}_{k = 1}^{\infty}$ is a sequence of decreasing step sizes satisfying $\sum_{k = 1}^{\infty} γ_{k} = \infty$ and $\sum_{k = 1}^{\infty} γ_{k}^{2} < \infty$ . Here we use Rao-Blackwellization (Svensson et al., 2014) to avoid sampling ẽ_1:T because it is analytically integrable. It has been shown in (Delyon et al., 1999) that the resulting sequence {θ_k}_k≥1 will converge to a stationary point of p_θ(x̃_1:T) under weak assumptions.

4.3 CONDITIONAL PARTICLE FILTER WITH ANCESTOR SAMPLING

In our model, sampling from the posterior p_{θ_k−1} (z̃_1:T|x̃_1:T) is usually performed using forward filter/backward simulator particle smoother, which typically requires a large number of particles to generate a smooth backward trajectory z̃_1:T [k]. To reduce the number of required particles, we use the Markovian version of SAEM (Kuhn & Lavielle, 2004), which samples from a Markov kernel ℳ_{θ_t−1}, leaving the posterior distribution invariant. Specifically, let z̃_1:T[k − 1] be the previous draw from the Markov kernel, the current state is sampled by z̃_1:T[k] ~ ℳ_{θ_k−1} (·|z̃_1:T [k − 1]). Following (Lindsten, 2013; Svensson et al., 2014), we construct the Markov kernel using Rao-Blackwellized conditional particle filter with ancestor sampling (RB-CPF-AS) (Lindsten et al., 2014), which was originally proposed for Gibbs sampling.

The machinery inside RB-CPF-AS resembles a standard particle filter, with two main differences: one particle trajectory is deterministically set to a reference trajectory ${\tilde{z}}_{1 : T}^{'}$ , and the ancestors of the reference trajectory are randomly chosen and stored during the algorithm execution. Algorithm 1 gives a brief description of the RB-CPF-AS algorithm. Let ${{\tilde{z}}_{1 : t - 1}^{i}, w_{t - 1}^{i}}_{i = 1}^{N}$ be the approximation of p_θ(z̃_1:t−1|x̃_1:t−1), RB-CPF-AS propagates this sample to time t by introducing the auxiliary variables ${a_{t}^{i}}_{i = 1}^{N}$ , referred to as ancestor indices. To generate ${\tilde{z}}_{t}^{i}$ for the first N − 1 particle trajectories, we first sample the ancestor index according to $P (a_{t}^{i} = j) \propto w_{t - 1}^{j}$ , and then sample ${\tilde{z}}_{t, j}^{i}$ according to $p_{θ} ({\tilde{z}}_{t} | {\tilde{z}}_{t - 1}^{a_{t}^{i}}) = \prod_{j = 1}^{nk} p_{θ} ({\tilde{z}}_{t, j})$ . The first N − 1 trajectories are then augmented as ${\tilde{z}}_{1 : t}^{i} = {{\tilde{z}}_{1 : t - 1}^{a_{t}^{i}}, {\tilde{z}}_{t}^{i}}$ . The N-th particle is set to the reference particle, ${\tilde{z}}_{t}^{N} = {\tilde{z}}_{t}^{'}$ , and the ancestor index $a_{t}^{N}$ is sampled according to

P (a_{t}^{N} = i) \propto p_{θ} ({\tilde{z}}_{1 : t - 1}^{i} | {\tilde{z}}_{t : T}^{'}, {\tilde{x}}_{1 : T}) \propto p_{θ} ({\tilde{x}}_{t : T}, {\tilde{z}}_{t : T}^{'} | {\tilde{x}}_{1 : t - 1}, {\tilde{z}}_{1 : t - 1}^{i}) p_{θ} ({\tilde{z}}_{1 : t - 1}^{i} | {\tilde{x}}_{1 : t - 1}),

(15)

where $p_{θ} ({\tilde{z}}_{1 : t - 1}^{i} | {\tilde{x}}_{1 : t - 1}) = w_{t - 1}^{i}$ and

p_{θ} ({\tilde{x}}_{t : T}, {\tilde{z}}_{t : T}^{'} | {\tilde{x}}_{1 : t - 1}, {\tilde{z}}_{1 : t - 1}^{i}) \propto {| M_{t - 1}^{i} |}^{- 1 / 2} exp (- \frac{1}{2} η_{t - 1}^{i}) .

(16)

Conditioned on ${\tilde{z}}_{1 : T}^{i}$ , we can calculate $p ({\tilde{e}}_{1 : T} | {\tilde{z}}_{1 : T}^{i}, {\tilde{x}}_{1 : T})$ using Kalman filter and Rauch-Tung-Striebel (RTS) smoother. The filtering, prediction, and smoothing PDFs are

p_{θ} ({\tilde{e}}_{t} | {\tilde{z}}_{1 : t}^{i}, {\tilde{x}}_{1 : t}) = 𝒩 ({\tilde{e}}_{t} | {\hat{μ}}_{f, t}^{i}, {\sum^{^}}_{f, t}^{i}),

(17a)

p_{θ} ({\tilde{e}}_{t + 1} | {\tilde{z}}_{1 : t}^{i}, {\tilde{x}}_{1 : t}) = 𝒩 ({\tilde{e}}_{t + 1} | {\hat{μ}}_{p, t}^{i}, {\sum^{^}}_{p, t}^{i}),

(17b)

p_{θ} ({\tilde{e}}_{t} | {\tilde{z}}_{1 : T}^{i}, {\tilde{x}}_{1 : T}) = 𝒩 ({\tilde{e}}_{t} | {\hat{μ}}_{s, t}^{i}, {\sum^{^}}_{s, t}^{i}),

(17c)

respectively. Let ${‖ e ‖}_{Ω}^{2} = e^{𝖳} Ω e$ and ${\sum^{^}}_{f, t}^{i} = Γ_{f, t}^{i} Γ_{f, t}^{i, 𝖳}$ . In (16),

M_{t}^{i} = Γ_{f, t}^{i, 𝖳} Ω_{t} Γ_{f, t}^{i} + I,

(18a)

η_{t}^{i} = {‖ {\hat{μ}}_{f, t}^{i} ‖}_{Ω_{t}}^{2} - 2 λ_{t}^{𝖳} {\hat{μ}}_{f, t}^{i} - {‖ Γ_{f, t}^{i, 𝖳} (λ_{t} - Ω_{t} {\hat{μ}}_{f, t}^{i}) ‖}_{M_{t}^{- 1}}^{2},

(18b)

where

Ω_{t} = F^{𝖳} (I - {\hat{Ω}}_{t + 1} {\sum^{\sim}}_{t + 1}^{' 1 / 2} ϒ_{t + 1}^{- 1} {\sum^{\sim}}_{t + 1}^{' 1 / 2}) {\hat{Ω}}_{t + 1} F,

(19a)

λ_{t} = F^{𝖳} (I - {\hat{Ω}}_{t + 1} {\sum^{\sim}}_{t + 1}^{' 1 / 2} ϒ_{t + 1}^{- 1} {\sum^{\sim}}_{t + 1}^{' 1 / 2}) ({\hat{λ}}_{t + 1} - {\hat{Ω}}_{t + 1} {\tilde{μ}}_{t + 1}^{'}),

(19b)

{\hat{Ω}}_{t} = Ω_{t} + H^{𝖳} Λ^{- 1} H,

(19c)

{\hat{λ}}_{t} = λ_{t} + H^{𝖳} Λ^{- 1} ({\tilde{x}}_{t} - A^{k} {\tilde{x}}_{t - 1}),

(19d)

ϒ_{t} = {\sum^{\sim}}_{t}^{' 1 / 2} {\hat{Ω}}_{t} {\sum^{\sim}}_{t}^{' 1 / 2} + I,

(19e)

{\tilde{μ}}_{t}^{'} = {[{\tilde{μ}}_{1, {\tilde{z}}_{t, 1}^{'}}, \dots, {\tilde{μ}}_{nk, {\tilde{z}}_{t, nk}^{'}}, 0_{1 \times n (k - 1)}]}^{𝖳},

(19f)

{\sum^{\sim}}_{t}^{'} = diag ([{\tilde{σ}}_{1, {\tilde{z}}_{t, 1}^{'}}^{2}, \dots, {\tilde{σ}}_{nk, {\tilde{z}}_{t, nk}^{'}}^{2}, 0_{1 \times n (k - 1)}]) .

(19g)

With Ω_T = 0 and $λ_{T} = 0, {Ω_{t}, λ_{t}}_{t = 1}^{T}$ can be computed recursively for ${\tilde{z}}_{1 : T}^{'}$ using (19a)–(19g). Once all the ancestors ${a_{t}^{i}}_{i = 1}^{N}$ have been sampled, we can calculate the new particle weights as follows

w_{t}^{i} \propto p_{θ} ({\tilde{x}}_{t} | {\tilde{z}}_{1 : t}^{i}, {\tilde{x}}_{1 : t - 1}) = 𝒩 ({\tilde{x}}_{t} | F {\hat{μ}}_{p, t + 1}^{i} + A^{k} {\tilde{x}}_{t - 1}, F {\sum^{^}}_{p, t + 1}^{i} F^{𝖳} + Λ) .

(20)

After all the particle trajectories have been generated, we obtain z̃_1:T[k] by sampling from these trajectories according to the weights ${W_{T}^{i}}_{i = 1}^{N}$ at time T.

4.4 PARAMETER UPDATE

At the k-th M step, given the sample z̃_1:T[k] drawn by RB-CPF-AS, we can obtain p_{θ_k−1} (ẽ_t|z̃_1:T[k], x̃_1:T) = 𝒩(ẽ_t|μ̂_s,t, Σ̂_s,t) using the RTS smoother. Then we have

\int p_{θ_{k - 1}} ({\tilde{e}}_{1 : T} | {\tilde{z}}_{1 : T}, {\tilde{x}}_{1 : T}) log p_{θ} ({\tilde{x}}_{1 : T}, {\tilde{e}}_{1 : T}, {\tilde{z}}_{1 : T}) d_{{\tilde{e}}_{1 : T}} = \sum_{t = 1}^{T} log p ({\tilde{z}}_{t}) - \frac{1}{2} \sum_{t = 1}^{T} \int q ({\tilde{e}}_{t}^{'}) {({\tilde{e}}_{t}^{'} - {\tilde{μ}}_{t}^{'})}^{𝖳} {\sum^{\sim}}_{t}^{' - 1} ({\tilde{e}}_{t}^{'} - {\tilde{μ}}_{t}^{'}) d {\tilde{e}}_{t}^{'} + log | {\sum^{\sim}}_{t} | - \frac{1}{2} \sum_{t = 1}^{T} \int q ({\tilde{e}}_{t}) {({\tilde{y}}_{t} - H {\tilde{e}}_{t})}^{𝖳} Λ^{- 1} ({\tilde{y}}_{t} - H {\tilde{e}}_{t}) d {\tilde{e}}_{t} + const,

(21)

where z̃_t = z̃_t[k], ${\tilde{e}}_{t}^{'} = {[e_{tk}^{𝖳}, e_{tk - 1}^{𝖳}, \dots, {\tilde{e}}_{(t - 1) k + 1}^{𝖳}]}^{𝖳}, {\tilde{μ}}_{t}^{'} = {[{\tilde{μ}}_{1, {\tilde{z}}_{t, 1}}, \dots, {\tilde{μ}}_{nk, {\tilde{z}}_{t, nk}}]}^{𝖳}, {\sum^{\sim}}_{t}^{'} = diag ([{\tilde{σ}}_{1, {\tilde{z}}_{t, 1}}^{2}, \dots, {\tilde{σ}}_{nk, {\tilde{z}}_{t, nk}}^{2}])$ , ỹ_t = x̃_t − A^kx̃_t−1, and q(ẽ_t) = p_{θ_k−1} (ẽ_t|z̃_1:T, x̃_1:T).

It can be seen that we only need sufficient statistics ∫ ẽ_tq(ẽ_t)dẽ_t and $\int {\tilde{e}}_{t} {\tilde{e}}_{t}^{𝖳} q ({\tilde{e}}_{t}) d {\tilde{e}}_{t}$ to maximize (21). Denoting a sufficient statistics at the k-th iteration as S^k, we use 𝕊^k = (1 − γ_k)𝕊^k + γ_kS^k to maximize 𝒬̂_k(θ). To maximize 𝒬̂_k(θ) with respect to A, we compute the gradient of A in terms involving A^k and H and apply a conjugate gradient descent method as done in Gong et al. (2015).

Algorithm 1.

RB-CPF-AS

Input:

{\tilde{z}}_{1 : T}^{'} = {\tilde{z}}_{1 : T} [k - 1]

, θ = θ_k−1

Output: z̃_1:T [k] ~ ℳ_{θ_k−1} (·|z̃_1:T [k − 1])

Compute

{Ω_{t}, λ_{t}}_{t = 1}^{T}

according to (19a)–(19g)

Draw

{\tilde{z}}_{1}^{i}

with

{\tilde{z}}_{1}^{i} ~ p_{θ} ({\tilde{z}}_{1})

for i = 1, …, N

Compute

μ_{1}^{i}, \sum_{1}^{i}

, and

w_{1}^{i}

for i = 1, …, N

for t=2 to T do

Draw

a_{t}^{i}

with

P (a_{t}^{i} = j) \propto w_{t - 1}^{j}

for i = 1, …, N

// Resampling and ancestor sampling

Draw

{\tilde{z}}_{t}^{i}

with

{\tilde{z}}_{t}^{i} ~ p_{θ} ({\tilde{z}}_{t})

for i = 1, …, N

Compute

{M_{t - 1}^{i}, η_{t - 1}^{i}}

according to (18a) and (18b)

Draw

a_{N}^{i}

according to (15) for i = 1, …, N

// Particle propagation

Set

{\tilde{z}}_{1 : t}^{i} = {{\tilde{z}}_{1 : t - 1}^{a_{t}^{i}}, {\tilde{z}}_{t}^{i}}

for i = 1, …, N

Set

{\hat{μ}}_{f, 1 : t - 1}^{i} = {\hat{μ}}_{f, 1 : t - 1}^{a_{t}^{i}}, {\hat{μ}}_{p, 1 : t - 1}^{i} = {\hat{μ}}_{p, 1 : t - 1}^{a_{t}^{i}}

{\sum^{^}}_{f, 1 : t - 1}^{i} = {\sum^{^}}_{f, t - 1}^{a_{t}^{i}}, {\sum^{^}}_{p, 1 : t - 1}^{i} = {\sum^{^}}_{p, 1 : t - 1}^{a_{t}^{i}}

// Weighting

Compute

{\hat{μ}}_{f, t}^{i}, {\hat{μ}}_{p, t}^{i}, {\sum^{^}}_{f, t}^{i}

, and

{\sum^{^}}_{p, t}^{i}

Compute weights

w_{t}^{i}

according to (20)

Draw J with

P (J = j) \propto W_{T}^{j}

and set

{\tilde{z}}_{1 : T} [k] = {\tilde{z}}_{1 : T}^{J}

Open in a new tab

5 EXPERIMENTS

In this section, we conduct empirical studies of the two estimation methods presented in Section 3.2 and Section 4 on both synthetic and real data to show their effectiveness.

5.1 SIMULATED DATA

We conduct a series of simulations to investigate the effectiveness of the proposed estimation methods. Following (Gong et al., 2015), we generated the data at the casual frequency using the VAR model (1) with randomly generated matrix A and independent Gaussian mixture noises e_t. The elements in A were drawn from a uniform distribution 𝒰(−0.5, 0.5). The Gaussian mixture model contains two components for each channel. The parameters were w_1,1 = 0.2, w_1,2 = 0.8, w_2,1 = 0.3, w_2,2 = 0.7, μ_{_i,1} = 0, μ_i,2 = 0, $σ_{i, 1}^{2} = 1 e - 4, σ_{1, 2}^{2} = . 1$ , and $σ_{2, 2}^{2} = 0.2$ . Low-resolution observations were obtained by aggregating the high-resolution data using aggregation factor k. Similarly, we also generated data with Gaussian noise (by setting $σ_{1, c}^{2} = 0.01, σ_{2, c}^{2} = 0.02$ ) for comparison of different methods. We tested data with dimension n = 2, aggregation factor k = 2 and 3, and sample size T = 150 and 300, respectively. For comparison, we replaced the Gaussian mixture models in our method with Gaussian noise models, leading to a method based on Gaussian noises. We denote the method proposed in Section 4 that performs causal discovery from temporally aggregated data as CDTA_finite and the corresponding Gaussian counterpart as CDTA_Gauss. We also compare with the NG-EM method (Gong et al., 2015) on the aggregated data with non-Gaussian noises. The experiments are repeated for 10 replications.

Table 1 shows the mean squared error (MSE) of the estimated causal transition matrix A. It can be seen that as the sample size T increases, both the proposed CDTA_finite and the baseline method CDTA_Gauss obtain smaller estimation errors. On the non-Gaussian data, CDTA_Gauss produces much higher errors than CDTA_finite. On the Gaussian data, neither CDTA_finite nor CDTA_Gauss can obtain accurate estimations. This is because the estimation algorithms can converge to many possible solutions that have the same marginal likelihood, if the data noises are Gaussian or the estimation algorithms assume a Gaussian noise model. The results are consistent with the theoretical results that the causal relations might not be uniquely determined using Gaussian noise models. It can also be seen that the NG-EM method fails on the aggregated data, because NG-EM is proposed for subsampled rather than aggregated data.

Table 1.

MSE of different methods on simulated non-Gaussian and Gaussian data. The results are shown for subsampling factors (k = 2, 3) and lengths of data (T = 150, 300).

Data	non-Gaussian noise				Gaussian noise
Data	k=2		k=3		k=2		k=3
Methods	T=150	T=300	T=150	T=300	T=150	T=300	T=150	T=300
CDTA_finite	2.10e-4	1.19e-4	8.17e-4	7.36e-4	1.42e-2	3.67e-3	7.63e-3	9.69e-3
CDTA_Gauss	1.28e-2	4.49e-3	1.20e-2	7.22e-3	1.13e-2	3.08e-3	6.26e-2	9.07e-3
NG-EM	8.75e-2	8.51e-2	5.27e-1	1.88e-1	-	-	-	-

Open in a new tab

Further, we examined the performance of the method described in Section 3.2, denoted as CDTA_infty, with finite k values. To achieve so, we generated aggregated data with $A = [\begin{matrix} 0.8 & 0.1 \\ 0.6 & 0.7 \end{matrix}]$ , aggregation factor k = 2, 3, 4, 10, and the same Gaussian mixture noise parameters described above. The true A^NoSelfLoop in this case can be calculated as $A^{NoSelfLoop} = [\begin{matrix} 0 & 0.5 \\ 2 & 0 \end{matrix}]$ . Using the linear instantaneous non-Gaussian model, we can obtain the estimations of A^NoSelfLoop on the aggregated data. The results for k = 2, 3, 4, 10 are given as follows:

{\hat{A}}_{2}^{NoSelfLoop} = [\begin{matrix} 0 & 0.52 \\ - 13.5 & 0 \end{matrix}], {\hat{A}}_{3}^{NoSelfLoop} = [\begin{matrix} 0 & 0.47 \\ 1.43 & 0 \end{matrix}],

{\hat{A}}_{4}^{NoSelfLoop} = [\begin{matrix} 0 & 0.44 \\ 1.61 & 0 \end{matrix}], {\hat{A}}_{10}^{NoSelfLoop} = [\begin{matrix} 0 & 0.51 \\ 2.03 & 0 \end{matrix}] .

(22)

It seems that when k ≥ 4, the linear instantaneous non-Gaussian causal model (10), which assumes that there is no self-loop, can estimate the corresponding A^NoSelfLoop accurately and very efficiently, at the cost of losing the self-loops in the original process. However, the self-loops can be estimated with CDTA_finite when k is reasonably large. As a cautionary notice, researchers should carefully interpret the estimated parameters produced by linear instantaneous causal models, which assume there is no self-loop; as a consequence, the linear instantaneous non-Gaussian causal model produces (1 − A_jj)⁻¹A_ji, whose magnitude can be very different from the true causal parameter A_ji.

5.2 REAL DATA

We conducted experiments on the Temperature Ozone data (Mooij et al., 2016) and the macroeconomic data used in (Moneta, 2008). These two time series are collected by averaging the records during specified time intervals. For example, the Temperature Ozone data contain daily mean values of ozone and temperature of year 2009 in Chaumont and Switzerland. The macroeconomic data contain quarterly US macro variables for the period 1947:2 to 1994:1.

Temperature Ozone

The Temperature Ozone data is the 50th causal-effect pair from the website https://webdav.tuebingen.mpg.de/cause-effect/. The data have records of ozone density X and daily mean temperature Y. The ground truth is causal relation is Y → X. We first applied CDTA_infty on the data, resulting in $A^{NoSelfLoop} = [\begin{matrix} 0 & 0.65 \\ 0.65 & 0 \end{matrix}]$ . From this result, we can find that instantaneous effects exist in both directions. This could possibly caused by aggregation with a small k as the estimated A^NoSelfLoop is likely to be inaccurate. We then estimated the causal matrix by CDTA_finite. The estimated transition matrix A for k = 1, 2, 3 is $[\begin{matrix} 0.8418 & 0.0945 \\ 0.0165 & 0.9785 \end{matrix}], [\begin{matrix} 0.8426 & 0.1151 \\ 0.0220 & 0.9702 \end{matrix}], [\begin{matrix} 0.8360 & 0.1150 \\ 0.0172 & 0.9729 \end{matrix}]$ , respectively. It seems that the estimated matrices sensibly captured the self-influences and cross-influences between the ozone and temperature processes.

Macroeconomic Data

The data are quarterly U.S. observational on real aggregated macroeconomic variables. Here we consider the causal relations between two variables, including real balances X and price inflation Y. X denotes the logarithm of per captita M2 minus the logarithm of the implicit price deflator. Y is the log of the implicit price deflator at the time t minus log of the implicit price deflator at the time t − 1. Again, we first applied the CDTA_infty to find the rough estimation of causal relations excluding self-loops. The estimated A^NoSelfLoop is $[\begin{matrix} 0 & 0 \\ 0.3074 & 0 \end{matrix}]$ . This indicates that no influence from effect to cause can be estimated from the instantaneous dependencies, which is consistent with the ground truth. We also employed CDTA_finite to obtain the estimation of a complete causal transition matrix A. The estimated transition matrix A for k = 1, 2, 3, 4 is $[\begin{matrix} 1.009 & - 0.0013 \\ 0.1718 & 0.5796 \end{matrix}], [\begin{matrix} 0.9007 & - 0.0011 \\ 0.2024 & 0.6420 \end{matrix}], [\begin{matrix} 0.8503 & - 0.0005 \\ 0.1378 & 0.7091 \end{matrix}]$ , respectively. We can see that A gives weaker responses from effect Y to cause X as k increases. If we consider k = 4 as the aggregation factor, then we can calculate from the estimated A that $A^{NoSelfLoop} = [\begin{matrix} 0 & 0.033 \\ 0.4737 & 0 \end{matrix}]$ , which is close to the results estimated by CDTA_infty.

6 CONCLUSION

In this paper, we have investigate the problem of discovering high frequency causal relations from temporally aggregated time series. When the aggregation factor is finite, we proved that the causal relations are fully identifiable if the underlying causal relations are linear and the noise process is non-Gaussian. We also show that the causal matrix that removes self-loops is identifiable from instantaneous dependencies, when the aggregation factor goes to infinity. Based on these results, we propose an algorithm to recover the complete causal matrix when the aggregation factor is relatively small an an very efficient algorithm to partially recover the matrix when the aggregation factor is relatively large. Future work will focus on automatically estimating the aggregation factor k from data.

Acknowledgments

The authors would like to thank Dr. Tongliang Liu for helpful discussions. DT and MG would like to acknowledge the support from DP-140102164, FT-130101457, and LP-150100671. CG and KZ would like to acknowledge the support from NIH-1R01EB022858-01 FAIN-R01EB022858, NIH-1R01LM012087, and NIH-5U54HG008540-02 FAINU54HG008540.

APPENDIX

PROOF OF THEOREM 1

Proof

Here we consider the limit when T → 1. According to the identifiability results of A^k (7), we have

A^{k} = A^{' k} .

(23)

We then consider the remaining error term, e⃗_t. The corresponding random vector, e⃗ follows both the representation (9) and

H' ≜ [I, \dots, \sum_{l = 0}^{k - 1} A^{' l}, \sum_{l = 1}^{k - 1} A^{' l}, \dots, \sum_{l = k - 2}^{k - 1} A^{' l}, A^{' k - 1}] .

(24)

\tilde{e}' = {(e_{1}^{' (0)}, \dots, e_{n}^{' (0)}, e_{1}^{' (1)}, \dots, e_{n}^{' (1)}, \dots, e_{1}^{' (2 k - 2)}, \dots, e_{n}^{' (2 k - 2)})}^{𝖳}

(25)

with ${e'}_{i}^{(l)}$ , l = 0, …, 2k − 2, having the same distribution $p_{e_{i}^{'}}$ .

According to Proposition 1, each column of H′ is a scaled version of a column of H. Denote by H_ln+i, l = 0, …, 2k − 2; i = 1, …, n, the (ln + i)th column of H, and similarly for $H_{ln + i}^{'}$ . According to the Uniqueness Theorem in Eriksson & Koivunen (2004), we know that under condition A2, for each i, there exists one and only one j such that the distribution of $e_{i}^{(l)}$ , l = 0, …, 2k − 2 (which have the same distribution), is the same as the distribution of ${e'}_{j}^{(l)}$ , l = 0, …, 2k − 2, up to changes of location and scale. As a consequence, the columns ${H_{ln + j}^{'} | l = 0, \dots, 2 k - 2}$ correspond to {H_ln+i | l = 0, …, 2k − 2} up to the permutation anding arbitrariness.

According to the structure of H, ∀ m ≤ k − 1, H_kn+i = H_m+i + H_m+k+i, and similarly we have ∀ m ≤ k − 1, $H_{kn + j}^{'} = H_{m + j}^{'} + H_{m + k + j}^{'}$ . Hence, H_kn+i is proportional to $H_{kn + j}^{'}$ , i.e., $H_{kn + j}^{'} = λ_{ki} H_{kn + i}$ . Assume that $\sum_{l = 0}^{1} A^{' l}, \dots, \sum_{l = 0}^{k - 1} A^{' l}, \sum_{l = 1}^{k - 1} A^{' l}, \dots, \sum_{l = k - 2}^{k - 1} A^{' l}$ , A′^k−1 are non-diagonal matrices, we have $H_{j}^{'} = λ_{0 i} H_{i}$ . Since H_i and $H_{j}^{'}$ must be columns of I, as implied by the structure of H and H′, we can see that λ_0i = 1 and that i = j. Consequently, λ_ki must be 1 or −1. Let B = I + A + ⋯ + A^k−1 and B′ = I + A′ + …* + A′^k−1, we thus have B = B′D, where D is a diagonal matrix with 1 or −1 as its diagonal entries. Moreover, because AB − B = A^k − I and A′B′ − B′ = A′^k − I, and A^k = A′^k, we have

A' - I = (A - I) D .

(26)

If both A′ and A have diagonal entries which are smaller than 1, D must be the identity matrix, i.e., A′ = A. Therefore statement (i) is true.

If each p_{e_i} is asymmetric, e_i and −e_i have different distributions. Consequently, the representation (24) does not hold any more if one changes the signs of a subset of, but not all, non-zero elements of ${H_{ln + j}^{'} | l = 0, \dots, 2 k - 2}$ . This implies that for non-zero H_ln+i, λ_li, including λ_0i, have the same sign, and they are therefore 1 since λ_0i = 1. λ_ki = 1 leads to D = I and thus gives A′ = A. That is, (ii) is true.

References

Boot JCG, Feibes W, Lisman J, Hubertus C. Further methods of derivation of quarterly figures from annual data. Applied Statistics. 1967:65–75. [Google Scholar]
Breitung J, Swanson NR. Temporal aggregation and spurious instantaneous causality in multiple time series models. Journal of Time Series Analysis. 2002;23:651–665. [Google Scholar]
Danks D, Plis S. Learning causal structure from undersampled time series; JMLR: Workshop and Conference Proceedings; 2013. [PMC free article] [PubMed] [Google Scholar]
Delyon B, Lavielle M, Moulines E. Convergence of a stochastic approximation version of the em algorithm. Annals of statistics. 1999:94–128. [Google Scholar]
Eriksson J, Koivunen V. Identifiability, separability, and uniqueness of linear ICA models. IEEE Signal Processing Letters. 2004;11(7):601–604. [Google Scholar]
Geiger P, Zhang K, Gong M, Schölkopf B, Janzing D. 32nd International Conference on Machine Learning. Microtome Publishing; 2015. Causal inference by identification of vector autoregressive processes with hidden components; pp. 1917–1925. [Google Scholar]
Ghysels E, Hill JB, Motegi K. Testing for granger causality with mixed frequency data. Journal of Econometrics. 2016;192(1):207–230. [Google Scholar]
Gong M, Zhang K, Schölkopf B, Tao D, Geiger P. Discovering temporal causal relations from subsampled data. ICML. 2015:1898–1906. [Google Scholar]
Granger Clive WJ. Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society. 1969:424–438. [Google Scholar]
Granger Clive WJ. Implications of aggregation with common factors. Econometric Theory. 1987;3(02):208–222. [Google Scholar]
Harvey AC, Chung CH. Estimating the underlying change in unemployment in the uk. Journal of the Royal Statistics Society, Series A. 2000;163:303–309. [Google Scholar]
Hyttinen A, Plis S, Järvisalo M, Eberhardt F, Danks D. Causal discovery from subsampled time series data by constraint optimization; International Conference on Probabilistic Graphical Models; 2016. pp. 216–227. [PMC free article] [PubMed] [Google Scholar]
Hyvärinen A, Karhunen J, Oja E. Independent Component Analysis. John Wiley & Sons, Inc; 2001. [Google Scholar]
Kuhn E, Lavielle M. Coupling a stochastic approximation version of em with an mcmc procedure. ESAIM: Probability and Statistics. 2004;8:115–131. [Google Scholar]
Lacerda G, Spirtes P, Ramsey J, Hoyer PO. Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI2008) Helsinki, Finland: 2008. Discovering cyclic causal models by independent components analysis. [Google Scholar]
Lindsten F. Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE; 2013. An efficient stochastic approximation em algorithm using conditional particle filters; pp. 6274–6278. [Google Scholar]
Lindsten F, Jordan MI, Schön TB. Particle gibbs with ancestor sampling. Journal of Machine Learning Research. 2014;15(1):2145–2184. [Google Scholar]
Marcellino M. Some consequences of temporal aggregation in empirical analysis. Journal of Business and Economic Statistics. 1999;17:129–136. [Google Scholar]
Moauro F, Savio G. Temporal disaggregation using multivariate structural time series models. Journal of Econometrics. 2005;8:210–234. [Google Scholar]
Moneta A. Graphical causal models and vars: an empirical assessment of the real business cycles hypothesis. Empirical Economics. 2008;35(2):275–300. [Google Scholar]
Mooij JM, Peters J, Janzing D, Zscheischler J, Schölkopf B. Distinguishing cause from effect using observational data: methods and benchmarks. Journal of Machine Learning Research. 2016;17(32):1–102. [Google Scholar]
Palm FC, Nijman TE. Missing observations in the dynamic regression model. Econometrica. 1984;52:1415–1435. [Google Scholar]
Pearl J. Causality: Models, Reasoning, and Inference. Cambridge University Press; Cambridge: 2000. [Google Scholar]
Plis S, Danks D, Freeman C, Calhoun V. Rate-agnostic (causal) structure learning. Advances in neural information processing systems. 2015a:3303–3311. [PMC free article] [PubMed] [Google Scholar]
Plis S, Danks D, Yang J. Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence. Vol. 31. NIH Public Access; 2015b. Mesochronal structure learning. [PMC free article] [PubMed] [Google Scholar]
Proietti T. Temporal disaggregation by state space methods: Dynamic regression methods revisited. The Econometrics Journal. 2006;9:357–372. [Google Scholar]
Rajaguru G, Abeysinghe T. Temporal aggregation, cointegration and causality inference. Economics Letters. 2008;101:223–226. [Google Scholar]
Shimizu S, Hoyer PO, Hyvärinen A, Kerminen AJ. A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research. 2006;7:2003–2030. [Google Scholar]
Silvestrini A, Veredas D. Temporal aggregation of univariate and multivariate time series models: A survey. Journal of Economic Surveys. 2008;22:458–497. [Google Scholar]
Sims CA. Macroeconomics and reality. Econometrica. 1980;48:1–48. [Google Scholar]
Spirtes P, Glymour C, Scheines R. Causation, Prediction, and Search. 2. MIT Press; Cambridge, MA: 2001. [Google Scholar]
Stram Daniel O, Wei William WS. A methodological note on the disaggregation of time series totals. Journal of Time Series Analysis. 1986;7(4):293–302. [Google Scholar]
Svensson A, Schön TB, Lindsten F. Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on. IEEE; 2014. Identification of jump markov linear models using particle filters; pp. 6504–6509. [Google Scholar]
Tank A, Fox EB, Shojaie A. Identifiability and estimation of structural vector autoregressive models for subsampled and mixed frequency time series. arXiv preprint arXiv: 1704.02519. 2017 doi: 10.1093/biomet/asz007. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tiao George C. Asymptotic behaviour of temporal aggregates of time series. Biometrika. 1972:525–531. [Google Scholar]
Van Nes EH, Scheffer M, Brovkin V, Lenton TM, Ye H, Deyle E, Sugihara G. Causal feedbacks in climate change. Nature Climate Change. 2015;5(5):445–448. [Google Scholar]
Wei GCG, Tanner MA. A monte carlo implementation of the em algorithm and the poor man’s data augmentation algorithms. Journal of the American statistical Association. 1990;85(411):699–704. [Google Scholar]
Weiss A. Systematic sampling and temporal aggregation in time series models. Journal of Econometrics. 1984;26:271–281. [Google Scholar]
Zhou D, Zhang Y, Xiao Y, Cai D. Analysis of sampling artifacts on the granger causality analysis for topology extraction of neuronal dynamics. Frontiers in computational neuroscience. 2014;8 doi: 10.3389/fncom.2014.00075. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Boot JCG, Feibes W, Lisman J, Hubertus C. Further methods of derivation of quarterly figures from annual data. Applied Statistics. 1967:65–75. [Google Scholar]

[R2] Breitung J, Swanson NR. Temporal aggregation and spurious instantaneous causality in multiple time series models. Journal of Time Series Analysis. 2002;23:651–665. [Google Scholar]

[R3] Danks D, Plis S. Learning causal structure from undersampled time series; JMLR: Workshop and Conference Proceedings; 2013. [PMC free article] [PubMed] [Google Scholar]

[R4] Delyon B, Lavielle M, Moulines E. Convergence of a stochastic approximation version of the em algorithm. Annals of statistics. 1999:94–128. [Google Scholar]

[R5] Eriksson J, Koivunen V. Identifiability, separability, and uniqueness of linear ICA models. IEEE Signal Processing Letters. 2004;11(7):601–604. [Google Scholar]

[R6] Geiger P, Zhang K, Gong M, Schölkopf B, Janzing D. 32nd International Conference on Machine Learning. Microtome Publishing; 2015. Causal inference by identification of vector autoregressive processes with hidden components; pp. 1917–1925. [Google Scholar]

[R7] Ghysels E, Hill JB, Motegi K. Testing for granger causality with mixed frequency data. Journal of Econometrics. 2016;192(1):207–230. [Google Scholar]

[R8] Gong M, Zhang K, Schölkopf B, Tao D, Geiger P. Discovering temporal causal relations from subsampled data. ICML. 2015:1898–1906. [Google Scholar]

[R9] Granger Clive WJ. Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society. 1969:424–438. [Google Scholar]

[R10] Granger Clive WJ. Implications of aggregation with common factors. Econometric Theory. 1987;3(02):208–222. [Google Scholar]

[R11] Harvey AC, Chung CH. Estimating the underlying change in unemployment in the uk. Journal of the Royal Statistics Society, Series A. 2000;163:303–309. [Google Scholar]

[R12] Hyttinen A, Plis S, Järvisalo M, Eberhardt F, Danks D. Causal discovery from subsampled time series data by constraint optimization; International Conference on Probabilistic Graphical Models; 2016. pp. 216–227. [PMC free article] [PubMed] [Google Scholar]

[R13] Hyvärinen A, Karhunen J, Oja E. Independent Component Analysis. John Wiley & Sons, Inc; 2001. [Google Scholar]

[R14] Kuhn E, Lavielle M. Coupling a stochastic approximation version of em with an mcmc procedure. ESAIM: Probability and Statistics. 2004;8:115–131. [Google Scholar]

[R15] Lacerda G, Spirtes P, Ramsey J, Hoyer PO. Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI2008) Helsinki, Finland: 2008. Discovering cyclic causal models by independent components analysis. [Google Scholar]

[R16] Lindsten F. Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE; 2013. An efficient stochastic approximation em algorithm using conditional particle filters; pp. 6274–6278. [Google Scholar]

[R17] Lindsten F, Jordan MI, Schön TB. Particle gibbs with ancestor sampling. Journal of Machine Learning Research. 2014;15(1):2145–2184. [Google Scholar]

[R18] Marcellino M. Some consequences of temporal aggregation in empirical analysis. Journal of Business and Economic Statistics. 1999;17:129–136. [Google Scholar]

[R19] Moauro F, Savio G. Temporal disaggregation using multivariate structural time series models. Journal of Econometrics. 2005;8:210–234. [Google Scholar]

[R20] Moneta A. Graphical causal models and vars: an empirical assessment of the real business cycles hypothesis. Empirical Economics. 2008;35(2):275–300. [Google Scholar]

[R21] Mooij JM, Peters J, Janzing D, Zscheischler J, Schölkopf B. Distinguishing cause from effect using observational data: methods and benchmarks. Journal of Machine Learning Research. 2016;17(32):1–102. [Google Scholar]

[R22] Palm FC, Nijman TE. Missing observations in the dynamic regression model. Econometrica. 1984;52:1415–1435. [Google Scholar]

[R23] Pearl J. Causality: Models, Reasoning, and Inference. Cambridge University Press; Cambridge: 2000. [Google Scholar]

[R24] Plis S, Danks D, Freeman C, Calhoun V. Rate-agnostic (causal) structure learning. Advances in neural information processing systems. 2015a:3303–3311. [PMC free article] [PubMed] [Google Scholar]

[R25] Plis S, Danks D, Yang J. Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence. Vol. 31. NIH Public Access; 2015b. Mesochronal structure learning. [PMC free article] [PubMed] [Google Scholar]

[R26] Proietti T. Temporal disaggregation by state space methods: Dynamic regression methods revisited. The Econometrics Journal. 2006;9:357–372. [Google Scholar]

[R27] Rajaguru G, Abeysinghe T. Temporal aggregation, cointegration and causality inference. Economics Letters. 2008;101:223–226. [Google Scholar]

[R28] Shimizu S, Hoyer PO, Hyvärinen A, Kerminen AJ. A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research. 2006;7:2003–2030. [Google Scholar]

[R29] Silvestrini A, Veredas D. Temporal aggregation of univariate and multivariate time series models: A survey. Journal of Economic Surveys. 2008;22:458–497. [Google Scholar]

[R30] Sims CA. Macroeconomics and reality. Econometrica. 1980;48:1–48. [Google Scholar]

[R31] Spirtes P, Glymour C, Scheines R. Causation, Prediction, and Search. 2. MIT Press; Cambridge, MA: 2001. [Google Scholar]

[R32] Stram Daniel O, Wei William WS. A methodological note on the disaggregation of time series totals. Journal of Time Series Analysis. 1986;7(4):293–302. [Google Scholar]

[R33] Svensson A, Schön TB, Lindsten F. Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on. IEEE; 2014. Identification of jump markov linear models using particle filters; pp. 6504–6509. [Google Scholar]

[R34] Tank A, Fox EB, Shojaie A. Identifiability and estimation of structural vector autoregressive models for subsampled and mixed frequency time series. arXiv preprint arXiv: 1704.02519. 2017 doi: 10.1093/biomet/asz007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Tiao George C. Asymptotic behaviour of temporal aggregates of time series. Biometrika. 1972:525–531. [Google Scholar]

[R36] Van Nes EH, Scheffer M, Brovkin V, Lenton TM, Ye H, Deyle E, Sugihara G. Causal feedbacks in climate change. Nature Climate Change. 2015;5(5):445–448. [Google Scholar]

[R37] Wei GCG, Tanner MA. A monte carlo implementation of the em algorithm and the poor man’s data augmentation algorithms. Journal of the American statistical Association. 1990;85(411):699–704. [Google Scholar]

[R38] Weiss A. Systematic sampling and temporal aggregation in time series models. Journal of Econometrics. 1984;26:271–281. [Google Scholar]

[R39] Zhou D, Zhang Y, Xiao Y, Cai D. Analysis of sampling artifacts on the granger causality analysis for topology extraction of neuronal dynamics. Frontiers in computational neuroscience. 2014;8 doi: 10.3389/fncom.2014.00075. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Causal Discovery from Temporally Aggregated Time Series

Mingming Gong

Kun Zhang

Bernhard Schölkopf

Clark Glymour

Dacheng Tao

Abstract

1 INTRODUCTION

2 EFFECT OF TEMPORAL AGGREGATION

2.1 WITH A FINITE k

2.2 WHEN k → ∞

3 IDENTIFIABILITY OF CAUSAL RELATIONS IN A

3.1 IDENTIFIABILITY WITH FINITE k

Proposition 1

Theorem 1

3.2 IDENTIFIABILITY AS k → ∞

4 ESTIMATING THE CAUSAL RELATIONS FROM AGGREGATED DATA

4.1 STATE-SPACE MODELING

4.2 STOCHASTIC APPROXIMATION EM

4.3 CONDITIONAL PARTICLE FILTER WITH ANCESTOR SAMPLING

4.4 PARAMETER UPDATE

Algorithm 1.

5 EXPERIMENTS

5.1 SIMULATED DATA

Table 1.

5.2 REAL DATA

Temperature Ozone

Macroeconomic Data

6 CONCLUSION

Acknowledgments

APPENDIX

PROOF OF THEOREM 1

Proof

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Causal Discovery from Temporally Aggregated Time Series

Mingming Gong

Kun Zhang

Bernhard Schölkopf

Clark Glymour

Dacheng Tao

Abstract

1 INTRODUCTION

2 EFFECT OF TEMPORAL AGGREGATION

2.1 WITH A FINITE k

2.2 WHEN k → ∞

3 IDENTIFIABILITY OF CAUSAL RELATIONS IN A

3.1 IDENTIFIABILITY WITH FINITE k

Proposition 1

Theorem 1

3.2 IDENTIFIABILITY AS k → ∞

4 ESTIMATING THE CAUSAL RELATIONS FROM AGGREGATED DATA

4.1 STATE-SPACE MODELING

4.2 STOCHASTIC APPROXIMATION EM

4.3 CONDITIONAL PARTICLE FILTER WITH ANCESTOR SAMPLING

4.4 PARAMETER UPDATE

Algorithm 1.

5 EXPERIMENTS

5.1 SIMULATED DATA

Table 1.

5.2 REAL DATA

Temperature Ozone

Macroeconomic Data

6 CONCLUSION

Acknowledgments

APPENDIX

PROOF OF THEOREM 1

Proof

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases