Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 May 6.
Published in final edited form as: Proc IEEE Int Conf Data Min. 2017 Dec 18;2017:913–918. doi: 10.1109/ICDM.2017.114

Behind Distribution Shift: Mining Driving Forces of Changes and Causal Arrows

Biwei Huang †,, Kun Zhang , Jiji Zhang , Ruben Sanchez-Romero , Clark Glymour , Bernhard Schölkopf
PMCID: PMC6502242  NIHMSID: NIHMS1001265  PMID: 31068766

Abstract

We address two important issues in causal discovery from nonstationary or heterogeneous data, where parameters associated with a causal structure may change over time or across data sets. First, we investigate how to efficiently estimate the “driving force” of the nonstationarity of a causal mechanism. That is, given a causal mechanism that varies over time or across data sets and whose qualitative structure is known, we aim to extract from data a low-dimensional and interpretable representation of the main components of the changes. For this purpose we develop a novel kernel embedding of nonstationary conditional distributions that does not rely on sliding windows. Second, the embedding also leads to a measure of dependence between the changes of causal modules that can be used to determine the directions of many causal arrows. We demonstrate the power of our methods with experiments on both synthetic and real data.

I. INTRODUCTION

A fundamental problem in science and engineering is to discover and make use of causal relations among variables of interest. A standard way to find causal relations resorts to interventions or randomized experiments, which, however, are usually difficult or even impossible to conduct. Consequently, how to infer causal relations from observational data or combinations of observational and experimental data, known as causal discovery [9], [6], has drawn much attention in several disciplines in the past three decades.

Most causal discovery methods assume that there is a fixed causal model underlying the observed data and aim to estimate it from the data. In this setting, constraint-based causal discovery methods [9], [6] make use of conditional independence relationships among variables to infer the equivalence class of the underlying causal structure. With the rapid accumulation of huge volumes of data of various types, collected data often exhibit distribution shift, which can occur across data sets or over time. From a causal standpoint, the shift in the joint distribution of the data may result from changes in just a few local causal mechanisms or modules because of varied background variables or experimental conditions, while a large portion of the data-generating process remains the same.

As illustrated in [11], applying causal discovery methods which assume a fixed causal model on data with distribution shift may lead to extra causal edges and, accordingly, it is desirable to develop causal analysis methods specifically for such data. A procedure was proposed in [11] which is able to asymptotically correctly recover the skeleton of the causal structure over observed variables and locate changing causal modules. In this paper we build on that work and aim to address two further problems, after the skeleton of the causal structure is learned.

  1. How to efficiently estimate the nonstationary “driving force” of a causal mechanism that changes over time or across data sets? An interpretable representation of the main components of the nonstationarity will greatly enhance understanding of the data generating process.

  2. How to make use of distribution shifts to determine causal directions in a system with an arbitrary number of variables? Such a method will supplement the classic Meek orientation rules [5] to derive more causal information from nonstationary/heterogeneous data.

Both problems are essential components of our causal analysis framework for nonstationary/heterogeneous data. Regarding problem 1, traditionally, one may use Bayesian change point detection to detect change points of observed time series [1], or one may use sliding window-based methods. However, Bayesian change point detection can only be applied to detect changes in marginal distributions, whereas causal mechanisms are represented by conditional distributions. Moreover, neither of them is appropriate when the causal mechanisms change continuously over time. [4] proposed a method that is able to learn how the causal model changes over time automatically. However, it requires the assumption of linearity, and it fails to handle cases where the nonstationarity results from changes of influences from the noise. Problem 2, as a sub-problem of causal discovery, exploits a generalized notion of the invariance property [10] or the exogeneity property [2], [13] of causal systems: if there is no nonstationary confounder for Vi and Vj, then the causal mechanisms, represented by the conditional distributions P(Vi |PAi) and P(Vj |PAj), change independently over time or across data sets.

The paper is organized as follows. After reviewing the procedure of causal skeleton discovery in the case of distribution shift in Section II, we present our solution to problem 1 in Section III, in which we assume that the direct causes of the considered variable are known. Then we address problem 2 in Section IV. We discuss problem 1 before problem 2, since our method for dealing with problem 2 takes advantage of the technical results derived for solving problem 1. Section V gives the experimental results on both synthetic and real data.

II. CAUSAL SKELETON DISCOVERY FROM NONSTATIONARY/HETEROGENEOUS DATA

Suppose that the underlying causal structure over variables V={Vi}i=1m is represented by a DAG G. For each Vi, let PAi denote the set of parents of Vi in G. Suppose that at each time point or in each domain, the joint probability distribution of V factorizes according to G:P(V)=i=1mP(Vi|PAi). We call each P(Vi |PAi) a causal module. Their changes may be due to changes of causal strengths, influences from the noise, etc. We assume that those quantities that change over time or cross domains can be written as functions of a time or domain index, and denote by C such an index. If the changes in some modules are related, one can treat the situation as if there exists some unobserved quantity that influences the changes of those modules simultaneously. We call such quantities nonstationary confounders.

We assume that for each Vi the local causal process for Vi can be represented by the following structural equation model (SEM):

Vi=fi(PAi,gi(C),θi(C),ϵi), (1)

where gi(C){gl(C)}l=1L denotes the set of nonstationary confounders that influence Vi (it is an empty set if there is no confounder behind Vi and any other variable), θi(C) denotes the effective parameters in the model that are also assumed to be functions of C, and ϵi is a disturbance term that is independent of C and has a non-zero variance (i.e., the model is not deterministic). The noise terms ϵi are assumed to be independent and identically distributed.

The procedure for causal discovery from nonstationary data proposed in [11] is briefly described in Algorithm 1. Step 3 aims to discover the skeleton of the causal structure over V, i.e., an undirected graph representing which variables are adjacent in the underlying causal structure. Step 2 is used to identify nonstationary causal modules. The (asymptotic) correctness of the procedure is justified by the following Theorem proved in [11].

Theorem 1. Given the above assumptions, for every Vi, VjV, Vi and Vj are not adjacent in G if and only if they are independent conditional on some subset of {Vk | ki, kj} ∪ {C}.

III. NONSTATIONARY DRIVING FORCE ESTIMATION

In this section, we focus on the discovery of how causal module P(Vi | PAi) changes, i.e., where the changes occur, how fast it changes, and how to visualize the changes. We assume that we already know the causal structure and know which causal modules are nonstationary (see Algorithm 1).

Algorithm 1.

Detection of Changing Modules and Recovery of Causal Skeleton

1) Build a complete undirected graph UC on the variable set V ∪ {C}.
2) (Detection of changing modules) For each i, test for the marginal and conditional independence between Vi and C. If they are independent given a subset of {Vk | ki}, remove the edge between Vi and C in UC.
3) (Recovery of causal skeleton) For every ij, test for the marginal and conditional independence between Vi and Vj. If they are independent given a subset of {Vk | ki, kj} ∪ {C}, remove the edge between Vi and Vj in UC.

In the parametric case, if we know which parameters of the causal model PAiVi are changing, e.g., the mean of a root cause, the coefficients in a linear SEM, then we can estimate such parameters for different values of C and see how they change. However, such knowledge is usually not available, and for the sake of flexibility it is better to model the causal processes nonparametrically. Therefore, it is desirable to develop a general nonparametric procedure for capturing the nonstationarity of changing causal modules.

We aim to find a low-dimensional mapping of P(Vi | PAi) which captures its nonstationarity in a nonparametric way:

λi(C)=hi(P(Vi|PAi,C)). (2)

Note that changes in P(Vi | PAi) are irrelevant to changes in P(PAi), and accordingly, they are not necessarily the same as changes in the joint distribution P(Vi, PAi). If Vi is a root cause, then PAi is an empty set, and P(Vi | PAi) reduces to the marginal distribution P(Vi).

We call λi(C) the nonstationary driving force of P(Vi | PAi, C). If P(Vi | PAi, C) does not change along with C, then λi(C) remains constant. Otherwise, λi(C) is intended to capture the variability of P(Vi | PAi, C) across different values of C.

Now there are two problems to solve. One is given only observed data, how to represent the conditional distributions conveniently. The other is what method to use to enable λi(C) to capture the variability in the conditional distribution along with C. We tackle the above two problems by using kernels [7] and accordingly propose a method called Nonstationary Driving Force Estimation (NoDFEs) of causal modules.

A. Kernel Embedding of Constructed Joint Distributions

Notation:

Throughout the paper, we use following notation. Let X be a random variable on domain X, and (H,k) be a Reproducing Kernel Hilbert Space (RKHS) with a measurable kernel on X. Let ϕ(x)H represent the feature map for each xX, with ϕ:XH. We assume integrability: EX[k(X,X)]≤∞. Similar notations are for variables Y and C. The cross-covariance operator CYX:HG is defined as CYX:=EYX[ϕ(X)ψ(Y)], where G is the RKHS associated with Y.

Intuitively, to represent the kernel embedding of nonstationary causal modules, we need to consider P(Vi|PAi) for each C separately. If C is a domain index, for each value of C we have a dataset of (Vi, PAi). If C is a time index, one may use a sliding window to use the data of (Vi, PAi) in the window of length L centered at C = c. However, in some cases it might be hard to find an appropriate window length L, especially when the causal module changes fast. In the following, we propose a way to estimate the kernel embedding of nonstationary causal modules on the whole dataset, avoiding window segmentation. For the sake of conciseness, below we use Y and X to denote Vi and PAi, respectively.

Instead of working with P(Y |X,C = cn) (n = 1, · · · ,N) directly, we “virtually” construct a particular distribution P˜(Y_,X|C=cn) as follows:1

P˜(Y_,X|C=cn)=P(Y|X,C=cn)P(X).

The constructed distribution P˜(Y_,X|C=cn) captures changes in P(Y | X,C = cn) across different cn.

Proposition 1 shows that the kernel embedding of the distribution P˜(Y_,X|C=cn) can be estimated on the whole dataset, without window segmentation.

Proposition 1. Let X represent the direct causes of Y, and suppose that they have N observations. The kernel embedding of distribution P˜(Y_,X|C=cn) can be represented as

μ˜^Y_,X|C=cn=1nΦy(KxKc+λI)1diag(kc,cn)KxΦxT,

where Φy[ϕ(y1),,ϕ(yN)], Φx[ϕ(x1),,ϕ(xN)], kc,cn:=[k(c1,cn),,k(cN,cn)]T, and ⊙ represents pointwise product.

B. Nonstationary Driving Force Estimation As an Eigenvalue Decomposition Problem

Next, we use the estimated kernel embedding of distributions, μ˜^Y_,X|C=cn(n=1,,N), as the input, and aim to find λ^(C) as a low-dimensional representation of μ˜Y_,X|C=cn, to capture its variability across different values of C. This can be readily achieved by exploiting kernel principle component analysis (KPCA) techniques [8], which computes principal components in kernel spaces of the input.

To perform KPCA, we need to know the N × N Gram matrix of μ˜^Y_,X|C=c first. If we use a linear kernel, the (c, c′)th entry of the Gram matrix MY_Xl is the inner product between μ˜^Y_,X|C=c and μ˜^Y_,X|C=c:

MY_Xl(c,c)tr(μ˜^Y_,X|C=cTμ˜^Y_,X|C=c)=1n2kc,cT[Kx3((KxKc+λI)1Ky(KxKc+λI)1)]kc,c,

which is the (c, c′)th entry of the matrix

MY_Xl=1n2Kc[Kx3((KxKc+λI)1Ky(KxKc+λI)1)]Kc. (3)

If we use a Gaussian kernel with kernel width σ2, the Gram matrix is given by

MY_Xg(c,c)=exp(μ˜Y_,X|C=cμ˜Y_,X|C=cF22σ22)=exp(MY_Xl(c,c)+MY_Xl(c,c)2MY_Xl(c,c)2σ22), (4)

where || · ||F denotes the Frobenius norm.

Finally, λ^i(C) can be found by performing eigenvalue decomposition on the above Gram matrix, MY_Xl or MY_Xg; for details please see [8]. In practice, one may take the first few eigenvectors which capture most of the variance.

We can see that with our methods, we do not need to explicitly learn the high-dimensional kernel embedding μ˜Y_,X|C=c for each c. With the kernel trick, the final Gram matrix can be represented by N × N kernel matrices directly. Then the nonstationary driving force λ^i(C) can be estimated by performing eigenvalue decomposition on the Gram matrix.

Algorithm 2 summarizes the proposed NoDFEs method. There are several hyperparameters to set. The hyperparameters associated with Kx, Kc, and the regularization parameter λ in equation (3) are learned through a Gaussian process regression framework: the hyperparameters are learned by maximizing the marginal likelihood. For the hyperparameters associated with Ky and the kernel with σ2 in equation (4), we set them with empirical values. See [12] for details.

Change in marginal distributions.

As a special case, when we are concerned with how the marginal distribution of Y changes with C, i.e., when X = ∅, we have μY|C=cn=CYCCCC1ϕ(cn). By constraining X in μ˜Y,X|C=cn to a fixed value, its empirical estimate is

μ^Y|C=cn=Φy(Kc+λI)1kc,cn.

Then the Gram matrix with a linear kernel is

MYl=Kc(Kc+λI)1Ky(Kc+λI)1Kc.
Algorithm 2.

NoDFEs of Causal Modules P(Y |X)

1) Input: N observstions of X and Y.
2) Calculate Gram matrix MY_X (see Eq. 3 for linear kernels and Eq. 4 for Gaussian kernels).
3) Find λ^i(C) by directly feeding Gram matrix MY_X to KPCA. That is, perform eigenvalue decomposition on M to find the nonlinear principal components λ^i(C) [8].
4) Output: the estimation of nonstationary driving force λ^i(C).

IV. CAUSAL DIRECTION ESTIMATION BY DEPENDENCE MINIMIZATION

In this section, we propose a nonparametric method to determine causal directions, by making use of the independence property between causal modules. Suppose that XY; if only one of the distributions P(X) and P(Y|X) changes, the independent change property also holds because a constant is independent from any variable. Therefore, below we do not separately study the case where only one of the two considered variables is adjacent to C, but consider it as a special case.

We also note that to accelerate the process of causal direction determination, one may first apply Meek’s orientation rules [5] to derive the equivalence class and then use the procedure proposed below to further find some of orientations that are not given in the equivalence class.

A. Two-Variable Case

For simplicity, let us start with the two-variable case: suppose that X and Y are adjacent and at least one of them is adjacent to C, and there are no confounders behind them. We aim to identify the causal direction between them, which, without loss of generality, we assume to be XY. The guiding idea is that distribution shift may carry information that confirms “independence” of causal modules, which, in the simple case we are considering, is the “independence” between P(X) and P(Y|X). If P(X) and P(Y |X) are “independent” but P(Y) and P(X|Y ) are not, then the causal direction is inferred to be from X to Y.

The dependence between P(X) and P(Y|X) can be estimated by extending the Hilbert-Schmidt Independence Criterion (HSIC) [3].

a). HSIC:

Given a set of observations {(u1, v1), (u2, v2), …, (uN, vN)} for variables U and V, respectively, HSIC provides a statistic for testing their statistical independence as well as a measure of dependence. Let MU and MV be the Gram matrices for U and V calculated on the sample, respectively. An estimator of HSIC is given by [3]

HSICUV=1(N1)2tr(MUHMVH), (5)

where H is used to center the features, with entries HijδijN1.

We will use a normalized version of the estimated HSIC, which is invariant to the scale in MU and MV :

HSICUVN=HSICUV1N1tr(MUH)1N1tr(MVH)=tr(MUHMVH)tr(MUH)tr(MVH). (6)

b). Dependence between Nonstationary Modules and Causal Direction Estimation:

In our case, we aim to check whether P(Y|X,C) and P(X|C) change independently when C changes. We work with the estimate of their embeddings. Then we can think of {(μ^X|C=c,μ˜^Y_,X|C=c)}c=c1cN as the observed data pairs and measure their dependence from the data pairs.

This can be done by applying (the normalized version of) the estimate of HSIC given in equation (6) to the above data pairs. The expression then involves MX, the Gram matrix of μ^X|C at C = c1, c2, …, cN, and MY_X, the Gram matrix of μ˜^Y_X|C at C = c1, c2, …, cN. In particular, the dependence between P(Y|X,C) and P(X|C) on the given data can be estimated by

Δ^XY=tr(MXHMY_XH)tr(MXH)tr(MY_XH). (7)

Similarly, for the hypothetic direction YX the dependence between P(X|Y,C) and P(Y|C) on the data is estimated by

Δ^YX=tr(MYHMX_YH)tr(MYH)tr(MX_YH). (8)

We have the following rule to infer the causal direction between X and Y.

Causal Direction Inference Rule:

Suppose that X and Y are two random variables with N observations. We assume that X and Y are adjacent with at least one of them adjacent to C. We further assume that there are no confounders behind them. If Δ^XY<Δ^YX, which are given by equations (7) and (8), respectively, then X is the cause of Y. Otherwise we conclude that Y is a cause of X.

B. With More Than Two Variables

Our rule to determine the causal direction in the two-variable case can be extended to a heuristic method for inferring causal directions in the multi-variable case. Suppose that we have m observed random variables {Vi}i=1m, and the causal skeleton UG of the m random variables is recovered by Algorithm 1. Let VS be the subset of {Vi}i=1m such that ViVS iff Vi’s causal module is nonstationary or there is a Vj adjacent to Vi whose causal module is nonstationary. Assume that there are no nonstationary confounders behind VS. We propose to use the following heuristic to estimate the causal directions between variables in VS (Algorithm 3).

Algorithm 3.

Causal Direction Determination

1) Input: observations of {Vi}1m, subset VS, causal skeleton UG.
2) Let R = VS.
3) For each variable Vi in R, let Adi be the set of variables adjacent to Vi in UG. Estimate the dependence between P(Adi) and P(Vi|Adi) using equation (7), and denote the estimation as Δ^(i). Find the variable Vl in R with the minimum Δ^.
4) Orient all edges incident to Vl in U into Vl. (In other words, make Vl a leaf.)
5) Remove Vl from R.
6) Repeat steps 3, 4, and 5 until only one variable is left in R.
7) Output: Graph UG (with edges between variables in VS oriented).

For variables outside VS (i.e., variables whose modules are stationary and which are adjacent only to variables with stationary modules), the causal direction between them cannot be determined by Algorithm 3. In such a case, one may further infer some causal directions by making use of Meek orientation rules [5].

V. EXPERIMENTAL RESULTS

A. Simulations

We generated synthetic data according to the SEMs specified in Fig. 1. We considered different sources of nonstationarity: (1) nonstationarity due to the change of causal coefficients; (2) nonstationary due to the change of influences from the noise. More specifically, the modules for V2, V3 and V5 are nonstationary in the sense of (1), and V2, V6, and V7 are nonstationary in the sense of (2). The nonstationarity is governed by functions ai(t) (i = 2, 3, 5, 6, 7). We considered both smooth changes and sudden changes of ai. Smooth change: We generated ai by sampling from a Gaussian process (GP) prior with a squared exponential kernel. Sudden change: We generated the sudden change of ai with block signal. In both cases, ais are all sampled independently to ensure the assumption that causal modules change independently (that is, there is no nonstationary confounding). The functions {fi}i=28 are randomly chosen from linear functions, sinusoid functions, and polynomial functions. The noise terms Ei (i = 1, · · ·, 8) are randomly chosen from Gaussian distributions and uniform distributions. We also considered different sample sizes (N = 600,1200). There are hence four settings in total: (1) N = 600, smooth change; (2) N = 600, sudden change; (3) N = 1200, smooth change; (4) N = 1200, sudden change. For each setting, we ran 50 trials.

Fig. 1:

Fig. 1:

The SEMs according to which we generated the synthetic data.

We first learned causal skeletons by the procedure in Algorithm 1, with PC search [9] and kernel-based conditional independence (KCI) test [12]. We included time information T as C in the causal system to capture nonstationarity, and thus we can recover the causal skeleton and detect changing causal modules. Next, we inferred causal directions by making use of the independence between causal modules, with the procedure proposed in Algorithm 3. We compared it with the method proposed in [11], which uses a window-based method to infer causal directions. For those pairs of adjacent variables that do not have nonstationary causal modules, we try to infer causal directions, if possible, by Meek orientation rules [5]. Then, based on the recovered causal graph, we extracted the nonstationary driving force of changing causal modules by the procedure NoDFEs in Algorithm 2. We used Gaussian kernels both in kernel embedding of constructed joint distributions and kernel PCA. We compared our approach with linear time-dependent functional causal model [4], which puts a GP prior on time-varying coefficients and uses the mean of the posterior to represent the nonstationary driving force. In addition, we compared our methods with Bayesian change point detection [1] 2, which is widely used in nonstationary data to detect change points; we did Bayesian change point detection on V2, V3, V5, V6, and V7, whose causal modules change over time.

We counted a causal connection between two variables as genuine if it exists in more than 85% trials. Algorithm 1 identifies the causal skeleton and nonstatioary causal modules correctly, in all four setting. Table I shows the accuracy of inferred causal directions in different settings, compared with the window-based method proposed in [11]. Our method obviously and significantly outperforms the window-based method, especially in cases of smooth changes. To our knowledge, there are no other comparable methods that can be used to infer causal directions in the nonstationary case.

TABLE I:

Accuracy of inferred causal directions

Our method Window-based
N=600, smooth 85.4% 56.4%
N=600, sudden 85.0% 70.2%
N=1200, smooth 87.9% 59.6%
N=1200, sudden 88.5% 74.2%

Figure 2 visualize the estimated nonstationary driving force of changing causal modules for smooth changes, when N = 600. Left panel: blue lines are estimated nonstationary driving force by NoDFEs. Red lines are ground truth. Vertical black dashed lines indicate detected change points by Bayesian change point detection. Middle Panel: the largest ten eigenvalues of Gram matrix Mg. Right Panel: blue lines are recovered nonstationary components by linear time-dependent GP. Red lines are ground truth. The scales of recovery have been adapted. We only showed the first principal component in the left panel, since the first eigenvector captures most of the variance (middle panel). We found that NoDFEs gives the best recovery in all cases. Bayesian change point detection fails to handle the case of smooth changes, while it works for sudden changes. The linear time-dependent GP does not work well when the influences from the noise change (2&5 → 6, 3 → 7).

Fig. 2:

Fig. 2:

Visualization of estimated nonstationary driving forces of changing causal modules. See main text for details.

B. Real-World Datasets

US Stock Market We applied our methods to daily returns of stocks from New York Stock Exchange, downloaded from Yahoo Finance, which contains 80 major stocks from 07/05/2006 to 12/16/2009. They are grouped into 10 sectors, energy, public utilities, capital goods, health care, consumer service, finance, transportation, consumer nondurable goods, basic industry, and technology.

Figure 3 shows the causal connections between stock returns, each color representing one sector. We found that intrasector connections are denser than inter-sector connections. The stocks in energy, finance, public utilities, and basic industries are more likely to be causes of stocks in other sectors; among those four sectors, stocks in energy and finance cause stocks in utilities and basic industries. 37 out of 80 causal modules are nonstationary; most of them are in finance (7 out of 9) and consumer service (5 out of 7).

Fig. 3:

Fig. 3:

Recovered causal graph from 80 NYSE stocks. Each color of nodes represents one sector.

Figure 4 visualizes the estimated nonstationary driving forces of stocks USB, JCP, GE, PBR, SAN, and CHK, recovered by NoDFEs. We found that among these six stocks, USB, JCP, GE, and PBR have change points around 07/16/2007 (T1) and 05/05/2008 (T2), while SAN and CHK only have changes points around 05/05/2008 (T2),. Most stocks which have change points only at T2 have more direct causes. These findings match with the critical time points of financial crisis around the year of 2008.

Fig. 4:

Fig. 4:

The estimated nonstationary driving force of six stock returns from 07/05/2006 to 12/16/2009. The change points match with the critical time of financial crisis.

VI. CONCLUSION

In this paper we proposed nonparametric methods for estimating the underlying driving force of the change in the local causal mechanisms and for determining causal direction by leveraging distribution shift. The discovered causal direction helps construct correct causal models and, moreover, the estimated nonstationary driving force of the changes in the causal mechanisms facilitates understanding why and how the generating process changes and gives suggestions about what variables to further incorporate into the system to make it causally sufficient. We note that causal modeling and distribution shift are heavily coupled and that distribution shift actually contains useful information for causal direction determination. A line of our future research is to exploit this connection to improve online prediction in nonstationary environments.

ACKNOWLEDGEMENTS

This project was supported by the National Institutes of Health (NIH) under Award Numbers NIH-1R01EB022858-01 FAIN-R01EB022858, NIH-1R01LM012087, and NIH-5U54HG008540-02 FAIN-U54HG008540.

Footnotes

1

Here we use Y_ instead of Y to emphasize that in this constructed distribution Y and X are not symmetric, which will be used in Section IV.

REFERENCES

  • [1].Adams RP and Mackay DJC Bayesian online change point detection, 2007. Technical report, University of Cambridge, Cambridge, UK: Preprint at http://arxiv.org/abs/0710.3742v1. [Google Scholar]
  • [2].Engle RF, Hendry DF, and Richard JF Exogeneity. Econometrica, 51:277–304, 1983. [Google Scholar]
  • [3].Gretton A, Fukumizu K, Teo CH, Song L, Schölkopf B, and Smola AJ A kernel statistical test of independence In NIPS 20, pages 585–592, Cambridge, MA, 2008. MIT Press. [Google Scholar]
  • [4].Huang B, Zhang K, and Schölkopf B Identification of time-dependent causal model: A gaussian process treatment. In the 24th International Joint Conference on Artificial Intelligence, Machine Learning Track, pages 3561–3568, Buenos, Argentina, 2015. [Google Scholar]
  • [5].Meek C Strong completeness and faithfulness in bayesian networks. In Proceedings of the Eleventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-95), pages 411–419, 1995. [Google Scholar]
  • [6].Pearl J Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, 2000. [Google Scholar]
  • [7].Schölkopf B and Smola A Learning with kernels. MIT Press, Cambridge, MA, 2002. [Google Scholar]
  • [8].Schölkopf B, Smola A, and Muller K Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:1299–1319, 1998. [Google Scholar]
  • [9].Spirtes P, Glymour C, and Scheines R Causation, Prediction, and Search. MIT Press, Cambridge, MA, 2nd edition, 2001. [Google Scholar]
  • [10].Woodward J Making things happen: A theory of causal explanation. Oxford University Press, New York, 2003. [Google Scholar]
  • [11].Zhang K, Huang B, Zhang J, Glymour C, and Schölkopf B Causal discovery from nonstationary/heterogeneous data: Skeleton estimation and orientation determination. In IJCAI, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Zhang K, Peters J, Janzing D, and Schölkopf B Kernel-based conditional independence test and application in causal discovery. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), Barcelona, Spain, 2011. [Google Scholar]
  • [13].Zhang K, Zhang J, and Schölkopf B Distinguishing cause from effect based on exogeneity. In Proc. 15th Conference on Theoretical Aspects of Rationality and Knowledge (TARK 2015), 2015. [Google Scholar]

RESOURCES