Abstract
Data-based detection and quantification of causation in complex, nonlinear dynamical systems is of paramount importance to science, engineering, and beyond. Inspired by the widely used methodology in recent years, the cross-map-based techniques, we develop a general framework to advance towards a comprehensive understanding of dynamical causal mechanisms, which is consistent with the natural interpretation of causality. In particular, instead of measuring the smoothness of the cross-map as conventionally implemented, we define causation through measuring the scaling law for the continuity of the investigated dynamical system directly. The uncovered scaling law enables accurate, reliable, and efficient detection of causation and assessment of its strength in general complex dynamical systems, outperforming those existing representative methods. The continuity scaling-based framework is rigorously established and demonstrated using datasets from model complex systems and the real world.
1. Introduction
Identifying and ascertaining causal relations are a problem of paramount importance to science and engineering with broad applications [1–3]. For example, accurate detection of causation is the key to identifying the origin of diseases in precision medicine [4] and is important to fields such as psychiatry [5]. Traditionally, associational concepts are often misinterpreted as causation [6, 7], while causal analysis in fact goes one step further beyond association in a sense that, instead of using static conditions, causation is induced under changing conditions [8]. The principle of Granger causality formalizes a paradigmatic framework [9–11], quantifying causality in terms of prediction improvements, but, because of its linear, multivariate, and statistical regression nature, the various derived methods require extensive data [12]. Entropy-based methods [13–20] face a similar difficulty. Another issue with the Granger causality is the fundamental requirement of separability of the underlying dynamical variables, which usually cannot be met in the real world systems. To overcome these difficulties, the cross-map-based techniques, paradigms tailored to dynamical systems, have been developed and have gained widespread attention in the past decade [21–36].
The cross-map is originated from nonlinear time series analysis [37–42]. A brief understanding of such a map is as follows. Consider two subsystems: X and Y. In the reconstructed phase space of X, if for any state vector at a time a set of neighboring vectors can be found, the set of the cross-mapped vectors, which are the partners with equal time of X, could be available in Y. The cross-map underlying the reconstructed spaces can be written as Yt = Φ(Xt) (where Xt and Yt are delay coordinates with sufficiently large dimensions) for the case of Yunidirectionally causing X while, mathematically, its inverse map does not exist [34]. In practice, using the prior knowledge on the true causality in toy models or/and the assumption on the expanding property of Φ (representing by its Jacobian's singular value larger than one in the topological causality framework [24]), scientists developed many practically useful techniques based on the cross-map for causality detection. For instance, the “activity” method, originally designed to measure the continuity of the inverse of the cross-map, compares the divergence of the cross-mapped vectors to the state vector in X with the divergence of the independently-selected neighboring vectors to the same state vector [22, 23]. The topological causality measures the divergence rate of the cross-mapped vectors from the state vectors in Y [24], and the convergent cross-mapping (CCM), increasing the length of time series, compares the true state vector Y with the average of the cross-mapped vectors, as the estimation of Y [21, 25–36]. Then, the change of the divergence or the accuracy of the estimation is statistically evaluated for determining the causation from Y to X. Inversely, the causation from X to Y can be evaluated in an analogous manner. The above evaluations [21, 24, 26–36] can be understood at a conceptional and qualitative level and perform well in many demonstrations.
In this work, striving for a comprehensive understanding of causal mechanisms and inspired by the cross-map-based techniques, we develop a mathematically rigorous framework for detecting causality in nonlinear dynamical systems, turning eyes towards investigating the original systems from their cross-maps, which is also logically consistent with the natural interpretation of causality as functional dependences [2, 8]. The skills used in cross-map-based methods are assimilated in our framework, while we directly study the original dynamical systems or the reconstructed systems instead of the cross-maps. The foundation of our framework is the scaling law for the changing relation of ε with δ arising from the continuity for the investigated system, henceforth the term “continuity scaling”. In addition to providing a theory, we demonstrate, using synthetic and real-world data, that our continuity scaling framework is accurate, computationally efficient, widely applicable, showing advantages over the existing methods.
2. Continuity Scaling Framework
To explain the mathematical idea behind the development of our framework, we use the following class of discrete time dynamical systems: xt+1 = f(xt, yt) and yt+1 = g(xt, yt) for t ∈ ℕ, where the state variables {xt}t∈ℕ, {yt}t∈ℕ evolve in the compact manifolds ℳ, 𝒩 of dimension Dℳ, D𝒩 under sufficiently smooth map f, g, respectively. We adopt the common recognition of causality in dynamical systems.
Definition 1 . —
If the dependence of f(x, y) on y is nontrivial (i.e., a directional coupling exists), a variation in y results in a change in the value of f(x, y) for any given x, which, according to the natural interpretation of causality [2, 43], admits that y : {yt}t∈ℕ has a direct causal effect on x : {xt}t∈ℕ, denoted by y↪x, as shown in the upper panel of Figure 1(a).
Figure 1.
Illustration of causal relation between two sets of dynamical variables. (a) Existence of causation from y in 𝒩 to x in ℳ, where each correspondence from xt+1 to yt is one-to-one, represented by the line or the arrow, respectively, in the upper and the middle panels. In this case, a change in lnεx results in a direct change in δy (the lower panel) with εx and δy denoting the neighborhood size of the resulting variable x and of the causal variable y, respectively. (b) Absence of causation from y to x, where every point on each trajectory, {yt}, in 𝒩 could be the correspondent point from xt+1 in ℳ (the upper panel) and thus every point in 𝒩 belongs to the largest δ-neighborhood of yt (the middle panel). In this case, δy does not depend on εx (the lower panel). Also refer to the supplemental animation for illustration.
We now interpret the causal relationship stipulated by the continuity of a function. Let fxg(·)≜f(xg, ·) for a given point xg ∈ ℳ. For any yP ∈ 𝒩, we denote its image under the given function by xI≜fxg(yP). Applying the logic statement of a continuous function to fxg(·), we have that, for any neighborhood 𝒪(xI, ε) centered at xI and of radius ε > 0, there exists a neighborhood 𝒪(yP, δ) centered at yP of radius δ > 0, such that fxg(𝒪(yP, δ)) ⊂ 𝒪(xI, ε). The neighborhood and its radius are defined by
(1) |
where dist𝒮(·, ·) represents an appropriate metric describing the distance between two given points in a specified manifold 𝒮 with 𝒮 = ℳ or 𝒩. The meaning of this mathematical statement is that, if we have a neighborhood of the resulting variable xI first, we can then find a neighborhood for the causal variable yP to satisfy the above mapping and inclusion relation. This operation of “first-ε-then-δ” provides a rigorous base for the principle that the information about the resulting variable can be used to estimate the information of the causal variable and therefore to ascertain causation, as indicated by the long arrow in the middle panels of Figure 1(a). Note that, the existence of the δ > 0 neighborhood is always guaranteed for a continuous map fxg. In fact, due to the compactness of the manifold 𝒩, a largest value of δ exists. However, if yP does not have an explicit causal effect on the variable xI, i.e., fxg is independent of yP, the existence of δ is still assured but it is independent of the value of ε, as shown in the upper panel of Figure 1(b). This means that merely determining the existence of a δ-neighborhood is not enough for inferring causation - it is necessary to vary ε systematically and to examine the scaling relation between δ and ε. In the following we discuss a number of scenarios.
Case I. —
Dynamical variables {(xt, yt)}t∈ℕ are fully measurable. For any given constant εx > 0, the set {xτ ∈ ℳ|τ ∈ Ixt(εx)} can be used to approximate the neighborhood 𝒪(xt+1, εx), where the time index set is
(2)
The radius δyt = δyt(εx) of the neighborhood 𝒪(yt, δyt) satisfying fxg=xt(𝒪(yt, δyt)) ⊂ 𝒪(xt+1, εx) can be estimated as
(3) |
where #[·] is the cardinality of the given set and the index set is .
The strict mathematical steps for estimating δyt are given in Section II of Supplementary Information (SI). We emphasize that here correspondence between xt+1 and yt is investigated, differing from the cross-map-based methods, with one-step time difference naturally arising. This consideration yields a key condition [DD], which is only need when considering the original iteration/flow and whose detailed description and universality are demonstrated in SI. We reveal a linear scaling law between 〈δyt〉t∈ℕ and lnεx, as shown in the lower panels of Figure 1, whose slope sy↪x is an indicator of the correspondent relation between ε and δ and hence the causal relation y↪x. Here, 〈·〉t∈ℕ denotes the average over time. In particular, a larger slope value implies a stronger causation in the direction from y to x as represented by the map functions f(xt, yt) (Figure 1(a)), while a near zero slope indicates null causation in this direction (Figure 1(b)). Likewise, possible causation in the reversed direction, x↪y, as represented by the function g(xt, yt), can be assessed analogously. And the unidirectional case when f(x, y) = f0(x) independent of y is uniformly considered in Case II. We summarize the consideration below and an argument for the generic existence of the scaling law is provided in Section II of SI.
Theorem 1 . —
For dynamical variables {(xt, yt)}t∈ℕ measured directly from the dynamical systems, if the slope sy↪x defined above is zero, no causation exists from y to x. Otherwise, a directional coupling can be confirmed from y to x and the slope sy↪x increases monotonically with the coupling strength.
Case II. —
The dynamical variables {(xt, yt)}t∈ℕ are not directly accessible but measurable time series {ut}t∈ℕ and {vt}t∈ℕ are available, where ut = u(xt) and vt = v(yt) with u: ℳ⟶ℝru and v: 𝒩⟶ℝrv being smooth observational functions. To assess causation from y to x, we assume one-dimensional observational time series (for simplicity): ru = rv = 1, and use the classical delay-coordinate embedding method [37–42, 44] to reconstruct the phase space: ut = (ut, ut+τu,⋯,ut+(du − 1)τu)T and vt = (vt, vt+τv,⋯,vt+(dv − 1)τv)T, where τu,v is the delay time and du,v > 2(Dℳ + D𝒩) is the embedding dimension that can be determined using some standard criteria [45]. As illustrated in Figure 2, the dynamical evolution of the reconstructed states {(ut, vt)}t∈ℕ is governed by
(4)
Figure 2.
Illustration of system dynamics before and after embedding for Case II. In the left panel, the arrows describe how the original systems (f, g) is equivalent to the system after embedding. In the right panel, causation between the internal variables x and y can be ascertained by detecting the causation between the variables u and v reconstructed from measured time series.
The map functions can be calculated as , , where the embedding (diffeomorphism) Es: with , s = u or v, is given by
(5) |
with the inverse function Es−1 defined on , [f, g]k representing the kth iteration of the map and the projection mappings as Π1(x, y) = x and Π2(x, y) = y. Case II has now been reduced to Case I, and our continuity scaling framework can be used to ascertain the causation from v to u based on the measured time series with the indices Iut(εu), δvt(εu) and sv↪u (equations (2) and (3)).
Does the causation from v to u imply causation from y to x? The answer is affirmative, which can be argued, as follows. If the original map function f is independent of y: f(x, y) = f0(x), there is no causation from y to x. In this case, the embedding Eu(x, y) becomes independent of y, degenerating into the form of Eu(x, y) = Eu0(x), a diffeomorphism from ℳ to only. As a result, equation (4) becomes and , where and the resulting mapping is independent of v. The independence can be validated by computing the slope sv↪u associated with the scaling relation between 〈δvt〉t∈ℕ and lnεu, where a zero slope indicates null causation from v to u and hence null causation from y to x. Conversely, a finite slope signifies causation between the variables. Thus, any type of causal relation (unidirectional or bi-directional) detected between the reconstructed state variables {(ut, vt)}t∈ℕ implies the same type of causal relation between the internal but inaccessible variables x and y of the original system.
Case III. —
The structure of the internal variables is completely unknown. Given the observational functions : ℳ × 𝒩⟶ℝ with and , we first reconstruct the state space: and . To detect and quantify causation from to (or vice versa), we carry out a continuity scaling analysis with the modified indices , and . Differing from Case II, here, due to the lack of knowledge about the correspondence structure between the internal and observational variables, a causal relation for the latter does not definitely imply the same for the former.
Case IV. —
Continuous-time dynamical systems possessing a sufficiently smooth flow {St; t ∈ ℝ} on a compact manifold ℋ: dSt(u0)/dt = χ(St(u0)), where χ is the vector field. Let and be two respective time series from the smooth observational functions : ℋ⟶ℝ with and , where 1/ω is the sampling rate and ν is the time shift. Defining Ξ≜Sω: ℋ⟶ℋ and , we obtain a discrete-time system as with the observational functions as and , reducing the case to Case III and rendering applicable our continuity scaling analysis to unveil and quantify the causal relation between and . If the domains of and have their own restrictions on some particular subspaces, e.g., : ℋu⟶ℝ and : ℋv⟶ℝ with ℋ = ℋu ⊕ ℋv, the case is further reduced to Case II, so the detected causal relation between the observational variables imply causation between the internal variables belonging to their respective subspaces.
3. Demonstrations: From Complex Dynamical Models to Real-World Networks
To demonstrate the efficacy of our continuity scaling framework and its superior performance, we have carried out extensive numerical tests with a large number of synthetic and empirical datasets, including those from gene regulatory networks as well as those of air pollution and hospital admission. The practical steps of the continuity scaling framework together with the significance test procedures are described in Methods. We present three representative examples here, while leaving others of significance to SI.
The first example is an ecological model of two unidirectionally interacting species: x1,t+1 = x1,t(3.8 − 3.8x1,t − μ12x2,t) and x2,t+1 = x2,t(3.7 − 3.7x2,t − μ21x1,t). With time series {(x1,t, x2,t)}t∈ℕ obtained from different values of the coupling parameters, our continuity scaling framework yields correct results of different degree of unidirectional causation, as shown in Figures 3(a) and 3(b). In all cases, there exists a reasonable range of lnεx2 (neither too small nor too large) from which the slope sx1↪x2 of the linear scaling can be extracted. The statistical significance of the estimated slope values and consequently the strength of causation can be assessed with the standard p-value test [46] (Methods and SI). An ecological model with bidirectional coupling has also been tested (see Section III of SI). Figures 3(c) and 3(d) show the results from ecological networks of five mutually interacting species on a ring and on a tree structure, respectively, where the color-coded slope values reflect accurately the interaction patterns in both cases.
Figure 3.
Ascertaining and characterizing causation in various ecological systems of interacting species. (a, b) Unidirectional causation of two coupled species. In (a), the values of the slope sx1↪x2 associated with the causal relation x1↪x2 are approximately 0.0004, 0.1167, 0.1203, and 0.1238 for four different values of the coupling parameter μ21. (b) Near zero slope values sx2↪x1 for x2↪x1, indicating its nonexistence. (c, d) Inferred causal network of five species whose interacting structure is, respectively, that of a ring: xi↪xi+1(mod5) (i = 1, ⋯, 5) and of a tree: xj↪xj+1,j+3 (j = 1, 2), where the estimated slope values are color-coded. Results of a statistical analysis of the accuracy and reliability of the determined causal interactions are presented in SI Section III. Time series of length 5000 are used in all these simulations. The embedding parameters are τs = 1 and ds = 3 with s = x1, ⋯, x5.
The second example is the coupled Lorenz system: , , with i, j = 1, 2 and . We use time series {y1,t, y2,t}t=nω for detecting different configurations of causation (see Section III of SI). Figure 4 presents the overall result, where the color-coded estimated values of the slope from the continuity scaling are shown for different combinations of the sampling rate 1/ω and coupling strength. Even with relatively low sampling rate, our continuity scaling framework can successfully detect and quantify the strength of causation. Note that the accuracy does not vary monotonously with the sampling rate, indicating the potential of our framework to ascertain and quantify causation even with rare data. Moreover, the proposed index can accurately reflect the true causal strength (denoting by the coupling parameter), which is also evidenced by numerical tests in Sections III and IV of SI. Robustness tests against different noise perturbations are provided in Section III of SI demonstrating the practical usefulness of our framework. Additionally, analogous to the first example, we present in SI several examples on causation detection in the coupled Lorenz system with nonlinear couplings, and the Rössler-Lorenz system, etc., which further demonstrates the generic efficacy of our framework.
Figure 4.
Detecting causation in the unidirectionally coupled Lorenz system. The results are for different values of μ21 (μ12 = 0) and sampling rate 1/ω. (a, b) Color-coded values of the slopes sy1↪y2 and sy2↪y1, respectively. The integration time step is 10−3 and the embedding parameters are ds = 7, τs ≈ 0.05 with ω|τs (s = y1 or y2). See Section III and Table S9 of SI for all the other parameters including the time series lengths used in the simulations.
In addition, we present study on several real-world dataset, which brings new insights to the evolutionary mechanism of underlying systems. We study gene expression data from DREAM4 in silico Network Challenge [47, 48], whose intrinsic gene regulatory networks (GRNs) are known for verification (Figure 5(a) and Figure S17 of SI). Applying our framework to these data, we ascertain the causations between each pair of genes by using the continuity scaling framework. The corresponding ROC curves for five different networks as well as their AUROC values are shown in Figure 5(b), which indicates a high detection accuracy in dealing with real-world data.
Figure 5.
Detecting causal interactions in five GRNs. (a) One representative GRN containing 20 randomly selected genes. Other four structures can be found in Figure S17 of SI. (b) The ROC curves as well as their AUROC values demonstrate the efficacy of our framework.
We then test the causal relationship in a marine ecosystem consisting of Pacific sardine landings, northern anchovy landings and sea surface temperature (SST). We reveal new findings to support the competing relationship hypothesis stated in [49] which cannot be detected by CCM [25]. As pointed out in Figure 6, while common influence from SST to both species is verified with both methods, our continuity scaling additionally illuminates notable influence from anchovy to sardine with its reverse direction being less significant. While competing relationship plays an important role in ecosystems, continuity scaling can reveal more essential interaction mechanism. See Section III.E of SI for more details.
Figure 6.
The comparison of causal network structure detected by continuity scaling and CCM among sea surface temperature, sardine, and anchovy.
Moreover, we study the transmission mechanism of the recent COVID-19 pandemic. Particularly, we analyze the daily new cases of COVID-19 of representative countries for two stages: day 1 (January 22 nd 2020) to day 100 (April 30 th 2020) and day 101 (May 1 st 2020) to day 391 (February 15 th 2021). Our continuity scaling is pairwisely applied to reconstruct the transmission causal network. As shown in Figure 7, China shows a significant effect on a few countries at the first stage and this effect disappears at the second stage. However, other countries show a different situation with China, whose external effect lasts as shown in Section III.E and Figure S18 of SI. Our results accord with that China holds stringent epidemic control strategies with sporadic domestic infections, as evidenced by official daily briefings, demonstrating the potential of continuity scaling in detecting causal networks for ongoing complex systems. Additionally, We emphasize that day 100 is a suitable critical day to distinguish the early severe stage and the late well-under-control stage of the pandemic (see Figure S18(a) of SI), while slight change of the critical day will not nullify our result. As shown in Figure S18(b) of SI, when the critical day varies from day 94 to day 106, no significant change (less than 5%) of the detected causal links occurs at both stages, and the number of countries under influence of China at Stage 2 remains zero. See more details in Section III.E of SI.
Figure 7.
The causal effect from China to other countries of the COVID-19 pandemic detected by continuity scaling between stages 1 and 2. Here, stage 1 is from January 22 nd 2020 to April 30 th 2020, and stage 2 is from May 1 st 2020 to February 15 th 2021. For those detected causal links between all countries, refer to Section III.E and Figure S18 of SI. These maps are for illustration only.
Additional real world examples including air pollutants and hospital admission record from Hong Kong are also shown in Section III of SI.
4. Discussion
To summarize, we have developed a novel framework for data-based detection and quantification of causation in complex dynamical systems. On the basis of the widely used cross-map-based techniques, our framework enjoys a rigorous foundation, focusing on the continuity scaling law of the concerned system directly instead of only investigating the continuity of its cross-map. Therefore, our framework is consistent with the standard interpretation of causality, and works even in the typical cases where several existing typical methods do not perform that well or even they fail (see the comparison results in Section IV of SI). In addition, the mathematical reasoning leading to the core of our framework, the continuity scaling, helps resolve the long-standing issue associated with techniques directly using cross-map that information about the resulting variables is required to project the dynamical behavior of the causal variables, whereas several works in the literature [50], which directly studied the continuity or the smoothness of the cross-map, likely yielded confused detected results on causal directions.
Computational complexity. The computational complexity of the algorithm is O(T2Nε), which is relatively smaller than the CCM method, whose computational complexity is O(T2logT).
Limitations and future works. Nevertheless, there are still some spaces for improving the presently proposed framework. First, currently, only bivariate detection algorithm is designed, so generalization to multivariate network inference requires further considerations, as analogous to those works presented in Refs. [51–53]. Second, the causal time delay has not been taken into account in the current framework, so it also could be further investigated, similar to the work reported in Ref. [33]. Also, more advanced algorithms, such as the one developed in Ref. [54], could be integrated into this framework for detecting those temporal causal structures. Definitely, we will settle these questions in our future work.
Detecting causality in complex dynamical systems has broad applications not only in science and engineering, but also in many aspects of the modern society, demanding accurate, efficient, and rigorously justified and hence trustworthy methodologies. Our present work provides a vehicle along this feat and indeed resolves the puzzles arising in the use of those influential methods.
5. Methods
Continuity scaling framework: a detailed description of algorithms. Let {ut}t=1,2,⋯,T and {vt}t=1,2,⋯,T be two experimentally measured time series of internal variables {(xt, yt)}t∈ℕ. Typically, if the dynamical variables {(xt, yt)}t∈ℕ are accessible, {(ut, vt)} reduce to one-dimensional coordinate of the internal system. The key computational steps of our continuity scaling framework are described, as follows.
We reconstruct the phase space using the classical method of delay coordinate embedding [37] with the optimal embedding dimension dz and time lag τz determined by the methods in Refs. [55, 56] (i.e., the false nearest neighbors and the delayed mutual information, respectively):
(6) |
where z = u, v, T0 = min{T − (dz − 1)τz|z = u, v}, and Euclidean distance is used for both ℒu,v.
We present the steps for causation detection using the case of v↪u as an example.
We calculate the respective diameters for ℒu,v as
(7) |
where z = u, v, and z = u, v. We set up a group of numbers, {εu,j}j=1,⋯,Nε, as εu,1 = e · Du, εu,Nε = Du, with the other elements satisfying
(8) |
for j = 2, ⋯, Nε − 1. Then, in light of (2) with (3), we have
(9) |
with
(10) |
where numerically, εu alters its value successively from the set {εu,j}j=1,⋯,Nε, and the threshold E is a positive number chosen to avoid the situation where the nearest neighboring points are induced by the consecutive time order only.
As defined, 〈δvt(εu)〉t∈ℕ is the average of δvt(εu) over all possible time t. We use a finite number of pairs {(〈δvt(εu,j)〉t∈ℕT0, lnεu,j)}j=1,⋯,Nε to approximate the scaling relation between 〈δvt(εu)〉t∈ℕ and lnεu, where ℕT0 = {1, 2, ⋯, T0}. Theoretically, a larger value of Nε and a smaller value of e will result in a more accurate approximation of the scaling relation. In practice, the accuracy is determined by the length of the observational time series, the sampling duration, and different types of noise perturbations. In numerical simulations, we set e = 0.001 and Nε = 33. In addition, a too large or a too small value of εu can induce insufficient data to restore the neighborhood and/or the entire manifold. We thus set δvt(εu,j) = δvt(εu,j+1) as a practical technique as the number of points is limited practically in a small neighborhood. As a result, near zero slope values would appear on both sides of the scaling curve 〈δvt(εu)〉t∈ℕ-lnεu, as demonstrated in Figure 3 and in SI. In such a case, to estimate the slope of the scaling relation, we take the following approach.
Define a group of numbers by
(11) |
where j = 1, ⋯, Nε − 1, sort them in a descending order, from which we determine the [Nε + 1/2] largest numbers, collect their subscripts - j's together as an index set , and set . Applying the least squares method to the linear regression model:
(12) |
with the dataset {(〈δvt(εu,j)〉t∈ℕT0, lnεu,j)}j∈H, we get the optimal values for the parameters (S, b) in (12) and finally obtain the slope of the scaling relation as .
For the other causal direction from u to v, these steps are equally applicable to estimating the slope su↪v.
To assess the statistical significance of the numerically determined causation, we devise the following surrogate test using the case of v causing u as an illustrative example.
Divide the time series {u(t)}t∈ℕT0 into NG consecutive segments of equal length (except for the last segment - the shortest segment). Randomly shuffle these segments and then regroup them into a surrogate sequence . Applying such a random permutation method to {v(t)}t∈ℕT0 generates another surrogate sequence . Carrying out the slope computation yields . The procedure can be repeated for a sufficient number of times, say Q, which consequently yields a group of estimated slopes, denoted as , where is set as sv↪u obtained from the original time series. For all the estimated slopes, we calculate their mean and the standard deviation . The p-value for sv↪u is calculated as
(13) |
where normcdf[·] is the cumulative Gaussian distribution function. The principle of statistical hypothesis testing guarantees the existence of causation from v to u if psv↪u < 0.05.
In simulations, we set the number of segments to be NG = 25 and the number of times for random permutations to be Q ≥ 20.
Acknowledgments
W.L. is supported by the National Key R&D Program of China (Grant No. 2018YFC0116600), by the National Natural Science Foundation of China (Grant Nos. 11925103 and 61773125), by the STCSM (Grant No. 18DZ1201000), and by the Shanghai Municipal Science and Technology Major Project (No. 2021SHZDZX0103). Y.-C.L. is supported by AFOSR (Grant No. FA9550-21-1-0438). S.-Y.L. is supported by the National Natural Science Foundation of China (No. 12101133) and “Chenguang Program” supported by Shanghai Education Development Foundation and Shanghai Municipal Education Commission (No. 20CG01). Q.N. is partially supported by NSF (Grant No. DMS1763272) and the Simons Foundation (Grant No. 594598). H.-F.M. is supported by the National Natural Science Foundation of China (Grant No. 12171350) and by the National Key R&D Program of China (Grant No. 2018YFA0801100).
Additional Points
Code Availability. The source codes for our CS framework are available at https://github.com/bianzhiyu/ContinuityScaling.
Conflicts of Interest
The authors declare no competing interests.
Authors' Contributions
W.L. conceived idea. X.Y., S.-Y.L., and W.L. designed and performed the research. X.Y., S.-Y.L., H.-F.M., and W.L. analyzed the data. H.-F.M., Y.-C.L., and Q.N. contributed data and analysis tools, and all the authors wrote the paper. X.Y. and S.-Y.L. equally contributed to this work.
Supplementary Materials
Supplementary materials: SI.pdf (where we include analytic and computational details of the results in the main text. This SI is helpful but not essential for understanding the main results of the paper.)
References
- 1.Bunge M. Causality and Modern Science . Routledge; 2017. [DOI] [Google Scholar]
- 2.Pearl J. Causality . Cambridge university press; 2013. [Google Scholar]
- 3.Runge J., Bathiany S., Bollt E., et al. Inferring causation from time series in earth system sciences. Nature Communications . 2019;10(1):p. 2553. doi: 10.1038/s41467-019-10105-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Collins F. S., Varmus H. A new initiative on precision medicine. New England Journal of Medicine . 2015;372(9):793–795. doi: 10.1056/NEJMp1500523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Saxe G. N., Statnikov A., Fenyo D., et al. A complex systems approach to causal discovery in psychiatry. PloS One . 2016;11(3, article e0151174) doi: 10.1371/journal.pone.0151174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Cox D. R., Hinkley D. V. Theoretical Statistics . CRC Press; 1979. [DOI] [Google Scholar]
- 7.Cover T. M. Elements of Information Theory . John Wiley & Sons; 1999. [Google Scholar]
- 8.Pearl J. Causal inference in statistics: an overview. Statistics Surveys . 2009;3:96–146. doi: 10.1214/09-SS05710.1214/09-SS057. [DOI] [Google Scholar]
- 9.Wiener N. The Theory of Prediction . Modern mathematics for engineers; 1956. [Google Scholar]
- 10.Granger C. W. Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society . 1969;37(3):424–438. doi: 10.2307/1912791. [DOI] [Google Scholar]
- 11.Haufe S., Nikulin V. V., Müller K. R., Nolte G. A critical assessment of connectivity measures for EEG data: a simulation study. NeuroImage . 2013;64:120–133. doi: 10.1016/j.neuroimage.2012.09.036. [DOI] [PubMed] [Google Scholar]
- 12.Ding M., Chen Y., Bressler S. L. Granger causality: basic theory and application to neuroscience. Handbook of Time Series Analysis: recent theoretical developments and applications . 2006;437 doi: 10.1002/9783527609970.ch17. [DOI] [Google Scholar]
- 13.Schreiber T. Measuring information transfer. Physical Review Letters . 2000;85(2):461–464. doi: 10.1103/PhysRevLett.85.461. [DOI] [PubMed] [Google Scholar]
- 14.Frenzel S., Pompe B. Partial mutual information for coupling analysis of multivariate time series. Physical Review Letters . 2007;99(20, article 204101) doi: 10.1103/PhysRevLett.99.204101. [DOI] [PubMed] [Google Scholar]
- 15.Vicente R., Wibral M., Lindner M., Pipa G. Transfer entropy–a model-free measure of effective connectivity for the neurosciences. Journal of Computational Neuroscience . 2011;30(1):45–67. doi: 10.1007/s10827-010-0262-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Runge J., Heitzig J., Petoukhov V., Kurths J. Escaping the curse of dimensionality in estimating multivariate transfer entropy. Physical Review Letters . 2012;108(25, article 258701) doi: 10.1103/PhysRevLett.108.258701. [DOI] [PubMed] [Google Scholar]
- 17.Sun J., Cafaro C., Bollt E. M. Identifying the coupling structure in complex systems through the optimal causation entropy principle. Entropy . 2014;16(6):3416–3433. doi: 10.3390/e16063416. [DOI] [Google Scholar]
- 18.Cafaro C., Lord W. M., Sun J., Bollt E. M. Causation entropy from symbolic representations of dynamical systems. Chaos: An Interdisciplinary Journal of Nonlinear Science . 2015;25, article 043106 doi: 10.1063/1.491690210.1063/1.4916902. [DOI] [PubMed] [Google Scholar]
- 19.Sun J., Taylor D., Bollt E. M. Causal network inference by optimal causation entropy. SIAM Journal on Applied Dynamical Systems . 2015;14(1):73–106. doi: 10.1137/140956166. [DOI] [Google Scholar]
- 20.Solyanik-Gorgone M., Ye J., Miscuglio M., Afanasev A., Willner A. E., Sorger V. J. Quantifying information via Shannon entropy in spatially structured optical beams. Research . 2021;2021, article 9780760 doi: 10.34133/2021/9780760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Hirata Y., Aihara K. Identifying hidden common causes from bivariate time series: a method using recurrence plots. Physical Review E . 2010;81(1, article 016203) doi: 10.1103/PhysRevE.81.016203. [DOI] [PubMed] [Google Scholar]
- 22.Quiroga R. Q., Arnhold J., Grassberger P. Learning driver-response relationships from synchronization patterns. Physical Review E . 2000;61(5):5142–5148. doi: 10.1103/PhysRevE.61.5142. [DOI] [PubMed] [Google Scholar]
- 23.Arnhold J., Grassberger P., Lehnertz K., Elger C. E. A robust method for detecting interdependences: application to intracranially recorded eeg. Physica D: Nonlinear Phenomena . 1999;134(4):419–430. doi: 10.1016/S0167-2789(99)00140-2. [DOI] [Google Scholar]
- 24.Harnack D., Laminski E., Schünemann M., Pawelzik K. R. Topological causality in dynamical systems. Physical Review Letters . 2017;119(9, article 098301) doi: 10.1103/PhysRevLett.119.098301. [DOI] [PubMed] [Google Scholar]
- 25.Sugihara G., May R., Ye H., et al. Detecting causality in complex ecosystems. Science . 2012;338(6106):496–500. doi: 10.1126/science.1227079. [DOI] [PubMed] [Google Scholar]
- 26.Deyle E. R., Fogarty M., Hsieh C. H., et al. Predicting climate effects on pacific sardine. Proceedings of the National Academy of Sciences . 2013;110(16):6430–6435. doi: 10.1073/pnas.1215506110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wang X., Piao S., Ciais P., et al. A two-fold increase of carbon cycle sensitivity to tropical temperature variations. Nature . 2014;506(7487):212–215. doi: 10.1038/nature12915. [DOI] [PubMed] [Google Scholar]
- 28.Ma H., Aihara K., Chen L. Detecting causality from nonlinear dynamics with short-term time series. Scientific Reports . 2014;4:1–10. doi: 10.1038/srep0746410.1038/srep07464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.McCracken J. M., Weigel R. S. Convergent cross-mapping and pairwise asymmetric inference. Physical Review E . 2014;90(6, article 062903) doi: 10.1103/PhysRevE.90.062903. [DOI] [PubMed] [Google Scholar]
- 30.Ye H., Deyle E. R., Gilarranz L. J., Sugihara G. Distinguishing time-delayed causal interactions using convergent cross mapping. Scientific Reports . 2015;5(1, article 14750) doi: 10.1038/srep14750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Clark A. T., Ye H., Isbell F., et al. Spatial convergent cross mapping to detect causal relationships from short time series. Ecology . 2015;96(5):1174–1181. doi: 10.1890/14-1479.1. [DOI] [PubMed] [Google Scholar]
- 32.Jiang J.-J., Huang Z.-G., Huang L., Liu H., Lai Y.-C. Directed dynamical influence is more detectable with noise. Scientific Reports . 2016;6(1, article 24088) doi: 10.1038/srep24088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ma H., Leng S., Tao C., et al. Detection of time delays and directional interactions based on time series from complex dynamical systems. Physical Review E . 2017;96(1, article 012221) doi: 10.1103/PhysRevE.96.012221. [DOI] [PubMed] [Google Scholar]
- 34.Amigó J. M., Hirata Y. Detecting directional couplings from multivariate flows by the joint distance distribution. Chaos: An Interdisciplinary Journal of Nonlinear Science . 2018;28, article 075302 doi: 10.1063/1.501077910.1063/1.5010779. [DOI] [PubMed] [Google Scholar]
- 35.Wang Y., Yang J., Chen Y., De Maeyer P., Li Z., Duan W. Detecting the causal effect of soil moisture on precipitation using convergent cross mapping. Scientific Reports . 2018;8(1):1–8. doi: 10.1038/s41598-018-30669-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Leng S., Ma H., Kurths J., et al. Partial cross mapping eliminates indirect causal influences. Nature Communications . 2020;11(1):1–9. doi: 10.1038/s41467-020-16238-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Takens F. Dynamical Systems and Turbulence, Warwick 1980 . Springer; 1981. Detecting strange attractors in turbulence. [DOI] [Google Scholar]
- 38.Packard N. H., Crutchfield J. P., Farmer J. D., Shaw R. S. Geometry from a time series. Physical Review Letters . 1980;45(9):712–716. doi: 10.1103/PhysRevLett.45.712. [DOI] [Google Scholar]
- 39.Sauer T., Yorke J. A., Casdagli M. Embedology. Journal of Statistical Physics . 1991;65(3-4):579–616. doi: 10.1007/BF01053745. [DOI] [Google Scholar]
- 40.Stark J. Delay embeddings for forced systems. I. Deterministic forcing. Journal of Nonlinear Science . 1999;9(3):255–332. doi: 10.1007/s003329900072. [DOI] [Google Scholar]
- 41.Stark J., Broomhead D. S., Davies M. E., Huke J. Delay embeddings for forced systems. II. Stochastic forcing. Journal of Nonlinear Science . 2003;13(6):519–577. doi: 10.1007/s00332-003-0534-4. [DOI] [Google Scholar]
- 42.Muldoon M. R., Broomhead D. S., Huke J. P., Hegger R. Delay embedding in the presence of dynamical noise. Dynamics and Stability of Systems . 1998;13(2):175–186. doi: 10.1080/02681119808806259. [DOI] [Google Scholar]
- 43.Spirtes P., Glymour C. N., Scheines R., Heckerman D. Causation, Prediction, and Search . MIT press; 2001. [Google Scholar]
- 44.Cummins B., Gedeon T., Spendlove K. On the efficacy of state space reconstruction methods in determining causality. SIAM Journal on Applied Dynamical Systems . 2015;14(1):335–381. doi: 10.1137/130946344. [DOI] [Google Scholar]
- 45.Kantz H., Schreiber T. Nonlinear Time Series Analysis . Vol. 7. Cambridge university press; 2004. [DOI] [Google Scholar]
- 46.Lancaster G., Iatsenko D., Pidde A., Ticcinelli V., Stefanovska A. Surrogate data for hypothesis testing of physical systems. Physics Reports . 2018;748:1–60. doi: 10.1016/j.physrep.2018.06.001. [DOI] [Google Scholar]
- 47.Marbach D., Schaffter T., Mattiussi C., Floreano D. Generating realistic in silico gene networks for performance assessment of reverse engineering methods. Journal of Computational Biology . 2009;16(2):229–239. doi: 10.1089/cmb.2008.09TT. [DOI] [PubMed] [Google Scholar]
- 48.Marbach D., Prill R. J., Schaffter T., Mattiussi C., Floreano D., Stolovitzky G. Revealing strengths and weaknesses of methods for gene network inference. Proceedings of the National Academy of Sciences . 2010;107(14):6286–6291. doi: 10.1073/pnas.0913357107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Lasker R., Mac Call A. Proceedings of the Joint Oceanographic Assembly, Halifax, August 1982: General Symposia . Ontario: Department of Fisheries and Oceans; 1983. New ideas on the fluctuations of the clupeoid stocks off California. [Google Scholar]
- 50.Quyen M. L. V., Martinerie J., Adam C., Varela F. J. Nonlinear analyses of interictal EEG map the brain interdependences in human focal epilepsy. Physica D: Nonlinear Phenomena . 1999;127(3-4):250–266. doi: 10.1016/S0167-2789(98)00258-9. [DOI] [Google Scholar]
- 51.Peters J., Janzing D., Schölkopf B. Elements of Causal Inference: Foundations and Learning Algorithms . MIT Press; 2017. [Google Scholar]
- 52.Runge J. Causal network reconstruction from time series: from theoretical assumptions to practical estimation. Chaos: An Interdisciplinary Journal of Nonlinear Science . 2018;28, article 075310 doi: 10.1063/1.502505010.1063/1.5025050. [DOI] [PubMed] [Google Scholar]
- 53.Lou Y., Wang L., Chen G. Enhancing controllability robustness of q-snapback networks through redirecting edges. Research . 2019;2019, article 7857534:23. doi: 10.34133/2019/7857534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Hou J.-W., Ma H.-F., He D., Sun J., Nie Q., Lin W. Harvesting random embedding for high-frequency change-point detection in temporal complex. National Science Review . 2022;9, article nwab228 doi: 10.1093/nsr/nwab22810.1093/nsr/nwab228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Fraser A. M., Swinney H. L. Independent coordinates for strange attractors from mutual information. Physical Review A . 1986;33(2):1134–1140. doi: 10.1103/PhysRevA.33.1134. [DOI] [PubMed] [Google Scholar]
- 56.Kennel M. B., Brown R., Abarbanel H. D. Determining embedding dimension for phase-space reconstruction using a geometrical construction. Physical Review A . 1992;45(6):3403–3411. doi: 10.1103/PhysRevA.45.3403. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary materials: SI.pdf (where we include analytic and computational details of the results in the main text. This SI is helpful but not essential for understanding the main results of the paper.)