Abstract
Based on the nonlinear system theory, we introduce previously undescribed dependence measures for stationary causal processes. Our physical and predictive dependence measures quantify the degree of dependence of outputs on inputs in physical systems. The proposed dependence measures provide a natural framework for a limit theory for stationary processes. In particular, under conditions with quite simple forms, we present limit theorems for partial sums, empirical processes, and kernel density estimates. The conditions are mild and easily verifiable because they are directly related to the data-generating mechanisms.
Keywords: nonlinear time series, limit theory, kernel estimation, weak convergence
Let εi, , be independent and identically distributed (iid) random variables and g be a measurable function such that
[1] |
is a properly defined random variable. Then (Xi) is a stationary process, and it is causal or nonanticipative in the sense that Xi does not depend on the future innovations εj, j > i. The causality assumption is quite reasonable in the study of time series. Wiener (1) considered the fundamental coding and decoding problem of representing stationary and ergodic processes in terms of the form Eq. 1. In particular, Wiener studied the construction of εi based on Xk, k ≤ i. The class of processes that Eq. 1 represents is huge and it includes linear processes, Volterra processes, and many time series models. In certain situations, Eq. 1 is also called the nonlinear Wold representation. See refs. 2-4 for other deep contributions of representing stationary and ergodic processes by Eq. 1. To conduct statistical inference of such processes, it is necessary to consider the asymptotic properties of the partial sum and the empirical distribution function .
In probability theory, many limit theorems have been established for independent random variables. Those limit theorems play an important role in the related statistical inference. In the study of stochastic processes, however, independence usually does not hold, and the dependence is an intrinsic feature. In an influential paper, Rosenblatt (5) introduced the strong mixing condition. For a stationary process (Xi), let the sigma algebra , m ≤ n, and define the strong mixing coefficients
[2] |
If αn → 0, then we say that (Xi) is strong mixing. Variants of the strong mixing condition include ρ, Ψ, and β-mixing conditions among others (6). A central limit theorem (CLT) based on the strong mixing condition is proved in ref. 5. Since then, as basic assumptions on the dependence structures, the strong mixing condition and its variants have been widely used and various limit theorems have been obtained; see the extensive treatment in ref. 6.
Since the quantity in Eq. 2 measures the dependence between events A and B and it is zero if A and B are independent, it is sensible to call αn and its variants “probabilistic dependence measures.” For stationary causal processes, the calculation of probabilistic dependence measures is generally not easy because it involves the complicated manipulation of taking the supremum over two sigma algebras (7-9). Additionally, many well-known processes are not strong mixing. A prominent example is the Bernoulli shift process. Consider the simple AR(1) process Xn = (Xn-1 + εn)/2, where εi are iid Bernoulli random variables with success probability 1/2 (see refs. 10 and 11). Then Xn is a causal process with the representation and the innovations εn, εn-1,..., correspond to the dyadic expansion of Xn. The process Xn is not strong mixing since αn ≡ 1/4 for all n (12). Some alternative ways have been proposed to overcome the disadvantages of strong mixing conditions (8, 9).
Dependence Measures
In this work, we shall provide another look at the fundamental issue of dependence. Our primary goal is to introduce “physical or functional” and “predictive dependence measures” a previously undescribed type of dependence measures that are quite different from strong mixing conditions. In particular, following refs. 1 and 13, we shall interpret Eq. 1 as an input/output system and then introduce dependence coefficients by measuring the degree of dependence of outputs on inputs. Specifically, we view Eq. 1 as a physical system
[3] |
where ei, ei-1,... are inputs, g is a filter or a transform, and xi is the output. Then, the process Xi is the output of the physical system 3 with random inputs. It is clearly not a good way to assess the dependence just by taking the partial derivatives ∂g/∂ej, which may not exist if g is not well-behaved. Nonetheless, because the inputs are random and iid, the dependence of the output on the inputs can be simply measured by applying the idea of coupling. Let () by an iid copy of (εi); let the shift process ξi = (..., εi-1, εi), . For a set , let if j ∈ I and εj,I = εj if j ∉ I; let ξi,I = (..., εi-1,I, εi,I) and . Then ξi,I is a coupled version of ξi with εj replaced by if j ∈ I. For p > 0 write if and ∥X∥ = ∥X∥2.
Definition 1 (Functional or physical dependence measure): For p > 0 and let δp(I, n) = ∥g(ξn) - g(ξn,I)∥p and . Write δ(n) = δ2(n).
Definition 2 (Predictive dependence measure): Let p ≥ 1 and gn be a Borel function on such that , n ≥ 0. Let ωp(I, n) = ∥gn(ξ0) - gn(ξ0,I)∥p and . Write ω(n) = ω2(n).
Definition 3 (p-stability): Let p ≥ 1. The process (Xn) is said to be p-stable if , and p-strong stable if . If Ω = Ω2 < ∞, we say that (Xn) is stable.
By the causal representation in Eq. 1, if min{i: i ∈ I} > n, then δp(I, n) = 0. Apparently, δp(I, n) quantifies the dependence of Xn = g(ξn) on {εi, i ∈ I} by measuring the distance between g(ξn) and its coupled version g(ξn,I). In Definition 2, is the n-step ahead predicated mean, and ωp(n) measures the contribution of ε0 in predicting future expected values. In the classical prediction theory (14), the conditional expectation of the form is studied. The one used in Definition 2 has a different form. It turns out that, in studying asymptotic properties and moment inequalities of Sn, it is convenient to use and predictive dependence measure (cf. Theorems 2 and 3), while the other version is generally difficult to work with. In the special case in which Xn are martingale differences with respect to the filter σ(ξn), gn = 0 almost surely and consequently ω(n) = 0, n ≥ 1.
Roughly speaking, since , the p-stability in Definition 3 indicates that the cumulative contribution of ε0 in predicting future expected values is finite. Interestingly, the stability condition Ω2 < ∞ implies invariance principles with -norming in a natural way (Theorem 3). By (i) of Theorem 1, p-strong stability implies p-stability since δp(n) ≥ ωp(n).
Our dependence measures provide a very convenient and simple way for a large-sample theory for stationary causal processes (see Theorems 2-5 below). In many applications, functional and predictive dependence measures are easy to use because they are directly related to data-generating mechanisms and because the construction of the coupled process g(ξn,I) is simple and explicit. Additionally, limit theorems with those dependence measures have easily verifiable conditions and are often optimal or nearly optimal. On the other hand, however, our dependence measures rely on the representation 1, whereas the strong mixing coefficients can be defined in more general situations (6).
Theorem 1. (i) Let p ≥ 1 and n ≥ 0. Then δp(n) ≥ ωp(n). (ii) Let p ≥ 1 and the projection operator , . Then for n ≥ 0,
[4] |
(iii) Let p > 1, Cp = 18p3/2(p - 1)-1/2 if 1 < p < 2, Cp = if p ≥ 2; let . Then
[5] |
Proof: (i) Since ,
which by Jensen's inequality implies δp(n) ≥ ωp(n). (ii) Since and and (εi) are independent, we have and inequality 4 follows from
(iii) For presentational clarity, let I = {..., -1, 0}. For i ≤ 0 let
Then D0, D-1,.. .are martingale differences with respect to the sigma algebras σ(εi,..., εn), i = 0, -1,.... By Jensen's inequality, ∥Di∥p ≤ δp(n - i). Let , and . Then and
To show Eq. 5, we shall deal with the two cases 1 < p < 2 and p ≥ 2 separately. If 1 < p < 2, then . By Burkholder's inequality (15)
If p ≥ 2, by proposition 4 in ref. 16, . So Eq. 5 follows.
Inequality 5 suggests the interesting reduction property: the degree of dependence of Xn on can be bounded in an element-wise manner, and it suffices to consider the dependence of Xn on individual εi. Indeed, our limit theorems and moment inequalities in Theorems 2-5 involve conditions only on δp(n) and ωp(n).
Linear Processes. Let εi be iid random variables with , p ≥ 1; let (ai) be real coefficients such that
[6] |
is a proper random variable. The existence of Xt can be checked by Kolmogorov's three series theorem. The linear process (Xt) can be viewed as the output from a linear filter and the input (..., εt-1, εt) is a series of shocks that drive the system (ref. 17, pp. 8-9). Clearly, , where . Let p = 2. If
[7] |
then the filter is said to be stable (17) and the preceding inequality implies short-range dependence since the covariances are absolutely summable. Definition 3 extends the notion of stability to nonlinear processes.
Volterra Series. Analysis of nonlinear systems is a notoriously difficult problem, and the available tools are very limited (18). Oftentimes it would be unsatisfactory to linearize or approximate nonlinear systems by linear ones. The Volterra representation provides a reasonably simple and general way. The idea is to represent Eq. 3 as a power series of inputs. In particular, suppose that g in Eq. 3 is sufficiently well-behaved so that it has the stationary and causal representation
[8] |
where functions gk are called the Volterra kernel. The right-hand side of Eq. 8 is generically called the Volterra expansion, and it plays an important role in the nonlinear system theory (13, 18-22). There is a continuous-time version of Eq. 8 with summations replaced by integrals. Because the series involved has infinitely many terms, to guarantee the meaningfulness of the representation, there is a convergence issue that is often difficult to deal with, and the imposed conditions can be quite restrictive (18). Fortunately, in our setting, the difficulty can be circumvented because we are dealing with iid random inputs. Indeed, assume that et are iid with mean 0, variance 1 and gk(u1,..., uk) is symmetric in u1,..., uk and it equals zero if ui = uj for some 1 ≤ i < j ≤ k, and
Then Xn exists and is in . Simple calculations show that
and
The Volterra process is stable if .
Nonlinear Transforms of Linear Processes. Let (Xt) be the linear process defined in Eq. 6 and consider the transformed process Yt = K(Xt), where K is a possibly nonlinear filter. Let ω(n, Y) be the predictive dependence measure of (Yt). Assume that εi have mean 0 and finite variance. Under mild conditions on K, we have (cf. theorem 2 in ref. 23). By Theorem 1, . In this case, if (Xt) is stable, namely Eq. 7 holds, then (Yt) is also stable.
Quite interesting phenomena happen if (Xn) is unstable. Under appropriate conditions on K, (Yn) could possibly be stable. With a nonlinear transform, the dependence structure of (Yt) can be quite different from that of (Xn) (24-27). The asymptotic problem of has a long history (see refs. 23 and 27 and references therein). Let and assume for some . Consider the remainder of the τ-th order Volterra expansion of Yn
[9] |
where r = 0,..., τ, and
Let and . Under mild regularity conditions on K and εn, by theorem 5 in ref. 23, . By Theorem 1, the predictive dependence measure ω(τ)(n) of the remainder L(τ)(ξn) satisfies
[10] |
It is possible that while . Consider the special case a = n-βl(n), where 1/2 < β < 1 and l is a slowly varying function, namely, for any c > 0. l(cn)/l(n) → 1 as n → ∞. By Karamata's Theorem (28) for j ≥ 2, . If τ > (2β - 1)-1 - 1, then is summable. Therefore, if the function K satisfies κr = 0 for r = 0,..., τ and (τ + 1)(2β - 1) > 1, then Yt = K(Xt) is stable even though Xt is not. Appell polynomials (29) satisfy such conditions. For example, let , then K∞(w) = w2 and κ1 = 0, κ2 = 2. If β ∈ (3/4, 1), then the process is stable. If 1/2 < β < 3/4, then Sn(K)/∥Sn(K)∥ converges to the Rosenblatt distribution.
Uniform Volterra expansions for Fn(x) over are established in refs. 30 and 31. Wu (32) considered nonlinear transforms of linear processes with infinite variance innovations.
Nonlinear Time Series. Let εt be iid random variables and consider the recursion
[11] |
where R is a measurable function. The framework 11 is quite general, and it includes many popular nonlinear time series models, such as threshold autoregressive models (33), exponential autoregressive models (34), bilinear autoregressive models, autoregressive models with conditional heteroscedasticity (35), among others. If there exists α > 0 and x0 such that
[12] |
where
then Eq. 11 admits a unique stationary distribution (36), and iterations of Eq. 11 give rise to Eq. 1. By theorem 2 in ref. 37, Eq. 12 implies that there exists p > 0 and r ∈ (0, 1) such that
[13] |
where I = {..., -1, 0}. Recall . By stationarity, . So Eq. 13 implies . On the other hand, by Theorem 1 (iii), if holds for some p > 1 and for some r ∈ (0,1), then Eq. 13 also holds. So they are equivalent if p > 1. In refs. 37 and 38, the property 13 is called geometric-moment contraction, and it is very useful in studying asymptotic properties of nonlinear time series.
Inequalities and Limit Theorems
For (Xi) defined in Eq. 1, let Su = Sn + (u - n)Xn+1, n ≤ u ≤ n + 1, n = 0, 1,..., be the partial sum process. Let Rn(s) = [Fn(s) - F(s)], where is the distribution function of X0. Primary goals in the limit theory of stationary processes include obtaining asymptotic properties of {Su, 0 ≤ u ≤ n} and . Such results are needed in the related statistical inference. The physical and predictive dependence measures provide a natural vehicle for an asymptotic theory for Sn and Rn.
Partial Sums. Let , and Bp = p/(p - 1), p > 1. Recall and let
By Theorem 1, Θp ≤ Ωp ≤ 2Θp. Moment inequalities and limit theorems of Sn are given in Theorems 2 and 3, respectively. Denote by IB the standard Brownian motion. An interesting feature in the large deviation result in Theorem 2(ii) is that Ωp and Xk do not need to be bounded.
Theorem 2. Let p ≥ 2. (i) We have ∥Zn∥p ≤ BpΘp ≤ BpΩp. (ii) Let 0 < α ≤ 2 and assume
[14] |
Then for 0 ≤ t < t0, where t0 = (eαγα)-12-α/2. Consequently, for u > 0, .
Proof: (i) It follows from W.B.W. (unpublished results) and theorem 2.5 in ref. 39. For completeness we present the proof here. Let and . Then . By Doob's maximal inequality and theorem 2.5 in ref. 39 (or proposition 4 in ref. 16),
Since , (i) follows. (ii) Let Z = Zn and p0 = [2/α] + 1. By Stirling's formula and Eq. 14
By (i), since , (ii) follows from
Example 1: For the linear process 6, assume that
[15] |
and . We now apply (ii)of Theorem 2 to the sum , where g̃(ξi) = 1Xi≤u - F(u). To this end, we need to calculate the predictive dependence measure ωp(n, g̃) (say) of the process g̃(ξn). Without loss of generality let a0 = 1. Let Fε and fε be the distribution and density functions of ε0 and assume c:= supufε(u) < ∞. Then Eq. 14 holds with α = 1. To see this, let Yn-1 = Xn - εn, Zn-1 = Yn-1 - anε0 and . Let n ≥ 1. Then and . By the triangle inequality,
Hence, . Since , we have . Clearly, 0 ≤ Qn ≤ 1. So , where C = 2cA. For η > 0 let the set . By Eq. 15
Condition 15 holds if .
Theorem 3. (i) Assume that Ω2 < ∞. Then
[16] |
where . (ii) Let 2 < p ≤ 4 and assume that . Then on a possibly richer probability space, there exists a Brownian motion IB such that
[17] |
where l(n) = (log n)1/2+1/p(log log n)2/p.
The proof of the strong invariance principle (ii) is given by W.B.W. (unpublished results). Theorem 3(i) follows from corollary 3 in ref. 40, and the expression is a consequence of the martingale approximation: let and Mn = D1 +... + Dn, then ∥Sn - Mn∥ = o() and ∥Sn∥/ = σ + o(1) (see theorem 6 in ref. 41). Theorem 3(i) also can be proved by using the argument in ref. 42. The invariance principle in the latter paper has a slightly different form. We omit the details. See refs. 43 and 44 for some related works.
Empirical Distribution Functions. Let , , be the conditional distribution function of Xi given ξ0. By Definition 2, the predictive dependence measure for g̃(ξi) = 1Xi≤u - F(u), at a fixed u,is . To study the asymptotic properties of Rn, it is certainly necessary to consider the whole range u ∈ (-∞, ∞). To this end, we introduce the integrated predictive dependence measure
[18] |
and the uniform predictive dependence measure
[19] |
where , j = 0, 1,..., i ≥ 1. Let . Theorem 4 below concerns the weak convergence of Rn based on . It follows from corollary 1 by W.B.W. (unpublished results).
Theorem 4. Assume that and for some positive constants τ, c0 < ∞. Further assume that
[20] |
Then , where W is a centered Gaussian process.
Kernel Density Estimation. An important problem in nonparametric inference of stochastic processes is to estimate the marginal density function f (say) given the data X1,..., Xn. A popular method is the kernel density estimation (45, 46). Let K be a bounded kernel function for which and bn > 1 be a sequence of bandwidths satisfying
[21] |
Let Kb(x) = K(x/b). Then f can be estimated by
[22] |
If Xi are iid, Parzen (46) proved a central limit theorem for under the natural condition 21. There has been a substantial literature on generalizing Parzen's result to time series (47, 48). Wu and Mielniczuk (49) solved the open problem that, for short-range dependent linear processes, Parzen's central limit theorem holds under Eq. 21. See references therein for historical developments. Here, we shall generalize the result in ref. 49 to nonlinear processes. To this end, we shall adopt the uniform predictive dependence measure 19. The asymptotic normality of fn requires a summability condition of .
Theorem 5. Assume that for some constant c0 < ∞ and that f = F′ is continuous. Let . Then under Eq.21 and
[23] |
we have for every .
Proof: Let m be a nonnegative integer. By the identity and the Lebesgue dominated convergence theorem, we have and hm+1 is also bounded by c0. By Theorem 1(ii), . Let . By Theorem 2(i) and Eq. 23
Let and . Observe that
Then . Following the argument of lemma 2 in ref. 49, Mn/ ⇒ N[0, f(x)κ], which finishes the proof since and bn → 0.
Acknowledgments
I thank J. Mielniczuk, M. Pourahmadi, and X. Shao for useful comments. I am very grateful for the extremely helpful suggestions of two reviewers. This work was supported by National Science Foundation Grant DMS-0448704.
Author contributions: W. B. W. designed research, performed research, and wrote the paper.
Abbreviation: iid, independent and identically distributed.
References
- 1.Wiener, N. (1958) Nonlinear Problems in Random Theory (MIT Press, Cambridge, MA).
- 2.Rosenblatt, M. (1959) J. Math. Mech. 8, 665-681. [Google Scholar]
- 3.Rosenblatt, M. (1971) Markov Processes. Structure and Asymptotic Behavior (Springer, New York).
- 4.Kallianpur, G. (1981) in Norbert Wiener, Collected Works with Commentaries eds. Wiener, N. & Masani, P. (MIT Press, Cambridge, MA) pp. 402-424.
- 5.Rosenblatt, M. (1956) Proc. Natl. Acad. Sci. USA 42, 43-47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bradley, R. C. (2005) Introduction to Strong Mixing Conditions (Indiana Univ. Press, Bloomington, IN).
- 7.Blum, J. R. & Rosenblatt, M. (1956) Proc. Natl. Acad. Sci. USA 42, 412-413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Doukhan, P. & Louhichi, S. (1999) Stochastic Process. Appl. 84, 313-342. [Google Scholar]
- 9.Dedecker, J. & Prieur, C. (2005) Probab. Theor. Rel. Fields 132, 203-236. [Google Scholar]
- 10.Rosenblatt, M. (1964) J. Res. Natl. Bureau Standards Sect. D 68D, 933-936. [Google Scholar]
- 11.Rosenblatt, M. (1980) J. Appl. Prob. 17, 265-270. [Google Scholar]
- 12.Andrews, D. W. K. (1984) J. Appl. Prob. 21, 930-934. [Google Scholar]
- 13.Priestley, M. B. (1988). Nonlinear and Nonstationary Time Series Analysis (Academic, London).
- 14.Pourahmadi, M. (2001) Foundations of Time Series Analysis and Prediction Theory (Wiley, New York).
- 15.Chow, Y. S. & Teicher, H. (1988) Probability Theory (Springer, New York).
- 16.Dedecker, J. & Doukhan, P. (2003) Stochastic Process. Appl. 106, 63-80. [Google Scholar]
- 17.Box, G. E. P., Jenkins, G. M. & Reinsel, G. C. (1994) Time Series Analysis: Forecasting and Control (Prentice-Hall, Englewood Cliffs, NJ).
- 18.Rugh, W. J. (1981) Nonlinear System Theory: The Volterra/Wiener Approach (Johns Hopkins Univ. Press, Baltimore).
- 19.Schetzen, M. (1980) The Volterra and Wiener Theories of Nonlinear Systems (Wiley, New York).
- 20.Casti, J. L. (1985) Nonlinear System Theory (Academic, Orlando, FL).
- 21.Bendat, J. S. (1990) Nonlinear System Analysis and Identification from Random Data (Wiley, New York).
- 22.Mathews, V. J. & Sicuranza, G. L. (2000) Polynomial Signal Processing (Wiley, New York).
- 23.Wu, W. B. (2006) Econometric Theory 22, in press.
- 24.Dittmann, I. & Granger, C. W. J. (2002) J. Econometrics 110, 113-133. [Google Scholar]
- 25.Sun, T. C. (1963) J. Math. Mech. 12, 945-978. [Google Scholar]
- 26.Taqqu, M. S. (1975) Z. Wahrscheinlichkeitstheorie Verw. Geb. 31, 287-302. [Google Scholar]
- 27.Ho, H. C. & Hsing, T. (1997) Ann. Prob. 25, 1636-1669. [Google Scholar]
- 28.Feller, W. (1971) An Introduction to Probability Theory and its Applications (Wiley, New York) Vol. II.
- 29.Avram, F. & Taqqu, M. (1987) Ann. Prob. 15, 767-775. [Google Scholar]
- 30.Ho, H. C. & Hsing, T. (1996) Ann. Stat. 24, 992-1024. [Google Scholar]
- 31.Wu, W. B. (2003) Bernoulli 9, 809-831. [Google Scholar]
- 32.Wu, W. B. (2003) Statistica Sinica 13, 1259-1267. [Google Scholar]
- 33.Tong, H. (1990) Non-linear Time Series: A Dynamical System Approach (Oxford Univ. Press, Oxford).
- 34.Haggan, V. & Ozaki, T. (1981) Biometrika 68, 189-196. [Google Scholar]
- 35.Engle, R. F. (1982) Econometrica 50, 987-1007. [Google Scholar]
- 36.Diaconis, P. & Freedman, D. (1999) SIAM Rev. 41, 41-76. [Google Scholar]
- 37.Wu, W. B. & Shao, X. (2004) J. Appl. Prob. 41, 425-436. [Google Scholar]
- 38.Hsing, T. & Wu, W. B. (2004) Ann. Prob. 32, 1600-1631. [Google Scholar]
- 39.Rio, E. (2000) Theorie Asymptotique des Processus Aleatoires Faiblement Dependants (Springer, Berlin).
- 40.Dedecker, J. & Merlevède, P. (2003) Stochastic Process. Appl. 108, 229-262. [Google Scholar]
- 41.Volný, D. (1993) Stochastic Process. Appl. 44, 41-74. [Google Scholar]
- 42.Hannan, E. J. (1979) Stochastic Process. Appl. 9, 281-289. [Google Scholar]
- 43.Hannan, E. J. (1973) Z. Wahrscheinlichkeitstheorie Verw. Geb. 26, 157-170. [Google Scholar]
- 44.Heyde, C. C. (1974) Z. Wahrscheinlichkeitstheorie Verw. Geb. 30, 315-320. [Google Scholar]
- 45.Rosenblatt, M. (1956) Ann. Math. Stat. 27, 832-837. [Google Scholar]
- 46.Parzen, E. (1962) Ann. Math. Stat. 33, 1965-1976. [Google Scholar]
- 47.Robinson, P. M. (1983) J. Time Ser. Anal. 4, 185-207. [Google Scholar]
- 48.Bosq, D. (1996) Nonparametric Statistics for Stochastic Processes. Estimation and Prediction (Springer, New York).
- 49.Wu, W. B. & Mielniczuk, J. (2002) Ann. Stat. 30, 1441-1459. [Google Scholar]