Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Feb 1.
Published in final edited form as: Bernoulli (Andover). 2013 Feb 1;19(1):205–227. doi: 10.3150/11-BEJ399

Inference for modulated stationary processes

Zhibiao Zhao 1,*, Xiaoye Li 1,**
PMCID: PMC3607552  NIHMSID: NIHMS383095  PMID: 23539557

Abstract

We study statistical inferences for a class of modulated stationary processes with time-dependent variances. Due to non-stationarity and the large number of unknown parameters, existing methods for stationary or locally stationary time series are not applicable. Based on a self-normalization technique, we address several inference problems, including self-normalized central limit theorem, self-normalized cumulative sum test for the change-point problem, long-run variance estimation through blockwise self-normalization, and self-normalization-based wild boot-strap. Monte Carlo simulation studies show that the proposed self-normalization-based methods outperform stationarity-based alternatives. We demonstrate the proposed methodology using two real data sets: annual mean precipitation rates in Seoul during 1771–2000, and quarterly U.S. Gross National Product growth rates during 1947–2002.

Keywords: Change-point analysis, Confidence interval, Long-run variance, Modulated stationary process, Self-normalization, Strong invariance principle, Wild bootstrap

1. Introduction

In time series analysis, stationarity requires that dependence structure be sustained over time, and thus we can borrow information from one time period to study model dynamics over another period; see Fan and Yao (2003) for nonparametric treatments and Lahiri (2003) for various resampling and block bootstrap methods. In practice, however, many climatic, economic, and financial time series are non-stationary and therefore challenging to analyze. First, since dependence structure varies over time, information is more localized. Second, non-stationary processes often require extra parameters to account for time-varying structure. One way to overcome these issues is to impose certain local stationarity; see, for example, Dahlhaus (1997) and Adak (1998) for spectral representation frameworks and Dahlhaus and Polonik (2009) for a time domain approach.

In this article we study a class of modulated stationary processes [see Adak (1998)]:

Xi=μ+σiei,  i=1,,n, (1.1)

where ei are stationary time series with zero mean, and σi > 0 are unknown constants adjusting for time-dependent variances. Then Xi oscillates around the constant mean μ, whereas its variance changes over time in an unknown manner. In the special case of σi ≡ 1, (1.1) reduces to stationary case. If σi = s(i/n) for a Lipschitz continuous function s(t) on [0, 1], then (1.1) is locally stationary. For the general non-stationary case (1.1), the number of unknown parameters is larger than the number of observations, and it is infeasible to estimate σi. Due to non-stationarity and the large number of unknown parameters, existing methods that are developed for (locally) stationary processes are not applicable, and our main purpose is to develop new statistical inference techniques.

First, we establish a uniform strong approximation result which can be used to derive a self-normalized central limit theorem (CLT) for the sample mean of (1.1). For stationary case σi ≡ 1, by Fan and Yao (2003), under mild mixing conditions,

n(X¯μ)N(0,τ2),   where   τ2=γ0+2k=1γk   and   γk=Cov(ei,ei+k). (1.2)

For the modulated stationary case (1.1), it is non-trivial whether n(X¯μ) has a CLT without imposing further assumptions on σi and the dependence structure of ei. Moreover, even when the latter CLT exists, it is difficult to estimate the limiting variance due to the large number of unknown parameters; see De Jong and Davidson (2000) for related work assuming a near-epoch dependent mixing framework. Zhao (2011) studied confidence interval construction for μ in (1.1) by assuming a block-wise asymptotically-equal-cumulative-variance assumption. The latter assumption is rather restrictive and essentially requires that block averages be asymptotically independent and identically distributed (IID). In this article, we deal with the more general setting (1.1). Under a strong invariance principle assumption, we establish a self-normalized CLT with the self-normalizing constant adjusting for time-dependent non-stationarity. The obtained CLT is an extension of the classical CLT for IID data or stationary time series to modulated stationary processes. Furthermore, we extend the idea to linear combinations of means over different time periods, which allows us to address inference regarding mean levels over multiple time periods.

Second, we study wild bootstrap for modulated stationary processes. Since the seminal work of Efron (1979), a great deal of research has been done on bootstrap under various settings, ranging from bootstrap for IID data in Efron (1979), wild bootstrap for independent observations with possibly non-constant variances in Wu (1986) and Liu (1988), to various block bootstrap and resampling methods for stationary time series in Künsch (1989), Politis and Romano (1994), Bühlmann (2002), and the monograph Lahiri (2003). With the established self-normalized CLT, we propose a wild bootstrap procedure that is tailored to deal with modulated stationary processes: the dependence is removed through a scaling factor, and the non-constant variance structure of the original data is preserved in the wild bootstrap data-generating mechanism. Our simulation study shows that the wild bootstrap method outperforms the widely used stationarity-based block bootstrap.

Third, we address change-point analysis. The change-point problem has been an active area of research; see Pettitt (1980) for proportion changes in binary data, Horváth (1993) for mean and variance changes in Gaussian observations, Bai and Perron (1998) for coefficient changes in linear models, Aue et al. (2008a) for coefficient changes in polynomial regression with uncorrelated errors, Aue et al. (2008b) for mean change in time series with stationary errors, Shao and Zhang (2010) for change-points for stationary time series, and the monograph by Csörgő and Horváth (1997) for more discussion. Most of these works deal with stationary and/or independent data. Hansen (2000) studied tests for constancy of parameters in linear regression models with non-stationary regressors and conditionally homoscedastic martingale difference errors. Here we consider

H0:Xi=μi+σiei,μ1==μn,   Ha:μ1==μJμJ+1==μn, (1.3)

where J is an unknown change-point. The aforementioned works mainly focused on detecting changes in mean while the error variance is constant. On the other hand, researchers have also realized the importance of the variance/covariance structure in change-point analysis. For example, Inclán and Tiao (1994) studied change in variance for independent data, and Aue et al. (2009) and Berkes, Gombay and Horváth (2009) considered change in covariance for time series data. To our knowledge, there has been almost no attempt to advance change-point analysis under the non-constant variances framework in (1.3). Andrews (1993) studied change-point problem under near-epoch dependence structure that allows for non-stationary processes, but his Assumption 1 (c) on page 830 therein essentially implies that the process has constant variance. The popular cumulative sum (CUSUM) test is developed for stationary time series and does not take into account the time-dependent variances. Using the self-normalization idea, we propose a self-normalized CUSUM test and a wild bootstrap method to obtain its critical value. Our empirical studies show that the usual CUSUM tests tend to over-reject the null hypothesis in the presence of non-constant variances. By contrast, the self-normalized CUSUM test yields size close to the nominal level.

Fourth, we estimate the long-run variance τ2 in (1.2). Long-run variance plays an essential role in statistical inferences involving time series. Most works in the literature deal with stationary processes through various block bootstrap and subsampling approaches; see Carlstein (1986), Künsch (1989), Politis and Romano (1994), Götze and Künsch (1996), and the monograph Lahiri (2003). De Jong and Davidson (2000) established the consistency of kernel estimators of covariance matrices under a near epoch dependent mixing condition. Recently, Müller (2007) studied robust long-run variance estimation for locally stationary process. For model (1.1), the error process {ei} is contaminated with unknown standard deviations {σi}, and we apply blockwise self-normalization to remove non-stationarity, resulting in asymptotically stationary blocks.

Fifth, the proposed methods can be extended to deal with linear regression model

Xi=Uiβ+σiei, (1.4)

where Ui = (ui,1, …, ui,p) are deterministic covariates, and β = (β1, …, βp)′ is the unknown column vector of parameters. For p = 2, Hansen (1995) established the asymptotic normality of the least-squares estimate of the slope parameter under a fairly general framework of non-stationary errors. While Hansen (1995) assumed that the errors form a martingale difference array so that they are uncorrelated, the framework in (1.4) is more general in that it allows for correlations. On the other hand, Hansen (1995) allowed the conditional volatilities to follow an autoregressive model, hence introducing stochastic volatilities. Phillips, Sun and Jin (2007) considered (1.4) for stationary errors and their approach is not applicable here due to the unknown non-constant variances σi2. In Section 2.6 we consider self-normalized CLT for the least-squares estimator of β in (1.4). In the polynomial regression case ui,r = (i/n)r−1, Aue et al. (2008a) studied a likelihood-based test for constancy of β in (1.4) for uncorrelated errors with constant variance. Due to the presence of correlation and time-varying variances, it is more challenging to study the change-point problem for (1.4) and this is beyond the scope of this article.

The rest of this article is organized as follows. We present theoretical results in Section 2. Sections 3–4 contain Monte Carlo studies and applications to two real data sets.

2. Main results

For sequences {an} and {bn}, write an = O(bn), an = o(bn), and anbn, respectively, if |an/bn| < c1, an/bn → 0, and c2 < |an/bn| < c3, for some constants 0 < c1, c2, c3 < ∞. For q > 0 and a random variable e, write e ∈ ℒq if ‖eq ≔ {E(|e|q)}1/q < ∞.

2.1. Uniform approximations for modulated stationary processes

In (1.1), assume without loss of generality that E(ei) = 0 and E(ei2)=1 so that {ei} and {ei21} are centered stationary processes. With the convention S0=S0*=0, define

Si=j=1iej    and    Si*=j=1i(ej21),    i=1,2,. (2.1)

Assumption 2.1. There exist standard Brownian motions {Bt} and {Bt*} such that

max1in|SiτBi|=oa.s.(Δn)   and   max1in|Si*τ*Bi*|=oa.s.(Δn), (2.2)

where Δn is the approximation error, τ2 and τ*2 are the long-run variances of {ei} and {ei21}, respectively. Further assume τ2 > 0 to avoid the degenerate case τ2 = 0.

The uniform approximations in (2.2) are generally called strong invariance principle. The two Brownian motions {Bt} and {Bt*} may be defined on different probability spaces and hence are not jointly distributed, which is not an issue because our argument does not depend on their joint distribution. To see how to use (2.2), under H0 in (1.3), consider

Fj=j(X¯jμ)   and   V¯j2=i=1j(XiX¯j)2,   where   X¯j=j1i=1jXi. (2.3)

Theorem 2.1 below presents uniform approximations for Fj and Vj2. Define

rn=|σn|+i=2n|σiσi1|  and  rn*=|σn2|+i=2n|σi2σi12|, (2.4)
j2=i=1jσi2  and   j*2=(i=1jσi4)1/2. (2.5)

Theorem 2.1. Let (2.2) hold. For any c ∈ (0, 1], the uniform approximations hold:

maxcnjn|Fjτi=1jσi(BiBi1)|=Oa.s.(rnΔn), (2.6)
maxcnjn|V¯j2Σj2|=Op{(rn2Δn2+Σn2)/n+Σn*2+rn*Δn}. (2.7)

Theorem 2.1 provides quite general results under (2.2). We now discuss sufficient conditions for (2.2). Shao (1993) obtained sufficient mixing conditions for (2.2). In this article, we briefly introduce the framework in Wu (2007). Assume that ei has the causal representation ei = G (…, εi−1, εi), where εi are IID innovations, and G is a measurable function such that ei is well-defined. Let {εi}i be an independent copy of {εi}i∈ℤ. Assume

i=1ieiei8<,   where  ei=G(,ε1,ε0,ε1,εi1,εi). (2.8)

Proposition 2.1 below follows from Corollary 4 in Wu (2007).

Proposition 2.1. Assume that (2.8) holds. Then (2.2) holds with Δn = n1/4 log(n), the optimal rate up to a logarithm factor.

For linear process ei=j=0ajεij with εi ∈ ℒ8 and Ei) = 0, eiei8=ε0ε08|ai|. If i=1i|ai| < ∞, then (2.2) holds with Δn = n1/4 log(n). For many nonlinear time series, eiei8 decays exponentially fast and hence (2.8) holds; see Section 3.1 of Wu (2007). From now on we assume (2.2) holds with Δn = n1/4 log(n).

Remark 2.1. If ei are IID with E(ei) = 0 and ei ∈ ℒq for some 2 < q ≤ 4, the celebrated “Hungarian embedding” asserts that j=1iej satisfies a strong invariance principle with the optimal rate oa.s. (n1/q). Thus, it is necessary to have the moment assumption ei ∈ ℒ8 in Proposition 2.1 in order to ensure strong invariance principles for both Si and Si* in (2.1) with approximation rate n1/4 log(n). On the other hand, one can relax the moment assumption by loosening the approximation rate. For example, by Corollary 4 in Wu (2007), assume ei ∈ ℒ2q for some q > 2 and i=1ieiei*2q<, then (2.2) holds with Δn = n1/min(q,4) log(n).

As shown in Examples 2.1–2.3 below, rn and rn* in (2.4) often have tractable bounds.

Example 2.1. If σi is non-decreasing in i, then σnrn ≤ 2σn and σn2rn*2σn2. If σi is non-increasing in i, then rn = σ1 and rn*=σ12. If σi are piecewise constants with finitely many pieces, then rn,rn*=O(1).

Example 2.2. Let σi = s(i/nγ) for γ ∈ [0, 1] and a Lipschitz continuous function s(t), t ∈ [0, ∞), supt∈[0, ∞) s(t) < ∞. Then rn,rn*=O(n1γ). If γ = 1, we obtain locally stationary case with time window i/n ∈ [0, 1]; if γ ∈ [0, 1), we have infinite time window [0, ∞) as n/nγ → ∞, which may be more reasonable for data with a long time horizon.

Example 2.3. If σi = iβL(i) for a slowly varying function L(·) such that L(cx)/L(x) → 1 as x → ∞ for all c > 0. Then we can show rn = O{nβL(n)} or O(1) and rn*=O{n2βL2(n)} or O(1), depending on whether β > 0 or β < 0. For the boundary case β = 0, assume L (i+1)/L(i) = 1+O(1/i) uniformly, then rn=L(n)+O(1)i=2nL(i)/i=O{log(n)max1inL(i)}. Similarly, rn*=O{log(n)max1inL2(i)}.

2.2. Self-normalized central limit theorem

In this section we establish a self-normalized CLT for the sample average . To understand how non-stationarity makes this problem difficult, elementary calculation shows

Var{n(X¯μ)}=γ0ni=1nσi2+2n1i<jnσiσjγjiτn2, (2.9)

where γk = Cov(e0, ek). In the stationary case σi ≡ 1, under condition k=0|γk|<,τn2τ2, the long-run variance in (1.2). For non-constant variances, it is difficult to deal with τn2 directly, due to the large number of unknown parameters and complicated structure. See De Jong and Davidson (2000) for a kernel estimator of τn2 under a near-epoch dependent mixing framework.

To attenuate the aforementioned issue, we apply the uniform approximations in Theorem 2.1. Assume that (2.10) below holds. Note that the increments BiBi−1 of standard Brownian motions are IID standard normal random variables. By (2.6), n( −μ) is equivalent to N(0,τ2Σn2) in distribution. By (2.7), nn → 1 in probability. By Slutsky’s theorem, we have the following Proposition 2.2.

Proposition 2.2. Let (2.2) hold with Δn = n1/4 log(n). For rn,rn*,Σn2,Σn*2 in (2.4)(2.5), assume

δn=rnΔn/Σn+(rn*Δn+Σn*2)/Σn20. (2.10)

Recall V¯n2 in (2.3). Then as n → ∞, n( − μ)/nN(0, τ2). Consequently, a (1 − α) asymptotic confidence interval for μ is X̅ ± zα/2τ̂n/n, where τ̂ is a consistent estimate of τ (Section 2.5 below), and zα/2 is (1 − α/2) standard normal quantile.

Proposition 2.2 is an extension of the classical CLT for IID data or stationary processes to modulated stationary processes. If Xi are IID, then n( − μ)/nN(0, 1). In Proposition 2.2, τ2 can be viewed as the variance inflation factor due to the dependence of {ei}. For stationary data, the sample variance V¯n2/n is a consistent estimate of the population variance. Here, for non-constant variances case (1.1), by (2.7) in Theorem 2.1, V¯n2/n can be viewed as an estimate of the time-average “population variance” Σn2/n. So, we can interpret the CLT in Proposition 2.2 as a self-normalized CLT for modulated stationary processes with the self-normalizing term n adjusting for non-stationarity due to σ1, …, σn and τ2 accounting for dependence of {ei}. Clearly, parameters σ1, …, σn are canceled out through self-normalization. Finally, condition (2.10) is satisfied for Example 2.2 with γ > 3/4 and Example 2.3 with β > −1/4.

In classical statistics, the width of confidence intervals usually shrinks as sample size increases. By Proposition 2.2 and Theorem 2.1, the width of the constructed confidence interval for μ is proportional to n/n or, equivalently, Σn/n. Thus, a necessary and sufficient condition for shrinking confidence interval is i=1nσi2/n20, which is satisfied if σi=o(i). An intuitive explanation is as follows. For IID data, sample mean converges at a rate of O(n). In (1.1), if σi grows faster than O(i), the contribution of a new observation is negligible relative to its noise level.

Example 2.4. If σiiβ with β ∈ [0, 1/2), the length of confidence interval is proportional to Σn/nnβ−1/2. In particular, if c1 < σi < c2 for some positive constants c1 and c2, then Σn/n achieves the optimal rate O(n−1/2). If σi ≍ log(i), then Σn/nlog(n)/n.

The same idea can be extended to linear combinations of means over multiple time periods. Suppose we have observations from k consecutive time periods 𝒯1, …, 𝒯k, each of the form (1.1) with different means, denoted by μ1, …, μk, and each having time-dependent variances. Let ν = β1μ1 + ⋯ + βkμk for given coefficients β1, …, βk. For example, if we are interested in mean change from 𝒯1 to 𝒯2, we can take ν = μ2 − μ1; if we are interested in whether the increase from 𝒯3 to 𝒯4 is larger than that from 𝒯1 to 𝒯2, we can let ν = (μ4 − μ3) − (μ2 − μ1). Proposition 2.3 below extends Proposition 2.2 to multiple means.

Proposition 2.3. Let ν = β1μ1 + ⋯ + βkμk. For 𝒯j, denote its sample size nj and its sample average X̅ (j). Assume that (2.10) holds for each individual time periods 𝒯j and for simplicity that n1, …, nk are of the same order. Then

j=1kβjX¯(j)νΛnN(0,τ2),   where  Λn2=j=1k{βj2nj2i𝒯j[XiX¯(j)]2}.

2.3. Wild bootstrap for self-normalized statistic

Recall σiei in (1.1). Suppose we are interested in the self-normalized statistic

Hn=i=1nσieii=1nσi2ei2.

For problems with small sample sizes, it is natural to use bootstrap distribution instead of the convergence HnN(0, τ2) in Proposition 2.2. Wu (1986) and Liu (1988) have pioneered the work on wild bootstrap for independent data with non-identical distributions. We shall extend their wild bootstrap procedure to the modulated stationary process (1.1).

Let {αi} be IID random variables independent of {ei} satisfying αi ∈ ℒ3, Ei) = 0, E(αi2)=1. Define the self-normalized statistic based on new data:

Hn*=i=1nξii=1n(ξiξ¯)2,   where  ξi=σieiαi   and  ξ¯=ξ1++ξnn.

Clearly, ξi inherits the non-stationarity structure of σiei by writing ξi=σiei* with ei*=eiαi. On the other hand, for the new error process {ei*},E(ei*2)=E(ei2)=1 and Cov(ei*,ej*)=0 for ij. Thus, {ei*} is a white noise sequence with long-run variance one. By Proposition 2.2, the scaled version Hn/τ ⇒ N(0, 1) is robust against the dependence structure of {ei}, so we expect that Hn* should be close to Hn/τ in distribution.

Theorem 2.2. Let the conditions in Proposition 2.2 hold. Further assume

(i=1nσi3)2(i=1nσi2)30. (2.11)

Let τ̂ be a consistent estimate of τ . Denote by ℙ* the conditional law given {ei}. Then

supx|*(Hn*x)(Hn/τ^x)|0,  in probability. (2.12)

Theorem 2.2 asserts that, Hn* behaves like the scaled version Hn/τ̂, with the scaling factor τ̂ coming from the dependence of {ei}. Here we use the sample mean in (1.1) to illustrate a wild bootstrap procedure to obtain the distribution of n( − μ)/(τn) in Proposition 2.2.

  • (i)

    Apply method in Section 2.5 to X1, …, Xn to obtain a consistent estimate τ̂ of τ.

  • (ii)

    Subtract the sample mean from data to obtain εi = Xi, i = 1, …, n.

  • (iii)

    Generate IID random variables α1, …, αn satisfying Ei) = 0, E(αi2)=1.

  • (iv)
    Based on εi in (ii) and αi in (iii), generate bootstrap data ξib=εiαi, and compute
    Hnb=i=1nξibτ^bi=1n(ξibξ¯b)2,
    where τ̂b is a long-run variance estimate (see Section 2.5) for bootstrap data ξib.
  • (vi)

    Repeat (ii)–(iv) many times and use the empirical distribution of those realizations of Hnb as the distribution of n( − μ)/(τn).

The proposed wild bootstrap is an extension of that in Liu (1988) for independent data to modulated stationary case, and it has two appealing features. First, the scaling factor τ̂ makes the statistic independent of the dependence structure. Second, the bootstrap data-generating mechanism is adaptive to unknown time-dependent variances {σi2}. For the distribution of αi in step (iii), we use ℙ(αi = −1) = ℙ(αi = 1) = 1/2, which has some desirable properties. For example, it preserves the magnitude and range of the data. As shown by Davidson and Flachaire (2008), for certain hypothesis testing problems in linear regression models with symmetrically distributed errors, the bootstrap distribution is exactly equal to that of the test statistic; see Theorem 1 therein.

For the purpose of comparison, we briefly introduce the widely used block bootstrap for a stationary time series {Xi} with mean μ. By (1.2), n(X¯μ)N(0,τ2). Suppose that we want to bootstrap the distribution of n(X¯μ). Let kn, ℓn, ℐ1, …, ℐn be defined as in Section 2.5 below. The non-overlapping block bootstrap works as follows

  1. Take a simple random sample of size ℓn with replacement from the blocks ℐ1, …, ℐn, and form the bootstrap data X1b,,Xnb,n=knn, by pooling together Xi’s for which the index i is within those selected blocks.

  2. Let b be the sample average of X1b,,Xnb. Compute Ξn=n{X¯bE*(X¯b)}, where E*(X¯b)=i=1nXi/n is the conditional expectation of b given {Xi}.

  3. Repeat (i)–(ii) many times and use the empirical distribution of Ξn’s as the distribution of n(X¯μ).

In step (ii), another choice is the studentized version Ξ̃n=n{X¯bE*(X¯b)}/τ^b, where τ̂b is a consistent estimate of τ based on bootstrap data. Assuming stationarity and kn → ∞, the blocks are asymptotically independent and share the same model dynamics as the whole data, which validates the above block bootstrap. We refer the reader to Lahiri (2003) for detailed discussions. For a non-stationary process, block bootstrap is no longer valid, because individual blocks are not representative of the whole data. By contrast, the proposed wild bootstrap is adaptive to unknown dependence and the non-constant variances structure.

2.4. Change-point analysis: self-normalized CUSUM test

To test a change-point in the mean of a process {Xi}, two popular CUSUM-type tests [see Section 3 of Robbins et al. (2011) for a review and related references] are

Tn1=maxcnj(1c)nτ^1|SX(j)|j(1j/n)  and  Tn2=maxcnj(1c)nτ^1|SX(j)|, (2.13)

where τ̂2 is a consistent estimate of the long-run variance τ2 of {Xi}, and

SX(j)=(1jn)i=1jXijni=j+1nXi. (2.14)

Here c > 0 (c = 0.1 in our simulation studies) is a small number to avoid the boundary issue. For IID data, j(1 − j/n) is proportional to the variance of SX (j), so Tn1 is a studentized version of Tn2. For IID Gaussian data, Tn1 is equivalent to likelihood ratio test; see Csörgő and Horváth (1997). Assume that, under null hypothesis,

{n1/2i=1nt[XiE(Xi)]}0t1τ{Bt}0t1,  in the Skorohod space, (2.15)

for a standard Brownian motion {Bt}t≥0. The above convergence requires finite-dimensional convergence and tightness; see Billingsley (1968). By the continuous mapping theorem, Tn1maxct1c|Btt B1|/t(1t) and Tn2/nmaxct1c|Btt B1|.

For the modulated stationary case (1.3), (2.15) is no longer valid. Moreover, since Tn1 and Tn2 do not take into account the time-dependent variances σi2, an abrupt change in variances may lead to a false rejection of H0 when the mean remains constant. For example, our simulation study in Section 3.3 shows that the empirical false rejection probability for Tn1 and Tn2 is about 10% for nominal level 5%. To alleviate the issue of non-constant variances, we adopt the self-normalization approach as in previous sections. Recall Fj and j in (2.3). For each fixed cnj ≤ (1 − c)n, by Theorem 2.1 and Slutsky’s theorem, Fj/j ⇒ N(0, τ2) in distribution, assuming the negligibility of the approximation errors. Therefore, the self-normalization term j can remove the time-dependent variances. In light of this, we can simultaneously self-normalize the two terms i=1jXi and i=j+1nXi in (2.14) and propose the self-normalized test statistic

TnSN=maxcnj(1c)nτ^1|Tn(j)|,   where  Tn(j)=SX(j)(1j/n)2V¯j2+(j/n)2V¯j2. (2.16)

Here, V¯j2 is defined as in (2.3), V¯j2=i=j+1n(XiX¯j)2 with X¯j=(nj)1i=j+1nXi.

Theorem 2.3. Assume (2.2) holds. Let δn → 0 be as in (2.10). Under H0, we have maxcnj(1c)n|Tn(j)τT˜n(j)|=Op(δn), where T˜n(j)=(1j/n)i=1jσi(BiBi1)j/ni=j+1nσi(BiBi1)(1j/n)2i=1jσi2+(j/n)2i=j+1nσi2.

By Theorem 2.3, under H0, TnSN is asymptotically equivalent to maxcnj≤(1−c)n|n(j)|. Due to the self-normalization, for each j, the time-dependent variances are removed and n(j) ~ N(0, 1) has a standard normal distribution. However, n(j) and e n(j′) are correlated for jj′. Therefore, {n(j)} is a non-stationary Gaussian process with a standard normal marginal density. Due to the large number of unknown parameters σi, it is infeasible to obtain the null distribution directly. On the other hand, Theorem 2.3 establishes the fact that, asymptotically, the distribution of TnSN in (2.16) depends only on σ1, …, σn and is robust against the dependence structure of {ei}, which motivates us to use the wild bootstrap method in Section 2.3 to find the critical value of TnSN.

  1. Compute Tn(j) and find Ĵ = argmaxcnj≤(1−c)n|Tn(j)|.

  2. Divide the data into two blocks X1, …, XĴ and XĴ+1, …, Xn. Within each block, subtract the sample mean from the observations therein to obtain centered data. Pool all centered data together and denote them by ε1, …, εn.

  3. Based on ε1, …, εn, obtain an estimate τ̂ of τ . See Section 2.5 below.

  4. Compute the test statistic TnSN in (2.16).

  5. Based on εi in (ii), use the wild bootstrap method in Section 2.3 to generate synthetic data ξ1, …, ξn, and use (i)–(iv) to compute the bootstrap test statistic Tnb based on the bootstrap data ξ1, …, ξn.

  6. Repeat (v) many times and find (1 − α) quantile of those Tnbs.

As argued in Section 2.3, the synthetic data-generating scheme (v) inherits the time-varying non-stationarity structure of the original data. Also, the statistic TnSN is robust against the dependence structure, which justifies the proposed bootstrap method. If H0 is rejected, the change-point is then estimated by Ĵ = argmaxcnj≤(1−c)n |Tn(j)|.

If there is no evidence to reject H0, we briefly discuss how to apply the same methodology to test 0 : σ1 = ⋯ = σJ ≠ σJ+1 = ⋯ = σn, that is, whether there is a change-point in the variances σi2. By (1.1), we have (Xiμ)2=σi2+σi2ζi, where ζi=ei21 has mean zero. Therefore, testing a change-point in the variances σi2 of Xi is equivalent to testing a change-point in the mean of the new data i = (Xi)2.

2.5. Long-run variance estimation

To apply the results in Sections 2.2–2.4, we need a consistent estimate of the long-run variance τ2. Most existing works deal with stationary time series through various block bootstrap and subsampling approaches; see Lahiri (2003) and references therein. Assuming a near-epoch dependent mixing condition, De Jong and Davidson (2000) established the consistency of a kernel estimator of Var(i=1nXi), and their result can be used to estimate τn2 in (2.9) for the CLT of n(X¯μ). However, for the change-point problem in Section 2.4, we need an estimator of the long-run variance τ2 of the unobservable process {ei}, so the method in De Jong and Davidson (2000) is not directly applicable.

To attenuate the non-stationarity issue, we extend the idea in Section 2.2 to blockwise self-normalization. Let kn be the block length. Denote by ℓn = ⌊n/kn⌋ the largest integer not exceeding n/kn. Ignore the boundary and divide 1, …, n into ℓn blocks

j={(j1)kn+1,,jkn},  j=1,,n. (2.17)

Recall the overall sample mean . For each block j, define the self-normalized statistic

Dj=kn[X¯(j)X¯]V(j),   where  X¯(j)=1knijXi,   V2(j)=ij[XiX¯(j)]2. (2.18)

By Proposition 2.2, the self-normalized statistics D1, …, Dn ~ N(0, τ2) are asymptotically IID. Thus, we propose estimating τ2 by

τ^2=1nj=1nDj2. (2.19)

As in (2.4)(2.5), we define the quantities on block j

r(j)=|σj kn|+ij|σiσi1|  and  r*(j)=|σj kn2|+ij|σi2σi12|, (2.20)
Σ2(j)=ijσi2  and  Σ*2(j)=(ijσi4)1/2. (2.21)

Theorem 2.4. Let (2.2) hold with Δn = n1/4 log(n). Recall rn, Σn in (2.4)(2.5). Define

Mn=1kn+max1jnΣ*2(j)+r*(j)ΔnΣ2(j)+max1jnr(j)ΔnΣ(j). (2.22)

Assume that rnΔnn → 0 and

χn=n1/2+log(n)Mn+log(n)Σnn2j=1n1Σ(j)+Σn2n3j=1n1Σ2(j)0. (2.23)

Then τ̂2 − τ2 = Opn). Consequently, τ̂ is a consistent estimate of τ.

Consider Example 2.2 with γ ∈ [0, 1). Then χnlog(n)/n+log2(n)(n1/4/kn+n5/4γ/kn+knn1/4γ). For γ ∈ (3/4, 1), it can be shown that the optimal rate is χnn−1/8 log5/4(n) when knn3/4 log3/2(n). In Example 2.3 with σi = iβ for some β ∈ [0, 1), elementary but tedious calculations show that the optimal rate is

χn{n1/8 log5/4(n),knn3/4 log3/2(n),β[0,3/4],nβ154β{log(n)}8(1β)54β,knn4.54β54β{log(n)}454β,β(3/4,1).

2.6. Some possible extensions

The self-normalization approaches in Sections 2.2–2.5 can be extended to linear regression model (1.4) with modulated stationary time series errors. The approach in Phillips, Sun and Jin (2007) is not applicable here due to non-stationarity. For simplicity, we consider the simple case that p = 2, Ui = (1, i/n), and β = (β0, β1)′. Hansen (1995) studied a similar setting for martingale difference errors. Denote by β̂0 and β̂1 the simple linear regression estimates of β0 and β1 given by

β^1=ni=1niXii=1nii=1nXii=1ni2(i=1ni)2/n   and  β^0=X¯β^1(n+1)/(2n). (2.24)

Then simple algebra shows that

β^0β0=2n2ni=1n(2n3i+1)σiei,  β^1β1=6n21i=1n(2in1)σiei.

The latter expressions are linear combinations of {ei}. Thus, by the same argument in Proposition 2.2 and Theorem 2.1, we have self-normalized CLTs for β̂0 and β̂1.

Theorem 2.5. Let si,0 = (2n − 3i + 1)σi and si,1 = (2in − 1)σi. Assume that {si,0}1≤in and {si,1}1≤in satisfy condition (2.10). Then as n → ∞,

n2(β^0β0)2Vn,0N(0,τ2),  where  Vn,02=i=1n(2n3i+1)2(Xiβ^0β^1i/n)2,
n2(β^1β1)6Vn,1N(0,τ2),  where  Vn,12=i=1n(2in1)2(Xiβ^0β^1i/n)2.

The long-run variance τ2 can be estimated using the idea of blockwise self-normalization in Section 2.5. Let kn, ℓn and ℐj be defined as in Section 2.5. Then we propose

τ^2=1nj=1nDj2,  where   Dj=ij(Xiβ^0β^1i/n)ij(Xiβ^0β^1i/n)2. (2.25)

Here, D1, …, Dn are asymptotically IID normal random variables with mean zero and variance τ2. Consistency can be established under similar conditions as in Theorem 2.4.

For the general linear regression model (1.4), the linearly weighted average structure of linear regression estimates allows us to obtain self-normalized CLTs as in Theorem 2.5 under more complicated conditions. Also, it is possible to extend the proposed method to the nonparametric regression model with time-varying variances

Xi=f(i/n)+σiei, (2.26)

where f(·) is a nonparametric time trend of interest. Nonparametric estimates, for example the Nadaraya-Watson estimate, are usually based on locally weighted observations. The latter feature allows us to derive similar self-normalized CLT. However, the change-point problem for (1.4) and (2.26) will be more challenging, and Aue et al. (2008a) studied (1.4) for uncorrelated errors with constant variance. Also, it is more difficult to address the bandwidth selection issues; see Altman (1990) for related contribution when σi ≡ 1. It remains a direction of future research to investigate (1.4) and (2.26).

3. Simulation study

3.1. Selection of block length kn for τ̂

Recall that D1, …, Dn in (2.25) are asymptotically IID normal random variables. To get a sensible choice of the block length parameter kn, we propose a simulation-based method by minimizing the empirical mean squared error (MSE).

  1. Simulate n IID standard normal random variables Z1, …, Zn.

  2. Based on Z1, …, Zn, obtain τ̂ with block length k.

  3. Repeat (i)–(ii) many times, compute empirical MSE(k) as the average of realizations of (τ̂ − 1)2, and find the optimal k by minimizing MSE(k).

We find that the optimal block length k is about 12 for n = 120, about 15 for n = 240, about 20 for n = 360, 600, and about 25 for n = 1200.

3.2. Empirical coverage probabilities

Let sample size n = 120. Recall ei and σi in (1.1). For σi, consider four choices:

  • A1 : σi = 0.21in/2 + 0.61i>n/2,

  • A2 : σi = 0.2{1 + cos2(i/n4/5)},

  • A3 : σi = 0.2 + 0.1 log(1 + |in/2|),

  • A4 : σi = 0.3 + ϕ(i/60),

where ϕ is the standard normal density and 1 is the indicator function. The sequences A1–A4 exhibit different patterns, with a piecewise constancy for A1, a cosine shape for A2, a sharp change around time n/2 for A3, and a gradual downtrend for A4. Let εi be IID N(0,1). For ei, we consider both linear and nonlinear processes:

  • B1 : ei={ηiE(ηi)}/Var(ηi), where  ηi=θ|ηi1|+1θ2εi,|θ|<1.

  • B2 : ei=j=0ajεij, where  aj=(j+1)βj=0(j+1)2β,β>1/2.

For B1, by Wu (2007), (2.8) holds. By Andel, Netuka and Svara (1984), E(ηi)=θ2/π and Var(ηi) = 1 − 2θ2/π. To examine how the strength of dependence affects the performance, we consider θ = 0, 0.4, 0.8, representing independence, intermediate, and strong dependence, respectively. For B2 with β > 2, (2.2) holds with Δn = n1/4 log(n), and we consider three cases β = 2.1, 3, 4. To assess the effect of block length kn, three choices kn = 8, 10, 12 are used. Thus, we consider all 72 combinations of {A1, A2, A3, A4} × {B1, θ = 0, 0.4, 0.8; B2, β = 2.1, 3, 4} × {kn = 8, 10, 12}.

Without loss of generality we examine coverage probabilities based on 103 realized confidence intervals for μ = 0 in (1.1). We compare our self-normalization-based confidence intervals to some stationarity-based methods. For (1.1), if we pretend that the error process {i = σiei} is stationary, then we can use (1.2) to construct an asymptotic confidence interval for μ. Under stationarity, the long-run variance τ2 of {i} can be similarly estimated through the block method in Section 2.5 by using the non-normalized version Dj=kn[X¯(j)X¯] in (2.25); see Lahiri (2003). Thus, we compare two self-normalization-based methods and three stationarity-based alternatives: self-normalization-based confidence intervals through the asymptotic theory in Proposition 2.2 (SN) and the wild bootstrap (WB) in Section 2.3; stationarity-based confidence intervals through the asymptotic theory (1.2) (ST), non-overlapping block bootstrap (BB), and studentized non-overlapping block bootstrap (SBB) in Section 2.3. From the results in Table 1, we see that the coverage probabilities of the proposed self-normalization-based methods (columns SN and WB) are close to the nominal level 95% for almost all cases considered. By contrast, the stationarity-based methods (columns ST, BB and SBB) suffer from substantial undercoverage, especially when dependence is strong (θ = 0.8 in Table 1 (a) and β = 2.1 in Table 1 (b)). For the two self-normalization-based methods, WB slightly outperforms SN.

Table 1.

Coverage probabilities (in percentage) for μ in (1.1) with ei from B1 [Table 1 (a)] and B2 [Table 1 (b)]. Nominal level is 95%. SN and WB denote self-normalization-based confidence intervals using asymptotic theory in Proposition 2.2 and the wild bootstrap procedure, respectively; ST, BB, SBB denote stationarity-based confidence intervals using asymptotic theory in (1.2), non-overlapping block bootstrap, and studentized non-overlapping block bootstrap, respectively.

(a): Model B1

θ kn σi SN WB ST BB SBB σi SN WB ST BB SBB
0.0 8 98.0 94.7 93.1 92.2 92.8 96.6 95.2 92.3 92.5 92.5
10 A1 98.2 95.0 92.6 92.4 92.2 A2 94.6 94.6 90.0 89.5 89.4
12 98.1 95.6 91.7 91.4 91.1 92.1 93.7 89.7 89.5 89.6

8 96.4 95.0 92.5 92.3 92.0 96.6 95.6 93.1 92.6 93.0
10 A3 94.7 94.7 90.8 90.6 90.6 A4 95.1 95.1 91.4 91.3 91.3
12 93.7 94.8 90.8 90.4 90.5 92.9 93.7 89.8 89.7 89.5

0.4 8 98.7 95.9 92.7 92.6 92.9 96.6 95.3 92.5 92.4 92.0
10 A1 98.5 95.7 92.8 92.7 92.3 A2 95.4 95.4 91.6 91.1 91.6
12 98.0 95.0 90.8 90.8 90.2 92.5 94.0 89.4 89.1 89.4

8 96.6 95.2 91.7 91.7 91.6 95.4 94.1 90.8 90.9 90.6
10 A3 95.3 95.5 91.5 91.3 91.5 A4 95.0 94.8 91.2 90.7 90.8
12 93.1 94.6 90.2 89.9 89.9 94.1 95.1 90.3 89.8 90.1

0.8 8 97.9 94.6 87.8 86.8 87.3 96.1 94.7 87.2 87.3 87.0
10 A1 97.6 95.5 87.3 87.0 86.7 A2 93.3 92.9 86.4 86.8 86.1
12 97.3 94.0 85.8 85.5 85.1 92.6 93.4 86.5 86.4 86.4

8 94.8 93.5 85.7 85.7 86.0 95.5 94.7 86.3 86.1 86.1
10 A3 93.5 93.8 85.7 85.5 85.2 A4 95.3 95.1 88.5 88.3 88.5
12 92.4 93.3 87.2 86.7 86.9 92.6 94.2 86.3 85.8 85.7
(b): Model B2

β kn σi SN WB ST BB SBB σi SN WB ST BB SBB
4.0 8 97.6 94.9 91.8 91.4 91.9 95.9 94.2 91.9 92.0 91.1
10 A1 97.7 93.2 88.9 88.1 88.3 A2 95.7 95.7 92.1 91.8 92.1
12 97.9 95.5 90.7 90.2 90.0 93.3 94.6 90.0 89.9 89.7

8 94.6 93.3 89.8 89.5 89.5 95.6 94.7 91.3 91.7 91.0
10 A3 95.1 95.2 91.6 91.4 91.5 A4 95.4 95.9 92.8 92.2 93.0
12 93.8 95.4 90.8 90.6 90.2 93.9 94.9 88.9 88.5 88.6

3.0 8 99.1 95.7 91.1 91.0 91.2 95.8 94.6 90.4 89.8 90.1
10 A1 98.5 96.4 91.6 90.9 91.1 A2 95.6 95.2 92.1 91.9 91.5
12 97.9 94.6 89.6 89.3 89.0 94.1 95.0 90.5 90.2 90.4

8 95.9 94.6 92.0 91.9 91.7 96.0 94.5 90.6 90.4 90.3
10 A3 94.3 94.4 90.0 89.9 89.8 A4 94.3 94.4 89.2 89.3 88.9
12 93.2 94.5 88.9 88.6 88.7 93.1 94.1 89.6 88.9 88.8

2.1 8 97.1 92.5 86.2 86.2 85.5 95.7 93.8 88.9 89.0 88.7
10 A1 97.6 94.7 89.2 88.9 88.6 A2 93.5 93.6 88.8 88.8 88.4
12 97.2 95.1 87.9 87.5 87.7 92.6 93.9 88.0 87.6 87.7

8 94.0 93.7 88.5 88.4 88.3 95.0 93.1 88.8 88.7 88.6
10 A3 93.3 93.8 88.1 87.9 87.8 A4 94.1 94.2 89.1 88.8 89.1
12 92.9 94.7 89.1 88.4 88.4 91.5 92.6 87.7 87.5 87.5

3.3. Size and power study

In (1.3), we use the same setting for σi and ei as in Section 3.2. For mean μi, we consider μi = λ1i>40, λ ≥ 0, and compare the test statistics Tn1,Tn2 in (2.13) and TnSN in (2.16). First, we compare their size under the null with λ = 0. The critical value of TnSN is obtained using the wild bootstrap in Section 2.4; for Tn1 and Tn2, their critical values are based on the block bootstrap in Section 2.3. In each case, we use 103 bootstrap samples, nominal level 5%, and block length kn = 10, and summarize the empirical sizes (under the null λ = 0) in Table 2 based on 103 realizations. While TnSN has size close to 5%, Tn1 and Tn2 tend to over-reject the null, and the false rejection probabilities can be three times the nominal level of 5%. Next, we compare the size-adjusted power. Instead of using the bootstrap methods to obtain critical values, we use 95% quantiles of 104 realizations of the test statistics when data are simulated directly from the null model so that the empirical size is exactly 5%. Figure 1 presents the power curves for combinations {A1–A4} × {B1 with θ = 0.4; B2 with β = 3.0} with 103 realizations each. For A1, TnSN outperforms Tn1 and Tn2; for A2–A4, there is a moderate loss of power for TnSN. Overall, TnSN has power comparable to other two tests. In practice, however, the null model is unknown, and when one turns to the bootstrap method to obtain the critical values, the usual CUSUM tests Tn1 and Tn2 will likely over-reject the null as shown in Table 2. In summary, with such small sample size and complicated time-varying variances structure, TnSN along with the wild bootstrap method delivers reasonably good power and the size is close to nominal level.

Table 2.

Size (in percentage) comparison of Tn1 and Tn2 in (2.13) and TnSN in (2.16), with sample size n = 120, nominal level 5%, and block length kn = 10.

Model B1 Model B2
σi θ
TnSN
Tn1
Tn2
β
TnSN
Tn1
Tn2
0.0 4.9 9.1 8.4 2.1 7.3 12.2 13.4
A1 0.4 4.7 9.4 9.6 3.0 4.7 8.6 9.2
0.8 6.0 15.1 14.7 4.0 5.6 9.9 7.7

0.0 5.7 8.2 6.1 2.1 5.8 9.5 8.6
A2 0.4 6.1 8.9 6.8 3.0 5.3 9.6 6.8
0.8 7.3 12.6 9.3 4.0 4.2 7.5 4.2

0.0 5.0 5.7 4.8 2.1 5.5 7.7 6.7
A3 0.4 5.3 6.9 5.4 3.0 5.8 6.1 4.9
0.8 7.0 9.8 10.0 4.0 5.0 6.5 4.2

0.0 5.4 8.4 6.0 2.1 6.9 8.8 7.1
A4 0.4 5.7 7.9 5.2 3.0 4.8 6.6 6.3
0.8 7.2 11.1 9.2 4.0 5.3 6.2 5.8

Figure 1.

Figure 1

Size-adjusted power curves for Tn1 (dashed curve) and Tn2 (dotdash curve) in (2.13) and TnSN (solid curve) in (2.16) as functions of change size λ (horizontal axis) with sample size n = 120 and block length kn = 10. For (A1,B1)–(A4,B1), the error process {ei} is from B1 with θ = 0.4; for (A1,B2)–(A4,B2), the error process {ei} is from B2 with β = 3.0.

Finally, we point out that the proposed self-normalization-based methods are not robust to models with time-varying correlation structures. For example, consider the model ei = 0.3ei−1 + εi for 1 ≤ i ≤ 60 and ei = 0.8ei−1 + εi for 61 ≤ in, where εi are IID N(0,1). With kn = 10, the sizes (nominal level 5%) for the three tests TnSN,Tn1,Tn2 are 0.154, 0.196, 0.223 for A1. Future research directions include (i) developing tests for change in the variance or covariance structure for (1.1) [See Inclán and Tiao (1994), Aue et al. (2009) and Berkes, Gombay and Horváth (2009) for related contributions]; and (ii) developing methods that are robust to changes in correlations.

4. Applications to two real data sets

4.1. Annual mean precipitation in Seoul during 1771–2000

The data set consists of annual mean precipitation rates in Seoul during 1771–2000; see Figure 2 for a plot. The mean levels seem to be different for the two time periods 1771–1880 and 1881–2000. Ha and Ha (2006) assumed the observations are IID under the null hypothesis. As shown in Figure 2, the variations change over time. Also, the autocorrelation function plot (not reported here) indicates strong dependence up to lag 18. Therefore, it is more reasonable to apply our self-normalization-based test that is tailored to deal with modulated stationary processes. With sample size n = 230, by the method in Section 3.1, the optimal block length is about 15. Based on 105 bootstrap samples as described in Section 2.4, we obtain the corresponding p-values 0.016, 0.005, 0.045, 0.007, with block length kn = 12, 14, 16, 18, respectively. For all choices of kn, there is compelling evidence that a change-point occurred at year 1880. While our result is consistent with that of Ha and Ha (2006), our modulated stationary time series framework seems to be more reasonable. Denote by μ1 and μ2 the mean levels over pre-change and post-change time periods 1771–1880 and 1881–2000. For the two sub-periods with sample sizes 110 and 120, the optimal block length is about 12. With kn = 12, applying the wild bootstrap in Section 2.3 with 105 bootstrap samples, we obtain 95% confidence intervals [121.7, 161.3] for μ1, [100.9, 114.3] for μ2. For the difference μ1 − μ2, with optimal block length kn = 15, the 95% wild bootstrap confidence interval is [19.6, 48.2]. Note that the latter confidence interval for μ1 − μ2 does not cover zero, which provides further evidence for μ1 ≠ μ2 and the existence of a change-point at year 1880.

Figure 2.

Figure 2

Annual mean precipitation in Seoul during 1771–2000.

4.2. Quarterly U.S. GNP growth rates during 1947–2002

The data set consists of quarterly U.S. Gross National Product (GNP) growth rates from the first quarter of 1947 to the third quarter of 2002; see Section 3.8 in Shumway and Stoffer (2006) for a stationary autoregressive model approach. However, the plot in Figure 3 suggests a non-stationary pattern: the variation becomes smaller after year 1985 whereas the mean level remains constant. Moreover, the stationarity test in Kwiatkowski et al. (1992) provides fairly strong evidence for non-stationarity with a p-value of 0.088. With the block length kn = 12, 14, 16, 18, we obtain the corresponding p-values 0.853, 0.922, 0.903, 0.782, and hence there is no evidence to reject the null hypothesis of a constant mean μ. Based on kn = 15, the 95% wild bootstrap confidence interval for μ is [0.66%, 1.00%]. To test whether there is a change-point in the variance, by the discussion in the last paragraph of Section 2.4, we consider i = (Xin)2. With kn = 12, 14, 16, 18, the corresponding p-values are 0.001, 0.006, 0.001, 0.010, indicating strong evidence for a change-point in the variance at year 1984. In summary, we conclude that there is no change-point in the mean level, but there is a change-point in the variance at year 1984.

Figure 3.

Figure 3

Quarterly U.S. GNP growth rates during 1947–2002.

Acknowledgements

We are grateful to the associate editor and three anonymous referees for their insightful comments that have significantly improved this paper. We also thank Amanda Applegate for help on improving the presentation and Kyung-Ja Ha for providing us the Seoul precipitation data. Zhao’s research was partially supported by NIDA grant P50-DA10075. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDA or the NIH.

Appendix: Proofs

Proof of Theorem 2.1. Let rj=|σj|+i=2j|σiσi1|. By the triangle inequality, we have rjrn. Recall Si in (2.2). By the summation by parts formula, (2.6) follows via

Fj=i=1jσi(SiSi1)=σjSj+i=1j1(σiσi+1)Si=σjτBj+i=1j1(σiσi+1)τBi+Oa.s.(rnΔn)=τi=1jσi(BiBi1)+Oa.s.(rnΔn). (5.1)

By Kolmogorov’s maximal inequality for independent random variables, for δ > 0,

P{max1jn|i=1jσi(BiBi1)|δΣn}(δΣn)2E[{i=1nσi(BiBi1)}2]=δ2. (5.2)

Thus, by (5.1), max1≤jn |Fj| = Opn + rnΔn). Observe that

V¯j2Σj2=WjFj2/j,   where  Wj=i=1jσi2(ei21). (5.3)

By (2.2), the same argument in (5.1) and (5.2) shows Wj=Op(Σn*2+rn*Δn), uniformly. The desired result then follows via (5.3).

Proof of Theorem 2.2. Denote by Φ(x) the standard normal distribution function. By Proposition 2.2 and Slutsky’s theorem, ℙ(Hn/τ̂ ≤ x) → Φ(x) for each fixed x ∈ ℝ. Since Φ(x) is a continuous distribution, supx∈ℝ |ℙ(Hn/τ̂ ≤ x) − Φ(x)| = 0. It remains to prove supx|*(Hn*x)Φ(x)|0, in probability. Notice that, conditioning on {ei}, {ξi} are independent random variables with zero mean. By the Berry-Esséen bound in Bentkus, Bloznelis and Götze (1996), there exists a finite constant c such that

supx|*(Hn*x)Φ(x)|ci=1nE*(|ξi|3){i=1nE*(|ξi|2)}3/2, (5.4)

where E* denotes conditional expectations given {ei}. Clearly, E*(|ξi|2)=σi2ei2E(α12) and E(|ξi|3)=σi3|ei3|E(|α13|). Thus, under the assumption ei ∈ ℒ3, we have i=1nE*(|ξi|3)=Op(i=1nσi3). Meanwhile, by the proof of Theorem 2.1, i=1nE*(|ξi|2)=i=1nσi2ei2={1+op(1)}i=1nσi2. Therefore, the desired result follows from (5.4) in view of (2.11).

Proof of Theorem 2.3. For cnj ≤ (1 − c)n, c ≤ (1 − j/n), j/n ≤ 1 − c. For SX (j) in (2.14), by (2.6), we have maxcnj≤(1−c)n |SX(j) − τ X (j)| = Oa.s.(rnΔn), where

S˜X(j)=(1jn)i=1jσi(BiBi1)jni=j+1nσi(BiBi1).

By (2.7), maxcnj(1c)n|(1j/n)2V¯j2+(j/n)2V¯j2Vj2|=Op(ϖn), where

Vj2=(1j/n)2i=1jσi2+(j/n)2i=j+1nσi2  and  ϖn=(rn2Δn2+Σn2)/n+Σn*2+rn*Δn.

For cnj ≤ (1 − c)n, Vj2c2Σn2. Thus, condition (2.10) implies ϖn=o(Vj2) and {Vj2+Op(ϖn)}1/2=Vj+Op(ϖn/Vj) . Therefore, uniformly over cnj ≤ (1 − c)n,

Tn(j)τT˜n(j)=τS˜X(j)+Oa.s(rnΔn)Vj+Op(ϖn/Vj)τS˜X(j)Vj=Op{rnΔnVj+ϖnS˜X(j)Vj3}.

By (5.2), maxj |X (j)| = Opn). Thus, the result follows in view of VjcΣn.

Proof of Theorem 2.4. Condition Mn → 0 implies max1≤j≤ℓn r(jn/Σ(j) → 0. By (2.7),

ωjV2(j)Σ2(j)1=Op{Σ*2(j)+r*(j)ΔnΣ2(j)+1kn}=Op(Mn)0. (5.5)

Define Uj = Σ−1(j) ∑i∈ℐj σi(BiBi−1). Clearly, U1, …, Un are independent standard normal random variables. Thus, max1jn|Uj|=Op{log(n)}=Op{log(n)}. By (2.6), n − μ = Op{(Σn + rnΔn)/n} = Opn/n). Recall the definition of Dj in (2.18). By the same argument in (2.6), using 1+x=1+O(x) as x → 0, we have

Dj=kn{X¯(j)μ}Σ(j)11+ωj+kn(μX¯n)Σ(j)11+ωj=[τUj+Oa.s{r(j)ΔnΣ(j)}]{1+O(ωj)}+Op{knΣnnΣ(j)}=τUj+Op{log(n)Mn+ΣnnΣ(j)}.

By the latter expression and log(n)Mn → 0, we can easily verify τ̂2 − τ2 = Opn).

References

  1. Adak S. Time-dependent spectral analysis of nonstationary time series. J. Amer. Statist. Assoc. 1998;93:1488–1501. [Google Scholar]
  2. Altman NS. Kernel smoothing of data with correlated errors. J. Amer. Statist. Assoc. 1990;85:749–759. [Google Scholar]
  3. Andel J, Netuka I, Svara K. On threshold autoregressive processes. Kybernetika. 1984;20:89–106. [Google Scholar]
  4. Andrews DWK. Tests for parameter instability and structural change with unknown change point. Econometrica. 1993;61:821–856. [Google Scholar]
  5. Aue A, HÖrmann S, Horváth L, Reimherr M. Break detection in the covariance structure of multivariate time series models. Ann. Statist. 2009;37:4046–4087. [Google Scholar]
  6. Aue A, Horváth L, Hušsková M, Kokoszka P. Testing for changes in polynomial regression. Bernoulli. 2008a;14:637–660. [Google Scholar]
  7. Aue A, Horváth L, Kokoszka P, Steinebach J. Monitoring shifts in mean: asymptotic normality of stopping times. Test. 2008b;17:515–530. [Google Scholar]
  8. Bai J, Perron P. Estimating and testing linear models with multiple structural changes. Econometrica. 1998;66:47–78. [Google Scholar]
  9. Bentkus V, Bloznelis M, Götze F. A Berry-Essen bound for Student’s statistic in the non i.i.d. case. J. Theoret. Probab. 1996;9:765–796. [Google Scholar]
  10. Berkes I, Gombay E, Horváth L. Testing for changes in the covariance structure of linear processes. J. Statist. Plann. Inference. 2009;139:2044–2063. [Google Scholar]
  11. Billingsley P. Convergence of Probability Measures. New York: Wiley; 1968. [Google Scholar]
  12. Bühlmann P. Bootstraps for time series. Stat. Sci. 2002;17:52–72. [Google Scholar]
  13. Carlstein E. The use of subseries values for estimating the variance of a general statistic from a stationary sequence. Ann. Statist. 1986;14:1171–1179. [Google Scholar]
  14. Csörgő M, Horváth L. Limit Theorems in Change-point Analysis. New York: Wiley; 1997. [Google Scholar]
  15. Dahlhaus R. Fitting time series models to nonstationary processes. Ann. Statist. 1997;25:1–37. [Google Scholar]
  16. Dahlhaus R, Polonik W. Empirical spectral processes for locally stationary time series. Bernoulli. 2009;15:1–39. [Google Scholar]
  17. Davidson R, Flachaire E. The wild bootstrap, tamed at last. J. Econometrics. 2008;146:162–169. [Google Scholar]
  18. De Jong RM, Davidson J. Consistency of kernel estimators of heteroscedastic and autocorrelated covariance matrices. Econometrica. 2000;68:407–423. [Google Scholar]
  19. Efron B. Bootstrap methods: Another look at the jackknife. Ann. Statist. 1979;7:1–26. [Google Scholar]
  20. Fan J, Yao Q. Nonlinear Time Series: Nonparametric and Parametric Methods. New York: Springer-Verlag; 2003. [Google Scholar]
  21. Götze F, Künsch HR. Second order correctness of the blockwise bootstrap for stationary observations. Ann. Statist. 1996;24:1914–1933. [Google Scholar]
  22. Ha K-J, Ha E. Climatic change and interannual fluctuations in the long-term record of monthly precipitation for Seoul. Int. J. Climatol. 2006;26:607–618. [Google Scholar]
  23. Hansen B. Regression with non-stationary volatility. Econometrica. 1995;63:1113–1132. [Google Scholar]
  24. Hansen B. Testing for structural change in conditional models. J. Econometrics. 2000;97:93–115. [Google Scholar]
  25. Horváth L. The maximum likelihood method for testing changes in the parameters of normal observations. Ann. Statist. 1993;21:671–680. [Google Scholar]
  26. Inclán C, Tiao GC. Use of cumulative sums of squares for retrospective detection of changes of variance. J. Amer. Statist. Assoc. 1994;89:913–923. [Google Scholar]
  27. Künsch HR. The jackknife and the bootstrap for general stationary observations. Ann. Statist. 1989;17:1217–1241. [Google Scholar]
  28. Kwiatkowski D, Phillips PCB, Schmidt P, Shin Y. Testing the null hypothesis of stationarity against the alternative of a unit root. J. Econometrics. 1992;54:159–178. [Google Scholar]
  29. Lahiri SN. Resampling Methods for Dependent Data. New York: Springer-Verlag; 2003. [Google Scholar]
  30. Liu RY. Bootstrap procedures under some non-I.I.D. models. Ann. Statist. 1988;16:1696–1708. [Google Scholar]
  31. Müller UK. A theory of robust long-run variance estimation. J. Econometrics. 2007;141:1331–1352. [Google Scholar]
  32. Pettitt A. A simple cumulative sum type statistic for the change-point problem with zero-one observations. Biometrika. 1980;67:79–84. [Google Scholar]
  33. Phillips PCB, Sun YX, Jin SN. Long run variance estimation and robust regression testing using sharp origin kernels with no truncation. J. Statist. Plann. Inference. 2007;137:985–1023. [Google Scholar]
  34. Politis D, Romano J. Large sample confidence regions based on subsamples under minimal assumptions. Ann. Statist. 1994;22:2031–2050. [Google Scholar]
  35. Robbins MW, Lund RB, Gallagher CM, Lu Q. Changepoints in the North Atlantic tropical cyclone record. J. Amer. Statist. Assoc. 2011;106:89–99. [Google Scholar]
  36. Shao QM. Almost sure invariance principles for mixing sequences of random variables. Stoch. Proc. Appl. 1993;48:319–334. [Google Scholar]
  37. Shao X, Zhang X. Testing for change points in time series. J. Amer. Statist. Assoc. 2010;105:1228–1240. [Google Scholar]
  38. Shumway RH, Stoffer DS. Time Series Analysis and its Applications with R Examples. 2nd edn. New York: Springer; 2006. [Google Scholar]
  39. Wu CFJ. Jackknife, bootstrap and other resampling methods in regression analysis. Ann. Statist. 1986;14:1261–1295. [Google Scholar]
  40. Wu WB. Strong invariance principles for dependent random variables. Ann. Probab. 2007;35:2294–2320. [Google Scholar]
  41. Zhao Z. A self-normalized confidence interval for the mean of a class of non-stationary processes. Biometrika. 2011;98:81–90. doi: 10.1093/biomet/asq076. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES