Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2023 Mar 13;50(14):2862–2888. doi: 10.1080/02664763.2023.2174257

Detection and estimation of multiple transient changes

Michael Baron a,CONTACT, Sergey V Malov b,c,d
PMCID: PMC10557625  PMID: 37808619

Abstract

Change-point detection methods are proposed for the case of temporary failures, or transient changes, when an unexpected disorder is ultimately followed by a re-adjustment and return to the initial state. A base distribution of the ‘in-control’ state changes to an ‘out-of-control’ distribution for unknown periods of time. Likelihood based sequential and retrospective tools are proposed for the detection and estimation of each pair of change-points. The accuracy of the obtained change-point estimates is assessed. Proposed methods offer simultaneous control of the familywise false alarm and false re-adjustment rates at the pre-chosen levels.

Keywords: Change-point problem, CUSUM process, false alarm, maximum likelihood estimate, transient changes

1. Introduction to transient changes

Transient changes, or temporary disorders, refer to the situations when an initial distribution of observed data changes to a different one and eventually returns to the original state. The moments of change are usually unexpected and a priori unknown, the underlying distributions may be known or unknown, but the ultimate return to the initial distribution is assumed to be inevitable. In general, a data sequence may experience one or more transient changes, which can be changes in the mean value, variance, or other characteristics of the observed process. This article focuses on the detection of such changes and estimation of change-points.

There is a wide range of practical situations that are subject to transient changes. Applications in signal and image processing for the detection of finite signals are mentioned in [41], with the detection of space objects detailed in [41, Section 6]. Detection of transient changes appears useful in medical diagnostics based on the heart rate variability [7]. Similar models, termed ‘the pulse form’ or ‘the epidemic alternative’, were introduced in [17,23,46] for epidemiologic monitoring and malformation surveillance. Application to the monitoring of chemical concentrations in drinking water is detailed in [16], Section 6. Analysis of transient changes is important in industrial process control and power systems, for the identification of in-control and out-of-control periods; specific applications are described in [1,48]. Another application studied in [1] deals with the exploration of vertical ocean shears.

Similar situations also occur in financial data from deregulated energy markets. During the periods of high demand, extreme weather conditions, maintenance or closure of a power plant, the instantaneous price of electricity may experience a spike lasting from several hours to several days, as shown on Figure 1. After each spike, the distribution of prices returns to the initial state [5,6,39,47]. Accurate detection of spikes and estimation of their parameters is needed for financial modeling and prediction that is critical for proper valuation of energy options and contracts [29,30].

Figure 1.

Figure 1.

Spikes in instantaneous electricity prices during two years in the PJM (Pennsylvania–New Jersey–Maryland) energy market.

The field of change-point analysis is well studied over the last 70 years or so [33]. Thorough surveys of proposed methods can be found in [8,27,42]. Many of the proposed tools can be applied to the analysis of transient changes; for example, [18] evaluates performance of the Page's CUSUM procedure [25,26] for transient changes of a known duration. As we show, the transient change-point analysis is a different problem; in particular, the likelihood-ratio testing for a transient change is related to the maximum value of a CUSUM process.

We start with the retrospective (off-line, non-sequential) analysis of one transient change. Our goals focus on testing occurrence of a change and estimating its starting and ending moments. The level α likelihood-ratio test is derived, which leads to the maximum likelihood estimation of the interval of change. Then, we derive the asymptotic distribution of the maximum likelihood estimator.

In the case of Normal distributions, our estimator of the end of a transient segment coincides with the test statistic of [46, Section 2.2] and [35, Section 3.6]. A number of other tests are presented in [46] for a transient change in a sequence of Normal random variables. A short survey of other tests is given in [1, Section 1].

We then generalize our method to the situation of multiple transient changes. A number of algorithms have been proposed for multiple change-points that are not transient. One can use an overall maximum likelihood estimation procedure [14], although without a restriction it tends to produce false alarms. To limit false alarms, one can restrict the number of change-points or the distance between them, as in [22]. Alternatively, one can utilize binary segmentation [43,44] and wild binary segmentation [12,13], the isolate-detect scheme [2], Bayesian recursion [11] and its regression version [31], scan statistics [10], and other methods.

Building upon our analysis of one transient change, we utilize a fully sequential scheme of [3] to detect and estimate multiple transient changes. In other words, a retrospective problem that arises from a sample of a fixed size n is being solved sequentially, where change-points are detected sequentially, one at a time.

Several sequential statistical methods have been proposed for the analysis of transient changes. Under the assumption of a known duration of the post-change period, the standard CUSUM algorithm for change-point detection is modified and optimized in [15,16,41]. The optimality is understood as the lowest probability of missing a transient change [15,16] or the highest probability of detection [41], subject to the given probability of a false alarm within the given time. The optimized detection rule is the window-limited CUSUM, or WL-CUSUM. The special case of changes in the mean is considered in [24], where an approximate expression for the average run length to false alarm is given for the moving-average sum (MOSUM) algorithm.

When the assumption of a completely known duration of the period of change is not realistic, one may consider it random, put a prior distribution on each change-point, and consider the resulting Bayesian problem, as in [6,28,40].

For a more detailed overview of literature on Bayesian and non-Bayesian transient change-point detection methods, see [9,16,46].

In this work, we focus on the detection of transient changes and estimation of change-points when the change intervals are completely unknown. The considered scenarios are retrospective, in which a fixed-length data sequence is already collected, and it may contain one or more transient changes that we aim to detect without an option of requesting more data.

Existence of two alternating distributions, regarded as ‘in-control’ and ‘out-of-control’ states, leads us to a new problem of a simultaneous control of the false alarm and the false re-adjustment rates.

The new results include simultaneous control of these rates in the case of one transient change, their familywise control in the case of multiple transient changes, and the asymptotic distribution of the maximum likelihood estimator of the interval of change. Notably, our introduced threshold for α-testing of one transient change, is extended to the case of multiple transient changes, providing α-control of both familywise error rates without a Bonferroni-type correction or another adjustment for multiple comparisons. The Doob's maximal inequality, used in the derivation of this threshold, appears sufficient to guarantee familywise control regardless of the number of transient changes.

Algorithms for the detection, estimation, and testing of one transient change are derived in Section 2, and for multiple transient changes in Section 4, where the number of changes may be known or unknown. The proposed detection method, a self-correcting CUSUM procedure, is shown to detect transient changes while controlling the familywise false alarm rate and the familywise false re-adjustment rate simultaneously at the pre-determined levels. Asymptotic results are derived in Section 3. Section 5 contains illustrations and simulation results.

2. Estimation and testing of one transient change interval

In this section, we assume at most one interval of change and derive the transient change detection scheme that controls the probability of a false alarm at a pre-chosen level α. Two scenarios are possible - either all the data follow the base distribution,

H0: X1,,XnF,

or there is one region of change [a,b], so that

H1:{X1,,XaFXa+1,,XbGXb+1,,XnF

where a and b are unknown change-points while the distributions F and G are known. The former case can be viewed as the no-change null hypothesis H0, and the latter as the transient-change alternative H1.

The goals are [1] to distinguish between H0 and H1 with a given level of significance, and [2] to estimate change-point parameters a and b in the case of H1.

2.1. Maximum likelihood estimation

From now on, we assume independent observations X1,,Xn. (The case of dependent data is usually solved by representing the joint likelihood function as a product of conditional densities, given past observations. Probability results will then require additional assumptions.) For the parameter (a,b), the log-likelihood function is written as

L(X;a,b)=i=1alogf(Xi)+i=a+1blogg(Xi)+i=b+1nlogf(Xi)=i=a+1blogg(Xi)f(Xi)+i=1nlogf(Xi)=SbSa+const, (1)

where f and g are probability densities of distributions F and G with respect to a reference measure μ;

St=i=1tlogg(Xi)f(Xi)=i=1tzi

is a random walk built on marginal log-likelihood ratios zi as its increments; and i=1nlogf(Xi) is a constant term as it does not depend on the unknown parameters a and b. Measures F and G are not required to be mutually absolutely continuous, so that the log-likelihood ratio log(g/f) assumes values in R¯=[,].

Maximizing  (1), we immediately obtain the maximum likelihood estimator (MLE)

(aˆ,bˆ)=(aˆn,bˆn)=argmaxab(SbSa). (2)

Naturally, the log-likelihood (1) and the maximizer (2) are functions of the sample size n. To simplify notations, n will often be omitted, except for the asymptotic study in Section 3.4, exploring the distribution of (2) as n.

According to (2), the MLE returns the interval of the largest growth of random walk St. A direct method of calculating aˆ and bˆ can be proposed in terms of the associated cumulative-sum (CUSUM) process

Wt=StminitSi, (3)

which vanishes at every successive point of minimum of St. Given bˆ, one finds aˆ by minimizing St for tbˆ, that is, finding the most recent zero of the CUSUM Wt. Then, the CUSUM does not return to zero between aˆ and bˆ, and therefore,

SbˆSaˆ=WbˆWaˆ=Wbˆ. (4)

Maximizing (4), we obtain its computational formula for the MLE,

bˆ=bˆn=argmax0<tnWt,  aˆ=aˆn=max{Ker(W)[0,bˆ)}, (5)

where Ker (W)=t:Wt=0 denotes the CUSUM's ‘kernel’, or the set of its zeros.

Our estimator bˆ matches the Levin-Kline statistic [3] of [23], which is applied to the case of Normal distributions in Section 3.6 of [35] and Section 2.2 of [46].

As an illustration, an example of a log-likelihood ratio based random walk St, the associated CUSUM process Wt, and the resulting transient change-point estimator (aˆ,bˆ) is shown in Figure 2.

Figure 2.

Figure 2.

Maximum likelihood estimation of a single transient change interval. The likelihood-ratio test statistic Λ is the largest increment of both processes St and Wt.

2.2. Testing appearance of a transient change

The largest increment (4) of the random walk St also serves as the log-likelihood ratio test statistic

Λ=Λn=logmaxa<b{f(X0:a)g(Xa:b)f(Xb:n)}f(X0:n)=maxa<b(SbSa)=Wbˆ

for testing the no-change null hypothesis against an alternative hypothesis that a transient change occurred in our data,

H0: X0:nF  vs.  H1: {X0:aFXa:bGXb:nF  for some a<b,

where Xk:m=(Xk+1,,Xm) for any k<m, and F(Xk:m) and G(Xk:m) denote joint distributions.

The likelihood-ratio test (LRT) rejects H0 in favor of H0 if Λh for some threshold h. The choice of h controls the balance between probabilities of Type I and Type II errors, or in other words, between the detection sensitivity and the rate of false alarms.

In order to control the probability of a false alarm at the given level α, we take advantage of the Doob's Maximal Inequality (for example, see [32], Section VII-3; [38], Section 7.1.1). It states that for a submartingale {Yt} and any constant c0,

P{sup0tnYtc}E(Yn+)c,

where x+=max{x,0}.

The Doob's inequality can be applied directly to the LRT statistic

Λ=Wbˆ=max0tnWt

in the following way. The CUSUM process (3) admits a recursive representation

W0=0, Wt+1=max{0,Wt+zt+1}

with zi=logg(Xi)f(Xi) ([25], Section 2.2). Similarly, Ut=exp{Wt} can be expressed recursively as

U0=1, Ut+1=max{1,Utezt+1}. (6)

It follows that EF|Ut|< for every t, because

1EF(ez11)=EF{ez1;z10}+PF{z1<0}EFez1+1=gffdμ+1=1+1=2,

and therefore,

1EF(Ut)EFt(ez11)2t<.

Also, from (6),

EF{Ut+1|U1,,Ut}UtEFezt+1=Ut,

showing that Ut is a submartingale. Applying the Doob's maximal inequality to the process {Ut}, we have

P{ Type I error }=PF{Λh}=PF{max0tnWth}=PF{max0tnUteh} (7)
ehEF(Un)=ehEF(eWn). (8)

Thus, setting the threshold at

hα=logαEF(eWn) (9)

guarantees the probability of a false alarm no higher than α.

A large-sample, large-threshold asymptotic expression for the Type I error probability (7) is derived in [36, Theorem 2]. By comparison, (8) is an inequality, which is valid for any finite n and h.

We conclude this section summarizing the obtained results.

Proposition 2.1 Detection of a single transient segment —

For the case of at most one transient change in the interval [0,n],

  • (1)

    The maximum likelihood estimator of the interval of change [a,b] is given by (2) in terms of the random walk St and by (5) in terms of the CUSUM process Wt.

  • (2)
    The likelihood-ratio test (LRT) rejects the no-change hypothesis if
    Λ=max0tnWth
    for some threshold h.
  • (3)

    Threshold hα determined by (9) yields a level α LRT.

  • (4)

    The change-point detection algorithm that reports a transient change at the stopping time Tα=min{t:Wthα} produces a false alarm with probability P{false alarm}α.

3. Precision and limiting distribution of the MLE

In this section, we assume existence of a transient interval (a,b]. In other words, we assume the alternative hypothesis H1 and study the distribution of the maximum likelihood estimators (aˆ,bˆ) under H1. Under this condition, we study precision of the maximum likelihood estimator for the interval of change and derive its large-sample limiting distribution.

Several concepts will be introduced as building blocks in this derivation:

  • pre-likelihood estimators (PLE), defined by condition (10), weaker than the MLE, and therefore, including all MLE;

  • local likelihood estimators (LLE), defined constructively for any local point γ;

  • detection point, which belongs to the interval of change with a given probability.

3.1. Pre-Likelihood estimators

We start by introducing so-called pre-likelihood estimators (PLE). PLE satisfy the necessary (but not sufficient) conditions (10) for a pair of points to be the MLE.

To this end, introduce the direct and inverse shifted random walk processes {Sτ,i+}i=1nτ: Sτ,i+=Sτ+iSτ and {Sτ,i}i=1τ1: Sτ,i=SτiSτ respectively; S0+=S0=0, Si=Si+=S1,i= and Si=Sn,i, i=1,,n. The MLE (aˆ,bˆ) must satisfy inequalities

miniSaˆ,i>0,minibˆaˆSaˆ,i+>0,maxibˆaˆSbˆ,i<0,maxiSbˆ,i+<0. (10)

In the sequel, any pair (a~,b~) that satisfies (10) will be called a pre-likelihood estimator (PLE). The PLE conditions (10) ensure that the left-side estimate a~ cannot be improved for the given right-end b~ by shifting it to increase Sb~Sa~, and similarly, the right-side estimate b~ cannot be improved for the given left-end a~.

PLE are directly related to MLE, because if there exists a unique PLE, it is equal to the MLE. In the next section, we find conditions for the PLE to be unique, for a sufficiently large n.

PLE can be constructed by a combination of the direct and reverse CUSUM processes, Wt and Wt, where the reverse CUSUM process is defined as

W0=0, Wnk=max(0,S1,,Sk)Sk,k=1,,n.

The direct CUSUM Wt is based upon the random walk St=z1++zt, whereas the reverse CUSUM is built upon the random walk that starts at time n and proceeds back in time, adding increments (zi). Any PLE can be obtained from the kernels of these CUSUM processes, defined as K={t:Wt=0}, K={t:Wt=0}, and K~=KK. A pair (a~,b~) is a PLE if and only if a~<b~, a~K, b~K, and {iK~:a~<i<b~}=.

3.2. Local likelihood estimators

In general, there may be multiple PLEs, and their number is random. To study their distribution, we define a class of local estimators (a~γ,b~γ) that are constructed around a fixed point γ,

a~γ=γargminiγSγ,iandb~γ=γ+argmaxinγSγ,i+.

We call them local likelihood estimators (LLE). Every PLE coincides with an LLE with respect to any point γ inside the interval defined by this PLE.

Below, we derive the distribution of an LLE with respect to a fixed point γ. Constructed this way, LLE a~γ and b~γ are independent for any fixed γ. We consider the interesting case of a<γb. The distribution of b~γ under bγ=k0 is the same as the distribution of MLE in the change point problem. Then

P(b~γ=b)=RG,k(0)RF,nb+(0)exp(m=1m1(PF(j=1mYj0)+PG(j=1mYj0))),

where

RH,k+(x)=PH(max(0,S1+,,Sk+)<x),RH,k(x)=PH(max(0,S1,,Sk)<x), (11)

and the random walk process Sk based on the sample from a distribution H. The last inequality follows from the Spitzer's formula (see [45]) as k,(nb) [21]. Moreover [19], under r>0,

P(b~γ=b+r)=0RG,bγ(x)BF,r,nbr+(x)dx,

and under r<0,

P(b~γ=b+r)=0RF,nb+(x)BG,r,bγ+r(x)dx,

where

BH,k,s+(y)dy=PH(argmax0ik+sSi=k,Sk[y,y+dy))BH,k,s(y)dy=PH(argmax0ik+s(Si)=k,Sk[y,y+dy)). (12)

The distribution of a~γ can be obtained in a similar manner:

P(a~γ=a)=RF,a(0)RG,γa+1(0)exp(m=1m1(PG(j=1mYj0)+PF(j=1mYj0)));
P(a~γ=a+l)=0RG,γa+1(x)BF,l,a+l+(x)dx

for l<0; and

P(a~γ=a+l)=0RF,a+(x)BG,l,γal+1(x)dx

for l>0.

Proposition 3.1

Let a, b are fixed. Then for each rJ={b+1,,nb},

infγ(a,b]P(b~γ=b+r)pr={exp(m=11m(PF(j=1mYj0,)+PG(j=1mYj0)))forr=0,0RF,+(x)BG,r,(x)dxforr<0,+.2em0RG,(x)BF,r,+(x)dxforr>0,

and for each lJ={a+1,,na},

infγ(a,b]P(a~γ=a+l)ql={exp(m=11m(PG(j=1mYj0)+PF(j=1mYj0)))forl=0,0RG,(x)BF,l,+(x)dxforl<0,+.2em0RF,+(x)BG,l,(x)dxforl>0.

Proof.

Continuing trajectories of the random walks occurring at time k in a neighborhood of y, we conclude that the probability of reaching maximum at time k is not increased. Hence, BH,k,s+(x), BH,k,s(x) are non increased in s as s>0. In a similar manner we obtain that RH,s+(x) and RH,s(x) are non increased in s as s>0. The proposition follows immediately.

The right hand sides of inequalities in the last proposition for r0 or s0 are quite complicated for practical use. Approximations suitable for computation are obtained in [19].

The inequalities in Proposition 3.1 can be used immediately to get the lower bound for the cumulative probabilities,

P(sa~γat)l=stql  and  P(sb~γbt)r=stpr (13)

for st and any γ(a,b], where pr and ql are given in Proposition 3.1.

3.3. Local estimation around a detection point

The next result extends (13) from fixed γ to a random point γˆ, possibly dependent on data, that has a probability of falling into (a,b] bounded from below. If P(γˆ(a,b])1α, the random point γˆ will be called a detection point of level α.

An example of a level α detection point can be constructed as follows. Let α=α1+α2, and consider the stopping time Tα1, defined by part 4 of Proposition 2.1. Similarly, consider the stopping time Tα2 that is based on the reverse CUSUM process Wt. If Tα1Tα2, then any point in the interval [Tα1,Tα2] is a detection point of level α.

Cumulative and tail probabilities for LLE with respect to such a point γˆ are bounded from below in the following proposition.

Proposition 3.2

Let γˆ be a detection point of level α, as defined above. Then

P(sa~γˆat)l=stqlα;P(sb~γˆbt)r=stprα,

where probabilities pr and ql are given in Proposition 3.1.

Proof.

The proof is based on the Boole inequality

1P(AB)=P(A¯B¯)P(A¯)+P(B¯)

that implies P(AB)P(A)P(B¯). Then for any fixed a<b,

P(sa~γˆar)P(sa~γˆar,γˆ(a,b])infγ(a,b]P(sa~γar,γ(a,b])infγ(a,b]P(sa~γar)αl=stqlα,

from (13). The second inequality is obtained analogously.

3.4. Asymptotic distribution of the MLE

In this section, we study the large-sample asymptotic behavior of MLE (aˆ,bˆ) as the sample size and all homogeneous segments tend to infinity. We assume that the parameters a=a(n) and b=b(n) and the interval of change D=[a(n),b(n)] are dependent on n, and Δ=min{a(n),b(n)a(n),nb(n)} as n. We use the notation PP(D,n) for the distribution with a transient change and PH for the case of i.i.d. random variables X1,,Xn with the common distribution function H. In particular, PFP(,n) and PGP(N,n), where N={1,,n}.

Next, we define random walks Sk=i=1kYi and S~k=i=1kYi. For example, for the transient change-point detection problem, with log-likelihood ratios Yi=loggf(Xi), the random walk Sk is used to detect a change from F to G whereas S~k is used to detect a change from G to F.

Let Wk, W~k be the corresponding CUSUM processes, where W0=W~0=0, Wk=SkminikSi=(Wk1+Yk)0, and W~k=maxikSiSk=(W~k1Yk)0.

We start with the following auxiliary results.

Lemma 3.3

Let EFY=c1<0. Then for any ϵ>0

PF(supnmmaxjnWj/n>ϵ)0asm.

Proof.

Let F={Fk}kN be the natural filtration associated with the process Y1,Y2,; and τ1,τ2, be the successive zeroes of the CUSUM process {Wk}kN.

Introduce Yk=Yk1I{Yk0} and Sk=j=1kYj, kN. Note that EFYk=c2<. By the Markov property of the random process {Sk}kN: Sk=jkYj with respect to the filtration F using Wald's identity, we obtain that EFSτk=EFSτ1=c2EFτ1 for all k>1.

Let Z1,Z2, be independent copies of the random variable Sτ1. Then

PF(maxjnWj/n>ϵ)PF(maxjnZj/n>ϵ). (14)

Denote, H(u)=PF(Z1u) is the distribution function of Z's. Note that the events {maxjnZj/n>ϵ}, nN, occur infinitely many times iff the events {Zk>kϵ}, kN, occur infinitely many times; PF(Zk>kϵ)=1H(kϵ). By the Borel–Cantelli lemma and Maclaurin–Cauchy test, PF almost sure Zkkϵ under a sufficiently large k if

0(1H(xϵ))dx=EZ1<.

The lemma is proved.

The next lemma follows immediately from the strong law of large numbers (SLLN) and Lemma 3.3.

Lemma 3.4

Let EFY=c1<0, EGY=c2>0 and liminfnΔnnϵ for some ϵ>0. Then

limrlimnP(|aˆna|r)=0;limrlimnP(|bˆnb|r)=0.

Proof.

For the most distant from a version of the PLE a~n we can write that

P(aaˆnr)P(aa~nr)=P(supkarWk=supknWk)PF(infrkaS~k0)+P(supkaWkSa,ba)

where

PF(infrkaS~k0)PF(infkrS~k0)=PF(supkrSk0)=PF(supkr(Sk/kc1)c1)0 (15)

as r. Moreover, for any δ>0,

P(supkaWkSa,ba)PF(supkaWk>aδ)+PG(Sbaaδ).

The first term in the right-hand side of the last inequality tends to 0 as n by Lemma 3.3, and the second term tends to 0 as n by the law of large numbers since limsupna/(ba)ϵ1 and c2>0.

Analogously, we obtain that P(bˆnbr)0 as r.

Let ϵ>0; rϵ and nϵ are such that

P(bˆnb>rϵ)ϵ/2

for all nnϵ. Since aˆnbˆn, on the event Aϵ={bˆnbrϵ},

P(aˆnar)PG(suprkbaSk0)+P(Sa,baminkrϵSba,k0). (16)

The first term in the right hand side of the last inequality is tended to 0 as r uniformly in n1 as in (15). Then there exists an r0ϵ, such that PG(supr0ϵkbaSk0)ϵ/4. Finally, minkrϵSba,k=OP(1), and, therefore,

P(Sa,baminkrϵSba,k0)=P(Sa,ba/(ba)c2c2+minkrϵSba,k/(ba))0

as ba. Hence, the second term in (16) is not exceed ϵ/4 under the sufficiently large Δ. We obtained that P(aˆnarϵr0ϵ)ϵ, under the sufficiently large Δ, and, therefore,

limrlimΔP(aˆnar)=0.

Convergence limrlimΔP(bˆbnr)=0 can be obtained in the similar manner. Therefore, the lemma is proved.

Lemma 3.4 yields the following proposition.

Proposition 3.5

Let γ=λa+(1λ)b, γˆ=λaˆ+(1λ)bˆ for some λ(0,1); EFY<0 and EGY>0. Then

Pθ(aˆ<γ<bˆ)1andPθ(a<γˆ<b)1,

as liminfnΔn/nϵ for some ϵ>0. Moreover,

Pθ(max(bˆγ,γaˆ)>M)1andPθ(max(bγˆ,γˆa)>M)1

for any fixed M>0 for all θ=(a,b) as liminfnΔn/nϵ for some ϵ>0.

Remark 3.1

  1. Lemma 3.4 actually proves that the MLE (aˆ,bˆ) is unique with probability tending to 1 as n, and if γ(a,b) and (a~γ,b~γ) is the LLE with respect to some point γ, then
    Pθ(aˆ=a~γ,bˆ=b~γ)1
    as n uniformly for all a(n),b(n):liminfnΔn/nϵ for some ϵ>0.
  2. Under some known point γ between a and b, the estimation problem reduces to two separate change-point estimation problems, on the direct i=1,,γ and the inverse i=n,n1,,γ sets of indices.

  3. The main results can be easily extended to the case of multiple transient changes Dn=i=1Jn(aj,bj] as Δn=minj=0,,Jn+1(bjaj) as n, where a0=0 and aJn+1=n.

Remark 3.1(i), together with Proposition 3.1, yield the following result, which establishes the asymptotic distribution of the MLE (aˆ,bˆ).

Proposition 3.6

Under the conditions of Lemma 3.4, for any fixed r and l,

limnP(bˆ=b+r)=pr={exp(m=11m(PF(j=1mYj0,)+PG(j=1mYj0)))forr=0,0RF,+(x)BG,r,(x)dxforr<0,0RG,(x)BF,r,+(x)dxforr>0;limnP(aˆ=a+l)=ql={exp(m=11m(PG(j=1mYj0)+PF(j=1mYj0)))forl=0,0RG,(x)BF,l,+(x)dxforl<0,0RF,+(x)BG,l,(x)dxforl>0;

where rZpr=lZql=1, and R+, R, B+, and B are defined by (11) and (12) in the previous section. Moreover,

limnsupIZ|rIP(bˆ=b+r)rIpr|=0 (17)

and

limnsupIZ|lIP(aˆ=a+l)lIql|=0. (18)

Proof.

Let prn=infγ(a,b)P(bˆ=b+r) for some γ(a,b) and J={b+1,,nb}. Then Proposition 3.1 implies that rJprnrJpr. Moreover, prn=0 for rJ and rJpr1 as n. Hence, supIZinfγ(a,b)|rIprnrIpr|0 as n. On the other hand, let λ=(a+b)/2 and Aλ={bˆ=b~λ}. Then |P(bˆI)P(b~I)|=|P(bˆI,A¯λ)P(b~γI,A¯λ)|P(Aλ)0 as n uniformly on IZ. The convergence in (17) holds. The convergence in (18) can be obtained in a similar manner. The proposition is proved.

The quantities of the type

exp(m=11m(PF(j=1mYj0,)+PG(j=1mYj0)))

that appear in the limiting distribution of MLE aˆ and bˆ represent probabilities for a random walk with a negative drift to stay in the negative half-plane, in the case of bˆ, and for a random walk with a positive drift to stay in the positive half-plane, in the case of aˆ. These probabilities refer back to [37], later cited by many authors including [19,34]. Corollary 8.44 of [34] specifies these precise probabilities, rephrasing the fact of staying within a negative half-plane as ∞ being the first moment of becoming positive. Our probabilities in Proposition 3.6 are for two-sided random walks, making aˆ a minimum and bˆ a maximum, when r = 0 and l = 0.

4. Multiple transient changes and the familywise false alarm rate

Next, we consider a possibility of multiple transient changes [ak,bk], k=1,,K, where K is the number of transient change intervals. The distribution of observed data oscillates between distributions F and G, switching at unknown times, so that

X0:a1=(X1,,Xa1)FXa1:b1=(Xa1+1,,Xb1)GXb1:a2=(Xb1+1,,Xa2)FXa2:b2=(Xa2+1,,Xb2)GXbK:n=(XbK+1,,Xn)F

One interpretation of this setting is a base distribution F, when the observed process is ‘in control’, that is subject to sudden disorder times ak, when it goes ‘out of control’ to a disturbed distribution G. Each disorder will eventually be followed by a ‘re-adjustment’ to the base distribution, which takes place at time bk.

The goal is to detect all the changes and estimate all (2K) change-points ak and bk. Facing a possibility of multiple changes, we aim to control a familywise false alarm rate and a familywise false re-adjustment rate that are understood as the probability of at least one erroneously detected change-point.

In the class of multiple change-point problems, existence of two alternating distributions leads to two special forms of familywise detection errors that we aim to control.

That is, a (2K)-dimensional change-point parameter

{ak,bk}k=1k=K={a1,b1;;aK,bK}

is estimated by a 2Kˆ-dimensional estimator

{aˆk,bˆk}k=1k=Kˆ={aˆ1,bˆ1;;aKˆ,bKˆ}.

A false alarm is understood as an estimated segment [aˆk,bˆk] that does not intersect with any disorder region [am,bm]. The familywise false alarm rate will be defined as the familywise error rate in the sense of [20], the probability of at least one false alarm,

FAR=P{k([aˆk,bˆk]  (j[aj,bj]) = )}. (19)

Similarly, we call it a false re-adjustment when the estimated ‘in control’ interval [bˆk,aˆk+1] does not contain any in-control observations, that is,

FRR=P{k([bˆk,aˆk+1]  (j[bj,aj+1]) = )}. (20)

We aim at controlling the familywise rates of false alarms and false adjustments at pre-chosen levels α and β, respectively,

FARα,  and  FRRβ.

We consider two situations, when the number of transient changes K is known or unknown.

4.1. Known number of transient changes and MLE

The log-likelihood function of (ak,bk),k=1,,K is written as

L(X;{(ak,bk)})=k=1Ki=ak+1bklogg(Xi)f(Xi)

Maximizing it, we obtain the maximum likelihood estimator

{(aˆk,bˆk)}k=1k=K=argmaxa1<b1<<ak<bKk=1K(SbkSak),

which are K mutually disjoint intervals of the biggest growth of St (Figure 3).

Figure 3.

Figure 3.

Estimation of multiple change-points.

A computational algorithm for {(aˆk,bˆk)} can be obtained as an iteration of steps outlined in Section 1 for the single-interval case, with a few modifications.

Step 1. Apply 5 to obtain the first MLE interval that corresponds to the interval of the biggest growth of the random walk St(1)=St and the associated CUSUM process Wt,

bˆ1=argmaxWt,  aˆ1=max{Ker(W)[0,bˆ)}.

Step 2. Apply Step 1 to the processes

St(2,1)=Stfor 1taˆ1,St(2,2)=(StSaˆ1)for aˆ1tbˆ1,St(2,3)=StSbˆ1for bˆ1tn.

This results in three new intervals, [c1,d1], [c2,d2], and [c3,d3]. Compare D1=Sd1Sc1, D2=(Sd2Sc2), and D3=Sd3Sc3, and let Dj=max{D1,D2,D3}.

If j = 1 or j = 3, add the corresponding interval to the MLE, i.e. let

aˆ2=cj and bˆ2=dj.

If j = 2, then let

bˆ1=c2 and aˆ2=d2,

replacing the previously found interval [aˆ1,bˆ1] with two intervals, [aˆ1,c2] and [d2,bˆ1].

Based on the reversed log-likelihood ratios log(f/g), the process (St) is actually the random walk that can be used to detect a change from G to F. Thus, the found interval [c2,d2] is a candidate for a re-adjustment period, a change back to the base distribution. When the original walk St drops more on [c2,d2] than it grows on [c1,d1] or [c3,d3], the sum of increments along the obtained intervals [aˆ1,bˆ1] and [aˆ2,bˆ2] is higher that on any other two intervals, and thus, they will form the MLE for K = 2.

Step k. For k=2,,KN (where N is the number of intervals where the random walk St increases), we repeat the same operations as in Step 2. In every detected interval of change, [aˆj,bˆj], we find an interval of the largest drop of St. In every interval between them including [1,aˆ1] and [bˆk1,n], we find an interval of the largest growth of St. Then we find the interval of the largest change among them. If it is an interval of growth between bˆj and aˆj+1, we simply add it to the list of intervals of change. If it is an interval [c,d] of decrease between aˆj and bˆj, we replace the previously found [aˆj,bˆj] with two intervals, [aˆj,c] and [d,bˆj].

An example is shown in Figure 3. At step 1, the interval of the largest growth is determined as [aˆ1=98, bˆ1=263]. At step 2, the second largest growth interval is determined as [aˆ2=400, bˆ1=504]. At step 3, the largest-growth interval with ends at c = 157 and d = 190 is found inside [aˆ1,bˆ1]. Therefore, we conclude that a re-adjustment occurred between c and d, and [aˆ1,bˆ1]=[98,263] is now replaced with two intervals, [98,157] and [190,263].

4.2. Unknown number of transient changes and familywise error rates

Since the number of changes K is usually unknown, the algorithm in Section 4.1 may either miss changes or produce false alarms. As noted before, intervals of the biggest growth of the random walk St signal transient changes. Therefore, those intervals where the increment in St exceeds a certain threshold will serve as estimated transient change intervals.

This threshold controls the rate of false alarms. As we show below, no Bonferroni or Holm type correction is needed to control the familywise error rates. Instead, both the familywise rate of false alarms  (19) and the familywise rate of false re-adjustments  (20) can be controlled by thresholds that are independent of the true number of change-points, which can remain unknown.

The algorithm can be described as follows.

  1. Introduce two CUSUM processes, renewed at a random time T0,
    WT,t=ST+tmin0itST+i=CUSUM based on (ST+tST),renewed at TW~T,t=max0itST+iST+t=CUSUM based on (ST+tST),renewed at T
    The CUSUM WT,t is set to detect the next disorder time, whereas W~T,t is tuned to determine the next re-adjustment time. A special case of T = 0 results in the initial CUSUM processes Wt and W~t without any resetting.
  2. To control the familywise false alarm and false re-adjustment rates at the desired levels α and β, respectively, define thresholds as
    hα=log(αEF1(eWn)) and h~β=log(βEG1(eW~n)) (21)
  3. The algorithm proceeds through the data series, detecting disorders and re-adjustments at stopping times τk and post-estimating change-points ak and bk sequentially for k=1,2,,K as follows,
    τ1=inf{t:0<tn,Wthα},aˆ1=max{KerWt[0,τ1)}τ~k=τk+inf{t:0<tnτk,W~τk,th~β},bˆk=τk+max{KerW~τk,t[0,τ~kτk)};τk=τ~k1+inf{t:0<tnτ~k1,Wτ~k1,thα},aˆk=τ~k1+max{KerWτ~k1,t[0,τkτ~k1)},
    until τk= or τ~k=.

By this definition of stopping times τk, τ~k and change-point estimates aˆk, bˆk, each stopping time belongs to the corresponding interval of transient change that it is designed to detect, aˆk<τkbˆk and bˆk1<τ~kaˆk. CUSUM processes Wt and W~t are restarted and grounded at these times. As in the previous sections, change-points ak and bk are then estimated by the last zero points of restarted CUSUM processes Wτ~k1,t and W~τk,t, respectively.

Proposition 4.1

The transient change-point detection and estimator scheme (i)-(iii) resulting in the estimator {aˆk,bˆk}k=1k=K controls familywise rates of false alarms and false re-adjustments at levels

FARα and FRRβ,

for any unknown number of transient changes K.

Proof.

According to the algorithm (i)-(iii), a false alarms occurs in the interval [aˆk,bˆk) if all the data in this interval follow the distribution F, including the segment Xaˆk:τk that triggered the false detection at time τk.

Also note that each renewed CUSUM process Wτk,t=Sτk+tmin0tτkSτk+t is dominated by the original CUSUM process Wt on the corresponding segment,

Wτk,tWτk+t.

This is because the subtracted term mint0St in the original CUSUM process cannot exceed the corresponding minimum min0tτkSτk+t of the renewed CUSUM.

Therefore, at least one false alarm can possibly occur only if the original CUSUM process Wt exceeds the threshold hα at least once in the interval (0,n] under the distribution F. The probability of the latter event is bounded by the Doob's inequality. Similarly to  (8), obtain

FARPF{k(max0<t(bkτ~k1)+Wτk,thα)}PF{max0<tnWthα}ehEF(eWn)=α,

after substituting the first expression in  (21) for hα.

The inequality FRRβ is proven along the same lines, replacing the CUSUM process Wt with W~t, and accordingly, the stopping times τk with τ~k and vice versa.

5. Experimental study

In this section, we illustrate the proposed methods and explore their detection and estimation power by a simulation study. Our considered scenarios are

  1. Transient changes in the mean of a Normal distribution;

  2. Transient changes in the variance of a Normal distribution;

  3. Transient changes between the Normal and Laplace distributions.

In this study, we estimate the detection thresholds that yield the preset rate of false alarms α=0.05, rate of false re-adjustments β=0.05, evaluate the detection power of the proposed methods, and assess the accuracy of change-point estimators. Familywise rates are controlled at levels α=β=0.05 in the case of multiple transient changes.

The symmetric case of changes in the mean appears quite different from the asymmetric situation of changes in the variance, where it appears more difficult to detect a variance reduction than a variance increase. Scenario [3] is interesting from a practical point of view. The Standard Normal distribution and the Laplace (Double Exponential) distribution with the location parameter μ=0 and the scale parameter b=1/2 are both symmetric, with the same zero means and the same unit variances. However, the Laplace distribution has heavier tails resulting in higher probabilities of large deviations. In industrial manufacturing, for example, large deviations may imply overheating, overcooling, a lack or an excess of a chemical ingredient. Timely detection of such changes and accurate estimation of their locations are critical parts of the quality control, because the items produced during the transient change interval are likely to be non-conforming. The use of likelihood ratios (for example, instead of Shewart charts) allows to detect such changes.

5.1. Detection and estimation of change points

Table 1 contains the probability of detection, as well as means and standard deviations of change-point estimates aˆ and bˆ. An observed sample of size n = 1000 is assumed, with a transient change between a = 500 and b = 700. The considered scenarios include a change from the Standard Normal base distribution to the disturbed distribution:

  1. To the Normal distribution with mean μ and unit variance (change in the mean);

  2. To the Normal distribution with mean 0 and variance σ2 (change in the variance);

  3. To the Laplace distribution with mean 0 and variance 1 (change neither in the mean nor in the variance).

Table 1.

Detection thresholds, detection probabilities, and properties of change-point estimates for transient changes in the means and in the variances of Normal distributions and from the Normal to Laplace distributions.

Disturbed distribution Threshold Detection probability Accuracy of Estimation
μ σ h Pa,b(Λh) E(aˆ) Std(aˆ) E(bˆ) Std(bˆ)
0.05 1 2.65 0.109 351.2 238.0 694.5 238.5
0.10 1 4.16 0.212 404.0 208.9 690.9 209.9
0.15 1 5.02 0.394 444.5 171.2 691.0 174.8
0.20 1 5.60 0.618 472.1 127.8 695.3 133.8
0.25 1 6.03 0.804 487.5 92.0 697.7 97.6
0.30 1 6.35 0.915 495.6 64.6 699.6 68.0
0.35 1 6.62 0.969 498.2 45.8 700.4 46.9
0.40 1 6.84 0.991 499.8 32.7 700.4 32.9
0.60 1 7.45 1 500.0 14.1 700.1 14.1
0.80 1 7.80 1 499.9 7.9 700.0 7.9
1.00 1 8.00 1 500.0 5.1 700.0 5.0
0 0.50 8.20 1 498.4 5.4 701.6 5.5
0 0.75 6.92 0.989 494.7 32.6 704.8 34.1
0 0.90 5.01 0.355 430.2 171.8 701.2 175.4
0 0.95 3.41 0.137 361.2 222.6 703.6 223.9
0 1.05 3.33 0.146 385.7 230.7 678.7 232.4
0 1.10 4.73 0.362 447.8 181.3 681.2 185.5
0 1.25 6.22 0.950 502.9 55.1 693.9 58.2
0 1.50 6.95 1 503.4 15.7 696.9 15.8
0 2.00 7.25 1 501.6 5.6 698.4 5.6
Laplace(0, 1/2) 6.4 0.975 499.9 45.7 698.1 47.4

Threshold hα is calculated as the 95-th empirical percentile of the distribution of Λ=max0tn(Wt), that yields the rate of false alarms FAR=α=0.05. Results are based on N=50,000 Monte Carlo runs, and the threshold is estimated from Nh=200,000 Monte Carlo runs. Experimentally, we observed that estimation of the CUSUM's exponential moment EF(eWn) for threshold 9 is less reliable due to a very high variance of eWn.

As one would anticipate, the detection power, expressed as the probability of detection, monotonically increases with the magnitude of change. When the transient change lasts for ba = 200 observations, it is detected with the probability of 0.95 or higher when the mean drifts by 0.3+ standard deviations or when the standard deviation changes by 25%, in one or the other direction. Accordingly, the accuracy of estimators aˆ and bˆ improves with the magnitude of change resulting in lower standard errors. Results imply that the change-point estimators are nearly unbiased and distribution-consistent, converging in probability to the corresponding parameters, as the change grows in magnitude (unlike the standard notion of consistency related to large samples, see [4]).

5.2. Power analysis and the choice of a threshold

Table 2 shows power analysis for certain types of changes. The power, represented by the detection probability in change-point analysis, is estimated as a function of the magnitude and duration of transient change. Naturally, the difficulty in detecting small changes can be compensated by a sufficiently long interval where the data follows the new distribution. Even a 0.2σ change in the mean or a 10% shift in the standard deviation are quite likely to be detected, when the change sustains for a block of, say, ba = 400 observations. A change from the Normal to the Laplace distribution with the same mean and the same variance has a 99% chance to be detected if the region of change lasts for about 250 observations.

Table 2.

Power analysis. Detection probabilities as functions of magnitude and duration of a transient change.

Change Duration of the transient period Δ
From N(0,1) to N(μ,1) μ 50 100 150 200 250 300 350 400 450 500
  0.1 0.068 0.102 0.152 0.213 0.281 0.346 0.394 0.465 0.532 0.571
  0.2 0.113 0.259 0.457 0.611 0.740 0.828 0.889 0.924 0.949 0.968
  0.3 0.216 0.567 0.808 0.916 0.968 0.983 0.992 0.997 0.999 1
  0.4 0.421 0.839 0.964 0.992 0.998 0.999 1 1 1 1
  0.5 0.659 0.957 0.995 1 1 1 1 1 1 1
  0.6 0.836 0.993 1 1 1 1 1 1 1 1
  0.7 0.937 0.999 1 1 1 1 1 1 1 1
  0.8 0.978 1 1 1 1 1 1 1 1 1
  0.9 0.994 1 1 1 1 1 1 1 1 1
  1.0 0.998 1 1 1 1 1 1 1 1 1
From N(0,1) to N(0,σ) σ  
  0.50 0.993 1 1 1 1 1 1 1 1 1
  0.75 0.305 0.797 0.952 0.989 0.997 0.999 1 1 1 1
  0.90 0.082 0.144 0.241 0.361 0.465 0.576 0.664 0.738 0.796 0.837
  0.95 0.062 0.084 0.108 0.142 0.171 0.211 0.254 0.290 0.330 0.357
  1.05 0.064 0.086 0.115 0.143 0.181 0.212 0.256 0.289 0.325 0.366
  1.10 0.086 0.162 0.256 0.365 0.462 0.559 0.637 0.703 0.756 0.801
  1.25 0.325 0.693 0.871 0.950 0.979 0.991 0.997 0.999 1 1
  1.50 0.865 0.992 0.999 1 1 1 1 1 1 1
  2.00 1 1 1 1 1 1 1 1 1 1
Normal to Laplace 0.323 0.731 0.905 0.973 0.991 0.997 0.999 1 1 1

A similar pattern is seen in the operating characteristic curve on Figure 4. The sensitivity-specificity ratio is represented by the probability of detection and the rate of false alarms. Easily detectable changes are represented by steeper ROC curves, which correspond to larger magnitudes of change μ and longer periods Δ of transient change. On this Figure, Δ=10 means that only 10 data points are observed from the changed distribution.

Figure 4.

Figure 4.

Operating characteristics. ROC curves for change detection in the mean with various thresholds.

ROC curves can also be used for determining detection thresholds h that achieve the desired balance between the detection power and the rate of false alarms. A simple argument results in a lower bound for the needed threshold, which appears a good approximation of h for larger changes. Indeed, one large increment zi is sufficient for exceeding the threshold and triggering a false alarm, under the base distribution.

Let Fz be the cumulative distribution function of the individual log-likelihood ratios zi=log(g(Xi)/f(Xi)) under the base distribution F. The false alarm rate must be bounded from below by the probability of having at least one increment zi alone, 1,,n, exceeding the threshold, and consequently, driving the whole CUSUM process over h. That is,

FARP{i=1nzih=1Fzn(h)}

exceeds α if and only if h<Fz1((1α)1/n). Hence, we obtain the lower bound for the required threshold,

hFz1((1α)1/n).

For example, in case of a change in the mean of a Normal distribution, the base distribution of log-likelihood ratios zi is Normal with mean (μ2/2) and variance μ2. Hence,

FAR1Φn(h+μ2/2μ),

and we obtain that any threshold satisfying

hμΦ1((1α)1/n)μ22 (22)

yields the false alarm rate controlled at a level not exceeding α, where Φ denotes the Standard Normal c.d.f.

As seen in Figure 5,  (22) is a pretty accurate approximation of the required threshold for mean changes that are larger than 4 standard deviations. It means that detecting a change between substantially different distributions, a false alarm is likely to be cause by one extreme observation.

Figure 5.

Figure 5.

Lower bound estimation of required thresholds.

5.3. Detection of multiple transient changes

The next experiment focuses on multiple changes whose number is unknown. In the data stream of length n = 1000, three transient change intervals are generated, each lasting for bkak=100 observations, k = 1, 2, 3. Each such segment is marked with a mean shifted by μ standard deviations. The algorithm described in Section 4.2 is then used to detect and estimate the start and end points of all intervals of change, with thresholds determined from the empirical null distribution of the test statistic. Since the actual number of change-points is treated as unknown, the algorithm may detect either fewer or more than N = 3 intervals of change.

In this study, we estimate the familywise false alarm rate FAR and the familywise false re-adjustment rate FRR, explore the frequency of detecting the correct and the incorrect number of changes, and evaluate the accuracy of all change-point estimators (aˆk,bˆk).

Results in Table 3 show that rather low familywise false alarm and false re-adjustment rates for all shifts μ; they are controlled by properly selected detection thresholds.

Table 3.

Analysis of multiple transient changes. Familywise false alarm and false re-adjustment rates and the distribution of detected intervals of change.

Shift Threshold     Probability of detecting k intervals
μ h FAR FRR k = 0 k = 1 k = 2 k = 3 k4
0.1 4.14 0 0 0.77 0.23 0 0 0
0.2 5.60 0.001 0 0.44 0.49 0.07 0 0
0.3 6.36 0.002 0 0.09 0.37 0.41 0.13 0
0.4 6.84 0.007 0 0 0.07 0.36 0.56 0
0.5 7.18 0.013 0 0 0 0.11 0.87 0.01
0.6 7.44 0.020 0.002 0 0 0.02 0.96 0.02
0.7 7.64 0.024 0.003 0 0 0 0.97 0.03
0.8 7.79 0.028 0.006 0 0 0 0.97 0.03
0.9 7.94 0.027 0.008 0 0 0 0.97 0.03
1.0 8.01 0.030 0.010 0 0 0 0.96 0.04

When the sequence is observed with no change-points, the threshold explained in Section 5.1 guarantees exactly the desired rate of false alarms, subject to the Monte Carlo estimation error only. Based on the test statistic, the maximum CUSUM value, the threshold h=h(n) depends on the sample size n, and it is an increasing function of n. In the presence of change-points, false alarms, if any, occur within shorter intervals between transient change segments. Shorter intervals could have been served by lower thresholds if their lengths were known. However, the duration and mere presence of transient changes is unknown, and we guarantee the desired FAR conservatively by selecting a threshold h(n), which yields FAR=α in the absence of change-points and FAR<α in the presence of change-points, where the real FAR depends on the number and mutual location of transient changes, and more precisely, on the duration of each segment. The actual familywise error rates are reflected in columns 3-4 of Table 3.

Table 3 shows that for changes in the magnitude of 0.5 standard deviations or more, the multiple transient change detection algorithm is quite likely to detect precisely the correct number, K = 3 intervals of change. Small shifts are more difficult to detect. When the mean drifts away by 0.2 standard deviations or less, the procedure will almost certainly detect fewer than K transient changes. Of course, detection is more likely when a change lasts for longer than 100 observations, which can be achieved, for example, by more frequent measurements.

With the thresholds selected to control FAR and FRR, detecting more than K intervals of change is very unlikely. For K = 3, the probability of detecting more than 3 intervals is less than 0.001 for all the considered scenarios.

Accuracy of change-point estimators is evaluated in Table 4. Here, the means and standard deviations of all estimators (aˆk,bˆk) are calculated over those data streams that resulted in the correct number of detected intervals. Estimated means are to be compared with the actual intervals of change,

[a1,b1]=[150,250],  [a2,b2]=[450,550],  and  [a3,b3]=[750,850].

Results suggest that change-point estimators are nearly unbiased for shifts of magnitude from about 0.4 standard deviations, with their precision visibly improving for larger shifts.

Table 4.

Analysis of multiple transient changes. Accuracy of change-point estimation.

  Means and standard deviations of change-point estimators
Shift μ E(aˆ1) E(bˆ1) E(aˆ2) E(bˆ2) E(aˆ3) E(bˆ3) σ(aˆ1) σ(bˆ1) σ(aˆ2) σ(bˆ2) σ(aˆ3) σ(bˆ3)
0.2 114.0 272.2 426.4 551.0 734.4 870.8 43.9 33.5 35.7 20.0 39.9 21.1
0.3 135.3 260.0 441.4 560.0 739.9 853.9 36.1 30.5 30.9 30.4 30.9 25.6
0.4 145.4 253.9 446.1 553.2 745.7 852.7 26.0 24.6 26.9 25.2 25.4 22.5
0.5 148.8 251.2 448.8 551.2 748.6 850.8 18.8 18.6 20.0 19.2 18.9 18.3
0.6 149.7 250.4 449.8 550.3 749.8 850.3 13.7 13.7 13.9 13.7 13.4 13.3
0.7 149.9 250.1 449.9 550.1 749.9 850.0 10.4 10.2 10.1 10.4 10.1 10.3
0.8 150.0 250.1 450.1 550.1 749.9 850.0 8.2 7.6 7.9 8.0 7.9 7.8
0.9 150.0 250.0 450.1 550.0 750.1 850.1 6.2 6.2 6.4 6.3 6.3 6.3
1.0 149.9 250.0 450.0 550.0 750.0 850.0 5.0 5.0 5.2 4.9 5.1 5.0

6. Summary and conclusions

The transient change-point analysis refers to temporary changes in the distribution of data. Here, we studied detection and maximum likelihood estimation of transient changes, including the cases of a single change and multiple changes, where their number may be known or unknown, and studied precision of the obtained estimators.

Even small transient changes can be detected, if they sustain for a sufficiently long period of time. The power of detection naturally reduces with smaller magnitudes or shorter durations of a change. Detection sensitivity depends on the selected threshold, which can be chosen to satisfy a preset rate of false alarms.

The next step is extension of these methods to the case of distributions with unknown (nuisance) parameters. Generalized likelihood ratios and Bayesian methods have been proposed to handle nuisance parameters in the situations of a single change. Application of similar techniques to the case of multiple transient changes will allow detecting changes from the base distribution to different disturbed distributions in each transient change interval.

Funding Statement

Research of M. Baron is supported by the NSF [grant number 1737960] and the Defense Advanced Research Projects Agency (DARPA) [grant number HR0011-18-C-0051]. Research of S. Malov is supported by the Russian Science Foundation (RSF) [grant number 20-14-00072].

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Abd-Elnaser S., Rabou A.S., and Gad A.M., Change-point rank tests with epidemic alternatives, Egypt. Stat. J. 50 (2006), pp. 114–135. [Google Scholar]
  • 2.Anastasiou A. and Fryzlewicz P., Detecting multiple generalized change-points by isolating single ones, Metrika 85 (2022), pp. 141–174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Baron M., Sequential methods for multistate processes, in Applied Sequential Methodologies, N. Mukhopadhyay, S. Datta, and S. Chattopadhyay, eds., Dekker, New York, 2004, pp. 55–73.
  • 4.Baron M. and Granott N., Consistent estimation of early and frequent change points, in Foundations of Statistical Inference, J. Haitovsky, H. R. Lerche, and Y. Ritov, eds., Physica-Verlag, Heidelberg, New York, 2003, pp. 181–194.
  • 5.Baron M., Rosenberg M., and Sidorenko N., Electricity pricing: Modeling and prediction with automatic spike detection, Energy, Power Risk Management 2001 (2001), pp. 36–39. [Google Scholar]
  • 6.Baron M., Rosenberg M., and Sidorenko N., Divide and conquer: Forecasting power via automatic price regime separation, Energy, Power Risk Management 2002 (2002), pp. 70–73. [Google Scholar]
  • 7.Bianchi A.M., Mainardi L., Petrucci E., Signorini M.G., Mainardi M., and Cerutti S., Time-variant power spectrum analysis for the detection of transient episodes in HRV signal, IEEE Trans. Biomed. Eng. 40 (1993), pp. 136–144. [DOI] [PubMed] [Google Scholar]
  • 8.Chen J. and Gupta A.K., Parametric Statistical Change Point Analysis: With Applications to Genetics, Medicine, and Finance, Birkhäuser, Boston, MA, 2012.
  • 9.Egea-Roca D., López-Salcedo J.A., Seco-Granados G., and Poor H.V., Performance bounds for finite moving average tests in transient change detection, IEEE. Trans. Signal. Process. 66 (2018), pp. 1594–1606. [Google Scholar]
  • 10.Eichinger B. and Kirch C., A MOSUM procedure for the estimation of multiple random change points, Bernoulli 24 (2018), pp. 526–564. [Google Scholar]
  • 11.Fearnhead P., Exact and efficient Bayesian inference for multiple changepoint problems, Stat. Comput. 16 (2006), pp. 203–213. [Google Scholar]
  • 12.Fryzlewicz P., Wild binary segmentation for multiple change-point detection, Ann. Stat. 42 (2014), pp. 2243–2281. [Google Scholar]
  • 13.Fryzlewicz P., Detecting possibly frequent change-points: Wild binary segmentation and steepest-drop model selection, J. Korean. Stat. Soc. 49 (2020), pp. 1027–1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Fu Y.-X and Curnow R.N., Maximum likelihood estimation of multiple change points, Biometrika 77 (1990), pp. 563–573. [Google Scholar]
  • 15.Guépié B.K., Fillatre L., and Nikiforov I., Detecting a suddenly arriving dynamic profile of finite duration, IEEE Trans. Inform. Theory 63 (2017), pp. 3039–3052. [Google Scholar]
  • 16.Guépié B.K., Fillatre L., and Nikiforov I.V., Sequential detection of transient changes, Seq. Anal. 31 (2012), pp. 528–547. [Google Scholar]
  • 17.Gut A. and Steinebach J., A two-step sequential procedure for detecting an epidemic change, Extremes 8 (2005), pp. 311–326. [Google Scholar]
  • 18.Han C., Willett P.K., and Abraham D.A., Some methods to evaluate the performance of page's test as used to detect transient signals, IEEE. Trans. Signal. Process. 47 (1999), pp. 2112–2127. [Google Scholar]
  • 19.Hinkley D.V., Inference about the change-point in a sequence of random variables, Biometrika 57 (1970), pp. 1–17. [Google Scholar]
  • 20.Hochberg Y. and Tamhane A.C., Multiple Comparison Procedures, Wiley, New York, 1987. [Google Scholar]
  • 21.Hu I. and Rukhin A.L., A lower bound for error probability in change-point estimation, Stat. Sin. 5 (1995), pp. 319–331. [Google Scholar]
  • 22.Lee C.-B., Nonparametric multiple change-point estimators, Statist. Probab. Lett. 27 (1996), pp. 295–304. [Google Scholar]
  • 23.Levin B. and Kline J., The cusum test of homogeneity with an application in spontaneous abortion epidemiology, Stat. Med. 4 (1985), pp. 469–488. [DOI] [PubMed] [Google Scholar]
  • 24.Noonan J. and Zhigljavsky A., Power of the MOSUM test for online detection of a transient change in mean, Seq. Anal. 39 (2020), pp. 269–293. [Google Scholar]
  • 25.Page E.S., Continuous inspection schemes, Biomterika 41 (1954), pp. 100–115. [Google Scholar]
  • 26.Page E.S., On problems in which a change in a parameter occurs at an unknown point, Biometrika 44 (1957), pp. 248–252. [Google Scholar]
  • 27.Poor H.V. and Hadjiliadis O., Quickest Detection, Cambridge University Press, Cambridge (UK), 2009. [Google Scholar]
  • 28.Repin V.G., Detection of a signal with unknown moments of appearance and disappearance, Problemy Peredachi Informatsii 27 (1991), pp. 61–72. [Google Scholar]
  • 29.Rosenberg M., Bryngelson J.D., Baron M., and Papalexopoulos A.D., Transmission valuation analysis based on real options with price spikes, in Handbook of Power Systems II; Energy Systems Part I, S. Rebennack, P.M. Pardalos, M.V.F. Pereira and N. Iliadis, eds., Springer, Berlin-Heiderberg, 2010, pp. 101–125.
  • 30.Rosenberg M., Bryngelson J.D., Sidorenko N., and Baron M., Price spikes and real options: transmission valuation, in Real Options and Energy Management, E.I. Ronn, ed., Risk Books, London, 2002, pp. 323–370.
  • 31.Seidou O. and Ouarda T., Recursion-based multiple changepoint detection in multiple linear regression and application to river streamflows, Water. Resour. Res. 43(7) (2007). DOI: 10.1029/2006WR005021. [DOI] [Google Scholar]
  • 32.Shiryaev A.N., Probability, 2nd ed. Springer-Verlag, New York, 1995. [Google Scholar]
  • 33.Shiryaev A.N., Quickest detection problems: Fifty years later, Seq. Anal. 29 (2010), pp. 345–385. [Google Scholar]
  • 34.Siegmund D., Sequential Analysis: Tests and Confidence Intervals, Springer-Verlag, New York, 1985. [Google Scholar]
  • 35.Siegmund D., Boundary crossing probabilities and statistical applications, Ann. Stat. 14 (1986), pp. 361–404. [Google Scholar]
  • 36.Siegmund D., Approximate tail probabilities for the maxima of some random fields, Ann. Probab. 16 (1988), pp. 487–501. [Google Scholar]
  • 37.Spitzer F., Principles of Random Walk, Van Nostrand, New York, 1966. [Google Scholar]
  • 38.Stroock D.W., Mathematics of probability, Vol. 149. American Mathematical Soc., 2013.
  • 39.Tafakori L., Pourkhanali A., and Fard F.A., Forecasting spikes in electricity return innovations, Energy 150 (2018), pp. 508–526. [Google Scholar]
  • 40.Tartakovskii A.G., Detection of signals with random moments of appearance and disappearance, Probl. Peredachi Inf. 24 (1988), pp. 39–50. [Google Scholar]
  • 41.Tartakovsky A.G., Berenkov N.R., Kolessa A.E., and Nikiforov I.V., Optimal sequential detection of signals with unknown appearance and disappearance points in time, IEEE. Trans. Signal. Process. 69 (2021), pp. 2653–2662. [Google Scholar]
  • 42.Tartakovsky A. G., Nikiforov I. V., and Basseville M., Sequential Analysis Hypothesis Testing and Change-Point Detection, Chapman & Hall/CRC, 2014. [Google Scholar]
  • 43.Vostrikova L. Ju., Detecting ‘disorder’ in multidimensional random processes, Sov. Math. Dokl. 24 (1981), pp. 55–59. [Google Scholar]
  • 44.Wang X., Liu B., Zhang X., and Liu Y., Efficient multiple change point detection for high-dimensional generalized linear models, Can. J. Stat. (2022). DOI: 10.1002/cjs.11721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Woodroofe M., Nonlinear Renewal Theory in Sequential Analysis, SIAM, 1982. [Google Scholar]
  • 46.Yao Q., Tests for change-points with epidemic alternatives, Biometrika 80 (1993), pp. 179–191. [Google Scholar]
  • 47.Zhang L. and Li Y., Regime-switching based vehicle-to-building operation against electricity price spikes, Energy Econ. 66 (2017), pp. 1–8. [Google Scholar]
  • 48.Zhou B., Chioua M., Bauer M., Schlake J.C., and Thornhill N.F., Improving root cause analysis by detecting and removing transient changes in oscillatory time series with application to a 1, 3-butadiene process, Ind. Eng. Chem. Res. 58 (2019), pp. 11234–11250. [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES