Detection and estimation of multiple transient changes

Michael Baron; Sergey V Malov

doi:10.1080/02664763.2023.2174257

. 2023 Mar 13;50(14):2862–2888. doi: 10.1080/02664763.2023.2174257

Detection and estimation of multiple transient changes

Michael Baron ^a,^CONTACT, Sergey V Malov ^b,^c,^d

PMCID: PMC10557625 PMID: 37808619

Abstract

Change-point detection methods are proposed for the case of temporary failures, or transient changes, when an unexpected disorder is ultimately followed by a re-adjustment and return to the initial state. A base distribution of the ‘in-control’ state changes to an ‘out-of-control’ distribution for unknown periods of time. Likelihood based sequential and retrospective tools are proposed for the detection and estimation of each pair of change-points. The accuracy of the obtained change-point estimates is assessed. Proposed methods offer simultaneous control of the familywise false alarm and false re-adjustment rates at the pre-chosen levels.

Keywords: Change-point problem, CUSUM process, false alarm, maximum likelihood estimate, transient changes

1. Introduction to transient changes

Transient changes, or temporary disorders, refer to the situations when an initial distribution of observed data changes to a different one and eventually returns to the original state. The moments of change are usually unexpected and a priori unknown, the underlying distributions may be known or unknown, but the ultimate return to the initial distribution is assumed to be inevitable. In general, a data sequence may experience one or more transient changes, which can be changes in the mean value, variance, or other characteristics of the observed process. This article focuses on the detection of such changes and estimation of change-points.

There is a wide range of practical situations that are subject to transient changes. Applications in signal and image processing for the detection of finite signals are mentioned in [41], with the detection of space objects detailed in [41, Section 6]. Detection of transient changes appears useful in medical diagnostics based on the heart rate variability [7]. Similar models, termed ‘the pulse form’ or ‘the epidemic alternative’, were introduced in [17,23,46] for epidemiologic monitoring and malformation surveillance. Application to the monitoring of chemical concentrations in drinking water is detailed in [16], Section 6. Analysis of transient changes is important in industrial process control and power systems, for the identification of in-control and out-of-control periods; specific applications are described in [1,48]. Another application studied in [1] deals with the exploration of vertical ocean shears.

Similar situations also occur in financial data from deregulated energy markets. During the periods of high demand, extreme weather conditions, maintenance or closure of a power plant, the instantaneous price of electricity may experience a spike lasting from several hours to several days, as shown on Figure 1. After each spike, the distribution of prices returns to the initial state [5,6,39,47]. Accurate detection of spikes and estimation of their parameters is needed for financial modeling and prediction that is critical for proper valuation of energy options and contracts [29,30].

Figure 1. — Spikes in instantaneous electricity prices during two years in the PJM (Pennsylvania–New Jersey–Maryland) energy market.

The field of change-point analysis is well studied over the last 70 years or so [33]. Thorough surveys of proposed methods can be found in [8,27,42]. Many of the proposed tools can be applied to the analysis of transient changes; for example, [18] evaluates performance of the Page's CUSUM procedure [25,26] for transient changes of a known duration. As we show, the transient change-point analysis is a different problem; in particular, the likelihood-ratio testing for a transient change is related to the maximum value of a CUSUM process.

We start with the retrospective (off-line, non-sequential) analysis of one transient change. Our goals focus on testing occurrence of a change and estimating its starting and ending moments. The level α likelihood-ratio test is derived, which leads to the maximum likelihood estimation of the interval of change. Then, we derive the asymptotic distribution of the maximum likelihood estimator.

In the case of Normal distributions, our estimator of the end of a transient segment coincides with the test statistic of [46, Section 2.2] and [35, Section 3.6]. A number of other tests are presented in [46] for a transient change in a sequence of Normal random variables. A short survey of other tests is given in [1, Section 1].

We then generalize our method to the situation of multiple transient changes. A number of algorithms have been proposed for multiple change-points that are not transient. One can use an overall maximum likelihood estimation procedure [14], although without a restriction it tends to produce false alarms. To limit false alarms, one can restrict the number of change-points or the distance between them, as in [22]. Alternatively, one can utilize binary segmentation [43,44] and wild binary segmentation [12,13], the isolate-detect scheme [2], Bayesian recursion [11] and its regression version [31], scan statistics [10], and other methods.

Building upon our analysis of one transient change, we utilize a fully sequential scheme of [3] to detect and estimate multiple transient changes. In other words, a retrospective problem that arises from a sample of a fixed size n is being solved sequentially, where change-points are detected sequentially, one at a time.

Several sequential statistical methods have been proposed for the analysis of transient changes. Under the assumption of a known duration of the post-change period, the standard CUSUM algorithm for change-point detection is modified and optimized in [15,16,41]. The optimality is understood as the lowest probability of missing a transient change [15,16] or the highest probability of detection [41], subject to the given probability of a false alarm within the given time. The optimized detection rule is the window-limited CUSUM, or WL-CUSUM. The special case of changes in the mean is considered in [24], where an approximate expression for the average run length to false alarm is given for the moving-average sum (MOSUM) algorithm.

When the assumption of a completely known duration of the period of change is not realistic, one may consider it random, put a prior distribution on each change-point, and consider the resulting Bayesian problem, as in [6,28,40].

For a more detailed overview of literature on Bayesian and non-Bayesian transient change-point detection methods, see [9,16,46].

In this work, we focus on the detection of transient changes and estimation of change-points when the change intervals are completely unknown. The considered scenarios are retrospective, in which a fixed-length data sequence is already collected, and it may contain one or more transient changes that we aim to detect without an option of requesting more data.

Existence of two alternating distributions, regarded as ‘in-control’ and ‘out-of-control’ states, leads us to a new problem of a simultaneous control of the false alarm and the false re-adjustment rates.

The new results include simultaneous control of these rates in the case of one transient change, their familywise control in the case of multiple transient changes, and the asymptotic distribution of the maximum likelihood estimator of the interval of change. Notably, our introduced threshold for α-testing of one transient change, is extended to the case of multiple transient changes, providing α-control of both familywise error rates without a Bonferroni-type correction or another adjustment for multiple comparisons. The Doob's maximal inequality, used in the derivation of this threshold, appears sufficient to guarantee familywise control regardless of the number of transient changes.

Algorithms for the detection, estimation, and testing of one transient change are derived in Section 2, and for multiple transient changes in Section 4, where the number of changes may be known or unknown. The proposed detection method, a self-correcting CUSUM procedure, is shown to detect transient changes while controlling the familywise false alarm rate and the familywise false re-adjustment rate simultaneously at the pre-determined levels. Asymptotic results are derived in Section 3. Section 5 contains illustrations and simulation results.

2. Estimation and testing of one transient change interval

In this section, we assume at most one interval of change and derive the transient change detection scheme that controls the probability of a false alarm at a pre-chosen level α. Two scenarios are possible - either all the data follow the base distribution,

\begin{aligned} H_{0} : X_{1}, \dots, X_{n} \sim F, \end{aligned}

or there is one region of change $[a, b]$ , so that

\begin{aligned} H_{1} : {\begin{cases} X_{1}, \dots, X_{a} & \sim & F \\ X_{a + 1}, \dots, X_{b} & \sim & G \\ X_{b + 1}, \dots, X_{n} & \sim & F \end{cases} \end{aligned}

where a and b are unknown change-points while the distributions F and G are known. The former case can be viewed as the no-change null hypothesis $H_{0}$ , and the latter as the transient-change alternative $H_{1}$ .

The goals are [1] to distinguish between $H_{0}$ and $H_{1}$ with a given level of significance, and [2] to estimate change-point parameters a and b in the case of $H_{1}$ .

2.1. Maximum likelihood estimation

From now on, we assume independent observations $X_{1}, \dots, X_{n}$ . (The case of dependent data is usually solved by representing the joint likelihood function as a product of conditional densities, given past observations. Probability results will then require additional assumptions.) For the parameter $(a, b)$ , the log-likelihood function is written as

\begin{aligned} L (X; a, b) & = \sum_{i = 1}^{a} \log f (X_{i}) + \sum_{i = a + 1}^{b} \log g (X_{i}) + \sum_{i = b + 1}^{n} \log f (X_{i}) \\ = \sum_{i = a + 1}^{b} \log \frac{g (X_{i})}{f (X_{i})} + \sum_{i = 1}^{n} \log f (X_{i}) \\ = S_{b} - S_{a} + const, \end{aligned}

(1)

where f and g are probability densities of distributions F and G with respect to a reference measure μ;

\begin{aligned} S_{t} = \sum_{i = 1}^{t} \log \frac{g (X_{i})}{f (X_{i})} = \sum_{i = 1}^{t} z_{i} \end{aligned}

is a random walk built on marginal log-likelihood ratios $z_{i}$ as its increments; and $\sum_{i = 1}^{n} \log f (X_{i})$ is a constant term as it does not depend on the unknown parameters a and b. Measures F and G are not required to be mutually absolutely continuous, so that the log-likelihood ratio $\log (g / f)$ assumes values in $\bar{R} = [- \infty, \infty]$ .

Maximizing (1), we immediately obtain the maximum likelihood estimator (MLE)

\begin{aligned} (\hat{a}, \hat{b}) = ({\hat{a}}_{n}, {\hat{b}}_{n}) = \underset{a \leq b}{a r g m a x} (S_{b} - S_{a}) . \end{aligned}

(2)

Naturally, the log-likelihood (1) and the maximizer (2) are functions of the sample size n. To simplify notations, n will often be omitted, except for the asymptotic study in Section 3.4, exploring the distribution of (2) as $n \to \infty$ .

According to (2), the MLE returns the interval of the largest growth of random walk $S_{t}$ . A direct method of calculating $\hat{a}$ and $\hat{b}$ can be proposed in terms of the associated cumulative-sum (CUSUM) process

\begin{aligned} W_{t} = S_{t} - min_{i \leq t} S_{i}, \end{aligned}

(3)

which vanishes at every successive point of minimum of $S_{t}$ . Given $\hat{b}$ , one finds $\hat{a}$ by minimizing $S_{t}$ for $t \leq \hat{b}$ , that is, finding the most recent zero of the CUSUM $W_{t}$ . Then, the CUSUM does not return to zero between $\hat{a}$ and $\hat{b}$ , and therefore,

\begin{aligned} S_{\hat{b}} - S_{\hat{a}} = W_{\hat{b}} - W_{\hat{a}} = W_{\hat{b}} . \end{aligned}

(4)

Maximizing (4), we obtain its computational formula for the MLE,

\begin{aligned} \hat{b} = {\hat{b}}_{n} = \underset{0 < t \leq n}{a r g m a x} W_{t}, \hat{a} = {\hat{a}}_{n} = max {Ker (W) \cap [0, \hat{b})}, \end{aligned}

(5)

where Ker $(W) = t : W_{t} = 0$ denotes the CUSUM's ‘kernel’, or the set of its zeros.

Our estimator $\hat{b}$ matches the Levin-Kline statistic [3] of [23], which is applied to the case of Normal distributions in Section 3.6 of [35] and Section 2.2 of [46].

As an illustration, an example of a log-likelihood ratio based random walk $S_{t}$ , the associated CUSUM process $W_{t}$ , and the resulting transient change-point estimator $(\hat{a}, \hat{b})$ is shown in Figure 2.

Figure 2. — Maximum likelihood estimation of a single transient change interval. The likelihood-ratio test statistic Λ is the largest increment of both processes $S_{t}$ and $W_{t}$ .

2.2. Testing appearance of a transient change

The largest increment (4) of the random walk $S_{t}$ also serves as the log-likelihood ratio test statistic

\begin{aligned} Λ = Λ_{n} = \log \frac{max_{a < b} {f (X_{0 : a}) g (X_{a : b}) f (X_{b : n})}}{f (X_{0 : n})} = max_{a < b} (S_{b} - S_{a}) = W_{\hat{b}} \end{aligned}

for testing the no-change null hypothesis against an alternative hypothesis that a transient change occurred in our data,

\begin{aligned} H_{0} : X_{0 : n} \sim F vs. H_{1} : {\begin{cases} X_{0 : a} & \sim & F \\ X_{a : b} & \sim & G \\ X_{b : n} & \sim & F \end{cases} for some a < b, \end{aligned}

where $X_{k : m} = (X_{k + 1}, \dots, X_{m})$ for any k<m, and $F (X_{k : m})$ and $G (X_{k : m})$ denote joint distributions.

The likelihood-ratio test (LRT) rejects $H_{0}$ in favor of $H_{0}$ if $Λ \geq h$ for some threshold h. The choice of h controls the balance between probabilities of Type I and Type II errors, or in other words, between the detection sensitivity and the rate of false alarms.

In order to control the probability of a false alarm at the given level α, we take advantage of the Doob's Maximal Inequality (for example, see [32], Section VII-3; [38], Section 7.1.1). It states that for a submartingale ${Y_{t}}$ and any constant $c \geq 0$ ,

\begin{aligned} P {sup_{0 \leq t \leq n} Y_{t} \geq c} \leq \frac{E (Y_{n}^{+})}{c}, \end{aligned}

where $x^{+} = max {x, 0}$ .

The Doob's inequality can be applied directly to the LRT statistic

\begin{aligned} Λ = W_{\hat{b}} = max_{0 \leq t \leq n} W_{t} \end{aligned}

in the following way. The CUSUM process (3) admits a recursive representation

\begin{aligned} W_{0} = 0, W_{t + 1} = max {0, W_{t} + z_{t + 1}} \end{aligned}

with $z_{i} = \log \frac{g (X_{i})}{f (X_{i})}$ ([25], Section 2.2). Similarly, $U_{t} = \exp {W_{t}}$ can be expressed recursively as

\begin{aligned} U_{0} = 1, U_{t + 1} = max {1, U_{t} e^{z_{t + 1}}} . \end{aligned}

(6)

It follows that $E_{F} | U_{t} | < \infty$ for every t, because

\begin{aligned} 1 & \leq E_{F} (e^{z_{1}} \lor 1) = E_{F} {e^{z_{1}}; z_{1} \geq 0} + P_{F} {z_{1} < 0} \leq E_{F} e^{z_{1}} + 1 \\ = \int \frac{g}{f} f d μ + 1 = 1 + 1 = 2, \end{aligned}

and therefore,

\begin{aligned} 1 \leq E_{F} (U_{t}) \leq E_{F}^{t} (e^{z_{1}} \lor 1) \leq 2^{t} < \infty . \end{aligned}

Also, from (6),

\begin{aligned} E_{F} {U_{t + 1} | U_{1}, \dots, U_{t}} \geq U_{t} E_{F} e^{z_{t + 1}} = U_{t}, \end{aligned}

showing that $U_{t}$ is a submartingale. Applying the Doob's maximal inequality to the process ${U_{t}}$ , we have

\begin{aligned} P {Type I error} & = P_{F} {Λ \geq h} = P_{F} {max_{0 \leq t \leq n} W_{t} \geq h} = P_{F} {max_{0 \leq t \leq n} U_{t} \geq e^{h}} \end{aligned}

(7)

\begin{aligned} \leq e^{- h} E_{F} (U_{n}) = e^{- h} E_{F} (e^{W_{n}}) . \end{aligned}

(8)

Thus, setting the threshold at

\begin{aligned} h_{α} = - \log \frac{α}{E_{F} (e^{W_{n}})} \end{aligned}

(9)

guarantees the probability of a false alarm no higher than α.

A large-sample, large-threshold asymptotic expression for the Type I error probability (7) is derived in [36, Theorem 2]. By comparison, (8) is an inequality, which is valid for any finite n and h.

We conclude this section summarizing the obtained results.

Proposition 2.1 Detection of a single transient segment —

For the case of at most one transient change in the interval $[0, n]$ ,

(1)
The maximum likelihood estimator of the interval of change $[a, b]$ is given by (2) in terms of the random walk $S_{t}$ and by (5) in terms of the CUSUM process $W_{t}$ .

(2)
The likelihood-ratio test (LRT) rejects the no-change hypothesis if
$\begin{aligned} Λ = max_{0 \leq t \leq n} W_{t} \geq h \end{aligned}$
for some threshold h.

(3)
Threshold $h_{α}$ determined by (9) yields a level α LRT.

(4)
The change-point detection algorithm that reports a transient change at the stopping time $T_{α} = min {t : W_{t} \geq h_{α}$ } produces a false alarm with probability $P {false alarm} \leq α$ .

3. Precision and limiting distribution of the MLE

In this section, we assume existence of a transient interval $(a, b]$ . In other words, we assume the alternative hypothesis $H_{1}$ and study the distribution of the maximum likelihood estimators $(\hat{a}, \hat{b})$ under $H_{1}$ . Under this condition, we study precision of the maximum likelihood estimator for the interval of change and derive its large-sample limiting distribution.

Several concepts will be introduced as building blocks in this derivation:

–
pre-likelihood estimators (PLE), defined by condition (10), weaker than the MLE, and therefore, including all MLE;
–
local likelihood estimators (LLE), defined constructively for any local point γ;
–
detection point, which belongs to the interval of change with a given probability.

3.1. Pre-Likelihood estimators

We start by introducing so-called pre-likelihood estimators (PLE). PLE satisfy the necessary (but not sufficient) conditions (10) for a pair of points to be the MLE.

To this end, introduce the direct and inverse shifted random walk processes ${S_{τ, i}^{+}}_{i = 1}^{n - τ}$ : $S_{τ, i}^{+} = S_{τ + i} - S_{τ}$ and ${S_{τ, i}^{-}}_{i = 1}^{τ - 1}$ : $S_{τ, i}^{-} = S_{τ - i} - S_{τ}$ respectively; $S_{0}^{+} = S_{0}^{-} = 0$ , $S_{i} = S_{i}^{+} = S_{1, i} =$ and $S_{i}^{-} = S_{n, i}^{-}$ , $i = 1, \dots, n$ . The MLE $(\hat{a}, \hat{b})$ must satisfy inequalities

\begin{aligned} \begin{aligned} {min}_{i} S_{\hat{a}, i}^{-} > 0, {min}_{i \leq \hat{b} - \hat{a}} S_{\hat{a}, i}^{+} > 0, \\ {max}_{i \leq \hat{b} - \hat{a}} S_{\hat{b}, i}^{-} < 0, {max}_{i} S_{\hat{b}, i}^{+} < 0. \end{aligned} \end{aligned}

(10)

In the sequel, any pair $(\tilde{a}, \tilde{b})$ that satisfies (10) will be called a pre-likelihood estimator (PLE). The PLE conditions (10) ensure that the left-side estimate $\tilde{a}$ cannot be improved for the given right-end $\tilde{b}$ by shifting it to increase $S_{\tilde{b}} - S_{\tilde{a}}$ , and similarly, the right-side estimate $\tilde{b}$ cannot be improved for the given left-end $\tilde{a}$ .

PLE are directly related to MLE, because if there exists a unique PLE, it is equal to the MLE. In the next section, we find conditions for the PLE to be unique, for a sufficiently large n.

PLE can be constructed by a combination of the direct and reverse CUSUM processes, $W_{t}$ and $W_{t}^{-}$ , where the reverse CUSUM process is defined as

\begin{aligned} W_{0}^{-} = 0, W_{n - k}^{-} = max (0, S_{1}^{-}, \dots, S_{k}^{-}) - S_{k}^{-}, k = 1, \dots, n . \end{aligned}

The direct CUSUM $W_{t}$ is based upon the random walk $S_{t} = z_{1} + \dots + z_{t}$ , whereas the reverse CUSUM is built upon the random walk that starts at time n and proceeds back in time, adding increments $(- z_{i})$ . Any PLE can be obtained from the kernels of these CUSUM processes, defined as $K = {t : W_{t} = 0}$ , $K^{-} = {t : W_{t}^{-} = 0}$ , and $\tilde{K} = K ⋃ K^{-}$ . A pair $(\tilde{a}, \tilde{b})$ is a PLE if and only if $\tilde{a} < \tilde{b}$ , $\tilde{a} \in K$ , $\tilde{b} \in K^{-}$ , and ${i \in \tilde{K} : \tilde{a} < i < \tilde{b}} = \emptyset$ .

3.2. Local likelihood estimators

In general, there may be multiple PLEs, and their number is random. To study their distribution, we define a class of local estimators $({\tilde{a}}_{γ}, {\tilde{b}}_{γ})$ that are constructed around a fixed point γ,

\begin{aligned} {\tilde{a}}_{γ} = γ - \arg {min}_{i \leq γ} S_{γ, i}^{-} and {\tilde{b}}_{γ} = γ + \arg {max}_{i \leq n - γ} S_{γ, i}^{+} . \end{aligned}

We call them local likelihood estimators (LLE). Every PLE coincides with an LLE with respect to any point γ inside the interval defined by this PLE.

Below, we derive the distribution of an LLE with respect to a fixed point γ. Constructed this way, LLE ${\tilde{a}}_{γ}$ and ${\tilde{b}}_{γ}$ are independent for any fixed γ. We consider the interesting case of $a < γ \leq b$ . The distribution of ${\tilde{b}}_{γ}$ under $b - γ = k \geq 0$ is the same as the distribution of MLE in the change point problem. Then

\begin{aligned} P ({\tilde{b}}_{γ} & = b) = R_{G, k}^{-} (0) R_{F, n - b}^{+} (0) \\ \geq \exp (- \sum_{m = 1}^{\infty} m^{- 1} (P_{F} (\sum_{j = 1}^{m} Y_{j} \geq 0) + P_{G} (\sum_{j = 1}^{m} Y_{j} \leq 0))), \end{aligned}

where

\begin{aligned} \begin{aligned} R_{H, k}^{+} (x) & = P_{H} (max (0, S_{1}^{+}, \dots, S_{k}^{+}) < x), \\ R_{H, k}^{-} (x) & = P_{H} (max (0, S_{1}^{-}, \dots, S_{k}^{-}) < x), \end{aligned} \end{aligned}

(11)

and the random walk process $S_{k}$ based on the sample from a distribution H. The last inequality follows from the Spitzer's formula (see [45]) as $k, (n - b) \to \infty$ [21]. Moreover [19], under r>0,

\begin{aligned} P ({\tilde{b}}_{γ} = b + r) = \int_{0}^{\infty} R_{G, b - γ}^{-} (x) B_{F, r, n - b - r}^{+} (x) d x, \end{aligned}

and under r<0,

\begin{aligned} P ({\tilde{b}}_{γ} = b + r) = \int_{0}^{\infty} R_{F, n - b}^{+} (x) B_{G, - r, b - γ + r}^{-} (x) d x, \end{aligned}

where

\begin{aligned} \begin{aligned} B_{H, k, s}^{+} (y) d y & = P_{H} ({a r g m a x}_{0 \leq i \leq k + s} S_{i} = k, S_{k} \in [y, y + d y)) \\ B_{H, k, s}^{-} (y) d y & = P_{H} ({a r g m a x}_{0 \leq i \leq k + s} (- S_{i}) = k, - S_{k} \in [y, y + d y)) . \end{aligned} \end{aligned}

(12)

The distribution of ${\tilde{a}}_{γ}$ can be obtained in a similar manner:

\begin{aligned} P ({\tilde{a}}_{γ} & = a) = R_{F, a} (0) R_{G, γ - a + 1}^{-} (0) \\ \geq \exp (- \sum_{m = 1}^{\infty} m^{- 1} (P_{G} (\sum_{j = 1}^{m} Y_{j} \geq 0) + P_{F} (\sum_{j = 1}^{m} Y_{j} \leq 0))); \end{aligned}

\begin{aligned} P ({\tilde{a}}_{γ} = a + l) = \int_{0}^{\infty} R_{G, γ - a + 1}^{-} (x) B_{F, - l, a + l}^{+} (x) d x \end{aligned}

for l<0; and

\begin{aligned} P ({\tilde{a}}_{γ} = a + l) = \int_{0}^{\infty} R_{F, a}^{+} (x) B_{G, l, γ - a - l + 1}^{-} (x) d x \end{aligned}

for l>0.

Proposition 3.1

Let a, b are fixed. Then for each $r \in J = {- b + 1, \dots, n - b}$ ,

$\begin{aligned} inf_{γ \in (a, b]} P ({\tilde{b}}_{γ} = b + r) \\ \geq p_{r} = {\begin{cases} \exp (- \sum_{m = 1}^{\infty} \frac{1}{m} (P_{F} (\sum_{j = 1}^{m} Y_{j} \geq 0,) + P_{G} (\sum_{j = 1}^{m} Y_{j} \leq 0))) for r = 0, \\ \int_{0}^{\infty} R_{F, \infty}^{+} (x) B_{G, - r, \infty}^{-} (x) d x for r < 0, + .2 e m \\ \int_{0}^{\infty} R_{G, \infty}^{-} (x) B_{F, r, \infty}^{+} (x) d x for r > 0, \end{cases} \end{aligned}$

and for each $l \in J^{*} = {- a + 1, \dots, n - a}$ ,

$\begin{aligned} inf_{γ \in (a, b]} P ({\tilde{a}}_{γ} = a + l) \\ \geq q_{l} = {\begin{cases} \exp (- \sum_{m = 1}^{\infty} \frac{1}{m} (P_{G} (\sum_{j = 1}^{m} Y_{j} \geq 0) + P_{F} (\sum_{j = 1}^{m} Y_{j} \leq 0))) for l = 0, \\ \int_{0}^{\infty} R_{G, \infty}^{-} (x) B_{F, - l, \infty}^{+} (x) d x for l < 0, + .2 e m \\ \int_{0}^{\infty} R_{F, \infty}^{+} (x) B_{G, l, \infty}^{-} (x) d x for l > 0. \end{cases} \end{aligned}$

Proof.

Continuing trajectories of the random walks occurring at time k in a neighborhood of y, we conclude that the probability of reaching maximum at time k is not increased. Hence, $B_{H, k, s}^{+} (x)$ , $B_{H, k, s}^{-} (x)$ are non increased in s as s>0. In a similar manner we obtain that $R_{H, s}^{+} (x)$ and $R_{H, s}^{-} (x)$ are non increased in s as s>0. The proposition follows immediately.

The right hand sides of inequalities in the last proposition for $r \neq 0$ or $s \neq 0$ are quite complicated for practical use. Approximations suitable for computation are obtained in [19].

The inequalities in Proposition 3.1 can be used immediately to get the lower bound for the cumulative probabilities,

\begin{aligned} P (s \leq {\tilde{a}}_{γ} - a \leq t) \geq \sum_{l = s}^{t} q_{l} and P (s \leq {\tilde{b}}_{γ} - b \leq t) \geq \sum_{r = s}^{t} p_{r} \end{aligned}

(13)

for $s \leq t$ and any $γ \in (a, b]$ , where $p_{r}$ and $q_{l}$ are given in Proposition 3.1.

3.3. Local estimation around a detection point

The next result extends (13) from fixed γ to a random point $\hat{γ}$ , possibly dependent on data, that has a probability of falling into $(a, b]$ bounded from below. If $P (\hat{γ} \in (a, b]) \geq 1 - α$ , the random point $\hat{γ}$ will be called a detection point of level α.

An example of a level α detection point can be constructed as follows. Let $α = α_{1} + α_{2}$ , and consider the stopping time $T_{α_{1}}$ , defined by part 4 of Proposition 2.1. Similarly, consider the stopping time $T_{α_{2}}^{-}$ that is based on the reverse CUSUM process $W_{t}^{-}$ . If $T_{α_{1}} \leq T_{α_{2}}^{-}$ , then any point in the interval $[T_{α_{1}}, T_{α_{2}}^{-}]$ is a detection point of level $α$ .

Cumulative and tail probabilities for LLE with respect to such a point $\hat{γ}$ are bounded from below in the following proposition.

Proposition 3.2

Let $\hat{γ}$ be a detection point of level α, as defined above. Then

$\begin{aligned} P (s \leq {\tilde{a}}_{\hat{γ}} - a \leq t) & \geq \sum_{l = s}^{t} q_{l} - α; \\ P (s \leq {\tilde{b}}_{\hat{γ}} - b \leq t) & \geq \sum_{r = s}^{t} p_{r} - α, \end{aligned}$

where probabilities $p_{r}$ and $q_{l}$ are given in Proposition 3.1.

Proof.

The proof is based on the Boole inequality

$\begin{aligned} 1 - P (A B) = P (\bar{A} \cup \bar{B}) \leq P (\bar{A}) + P (\bar{B}) \end{aligned}$

that implies $P (A B) \geq P (A) - P (\bar{B})$ . Then for any fixed a<b,

$\begin{aligned} P (s \leq {\tilde{a}}_{\hat{γ}} - a \leq r) & \geq P (s \leq {\tilde{a}}_{\hat{γ}} - a \leq r, \hat{γ} \in (a, b]) \\ \geq {inf}_{γ \in (a, b]} P (s \leq {\tilde{a}}_{γ} - a \leq r, γ \in (a, b]) \\ \geq {inf}_{γ \in (a, b]} P (s \leq {\tilde{a}}_{γ} - a \leq r) - α \geq \sum_{l = s}^{t} q_{l} - α, \end{aligned}$

from (13). The second inequality is obtained analogously.

3.4. Asymptotic distribution of the MLE

In this section, we study the large-sample asymptotic behavior of MLE $(\hat{a}, \hat{b})$ as the sample size and all homogeneous segments tend to infinity. We assume that the parameters $a = a (n)$ and $b = b (n)$ and the interval of change $D = [a (n), b (n)]$ are dependent on n, and $Δ = min {a (n), b (n) - a (n), n - b (n)} \to \infty$ as $n \to \infty$ . We use the notation $P \equiv P_{(D, n)}$ for the distribution with a transient change and $P_{H}$ for the case of i.i.d. random variables $X_{1}, \dots, X_{n}$ with the common distribution function H. In particular, $P_{F} \equiv P_{(\emptyset, n)}$ and $P_{G} \equiv P_{(N, n)}$ , where $N = {1, \dots, n}$ .

Next, we define random walks $S_{k} = \sum_{i = 1}^{k} Y_{i}$ and ${\tilde{S}}_{k} = - \sum_{i = 1}^{k} Y_{i}$ . For example, for the transient change-point detection problem, with log-likelihood ratios $Y_{i} = \log \frac{g}{f} (X_{i})$ , the random walk $S_{k}$ is used to detect a change from F to G whereas ${\tilde{S}}_{k}$ is used to detect a change from G to F.

Let $W_{k}$ , ${\tilde{W}}_{k}$ be the corresponding CUSUM processes, where $W_{0} = {\tilde{W}}_{0} = 0$ , $W_{k} = S_{k} - min_{i \leq k} S_{i} = (W_{k - 1} + Y_{k}) \lor 0$ , and ${\tilde{W}}_{k} = max_{i \leq k} S_{i} - S_{k} = ({\tilde{W}}_{k - 1} - Y_{k}) \lor 0$ .

We start with the following auxiliary results.

Lemma 3.3

Let $E_{F} Y = c_{1} < 0$ . Then for any $ϵ > 0$

$\begin{aligned} P_{F} ({sup}_{n \geq m} {max}_{j \leq n} W_{j} / n > ϵ) \to 0 as m \to \infty . \end{aligned}$

Proof.

Let $F = {F_{k}}_{k \in N}$ be the natural filtration associated with the process $Y_{1}, Y_{2}, \dots$ ; and $τ_{1}, τ_{2}, \dots$ be the successive zeroes of the CUSUM process ${W_{k}}_{k \in N}$ .

Introduce $Y_{k}^{*} = Y_{k} 1 I_{{Y_{k} \geq 0}}$ and $S_{k}^{*} = \sum_{j = 1}^{k} Y_{j}^{*}$ , $k \in N$ . Note that $E_{F} Y_{k}^{*} = c_{2} < \infty$ . By the Markov property of the random process ${S_{k}^{*}}_{k \in N}$ : $S_{k}^{*} = \sum_{j \leq k} Y_{j}^{*}$ with respect to the filtration $F$ using Wald's identity, we obtain that $E_{F} S_{τ_{k}}^{*} = E_{F} S_{τ_{1}}^{*} = c_{2} E_{F} τ_{1}$ for all k>1.

Let $Z_{1}, Z_{2}, \dots$ be independent copies of the random variable $S_{τ_{1}}^{*}$ . Then

$\begin{aligned} P_{F} ({max}_{j \leq n} W_{j} / n > ϵ) \leq P_{F} ({max}_{j \leq n} Z_{j} / n > ϵ) . \end{aligned}$ (14)

Denote, $H (u) = P_{F} (Z_{1} \leq u)$ is the distribution function of Z's. Note that the events ${{max}_{j \leq n} Z_{j} / n > ϵ}$ , $n \in N$ , occur infinitely many times iff the events ${Z_{k} > k ϵ}$ , $k \in N$ , occur infinitely many times; $P_{F} (Z_{k} > k ϵ) = 1 - H (k ϵ)$ . By the Borel–Cantelli lemma and Maclaurin–Cauchy test, $P_{F}$ almost sure $Z_{k} \leq k ϵ$ under a sufficiently large k if

$\begin{aligned} \int_{0}^{\infty} (1 - H (x ϵ)) d x = E Z_{1} < \infty . \end{aligned}$

The lemma is proved.

The next lemma follows immediately from the strong law of large numbers (SLLN) and Lemma 3.3.

Lemma 3.4

Let $E_{F} Y = c_{1} < 0$ , $E_{G} Y = c_{2} > 0$ and $lim inf_{n \to \infty} \frac{Δ_{n}}{n} \geq ϵ$ for some $ϵ > 0$ . Then

$\begin{aligned} lim_{r \to \infty} lim_{n \to \infty} P (| {\hat{a}}_{n} - a | \geq r) = 0; lim_{r \to \infty} lim_{n \to \infty} P (| {\hat{b}}_{n} - b | \geq r) = 0. \end{aligned}$

Proof.

For the most distant from a version of the PLE ${\tilde{a}}_{n}$ we can write that

$\begin{aligned} P (a - {\hat{a}}_{n} \geq r) & \leq P (a - {\tilde{a}}_{n} \geq r) = P ({sup}_{k \leq a - r} W_{k} = {sup}_{k \leq n} W_{k}) \\ \leq P_{F} ({inf}_{r \leq k \leq a} {\tilde{S}}_{k} \leq 0) + P ({sup}_{k \leq a} W_{k} \geq S_{a, b - a}) \end{aligned}$

where

$\begin{aligned} P_{F} ({inf}_{r \leq k \leq a} {\tilde{S}}_{k} \leq 0) \leq P_{F} ({inf}_{k \geq r} {\tilde{S}}_{k} \leq 0) \\ = P_{F} ({sup}_{k \geq r} S_{k} \geq 0) = P_{F} ({sup}_{k \geq r} (S_{k} / k - c_{1}) \geq - c_{1}) \to 0 \end{aligned}$ (15)

as $r \to \infty$ . Moreover, for any $δ > 0$ ,

$\begin{aligned} P ({sup}_{k \leq a} W_{k} \geq S_{a, b - a}) \leq P_{F} ({sup}_{k \leq a} W_{k} > a δ) + P_{G} (S_{b - a} \leq a δ) . \end{aligned}$

The first term in the right-hand side of the last inequality tends to 0 as $n \to \infty$ by Lemma 3.3, and the second term tends to 0 as $n \to \infty$ by the law of large numbers since $lim sup_{n \to \infty} a / (b - a) \leq ϵ^{- 1}$ and $c_{2} > 0$ .

Analogously, we obtain that $P ({\hat{b}}_{n} - b \geq r) \to 0$ as $r \to \infty$ .

Let $ϵ > 0$ ; $r_{ϵ}$ and $n_{ϵ}$ are such that

$\begin{aligned} P ({\hat{b}}_{n} - b > r_{ϵ}) \leq ϵ / 2 \end{aligned}$

for all $n \geq n_{ϵ}$ . Since ${\hat{a}}_{n} \leq {\hat{b}}_{n}$ , on the event $A_{ϵ} = {{\hat{b}}_{n} - b \leq r_{ϵ}}$ ,

$\begin{aligned} P ({\hat{a}}_{n} - a \geq r) \leq P_{G} (sup_{r \leq k \leq b - a} S_{k} \leq 0) + P (S_{a, b - a} - min_{k \leq r_{ϵ}} S_{b - a, k} \geq 0) . \end{aligned}$ (16)

The first term in the right hand side of the last inequality is tended to 0 as $r \to \infty$ uniformly in $n \geq 1$ as in (15). Then there exists an $r_{0 ϵ}$ , such that $P_{G} (sup_{r_{0 ϵ} \leq k \leq b - a} S_{k} \leq 0) \leq ϵ / 4$ . Finally, $min_{k \leq r_{ϵ}} S_{b - a, k} = O_{P} (1)$ , and, therefore,

$\begin{aligned} P (S_{a, b - a} - min_{k \leq r_{ϵ}} S_{b - a, k} \geq 0) = P (S_{a, b - a} / (b - a) - c_{2} \geq - c_{2} + min_{k \leq r_{ϵ}} S_{b - a, k} / (b - a)) \to 0 \end{aligned}$

as $b - a \to \infty$ . Hence, the second term in (16) is not exceed $ϵ / 4$ under the sufficiently large Δ. We obtained that $P ({\hat{a}}_{n} - a \geq r_{ϵ} \lor r_{0 ϵ}) \leq ϵ,$ under the sufficiently large Δ, and, therefore,

$\begin{aligned} lim_{r \to \infty} lim_{Δ \to \infty} P ({\hat{a}}_{n} - a \geq r) = 0. \end{aligned}$

Convergence $lim_{r \to \infty} lim_{Δ \to \infty} P (\hat{b} - b_{n} \leq r) = 0$ can be obtained in the similar manner. Therefore, the lemma is proved.

Lemma 3.4 yields the following proposition.

Proposition 3.5

Let $γ = λ a + (1 - λ) b$ , $\hat{γ} = λ \hat{a} + (1 - λ) \hat{b}$ for some $λ \in (0, 1)$ ; $E_{F} Y < 0$ and $E_{G} Y > 0$ . Then

$\begin{aligned} P_{θ} (\hat{a} < γ < \hat{b}) \to 1 and P_{θ} (a < \hat{γ} < b) \to 1, \end{aligned}$

as $lim inf_{n \to \infty} Δ_{n} / n \geq ϵ$ for some $ϵ > 0$ . Moreover,

$\begin{aligned} P_{θ} (max (\hat{b} - γ, γ - \hat{a}) > M) \to 1 and P_{θ} (max (b - \hat{γ}, \hat{γ} - a) > M) \to 1 \end{aligned}$

for any fixed M>0 for all $θ = (a, b)$ as $lim inf_{n \to \infty} Δ_{n} / n \geq ϵ$ for some $ϵ > 0$ .

Remark 3.1

Lemma 3.4 actually proves that the MLE $(\hat{a}, \hat{b})$ is unique with probability tending to 1 as $n \to \infty$ , and if $γ \in (a, b)$ and $({\tilde{a}}_{γ}, {\tilde{b}}_{γ})$ is the LLE with respect to some point γ, then
$\begin{aligned} P_{θ} (\hat{a} = {\tilde{a}}_{γ}, \hat{b} = {\tilde{b}}_{γ}) \to 1 \end{aligned}$
as $n \to \infty$ uniformly for all $a (n), b (n) : lim inf_{n \to \infty} Δ_{n} / n \geq ϵ$ for some $ϵ > 0$ .

Under some known point γ between a and b, the estimation problem reduces to two separate change-point estimation problems, on the direct $i = 1, \dots, γ$ and the inverse $i = n, n - 1, \dots, γ$ sets of indices.

The main results can be easily extended to the case of multiple transient changes $D_{n} = ⋃_{i = 1}^{J_{n}} (a_{j}, b_{j}]$ as $Δ_{n} = min_{j = 0, \dots, J_{n + 1}} (b_{j} - a_{j}) \to \infty$ as $n \to \infty$ , where $a_{0} = 0$ and $a_{J_{n} + 1} = n$ .

Remark 3.1(i), together with Proposition 3.1, yield the following result, which establishes the asymptotic distribution of the MLE $(\hat{a}, \hat{b})$ .

Proposition 3.6

Under the conditions of Lemma 3.4, for any fixed r and l,

$\begin{aligned} lim_{n \to \infty} P (\hat{b} & = b + r) = p_{r} \\ = {\begin{cases} \exp (- \sum_{m = 1}^{\infty} \frac{1}{m} (P_{F} (\sum_{j = 1}^{m} Y_{j} \geq 0,) + P_{G} (\sum_{j = 1}^{m} Y_{j} \leq 0))) for r = 0, \\ \int_{0}^{\infty} R_{F, \infty}^{+} (x) B_{G, - r, \infty}^{-} (x) d x for r < 0, \\ \int_{0}^{\infty} R_{G, \infty}^{-} (x) B_{F, r, \infty}^{+} (x) d x for r > 0; \end{cases} \\ lim_{n \to \infty} P (\hat{a} & = a + l) = q_{l} \\ = {\begin{cases} \exp (- \sum_{m = 1}^{\infty} \frac{1}{m} (P_{G} (\sum_{j = 1}^{m} Y_{j} \geq 0) + P_{F} (\sum_{j = 1}^{m} Y_{j} \leq 0))) for l = 0, \\ \int_{0}^{\infty} R_{G, \infty}^{-} (x) B_{F, - l, \infty}^{+} (x) d x for l < 0, \\ \int_{0}^{\infty} R_{F, \infty}^{+} (x) B_{G, l, \infty}^{-} (x) d x for l > 0; \end{cases} \end{aligned}$

where $\sum_{r \in Z} p_{r} = \sum_{l \in Z} q_{l} = 1$ , and $R^{+}$ , $R^{-}$ , $B^{+}$ , and $B^{-}$ are defined by (11) and (12) in the previous section. Moreover,

$\begin{aligned} lim_{n \to \infty} sup_{I \subseteq Z} | \sum_{r \in I} P (\hat{b} = b + r) - \sum_{r \in I} p_{r} | = 0 \end{aligned}$ (17)

and

$\begin{aligned} lim_{n \to \infty} sup_{I \subseteq Z} | \sum_{l \in I} P (\hat{a} = a + l) - \sum_{l \in I} q_{l} | = 0. \end{aligned}$ (18)

Proof.

Let $p_{r n} = inf_{γ \in (a, b)} P (\hat{b} = b + r)$ for some $γ \in (a, b)$ and $J = {- b + 1, \dots, n - b}$ . Then Proposition 3.1 implies that $\sum_{r \in J} p_{r n} \geq \sum_{r \in J} p_{r}$ . Moreover, $p_{r n} = 0$ for $r \notin J$ and $\sum_{r \in J} p_{r} \to 1$ as $n \to \infty$ . Hence, $sup_{I \subseteq Z} inf_{γ \in (a, b)} | \sum_{r \in I} p_{r n} - \sum_{r \in I} p_{r} | \to 0$ as $n \to \infty$ . On the other hand, let $λ = (a + b) / 2$ and $A_{λ} = {\hat{b} = {\tilde{b}}_{λ}}$ . Then $| P (\hat{b} \in I) - P (\tilde{b} \in I) | = | P (\hat{b} \in I, {\bar{A}}_{λ}) - P ({\tilde{b}}_{γ} \in I, {\bar{A}}_{λ}) | \leq P (A_{λ}) \to 0$ as $n \to \infty$ uniformly on $I \subseteq Z$ . The convergence in (17) holds. The convergence in (18) can be obtained in a similar manner. The proposition is proved.

The quantities of the type

\begin{aligned} \exp (- \sum_{m = 1}^{\infty} \frac{1}{m} (P_{F} (\sum_{j = 1}^{m} Y_{j} \geq 0,) + P_{G} (\sum_{j = 1}^{m} Y_{j} \leq 0))) \end{aligned}

that appear in the limiting distribution of MLE $\hat{a}$ and $\hat{b}$ represent probabilities for a random walk with a negative drift to stay in the negative half-plane, in the case of $\hat{b}$ , and for a random walk with a positive drift to stay in the positive half-plane, in the case of $\hat{a}$ . These probabilities refer back to [37], later cited by many authors including [19,34]. Corollary 8.44 of [34] specifies these precise probabilities, rephrasing the fact of staying within a negative half-plane as ∞ being the first moment of becoming positive. Our probabilities in Proposition 3.6 are for two-sided random walks, making $\hat{a}$ a minimum and $\hat{b}$ a maximum, when r = 0 and l = 0.

4. Multiple transient changes and the familywise false alarm rate

Next, we consider a possibility of multiple transient changes $[a_{k}, b_{k}]$ , $k = 1, \dots, K$ , where K is the number of transient change intervals. The distribution of observed data oscillates between distributions F and G, switching at unknown times, so that

\begin{aligned} \begin{array}{lcccc} X_{0 : a_{1}} & = & (X_{1}, \dots, X_{a_{1}}) & \sim & F \\ X_{a_{1} : b_{1}} & = & (X_{a_{1} + 1}, \dots, X_{b_{1}}) & \sim & G \\ X_{b_{1} : a_{2}} & = & (X_{b_{1} + 1}, \dots, X_{a_{2}}) & \sim & F \\ X_{a_{2} : b_{2}} & = & (X_{a_{2} + 1}, \dots, X_{b_{2}}) & \sim & G \\ \dots & \dots & \dots \\ X_{b_{K} : n} & = & (X_{b_{K} + 1}, \dots, X_{n}) & \sim & F \end{array} \end{aligned}

One interpretation of this setting is a base distribution F, when the observed process is ‘in control’, that is subject to sudden disorder times $a_{k}$ , when it goes ‘out of control’ to a disturbed distribution G. Each disorder will eventually be followed by a ‘re-adjustment’ to the base distribution, which takes place at time $b_{k}$ .

The goal is to detect all the changes and estimate all $(2 K)$ change-points $a_{k}$ and $b_{k}$ . Facing a possibility of multiple changes, we aim to control a familywise false alarm rate and a familywise false re-adjustment rate that are understood as the probability of at least one erroneously detected change-point.

In the class of multiple change-point problems, existence of two alternating distributions leads to two special forms of familywise detection errors that we aim to control.

That is, a $(2 K)$ -dimensional change-point parameter

\begin{aligned} {a_{k}, b_{k}}_{k = 1}^{k = K} = {a_{1}, b_{1}; \dots; a_{K}, b_{K}} \end{aligned}

is estimated by a $2 \hat{K}$ -dimensional estimator

\begin{aligned} {{\hat{a}}_{k}, {\hat{b}}_{k}}_{k = 1}^{k = \hat{K}} = {{\hat{a}}_{1}, {\hat{b}}_{1}; \dots; a_{\hat{K}}, b_{\hat{K}}} . \end{aligned}

A false alarm is understood as an estimated segment $[{\hat{a}}_{k}, {\hat{b}}_{k}]$ that does not intersect with any disorder region $[a_{m}, b_{m}]$ . The familywise false alarm rate will be defined as the familywise error rate in the sense of [20], the probability of at least one false alarm,

\begin{aligned} FAR = P {\cup_{k} ([{\hat{a}}_{k}, {\hat{b}}_{k}] \cap (\cup_{j} [a_{j}, b_{j}]) = \emptyset)} . \end{aligned}

(19)

Similarly, we call it a false re-adjustment when the estimated ‘in control’ interval $[{\hat{b}}_{k}, {\hat{a}}_{k + 1}]$ does not contain any in-control observations, that is,

\begin{aligned} FRR = P {\cup_{k} ([{\hat{b}}_{k}, {\hat{a}}_{k + 1}] \cap (\cup_{j} [b_{j}, a_{j + 1}]) = \emptyset)} . \end{aligned}

(20)

We aim at controlling the familywise rates of false alarms and false adjustments at pre-chosen levels α and β, respectively,

\begin{aligned} FAR \leq α, and FRR \leq β . \end{aligned}

We consider two situations, when the number of transient changes K is known or unknown.

4.1. Known number of transient changes and MLE

The log-likelihood function of $(a_{k}, b_{k}), k = 1, \dots, K$ is written as

\begin{aligned} L (X; {(a_{k}, b_{k})}) = \sum_{k = 1}^{K} \sum_{i = a_{k} + 1}^{b_{k}} \log \frac{g (X_{i})}{f (X_{i})} \end{aligned}

Maximizing it, we obtain the maximum likelihood estimator

\begin{aligned} {({\hat{a}}_{k}, {\hat{b}}_{k})}_{k = 1}^{k = K} = \underset{a_{1} < b_{1} < \dots < a_{k} < b_{K}}{a r g m a x} \sum_{k = 1}^{K} (S_{b_{k}} - S_{a_{k}}), \end{aligned}

which are K mutually disjoint intervals of the biggest growth of $S_{t}$ (Figure 3).

A computational algorithm for ${({\hat{a}}_{k}, {\hat{b}}_{k})}$ can be obtained as an iteration of steps outlined in Section 1 for the single-interval case, with a few modifications.

Step 1. Apply 5 to obtain the first MLE interval that corresponds to the interval of the biggest growth of the random walk $S_{t}^{(1)} = S_{t}$ and the associated CUSUM process $W_{t}$ ,

\begin{aligned} {\hat{b}}_{1} = a r g m a x W_{t}, {\hat{a}}_{1} = max {Ker (W) \cap [0, \hat{b})} . \end{aligned}

Step 2. Apply Step 1 to the processes

\begin{aligned} S_{t}^{(2, 1)} & = S_{t} for 1 \leq t \leq {\hat{a}}_{1}, \\ S_{t}^{(2, 2)} & = - (S_{t} - S_{{\hat{a}}_{1}}) for {\hat{a}}_{1} \leq t \leq {\hat{b}}_{1}, \\ S_{t}^{(2, 3)} & = S_{t} - S_{{\hat{b}}_{1}} for {\hat{b}}_{1} \leq t \leq n . \end{aligned}

This results in three new intervals, $[c_{1}, d_{1}]$ , $[c_{2}, d_{2}]$ , and $[c_{3}, d_{3}]$ . Compare $D_{1} = S_{d_{1}} - S_{c_{1}}$ , $D_{2} = - (S_{d_{2}} - S_{c_{2}})$ , and $D_{3} = S_{d_{3}} - S_{c_{3}}$ , and let $D_{j} = max {D_{1}, D_{2}, D_{3}}$ .

If j = 1 or j = 3, add the corresponding interval to the MLE, i.e. let

\begin{aligned} {\hat{a}}_{2} = c_{j} and {\hat{b}}_{2} = d_{j} . \end{aligned}

If j = 2, then let

\begin{aligned} {\hat{b}}_{1} = c_{2} and {\hat{a}}_{2} = d_{2}, \end{aligned}

replacing the previously found interval $[{\hat{a}}_{1}, {\hat{b}}_{1}]$ with two intervals, $[{\hat{a}}_{1}, c_{2}]$ and $[d_{2}, {\hat{b}}_{1}]$ .

Based on the reversed log-likelihood ratios $\log (f / g)$ , the process $(- S_{t})$ is actually the random walk that can be used to detect a change from G to F. Thus, the found interval $[c_{2}, d_{2}]$ is a candidate for a re-adjustment period, a change back to the base distribution. When the original walk $S_{t}$ drops more on $[c_{2}, d_{2}]$ than it grows on $[c_{1}, d_{1}]$ or $[c_{3}, d_{3}]$ , the sum of increments along the obtained intervals $[{\hat{a}}_{1}, {\hat{b}}_{1}]$ and $[{\hat{a}}_{2}, {\hat{b}}_{2}]$ is higher that on any other two intervals, and thus, they will form the MLE for K = 2.

Step k. For $k = 2, \dots, K \land N$ (where N is the number of intervals where the random walk $S_{t}$ increases), we repeat the same operations as in Step 2. In every detected interval of change, $[{\hat{a}}_{j}, {\hat{b}}_{j}]$ , we find an interval of the largest drop of $S_{t}$ . In every interval between them including $[1, {\hat{a}}_{1}]$ and $[{\hat{b}}_{k - 1}, n]$ , we find an interval of the largest growth of $S_{t}$ . Then we find the interval of the largest change among them. If it is an interval of growth between ${\hat{b}}_{j}$ and ${\hat{a}}_{j + 1}$ , we simply add it to the list of intervals of change. If it is an interval $[c, d]$ of decrease between ${\hat{a}}_{j}$ and ${\hat{b}}_{j}$ , we replace the previously found $[{\hat{a}}_{j}, {\hat{b}}_{j}]$ with two intervals, $[{\hat{a}}_{j}, c]$ and $[d, {\hat{b}}_{j}]$ .

An example is shown in Figure 3. At step 1, the interval of the largest growth is determined as $[{\hat{a}}_{1} = 98, {\hat{b}}_{1} = 263]$ . At step 2, the second largest growth interval is determined as $[{\hat{a}}_{2} = 400, {\hat{b}}_{1} = 504]$ . At step 3, the largest-growth interval with ends at c = 157 and d = 190 is found inside $[{\hat{a}}_{1}, {\hat{b}}_{1}]$ . Therefore, we conclude that a re-adjustment occurred between c and d, and $[{\hat{a}}_{1}, {\hat{b}}_{1}] = [98, 263]$ is now replaced with two intervals, $[98, 157]$ and $[190, 263]$ .

4.2. Unknown number of transient changes and familywise error rates

Since the number of changes K is usually unknown, the algorithm in Section 4.1 may either miss changes or produce false alarms. As noted before, intervals of the biggest growth of the random walk $S_{t}$ signal transient changes. Therefore, those intervals where the increment in $S_{t}$ exceeds a certain threshold will serve as estimated transient change intervals.

This threshold controls the rate of false alarms. As we show below, no Bonferroni or Holm type correction is needed to control the familywise error rates. Instead, both the familywise rate of false alarms (19) and the familywise rate of false re-adjustments (20) can be controlled by thresholds that are independent of the true number of change-points, which can remain unknown.

The algorithm can be described as follows.

Introduce two CUSUM processes, renewed at a random time $T \geq 0$ ,
$\begin{aligned} W_{T, t} & = S_{T + t} - min_{0 \leq i \leq t} S_{T + i} = CUSUM based on (S_{T + t} - S_{T}), renewed at T \\ {\tilde{W}}_{T, t} & = max_{0 \leq i \leq t} S_{T + i} - S_{T + t} = CUSUM based on - (S_{T + t} - S_{T}), renewed at T \end{aligned}$
The CUSUM $W_{T, t}$ is set to detect the next disorder time, whereas ${\tilde{W}}_{T, t}$ is tuned to determine the next re-adjustment time. A special case of T = 0 results in the initial CUSUM processes $W_{t}$ and ${\tilde{W}}_{t}$ without any resetting.
To control the familywise false alarm and false re-adjustment rates at the desired levels α and β, respectively, define thresholds as
$\begin{aligned} h_{α} = - \log (α E_{F}^{- 1} (e^{W_{n}})) a n d {\tilde{h}}_{β} = - \log (β E_{G}^{- 1} (e^{{\tilde{W}}_{n}})) \end{aligned}$ (21)

The algorithm proceeds through the data series, detecting disorders and re-adjustments at stopping times

τ_{k}

and post-estimating change-points

a_{k}

and

b_{k}

sequentially for

k = 1, 2, \dots, K

as follows,

\begin{aligned} τ_{1} & = inf {t : 0 < t \leq n, W_{t} \geq h_{α}}, \\ {\hat{a}}_{1} & = max {Ker W_{t} \cap [0, τ_{1})} \\ {\tilde{τ}}_{k} & = τ_{k} + inf {t : 0 < t \leq n - τ_{k}, {\tilde{W}}_{τ_{k}, t} \geq {\tilde{h}}_{β}}, \\ {\hat{b}}_{k} & = τ_{k} + max {Ker {\tilde{W}}_{τ_{k}, t} \cap [0, {\tilde{τ}}_{k} - τ_{k})}; \\ τ_{k} & = {\tilde{τ}}_{k - 1} + inf {t : 0 < t \leq n - {\tilde{τ}}_{k - 1}, W_{{\tilde{τ}}_{k - 1}, t} \geq h_{α}}, \\ {\hat{a}}_{k} & = {\tilde{τ}}_{k - 1} + max {Ker W_{{\tilde{τ}}_{k - 1}, t} \cap [0, τ_{k} - {\tilde{τ}}_{k - 1})}, \end{aligned}

until

τ_{k} = \infty

{\tilde{τ}}_{k} = \infty

By this definition of stopping times $τ_{k}$ , ${\tilde{τ}}_{k}$ and change-point estimates ${\hat{a}}_{k}$ , ${\hat{b}}_{k}$ , each stopping time belongs to the corresponding interval of transient change that it is designed to detect, ${\hat{a}}_{k} < τ_{k} \leq {\hat{b}}_{k}$ and ${\hat{b}}_{k - 1} < {\tilde{τ}}_{k} \leq {\hat{a}}_{k}$ . CUSUM processes $W_{t}$ and ${\tilde{W}}_{t}$ are restarted and grounded at these times. As in the previous sections, change-points $a_{k}$ and $b_{k}$ are then estimated by the last zero points of restarted CUSUM processes $W_{{\tilde{τ}}_{k - 1}, t}$ and ${\tilde{W}}_{τ_{k}, t}$ , respectively.

Proposition 4.1

The transient change-point detection and estimator scheme (i)-(iii) resulting in the estimator ${{\hat{a}}_{k}, {\hat{b}}_{k}}_{k = 1}^{k = K}$ controls familywise rates of false alarms and false re-adjustments at levels

$\begin{aligned} FAR \leq α and FRR \leq β, \end{aligned}$

for any unknown number of transient changes K.

Proof.

According to the algorithm (i)-(iii), a false alarms occurs in the interval $[{\hat{a}}_{k}, {\hat{b}}_{k})$ if all the data in this interval follow the distribution F, including the segment $X_{{\hat{a}}_{k} : τ_{k}}$ that triggered the false detection at time $τ_{k}$ .

Also note that each renewed CUSUM process $W_{τ_{k}, t} = S_{τ_{k} + t} - min_{0 \leq t \leq τ_{k}} S_{τ_{k} + t}$ is dominated by the original CUSUM process $W_{t}$ on the corresponding segment,

$\begin{aligned} W_{τ_{k}, t} \leq W_{τ_{k} + t} . \end{aligned}$

This is because the subtracted term $min_{t \geq 0} S_{t}$ in the original CUSUM process cannot exceed the corresponding minimum $min_{0 \leq t \leq τ_{k}} S_{τ_{k} + t}$ of the renewed CUSUM.

Therefore, at least one false alarm can possibly occur only if the original CUSUM process $W_{t}$ exceeds the threshold $h_{α}$ at least once in the interval $(0, n]$ under the distribution F. The probability of the latter event is bounded by the Doob's inequality. Similarly to (8), obtain

$\begin{aligned} FAR & \leq P_{F} {⋃_{k} (max_{0 < t \leq (b_{k} - {\tilde{τ}}_{k - 1})^{+}} W_{τ_{k}, t} \geq h_{α})} \leq P_{F} {max_{0 < t \leq n} W_{t} \geq h_{α}} \\ \leq e^{- h} E_{F} (e^{W_{n}}) = α, \end{aligned}$

after substituting the first expression in (21) for $h_{α}$ .

The inequality $FRR \leq β$ is proven along the same lines, replacing the CUSUM process $W_{t}$ with ${\tilde{W}}_{t}$ , and accordingly, the stopping times $τ_{k}$ with ${\tilde{τ}}_{k}$ and vice versa.

5. Experimental study

In this section, we illustrate the proposed methods and explore their detection and estimation power by a simulation study. Our considered scenarios are

Transient changes in the mean of a Normal distribution;
Transient changes in the variance of a Normal distribution;
Transient changes between the Normal and Laplace distributions.

In this study, we estimate the detection thresholds that yield the preset rate of false alarms $α = 0.05$ , rate of false re-adjustments $β = 0.05$ , evaluate the detection power of the proposed methods, and assess the accuracy of change-point estimators. Familywise rates are controlled at levels $α = β = 0.05$ in the case of multiple transient changes.

The symmetric case of changes in the mean appears quite different from the asymmetric situation of changes in the variance, where it appears more difficult to detect a variance reduction than a variance increase. Scenario [3] is interesting from a practical point of view. The Standard Normal distribution and the Laplace (Double Exponential) distribution with the location parameter $μ = 0$ and the scale parameter $b = 1 / \sqrt{2}$ are both symmetric, with the same zero means and the same unit variances. However, the Laplace distribution has heavier tails resulting in higher probabilities of large deviations. In industrial manufacturing, for example, large deviations may imply overheating, overcooling, a lack or an excess of a chemical ingredient. Timely detection of such changes and accurate estimation of their locations are critical parts of the quality control, because the items produced during the transient change interval are likely to be non-conforming. The use of likelihood ratios (for example, instead of Shewart charts) allows to detect such changes.

5.1. Detection and estimation of change points

Table 1 contains the probability of detection, as well as means and standard deviations of change-point estimates $\hat{a}$ and $\hat{b}$ . An observed sample of size n = 1000 is assumed, with a transient change between a = 500 and b = 700. The considered scenarios include a change from the Standard Normal base distribution to the disturbed distribution:

To the Normal distribution with mean μ and unit variance (change in the mean);
To the Normal distribution with mean 0 and variance $σ^{2}$ (change in the variance);
To the Laplace distribution with mean 0 and variance 1 (change neither in the mean nor in the variance).

Table 1.

Detection thresholds, detection probabilities, and properties of change-point estimates for transient changes in the means and in the variances of Normal distributions and from the Normal to Laplace distributions.

Disturbed distribution		Threshold	Detection probability	Accuracy of Estimation
μ	σ	h	$P_{a, b} (Λ \geq h)$	$E (\hat{a})$	$S t d (\hat{a})$	$E (\hat{b})$	$S t d (\hat{b})$
0.05	1	2.65	0.109	351.2	238.0	694.5	238.5
0.10	1	4.16	0.212	404.0	208.9	690.9	209.9
0.15	1	5.02	0.394	444.5	171.2	691.0	174.8
0.20	1	5.60	0.618	472.1	127.8	695.3	133.8
0.25	1	6.03	0.804	487.5	92.0	697.7	97.6
0.30	1	6.35	0.915	495.6	64.6	699.6	68.0
0.35	1	6.62	0.969	498.2	45.8	700.4	46.9
0.40	1	6.84	0.991	499.8	32.7	700.4	32.9
0.60	1	7.45	1	500.0	14.1	700.1	14.1
0.80	1	7.80	1	499.9	7.9	700.0	7.9
1.00	1	8.00	1	500.0	5.1	700.0	5.0
0	0.50	8.20	1	498.4	5.4	701.6	5.5
0	0.75	6.92	0.989	494.7	32.6	704.8	34.1
0	0.90	5.01	0.355	430.2	171.8	701.2	175.4
0	0.95	3.41	0.137	361.2	222.6	703.6	223.9
0	1.05	3.33	0.146	385.7	230.7	678.7	232.4
0	1.10	4.73	0.362	447.8	181.3	681.2	185.5
0	1.25	6.22	0.950	502.9	55.1	693.9	58.2
0	1.50	6.95	1	503.4	15.7	696.9	15.8
0	2.00	7.25	1	501.6	5.6	698.4	5.6
Laplace(0, $1 / \sqrt{2}$ )		6.4	0.975	499.9	45.7	698.1	47.4

Open in a new tab

Threshold $h_{α}$ is calculated as the 95-th empirical percentile of the distribution of $Λ = max_{0 \leq t \leq n} (W_{t})$ , that yields the rate of false alarms $FAR = α = 0.05$ . Results are based on $N =$ 50,000 Monte Carlo runs, and the threshold is estimated from $N_{h} =$ 200,000 Monte Carlo runs. Experimentally, we observed that estimation of the CUSUM's exponential moment $E_{F} (e^{W_{n}})$ for threshold 9 is less reliable due to a very high variance of $e^{W_{n}}$ .

As one would anticipate, the detection power, expressed as the probability of detection, monotonically increases with the magnitude of change. When the transient change lasts for b−a = 200 observations, it is detected with the probability of 0.95 or higher when the mean drifts by 0.3+ standard deviations or when the standard deviation changes by 25%, in one or the other direction. Accordingly, the accuracy of estimators $\hat{a}$ and $\hat{b}$ improves with the magnitude of change resulting in lower standard errors. Results imply that the change-point estimators are nearly unbiased and distribution-consistent, converging in probability to the corresponding parameters, as the change grows in magnitude (unlike the standard notion of consistency related to large samples, see [4]).

5.2. Power analysis and the choice of a threshold

Table 2 shows power analysis for certain types of changes. The power, represented by the detection probability in change-point analysis, is estimated as a function of the magnitude and duration of transient change. Naturally, the difficulty in detecting small changes can be compensated by a sufficiently long interval where the data follows the new distribution. Even a 0.2σ change in the mean or a 10% shift in the standard deviation are quite likely to be detected, when the change sustains for a block of, say, b−a = 400 observations. A change from the Normal to the Laplace distribution with the same mean and the same variance has a 99% chance to be detected if the region of change lasts for about 250 observations.

Table 2.

Power analysis. Detection probabilities as functions of magnitude and duration of a transient change.

Change		Duration of the transient period Δ
From N(0,1) to N(μ,1)	μ	50	100	150	200	250	300	350	400	450	500
	0.1	0.068	0.102	0.152	0.213	0.281	0.346	0.394	0.465	0.532	0.571
	0.2	0.113	0.259	0.457	0.611	0.740	0.828	0.889	0.924	0.949	0.968
	0.3	0.216	0.567	0.808	0.916	0.968	0.983	0.992	0.997	0.999	1
	0.4	0.421	0.839	0.964	0.992	0.998	0.999	1	1	1	1
	0.5	0.659	0.957	0.995	1	1	1	1	1	1	1
	0.6	0.836	0.993	1	1	1	1	1	1	1	1
	0.7	0.937	0.999	1	1	1	1	1	1	1	1
	0.8	0.978	1	1	1	1	1	1	1	1	1
	0.9	0.994	1	1	1	1	1	1	1	1	1
	1.0	0.998	1	1	1	1	1	1	1	1	1
From N(0,1) to N(0,σ)	σ
	0.50	0.993	1	1	1	1	1	1	1	1	1
	0.75	0.305	0.797	0.952	0.989	0.997	0.999	1	1	1	1
	0.90	0.082	0.144	0.241	0.361	0.465	0.576	0.664	0.738	0.796	0.837
	0.95	0.062	0.084	0.108	0.142	0.171	0.211	0.254	0.290	0.330	0.357
	1.05	0.064	0.086	0.115	0.143	0.181	0.212	0.256	0.289	0.325	0.366
	1.10	0.086	0.162	0.256	0.365	0.462	0.559	0.637	0.703	0.756	0.801
	1.25	0.325	0.693	0.871	0.950	0.979	0.991	0.997	0.999	1	1
	1.50	0.865	0.992	0.999	1	1	1	1	1	1	1
	2.00	1	1	1	1	1	1	1	1	1	1
Normal to Laplace		0.323	0.731	0.905	0.973	0.991	0.997	0.999	1	1	1

Open in a new tab

A similar pattern is seen in the operating characteristic curve on Figure 4. The sensitivity-specificity ratio is represented by the probability of detection and the rate of false alarms. Easily detectable changes are represented by steeper ROC curves, which correspond to larger magnitudes of change μ and longer periods Δ of transient change. On this Figure, $Δ = 10$ means that only 10 data points are observed from the changed distribution.

ROC curves can also be used for determining detection thresholds h that achieve the desired balance between the detection power and the rate of false alarms. A simple argument results in a lower bound for the needed threshold, which appears a good approximation of h for larger changes. Indeed, one large increment $z_{i}$ is sufficient for exceeding the threshold and triggering a false alarm, under the base distribution.

Let $F_{z}$ be the cumulative distribution function of the individual log-likelihood ratios $z_{i} = \log (g (X_{i}) / f (X_{i}))$ under the base distribution F. The false alarm rate must be bounded from below by the probability of having at least one increment $z_{i}$ alone, $1, \dots, n$ , exceeding the threshold, and consequently, driving the whole CUSUM process over h. That is,

\begin{aligned} FAR \geq P {⋃_{i = 1}^{n} z_{i} \geq h = 1 - F_{z}^{n} (h)} \end{aligned}

exceeds α if and only if $h < F_{z}^{- 1} ((1 - α)^{1 / n})$ . Hence, we obtain the lower bound for the required threshold,

\begin{aligned} h \geq F_{z}^{- 1} ((1 - α)^{1 / n}) . \end{aligned}

For example, in case of a change in the mean of a Normal distribution, the base distribution of log-likelihood ratios $z_{i}$ is Normal with mean $(- μ^{2} / 2)$ and variance $μ^{2}$ . Hence,

\begin{aligned} FAR \geq 1 - Φ^{n} (\frac{h + μ^{2} / 2}{μ}), \end{aligned}

and we obtain that any threshold satisfying

\begin{aligned} h \geq μ Φ^{- 1} ((1 - α)^{1 / n}) - \frac{μ^{2}}{2} \end{aligned}

(22)

yields the false alarm rate controlled at a level not exceeding α, where Φ denotes the Standard Normal c.d.f.

As seen in Figure 5, (22) is a pretty accurate approximation of the required threshold for mean changes that are larger than 4 standard deviations. It means that detecting a change between substantially different distributions, a false alarm is likely to be cause by one extreme observation.

5.3. Detection of multiple transient changes

The next experiment focuses on multiple changes whose number is unknown. In the data stream of length n = 1000, three transient change intervals are generated, each lasting for $b_{k} - a_{k} = 100$ observations, k = 1, 2, 3. Each such segment is marked with a mean shifted by μ standard deviations. The algorithm described in Section 4.2 is then used to detect and estimate the start and end points of all intervals of change, with thresholds determined from the empirical null distribution of the test statistic. Since the actual number of change-points is treated as unknown, the algorithm may detect either fewer or more than N = 3 intervals of change.

In this study, we estimate the familywise false alarm rate FAR and the familywise false re-adjustment rate FRR, explore the frequency of detecting the correct and the incorrect number of changes, and evaluate the accuracy of all change-point estimators $({\hat{a}}_{k}, {\hat{b}}_{k})$ .

Results in Table 3 show that rather low familywise false alarm and false re-adjustment rates for all shifts μ; they are controlled by properly selected detection thresholds.

Table 3.

Analysis of multiple transient changes. Familywise false alarm and false re-adjustment rates and the distribution of detected intervals of change.

Shift	Threshold			Probability of detecting k intervals
μ	h	FAR	FRR	k = 0	k = 1	k = 2	k = 3	$k \geq 4$
0.1	4.14	0	0	0.77	0.23	0	0	0
0.2	5.60	0.001	0	0.44	0.49	0.07	0	0
0.3	6.36	0.002	0	0.09	0.37	0.41	0.13	0
0.4	6.84	0.007	0	0	0.07	0.36	0.56	0
0.5	7.18	0.013	0	0	0	0.11	0.87	0.01
0.6	7.44	0.020	0.002	0	0	0.02	0.96	0.02
0.7	7.64	0.024	0.003	0	0	0	0.97	0.03
0.8	7.79	0.028	0.006	0	0	0	0.97	0.03
0.9	7.94	0.027	0.008	0	0	0	0.97	0.03
1.0	8.01	0.030	0.010	0	0	0	0.96	0.04

Open in a new tab

When the sequence is observed with no change-points, the threshold explained in Section 5.1 guarantees exactly the desired rate of false alarms, subject to the Monte Carlo estimation error only. Based on the test statistic, the maximum CUSUM value, the threshold $h = h (n)$ depends on the sample size n, and it is an increasing function of n. In the presence of change-points, false alarms, if any, occur within shorter intervals between transient change segments. Shorter intervals could have been served by lower thresholds if their lengths were known. However, the duration and mere presence of transient changes is unknown, and we guarantee the desired FAR conservatively by selecting a threshold $h (n)$ , which yields $FAR = α$ in the absence of change-points and $FAR < α$ in the presence of change-points, where the real FAR depends on the number and mutual location of transient changes, and more precisely, on the duration of each segment. The actual familywise error rates are reflected in columns 3-4 of Table 3.

Table 3 shows that for changes in the magnitude of 0.5 standard deviations or more, the multiple transient change detection algorithm is quite likely to detect precisely the correct number, K = 3 intervals of change. Small shifts are more difficult to detect. When the mean drifts away by 0.2 standard deviations or less, the procedure will almost certainly detect fewer than K transient changes. Of course, detection is more likely when a change lasts for longer than 100 observations, which can be achieved, for example, by more frequent measurements.

With the thresholds selected to control FAR and FRR, detecting more than K intervals of change is very unlikely. For K = 3, the probability of detecting more than 3 intervals is less than 0.001 for all the considered scenarios.

Accuracy of change-point estimators is evaluated in Table 4. Here, the means and standard deviations of all estimators $({\hat{a}}_{k}, {\hat{b}}_{k})$ are calculated over those data streams that resulted in the correct number of detected intervals. Estimated means are to be compared with the actual intervals of change,

\begin{aligned} [a_{1}, b_{1}] = [150, 250], [a_{2}, b_{2}] = [450, 550], and [a_{3}, b_{3}] = [750, 850] . \end{aligned}

Results suggest that change-point estimators are nearly unbiased for shifts of magnitude from about 0.4 standard deviations, with their precision visibly improving for larger shifts.

Table 4.

Analysis of multiple transient changes. Accuracy of change-point estimation.

	Means and standard deviations of change-point estimators
Shift μ	$E ({\hat{a}}_{1})$	$E ({\hat{b}}_{1})$	$E ({\hat{a}}_{2})$	$E ({\hat{b}}_{2})$	$E ({\hat{a}}_{3})$	$E ({\hat{b}}_{3})$	$σ ({\hat{a}}_{1})$	$σ ({\hat{b}}_{1})$	$σ ({\hat{a}}_{2})$	$σ ({\hat{b}}_{2})$	$σ ({\hat{a}}_{3})$	$σ ({\hat{b}}_{3})$
0.2	114.0	272.2	426.4	551.0	734.4	870.8	43.9	33.5	35.7	20.0	39.9	21.1
0.3	135.3	260.0	441.4	560.0	739.9	853.9	36.1	30.5	30.9	30.4	30.9	25.6
0.4	145.4	253.9	446.1	553.2	745.7	852.7	26.0	24.6	26.9	25.2	25.4	22.5
0.5	148.8	251.2	448.8	551.2	748.6	850.8	18.8	18.6	20.0	19.2	18.9	18.3
0.6	149.7	250.4	449.8	550.3	749.8	850.3	13.7	13.7	13.9	13.7	13.4	13.3
0.7	149.9	250.1	449.9	550.1	749.9	850.0	10.4	10.2	10.1	10.4	10.1	10.3
0.8	150.0	250.1	450.1	550.1	749.9	850.0	8.2	7.6	7.9	8.0	7.9	7.8
0.9	150.0	250.0	450.1	550.0	750.1	850.1	6.2	6.2	6.4	6.3	6.3	6.3
1.0	149.9	250.0	450.0	550.0	750.0	850.0	5.0	5.0	5.2	4.9	5.1	5.0

Open in a new tab

6. Summary and conclusions

The transient change-point analysis refers to temporary changes in the distribution of data. Here, we studied detection and maximum likelihood estimation of transient changes, including the cases of a single change and multiple changes, where their number may be known or unknown, and studied precision of the obtained estimators.

Even small transient changes can be detected, if they sustain for a sufficiently long period of time. The power of detection naturally reduces with smaller magnitudes or shorter durations of a change. Detection sensitivity depends on the selected threshold, which can be chosen to satisfy a preset rate of false alarms.

The next step is extension of these methods to the case of distributions with unknown (nuisance) parameters. Generalized likelihood ratios and Bayesian methods have been proposed to handle nuisance parameters in the situations of a single change. Application of similar techniques to the case of multiple transient changes will allow detecting changes from the base distribution to different disturbed distributions in each transient change interval.

Funding Statement

Research of M. Baron is supported by the NSF [grant number 1737960] and the Defense Advanced Research Projects Agency (DARPA) [grant number HR0011-18-C-0051]. Research of S. Malov is supported by the Russian Science Foundation (RSF) [grant number 20-14-00072].

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Abd-Elnaser S., Rabou A.S., and Gad A.M., Change-point rank tests with epidemic alternatives, Egypt. Stat. J. 50 (2006), pp. 114–135. [Google Scholar]
2.Anastasiou A. and Fryzlewicz P., Detecting multiple generalized change-points by isolating single ones, Metrika 85 (2022), pp. 141–174. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Baron M., Sequential methods for multistate processes, in Applied Sequential Methodologies, N. Mukhopadhyay, S. Datta, and S. Chattopadhyay, eds., Dekker, New York, 2004, pp. 55–73.
4.Baron M. and Granott N., Consistent estimation of early and frequent change points, in Foundations of Statistical Inference, J. Haitovsky, H. R. Lerche, and Y. Ritov, eds., Physica-Verlag, Heidelberg, New York, 2003, pp. 181–194.
5.Baron M., Rosenberg M., and Sidorenko N., Electricity pricing: Modeling and prediction with automatic spike detection, Energy, Power Risk Management 2001 (2001), pp. 36–39. [Google Scholar]
6.Baron M., Rosenberg M., and Sidorenko N., Divide and conquer: Forecasting power via automatic price regime separation, Energy, Power Risk Management 2002 (2002), pp. 70–73. [Google Scholar]
7.Bianchi A.M., Mainardi L., Petrucci E., Signorini M.G., Mainardi M., and Cerutti S., Time-variant power spectrum analysis for the detection of transient episodes in HRV signal, IEEE Trans. Biomed. Eng. 40 (1993), pp. 136–144. [DOI] [PubMed] [Google Scholar]
8.Chen J. and Gupta A.K., Parametric Statistical Change Point Analysis: With Applications to Genetics, Medicine, and Finance, Birkhäuser, Boston, MA, 2012.
9.Egea-Roca D., López-Salcedo J.A., Seco-Granados G., and Poor H.V., Performance bounds for finite moving average tests in transient change detection, IEEE. Trans. Signal. Process. 66 (2018), pp. 1594–1606. [Google Scholar]
10.Eichinger B. and Kirch C., A MOSUM procedure for the estimation of multiple random change points, Bernoulli 24 (2018), pp. 526–564. [Google Scholar]
11.Fearnhead P., Exact and efficient Bayesian inference for multiple changepoint problems, Stat. Comput. 16 (2006), pp. 203–213. [Google Scholar]
12.Fryzlewicz P., Wild binary segmentation for multiple change-point detection, Ann. Stat. 42 (2014), pp. 2243–2281. [Google Scholar]
13.Fryzlewicz P., Detecting possibly frequent change-points: Wild binary segmentation and steepest-drop model selection, J. Korean. Stat. Soc. 49 (2020), pp. 1027–1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Fu Y.-X and Curnow R.N., Maximum likelihood estimation of multiple change points, Biometrika 77 (1990), pp. 563–573. [Google Scholar]
15.Guépié B.K., Fillatre L., and Nikiforov I., Detecting a suddenly arriving dynamic profile of finite duration, IEEE Trans. Inform. Theory 63 (2017), pp. 3039–3052. [Google Scholar]
16.Guépié B.K., Fillatre L., and Nikiforov I.V., Sequential detection of transient changes, Seq. Anal. 31 (2012), pp. 528–547. [Google Scholar]
17.Gut A. and Steinebach J., A two-step sequential procedure for detecting an epidemic change, Extremes 8 (2005), pp. 311–326. [Google Scholar]
18.Han C., Willett P.K., and Abraham D.A., Some methods to evaluate the performance of page's test as used to detect transient signals, IEEE. Trans. Signal. Process. 47 (1999), pp. 2112–2127. [Google Scholar]
19.Hinkley D.V., Inference about the change-point in a sequence of random variables, Biometrika 57 (1970), pp. 1–17. [Google Scholar]
20.Hochberg Y. and Tamhane A.C., Multiple Comparison Procedures, Wiley, New York, 1987. [Google Scholar]
21.Hu I. and Rukhin A.L., A lower bound for error probability in change-point estimation, Stat. Sin. 5 (1995), pp. 319–331. [Google Scholar]
22.Lee C.-B., Nonparametric multiple change-point estimators, Statist. Probab. Lett. 27 (1996), pp. 295–304. [Google Scholar]
23.Levin B. and Kline J., The cusum test of homogeneity with an application in spontaneous abortion epidemiology, Stat. Med. 4 (1985), pp. 469–488. [DOI] [PubMed] [Google Scholar]
24.Noonan J. and Zhigljavsky A., Power of the MOSUM test for online detection of a transient change in mean, Seq. Anal. 39 (2020), pp. 269–293. [Google Scholar]
25.Page E.S., Continuous inspection schemes, Biomterika 41 (1954), pp. 100–115. [Google Scholar]
26.Page E.S., On problems in which a change in a parameter occurs at an unknown point, Biometrika 44 (1957), pp. 248–252. [Google Scholar]
27.Poor H.V. and Hadjiliadis O., Quickest Detection, Cambridge University Press, Cambridge (UK), 2009. [Google Scholar]
28.Repin V.G., Detection of a signal with unknown moments of appearance and disappearance, Problemy Peredachi Informatsii 27 (1991), pp. 61–72. [Google Scholar]
29.Rosenberg M., Bryngelson J.D., Baron M., and Papalexopoulos A.D., Transmission valuation analysis based on real options with price spikes, in Handbook of Power Systems II; Energy Systems Part I, S. Rebennack, P.M. Pardalos, M.V.F. Pereira and N. Iliadis, eds., Springer, Berlin-Heiderberg, 2010, pp. 101–125.
30.Rosenberg M., Bryngelson J.D., Sidorenko N., and Baron M., Price spikes and real options: transmission valuation, in Real Options and Energy Management, E.I. Ronn, ed., Risk Books, London, 2002, pp. 323–370.
31.Seidou O. and Ouarda T., Recursion-based multiple changepoint detection in multiple linear regression and application to river streamflows, Water. Resour. Res. 43(7) (2007). DOI: 10.1029/2006WR005021. [DOI] [Google Scholar]
32.Shiryaev A.N., Probability, 2nd ed. Springer-Verlag, New York, 1995. [Google Scholar]
33.Shiryaev A.N., Quickest detection problems: Fifty years later, Seq. Anal. 29 (2010), pp. 345–385. [Google Scholar]
34.Siegmund D., Sequential Analysis: Tests and Confidence Intervals, Springer-Verlag, New York, 1985. [Google Scholar]
35.Siegmund D., Boundary crossing probabilities and statistical applications, Ann. Stat. 14 (1986), pp. 361–404. [Google Scholar]
36.Siegmund D., Approximate tail probabilities for the maxima of some random fields, Ann. Probab. 16 (1988), pp. 487–501. [Google Scholar]
37.Spitzer F., Principles of Random Walk, Van Nostrand, New York, 1966. [Google Scholar]
38.Stroock D.W., Mathematics of probability, Vol. 149. American Mathematical Soc., 2013.
39.Tafakori L., Pourkhanali A., and Fard F.A., Forecasting spikes in electricity return innovations, Energy 150 (2018), pp. 508–526. [Google Scholar]
40.Tartakovskii A.G., Detection of signals with random moments of appearance and disappearance, Probl. Peredachi Inf. 24 (1988), pp. 39–50. [Google Scholar]
41.Tartakovsky A.G., Berenkov N.R., Kolessa A.E., and Nikiforov I.V., Optimal sequential detection of signals with unknown appearance and disappearance points in time, IEEE. Trans. Signal. Process. 69 (2021), pp. 2653–2662. [Google Scholar]
42.Tartakovsky A. G., Nikiforov I. V., and Basseville M., Sequential Analysis Hypothesis Testing and Change-Point Detection, Chapman & Hall/CRC, 2014. [Google Scholar]
43.Vostrikova L. Ju., Detecting ‘disorder’ in multidimensional random processes, Sov. Math. Dokl. 24 (1981), pp. 55–59. [Google Scholar]
44.Wang X., Liu B., Zhang X., and Liu Y., Efficient multiple change point detection for high-dimensional generalized linear models, Can. J. Stat. (2022). DOI: 10.1002/cjs.11721. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Woodroofe M., Nonlinear Renewal Theory in Sequential Analysis, SIAM, 1982. [Google Scholar]
46.Yao Q., Tests for change-points with epidemic alternatives, Biometrika 80 (1993), pp. 179–191. [Google Scholar]
47.Zhang L. and Li Y., Regime-switching based vehicle-to-building operation against electricity price spikes, Energy Econ. 66 (2017), pp. 1–8. [Google Scholar]
48.Zhou B., Chioua M., Bauer M., Schlake J.C., and Thornhill N.F., Improving root cause analysis by detecting and removing transient changes in oscillatory time series with application to a 1, 3-butadiene process, Ind. Eng. Chem. Res. 58 (2019), pp. 11234–11250. [Google Scholar]

[CIT0001] 1.Abd-Elnaser S., Rabou A.S., and Gad A.M., Change-point rank tests with epidemic alternatives, Egypt. Stat. J. 50 (2006), pp. 114–135. [Google Scholar]

[CIT0002] 2.Anastasiou A. and Fryzlewicz P., Detecting multiple generalized change-points by isolating single ones, Metrika 85 (2022), pp. 141–174. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0003] 3.Baron M., Sequential methods for multistate processes, in Applied Sequential Methodologies, N. Mukhopadhyay, S. Datta, and S. Chattopadhyay, eds., Dekker, New York, 2004, pp. 55–73.

[CIT0004] 4.Baron M. and Granott N., Consistent estimation of early and frequent change points, in Foundations of Statistical Inference, J. Haitovsky, H. R. Lerche, and Y. Ritov, eds., Physica-Verlag, Heidelberg, New York, 2003, pp. 181–194.

[CIT0005] 5.Baron M., Rosenberg M., and Sidorenko N., Electricity pricing: Modeling and prediction with automatic spike detection, Energy, Power Risk Management 2001 (2001), pp. 36–39. [Google Scholar]

[CIT0006] 6.Baron M., Rosenberg M., and Sidorenko N., Divide and conquer: Forecasting power via automatic price regime separation, Energy, Power Risk Management 2002 (2002), pp. 70–73. [Google Scholar]

[CIT0007] 7.Bianchi A.M., Mainardi L., Petrucci E., Signorini M.G., Mainardi M., and Cerutti S., Time-variant power spectrum analysis for the detection of transient episodes in HRV signal, IEEE Trans. Biomed. Eng. 40 (1993), pp. 136–144. [DOI] [PubMed] [Google Scholar]

[CIT0008] 8.Chen J. and Gupta A.K., Parametric Statistical Change Point Analysis: With Applications to Genetics, Medicine, and Finance, Birkhäuser, Boston, MA, 2012.

[CIT0009] 9.Egea-Roca D., López-Salcedo J.A., Seco-Granados G., and Poor H.V., Performance bounds for finite moving average tests in transient change detection, IEEE. Trans. Signal. Process. 66 (2018), pp. 1594–1606. [Google Scholar]

[CIT0010] 10.Eichinger B. and Kirch C., A MOSUM procedure for the estimation of multiple random change points, Bernoulli 24 (2018), pp. 526–564. [Google Scholar]

[CIT0011] 11.Fearnhead P., Exact and efficient Bayesian inference for multiple changepoint problems, Stat. Comput. 16 (2006), pp. 203–213. [Google Scholar]

[CIT0012] 12.Fryzlewicz P., Wild binary segmentation for multiple change-point detection, Ann. Stat. 42 (2014), pp. 2243–2281. [Google Scholar]

[CIT0013] 13.Fryzlewicz P., Detecting possibly frequent change-points: Wild binary segmentation and steepest-drop model selection, J. Korean. Stat. Soc. 49 (2020), pp. 1027–1070. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0014] 14.Fu Y.-X and Curnow R.N., Maximum likelihood estimation of multiple change points, Biometrika 77 (1990), pp. 563–573. [Google Scholar]

[CIT0015] 15.Guépié B.K., Fillatre L., and Nikiforov I., Detecting a suddenly arriving dynamic profile of finite duration, IEEE Trans. Inform. Theory 63 (2017), pp. 3039–3052. [Google Scholar]

[CIT0016] 16.Guépié B.K., Fillatre L., and Nikiforov I.V., Sequential detection of transient changes, Seq. Anal. 31 (2012), pp. 528–547. [Google Scholar]

[CIT0017] 17.Gut A. and Steinebach J., A two-step sequential procedure for detecting an epidemic change, Extremes 8 (2005), pp. 311–326. [Google Scholar]

[CIT0018] 18.Han C., Willett P.K., and Abraham D.A., Some methods to evaluate the performance of page's test as used to detect transient signals, IEEE. Trans. Signal. Process. 47 (1999), pp. 2112–2127. [Google Scholar]

[CIT0019] 19.Hinkley D.V., Inference about the change-point in a sequence of random variables, Biometrika 57 (1970), pp. 1–17. [Google Scholar]

[CIT0020] 20.Hochberg Y. and Tamhane A.C., Multiple Comparison Procedures, Wiley, New York, 1987. [Google Scholar]

[CIT0021] 21.Hu I. and Rukhin A.L., A lower bound for error probability in change-point estimation, Stat. Sin. 5 (1995), pp. 319–331. [Google Scholar]

[CIT0022] 22.Lee C.-B., Nonparametric multiple change-point estimators, Statist. Probab. Lett. 27 (1996), pp. 295–304. [Google Scholar]

[CIT0023] 23.Levin B. and Kline J., The cusum test of homogeneity with an application in spontaneous abortion epidemiology, Stat. Med. 4 (1985), pp. 469–488. [DOI] [PubMed] [Google Scholar]

[CIT0024] 24.Noonan J. and Zhigljavsky A., Power of the MOSUM test for online detection of a transient change in mean, Seq. Anal. 39 (2020), pp. 269–293. [Google Scholar]

[CIT0025] 25.Page E.S., Continuous inspection schemes, Biomterika 41 (1954), pp. 100–115. [Google Scholar]

[CIT0026] 26.Page E.S., On problems in which a change in a parameter occurs at an unknown point, Biometrika 44 (1957), pp. 248–252. [Google Scholar]

[CIT0027] 27.Poor H.V. and Hadjiliadis O., Quickest Detection, Cambridge University Press, Cambridge (UK), 2009. [Google Scholar]

[CIT0028] 28.Repin V.G., Detection of a signal with unknown moments of appearance and disappearance, Problemy Peredachi Informatsii 27 (1991), pp. 61–72. [Google Scholar]

[CIT0029] 29.Rosenberg M., Bryngelson J.D., Baron M., and Papalexopoulos A.D., Transmission valuation analysis based on real options with price spikes, in Handbook of Power Systems II; Energy Systems Part I, S. Rebennack, P.M. Pardalos, M.V.F. Pereira and N. Iliadis, eds., Springer, Berlin-Heiderberg, 2010, pp. 101–125.

[CIT0030] 30.Rosenberg M., Bryngelson J.D., Sidorenko N., and Baron M., Price spikes and real options: transmission valuation, in Real Options and Energy Management, E.I. Ronn, ed., Risk Books, London, 2002, pp. 323–370.

[CIT0031] 31.Seidou O. and Ouarda T., Recursion-based multiple changepoint detection in multiple linear regression and application to river streamflows, Water. Resour. Res. 43(7) (2007). DOI: 10.1029/2006WR005021. [DOI] [Google Scholar]

[CIT0032] 32.Shiryaev A.N., Probability, 2nd ed. Springer-Verlag, New York, 1995. [Google Scholar]

[CIT0033] 33.Shiryaev A.N., Quickest detection problems: Fifty years later, Seq. Anal. 29 (2010), pp. 345–385. [Google Scholar]

[CIT0034] 34.Siegmund D., Sequential Analysis: Tests and Confidence Intervals, Springer-Verlag, New York, 1985. [Google Scholar]

[CIT0035] 35.Siegmund D., Boundary crossing probabilities and statistical applications, Ann. Stat. 14 (1986), pp. 361–404. [Google Scholar]

[CIT0036] 36.Siegmund D., Approximate tail probabilities for the maxima of some random fields, Ann. Probab. 16 (1988), pp. 487–501. [Google Scholar]

[CIT0037] 37.Spitzer F., Principles of Random Walk, Van Nostrand, New York, 1966. [Google Scholar]

[CIT0038] 38.Stroock D.W., Mathematics of probability, Vol. 149. American Mathematical Soc., 2013.

[CIT0039] 39.Tafakori L., Pourkhanali A., and Fard F.A., Forecasting spikes in electricity return innovations, Energy 150 (2018), pp. 508–526. [Google Scholar]

[CIT0040] 40.Tartakovskii A.G., Detection of signals with random moments of appearance and disappearance, Probl. Peredachi Inf. 24 (1988), pp. 39–50. [Google Scholar]

[CIT0041] 41.Tartakovsky A.G., Berenkov N.R., Kolessa A.E., and Nikiforov I.V., Optimal sequential detection of signals with unknown appearance and disappearance points in time, IEEE. Trans. Signal. Process. 69 (2021), pp. 2653–2662. [Google Scholar]

[CIT0042] 42.Tartakovsky A. G., Nikiforov I. V., and Basseville M., Sequential Analysis Hypothesis Testing and Change-Point Detection, Chapman & Hall/CRC, 2014. [Google Scholar]

[CIT0043] 43.Vostrikova L. Ju., Detecting ‘disorder’ in multidimensional random processes, Sov. Math. Dokl. 24 (1981), pp. 55–59. [Google Scholar]

[CIT0044] 44.Wang X., Liu B., Zhang X., and Liu Y., Efficient multiple change point detection for high-dimensional generalized linear models, Can. J. Stat. (2022). DOI: 10.1002/cjs.11721. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0045] 45.Woodroofe M., Nonlinear Renewal Theory in Sequential Analysis, SIAM, 1982. [Google Scholar]

[CIT0046] 46.Yao Q., Tests for change-points with epidemic alternatives, Biometrika 80 (1993), pp. 179–191. [Google Scholar]

[CIT0047] 47.Zhang L. and Li Y., Regime-switching based vehicle-to-building operation against electricity price spikes, Energy Econ. 66 (2017), pp. 1–8. [Google Scholar]

[CIT0048] 48.Zhou B., Chioua M., Bauer M., Schlake J.C., and Thornhill N.F., Improving root cause analysis by detecting and removing transient changes in oscillatory time series with application to a 1, 3-butadiene process, Ind. Eng. Chem. Res. 58 (2019), pp. 11234–11250. [Google Scholar]

PERMALINK

Detection and estimation of multiple transient changes

Michael Baron

Sergey V Malov

Abstract

1. Introduction to transient changes

Figure 1.

2. Estimation and testing of one transient change interval

2.1. Maximum likelihood estimation

Figure 2.

2.2. Testing appearance of a transient change

Proposition 2.1 Detection of a single transient segment —

3. Precision and limiting distribution of the MLE

3.1. Pre-Likelihood estimators

3.2. Local likelihood estimators

Proposition 3.1

Proof.

3.3. Local estimation around a detection point

Proposition 3.2

Proof.

3.4. Asymptotic distribution of the MLE

Lemma 3.3

Proof.

Lemma 3.4

Proof.

Proposition 3.5

Remark 3.1

Proposition 3.6

Proof.

4. Multiple transient changes and the familywise false alarm rate

4.1. Known number of transient changes and MLE

Figure 3.

4.2. Unknown number of transient changes and familywise error rates

Proposition 4.1

Proof.

5. Experimental study

5.1. Detection and estimation of change points

Table 1.

5.2. Power analysis and the choice of a threshold

Table 2.

Figure 4.

Figure 5.

5.3. Detection of multiple transient changes

Table 3.

Table 4.

6. Summary and conclusions

Funding Statement

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases