Bootstrap Confidence Intervals for Multiple Change Points Based on Two-Stage Procedures

Li Hou; Baisuo Jin; Yuehua Wu; Fangwei Wang

doi:10.3390/e27050537

. 2025 May 17;27(5):537. doi: 10.3390/e27050537

Bootstrap Confidence Intervals for Multiple Change Points Based on Two-Stage Procedures

Li Hou ¹, Baisuo Jin ^1,^*, Yuehua Wu ^2,^*, Fangwei Wang ¹

Editor: Nikolay K Vitanov

PMCID: PMC12110430 PMID: 40422491

Abstract

This paper investigates the construction of confidence intervals for multiple change points in linear regression models. First, we detect multiple change points by performing variable selection on blocks of the input sequence; second, we re-estimate their exact locations in a refinement step. Specifically, we exploit an orthogonal greedy algorithm to recover the number of change points consistently in the cutting stage, and employ the sup-Wald-type test statistic to determine the locations of multiple change points in the refinement stage. Based on a two-stage procedure, we propose bootstrapping the estimated centered error sequence, which can accommodate unknown magnitudes of changes and ensure the asymptotic validity of the proposed bootstrapping method. This enables us to construct confidence intervals using the empirical distribution of the resampled data. The proposed method is illustrated with simulations and real data examples.

Keywords: change point, bootstrap, confidence interval, OGA, sup-Wald-type test

1. Introduction

Multiple change point detection is common in many applications, such as signal processing [1], medical diagnosis [2], industrial control [3], and oceanography [4]. For example, in the case of the detection of changes in one’s heart rates, the accurate identification of the change points can decompose the data into stationary segments, allowing clinicians to identify the behavior of the heart and diagnose diseases. The detection of multiple change points has important practical implications, which has prompted extensive research in the statistical community.

Implementing appropriate hypothesis testing is an important method for detecting multiple change points. Here are some examples. Ref. [5] proposed sup-Wald-type tests to test the null hypothesis of no change against alternative hypotheses containing an arbitrary number of changes to identify multiple structural changes in linear models. At the same time, the null hypothesis of s changes against $s + 1$ changes is tested to determine the number of breakpoints. Later, ref. [6] developed dynamic programming principles to estimate multiple change points in linear regression. Ref. [7] proposed a genetic algorithm for detecting multiple breakpoints. However, these methods are very time-consuming. There is another line of research on multiple change point detection that transforms the problem into a variable selection problem. For example, ref. [8] used LASSO [9] to estimate the locations of multiple change points in a one-dimensional piecewise constant signal observed in white noise. Ref. [10] introduced a two-stage procedure based on adaptive LASSO [11], SCAD [12], and MCP [13] regularization methods that can simultaneously detect multiple change points in linear models. In addition, the two-stage procedure has been extended to accelerated failure time (AFT) models [14].

Despite the aforementioned developments, research on the inference for multiple change points remains limited. Under the null hypothesis, asymptotic [15] or approximate [16] distributions of test statistics have been derived, which enables quantifying uncertainty in the number of change points. For many testing procedures in change point analysis, the computation of critical values often relies on the asymptotic behavior of the test statistic under the null hypothesis. However, the convergence of the test statistic to its limiting distribution is often slow, and in some cases, the exact form of this distribution remains unknown. It has been noted that the bootstrap method is a computer-based statistical inference technique that can provide answers to a variety of statistical questions without relying on formulas. For example, ref. [17] proposed a bootstrap method in which the estimated error sequence is resampled with replacement to obtain confidence intervals for change points. Ref. [18] introduced an asymptotically valid confidence region for a single change point through the inversion of bootstrap tests. Ref. [19] studied the application of a circular overlapping block bootstrap method in the context of an at-most-one-change time series model with an abrupt change in the mean and dependent errors. In the field of a single change point estimation, the bootstrap technique has emerged as a valuable tool for approximating unknown probability distributions and the characteristics of change point estimators. These methods make it possible for us to provide confidence intervals for multiple change points. Ref. [20] addressed this problem in mean shift models by proposing the bootstrap construction of pointwise and uniform confidence intervals for multiple change points based on a moving sum procedure.

This paper aims to construct confidence intervals for multiple change points. By [10], the multiple change point detection problem can be formulated as a model selection problem. Before using bootstrapping, it is important to ensure that the estimates of the number of change points and the estimates of the within-segment parameters are consistent. However, regularization methods such as the LASSO and SCAD may suffer from the bias problem inherited from the penalty function [12,21]. Specifically, LASSO may select more irrelevant variables, leading to an overestimation of the number of change points. To alleviate this problem, we consider adopting the $L_{0}$ -regularization method to achieve consistency between the estimates of the number of change points and their locations. Although the subset selection, while unbiased, is often computationally expensive, several optimization strategies and algorithms have been proposed to overcome the computational difficulties. For example, ref. [22] introduced an iterative algorithm called the orthogonal greedy algorithm (OGA) for high-dimensional regression models, which sequentially selects input variables to be included in the linear regression model and proposed a procedure named OGA + HDIC + Trim. OGA is a fast stepwise forward regression method that starts with a null model and adds predictors via component-wise linear least squares estimation. HDIC is a high-dimensional information criterion used to enter predictors into the model along the OGA path. The Trim method defines a subset to exclude additional irrelevant variables from OGA + HDIC. To construct confidence intervals for multiple change points, we proceed as follows. We first cut the data sequence into segments. Due to its model selection consistency and expected convergence properties, we apply OGA + HDIC + Trim to find data segments containing change points. We then utilize sup-Wald-type test statistics to locate them. Finally, we apply the bootstrap method to construct confidence intervals for multiple change points using the bootstrap estimated centered error sequence.

The main contributions of this paper are as follows. As described above, we first propose a two-stage procedure to detect multiple change points by using OGA + HDIC + Trim procedure and sup-Wald-type test statistics. In the first stage, we cut the data sequence into segments. In addition, we give the asymptotic distributions of the change point estimators under certain conditions. Based on this framework, we further explore the application of bootstrapping techniques in constructing confidence intervals for the change points and demonstrate the validity of the bootstrap method, i.e., the proposed bootstrap $100 (1 - α) %$ -confidence intervals asymptotically attain the coverage probability of $1 - α$ for a given $α \in (0, 1)$ [20]. Last but not least, we illustrate the effectiveness and applicability of the proposed method through extensive simulation studies and a real data example, respectively.

The rest of this paper is organized as follows. Section 2 details the detection of multiple change points using the OGA + HDIC + Trim procedure and sup-Wald-type test statistics. Furthermore, Section 3 introduces the resampling bootstrap method for the change point estimators based on a two-stage procedure and Section 4 gives the theoretical properties of the proposed bootstrap method. Section 5 presents extensive simulation studies. Section 6 applies the proposed method to the seismograms of the 1982 Urakawa–Oki earthquake. Technical proofs of the main results are relegated to the Appendix A. In this paper, vectors and matrices are denoted in bold type.

2. Multiple Change Point Detection Based on Two-Stage Procedures

Assume that $(x_{i}, y_{i}), i = 1, \dots, n,$ satisfy the following linear regression model with s change points located at $a_{1} < \dots < a_{s}$ as

\begin{matrix} y_{i} & = x_{i}^{⊤} [β_{1} + \sum_{l = 1}^{s} δ_{l} I (a_{l} < i \leq n)] + ε_{i} \\ = \{\begin{matrix} x_{i}^{⊤} β_{1} + ε_{i}, & if 1 \leq i \leq a_{1}, \\ x_{i}^{⊤} (β_{1} + δ_{1}) + ε_{i}, & if a_{1} < i \leq a_{2}, \\ \dots & \dots \\ x_{i}^{⊤} (β_{1} + \sum_{l = 1}^{s} δ_{l}) + ε_{i}, & if a_{s} < i \leq n, \end{matrix} \end{matrix}

(1)

where n is the sample size, $x_{i} = {(x_{i, 1}, \dots, x_{i, q})}^{⊤}$ is a sequence of q-dimensional predictors; $β_{1} = {(β_{1, 1}, \dots, β_{q, 1})}^{⊤} \neq 0$ is an unknown q-dimensional vector of regression coefficients; s is an unknown number of change points; $1 < a_{1} < \dots < a_{s} < n$ are unknown change points; $δ_{l} = {(δ_{1, l}, \dots, δ_{q, l})}^{⊤}, l = 1, \dots, s$ , are unknown changes in regression coefficient vectors at change points; and $ε_{i}$ ’s are unobservable random errors.

The estimation of the regression coefficients $β_{1}$ and $(β_{1} + \sum_{l = 1}^{j} δ_{l}), j = 1, \dots s$ is hampered by the unknown parameters s, $a_{1}, \dots, a_{s}$ . In light of [23], we propose to replace s in (1) with a predetermined number of segments, denoted by $p_{n}$ , to facilitate the estimation of the regression coefficients. This replacement allows us to identify the total number of change points. Specifically, we partition the data sequence into $p_{n}$ segments, where $p_{n} \to \infty$ as $n \to \infty$ . This ensures that all segments excluding the first segment have length m, while the length of the first segment is $n - (p_{n} - 1) m$ . Let $Q_{1} =$ $\{1, 2, \dots, n - (p_{n} - 1) m\}$ and $Q_{l} = \{n - (p_{n} - l + 1) m + 1, \dots, n - (p_{n} - l) m\}, l = 2, \dots, p_{n}$ . See (3) for more details on segmentation. We assume that $m < min (a_{j + 1} - a_{j}) / 2$ for $j = 1, \dots, s$ , so that each segment $Q_{l}$ , for $l = 1, \dots, p_{n}$ , contains at most one change point.

We denote the segment as $k_{j}$ if $a_{j} \in Q_{k_{j}}$ . This partitioning results in the following model:

\begin{matrix} y_{i} & = x_{i}^{⊤} [β_{1} + \sum_{l = 2}^{p_{n}} d_{l} I (n - (p_{n} - l + 1) m < i \leq n) - ω_{i}] + ε_{i}, \end{matrix}

(2)

where

\{\begin{matrix} d_{k_{j}} = δ_{j} \neq 0, & for j = 1, \dots, s, \\ d_{l} = 0, & for any l \notin \{k_{1}, \dots, k_{s}\}, \end{matrix}

and $ω_{i} = d_{k_{j}} I (i \in T_{j})$ with $T_{j} = \{n - (p_{n} - k_{j} + 1$ ) $m + 1, \dots, a_{j}\}$ . $ω_{i} = 0$ for all $i \notin \cup_{j = 1}^{s} Q_{k_{j}}$ .

Remark 1.

By the definition of m, any two segments also contain at most one change point. From (2), it follows that

$\begin{matrix} y_{i} & = \{\begin{matrix} x_{i}^{⊤} [β_{1} + \sum_{l = 1}^{j - 1} δ_{l}] + ε_{i}, & if i \in Q_{k_{j} - 1}, \\ x_{i}^{⊤} [β_{1} + \sum_{l = 1}^{j} δ_{l} - ω_{i}] + ε_{i}, & if i \in Q_{k_{j}}, \\ x_{i}^{⊤} [β_{1} + \sum_{l = 1}^{j} δ_{l}] + ε_{i}, & if i \in Q_{k_{j} + 1} . \end{matrix} \end{matrix}$

Based on the partition, we can obtain least-squares estimates ${\hat{γ}}_{l}$ for $l = k_{j} - 1, k_{j}, k_{j} + 1$ within each segment.

1.
When $a_{j} = n - (p_{n} - k_{j}) m$ , the change point coincides with the pre-specified cut-point and $ω_{i} = δ_{j}$ . The regression coefficients corresponding to the three segments are denoted by $γ_{l}, l = k_{j} - 1, k_{j}, k_{j} + 1$ , and are equal to $β_{1} + \sum_{l = 1}^{j - 1} δ_{l}$ , $β_{1} + \sum_{l = 1}^{j - 1} δ_{l}$ , and $β_{1} + \sum_{l = 1}^{j} δ_{l}$ , respectively. The segment $Q_{k_{j}}$ that contains the change point $a_{j}$ can be identified by ${\hat{γ}}_{k_{j} + 1} - {\hat{γ}}_{k_{j}} \neq 0$ .
1.
When $a_{j} < n - (p_{n} - k_{j}) m$ , due to the existence of $ω_{i}$ , the linear regression model is misspecified on $Q_{k_{j}}$ , causing ${\hat{γ}}_{k_{j}}$ to converge to the pseudo-true value $γ_{k_{j}}$ under model misspecification [24,25]. Since $γ_{k_{j}}$ is different from both $β_{1} + \sum_{l = 1}^{j - 1} δ_{l}$ and $β_{1} + \sum_{l = 1}^{j} δ_{l}$ , $Q_{k_{j}}$ can be determined by the following: ${\hat{γ}}_{k_{j}} - {\hat{γ}}_{k_{j} - 1} \neq 0$ and ${\hat{γ}}_{k_{j} + 1} - {\hat{γ}}_{k_{j}} \neq 0$ .

According to Remark 1, we reformulate the change-point detection problem as a high-dimensional variable selection task by constructing differences between coefficients. Therefore, (2) can be rewritten in matrix form as follows:

y_{n} = X_{n} θ + e_{n} + ε_{n},

(3)

where $y_{n} = {(y_{1}, y_{2}, \dots, y_{n})}^{⊤}$ ,

X_{n} = (X^{(1)}, \dots, X^{(p_{n})}) = (\begin{matrix} X_{(1)} & 0 & \dots & 0 \\ X_{(2)} & X_{(2)} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ X_{(p_{n})} & X_{(p_{n})} & \dots & X_{(p_{n})} \end{matrix}),

$X_{(1)} = {(x_{1}, \dots, x_{n - (p_{n} - 1) m})}^{⊤}$ and $X_{(j)} = {(x_{n - (p_{n} - j + 1) m + 1}, \dots, x_{n - (p_{n} - j) m})}^{⊤}$ . In addition, the definitions of $θ = {(θ_{1}^{⊤}, \dots, θ_{p_{n}}^{⊤})}^{⊤} = {(β_{1}^{⊤}, {(γ_{2} - β_{1})}^{⊤}, \dots, {(γ_{p_{n}} - γ_{p_{n} - 1})}^{⊤})}^{⊤}$ , $ε_{n} = {(ε_{1}, \dots, ε_{n})}^{⊤}$ , and $e_{n}$ represents the artificial n-dimensional error due to model misspecification, whose elements are equal to zero for all $i \notin \cup_{j = 1}^{s} Q_{k_{j}}$ .

It can be seen that $θ_{l} = 0$ if $l \neq 1, k_{j},$ or $k_{j} + 1$ for $j = 1, \dots, s$ . Let $A_{n} = {1, k_{1}, k_{1} + 1, k_{2}, k_{2} + 1, \dots, k_{s}, k_{s} + 1}$ . It is important to note that $θ_{l} = 0_{q \times 1}$ is evident for all $l \notin A_{n}$ . Therefore, the estimation of $A_{n}$ comes down to identifying the non-zero elements from $θ$ , which is the focus of the subsequent subsections.

2.1. Segment Selection

By the definition of m, each segment $Q_{k_{j}}$ contains a change point. In this subsection, our goal is to select segments $Q_{k_{j}}$ for $j = 1, \dots, s$ . From the above statement, we can see that selecting all non-zero elements of $θ$ can yield the estimation of $A_{n}$ effectively. Therefore, after the cutting stage, the problem of detecting multiple change points is reduced to a variable selection problem in high-dimensional scenarios. We use the OGA + HDIC + Trim method for the segment selection.

We rewrite the model (3) as follows:

y_{n} = \sum_{j = 1}^{r_{n}} z_{j} θ_{j} + {\tilde{ε}}_{n},

(4)

where $r_{n} = p_{n} \cdot q$ , ${z_{1}, \dots, z_{r_{n}}}$ are all the column vectors of $x_{n}$ , ${\tilde{ε}}_{n} = e_{n} + ε_{n}$ and $θ = {(θ_{1}, \dots, θ_{r_{n}})}^{⊤}$ . Without loss of generality, replace $y_{n}$ by $y_{n} - y_{n}^{⊤} 1_{n}$ and $z_{j}$ by $z_{j} - z_{j}^{⊤} 1_{n}$ . Define ${\hat{σ}}_{J_{d}}^{2} = n^{- 1} y_{n}^{⊤} (I_{n} - H_{J_{d}}) y_{n}$ , where $H_{J_{d}}$ is the orthogonal projection matrix onto the linear span of ${z_{j}, j \in J_{d}}$ . For convenience, denote $H_{\emptyset} = 0$ . The proposed OGA+HDIC+Trim algorithm is outlined in Algorithm 1.

Algorithm 1 OGA + HDIC + Trim

Require: response vector

y_{n} \in R^{n}

, regressor matrix

X_{n} \in R^{n \times r_{n}}

.
Initialzation: set

d = 0

r^{(0)} = y_{n}

and

{\hat{J}}_{0} = \emptyset

.
While

d = 0, \dots, D_{n}

do
Compute

{\hat{j}}_{d + 1} = arg {max}_{1 \leq j \leq r_{n}, j \notin {\hat{J}}_{d - 1}} | z_{j}^{⊤} r^{(d)} | / (n^{1 / 2} ∥ z_{j} ∥)

and update

{\hat{J}}_{d + 1} = {\hat{J}}_{d} ⋃ {\hat{j}}_{d + 1}

;
Compute

z_{{\hat{j}}_{d + 1}}^{⊥}

via

z_{{\hat{j}}_{d + 1}}^{⊥} = (I_{n} - \sum_{ℓ = 1}^{d} H_{{\hat{j}}_{ℓ}}^{⊥}) z_{{\hat{j}}_{d + 1}}

, where

H_{{\hat{j}}_{ℓ}}^{⊥} = z_{{\hat{j}}_{ℓ}} z_{{\hat{j}}_{ℓ}}^{⊤} / {∥ z_{{\hat{j}}_{ℓ}} ∥}^{2}

;
Compute

r^{(d + 1)}

via

r^{(d + 1)} = (I_{n} - H_{{\hat{j}}_{d + 1}}^{⊥}) r^{(d)}

.
end
Compute the minimum of HDIC via

\begin{matrix} HDIC ({\hat{J}}_{d}) & = log {\hat{σ}}_{{\hat{J}}_{d}}^{2} + ♯ ({\hat{J}}_{d}) c_{n} log (r_{n}) / n, \\ {\hat{d}}_{n} & = arg min_{1 \leq d \leq D_{n}} HDIC ({\hat{J}}_{d}) . \end{matrix}

(5)

return

{\hat{J}}_{n}

via

{\hat{J}}_{n} = \{\begin{matrix} {{\hat{j}}_{ℓ} : HDIC ({\hat{J}}_{d} - {{\hat{j}}_{ℓ}}) > HDIC ({\hat{J}}_{d}), 1 \leq ℓ \leq {\hat{d}}_{n}}, & if {\hat{d}}_{n} > 1; \\ {{\hat{j}}_{1}}, & if {\hat{d}}_{n} = 1 . \end{matrix}

Open in a new tab

In this context, the value of $D_{n}$ denotes the maximum number of iterations, and the convergence rate theory of OGA in [22] shows that $D_{n} = O (n^{1 / 2} {(log r_{n})}^{- 1 / 2})$ . The parameter $c_{n}$ satisfies the conditions $c_{n} \to \infty$ and $c_{n} log r_{n} = o (n^{1 - 2 γ})$ , where $γ \in [0, 1)$ . Let $\hat{θ} = {({\hat{θ}}_{1}^{⊤}, \dots, {\hat{θ}}_{p_{n}}^{⊤})}^{⊤}$ be the estimate obtained by applying the OGA+HDIC+Trim procedure. Therefore, we derive an estimate of $A_{n}$ from ${\hat{J}}_{n}$ as ${\hat{A}}_{n} = {l : {\hat{θ}}_{l} \neq 0, l = 1, \dots, p_{n}} .$ Denote

{\hat{C}}_{n} = \{l : l \in {\hat{A}}_{n}, l + 1 \notin {\hat{A}}_{n}, l = 2, \dots, p_{n} + 1\} = \{{\hat{k}}_{1}, \dots, {\hat{k}}_{\hat{s}}\},

(6)

where ${\hat{k}}_{1} < \dots < {\hat{k}}_{\hat{s}}$ . It is clear that if $l + 1 \notin {\hat{A}}_{n}, l \in {\hat{A}}_{n}$ and $l - 1 \in {\hat{A}}_{n}$ , then $l \in {\hat{C}}_{n}$ and $l - 1 \notin {\hat{C}}_{n}$ . Let $Q_{(j)} = Q_{k_{j}} \cup Q_{k_{j} + 1}$ . It only includes the change point $a_{j}$ since $m < {min}_{j} (a_{j + 1} - a_{j}) / 2$ , and it ensures that no change point overlaps with the cut-point of $Q_{(j)}$ . By following the steps outlined above and considering that the Wald test cannot detect the location of the change point at the partition boundary, we derive the following selected segments:

{\hat{Q}}_{(j)} = {\hat{Q}}_{{\hat{k}}_{j}} ⋃ {\hat{Q}}_{{\hat{k}}_{j} + 1} = {n - (p_{n} - {\hat{k}}_{j} + 1) m + 1, n - (p_{n} - {\hat{k}}_{j} - 1) m},

(7)

where ${\hat{Q}}_{{\hat{k}}_{j}} = {n - (p_{n} - {\hat{k}}_{j} + 1) m + 1, \dots, n - (p_{n} - {\hat{k}}_{j}) m}$ .

2.2. Refining

By Theorem 1 in Section 4, $\hat{s}$ converges to s as $n \to \infty$ . Hence, we assume that for a large n, there exists ${\hat{Q}}_{(j)}$ such that $a_{j} \in {\hat{Q}}_{(j)}$ . We now show how to estimate this change point. Note that for $i \in {\hat{Q}}_{(j)} = {n - (p_{n} - {\hat{k}}_{j} + 1) m + 1, n - (p_{n} - {\hat{k}}_{j} - 1) m}$ , we have

y_{i} = x_{i}^{⊤} [β_{j} + δ_{j} I (a_{j} < i \leq n_{j}^{(r)})] + ε_{i}, n_{j}^{(l)} \leq i \leq n_{j}^{(r)},

(8)

where $n_{j}^{(l)} = n - (p_{n} - {\hat{k}}_{j} + 1) m + 1$ and $n_{j}^{(r)} = n - (p_{n} - {\hat{k}}_{j} - 1) m$ . Here, $δ_{j} = β_{j + 1} - β_{j}$ and $β_{j}$ and $β_{j + 1}$ are unknown q-dimensional vectors of the regression coefficients on the line segment ${n_{j}^{(l)}, a_{j}}$ and ${a_{j}, n_{j}^{(r)},}$ , respectively. $n_{j}^{(r)} - n_{j}^{(l)} + 1 = 2 m$ .

We compute the sup-Wald test statistics [26] and estimate $a_{j}$ by

{\hat{a}}_{j, h} = arg max_{h} {\hat{δ}}_{j; h}^{⊤} (Z_{j, h}^{⊤} M_{j} Z_{j, h}) {\hat{δ}}_{j; h}, n_{j}^{(l)} + q < h < n_{j}^{(r)} - q,

(9)

where $Z_{j, h} = {(0, \dots, 0, x_{h + 1}, x_{h + 2}, \dots, x_{n_{j}^{(r)}})}^{⊤}$ , $X_{j} = {(x_{n_{j}^{(l)}}, x_{n_{j}^{(l)} + 1} . \dots, x_{n_{j}^{(r)}})}^{⊤}$ , and $M_{j} = I_{n} - X_{j} {(X_{j}^{⊤} X_{j})}^{- 1} x_{j}^{⊤}$ . We also obtain the estimates $({\hat{β}}_{j; h}, {\hat{δ}}_{j; h})$ of $(β_{j}, δ_{j})$ by regressing $y_{(j)} = {(y_{n_{j}^{(l)}}, \dots, y_{n_{j}^{(r)}})}^{⊤}$ on $X_{j}$ and $Z_{j, h}$ , respectively. The limiting behavior of ${\hat{a}}_{j, h}$ is given in Section 4.

Since the multiple change points are not dense in the data sequence, without loss of generality, we assume that $m < min (a_{j + 1} - a_{j}) / 2,$ $j = 1, \dots, s - 1$ , so that each segment $Q_{(j)}$ contains at most one change point. Note that if m is too small, it leads to inconsistency in estimating the regression parameters and increases the computational time. Therefore, we need to avoid choosing too small a value for m. To address this issue, we define $m = ⌈ c_{0} \sqrt{n} ⌉$ according to Theorem 2 in Section 3, where $c_{0}$ serves as a tuning parameter, and $⌈ \cdot ⌉$ is the ceiling function. We study the range of values of $c_{0}$ on the interval $[0.1, 1.5]$ . The final value of m is determined using the Bayesian Information Criterion (BIC) as follows:

\hat{m} = arg min_{m} log \{\sum_{i = 1}^{n} {(y_{i} - x_{i}^{⊤} [{\hat{β}}_{1, h} + \sum_{j = 1}^{\hat{s}} {\hat{δ}}_{j, h} I ({\hat{a}}_{j, h} < i \leq n)])}^{2}\} + \hat{s} \cdot q log n .

3. Bootstrap Confidence Intervals for Multiple Change Points

In this section, we construct bootstrap confidence intervals for multiple change points. It helps us to study the behavior of multiple change points in a linear regression model. Obviously, the two-stage procedure inherently involves the quantification of uncertainty in the number of change points and their respective locations.

We define the estimated residuals ${\hat{ε}}_{i}$ and the centered residuals ${\tilde{ε}}_{i}$ as follows:

\begin{matrix} {\hat{ε}}_{i} & = y_{i} - x_{i}^{⊤} [{\hat{β}}_{1, h} + \sum_{l = 1}^{\hat{s}} {\hat{δ}}_{l, h} I ({\hat{a}}_{l, h} < i \leq n)], \\ {\tilde{ε}}_{i} & = {\hat{ε}}_{i} - \frac{1}{{\hat{a}}_{j, h} - {\hat{a}}_{j - 1, h}} \sum_{l = {\hat{a}}_{j - 1, h} + 1}^{{\hat{a}}_{j, h}} {\hat{ε}}_{l}, {\hat{a}}_{j - 1, h} < i \leq {\hat{a}}_{j, h}, \end{matrix}

(10)

where ${\hat{a}}_{0, h} = 1$ and ${\hat{a}}_{\hat{s} + 1, h} = n$ . Let $ε_{{\hat{a}}_{j - 1, h} + 1}^{*}, \dots, ε_{{\hat{a}}_{j, h}}^{*}$ be independently and identically distributed (i.i.d.) random variables sampled from the empirical distribution function of ${{\tilde{ε}}_{{\hat{a}}_{j - 1, h} + 1}, \dots, {\tilde{ε}}_{{\hat{a}}_{j, h}}}$ . We then consider the bootstrap observations defined as follows:

\begin{matrix} y_{i}^{*} & = x_{i}^{⊤} [{\hat{β}}_{1, h} + \sum_{l = 1}^{\hat{s}} {\hat{δ}}_{l, h} I ({\hat{a}}_{l, h} < i \leq n)] + ε_{i}^{*}, i = 1, \dots, n, \end{matrix}

(11)

where $ε_{i}^{*}$ represents the bootstrapped version of the residuals. To obtain an approximation to the distribution of ${\hat{a}}_{j; h}$ in (9), a bootstrap statistic of the estimate is defined as

{\hat{a}}_{j^{*}; h_{b}}^{*} = arg max_{h_{b}} {\hat{δ}}_{j^{*}; h_{b}}^{* ⊤} (z_{j^{*}, h_{b}}^{⊤} M_{j} Z_{j^{*}, h_{b}}) {\hat{δ}}_{j^{*}; h_{b}}^{*}, j^{*} = 1, \dots, {\hat{s}}^{*},

(12)

where the number of change points ${\hat{s}}^{*}$ and the ${\hat{C}}_{n}^{*}$ are obtained by performing OGA on $y_{n}^{*} = {(y_{1}^{*}, \dots, y_{n}^{*})}^{⊤}$ as in (6). The data sequence segmentation remains unchanged. In this case, $Z_{j^{*}, h_{b}} = {(0, \dots, 0, x_{h_{b} + 1}, \dots, x_{n_{j^{*}}^{(r)}})}^{⊤}$ , and $({\hat{β}}_{j^{*}; h_{b}}^{*}, {\hat{δ}}_{j^{*}; h_{b}}^{*})$ represents the least-squares estimates obtained by regressing $y_{(j^{*})}^{*} = {(y_{n_{j^{*}}^{(l)}}^{*}, \dots, y_{n_{j^{*}}^{(r)}}^{*})}^{⊤}$ on $X_{j^{*}}$ and $Z_{j^{*}, h_{b}}$ . Next, we describe the construction of bootstrap CIs for multiple change points.

1.
We generate a bootstrap sample ${(y_{1}^{*}, \dots, y_{n}^{*})}^{⊤}$ by randomly sampling residuals ${(ε_{1}^{*}, \dots, ε_{n}^{*})}^{⊤}$ from the set ${{\tilde{ε}}_{1}, \dots, {\tilde{ε}}_{n}}$ as in (11);
2.
We apply the two-stage procedure and compute the local maximizer obtained as in (12) for each estimated segment;
3.
For a given bootstrap sample size B, we repeat Steps 1-2 B times and record ${\hat{a}}_{j^{*}; h_{b}}^{* (b)}$ , $j^{*} = 1, \dots, {\hat{s}}^{*}$ , where $b = 1, \dots, B$ .

Therefore, the bootstrap-based approximation for the change point $a_{j}$ can be constructed. Generally, for any $α \in (0, 1)$ , the bootstrap $100 (1 - α) %$ confidence interval for the change point $a_{j}$ is given by the following:

{CIs}_{j} (α) = [{\hat{a}}_{j, h} + q_{U} (α / 2), {\hat{a}}_{j, h} + q_{L} (α / 2)],

(13)

where $q_{U} (α / 2) = sup \{x; \frac{1}{B} \sum_{b = 1}^{B} I ({\hat{a}}_{j^{*}, h_{b}}^{* (b)} - {\hat{a}}_{j; h} \leq x) \leq α / 2\}$ , and $q_{L} (α / 2) = inf \{x; \frac{1}{B} \sum_{b = 1}^{B} I ({\hat{a}}_{j^{*}, h_{b}}^{* (b)} - {\hat{a}}_{j; h} \geq x) \geq 1 - α / 2\} .$

If ${\hat{C}}_{n} \subseteq {\hat{C}}_{n}^{*}$ , then ${\hat{a}}_{j; h_{b}}^{*}$ is an estimate of ${\hat{a}}_{j; h}$ for $j = 1, \dots, \hat{s}$ . If some elements of ${\hat{C}}_{n}$ are not in the set ${\hat{C}}_{n}^{*}$ , then ${\hat{a}}_{j; h}$ has no corresponding bootstrap estimate for some j. Hence, the bootstrap CI of ${\hat{a}}_{j; h}$ is constructed using ${{\hat{a}}_{j; h_{b}}^{* (b)} : {\hat{a}}_{j; h_{b}}^{* (b)} \notin \emptyset, b = 1, \dots, B^{*}}$ instead of ${{\hat{a}}_{j^{*}; h_{b}}^{* (1)}, \dots, {\hat{a}}_{j^{*}; h_{b}}^{* (B)}}$ in (13), which yields

{CIs}_{j}^{*} (α) = [{\hat{a}}_{j, h} + q_{U}^{*} (α / 2), {\hat{a}}_{j, h} + q_{L}^{*} (α / 2)],

(14)

where

q_{U}^{*} (α / 2) = sup \{x : \frac{1}{B^{*}} \sum_{b = 1}^{B^{*}} I ({\hat{a}}_{j, h_{b}}^{* (b)} - {\hat{a}}_{j; h} \leq x) \leq α / 2\},

and

q_{L}^{*} (α / 2) = inf \{x : \frac{1}{B^{*}} \sum_{b = 1}^{B^{*}} I ({\hat{a}}_{j, h_{b}}^{* (b)} - {\hat{a}}_{j; h} \geq x) \geq 1 - α / 2\} .

4. Theoretical Validity of the Bootstrap Confidence Intervals

To investigate the performance of the bootstrap CIs for multiple change points, we make the following assumptions.

Assumption 1.

If $s ⩾ 1$ , then $a_{j} / n \to τ_{j} > 0$ for $1 \leq j \leq s .$ Furthermore, if $s ⩾ 2$ , then ${min}_{1 \leq j \leq s - 1} (τ_{j + 1} - τ_{j}) > 0$ .

Assumption 2.

$\{ε_{i}, i = 1, 2, \dots, n\}$ is a sequence of independently and identically distributed random variables with $E (ε_{i}) = 0$ and $E (ε_{i}^{2}) = σ^{2}$ .

Assumption 3.

$a_{j} - n_{j}^{(l)} = ⌈ 2 τ_{j} m ⌉$ , where $τ_{j} \in (0, 1)$ and $⌈ \cdot ⌉$ is the ceilling function.

Assumption 4.

${sup}_{(t_{2} - t_{1}) \geq 1} ∥\sum_{i = t_{1}}^{t_{2}} x_{i} x_{i}^{⊤} / (t_{2} - t_{1})∥$ is stochastically bounded. $ε_{i}$ is independent of the regressor $x_{j}$ for all i and j.

Assumption 5.

$δ_{j} \to 0$ , and $δ_{j}^{- 1} {(2 m)}^{- 1 / 2 + α} = o (1)$ for some $α \in (0, 1 / 2)$ as $n \to \infty$ .

By Assumption 1, it follows that $m / (a_{j + 1} - a_{j}) \to 0$ , i.e., there is at most one change point in each segment for a large n. Assumption 2 follows that the residuals are independent and identically distributed and justifies the use of bootstrapping to estimate the central error, helping generate the sample distribution of the change points. Assumption 3 assumes that the shift point is bounded from the endpoints for asymptotic purposes. Assumption 4 requires that there is enough data around the change point and at the beginning and end of the sample so that the change point can be identified. The asymptotic distribution of ${\hat{a}}_{j, h}$ depends on various unknown quantities, with the magnitude of the change $δ_{j}$ being the most significant. Assumption 5 is the minimum signal amplitude of the regression coefficient in the high-dimensional setting. The shift amplitude cannot be too small; otherwise, the change point will not be identified. Based on this, we give the necessary Assumption 5. Next, we establish the consistency of the number of change points $\hat{s}$ and the change point estimator ${\hat{a}}_{j, h}$ . The following theorem provides the consistency of the estimated number of change points.

Theorem 1.

Suppose that $m \to \infty$ , $p_{n} \to \infty$ , $D_{n} = O ({(n / log (r_{n}))}^{1 / 2})$ , and $log (r_{n}) / n \to 0$ as $n \to \infty$ . Under Assumptions 1–5, we have

$\begin{matrix} lim_{n \to \infty} P (\hat{s} = s) = 1; \\ lim_{n \to \infty} P (a_{j} \in {\hat{Q}}_{(j)}, j = 1, \dots, s ∣ \hat{s} = s) = 1, \end{matrix}$ (15)

where $\hat{s}$ and ${\hat{k}}_{j}, j = 1, \dots, \hat{s}$ are given in (6).

Theorem 1 extends Theorem 4 in [22] to the multiple change points detection case. The consistency and asymptotic distribution of change points estimators are given below.

Theorem 2.

If $\sum_{i = t_{1}}^{t_{2}} x_{i} x_{i}^{⊤} / (t_{1} - t_{2}) \to_{p} V$ as $t_{2} - t_{1} \to \infty$ , under Assumptions 1–4, we have that when $n \to \infty$ , ${\hat{a}}_{j, h} - a_{j} = O_{p} (∥ δ_{j} ∥^{- 2})$ and

$\frac{δ_{j}^{⊤} V δ_{j}}{σ^{2}} ({\hat{a}}_{j, h} - a_{j}) \to_{d} arg max {W (c) - | c | / 2 : c \in R},$

where V is a strictly positive definite matrix, ${W (c) : c \in R}$ is a two-sided Wiener process, and $δ_{j}$ is a fixed value or satisfies $δ_{j} \to 0$ , as specified in Assumption 5.

Subsequently, we establish the validity of the bootstrap CIs in (14). For future reference, we define some notations here. Any symbol with a superscript ∗ denotes an object under the bootstrap probability measure, rather than the original measure used in some of the other sections. For example, $E^{*} (\cdot)$ denotes the conditional expectation with respect to the bootstrap probability measure conditional on the original data. Similarly, $P^{*} (\cdot)$ denotes the conditional probability under the bootstrap measure.

Theorem 3.

Under the assumptions of Theorem 2, we have

$sup_{x \in R} | P^{*} ({\hat{a}}_{j, h_{b}}^{*} - {\hat{a}}_{j, h} \leq x) - P ({\hat{a}}_{j, h} - a_{j} \leq x) | \to_{p} 0 .$

The proofs of Theorems 2 and 3 are given in Appendix A.

Combining Theorem 3 with Theorems 1 and 2 establishes the validity of the bootstrap method for multiple change points.

Corollary 1.

Under Assumptions 1–5, we have that, as $n \to \infty$ ,

$sup_{x \in R} |P^{*} (\cap_{j = 1}^{s} \{{\hat{a}}_{j, h_{b}}^{*} - {\hat{a}}_{j, h} \leq x\}) - P (\cap_{j = 1}^{s} \{{\hat{a}}_{j, h} - a_{j} \leq x\})| \to_{p} 0 .$

Since $P (a_{j} \in {CIs}_{j}^{*} (α)) \to 1 - α$ for each $j = 1, \dots, s$ , by the Bonferroni correction, we have $P (\cap_{j = 1}^{s} \{a_{j} \in {CIs}_{j}^{*} (α / s)\}) \to 1 - α$ . The asymptotic validity of the proposed bootstrap CIs in (14) follows.

5. Simulation

In this section, we first present the simulation results for change point detection given in Section 2 and compare them with the two-stage multiple change point detection procedure involving LASSO (TSMCD_lasso) by [10]. We also construct confidence intervals using the bootstrap method proposed in Section 3. We denote the CIs in (14) by bootstrap_oga,wald.

5.1. Detection of Multiple Change Points

We consider the simulation setting where the change points $a_{j}$ , $j = 1, 2, 3$ are, respectively, 150, 300, and 450, respectively, and generate data from the model

\begin{matrix} y_{t} = & 2 cos (t π / 30) + 2 sin (t π / 30) + 0.1 y_{t - 1} \\ + (- 3 cos (t π / 30) + sin (t π / 30) + 0.2 y_{t - 1}) I_{(150, 600]} (t) \\ + (2 cos (t π / 30) - 0.3 y_{t - 1}) I_{(300, 600]} (t) \\ + (2 cos (t π / 30) + 2 sin (t π / 30)) I_{(450, 600]} (t) + ε_{t}, \end{matrix}

(16)

where $ε_{1}, \dots, ε_{n}$ are independent and follow the standard normal distribution. The simulated data are shown in Figure 1. In this model, ${y_{t}, t = 1, \dots, 600}$ represents a periodic autocorrelation sequence with a period of 30 and an autocorrelation order of 1.

Simulated regression data with change points indicated by dotted lines.

We perform 1000 Monte Carlo simulations for multiple change point estimation in Table 1. According to [22], we take $c_{n} = 2$ in (5), similar to AIC. We focus on counting the number of events for which ${| {\hat{a}}_{j} - a_{j} | \leq 5}$ . The percentage of the correct identifications of all change points, denoted as $c_{all} (%)$ , reflects the proportion of replicates for which $| {\hat{a}}_{j} - a_{j} | \leq 5$ for all j. In addition, $c_{j} (%)$ represents the proportion of replicates for which $| {\hat{a}}_{j} - a_{j} | \leq 5$ . The mean and standard error of the estimated change points are calculated for the replicates for which the difference between the estimated change points and the true value is less than or equal to 50 (i.e., $| {\hat{a}}_{j} - a_{j} | \leq 50$ ).

Table 1.

Performance of different methods for multiple change point detection.

Method	$c_{all}$		$c_{1}$	$c_{2}$	$c_{3}$
TSP_oga,wald	90.70		98.50	96.80	97.80
		Mean	150.34	300.41	449.77
		SE	1.66	2.31	2.22
TSMCD_lasso	72.60		95.20	95.80	96.40
		Mean	150.61	300.43	450.16
		SE	2.42	2.31	2.19

Open in a new tab

From Table 1, we can see that TSP_oga,wald generally outperforms TSMCD_lasso in terms of yielding a correct identification rate. It is noteworthy that TSMCD_lasso performs significantly worse than TSP_oga,wald in identifying all change points, especially at $a_{1}$ , as evidenced by the lower $c_{1}$ value. TSP_oga,wald and TSMCD_lasso show comparable estimation accuracy in terms of the mean and standard error (SE) of the estimated change points.

5.2. Bootstrap CIs

All results are based on 500 realizations in the simulation setting, and we use $B = 500$ to give the corresponding bootstrap CIs. We assume the confidence level is $(1 - α) \in {0.9, 0.95}$ . For each j, the coverage of the bootstrap CI is calculated as the proportion of simulated realizations where ${CIs}_{j}^{*} (α)$ contains $a_{j}$ . For all j, the coverage of the bootstrap CIs is calculated as the proportion of simulated realizations, where $\cup_{j = 1}^{s} {CIs}_{j}^{*} (α / s)$ contains all $a_{j}$ .

Table 2 confirms the effectiveness of our bootstrap method, as the empirical coverage probability for each $a_{j}$ is close to the nominal level. The overall coverage of the bootstrap confidence intervals for all j is slightly higher than the nominal level, which may be due to the complexity of multiple change point detection. The average computational time for the bootstrap_oga,wald procedure is 1.44 min per Monte Carlo replication, as measured on an Intel(R) Core(TM) i9-14900K processor (3.20 GHz) with 64 GB of RAM.

Table 2.

Performance of bootstrap_oga,wald.

$(1 - α)$ %	$a_{all}$	$a_{1}$	$a_{2}$	$a_{3}$
90	93.80	91.80	93.80	91.00
95	96.80	95.80	95.80	95.60

Open in a new tab

6. Empirical Application

In this section, we illustrate the proposed method by an application to the east–west component of seismograms, recorded at Iwanai station during the first foreshock of the Urakawa–Oki earthquake in 1982. This dataset has been previously studied by [27]. The time series data are analyzed using autoregressive models (AR) of an order of 5. The estimates and visualization of the 95% confidence level bootstrap CIs are presented in Table 3 and Figure 2, respectively.

Table 3.

Bootstrap CIs for the change points.

Change Point	95% Bootstrap CIs	90% Bootstrap CIs
3074	[3072, 3084]	[3073, 3080]
3914	[3877, 3952]	[3882, 3948]

Open in a new tab

Change points estimated by our method (vertical lines), with shaded areas representing the 95% confidence intervals around the change points. The results for the two change points are shown in red and blue, respectively.

It can be seen from Table 3 that change points are detected at 3074 and 3914. In [27], they found the estimated change points to be 3079 and 3929. It is noteworthy that [27]s’ results and ours are close. In geology, these two change points represent the arrival times of P-waves and S-waves, which are two types of seismic waves. From Figure 2, it is clear that the confidence interval (CI) of the first change point is narrower than that of the second change point.

This example demonstrates the applicability and effectiveness of our method in detecting change points in seismic data. By accurately identifying the locations of structural faults, we can gain insights into the underlying geological processes and improve our understanding of seismic events.

7. Conclusions

This paper effectively addresses the bias issues often encountered when using OGA for model selection and the estimation of segments containing change points in the cutting stage. The accuracy of multiple change point estimation is improved by applying sup-Wald-type test statistics in the refining stage. The proposed method successfully constructs confidence intervals for multiple change points by using a bootstrapping technique with a two-stage procedure to quantify the uncertainty of multiple change points. The reliable construction of confidence intervals makes this method a valuable addition to the field of change point analysis and regression modeling. The bootstrapping technique also guarantees asymptotic validity. Numerical studies demonstrate the statistical accuracy of the proposed method. Our method can also be applied together with block bootstrapping to the parameter changes of linear regression models with dependent errors.

Appendix A

Proof of Theorem 2.

We only sketch this proof because it is similar to the proof of Propositions 1 and 2 in [26]. To prove Theorem 2, we define $V (h) = {\hat{δ}}_{j; h}^{⊤} (Z_{j, h}^{⊤} M_{j} Z_{j, h}) {\hat{δ}}_{j; h}$ . For the sake of simplicity, we denote $h_{0} = a_{j}$ , which means that $h_{0}$ is the change point in $I_{k_{j}}$ . If $h = h_{0}$ , we can obtain that $Z_{j, h} = Z_{j, h_{0}}$ , where $Z_{j, h_{0}} = {(0, \dots, 0, x_{h_{0}}, \dots, x_{n_{j}^{(r)}})}^{⊤}$ . By Equation (9), ${\hat{a}}_{j, h} = arg {max}_{h} V (h)$ . Note that

$\begin{matrix} {\hat{δ}}_{j; h} = {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} (Z_{j, h}^{⊤} M_{j} Z_{j, h_{0}} δ_{j} + Z_{j, h}^{⊤} M_{j} ε), \\ {\hat{δ}}_{j, h_{0}} = δ_{j} + {(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h_{0}})}^{- 1} Z_{j, h_{0}}^{⊤} M_{j} ε . \end{matrix}$

It follows that

$\begin{matrix} V (h) - V (h_{0}) & = δ_{j}^{⊤} \{(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h}) {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} (Z_{j, h}^{⊤} M_{j} Z_{j, h_{0}}) - (Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h_{0}})\} δ_{j} \\ + v (h), \end{matrix}$

where

$\begin{matrix} v (h) & = 2 δ_{j}^{⊤} (Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h}) {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h_{0}}^{⊤} M_{j} ε - 2 δ_{j}^{⊤} Z_{j, h_{0}}^{⊤} M_{j} ε \\ + ε^{⊤} M_{j} Z_{j, h} {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε - ε^{⊤} M_{j} Z_{j, h_{0}} {(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h_{0}})}^{- 1} Z_{j, h_{0}}^{⊤} M_{j} ε . \end{matrix}$

Define for $h \neq h_{0}$ ,

$\begin{matrix} g (h) & = δ_{j}^{⊤} |(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h}) {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} (Z_{j, h}^{⊤} M_{j} Z_{j, h_{0}}) \\ - Z_{j, h_{0}} M_{j} Z_{j, h_{0}}| δ_{j} / |h_{0} - h| . \end{matrix}$

When $h = h_{0}$ , define $g (h) = δ_{j}^{⊤} δ_{j}$ . Then, we have

$\begin{matrix} V (h) - V (h_{0}) & = - | h_{0} - h | g (h) + v (h) . \end{matrix}$ (A1)

Let

$Z_{Δ} = \{\begin{matrix} Z_{j, h} - Z_{j, h_{0}} = {(0, \dots, 0, x_{h + 1}, \dots, x_{h_{0}}, 0, \dots, 0)}^{⊤}, & h < h_{0}, \\ Z_{j, h_{0}} - Z_{j, h} = {(0, \dots, 0, x_{h_{0} + 1}, \dots, x_{h}, 0, \dots, 0)}^{⊤}, & h > h_{0}, \\ 0, & h = h_{0} . \end{matrix}$

We have $Z_{j, h_{0}} = Z_{j, h} - Z_{Δ} sgn (h_{0} - h)$ . It follows that

$\begin{matrix} v (h) & = v_{1} (h) + v_{2} (h) + v_{3} (h) + v_{4} (h) + v_{5} (h) \\ = [2 δ_{j}^{⊤} Z_{Δ}^{⊤} ε] sgn (h_{0} - h) \\ - [2 δ_{j}^{⊤} Z_{Δ}^{⊤} X_{j} {(X_{j}^{⊤} X_{j})}^{- 1} X_{j}^{⊤} ε] sgn (h_{0} - h) \\ - [2 δ_{j}^{⊤} (Z_{Δ}^{⊤} M_{j} Z_{j, h}) {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε] sgn (h_{0} - h) \\ + ε^{⊤} M_{j} Z_{j, h} {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε \\ - ε^{⊤} M_{j} Z_{j, h_{0}} {(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h_{0}})}^{- 1} Z_{j, h_{0}}^{⊤} M_{j} ε . \end{matrix}$ (A2)

We now establish the convergence rate of the change point estimator ${\hat{a}}_{j, h}$ in Theorem 2. By Lemma A.2 in [26], there exists $λ > 0$ such that for every $ϵ > 0$ , ${inf}_{| h - h_{0} | > C ∥ δ_{j} ∥^{- 2}} g (h) \geq λ {∥ δ_{j} ∥}^{2}$ has a probability of at least $1 - ϵ$ . Therefore, we only need to prove that

$\begin{matrix} P (| {\hat{a}}_{j, h} - a_{j} | > C ∥ δ_{j} ∥^{- 2}) \\ \leq & P (sup_{|h - h_{0}| > C {∥ δ_{j} ∥}^{- 2}} V (h) \geq V (h_{0})) \\ \leq & P (sup_{h \in K (C)} |\frac{v (h)}{(h_{0} - h) {∥ δ_{j} ∥}^{2}}| > λ) \leq ϵ, \end{matrix}$ (A3)

where $K (C) = \{h : |h - h_{0}| > C {∥ δ_{j} ∥}^{- 2} and n_{j}^{(l)} + η N_{j} \leq h \leq n_{j}^{(r)} - η N_{j}\}$ for a small number of $η > 0$ . It is easy to show that

$\begin{matrix} sup_{h \in K (C)} \frac{| v_{2} (h) |}{| h_{0} - h | ∥ δ_{j} ∥^{2}} & \leq 2 ∥ δ_{j} ∥^{- 1} ∥ Z_{Δ}^{⊤} x_{j} / (h_{0} - h) ∥ ∥ {(x_{j}^{⊤} x_{j})}^{- 1} x_{j}^{⊤} ε ∥ \\ = O_{p} (∥ δ_{j} ∥^{- 1} N_{j}^{- 1 / 2}) = o_{p} (1), \\ sup_{h \in K (C)} \frac{v_{3} (h)}{| h_{0} - h | ∥ δ_{j} ∥^{2}} & \leq ∥ δ_{j} ∥^{- 1} ∥ (Z_{Δ}^{⊤} M_{j} Z_{j, h}) / (h_{0} - h) ∥ ∥ {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε ∥ \\ = O_{p} (∥ δ_{j} ∥^{- 1} N_{j}^{- 1 / 2}) = o_{p} (1), \\ sup_{h \in K (C)} \frac{| v_{4} (h) |}{| h_{0} - h | ∥ δ_{j} ∥^{2}} & \leq ∥ δ_{j} ∥^{- 2} ∥ {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1 / 2} Z_{j, h}^{⊤} M_{j} {ε ∥}^{2} / | h_{0} - h | \\ = O_{p} (∥ δ_{j} ∥^{- 2} | h_{0} - h |^{- 1}) = O_{p} (1), \\ sup_{h \in K (C)} \frac{| v_{5} (h) |}{| h_{0} - h | ∥ δ_{j} ∥^{2}} & \leq ∥ δ_{j} ∥^{- 2} ∥ {(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h_{0}})}^{- 1 / 2} Z_{j, h_{0}}^{⊤} M_{j} {ε ∥}^{2} / | h_{0} - h | \\ = O_{p} (∥ δ_{j} ∥^{- 2} | h_{0} - h |^{- 1}) = O_{p} (1) . \end{matrix}$

By Lemma A.3 in [26], there exists a large C such that

$\begin{matrix} P (sup_{h \in K (C)} \frac{| v_{1} (h) |}{| h_{0} - h | ∥ δ_{j} ∥^{2}} > \frac{λ}{5}) \\ \leq & P (sup_{h \in K (C)} ∥ δ_{j} ∥^{- 1} ∥ Z_{Δ}^{⊤} ε / (h_{0} - h) ∥ > \frac{λ}{10}) \leq \frac{100 B}{λ^{2} C} \leq \frac{ϵ}{5} . \end{matrix}$

This completes the proof of (A3).

To study the limiting distribution of ${\hat{a}}_{j, h}$ , we need to investigate the behavior of $V (h)$ on $D (C) = {h : | h - h_{0} | \leq C ∥ δ_{j} ∥^{- 2}}$ . Since $∥ X_{j}^{⊤} Z_{Δ} ∥ = O_{p} (∥ δ_{j} ∥^{- 2})$ and $∥ Z_{j, h}^{⊤} M_{j} Z_{Δ} ∥ = O_{p} (∥ δ_{j} ∥^{- 2})$ , we have

$\begin{matrix} | h_{0} - h | g (h) & = δ_{j}^{⊤} \{(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h}) {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} (Z_{j, h}^{⊤} M_{j} Z_{j, h_{0}}) - (Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h_{0}})\} δ_{j} \\ = δ_{j}^{⊤} {Z_{Δ}^{⊤} Z_{Δ} - Z_{Δ}^{⊤} X_{j} {(X_{j}^{⊤} X_{j})}^{- 1} X_{j}^{⊤} Z_{Δ}} δ_{j} \\ - δ_{j} {(Z_{Δ}^{⊤} M_{j} Z_{j, h}) {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} (Z_{j, h}^{⊤} M_{j} Z_{Δ})} δ_{j} \\ = δ_{j}^{⊤} Z_{Δ}^{⊤} Z_{Δ} δ_{j} + o_{p} (1) . \end{matrix}$

Consider $v (h)$ in (A2). It is straightforward to prove that if $| h - h_{0} | \leq C ∥ δ_{j} ∥^{- 2}$ , then $v_{2} (h) = O_{p} (∥ δ_{j} ∥^{- 1} N_{j}^{- 1 / 2}) = o_{p} (1)$ , and $v_{3} (h) = O_{p} (∥ δ_{j} ∥^{- 1} N_{j}^{- 1 / 2}) = o_{p} (1)$ . It can also be shown that

$\begin{matrix} v_{4} (h) + v_{5} (h) & = ε^{⊤} M_{j} Z_{j, h_{0}} [{(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} - {(Z_{j, h_{0}}^{⊤} M_{j} Z_{j, h_{0}})}^{- 1}] Z_{j, h_{0}}^{⊤} M_{j} ε \\ + ε^{⊤} M_{j} Z_{Δ} {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε \\ + ε^{⊤} M_{j} Z_{j, h_{0}} {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{Δ}^{⊤} M_{j} ε = o_{p} (1) \end{matrix}$

on $D (C)$ . Therefore, we obtain that

$V (h) - V (h_{0}) = - δ_{j}^{⊤} Z_{Δ}^{⊤} Z_{Δ} δ_{j} + [2 δ_{j}^{⊤} Z_{Δ}^{⊤} ε] sgn (h_{0} - h) + o_{p} (1) .$ (A4)

In light of the proof of Proposition 2 in [26], we obtain the limiting distribution in Theorem 2. □

In the following, we introduce lemmas that are instrumental in proving Theorem 3.

Lemma A1.

If $δ_{j}$ is fixed or $δ_{j} \to 0$ but satisfies Assumption 5, then as $N_{j} \to \infty$ ,

${var}^{*} (ε_{n_{j}^{(l)}}^{*}) = N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {(ε_{i} - {\bar{ε}}_{N_{j}})}^{2} + O_{p} (N_{j}^{- 1}) .$

where $ε_{n_{j}^{(l)}}^{*}$ is defined in (11).

Proof f Lemma A1.

In view of (1) and (10), we have

$\begin{matrix} {\tilde{ε}}_{i} & = {[x_{i} I (a_{j} < i \leq n_{j}^{(r)}) - \frac{1}{n_{j}^{(r)} - a_{j}} \sum_{l = a_{j} + 1}^{n_{j}^{(r)}} x_{l}]}^{⊤} δ_{j} + {(x_{i} - {\bar{x}}_{j})}^{⊤} (β_{j} - {\hat{β}}_{j, h}) \\ - {[x_{i} I ({\hat{a}}_{j, h} < i \leq n_{j}^{(r)}) - \frac{1}{N_{j}} \sum_{l = {\hat{a}}_{j, h} + 1}^{n_{j}^{(r)}} x_{l}]}^{⊤} {\hat{δ}}_{j, h} + ε_{i} - {\bar{ε}}_{N_{j}}, \end{matrix}$ (A5)

where ${\bar{x}}_{j} = \frac{1}{N_{j}} \sum_{l = n_{j}^{(l)}}^{n_{j}^{(r)}} x_{l}$ and ${\bar{ε}}_{N_{j}} = \frac{1}{N_{j}} \sum_{l = n_{j}^{(l)}}^{n_{j}^{(r)}} ε_{l}$ .

Assume $a_{j} < {\hat{a}}_{j, h}$ (the other case can be handled in a similar way). Since $E^{*} (ε_{n_{j}^{(l)}}^{*}) = N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {\tilde{ε}}_{i} = 0$ , we have ${var}^{*} (ε_{n_{j}^{(l)}}^{*}) = N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {\tilde{ε}}_{i}^{2}$ . It follows from (A5) that

$\begin{matrix} {var}^{*} (ε_{n_{j}^{(l)}}^{*}) & = \frac{1}{N_{j}} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {(ε_{i} - {\bar{ε}}_{N_{j}})}^{2} + \frac{2}{N_{j}} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {(x_{i} - {\bar{x}}_{j})}^{⊤} (β_{j} - {\hat{β}}_{j, h}) (ε_{i} - {\bar{ε}}_{N_{j}}) \\ + \frac{1}{N_{j}} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {[{(x_{i} - {\bar{x}}_{j})}^{⊤} (β_{j} - {\hat{β}}_{j, h})]}^{2} + \frac{1}{N_{j}^{2}} {(\sum_{i = {\hat{a}}_{j, h} + 1}^{n_{j}^{(r)}} x_{i}^{⊤} (δ_{j} - {\hat{δ}}_{j, h}))}^{2} \\ + \frac{1}{N_{j}^{2}} {(\sum_{i = a_{j} + 1}^{{\hat{a}}_{j, h}} x_{i}^{⊤} δ_{j})}^{2} + \frac{2}{N_{j}^{2}} \sum_{i = a_{j} + 1}^{{\hat{a}}_{j, h}} x_{i}^{⊤} δ_{j} \sum_{l = {\hat{a}}_{j, h} + 1}^{n_{j}^{(r)}} x_{l}^{⊤} (δ_{j} - {\hat{δ}}_{j, h}) . \end{matrix}$ (A6)

According to Corollary 1 in [26], we have $β_{j} - {\hat{β}}_{j, h} = O_{p} (N_{j}^{- 1 / 2})$ and $δ_{j} - {\hat{δ}}_{j, h} = O_{p} (N_{j}^{- 1 / 2})$ , which, combined with ${\hat{a}}_{j, h} = a_{j} + O_{p} (∥ δ_{j} ∥^{- 2})$ , yields that $N_{j}^{- 1} \sum_{i = a_{j} + 1}^{{\hat{a}}_{j, h}} x_{i}^{⊤} δ_{j} = O_{p} (N_{j}^{- 1} ∥ δ_{j} ∥^{- 1})$ and $N_{j}^{- 1} \sum_{l = {\hat{a}}_{j, h} + 1}^{n_{j}^{(r)}} x_{l}^{⊤} (δ_{j} - {\hat{δ}}_{j, h}) = O_{p} (N_{j}^{- 1 / 2})$ . This completes the proof.

□

Lemma A2.

For every $ϵ > 0$ ,

$\underset{N_{j} \to \infty}{lim sup} P^{*} (|N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {(ε_{i}^{*})}^{2} - V a r^{*} (ε_{n_{j}^{(l)}}^{*})| \geq ϵ) \leq ϵ σ^{2} .$

Proof of Lemma A2.

We have for every $ϵ > 0$ and every $η > 0$ ,

$\begin{matrix} P^{*} (|N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} ε_{i}^{* 2} - {var}^{*} (ε_{n_{j}^{(l)}}^{*})| \geq ϵ) \\ = & P^{*} (|N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} ε_{i}^{* 2} - N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {\tilde{ε}}_{i}^{2}| \geq ϵ) \\ \leq & P^{*} (|N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} ε_{i}^{* 2} I (| ε_{i}^{*} | \geq η \sqrt{N_{j}}) - N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {\tilde{ε}}_{i}^{2} I (| {\tilde{ε}}_{i} | \geq η \sqrt{N_{j}})| \geq ϵ) \\ + P^{*} (|N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} ε_{i}^{* 2} I (| ε_{i}^{*} | < η \sqrt{N_{j}}) - N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {\tilde{ε}}_{i}^{2} I (| {\tilde{ε}}_{i} | < η \sqrt{N_{j}})| \geq ϵ) \end{matrix}$ (A7)

Clearly, for every $ϵ > 0$ and every $η > 0$ , as $N_{j} \to \infty$ , we have

$N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {\tilde{ε}}_{i}^{2} I (| {\tilde{ε}}_{i} | \geq η \sqrt{N_{j}}) \to 0 a . s .$

It follows that

$P^{*} (N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} ε_{i}^{* 2} I (|ε_{i}^{*}| \geq η \sqrt{N_{j}}) \geq ϵ / 2) \leq \frac{2}{ϵ} N_{j}^{- 1} \sum_{i = n_{j}^{(l)}}^{n_{j}^{(r)}} {\tilde{ε}}_{i}^{2} I (|{\tilde{ε}}_{i}| \geq η \sqrt{N_{j}}) \to 0, a . s .$

Since

$\begin{matrix} \underset{N_{j} \to \infty}{lim sup} P^{*} (|N_{j}^{- 1} \sum_{i = n_{j}^{(r)}}^{n_{j}^{(l)}} ε_{i}^{* 2} I (|ε_{i}^{*}| < η \sqrt{N_{j}}) - N_{j}^{- 1} \sum_{i = n_{j}^{(r)}}^{n_{j}^{(l)}} {\tilde{ε}}_{i}^{2} I (|{\tilde{ε}}_{i}| < η \sqrt{N_{j}})| \geq ϵ / 2) \\ \leq & \frac{4}{ϵ^{2}} \underset{N_{j} \to \infty}{lim sup} N_{j}^{- 2} \sum_{i = n_{j}^{(r)}}^{n_{j}^{(l)}} {\tilde{ε}}_{i}^{4} I (|{\tilde{ε}}_{i}| < η \sqrt{N_{j}}) \\ \leq & \frac{4}{ϵ^{2}} η^{2} \underset{n \to \infty}{lim sup} N_{j}^{- 1} \sum_{i = n_{j}^{(r)}}^{n_{j}^{(l)}} {\tilde{ε}}_{i}^{2} = \frac{4}{ϵ^{2}} η^{2} σ^{2} a . s ., \end{matrix}$

choosing $η^{2} = ϵ^{3} / 8$ gives the assertion in Lemma A2. □

Lemma A3.

Under the assumption that $\sum_{i = t_{1}}^{t_{2}} x_{i} x_{i}^{⊤} / (t_{1} - t_{2}) \to_{p} V$ as $t_{2} - t_{1} \to_{p} \infty$ , for every $ϵ > 0$ , there exists $λ > 0$ such that $g^{*} \geq λ {∥ {\hat{δ}}_{j, h} ∥}^{2}$ with a probability of at least $1 - ϵ$ , where $g^{*} = {inf}_{|k_{2} - h| > N_{j} η} g^{*} (h_{b})$ .

Lemma A4.

Under the assumption that $\sum_{i = t_{1}}^{t_{2}} x_{i} x_{i}^{⊤} / (t_{1} - t_{2}) \to_{p} V$ as $t_{2} - t_{1} \to \infty$ , for every $ϵ > 0$ and $λ > 0$ , there exists $N_{0} > 0$ , such that $N_{j} > N_{0}$ , $P^{*} (| h_{b} - h | > N_{j} η) < ϵ$ .

The proofs of Lemmas A3 and A4 follow from [26] (see Lemmas A.2 and A.4). We next establish the consistency of ${\hat{a}}_{j, h_{b}}^{*}$ in Lemma A5, which serves as a key step in proving Theorem 3.

Lemma A5.

If $δ_{j}$ is fixed or $δ_{j} \to 0$ but satisfies Assumption 5, under Assumptions 1–4, we have

${\hat{a}}_{j, h_{b}}^{*} - {\hat{a}}_{j, h} = O_{p} (∥ {\hat{δ}}_{j, h} ∥^{- 2}) .$

Proof of Lemma A5.

We define $V^{*} (h_{b}) = {({\hat{δ}}_{j; h_{b}}^{*})}^{⊤} (Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}}) {\hat{δ}}_{j; h_{b}}^{*}$ . By Equation (12), ${\hat{a}}_{j, h_{b}}^{*} = arg {max}_{h_{b}} V^{*} (h_{b})$ . Denote $h = {\hat{a}}_{j, h}$ . If $h_{b} = h$ , then $Z_{j, h_{b}} = Z_{j, h}$ . It can be seen from (11) that

$\begin{matrix} {\hat{δ}}_{j; h_{b}}^{*} = {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} (Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h} {\hat{δ}}_{j, h} + Z_{j, h_{b}}^{⊤} M_{j} ε^{*}), \\ {\hat{δ}}_{j; h}^{*} = {\hat{δ}}_{j, h} + {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε^{*} . \end{matrix}$

It follows that

$\begin{matrix} V^{*} (h_{b}) - V^{*} (h) & = {\hat{δ}}_{j, h}^{⊤} \{(Z_{j, h}^{⊤} M_{j} Z_{j, h_{b}}) {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} (Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h}) - (Z_{j, h}^{⊤} M_{j} Z_{j, h})\} {\hat{δ}}_{j, h} \\ + v^{*} (h_{b}), \end{matrix}$

where

$\begin{matrix} v^{*} (h_{b}) & = 2 {\hat{δ}}_{j, h}^{⊤} (Z_{j, h}^{⊤} M_{j} Z_{j, h_{b}}) {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} ε^{*} - 2 {\hat{δ}}_{j, h}^{⊤} Z_{j, h}^{⊤} M_{j} ε^{*} \\ + ε^{* ⊤} M_{j} Z_{j, h_{b}} {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} ε^{*} - ε^{* ⊤} M_{j} Z_{j, h} {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε^{*} . \end{matrix}$

Define

$g^{*} (h_{b}) = \{\begin{matrix} {\hat{δ}}_{j, h}^{⊤} |(Z_{j, h}^{⊤} M_{j} Z_{j, h_{b}}) {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} (Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h}) - Z_{j, h} M_{j} Z_{j, h}| {\hat{δ}}_{j, h} / |h - h_{b}|, & if h_{b} \neq h \\ {\hat{δ}}_{j, h}^{⊤} {\hat{δ}}_{j, h}, & if h_{b} = h . \end{matrix}$

We have

$V^{*} (h_{b}) - V^{*} (h) = - | h - h_{b} | g^{*} (h_{b}) + v^{*} (h_{b}) .$ (A8)

Again, since $V^{*} (h_{b}) \geq V^{*} (h)$ by definition, it suffices to prove, by Lemma A4, that

$P^{*} (sup_{|h_{b} - h| > C {∥ {\hat{δ}}_{j, h} ∥}^{- 2}} V^{*} (h_{b}) \geq V^{*} (h)) < ϵ .$

By Lemmas A1 and A2, for any $ϵ > 0$ and $η > 0$ , we have $P^{*} (|h_{b} - h| > N_{j} η) < ϵ$ for a large $N_{j}$ . Therefore, to prove the above equation, we only need to show that

$P^{*} (sup_{h_{b} \in K^{*} (C)} V (h_{b}) \geq V (h)) < ϵ,$

where $K^{*} (C) = \{h_{b} : |h_{b} - h| > C {∥ {\hat{δ}}_{j, h} ∥}^{- 2} a n d n_{j}^{(l)} + N_{j} η \leq h_{b} \leq n_{j}^{(r)} - η N_{j}\}$ for a small number $η > 0$ . Finding ${sup}_{h_{b} \in K^{*} (C)} V (h_{b})$ is equivalent to a restricted search, this is legitimate only after the consistency has been established. Thus, by Lemma A3, it suffices to show that

$P^{*} (sup_{h_{b} \in K^{*} (C)} \frac{| v^{*} (h_{b}) |}{| h - h_{b} | ∥ {\hat{δ}}_{j, h} ∥^{2}} > λ) < ϵ .$ (A9)

Next, consider $v^{*} (h_{b})$ . Denote

$Z_{Δ_{b}} = \{\begin{matrix} Z_{j, h_{b}} - Z_{j, h} = {(0, \dots, 0, x_{h_{b} + 1}, \dots, x_{h}, 0, \dots, 0)}^{⊤}, & h_{b} < h, \\ Z_{j, h} - Z_{j, h_{b}} = {(0, \dots, 0, x_{h + 1}, \dots, x_{h_{b}}, 0, \dots, 0)}^{⊤}, & h_{b} > h, \\ 0, & h_{b} = h . \end{matrix}$

It follows that $Z_{j, h_{b}} = Z_{j, h} + Z_{Δ_{b}} sgn (h - h_{b})$ . Thus, we have

$\begin{matrix} v^{*} (h_{b}) & = v_{1}^{*} (h_{b}) + v_{2}^{*} (h_{b}) + v_{3}^{*} (h_{b}) + v_{4}^{*} (h_{b}) + v_{5}^{*} (h_{b}) \\ = [2 {\hat{δ}}_{j, h}^{⊤} Z_{Δ_{b}}^{⊤} ε^{*}] sgn (h - h_{b}) \\ - [2 {\hat{δ}}_{j, h}^{⊤} Z_{Δ_{b}}^{⊤} x_{j} {(x_{j}^{⊤} x_{j})}^{- 1} x_{j}^{⊤} ε^{*}] sgn (h - h_{b}) \\ - [2 {\hat{δ}}_{j, h}^{⊤} (Z_{Δ_{b}}^{⊤} M_{j} Z_{j, h_{b}}) {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} ε^{*}] sgn (h - h_{b}) \\ + ε^{* ⊤} M_{j} Z_{j, h_{b}} {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} ε^{*} \\ - ε^{* ⊤} M_{j} Z_{j, h} {(Z_{j, h}^{⊤} M_{j} Z_{j, h})}^{- 1} Z_{j, h}^{⊤} M_{j} ε^{*} . \end{matrix}$ (A10)

By Lemmas A1 and A2, we deduce that ${(X_{j}^{⊤} X_{j})}^{- 1} X_{j}^{⊤} ε^{*} = N_{j}^{- 1 / 2} O_{p^{*}} (1) .$ Since $N_{j}^{- 1 / 2 + α} δ_{j}^{- 1} \to 0$ and

$N_{j}^{- 1 / 2 + α} δ_{j}^{- 1} - N_{j}^{- 1 / 2 + α} {\hat{δ}}_{j, h}^{- 1} = N_{j}^{- 1 / 2 + α} δ_{j}^{- 1} ({\hat{δ}}_{j, h} - δ_{j}) {\hat{δ}}_{j, h}^{- 1} = O_{p^{*}} (N_{j}^{- 1 + α} δ_{j}^{- 2}),$

by Corollary 1 in [26], we have $N_{j}^{- 1 / 2 + α} {\hat{δ}}_{j, h}^{- 1}$ tends to zero. It follows from $Z_{Δ_{b}}^{⊤} X_{j} = | h - h_{b} | O_{p} (1)$ that

$\begin{matrix} sup_{h_{b} \in K^{*} (C)} \frac{| v_{2}^{*} (h_{b}) |}{| h - h_{b} | ∥ {\hat{δ}}_{j, h} ∥^{2}} & \leq 2 ∥ {\hat{δ}}_{j, h} ∥^{- 1} ∥ Z_{Δ_{b}}^{⊤} X_{j} / (h - h_{b}) ∥ ∥ {(X_{j}^{⊤} X_{j})}^{- 1} X_{j}^{⊤} ε^{*} ∥ \\ = O_{p^{*}} (∥ {\hat{δ}}_{j, h} ∥^{- 1} N_{j}^{- 1 / 2}) = o_{p^{*}} (1) . \end{matrix}$

Similarly, we have $∥ (Z_{Δ_{b}}^{⊤} M_{j} Z_{j, h_{b}}) ∥ = | h - h_{b} | O_{p^{*}} (1)$ , and ${(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} ε^{*} = N_{j}^{- 1 / 2} O_{p^{*}} (1)$ uniformly on $K^{*} (C)$ , which implies

$\begin{matrix} sup_{h_{b} \in K^{*} (C)} \frac{v_{3} (h_{b})}{| h - h_{b} | ∥ {\hat{δ}}_{j, h} ∥^{2}} & \leq ∥ {\hat{δ}}_{j, h} ∥^{- 1} ∥ (Z_{Δ_{b}}^{⊤} M_{j} Z_{j, h_{b}}) / (h - h_{b}) ∥ ∥ {(Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} ε^{*} ∥ \\ = O_{p^{*}} (∥ {\hat{δ}}_{j, h} ∥^{- 1} N_{j}^{- 1 / 2}) = o_{p^{*}} (1) . \end{matrix}$

Furthermore, it is easy to show that the expressions ${sup}_{h_{b} \in K^{*} (C)} \frac{| v_{4}^{*} (h_{b}) |}{| h - h_{b} | ∥ {\hat{δ}}_{j, h} ∥^{2}}$ and

${sup}_{h_{b} \in K^{*} (C)} \frac{| v_{5}^{*} (h) |}{| h - h_{b} | ∥ {\hat{δ}}_{j, h} ∥^{2}}$ are uniformly $O_{p^{*}} (1)$ on $K^{*} (C)$ . By Lemma A.3 in [26],

$\begin{matrix} P^{*} (sup_{h_{b} \in K^{*} (C)} \frac{| v_{1}^{*} (h_{b}) |}{| h - h_{b} | ∥ {\hat{δ}}_{j, h} ∥^{2}} > \frac{λ}{5}) \\ \leq & P^{*} (sup_{h_{b} \in K^{*} (C)} ∥ {\hat{δ}}_{j, h} ∥^{- 1} ∥ Z_{Δ_{b}}^{⊤} ε^{*} / (h - h_{b}) ∥ > \frac{λ}{10}) \leq \frac{100 B}{λ^{2} C} \leq \frac{ϵ}{5} \end{matrix}$

for large C. In conclution, the consistency of the bootstrap estimator has been established. □

Proof of Theorem 3.

Lemma A5 implies that when C is large, ${\hat{a}}_{j, h_{b}}^{*}$ has a high probability of being outside the set $K^{*} (C)$ . Let $D^{*} (C)$ denote the complement set of $K^{*} (C)$ such that $D^{*} (C) = {h_{b} : | h_{b} - h | \leq C ∥ {\hat{δ}}_{j, h} ∥^{- 2}}$ . To study the limiting distribution, we consider the expression of $V^{*} (h_{b}) - V^{*} (h)$ on $D^{*} (C)$ .

We use $Z_{j, h} = Z_{j, h_{b}} - Z_{Δ_{b}} sgn (h - h_{b})$ to obtain that

$\begin{matrix} |h - h_{b}| g^{*} (h_{b}) & = {\hat{δ}}_{j, h}^{⊤} {Z_{j, h} M_{j} Z_{j, h} - Z_{j, h}^{⊤} M_{j} Z_{j, h_{b}} {(Z_{j, h_{b}} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} Z_{j, h}} {\hat{δ}}_{j, h} \\ = {\hat{δ}}_{j, h}^{⊤} {Z_{Δ_{b}}^{⊤} Z_{Δ_{b}} - Z_{Δ_{b}}^{⊤} X_{j} {(X_{j}^{⊤} X_{j})}^{- 1} X_{j}^{⊤} Z_{Δ_{b}}} {\hat{δ}}_{j, h} \\ - {\hat{δ}}_{j, h}^{⊤} Z_{Δ_{b}}^{⊤} M_{j} Z_{j, h_{b}} {(Z_{j, h_{b}} M_{j} Z_{j, h_{b}})}^{- 1} Z_{j, h_{b}}^{⊤} M_{j} Z_{Δ_{b}} {\hat{δ}}_{j, h} . \end{matrix}$ (A11)

Since $∥ X_{j}^{⊤} Z_{Δ_{b}} ∥ = O_{p^{*}} (∥ {\hat{δ}}_{j, h} ∥^{- 2})$ and $∥ Z_{j, h_{b}}^{⊤} M_{j} Z_{Δ_{b}} ∥ = O_{p^{*}} (∥ {\hat{δ}}_{j, h} ∥^{- 2})$ , we have

$| h - h_{b} | g^{*} (h_{b}) = {\hat{δ}}_{j, h}^{⊤} Z_{Δ_{b}}^{⊤} Z_{Δ_{b}} {\hat{δ}}_{j, h} + o_{p^{*}} (1) .$

Since $|h - h_{b}| \leq C {∥ {\hat{δ}}_{j, h} ∥}^{- 2}$ , we deduce that $v_{2}^{*} (h_{b})$ and $v_{3}^{*} (h_{b})$ are both bounded by $O_{p^{*}} (N_{j}^{- 1 / 2} ∥ {\hat{δ}}_{j, h} ∥^{- 1}) = o_{p^{*}} (1)$ , and $v_{4}^{*} (h_{b})$ and $v_{5}^{*} (h_{b})$ are both bounded by $o_{p^{*}} (1)$ uniformly on $D^{*} (C)$ .

By the above results, we obtain that

$\begin{matrix} V^{*} (h_{b}) - V^{*} (h) & = - {\hat{δ}}_{j, h}^{⊤} Z_{Δ_{b}}^{⊤} Z_{Δ_{b}} {\hat{δ}}_{j, h} + 2 {\hat{δ}}_{j, h}^{⊤} Z_{Δ_{b}}^{⊤} ε^{*} sgn (h - h_{b}) + o_{p^{*}} (1), \end{matrix}$

which combined with (A4) completes the proof. □

Author Contributions

Conceptualization, L.H., B.J. and Y.W.; methodology, L.H. and B.J.; data analysis, L.H. and F.W.; writing, L.H., Y.W. and F.W. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding Statement

This work is supported by the National Natural Science Foundation of China (12231017, 72293573); and the Natural Science and Engineering Research Council of Canada (RGPIN-2023-05655).

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

1.Haynes K., Eckley I.A., Fearnhead P. Computationally efficient changepoint detection for a range of penalties. J. Comput. Graph. Stat. 2017;26:134–143. doi: 10.1080/10618600.2015.1116445. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Picard F., Robin S., Lavielle M., Vaisse C., Daudin J.J. A statistical approach for array CGH data analysis. BMC Bioinform. 2005;6:27. doi: 10.1186/1471-2105-6-27. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Li J., Fearnhead P., Fryzlewicz P., Wang T. Automatic change-point detection in time series via deep learning. J. R. Stat. Soc. Ser. B Stat. Methodol. 2024;86:273–285. doi: 10.1093/jrsssb/qkae004. [DOI] [Google Scholar]
4.Killick R., Fearnhead P., Eckley I.A. Optimal detection of changepoints with a linear computational cost. J. Am. Stat. Assoc. 2012;107:1590–1598. doi: 10.1080/01621459.2012.737745. [DOI] [Google Scholar]
5.Bai J., Perron P. Estimating and testing linear models with multiple structural changes. Econometrica. 1998;66:47–78. doi: 10.2307/2998540. [DOI] [Google Scholar]
6.Bai J., Perron P. Computation and analysis of multiple structural change models. J. Appl. Econom. 2003;18:1–22. doi: 10.1002/jae.659. [DOI] [Google Scholar]
7.Davis R.A., Lee T.C.M., Rodriguez-Yam G.A. Structural break estimation for nonstationary time series models. J. Am. Stat. Assoc. 2006;101:223–239. doi: 10.1198/016214505000000745. [DOI] [Google Scholar]
8.Harchaoui Z., Lévy-Leduc C. Multiple change-point estimation with a total variation penalty. J. Am. Stat. Assoc. 2010;105:1480–1493. doi: 10.1198/jasa.2010.tm09181. [DOI] [Google Scholar]
9.Tibshirani R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996;58:267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x. [DOI] [Google Scholar]
10.Jin B., Wu Y., Shi X. Consistent two-stage multiple change-point detection in linear models. Can. J. Stat. 2016;44:161–179. doi: 10.1002/cjs.11282. [DOI] [Google Scholar]
11.Zou H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006;101:1418–1429. doi: 10.1198/016214506000000735. [DOI] [Google Scholar]
12.Fan J., Li R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. J. Am. Stat. Assoc. 2001;96:1348–1360. doi: 10.1198/016214501753382273. [DOI] [Google Scholar]
13.Zhang C. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010;38:894–942. doi: 10.1214/09-AOS729. [DOI] [PubMed] [Google Scholar]
14.Li J., Jin B. Multi-threshold accelerated failure time model. Ann. Stat. 2018;46:2657–2682. doi: 10.1214/17-AOS1632. [DOI] [Google Scholar]
15.Eichinger B., Kirch C. A MOSUM procedure for the estimation of multiple random change points. Bernoulli. 2018;24:526–564. doi: 10.3150/16-BEJ887. [DOI] [Google Scholar]
16.Fang X., Li J., Siegmund D. Segmentation and estimation of change-point models: False positive control and confidence regions. Ann. Stat. 2020;48:1615–1647. doi: 10.1214/19-AOS1861. [DOI] [Google Scholar]
17.Antoch J., Hušková M., Veraverbeke N. Change-point problem and bootstrap. J. Nonparametr. Stat. 1995;5:123–144. doi: 10.1080/10485259508832639. [DOI] [Google Scholar]
18.Dumbgen L. The asymptotic behavior of some nonparametric change-point estimators. Ann. Stat. 1991;19:1471–1495. doi: 10.1214/aos/1176348257. [DOI] [Google Scholar]
19.Hušková M., Kirch C. Bootstrapping confidence intervals for the change-point of time series. J. Time Ser. Anal. 2008;29:947–972. doi: 10.1111/j.1467-9892.2008.00589.x. [DOI] [Google Scholar]
20.Cho H., Kirch C. Bootstrap confidence intervals for multiple change points based on moving sum procedures. Comput. Stat. Data Anal. 2022;175:107552. doi: 10.1016/j.csda.2022.107552. [DOI] [Google Scholar]
21.Lv J., Fan Y. A unified approach to model selection and sparse recovery using regularized least squares. Ann. Stat. 2009;37:3498–3528. doi: 10.1214/09-AOS683. [DOI] [Google Scholar]
22.Ing C.K., Lai T.L. A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Stat. Sin. 2011;21:1473–1513. doi: 10.5705/ss.2010.081. [DOI] [Google Scholar]
23.Jin B., Shi X., Wu Y. A novel and fast methodology for simultaneous multiple structural break estimation and variable selection for nonstationary time series models. Stat. Comput. 2013;23:221–231. doi: 10.1007/s11222-011-9304-6. [DOI] [Google Scholar]
24.White H. Maximum likelihood estimation of misspecified models. Econom. J. Econom. Soc. 1982;50:1–25. doi: 10.2307/1912526. [DOI] [Google Scholar]
25.Flynn C.J., Hurvich C.M., Simonoff J.S. Efficiency for regularization parameter selection in penalized likelihood estimation of misspecified models. J. Am. Stat. Assoc. 2013;108:1031–1043. doi: 10.1080/01621459.2013.801775. [DOI] [Google Scholar]
26.Bai J. Estimation of a change point in multiple regression models. Rev. Econ. Stat. 1997;79:551–563. doi: 10.1162/003465397557132. [DOI] [Google Scholar]
27.Takanami T., Kitagawa G. Estimation of the arrival times of seismic waves by multivariate time series model. Ann. Inst. Stat. Math. 1991;43:407–433. doi: 10.1007/BF00053364. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

[B1-entropy-27-00537] 1.Haynes K., Eckley I.A., Fearnhead P. Computationally efficient changepoint detection for a range of penalties. J. Comput. Graph. Stat. 2017;26:134–143. doi: 10.1080/10618600.2015.1116445. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2-entropy-27-00537] 2.Picard F., Robin S., Lavielle M., Vaisse C., Daudin J.J. A statistical approach for array CGH data analysis. BMC Bioinform. 2005;6:27. doi: 10.1186/1471-2105-6-27. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3-entropy-27-00537] 3.Li J., Fearnhead P., Fryzlewicz P., Wang T. Automatic change-point detection in time series via deep learning. J. R. Stat. Soc. Ser. B Stat. Methodol. 2024;86:273–285. doi: 10.1093/jrsssb/qkae004. [DOI] [Google Scholar]

[B4-entropy-27-00537] 4.Killick R., Fearnhead P., Eckley I.A. Optimal detection of changepoints with a linear computational cost. J. Am. Stat. Assoc. 2012;107:1590–1598. doi: 10.1080/01621459.2012.737745. [DOI] [Google Scholar]

[B5-entropy-27-00537] 5.Bai J., Perron P. Estimating and testing linear models with multiple structural changes. Econometrica. 1998;66:47–78. doi: 10.2307/2998540. [DOI] [Google Scholar]

[B6-entropy-27-00537] 6.Bai J., Perron P. Computation and analysis of multiple structural change models. J. Appl. Econom. 2003;18:1–22. doi: 10.1002/jae.659. [DOI] [Google Scholar]

[B7-entropy-27-00537] 7.Davis R.A., Lee T.C.M., Rodriguez-Yam G.A. Structural break estimation for nonstationary time series models. J. Am. Stat. Assoc. 2006;101:223–239. doi: 10.1198/016214505000000745. [DOI] [Google Scholar]

[B8-entropy-27-00537] 8.Harchaoui Z., Lévy-Leduc C. Multiple change-point estimation with a total variation penalty. J. Am. Stat. Assoc. 2010;105:1480–1493. doi: 10.1198/jasa.2010.tm09181. [DOI] [Google Scholar]

[B9-entropy-27-00537] 9.Tibshirani R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996;58:267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x. [DOI] [Google Scholar]

[B10-entropy-27-00537] 10.Jin B., Wu Y., Shi X. Consistent two-stage multiple change-point detection in linear models. Can. J. Stat. 2016;44:161–179. doi: 10.1002/cjs.11282. [DOI] [Google Scholar]

[B11-entropy-27-00537] 11.Zou H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006;101:1418–1429. doi: 10.1198/016214506000000735. [DOI] [Google Scholar]

[B12-entropy-27-00537] 12.Fan J., Li R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. J. Am. Stat. Assoc. 2001;96:1348–1360. doi: 10.1198/016214501753382273. [DOI] [Google Scholar]

[B13-entropy-27-00537] 13.Zhang C. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010;38:894–942. doi: 10.1214/09-AOS729. [DOI] [PubMed] [Google Scholar]

[B14-entropy-27-00537] 14.Li J., Jin B. Multi-threshold accelerated failure time model. Ann. Stat. 2018;46:2657–2682. doi: 10.1214/17-AOS1632. [DOI] [Google Scholar]

[B15-entropy-27-00537] 15.Eichinger B., Kirch C. A MOSUM procedure for the estimation of multiple random change points. Bernoulli. 2018;24:526–564. doi: 10.3150/16-BEJ887. [DOI] [Google Scholar]

[B16-entropy-27-00537] 16.Fang X., Li J., Siegmund D. Segmentation and estimation of change-point models: False positive control and confidence regions. Ann. Stat. 2020;48:1615–1647. doi: 10.1214/19-AOS1861. [DOI] [Google Scholar]

[B17-entropy-27-00537] 17.Antoch J., Hušková M., Veraverbeke N. Change-point problem and bootstrap. J. Nonparametr. Stat. 1995;5:123–144. doi: 10.1080/10485259508832639. [DOI] [Google Scholar]

[B18-entropy-27-00537] 18.Dumbgen L. The asymptotic behavior of some nonparametric change-point estimators. Ann. Stat. 1991;19:1471–1495. doi: 10.1214/aos/1176348257. [DOI] [Google Scholar]

[B19-entropy-27-00537] 19.Hušková M., Kirch C. Bootstrapping confidence intervals for the change-point of time series. J. Time Ser. Anal. 2008;29:947–972. doi: 10.1111/j.1467-9892.2008.00589.x. [DOI] [Google Scholar]

[B20-entropy-27-00537] 20.Cho H., Kirch C. Bootstrap confidence intervals for multiple change points based on moving sum procedures. Comput. Stat. Data Anal. 2022;175:107552. doi: 10.1016/j.csda.2022.107552. [DOI] [Google Scholar]

[B21-entropy-27-00537] 21.Lv J., Fan Y. A unified approach to model selection and sparse recovery using regularized least squares. Ann. Stat. 2009;37:3498–3528. doi: 10.1214/09-AOS683. [DOI] [Google Scholar]

[B22-entropy-27-00537] 22.Ing C.K., Lai T.L. A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Stat. Sin. 2011;21:1473–1513. doi: 10.5705/ss.2010.081. [DOI] [Google Scholar]

[B23-entropy-27-00537] 23.Jin B., Shi X., Wu Y. A novel and fast methodology for simultaneous multiple structural break estimation and variable selection for nonstationary time series models. Stat. Comput. 2013;23:221–231. doi: 10.1007/s11222-011-9304-6. [DOI] [Google Scholar]

[B24-entropy-27-00537] 24.White H. Maximum likelihood estimation of misspecified models. Econom. J. Econom. Soc. 1982;50:1–25. doi: 10.2307/1912526. [DOI] [Google Scholar]

[B25-entropy-27-00537] 25.Flynn C.J., Hurvich C.M., Simonoff J.S. Efficiency for regularization parameter selection in penalized likelihood estimation of misspecified models. J. Am. Stat. Assoc. 2013;108:1031–1043. doi: 10.1080/01621459.2013.801775. [DOI] [Google Scholar]

[B26-entropy-27-00537] 26.Bai J. Estimation of a change point in multiple regression models. Rev. Econ. Stat. 1997;79:551–563. doi: 10.1162/003465397557132. [DOI] [Google Scholar]

[B27-entropy-27-00537] 27.Takanami T., Kitagawa G. Estimation of the arrival times of seismic waves by multivariate time series model. Ann. Inst. Stat. Math. 1991;43:407–433. doi: 10.1007/BF00053364. [DOI] [Google Scholar]

PERMALINK

Bootstrap Confidence Intervals for Multiple Change Points Based on Two-Stage Procedures

Li Hou

Baisuo Jin

Yuehua Wu

Fangwei Wang

Roles

Abstract

1. Introduction

2. Multiple Change Point Detection Based on Two-Stage Procedures

Remark 1.

2.1. Segment Selection

2.2. Refining

3. Bootstrap Confidence Intervals for Multiple Change Points

4. Theoretical Validity of the Bootstrap Confidence Intervals

Assumption 1.

Assumption 2.

Assumption 3.

Assumption 4.

Assumption 5.

Theorem 1.

Theorem 2.

Theorem 3.

Corollary 1.

5. Simulation

5.1. Detection of Multiple Change Points

Figure 1.

Table 1.

5.2. Bootstrap CIs

Table 2.

6. Empirical Application

Table 3.

Figure 2.

7. Conclusions

Appendix A

Proof of Theorem 2.

Lemma A1.

Proof f Lemma A1.

Lemma A2.

Proof of Lemma A2.

Lemma A3.

Lemma A4.

Lemma A5.

Proof of Lemma A5.

Proof of Theorem 3.

Author Contributions

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Funding Statement

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases