Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2015 Feb 15;78(1):193–210. doi: 10.1111/rssb.12108

The lasso for high dimensional regression with a possible change point

Sokbae Lee 1,2,, Myung Hwan Seo 3,1, Youngki Shin 4
PMCID: PMC5014306  PMID: 27656104

Summary

We consider a high dimensional regression model with a possible change point due to a covariate threshold and develop the lasso estimator of regression coefficients as well as the threshold parameter. Our lasso estimator not only selects covariates but also selects a model between linear and threshold regression models. Under a sparsity assumption, we derive non‐asymptotic oracle inequalities for both the prediction risk and the l1‐estimation loss for regression coefficients. Since the lasso estimator selects variables simultaneously, we show that oracle inequalities can be established without pretesting the existence of the threshold effect. Furthermore, we establish conditions under which the estimation error of the unknown threshold parameter can be bounded by a factor that is nearly n1 even when the number of regressors can be much larger than the sample size n. We illustrate the usefulness of our proposed estimation method via Monte Carlo simulations and an application to real data.

Keywords: Lasso, Oracle inequalities, Sample splitting, Sparsity, Threshold models

1. Introduction

The lasso and related methods have received rapidly increasing attention in statistics since the seminal work of Tibshirani (1996). For example, see Bühlmann and van de Geer (2011) as well as Fan and Lv (2010) and Tibshirani (2011) for a general overview and recent developments.

In this paper, we develop a method for estimating a high dimensional regression model with a possible change point due to a covariate threshold, while selecting relevant regressors from a set of many potential covariates. In particular, we propose the l1 penalized least squares (lasso) estimator of parameters, including the unknown threshold parameter, and analyse its properties under a sparsity assumption when the number of possible covariates can be much larger than the sample size.

To be specific, let {(Yi,Xi,Qi):i=1,,n} be a sample of independent observations such that

Yi=Xiβ0+Xiδ01{Qi<τ0}+Ui,i=1,,n, (1.1)

where, for each i, Xi is an M×1 deterministic vector, Qi is a deterministic scalar, Ui follows N(0,σ2) and 1{·} denotes the indicator function. The scalar variable Qi is the threshold variable and τ0 is the unknown threshold parameter. Since Qi is a fixed variable in our set‐up, expression (1.1) includes a regression model with a change point at unknown time (e.g. Qi=i/n). In this paper, we focus on the fixed design for {(Xi,Qi):i=1,,n} and independent normal errors {Ui:i=1,,n}. This set‐up has been extensively used in the literature (e.g. Bickel et al. (2009)).

A regression model such as model (1.1) offers applied researchers a simple yet useful framework to model non‐linear relationships by splitting the data into subsamples. Empirical examples include cross‐country growth models with multiple equilibria (Durlauf and Johnson, 1995), racial segregation (Card et al., 2008) and financial contagion (Pesaran and Pick, 2007), among many others. Typically, the choice of the threshold variable is well motivated in applied work (e.g. initial per capita output in Durlauf and Johnson (1995), and the minority share in a neighbourhood in Card et al. (2008)), but selection of other covariates is subject to applied researchers' discretion.

However, covariate selection is important in identifying threshold effects (i.e. non‐zero δ0) since a statistical model favouring threshold effects with a particular set of covariates could be overturned by a linear model with a broader set of regressors. Therefore, it seems natural to consider the lasso as a tool to estimate model (1.1).

The statistical problem that we consider is to estimate unknown parameters (β0,δ0,τ0)R2M+1 when M is much larger than n. For the classical set‐up (estimation of parameters without covariate selection when M is smaller than n), estimation of model (1.1) has been well studied (e.g. Tong (1990), Chan (1993) and Hansen (2000)). Also, a general method for testing threshold effects in regression (i.e. testing H0:δ0=0 in model (1.1)) is available for the classical set‐up (e.g. Lee et al. (2011)).

Although there are many references on lasso‐type methods and also equally many on change points, sample splitting and threshold models, there seem to be only a handful of references that intersect both topics. Wu (2008) proposed an information‐based criterion for carrying out change point analysis and variable selection simultaneously in linear models with a possible change point; however, the method proposed in Wu (2008) would be infeasible in a sparse high dimensional model. In change point models without covariates, Harchaoui and Lévy‐Leduc (2008, 2010) proposed a method for estimating the location of change points in one‐dimensional piecewise constant signals observed in white noise, using a penalized least square criterion with an l1‐type penalty. Zhang and Siegmund (2012) developed Bayes information criterion like criteria for determining the number of changes in the mean of multiple sequences of independent normal observations when the number of change points can increase with the sample size. Ciuperca (2014) considered a similar estimation problem to ours, but the corresponding analysis was restricted to the case when the number of potential covariates is small.

In this paper, we consider the lasso estimator of regression coefficients as well as the threshold parameter. Since the change point parameter τ0 does not enter additively in model (1.1), the resulting optimization problem in the lasso estimation is non‐convex. We overcome this problem by comparing the values of standard lasso objective functions on a grid over the range of possible values of τ0.

Theoretical properties of the lasso and related methods for high dimensional data have been examined by Fan and Peng (2004), Bunea et al. (2007), Candès and Tao (2007), Huang et al. (2008a,b), Kim et al. (2008), Bickel et al. (2009) and Meinshausen and Yu (2009), among many others. Most of the references consider quadratic objective functions and linear or non‐parametric models with an additive mean 0 error. There has been recent interest in extending this framework to generalized linear models (e.g. van de Geer (2008) and Fan and Lv (2011)), to quantile regression models (e.g. Belloni and Chernozhukov (2011a), Bradic et al. (2011) and Wang et al. (2012)), and to hazards models (e.g. Bradic et al. (2012) and Lin and Lv (2013)). We contribute to this literature by considering a regression model with a possible change point and then deriving non‐asymptotic oracle inequalities for both the prediction risk and the l1‐estimation loss for regression coefficients under a sparsity scenario.

Our theoretical results build on Bickel et al. (2009). Since the lasso estimator selects variables simultaneously, we show that oracle inequalities that are similar to those obtained in Bickel et al. (2009) can be established without pretesting the existence of the threshold effect. In particular, when there is no threshold effect (δ0=0), we prove oracle inequalities that are basically equivalent to those in Bickel et al. (2009). Furthermore, when δ00, we establish conditions under which the estimation error of the unknown threshold parameter can be bounded by a factor of nearly n1 when the number of regressors can be much larger than the sample size. To achieve this, we develop some sophisticated chaining arguments and provide sufficient regularity conditions under which we prove oracle inequalities. The superconsistency result of τ^ is well known when the number of covariates is small (see, for example, Chan (1993) and Seijo and Sen (2011a, b)). To the best of our knowledge, our paper is the first work that demonstrates the possibility of a nearly n1‐bound in the context of sparse high dimensional regression models with a change point.

The remainder of this paper is as follows. In Section 2 we propose the lasso estimator, and in Section 3 we give a brief illustration of our proposed estimation method by using a real data example in economics. In Section 4 we establish the prediction consistency of our lasso estimator. In Section 5 we establish sparsity oracle inequalities in terms of both the prediction loss and the l1‐estimation loss for (α0,τ0), while providing low level sufficient conditions for two possible cases of threshold effects. In Section 6 we present results of some simulation studies, and Section 7 concludes. The on‐line appendices consist of six sections: appendix A provides sufficient conditions for one of our main assumptions, appendix B gives some additional discussions on identifiability for τ0, appendices C, D and E contain all the proofs, and appendix F provides additional numerical results.

1.1. Notation

We collect the notation that is used in the paper here. For {(Yi,Xi,Qi):i=1,,n} following model (1.1), let Xi(τ) denote the 2M×1 vector such that Xi(τ)=(Xi,Xi1{Qi<τ}) and let X(τ) denote the n×2M matrix whose ith row is Xi(τ). For an L‐dimensional vector a, let |a|p denote the lp‐norm of a, and |J(a)| denote the cardinality of J(a), where J(a)={j{1,,L}:aj0}. In addition, let M(a) denote the number of non‐zero elements of a, i.e. M(a)=Σj=1L1{aj0}=|J(a)|. Let aJ denote the vector in RL that has the same co‐ordinates as a on J and zero co‐ordinates on the complement Jc of J. For any n‐dimensional vector W=(W1,,Wn), define the empirical norm as Wn:=(n1.5ptΣi=1n.7ptWi2)1/2. Let the superscript ‘(j)’ denote the jth element of a vector or the jth column of a matrix depending on the context. Finally, define f(α,τ)(x,q):=xβ+xδ1{q<τ}, f0(x,q):=xβ0+xδ01{q<τ0} and f^(x,q):=xβ^+xδ^1{q<τ^}. Then, we define the prediction risk as

1.1.

2. Lasso estimation

Let α0=(β0,δ0). Then, using notation defined above, we can rewrite model (1.1) as

Yi=Xi(τ0)α0+Ui,i=1,,n. (2.1)

Let y(Y1,,Yn). For any fixed τT, where T[t0,t1] is a parameter space for τ0, consider the residual sum of squares

Sn(α,τ)=n1i=1n(YiXiβXiδ1{Qi<τ})2=yX(τ)αn2,

where α=(β,δ).

We define the following 2M×2M diagonal matrix:

D(τ):=diag{X(j)(τ)n,j=1,,2M}.

For each fixed τT, define the lasso solution α^(τ) by

α^(τ):=argminαAR2M{Sn(α,τ)+λ|D(τ)α|1}, (2.2)

where λ is a tuning parameter that depends on n and A is a parameter space for α0.

It is important to note that the scale normalizing factor D(τ) depends on τ since different values of τ generate different dictionaries X(τ). To see this more clearly, define

X(j)(X1(j),,Xn(j)),X(j)(τ)(X1(j)1{Q1<τ},,Xn(j)1{Qn<τ}). (2.3)

Then, for each τT and for each j=1,…,M, we have X(j)(τ)n=X(j)n and X(M+j)(τ)n=X(j)(τ)n. Using this notation, we rewrite the l1‐penalty as

λ|D(τ)α|1=λj=12MX(j)(τ)n|α(j)|=λj=1M.8pt{X(j)n|α(j)|+X(j)(τ)n|α(M+j)|}.

Therefore, for each fixed τT, α^(τ) is the weighted lasso that uses a data‐dependent l1‐penalty to balance covariates adequately.

We now estimate τ0 by

τ^:=argminτTR[Sn{α^(τ),τ}+λ|D(τ).8ptα^(τ)|1]. (2.4)

In fact, for any finite n, τ^ is given by an interval and we simply define the maximum of the interval as our estimator. If we wrote the model by using 1{Qi>τ}, then the convention would be the minimum of the interval being the estimator. Then the estimator of α0 is defined as α^:=α^(τ^). In fact, our proposed estimator of (α,τ) can be viewed as the one‐step minimizer such that

(α^,τ^):=argminαAR2M,τTRSn(α,τ)+λ|D(τ)α|1. (2.5)

It is worth noting that we penalize β0 and δ0 in expression (2.5), where δ0 is the change of regression coefficients between two regimes. Model (1.1) can be written as

Yi=Xiβ0+Ui,ifQiτ0,Xiβ1+Ui,ifQi<τ0, (2.6)

where β1β0+δ0. In view of model (2.6), alternatively, one might penalize β0 and β1 instead of β0 and δ0. We opted to penalize δ0 in this paper since the case δ0=0 corresponds to the linear model. If δ^=0, then this case amounts to selecting the linear model.

3. Empirical illustration

In this section, we apply the proposed lasso method to growth regression models in economics. The neoclassical growth model predicts that economic growth rates converge in the long run. This theory has been tested empirically by looking at the negative relationship between long‐run growth rate and initial gross domestic product (GDP) given other covariates (see Barro and Sala‐i‐Martin (1995) and Durlauf et al. (2005) for literature reviews). Although empirical results confirmed the negative relationship between growth rate and initial GDP, there has been some criticism that the results depend heavily on the selection of covariates. Recently, Belloni and Chernozhukov (2011b) showed that lasso estimation can help to select the covariates in the linear growth regression model and that the lasso estimation results reconfirm the negative relationship between long‐run growth rate and initial GDP.

We consider the growth regression model with a possible threshold. Durlauf and Johnson (1995) provided the theoretical background of the existence of multiple steady states and estimated the model with two possible threshold variables. They checked the robustness by adding other available covariates to the model, but it is not still free from the criticism of ad hoc variable selection. Our proposed lasso method might be a good alternative in this situation. Furthermore, as we shall show later, our method works well even if there is no threshold effect in the model. Therefore, one might expect more robust results from our approach.

The regression model that we consider has the form

gri=β0+β1.2ptlgdp60i+Xiβ2+1{Qi<τ}(δ0+δ1.8ptlgdp60i+Xiδ2)+εi, (3.1)

where gri is the annualized GDP growth rate of country i from 1960 to 1985, lgdp60i is the log‐GDP in 1960 and Qi is a possible threshold variable for which we use the initial GDP or the adult literacy rate in 1960 following Durlauf and Johnson (1995). Finally, Xi is a vector of additional covariates related to education, market efficiency, political stability, market openness and demographic characteristics. In addition, Xi contains cross‐product terms between lgdp60i and education variables. Table 1 gives a list of all covariates used and a description of each variable. We include as many covariates as possible, which might mitigate the potential omitted variable bias. The data set mostly comes from Barro and Lee (1994), and the additional adult literacy rate is from Durlauf and Johnson (1995). Because of missing observations, we have 80 observations with 46 covariates (including a constant term) when Qi is the initial GDP (n=80 and M=46), and 70 observations with 47 covariates when Qi is the literacy rate (n=70 and M=47). It is worthwhile to note that the number of covariates in the threshold models is bigger than the number of observations (2M>n in our notation). Thus, we cannot adopt the standard least squares method to estimate the threshold regression model.

Table 1.

List of variables

Variable name Description
Dependent variable
gr Annualized GDP growth rate in the period 1960–1985
Threshold variables
gdp60 Real GDP per capita in 1960 (1985 price)
lr Adult literacy rate in 1960
Covariates
lgdp60 Log‐GDP per capita in 1960 (1985 price)
lr Adult literacy rate in 1960 (only included when Q=lr)
lsk log(investment/output) annualized over 1960–1985; a proxy for log(physical
savings rate)
lgrpop log(population growth rate) annualized over 1960–1985
pyrm60 log(average years of primary schooling) in the male population in 1960
pyrf60 log(average years of primary schooling) in the female population in 1960
syrm60 log(average years of secondary schooling) in the male population in 1960
syrf60 log(average years of secondary schooling) in the female population in 1960
hyrm60 log(average years of higher schooling) in the male population in 1960
hyrf60 log(average years of higher schooling) in the female population in 1960
nom60 Percentage of no schooling in the male population in 1960
nof60 Percentage of no schooling in the female population in 1960
prim60 Percentage of primary schooling attained in the male population in 1960
prif60 Percentage of primary schooling attained in the female population in 1960
pricm60 Percentage of primary schooling complete in the male population in 1960
pricf60 Percentage of primary schooling complete in the female population in 1960
secm60 Percentage of secondary schooling attained in the male population in 1960
secf60 Percentage of secondary schooling attained in the female population in 1960
seccm60 Percentage of secondary schooling complete in the male population in 1960
seccf60 Percentage of secondary schooling complete in the female population in 1960
llife log(life expectancy at age 0) averaged over 1960–1985
lfert log(fertility rate) averaged over 1960–1985
edu/gdp Government expenditure on eduction per GDP averaged over 1960–1985
gcon/gdp Government consumption expenditure net of defence and education per GDP averaged over 1960–1985
revol Number of revolutions per year over 1960–1984
revcoup Number of revolutions and coups per year over 1960–1984
wardum Dummy for countries that participated in at least one external war over 1960–1984
wartime Fraction of time over 1960–1985 involved in external war
lbmp log(1 + black market premium averaged over 1960–1985)
tot Term‐of‐trade shock
lgdp60 × ‘educ’ Product of two covariates (interaction of lgdp60 and education variables from pyrm60 to seccf60); total 16 variables

Table 2 summarizes the model selection and estimation results when Qi is the initial GDP. In the on‐line appendix F (see Table 4), we report additional empirical results with Qi being the literacy rate. To compare different model specifications, we also estimate a linear model, i.e. all δs are 0s in model (3.1), by standard lasso estimation. In each case, the regularization parameter λ is chosen by the ‘leave‐one‐out’ cross‐validation method. For the range T of the threshold parameter, we consider an interval between the 10% and 90% sample quantiles for each threshold variable.

Table 2.

Model selection and estimation results with Q=gdp60a

Variable Value for the linear model Values for the threshold model,
τ^=2898
β^
δ^
Constant −0.0923 −0.0811
lgdp60 −0.0153 −0.0120
lsk 0.0033 0.0038
lgrpop 0.0018
pyrf60 0.0027
syrm60 0.0157
hyrm60 0.0122 0.0130
hyrf60 −0.0389 −0.0807
nom60 2.64 × 105
prim60 −0.0004 −0.0001
pricm60 0.0006 −1.73 × 104
0.35×104
pricf60 −0.0006
secf60 0.0005
seccm60 0.0010 0.0014
llife 0.0697 0.0523
lfert −0.0136 −0.0047
edu/gdp −0.0189
gcon/gdp −0.0671 −0.0542
revol −0.0588
revcoup 0.0433
wardum −0.0043 −0.0022
wartime −0.0019 −0.0143 −0.0023
lbmp −0.0185 −0.0174 −0.0015
tot 0.0971 0.0974
lgdp60 × pyrf60
3.81×106
lgdp60 × syrm60 0.0002
lgdp60 × hyrm60 0.0050
lgdp60 × hyrf60 −0.0003
lgdp60 × nom60
8.26×106
lgdp60 × prim60
6.02×107
lgdp60 × prif60
3.47×106
8.11×106
lgdp60 × pricf60
8.46×106
lgdp60 × secm60 −0.0001
lgdp60 × seccf60 −0.0002
2.87×106
λ 0.0004 0.0034
M(α^)
28 26
Number of covariates 46 92
Number of observations 80 80
a

The regularization parameter λ is chosen by the ‘leave‐one‐out’ cross‐validation method. M(α^) denotes the number of covariates to be selected by the lasso estimator and a dash indicates that the regressor is not selected. Recall that β^ is the coefficient when Qγ^ and that δ^ is the change of the coefficient value when Q<γ^.

Main empirical findings are as follows. First, the marginal effect of lgdp60i, which is given by

grilgdp60i=β1+educiβ~2+1{Qi<γ}(δ1+educiδ~2),

where educi is a vector of education variables and β~2 and δ~2 are subvectors of β2 and δ2 corresponding to educi, is estimated to be negative for all the observed values of educi. This confirms the theory of the neoclassical growth model. Second, some non‐zero coefficients of interaction terms between lgdp60 and various education variables show the existence of threshold effects in both threshold model specifications. This result implies that the growth convergence rates can vary according to different levels of the initial GDP or the adult literacy rate in 1960. Specifically, in both threshold models, we have δ1=0, but some δ2s are not 0. Thus, conditionally on other covariates, there are different technological diffusion effects according to the threshold point. For example, a developing country (lower Q) with a higher education level will converge faster perhaps by absorbing advanced technology more easily and more quickly. Finally, the lasso with the threshold model specification selects a more parsimonious model than that with the linear specification even though the former doubles the number of potential covariates.

4. Prediction consistency of lasso estimator

In this section, we consider the prediction consistency of the lasso estimator. We make the following assumptions.

Assumption 1

  1. For the parameter space A for α0, any α(α1,,α2M)AR2M, including α0, satisfies maxj=1,,2M|αj|C1 for some constant C1>0. In addition, τ0T[t0,t1] that satisfies mini=1,,nQi<t0<t1<maxi=1,,nQi.

  2. There are universal constants C2>0 and C3>0 such that X(j)(τ)nC2 uniformly in j and τT, and X(j)(t0)nC3 uniformly in j, where j=1,…,2M.

  3. There is no ij such that Qi=Qj.

Assumption 1(a) imposes the boundedness for each component of the parameter vector. The first part of assumption 1(a) which implies that |α|12C1M for any αA, seems to be weak, since the sparsity assumption implies that |α0|1 is much smaller than C1M. Furthermore, in the literature on change point and threshold models, it is common to assume that the parameter space is compact. For example, see Seijo and Sen (2011a, b).

The lasso estimator in expression (2.5) can be computed without knowing the value of C1, but T[t0,t1] must be specified. In practice, researchers tend to choose some strict subset of the range of observed values of the threshold variable. Assumption 1(b) imposes that each covariate is of the same magnitude uniformly over τ. In view of the assumption that mini=1,,nQi<t0, it is not stringent to assume that X(j)(t0)n is bounded away from zero.

Assumption 1(c) imposes that there is no tie among Qis. This is a convenient assumption such that we can always transform general Qi to Qi=i/n without loss of generality. This holds with probability 1 for the random‐design case if Qi is continuously distributed.

Define

rn:=min1jMX(j)(t0)n2X(j)n2,

where X(j) and X(j)(τ) are defined in expression (2.3). Assumption 1(b) implies that rn is bounded away from zero. In particular, we have that 1rnC3/C2>0.

Recall that

f^f0n:=1ni=1n{f^(Xi,Qi)f0(Xi,Qi)}21/2, (4.1)

where f^(x,q):=xβ^+xδ^1{q<τ^} and f0(x,q):=xβ0+xδ01{q<τ0}. To establish theoretical results in the paper (in particular, oracle inequalities in Section 5), let (α^,τ^) be the lasso estimator defined by expression (2.5) with

λ=Aσlog(3M)nrn1/2 (4.2)

for a constant A>2√2/μ, where μ ∈ (0,1) is a fixed constant. We now present the first theoretical result of this paper.

Theorem 1

(consistency of the lasso). Let assumption 1 hold. Let μ be a constant such that 0<μ<1, and let (α^,τ^) be the lasso estimator defined by expression (2.5) with λ given by equation (4.2). Then, with probability at least 1(3M)1A2μ2/8, we have

f^f0nK1λM(α0)1/2,

where K12C1C2(3+μ)1/2>0.

The non‐asymptotic upper bound on the prediction risk in theorem 1 can be translated easily into asymptotic convergence. Theorem 1 implies the consistency of the lasso, provided that n→∞, M→∞ and λM(α0)0. Recall that M(α0) represents the sparsity of model (2.1). In view of equation (4.2), the condition λM(α0)0 requires that M(α0)=o[{nrn/log(3M)}1/2]. This implies that M(α0) can increase with n.

Remark 1

Note that the prediction error increases as A or μ increases; however, the probability of correct recovery increases if A or μ increases. Therefore, there is a trade‐off between the prediction error and the probability of correct recovery.

5. Oracle inequalities

In this section, we establish finite sample sparsity oracle inequalities in terms of both the prediction loss and the l1‐estimation loss for unknown parameters. First of all, we make the following assumption.

Assumption 2

(uniform restricted eigenvalue (URE) (s,c0,S)). For some integer s such that 1⩽s⩽2M, a positive number c0 and some set SR, the following condition holds:

κ(s,c0,S):=minτSminJ0{1,,2M},|J0|sminγ0,|γJ0c|1c0|γJ0|1|X(τ)γ|2n|γJ0|2>0.

If τ0 were known, then assumption 2 is just a restatement of the restricted eigenvalue assumption of Bickel et al. (2009) with S={τ0}. Bickel et al. (2009) provided sufficient conditions for the restricted eigenvalue condition. In addition, van de Geer and Bühlmann (2009) showed the relationships between the restricted eigenvalue condition and other conditions on the design matrix, and Raskutti et al. (2010) proved that restricted eigenvalue conditions hold with high probability for a large class of correlated Gaussian design matrices.

If τ0 is unknown as in our set‐up, it seems necessary to assume that the restricted eigenvalue condition holds uniformly over τ. We consider separately two cases depending on whether δ0=0 or not. On the one hand, if δ0=0 so that τ0 is not identifiable, then we need to assume that the URE condition holds uniformly on the whole parameter space, T. On the other hand, if δ00 so that τ0 is identifiable, then it suffices to impose that the URE condition holds uniformly on a neighbourhood of τ0. In the on‐line appendix A, we provide two types of sufficient conditions for assumption 2. One type is based on modifications of assumption 2 of Bickel et al. (2009) and the other type is in the same spirit as van de Geer and Bühlmann (2009), section 10.1. Using the second type of results, we verify primitive sufficient conditions for the URE condition in the context of our simulation designs. See the on‐line appendix A for details.

The URE condition is useful for us to improve the result in theorem 1. Recall that, in theorem 1, the prediction risk is bounded by a factor of {λM(α0)}1/2. This bound is too large to give us an oracle inequality. We shall show below that we can establish non‐asymptotic oracle inequalities for the prediction risk as well as the l1‐estimation loss, thanks to the URE condition.

The strength of the proposed lasso method is that it is not necessary to know or pretest whether δ0=0 or not. It is worth noting that we do not have to know whether there is a threshold in the model to establish oracle inequalities for the prediction risk and the l1‐estimation loss for α0, although we divide our theoretical results into two cases below. This implies that we can make prediction and estimate α0 precisely without knowing the presence of a threshold effect or without pretesting for it.

5.1. Case I: no threshold

We first consider the case that δ0=0. In other words, we estimate a threshold model via the lasso method, but the true model is simply a linear model Yi=Xiβ0+Ui. This is an important case to consider in applications, because one may not be sure not only about covariates selection but also about the existence of the threshold in the model.

Let ϕmax denote the supremum (over τT) of the largest eigenvalue of X(τ)X(τ)/n. Then, by definition, the largest eigenvalue of X(τ)X(τ)/n is bounded uniformly in τT by ϕmax. The following theorem gives oracle inequalities for the first case.

Theorem 2

Suppose that δ0=0. Let assumptions 1 and 2 hold with κ=κ{s,(1+μ)/(1μ),T} for 0<μ<1, and M(α0)sM. Let (α^,τ^) be the lasso estimator defined by expression (2.5) with λ given by expression (4.2). Then, with probability at least 1(3M)1A2μ2/8, we have

f^f0nK2σκlog(3M)nrns1/2,|α^α0|1K2σκ2log(3M)nrn1/2s,M(α^)K2ϕmaxκ2s

for some universal constant K2>0.

To appreciate the usefulness of the inequalities derived above, it is worth comparing inequalities in theorem 2 with those in theorem 7.2 of Bickel et al. (2009). The latter corresponds to the case that δ0=0 is known a priori and λ=2Aσ.3ptlog(M/n)1/2 in our notation. If we compare theorem 2 with theorem 7.2 of Bickel et al. (2009), we can see that the lasso estimator in model (2.5) gives qualitatively the same oracle inequalities as the lasso estimator in the linear model, even though our model is much more overparameterized in that δ and τ are added to β as parameters to estimate.

Also, as in Bickel et al. (2009), there is no requirement on α0 such that the minimum value of non‐zero components of α0 is bounded away from zero. In other words, there is no need to assume the minimum strength of the signals. Furthermore, α0 is well estimated here even if τ0 is not identifiable at all. Finally, note that the value of the constant K2 is given in the proof of theorem 2 and that theorem 2 can be translated easily into asymptotic oracle results as well, since both κ and rn are bounded away from zero by the URE condition and assumption 1 respectively.

5.2. Case II: fixed threshold

This subsection explores the case where the threshold effect is well identified and discontinuous. We begin with the following additional assumptions to reflect this.

Assumption 3

(identifiability under sparsity and discontinuity of regression). For a given sM(α0), and for any η and τ such that |ττ0|>ηmini|Qiτ0| and α{α:M(α)s}, there is a constant c>0 such that

f(α,τ)f0n2>cη.

Assumption 3 implies, among other things, that for some sM(α0), and for any α{α:M(α)s} and τ such that (α,τ)(α0,τ0),

f(α,τ)f0n0. (5.1)

This condition can be regarded as identifiability of τ0. If τ0 were known, then a sufficient condition for the identifiability under the sparsity would be that URE(s,c0,{τ0}) holds for some c01. Thus, the main point in result (5.1) is that there is no sparse representation that is equivalent to f0 when the sample is split by ττ0. In fact, assumption 3 is stronger than just the identifiability of τ0 as it specifies the rate of deviation in f as τ moves away from τ0, which in turn dictates the bound for the estimation error of τ^. We provide further discussions on assumption 3 in the on‐line appendix B.

Remark 2

The restriction ηmini|Qiτ0| in assumption 3 is necessary since we consider the fixed design for both Xi and Qi. Throughout this section, we implicitly assume that the sample size n is sufficiently large such that minij|QiQj| is very small, implying that the restriction ηminij|QiQj| never binds in any of the inequalities below. This is typically true for the random‐design case if Qi is continuously distributed.

Assumption 4

(smoothness of design). For any η>0, there is a constant C<∞ such that

supjsup|ττ0|<η1ni=1n|Xi(j)|2|1(Qi<τ0)1(Qi<τ)|Cη.

Assumption 4 has been assumed in the classical set‐up with a fixed number of stochastic regressors to exclude cases like Qi has a point mass at τ0or E(Xi|Qi=τ0) is unbounded. In our set‐up, assumption 4 amounts to a deterministic version of some smoothness assumption for the distribution of the threshold variable Qi. When (Xi,Qi) is a random vector, it is satisfied under the standard assumption that Qi is continuously distributed and E(|Xi(j)|2|Qi=τ) is continuous and bounded in a neighbourhood of τ0 for each j.

To simplify the notation, in the following theorem, we assume without loss of generality that Qi=i/n. Then T=[t0,t1](0,1). In addition, let η0=max[n1,K1{λM(α0)}] where K1 is the same constant in theorem 1.

Assumption 5

(well‐defined second moments). For any η such that 1/nηη0, hn2(η) is bounded, where

hn2(η):=12nηi=min{1,[n(τ0η)]}max{[n(τ0+η)],n}(Xiδ0)2

and [·] denotes an integer part of any real number.

Assumption 5 assumes that hn2(η) is well defined for any η such that 1/nηη0. Assumption 5 amounts to some weak regularity condition on the second moments of the fixed design. Assumption 3 implies that δ00 and that hn2(η) is bounded away from zero. Hence, assumptions 3 and 5 imply that hn2(η) is bounded and bounded away from zero.

To present the theorem below, it is necessary to make one additional technical assumption (see assumption 6 in the on‐line appendix E). We opted not to show assumption 6 here, since we believe that this is just a sufficient condition that does not add much to our understanding of the main result. However, we would like to point out that assumption 6 can hold for all sufficiently large n, provided that sλ|δ0|10, as n→0. See remark 4 in the on‐line appendix E for details.

We now give the main result of this section.

Theorem 3

Suppose that assumptions 1 and 2 hold with S={|ττ0|η0}, κ=κ{s,(2+μ)/(1μ),S} for 0<μ<1, and M(α0)sM. Furthermore, assumptions 3, 4 and 5 hold and let n be sufficiently large that assumption 6 in the on‐line appendix E holds. Let (α^,τ^) be the lasso estimator defined by expression (2.5) with λ given by expression (4.2). Then, with probability at least 1(3M)1A2μ2/8C4(3M)C5/rn for some positive constants C4 and C5, we have

f^f0nK3σκlog(3M)nrns1/2,|α^α0|1K3σκ2log(3M)nrn1/2s,|τ^τ0|K3σ2κ2log(3M)nrns,M(α^)K3ϕmaxκ2s

for some universal constant K3>0.

Theorem 3 gives the same inequalities (up to constants) as those in theorem 2 for the prediction risk as well as the l1‐estimation loss for α0. It is important to note that |τ^τ0| is bounded by a constant times slog(3M)/(nrn), whereas |α^α0|1 is bounded by a constant times s{log(3M)/(nrn)}1/2. This can be viewed as a non‐asymptotic version of the superconsistency of τ^ to τ0. As noted at the end of Section 5.1, since both κ and rn are bounded away from zero by the URE condition and assumption 1 respectively, theorem 3 implies asymptotic rate results immediately. The values of constants C4, C5 and K3 are given in the proof of theorem 3.

The main contribution of this section is that we have extended the well‐known superconsistency result of τ^ when M<n (see, for example, Chan (1993) and Seijo and Sen (2011a, b)) to the high dimensional set‐up (Mn). In both cases, the main reason that we achieve the superconsistency for the threshold parameter is that the least squares objective function behaves locally linearly around the true threshold parameter value rather than locally quadratically, as in regular estimation problems. An interesting remaining research question is to investigate whether it would be possible to obtain the superconsistency result of τ^ under a weaker condition, perhaps without a restricted eigenvalue condition.

6. Monte Carlo experiments

In this section we conduct some simulation studies and check the properties of the lasso estimator proposed. The baseline model is model (1.1), where Xi is an M‐dimensional vector generated from N(0,I), Qi is a scalar generated from the uniform distribution on the interval of (0,1) and the error term Ui is generated from N(0,0.52). The threshold parameter is set to τ0=0.3,0.4,0.5 depending on the simulation design, and the coefficients are set to β0=(1,0,1,0,,0), and δ0=c(0,1,1,0,,0) where c=0 or c=1. Note that there is no threshold effect when c=0. The number of observations is set to n=200. Finally, the dimension of Xi in each design is set to M=50,100,200,400, so that the total numbers of regressors are 100, 200, 400 and 800 respectively. The range of τ is T=[0.15,0.85].

We can estimate the parameters by the standard lasso–least angle regression algorithm of Efron et al. (2004) without much modification. Given a regularization parameter value λ, we estimate the model for each grid point of τ that spans over 71 equispaced points on T. This procedure can be conducted by using the standard linear lasso. Next, we plug in the estimated parameter α^(τ):=(β^(τ),δ^(τ)) for each τ into the objective function and choose τ^ by expression (4.2). Finally, α^ is estimated by α^(τ^). The regularization parameter λ is chosen by expression (4.2) where σ=0.5 is assumed to be known. For the constant A, we use four different values: A=2.8,3.2,3.6,4.0.

Table 3 and Figs 1 and 2 summarize these simulation results. To compare the performance of the lasso estimator, we also report the estimation results of the least squares estimation (‘least squares’) available only when M=50 and two oracle models (oracle 1 and oracle 2). Oracle 1 assumes that the regressors with non‐zero coefficients are known. In addition to that, oracle 2 assumes that the true threshold parameter τ0 is known. Thus, when c≠0, oracle 1 estimates (β(1),β(3),δ(2),δ(3)) and τ by using least squares estimation whereas oracle 2 estimates only (β(1),β(3),δ(2),δ(3)). When c=0, both oracle 1 and oracle 2 estimate only (β(1),β(3)). All results are based on 400 replications of each sample.

Table 3.

Simulation results with M=50a

Threshold parameter Estimation method Constant for λ Prediction error
E[M(α^)]
E|α^α0|1
E|τ^τ0|1
Mean Median Standard deviation
Jump scale: c = 1
τ0=0.5
Least squares None 0.285 0.276 0.074 100.00 7.066 0.008
Lasso A=2.8 0.041 0.030 0.035 12.94 0.466 0.010
A=3.2 0.048 0.033 0.049 10.14 0.438 0.013
A=3.6 0.067 0.037 0.086 8.44 0.457 0.024
A=4.0 0.095 0.050 0.120 7.34 0.508 0.040
Oracle 1 None 0.013 0.006 0.019 4.00 0.164 0.004
Oracle 2 None 0.005 0.004 0.004 4.00 0.163 0.000
τ0=0.4
Least squares None 0.317 0.304 0.095 100.00 7.011 0.008
Lasso A=2.8 0.052 0.034 0.063 13.15 0.509 0.016
A=3.2 0.063 0.037 0.083 10.42 0.489 0.023
A=3.6 0.090 0.045 0.121 8.70 0.535 0.042
A=4.0 0.133 0.061 0.162 7.68 0.634 0.078
Oracle 1 None 0.014 0.006 0.022 4.00 0.163 0.004
Oracle 2 None 0.005 0.004 0.004 4.00 0.163 0.000
τ0=0.3
Least squares None 2.559 0.511 16.292 100.00 12.172 0.012
Lasso A=2.8 0.062 0.035 0.091 13.45 0.602 0.030
A=3.2 0.089 0.041 0.125 10.85 0.633 0.056
A=3.6 0.127 0.054 0.159 9.33 0.743 0.099
A=4.0 0.185 0.082 0.185 8.43 0.919 0.168
Oracle 1 None 0.012 0.006 0.017 4.00 0.177 0.004
Oracle 2 None 0.005 0.004 0.004 4.00 0.176 0.000
Jump scale: c = 0
b Least squares None 6.332 0.460 41.301 100.00 20.936 b
Lasso A=2.8 0.013 0.011 0.007 9.30 0.266
A=3.2 0.014 0.012 0.008 6.71 0.227
A=3.6 0.015 0.014 0.009 4.95 0.211
A=4.0 0.017 0.016 0.010 3.76 0.204
Oracle 1 and None 0.002 0.002 0.003 2.00 0.054
oracle 2
a

M denotes the column size of Xi and τ denotes the threshold parameter. Oracle 1 and oracle 2 are estimated by least squares when sparsity is known and when sparsity and τ0 are known respectively. All simulations are based on 400 replications of a sample with 200 observations.

b

Not applicable.

Figure 1.

Figure 1

Mean prediction errors and mean M(α^) (♦, τ=0.3; □, τ=0.4; ◯, τ=0.5; △, c=0): (a) M=100; (b) M=200; (c) M=400

Figure 2.

Figure 2

Mean l1‐errors for α and τ (♦, τ=0.3; □, τ=0.4; ◯, τ=0.5; △, c=0): (a) M=100; (b) M=200; (c) M=400

The reported mean‐squared prediction error PE for each sample is calculated numerically as follows. For each sample s, we have the estimates β^s, δ^s and τ^s. Given these estimates, we generate new data {Yj,Xj,Qj} of 400 observations and calculate the prediction error as

PE^s=1400.1ptj=1400{f0(xj,qj)f^(xj,qj)}2. (6.1)

The mean, median and standard deviation of the prediction error are calculated from the 400 replications, {PE^s}s=1400. We also report the mean of M(α^) and l1‐errors for α and τ. Table 3 reports the simulation results for M=50. For simulation designs with M>50, the least squares estimator is not available, and we summarize the same statistics only for the lasso estimator in Figs 1 and 2.

When M=50, across all designs, the lasso estimator proposed performs better than the least squares estimator in terms of mean and median prediction errors, the mean of M(α^) and the l1‐error for α. The performance of the lasso estimator becomes much better when there is no threshold effect, i.e. c=0. This result confirms the robustness of the lasso estimator for whether or not there is a threshold effect. However, the least squares estimator performs better than the lasso estimator in terms of estimation of τ0 when c=1, although the difference here is much smaller than the differences in prediction errors and the l1‐error for α.

From Figs 1 and 2, we can reconfirm the robustness of the lasso estimator when M=100,200,400. As predicted by the theory that was developed in previous sections, the prediction error and l1‐errors for α and τ increase slowly as M increases. The graphs also show that the results are quite uniform across different regularization parameter values except A=4.0.

In the on‐line appendix F, we report additional simulation results, while allowing correlation between covariates. Specifically, the M‐dimensional vector Xi is generated from a multivariate normal N(0,Σ) distribution with (Σ)i,j=ρ|ij|, where (Σ)i,j denotes the (i,j) element of the M×M covariance matrix Σ  and  ρ=0.3. All other random variables are the same as above. We obtained very similar results to those for the previous cases: the lasso outperforms the least squares estimator, and the prediction error, the mean of M(α^) and l1‐errors increase very slowly as M increases. See further details in the on‐line appendix F, which also reports satisfactory simulation results regarding frequencies of selecting true parameters when both ρ=0 and ρ=0.3.

In sum, the simulation results confirm the theoretical results that were developed earlier and show that the lasso estimator proposed will be useful for the high dimensional threshold regression model.

7. Conclusions

We have considered a high dimensional regression model with a possible change point due to a covariate threshold and have developed the lasso method. We have derived non‐asymptotic oracle inequalities and have illustrated the usefulness of our proposed estimation method via simulations and a real data application.

We conclude this paper by providing some areas of future research. First, it would be interesting to extend other penalized estimators (e.g. the adaptive lasso of Zou (2006) and the smoothly clipped absolute deviation penalty of Fan and Li (2001)) to our set‐up and to see whether we would be able to improve the performance of our estimation method. Second, an extension to multiple change points is also an important research topic. There has been some advance in this direction, especially regarding key issues like computational cost and the determination of the number of change points (see, for example, Harchaoui and Lévy‐Leduc (2010) and Frick et al. (2014)). However, they are confined to a single regressor case, and the extension to a large number of regressors would be highly interesting. Finally, it would be also an interesting research topic to investigate the minimax lower bounds of the estimator proposed and its prediction risk like Raskutti et al. (2011, 2012) did in high dimensional linear regression set‐ups.

Supporting information

‘Online appendices’.

Acknowledgements

We thank Marine Carrasco, Yuan Liao, Ya'acov Ritov, two referees and seminar participants at various places for their helpful comments. This work was supported by a National Research Foundation of Korea grant funded by the Korean Government (NRF‐2012S1A5A8023573), the Institute of Economic Research of Seoul National University, by the European Research Council (ERC‐2009‐StG‐240910‐ROMETA) and by the Social Sciences and Humanities Research Council of Canada. This work was made possible by the facilities of the Shared Hierarchical Academic Research Computing Network (www.sharcnet.ca) and Compute/Calcul Canada.

References

  1. Barro, R . and Lee, J . (1994) Data set for a panel of 139 countries. Report National Bureau of Economic Research, Cambridge. (Available from http://admin.nber.org/pub/barro.lee/
  2. Barro, R . and Sala‐i‐Martin, X . (1995) Economic Growth. New York: McGraw‐Hill. [Google Scholar]
  3. Belloni, A. and Chernozhukov, V. (2011a) l1‐penalized quantile regression in high‐dimensional sparse models. Ann. Statist., 39, 82–130. [Google Scholar]
  4. Belloni, A . and Chernozhukov, V . (2011b) High dimensional sparse econometric models: an introduction In Inverse Problems and High‐dimensional Estimation (eds Alquier P., Gautier E. and Stoltz G.), pp. 121–156. Berlin: Springer. [Google Scholar]
  5. Bickel, P. J. , Ritov, Y. and Tsybakov, A. B. (2009) Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist., 37, 1705–1732. [Google Scholar]
  6. Bradic, J. , Fan, J. and Jiang, J. (2012) Regularization for Cox's proportional hazards model with NP‐dimensionality. Ann. Statist., 39, 3092–3120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bradic, J. , Fan, J. and Wang, W. (2011) Penalized composite quasi‐likelihood for ultrahigh dimensional variable selection. J. R. Statist. Soc. B, 73, 325–349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bühlmann, P . and van de Geer, S . (2011) Statistics for High‐dimensional Data: Methods, Theory and Applications. New York: Springer. [Google Scholar]
  9. Bunea, F. , Tsybakov, A. and Wegkamp, M. (2007) Sparsity oracle inequalities for the Lasso. Electron. J. Statist., 1, 169–194. [Google Scholar]
  10. Candès, E. and Tao, T. (2007) The Dantzig selector: statistical estimation when p is much larger than n . Ann. Statist., 35, 2313–2351. [Google Scholar]
  11. Card, D. , Mas, A. and Rothstein, J. (2008) Tipping and the dynamics of segregation. Q. J. Econ., 123, 177–218. [Google Scholar]
  12. Chan, K. S. (1993) Consistency and limiting distribution of the least squares estimator of a threshold autoregressive model. Ann. Statist., 21, 520–533. [Google Scholar]
  13. Ciuperca, G. (2014) Model selection by lasso methods in a change‐point model. Statist. Pap., 55, 349–374. [Google Scholar]
  14. Durlauf, S. N. and Johnson, P. A. (1995) Multiple regimes and cross‐country growth behavior. J. Appl. Econmetr., 10, 365–384. [Google Scholar]
  15. Durlauf, S. , Johnson, P . and Temple, J . (2005) Growth econometrics In Handbook of Economic Growth, vol. (eds P. Aghion and S. N. Durlauf), pp. 555–677. Amsterdam: Elsevier [Google Scholar]
  16. Efron, B. , Hastie, T. , Johnstone, I. and Tibshirani, R. (2004) Least angle regression. Ann. Statist., 32, 407–499. [Google Scholar]
  17. Fan, J. and Li, R. (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Statist. Ass., 96, 13–48. [Google Scholar]
  18. Fan, J. and Lv, J. (2010) A selective overview of variable selection in high dimensional feature space. Statist. Sin., 20, 101–148. [PMC free article] [PubMed] [Google Scholar]
  19. Fan, J. and Lv, J. (2011) Nonconcave penalized likelihood with np‐dimensionality. IEEE Trans. Inform. Theor., 57, 5467–5484. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Fan, J. and Peng, H. (2004) Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist., 32, 928–961. [Google Scholar]
  21. Frick, K. , Munk, A. and Sieling, H. (2014) Multiscale change point inference (with discussion). J. R. Statist. Soc. B, 76, 495–580. [Google Scholar]
  22. van de Geer, S. A. (2008) High‐dimensional generalized linear models and the lasso. Ann. Statist., 36, 614–645. [Google Scholar]
  23. van de Geer, S. A. and Bühlmann, P. (2009) On the conditions used to prove oracle results for the Lasso. Electron. J. Statist., 3, 1360–1392. [Google Scholar]
  24. Hansen, B. E. (2000) Sample splitting and threshold estimation. Econometrica, 68, 575–603. [Google Scholar]
  25. Harchaoui, Z . and Lévy‐Leduc, C . (2008) Catching change‐points with Lasso In Advances in Neural Information Processing Systems, vol. . Cambridge: MIT Press. [Google Scholar]
  26. Harchaoui, Z. and Lévy‐Leduc, C. (2010) Multiple change‐point estimation with a total variation penalty. J. Am. Statist. Ass., 105, 1480–1493. [Google Scholar]
  27. Huang, J. , Horowitz, J. L. and Ma, M. S. (2008a) Asymptotic properties of bridge estimators in sparse high‐dimensional regression models. Ann. Statist., 36, 587–613. [Google Scholar]
  28. Huang, J. , Ma, S. G. and Zhang, C.‐H. (2008b). Adaptive lasso for sparse high‐dimensional regression models. Statist. Sin., 18, 1603–1618. [Google Scholar]
  29. Kim, Y. , Choi, H. and Oh, H.‐S. (2008) Smoothly clipped absolute deviation on high dimensions. J. Am. Statist. Ass., 103, 1665–1673. [Google Scholar]
  30. Lee, S. , Seo, M. and Shin, Y. (2011) Testing for threshold effects in regression models. J. Am. Statist. Ass., 106, 220–231. [Google Scholar]
  31. Lin, W. and Lv, J. (2013) High‐dimensional sparse additive hazards regression. J. Am. Statist. Ass., 108, 247–264. [Google Scholar]
  32. Meinshausen, N. and Yu, B. (2009) Lasso‐type recovery of sparse representations for high‐dimensional data. Ann. Statist., 37, 246–270. [Google Scholar]
  33. Pesaran, M. H. and Pick, A. (2007) Econometric issues in the analysis of contagion. J. Econ. Dynam. Control, 31, 1245–1277. [Google Scholar]
  34. Raskutti, G. , Wainwright, M. J. and Yu, B. (2010) Restricted eigenvalue properties for correlated gaussian designs. J. Mach. Learn. Res., 11, 2241–2259. [Google Scholar]
  35. Raskutti, G. , Wainwright, M. J. and Yu, B. (2011) Minimax rates of estimation for high‐dimensional linear regression over‐balls. IEEE Trans. Inform. Theor., 57, 6976–6994. [Google Scholar]
  36. Raskutti, G. , Wainwright, M. J. and Yu, B. (2012) Minimax‐optimal rates for sparse additive models over kernel classes via convex programming. J. Mach. Learn. Res., 13, 389–427. [Google Scholar]
  37. Seijo, E. and Sen, B. (2011a) Change‐point in stochastic design regression and the bootstrap. Ann. Statist., 39, 1580–1607. [Google Scholar]
  38. Seijo, E. and Sen, B. (2011b) A continuous mapping theorem for the smallest argmax functional. Electron. J. Statist., 5, 421–439. [Google Scholar]
  39. Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B, 58, 267–288. [Google Scholar]
  40. Tibshirani, R. (2011) Regression shrinkage and selection via the lasso: a retrospective (with comments). J. R. Statist. Soc. B, 73, 273–282. [Google Scholar]
  41. Tong, H . (1990) Non‐linear Time Series: a Dynamical System Approach. New York: Oxford University Press. [Google Scholar]
  42. Wang, L. , Wu, Y. and Li, R. (2012) Quantile regression for analyzing heterogeneity in ultra‐high dimension. J. Am. Statist. Ass., 107, 214–222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Wu, Y. (2008) Simultaneous change point analysis and variable selection in a regression problem. J. Multiv. Anal., 99, 2154–2171. [Google Scholar]
  44. Zhang, N. R. and Siegmund, D. O. (2012) Model selection for high dimensional multi‐sequence change‐point problems. Statist. Sin., 22, 1507–1538. [Google Scholar]
  45. Zou, H. (2006) The adaptive lasso and its oracle properties. J. Am. Statist. Ass., 101, 1418–1429. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

‘Online appendices’.


Articles from Journal of the Royal Statistical Society. Series B, Statistical Methodology are provided here courtesy of Wiley

RESOURCES