ABSTRACT
In this paper, we consider the structural change in a class of discrete valued time series, where the true conditional distribution of the observations is assumed to be unknown. The conditional mean of the process depends on a parameter which may change over time. We provide sufficient conditions for the consistency and the asymptotic normality of the Poisson quasi-maximum likelihood estimator (QMLE) of the model. We consider an epidemic change-point detection and propose a test statistic based on the QMLE of the parameter. Under the null hypothesis of a constant parameter (no change), the test statistic converges to a distribution obtained from increments of a Browninan bridge. The test statistic diverges to infinity under the epidemic alternative, which establishes that the proposed procedure is consistent in power. The effectiveness of the proposed procedure is illustrated by simulated and real data examples.
KEYWORDS: Discrete valued time series, change-point detection, epidemic alternative, semi-parametric statistic, Poisson QMLE
MATHEMATICS SUBJECT CLASSIFICATIONS: 62M10, 62F03, 62F12
1. Introduction
Change-point detection is a vast and active field of research, since its applications can be found in several areas, such as, epidemiology, finance, ecology, biology, etc. This paper focuses on the epidemic change-point problem (see, for instance, [18,26]) in a large class of integer-valued time series. The epidemic change-point problem involves testing the null hypothesis of no change versus the alternative that two changes occur during the data generating, with the structure of the first and the third segment is the same, and different from the second segment.
Assume that is a time series of counts and denote by the σ-field generated by the whole past at time t−1. Let Θ be a fixed compact subset of ( ) and . For any , define the class of integer-valued time series given by
Class : The process belongs to if it satisfies:
(1) |
where is a measurable non-negative function, assumed to be known up to the parameter θ. Ahmad and Francq [1] carried out the inference question in the semiparametric setting in the class , whereas [8,9] focused on the model selection and multiple change-points problems in this class. Note that, numerous classical integer-valued time series models belong to the class : for instance, the Poisson INGARCH models (see for instance [12]), the negative binomial INGARCH models (proposed by [27]), the binomial INGARCH (see [25]), the Poisson exponential autoregressive models (see [13]), the INAR models (see [23,24]).
Firstly, we consider the class for some and carry out the inference on the parameter . The consistency and the asymptotic normality of the Poisson quasi-maximum likelihood estimator (QMLE) are addressed. These results are the same as the ones obtained by [1]. But the conditions set here seem to be more straightforward than those needed by these authors (see Section 2 below).
Secondly, we focus on the test with an epidemic alternative, for detecting changes in the parameter of the class . The principle is that, the parameter has changed at time , and then restored its original value after a time ; that is, an ‘epidemic’ occurred between and . This question has been addressed in several works; see, among others papers, [2,3,5,6,14–16,18–21,26]. Most of these procedures are developed for epidemic change-point detection in the mean of random variables. Also, the case time series of count has not received great attention in the literature, while these models are very useful in many fields (see Section 5 for an example of application to the number of hospital admissions). For the general class , we propose a test procedure based on the Poisson QMLE, for detecting epidemic change in the parameter . Under the null hypothesis (no change) the test statistic converges to a distribution obtained from a difference between two Brownian bridges; this test statistic diverges to infinity under the epidemic alternative.
The paper is organized as follows. In Section 2, we set some assumptions, define the Poisson QMLE and establish its asymptotic properties. Section 3 is devoted to the construction of the test statistic and the asymptotic studies under the null and the epidemic alternative. Some simulation results are displayed in Section 4. Section 5 focuses on a real data example and Section 6 provides the proofs of the main results.
2. Assumptions and Poisson QMLE
Throughout the sequel, the following notations will be used:
, for any ;
, for any matrix ; where denotes the set of matrices of dimension with coefficients in ;
for any function ;
, where Y is a random vector with finite order moments;
for any such as ;
.
In the sequel, we will denote by 0 the null vector of any vector space. Consider the following classical contraction condition on the function .
Assumption 1
(i = 0, 1, 2): For any , the function is i times continuously differentiable on Θ with ; and there exists a sequence of non-negative real numbers satisfying (or for i = 1, 2); such that for any ,
In the whole paper, it is assumed that any belonging to is a stationary and ergodic process satisfying:
(2) |
Let be a trajectory with and . Then, for any subset , the conditional Poisson (quasi)log-likelihood computed on is given (up to a constant) by
where . This conditional (quasi)log-likelihood is approximated (see also [1]) by
(3) |
where . According to (3), the Poisson QMLE of computed on is defined by
(4) |
When is a trajectory of a process belonging to , we impose the following assumptions to study the asymptotic behavior of the Poisson QMLE.
-
(A0):
for all , ; moreover, such that , for all ;
-
(A1):
is an interior point of ;
-
(A2):
for all , .
The above assumptions allow to establish the consistency and the asymptotic normality of the QMLE. Such results have already been obtained by [1]. However, the conditions needed in the following proposition seem to be more straightforward. See also Remark 2 in [10], which points out that, many of the assumptions in [1] (for instance, (4), (11), (12), (14), (15)) can be obtained from the Lipschitz-type conditions above.
Proposition 2.1
Assume that is a trajectory of a process belonging to . Let and be two integer valued sequences such that , and as .
- (i)
- (ii)
This proposition will be proved by relying on some results which have already been established in [11] without using the assumption of ‘conditional Poisson distribution’.
For any with , define the following matrices:
According to (A2), one can easily show that the matrices I and J are symmetric and positive definite. Further, the part (i.) of Proposition 2.1 implies the almost sure convergence of and to and , respectively. Therefore, is a consistent estimator of the covariance matrix Σ.
3. Change-point test and asymptotic results
Assume that is an observed trajectory of the process and we would like to test the null hypothesis of constant parameter
-
:
is a trajectory of the process stationary with ,
against the epidemic alternative
-
:
there exists (with and ) such that and are trajectories of a process , and is a trajectory of a process .
We derive a retrospective test procedure in a semi-parametric setting, with a statistic based on the Poisson QMLE. Suppose that and are two integer valued sequences such that: , and . For all , define the matrix
(7) |
and the subset
(8) |
For all , we introduce
(9) |
Consider thus the test statistic given by
(10) |
Let us note that, the sequences and play a very important role in the construction of the proposed test statistic . In the theoretical study, they allow to apply the convergence results of Proposition 2.1 when establishing the consistency of the proposed procedure. For example, assures that the lengths of , and are sufficiently large (see (8)), whereas assures that the length of (see (7)) is large enough as . In practical applications, they are used to assure that the length of , , and are not too small, which allows to obtain the convergence of the numerical algorithm used to compute the estimators on theses segments. See also [11,17] for further comments on such sequences. The matrix is also essentially useful to prove the asymptotic properties of , because: (i) under , each of the three matrices in the formula of converges almost surely to the covariance matrix Σ and (ii) under the epidemic alternative, the first and third matrices converge to the covariance matrix of the stationary model of the first regime (or to the third regime) which is positive definite. The consistency of second matrix is not ensured under the alternative; but it is positive semi-definite.
Note that, a weight function can be used to increase the power of the test procedure based on the statistic . See, for instance, [7,10,11] for some examples. The statistic can be seen as an extension to any parameter of the test statistic proposed by [19] (statistic ), [16] (statistic ), [5] (statistic ) or [2] (statistic ) in the context of mean change analysis. Indeed, in the particular case of the change-point detection in the mean with the empirical mean computed on the segment , the statistic is equivalent to those proposed by these authors.
The statistic test evaluates the distance between and , for all . These distances are not too large in the absence of change-point (i.e. under ). Thus, the procedure rejects the null hypothesis if there exist two instants and such that the distances exceed a suitably chosen constant.
The following theorem establishes the asymptotic behavior of the test statistic under the null hypothesis.
Theorem 3.1
Under with , assume that (A0)–(A2), (i = 0, 1, 2) and (2) (with ) and (6) hold. Then,
(11) where is a d-dimensional Brownian bridge.
For a significance level , the critical region of the test is then , where is the -quantile of the distribution of . This assures that the test procedure has correct size asymptotically. Table 1 below shows the values of for and , which are obtained by computing the empirical quantiles through Monte-Carlo simulations based on 5000 replications. The distribution was evaluated on a grid of size 1000.
Table 1.
Some empirical -quantiles of the distribution of .
d | |||||
---|---|---|---|---|---|
α | 1 | 2 | 3 | 4 | 5 |
0.01 | 3.907 | 7.320 | 12.384 | 16.004 | 19.039 |
0.05 | 2.973 | 5.690 | 8.948 | 11.708 | 14.471 |
0.10 | 2.503 | 4.988 | 7.650 | 9.954 | 12.410 |
Under the epidemic alternative, we set the following additional condition.
Assumption 2
There exists such that (with is the integer part).
Combining all the regularity assumptions given above, we obtain the following result.
Theorem 3.2
Under with and belonging to , assume that B, (A0)–(A2), (i = 0, 1, 2), (2) (with ) and (6) hold. Then,
(12)
This theorem establishes the consistency in power of the proposed procedure. Under , an estimator of the vector of breakpoints is given by
The property of this estimator, for example, the study of the asymptotic behavior of is a topic of another research subject.
4. Simulation study
We present some simulation results in order to assess the empirical size and power of the proposed test procedure. To do so, we consider the following processes:
- Poisson-INGARCH processes:
(13) - NB-INGARCH processes:
where denotes the negative binomial distribution with parameters r (assumed to be known) and p, and the parameter vector associated to the models is denoted by which becomes when (i.e. for an INARCH representation). The NB-INGARCH processes are generated with r = 1 and r = 5. Here, we use the probability mass function of given by(14)
Firstly, we generate two trajectories from (14): a trajectory under with and a trajectory under with breaks at when changes to and when reverts back to . We have implemented the procedure in the R software (developed by the CRAN project). Figure 1 shows the realizations of the statistic computed with . As can be seen from this figure, in the scenario without change, the statistic is less than the limit of the critical region that is represented by the horizontal triangle (see Figure 1(c)). Under the alternative (of epidemic change), is greater than the critical value of the test and it reaches its maximum around the point where the changes occur (see the dotted lines in Figure 1(d)).
Figure 1.
Typical realization of 500 observations of two NB-INGARCH(1,1) processes with r = 5 and the corresponding statistics for the epidemic change-point detection. (a) is a trajectory without change, where the true parameter is constant. (b) is a trajectory generated under the epidemic alternative, where the parameter changes to at and reverts back to at . The horizontal triangles in (c) and (d) represent the limit of the critical region of the test, whereas the dotted lines show the point where the maximum of is reached.
Now, for each of the two models (13) and (14), we are going to generate independent replications with sample size in the following situations: a scenario where the parameter is constant (no change) and a scenario where the parameter changes from to at time and reverts back to at time . Table 2 contains the empirical sizes and powers computed (under and , respectively) as the proportion of the number of rejections of the null hypothesis based on 500 replications. These results are obtained with a significance level . The scenario ‘ ’ considered here is related and close to the fitted representation obtained from the real data example (see below). As expected, the performance is better for the Poisson-INGARCH processes than in the NB-INGARCH processes, but the test procedure works well in both cases (see Table 2). It produces reasonable empirical levels which are close to the nominal one when n = 1000. Also, the empirical powers increase with the sample size and are close to 1 when n = 1000, even in situations where the difference between and is relatively small (see, for example, the last scenario of the Poisson-INGARCH in Table 2). This is consistent with the results of Theorem 3.2.
Table 2.
Empirical sizes and powers at the nominal level 0.05 for the epidemic change-point detection in the Poisson-INGARCH (13) and NB-INGARCH (14) processes.
n = 500 | n = 1000 | |||||
---|---|---|---|---|---|---|
Poisson-INGARCH | Empirical levels: | |||||
0.036 | 0.044 | |||||
0.064 | 0.058 | |||||
0.040 | 0.048 | |||||
Empirical powers: | ||||||
0.996 | 1.000 | |||||
0.990 | 0.998 | |||||
0.826 | 0.964 | |||||
NB-INGARCH | r = 1 | Empirical levels: | ||||
0.032 | 0.042 | |||||
0.076 | 0.060 | |||||
Empirical powers: | ||||||
0.880 | 0.992 | |||||
0.896 | 0.948 | |||||
r = 5 | Empirical levels: | |||||
0.030 | 0.046 | |||||
0.074 | 0.056 | |||||
Empirical powers: | ||||||
0.984 | 1.000 | |||||
0.966 | 0.992 |
5. Real data example
We investigate the number of daily hospital admissions for respiratory diseases in children under 6 years old in the Vitória metropolitan area, Brazil. The data are obtained from the Hospital Infantil Nossa Senhora da Gloria. The time series is plotted in Figure 2(a). There are 413 available observations that represent the admission from 13 June 2008 through 30 July 2009. This time series is a part of a large dataset (available at https://rss.onlinelibrary.wiley.com/pb-assets/hub-assets/rss/Datasets/RSSC%2067.2/C1239deSouza-1531120585220.zip) which has been studied by [22]. In their works, they used a hybrid generalized additive with Poisson marginal distribution to analyze the effects of some atmospheric pollutants on the number of hospital admissions due to cause-specific respiratory diseases.
Figure 2.
Plot of for the epidemic change-point detection applied to the number of treatments for respiratory diseases in the Vitória metropolitan area, Brazil, between 13 June 2008 and 30 July 2009 with an INARCH(1) representation. The vertical lines in (a) are the estimated breakpoints. The horizontal triangle in (b) represents the limit of the critical region of the test, whereas the dotted lines show the point where the maximum of is reached.
The time series plot appears to show an epidemic change in the sequence. To test this, we apply our detection procedure with an INARCH representation given by . In each segment, to compute the QPMLE, the initial values and are set to be the empirical mean of the data and the null vector, respectively. For and , Figure 2(b) shows the values of the statistic corresponding to all the possible combinations . The critical value of the nominal level is and the resulting test statistic is ; which implies the rejection of the null hypothesis (i.e. changes-points are detected). The peak in the graph is reached at the point which is the vector of the locations of the break-points estimated. In the simulation with the scenario close to this real data example, we have found that the estimator of the change-point is on average very close to the true change-point . More precisely, among the 500 trajectories simulated for n = 500, we considered those for which a change-point was detected (498 for Poisson INGARCH and 492 for NB-INGARCH process), and we found that on average, in both of these cases. The locations of the changes detected here correspond to the dates 27 December 2008 and 24 March 2009. The second regime detected coincides with a large part of the austral summer which is from December to March; which partly explains the slight decrease of the number of hospital admissions observed in this period. The estimated model on each regime yields:
(15) |
where in parentheses are the standard errors of the estimators. In (15), one can see that, the parameter of the first regime is very close to that of the third regime. This is in accordance with the alternative and lends a substantial support to the existence of an epidemic change-point in this series.
6. Proofs of the main results
Let and be sequences of random variables or vectors. Throughout this section, we use the notation to mean: for all . Write to mean: for all , there exists C>0 such that for n large enough. In the sequel, C denotes a positive constant whose the value may differ from one inequality to another.
6.1. Proof of Proposition 2.1
Without loss of generality, we show the results for and (i.e. the consistency and the asymptotic normality of ).
To simplify the expressions in this paragraph, we set: and for any .
To prove the first part of the proposition (i.e. the consistency), it suffices to show that the condition (4) of [1] is satisfied. This condition is established by [10] in their Remark 2.1 by using and (5).
- Applying the mean value theorem to the function for all , there exists between and such that
which is equivalent to
with(16) (17)
Moreover, by proceeding as in Lemma 7.1 of [11], we can show that
In addition, for n large enough since is a local maximum of the function (from the assumption (A1) and the consistency of ). Thus, (16) gives
(18) |
The following lemma will be useful in the sequel.
Lemma 6.1
Assume that all the assumptions of Proposition 2.1 hold. Then,
- (a)
.
- (b)
is a stationary ergodic, square integrable martingale difference sequence with covariance matrix ;
- (c)
and that the matrix is invertible.
Proof.
Let us use the Lemma 6.1 to complete the proof of the part (ii.) of Proposition 2.1. From Lemma 6.1(c), for n large enough such that (defined in (17)) is an invertible matrix. Then, the relation (18) is equivalent to
Furthermore, applying the central limit theorem to the stationary ergodic martingale difference sequence (see Lemma 6.1(b)), we have
Therefore, for n large enough, it holds that
6.2. Proof of Theorem 3.1
The following lemma is obtained from the Lemma A.1 and A.4 of [9]; the proof is then omitted.
Lemma 6.2
Assume that the assumptions of Theorem 3.1 hold. Then,
Define the statistic
where Σ is defined in Proposition 2.1 and computed at , under . Consider the following lemma.
Lemma 6.3
Assume that the assumptions of Theorem 3.1 hold. Then,
Proof.
Let . According to the asymptotic normality of the QMLE from Proposition 2.1 and the consistency of , when , we have
(21) Then, in addition to the consistency of the QMLE from Proposition 2.1, we obtain
This allows to conclude the proof of the lemma.
Let , and . The mean value theorem to the function to implies that there exists between and such that
This is equivalent to
(22) |
with
We first use Lemma 6.1 to show that
(23) |
Remark that
and
Let . Applying (22) with and , we have
(24) |
With and , (22) gives
(25) |
Moreover, as , Proposition 2.1 and Lemma 6.1(c) (applied to ) imply
Then, according to (21), for n large enough, it holds from (24) that
where the last equality is obtained from Lemma 6.2 (ii.). It is equivalent to
(26) |
For n large enough, is an interior point of Θ and we have . Thus, from (26), we obtain
(27) |
Similarly, using (25), we also obtain
(28) |
The subtraction of (27) and (28) gives
i.e.
(29) |
By going along similar lines, we have
(30) |
Combining (29) and (30), we get
i.e.
(31) |
Recall that, for any ,
From Lemma 6.1(b) (applied to ), applying the central limit theorem for the martingale difference sequence (see [4]), we have
where is a Gaussian process with covariance matrix . Hence,
in , where is a d-dimensional standard motion, and is a d-dimensional Brownian bridge.
Similarly, we get
Thus, as , it comes from (31) that
Hence, for n large enough, we have
in D([0,1]). We conclude the proof of the theorem from Lemma 6.3. □
6.2.1. Proof of Theorem 3.2
Recall that, under the alternative, is a trajectory of a process satisfying
(32) |
where (with ) and (j = 1, 2) is a stationary and ergodic solution of the model (1) depending on with .
We have . Then, it suffices to show that to establish the theorem. For any ,
with
and
Moreover, for n large enough, (from the consistency of the Poisson QMLE). Consequently, becomes
(33) |
Furthermore, by definition, the three matrices in the formula of are positive semi-definite, and the first and the last one converge a.s. to same matrix which is positive definite.
Then, for n large enough, we can write
with
From the asymptotic properties of the Poisson QMLE, we have
;
,
where
Therefore, since is positive definite and , we deduce that . This completes the proof of the theorem. □
Funding Statement
The first author was supported by the CY Advanced Studies (CY Cergy Paris Université, France), and the MME-DII Center of Excellence [grant number ANR-11-LABEX-0023-01]. The second author developed within [grant number ANR BREAKRISK: ANR-17-CE26-0001-01] and the CY Initiative of Excellence [grant 'Investissements d'Avenir' ANR-16-IDEX-0008, Project 'EcoDep' PSI-AAP2020-0000000013].
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Ahmad A. and Francq C., Poisson QMLE of count time series models, J. Time Ser. Anal. 37 (2016), pp. 291–314. [Google Scholar]
- 2.Aston J.A. and Kirch C., Detecting and estimating changes in dependent functional data, J. Multivar. Anal. 109 (2012), pp. 204–220. [Google Scholar]
- 3.Aston J.A. and Kirch C., Evaluating stationarity via change-point alternatives with applications to fmri data, Ann. Appl. Stat. 6 (2012), pp. 1906–1948. [Google Scholar]
- 4.Billingsley P., Convergence of Probability Measures, Wiley, London, 1968. [Google Scholar]
- 5.Bucchia B., Testing for epidemic changes in the mean of a multiparameter stochastic process, J. Stat. Plan. Inference. 150 (2014), pp. 124–141. [Google Scholar]
- 6.Csörgö M., Csörgö M., and Horváth L., Limit Theorems in Change-Point Analysis, Wiley, New York, 1997. [Google Scholar]
- 7.Diop M.L. and Kengne W., Testing parameter change in general integer-valued time series, J. Time Ser. Anal. 38 (2017), pp. 880–894. [Google Scholar]
- 8.Diop M.L. and Kengne W., Consistent model selection procedure for general integer-valued time series, Stats. 55 (2021), pp. 1207–1230. [Google Scholar]
- 9.Diop M.L. and Kengne W., Piecewise autoregression for general integer-valued time series, J. Stat. Plan. Infer. 211 (2021), pp. 271–286. [Google Scholar]
- 10.Diop M.L. and Kengne W., Poisson QMLE for change-point detection in general integer-valued time series models, Metrika. 85 (2022), pp. 373–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Doukhan P. and Kengne W., Inference and testing for structural change in general poisson autoregressive models, Electron. J. Stat. 9 (2015), pp. 1267–1314. [Google Scholar]
- 12.Ferland R., Latour A., and Oraichi D., Integer-valued garch process, J. Time Ser. Anal. 27 (2006), pp. 923–942. [Google Scholar]
- 13.Fokianos K., Rahbek A., and Tjøstheim D., Poisson autoregression, J. Am. Stat. Assoc. 104 (2009), pp. 1430–1439. [Google Scholar]
- 14.Graiche F., Merabet D., and Hamadouche D., Testing change in the variance with epidemic alternatives, Commun. Stat. Theor. M. 45 (2016), pp. 3822–3837. [Google Scholar]
- 15.Guan Z., Semiparametric tests for change-points with epidemic alternatives, J. Stat. Plan. Infer. 137 (2007), pp. 1748–1764. [Google Scholar]
- 16.Jarušková D. and Piterbarg V.I., Log-likelihood ratio test for detecting transient change, Stat. Probab. Lett. 81 (2011), pp. 552–559. [Google Scholar]
- 17.Kengne W.C., Testing for parameter constancy in general causal time-series models, J. Time Ser. Anal. 33 (2012), pp. 503–518. [Google Scholar]
- 18.Levin B. and Kline J., The cusum test of homogeneity with an application in spontaneous abortion epidemiology, Stat. Med. 4 (1985), pp. 469–488. [DOI] [PubMed] [Google Scholar]
- 19.Račkauskas A. and Suquet C., Hölder norm test statistics for epidemic change, J. Stat. Plan. Infer. 126 (2004), pp. 495–520. [Google Scholar]
- 20.Račkauskas A. and Suquet C., Testing epidemic changes of infinite dimensional parameters, Stat. Inference Stoch. Process. 9 (2006), pp. 111–134. [Google Scholar]
- 21.Ramanayake A. and Gupta A.K., Tests for an epidemic change in a sequence of exponentially distributed random variables, Biom. J.: J. Math. Methods Biosci. 45 (2003), pp. 946–958. [Google Scholar]
- 22.Souza J.B., Reisen V.A., Franco G.C., Ispány M., Bondon P., and Santos J.M., Generalized additive models with principal component analysis: An application to time series of respiratory disease and air pollution data, J. R. Stat. Soc. Ser. C. Appl. Stat. 67 (2018), pp. 453–480. [Google Scholar]
- 23.Weiß C.H., Thinning operations for modeling time series of counts-a survey, AStA Adv. Stat. Anal. 92 (2008), pp. 319–341. [Google Scholar]
- 24.Weiß C.H., Feld M.H.-J., Mamode Khan N., and Sunecher Y., INARMA model, Count Time Ser. Stats. 2 (2019), pp. 284–320. [Google Scholar]
- 25.Weiß C.H. and Pollett P.K., Binomial autoregressive processes with density-dependent thinning, J. Time Ser. Anal. 35 (2014), pp. 115–132. [Google Scholar]
- 26.Yao Q., Tests for change-points with epidemic alternatives, Biometrika. 80 (1993), pp. 179–191. [Google Scholar]
- 27.Zhu F., A negative binomial integer-valued GARCH model, J. Time Ser. Anal. 32 (2011), pp. 54–67. [Google Scholar]