Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2021 Nov 1;50(2):352–369. doi: 10.1080/02664763.2021.1993798

A first-order binomial-mixed Poisson integer-valued autoregressive model with serially dependent innovations

Zezhun Chen 1,CONTACT, Angelos Dassios 1, George Tzougas 1,
PMCID: PMC9870000  PMID: 36698548

Abstract

Motivated by the extended Poisson INAR(1), which allows innovations to be serially dependent, we develop a new family of binomial-mixed Poisson INAR(1) (BMP INAR(1)) processes by adding a mixed Poisson component to the innovations of the classical Poisson INAR(1) process. Due to the flexibility of the mixed Poisson component, the model includes a large class of INAR(1) processes with different transition probabilities. Moreover, it can capture some overdispersion features coming from the data while keeping the innovations serially dependent. We discuss its statistical properties, stationarity conditions and transition probabilities for different mixing densities (Exponential, Lindley). Then, we derive the maximum likelihood estimation method and its asymptotic properties for this model. Finally, we demonstrate our approach using a real data example of iceberg count data from a financial system.

Keywords: Count data time series, binomial-mixed Poisson INAR(1) models, mixed Poisson distribution, overdispersion, maximum likelihood estimation

1. Introduction

Modelling the integer-valued count time series has attracted a lot of attention over the last few years in a plethora of different scientific fields such as the social sciences, healthcare, insurance, economics and the financial industry. The standard ARMA model will inevitably introduce real-valued results, and so is not appropriate for modelling this type of data. As a result, many alternative classes of integer-valued time series models have been introduced and explored in the applied statistical literature. The Integer-valued autoregressive process of order one, abbreviated as INAR(1), was proposed by McKenzie [8] and Al-Osh and Alzaid [1] as a counterpart to the Gaussian AR(1) model for Poisson counts. This model was derived by manipulating the operation between coefficients and variables, as well as the innovation term, in such a way that the values are always integers. The relationship of coefficients and variables is defined as αXt=i=1kVi such that Vi are i.i.d Bernoulli random variables with parameter α and ° denotes the binomial thinning operator. The binomial thinning is very easy to interpret, and binomial INAR(1) has the same autocorrelation structure as the standard AR(1) model and hence can be applied to fit the count data. For a general review, please see [11,12].

Later on, in order to accommodate different features exhibited by count data, for example, under-dispersion, overdispersion, probability of observing zero and different dependent structures, many research studies introduced alternative thinning operators or varied the distribution of Vi for different needs. The case where Vi are i.i.d geometric random variables is analysed by Ristić et al. [10], which is called NGINAR(1). Kirchner [7] introduced reproduction operators so that Vi are i.i.d Poisson random variables to explore the relationship between Hawkes process and integer-valued time series. For further variation, random coefficients thinning is introduced so that Vi are i.i.d Bernoulli with the parameter α being a random variable. This type of thinning operator was proposed by McKenzie [8,9] and Zheng et al. [14]; they applied this to a generalized INAR(1) model. In particular, to accommodate the overdispersion feature, one way is to change the thinning operators from binomial to other types as discussed above. Another way is to replace the innovation distribution with some other overdispersed distribution; for example, see [2]. A third approach would be to keep the structure of binomial INAR(1) but to allow the innovation terms to be serially dependent; see [13].

In this study, motivated by Weiß [13], we develop a new family of binomial-mixed Poisson INAR(1) (BMP INAR(1)) processes by adding a mixed Poisson component to the innovations term of the classical Poisson INAR(1) process. The proposed class of BMP INAR(1) processes is ideally suited for modelling heterogeneity in count time series data since, due to the mixed Poisson component which we introduce herein, it includes many members with different transition probabilities that can adequately capture different levels of overdispersion in the data while keeping the innovation as independent Poisson.

The rest paper is organized as follows. Section 2 defines the Binomial mixed Poisson INAR(1) model by adding a mixed Poisson component in the Poisson INAR(1) model. Statistical properties and the stationarity condition are derived in Section 3. Section 4 derives the distribution of the mixed Poisson component based on two different mixing density functions from the exponential family, namely the Exponential and Lindley distributions. In Section 5, maximum likelihood estimation is discussed as well as its asymptotic properties for the estimators. In Section 6, the model is fitted to financial data (iceberg count) and discuss numerical results. Finally, concluding remarks are provided in Section 7.

2. Construction of binomial mixed Poisson INAR(1)

In [13], the classical Poisson INAR(1) was extended by allowing the innovations ε to depend on the current state of the model Xt such that εtPo(aXt1+b), where a and b are some positive constants. The innovation with this definition is separable in the sense that εt=aXt1+ϵt, where aXt1=i=1Xt1Ui, with Uii.i.dPo(a) and ϵtPo(b). To introduce further heterogeneity while maintaining serially dependent innovations structure in this model, we extend this by allowing Ui to be a mixed Poisson random variable.

Starting from a Poisson random variable U with parameter θ, we may obtain a large class of random variables by allowing θ to be another random variable which follows some classes of density function g(θ|φ) where φ can be a scalar or a vector; see Karlis [6]. The random variable U follows a Mixed Poisson distribution with g as a mixing density. The distribution function of U is defined as

P(U=u)=0eθiθiuu!g(θ|φ)dθ. (1)

We now construct our model.

Definition 2.1

The Binomial-Mixed Poisson integer-valued Autoregressive model (BMP INAR(1)) is defined by the following equations:

Xt+1=p1Xt+εt+1=p1Xt+φgXt+Zt+1,p1Xt=k=1XtVk,φgXt=i=1XtUi,P(Ui=x)=0eθiθixx!g(θi|φ)dθi, (2)

where

  • ° is a binomial thinning operator such that Vi are i.i.d Bernoulli random variables with parameter p1[0,1];

  • {Zt}t=1,2, are i.i.d Poisson random variables with rate λ1>0;

  • g is a reproduction operator such that Ui are independent Mixed Poisson distributed with mixing density function g(θi|φ);

  • g and ° are independent of each other so that Ui and Vk are independent of each other.

As we will see shortly, the stationarity condition for this model is simply p1+μg<1 where μg is the first moment of Ui. When it comes to interpretation, this model can be seen as the evolution of a population where the binomial part indicates the survivors from the previous period, the mixed Poisson part is the total offspring and the innovation part indicates immigrants. Obviously, this model is a Markov Chain and its transition probability can be found easily once we know the mixing density g(θ|φ). The probability mass function of Yt+1=φgXt is given by

P(Yt+1=y|Xt=n)=E[ei=1nθi(i=1nθi)yy!], (3)

where the expectation is taken over θ1,θ2,,θn. In order to evaluate the expectation explicitly, it would be desirable that the random variables θi have an ‘additivity’ property such that density (or probability mass function) of the sum i=1nθi is either itself with different parameters or can be written in a closed form. Many members of the exponential family have this kind of property. In general, we let g(x|φ) be of an exponential family form such that

g(x|φ)=h(x)exp{η(φ)T(x)+ξ(φ)}. (4)

Denote the density of the sum Sn=i=1nθi as gn(s|φ), where θi are i.i.d random variables with density g(θ|φ). The expectation above can be expressed as

P(Yt+1=y|Xt=n)=R+essyy!gn(s|φ)ds. (5)

The density gn(s|φ) is explicitly known in many cases, for example, it can be an Inverse Gaussian, Exponential, Gamma, Geometric, Bernoulli or Lindley. For the sake of parsimony, we use distributions with a single parameter. In other words, we assume that φ is scalar. Note that, if we let g(θ|φ)=δφ(θ) – a Dirac delta function concentrating at φ, the model will recover to the Extended Poisson INAR(1) in [13].

3. Statistical properties of BMP INAR(1)

3.1. Moments and correlation structure

We first need to derive the moments of Ui.

Lemma 3.1

The first moment and second central moment of Ui with density g(x|φ) are given by

E[Ui]=μg,Var(Ui)=μg+σg2, (6)

where μg=Eg[θi]=Rxg(x|φ)dx and σg2=Varg(θi).

Proof.

By the conditional expectation argument

E[Ui]=Eg[E[Ui|θi]]=Eg[θi]=μg,E[Ui2]=Eg[E[Ui2|θi]]=Eg[θi2+θi],Var(Ui)=E[Ui2](E[Ui])2=σg2+μg.

Proposition 3.1

Assume p1+μg<1. The stationary moments of Xt is given by

E[Xt]=μx=λ11p1μg,Var(Xt)=σx2=μx1p12+σg21(p1+μg)2,Cov(Xt,Xtk)=γ(k)=(p1+μg)kσx2. (7)

Proof.

For the first moment, we have

E[Xt]=E[p1Xt1]+E[φgXt1]+E[Zt],μx=p1μx+μgμx+λ1,μx=λ11p1μg.

Since the operators ° and g are independent of each other, for the second central moment, we have

Var(Xt)=Var(p1Xt1+φgXt1)+Var(Zt)=Var(E[i=1Xt1(Vi+Ui)|Xt1])+E[Var(i=1Xt1(Vi+Ui)|Xt1)]+λ1=(p1+μg)2σx2+(p1(1p1)+σg2+μg)μx+λ1,σx2=μx1p12+σg21(p1+μg)2.

Let Ft=σ(Xt,Xt1,,) be the σ-algebra generated by the model Xt up to time t, the covariance of the model is given by

Cov(Xt,Xtk)=Cov(p1Xt1,Xtk)+Cov(φgXt1,Xtk)+Cov(Zt,Xtk).

Again by using conditional expectations, we have

Cov(p1Xt1,Xtk)=Cov(E[p1Xt1|Ft1],E[Xtk|Ft1])+E[Cov(p1Xt1,Xtk|Ft1)]=Cov(p1Xt1,Xtk)+E[Cov(i=1Xt1Vi,Xtk|Ft1)]=p1γ(k1)+0.

Obviously, Cov(Xt,Xtk)=γ(k)=(p1+μg)γ(k1)=(p1+μg)kγ(0).

From the results above, it is clear that this model follows the same correlation structure as that of standard AR(1) model. Furthermore, unlike equal-dispersed Poisson INAR(1), BMP INAR(1) is in general an overdispersed model with Fisher index of dispersion

FIx=σx2μx=1+μg2+2p1μg+σg21(p1+μg)2. (8)

3.2. Existence of stationary solution

Proposition 3.2

Given that P(Ui=0)>0 and p1+μg<1 the following infinite sequence:

fi(θ)=(1p1+p1fi1(θ))Φu(fi1(θ)),i1,f0(θ)=θ,θ[0,1], (9)

where Φu(θ) is the probability generating function (p.g.f) of Ui, has a limit limifi(θ)=1

Proof.

Define the increment of the sequence

fi(θ)fi1(θ)=(1p1+p1fi1(θ))Φu(fi1(θ))fi1(θ)=(1p1+p1x)Φu(x)xx=fi(θ)=:Q(x).

By the definition of p.g.f, x[0,1], the monotonicity of this function is shown by its first and second derivatives

Q(x)=p1Φu(x)+(1p1+p1x)Φu(x)1,Q(x)=2p1Φu(x)+(1p1+p1x)Φu(x).

By the definition of p.g.f, Φ(x)0 and Φ(x)0. So Q(x)0, which implies Q(x) is non-decreasing function. Then we have

Q(x)Q(1)=p1+μg1<0.

Notice that Q(0)=(1p1)P(Ui=0)>0,Q(1)=0. Hence we can conclude that Q is a monotonic decreasing function ranging from 0 to Q(0). In order words, for any i=1,, and θ[0,1], the sequence fi(θ)=fi1(θ)+Q(fi1(θ)) is increasing with respect to i. Finally, limifi(θ)=1.

Proposition 3.3

Let Xt be the BMP INAR(1) model defined in Definition 2.1. If the condition P(Ui)>0 and p1+μg<1 holds, then the process Xt has a proper stationary distribution and Xt is an ergodic Markov Chain. The stationary distribution is Φx(θ)=i=0Φz(fi(θ)).

Proof.

Denote the p.g.f of Xn and the innovation Zn as ΦXn(θ) and Φz(θ)respectively, then ΦXn(θ) can be expressed as following product:

ΦXn(θ)=E[E[θXn|Xn1]|X0]=E[E[θp1Xn1+φgXn1+Zt|Xn1]|X0]=E[f1(θ)Xn1|X0]Φz(f0(θ))==E[fn(θ)X0]i=0n1Φz(fi(θ)).

To show the existence of the limiting distribution is equivalent to show the limit of the product as n goes to infinity is something other than 0, which means that we have to show that the series

LPn=logΦXn(θ)=logE[fn(θ)X0]+i=0n1logΦz(fi(θ))

is convergent as n. The convergence of the infinite series i=0logΦz(fi(θ)) can be shown by the ratio test

limi|logΦz(fi(θ))logΦz(fi1(θ))|=limx1logΦz((1p1+p1x)Φu(x))logΦz(x)=limx1Φz(x)Φz(x)Φz((1p1+p1x)Φu(x))Φz((1p1+p1x)Φu(x))(p1Φu(x)+(1p1+p1x)Φu(x))=p1+μg<1. (10)

Hence limnLPn>, from which we can infer that limnΦXn(θ)>0 exists and the limiting distribution of Xn exists. Furthermore, by the construction of Xn, the chain is defined on a countable state space S={0,1,2,}. The positivity of transition probability P(Xn=j|Xn1=i)>0,i,jS implies that Xn is irreducible and aperiodic. Hence the limiting distribution Φx(θ)=limnΦXn(θ) is the unique stationary distribution for Xn.

In general, P(Ui=0)=R+eθg(θ|φ)dθ>0 as long as g(θ|φ)>0, so we just need to ensure the existence of the first moment to achieve the stationarity of Xn. The infinite product Φx(θ)=i=0Φz(fi(θ)) is the p.g.f of the stationary distribution, which also satisfies

Φx(θ)=Φx((1p1+p1θ)Φu(θ))Φz(θ). (11)

4. Distribution function of the mixed Poisson component

In order to apply maximum likelihood estimation for the statistical inference of this model, we need to derive the distribution of Yt+1=φgXt according to different density functions g. As mentioned before, we focus on the density g coming from the exponential family. For expository purposes, we will derive the distribution of Yt+1 based on exponential and Lindley densities.

4.1. Mixed by exponential density

If g(θ|φ)=1φe1φθ, then the distribution of Ui is given by

P(Ui=x)=0eθiθixx!1φe1θixdθi=1φx!0e(1+1φ)θiθixdθi=(11+φ)(φ1+φ)x,x=0,1,, (12)

which is a geometric distribution with parameter φ1+φ. Then, the distribution function fφ(m,Xt) of φgXt as well as its first and second derivatives are given by

fφ(m,Xt)=Cm+Xt1m(11+φ)Xt(φ1+φ)m,fφ(m,Xt)φ=(mφ(1φ)Xt1+φ)fφ(m,Xt),2fφ(m,Xt)(φ)2=((mφ(1φ)Xt1+φ)2+Xt(1+φ)2m(1+2φ)φ2(1+φ)2)fφ(m,Xt). (13)

Note that Xt will recover to the NGINAR(1) in [10] if we further let p1=0. In general, the stationarity condition becomes p1+φ<1 and the probability generating function of Xt satisfies the equation

Φx(θ)=Φx(1p1+p1θ1+φφθ)Φz(θ). (14)

We will now relax the assumption of the innovation term being Poisson and let the marginal distribution of X be a geometric random variable with parameter α1+α,α>0. Using the relationship of the p.g.f, we can infer the required distribution of Z.

Proposition 4.1

If p1>φ,α>φ or p1<φ,α<φ and the distribution of {Zt}t=1,2, follows a mixed geometric distribution such that

Zt={Geom(φ1+φ),W.P.(p1φ)ααφ,Geom(α1+α),W.P.1(p1φ)ααφ, (15)

then the marginal distribution of X follows a Geom(α1+α) distribution.

Proof.

By utilizing equation (14), we assume the X has a geometric distribution such that Φx(θ)=11+ααθ. Then, the probability generating function of Z has the following form:

Φz(θ)=Φx(θ)Φx(1p1+p1θ1+φφθ)=(1+φφθ)(1+α)α(1p1+p1θ)(1+ααθ)(1+φφθ)=(p1φ)ααφ11+φφθ+(1(p1φ)ααφ)11+ααθ. (16)

4.2. Mixed by Lindley density

Suppose now the density g(θ|φ)=φ21+φ(θ+1)eφθ is a Lindley density function. The distribution of Ui is the so-called Poisson–Lindley distribution, see [6], which has the following probability mass function

P(Ui=x)=0eθiθixx!φ21+φ(θi+1)eφθidθi=φ2(1+φ)x!(0θix+1e(φ+1)θidθi+0θixe(φ+1)θidθi)=φ2(1+φ)x!(Γ(x+2)(1+φ)x+2+Γ(x+1)(1+φ)x+1)=φ2(φ+2+x)(1+φ)x+3,x=0,1,. (17)

Under this parameter setting, E[Ui]=μg=φ+2φ(φ+1) which makes the parameter φ less interpretable. So we adopt the following parameter setting for the mixing density g(θ|φ)

g(θ|φ)=φ~21+φ~(θ+1)eφ~θφ~=1φ+Δ2φΔ=(φ1)2+8φ. (18)

Then, μg=φ,σg=φ22(φ~(1+φ~))2. On the other hand, the additivity of Ui is not that clear. In order to evaluate the expectation (3), we need to find out the distribution of Sn=i=1nθi.

Proposition 4.2

Suppose θi are i.i.d Lindley distributed. The density of the sum Sn=i=1nθi is given by

gn(s|φ)=(φ~21+φ~)neφ~sk=0nCnkΓ(n+k)sn+k1. (19)

Proof.

We can prove this by inverting the Laplace transform. The Laplace transform of θi is

E[eνθi]=0φ~21+φ~(θi+1)e(ν+φ~)θidθi=φ~21+φ~φ~+ν+1(φ~+ν)2.

Then the Laplace transform of Sn is simply the product of E[eνθi], which is

E[eνSn]=(φ~21+φ~)n(φ~+ν+1)n(φ~+ν)2n.

Using a binomial expansion, we have

E[eνSn]=(φ~21+φ~)n1(φ~+ν)2nk=0nCnk(φ~+ν)k=(φ~21+φ~)n1(φ~+ν)nk=0nCnnk(φ~+ν)(nk)=(φ~21+φ~)nk=0nCnk(φ~+ν)(n+k)=(φ~21+φ~)nk=0n0CnkΓ(n+k)sn+k1eφ~seνsds=0eνs(φ~21+φ~)neφ~sk=0nCnkΓ(n+k)sn+k1ds.

Obviously, the density function of Sn is the integrand except eνs.

Then, the distribution of Yt+1=θgXt is given by the following proposition.

Proposition 4.3

The probability mass function of Yt+1=φgXt as well as its derivatives are given by

fφ(y,n)=P(Yt+1=y|Xt=n)=(φ~21+φ~)nk=0nCnkCn+k+y1y(1+φ~)(n+k+y),fφ(y,n)φ~=n(2φ~11+φ~)fφ(y,n)(y+1)fφ(y+1,n),2fφ(y,n)φ~2=(n2(2φ~11+φ~)2n(2φ~21(1+φ~)2))fφ(y,n)2n(y+1)(2φ~11+φ~)fφ(y+1,n)+(y+1)(y+2)fφ(y+2,n),fφ(y,n)φ=fφ(y,n)φ~φ~φ,2fφ(y,n)φ2=2fφ(y,n)φ~2(φ~φ)2+fφ(y,n)φ~2φ~φ2, (20)

where

φ~φ=12φ+φ+32φΔ1φ+Δ2φ2,2φ~φ2=1φ2+12φΔ+1φ+Δφ3(φ+3)22φΔ3φ+3φ2Δ.

Proof.

P(Yt+1=y|Xt=n)=E[ei=1nθi(i=1nθi)yy!]=0essyy!(φ21+φ)neφsk=0nCnkΓ(n+k)sn+k1ds=(φ21+φ)nk=0nCnkΓ(n+k+y)Γ(n+k)Γ(y+1)(1+φ)(n+k+y)=(φ21+φ)nk=0nCnkCn+k+y1y(1+φ)(n+k+y).

5. Maximum likelihood estimation and its asymptotic property

In general, the transition probability can be written down explicitly as

P(Xt+1=i|Xt=j)=m=0min(i,j)Cjmp1m(1p1)jmP(Yt+1+Zt+1=im)=m=0min(i,j)x=0imFp1(m,j)fφ(x,j)Fλ1(imx),Fp1(m,j)=Cjmp1m(1p1)jm,fφ(x,j)=R+essxx!gj(s|φ)ds,Fλ1(imx)=eλ1λ1imx(imx)!. (21)

The log likelihood function is simply (p1,φ,α)=t=0n1logP(Xt+1|Xt).

Proposition 5.1

Suppose we have a random sample {X1,X2,,Xn}. Let p=(p1,φ,λ1) denote the parameters vector for the stationary BMP INAR(1) model. The maximum likelihood estimator p^ has the following asymptotic distribution:

n(p^p)N(0,I1), (22)

where

H={p1p1p1φp1λ1φp1φφφλ1λ1p1p1φλ1λ1},I=E[H], (23)
P(Xt+1|Xt)p1=m=0min(Xt+1,Xt)x=0Xt+1mFp1(m,Xt)p1fφ(x,Xt)Fλ1(Xt+1mx),2P(Xt+1|Xt)(p1)2=m=0min(Xt+1,Xt)x=0Xt+1m2Fp1(m,Xt)(p1)2fφ(x,Xt)Fλ1(Xt+1mx),2P(Xt+1|Xt)p1φ=m=0min(Xt+1,Xt)x=0Xt+1mFp1(m,Xt)p1fφ(x,Xt)φFλ1(Xt+1mx),xy=t=0T12P(Xt+1|Xt)xy1P(Xt+1|Xt)P(Xt+1|Xt)xP(Xt+1|Xt)y1P(Xt+1|Xt)2, (24)

where x,y{p1,φ,λ1}. The first and second derivatives of each distribution function is given by

Fp1(m,Xt)p1=mp1Xtp1(1p1)Fp1(m,Xt),fφ(m,Xt)φ=φR+essxx!gXt(s|φ)ds,Fλ1(m)λ1=(mλ11)Fλ1(m)2Fp1(m,Xt)(p1)2=(m(m1(Xt1)p1)p12(1p1)(Xtm)(m(Xt1)p1)p1(1p1)2)Fp1(m,Xt),2fφ(m,Xt)φ2=2φ2R+essxx!gXt(s|φ)ds,2Fλ1(x)(λ1)2=(12xλ1+x(x1)λ12)Fλ1(x).

Proof.

From Proposition 3.3, we know that the Xn is stationary and ergodic and its stationary distribution is characterized by the p.g.f Φx(θ)=i=0Φz(fi(θ)). Then score functions and information matrix I are also stationary and ergodic. Then the proof for asymptotic normality is similar to the proof of Theorem 4 in Appendix A of [3].

The expectation of information matrix I can be calculated numerically by finding out unconditional distribution P(Xt) and joint distribution P(Xt1,Xt). However, this would be computational intensive when the sample size n is large. In practice, since the process Xt is stationary and ergodic, IH when n is large.

To verify the asymptotic normality of the maximum likelihood estimators, we conduct a Monte Carlo experiment. This experiment is based on 2000 replications. For each replication, a time series of BMP-INAR(1) with chosen mixing density, either Exponential or Lindley, of size n=100,200,,500 is generated. The parameters are set as p1=φ=0.3,λ1=2 for both mixing densities and they are estimated via the maximum likelihood method. The biases and standard errors of the estimated parameters are shown in Tables 1 and 2. We observe that the biases of the estimators are either reasonably small or decreasing with respect to the sample size n. And it is clear that the standard error is also decreasing with respect to n. Finally, in order to graphically inspect the distribution of estimators, normal quantile-quantile plots are provided in Figure 1.

Table 1.

The bias of Maximum likelihood estimators of BMP-INAR(1) model with respect to different sample size n.

  Bias(p^) n = 100 n = 200 n = 300 n = 400 n = 500
Exponential p1 0.0022 −0.0021 0.0019 −0.0003 −0.0003
  ϕ −0.0284 −0.0104 −0.0110 −0.0072 −0.0059
  λ1 0.1089 0.0526 0.0384 0.0366 0.0279
Lindley p1 −0.0008 0.0004 −0.0015 −0.0020 −0.0011
  ϕ −0.0209 −0.0143 −0.0085 −0.0050 −0.0039
  λ1 0.0387 0.0227 0.0141 0.0144 0.0101

Table 2.

The standard error of Maximum likelihood estimators of BMP-INAR(1) model with respect to different sample size n.

  S.E.( p^) n = 100 n = 200 n = 300 n = 400 n = 500
Exponential p1 0.1303 0.0965 0.0752 0.0663 0.0576
  ϕ 0.1384 0.0970 0.0783 0.0670 0.0581
  λ1 0.3982 0.2858 0.2276 0.2012 0.1764
Lindley p1 0.1319 0.0991 0.0854 0.0711 0.0630
  ϕ 0.1432 0.1054 0.0880 0.0729 0.0661
  λ1 0.2050 0.1515 0.1166 0.0999 0.0911

Figure 1.

Figure 1.

Quantile–Quantile plots for maximum likelihood estimators of BMP-INAR(1) model. The left panel shows plots for the Exponential mixing density, while the right panel depicts the plots for the Lindley mixing density.

6. Real data example: iceberg order data

The iceberg order counts concern the Deutsche Telekom shares traded in the XETRA system of Deutsche Börse, and the concrete time series gives the number of iceberg orders (for the ask side) per 20 min for 32 consecutive trading days in the first quarter of 2004. The special feature of iceberg orders is that only a small part of the order (tip of the iceberg) is visible in the order book and the main part of the order is hidden. For detail description, please see the [4,5]. This dataset is also analysed in [13], where the Extended Poisson INAR(1) is applied to fit the data.

A table of descriptive statistics, a time series, as well as the ACF and PACF plots are shown in Table 3 and Figure 2. The variance of the iceberg count is higher than its mean, which indicates the data is overdispersed. The level of dispersion is described by the Fisher index of dispersion FI>1. Evidence of the applicability of a first-order autoregressive model is indicated by the empirical ACF and PACF graphs. They illustrate a clear decay for ACF and cut-off at lag = 1 for PACF.

Table 3.

Descriptive statistics of iceberg count.

Minimum Maximum Median Mean Variance FI
0 9 1 1.407 2.184 1.552

Figure 2.

Figure 2.

Time series plot of iceberg data and its empirical ACF PACF plots. The dash line lines are the 95% confident bands by assuming the series to be a white noise process.

The likelihood function is constructed as in (21) with different fφ(x,j) (mixed by Exponential or Lindley). It is then maximized through ‘optim’ in R with ‘method = BFGS’ (quasi-Newton method) while the standard deviations of MLEs are calculated through inverting the negative observed information matrix in Proposition 5.1 based on MLEs. To access the goodness of fit, we adopt the information criteria AIC and BIC as well as the (standardized) Pearson residuals. If the model is correctly specified, Pearson residuals for BMP-INAR(1) are expected to have a mean and variance close to 0 and 1, respectively, with no significant autocorrelation. The Pearson residuals are calculated by the following formula:

et=xtE[Xt|xt1]Var(Xt|xt1), (25)

where xt denotes the observed value.

The ACF plots of the Pearson residuals in Figure 3 indicate that the BMP-INAR(1) models are appropriate for fitting the iceberg data. The estimated parameters shown in Table 4 are significantly different from 0, which is indicated by their estimated standard deviation. Compared to the Dirac delta case, which is actually the Extended Poisson INAR(1) of [13], the other two cases do show some improvement with smaller AIC, BIC values and larger fitted Fisher index of dispersion FI^x which, however, is slightly smaller than the empirical FI. On the other hand, it seems that there is little difference between the other two cases as they have very similar AIC and BIC values. This is due to the fact that the value of φ^ is identical for both densities. Finally, it should be noted that the variance of the Pearson residuals is visibly larger than 1. As it was previously mentioned, the exponential and Lindley mixing densities were considered for expository purposes. Therefore, since the proposed family of BMP INAR(1) models is quite general, another mixing distribution could potentially more efficiently capture the observed dispersion structure for this data.

Figure 3.

Figure 3.

Autocorrelation of Standardized Pearson residuals for three different mixing densities.

Table 4.

The results for the BMP INAR(1) model mixed by different density functions.

            Pearson residuals  
Mixing density p1^ φ^ λ1^ AIC BIC Mean Variance FI^x
Dirac delta 0.410 0.188 0.567 2212 2226 −0.001 1.159 1.295
  (0.058) (0.059) (0.040)          
Exponential 0.434 0.167 0.563 2208 2222 −0.002 1.154 1.315
  (0.044) (0.044) (0.040)          
Lindley 0.434 0.167 0.563 2208 2222 −0.002 1.154 1.314
  (0.043) (0.043) (0.040)          

Note: The results of Dirac delta case are from Table 2 of [13]. The estimated standard deviations for all models are in brackets.

Overall, the mixed Poison component in the BMP INAR(1) model efficiently captures the overdispersion in this type of financial data.

7. Concluding remarks

The BMP INAR(1) is an extension of the classical Poisson INAR(1) model obtained by adding an additional mixed Poisson component and hence it can capture the level of overdispersion coming from the data. The exponential family is a desired choice for the mixing density due to its ‘additivity’ property. The choice of the mixing density can control the dispersion level to some extent, although the BMP INAR(1) Xt is always overdispersed in general. Furthermore, due to its simplicity, Xt is actually a Markov chain and the maximum likelihood estimation method can be applied easily. The real data analysis shows that BMP INAR(1) can be a potential choice for modelling financial count data that exhibit standard AR(1) structure and overdispersion.

Acknowledgments

The authors would like to thank the anonymous referee for their very helpful comments and suggestions which have significantly improved this article and would like to thank Prof Christian Wei for kindly sharing the financial count data.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Al-Osh M. and Alzaid A.A., First-order integer-valued autoregressive (INAR (1)) process, J. Time Ser. Anal. 8 (1987), pp. 261–275. [Google Scholar]
  • 2.Bourguignon M., Rodrigues J., and Santos-Neto M., Extended Poisson INAR (1) processes with equidispersion, underdispersion and overdispersion, J. Appl. Stat. 46 (2019), pp. 101–118. [Google Scholar]
  • 3.Bu R., McCabe B., and Hadri K., Maximum likelihood estimation of higher-order integer-valued autoregressive processes, J. Time Ser. Anal. 29 (2008), pp. 973–994. [Google Scholar]
  • 4.Frey S. and Sandås P., The impact of iceberg orders in limit order books, in AFA 2009 San Francisco Meetings Paper, 2009.
  • 5.Jung R.C. and Tremayne A., Useful models for time series of counts or simply wrong ones? Adv. Stat. Anal., 95 (2011), pp. 59–91. [Google Scholar]
  • 6.Karlis D., EM algorithm for mixed poisson and other discrete distributions, ASTIN Bull. J. IAA 35 (2005), pp. 3–24. [Google Scholar]
  • 7.Kirchner M., Hawkes and INAR(∞) processes, Stoch. Process. Appl. 126 (2016), pp. 2494–2525. [Google Scholar]
  • 8.McKenzie E., Some simple models for discrete variate time series, J. Am. Water Resour. Assoc. 21 (1985), pp. 645–650. [Google Scholar]
  • 9.McKenzie E., Autoregressive moving-average processes with negative-binomial and geometric marginal distributions, Adv. Appl. Probab. 18 (1986), pp. 679–705. [Google Scholar]
  • 10.Ristić M.M., Bakouch H.S., and Nastić A.S., A new geometric first-order integer-valued autoregressive (NGINAR (1)) process, J. Stat. Plan. Inference 139 (2009), pp. 2218–2226. [Google Scholar]
  • 11.Scotto M.G., Weiß C.H., and Gouveia S., Thinning-based models in the analysis of integer-valued time series: A review, Stat. Model. 15 (2015), pp. 590–618. [Google Scholar]
  • 12.Weiß C.H., Thinning operations for modeling time series of counts a survey, Adv. Stat. Anal. 92 (2008), pp. 319–341. [Google Scholar]
  • 13.Weiß C.H., A poisson INAR(1) model with serially dependent innovations, Metrika 78 (2015), pp. 829–851. [Google Scholar]
  • 14.Zheng H., Basawa I.V., and Datta S., First-order random coefficient integer-valued autoregressive processes, J. Stat. Plan. Inference 137 (2007), pp. 212–229. [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES