Abstract
This study explores zero-inflated count time series models used to analyze data sets with characteristics such as overdispersion, excess zeros, and autocorrelation. Specifically, we investigate the process, a first-order stationary integer-valued autoregressive model with random coefficients and a zero-inflated geometric marginal distribution. Our focus is on examining various estimation and prediction techniques for this model. We employ estimation methods, including Whittle, Taper Spectral Whittle, Maximum Empirical Likelihood, and Sieve Bootstrap estimators for parameter estimation. Additionally, we propose forecasting approaches, such as median, Bayesian, and Sieve Bootstrap methods, to predict future values of the series. We assess the performance of these methods through simulation studies and real-world data analysis, finding that all methods perform well, providing 95% highest predicted probability intervals that encompass the observed data. While Bayesian and Bootstrap methods require more time for execution, their superior predictive accuracy justifies their use in forecasting.
Keywords: Count time series, prediction, Whittle estimation, Bayesian, Sieve bootstrap
1. Introduction
Time series analysis of counts or integer-valued time series, as a distinguished statistical technique, has been widely used in many fields, including, the number of daily transactions in the stock markets [7], the weekly number of patients in a hospital infeeted by influenza [8], the annual severe hurricane counts in the North Atlantic [18], etc. A distinctive feature of this data type is its discrete integer-valued nature, which challenges the suitability of conventional autoregressive models designed for continuous variables, thereby hindering their effective adaptation to this sort of data.
Since the late 1970s, various models have been introduced for modeling count time series, often relying on predefined marginal distributions. Notably, the binomial thinning operator-based method, as explored by [36], addresses this issue. The binomial thinning operator ‘°’ is defined as where is a sequence of i.i.d Bernoulli random variables with . The authors of [1] introduced the concept of the first-order integer-valued autoregressive (INAR(1)) process, pioneering the development of thinning-based INAR models. Their work laid the foundation for subsequent research in this field. Let be a sequence of scalar time series satisfying the INAR(1) equation
(1) |
where can follow a certain marginal distribution and the innovation process is a sequence of i.i.d random variables with some discrete distributions. Also, is independent of Bernoulli counting series and , for l>1. The coefficient α may be explained as the proportion of observations counted at time t−1 that still remains at time t.
Jin-Guan and Yuan [15] presented a pth-order INAR model and [43] proposed a first-order random coefficient INAR process. Furthermore, [31] defined a new INAR(1) process with geometric distribution based on the negative binomial thinning operator. In addition, [5] presented an INAR(1) process with zero truncated Poisson marginal distribution. In 2016, [24] introduced random environment in INAR process, which was a significant breakthrough in INAR modeling. Pavlopoulos and Karlis [28] proposed an overdispersed INAR(1) model with innovations following a finite mixture of Poisson distribution of order k. Moreover, some authors such as [13,40,43] considered random coefficient INAR(1) processes, in which the autoregressive parameter is a random variable. In the same line, INAR(1) models with zero-inflated Poisson and zero-inflated generalized power series innovations were introduced by [14,17], respectively.
Estimation and forecasting play a pivotal role in time series analysis. Different estimation methods for INAR time series have been proposed by researchers in this field such as Yule-Walker, maximum likelihood, etc. For instance, [42] used the maximum likelihood, conditional least squares, modified quasi-likelihood and generalized method of moments to estimate the parameters of interest in pth-order random coefficient INAR processes. Bu et al. [8] created a generic framework to estimate higher-order integer-valued autoregressive processes using maximum likelihood estimation. Likewise, [41] studied the empirical likelihood based inference for the random coefficient integer-valued autoregressive processes.
On the other hand, when it comes to integer-valued time series, prediction algorithms have not been extensively investigated. For example, [20] used Box-Jenkins' AR(p) model to look into prognostication for count time series. Wang and Zhang [38] proposed Quasi-likelihood estimators and two methods for coherent point prediction. Also, [23] suggested Whittle, maximum empirical likelihood, Sieve Bootstrap estimation methods, along with Bayesian, and Sieve Bootstrap prediction methods for the Poisson-Lindley INAR(1), denoted by PLINAR(1).
Among these instances, only [4] suggested the maximum likelihood estimator for the processes, while other authors focused on the estimation and forecasting of INAR with non-random coefficient. However, it should be noted that the practicality and effectiveness of INAR models with non-random coefficients are limited in various real-world applications. Inaccuracies in modeling often stem from the failure to account for the impact of random coefficients. As an example, in modeling the number of patients in inpatients wards or the number of submissions to animal health laboratories, the survival rate α may be affected by several environmental factors, hence; it could vary randomly over time. So, it is logiced to substitute α with . Moreover, the study of INAR models holds significant importance, especially in scenarios involving a high proportion of zero values, as ignoring this inflation can lead to two critical issues: (1) overdispersion due to the prevalence of zeros and (2) biased estimations of parameters and standard errors [4]. Except the contribution by [4], others did not focus on both random coefficient and zero inflation simultaneously in INAR model. To cover the caveat of previous studies, in this paper, we are interested in estimating and forecasting the zero inflated geometric process with random coefficient.
In the following section, some basic definitions and preliminaries are mentioned. Then in Section 3, the estimators of the model parameters are obtained using Whittle, taper spectral Whittle, maximum empirical likelihood and Sieve Bootstrap estimation methods. In Section 4, Median, Bayesian and Sieve Bootstrap prediction methods for the current process are presented. The performance evaluation of the estimators and prediction methods is conducted through Monte Carlo simulations in Section 5, while practical data is utilized for predictive purposes in Section 6. The concluding remarks are presented in Section 7.
2. Preliminaries
Bakouch et al. [4] proposed an INAR(1) process with a random coefficient and a zero-inflated geometric marginal distribution, named process that admits non-negative integer values with excess zeros based on a randomized binomial thinning operator.
Definition 2.1
A zero-inflated geometric (ZIG) distribution with parameters µ and p, abbreviated as , is said to exist for a random variable X, if its probability mass function (pmf) is
where , .
Definition 2.2
Given a non-negative integer-valued random variable X and a binary random variable which is independent of X, such that for any given real numbers , the (standard) binomial thinning operation is hereby extended to a randomized binomial thinning operation, defined and denoted by the random variable
Based on this definition, ‘ ’ is either equal to the ‘standard’ binomial thinning operator ‘ ’ (w.p. ) or equal to 0 (w.p. β). Due to [4], the process has the following expression
(2) |
where each is independent of the past and the corresponding pair . The mean and variance of innovation process are
Based on the properties of the randomized binomial thinning operator and iterating (2), [4] showed that
(3) |
Consequently, the (1) process (2) has a unique, strictly stationary solution given by (3). Also, the (1) process is a stationary discrete time Markov chain with the following one-step transition probabilities from state i to state j:
where is the pmf of and .
For the process, the k step-ahead conditional expectation and variance are
The autocovariance function of the process is obtained as
and the autocorrelation function is . Consequently, the spectral density function of the process is
(4) |
3. Estimation methods
Model parameter estimation is critical to properly separe the model and predict future values of the series. Bakouch et al. [4] described and compared two methods; the conditional least square (CLS) and Maximum likelihood estimators (MLE) to estimate the unknown parameters of the process. In this section, we describe some alternative methods to estimate . The assessment of their performances will be checked in Section 5.
3.1. Maximum empirical likelihood estimation
Owen [25,26] presented the empirical likelihood (EL) technique, which involves a method for applying likelihood-based inference theories to non-parametric situations. For a sequence of observations, of a discrete-valued random variable X with some unknown distribution functions F, the EL function is defined as , where . Let the empirical distribution function be as I is the indicator function. Then, the empirical likelihood ratio (ELR) is expressed as .
Assume be the estimating function that define the parameter through . Therefore, the profile ELR function is described as
The Lagrange multiplier approach can be used to find the maximizer of . Consider the Lagrange multipliers and such that
It can be justified that the maximum has been reached for and . Furthermore, , which is a function of the unknown parameter , is a solution to
As a result, the ELR function for the negative log profile may be expressed as
the maximal empirical likelihood estimator (MELE) of is the minimizer of .
Mykland [22] extended this strategy to statistical models with a martingale structure. Under the true parameter, it established the derivative of the objective function with respect to the unknown parameters as a martingale. Therefore, the score function is implemented to construct the ELR statistic. This approach is used to acquire the profile ELR function in this case.
The residual sum of square for the process is defined as
Let . The CLS estimation of is obtained after solving the equation, . It is convenient to justify that , with
(5) |
Let . can be proved to be a martingale difference sequence, see [41]. Based on the score function and following [10,22], the profile ELR function can be built as follows:
(6) |
the method of Lagrange multipliers can be used to find the maximizer of (6). Using the same methods as in the i.i.d scenario, it can be confirmed that the best value of is and satisfying
(7) |
Numerical approaches can be used to find the solution to Equation (7). As a result, the ELR statistic for log profiles has the following form.
(8) |
The minimizer of (8) is the MELE of that can be obtained using the following algorithm:
3.2. Sieve bootstrap parameter estimation
The Sieve Bootstrap (SB) technique which was originally employed by [2] to analyze time series data, was expanded by [9] and [16]. Meanwhile, in the following, our proposed methodology to obtain the parameters of (1) using SB method is prsented.
3.3. Whittle estimation
In this section, a frequency domain estimate method proposed by [39] is regarded to estimate Gaussian process parameters. Whittle likelihood is an approximation method which employs likelihood function of a stationary Gaussian time series. The merit of employing Whittle's estimators due to its convenience in obtaining a model's spectral density function in comparison to finding the actual likelihood.
Let be the vector of the model parameters and denote the non-normalized spectral density function of the process. The periodogram of the process of interest is , where . For Gaussian time series, [39] proved that maximizing the sample's log likelihood function is asymptotically identical to minimizing the Whittle's criterion provided by
(9) |
Consider , to represent the estimator achieved by minimizing (9). Under the Gaussianity assumption, this estimator is weakly consistent and has an asymptotic normal distribution; see [37,39]. The Whittle's criteria can be applied to non-Gaussian time series, and the resulting estimator is still consistent and has an asymptotic normal distribution, according to [29]. Since it is difficult to express the asymptotic variance of , this kind of estimate isn't feasible.
Estimating the parameters employing the minimization of Whittle criterion, (9) is substituted by
(10) |
where is the spectral density function at the frequency point . For the process, taking into account the spectral density function shown in (4), and , we obtain
(11) |
Numerical techniques can be used to achieve the numerical minimizing of Equation (11). In order to minimize (11), we used nlminb function in R software. The spectral density function of any order contains important information about properties of a random process. Here, we recall the tapering method used to make the curve of the periodogram smooth by a taper function, hence we get an estimator with small variance. For more details see [3].
Based on a realization from a certain process, the nonparamteric spectral density estimator of that process is
(12) |
where is the sample mean of the process , is a data taper, and is known as a taper spectral density estimator [3]. Consequently, in (10), we can use instead of .
4. Forecasting
Point and interval predictions of future values are crucial in practice, and several methods for forecasting future values of integer-valued time series have been created. In classical methodology, the h-step-ahead predictor of (1) is defined by the conditional mean (CM), which is investigated by [7,11] as
(13) |
The CM-based prediction is not a coherent forecasting method and is not integer-valued. Freeland and McCabe [11] established the notion of coherent forecasting with respect to integer-valued time series data. Moreover, the probabilistic forecasts, using the estimation of the forecasting distribution, are researched by [11,21]. In line with the objectives of the present study, some predictor methods are suggested in the following.
4.1. Median of transition probability-based predictor
Freeland and McCabe [11] suggested a median of transition probability-based predictor, by minimizing the expected absolute error given the sample, , as a coherent prediction for . Let be the median of the h-step ahead conditional distribution , where is the marginal density of given , which is achieved via
In this situation, is the predicted value of , that is, and, for the (1) process, it is obtained as the smallest non-negative integer satisfying
Clearly, the median method is a coherent forecasting method.
4.2. Sieve bootstrap forecast
In order to hold the integer-valued nature of the data, another method can be proposed; that is, the Bootstrap approach. A Bootstrap method does not require any specific distribution. Pascual et al. [27] implemented Bootstrap approach using the following algorithm after minor adjustments to the (1) model.
4.3. Method of Bayesian forecasting
Notice the forthcoming observation and the random vector of unknown parameters . The knowledge about comes from the observed sample , which is quantified in the posterior prediction model, . The Bayesian predictive probability function is a weighted average across the parameter space Θ, and it assigns a weight to each potential parameter setting as a posterior distribution. For the Poisson INAR(1) process, [34,35] proposed a Bayesian technique. Also, [23] used this method for Poisson-Lindley INAR(1) process. Here, we extend this method to the (1) process.
Definition 4.1
Let be the unknown parameter vector. The h-step ahead Bayesian posterior predictive distribution is displayed by
(14) where is the posterior probability function of .
In other methods, the h-step ahead transition probability is denoted by . To emphasize on the randomness of the parameters, in this part, is substituted by . When Bayesian posterior predictive distribution , is obtained, the Bayesian h-step-ahead predictor can be defined as the expectation, median, or mode of this distribution.
In (1) model, beta and gamma distributions can be applied as the prior distributions of the parameters. Specifically, assume that , , ; and . If these parameters are considered to be independent random variables, then the prior distribution of would be
(15) |
where , and , , c and d are known parameters. Hence, the posterior distribution of is
(16) |
where is the conditional likelihood function and . It is difficult to estimate the marginal distribution, and the posterior mean value of each of the unknown parameters due to the intricate structure of the posterior distribution of . As a result, the Gibbs sampling algorithm of [12] must be used in simulation experiments, we need the full conditional posterior distribution of . Using Equation (16), each parameter's entire conditional posterior distribution is provided by
It's worth noting that when a gamma prior is applied to a parameter, the entire conditional posterior density function is a linear combination of gamma densities, while a beta distribution is employed, the whole conditional posterior density function is a linear combination of beta densities. Thus, for the (1) model, the Bayesian predictive function of given is
(17) |
The complexity of precludes us from utilizing the standard Bayesian procedure to discover a solution. To determine the forecasting value, a Bayesian forecasting algorithm is used. The next algorithm is used to obtain the h step ahead sample from Equation (17).
5. Numerical simulations
Implementing R software, we conducted various simulation tests to evaluate the performance of the suggested estimates and prediction methods. We generated the number of observations from the model with three different sets of parameters, , , . In each scenario, we ran simulations with five different sample sizes: N = 50, 100, 200, 500, and 1000, and all of the experiments were repeated times.
The proposed estimation methods were compared with MLE approaches. Table 1, reports the estimated bias (BIAS) and sample standard error (SSE) for for MLE, Whittle estimation (WE), taper spectral Whittle estimation (TSWE), Maximum Empirical Likelihood Estimation (MELE) and Sieve Bootstrap estimation (SBE) methods. With respect to the spectral measures in Section 3.3, the following tapers that have been proposed by [30] were employed.
For each case, the first and second lines show BIAS and SSE for , respectively, where SSE is in bold-face-type and brackets. Based on this table, we discovered that when sample size n increases, the BIAS and SSE of all estimations decrease, which implies the acceptable performances of the mentioned techniques. Moreover, it is concluded that all of the proposed estimation techniques perform as well as or better than MLE technique.
Table 1.
Simulation results for the Bias and SSE (SSE is presented in bold-face-type) of some values for the process.
n | MLE | WE | TSWE | MELE | SBE |
---|---|---|---|---|---|
(p = 0.2, , , ) | |||||
50 | (−0.026,0.006, −0.010,−0.010) | (0.036,0.011, 0.008, −0.014) | (0.030,0.001, 0.015,−0.016) | (−0.002,0.018, 0.008,0.001) | (−0.010,0.002, 0.003,0.004) |
(0.006,0.007,0.009,0.007) | (0.005, 0.007,0.007,0.006) | (0.006,0.008,0.008,0.006) | (0.009,0.008,0.009,0.009) | (0.003,0.000,0.000,0.003) | |
100 | (−0.017,0.007,−0.008,−0.008) | (0.032,0.010,0.008,−0.013) | (0.029,0.001,0.013,−0.017) | (−0.003,0.012,0.009,0.003) | (0.008,0.002,0.004,0.005) |
(0.006,0.007,0.008,0.006) | (0.004,0.007,0.006,0.005) | (0.005,0.007,0.008,0.005) | (0.009,0.008,0.009,0.008) | (0.003,0.001,0.000,0.003) | |
200 | (0.012,0.005,−0.007,0.006) | (0.027,0.008,0.009,−0.011) | (0.020,0.000,0.010,−0.013) | (0.002,0.008,0.008,−0.002) | (0.007,−0.001,0.002,0.003) |
(0.005,0.006,0.008,0.004) | (0.002,0.005,0.005,0.003) | (0.003,0.005,0.006,0.003) | (0.008,0.007,0.007,0.006) | (0.002,−0.001,0.002,0.002) | |
500 | (−0.009,0.004,0.003,0.004) | (0.018,0.007,0.007,0.010) | (0.016,0.001,0.008,0.011) | (−0.001,0.004,0.007,0.002) | (0.004,0.001,0.001,0.004) |
(0.004,0.004,0.007,0.003) | (0.001,0.003,0.003,0.003) | (0.002,0.002,0.003,0.002) | (0.007,0.006,0.005,0.003) | (0.001,0.000,0.001,0.001) | |
1000 | (−0.006,−0.004, −0.001,0.001) | (0.015, 0.007, 0.009,0.01) | (0.014,0.000, 0.008,0.009) | (−0.001,0.001, 0.007,0.007) | (−0.005,−0.000, 0.001,0.005) |
(0.004,0.003,0.007,0.002) | (0.000,0.003,0.001,0.001) | (0.002,0.004,0.002,0.001) | (0.007,0.006,0.004, 0.005) | (0.001,0.001,0.001,0.001) | |
(p = 0.5, , , ) | |||||
50 | (0.064,−0.083, 0.079, 0.014) | (−0.020,−0.049, 0.061,0.032) | (−0.018,−0.052, 0.062,0.026) | (−0.008,−0.044, 0.056,0.011) | (0.022,−0.027, 0.019, 0.011) |
(0.004,0.002,0.002,0.007) | (0.007,0.006,0.005,0.006) | (0.007,0.006,0.005,0.007) | (0.009,0.007,0.006,0.008) | (0.003, 0.000,0.000,0.004) | |
100 | (0.058,−0.063,0.069,0.011) | (−0.017,−0.035,0.058,0.025) | (−0.016,−0.049,0.057,0.023) | (−0.007,−0.037,0.035,0.014) | (0.016,−0.023,0.012,0.009) |
(0.003,0.002,0.002,0.007) | (0.007,0.005,0.003,0.006) | (0.006,0.007,0.005,0.001) | (0.01,0.009,0.008,0.010) | (0.003,0.002,0.001,0.004) | |
200 | (0.042,0.047,0.039,0.009) | (−0.009,0.021,0.042,0.013) | (0.014,−0.025,0.043,0.021) | (0.006,−0.021,0.018,0.01) | (0.009,0.014,0.008,0.005) |
(0.003,0.001,0.001,0.006) | (0.006,0.003,0.002,0.004) | (0.006,0.006,0.003,0.004) | (0.009,0.008,0.007,0.009) | (0.002,0.001,0.002,0.003) | |
500 | (0.015,−0.02,0.01,0.007) | (0.008,0.019,0.030,0.009) | (0.012,0.012,0.026,0.017) | (0.006,0.010,0.008,0.009) | (−0.006,0.007,0.004,0.003) |
(0.002,0.001,0.001,0.004) | (0.004,0.003,0.001,0.03) | (0.005,0.004,0.002,0.003) | (0.008,0.008,0.006,0.008) | (0.001,0.001,0.001,0.003) | |
1000 | (−0.011, −0.009, 0.000,−0.005) | (0.006,0.014, 0.020,0.006) | (0.011, 0.009, 0.001,0.012) | (0.000,0.006, 0.005, 0.008) | (−0.003, 0.002, 0.001,0.002) |
(0.001,0.002,0.001,0.005) | (0.003,0.002,0.001,0.002) | (0.004,0.003,0.002,0.002) | (0.008,0.007,0.006,0.007) | (0.000,0.001, 0.001,0.002) | |
(p = 0.4, , , ) | |||||
50 | −0.042,−0.026, 0.009,−0.032) | (0.036,−0.003, 0.012,−0.028) | (0.027,−0.013, 0.022,−0.011) | (−0.002,0.010, −0.015,0.004) | (−0.004,0.009, 0.008,0.003) |
(0.009,0.004,0.005,0.008) | (0.008,0.005,0.005,0.008) | (0.008,0.005,0.006,0.008) | (0.006,0.008, 0.007,0.006) | (0.003,0.002,0.002,0.004) | |
100 | (−0.037,−0.022,0.010,−0.037) | (0.032,−0.004,0.008,−0.023) | (0.022,−0.009.0.009,−0.008) | (−0.003,0.010,−0.014,0.004) | (0.004,0.007,0.006,0.003) |
(0.008,0.005,0.004,0.006) | (0.006,0.004,0.004,0.005) | (0.008,0.004,0.005,0.009) | (0.005,0.006,0.005,0.006) | (0.003,0.002,0.002,0.003) | |
200 | (0.030,−0.019,0.009,−0.032) | (0.027,0.003,0.004,0.020) | (0.020,−0.007,0.005,0.003) | (0.002,0.004,0.012,0.003) | (0.003,0.005,0.005,0.002) |
(0.004,0.003,0.003,0.002) | (0.003,0.003,0.002,0.003) | (0.007,0.002,0.003,0.006) | (0.003,0.002,0.003,0.005) | (0.002,0.001,0.001,0.002) | |
500 | (−0.021,0.017,0.008,−0.020) | (0.015,0.002,0.003,0.017) | (0.017,0.003,0.002,0.002) | (0.001,0.008,0.011,0.002) | (0.003,0.004,0.004,0.001) |
(0.003,0.002,0.002,0.001) | (0.002,0.001,0.001,0.002) | (0.005,0.001,0.001,0.005) | (0.002,0.001,0.001,0.004) | (0.001,0.001,0.001,0.001) | |
1000 | (−0.025,−0.017, 0.008,−0.011) | (0.010,−0.002, 0.001, −0.014) | (0.017,−0.001, 0.001,−0.002) | (0.001,−0.008, 0.010,−0.0002) | (−0.002,0.003, 0.004,−0.001) |
(0.002,0.001,0.002,0.000) | (0.000,0.000,0.000,0.001) | (0.005,0.000,0.000,0.004) | (0.001,0.000,0.000,0.004) | (0.000,0.000,0.000,0.000) |
Regarding the second simulation objective, we aimed to compare the expectation based predictor with the MLE, proposed by [4], Bayesian, Median, Bootstrap, and the expectation based method implementing our proposed parameter estimation methods, toward h-step prediction. Thus, we generated N + 4 observations from the process, , where the first N observations were employed to estimate the parameters, and the other observations were utilized to compute the prediction mean absolute error (PMAE) as
where is the th observed data, is the corresponding h-step prediction and k is the repetition times.
The results are shown in Tables 2–4. The PMAE for all suggested prediction methods are decrease as the sample size N increases. This implies that the Bayesian method, at most, and the Median and Bootstrap, in the second rank, are better choices for prediction approaches.
Table 3.
PMAE for simulated process with p = 0.5, , and .
n | h | CM | CM | CM | CM | CM | Median | Bootstrap | Bayesian | Bayesian |
---|---|---|---|---|---|---|---|---|---|---|
(MLE) | (WE) | (TSWE ) | (MELE) | Bootstrap | Bootstrap | Mean | MCMC | |||
50 | 1 | 0.541 | 0.529 | 0.528 | 0.538 | 0.569 | 0.390 | 0.539 | 0.409 | 0.427 |
2 | 0.570 | 0.549 | 0.547 | 0.564 | 0.568 | 0.400 | 0.532 | 0.409 | 0.429 | |
3 | 0.578 | 0.564 | 0.556 | 0.564 | 0. 573 | 0.400 | 0.501 | 0.420 | 0.428 | |
4 | 0.572 | 0.578 | 0.580 | 0.593 | 0.573 | 0.410 | 0.522 | 0.421 | 0.439 | |
100 | 1 | 0.504 | 0.498 | 0.440 | 0.485 | 0.432 | 0.350 | 0.401 | 0.350 | 0.312 |
2 | 0.516 | 0.509 | 0.466 | 0.485 | 0.477 | 0.360 | 0.441 | 0.366 | 0.315 | |
3 | 0.527 | 0.539 | 0.483 | 0.492 | 0.479 | 0.410 | 0.446 | 0.353 | 0.315 | |
4 | 0.552 | 0.572 | 0.494 | 0.511 | 0.505 | 0.410 | 0.458 | 0.395 | 0.345 | |
200 | 1 | 0.419 | 0.403 | 0.410 | 0.410 | 0.413 | 0.320 | 0.317 | 0.124 | 0.183 |
2 | 0.427 | 0.421 | 0.445 | 0.448 | 0.416 | 0.350 | 0.359 | 0.139 | 0.184 | |
3 | 0.441 | 0.469 | 0.461 | 0.451 | 0.442 | 0.380 | 0.371 | 0.150 | 0.184 | |
4 | 0.461 | 0.475 | 0.483 | 0.455 | 0.471 | 0.390 | 0.391 | 0.154 | 0.184 | |
500 | 1 | 0.302 | 0.311 | 0.309 | 0.325 | 0.320 | 0.290 | 0.225 | 0.138 | 0.143 |
2 | 0.335 | 0.329 | 0.322 | 0.333 | 0.327 | 0.320 | 0.268 | 0.154 | 0.148 | |
3 | 0.352 | 0.353 | 0.342 | 0.352 | 0.347 | 0.320 | 0.280 | 0.175 | 0.169 | |
4 | 0.358 | 0.397 | 0.359 | 0.377 | 0.372 | 0.340 | 0.288 | 0.193 | 0.185 | |
1000 | 1 | 0.173 | 0.186 | 0.171 | 0.177 | 0.103 | 0.120 | 0.118 | 0.073 | 0.062 |
2 | 0.206 | 0.195 | 0.172 | 0.182 | 0.112 | 0.130 | 0.156 | 0.093 | 0.086 | |
3 | 0.217 | 0.213 | 0.190 | 0.186 | 0.194 | 0.190 | 0.167 | 0.102 | 0.099 | |
4 | 0.235 | 0.223 | 0.203 | 0.230 | 0.205 | 0.190 | 0.184 | 0.113 | 0.103 |
Table 2.
PMAE for simulated process with p = 0.2, , and .
n | h | CM | CM | CM | CM | CM | Median | Bootstrap | Bayesian | Bayesian |
---|---|---|---|---|---|---|---|---|---|---|
(MLE) | (WE) | (TSWE) | (MELE) | Bootstrap | Bootstrap | Mean | MCMC | |||
50 | 1 | 0.603 | 0.573 | 0.567 | 0.581 | 0.602 | 0.408 | 0.691 | 0.790 | 0.750 |
2 | 0.605 | 0.574 | 0.572 | 0.584 | 0.602 | 0.412 | 0.658 | 0.791 | 0.753 | |
3 | 0.582 | 0.568 | 0.591 | 0.594 | 0.581 | 0.419 | 0.673 | 0.790 | 0.752 | |
4 | 0.587 | 0.571 | 0.594 | 0.601 | 0.584 | 0.421 | 0.636 | 0.797 | 0.752 | |
100 | 1 | 0.573 | 0.562 | 0.571 | 0.569 | 0.572 | 0.402 | 0.618 | 0.653 | 0.645 |
2 | 0.577 | 0.562 | 0.571 | 0.573 | 0.573 | 0.406 | 0.626 | 0.654 | 0.647 | |
3 | 0.578 | 0.566 | 0.564 | 0.580 | 0.581 | 0.409 | 0.637 | 0.659 | 0.648 | |
4 | 0.582 | 0.568 | 0.569 | 0.579 | 0.586 | 0.412 | 0.640 | 0.674 | 0.675 | |
200 | 1 | 0.574 | 0.554 | 0.560 | 0.543 | 0.554 | 0.391 | 0.565 | 0.584 | 0.555 |
2 | 0.576 | 0.561 | 0.561 | 0.553 | 0.562 | 0.393 | 0.572 | 0.593 | 0.565 | |
3 | 0.578 | 0.564 | 0.563 | 0.555 | 0.564 | 0.398 | 0.576 | 0.593 | 0.565 | |
4 | 0.581 | 0.569 | 0.564 | 0.561 | 0.568 | 0.402 | 0.593 | 0.602 | 0.566 | |
500 | 1 | 0.565 | 0.555 | 0.556 | 0.498 | 0.544 | 0.385 | 0.482 | 0.487 | 0.468 |
2 | 0.573 | 0.558 | 0.558 | 0.502 | 0.547 | 0.392 | 0.494 | 0.489 | 0.474 | |
3 | 0.572 | 0.565 | 0.559 | 0.524 | 0.547 | 0.393 | 0.507 | 0.492 | 0.492 | |
4 | 0.573 | 0.568 | 0.561 | 0.549 | 0.559 | 0.395 | 0.526 | 0.506 | 0.495 | |
1000 | 1 | 0.558 | 0.549 | 0.530 | 0.475 | 0.508 | 0.362 | 0.452 | 0.341 | 0.359 |
2 | 0.562 | 0.553 | 0.532 | 0.515 | 0.533 | 0.378 | 0.485 | 0.356 | 0.383 | |
3 | 0.567 | 0.555 | 0.541 | 0.517 | 0.540 | 0.385 | 0.492 | 0.360 | 0.409 | |
4 | 0.568 | 0.560 | 0.547 | 0.518 | 0.542 | 0.392 | 0.491 | 0.401 | 0.418 |
Table 4.
PMAE for simulated process with p = 0.4, , and .
n | h | CM | CM | CM | CM | CM | Median | Bootstrap | Bayesian | Bayesian |
---|---|---|---|---|---|---|---|---|---|---|
(MLE) | (WE) | (TSWE ) | (MELE) | Bootstrap | Bootstrap | Mean | MCMC | |||
50 | 1 | 1.029 | 1.052 | 1.033 | 1.097 | 1.045 | 0.661 | 1.108 | 0.904 | 0.748 |
2 | 1.098 | 1.069 | 1.090 | 1.130 | 1.102 | 0.792 | 1.353 | 0.858 | 0.760 | |
3 | 1.122 | 1.095 | 1.110 | 1.078 | 1.143 | 0.832 | 1.173 | 0.904 | 0.760 | |
4 | 1.141 | 1.119 | 1.138 | 1.024 | 1.128 | 0.850 | 1.205 | 0.954 | 0.760 | |
100 | 1 | 0.907 | 0.925 | 0.932 | 0.912 | 0.921 | 0.746 | 0.981 | 0.809 | 0.747 |
2 | 1.033 | 0.999 | 1.006 | 1.007 | 1.020 | 0.822 | 0.989 | 0.815 | 0.751 | |
3 | 1.125 | 1.036 | 1.121 | 1.125 | 1.065 | 0.856 | 0.994 | 0.839 | 0.751 | |
4 | 1.154 | 1.121 | 1.146 | 1.127 | 1.118 | 0.856 | 1.110 | 0.854 | 0.751 | |
200 | 1 | 0.890 | 0.906 | 0.932 | 0.907 | 0.901 | 0.700 | 0.845 | 0.682 | 0.621 |
2 | 0.975 | 0.965 | 0.961 | 0.956 | 0.905 | 0.757 | 0.865 | 0.677 | 0.644 | |
3 | 1.031 | 0.965 | 0.980 | 0.969 | 0.911 | 0.807 | 0.874 | 0.683 | 0.652 | |
4 | 1.091 | 1.008 | 0.986 | 1.072 | 0.918 | 0.813 | 0.888 | 0.682 | 0.655 | |
500 | 1 | 0.908 | 0.891 | 0.919 | 0.896 | 0.841 | 0.709 | 0.725 | 0.427 | 0.398 |
2 | 0.909 | 0.952 | 0.938 | 0.929 | 0.843 | 0.713 | 0.727 | 0.445 | 0.405 | |
3 | 0.961 | 0.956 | 0.941 | 0.948 | 0.852 | 0.713 | 0.757 | 0.462 | 0.423 | |
4 | 1.064 | 0.961 | 0.947 | 0.972 | 0.863 | 0.805 | 0.772 | 0.459 | 0.429 | |
1000 | 1 | 0.897 | 0.812 | 0.809 | 0.829 | 0.713 | 0.630 | 0.627 | 0.245 | 0.256 |
2 | 0.913 | 0.843 | 0.844 | 0.829 | 0.714 | 0.690 | 0.652 | 0.259 | 0.256 | |
3 | 0.932 | 0.857 | 0.852 | 0.839 | 0.740 | 0.704 | 0.673 | 0.267 | 0.258 | |
4 | 0.956 | 0.858 | 0.869 | 0.884 | 0.740 | 0.714 | 0.680 | 0.314 | 0.267 |
A standard interval in autoregressive models for h-step interval forecasting is based on asymptotic normality property of , see [6] for more details. Maiti and Biswas [19] suggested that we can use the highest predicted probability (HPP) interval where . Based on the HPP interval specification and the forecasting distribution's unimodality, [38] suggested an algorithm for HPP interval for PLINAR(1). Specially, the HPP interval of given , denoted by , is , with is the largest number such that
where is the h-step ahead probability. Based on the above- mentioned information, we proposed the following algorithm to obtain the HPP interval for the process.
Based on the results, we can conclude that for all of the situations, CP is close to and the length of prediction interval decreases as n increases, which indicates that the HPP method can produce a reliable prediction interval for the .
Table 5.
HPP intervals for the prediction of simulated data with p = 0.2, , and .
n | h | MLE | WE | TSWE | MELE | Bootstrap | |
---|---|---|---|---|---|---|---|
n = 50 | 1 | HPPI | (0, 3.15) | (0,3.05) | (0,3.05) | (0, 3.05) | (0, 3.10) |
CP( ) | 0.97 | 0.97 | 0.95 | 0.97 | 0.96 | ||
2 | HPPI | (0,3.25) | (0,3.15) | (0, 3.15) | (0,3.45) | (0, 3.05) | |
CP( ) | 0.97 | 0.97 | 0.97 | 0.95 | 0.97 | ||
3 | HPPI | (0,3.30) | (0, 3.25) | (0, 3.25) | (0,3.55) | (0, 3.05) | |
CP( ) | 0.97 | 0.96 | 0.98 | 0.98 | 0.97 | ||
4 | HPPI | (0,3.30) | (0, 3.25) | (0, 3.25) | (0,3.55) | (0, 3.05) | |
CP( ) | 0.97 | 0.96 | 0.95 | 0.96 | 0.95 | ||
n = 100 | 1 | HPPI | (0, 3.15) | (0, 3.11) | (0, 3.10) | (0, 3.03) | (0, 3.11) |
CP( ) | 0.97 | 0.97 | 0.97 | 0.97 | 0.96 | ||
2 | HPPI | (0, 3.18) | (0, 3.12) | (0,3.11) | (0, 3.42) | (0, 3.06) | |
CP( ) | 0.97 | 0.96 | 0.97 | 0.98 | 0.96 | ||
3 | HPPI | (0, 3.09) | (0,3.24) | (0,3.25) | (0, 3.37) | (0, 3.04) | |
CP( ) | 0.97 | 0.97 | 0.97 | 0.96 | 0.96 | ||
4 | HPPI | (0, 3.09) | (0, 3.24) | (0,3.24) | (0, 3.21) | (0, 3.04) | |
CP( ) | 0.96 | 0.97 | 0.97 | 0.96 | 0.96 | ||
n = 200 | 1 | HPPI | (0,3.03) | (0,3.01) | (0,3.08) | (0,3.05) | (0,3.06) |
CP( ) | 0.97 | 0.97 | 0.95 | 0.97 | 0.95 | ||
2 | HPPI | (0,3.09) | (0,2.98) | (0,3.07) | (0,3.01) | (0,3.02) | |
CP( ) | 0.97 | 0.96 | 0.97 | 0.97 | 0.96 | ||
3 | HPPI | (0,3.10) | (0,3.05) | (0,3.10) | (0,3.04) | (0,3.15) | |
CP( ) | 0.95 | 0.97 | 0.96 | 0.97 | 0.95 | ||
4 | HPPI | (0,3.11) | (0,3.12) | (0,3.17) | (0,3.14) | (0,3.21) | |
CP( ) | 0.96 | 0.95 | 0.97 | 0.97 | 0.97 | ||
n = 500 | 1 | HPPI | (0,2.87) | (0,2.91) | (0,2.97) | (0,3.03) | (0,2.95) |
CP( ) | 0.97 | 0.96 | 0.97 | 0.95 | 0.96 | ||
2 | HPPI | (0,2.92) | (0,2.98) | (0,2.99) | (0,3.04) | (0,2.98) | |
CP( ) | 0.97 | 0.97 | 0.97 | 0.97 | 0.97 | ||
3 | HPPI | (0,2.97) | (0,3.01) | (0,2.91) | (0,3.07) | (0,2.97) | |
CP( ) | 0.97 | 0.97 | 0.95 | 0.96 | 0.97 | ||
4 | HPPI | (0,3.03) | (0,3.05) | (0,3.07) | (0,3.11) | (0,3.05) | |
CP( ) | 0.96 | 0.97 | 0.97 | 0.97 | 0.97 | ||
n = 1000 | 1 | HPPI | (0,2.72) | (0,2.86) | (0,2.74) | (0,2.81) | (0,2.83) |
CP( ) | 0.97 | 0.96 | 0.97 | 0.96 | 0.97 | ||
2 | HPPI | (0,2.83) | (0,2.81) | (0,2.84) | (0,2.98) | (0,2.86) | |
CP( ) | 0.97 | 0.97 | 0.97 | 0.97 | 0.96 | ||
3 | HPPI | (0,2.87) | (0,2.75) | (0,2.85) | (0,2.97) | (0,2.91) | |
CP( ) | 0.97 | 0.96 | 0.97 | 0.95 | 0.97 | ||
4 | HPPI | (0,2.93) | (0,2.82) | (0,2.97) | (0,3.03) | (0,2.95) | |
CP( ) | 0.97 | 0.97 | 0.97 | 0.97 | 0.95 |
Table 6.
HPP intervals for the prediction of simulated data with p = 0.5, , and .
n | h | MLE | WE | TSWE | MELE | Bootstrap | |
---|---|---|---|---|---|---|---|
n = 50 | 1 | HPPI | (0, 3.15) | (0, 3.25) | (0, 3.50) | (0, 3.15 ) | (0, 3.50) |
CP( ) | 0.97 | 0.97 | 0.95 | 0.97 | 0.96 | ||
2 | HPPI | (0, 3.24) | (0, 3.47) | (0, 3.59) | (0, 3.45) | (0, 3.63) | |
CP( ) | 0.97 | 0.97 | 0.96 | 0.97 | 0.97 | ||
3 | HPPI | (0, 3.32) | (0, 3.58) | (0, 3.63) | (0, 3.43) | (0, 3.35) | |
CP( ) | 0.97 | 0.96 | 0.97 | 0.95 | 0.97 | ||
4 | HPPI | (0, 3.10) | (0, 3.65) | (0, 3.55) | (0, 3.59) | (0, 3.49) | |
CP( ) | 0.97 | 0.95 | 0.97 | 0.97 | 0.96 | ||
n = 100 | 1 | HPPI | (0,3.09) | (0,3.10) | (0,3.27) | (0,3.05) | (0,3.28) |
CP( ) | 0.97 | 0.96 | 0.97 | 0.97 | 0.96 | ||
2 | HPPI | (0,3.16) | (0,3.19) | (0,3.09) | (0,3.09) | (0,3.37) | |
CP( ) | 0.97 | 0.97 | 0.97 | 0.97 | 0.97 | ||
3 | HPPI | (0,3.18) | (0,3.26) | (0,3.17) | (0,3.19) | (0,3.31) | |
CP( ) | 0.97 | 0.97 | 0.97 | 0.96 | 0.97 | ||
4 | HPPI | (0,3.14) | (0,3.32) | (0,3.28) | (0,3.31) | (0,3.37) | |
CP( ) | 0.97 | 0.96 | 0.97 | 0.97 | 0.95 | ||
n = 200 | 1 | HPPI | (0, 2.05) | (0, 2.09) | (0, 2.10) | (0, 2.01) | (0,2.03) |
CP( ) | 0.97 | 0.97 | 0.96 | 0.97 | 0.95 | ||
2 | HPPI | (0, 2.04) | (0,2.12) | (0, 2.16) | (0, 2.24) | (0, 2.04) | |
CP( ) | 0.97 | 0.96 | 0.97 | 0.96 | 0.96 | ||
3 | HPPI | (0, 2.03) | (0, 2.15) | (0, 2.22) | (0, 2.26) | (0, 2.04) | |
CP( ) | 0.97 | 0.97 | 0.97 | 0.96 | 0.96 | ||
4 | HPPI | (0, 2.03) | (0, 2.15) | (0, 2.22) | (0, 2.21) | (0, 2.06) | |
CP( ) | 0.97 | 0.96 | 0.95 | 0.96 | 0.96 | ||
n = 500 | 1 | HPPI | ( 0, 2.05) | (0, 1.91) | (0, 2.05) | (0, 1.97) | (0, 2.05) |
CP( ) | 0.97 | 0.96 | 0.95 | 0.97 | 0.97 | ||
2 | HPPI | (0, 2.03) | (0, 2.04) | (0, 2.09) | (0, 2.03) | (0, 2.05) | |
CP( ) | 0.97 | 0.96 | 0.95 | 0.95 | 0.96 | ||
3 | HPPI | (0, 2.00) | (0, 2.03) | (0, 2.12) | (0, 2.01) | (0, 2.03) | |
CP( ) | 0.97 | 0.97 | 0.97 | 0.96 | 0.96 | ||
4 | HPPI | (0, 2.00) | (0, 2.03) | (0, 2.12) | (0, 2.13) | (0, 2.01) | |
CP( ) | 0.97 | 0.96 | 0.97 | 0.96 | 0.96 | ||
n = 1000 | 1 | HPPI | (0, 2.01) | (0, 1.86) | (0, 2.10) | ( 0, 1.94) | (0, 2.02) |
CP( ) | 0.96 | 0.97 | 0.97 | 0.96 | 0.97 | ||
2 | HPPI | (0, 2.01) | (0, 2.01) | (0,2.04) | (0, 2.06) | (0, 2.02) | |
CP( ) | 0.97 | 0.95 | 0.97 | 0.97 | 0.95 | ||
3 | HPPI | (0, 2.00) | (0, 2.01) | (0,2.05) | (0, 1.98) | (0, 2.01) | |
CP( ) | 0.97 | 0.96 | 0.96 | 0.97 | 0.96 | ||
4 | HPPI | (0, 2.03) | (0, 2.01) | (0, 2.07) | (0, 2.08) | (0,2.01) | |
CP( ) | 0.96 | 0.97 | 0.97 | 0.98 | 0.96 |
Table 7.
HPP intervals for the prediction of simulated data with p = 0.4, , and .
n | h | MLE | WE | TSWE | MELE | Bootstrap | |
---|---|---|---|---|---|---|---|
n = 50 | 1 | HPPI | (0, 3.40) | (0.01, 3.20) | (0, 3.10) | (0 3.50) | (0, 3.40 ) |
CP( ) | 0.96 | 0.96 | 0.95 | 0.96 | 0.96 | ||
2 | HPPI | (0.01, 3.52) | (0, 3.25) | (0, 3.61) | (0 3.55) | (0.02, 3.42) | |
CP( ) | 0.96 | 0.96 | 0.96 | 0.96 | 0.95 | ||
3 | HPPI | (0, 3.48) | (0, 3.27) | (0, 3.72) | (0, 3.60) | (0, 3.45) | |
CP( ) | 0.96 | 0.96 | 0.96 | 0.96 | 0.95 | ||
4 | HPPI | (0.02, 3.47) | (0, 3.25) | (0.01, 3.65) | (0, 3.62) | (0, 3.51) | |
CP( ) | 0.96 | 0.95 | 0.96 | 0.95 | 0.96 | ||
n = 100 | 1 | HPPI | (0, 3.14) | (0, 3.20) | (0, 3.13) | (0, 3.52) | (0,3.18) |
CP( ) | 0.95 | 0.95 | 0.96 | 0.96 | 0.96 | ||
2 | HPPI | (0, 3.28) | (0, 3.22) | (0, 3.24) | (0 3.53) | (0, 3.22) | |
CP( ) | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 | ||
3 | HPPI | (0, 3.31) | (0, 3.25) | (0, 3.22) | (0, 3.57) | (0, 3.25) | |
CP( ) | 0.96 | 0.96 | 0.95 | 0.96 | 0.95 | ||
4 | HPPI | (0, 3.36) | (0, 3.27) | (0, 3.20) | (0, 3.51) | (0, 3.28) | |
CP( ) | 0.96 | 0.96 | 0.96 | 0.95 | 0.96 | ||
n = 200 | 1 | HPPI | (0, 3.09) | (0, 3.06) | (0 2.91) | (0, 3.12) | (0, 3.02) |
CP( ) | 0.95 | 0.96 | 0.96 | 0.96 | 0.96 | ||
2 | HPPI | (0, 3.17) | (0, 3.07) | (0, 3.04) | (0 3.09) | (0, 3.05) | |
CP( ) | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 | ||
3 | HPPI | (0, 3.18) | (0, 3.11) | (0, 3.08) | (0, 3.17) | (0, 3.03) | |
CP( ) | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 | ||
4 | HPPI | (0, 3.14) | (0, 3.12) | (0, 3.12) | (0, 3.15) | (0, 3.08) | |
CP( ) | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 | ||
n = 500 | 1 | HPPI | (0, 3.03) | (0, 2.96) | (0 2.85) | (0, 2.82) | (0, 2.84) |
CP( ) | 0.97 | 0.96 | 0.96 | 0.96 | 0.97 | ||
2 | HPPI | (0, 3.07) | (0, 2.97) | (0, 2.89) | (0 2.83) | (0, 2.85) | |
CP( ) | 0.97 | 0.96 | 0.96 | 0.96 | 0.96 | ||
3 | HPPI | (0, 3.08) | (0, 3.02) | (0, 2.99) | (0, 2.81) | (0, 2.87) | |
CP( ) | 0.96 | 0.96 | 0.96 | 0.96 | 0.95 | ||
4 | HPPI | (0, 3.06) | (0, 3.06) | (0, 3.08) | (0, 2.85) | (0, 2.88) | |
CP( ) | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 | ||
n = 1000 | 1 | HPPI | (0, 2.86) | (0, 2.78) | (0, 2.64) | (0, 2.52) | (0, 2.41) |
CP( ) | 0.96 | 0.96 | 0.96 | 0.96 | 0.97 | ||
2 | HPPI | (0, 2.98) | (0, 2.86) | (0, 2.65) | (0 2.59) | (0, 2.55) | |
CP( ) | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 | ||
3 | HPPI | (0, 3.01) | (0, 2.94) | (0, 2.83) | (0, 2.62) | (0, 2.52) | |
CP( ) | 0.96 | 0.96 | 0.96 | 0.95 | 0.97 | ||
4 | HPPI | (0, 2.98) | (0, 2.94) | (0, 2.93) | (0, 2.78) | (0, 2.51) | |
CP( ) | 0.96 | 0.96 | 0.95 | 0.95 | 0.95 |
6. Practical data analysis
The practical application of the model for anorexia count time series data is illustrated in the current section. The employed data set includes monthly numbers of submissions to animal health laboratories from January 2003 to December 2009, extracted from a region in New Zealand [14], see Table 8. It is tried to illustrate how the anorexia count time series data can be modeled by an INAR model with random coefficient.
Table 8.
Anorexia data set.
Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 | 0 | 1 | 3 | 1 | 4 | 1 | 1 | 4 | 11 | 2 | 1 | 1 |
2004 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2005 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
2006 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 5 | 6 | 3 | 2 | 1 |
2007 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 |
2008 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 4 | 0 | 1 | 0 |
2009 | 1 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
To this end, let be the number of animal's anorexia in the tth month; hence, acts as an INAR process and represents the sum of the number of anorexia from the previous month denoted by . The rate α might be influenced by a variety of external circumstances. It changes at random throughout the time and it can be denoted by , therefore could be modeled as a sum of and .
The sample paths, autocorrelation functions, partial autocorrelation functions and Pareto charts of the series are displayed in Figures 1 and 2. Figure 1 implies that the first-order autoregressive model is appropriate for the data series; however, the Pareto chart reveals that the zeros have the highest frequency among the other values for the data series; so, the zero-inflated INAR model appears to be more appropriate. To be stationary, the Phillips-Perron test has p-value . To identify a significant empirical overdispersion towards the data, we test which is equidispersed; while, is overdispersed, with significance level . We will reject on a significance level γ if the observed value of the dispersion index, , exceeds the critical value , where and denote the -quantile of the distribution. Alternatively, we can check if the p-value falls bellow γ, noting that we can replace α by , see [32]. The mean and variance of anorexia data are obtained as 0.8214286 and 2.895439, respectively. Hence, , the critical value is 1.324259 and the p-value is 0. The observed value , exceeds the critical value. Consequently, the data series isn't the result of an equidispersed INAR(1) procedure. The overdispersion of data is reinforced by comparing the p-value of data and the considered significance level
Figure 1.
Sample path, of the Anorexia data set.
Figure 2.
The Pareto chart of the Anorexia data set.
Also, it is of interest to test against , and for this purpose, we use the likelihood ratio test (LRT) statistic , where and are the likelihood values of the and the models, respectively. It's worth noting that the LRT statistic (asymptotically) follows the chi-square distribution with 1 degree of freedom, which equals the number of additional parameters in the model with extra parameters. The LRT statistic and its corresponding p-value of the data set are 5.41858 and 0.019923, respectively. It's worth noting that the LRT statistic is higher than the test's critical value for the data set; hence, the statistic indicates zero-inflation in the data; therefore, the model would fit the data better.
Finally, the hypothesis against , which is equivalent to , is tested and the p-value ; therefore, the hypothesis of constant coefficient for the model is rejected, see [4].
The performance of the prediction methods is checked using the first 80 observations to estimate the parameters, and prognosticate the last 4 observations. In Table 9, the point predictions for the last 4 observations are reported implementing the CM prediction method; while, the parameters are estimated using the MLE, MELE and WE, WTE methods. Moreover, the Bootstrap prediction method was used, when the parameters are estimated using Bootstrap. Furthermore, prediction methods based on the Bayesian method, i.e. Mean, Median, Mode and MCMC methods, are demonstrated. The HPP intervals are calculated for all prediction methods. All of the intervals, as can be seen, covered the observed data. The predictions based on Bayesian methods, MELE and Bootstrap methods are nearest to the actual observed data. Also HPP intervals indicates that Bayesian and Bootstrap methods have a shorter confidence interval. So, for data prediction, we recommend utilizing Bayesian and Bootstrap predictors in the process.
Table 9.
Prediction analysis of anorexia count time series data.
Observed value | 0 | 0 | 0 | 0 | |
---|---|---|---|---|---|
CM | prediction | 0.5873 | 0.7662 | 0.8207 | 0.8373 |
(MLE) | lower limit | 0 | 0 | 0 | 0 |
upper limit | 3 | 4 | 4 | 4 | |
CM | prediction | 0.0332 | 0.0648 | 0.09476 | 0.1232 |
(MELE estimation) | lower limit | 0 | 0 | 0 | 0 |
upper limit | 3 | 4 | 2 | 4 | |
CM | prediction | 0.6437 | 0.9556 | 1.1066 | 1.1798 |
(Whittle estimation) | lower limit | 0 | 0 | 0 | 0 |
upper limit | 3 | 4 | 4 | 5 | |
CM | prediction | 0.3302 | 0.5822 | 0.7745 | 0.9212 |
(TSW estimation) | lower limit | 0 | 0 | 0 | 0 |
upper limit | 2 | 3 | 3 | 4 | |
Bayesian | Mean Method | 0 | 0 | 0 | 0 |
Median Method | 0 | 0 | 0 | 0 | |
Mode Method | 0 | 0 | 0 | 0 | |
MCMC | 0.0286 | 0.0297 | 0.0298 | 0.0298 | |
lower limit | 0 | 0 | 0 | 0 | |
upper limit | 2.2 | 1.4 | 2.3 | 1.9 | |
Bootstrap | prediction | 0 | 0 | 0 | 0 |
(Bootstrap estimation) | lower limit | 0 | 0 | 0 | 0 |
upper limit | 2 | 2 | 2 | 2 |
7. Conclusion
In certain count time series datasets, a significant proportion of observations are zeros. To address this issue, [4] introduced the ZIGINARRC(1) process, a zero-inflated geometric marginal distribution with random coefficients. This paper investigates various estimation methods, including Whittle estimation, taper spectral Whittle estimation, maximum empirical likelihood, and sieve bootstrap estimation, in the context of the ZIGINARRC(1) process. These methods were compared to the standard maximum likelihood estimation through a simulation study, as summarized in Table 1. The results indicate that our proposed estimation methods perform comparably well to the maximum likelihood estimation method.
Additionally, we explore prediction techniques for the ZIGINARRC(1) process, including Bayesian, Median, and Sieve Bootstrap prediction methods, and conduct simulation studies to assess their performance. The results suggest that the Bayesian method is the most accurate, followed by the Median and Bootstrap methods. However, it should be noted that the Bayesian method is computationally intensive and time-consuming. In contrast, the Bootstrap method, while also time-consuming due to algorithmic complexity, offers more efficient execution times compared to the Bayesian approach.
Finally, we present a practical data study to demonstrate the practical applicability of the aforementioned prediction methods.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Al-Osh M.A. and Alzaid A.A., First-order integer-valued autoregressive (INAR (1)) process, J. Time Ser. Anal. 8 (1987), pp. 261–275. [Google Scholar]
- 2.Bühlmann P., Sieve bootstrap for time series, Bernoulli 3 (1997), pp. 123–148. [Google Scholar]
- 3.Bakouch H.S., Higher-order moments, cumulants and spectral densities of the NGINAR(1) process, Stat. Methodol. 7 (2010), pp. 1–21. [Google Scholar]
- 4.Bakouch H.S., Mohammadpour M., and Shirozhan M., A zero-inflated geometric INAR(1) process with random coefficient, Appl. Math. 63 (2018), pp. 79–105. [Google Scholar]
- 5.Bakouch H.S. and Ristić M.M., Zero truncated Poisson integer-valued AR(1) model, Metrika 72 (2010), pp. 265–280. [Google Scholar]
- 6.Bhansali R., Asymptotic mean-square error of predicting more than one-step ahead using the regression method, J. R. Stat. Soc. Ser. C Appl. Stat. 23 (1974), pp. 35–42. [Google Scholar]
- 7.Brännäs K. and Shahiduzzaman Quoreshi A.M.M., Integer-valued moving average modelling of the number of transactions in stocks, Appl. Financ. Econ. 20 (2010), pp. 1429–1440. [Google Scholar]
- 8.Bu R., McCabe B., and Hadri K., Maximum likelihood estimation of higher-order integer-valued autoregressive processes, J. Time Ser. Anal. 29 (2008), pp. 973–994. [Google Scholar]
- 9.Cardinal M., Roy R., and Lambert J., On the application of integer-valued time series models for the analysis of disease incidence, Stat. Med. 18 (1999), pp. 2025–2039. [DOI] [PubMed] [Google Scholar]
- 10.Chuang C.S. and Chan N.H., Empirical likelihood for autoregressive models, with applications to unstable time series, Stat. Sin. 12 (2002), pp. 387–407. [Google Scholar]
- 11.Freeland R.K. and McCabe B.P., Forecasting discrete valued low count time series, Int. J. Forecast. 20 (2004), pp. 427–434. [Google Scholar]
- 12.Gelfand A.E. and Smith A.F., Sampling-based approaches to calculating marginal densities, J. Am. Stat. Assoc. 85 (1990), pp. 398–409. [Google Scholar]
- 13.Gomes D. and e Castro L.C., Generalized integer-valued random coefficient for a first order structure autoregressive (RCINAR) process, J. Stat. Plan. Inference 139 (2009), pp. 4088–4097. [Google Scholar]
- 14.Jazi M.A., Jones G., and Lai C.D., First-order integer valued AR processes with zero inflated Poisson innovations, J. Time Ser. Anal. 33 (2012), pp. 954–963. [Google Scholar]
- 15.Jin-Guan D. and Yuan L., The integer-valued autoregressive (INAR(p)) model, J. Time Ser. Anal. 12 (1991), pp. 129–142. [Google Scholar]
- 16.Kim H.Y. and Park Y., A non-stationary integer-valued autoregressive model, Stat. Pap. 49 (2008), pp. 485–502. [Google Scholar]
- 17.Li C., Wang D., and Zhang H., First-order mixed integer-valued autoregressive processes with zero-inflated generalized power series innovations, J. Korean Stat. Soc. 44 (2015), pp. 232–246. [Google Scholar]
- 18.Livsey J., Lund R., Kechagias S., and Pipiras V., Multivariate integer-valued time series with flexible autocovariances and their application to major hurricane counts, Ann. Appl. Stat. 12 (2018), pp. 408–431. [Google Scholar]
- 19.Maiti R. and Biswas A., Coherent forecasting for overdispersed time series of count data, Braz. J. Probab. Stat. 29 (2015), pp. 747–766. [Google Scholar]
- 20.Maiti R., Biswas A., and Das S., Coherent forecasting for count time series using Box-Jenkins's AR(p) model, Stat. Neerl. 70 (2016), pp. 123–145. [Google Scholar]
- 21.McCabe B., Martin G., and Harris D., Efficient probabilistic forecasts for counts, J. R. Stat. Soc. Series B Stat. Methodol. 73 (2011), pp. 253–272. [Google Scholar]
- 22.Mykland P.A., Dual likelihood, Ann. Stat. 23 (1995), pp. 396–421. [Google Scholar]
- 23.Nasirzadeh R. and Zamani A., Poisson-Lindley INAR(1) processes: Some estimation and forecasting methods, J. Iran. Stat. Soc. 19 (2020), pp. 145–173. [Google Scholar]
- 24.Nastić A.S., Laketa P.N., and Ristić M.M., Random environment integer-valued autoregressive process, J. Time Ser. Anal. 37 (2016), pp. 267–287. [Google Scholar]
- 25.Owen A.B., Empirical likelihood ratio confidence intervals for a single functional, Biometrika 75 (1988), pp. 237–249. [Google Scholar]
- 26.Owen A.B., Empirical likelihood ratio confidence regions, Ann. Stat. 18 (1990), pp. 90–120. [Google Scholar]
- 27.Pascual L., Romo J., and Ruiz E., Bootstrap predictive inference for ARIMA processes, J. Time Ser. Anal. 25 (2004), pp. 449–465. [Google Scholar]
- 28.Pavlopoulos H. and Karlis D., INAR(1) modeling of overdispersed count series with an environmental application, Environmetrics 19 (2008), pp. 369–393. [Google Scholar]
- 29.Rice J., On the estimation of the parameters of a power spectrum, J. Multivar. Anal. 9 (1979), pp. 378–392. [Google Scholar]
- 30.Riedel K.S. and Sidorenko A., Minimum bias multiple taper spectral estimation, IEEE. Trans. Signal Process. 43 (1995), pp. 188–195. [Google Scholar]
- 31.Ristić M.M., Bakouch H.S., and Nastić A.S., A new geometric first-order integer-valued autoregressive (NGINAR(1)) process, J. Stat. Plan. Inference 139 (2009), pp. 2218–2226. [Google Scholar]
- 32.Schweer S. and Weiß C.H., Compound poisson INAR(1) processes: Stochastic properties and testing for overdispersion, J. Comput. Stat. Data Anal. 77 (2014), pp. 267–284. [Google Scholar]
- 33.Silva I., Silva M.E., Pereira I., and Silva N., Replicated INAR(1) process, Methodol. Comput. Appl. Probab. 7 (2005), pp. 517–542. [Google Scholar]
- 34.Silva N., Pereira I., and Silva M.E., Forecasting in INAR(1) model, Stat. J. 7 (2009), pp. 119–134. [Google Scholar]
- 35.Simarmata D.M., Novkaniza F., and Widyaningsih Y., A time series model: First-order integer-valued autoregressive (INAR(1)), AIP. Conf. Proc. 1862 (2017), p. 030157. 10.1063/1.4991261. [DOI] [Google Scholar]
- 36.Steutel F.W. and Van Harn K., Discrete analogues of self-decomposability and stability, Ann. Probab. 7 (1979), pp. 893–899. [Google Scholar]
- 37.Walker A.M., Asymptotic properties of least squares estimates of parameters of the spectrum of a stationary non-deterministic time series, J. Aust. Math. Soc. 4 (1964), pp. 363–384. [Google Scholar]
- 38.Wang Y. and Zhang H., Some estimation and forecasting procedures in Possion-Lindley INAR (1) process, Commun. Stat. Simul. Comput. 50 (2021), pp. 49–62. [Google Scholar]
- 39.Whittle P., Hypothesis Testing in Times Series Analysis, Vol. 4, Almqvist and Wiksells Boktryckeri AB, Uppsala, 1951. [Google Scholar]
- 40.Zhang H. and Wang D., Inference for random coefficient INAR(1) process based on frequency domain analysis, Commun. Stat. Simul. Comput. 44 (2015), pp. 1078–1100. [Google Scholar]
- 41.Zhang H., Wang D., and Zhu F., Empirical likelihood inference for random coefficient INAR(p) process, J. Time Ser. Anal. 32 (2011), pp. 195–203. [Google Scholar]
- 42.Zheng H., Basawa I.V., and Datta S., Inference for pth-order random coefficient integer-valued autoregressive processes, J. Time Ser. Anal. 27 (2006), pp. 411–440. [Google Scholar]
- 43.Zheng H., Basawa I.V., and Datta S., First-order random coefficient integer-valued autoregressive processes, J. Stat. Plan. Inference 137 (2007), pp. 212–229. [Google Scholar]