Abstract
This paper develops a quantile hidden semi-Markov regression to jointly estimate multiple quantiles for the analysis of multivariate time series. The approach is based upon the Multivariate Asymmetric Laplace (MAL) distribution, which allows to model the quantiles of all univariate conditional distributions of a multivariate response simultaneously, incorporating the correlation structure among the outcomes. Unobserved serial heterogeneity across observations is modeled by introducing regime-dependent parameters that evolve according to a latent finite-state semi-Markov chain. Exploiting the hierarchical representation of the MAL, inference is carried out using an efficient Expectation-Maximization algorithm based on closed form updates for all model parameters, without parametric assumptions about the states’ sojourn distributions. The validity of the proposed methodology is analyzed both by a simulation study and through the empirical analysis of air pollutant concentrations in a small Italian city.
Supplementary Information
The online version contains supplementary material available at 10.1007/s11222-022-10130-1.
Keywords: EM algorithm, Latent process, Maximum likelihood, Multivariate asymmetric Laplace distribution, Quantile regression, Sojourn distribution
Introduction
Since their introduction in the 1960s, Hidden Markov Models (HMMs, see MacDonald and Zucchini 1997; Cappé et al. 2006; Zucchini et al. 2016) have been successfully implemented in a wide range of applications for the analysis of time series data. This class of models is described by an observable stochastic process whose dynamic is governed, completely or partially, by a latent unobservable Markov chain. Owing to their mathematical tractability and the availability of efficient computational procedures, the use of HMMs is well justified when the researcher is interested in inference and/or predictions about the latent process based on the observed one. For a detailed survey of the literature and fields of application, please see MacDonald and Zucchini 1997; Ephraim and Merhav 2002; Cappé et al. 2006; Maruotti 2011; Bartolucci et al. 2012; Zucchini et al. 2016 and Maruotti and Punzo (2021).
One immediate consequence of the Markov property is that in any HMM, the sojourn time (also defined as state duration or dwell-time), that is, the number of consecutive time points that the Markov chain spends in a given state, is implicitly geometrically distributed (Langrock and Zucchini 2011; Zucchini et al. 2016). Despite the popularity of HMMs, this assumption may not be realistic in many applications which can lead to biased parameter estimates and deteriorate the states classification performance due to a misspecification of the dynamic of the hidden process. Bulla and Bulla (2006) and Maruotti et al. (2019), for instance, show the inability of HMMs to model temporal dependence and reproduce empirical characteristics in real-world data, especially when the probability mass function of sojourn times is far from being geometric.
Motivated by these considerations, Hidden Semi-Markov Models (HSMMs, see Yu 2015) are designed to relax this condition by allowing the Sojourn Distributions (SDs) to be modeled directly by the researcher using more flexible parametric or nonparametric distributions.
Typical choices include families of discrete (semi-)parametric distributions or, alternatively, one can avoid distributional assumptions by estimating the SD probability mass functions based on the observations (Sansom and Thomson 2001; Guédon 2003; Pohle et al. 2022). Combining the increased flexibility to capture a wide range of distributional shapes of the SDs with the well-known advantages of HMMs, HSMMs constitute a versatile framework in several spheres of application (see Guédon 2003; Barbu and Limnios 2009; Bulla et al. 2010; O’Connell and Højsgaard 2011; Yu 2015; Maruotti and Punzo 2021 and the references therein).
In the regression context where covariates are available, both HMMs and HSMMs have also been extended to include a set of predictors by introducing state-dependent regression parameters that evolve over time according to the unobserved process (Hamilton 1989; Yu 2015; Zucchini et al. 2016).
This model specification permits to investigate the dynamics of the hidden state sequence and, at the same time, allows to examine state-specific covariate effects in the observable process, providing a useful modeling framework to capture unobserved time-dependent heterogeneity. Typically, a linear model targeting the conditional mean of the dependent variable given covariates is specified. The assumptions underlying traditional linear regression models, however, are seldom satisfied in real data which often exhibit skewness, heavy tails and outliers. Moreover, the effect of the covariates can differ greatly between different parts of the response distribution. Therefore, when the aim of the research focuses not only at the center of the response distribution but also, and especially in the tails, quantile regression (Koenker and Bassett 1978) represents an interesting alternative to standard mean regression. This method provides a way to model the conditional quantiles of a response variable with respect to the covariates in order to have a more complete picture of the entire conditional distribution compared to ordinary least squares. In the univariate quantile regression framework, both the classical and Bayesian inferential approaches have been proposed in the literature to estimate the model parameters. In the frequentist setting, the inferential approach relies on the minimization of the asymmetric loss function (see Koenker and Bassett 1978) while, in the Bayesian setting and in a likelihood inferential approach, the Asymmetric Laplace (AL) distribution has been introduced as a likelihood inferential tool. The two approaches are well-justified by the relationship between the quantile loss function and the AL density. Indeed, Yu and Moyeed (2001) showed that the minimization of the quantile loss function is equivalent, in terms of parameter estimates, to the maximization of the likelihood associated with the AL density. For a detailed review and list of references, Koenker (2005); Luo et al. (2012); Bernardi et al. (2015) and Koenker et al. (2017) provide an overview of the most used quantile regression techniques in both the classical and Bayesian settings. In addition, quantile regression methods have also been generalized to account for serial heterogeneity. In the analysis of longitudinal data, Farcomeni (2012) and Marino et al. (2018) consider univariate linear quantile models where unobserved sources of time-varying heterogeneity are captured by means of state-dependent coefficients evolving according to a finite-state homogeneous hidden Markov chain. Further, Ye et al. (2016) and Maruotti et al. (2021) propose a (semi-)Markov quantile regression to model the regime-switching effect of the regression coefficients in financial and environmental time series.
When multivariate response variables are concerned, the existing literature on quantile regression is less extensive since there is no “natural” ordering in a p-dimensional space, for . As a consequence, the univariate quantile regression method does not straightforwardly extend to higher dimensions. Nevertheless, in most situations of practical interest, the purpose of the matter being investigated lies in describing the distribution of a multivariate response variable.
For this reason the search for a satisfactory notion of multivariate quantile has led to a flourishing literature on this topic despite its definition is still a debatable issue (see Serfling 2002; Kong and Mizera 2012; Koenker et al. 2017; Stolfi et al. 2018; Chavas 2018; Charlier et al. 2020; Merlo et al. 2021, 2022 and the references therein for relevant studies).
Recently, Petrella and Raponi (2019) generalized the AL distribution inferential approach of the univariate quantile regression to a multivariate framework by using the Multivariate Asymmetric Laplace (MAL) distribution defined in Kotz et al. (2012). Employing the MAL distribution as a likelihood based inferential tool, the authors sidestep the problem of defining the quantiles of a multivariate distribution, and instead implement joint estimation for the univariate quantiles of the conditional distribution of a multivariate response variable given covariates, accounting for possible correlation among the responses.
The purpose of this article is to extend the work of Petrella and Raponi (2019) by introducing a HSMM for the analysis of multivariate time series. More formally, we develop a Quantile Hidden Semi-Markov Model (QHSMM) to jointly estimate the quantiles of the univariate conditional distributions of a multivariate response, accounting for the dependence structure between the outcomes. In particular, to capture the temporal evolution of unobserved heterogeneity, we introduce state-dependent coefficients in the regression model that evolve over time according to a latent semi-Markov process. In order to prevent inconsistent parameter estimates due to misspecification of the SDs, we adopt the nonparametric approach of Guédon (2003) where they are left unspecified and approximated by discrete distributions concentrated on a finite set of time points estimated from the data. Within this scheme, our modeling framework can be thought of as a model-based clustering approach for data showing time-varying heterogeneity, where the interest lies in the effect of cluster-specific covariates on various quantile levels.
Throughout the paper we propose to estimate the model parameters with a Maximum Likelihood (ML) approach by using the MAL distribution as working likelihood in a regression framework. Specifically, as in Petrella and Raponi (2019) and Merlo et al. (2022), we consider the mixture representation of the MAL distribution which allows us to build an efficient Expectation-Maximization (EM) algorithm with the E- and M-step updates in closed form for all model parameters.
Using simulation experiments, we illustrate the validity of our approach under different data generating processes and evaluate its ability in recovering the true values of the regression coefficients, the true classification and number of latent states.
In the empirical analysis, we apply the proposed methodology to investigate the effect of a collection of atmospheric variables on the daily concentrations of three major pollutants, i.e., particulate matter, ozone and nitrogen dioxide measured in Rieti (Italy) from 2019 to 2021. Our method allows us to: (i) assess how the effects of atmospheric variables can vary across different (more extreme) quantiles of the conditional distribution of air pollutants, accounting for their dependence structure; (ii) summarize the data by means of a reduced number of latent regimes associated with different concentration levels of chemicals.
The paper is organized as follows. In Sect. 2, we introduce the proposed model. Sect. 3 illustrates the EM-based ML approach to estimating the model parameters and the computational details of the algorithm. In Sect. 4 we present the simulation results, while Sect. 5 discusses the empirical application. Finally, Sect. 6 concludes.
Methodology
Let denote a finite-state hidden semi-Markov chain defined over a discrete state space . The latent process is constructed as follows. A homogeneous hidden Markov chain with K states models the transitions between different states, with initial probabilities, , and transition probabilities
| 1 |
with , , for every and , i.e., the diagonal elements of the transition probability matrix are zeros. More concisely, we collect the initial and transition probabilities in the K-dimensional vector and in the matrix , respectively. Because the unobserved process is semi-Markovian, only transitions from one state to another are governed by the transition probabilities in (1), but the duration of a stay in a state is modeled by a separate SD. Specifically, let us denote by the SD, i.e., the probability the hidden process spends u consecutive time steps in the k-th state, as follows:
| 2 |
where corresponds to the maximum sojourn time of the hidden chain in state k. Let us also denote the K-dimensional vector collecting all state-specific maximum sojourn times.
HSMMs allow for great flexibility as the SD in (2) is directly specified by the researcher and estimated from the observed data. The SD can be chosen from a large variety of parametric distributions, such as the shifted-Poisson, the shifted-negative binomial distributions or, in the particular case where is assumed to be geometrically distributed, a HSMM reduces to a HMM with the most likely sojourn time for every state being 1 (Zucchini et al. 2016). Parametric distributions, however, might lack the flexibility to capture key features of empirical SDs in the data, increasing state misclassification rates and inducing substantial bias. Alternatively, semi- and nonparametric data-driven approaches can be adopted (see Sansom and Thomson 2001; Guédon 2003; Langrock and Zucchini 2011; Maruotti et al. 2021) to provide sufficient additional flexibility in comparison to HMMs and accommodate complex distributional shapes.
To build the proposed model, let be a continuous observable p-variate response variable and be a m-dimensional vector of covariates, with the first element being the intercept, at time . The process represents the state-dependent process of the HSMM and, conditional on the hidden states, fulfills the independence property:
| 3 |
where is the conditional distribution of given the covariates and the hidden state occupied at time t.
As mentioned in Section 1, our objective is to provide joint estimation of the p quantiles of the univariate conditional distributions of , taking into account time-dependent heterogeneity and potential correlation among the components of . Given p quantile indexes , with , , the Quantile Hidden Semi-Markov Model (QHSMM) is defined as follows:
| 4 |
with being a state-specific matrix of unknown regression coefficients that evolves over time according to the hidden process and takes one of the values in the set , and where denotes a p-dimensional vector of error terms with univariate component-wise quantiles (at fixed levels , respectively) equal to zero.
Generalizing the approach of Petrella and Raponi (2019), as conditional distribution of we consider a Multivariate Asymmetric Laplace (MAL) distribution (see Kotz et al. 2012). In detail, based on (4) we assume whose probability density function is given by:
| 5 |
where, for each time occasion and state , the location parameter is defined by the linear model:
| 6 |
with being the skewness parameter, , , and having generic element , . is a positive definite matrix such that , with being a state-specific correlation matrix and , with , . Moreover, , , and denotes the modified Bessel function of the third kind with index parameter .
One of the key benefits of the MAL distribution is that, using (4) and (5), and following Kotz et al. (2012), the can be written as a location-scale mixture with the following representation:
| 7 |
where denotes a p-variate standard Normal distribution and has a standard exponential distribution, with being independent of . In particular, the constraints imposed on and guarantee that the j-th element of , , is the -th conditional quantile function of given , for and , and represent necessary conditions for model identifiability for any fixed quantile level , as stated in the next proposition.
Proposition 1
Let , for , where K is a positive integer, with being known for any fixed value of . Furthermore, is a positive definite matrix with being an unknown correlation matrix and , with fixed element , . Then, the model in (2)-(6) is identified.
Proof
See Proof of Proposition 1 in Appendix.
In comparison with other methods in the literature, the proposed modeling framework includes the homogeneous joint quantile regression approach of Petrella and Raponi (2019) when and it reduces to the univariate hidden semi-Markov-switching quantile regression of Maruotti et al. (2021) when . Naturally, when a geometric SD is assumed for all latent states, we call our methodology the Quantile Hidden Markov Model (QHMM).
Maximum likelihood estimation and inference
In this section we introduce a ML approach to making inference on model parameters. As is usually done in the literature in the presence of latent variables, we propose a suitable likelihood-based EM algorithm (Dempster et al. 1977). To hedge against possibly biased inference from incorrect parametric assumptions on the SDs, we estimate the sojourn probabilities nonparametrically following Guédon (2003). In addition, we show that both the E- and M-step updates of the algorithm can be obtained in closed form by exploiting the hierarchical representation of the MAL distribution in (7) under the constraints on and , hence reducing the computational burden compared to direct maximization of the likelihood. We illustrate the EM algorithm to fit the more general QHSMM but it can also be employed for the QHMM by assuming a geometric SD. To ease the notation, unless specified otherwise, we omit the quantile levels vector , yet all model parameters are allowed to depend on it. All the proofs are collected in the Appendix.
Let us denote by the set of model parameters. For any fixed , number of hidden states K and maximum sojourn times , we use the MAL representation in (7) to express the complete-data likelihood as follows:
| 8 |
where is a latent variable that follows an exponential distribution with parameter 1, is the r-th visited state, is the time spent in that state (i.e., the duration of the r-th visit) and is the number of state changes up to time T. Following Guédon (2003), the survivor function for the sojourn time in state k is defined as:
| 9 |
The survivor function sums up the individual probability masses of all possible sojourns of length and it has several advantages. Firstly, we do not have to assume that the process is leaving a state immediately after the upper endpoint T. Secondly, it provides a more accurate prediction of the last state visited, which is important when the data analysis wishes to estimate the most recently visited state, and improves parameter estimation (O’Connell and Højsgaard 2011).
The EM algorithm
The EM algorithm alternates between performing an expectation (E) step, which defines the expectation of the complete log-likelihood function evaluated using the current estimates of the parameters, and a maximization (M) step, which computes parameter estimates by maximizing the expected complete log-likelihood obtained in the E-step. The expected complete log-likelihood function and the optimal parameter updates are given in the following propositions.
Given the representation in (8), for the implementation of the algorithm we introduce the following quantities. We define the probability of being in state k at time t given the observed sequence as:
| 10 |
The probability the process left state j at time and entered state k at time t given the observed sequence is:
| 11 |
Finally, let us denote by the expected number of times the process spends u consecutive time steps in state k as:
| 12 |
Then, the expected log-likelihood for the complete data is presented in the following proposition.
Proposition 2
For any fixed , number of hidden states K and maximum sojourn times , the expected complete log-likelihood function (up to additive constants) is:
| 13 |
where
| 14 |
with
| 15 |
Therefore, the EM algorithm can be implemented as follows:
E-step: At the generic r-th iteration of the algorithm, let denote the current parameter estimates. Then, conditionally on the observed data and , the quantities, in (10) and in (11), can be calculated via a dynamic programming method known as the forward-backward algorithm (see, e.g., Levinson et al. 1983), while in (12) can be computed using the efficient adaptation of the forward-backward algorithm provided by Guédon (2003). Similarly, the conditional expectations and in (14) are considered; see the Appendix. We denote such quantities as and .
M-step: Substitute them in (13) to maximize with respect to , and obtain the updated parameter estimates. Because the expected complete-data log-likelihood in (13) decomposes into orthogonal subproblems, the maximization with respect to the regression coefficients, the parameters of the MAL distribution and the hidden process, can be performed separately. Thus, the initial probabilities and transition probabilities are estimated by:
| 16 |
To update the state-specific SD, we follow the nonparametric approach of Guédon (2003). In particular, we set the latent state duration densities to be discrete nonparametric distributions with arbitrary point mass assigned to the feasible duration values, that is, the SD is estimated as follows:
| 17 |
Finally, the M-step updates of the parameters in the regression equation , are given in the following proposition.
Proposition 3
At the generic r-th iteration, the values of and maximizing (13) are:
| 18 |
| 19 |
where .
For the j-th state, the elements , of the diagonal scale matrix are estimated by:
| 20 |
where is the quantile check function of Koenker and Bassett (1978):
| 21 |
with being the indicator function and being the k-th element of the vector .
The E- and M-steps are alternated until convergence, that is when the observed likelihood between two consecutive iterations is smaller than a predetermined threshold. In this paper, we set this threshold criterion equal to .
Following Maruotti et al. (2021), for fixed , K and , we initialize the EM algorithm by providing the initial states partition, , according to a Multinomial distribution with probabilities 1/K. From the generated partition, the off-diagonal elements of are computed as proportions of transition. We obtain and by fitting univariate quantile regressions on observations within state k, while is set equal to the empirical correlation computed on observations in the k-th state. The initial SDs are estimated from assuming a geometric distribution as in HMMs. To avoid convergence to local maxima and better explore the parameter space, we fit the proposed QHSMM using a multiple random starts strategy with different starting partitions and retain the solution corresponding to the maximum likelihood value.
Once we computed the ML estimates of the model parameters , we calculate standard errors using a parametric bootstrap approach (Visser et al. 2000). That is, we refitted the model to H bootstrap samples and approximate the standard error of each model parameter with its corresponding standard deviation computed on bootstrap samples. Hence, standard error estimates for are given by the diagonal elements of:
| 22 |
where is the set of parameter estimates for the h-th bootstrap sample and denote the sample mean of all .
Model selection
In the EM algorithm discussed above, the number of hidden states K and maximum length of state durations are unknown. From an applied perspective, choosing an adequate number of states is a crucial aspect of the data analysis which shall take into account the data structure and research question at hand. In particular, K is typically selected using penalized-likelihood criteria or cross-validation methods, which can become demanding to fit computationally (Pohle et al. 2017). In HSMMs, not only the number of hidden states shall be selected, but also the maximum length of state durations. In practice, this amounts to fixing , , with U being large enough to capture the main support of the SD in each state (see Maruotti et al. 2021). The major disadvantages of this approach are the large number of parameters to be estimated and the fact that different states may require substantially different maximum sojourn times. For these reasons, herein we simultaneously select the optimal values of K and vector using penalized likelihood criteria, such as AIC (Akaike 1998), BIC (Schwarz 1978) and ICL (Biernacki et al. 2000) which penalizes the BIC for the estimated mean entropy and it is given by:
| 23 |
where in (23) is defined as , with being the observed data log-likelihood in correspondence of the ML estimate of , T corresponds to the number of observations and denotes the number of free model parameters in . Computing may prove to be difficult to evaluate directly because it involves the sum on every possible state sequence of length T, , and the sum on every supplementary duration from time spent in the state occupied at time T, . Therefore, to compute we use the variables in (10)-(12) required for the EM algorithm (please see Guédon 2003).
All criteria involve penalization terms depending on the number of parameters , which is given by the sum of:
the number of regression parameters in : ,
the number of scale parameters in : ,
the number of correlation parameters in : ,
the number of independent transition probabilities in : ,
the unconstrained sojourn distribution probabilities:
To select the order of the hidden process, we first define a sequence of values of K and construct a K-dimensional grid of maximum sojourn distributions, , and then fit the model using the EM algorithm described above for fixed , K and a vector in . Because a full search over might be computationally infeasible, we employ the greedy search algorithm considered in Langrock et al. (2015) and Adam et al. (2019) and select the best combination of corresponding to the lowest value of the penalized likelihood criteria.
Simulation study
We conduct a simulation study to evaluate the finite sample properties of the proposed QHSMM. This simulation exercise addresses the following issues: (i) study the performance of the model under different distributional choices for the error term and SDs, when either a linear or nonlinear quantile regression function of given is considered; (ii) assess the classification performance of the proposed model; (iii) evaluate the performance of penalized likelihood criteria in selecting the optimal number of hidden states K and maximum sojourn times . Additional simulation studies are illustrated in the Supplementary Materials.
We consider , a continuous response variable of dimension and one explanatory variable . The observations are generated from a two state HSMM, i.e., , using the following data generating process:
| 24 |
where and the true values of the state-dependent parameters, and , are given by:
| 25 |
We consider the following two distributions for the error terms in (24):
- ()
: are generated from a bivariate Normal random variable with zero mean vector and variance-covariance matrix equal to , for ;
- ()
: are generated from a bivariate Student t distribution with 5 degrees of freedom, zero mean and scale matrix equal to , for .
The state-specific covariance matrices are set equal to low () and high () correlation between the responses.
Similarly to Maruotti et al. (2021), for each scenario we further consider three SDs:
-
(SPO): a shifted-Poisson, i.e.:
with and ;26 -
(SNB): a shifted-negative binomial, i.e.:
with and ;27 -
(GEO): a geometric sojourn, i.e.:
with and .28
We fit the proposed QHSMM for five quantile levels, i.e., , , , and . For each model, we carry out Monte Carlo replications and report the following indicators. The Average Relative Bias (ARB), expressed as a percentage:
| 29 |
where is the estimated parameter at level for the b-th replication and is the corresponding “true” value. Secondly, the Root Mean Square Error (RMSE) of model parameters averaged across the B simulations:
| 30 |
To assess the first and second queries of this simulation exercise, Tables 1 and 2 report the ARB and RMSE for the state-specific coefficients and . As can be noted, the proposed model under the Normal and Student t error distributions is able to recover the true state-dependent parameters for both low and high degree of dependence and all three considered SDs. Not surprisingly, the bias effect is quite small when we analyze the median levels (see column 3). As the quantile levels become more extreme (see columns 1, 2, 4 and 5), the ARB slightly increases but it still remains reasonably small. Also, under the () scenario the heavier tails of the Student t contribute to higher ARB and RMSE especially at the 10-th and 90-th percentiles.
Table 1.
ARB (in percentage) and RMSE (in brackets) for state-parameter estimates of and with normal errors for the QHSMM
| True Coef. | |||||
|---|---|---|---|---|---|
| (0.10, 0.10) | (0.25, 0.25) | (0.50, 0.50) | (0.75, 0.75) | (0.90, 0.90) | |
| Panel A: SPO | |||||
| − 4 | |||||
| − 3 | |||||
| − 2 | |||||
| − 1 | |||||
| 1 | |||||
| 2 | |||||
| 4 | |||||
| 5 | |||||
| Panel B: SNB | |||||
| − 4 | |||||
| − 3 | |||||
| − 2 | |||||
| − 1 | |||||
| 1 | |||||
| 2 | |||||
| 4 | |||||
| 5 | |||||
| Panel C: GEO | |||||
| − 4 | |||||
| − 3 | |||||
| − 2 | |||||
| − 1 | |||||
| 1 | |||||
| 2 | |||||
| 4 | |||||
| 5 | |||||
Table 2.
ARB (in percentage) and RMSE (in brackets) for state-parameter estimates of and with Student t errors for the QHSMM
| True Coef. | (0.10, 0.10) | (0.25, 0.25) | (0.50, 0.50) | (0.75, 0.75) | (0.90, 0.90) |
|---|---|---|---|---|---|
| Panel A: SPO | |||||
| − 4 | |||||
| − 3 | |||||
| − 2 | |||||
| − 1 | |||||
| 1 | |||||
| 2 | |||||
| 4 | |||||
| 5 | |||||
| Panel B: SNB | |||||
| − 4 | |||||
| − 3 | |||||
| − 2 | |||||
| − 1 | |||||
| 1 | |||||
| 2 | |||||
| 4 | |||||
| 5 | |||||
| Panel C: GEO | |||||
| − 4 | |||||
| − 3 | |||||
| − 2 | |||||
| − 1 | |||||
| 1 | |||||
| 2 | |||||
| 4 | |||||
| 5 | |||||
To evaluate the classification performance of the proposed model, we report the average Adjusted Rand Index (ARI) of Hubert and Arabie 1985 and the misclassification rate (MCR). Specifically, we compare the classification obtained by the QHSMM with the one obtained from a QHMM under the assumption of a geometric SD. The state partition provided by the fitted models is obtained by taking the maximum, , posteriori probability for every . The results in Table 3 show that when the true SD of the data generating process is geometric, the QHMM provides a slightly better classification both in terms of ARI and MCR (please compare the GEO row of Panel A with that of Panel C and the GEO row of Panel B with that of Panel D). This is not surprising as the QHMM implicitly assumes geometrically distributed sojourn distributions. The QHSMM, on the contrary, outperforms the QHMM in all other cases, as it can approximate arbitrarily well any SD and does not rely on a distributional assumption for (compare the SPO and SNB rows of Panel A with those of Panel C, and the SPO and SNB rows of Panel B with those of Panel D), with very few exceptions at quantile levels and where the two models give comparable results.
Table 3.
Average and standard deviation (in brackets) values of the ARI and MCR for the QHSMM and QHMM under the three considered SDs and two distributions for the error term
| SD | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| (0.10, 0.10) | (0.25, 0.25) | (0.50, 0.50) | (0.75, 0.75) | (0.90, 0.90) | ||||||
| ARI | MCR | ARI | MCR | ARI | MCR | ARI | MCR | ARI | MCR | |
| QHSMM | ||||||||||
| Panel A: | ||||||||||
| SPO | 0.905 | 0.024 | 0.908 | 0.023 | 0.908 | 0.023 | 0.909 | 0.023 | 0.907 | 0.024 |
| (0.022) | (0.006) | (0.021) | (0.006) | (0.021) | (0.006) | (0.021) | (0.005) | (0.022) | (0.006) | |
| SNB | 0.852 | 0.037 | 0.857 | 0.036 | 0.855 | 0.036 | 0.856 | 0.036 | 0.853 | 0.037 |
| (0.026) | (0.007) | (0.026) | (0.007) | (0.026) | (0.007) | (0.026) | (0.007) | (0.026) | (0.007) | |
| GEO | 0.783 | 0.057 | 0.785 | 0.057 | 0.783 | 0.057 | 0.783 | 0.057 | 0.782 | 0.058 |
| (0.028) | (0.008) | (0.029) | (0.008) | (0.028) | (0.008) | (0.029) | (0.008) | (0.028) | (0.008) | |
| Panel B: | ||||||||||
| SPO | 0.874 | 0.032 | 0.885 | 0.030 | 0.885 | 0.029 | 0.885 | 0.029 | 0.876 | 0.032 |
| (0.029) | (0.008) | (0.023) | (0.006) | (0.023) | (0.006) | (0.023) | (0.006) | (0.025) | (0.007) | |
| SNB | 0.810 | 0.048 | 0.824 | 0.045 | 0.824 | 0.045 | 0.824 | 0.045 | 0.811 | 0.048 |
| (0.030) | (0.008) | (0.028) | (0.008) | (0.028) | (0.008) | (0.029) | (0.008) | (0.031) | (0.008) | |
| GEO | 0.737 | 0.071 | 0.745 | 0.068 | 0.744 | 0.069 | 0.743 | 0.069 | 0.735 | 0.071 |
| (0.031) | (0.009) | (0.031) | (0.009) | (0.031) | (0.009) | (0.031) | (0.009) | (0.031) | (0.009) | |
| QHMM | ||||||||||
| Panel C: | ||||||||||
| SPO | 0.895 | 0.027 | 0.904 | 0.024 | 0.909 | 0.023 | 0.907 | 0.024 | 0.900 | 0.025 |
| (0.022) | (0.006) | (0.021) | (0.005) | (0.020) | (0.005) | (0.020) | (0.005) | (0.021) | (0.005) | |
| SNB | 0.850 | 0.038 | 0.856 | 0.036 | 0.861 | 0.035 | 0.857 | 0.036 | 0.851 | 0.037 |
| (0.026) | (0.007) | (0.025) | (0.007) | (0.025) | (0.006) | (0.025) | (0.007) | (0.026) | (0.007) | |
| GEO | 0.799 | 0.053 | 0.802 | 0.052 | 0.804 | 0.052 | 0.802 | 0.052 | 0.798 | 0.053 |
| (0.026) | (0.007) | (0.026) | (0.007) | (0.026) | (0.007) | (0.027) | (0.007) | (0.027) | (0.008) | |
| Panel D: | ||||||||||
| SPO | 0.856 | 0.037 | 0.874 | 0.032 | 0.883 | 0.030 | 0.876 | 0.032 | 0.858 | 0.037 |
| (0.026) | (0.007) | (0.024) | (0.006) | (0.023) | (0.006) | (0.023) | (0.006) | (0.026) | (0.007) | |
| SNB | 0.803 | 0.050 | 0.820 | 0.046 | 0.830 | 0.043 | 0.822 | 0.045 | 0.806 | 0.049 |
| (0.029) | (0.008) | (0.027) | (0.007) | (0.026) | (0.007) | (0.027) | (0.007) | (0.029) | (0.008) | |
| GEO | 0.752 | 0.066 | 0.762 | 0.063 | 0.768 | 0.062 | 0.763 | 0.063 | 0.752 | 0.066 |
| (0.030) | (0.009) | (0.030) | (0.008) | (0.029) | (0.008) | (0.030) | (0.008) | (0.029) | (0.008) | |
We further evaluate the QHSMM introduced when a nonlinear quantile regression function of given is considered. Similarly to Geraci (2019), the observations are generated from a two state HSMM using the following nonlinear quantile regression models. In the first scenario, we simulated the data from the following logistic model:
| 31 |
where the explanatory variable is drawn from a continuous uniform distribution, , and where and denote the j-th component of and , respectively. For each component j and hidden state k, the true values of the parameters, , are given by , , and .
In the second scenario, following El Ghouch and Genton (2009) the data are generated according to the equation:
| 32 |
where and where the parameter in (32) can be seen as a misspecification parameter that controls the deviation from the polynomial function. As increases, the data structure becomes more complicated, and approximating the true curve by a polynomial becomes increasingly difficult. In this study we chose and . The true values of the parameters are given by , , and .
For the error terms in (31) and (32), and the SDs we considered the same distributions adopted for the linear case. Examples of the simulated data are shown in Figures 1 and 2 from the two scenarios, respectively.
Fig. 1.
Examples of data generated from the first scenario. Scatterplot of (first column) and (second column) under normal (first row) and Student t (second row) errors as a function of the included covariate, when using shifted-Poisson SDs. Red and blue data points distinguish the two latent states
Fig. 2.
Examples of data generated from the second scenario. Scatterplot of (first column) and (second column) under normal (first row) and Student t (second row) errors as a function of the included covariate, when using shifted-Poisson SDs. Red and blue data points distinguish the two latent states
We fit the proposed QHSMM for five quantile levels, i.e., , , , and . For each model, we report the Proportion of Negative Residuals (PNR):
| 33 |
with being the fitted conditional quantile of at level , . If the model is correctly specified, the PNR should be approximately equal to for each outcome. The results contained in Tables 4 and 5 are averaged over Monte Carlo replications.
Table 4.
Average and standard deviation (in brackets) values of the PNR for and under the considered SDs and error term distributions for the first scenario over 1000 samples
| PNR | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| (0.10, 0.10) | (0.25, 0.25) | (0.50, 0.50) | (0.75, 0.75) | (0.90, 0.90) | ||||||
| Panel A: | ||||||||||
| SPO | 0.096 | 0.083 | 0.265 | 0.233 | 0.497 | 0.504 | 0.751 | 0.769 | 0.920 | 0.912 |
| (0.005) | (0.004) | (0.008) | (0.007) | (0.007) | (0.008) | (0.006) | (0.006) | (0.005) | (0.004) | |
| SNB | 0.091 | 0.079 | 0.251 | 0.224 | 0.492 | 0.501 | 0.750 | 0.770 | 0.918 | 0.911 |
| (0.007) | (0.006) | (0.010) | (0.009) | (0.012) | (0.014) | (0.007) | (0.006) | (0.005) | (0.004) | |
| GEO | 0.088 | 0.076 | 0.246 | 0.231 | 0.499 | 0.502 | 0.753 | 0.769 | 0.920 | 0.913 |
| (0.008) | (0.006) | (0.012) | (0.017) | (0.007) | (0.008) | (0.007) | (0.006) | (0.004) | (0.005) | |
| Panel B: | ||||||||||
| SPO | 0.097 | 0.080 | 0.266 | 0.232 | 0.498 | 0.502 | 0.750 | 0.768 | 0.918 | 0.914 |
| (0.006) | (0.004) | (0.007) | (0.007) | (0.007) | (0.008) | (0.007) | (0.006) | (0.005) | (0.004) | |
| SNB | 0.089 | 0.080 | 0.256 | 0.224 | 0.495 | 0.502 | 0.750 | 0.768 | 0.917 | 0.912 |
| (0.006) | (0.005) | (0.011) | (0.010) | (0.010) | (0.012) | (0.007) | (0.006) | (0.005) | (0.004) | |
| GEO | 0.095 | 0.080 | 0.249 | 0.231 | 0.500 | 0.500 | 0.752 | 0.769 | 0.918 | 0.914 |
| (0.008) | (0.005) | (0.014) | (0.017) | (0.007) | (0.009) | (0.007) | (0.006) | (0.004) | (0.005) | |
Table 5.
Average and standard deviation (in brackets) values of the PNR for and under the considered SDs and error term distributions for the second scenario over 1000 samples
| PNR | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| (0.10, 0.10) | (0.25, 0.25) | (0.50, 0.50) | (0.75, 0.75) | (0.90, 0.90) | ||||||
| Panel A: | ||||||||||
| SPO | 0.095 | 0.090 | 0.249 | 0.239 | 0.479 | 0.482 | 0.743 | 0.760 | 0.907 | 0.923 |
| (0.010) | (0.017) | (0.007) | (0.007) | (0.008) | (0.008) | (0.008) | (0.008) | (0.033) | (0.011) | |
| SNB | 0.090 | 0.083 | 0.250 | 0.239 | 0.479 | 0.481 | 0.742 | 0.758 | 0.922 | 0.926 |
| (0.008) | (0.013) | (0.006) | (0.006) | (0.008) | (0.008) | (0.008) | (0.008) | (0.024) | (0.009) | |
| GEO | 0.092 | 0.084 | 0.251 | 0.238 | 0.486 | 0.483 | 0.742 | 0.758 | 0.909 | 0.926 |
| (0.008) | (0.014) | (0.006) | (0.006) | (0.008) | (0.009) | (0.008) | (0.008) | (0.042) | (0.011) | |
| Panel B: | ||||||||||
| SPO | 0.090 | 0.082 | 0.254 | 0.245 | 0.483 | 0.485 | 0.746 | 0.756 | 0.933 | 0.926 |
| (0.009) | (0.015) | (0.006) | (0.007) | (0.008) | (0.008) | (0.008) | (0.008) | (0.012) | (0.009) | |
| SNB | 0.088 | 0.079 | 0.254 | 0.244 | 0.482 | 0.484 | 0.745 | 0.754 | 0.935 | 0.925 |
| (0.008) | (0.012) | (0.006) | (0.007) | (0.008) | (0.008) | (0.008) | (0.007) | (0.011) | (0.009) | |
| GEO | 0.090 | 0.080 | 0.254 | 0.243 | 0.487 | 0.485 | 0.745 | 0.755 | 0.923 | 0.924 |
| (0.008) | (0.012) | (0.006) | (0.007) | (0.008) | (0.008) | (0.008) | (0.007) | (0.031) | (0.011) | |
By looking at the results, PNR rates are in general coherent with the selected quantile level. Further, one can observe that PNRs are typically closer to the nominal values in the first scenario as opposed to the second one. This may be explained by the fact that the fitted model provides a better linear approximation of the true logistic quantile regression function than the more complex nonlinear model in the second scenario. Moreover, in this latter case, the PNRs obtained for the second component do worse than the corresponding ones for , which is mainly due to the relatively larger variance associated to the misspecification parameter in (32). Overall, in both cases the PNRs are slightly, especially at the tails, below (at ) or above (at ) the expected proportions.
Finally, to assess the performance of penalized likelihood criteria (AIC, BIC and ICL) for selecting the number of hidden states and maximum sojourn times, we considered the same simulation experiment where the sojourns are generated from a beta-binomial distribution:
| 34 |
where is the beta function, , , and . For each of the simulated datasets, we fit the QHSMM with over the grid of state-dependent maximum sojourn times , and select the best combination of associated to the lowest penalized likelihood criteria. Table 6 reports the number of times each criterion correctly identifies the number of latent states and maximum support points of the SDs (Panel A) and the absolute frequency distributions of the selected K for each of the three criteria (Panel B).
Table 6.
Number of correctly identified hidden states K and maximum sojourn times (Panel A) and absolute frequency distribution of the selected number of states (Panel B) under Normal and Student t errors over 100 replications
| (0.10, 0.10) | (0.25, 0.25) | (0.50, 0.50) | (0.75, 0.75) | (0.90, 0.90) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AIC | BIC | ICL | AIC | BIC | ICL | AIC | BIC | ICL | AIC | BIC | ICL | AIC | BIC | ICL | |
| Panel A | |||||||||||||||
| 0 | 1 | 100 | 0 | 54 | 99 | 30 | 98 | 99 | 0 | 61 | 100 | 0 | 2 | 99 | |
| 0 | 0 | 82 | 0 | 27 | 97 | 72 | 86 | 97 | 0 | 22 | 97 | 0 | 0 | 92 | |
| Panel B | |||||||||||||||
| Panel B.1: | |||||||||||||||
| 2 | 0 | 1 | 100 | 0 | 56 | 100 | 30 | 99 | 99 | 0 | 62 | 100 | 0 | 2 | 100 |
| 3 | 0 | 70 | 0 | 0 | 44 | 0 | 4 | 1 | 1 | 0 | 38 | 0 | 0 | 80 | 0 |
| 4 | 100 | 29 | 0 | 100 | 0 | 0 | 66 | 0 | 0 | 100 | 0 | 0 | 100 | 18 | 0 |
| Panel B.2: | |||||||||||||||
| 2 | 0 | 0 | 88 | 0 | 32 | 100 | 72 | 99 | 100 | 0 | 28 | 100 | 0 | 0 | 98 |
| 3 | 0 | 15 | 11 | 1 | 68 | 0 | 13 | 1 | 0 | 0 | 72 | 0 | 0 | 56 | 2 |
| 4 | 100 | 85 | 1 | 99 | 0 | 0 | 15 | 0 | 0 | 100 | 0 | 0 | 100 | 44 | 0 |
As one can see in Panel A, all three criteria work relatively well at . By contrast, as we move towards the tails of the distribution of the responses, the ICL outperforms both the AIC and BIC and correctly identifies the pair that was used to generate the data more than 96% across all simulation scenarios. Moving onto Panel B, the AIC consistently performs worse than the BIC but both mostly overestimate the true number of states. These results suggest that regardless of the distribution on the error terms and SDs, the ICL yields superior performance and captures serial heterogeneity in the data in a more parsimonious manner compared to the other criteria, easing the interpretation of the latent states.
Application
In this section we apply the proposed methodology to air pollution data collected by the Lazio Regional Agency for Environmental Prevention and Protection (ARPA Lazio, https://www.arpalazio.it) in Italy. The ARPA Lazio provides information regarding the regional state of the environment and environmental trends, performing scientific, technical and research functions as well as assessment, monitoring, control and supporting local and health authorities. The time series used in this research are freely available from the ARPA Lazio website (https://www.arpalazio.net/main/aria/sci/basedati/chimici/chimici.php).
Data description
The data considered originate from a regional monitoring network system developed by the Lazio Region and have already been discussed in the work of Maruotti et al. (2017). This system has been organized in order to respond to the increasing demand for environmental information, but also for providing reliable data suitable for policy-related aspects. The network recorded concentrations of nine air pollutants on hourly basis at monitoring stations in the central area of the city of Rieti, Italy. The Rieti site has been chosen as it is classified as a traffic location, although it is located at a short distance from green areas and forests that facilitate the movement of air masses and removal of pollutants. In this work we consider three major air pollutants, i.e., Particulate Matter with aerodynamic diameter less than 2.5 m (PM), Ozone (O) and Nitrogen Dioxide (NO) from January 01, 2019 to June 14, 2021. We averaged the pollution data to daily frequency and all concentrations are expressed in g/m.
Atmospheric variables also play a major role in determining the level of exposure to particular pollutants and capturing time dependence as proxies for seasonal variations and characteristics. Since pollution episodes are triggered by specific atmospheric factors, we include the following variables, namely the daily average wind speed, temperature, pressure and humidity. Table 7 presents the main descriptive statistics for the response variables and the set of included predictors. The asymmetry in the distributions of the pollutants is noted by examining the mean-median relationship and the five summary statistics indicate severe departures from the Gaussian assumptions, presenting high kurtosis and outlying values. In the same table we also report the empirical correlation coefficients between each response variable which clearly highlight a positive correlation between PM and NO, and a negative association with O. Therefore, the dependence structure among different pollutants, which cannot be detected by univariate methods, constitutes a crucial aspect of the analysis and should not be neglected.
Table 7.
Summary statistics of the sample data
| Variable | Minimum | 1st quartile | Median | Mean | 3rd quartile | Maximum |
|---|---|---|---|---|---|---|
| PM | 0 | 6 | 9 | 11.846 | 15 | 63 |
| O | 7 | 31 | 49 | 46.340 | 61 | 94 |
| NO | 0 | 8 | 12 | 14.139 | 20 | 44 |
| Wind Speed | 0 | 6 | 8 | 8.839 | 11 | 29 |
| Temperature | 0 | 11 | 16 | 16.659 | 22 | 34 |
| Pressure | 989 | 1011 | 1015 | 1015.127 | 1020 | 1036 |
| Humidity | 17 | 42 | 56 | 58.415 | 74 | 100 |
| Correlation matrix | ||||||
| PM | O | NO | ||||
| PM | 1 | |||||
| O | 1 | |||||
| NO | 0.743 | 1 | ||||
From a graphical standpoint, Fig. 3 shows the normal QQ plots for the PM, O and NO time series. These reveal the presence of potentially influential observations in the data, heavy tails and skewness for all three outcomes. This exploratory analysis and preliminary considerations motivate us to consider a joint quantile regression approach as investigative tool.
Fig. 3.
From left to right, univariate normal QQ plots for PM, O and NO
Results
We jointly model the concentrations of PM, O and NO as a function of Wind Speed, Temperature, Pressure and Humidity at quantile levels , and . Considering the 75-th and 90-th percentiles puts emphasis on alert thresholds for ambient air pollution associated with high levels of concentrations of chemicals. As a first step of the analysis, we fit the proposed QHSMM for a sequence of states K from 1 to 6 and a K-dimensional grid, , of maximum sojourn times, . To select the optimal combination of K and , with , and avoid the computational cost of a full grid search over , we employ the greedy search algorithm described in Sect. 3.2 with 200 starting points. Table 8 reports the log-likelihood, AIC, BIC and ICL values for the fitted models at the investigated quantile levels. The AIC selects states for all quantile levels meanwhile the BIC and ICL are more aligned in the choice of K as they identify 3, 5 and 4 states for , and . This is not surprising since the AIC tends to overestimate the number of hidden states and, for this reason, we will not consider it hereafter. We can also see that the ICL values associated to and are extremely similar at . Following these considerations and looking at Table 8, we select the best fitted models with K equal to 3, 3 and 4 according to both the BIC and ICL criteria at , and , respectively.
Table 8.
Log-likelihood, AIC, BIC and ICL values for a varying number of hidden states. Bold font highlights the best values for the considered criteria (lower-is-better)
| K | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| (0.50, 0.50, 0.50) | (0.75, 0.75, 0.75) | (0.90, 0.90, 0.90) | ||||||||||
| Loglik | AIC | BIC | ICL | Loglik | AIC | BIC | ICL | Loglik | AIC | BIC | ICL | |
| 1 | 16611.98 | 16712.73 | 16712.73 | 17149.85 | 17250.61 | 17250.61 | 18046.10 | 18146.86 | 18146.86 | |||
| 2 | 15986.87 | 16466.66 | 16521.09 | 16387.25 | 16867.04 | 16937.16 | 17040.86 | 17544.65 | 17624.07 | |||
| 3 | 15549.64 | 15952.36 | 16758.42 | 16801.01 | 16635.73 | 17441.79 | 17490.93 | |||||
| 4 | 15315.28 | 16385.22 | 16424.51 | 15756.27 | 16826.21 | 16908.00 | 16258.32 | |||||
| 5 | 16187.32 | 16240.48 | 17477.18 | 17638.80 | ||||||||
| 6 | 15099.02 | 16797.49 | 16854.63 | 15509.90 | 17208.37 | 17284.46 | 16191.63 | 17794.14 | 17916.22 | |||
Figures 4, 5 and 6 report the classification results according to the selected models at the investigated quantile levels. Each plot shows the data points colored according to the estimated posterior probability of class membership, , with the vertical lines separating blocks of four months. In our study, the latent components can be associated to specific exposure regimes characterized by seasonal weather conditions. Specifically, blue points (state 2) tend to cluster days in late autumn, winter, and early spring, while green ones (state 3) are generally inferred during late spring and summer. At (see Figure 6), the four state QHSMM identifies a similar classification pattern with violet points (state 4) occurring mainly from spring until the end of summer, but also sporadically during the rest of the year. The overall results reflect the seasonal variation in atmospheric pollutants, with high concentrations of PM and NO in winter and high O concentrations in summer and warm months. It is also worth noting that the minimum values of particulate matter and nitrogen dioxide, and the maximum level of ozone were reached in state 1 (orange dots) during the period March-June 2020. These sudden changes may be possibly due to the implementation of lockdown measures to contain the COVID-19 outbreak in Italy (Bassani et al. 2021; Putaud et al. 2021). Indeed, in the first weeks of March, significant PM and NO declines were observed which led to O peaks resulting from reduced titration with nitrogen oxides. As restrictions were lifted in May-June, chemical concentrations settled again around the 2019 and early 2021 levels.
Fig. 4.

Time series classified according to their estimated posterior probability of class membership at . Vertical lines separate blocks of four months
Fig. 5.

Time series classified according to their estimated posterior probability of class membership at . Vertical lines separate blocks of four months
Fig. 6.

Time series classified according to their estimated posterior probability of class membership at . Vertical lines separate blocks of four months
The estimated transition probability matrices of the latent semi-Markov chain (see Table 9) confirm that pollutant concentrations alternate between good and poor air conditions. The off-diagonal elements demonstrate that the hidden process stays in state 3 (low levels of particulate matter, nitrogen dioxide and moderate levels of ozone), and 4 in the case , with temporary changes towards more severe PM and NO concentrations (state 2) or higher ozone episodes (state 1). Meanwhile, direct transitions between states 1 and 2 are very unlikely at all three quantile levels.
Table 9.
Estimated transition probabilities for different quantile levels
| states | 1 | 2 | 3 | 4 |
|---|---|---|---|---|
| Panel A: (0.50, 0.50, 0.50) | ||||
| 1 | 0 | |||
| 2 | 0 | |||
| 3 | 0 | |||
| Panel B: (0.75, 0.75, 0.75) | ||||
| 1 | 0 | |||
| 2 | 0 | |||
| 3 | 0 | |||
| Panel C: (0.90, 0.90, 0.90) | ||||
| 1 | 0 | |||
| 2 | 0 | |||
| 3 | 0 | |||
| 4 | 0 | |||
Taking the effect of covariates into account, Table 10 shows the estimated state-specific regression parameters, , at each quantile level, respectively. Standard errors are computed via parametric bootstrap using resamples as illustrated in Section 3 and point estimates are displayed in boldface when significant at the standard 5% level.
Table 10.
Regression parameters estimates. Point estimates are displayed in boldface when significant at the standard 5% level
| States | 1 | 2 | 3 | 4 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PM | O | NO | PM | O | NO | PM | O | NO | PM | O | NO | |
| Panel A: (0.50, 0.50, 0.50) | ||||||||||||
| Intercept | 149.664 | |||||||||||
| (28.810) | (58.965) | (20.840) | (130.720) | (157.563) | (101.024) | (20.776) | (55.920) | (17.086) | ||||
| Wind Speed | ||||||||||||
| (0.045) | (0.093) | (0.034) | (0.209) | (0.256) | (0.160) | (0.035) | (0.086) | (0.028) | ||||
| Temperature | 0.237 | |||||||||||
| (0.025) | (0.050) | (0.019) | (0.117) | (0.145) | (0.089) | (0.020) | (0.048) | (0.015) | ||||
| Pressure | 0.156 | |||||||||||
| (0.028) | (0.057) | (0.020) | (0.126) | (0.152) | (0.097) | (0.020) | (0.054) | (0.016) | ||||
| Humidity | ||||||||||||
| (0.012) | (0.023) | (0.008) | (0.053) | (0.066) | (0.041) | (0.009) | (0.022) | (0.007) | ||||
| Panel B: (0.75, 0.75, 0.75) | ||||||||||||
| Intercept | ||||||||||||
| (37.049) | (75.844) | (17.254) | (75.045) | (80.288) | (53.198) | (21.221) | (50.143) | (15.392) | ||||
| Wind Speed | ||||||||||||
| (0.059) | (0.115) | (0.027) | (0.124) | (0.126) | (0.085) | (0.033) | (0.079) | (0.024) | ||||
| Temperature | ||||||||||||
| (0.032) | (0.062) | (0.015) | (0.067) | (0.071) | (0.046) | (0.018) | (0.045) | (0.014) | ||||
| Pressure | ||||||||||||
| (0.036) | (0.073) | (0.017) | (0.072) | (0.077) | (0.051) | (0.020) | (0.048) | (0.015) | ||||
| Humidity | ||||||||||||
| (0.014) | (0.029) | (0.007) | (0.029) | (0.032) | (0.021) | (0.009) | (0.021) | (0.007) | ||||
| Panel C: (0.90, 0.90, 0.90) | ||||||||||||
| Intercept | 161.731 | 34.514 | ||||||||||
| (157.253) | (257.786) | (59.375) | (76.346) | (47.414) | (32.034) | (24.663) | (43.603) | (22.358) | (22.284) | (59.029) | (19.866) | |
| Wind Speed | ||||||||||||
| (0.260) | (0.415) | (0.151) | (0.121) | (0.074) | (0.053) | (0.039) | (0.066) | (0.035) | (0.034) | (0.091) | (0.031) | |
| Temperature | 0.178 | |||||||||||
| (0.137) | (0.235) | (0.053) | (0.069) | (0.041) | (0.029) | (0.022) | (0.039) | (0.019) | (0.019) | (0.049) | (0.017) | |
| Pressure | 0.170 | 0.084 | 0.009 | |||||||||
| (0.152) | (0.247) | (0.057) | (0.074) | (0.046) | (0.031) | (0.024) | (0.042) | (0.021) | (0.021) | (0.057) | (0.019) | |
| Humidity | ||||||||||||
| (0.064) | (0.107) | (0.025) | (0.033) | (0.019) | (0.013) | (0.010) | (0.018) | (0.009) | (0.009) | (0.024) | (0.008) | |
The estimated effects of the included covariates tend to be nonlinear, state-specific and are generally more pronounced in the upper end of the distribution of the responses. States 1 and 3 yield similar estimates as they are both associated with the best air conditions, especially when looking at effect of pressure and humidity. This is in sharp contrast with the point estimates in state 2 which is characterized by hazardous air quality. Among the four factors, wind speed and temperature exert the strongest influence on the considered pollutants in all seasons. In particular, PM and NO are negatively associated with both variables and such association increases during the coldest months of the year. Wind intensity, therefore, contributes considerably to the reduction of pollution, in particular in at-risk situations as identified by state 2. On the other hand, O concentration is positively associated with wind speed meanwhile the effect of temperature is positive in late-spring and hot weather, but negative throughout the rest of the year. Humidity can also help to decrease ozone pollution because the moisture in the air could enhance the condensation of water and slow down ozone production, but also reduce PM and NO. Further, the concentrations of PM and NO are positively associated with atmospheric pressure and negatively associated with O.
We conclude the analysis by reporting the estimated correlation matrices, , (see Table 11) in order to provide an indirect measure of tail dependence between the outcomes. Firstly, regardless of the state, the correlation coefficients between the air pollutants are generally significant, indicating that in this case fitting univariate quantile regressions separately would be inappropriate. Secondly, the correlation coefficients are increasing somewhat with . Finally, the estimates depict a data correlation structure that varies among the latent groups as the correlation between PM, O and NO in the wintertime is substantially higher than that in spring and summer.
Table 11.
Estimated state-dependent correlation matrices for different quantile levels. Point estimates are displayed in boldface when significant at the standard 5% level
| States | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PM | O | NO | PM | O | NO | PM | O | NO | PM | O | NO | |
| Panel A: (0.50, 0.50, 0.50) | ||||||||||||
| PM | 1 | 1 | 1 | |||||||||
| O | 0.098 | 1 | 1 | 0.031 | 1 | |||||||
| (0.062) | (0.154) | (0.057) | ||||||||||
| NO | 1 | 1 | 1 | |||||||||
| (0.063) | (0.061) | (0.116) | (0.137) | (0.047) | (0.056) | |||||||
| Panel B: (0.75, 0.75, 0.75) | ||||||||||||
| PM | 1 | 1 | 1 | |||||||||
| O | 1 | 1 | 1 | |||||||||
| (0.077) | (0.095) | (0.056) | ||||||||||
| NO | 1 | 1 | 1 | |||||||||
| (0.084) | (0.082) | (0.061) | (0.097) | (0.050) | (0.061) | |||||||
| Panel C: (0.90, 0.90, 0.90) | ||||||||||||
| PM | 1 | 1 | 1 | 1 | ||||||||
| O | 0.151 | 1 | 1 | 0.133 | 1 | 1 | ||||||
| (0.193) | (0.121) | (0.076) | (0.094) | |||||||||
| NO | 1 | 1 | 0.078 | 1 | 1 | |||||||
| (0.182) | (0.203) | (0.077) | (0.190) | (0.077) | (0.095) | (0.089) | (0.114) | |||||
Conclusion
The study of pollution exposure is at the heart of policy attention for health and economics welfare analysis. Motivated the necessity to develop sound policies for controlling air contaminants emissions, this paper extends the joint quantile regression of Petrella and Raponi (2019) by introducing a hidden semi-Markov quantile regression for the analysis of multiple pollutants time series. The proposed model allows to capture quantile-specific effects across the entire distribution of several outcomes in one step and infer cluster-specific covariate effects at various quantile levels of interest. In order to avoid making biased inference related to incorrect distributional assumptions about the SDs, we adopt the approach of Guédon (2003) where the latent sojourn densities are approximated by using nonparametric discrete distributions and estimated directly from the data. Using simulation exercises, the proposed approach reveals promising results in reducing state misclassification rates with respect to HMMs and identifying the correct number of hidden states. In the empirical application, we employ our methodology to jointly model daily PM, O and NO concentrations recorded in Rieti (Italy) as a function of wind speed, temperature, pressure and humidity. We find that seasonal changes in air pollutant concentrations are greatly affected by meteorological conditions, whose effects are generally amplified at the top end of the distribution of the responses. Moreover, the latent regimes capture seasonal variations in air pollution particles that characterize low and hazardous contamination levels.
This work could be extended further in the following directions. In particular, the method proposed could be positively applied to financial time series modeling. Indeed, financial returns often exhibit empirical characteristics, such as skewness, leptokurtosis, heteroscedasticity and clustering behavior over time, which are heavily influenced by hidden variables (e.g., the state of the market) during tranquil and crisis periods (Maruotti et al. 2021). In this context, time dependence can be further included in the regression model, allowing the quantile to vary over time according to an autoregressive process (Engle and Manganelli 2004).
The approach introduced might also be extended to other settings and data structures. Firstly, our QHSMM can be generalized to multivariate longitudinal data, extending the recent quantile mixed HMM in Merlo et al. (2022) by using more flexible sojourn distributions and, possibly, allowing the semi-Markov chain parameters to depend on observable covariates. Secondly, another extension would deal with the inclusion of spatial heterogeneity into the modeling approach to account for spatial-temporal variations of air pollutants from different monitoring sites. Lastly, while in the application we focused on daily concentrations of three air pollutants, in high-dimensional settings with a larger number of response variables and/or number of hidden states, the introduced QHSMM can be easily over-parameterized. This often occurs because of the large number of unique parameters in the covariance matrices to be estimated, implying a loss in terms of interpretability as well as numerically ill-conditioned estimators. In these cases, following Maruotti et al. (2017), we may consider a class of parsimonious HSMMs by imposing a factor decomposition on the state-specific covariance matrices. Not only would this modeling strategy provide information about the dependence between pollutants, but also a clear interpretation of the latent association structure among them.
Supplementary Materials
The Supplementary Materials include additional simulations that are used to support the results in the manuscript when the number of analyzed response variables p increases.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgements
We would like to warmly thank the Associate Editor and two anonymous reviewers for their thoughtful comments and efforts towards improving our manuscript. This work was partially supported by the Finance Market Fund, Norway, project number 309218; “Statistical modelling and inference for (high-dimensional) financial data”.
Appendix
Proof of Proposition 1
Firstly, we note that to ensure identifiability of the MAL density in (5) it suffices to apply Proposition 1 of Petrella and Raponi (2019). Secondly, for a general HSMM, identifiability has been proven up to label switching (Leroux 1992). Thus, to ensure identifiability, all one needs to prove is the identifiability of the marginal mixtures (Dannemann et al. 2014), which in our case are represented by the finite mixtures of MAL distributions. Based on the work of Holzmann et al. (2006), Browne and McNicholas (2015) prove identifiability of finite mixtures of multivariate generalized hyperbolic distributions. Since the MAL in (5) is a limiting case of the multivariate generalized hyperbolic distribution (see (3) and (4) in Browne and McNicholas (2015) with and ), model identifiability follows by applying Corollary 2 of Browne and McNicholas (2015).
Proof of Proposition 2
The E-step of the EM algorithm considers the conditional expectation of the complete log-likelihood function in (8) given the observed data and the current parameter estimates . At first, we recall that under the constraints imposed on and , the representation in (7) implies that:
| 35 |
This means that the joint density function of and is:
| 36 |
By substituting (36) in (8) and taking the conditional expectation of the logarithm of (8), we obtain the expected complete log-likelihood function in (13).
To compute the conditional expectation of and in (13), is treated as an additional latent variable. Using the joint distribution of and derived in (36) and the MAL density of given in (5), we have that:
| 37 |
which corresponds to a Generalized Inverse Gaussian (GIG) distribution with parameters , i.e.1
| 38 |
Then, it follows that
| 39 |
and
| 40 |
Denoting the two conditional expectations in (39) and (40) by and respectively, concludes the proof.
Proof of Proposition 3
Imposing the first order conditions on (13) with respect to each component of the set , gives the update estimates in (16), (17), (18) and (19). However, there is not closed formula solution to update the elements of the scale matrix ; hence, the M-step update requires using numerical optimization techniques to maximize (13). A considerable disadvantage of this procedure is the necessary high computational effort which could be very time-consuming. For this reason, we utilize a simpler estimator for the scale parameters , which follows directly from the fact that all marginals of the MAL distribution are univariate AL distributions (see Yu and Zhang 2005):
| 41 |
where is the k-th element of the vector .
Footnotes
The pdf of a GIG(p, a, b) distribution is defined as , with , and .
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Adam T, Langrock R, Weiß CH. Penalized estimation of flexible hidden Markov models for time series of counts. Metron. 2019;77(2):87–104. doi: 10.1007/s40300-019-00153-6. [DOI] [Google Scholar]
- Akaike, H.: Information theory and an extension of the maximum likelihood principle, in ‘Selected papers of Hirotugu Akaike’, Springer, pp. 199–213, (1998)
- Barbu, V. S. and Limnios, N.: Semi-Markov chains and hidden semi-Markov models toward applications: their use in reliability and DNA analysis, Vol. 191, Springer Science & Business Media (2009)
- Bartolucci, F., Farcomeni, A. and Pennoni, F.: Latent Markov models for longitudinal data, CRC Press, (2012)
- Bassani C, Vichi F, Esposito G, Montagnoli M, Giusto M, Ianniello A. Nitrogen dioxide reductions from satellite and surface observations during COVID-19 mitigation in Rome (Italy) Environmental Science and Pollution Research. 2021;28(18):22981–23004. doi: 10.1007/s11356-020-12141-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bernardi M, Gayraud G, Petrella L, et al. Bayesian tail risk interdependence using quantile regression. Bayesian Analysis. 2015;10(3):553–603. doi: 10.1214/14-BA911. [DOI] [Google Scholar]
- Biernacki C, Celeux G, Govaert G. Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000;22(7):719–725. doi: 10.1109/34.865189. [DOI] [Google Scholar]
- Browne RP, McNicholas PD. A mixture of generalized hyperbolic distributions. Canadian Journal of Statistics. 2015;43(2):176–198. doi: 10.1002/cjs.11246. [DOI] [Google Scholar]
- Bulla J, Bulla I. Stylized facts of financial time series and hidden semi-Markov models. Computational Statistics & Data Analysis. 2006;51(4):2192–2209. doi: 10.1016/j.csda.2006.07.021. [DOI] [Google Scholar]
- Bulla J, Bulla I, Nenadić O. hsmm-an R package for analyzing hidden semi-Markov models. Computational Statistics & Data Analysis. 2010;54(3):611–619. doi: 10.1016/j.csda.2008.08.025. [DOI] [Google Scholar]
- Cappé, O., Moulines, E. and Rydén, T. (2006), Inference in hidden Markov models, Springer Science & Business Media
- Charlier I, Paindaveine D, Saracco J. Multiple-output quantile regression through optimal quantization. Scandinavian Journal of Statistics. 2020;47(1):250–278. doi: 10.1111/sjos.12426. [DOI] [Google Scholar]
- Chavas J-P. On multivariate quantile regression analysis. Statistical Methods & Applications. 2018;27(3):365–384. doi: 10.1007/s10260-017-0407-x. [DOI] [Google Scholar]
- Dannemann J, Holzmann H, Leister A. Semiparametric hidden Markov models: identifiability and estimation. Wiley Interdisciplinary Reviews: Computational Statistics. 2014;6(6):418–425. doi: 10.1002/wics.1326. [DOI] [Google Scholar]
- Dempster, A. P., Laird, NM and Rubin, DB.: ‘Maximum likelihood from incomplete data via the EM algorithm’, Journal of the Royal Statistical Society Series B (Methodological) pp. 1–38 (1977)
- El Ghouch A, Genton MG. Local polynomial quantile regression with parametric features. J.Am. Stat. Assoc. 2009;104(488):1416–1429. doi: 10.1198/jasa.2009.tm08400. [DOI] [Google Scholar]
- Engle RF, Manganelli S. CAViaR: conditional autoregressive value at risk by regression quantiles. J. Bus. Econ. Stat. 2004;22(4):367–381. doi: 10.1198/073500104000000370. [DOI] [Google Scholar]
- Ephraim Y, Merhav N. Hidden Markov processes. IEEE Trans. Inf. Theor. 2002;48(6):1518–1569. doi: 10.1109/TIT.2002.1003838. [DOI] [Google Scholar]
- Farcomeni A. Quantile regression for longitudinal data based on latent Markov subject-specific parameters. Stat. Comput. 2012;22(1):141–152. doi: 10.1007/s11222-010-9213-0. [DOI] [Google Scholar]
- Geraci M. Modelling and estimation of nonlinear quantile regression with clustered data. Comput. Stat. Data Anal. 2019;136:30–46. doi: 10.1016/j.csda.2018.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guédon Y. Estimating hidden semi-Markov chains from discrete sequences. J. Comput. Graph. Stat. 2003;12(3):604–639. doi: 10.1198/1061860032030. [DOI] [Google Scholar]
- Hamilton, J. D.:‘A new approach to the economic analysis of nonstationary time series and the business cycle’, Econometrica: Journal of the Econometric Society pp. 357–384, (1989)
- Holzmann H, Munk A, Gneiting T. Identifiability of finite mixtures of elliptical distributions. Scandinav. J. Stat. 2006;33(4):753–763. doi: 10.1111/j.1467-9469.2006.00505.x. [DOI] [Google Scholar]
- Hubert L, Arabie P. Comparing partitions. J. Classif. 1985;2(1):193–218. doi: 10.1007/BF01908075. [DOI] [Google Scholar]
- Koenker, R.: Quantile regression. Cambridge University Press (2005)
- Koenker, R. and Bassett, G.:‘Regression Quantiles’, Econometrica: Journal of the Econometric Society46(1), 33–50, (1978)
- Koenker, R., Chernozhukov, V., He, X. and Peng, L.: Handbook of quantile regression, CRC press, (2017)
- Kong, L. and Mizera, I.: ‘Quantile tomography: using quantiles with multivariate data’, Statistica Sinica pp. 1589–1610, (2012)
- Kotz, S., Kozubowski, T. and Podgorski, K.:The Laplace distribution and generalizations: a revisit with applications to communications, economics, engineering, and finance, Springer Science & Business Media, (2012)
- Langrock R, Kneib T, Sohn A, DeRuiter SL. Nonparametric inference in hidden Markov models using P-splines. Biometrics. 2015;71(2):520–528. doi: 10.1111/biom.12282. [DOI] [PubMed] [Google Scholar]
- Langrock R, Zucchini W. Hidden Markov models with arbitrary state dwell-time distributions. Comput. Stat. Data Anal. 2011;55(1):715–724. doi: 10.1016/j.csda.2010.06.015. [DOI] [Google Scholar]
- Leroux BG. Maximum-likelihood estimation for hidden Markov models. Stochastic Process. Appl. 1992;40(1):127–143. doi: 10.1016/0304-4149(92)90141-C. [DOI] [Google Scholar]
- Levinson SE, Rabiner LR, Sondhi MM. An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition. Bell Syst. Tech. J. 1983;62(4):1035–1074. doi: 10.1002/j.1538-7305.1983.tb03114.x. [DOI] [Google Scholar]
- Luo Y, Lian H, Tian M. Bayesian quantile regression for longitudinal data models. J. Stat. Comput. Simul. 2012;82(11):1635–1649. doi: 10.1080/00949655.2011.590488. [DOI] [Google Scholar]
- MacDonald, I. L. and Zucchini, W.:Hidden Markov and other models for discrete-valued time series, Vol. 110, CRC Press (1997)
- Marino MF, Tzavidis N, Alfò M. Mixed hidden Markov quantile regression models for longitudinal data with possibly incomplete sequences. Stat. Methods Med. Res. 2018;27(7):2231–2246. doi: 10.1177/0962280216678433. [DOI] [PubMed] [Google Scholar]
- Maruotti A. Mixed hidden Markov models for longitudinal data: an overview. Int. Stat. Rev. 2011;79(3):427–454. doi: 10.1111/j.1751-5823.2011.00160.x. [DOI] [Google Scholar]
- Maruotti, A., Bulla, J., Lagona, F., Picone, M. and Martella, F.:‘Dynamic mixtures of factor analyzers to characterize multivariate air pollutant exposures’, The Annals of Applied Statistics pp. 1617–1648,(2017)
- Maruotti A, Petrella L, Sposito L. Hidden semi-Markov-switching quantile regression for time series. Compu. Stati. Data Anal. 2021;159:107208. doi: 10.1016/j.csda.2021.107208. [DOI] [Google Scholar]
- Maruotti A, Punzo A. Initialization of hidden Markov and semi-Markov models: a critical evaluation of several strategies. Int. Stat. Rev. 2021;89(3):447–480. doi: 10.1111/insr.12436. [DOI] [Google Scholar]
- Maruotti A, Punzo A, Bagnato L. Hidden Markov and semi-Markov models with multivariate leptokurtic-normal components for robust modeling of daily returns series. J. Financ. Econom. 2019;17(1):91–117. [Google Scholar]
- Merlo L, Petrella L, Raponi V. Forecasting VaR and ES using a joint quantile regression and its implications in portfolio allocation. J. Bank. Finan. 2021;133:106248. doi: 10.1016/j.jbankfin.2021.106248. [DOI] [Google Scholar]
- Merlo L, Petrella L, Salvati N, Tzavidis N. Marginal M-quantile regression for multivariate dependent data. Comput. Stat. Data Anal. 2022;173:107500. doi: 10.1016/j.csda.2022.107500. [DOI] [Google Scholar]
- Merlo L, Petrella L, Tzavidis N. ‘Quantile mixed hidden Markov models for multivariate longitudinal data: an application to children’s Strengths and Difficulties Questionnaire scores’, Journal of the Royal Statistical Society. Ser. C Appl. Stat. 2022;71(2):417–448. doi: 10.1111/rssc.12539. [DOI] [Google Scholar]
- O’Connell J, Højsgaard S, et al. Hidden semi Markov models for multiple observation sequences: the mhsmm package for R. J. Stat. Softw. 2011;39(4):1–22. [Google Scholar]
- Petrella L, Raponi V. Joint estimation of conditional quantiles in multivariate linear regression models with an application to financial distress. J. Multivar. Anal. 2019;173:70–84. doi: 10.1016/j.jmva.2019.02.008. [DOI] [Google Scholar]
- Pohle J, Adam T, Beumer LT. Flexible estimation of the state dwell-time distribution in hidden semi-Markov models. Comput. Stat. Data Anal. 2022;172:107479. doi: 10.1016/j.csda.2022.107479. [DOI] [Google Scholar]
- Pohle J, Langrock R, van Beest FM, Schmidt NM. Selecting the number of states in hidden Markov models: pragmatic solutions illustrated using animal movement. J. Agric Biol. Environ. Stat. 2017;22(3):270–293. doi: 10.1007/s13253-017-0283-8. [DOI] [Google Scholar]
- Putaud J-P, Pozzoli L, Pisoni E, Martins Dos Santos S, Lagler F, Lanzani G, Dal Santo U, Colette A. Impacts of the COVID-19 lockdown on air pollution at regional and urban background sites in northern Italy. Atmosp. Chem. Phys. 2021;21(10):7597–7609. doi: 10.5194/acp-21-7597-2021. [DOI] [Google Scholar]
- Sansom, J. and Thomson, P. (2001), ‘Fitting hidden semi-Markov models to breakpoint rainfall data’, Journal of Applied Probability38(A), 142–157
- Schwarz, G., et al.: Estimating the dimension of a model. Anna. Stat. 6(2), 461–464 (1978)
- Serfling R. Quantile functions for multivariate analysis: approaches and applications. Statist. Neerlandica. 2002;56(2):214–232. doi: 10.1111/1467-9574.00195. [DOI] [Google Scholar]
- Stolfi, P., Bernardi, M. and Petrella, L.: ‘The sparse method of simulated quantiles: An application to portfolio optimization’, Statistica Neerlandica (2018),
- Visser I, Raijmakers ME, Molenaar PC. Confidence intervals for hidden Markov model parameters. Br. J. Math. Statist. Psychol. 2000;53(2):317–327. doi: 10.1348/000711000159240. [DOI] [PubMed] [Google Scholar]
- Ye, W., Zhu, Y., Wu, Y. and Miao, B.: ‘Markov regime-switching quantile regression models and financial contagion detection’, Insurance: Mathematics and Economics67, 21–26,(2016)
- Yu K, Moyeed RA. Bayesian quantile regression. Stat. Probab. Lett. 2001;54(4):437–447. doi: 10.1016/S0167-7152(01)00124-9. [DOI] [Google Scholar]
- Yu K, Zhang J. A three-parameter asymmetric Laplace distribution and its extension. Commun. Statist. Theory Methods. 2005;34(9–10):1867–1879. doi: 10.1080/03610920500199018. [DOI] [Google Scholar]
- Yu, S.-Z.: Hidden Semi-Markov models: theory, algorithms and applications. Morgan Kaufmann (2015)
- Zucchini, W., MacDonald, I. L. and Langrock, R.: Hidden Markov models for time series: an introduction using R, Chapman and Hall CRC (2016)
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



