Abstract
In this paper we focus on partially linear regression models with long memory errors, and propose a wavelet-based Bayesian procedure that allows the simultaneous estimation of the model parameters and the nonparametric part of the model. Employing discrete wavelet transforms is crucial in order to simplify the dense variance-covariance matrix of the long memory error. We achieve a fully Bayesian inference by adopting a Metropolis algorithm within a Gibbs sampler. We evaluate the performances of the proposed method on simulated data. In addition, we present an application to Northern hemisphere temperature data, a benchmark in the long memory literature.
Keywords: Bayesian inference, long memory, MCMC, partially linear regression model, wavelet transforms
1. Introduction
Partially linear regression (PLR) models are semiparametric models since they contain both a parametric linear trend and a nonparametric component. These models are useful in situations where the response variable is linearly related to some of the covariates and, at the same time, depends on other covariates in a nonlinear way. PLR models are also quite flexible, since they include as special cases both the linear regression model (without the nonparametric component) and the usual nonparametric regression model (without the trend parameters). They have been widely adopted in the literature, especially in economics, finance, and biology. Engle, Granger, Rice and Weiss (1986) first analyzed the relationship between temperature and electricity sales using these models. Lenk (1999) analyzed traffic accident data by representing the nonparametric component of the model via a Fourier series and adopting a hierarchical prior on the Fourier coefficients. Koop and Porier (2004) assumed a normal prior on the nonparametric components and standard noninformative priors on the trend parameter and the error variance. They also extended their methods to partially linear probit models. Most of the existing contributions on PLR models deal with identically and independently distributed (i.i.d.) errors, while very few of them address correlated errors, especially long memory, see for example Germán, Wenceslao and Philippe (2004) and Beran and Ghosh (1998).
Some contributions exist in wavelet-based methods for nonparametric estimation of PLR models. Qu (2003) and Chang and Qu (2004) exploited the ability of wavelets to adapt to the unknown smoothness of a function by applying wavelet transforms to the data. The authors used an l1-penalized least square criterion for model estimation. Fadili and Bullmore (2005) studied cases where the nonparametric components can be parsimoniously estimated by choosing an appropriate penalty function. Qu (2006) proposed a partially Bayesian estimation procedure in the wavelet domain. All these contributions are restricted to the case of PLR models with i.i.d. normal errors.
In this paper we propose a wavelet-based Bayesian estimation procedure of the model parameters and the nonparametric function of a PLR model with long memory errors. Wavelets have a strong connection to long memory processes and have proven to be a powerful tool for the analysis and synthesis of data from such processes. The ability of wavelets to localize a process simultaneously in the time and scale domains results in representing many dense matrices in a sparse form. When transforming measurements from a long memory process, wavelet coefficients are approximately uncorrelated, in contrast with the dense long memory covariance structure of the data, see Tewfik and Kim (1992), Craigmile and Percival (2005), and Ko and Vannucci (2006), among others. Here we take advantage of this whitening property and use discrete wavelet transforms in order to simplify the variance-covariance structure of the response variable by writing the likelihood function with a diagonalized variance-covariance matrix. This in turn leads to a minimal computational burden in the estimation of the model parameters. We perform posterior estimation via Markov chain Monte Carlo (MCMC) methods and assess performances on simulated data and on the benchmark Northern hemisphere temperature data set.
The remainder of this paper is organized as follows. In Section 2 we introduce the model and the necessary basic concepts on long memory processes and on discrete wavelet transforms. We focus in particular on autoregressive fractionally integrated moving average (ARFIMA) errors. In Section 3 we describe the transformed model in the wavelet domain, and illustrate prior and posterior models and the MCMC procedure for the estimation of the parameters and the unknown nonparametric function. In Section 4 we report results from simulations and from the application to the Northern hemisphere temperature data. Some concluding remarks are given in Section 5.
2. The Model
Consider the partially linear regression model
| (2.1) |
where y is the (N × 1) vector of response data, X = [x1, …, xl] is the (N × l) design matrix consisting of (N × 1) covariate vectors xi, i = 1, …, l, β is the (l × 1) regression coefficient vector, tT = (t1, …, tN) is the (N × 1) vector representing equally spaced sample points. We assume ε to be an (N × 1) zero-mean Gaussian autoregressive fractionally integrated moving average error with a long memory parameter d ∈ (0, 0.5) and innovation variance . Our aim is to estimate the model parameters, (β, ϕ1, …, ϕp, d, θ1, …, θq, ), where the ϕ's and θ's are autoregressive and moving average (ARMA) parameters, and the unknown function f(t) in (2.1).
2.1. Long memory errors
A long memory process is characterized by a slow decay in its autocovariance, that is γ(h) ~ Ch−α, where C is a positive constant depending on the process, 0 < α < 1 and h is large. ARFIMA(p, d, q) processes , first introduced by Granger and Joyeux (1980) and Hosking (1981), are defined as the stationary solution of the equation
with B the backshift operator, BXt = Xt−1, Φ(B) = 1 − ϕ1 B − ⋯ − ϕp Bp, Θ (B) = 1 + θ1B + ⋯ + θq Bq, and a Gaussian white noise with zero mean and innovation variance . Applying the fractional d-differencing operator to results in an ARMA(p, q) model.
ARFIMA(p, d, q) processes are stationary and invertible for −0.5 < d < 0.5, with all roots of the polynomials Φ(·) and Θ(·) being outside the unit circle. The case 0 < d < 0.5 is characterized by long range dependences between distant observations and the autocorrelations decay hyperbolically to zero as the lag increases. For d = 0 the process becomes a Box-Jenkins ARMA(p, q) model. For −0.5 < d < 0, it is said to have intermediate memory and a summable autocorrelation function. A simple but important class of ARFIMA(p, d, q) processes is the fractionally integrated noise (or ARFIMA(0, d, 0)) process, (1 − B)dXt = εt.
Sowell (1992) explicitly derives the autocovariance function γ(h) of ARFIMA processes, and Doornik and Ooms (2003) express it in the numerically stable form
for h = 1, …, N − 1, where , ρ1, …, ρp are the p roots of the AR polynomial Φ,
(a)i is Pochhammer's symbol defined as (a)i = Γ(a + i)/Γ (a), and
with . The form of the autocovariance function for specific processes can be derived from the general formulation. For example if is an ARFIMA(0, d, q) series, the autocovariance function reduces to
and in the special case q = 1 we have
| (2.2) |
Also, for ARFIMA(0, d, 0), the autocovariance function is
| (2.3) |
2.2. Discrete wavelet transforms
Suppose we observe a time series, Y = (y1, …, yN), as a realization of a random process. A discrete wavelet transform (DWT), see Mallat (1989), can be used to transform the data Y into a set of wavelet coefficients. Although it operates via recursive applications of filters, for practical purposes a DWT of order g is often represented in matrix form as ω = WY, with W an N × N orthogonal matrix of the form that decomposes the data into sets of coefficients , with ωm = WmY of dimension N(j) = N/2j, j = 1, …, g, and yg = VgY of dimension N/2g such that N = N′ + N/2g where . Coefficients yg are scaling coefficients representing a coarser approximation of the data, while coefficients ω1, …, ωg are wavelet coefficients representing local features of the data at different scales (or resolution levels). An inverse transformation exists to reconstruct the data from its wavelet decomposition.
Nonparametric wavelet estimators have now been extensively used in the statistical literature. In regression models, the majority of the contributions in the literature have focused on the case of equally spaced data, following the seminal work of Donoho and Johnstone (1994, 1995). Several papers have been published since then, on modelling issues and extensions, using both classical and Bayesian methods. Rather than give a partial list of references, we refer readers to the paper of Antoniadis, Bigot and Sapatinas (2001) that presents an exhaustive review.
3. Bayesian Modelling in the Wavelet Domain
Our aim is to estimate the model parameters (β, Ψ, ), where Ψ = (ϕ, d, θ), ϕ = (ϕ1, …, ϕp), and θ = (θ1, …, θq), and the unknown function f (t) in model (2.1). For simplicity let us assume that N = 2J. This is not a real restriction and methods exist to overcome the limitation allowing wavelet transforms to be applied to any length of data (Taswell and McGill (1994)).
After applying a column-wise discrete wavelet transform W on both sides of the model, this can be expressed in the wavelet domain as
| (3.1) |
where ω = Wy = [ωjk]N×1, U = W [x1, …, xp] = [u1, …, up], with ui = [uijk]N×1, ϑ = W f(t) = [ϑjk]N×1 and , i = 1, …, p, j = 1, …, J – 1, k = 1, …, N/2j. As for the indexed terms, ωjk, uijk ϑjk, and denote the kth wavelet coefficient at the j-th scale (or resolution level) of the DWT of the response data y, the covariate xi, the nonparametric component f(t) and ε, respectively. Here , where is the (N × N) diagonal matrix with elements indicating the variance of the kth wavelet coe cient at the jth scale. Exact variances of wavelet and scaling coefficients can be computed as in Ko and Vannucci (2006) by writing Σ∊(i, j) = [γ(|i – j|)], with γ(h) the autocovariance function of an ARFIMA process, and then computing the variance-covariance matrix as . Vannucci and Corradi (1999) have proposed a recursive way of computing variances and covariances of wavelet coefficients by using the recursive filters of the DWT. Their algorithm has an interesting link to the two-dimensional discrete wavelet transform (DWT2) that makes computations simple. In the context of this paper, the variance-covariance matrix of the wavelet coefficients can be computed by first applying the DWT2 to the matrix Σ∊. The diagonal blocks of the resulting matrix will provide the within-scale variances and covariances at the di erent levels. One can then apply the one-dimensional DWT to the rows of the off-diagonal blocks to obtain the across-scale variances and covariances.
Many authors have shown how wavelet transforms, being band-pass filters, balance the divergence of the spectrum of long memory data at frequencies close to zero, and therefore “whiten” the data, i.e., the wavelet coefficients tend to be less correlated than the original data, see Tewfik and Kim (1992), Craigmile and Percival (2005), and Ko and Vannucci (2006), among others.
3.1. Prior model
For Bayesian inference we need to specify a prior distribution for each unknown model parameter. We use noninformative priors on β and , i.e.,
For the prior distribution of d, which dictates the long range dependent behavior of the model, we use a beta distribution of the type
As for the priors of the ϕ's and θ's, we use uniform distributions in (−1, 1) to satisfy the causality and invertibility of the ARMA processes.
In the literature on Bayesian methods for wavelet-based nonpara-metric regression models a commonly adopted prior distribution for the wavelet coe cients ϑjk of the nonparametric function is a mixture of two distributions. We follow Clyde, Parmigiani and Vidakovic (1998) and Abramovich, Sapatinas and Silverman (1998) and use mixture distributions of a zero-mean normal and a degenerate distribution at 0 of the type
where γjk ~ Bernoulli(pj), 0 ≤ pj ≤ 1 and δ(0) is a point mass at 0. The corresponds to `non-negligible' wavelet coe cients and the δ(0) to `negligible' coefficients. The hyperparameter pj represents the proportion of the `non-negligible' wavelet coefficients at scale j, and τj is a measure of the spread of their magnitudes. Here pj and are assumed to be constant for a given resolution level j. These hyperparameters play a very important role in the estimation of the nonparametric function f(t) and should be chosen appropriately. Following Abramovich, Sapatinas and Silverman (1998), we use
| (3.2) |
The estimation of Cp and Cτ will be discussed in Section 3.3.
Assuming independence among β, , Ψ, ϑ, and γ, the joint prior distribution can be written as
where Σγτ is the (N × N) diagonal matrix such that the kth element in the jth scale is .
3.2. Posterior inference
The posterior distribution of given (ω, U) is
where L(Θ|ω, U) is the likelihood function of ω. Here we use an MCMC method to generate samples from this posterior distribution. The details of the full conditionals are given in the Appendix. Clyde, Parmigiani and Vidakovic (1998) consider three posterior inferential methods (two analytic approximation methods and an importance sampling method) together with an MCMC method, and show via simulations that the MCMC-based posterior approach performs well.
3.3. Estimation of the hyperparameters
In applications the hyperparameters pj and τj need to be appropriately chosen. Because of the specification (3.2), this problem reduces to the estimation of the constants Cp and Cτ. Here we adopt a slight modification of the estimation procedure of Abramovich, Sapatinas and Silverman (1998), who suggested maximizing the likelihood function of the wavelet coefficients that pass the VisuShrink threshold , where σ is the median absolute deviation (MAD) of the finest wavelet coe cients divided by 0.6745 (Donoho and Johnstone (1994)). We therefore calculate the residuals of model (3.1), , where is the ordinary least squared estimate of β. Treating r as a wavelet estimate of the sum of the unknown function f(t) and the long memory noise ∊, we apply hard thresholding to the residuals using , where is the sample standard deviation of the wavelet coefficients at the finest resolution level of the wavelet decomposition of the residuals r. Then we maximize
where Φ denotes the standard normal cumulative distribution function, Mj denotes the number of the wavelet coefficients that pass the hard threshold on the resolution level j, and xjm, m = 1, …, Mj is the coefficient that passes the threshold on the scale j. A method-of-moment estimate of Cp given Cτ is
4. Applications
4.1. Simulation study
For the simulated data we used
where the error ε is assumed to follow an ARFIMA(0, d, 0) or an ARFIMA(0, d, 1) process. The famous “Blocks”, “Bumps”, “Doppler” and “HeavySine” functions, adopted by Donoho and Johnstone (1994), were used for the nonparametric functions f(t). In order to generate the long memory errors, we used a computationally simple method proposed by McLeod and Hipel (1978) that involves the Cholesky decomposition of the correlation matrix Rε(i, j) = [ρ(|i − j|)] = [ρ(h)], with h = |i − j| = 1, …, N − 1. The covariance functions (2.2) and (2.3) were used for ARFIMA(0, d, 1) and ARFIMA(0, d, 0) errors, respectively.
For simulations of errors from ARFIMA(0, d, 0) processes, different values of the long memory parameters, d = 0.05, 0.2, 0.4 were used. A unit innovation variance was chosen, i.e., . A covariate x was generated from a N(0, 1) and the trend parameter β was set to 1. When applying discrete wavelet transforms, we used Daubechies minimum phase wavelets with four vanishing moments for “Bumps”, “Doppler” and “HeavySine”, and with one vanishing moment for “Blocks” function. Different sample sizes were considered, specifically N = 128, 256, and 512. For a given N, we simulated 50 datasets and computed biases and mean squared errors of the estimates of β, d, , and f. For the Metropolis move of d, we used the normal proposal distribution with standard deviation 0.05. We used the simple least square estimate as an initial value of β. For the initial values of d and , we used 0.3 and 5, respectively, and then perturbed these initial values to obtain over-dispersed values in order to initialize three MCMC chains. All chains ran for 600 iterations with a burn-in period of 300. All chains mixed fast and well, and acceptance probabilities for the Metropolis steps were around 50%. Goodness-of-fit of the nonparametric estimators was assessed by calculating the mean squared error of as for each replicate, and then averaging over the 50 replicates. This measure is indicated in the tables as AMSE. Standard errors are also reported.
Table 1.1 shows the result. For all values of d the mean squared errors (MSE) and the biases of , , consistently decreased in almost all cases as the sample size increased. In the estimation of the nonparametric component, AMSEs and their standard errors (STDER) decreased as d approached 0 (i.e., almost uncorrelated errors). The MCMC chains mixed well and converged to the true values of the model parameters. Figure 1.1 shows the ideal four nonparametric functions in the first column, the corresponding contaminated series with a trend (β = 1) and long memory error (d = 0.2, ) in the second column, and the nonparametric function estimates via the proposed method in the third column.
Table 1.1.
Biases, MSEs and AMSEs of the estimated model parameters from the wavelet-based Bayesian estimation procedure when the error is simulated from an ARFIMA(0, d, 0). Both β and are set to 1.
|
|
|
|
|
|||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| d | N | f(t) | BIAS | MSE | BIAS | MSE | BIAS | MSE | AMSE | STDER | ||||
| 0.05 | 27 | Blocks | 0.003 | 0.012 | 0.062 | 0.006 | 0.125 | 0.091 | 0.639 | 0.139 | ||||
| Bumps | 0.012 | 0.013 | 0.069 | 0.008 | 0.196 | 0.073 | 0.428 | 0.077 | ||||||
| Doppler | −0.013 | 0.009 | 0.035 | 0.006 | −0.045 | 0.019 | 0.109 | 0.040 | ||||||
| HeavySine | 0.008 | 0.009 | 0.092 | 0.011 | 0.098 | 0.030 | 0.258 | 0.045 | ||||||
|
| ||||||||||||||
| 28 | Blocks | −0.004 | 0.008 | 0.055 | 0.005 | 0.130 | 0.055 | 0.390 | 0.069 | |||||
| Bumps | −0.011 | 0.004 | 0.019 | 0.007 | 0.149 | 0.051 | 0.304 | 0.035 | ||||||
| Doppler | 0.007 | 0.003 | 0.025 | 0.002 | −0.014 | 0.007 | 0.079 | 0.027 | ||||||
| HeavySine | −0.007 | 0.006 | 0.011 | 0.006 | 0.079 | 0.022 | 0.206 | 0.039 | ||||||
|
| ||||||||||||||
| 29 | Blocks | 0.002 | 0.004 | 0.033 | 0.002 | 0.076 | 0.016 | 0.244 | 0.043 | |||||
| Bumps | 0.011 | 0.001 | −0.018 | 0.009 | 0.116 | 0.029 | 0.251 | 0.042 | ||||||
| Doppler | 0.004 | 0.003 | 0.023 | 0.002 | 0.009 | 0.004 | 0.055 | 0.012 | ||||||
| HeavySine | −0.003 | 0.002 | 0.079 | 0.008 | 0.045 | 0.006 | 0.135 | 0.034 | ||||||
|
| ||||||||||||||
| 0.2 | 27 | Blocks | 0.007 | 0.012 | −0.083 | 0.010 | 0.161 | 0.098 | 0.665 | 0.149 | ||||
| Bumps | −0.029 | 0.010 | −0.026 | 0.005 | 0.180 | 0.073 | 0.525 | 0.109 | ||||||
| Doppler | −0.010 | 0.009 | −0.065 | 0.008 | −0.054 | 0.023 | 0.257 | 0.139 | ||||||
| HeavySine | 0.008 | 0.008 | −0.043 | 0.007 | −0.046 | 0.017 | 0.394 | 0.112 | ||||||
|
| ||||||||||||||
| 28 | Blocks | −0.008 | 0.007 | −0.062 | 0.008 | 0.057 | 0.026 | 0.481 | 0.115 | |||||
| Bumps | 0.032 | 0.008 | 0.019 | 0.004 | 0.138 | 0.029 | 0.398 | 0.080 | ||||||
| Doppler | −0.006 | 0.005 | −0.049 | 0.006 | −0.077 | 0.010 | 0.165 | 0.070 | ||||||
| HeavySine | 0.000 | 0.003 | 0.034 | 0.004 | 0.018 | 0.010 | 0.316 | 0.079 | ||||||
|
| ||||||||||||||
| 29 | Blocks | 0.007 | 0.002 | −0.057 | 0.006 | 0.003 | 0.007 | 0.326 | 0.069 | |||||
| Bumps | −0.012 | 0.001 | −0.021 | 0.003 | 0.120 | 0.023 | 0.331 | 0.057 | ||||||
| Doppler | 0.016 | 0.002 | −0.035 | 0.003 | −0.024 | 0.005 | 0.120 | 0.050 | ||||||
| HeavySine | 0.004 | 0.002 | 0.027 | 0.003 | −0.005 | 0.005 | 0.230 | 0.048 | ||||||
|
| ||||||||||||||
| 0.4 | 27 | Blocks | 0.010 | 0.013 | −0.206 | 0.048 | 0.346 | 0.218 | 1.721 | 0.817 | ||||
| Bumps | −0.007 | 0.011 | −0.176 | 0.036 | 0.054 | 0.033 | 1.412 | 1.524 | ||||||
| Doppler | −0.013 | 0.006 | −0.189 | 0.040 | −0.173 | 0.041 | 1.274 | 1.419 | ||||||
| HeavySine | −0.006 | 0.007 | −0.156 | 0.032 | −0.141 | 0.041 | 1.462 | 1.078 | ||||||
|
| ||||||||||||||
| 28 | Blocks | −0.014 | 0.004 | −0.148 | 0.030 | 0.076 | 0.026 | 1.382 | 0.666 | |||||
| Bumps | 0.003 | 0.005 | −0.110 | 0.016 | 0.038 | 0.011 | 1.223 | 0.977 | ||||||
| Doppler | 0.005 | 0.004 | −0.117 | 0.018 | −0.112 | 0.019 | 1.267 | 0.982 | ||||||
| HeavySine | −0.006 | 0.005 | −0.066 | 0.009 | −0.071 | 0.015 | 1.276 | 1.231 | ||||||
|
| ||||||||||||||
| 29 | Blocks | −0.000 | 0.003 | −0.047 | 0.006 | 0.011 | 0.007 | 1.127 | 0.579 | |||||
| Bumps | −0.002 | 0.003 | −0.051 | 0.004 | 0.031 | 0.006 | 1.215 | 0.851 | ||||||
| Doppler | 0.007 | 0.002 | −0.073 | 0.008 | −0.052 | 0.008 | 1.162 | 0.082 | ||||||
| HeavySine | 0.006 | 0.001 | −0.051 | 0.004 | −0.035 | 0.007 | 0.998 | 0.598 | ||||||
Figure 1.1.

The four nonparametric functions: (a) Blocks, (d) Bumps, (g) Doppler, and (j) HeavySine. Plots in the second column show noisy data with Gaussian long memory errors with d = 0.2 and = 1. Here β = 1. Plots in the third column show the recovered functions using the proposed wavelet-based Bayesian method.
Finally, we report simulation results of ARFIMA(0, d, 1) in Table 1.2. In the simulation, the long memory parameter d and moving average parameter θ were set to 0.2 and 0.3, respectively. For the Metropolis move of θ, we used the normal proposal distribution with standard deviation 0.05. The other parameters remained the same as in the simulation with ARFIMA(0, d, 0) errors. The biases and mean squared errors of , , and and the AMSEs and standard errors of were relatively large compared to those of the models without the moving average parameter, although they still showed good performances.
Table 1.2.
Biases, MSEs and AMSEs of the estimated model parameters from the wavelet-based Bayesian estimation procedure when the error is simulated from an ARFIMA(0, d, 1). The moving average parameter θ is set to 0.3, and both β and are set to 1.
|
|
|
|
|
|
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| N | f(t) | BIAS | MSE | BIAS | MSE | BIAS | MSE | BIAS | MSE | AMSE | STDER | |||||
| 27 | Blocks | −0.004 | 0.016 | −0.091 | 0.021 | −0.209 | 0.258 | 0.180 | 0.148 | 1.164 | 0.333 | |||||
| Bumps | −0.013 | 0.018 | −0.092 | 0.031 | 0.311 | 0.120 | 0.825 | 0.776 | 1.921 | 0.316 | ||||||
| Doppler | 0.010 | 0.006 | −0.074 | 0.008 | 0.123 | 0.061 | −0.153 | 0.049 | 0.558 | 0.164 | ||||||
| HeavySine | −0.022 | 0.009 | −0.025 | 0.008 | 0.213 | 0.079 | −0.046 | 0.048 | 0.960 | 0.297 | ||||||
|
| ||||||||||||||||
| 28 | Blocks | −0.006 | 0.004 | −0.018 | 0.010 | 0.179 | 0.056 | −0.094 | 0.033 | 0.903 | 0.287 | |||||
| Bumps | 0.002 | 0.006 | 0.105 | 0.010 | 0.259 | 0.110 | 0.316 | 0.137 | 1.291 | 0.274 | ||||||
| Doppler | −0.014 | 0.003 | −0.086 | 0.012 | 0.097 | 0.041 | −0.121 | 0.028 | 0.462 | 0.153 | ||||||
| HeavySine | −0.013 | 0.004 | −0.016 | 0.004 | 0.178 | 0.064 | −0.022 | 0.022 | 0.682 | 0.181 | ||||||
|
| ||||||||||||||||
| 29 | Blocks | 0.003 | 0.003 | 0.016 | 0.002 | 0.092 | 0.051 | −0.076 | 0.021 | 0.867 | 0.229 | |||||
| Bumps | −0.003 | 0.002 | −0.044 | 0.004 | 0.128 | 0.051 | 0.281 | 0.109 | 1.122 | 0.151 | ||||||
| Doppler | 0.006 | 0.001 | −0.091 | 0.011 | 0.097 | 0.019 | −0.088 | 0.012 | 0.343 | 0.108 | ||||||
| HeavySine | 0.007 | 0.002 | −0.019 | 0.008 | 0.078 | 0.032 | 0.007 | 0.010 | 0.436 | 0.108 | ||||||
4.2. An application to northern hemisphere temperature data
For an application we considered the Northern hemisphere temperature data, measured in months during the years 1854–1989, gathered by the Climate Research Unit of the University of East Anglia, England. This dataset is a benchmark in the long memory literature and has been used widely for the study of global warming. Beran (1994) fitted a linear trend model yt = β0 + β1t + εt to the data and applied the ARFIMA(0, d, 0) model to the residuals that resulted from detrending the data with the ordinary least square (OLS) estimate. The OLS estimate of β1 is 0.00032, and and were 0.37 and 0.0089, respectively. Beran and Feng (2002) obtained and a 95% confidence interval (CI) of (0.19, 0.46) by SEMIFAR model. On the other hand, one can find that the variability of the series at the beginning is larger than for the rest of the observations. Craigmile, Guttorp and Percival (2005) obtained with a 95% CI of (0.317, 0.408) and with an estimation method that ignored the non-constant variance of the data, and an estimate of with a 95% CI of (0.323, 0.415) and when taking into account the non-constant variability.
We applied our wavelet-based MCMC method for PLR models to the Northern hemisphere data. We chose ARFIMA(0, d, 0) as the error term. We discarded the first 608 temperatures, obtaining N = 1, 024 measurements. This refinement of data was needed to meet the stationarity assumption of the long memory error in our model. Figure 1.2 shows the data versus the estimated trend line (left) and the data versus the estimated nonparametric function after detrending them with the estimated trend (right). The estimates of β, d, and were 0.0006, 0.3660, and 0.0278, respectively. Our estimate of d is close to those found by Beran (1994) and Craigmile, Guttorp and Percival (2005). Our estimate of is closer to the one obtained by Craigmile, Guttorp and Percival (2005) when the nonconstant variability is taken into account. Overall, the temperature in the Northern hemisphere seems to increase approximately 0.72 degree in Celsius per century. Figure 1.3 shows the MCMC traces and the density plots of the estimated parameters.
Figure 1.2.

Left: Northern Hemisphere temperature data with N = 1, 024 (dashed line) and fitted trend (solid line), Right: Northern Hemisphere temperature data after detrending by the estimated trend (dashed line) and estimated nonparametric function (solid line).
Figure 1.3.

Northern Hemisphere Temperature: MCMC traces of , , and corresponding density plots after a burn-in period of 300 iterations.
5. Concluding Remarks
We have proposed a wavelet-based Bayesian method for the estimation of the model parameters and the nonparametric function in PLR models with long memory errors. We have taken advantage of the sparsity property of discrete wavelet transforms that reduces the strongly correlated response variable of the model to a nearly uncorrelated one. We have designed a Markov chain Monte Carlo method to obtain the posterior distributions of the model parameters and the nonparametric function. We have shown via simulation studies that the proposed method is promising and have demonstrated how it can be applied, by using the benchmark Northern hemisphere temperature data. The contribution of our work, with respect to existing literature, relies in incorporating strongly correlated long memory errors into PLR models, and in exploiting the whitening properties of the discrete wavelet transforms to design a computationally inexpensive inferential procedure.
Although we have chosen ARFIMA processes for the long memory error of the model, the proposed procedure can be easily applied to other long memory processes, such as fractional Brownian motion (fBm) or fractional Gaussian noise (fGn). Extensions to non-equally spaced designs for the nonparametric predictor function can be also considered. In this setting inference cannot rely on models that imply the a posteriori independence of the coefficients, unlike in the case of equispaced data. Mixture prior models can still be applied to the coefficients of the wavelet expansion but appropriate inferential procedures need to be developed, perhaps along the lines of what done by Park, Vannucci and Hart (2005).
Acknowledgement
The authors thank an associate editor and two referees for their valuable comments on an earlier version of the paper. Vannucci is supported by NSF, DMS-0605001, and by NIH, R01 HG003319-01.
Appendix. MCMC on Full Conditional Distributions
Let , , and , where . We sample the parameters by iterating among the following steps:
-
(1)
sample β from β|Ψ, , ϑ, γ, ω, U ~ N((U*′U*)−1U*′ (ω* − ϑ*), );
-
(2)sample Ψ from
-
(3)
sample from , Ψ, ϑ, γ, ω, U ~ IG(N/2, [(ω* − U* β − ϑ*)′(ω* − U* β − ϑ*)]/2), where IG(a, b) denotes the inverse gamma distribution with parameters a and b and pdf p(x|a, b) ~ (ba/Γ(a))x−(a+1)e−b/x;
-
(4)sample γjk from P(γjk = 1|β, Ψ, , ωjk, uijk) = Ojk/(Ojk + 1) where
-
(5)sample ϑ from
Note that, like the prior model, the full conditional distribution of ϑjk is a mixture of a normal distribution and a point mass at zero. Since the full conditional distribution of Ψ does not have a known closed form, we use a Metropolis sampler with independent Gaussian proposal distributions.
References
- Abramovich F, Sapatinas T, Silverman BW. Wavelet thresholding via a Bayesian approach. J. Roy. Statist. Soc. Ser. B. 1998;60:725–49. [Google Scholar]
- Antoniadis A, Bigot J, Sapatinas T. Wavelet estimators in nonparametric regression: A comparative simulation study. J. Statist. Soft. 2001;6:1–83. [Google Scholar]
- Beran J. Statistics for Long-Memory Processes. Chapman and Hall; New York: 1994. [Google Scholar]
- Beran J, Feng Y. SEMIFAR models-A semiparametric approach to modelling trends, long-range dependence. Comput. Statist. Data Anal. 2002;40:393–419. [Google Scholar]
- Beran J, Ghosh S. Root-n-consistent estimation in partial linear models with long-memory errors. Scand. J. Statist. 1998;25:345–357. [Google Scholar]
- Chang XW, Qu L. Wavelet estimation of partially linear models. Comput. Statist. Data Anal. 2004;47:31–48. [Google Scholar]
- Clyde M, Parmigiani G, Vidakovic B. Multiple shrinkage and subset selection in wavelets. Biometrika. 1998;85:391–402. [Google Scholar]
- Craigmile PF, Percival DB. Asymptotic decorrelation of between-scale Wavelet Coe cients. IEEE Trans. Inform. Theory. 2005;51:1039–1048. [Google Scholar]
- Craigmile PF, Guttorp P, Percival DB. Wavelet-based parameter estimation for polynomial contaminated fractionally differenced processes. IEEE Trans. Signal Process. 2005;53:3151–3161. [Google Scholar]
- Donoho DL, Johnstone IM. Ideal spatial adaptation by wavelet shrinkage. Biometrika. 1994;81:425–455. [Google Scholar]
- Donoho DL, Johnstone IM. Adapting to unknown smoothness via wavelet shrinkage. J. Roy. Statist. Soc. Ser. B. 1995;57:301–369. [Google Scholar]
- Doornik JA, Ooms M. Computational aspects of maximum likelihood estimation of autoregressive fractionally integrated moving average models. Comput. Statist. Data Anal. 2003;42:333–348. [Google Scholar]
- Engle RF, Granger CWJ, Rice J, Weiss A. Semiparametric estimates of the relation between weather and electricity sales. J. Amer. Statist. Assoc. 1986;81:310–320. [Google Scholar]
- Fadili MJ, Bullmore ET. Penalized partially linear models using sparse representations with an application to fMRI time series. IEEE Trans. Signal Process. 2005;53:3436–3448. [Google Scholar]
- Germán AP, Wenceslao GM, Philippe V. Estimation and testing in a partial linear regression model under long-memory dependence. Bernoulli. 2004;10:49–78. [Google Scholar]
- Granger CW, Joyeux R. An introduction to long memory time series models and fractional differencing. J. Time Ser. Anal. 1980;1:15–29. [Google Scholar]
- Hosking JRM. Fractional differencing. Biometrika. 1981;68:165–176. [Google Scholar]
- Ko K, Vannucci M. Bayesian wavelet analysis of autoregressive fractionally integrated moving-average processes. J. Statist. Plann. Inference. 2006;136:3415–3434. [Google Scholar]
- Koop G, Porier D. Bayesian variants of some classical semiparametric regression techniques. J. Econometrics. 2004;123:259–282. [Google Scholar]
- Lenk PJ. Bayesian inference for semiparametric regression using a Fourier representation. J. Roy. Statist. Soc. Ser. B. 1999;61:863–879. [Google Scholar]
- Mallat SG. A theory for multiresolution signal decomposition: the Wavelet Representation. IEEE Trans. Pattern Anal. Mach. Intell. 1989;11:674–693. [Google Scholar]
- McLeod AI, Hipel KW. Preservation of the rescaled adjusted range, parts 1, 2 and 3. Water Resources Research. 1978;14:491–512. [Google Scholar]
- Park CG, Vannucci M, Hart JD. Bayesian methods for wavelet series in single-index models. J. Comp. Graph. Statist. 2005;14:770–794. [Google Scholar]
- Qu L. Wavelet thresholding in partially linear models: a computation and simulation. Appl. Stoc. Models Bus. Ind. 2003;19:221–230. [Google Scholar]
- Qu L. Bayesian wavelet estimation of partially linear models. J. Statist. Comput. Simulation. 2006;76:605–617. [Google Scholar]
- Sowell F. Maximum likelihood estimation of stationary univariate fractionally integrated time series models. J. Econometrics. 1992;53:165–188. [Google Scholar]
- Taswell C, McGill KC. Wavelet transform algorithms for finite-duration discrete-time signals. ACM Trans. Math. Soft. 1994;20:398–412. [Google Scholar]
- Tewfik AH, Kim M. Correlation structure of the discrete wavelet coefficients of fractional Brownian motion. IEEE Trans. Inform. Theory. 1992;38:904–909. [Google Scholar]
- Vannucci M, Corradi F. Covariance structure of wavelet coefficients: Theory and models in a Bayesian perspective. J. Roy. Statist. Soc. Ser. B. 1999;61:971–986. [Google Scholar]
