Summary
We introduce methods for estimating the spectral density of a random field on a
-dimensional lattice from incomplete gridded data. Data are iteratively imputed onto an expanded lattice according to a model with a periodic covariance function. The imputations are convenient computationally, in that circulant embedding and preconditioned conjugate gradient methods can produce imputations in
time and
memory. However, these so-called periodic imputations are motivated mainly by their ability to produce accurate spectral density estimates. In addition, we introduce a parametric filtering method that is designed to reduce periodogram smoothing bias. The paper contains theoretical results on properties of the imputed-data periodogram and numerical and simulation studies comparing the performance of the proposed methods to existing approaches in a number of scenarios. We present an application to a gridded satellite surface temperature dataset with missing values.
Keywords: Circulant embedding, Conjugate gradient, Covariance function, Gaussian process, Nonparametric estimation, Semiparametric estimation, Spatial statistics
1. Introduction
Random fields defined on the integer lattice have wide applications in modelling gridded spatial and spatial-temporal datasets. They also form the basis for some models for non-gridded data (Nychka et al., 2015). The large sizes of modern spatial and spatial-temporal datasets entail an enormous computational burden when using traditional methods for estimating random field models. Modelling data on a grid provides a potential solution to the computational issue, since there exist some methods based on the discrete Fourier transform which can be computed efficiently with fast Fourier transform algorithms. However, there are some pitfalls associated with discrete Fourier transform-based methods related to edge effects and the handling of missing data. This paper provides an accurate and computationally efficient estimation framework for addressing those issues.
Let
,
, be a zero-mean stationary process on the
-dimensional integer lattice, that is,
and
for every
and
in
. Herglotz’s theorem states that the covariance function has a Fourier transform representation,
![]() |
(1) |
where
and
is the dot product. The function
is a spectral measure, and we assume throughout that it has a continuous derivative
, called a spectral density. We focus on estimation of
, which encodes the covariance function and thus is crucial for prediction of missing values and for regressions when
is used as a model for residuals. We restrict our attention to stationary models and note that stationary models often form the basis for more flexible nonstationary models that are needed to accurately model many physical processes (Fuentes, 2002).
Suppose that we observe vector
at a distinct set of
locations
. If
or
has a known parametric form and we assume that
is a Gaussian process, then we can use likelihood-based methods for estimating the parameters, which generally requires
memory and
floating point operations. If the locations form a complete rectangular subset of the integer lattice, we can use Whittle’s likelihood approximation (Whittle, 1954), which leverages fast Fourier transform algorithms in order to approximate the likelihood in
FLOPs and
memory. Guyon (1982) showed that, due to edge effects, the Whittle likelihood parameter estimates are not root-
consistent when the dimension
of the field is greater than 1. Dahlhaus & Künsch (1987) suggested the use of data tapers to reduce edge effects and proved that the tapered version of the likelihood approximation is asymptotically efficient when
. Stroud et al. (2017) and Guinness & Fuentes (2017) suggested the use of periodic embeddings and demonstrated their accuracy in numerical studies. Sykulski et al. (2019) introduced a debiased Whittle likelihood.
If one is not willing to assume that
or
has a known parametric form, and if the data are observed on a complete rectangular grid, nonparametric methods can be used to estimate
. The standard approach uses the discrete Fourier transform,
![]() |
and estimates the spectrum with a smoothed version of the periodogram
,
![]() |
where
is a smoothing kernel. Selection of the kernel bandwidth has been studied by Lee (1997), Ombao et al. (2001) and Lee (2001). Alternatively, one can smooth using penalized likelihoods (Wahba, 1980; Chow & Grenander, 1985; Pawitan & O’Sullivan, 1994) or smooth priors in a Bayesian setting (Zheng et al., 2009). Politis & Romano (1995) provided a method for reducing bias in the smoothed periodogram. Heyde & Gay (1993) studied asymptotic properties of the periodogram in an increasing domain setting, while Stein (1995) studied them in an increasing resolution setting, noting the importance of data filtering. Lim & Stein (2008) considered the multivariate case.
The nonparametric methods discussed above apply when a complete dataset is available on a rectangular grid. However, even when available on a grid, spatial datasets often have many missing values; for example, it is common to encounter gridded satellite datasets with some values obscured by clouds. Missing values complicate two aspects of periodogram-based estimators. The first is that a surrogate for the missing values must be substituted. Fuentes (2007, § 3) suggested replacing missing values with zeros and scaling the periodogram by the number of observed grid cells. Also of relevance is the extensive theoretical literature on spectral domain analysis for irregularly sampled spatial data (Matsuda & Yajima, 2009; Bandyopadhyay & Lahiri, 2009; Bandyopadhyay et al., 2015; Deb et al., 2017; Subba Rao, 2018), which can be applied to incomplete gridded data as well. All of these approaches use a discrete Fourier transform of the sampled data, which for gridded datasets is equivalent to the zero-infill approach in Fuentes (2007, § 3). Numerical comparisons between a zero-infill approach and our new approach are given in § 4. A second problem for spatial data is that scattered missing values seriously disrupt the use of differencing filters. For example, two-dimensional differencing at an observed location
can be applied only if observations at
,
, and
are observed as well.
To address these issues, this paper introduces computationally efficient methodology for estimating the spectrum based on imputing missing values with conditional simulations and iteratively updating the spectrum estimate, in a similar vein to the method proposed by Lee & Zhu (2009) for time series data. The novelty of our approach is that the missing values are imputed onto an expanded lattice under a covariance function that is periodic on the expanded lattice. These periodic imputations or periodic conditional simulations are convenient computationally, since circulant embedding and preconditioned conjugate gradient methods can be employed for efficient imputations, but their main appeal is their ability to produce accurate estimates via the amelioration of edge effects. We provide thorough numerical studies and theoretical results describing when the imputed-data spectrum is expected to give an estimate with a smaller bias than the spectrum used for imputation, which suggests that existing spectral density estimates can be improved through periodic embedding.
The theoretical results provide a sound basis for the nonparametric estimation methods and give some insight into why the parametric methods in Guinness & Fuentes (2017) perform so well in simulations. Additionally, this paper introduces a parametric filtering method based on fitting simple parametric models within the iterative method. The fitted parametric models can be used to filter the data, which is effective for reducing bias due to periodogram smoothing. Taken together, this work develops accurate and computationally efficient methods for estimating spectral densities when the gridded data have arbitrary missingness patterns. We present thorough numerical and simulation studies for the methods and demonstrate that even a small amount of lattice expansion provides substantial bias and correlation reduction. We apply the methods to a gridded but incomplete land surface temperature dataset.
2. Methodology
2.1. Notation and background
Let
with
, and define the hyper-rectangle
, where
![]() |
If
, this is simply a rectangular lattice of size
. We assume that the observation locations
form a subset of
, and so we call
the observation lattice. Define
to be the vector containing the process at the remaining locations
. Throughout, we asssume that
is missing at random, meaning that the missingness is potentially related to
but not related to the value of
(Little & Rubin, 2014). This section describes several existing and new iterative methods for estimating a spectral density
. All of these methods proceed by updating the spectrum estimate at the
th iteration,
, to the next estimate,
. Although the specific updating formulas vary, we use the notation
for all of them to keep the number of symbols manageable.
For time series data, Lee & Zhu (2009) proposed an iterative method for obtaining nonparametric estimates of the spectrum. Let
denote expectation in the zero-mean multivariate normal distribution for
under
with covariance given by (1). Their method can be extended from one dimension to general dimensions with the updating formula
![]() |
(2) |
where
is the set of Fourier frequencies associated with a grid of size
. The procedure is then iterated over
until convergence. Here, we use a smoothing kernel, but Lee & Zhu (2009) noted that any smoothing method can be applied. The conditional expectation of the periodogram under
is computationally expensive, so Lee & Zhu (2009) proposed replacing the expected value with an average over
independent realizations of
given
, as in
![]() |
(3) |
where
is the discrete Fourier transform derived from
, with
being independent Gaussian conditional simulations of
given
under
. Replacing the conditional expectation with a sample average is analogous to the approach taken in the iterative method in Tanner & Wong (1987) for Bayesian estimation of parametric statistical models. In this case, using a sample average creates a convergence issue, in that the Monte Carlo error causes the spectra in (3) to fluctuate indefinitely. In § 2.2, we propose an alternative averaging scheme, as well as imputation under a periodic model.
2.2. Periodic imputation
When
, edge effects become a prominent issue (Guyon, 1982); in particular, the Whittle likelihood can be interpreted as the exact likelihood for a model in which the field is periodic on the observation lattice (Guinness & Fuentes, 2017). Data tapers have been proposed to alleviate the issue, but tapering can lead to loss of information from data near the boundaries or near missing values. In this paper, we propose extending the hyper-rectangle in each dimension and performing the imputations under a periodic approximation to the covariance function. Surprisingly, using the periodic approximation to the covariance function for the imputations, rather than the true covariance function, leads to improved spectral density estimates. This is demonstrated numerically in § 4. Periodic models also facilitate straightforward implementation of circulant embedding techniques to simulate from the conditional distributions efficiently.
Let
, and define
so that
for
. Define
to be the total number of locations in
, which we refer to as the embedding lattice. Let
denote the vector of missing values on
and
denote expectation in the zero-mean multivariate normal distribution for
with covariance function
, defined as
![]() |
(4) |
where
are the Fourier frequencies associated with
. For every
, the function
is periodic in
with period
. This ensures that
is periodic on
in each dimension and is not the integral Fourier transform of
that appears in (1). We refer to a draw of
under
as a periodic conditional simulation or a periodic imputation. Figure 1 contains an example with
.
Fig. 1.
Data on the observation lattice
, data on the embedding lattice
, and a periodic conditional simulation.
Using conditional expectations, the update in the periodic model is
![]() |
(5) |
The conditional expectation in the Lee & Zhu (2009) estimator in (2) is calculated on the observation lattice and using the correct model, whereas in (5) we use the conditional expectation under a model that is periodic on the embedding lattice. As before, the conditional expectation can be replaced by the average over one or several conditional simulations. To address the convergence issue mentioned in § 2.1, we propose an alternative updating formula consisting of a burn-in period of
iterations and convergence monitoring based on the asymptotic standard deviation of the complete-data smoothed periodogram,
![]() |
Our full proposed estimation algorithm is as follows. Initialize
as a constant flat spectrum, and given spectrum
, update as follows.
Step 1. For
, conditionally simulate
given
under
.Step 2. For
, compute
from
.- Step 3. Update spectrum as

The algorithm is stopped when
![]() |
To summarize, during the
burn-in iterations, we use the sample average version of (5). After burn-in, the updating formula uses a weighted average of the previous spectrum and the current smoothed periodogram. Using a burn-in period avoids averaging over spectra from the first few iterations. Convergence is relative to the asymptotic standard deviation of the complete-data smoothed spectrum and a tolerance criterion
, which we take to be
or
in practice. We typically take
in practice. The Appendix contains details on how circulant embedding and preconditioned conjugate gradient methods can be employed to efficiently compute the periodic conditional simulations.
2.3. Variant with parametric filter
Even if
is unbiased for
, the smoothing step can introduce some bias in the spectral density estimate. For spectral densities with large dynamic range, data filters have been proposed to pre-whiten the data prior to smoothing (Stein, 1995). Missing data pose a challenge for data filters, but filters can easily be applied to the imputed data at each iteration. In this subsection, we propose a parametric filtering method that we show in simulations is successful in reducing smoothing bias.
Let
be a parametric spectral density. The imputed-data Whittle likelihood approximation is
![]() |
Let
be the maximizer of
. Then update as
![]() |
As before, in practice we replace
with a sample average that can be computed efficiently. The completely nonparametric variant is a special case with
constant. Using the parametric step in the smoothing serves to flatten the periodogram, which we show in simulation studies is helpful for reducing smoothing bias. This allows for the use of wider smoothing kernels, which reduces variance as well.
The parametric Matérn covariance is a popular choice for modelling spatial data, and so we recommend using some form of the Matérn covariance for the parametric model. Guinness & Fuentes (2017) described a quasi-Matérn covariance, whose spectral density can be evaluated quickly without aliasing calculations. Based on their results, we recommend using the quasi-Matérn covariance in practice. A special case of it is explored in § 4.
3. Theory
This section studies bias in the imputed-data periodogram and correlation in the imputed-data discrete Fourier transform vector. We use the notation that
is the true spectrum and
is a spectrum to be used for imputation. The theorem should be interpreted as a statement about how the discrete Fourier transform vector behaves given a particular imputation spectrum, not about the iterative procedure itself. Section 4 contains a numerical exploration of the iterative procedure, and § 6 discusses issues related to the theoretical study of the iterative procedure.
Let
, without parentheses, be the covariance matrix for
under the periodic covariance function
in (4) with spectrum
. Partition
as
, so that
and
are the covariance matrices for
and
, respectively. Let
denote the covariance matrix for
under the true nonperiodic covariance function
in (1). Note that
is
, while
is
. Define
to be the covariance matrix for
under periodic covariance function
with spectrum
, and define
,
, and
accordingly. Throughout, we assume that both
and
are bounded above and below by positive constants. If
is a periodic conditional simulation given observations
under
, then the true covariance matrix for
is
![]() |
(6) |
The matrix
is a key object of study, and it is of interest to understand its Fourier spectrum. To this end, define the
vector
to have entries
, where
is a Fourier frequency and
, with the entries of
ordered as they are in
. Define
![]() |
where
is complex conjugate and
is conjugate transpose, so that
is the Fourier spectrum of
from which we construct our estimates of the spectrum. Likewise, we define
if
and
if
. This notation is useful for succinct theorem statements and reflects the fact that the true bispectrum is zero off the diagonal for stationary models. It is of interest to study
, which for
corresponds to the bias of the periodogram and for
measures dependence in the periodogram, both of which should ideally be near zero.
The difference
will be exactly zero for every
and
if and only if
due to the uniqueness of the Fourier transform. Inspection of (6) suggests that
approaches
if both
and
approach
. The entries of
come from the true covariance function
, and the entries of
come from the periodic covariance function
. To see when
approaches
, consider the multi-dimensional Poisson summation formula,
![]() |
where
is the elementwise product
. This says that
, and thus
, approaches zero whenever
decays quickly enough, which can be ensured by placing smoothness conditions on the spectrum. We now state the main result.
Theorem 1.
Let
have
continuous partial derivatives,
, and
for
. Define
. Then for every
,
meaning that the difference contains two terms with the respective rates.
The first term in the rate derives from the decay of the covariances
. This term decays quickly with
when the spectrum is smooth and the dimension of the domain is small. The second term concerns the proportion of missing values relative to the number of observed values, which, when small, overwhelms the fact that
. The assumptions about how the observation grid grows with
are standard assumptions that ensure that each dimension grows at the same rate with
. When the spectrum is smooth enough, the first decay rate is better than the usual
(Guyon, 1982) or even
rate for the bias of the non-imputed periodograms. The proof is given in the Appendix along with intermediate results that assume a correct imputation spectrum.
The implication of the theorem is that when
is large enough and
is small enough, we can initialize the iterative algorithm with any estimate of the spectrum (e.g., Fuentes, 2007; Matsuda & Yajima, 2009), and one step in the iterative algorithm will decrease the bias relative to the initialized estimate. The theorem does not make any claims about convergence of the iterative algorithm; these issues are explored numerically in § 4.
4. Numerical studies and simulations
To provide more insight into the behaviour of the proposed estimation methods, we present a numerical study analysing the bispectrum of the imputed data and simulation results comparing the proposed estimators to other spectral density estimators. The numerical study involves calculations of the bispectrum from covariance matrices and thus involves no simulated data. In the simulation study, we estimate the spectral densities on simulated datasets, which allows us to study sampling variability and the effect of smoothing on the estimated spectral densities. In both the numerical study and the simulation study, we consider data on square grids under three missingness settings as shown in Fig. 2. The first setting has 30% scattered missing values. The second setting has a missing block in the centre of the grid, with roughly 30% of the total missing. The third setting has no missing values.
Fig. 2.
Example realizations from the three missingness settings, with missing values in white.
In the numerical study, we assume that the true covariance function is
, with data on a
grid. Let
be the bispectrum of
, that is,
![]() |
Then for
, let
, where
![]() |
This numerical study mirrors a setting where we initialize the iterative procedure with the periodogram of the non-imputed data. This is repeated for four values of expansion factor
. We quantify the error in the bispectrum with an integrated normalized squared bias
![]() |
The results for the integrated normalized squared bias are shown in Table 1. The column for iteration 0 corresponds to bias in the non-imputed-data periodogram and has values that are quite large compared to the imputed-data periodograms, especially in Setting 1. Rows 1 and 5 correspond to imputation of missing values on the original data domain; row 9 has no missing values. We see that imputing missing values on the original domain offers some improvement. However, imputing on an expanded domain gives biases that orders of magnitude smaller in many cases, and the biases decrease substantially in just a few iterations. It is also apparent that even a small amount of expansion lowers the bias; for example, expanding the domain by four pixels
gives biases near zero even though the spatial range parameter is twice as large as the domain expansion.
Table 1.
Integrated normalized squared bias under exponential covariance model, for three missingness settings, four expansion factors, including no expansion
, and zero to six iterations
| Iteration | ||||||||
|---|---|---|---|---|---|---|---|---|
| Setting | Expansion
|
0 | 1 | 2 | 3 | 4 | 5 | 6 |
| 1 | 32/32 | 757.6 | 9.584 | 5.946 | 5.457 | 5.457 | 5.516 | 5.562 |
| 1 | 34/32 | 866.4 | 5.077 | 1.181 | 0.406 | 0.230 | 0.185 | 0.173 |
| 1 | 36/32 | 971.6 | 5.663 | 1.466 | 0.432 | 0.152 | 0.069 | 0.043 |
| 1 | 38/32 | 1083.3 | 6.332 | 1.933 | 0.638 | 0.228 | 0.090 | 0.040 |
| 2 | 32/32 | 27.20 | 8.622 | 8.305 | 8.201 | 8.161 | 8.144 | 8.136 |
| 2 | 34/32 | 24.87 | 0.613 | 0.279 | 0.222 | 0.210 | 0.206 | 0.206 |
| 2 | 36/32 | 27.99 | 0.494 | 0.133 | 0.059 | 0.040 | 0.035 | 0.033 |
| 2 | 38/32 | 31.40 | 0.531 | 0.146 | 0.052 | 0.024 | 0.015 | 0.011 |
| 3 | 32/32 | 7.990 | 7.990 | 7.990 | 7.990 | 7.990 | 7.990 | 7.990 |
| 3 | 34/32 | 5.489 | 0.231 | 0.201 | 0.200 | 0.200 | 0.200 | 0.200 |
| 3 | 36/32 | 6.297 | 0.083 | 0.035 | 0.031 | 0.031 | 0.031 | 0.031 |
| 3 | 38/32 | 7.180 | 0.079 | 0.016 | 0.010 | 0.009 | 0.009 | 0.009 |
In the simulation study, we use an
grid in Settings 1 and 2, and a
grid in Setting 3. Data are generated from a zero-mean Gaussian process model with Matérn covariance function
![]() |
with three different choices of smoothness parameter
, range parameter 8, and variance parameter 2.
We consider several methods for estimating the spectral densities. The first method uses a smoothed periodogram computed from the discrete Fourier transform of the sampled data, scaled by the number of observations
. This method is described in § 1 and is the approach suggested by Fuentes (2007), Matsuda & Yajima (2009), Bandyopadhyay & Lahiri (2009), Bandyopadhyay et al. (2015), Deb et al. (2017), and Subba Rao (2018). The second method uses a periodogram computed from tapered data. We define one-dimensional cosine tapers
and
applied to 5% of the observations on each of the two edges, and the taper function is the outer product
. In Setting 1, the taper function is set to zero whenever there is a missing value. In Setting 2, which includes a square of missing values in the centre, we also taper the interior observations. The periodograms of tapered data are normalized by the sum of the squares of the taper function. Additionally, we consider the Lee & Zhu (2009) estimator described in § 2, i.e., nonperiodic imputation, and variants of their method that use lattice expansion and/or parametric filters. Using a nonperiodic embedding method allows us to separate the effect of using a larger lattice from the effect of imputing periodically.
For the imputation-based methods proposed in this paper, we consider lattice expansion factors
. We also consider two settings for the use of a parametric filter, the first being no filter, and the second with a filter of the form
![]() |
where
. This choice for the parametric model is a member of the quasi-Matérn family (Guinness & Fuentes, 2017) and is deliberately misspecified for the two cases
and
. Lindgren et al. (2011) showed that this model can approximate the Matérn covariance with smoothness parameter equal to 1.
All of the imputation-based estimation methods use
conditional simulations,
burn-in iterations, and convergence criterion
. The estimate from the
th dataset is denoted by
. All methods use a Gaussian smoothing kernel proportional to
, where the distance
is defined periodically on the domain
. We consider two metrics for evaluating the estimation methods. The first is a relative bias
![]() |
where
is the total number of simulated datasets and
is the true spectrum. The second metric is a mean relative squared error
![]() |
To evaluate relative bias on an equal footing, we compare all methods using a small value of
. Figure 3 contains plots of the relative bias for the nontapered and tapered methods, and for the nonfiltered and filtered periodic and nonperiodic embedding methods with
. Results for
are shown and results for larger values of
are similar. In Setting 1, the nontapered and tapered methods have a very large relative bias at almost every frequency. They estimate far too much power at higher frequencies, due to the fact that imputing with zeros produces fields that are rougher than the underlying process. In contrast, the periodic embedding methods have small bias. In Setting 2, the nontapered and tapered biases improve, but are still larger than the periodic embedding biases, especially for low frequencies. The relative biases for nontapered and tapered methods are similar in Setting 3 and are still larger than the periodic embedding relative biases. Though not shown here, the biases for
and
are similar. The parametric filters serve to reduce the bias compared to not filtering. The periodic embedding methods have a small bias near
; based on the accuracies shown in the numerical studies, this bias is likely due to smoothing bias because of the sharply peaked spectra near the origin. Imputing nonperiodically does not substantially improve the bias in Settings 2 and 3. It does improve bias in Setting 1, but it is not as effective as periodic embedding.
Fig. 3.
Relative bias as a function of frequency for the three missingness settings under
and six estimation settings: (a) not tapered; (b) tapered; (c)
, no filter, not periodic; (d)
, parametric filter, not periodic; (e)
, no filter, periodic; (f)
, parametric filter, periodic.
To evaluate mean relative squared error on an equal footing, all methods were computed with a range of choices for
; the reported results are for the value of
that minimized
![]() |
the root integrated mean relative squared error over all Fourier frequencies. Table 2 contains root integrated mean relative squared error results for the various methods. The periodic embedding methods with
are more accurate than both the nontapered and the tapered periodogram estimates in every case. In Setting 1, the nontapered and tapered estimates are quite poor, likely due to the large biases seen in Fig. 3. For periodic embedding, we see that the values improve when
but do not improve beyond
. This is consistent with the numerical studies that showed a small amount of periodic embedding was sufficient. Filtering provides a further improvement, reducing the values by 30–40%. In Setting 2, the nontapered and tapered estimates improve substantially, and the periodic embedding methods offer further improvement. Imputing missing values is an improvement, but imputing periodically always gives better results than imputing nonperiodically. This can be seen by comparing the
results to the
results and by comparing the periodic to the nonperiodic imputation results. In Setting 3, the parametric filter performs similarly to tapering, but periodic embedding with parametric filtering is by far the most accurate method when
.
Table 2.
Root integrated mean relative squared error results
| Missingness setting | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | |
| impute - filter - periodic |
|
|
|
||||||
| no - no - no | 3.495 | 0.560 | 0.478 | 32.11 | 3.145 | 2.299 | 257.3 | 17.03 | 15.36 |
| no - taper - no | 3.498 | 0.291 | 0.342 | 31.83 | 0.472 | 0.900 | 255.8 | 0.920 | 3.735 |
- no - no |
0.389 | 0.423 | 0.462 | 1.913 | 2.093 | 2.294 | 9.107 | 12.49 | 15.36 |
- no - no |
0.362 | 0.379 | 0.412 | 1.734 | 1.930 | 2.097 | 8.691 | 11.05 | 12.28 |
- no - no |
0.397 | 0.402 | 0.439 | 1.858 | 2.098 | 2.313 | 9.669 | 11.88 | 13.35 |
- yes - no |
0.313 | 0.320 | 0.323 | 1.168 | 1.197 | 1.559 | 6.138 | 8.280 | 8.803 |
- yes - no |
0.284 | 0.296 | 0.296 | 1.001 | 1.041 | 1.260 | 5.775 | 6.469 | 7.511 |
- yes - no |
0.296 | 0.312 | 0.309 | 1.022 | 1.062 | 1.266 | 5.689 | 7.443 | 7.587 |
- no - yes |
0.367 | 0.423 | 0.462 | 1.684 | 2.092 | 2.294 | 8.418 | 12.48 | 15.36 |
- no - yes |
0.208 | 0.219 | 0.259 | 0.238 | 0.268 | 0.326 | 0.288 | 0.297 | 0.382 |
- no - yes |
0.205 | 0.223 | 0.262 | 0.247 | 0.256 | 0.309 | 0.281 | 0.280 | 0.353 |
- yes - yes |
0.253 | 0.319 | 0.323 | 0.908 | 1.195 | 1.559 | 5.348 | 8.266 | 8.803 |
- yes - yes |
0.136 | 0.145 | 0.153 | 0.096 | 0.088 | 0.108 | 0.141 | 0.141 | 0.166 |
- yes - yes |
0.133 | 0.143 | 0.153 | 0.097 | 0.091 | 0.109 | 0.142 | 0.137 | 0.156 |
5. Application to satellite data
To illustrate the practical usefulness of the proposed methods, we analyse a gridded land surface temperature dataset. These data were used recently in Heaton et al. (2018), a study comparing various Gaussian process approximations. The data were originally collected by the Moderate Resolution Imaging Spectrometer on board the NASA Terra Satellite. The region is a grid of 500 by 300 locations in the latitudinal range of
to
and longitudinal range of
to
, roughly 450 km by 300 km with grid spacing 1100 m in the north/south direction and 900 m in the east/west direction. The values in the dataset represent land surface temperature in degrees Celsius. The dataset has
nonmissing values, which are plotted in the top left panel of Fig. 4. We can see that there is a distinct trend from the southeast to the northwest corner, so we include a linear trend in the mean function, estimated by generalized least squares.
Fig. 4.
Original data, predictions, standard deviations, and three conditional simulations of the missing values.
We have found that
is a reasonable convergence tolerance criterion, and we choose
burn-in iterations. We use a crossvalidation procedure to choose the smoothing parameter. A random subset of 30% of the data is held out; the iterative methods are run with a range of smoothing parameters, and the parameter that minimizes sum of squared prediction errors was chosen.
In Fig. 4, we plot the original data, the conditional expectation, an estimate of the conditional standard deviations, and three conditional simulation plots. The conditional standard deviations are estimated by computing 30 conditional simulations and finding the root mean squared difference between the conditional expectation and each of the conditional simulations at each pixel. On average, each conditional simulation took just 2.76 seconds and converged in 25 iterations with the Vecchia preconditioner, and took 15.48 seconds and converged in 159 iterations with the inverse spectrum preconditioner. The iterative spectrum estimation method took 4.86 minutes to converge. While these timings indicate that the analysis is feasible on a large dataset, a zero-infill method is much faster, taking just 0.06 seconds. All timings were carried out on a 2016 Macbook Pro with 3.3 GHz Intel Core i7 dual-core processer and 16 GB memory, running R 3.4.2 linked to Apple’s Accelerate BLAS libraries.
Visually, the data appear to have a longer correlation length scale in the northeast-southwest direction than in the southeast-northwest direction. The estimate of the spectrum returned by the iterative method confirms our visual suspicions, as can be seen in Fig. 5 where the logarithm of the estimated spectrum is plotted. The estimated spectrum shows clear signs of anisotropy in that the spectrum has contours that are not circular. Maximum likelihood estimation of anisotropic models is generally difficult due to optimization over additional parameters. In contrast, the nonparametric spectral density estimation methods automatically estimated the anisotropies with no extra computational effort.
Fig. 5.

Log base 10 of the spectral density estimate.
The spectral methods described in this paper were included in the Heaton et al. (2018) comparison project and compared favourably to all of the other methods on all of the prediction and timing metrics, and it was the best performing method for the interval score metric (Gneiting & Raftery, 2007), which rewards forecasts that come with small prediction intervals that often contain the predictand. To gain some intuition for this result, we report some results for
prediction intervals based on a Gaussian assumption. In particular, we sort the predictions
to be increasing in the prediction standard deviation, and then report average prediction standard deviations for
for various ranges of the indices
and
. The results from the periodic spectral methods are compared in Table 3 to predictions that use an isotropic Matérn covariance model, with parameters estimated via Vecchia’s approximation (Vecchia, 1988), as implemented in the
package
(Guinness, 2018; Guinness & Katzfuss, 2018; R Development Core Team, 2019). Vecchia’s approximation applies to parametric models and to both gridded and nongridded data. We can see that while the two methods do not differ substantially for predictions that the model expects to be uncertain, the periodic spectral methods produce smaller prediction intervals and smaller root mean squared prediction errors when the model expects small prediction errors. This is achieved with coverage rates that are larger than those produced by Vecchia’s approximation with an isotropic model.
Table 3.
Average prediction standard deviation and coverages for the specified range of predicted values, with the predicted values sorted according to the fitted models’ prediction standard deviations. In other words, the first column corresponds to prediction results for the 500 predictions that the model expects to be most certain, and the last column corresponds to the predictions expected to be most uncertain
| Index Range | 1 | 501 | 1001 | 2001 | 10001 | 20000 | |
|---|---|---|---|---|---|---|---|
| 500 | 1000 | 2000 | 10000 | 20000 | 44431 | ||
| Periodic | Avg Pred SD | 0.365 | 0.427 | 0.482 | 0.694 | 1.164 | 1.88 |
| Spectral | Std. Dev. | 0.414 | 0.477 | 0.554 | 0.686 | 1.078 | 2.209 |
| 80% Coverage | 81.14 | 82.06 | 83.15 | 84.96 | 85.34 | 73.53 | |
| 90% Coverage | 86.56 | 89.23 | 89.29 | 91.35 | 92.61 | 84.59 | |
| 95% Coverage | 91.47 | 92.82 | 93.08 | 94.72 | 95.99 | 91.19 | |
| Vecchia | Avg Pred SD | 0.501 | 0.548 | 0.585 | 0.749 | 1.198 | 1.876 |
| Std. Dev. | 0.503 | 0.58 | 0.538 | 0.718 | 1.094 | 2.201 | |
| 80% Coverage | 74.88 | 78.55 | 77.23 | 80.71 | 82.05 | 61.12 | |
| 90% Coverage | 84.88 | 87.65 | 87.15 | 88.58 | 90.66 | 74.85 | |
| 95% Coverage | 89.02 | 91.84 | 90.87 | 92.39 | 94.47 | 83.87 |
6. Discussion
The methods involve choosing the factor by which the lattice should be expanded. We have found that even very small factors that expand the lattice by a few pixels are effective at improving the spectral density estimates. We recommend expanding each dimension by an amount roughly equal to the correlation range in the data. The fact that we expand the lattice in the positive direction, rather than in the negative direction or both positive and negative directions, is not important since we assume a periodic model on the expanded lattice. As with most nonparametric spectral density estimates, the methods involve the choice of a smoothing parameter. We have not attempted to provide any new methods for selecting smoothing parameters, as this issue has been well-studied in the literature. However, the parametric filtering methods serve to flatten the periodogram, which makes the estimates less sensitive to the choice of smoothing parameter. In our application to land surface temperature data, we used a crossvalidation procedure to select the smoothing parameter. Though we have chosen
imputation per iteration in every example, the methods allow for
. We suspect that choosing
would drive the iterative methods to converge in fewer iterations but incur a higher computational cost per iteration. Examining the details of this trade-off would be an interesting study. It may be advantageous to use
if the conditional simulations can be computed in parallel.
While many large datasets involve spatially gridded observations, we acknowledge that there is also a need for methods for analysing nongridded data. The nonparametric methods described in this paper may prove useful for analysing nongridded data as well; in fact Nychka et al. (2015) have a framework for analysing nongridded data that includes a lattice process as a model component. Here we have considered stationary models which can also be used as components in nonstationary models (Fuentes, 2002), and so the methods developed here could potentially be extended to be used for local nonparametric estimation of nonstationary models. The paper contains some theoretical results about the iterative procedure, but proving that the iterative algorithm converges remains elusive, partly due to pathological cases in the observed vector
, but this is an important area of future work.
Supplementary Material
Acknowledgement
This research was supported by the U.S. National Science Foundation Division of Mathematical Sciences and the National Institute of Environmental Health Sciences.
Appendix
Circulant embedding and inverse spectrum preconditioner
To see how the conditional simulations of
given
can be computed efficiently, define
to be the covariance matrix for
under covariance function
, and partition
as
![]() |
where
and
are the covariance matrices for the observations
and missing values
, respectively. The conditional expectation for
given
is
. The most demanding computational step for obtaining
is solving the linear system
. Preconditioned conjugate gradient methods for solving linear systems (Greenbaum, 1997) are efficient when the forward multiplication
can be computed efficiently and when we can find a matrix
, called the preconditioner, for which
and for which
can be computed efficiently. Below, we describe how circulant embedding can be used to compute the forward multiplication
efficiently. In practice, we have found that a preconditioner based on Vecchia’s Gaussian process approximation (Vecchia, 1988) is effective and fast for the problems we have studied. This preconditioner was proposed in Stroud et al. (2017). At the end of this section, we give details about another preconditioner based on a submatrix of the inverse of
.
Suppose that
is an
nested block circulant matrix. Nested block circulant includes the special cases of circulant, arising from a periodic and stationary covariance in one dimension, and block circulant with circulant blocks, arising from a periodic and stationary covariance in two dimensions. The matrix
can be written as
, where
is the discrete Fourier transform matrix and
is a diagonal matrix with the eigenvalues on the diagonal. Because of the discrete Fourier matrix representation, one can multiply
in
time and
memory by taking the discrete Fourier transform of
, i.e.,
in
, then multiplying the entries of the resultant vector pointwise by the eigenvalues in
, and then taking an inverse discrete Fourier transform of the result.
The multiplication
can be computed efficiently by embedding the multiplication inside of
![]() |
Then the appropriate entries
can be extracted, and the unnecessary entries
can be discarded. Note that
is not nested block circulant, but there exists a row-column permutation of
that is nested block circulant. Let
denote the permutation matrix such that
is nested block circulant. Then the multiplication can be performed as
![]() |
Thus, the multiplication can be carried out by an appropriate reordering of
in
time, then an
-time multiplication by nested block circulant
, and then an
-time reordering of the result.
The preconditioner
is a submatrix of
. Here, we describe how the multiplication
can be performed efficiently without computing the entries of
. The inverse of
is a permutation of a nested block circulant matrix and can be written as
![]() |
This means that the multiplication
can be embedded in the larger multiplication
![]() |
and the multiplication can be carried out in
time and
memory by a sequence of reorderings, discrete Fourier transforms, and pointwise multiplications.
Proofs of theoretical results
Lemma A1.
If
,
for all
, and
for all
, then for all
,
![]()
Proof.
We have
, and so
. The matrix
can be written as
It suffices to show that
in order to establish the result. According to the multi-dimensional Poisson summation formula, we can relate
and
by (Guinness & Fuentes, 2017, Lemma 1)
(A1) where
. For any
, the observation lattice, we have
for every
. Thus if
,
. Thus at least one element of
has absolute value greater than
when
, and so
for all
, implying that all terms in the sum in (A1) must be zero. This gives us
for any
, and so
. □
Lemma A2.
If
has
continuous partial derivatives,
,
, and
for
, then for all
,
Proof.
As in the proof of Lemma A1,
. Partitioning the vector
as
according to the same partition as
, we have
and so the difference can be bounded as
We will consider each term in turn. Let
denote the spectral radius of symmetric matrix
. Then
where in the second to last inequality, we used
and
is positive definite, so the largest eigenvalue of
is smaller than the largest eigenvalue of
, which is smaller than the largest eigenvalue of
,
.
The previous inequality did not depend on
, so it holds for
as well. To bound
, we use the fact that for symmetric matrices
, where
The third equality uses the multi-dimensional Poisson summation formula referenced in (A1). By assumption, for
and any
,
This is because
for at least one
. Define
, which is the embedding distance in the dimension with the smallest amount of embedding. By assumption, we have
This means that the sum does not contain any terms
for which
. Define the set
, which is a hollowed-out cube on
and has size
. Using this notation, the sum can be bounded as
Lemma 9.5 in Körner (1989) states that if
has
continuous partial derivatives on
, with maximum
th partial derivative
, then
for every
, and so we can use the bound
. This gives us an explicit bound
where
is a polynomial of degree
in
. Then we have
since the largest exponent in
is
. Combining this with
gives the desired result. □
Theorem A1.
Let
have
continuous partial derivatives, and assume the same conditions on the observation and embedding lattice as in Lemma A2. Define
. Then for all
,
meaning that the difference contains two terms with the respective rates.
Proof.
Define the matrix
as
which differs from
in that
in
is replaced by
in
. The difference
can be written as
(A2) The first term in (A2) is
This expression has a similar form to that which appears in the proof of Lemma A2. As before, we need bounds for
and
in order to bound
. The proof for the bound on
is identical to that in Lemma A2, and the proof for the bound on
is similar, although
is replaced by
, which does not change the overall result that the first term in (A2) is
.
To shorten the equations to follow, write
. The second term in (A2) is
Define the discrete Fourier transform matrix
to have
entry
, where
is a Fourier frequency in
and
is a location in
. Partition the discrete Fourier transform matrix
into rows for the observations and missing values as
. We have
, where
is diagonal with entries
. This gives
, and likewise
, where
is diagonal with entries
. Then
can be written as
Note that
. Since
is a row of
and
is the same row of
, we have
where
for the entry corresponding to
and
otherwise. This gives
We can see now that since
, there is a cancellation, giving
This cancellation is the key step. Using matrix norm inequalities, we have
Since
is of length
and has entries
,
. Clearly,
because of its definition, and
because both
and
are diagonal with diagonal entries holding
and
, respectively. This leaves
because
,
,
, and
. Thus, the squared 2-norm is 1 plus the largest eigenvalue of
, which is
with the last inequality following from the proof of Lemma A2. Bringing this all together gives
establishing the second term of the theorem. □
Supplementary material
The methods are implemented in an
package titled
available at
.
References
- Bandyopadhyay S. & Lahiri S. (2009). Asymptotic properties of discrete Fourier transforms for spatial data. Sankhyā 71, 221–59. [Google Scholar]
- Bandyopadhyay S., Lahiri S. N. & Nordman D. J. (2015). A frequency domain empirical likelihood method for irregularly spaced spatial data. Ann. Statist. 43, 519–45. [Google Scholar]
- Chow Y.-S. & Grenander U. (1985). A sieve method for the spectral density. Ann. Statist. 13, 998–1010. [Google Scholar]
- Dahlhaus R. & Künsch H. (1987). Edge effects and efficient parameter estimation for stationary random fields. Biometrika 74, 877–82. [Google Scholar]
- Deb S., Pourahmadi M. & Wu W. B. (2017). An asymptotic theory for spectral analysis of random fields. Electron. J. Statist. 11, 4297–322. [Google Scholar]
- Fuentes M. (2002). Spectral methods for nonstationary spatial processes. Biometrika 89, 197–210. [Google Scholar]
- Fuentes M. (2007). Approximate likelihood for large irregularly spaced spatial data. J. Am. Statist. Assoc. 102, 321–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gneiting T. & Raftery A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Am. Statist. Assoc. 102, 359–78. [Google Scholar]
- Greenbaum A. (1997). Iterative Methods for Solving Linear Systems. Philadelphia: SIAM. [Google Scholar]
- Guinness J. (2018). Permutation and grouping methods for sharpening Gaussian process approximations. Technometrics 60, 415–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guinness J. & Fuentes M. (2017). Circulant embedding of approximate covariances for inference from Gaussian data on large lattices. J. Comp. Graph. Statist. 26, 88–97. [Google Scholar]
- Guinness J. & Katzfuss M. (2018). GpGp: Fast Gaussian Process Computation Using Vecchia’s Approximation. R package version 0.1.0, available athttps://CRAN.R-project.org/package=GpGp. [Google Scholar]
-
Guyon X. (1982). Parameter estimation for a stationary process on a
-dimensional lattice. Biometrika 69, 95–105. [Google Scholar] - Heaton M. J., Datta A., Finley A., Furrer R., Guhaniyogi R., Gerber F., Gramacy R. B., Hammerling D., Katzfuss M., Lindgren F.. et al. (2018). A case study competition among methods for analyzing large spatial data. arXiv:1710.05013v2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heyde C. & Gay R. (1993). Smoothed periodogram asymptotics and estimation for processes and fields with possible long-range dependence. Stoch. Proces. Appl. 45, 169–82. [Google Scholar]
- Körner T. W. (1989). Fourier Analysis. Cambridge: Cambridge University Press. [Google Scholar]
- Lee T. C. (1997). A simple span selector for periodogram smoothing. Biometrika 84, 965–9. [Google Scholar]
- Lee T. C. (2001). A stabilized bandwidth selection method for kernel smoothing of the periodogram. Sig. Proces. 81, 419–30. [Google Scholar]
- Lee T. C. & Zhu Z. (2009). Nonparametric spectral density estimation with missing observations. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009 19–24 April, Taipei, Taiwan). Piscataway, New Jersey: IEEE, DOI: 10.1109/ICASSP.2009.4960265. [DOI] [Google Scholar]
- Lim C. Y. & Stein M. (2008). Properties of spatial cross-periodograms using fixed-domain asymptotics. J. Mult. Anal. 99, 1962–84. [Google Scholar]
- Lindgren F., Rue H. & Lindström J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach. J. R. Statist. Soc. B 73, 423–98. [Google Scholar]
- Little R. J. & Rubin D. B. (2014). Statistical Analysis with Missing Data. Hoboken, New Jersey: John Wiley & Sons. [Google Scholar]
-
Matsuda Y. & Yajima Y. (2009). Fourier analysis of irregularly spaced data on
. J. R. Statist. Soc. B 71, 191–217. [Google Scholar] - Nychka D., Bandyopadhyay S., Hammerling D., Lindgren F. & Sain S. (2015). A multiresolution Gaussian process model for the analysis of large spatial datasets. J. Comp. Graph. Statist. 24, 579–99. [Google Scholar]
- Ombao H. C., Raz J. A., Strawderman R. L. & Von Sachs R. (2001). A simple generalised crossvalidation method of span selection for periodogram smoothing. Biometrika 88, 1186–92. [Google Scholar]
- Pawitan Y. & O’Sullivan F. (1994). Nonparametric spectral density estimation using penalized Whittle likelihood. J. Am. Statist. Assoc. 89, 600–10. [Google Scholar]
- Politis D. N. & Romano J. P. (1995). Bias-corrected nonparametric spectral estimation. J. Time Ser. Anal. 16, 67–103. [Google Scholar]
- R Development Core Team (2019). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; ISBN 3-900051-07-0, http://www.R-project.org. [Google Scholar]
- Stein M. L. (1995). Fixed-domain asymptotics for spatial periodograms. J. Am. Statist. Assoc. 90, 1277–88. [Google Scholar]
- Stroud J. R., Stein M. L. & Lysen S. (2017). Bayesian and maximum likelihood estimation for Gaussian processes on an incomplete lattice. J. Comp. Graph. Statist. 26, 108–20. [Google Scholar]
- Subba Rao S. (2018). Statistical inference for spatial statistics defined in the Fourier domain. Ann. Statist. 46, 469–99. [Google Scholar]
- Sykulski A. M., Olhede S. C., Guillaumin A. P., Lilly J. M. & Early J. J. (2019). The debiased Whittle likelihood. Biometrika 106, 251–66. [Google Scholar]
- Tanner M. A. & Wong W. H. (1987). The calculation of posterior distributions by data augmentation. J. Am. Statist. Assoc. 82, 528–40. [Google Scholar]
- Vecchia A. V. (1988). Estimation and model identification for continuous spatial processes. J. R. Statist. Soc. B 50, 297–312. [Google Scholar]
- Wahba G. (1980). Automatic smoothing of the log periodogram. J. Am. Statist. Assoc. 75, 122–32. [Google Scholar]
- Whittle P. (1954). On stationary processes in the plane. Biometrika 41, 434–49. [Google Scholar]
- Zheng Y., Zhu J. & Roy A. (2009). Nonparametric Bayesian inference for the spectral density function of a random field. Biometrika 97, 238–45. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
























































































































































































































