Summary
We consider geostatistical models that allow the locations at which data are collected to be informative about the outcomes. A Bayesian approach is proposed, which models the locations using a log Gaussian Cox process, while modelling the outcomes conditionally on the locations as Gaussian with a Gaussian process spatial random effect and adjustment for the location intensity process. We prove posterior propriety under an improper prior on the parameter controlling the degree of informative sampling, demonstrating that the data are informative. In addition, we show that the density of the locations and mean function of the outcome process can be estimated consistently under mild assumptions. The methods show significant evidence of informative sampling when applied to ozone data over Eastern U.S.A.
Keywords: Cox process, Gaussian process, Joint model, Point pattern, Posterior consistency, Preferential sampling
1. Introduction
Geostatistical models focus on inferring a continuous spatial process based on data observed at finitely many locations, with the locations typically assumed to be noninformative. As noted by Diggle et al. (2010), this assumption is commonly violated in point-referenced spatial data, as it is not unusual to collect data at locations thought to have a large or small value for the outcome. For example, in monitoring of air pollution, one may place more monitors at locations believed to have a high value of ozone or another pollutant, while in studying distribution of animal species one may systematically look in locations thought to commonly contain the species of interest. Diggle et al. (2010) proposed a shared latent process model to adjust for bias due to informative sampling locations. Their analysis was implemented using a Monte Carlo approach for maximum likelihood estimation.
We follow a Bayesian approach using a model related to those described by R. Menezes in an unpublished 2005 Ph.D thesis from Universidad de Santiago de Compostela, Ho & Stoyan (2008) and Diggle et al. (2010). The locations are modelled using a log Gaussian Cox process (Møller et al., 2001), with the intensity function included as a spatially varying predictor in the outcome model, which also includes spatial random effects drawn from a Gaussian process. A parameter a controls the degree of informative sampling, and the sampling locations are ignorable in the special case in which a = 0, while a > 0 implies a tendency to take more observations at spatial locations having relatively high outcome values. This model modifies shared random effects models for joint modelling of longitudinal and event time data (Radcliffe et al., 2004) and for accommodating informative missingness (Wu & Follmann, 1999).
To our knowledge, we are the first to develop a Bayesian approach to the informative locations problem in geostatistical modelling. However, adapting recently proposed models to the Bayesian paradigm is relatively straightforward, and our primary contribution is studying the theoretical properties of the model. In particular, it is not obvious that the data contain information about the informativeness of the sampling locations, and one may wonder to what extent the prior is driving the results even in large samples. We address this concern by proving that the posterior is proper under a noninformative prior on a. In addition, one can consistently estimate a, the density of the sampling locations and the mean function of the outcome process. This result extends recent work showing posterior consistency in Gaussian process regression models (Choi & Schervish, 2007; Choi, 2007). Proofs are provided in the Appendix.
2. Model for spatial data with informative sampling
Our objective is to estimate the spatial surface μ(s) ∈ , for all s ∈ 𝒟 ⊂ 2, based on observations y1, . . . , yn at locations s1, . . . , sn ∈ 𝒟. We propose the joint model
(1) |
where the observations are independent across locations si given ξ(s) and η(s), and p(s) is the location density. Assuming the locations are a realization of an inhomogeneous Poisson process with log intensity ξ(s), the mean surface is characterized as μ(s) = η(s) + aξ(s), where η(s) is a baseline surface and aξ(s) is an adjustment due to informative sampling. Letting x(s) denote a vector of spatial covariates, ξ(s) = x(s)T βξ + ξr (s) and η(s) = x(s)T βη + ηr (s), where βξ and βη are regression coefficients and ξr (s) and ηr (s) are zero-mean residual processes.
The log sampling density is treated as a latent covariate to adjust for informative sampling, with a > 0 implying that samples are more likely to be taken in areas with a large response. Setting the coefficient in βξ corresponding to the intercept to zero for identifiability,
(2) |
where β* = aβξ + βη. Therefore, accounting for informative sampling is only necessary when there is an association between the spatial surface of interest and the sampling density that cannot be explained by the shared spatial covariates x(s).
The residuals ξr (s) ∼ Πξr and ηr(s) ∼ Πηr are assigned independent zero-mean Gaussian process priors with Matérn covariance functions (Stein, 1999),
(3) |
where ψ = (τ2, ρ, ν) and 𝒦 denotes the modified Bessel function of the second kind. The Matérn covariance has three parameters: τ2 > 0 controls the variance, ρ > 0 controls the spatial range of the correlation and ν > 0 controls the smoothness of the process. Special cases include the exponential c(h | ψ) = τ2 exp(−21/2h/ρ) with ν = 1/2, and the squared exponential c(h | ψ) = τ2 exp(−2h2/ρ2) with ν = ∞.
3. Theoretical properties
3.1. Weak posterior consistency
In this section, we obtain posterior consistency of the parameters of our model under fixed-domain asymptotics. Consider the joint model defined in §2, with 𝒟 = [0, 1]2 without loss of generality and Πξr, Πηr Gaussian processes on 𝒞(𝒟), the space of continuous functions on 𝒟. Letting c(h | ψξ) and c(h | ψη) denote the covariance functions for ξr and ηr, respectively, we choose independent bounded hyperpriors for , , νξ and νη while letting ρξ ∼ πξ and ρη ∼ πη, where the supports of both πη and πξ are +. We choose a proper prior on for a, βξ ∼ N(β0ξ, Σ0ξ), βη ∼ N(β0η, Σ0η) and σ2 ∼ Inv-Ga (ασ, βσ).
Assumption 1. The prior ζ ∼ Π satisfies the prior positivity condition Π(ζ : ‖ζ − ζ0‖∞ < ∊) > 0 for all ∊ > 0 and for any ζ0 ∈ 𝒞(𝒟).
van der Vaart & van Zanten (2009) showed that Assumption 1 holds for Gaussian process priors with squared exponential covariance under mild conditions and, in an unpublished 2005 Ph.D thesis from Carnegie Mellon University, T. Choi provided a set of sufficient conditions on the Matéern covariance kernel for the same setting.
Assumption 2. The covariates are uniformly bounded, so there exists an M > 0 such that ‖x(s)‖ ⩽ M for all s ∈ 𝒟.
Theorem 1. Under models (1)–(2) with priors chosen as described in §3 and Assumptions 1–2, the posterior distribution Π[ξr, ηr, a, βξ, βη, σ | {(yi, si), i = 1, . . . , n}] is weakly consistent.
Theorem 1 does not imply that the hyperparameters in the covariance kernel are consistently estimated, though we do take into account uncertainty in these parameters and do not assume that the priors are well specified. It is typically not possible to consistently estimate all the parameters in the Matérn covariance (Zhang, 2004).
3.2. Posterior propriety of a
Under models (1)–(2), the parameter a controls the degree of informative sampling. The uniform improper prior, πa(a) ∝ 1, provides a noninformative choice. Theorem 2 shows that this prior leads to a proper posterior, implying that the data are informative about a.
Letting s = (s1, . . . , sn), y = (y1, . . . , yn)T, = {ξr (s1), . . . , ξr (sn)}T, and = {ηr(s1), . . . , ηr (sn)}T, we have ∼ N(0, ) and ∼ N(0, ), where (s, s′) = c(‖s − s′‖|ψξ) and (s, s′) = c(‖s − s′‖ |ψη) for s, s′ ∈ 𝒟. Let c(h |ψξ) = exp(−21/2hp/ρξ) and c(h |ψη) = exp(−21/2hp/ρη) for 0 < p ⩽ 2. We assume independent bounded priors on τξ and τη and independent discrete uniform priors on ρξ and ρη. Let βξ ∼ N(β0ξ, Σ0ξ), βη ∼ N(β0η, Σ0η) and σ2 ∼ π(σ2). Here we focus on powered exponential covariance functions rather than Matérn to simplify calculations. A similar result should hold for Matérn covariance functions if the priors on the hyperparameters have a bounded support.
Theorem 2. With the above prior specifications, the marginal posterior distribution of a, p(a | y, s) is proper, provided n ⩾ 2 and Eπ (σ) < ∞.
When the conditions of Theorem 2 are satisfied, the joint posterior is also proper.
4. Computational details
The exact density for the sample locations in (1) is not available analytically, so an approximation is required. In point process modelling, the integral is often approximated as the sum over a fine grid. Letting t1, . . . , tM ∈ 𝒟 be a rectangular grid covering 𝒟 with cell area Δ, we have
(4) |
This approximation yields a tractable posterior, but requires computationally expensive matrix inversions, which we limit using a kernel convolution approximation to the process.
Let δ(s) be a zero-mean Gaussian process with covariance c(h | ψ). A process convolution (Higdon, 2002) lets
(5) |
where W is the Brownian motion and Kψ is a kernel with parameters ψ. The kernel corresponding to the Matérn covariance is
The kernel convolution representation of the Gaussian process in (5) is often used to motivate dimension reduction for the spatial process. Let ϕ1, . . . , ϕN be a grid of spatial knots. Then, for large N,
(6) |
where wj ∼ N(0, 1). Applying kernel convolution to ξ(s) and η(s) yields
(7) |
where uj, υj ∼ N(0, 1). Selecting the number of grid points M and knots N is discussed in §5 and §6.
We use a combination of Gibbs and Metropolis sampling for posterior computation. Assuming conjugate normal and inverse gamma priors, and reparameterization so that uj ∼ N(0, ) and υj ∼ N(0, ), the full conditionals for β*, a, , and the vector (u1, . . . , uN)T are conjugate and we use Gibbs sampling. The correlation parameters ρη and ρξ and the smoothness parameters νη and νξ are updated with Metropolis sampling, tuned to have an acceptance ratio near 0.4. The sampling density parameters υj are updated using blocked Metropolis sampling to account for posterior correlation between coefficients for nearby knots. We used 10 blocks, with knots allocated to blocks using k-means clustering implemented by the kmeans package in R. For the simulation study in §5, we generated 5000 samples and discarded the first 1000 as burn-in. For the analysis of the ozone data in §6, we generated 20 000 samples and discarded the first 5000. Convergence was monitored using trace plots of the deviance as well as several parameters.
5. Simulation study
We conduct a simulation study to illustrate the effect of failing to account for informative sampling on spatial interpolation, and determine the amount of data needed to reliably identify informative sampling. We assume 𝒟 = [0, 1]2 and no spatial covariates, x(s) = 1 for all s. We generate data using model (7) with an equally spaced grid of N = 225 knots on [−0.2, 1.2]2 and a Matérn kernel. We generate S = 50 datasets from each of four simulation scenarios: (i) n = 250, a = 0, ρ = 0.2; (ii) n = 250, a = 1, ρ = 0.2; (iii) n = 250, a = 1, ρ = 0.5 and (iv) n = 500, a = 1, ρ = 0.2, with σ = 1, E{μ(s)} = 0, ν = 2.0 and τ = 0.1 under all scenarios. For each simulated dataset, we fit three models. The noninformative sampling model sets a = 0, the plug-in model sets ξ(s) = ξ̂(s) to account for informative locations and the full model implements the approach of §4. In the plug-in analysis, the location density is estimated using kernel density estimation in R’s KernSur function in the GenKern package with default settings. GenKern gives a bivariate kernel density estimate that uses Gaussian kernels with bandwidth chosen using a direct plug-in approach to approximate the asymptotically optimal bandwidth.
We use the same grid of N = 225 knots for generating the data in the kernel convolution model, and approximate the integral using a square grid of M = 900 points t1, . . . , tM covering [0, 1]. Motivated by Rodrigues & Diggle (2010), we used an equally spaced grid of 225 knots on [−0.2, 1.2]2. The simulation study results show that irrespective of the number and position of the sampling locations, the Gaussian process can be well approximated with 225 knots. Following Lee et al. (2005), the grid spacings are chosen to be no larger than the standard deviation of the kernel in the convolution representation. We use diffuse normal priors for β* and a and the covariance parameters have priors σ2, , ∼ Inv-Ga(0.01, 0.01), , ∼ U(0, 2), and , ∼ U(0, 30).
Table 1 reports bias, mean-squared error, mean absolute deviation, and coverage probability, each averaged over the grid of M spatial locations t1, . . . , tM. The coverage probability is the proportion of the M grid locations for which the posterior 95% interval for μ(tj) covers the true value. For the plug-in model and the full model, we also report the power for a in Table 1 which is defined to be the proportion of datasets for which the posterior 95% credible interval for a excludes zero.
Table 1.
Simulation study results
Design | Model | mse (×102) | mad (×102) | Bias (×102) | cp (×102) | Power for a (×102) |
---|---|---|---|---|---|---|
(i) | nis | 33.1 (2.8) | 41.3 (0.6) | 2.0 (1.3) | 93.0 (1.0) | – |
Plug-in | 32.2 (1.7) | 41.3 (0) | 2.5 (1.3) | 93.0 (1.0) | 10.0 | |
Full | 31.9 (1.2) | 41.5 (0.7) | 2.5 (1.3) | 93.0 (1.0) | 10.0 | |
(ii) | nis | 49.4 (5.0) | 50.0 (1.1) | −25.8 (1.3) | 90.0 (1.0) | – |
Plug-in | 39.2 (5.5) | 44.8 (0.9) | −13.9 (1.3) | 91.0 (1.0) | 74.0 | |
Full | 32.9 (2.8) | 43.2 (0.8) | −7.5 (1.6) | 93.0 (1.0) | 80.0 | |
(iii) | nis | 13.2 (1.1) | 28.1 (1.8) | −8.3 (1.4) | 94.0 (1.0) | – |
Plug-in | 12.1 (0.8) | 27.1 (1.8) | −3.1 (1.4) | 94.0 (1.0) | 40.0 | |
Full | 10.8 (0.7) | 25.3 (1.4) | −2.0 (1.3) | 95.0 (1.0) | 50.0 | |
(iv) | nis | 25.6 (1.1) | 36.9 (0.7) | −15.3 (1.2) | 92.0 (1.0) | – |
Plug-in | 20.9 (0.8) | 33.9 (0.5) | −7.2 (1.1) | 92.0 (1.0) | 88.0 | |
Full | 19.1 (0.6) | 32.6 (0.4) | −0.8 (1.0) | 94.0 (1.0) | 98.0 |
nis, noninformative sampling; mse, mean squared error; mad, mean absolute deviation; cp, convergence probability.
All three methods perform similarly, when sampling is not informative. In this case, the informative sampling methods rarely identify a as significant and reduce to the usual geostatisti-cal model. The noninformative sampling model has high mean squared error and negative bias in the remaining designs with informative sampling. The two methods that allow for informative sampling reduce mean squared error compared with the noninformative sampling model. The informative sampling models also reduce bias, although some bias remains, especially for design (ii). In all cases, the full model improves on the plug-in approach. The relative mean squared error of the noninformative sampling model to the full model is smaller for design (iii) (0.132/0.108 = 1.222) with large spatial range and design (iv) (0.256/0.190 = 1.47), a larger sample size than for design (ii) (0.494/0.329 = 1.502), so it seems that accounting for informative sampling is most important for small datasets with considerable spatial variation.
To analyse sensitivity to the prior for a, we redid simulation design (ii) with a = 1 and = 0.2 and used four different priors for a: N(1, 1), N(0, 1), N(0, 102) and an improper prior. In summary, the mean-squared prediction error and predictive coverage are insensitive to the hyper-parameters of the prior on a for n = 150 and n = 200. Even for a sample size as small as n = 50, differences are small for different priors. However, the N(0, 102) prior and the informative prior N(1, 1) lead to a better power for a than the others when n = 50 and 100. The minimum sample size needed to swamp out the prior for a is ∼ 150 in this example.
6. Analysis of Eastern United States ozone data
With the increasing concern about air pollution and climate change, building predictive models for ozone is an important area. It is often the case that the monitoring locations are informative about the ozone surface and hence it is important to account for informative sampling. We analyse the median daily ozone for June–August 2007 for n = 631 observations in Eastern U.S.A. The data are plotted in Fig. 1(a). There is a clear association between the sampling density and the response, as there are more monitors placed in areas with high ozone, such as Atlanta and New England, than in areas with low ozone, such as Mississippi and West Virginia. We fit a generalized additive model to the median ozone values and the kernel density estimate of the log sampling density using locally weighted scatterplot smoothing as shown in Fig. 1(b). The linear fit is entirely contained within the generalized additive model 95% confidence intervals for all values of the log sampling density estimate, supporting the log-linear model in (1).
Fig. 1.
Plots of the ozone data. (a) The ozone data in parts per billion and the monitor locations (points). (b) The estimated log sampling density against the response. Log sampling density versus median ozone (circles), gamfit with 95% intervals (dashed line), linear fit (solid line).
To apply a stationary spatial model, we first project the spatial locations to a two-dimensional surface using the Mercator projection, and then scale them to the unit square coordinate-wise by subtracting the minimum and dividing by the range of the observation locations. We fit the informative sampling model with a 30 × 30 grid of knots on [−0.2, 1.2]2 in the kernel convolution approximation in (6) and a 50 × 50 grid of points on [0, 1]2 in the integral approximation in the sampling density (4). Points outside the convex hull of the observation locations or outside the continental United States were discarded from integral approximation to the sampling density, leaving M = 1077. Kernel convolution knots not within 0.1 of an integral approximation knot were discarded, leaving N = 490.
We include a second-order spatial trend as predictors in x(s), that is, linear and quadratic terms for rescaled latitude and longitude and their interaction. We compare the noninformative sampling, plug-in and full models described in §5. The posteriors for several parameters are summarized in Table 2. The spatial process for both the mean process and sampling density are fairly smooth. The posterior 95% intervals for νξ and νη exclude the exponential covariance (ν = 0.5) for all the three models.
Table 2.
Mean and 95% intervals for the ozone data
Parameters | nis | Plug-in | Full |
---|---|---|---|
a | – | 4.43 (2.16, 6.46) | 3.21 (2.12, 4.25) |
σ | 4.68 (4.37, 5.03) | 4.70 (4.38, 5.04) | 4.78 (4.47, 5.12) |
τg | 0.17 (0.14, 0.27) | 0.15 (0.13, 0.19) | 0.17 (0.13, 0.21) |
ρg | 0.06 (0.05, 0.16) | 0.06 (0.04, 0.10) | 0.06 (0.05, 0.10) |
νg | 3.95 (0.92, 6.42) | 3.46 (1.53, 5.52) | 12.6 (0.74, 28.8) |
τf | – | – | 0.05 (0.04, 0.06) |
ρf | – | – | 0.07 (0.04, 0.13) |
νf | – | – | 10.7 (0.74, 28.77) |
nis, noninformative sampling.
The 95% interval of a for both the plug-in model (2.16, 6.46) and fully Bayesian model (2.12, 4.25) excludes zero, indicating an informative sampling scheme. The scale of a’s posterior is not comparable between the two models, because the plug-in density estimate has been standardized to have zero-mean and variance one. The effect of accounting for informative sampling is illustrated in Fig. 2. The difference in predicted values between the noninformative sampling and full model in Fig. 2(c) is the largest in Northern Pennsylvania and West Virginia. These areas have relatively few monitors and are near areas with high ozone. The difference between the non-informative sampling and plug-in predictions in Fig. 2(d) are also positive in these areas though the differences are not nearly as large in the plug-in analysis. This may be because the plug-in estimates do not appropriately account for uncertainty in estimation, and hence may lead to some attenuation of the estimated surface.
Fig. 2.
The effect of accounting for informative sampling. (a) Posterior mean predicted values; (b) log sampling density from the full model; (c) the difference in posterior mean predicted values of the noninformative sampling model and full model and (d) the plug-in model.
Finally, we refit the model with different priors and different knot locations to test for sensitivity to these assumptions. We fit the model with 20 × 20 and 40 × 40 initial grids of knots in the kernel convolution approximation. After removing knots outside the domain of interest, we obtain N = 206 and N = 876 knots, respectively. The results were fairly similar to the original 30 × 30 grid. In all cases the posterior of a was separated from zero, the posterior median being 3.31 and 2.85 for N = 206 and N = 876 knots, respectively, and the largest difference between the NIS and full model was in the Northern Pennsylvania and West Virginia.
7. Discussion
We have focused on a simple model for informative locations, which assumes that the outcomes are conditionally independent of the locations, given the mean process μ(s) and the spatial location density p(s). In addition, we include a single parameter a controlling the informativeness of the sampling process. These simplifying assumptions certainly make the theory and computation more tractable. However, to characterize the data from a broader variety of applications more realistically, it may be necessary to generalize the models. There are several interesting directions in this regard. First, it is straightforward conceptually to replace the constant a with a spatially varying coefficient a(s), which is assigned a Gaussian process prior. This generalization allows the informativeness of the sampling locations to vary spatially; for example, in certain regions, e.g., near cities, monitors may be placed without regard to the outcome, while in the rural areas, monitors may be placed at sites likely to have high values of ozone. It is an open question whether one can consistently estimate a(s) in this extended model without very restrictive assumptions. However, a simple adjustment for informative sampling may be preferable to more complicated models that require rich datasets for reliable estimation.
Acknowledgments
This research was partially supported by the National Institute of Environmental Health Sciences of the National Institutes of Health. The authors would like to thank Mr. Avishek Chakraborty and Mr. Anirban Bhattacharya for their helpful comments.
Appendix.
Proof of Theorem 1. Let ϕ = (ξr, ηr, βξ, βη, a, σ) and ϕ0 = (ξ0r, η0r, βξ0, βη0, a0, σ0) be a fixed set of parameters in 𝒞(𝒟) × 𝒞(𝒟) × × +. Clearly (yi, si) ∼ f (y, s | ϕ), where
Here μ(s) = x(s)T(aβξ + βη) + aξr(s) + ηr(s). Let μ0(s) = x(s)T(a0βξ0 + βη0) + a0ξ0r (s) + η0r(s). Define Λ(ϕ0, ϕ) = log {f (y, s | ϕ0)/f (y, s | ϕ)} and K (ϕ0, ϕ) = Eϕ0{Λ(ϕ0, ϕ)}. Then following Schwartz (1965), its enough to show that for all ∊ > 0,
We calculate K (ϕ0, ϕ) using the following equation:
For each δ > 0, define
Take b1 = ‖μ0 − μ‖∞ and b2 = σ/σ0. Let g1(b1, b2) = log b2 − ( − 1)/(2 ) + /(2 ). Clearly g1(b1, b2) is continuous at b1 = 0 and b2 = 1 and g1(0, 1) = 0. We have
and
For ∊ > 0, there exists a δ1 > 0 such that for all ϕ ∈ Bδ1,
There also exists δ2 > 0 such that for all ϕ ∈ Bδ2, {x(s)T(βξ − βξ0) + ξr (s) − ξ0r(s)} < ∊/3 uniformly for all s ∈ 𝒟. If we define hϕ(s) = exp{x(s)T βξ + ξr(s)}, then ϕ ↦ ∫𝒟hϕ(s) ds is a continuous function and hence ϕ ↦ log{∫𝒟 hϕ (s) ds} is also a continuous function. So, there exists a δ3 > 0 such that
Choosing δ = min{δ1, δ2, δ3}, ϕ ∈ Bδ implies K (ϕ0, ϕ) < ∊. From T. Choi’s unpublished 2005 Ph.D thesis, it follows that with the priors specified in §3.1
Hence,
Proof of Theorem 2. The prior specifications on ρξ, ρη, τξ and τη enable one to bound any quadratic forms and determinants involving and by fixed quantities. Hence, in showing that the posterior p(a |y, s) is proper, its enough to treat ρξ, ρη, τξ and τη as constants. Without loss of generality, we can work with 𝒟 = [0, 1]2 by the projection argument described in §6. Following Benes et al. (2003), we consider the grid approximation of the infinite dimensional Gaussian process {ξr (s) : s ∈ 𝒟}, denoted by ξr. Let , with {Ij} denoting a segmentation of 𝒟 into contiguous regions of equal area Δ = J−1 ∫𝒟ds. Choose J sufficiently large such that at most one si lies within any Ij. The infinite-dimensional Gaussian process, ξr, can be approximated by a finite dimensional vector , corresponding to the choice of arbitrary points within I1, . . . , IJ, respectively, such that ξr (si) = if si ∈ Ij. Thus ∼ N(0, ), where . Define the true posterior ptrue( |s) and the approximated posterior pJ ( |s) as follows:
and
Marginalizing out , we have y|s, ξr, a, σ2, βη, βξ ∼ N(Xβ* + a , σ2In + ), where XT = {x(s1) ⋯ x(sn)}. The true posterior of ( , a, σ2, βξ, βη) is
Benes et al. (2003) showed that, under these assumptions, for a fixed s ∈ 𝒟n, the expectation of any bounded function with respect to pJ( |s) converges to the corresponding expectation with respect to ptrue( |s) as J tends to infinity. Hence, there exists a J such that the expectation of the bounded function with respect to pJ ( |s) is greater than the corresponding expectation with respect to (1/2) ptrue( |s). Thus, in order to show propriety of the true posterior of ( , a, σ2, βξ, βg), which involves ptrue( |s), it is enough to show the propriety of the approximated posterior pJ ( , a, βξ, βη, σ2| y, s). The approximated posterior of ( , a, σ2, βξ, βη) is
where C is a constant. As exp{x(si)Tβξ + ξr (si)} < for all i = 1, . . . , n,
After integrating out , excluding , we are left with
where C1 > 0 is a constant and is the variance–covariance matrix of constructed out of . Setting Z = (y − Xβ*)/a, Σ = ( + σ2In)/a2 and Ωη = {( )−1 + Σ−1}−1 and completing quadratic forms yield
where C2 > 0 is another constant. Next we state a useful lemma from matrix algebra.
Lemma 1. If A and B are positive definite square matrices so is A − A(A + B)−1 A.
Proof. We have
The conclusion follows from the fact that the sum and inverses of positive definite matrices of the same dimension are also positive definite.
From Lemma 1, we have (ZTΣ−1Z − ZTΣ−1ΩηΣ−1Z) ⩾ 0, so that
Integrating out first and then βξ and βη,
Call = A and = B. Hence
Now we state a useful result from matrix algebra.
Proposition 1. If A and B and nonnegative definite matrices, then | A + B| ⩾ | A| + |B| with strict inequality holding in case of positive definite matrices.
Using Proposition 1, we get
where 0 < b1 ⩽ b2 ⩽ ⋯ ⩽ bn are the eigen values of B. By Minkowski’s inequality we get
Set |A|1/n = k1 and bn = k2. We assume n ⩾ 2. Then ignoring constants
Now, since Eπ (σ) < ∞,
By Fubini’s Theorem, p(a | Y, s) is integrable.
References
- Benes V, Bodlák K, Møller J, Waagepetersen RP. The ISI Int Conf Environ Statist Health. Univ Santiago de Compostela; 2003. Bayesian analysis of log Gaussian Cox process models for disease mapping; pp. 95–105. [Google Scholar]
- Choi T. Alternative posterior consistency results in nonparametric binary regression using Gaussian process priors. J Statist Plan Infer. 2007;137:2975–83. [Google Scholar]
- Choi T, Schervish M. On posterior consistency in nonparametric regression problems. J Mult Anal. 2007;98:1969–87. [Google Scholar]
- Diggle P, Menezes R, Su T. Geostatistical inference under preferential sampling (with discussion) Appl Statist. 2010;59:191–232. [Google Scholar]
- Higdon D. Space and space-time modelling using process convolutions. Quant Methods Curr Environ Issues. 2002:37–56. [Google Scholar]
- Ho L, Stoyan D. Modelling marked point patterns by intensity-marked Cox processes. Statist Prob Lett. 2008;78:1194–9. [Google Scholar]
- Lee H, Higdon D, Calder C, Holloman C. Efficient models for correlated data via convolutions of intrinsic processes. Statist Mod. 2005;5:53–74. [Google Scholar]
- Møller J, Syversveen A, Waagepetersen R. Log Gaussian Cox processes. Scand J Statist. 2001;25:451–82. [Google Scholar]
- Radcliffe SJ, Guo W, Ten Have T. Joint modelling of longitudinal and survival data via a common frailty. Biometrics. 2004;60:892–9. doi: 10.1111/j.0006-341X.2004.00244.x. [DOI] [PubMed] [Google Scholar]
- Rodrigues A, Diggle P. A class of convolution-based models for spatio-temporal processes with non-separable covariance structure. Scand J Statist. 2010;37:553–67. [Google Scholar]
- Schwartz L. On Bayes procedures. Z. Wahrsch. Verw. Gebiete. 1965;4:10–26. [Google Scholar]
- Stein ML. Interpolation of Spatial Data: Some Theory for Kriging. New York: Springer Series in Statistics; 1999. [Google Scholar]
- van der Vaart A, van Zanten J. Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth. Ann Statist. 2009;37:2655–75. [Google Scholar]
- Wu M, Follmann D. Use of summary measures to adjust for informative missingness in repeated measures data with random effects. Biometrics. 1999;55:75–84. doi: 10.1111/j.0006-341x.1999.00075.x. [DOI] [PubMed] [Google Scholar]
- Zhang H. Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J Am Statist Assoc. 2004;99:250–61. [Google Scholar]