Bayesian semiparametric model with spatially-temporally varying coefficients selection

Bo Cai; Andrew B Lawson; Md Monir Hossain; Jungsoon Choi; Russell S Kirby; Jihong Liu

doi:10.1002/sim.5789

. Author manuscript; available in PMC: 2014 Sep 20.

Published in final edited form as: Stat Med. 2013 Mar 25;32(21):3670–3685. doi: 10.1002/sim.5789

Bayesian semiparametric model with spatially-temporally varying coefficients selection

Bo Cai ^1,^*, Andrew B Lawson ², Md Monir Hossain ³, Jungsoon Choi ², Russell S Kirby ⁴, Jihong Liu ¹

PMCID: PMC3744634 NIHMSID: NIHMS461190 PMID: 23526312

Abstract

In spatio-temporal analysis, the effect of a covariate on the outcome usually varies across areas and time. The spatial configuration of the areas may potentially depend on not only the structured random intercept but also spatially varying coefficients of covariates. In addition, the normality assumption of the distribution of spatially varying coefficients could lead to potential biases of estimations. In this article, we propose a Bayesian semiparametric space-time model where the spatially-temporally varying coefficient is decomposed as fixed, spatially varying and temporally varying coefficients. The spatially varying coefficients of space-time covariates are modeled nonparametrically by using the area-specific Dirichlet process prior with weights transformed via a generalized transformation. Temporally varying coefficients of covariates are modeled through the dynamic model. Uncertainty of inclusion of the spatially-temporally varying coefficients is also taken into account by variable selection procedure through determining the probabilities of different effects for each covariate. The proposed semiparametric approach shows the improvement compared to the Bayesian spatial-temporal models with normality assumption on spatial random effects and the Bayesian model with the Dirichlet process prior on the random intercept. A simulation example is presented to evaluate the performance of the proposed approach with the competing models. An application to low birth weight data in South Carolina is used for an illustration.

Keywords: Area-specific Dirichlet process, Bayesian space-time models, Spatially-temporally varying coefficients, Variable selection

1 Introduction

Spatial-temporal data are often encountered in various disciplines such as epidemiology, ecology, political sciences, and economics. For example, the average of household income varies across different areas and time. In many applications, spatial-temporal regression models are used to explain the response variable observed over areas and time.

Suppose that the dependent variable y_it is observed in the ith spatial unit and the tth time point, for i = 1, … ,n and t = 1, …, T. A general space-time model can be expressed as

y_{i t} ~ f (y_{i t} | \cdot),

(1)

where f(y_it|·) denotes a conditional distribution of y_it given observed covariates, latent variables, and measurement errors, with mean μ_it, μ_it = E(y_it), which is typically related to a linear predictor η_it through a suitable link function g(·) where η_it = g(μ_it). The response variable could be observed as a continuous (e.g. disease rate), categorical (e.g. indicates of disease or health status) and count (e.g. disease or death number) outcome. When the response is an area-referenced count,

y_{i t} ~ Poisson (E_{i t} exp (η_{i t})),

(2)

where E_it is an expected number of events which is thought of as fixed and sometimes obtained by applying a standard table of sex- and age group-specific rates to the population count in district i at time t, n_it, subdivided by age and sex [1]. In our case, we set E_it = Rn_it with R = Σ_it y_it/Σ_it n_it. The standardization here is often referred to as internal standardization because we have used the same data to compute reference rates R. The logarithm of the relative risk, η_it, can usually be expressed as

η_{i t} = x_{i t}^{'} β + u_{i} + v_{i} + γ_{t},

(3)

where x_it = (1, x_it2, …, x_itp)′ denotes a p × 1 vector of covariates associated with unit i and time t, β = (β₁, …, β_p)′ denotes a p × 1 vector of population parameters, u_i and v_i denote random effects measuring spatial similarity and excess heterogeneity, respectively, and γ_t denotes a structured temporal random component. Conventionally, the fixed effects β can be modeled by a multivariate normal prior. The parameters u_i and v_i are assumed to be independent. The parameter v_i captures the heterogeneity among the units which is chosen to follow an exchangeable normally distributed prior, while u_i captures the spatial heterogeneity of data which is assumed to follow an intrinsic conditional autoregressive (CAR) distribution (a special case of the general class of Markov random field) [2], u_i|u_−i ~ CAR(τ), i.e. u_i|u_−i ~ N(ū_i, (τm_i)⁻¹), where u_−i = (u₁, …, u_i−1, u_i+1, …, u_n)′, $ū_{i} = m_{i}^{- 1} \sum_{j \in \partial_{i}} u_{j}$ with ∂_i denoting the neighbor set of unit i, m_i denotes the number of neighbors of unit i, and τ denotes the precision parameter. The constraint $\sum_{i = 1}^{n} u_{i} = 0$ is defined for the purpose of identifiability of the overall intercept. The temporal parameter γ_t is assumed to follow an autoregressive (AR) prior.

Model (2) is a typical spatio-temporal model for areal data based on which some hierarchical structures are developed [3–5]. More complex issues occur when the space-time interaction effect (e.g. w_it) is included in the predictor (3) [6–8]. Ugarte et al. [9] presented the evaluation of the performance of various simple spatio-temporal Bayesian models. Much work, however, assumed that effects of covariates on the response were constant across areas and time. In some applications, this assumption would be inappropriate. For example, the effect of the poverty rate on the low birth weight may vary across different regions and time points. To allow coefficients to vary spatially, among others, Assunção [10] and Gamerman et al. [11] respectively proposed spatially varying coefficients models for small area data. Dreassi et al. [12] developed a space model with time dependent covariates for small area data. Cai et al. [13] proposed a Bayesian regression model with multivariate linear splines for the analysis of space-time data. For point-referenced data, some approaches have been developed. Gelfand et al. [14] proposed a spatial process modelling for univariate and multivariate dynamic spatial data. Paez et al. [15] developed spatially varying dynamic coefficient models.

In the aforementioned spatial-temporal models, the spatially varying coefficients are often assumed to follow Gaussian distributions. In practice, the normality assumption is difficult to verify empirically and may be overly restrictive as spatially-varying coefficients may follow other distributions and may have clustering issues. Recently some approaches have been developed to relax the normality assumption for modeling point-referenced data. Gelfand et al. [16] proposed a Bayesian nonparametric spatial modeling with spatial Dirichlet process (SDP) mixture models. Duan et al. [17] developed a generalized spatial Dirichlet process. Reich and Fuentes [18] described a multivariate semiparametric Bayesian spatial model for spatial data. In contrast, for areal data, the semiparametric model with spatially-temporally varying coefficients of covariates has a lack of development. Li et al. [19] proposed nonparametric hierarchical models for areal data. They modeled the spatial random intercept by using areal-referenced spatial stick-breaking prior with the logit link between the weight and the random variate from the CAR.

In this paper, we focus on developing a Bayesian semiparametric space-time model with spatial-temporally varying coefficients of covariates. We model the spatially varying coefficients by using the area-specific stick-breaking representation for the Dirichlet process prior with the generalized transformation between the weight in the stick-breaking prior and the spatially-specific random variate from the CAR. The generalized transformation includes the linear link, logit link and probit link, providing more realistic weights associated with the spatial information in the areal data. Temporally varying coefficients are modeled by using a dynamic model. Each covariate could have different effects on the response variable, including no effect, the overall-only effect, the spatial-only effect, the temporal-only effect and the spatial-temporal effect. The proposed model allows for the uncertainty of inclusion of different effects for each covariate. We use the variable selection procedure through determining the probabilities of different effects for each covariate.

The remainder of the article proceeds as follows. Section 2 describes the semiparametric model with spatial-temporally varying coefficients while allowing for uncertainty of inclusion of the coefficients. Prior specification and posterior implementation are described. Section 3 discusses the model evaluation and comparison. Section 4 evaluates the performance of the approach based on a simulated example. Section 5 illustrates the approach via a real spatial-temporal data. Finally, Section 6 summarizes and discusses the results.

2 Semiparametric Model with Uncertainty of Spatial-Temporally Varying Coefficients

2.1 The model with selection of spatial-temporally varying coefficients

We consider to model the logarithm of the relative risk as

η_{i t} = x_{i t}^{'} θ_{i t},

(4)

where θ_it = (θ_it1, …, θ_itp)′ denotes a p × 1 vector of spatial-temporally varying coefficients of covariates. One can decompose each element of coefficients θ_it as θ_itk = α_k + β_ik + γ_tk, for k = 1, …, p, where α_k denotes the global effect of the kth covariate, β_ik denotes the spatially structured random effect of the kth covariate and γ_tk denotes the temporal-specific effect of the kth covariate. The regression coefficients are taken to be independent across covariates. However, this decomposition assumes that each covariate simultaneously has an overall effect, spatially and temporally varying effects on the response. This assumption is too restrictive in general which might make the fitted model over-parameterized because a covariate may have 1) no effect; 2) only fixed effect; 3) only spatial-specific effect given the fixed effect; 4) only temporal-specific effect given the fixed effect; 5) all three effects on the response. To account for this uncertainty, we consider to define θ_itk as

θ_{i t k} = δ_{1 k} α_{k} + δ_{1 k} (δ_{2 k} β_{i k} + δ_{3 k} γ_{t k}),

(5)

where δ_1k, δ_2k and δ_3k denote the indicator variables for α_k, β_ik and γ_tk, respectively. We fix δ₁₁, δ₂₁ and δ₃₁ to be one for all i and t to reflect some overall spatial-temporal effect. For k ≥ 2, under this construction, a covariate has

no effect (i.e. θ_itk = 0) for all i and t if δ_1k = 0,
only fixed effect (i.e. θ_itk = α_k) for all i and t if δ_1k = 1 and δ_2k = δ_3k = 0,
only spatial-specific effect given the fixed effect (i.e. θ_itk = α_k + β_ik) for all t if δ_1k = δ_2k = 1 and δ_3k = 0,
only temporal-specific effect given the fixed effect (i.e. θ_itk = α_k + γ_tk) for all i if δ_1k = δ_3k = 1 and δ_2k = 0,
spatial-temporally varying effects (i.e. θ_itk = α_k + β_ik + γ_tk) across areas and time if δ_1k = δ_2k = δ_3k = 1.

It is assumed that a covariate has no spatial- and temporal-specific effects if it does not have a global effect. In addition, given a global effect, a covariate having the spatial-specific effect is assumed to be independent of having the temporal-specific effect. Thus, for the priors of the indicators, we have π(δ_1k, δ_2k, δ_3k) = π(δ_2k|δ_1k)π(δ_3k|δ_1k)π(δ_1k), where π(δ_1k = 1) = p_1k, π(δ_2k = 0|δ_1k = 0) = π(δ_3k = 0|δ_1k = 0) = 1, π(δ_2k = 1|δ_1k = 1) = p_2k and π(δ_3k = 1|δ_1k = 1) = p_3k. Then the prior for δ_1k, δ_2k and δ_3k is expressed as

π (δ_{1 k}, δ_{2 k}, δ_{3 k}) = {\begin{matrix} 1 - p_{1 k} & if δ_{1 k} = δ_{2 k} = δ_{3 k} = 0 \\ p_{1 k} (1 - p_{2 k}) (1 - p_{3 k}) & if δ_{1 k} = 1 and δ_{2 k} = δ_{3 k} = 0 \\ p_{1 k} p_{2 k} (1 - p_{3 k}) & if δ_{1 k} = δ_{2 k} = 1 and = δ_{3 k} = 0 \\ p_{1 k} p_{3 k} (1 - p_{2 k}) & if δ_{1 k} = δ_{3 k} = 1 and = δ_{2 k} = 0 \\ p_{1 k} p_{2 k} p_{3 k} & if δ_{1 k} = δ_{2 k} = δ_{3 k} = 1 \end{matrix}

(6)

It is obvious that sum of the probabilities in (6) equals one. These priors provide prior probabilities of the five different scenarios. It is shown that the indicator δ_1k allows the kth covariate to be included or excluded from the model while δ_2k and δ_3k indicate if the kth covariate has spatial-temporally varying effects given that it is included in the model. The proposed variable selection structure (5) can be thought of as a general case of the variable selection method for spatial-temporally varying effects of covariates. When there are only global effects for the covariates, the proposed model reduces to the model in (3). When the observations are only spatially dependent, the proposed structure reduces to the one by Reich et al. [20] where they focused on variable selection in the parametric model with spatially-varying coefficients. If the indicators are ones (i.e. there is no variable selection), the model becomes a spatial-temporally varying coefficient model. Specially, when the observations are only spatially dependent, the proposed model reduces to the space-varying regression model [10, 11], i.e. $η_{i} = x_{i}^{'} θ_{i}$ with θ_i = α + β_i. One might be concerned with the identifiability of the indicators and the coefficients in (5). This concern can be relieved as we are actually interested in δ_1kα_k, δ_1kδ_2kβ_ik and δ_1kδ_3kγ_tk, and they are identifiable. On the other hand, Bayesian identifiability concerns whether the data and prior provide information for updating the indicators and the coefficients [21]. For example, when the indicator δ_1k = 0, all the coefficients (i.e. α_k, β_ik and γ_tk) only rely on the priors. The data will be involved in updating the coefficient(s) when δ_1k = 1.

To allow for exibility of the prior probability, p_lk, for l = 1, 2, 3, we consider choosing a hyper-prior Beta distribution for the prior exclusion probability, p_lk ~ Beta(c_l, d_l). Given these prior probabilities, the full conditional probabilities for different scenarios shown in (6) can be easily calculated through the categorical distribution (see details in Appendix). For the choice of c_l and d_l (l = 1, 2, 3), following the suggestion by Geisser [22], we choose c_l = d_l = 1 which yields the uniform hyper prior. Scott and Berger [23] discuss the choice of priors for the prior probability. They conclude that the objective prior (i.e. the uniform prior) for the prior probability can easily be implemented computationally while incorporation of subjective prior information can be beneficial when available. In our case, we have no subjective information about the prior probability of inclusion of the covariates, resulting in choosing a uniform prior. For more details, please refer to Geisser [22], Scott and Berger [23], and Cui and George [24], among others.

2.2 Nonparametric modeling for spatially varying coefficients

Typically, the prior of the global effect of the kth covariate, α_k, is taken to be $N (0, τ_{α, k}^{- 1})$ where τ_α,k is the precision following a gamma prior Gamma(a_α,k, b_α,k) with mean a_α,k/b_α,k and variance $a_{α, k} / b_{α, k}^{2}$ . The conditional specification of the prior for the temporally varying effect of the kth covariate can be taken as $N (γ_{t - 1, k}, τ_{γ, k}^{- 1})$ , for t = 1, …, T, with τ_γ,k being the precision following Gamma(a_γ,k, b_γ,k). The initial element 0k is chosen to be zero. For the spatially varying effect of the kth covariate, a conventional choice is the conditional distribution, β_ik|β_−ik ~ N(β̄_ik, (m_iτ_β,k)⁻¹), where β_−ik = (β_1k, …, β_i−1,k, β_i+1,k, …, β_nk)′, ${β̅}_{i k} = m_{i}^{- 1} \sum_{j \in \partial_{i}} β_{j k}$ , m_i denotes the number of neighbors of area i, and τ_β,k denotes the precision following a gamma prior Gamma(a_β,k, b_β,k). For identifiability, the constraint for β_ik’s is Σ_i β_ik = 0 for k = 1, …, p. However, the normal prior assumption constrains the distributions that the spatially varying random effects may follow. In contrast, the nonparametric prior over distributions provides wider support, typically the support being the space of all distributions (i.e. a infinite dimensional space). As a result, a nonparametric assumption allows for various shapes of the distribution, which may more accurately reflect our prior belief about the true distribution of spatially varying random effects. To allow for uncertainty of distributions that β_ik may follow, we consider β_ik ~ G_ik, where G_ik is an unknown random distribution varying across different areas. We can then choose a prior distribution for G_ik with support on the space of all probability measures.

Among the nonparametric processes (e.g. Gaussian process, Pólya tree process, etc.), the Dirichlet process (DP) is one of the most prominent random probability measures due to its richness, computational ease, and interpretability. We consider using the DP in our approach for several reasons. First, any distribution over its space can be approximated arbitrarily and accurately in the weak limit by a sequence of draws from the DP [25]. Second, since the distributions drawn from the DP are discrete, the DP has the clustering property which allows for the repeated values, implying that multiple β_ik’s can take on the same value simultaneously. This feature of the DP is desirable and reflects the attribute of the spatially varying coefficients which are typically clustered. Third, the stick-breaking representation [26] (which will be described later on) of the DP provides a convenient way of incorporating the area-specific information into the random distribution of β_ik’s. Finally, with the nice representation such as stick-breaking, the DP can be efficiently implemented. For more details on nonparametric Bayesian processes, one may refer to Ghosal and van der Vaart [25].

The Dirichlet process (DP) prior can be specified as DP(MG₀), where M is a concentration parameter and G₀ is the base measure of the Dirichlet process. Under this specification, for any partition B = (B₁,…, B_q)′ of ℛ, we have

{G (B_{1}), \dots, G (B_{q})} ~ D (M G_{0} (B_{1}), \dots, M G_{0} (B_{q})),

where D(·) denotes the Dirichlet distribution on the simplex of Rⁿ. This structure centers the distribution at the parametric base distribution, G₀, while allowing the true distribution to deviate from the parametric form. The amount of uncertainty in the parametric assumption is controlled by M. As M tends to zero, most of the samples share the same value sampled from the base measure G₀, whereas when M tends to infinity, the samples are almost i.i.d. samples from G₀. One of the popular representations of the DP prior is the Pólya urn representation [27, 28]. Briefly, a Pólya urn prior of β_i can be expressed as ${(M + n - 1)}^{- 1} M G_{0} + {(M + n - 1)}^{- 1} \sum_{s = 1}^{k^{(i)}} r_{s}^{(i)} δ_{β_{s}^{* (i)}} (\cdot)$ , where k⁽ⁱ⁾ denotes the number of distinct values across all $β_{j}^{'} s$ excluding β_i, $r_{s}^{(i)}$ denotes the frequency of all $β_{j}^{'} s$ (excluding β_i) being equal to the unique value $β_{s}^{* (i)}$ , and δ_β*(·) denotes the degenerate distribution at β*. Although the Pólya urn Gibbs sampling can be implemented straightforwardly, some limitations remain. When model (1) is not a normal distribution, it is problematic to calculate the probability of generating new samples from the posterior based on the prior and the likelihood due to its nonconjugacy. In addition, from the posterior distribution, the parameter is updated one at a time by using Gibbs sampling. This procedure could lead to slow mixing problem. Although an accelerated step [28] can enhance the mixing behavior, slow mixing may still occur due to the inherent property of one-at-a-time updates.

To avoid the limitations with the Pólya urn Gibbs sampling, we consider the blocked Gibbs sampling based on the finite dimensional Dirichlet priors [29]. With stick-breaking representation [26], the finite dimensional prior G can be expressed as $G \overset{d}{=} \sum_{s = 1}^{r} ω_{s} δ_{θ_{s}} (\cdot)$ , where r denotes the number of mixture components, ω_s denotes the weight and δ(·) denotes a discrete measure concentrated at θ_s which is randomly generated from the base measure G₀. For the choice of the truncation of the mixture, Ishwaran and Zarepour [30] suggested to use a reasonably large value such as 50 or the sample size.

To allow the unknown distribution of β_ik to vary across different areas, we propose to model the spatially varying coefficients by using the area-specific stick-breaking prior. Let S_k = (S_1k, …, S_nk)′ be a configuration, determining a classification of β_k = (β_1k, …, β_nk)′ into r_k distinct values $β_{k}^{*} = (β_{1 k}^{*}, \dots, β_{r_{k} k}^{*})'$ , with S_ik = s if β_ik in area i belonging to group s for covariate k in terms of the spatially varying effect, i.e. $β_{i k} = β_{s k}^{*}$ , for s = 1, …, r_k. Then we can model β_ik as follows

S_{i k} ~ \sum_{s = 1}^{r_{k}} ω_{i s k} δ_{s} (\cdot), ω_{i s k} = V_{i s k}^{*} \prod_{l = 1}^{s - 1} (1 - V_{i l k}^{*}), β_{s k}^{*} | \cdot ~ N (0, τ_{β, k}^{- 1}), for s = 1, \dots, r_{k},

where $V_{i s k}^{*} = u_{i s k} V_{s k}, V_{s k} \overset{i i d}{~} Beta (1, M_{k})$ and $\prod_{l < s} (1 - V_{i l k}^{*}) = 1$ for s = 1. The parameter u_isk is defined as a covariate-specific spatial weight which depends on the location-associated random variate. Since u_isk ∈ (0, 1), following Ishwaran and James [29], it can be shown that $\sum_{s = 1}^{r_{k}} ω_{i s k} = 1$ is almost surely in the aforementioned area-specific stick-breaking. We use a transformation, g(u_isk) = ϕ_isk, where ϕ_isk is assumed to follow a CAR(τ_k) prior. The transformation links the spatial weight to the CAR-distributed variate. Unlike the logit transformation used by Li et al. [19], we consider a more general transformation family introduced by Aranda-Ordaz [31],

g (u) = \frac{2}{λ} \frac{u^{λ} - {(1 - u)}^{λ}}{u^{λ} + {(1 - u)}^{λ}},

(7)

where λ denotes the transformation parameter. The choice of different values of λ results in various link functions. This includes that λ = 0, 0.4, 1 corresponds to the logit transformation in the limit, the probit link in approximation and the linear transformation, respectively. The inverse transformation function is then defined as

u = h (ϕ) = \frac{{(1 + ϕ λ / 2)}^{1 / λ}}{{(1 + ϕ λ / 2)}^{1 / λ} + {(1 - ϕ λ / 2)}^{1 / λ}}, for | ϕ λ | < 2

(8)

h(ϕ) = 0 for ϕλ ≤ −2 and h(ϕ) = 1 for ϕλ ≥ 2. Since the transformation is symmetric, we can focus on λ ≥ 0. We choose a uniform prior for λ in the range of (0, 0.5) where the logit and probit links are covered. This setting also allows ϕ to vary in a reasonable range. The prior of the concentration parameter M_k is chosen to be Uniform(0, 10) [32].

2.3 Posterior computation

We choose priors for the parameters as described in Section 2.1. The posterior computation relies on a blocked Gibbs sampling algorithm in which we iteratively sample from the full conditional distributions of a block of the parameters. For update of a single parameter from the non-conjugate distribution, we use adaptive rejection Metropolis sampling [33]. For a block of parameters, the posterior computation relies on the Gibbs sampler and Metropolis-Hastings algorithms. After initializing values for the parameters, the proposed MCMC algorithm proceeds in a series of steps outlined in the Appendix. Samples from the joint posterior distribution of the parameters are generated by repeating those steps for a large number of iterations after apparent convergence.

3 Model Comparison

The deviance information criterion (DIC) [34] is widely used as a model comparison tool. DIC is shown to be an approximation to a penalized loss function based on the deviance with a penalty derived from a cross-validation argument. However, the implicit approximation is valid only when the effective number of parameters is much smaller than the number of independent observations [35]. Plummer [35] pointed out that in disease mapping, this assumption does not hold, resulting in that DIC under-penalizes the complex models. Plummer [35] proposed penalized loss functions instead of p_D, the effective number of parameter, to assess model adequacy. However, as Plummer [35] noticed, this method requires MCMC runs with each observation left out in turn. Such calculation is not feasible in general, especially for large data sets. In this article, we consider a comparison method based on the conditional predictive ordinate (CPO) [36–39]. The CPO for the ith observation at time t is defined as the cross-validated marginal posterior predictive density

C P O_{i t} = f (y_{i t} | y_{(i t)}) = \int f (y_{i t} | θ) f (θ | y_{(i t)}, x_{(i t)}) d θ = {(\int \frac{1}{f (y_{i t} | θ, x_{i})} f (θ | y, x) d θ)}^{- 1},

where y_(it) denotes the vector of observations with the ith observation at time t deleted and θ is the vector of model parameters. The cross-validation likelihood can be estimated by

L_{C V} = \prod_{i = 1}^{n} \prod_{t = 1}^{T} C P O_{i t} .

Since the quantity of the cross-validation likelihood is typically close to zero, the negative cross-validatory predictive log-likelihood [40] can be used,

N L L K_{C V} = - \sum_{i = 1}^{n} \sum_{t = 1}^{T} log C P O_{i t} .

Since a closed form of CPO_it is usually unavailable, a Monte Carlo estimate of CPO_it can be obtained straightforwardly through MCMC samples ${θ^{(s)}}_{s = 1}^{N}$ from the posterior distribution f(θ|y, x),

{\hat{C P O}}_{i t} = {(\frac{1}{N} \sum_{s = 1}^{N} \frac{1}{f (y_{i t} | θ^{(s)}, x_{i})})}^{- 1},

where N is the number of iterations after a burn-in period. The estimate of the negative cross-validatory predictive log-likelihood can be calculated accordingly. Since a large CPO indicates agreement between the observation and the model, a model with a smaller NLLK_CV for all observations implies a better fit.

4 A Simulation Study

We evaluated the performance of the proposed approach, including the accuracy of the estimates, the sensitivity to different choices of hyperparameters, and comparison of the proposed model with other space-time models. Without loss of generality and for illustration purpose, we created the spatial data using South Carolina geographical structure containing 46 counties. The data were generated for n = 46 counties over T = 10 time points based on the model y_it ~ Poisson(E_it exp(η_it)), where the log-relative risk $η_{i t} = x_{i t}^{'} θ_{i t}$ with x_it = (1, x_it2, x_it3, x_it4, x_it5)′ and θ_it = α + β_i + γ_t. We chose α = (1, 1, 1, 1, 0)′, β_i = (0, β_i2, β_i3, 0, 0)′ with β_i2 being clustered to follow four different distributions and β_i3 to follow five different distributions (Figure 1), and γ_t = (0, γ_t2, 0, γ_t4, 0)′ with γ_t2 ~ N(γ_t−1,2, 0.5) and γ_t4 ~ N(γ_t−1,4, 1), for t = 1,…, T. This setting implies that the first covariate (i.e. the intercept) only has an overall effect, the second covariate has the fixed and spatial-temporal effects, the third covariate has the fixed and spatial effects, the fourth covariate has the fixed and temporal effects and the fifth covariate has no effect. We generated x_itl ~ Uniform(0, 1) for l = 2,…, 5.

The design of two spatial random effects, β_i2 and β_i3, in the simulation study, where the clusters with different colors in the map show different distributions.

We specified the priors for the parameters of the proposed model as follows. We used Gamma(0.05, 0.05) as the prior for τ_α,k and τ_γ,k. Following Ishwaran and James [29], we chose Gamma(2,2) as the prior for M_k to encourage both small and large values of M_k. Following Ishwaran and Zarepour [30], we chose r_k = n = 46. The prior for the spatially structured random effects β_ik was chosen as the nonparametric prior described in Section 2.2. We chose prior probabilities in equation (6) to be 0.5 to express an equal chance for inclusion and exclusion.

We implemented the analysis using the Gibbs sampler described in Section 2.3. We generated 50,000 iterations after a burn-in of 10,000 iterations. Convergence was assessed by using a variety of diagnostics described by Cowles and Carlin [41] and implemented using CODA [42] in R [43]. The diagnostic tests showed rapid convergence and efficient mixing. The parameters were estimated by thinning the chain by factor of 5 to obtain a sample of size 10,000.

We compared the proposed model (Model 5) with the four competing spatio-temporal models. The log-relative risks of these models are listed as follows:

Model 1: $η_{i t} = x_{i t}^{'} α + u_{i} + v_{i} + γ_{t},$ ,
Model 2: $η_{i t} = x_{i t}^{'} α + x_{i t}^{'} u_{i} + v_{i} + γ_{t}$ ,
Model 3: $η_{i t} = x_{i t}^{'} γ_{t} + u_{i t} + v_{i t}$ , and
Model 4: $η_{i t} = x_{i t}^{'} α + u_{i} + γ_{t}$ .

In the four models, we followed conventional settings by specifying the prior of α as N_p(0, Σ_α) with $Σ_{α} ~ IWishart (p, Σ_{0}^{- 1})$ and $Σ_{0}^{- 1}$ is a 5 × 5 precision matrix with diagonal element being 0.1 and off-diagonal element being 0.05. The prior of u_i in Model 1 was chosen as $CAR (τ_{1}^{*})$ with $τ_{1}^{*} ~ Gamma (0.005, 0.005)$ . The prior of v_i was taken as $N (0, τ_{2}^{* - 1})$ with $τ_{2}^{*} ~ Gamma (0.005, 0.005)$ . The conditional specification of the prior of γ_t was chosen as $N (γ_{t - 1}, τ_{3}^{* - 1})$ with $γ_{0} ~ N (0, τ_{3}^{* - 1})$ and $τ_{3}^{*} ~ Gamma (0.005, 0.005)$ . In Model 2, the prior of u_i was chosen as a multivariate CAR model, $MCAR (Σ_{u}^{- 1})$ . In Model 3, the prior of γ_t was chosen as $N (γ_{t - 1}, τ_{4}^{* - 1})$ . The priors of u_t and v_t in Model 3 were taken as $CAR (τ_{5}^{*})$ and $N (0, τ_{6}^{* - 1} I)$ , respectively, for t = 1, …, T. In Model 4, the prior of u_i was assumed to be a typical DP prior. Relying on the BlackBox component builder, WinBUGS[44] allows one to carry out relatively simple Bayesian statistical modeling by simply specifying a model and the priors for its parameters. For this reason, we implemented Models 1–4 using WinBUGS. Although it can be conceptually implemented using WinBUGS, the proposed model was written in R due to slowness and lack of exibility of WinBUGS. Based on our experience, both WinBUGS and R programs provide really similar results.

The second column in Table 1 presents the comparison of the estimated negative cross-validatory predictive log-likelihoods for the four models. It is shown that Model 5 with the smallest value of NLLK_CV,sim outperforms the other four models. Table 2 shows the posterior probabilities of inclusion of covariates in the five different cases listed in (6). It is evident that the model selects the designed covariate structure for each covariate with the highest posterior probability.

Table 1.

Model comparison based on the negative cross-validatory log-likelihood for the simulated example and the application to the low birth weight data in South Carolina.

Model	NLLK_CV,sim	NLLK_CV,app
Model 1	1189.78	1897.18
Model 2	1166.20	1854.43
Model 3	1175.92	1869.29
Model 4	1191.93	1810.67
Model 5	879.51	1689.83

Open in a new tab

Table 2.

Posterior probabilities of inclusion of the four covariates in the simulation example. Case (δ₁,δ₂,δ₃) indicates the five scenarios described in (6).

	Case

Predictor	(0,0,0)	(1,0,0)	(1,1,0)	(1,0,1)	(1,1,1)
x₂	0.03	0.06	0.14	0.15	0.62
x₃	0.02	0.04	0.65	0.13	0.16
x₄	0.03	0.09	0.08	0.66	0.14
x₅	0.70	0.16	0.05	0.07	0.02

Open in a new tab

In the simulation study (and the real data example), we checked sensitivity of the results to the prior specification by repeating the analyses with different hyperparameters. Particularly, we applied the Gamma prior, Gamma(0.01, 0.005), for the precision, and the uniform prior, Uniform(0, 50), for the standard deviation. Although we do not show details, there is basically no difference in parameter estimates, inferences or model ranking for the prior specification. One may also choose other potential priors such as a half-Cauchy prior. According to Gelman [45], the choice of noninformative priors for some scale parameter of the parameter with a common distribution may have a big impact on inferences, especially when the number of clusters is small (say, below 5) or the cluster-level variance is close to zero. However, using the traditional Gamma prior does not seem to sensitively affect the inference in our cases. The reasons might be: first, the number of subjects (i.e. clusters) is relatively large (n=46); second, since the random effects in our hierarchical model follow a random distribution rather than common distributions, it is not clear which prior for the variance of random effects should be more appropriate. In addition, our sensitivity analysis shows the appropriateness of the prior specification in the proposed model.

5 Application to Low Birth Weight Data in South Carolina

As an illustration, we applied the approach to the data of county-specific low birth weights (i.e. birth weight is less than 2500 gram) across 46 counties in the state of South Carolina during the period 1997–2006. As the observations were made yearly, a total of 460 observations were included in the data. The number of county-level low birth weights were obtained from South Carolina Department of Health and Environmental Control (DHEC). We considered the county-level population density (defined as population divided by the total land area in square miles), the proportion of African Americans, median household income and unemployment rate as socio-economic predictors of low birth weights. The population density, the proportion of African American population and the household income were acquired from the U.S. census. The unemployment rates were attained from the U.S. Bureau of Labor Statistics. In addition, we also considered aggregate data based on birth certificates for the other known socio-demographic and behavioral risk factors for low birth weights, including the proportion of mothers with less than 12th grade education (i.e. high school), the proportion of mothers smoking during pregnancy and the proportion of mothers receiving inadequate prenatal care based on the Kotelchuck Index (IKI value). See Kirby et al. [46] for details of the choice of the covariates. We calculated the correlation for each pair of the covariates. The range of the correlations is between 0.01 and 0.46, indicating that the covariates have reasonably low correlation. The multicollinearity was diagnosed by calculating the variance inflation factor (VIF) for each covariate. The range of VIF is between 1.15 and 3.68, implying the low multicollinearity.

In the data, y_it denotes the number of low birth weights in county i during year t, and x_it = (1, x_it2, x_it3, x_it4, x_it5, x_it6, x_it7, x_it8)′ with x_it2 indicating the county-level population density, x_it3 the proportion of black people, x_it4 the median household income, x_it5 the unemployment rate, x_it6 the proportion of mothers with less than 12th grade education, x_it7 the proportion of mothers smoking during pregnancy and x_it8 the proportion of mothers with IKI value in county i for year t, for i = 1, …, 46 and t = 1, …, 10.

We completed the specification of the proposed model by choosing prior Gamma(0.005, 0.005) for τ_α,k, τ_β,k, τ_γ,k and τ_k. The prior probability for selection of regression coefficients is chosen to follow Beta(1, 1). Since the number of regions is 46, we chose the truncation of the stick-breaking representation as 15 [29]. We also chose larger values which gave similar results. We collected 10,000 samples by thinning 50,000 samples by a factor of 5 after a burn-in of 10,000 iterations.

The third column in Table 1 shows the estimated negative cross-validatory predictive log-likelihoods for the proposed model along with the four competing models (as outlined in Section 4). We can see that the estimated NLLK_CV,app values for Models 1–4 are much higher than that for Model 5, evincing that Model 5 is the best among all the models. The priors of the parameters and the settings for the hyperparameters used were similar to those in the models of the simulated example.

Table 3 elucidates the marginal posterior probabilities of inclusion of the seven predictors in terms of fixed effects, spatial random effects and temporal effects. To detect if the fixed effects and space/time variations are significant, we calculated the Bayes factor based on the marginal posterior probabilities of indicators (i.e. δ_1k, δ_2k and δ_3k, k = 2, …, 8). More precisely, the Bayes factor can be calculated as

B F = \frac{P r (δ_{j k} = 1 | x, y) / P r (δ_{j k} = 1)}{P r (δ_{j k} = 0 | x, y) / P r (δ_{j k} = 0)}, j = 1, 2, 3,

where Pr(δ_jk = 1) denotes the prior probability of inclusion and Pr(δ_jk = 1|x, y) denotes the marginal posterior probability of inclusion. Since we assume the prior probabilities of inclusion and exclusion to be equivalent (i.e. 0.5), the Bayes factor reduces to Pr(δ_jk = 1|x, y)/Pr(δ_jk = 0|x, y). It is noticed that when Pr(δ_jk = 1|x, y) ≥ 0.94, the Bayes factor is over 15. Based on Jeffrey’s Bayes factor criteria ([46], p. 432), there is a very strong evidence of the effects with the posterior probability of inclusion being over 0.94. For the fixed effects, the proportion of black people and the median household income are significantly included in the model with the posterior probability of inclusion over 94%. For the spatial random effects, we conclude that the spatial variation of the effect for the proportion of black people is marginally very strong (93%), indicating that the effect of the covariate on low birth weights significantly vary across the counties. On the other hand, there is no strong temporal variation of the effect for any covariates, implying little variation of the covariates effects over time.

Table 3.

Marginal posterior probabilities of inclusion of the 7 predictors in the application.

Predictor	Fixed Effect	Spatial Effect	Temporal Effect
Population Density	0.91	0.82	0.63
Proportion of Black	0.97	0.93	0.89
Median Household Income	0.96	0.91	0.90
Unemployment Rate	0.88	0.82	0.66
Proportion of Less Education	0.85	0.74	0.70
Proportion of Smoking	0.92	0.90	0.78
Proportion of IKI	0.89	0.84	0.69

Open in a new tab

Table 4 provides the estimates and 95% credible intervals of the fixed effects for the seven predictors based on the posterior samples. It is clear that the African American women significantly have lower birth weight babies than the other race group. Besides the proportion of black people, it is noticed that the median household income is marginally negatively associated with the probability of low birth weight (95% CI is (−1.92,0.02)). This is basically consistent with its posterior probability of inclusion which is 0.96. For the population density and the proportion of smoking, as their posterior probabilities of inclusion are 0.91 and 0.92, respectively, it is anticipated that their 95% CIs cover zero, implying that these two covariates have less impact on explaining the low birth weights. The rest predictors are not significant in predicting the low birth weight. These results are consistent with the previous studies [47, 48].

Table 4.

Estimates and 95% credible intervals of fixed effects from the posterior samples.

Predictor	Fixed Effect	95% CI
Population Density	−0.82	(−1.76, 0.16)
Proportion of Black	1.71	(0.30, 3.06)
Median Household Income	−1.00	(−1.92, 0.02)
Unemployment Rate	0.86	(−0.43, 2.23)
Proportion of Less Education	0.86	(−0.47, 2.11)
Proportion of Smoking	1.13	(−0.21, 2.56)
Proportion of IKI	0.38	(−0.87, 1.54)

Open in a new tab

Figure 2 depicts the posterior densities for the spatial-specific effects of the predictors. It shows that the spatial random effects of different predictors follow different distributions. Figure 3 exemplifies boxplots of the posterior samples of the spatially-varying coefficient of the proportion of black people over county, where the points at zero are shown due to variable selection. Figure 4 presents boxplots of the posterior samples of the temporally-varying coefficients of all covariates over time. It is noted that although there are some trend variations, the temporally-varying coefficients vary in a small scale and almost all of the ranges cover zero, implying insignificant temporal effect for the covariates. Interestingly, even in the cases where the model coefficients do not present significant variation in time and space, the proposed approach still provides better fitting to the data in this application based on the NLLK_CV,app in Table 1. This advantage benefits from the area-specific nonparametric distribution assumption for the spatially-varying coefficients along with variable selection which allows for positive probabilities of excluding coefficients (i.e. zero values).

Posterior densities for spatially varying coefficients(β_i) across areas in the application.

The boxplots of the posterior samples of the spatially-varying coefficient of the proportion of black people over county.

The boxplots of the posterior samples of the temporally-varying coefficients of the 7 covariates over time.

Figure 5 displays the choropleth maps of comparison of the raw standard mortality ratio (SMR) of the low birth weight and the estimated SMR based on the proposed model in years 1997, 2002 and 2006. It is shown that the estimated relative risks of the low birth weight based on the proposed model capture the main geographical pattern and the temporal trend. In general, the relative risk of the low birth weights in many counties of South Carolina was increasing during 1997–2006. More precisely, initially high relative risks were mostly in the center and the East of the region. Then relative risks were getting worse in the East while in the center remained in the same interval, and increased remarkably in the North and the Southwest.

The choropleth maps of comparison of the raw SMR of low birth weight in SC and the estimated SMR based on the proposed method in years 1997, 2002 and 2006.

6 Discussion

We proposed a Bayesian semiparametric model with variable selection for the analysis of space-time data. The proposed approach relaxes the normality assumption for the spatial random effects of covariates while allowing for uncertainty of the inclusion of the fixed effects, spatial random effects and temporal effects. The spatial information is incorporated into the nonparametric distributions for the spatial random effects via the generalized transformation which includes various popular links.

One of the major advantages of the proposed semiparametric model is the ability to flexibly model variation within a localised areas of a study region. In the proposed model we allow geographically-localised definition of the dependence of covariates and provide a flexible method of incorporation of covariates with pre-defined inclusion probabilities. Although in the examples the covariate profiles show some impact on the overall county rates, it is evident that the estimated negative cross-validatory predictive log-likelihoods supports the proposed model over conventional space-time random effect models. This suggests that even with the degree of parameterization, there is an overall benefit in the use of such semiparametric models, especially when covariates are to be flexibly accommodated. Computational intensity is noticed in the proposed approach, though it is reasonably efficient when it is coded in R. It took about 84 hours to run 50,000 iterations for the real data example on a Linux server with Xeon(R) CPU X5355 at 2.66GHz. Future work will focus on developing more efficient semiparametric space-time models for areal data.

Acknowledgements

The authors would also like to thank the editor, the associate editor, and the three referees for valuable comments which greatly improved the presentation of the article. This work was supported by NIH/NHLBI 5R21HL088654-02.

Appendix

Full conditional distributions in Section 2.3

Step 1: Update (δ_1k, δ_2k, δ_3k), for k = 1, …, p, from its full conditional posterior distribution,

exp [\sum_{i = 1}^{n} \sum_{t = 1}^{T} (x_{i t k} y_{i t} θ_{i t k} - E_{i t} exp (\sum_{k = 1}^{p} x_{i t k} θ_{i t k}))] π (δ_{1 k}, δ_{2 k}, δ_{3 k}) .

With this posterior distribution, we can calculate the posterior probability for each scenario in (6). After standardization, we can generate a sample of (δ_1k, δ_2k, δ_3k) from a discrete probability measure as

(δ_{1 k}, δ_{2 k}, δ_{3 k}) ~ \sum_{l = 1}^{5} p_{l k}^{*} δ_{κ_{l k}} (\cdot),

where $p_{l k}^{*}$ denotes the standardized probability and κ_lk denotes different scenarios for (δ_1k, δ_2k, δ_3k) shown in (6).

Step 2: Update α_k, for k = 1, …, p, from its full conditional posterior distribution,

exp [\sum_{i = 1}^{n} \sum_{t = 1}^{T} (x_{i t k} y_{i t} δ_{1 k} α_{k} - E_{i t} exp (\sum_{k = 1}^{p} x_{i t k} θ_{i t k})) - \frac{τ_{α, k}}{2} α_{k}^{2}],

Step 3: Update τ_α,k, for k = 1, …, p, from its full conditional posterior distribution,

Gamma (a_{α, k} + \frac{1}{2}, b_{α, k} + \frac{α_{k}^{2}}{2})

Step 4: Update γ_tk, for t = 1, …, T and k = 1, …, p, from its full conditional posterior distribution,

exp [\sum_{i = 1}^{n} \sum_{t = 1}^{T} (x_{i t k} y_{i t} δ_{3 k} γ_{t k} - E_{i t} exp (\sum_{k = 1}^{p} x_{i t k} θ_{i t k})) - \frac{τ_{γ, k}}{2} \sum_{t = 1}^{T} {(γ_{t k} - γ_{t - 1, k})}^{2}],

Step 5: Update τ_γ,k, for k = 1, …, p, from its full conditional posterior distribution,

Gamma (a_{γ, k} + \frac{T}{2}, b_{α, k} + \frac{1}{2} \sum_{t = 1}^{T} {(γ_{t k} - γ_{t - 1, k})}^{2})

Step 6: Update β_ik, for i = 1, …, n and k = 1, …, p. Let $S_{k}^{*} = (S_{1 k}^{*}, \dots, S_{r_{k} k}^{*})'$ be the configuration of the current distinct values of S_k. Then we sample $β_{S_{s k}^{*}}^{*}$ , for s = 1, …, r_k, from its full conditional posterior distribution,

exp [\sum_{i : S_{i k} = S_{s k}^{*}} \sum_{t = 1}^{T} (x_{i t k} y_{i t} δ_{2 k} β_{S_{s k}^{*}}^{*} - E_{i t} exp (\sum_{k = 1}^{p} x_{i t k} θ_{i t k})) - \frac{τ_{β, k} β_{S_{s k}^{*}}^{* 2}}{2}] .

Step 7: Update τ_β,k, for k = 1, …, p, from its full conditional posterior distribution,

Gamma (a_{β, k} + \frac{n}{2}, b_{β, k} + \frac{1}{2} \sum_{i = 1}^{n} β_{i k}^{2})

Step 8: Update S_ik, for i = 1, …, n and k = 1, …, p from its full conditional posterior distribution,

\sum_{s = 1}^{r_{k}} {ω̂}_{i s k} δ_{s} (\cdot),

where ${ω̂}_{i s k} \propto ω_{i s k} exp [\sum_{t = 1}^{T} (x_{i t k} y_{i t} δ_{2 k} β_{S_{s k}^{*}}^{*} - E_{i t} exp (\sum_{k = 1}^{p} x_{i t k} θ_{i t k}))]$ for s = 1, …, r_k.

Step 9: Update ω_isk, for s = 1, …, r_k and k = 1, … p, as follows,

ω_{i s k} = {V̂}_{i s k}^{*} \prod_{l < s} (1 - {V̂}_{i l k}^{*}),

where ${V̂}_{i s k}^{*} = u_{i s k} {V̂}_{s k}, u_{i s k} = h (ϕ_{i s k}), {V̂}_{s k} ~ Beta (1 + L_{s k}, M_{k} + \sum_{l = s + 1}^{r_{k}} L_{l k})$ , with L_sk = #{S_ik = s, for i = 1, …, n}, denoting the number of S_ik values that equal s.

Step 10: Update M_k, for k = 1, …, p, from its full conditional posterior distribution,

Gamma (r_{k}, - \sum_{l = 1}^{r_{k} - 1} log (1 - {V̂}_{l k})) .

Step 11: Update ϕ_isk, for k = 1, …, p, from its full conditional posterior distribution,

\sum_{s = 1}^{r_{k}} {V̂}_{i s k}^{*} \prod_{l < s} (1 - {V̂}_{i l k}^{*}) δ_{β_{i s}^{*}} exp (- \frac{m_{i} τ_{k}}{2} {(ϕ_{i s k} - {ϕ̅}_{i s k})}^{2}),

where ${V̂}_{i s k}^{*} = u_{i s k} {V̂}_{s k}$ with $u_{i s k} = \frac{{(1 + ϕ_{i s k} λ_{k} / 2)}^{1 / λ_{k}}}{{(1 + ϕ_{i s k} λ_{k} / 2)}^{1 / λ_{k}} + {(1 + ϕ_{i s k} λ_{k} / 2)}^{1 / λ_{k}}}$ .

Step 12: Update τ_k, for k = 1, …, p, from its full conditional posterior distribution,

Gamma (a_{k} + \frac{n r_{k}}{2}, b_{k} + \frac{1}{2} \sum_{s = 1}^{r_{k}} \sum_{i ~ j} {(ϕ_{i s k} - ϕ_{j s k})}^{2})

Step 13: Update λ_k, for k = 1, …, p, from its full conditional posterior distribution,

\prod_{i = 1}^{n} (\sum_{s = 1}^{r_{k}} {V̂}_{i s k}^{*} \prod_{l < s} (1 - {V̂}_{i l k}^{*}) δ_{β_{i s}^{*}}) .

REFERENCES

1.Bernardinelli L, Montomoli C. Empirical Bayes versus fully Bayesian analysis of geographical variation in disease risk. Statistics in Medicine. 1992;11:983–1007. doi: 10.1002/sim.4780110802. [DOI] [PubMed] [Google Scholar]
2.Besag J. Spatial interaction and statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B. 1974;36:192–236. [Google Scholar]
3.Waller LA, Carlin BP, Xia H, Gelfand AE. Hierarchical spatio-temporal mapping of disease rates. Journal of the American Statistical Association. 1997;92:607–617. [Google Scholar]
4.Knorr-Held L, Besag J. Modelling risk from a disease in time and space. Statistics in Medicine. 1998;17:2045–2060. doi: 10.1002/(sici)1097-0258(19980930)17:18<2045::aid-sim943>3.0.co;2-p. [DOI] [PubMed] [Google Scholar]
5.Lagazio C, Biggeri A, Dreassi E. Age-period-cohort models and disease mapping. Environmetrics. 2003;14:475–490. [Google Scholar]
6.Knorr-Held L. Bayesian modelling of inseparable space-time variation in disease risk. Statistics in Medicine. 2000;19:2555–2567. doi: 10.1002/1097-0258(20000915/30)19:17/18<2555::aid-sim587>3.0.co;2-#. [DOI] [PubMed] [Google Scholar]
7.Lagazio C, Dreassi E, Biggeri A. A hierarchical Bayesian model for spacetime variation of disease risk. Statistical Modelling. 2001;1:17–29. [Google Scholar]
8.Richardson S, Abellán JJ, Best N. Bayesian spatio-temporal analysis of joint patterns of male and female lung cancer risks in Yorkshire. Statistical Methods in Medical Research. 2006;15:385–407. doi: 10.1191/0962280206sm458oa. [DOI] [PubMed] [Google Scholar]
9.Ugarte MD, Goicoa T, Ibáñez B, Militino AF. Evaluating the performance of spatiotemporal Bayesian models in disease mapping. Environmetrics. 2009;20:647–665. [Google Scholar]
10.Assunção RM. Space varying coefficient models for small area data. Environmetrics. 2003;14:453–473. [Google Scholar]
11.Gamerman D, Moreira ARB, Rue H. Space-varying regression models: specifications and simulation. Computational Statistics and Data Analysis. 2003;42:513–533. [Google Scholar]
12.Dreassi E, Biggeri A, Catelan D. Space-time models with time dependent covariates for the analysis of the temporal lag between socio-economic factors and mortality. Statistics in Medicine. 2005;24:1919–1932. doi: 10.1002/sim.2063. [DOI] [PubMed] [Google Scholar]
13.Cai B, Lawson AB, Hossain Md, Choi J. Bayesian Latent Structure Models with Space-Time Dependent Covariates. Statistical Modeling. 2012;12:145–164. doi: 10.1177/1471082X1001200202. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Gelfand A, Banerjee S, Gamerman D. Spatial process modelling for univariate and multivariate dynamic spatial data. Environmetrics. 2005;16:465–479. [Google Scholar]
15.Paez MS, Gamerman D, Landim FMP, Salazar E. Spatially varying dynamic coefficient models. Journal of Statistical Planning and Inference. 2008;138(4):1038–1058. [Google Scholar]
16.Gelfand AE, Kottas A, MacEachern SN. Bayesian nonparametric spatial modeling with Dirichlet process mixing. Journal of the American Statistical Association. 2005;100:1021–1035. [Google Scholar]
17.Duan J, Guindani M, Gelfand A. Generalized spatial Dirichlet process models. Biometrika. 2007;94:809–825. [Google Scholar]
18.Reich BJ, Fuentes M. A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields. Annals of Applied Statistics. 2007;1:249–264. [Google Scholar]
19.Li P, Banerjee S, Hanson TA, McBean AM. Research Report. Division of Biostatistics, University of Minnesota; 2010. Nonparametric hierarchical modeling for detecting boundaries in areally referenced spatial datasets. [Google Scholar]
20.Reich BJ, Fuentes M, Herring AH, Evenson KR. Bayesian Variable Selection for Multivariate Spatially-Varying Coefficient Regression. Biometrics. 2010;66:772–782. doi: 10.1111/j.1541-0420.2009.01333.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Kuo L, Mallick B. Variable selection for regression models. Sankhyá B. 1998;60:65–81. [Google Scholar]
22.Geisser S. On prior distribution for binary trials (with discussion) American Statistician. 1984;38(4):244–251. [Google Scholar]
23.Scott J, Berger J. An exploration of aspects of Bayesian multiple testing. Journal of Statistical Planning and Inference. 2006;136:2144–2162. [Google Scholar]
24.Cui W, George EI. Empirical Bayes vs. fully Bayes variable selection. Journal of Statistical Planning and Inference. 2008;138:888–900. [Google Scholar]
25.Ghosal S, van der Vaart AW. Fundamentals of Nonparametric Bayesian Inference. Cambridge University Press; (expected in 2013). [Google Scholar]
26.Sethuraman J. A constructive definition of Dirichlet priors. Statistica Sinica. 1994;4:639–650. [Google Scholar]
27.Blackwell D, Macqueen JB. Ferguson distributions via Pólya urn schemes. Annals of Statistics. 1973;1:353–355. [Google Scholar]
28.Bush CA, MacEachern SN. A semiparametric Bayesian model for randomised block designs. Biometrika. 1996;83:275–285. [Google Scholar]
29.Ishwaran H, James LF. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association. 2001;96:161–173. [Google Scholar]
30.Ishwaran H, Zarepour M. Markov Chain Monte Carlo in Approximate Dirichlet and Beta Two- Parameter Process Hierarchical Models. Biometrika. 2000;87:371–390. [Google Scholar]
31.Aranda-Ordaz F. On two families of transformations to additivity for binary response data. Biometrika. 1981;68:357–363. [Google Scholar]
32.Ohlssen DI, Sharples LD, Spiegelhalter DJ. Flexible random-effects models using Bayesian semi-parametric models: Application to institutional comparisons. Statistics in Medicine. 2007;26:2088–2112. doi: 10.1002/sim.2666. [DOI] [PubMed] [Google Scholar]
33.Gilks WR, Best NG, Tan KKC. Adaptive Rejection Metropolis Sampling within Gibbs Sampling. Journal of Applied Statistics. 1995;44:455–472. [Google Scholar]
34.Spiegelhalter DJ, Best NG, Carlin BP, Linde AVD. Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society, Series B. 2002;64:1–34. [Google Scholar]
35.Plummer M. Penalized loss functions for Bayesian model comparison. Biostatistics. 2008;9:523–539. doi: 10.1093/biostatistics/kxm049. [DOI] [PubMed] [Google Scholar]
36.Geisser S. Predictive Inference: An Introduction. London: Chapman & Hall; 1993. [Google Scholar]
37.Gelfand AE, Dey D, Chang H. Model determination using predictive distributions with implementation via sampling based methods (with discussion) In: Bernardo J, et al., editors. Bayesian Statistics 4. Oxford University Press; 1992. pp. 147–167. [Google Scholar]
38.Dey D, Chen MH, Chang H. Bayesian Approach for Nonlinear Random Effects Models. Biometrics. 1997;53:1239–1252. [Google Scholar]
39.Sinha D, Dey DK. Semiparametric Bayesian Analysis of Survival Data. Journal of the American Statistical Association. 1997;92:1195–1212. [Google Scholar]
40.Draper D, Krnjajić M. Technical Report. Santa Cruz: Department of Applied Mathematics and Statistics, Baskin School of Engineering, University of California; 2006. Bayesian model specification. [Google Scholar]
41.Cowles MK, Carlin BP. Markov Chain Monte Carlo diagnostics: A comparative review. Journal of the American Statistical Association. 1995;91:883–904. [Google Scholar]
42.Plummer M, Best NG, Cowles K, Vines K. CODA: Convergence Diagnosis and Output Analysis for MCMC. R News. 2006;6:7–11. [Google Scholar]
43.R Development Core Team. Vienna, Austria: R Foundation for Statistical Computing; 2007. R: A language and environment for statistical computing. ISBN 3-900051-07-0, URL http://www.R-project.org. [Google Scholar]
44.Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS – a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing. 2000;10:325–337. [Google Scholar]
45.Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis. 2006;1:515–534. [Google Scholar]
46.Jeffreys H. The Theory of Probability. 3rd ed. Oxford: 1961. [Google Scholar]
47.Kirby R, Liu J, Lawson A, Choi J, Cai B, Hossain M. Small area low birth weight incidence and socio-economic predictors: a latent spatial structure approach. Spatial and Spatio-Temporal Epidemiology. 2011;2(4):265–271. doi: 10.1016/j.sste.2011.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Olson ME, Diekema D, Elliott BA, Renier CM. Impact of Income and Income Inequality on Infant Health Outcomes in the United States. Pediatrics. 2010;126:1165–1173. doi: 10.1542/peds.2009-3378. [DOI] [PubMed] [Google Scholar]

[R1] 1.Bernardinelli L, Montomoli C. Empirical Bayes versus fully Bayesian analysis of geographical variation in disease risk. Statistics in Medicine. 1992;11:983–1007. doi: 10.1002/sim.4780110802. [DOI] [PubMed] [Google Scholar]

[R2] 2.Besag J. Spatial interaction and statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B. 1974;36:192–236. [Google Scholar]

[R3] 3.Waller LA, Carlin BP, Xia H, Gelfand AE. Hierarchical spatio-temporal mapping of disease rates. Journal of the American Statistical Association. 1997;92:607–617. [Google Scholar]

[R4] 4.Knorr-Held L, Besag J. Modelling risk from a disease in time and space. Statistics in Medicine. 1998;17:2045–2060. doi: 10.1002/(sici)1097-0258(19980930)17:18<2045::aid-sim943>3.0.co;2-p. [DOI] [PubMed] [Google Scholar]

[R5] 5.Lagazio C, Biggeri A, Dreassi E. Age-period-cohort models and disease mapping. Environmetrics. 2003;14:475–490. [Google Scholar]

[R6] 6.Knorr-Held L. Bayesian modelling of inseparable space-time variation in disease risk. Statistics in Medicine. 2000;19:2555–2567. doi: 10.1002/1097-0258(20000915/30)19:17/18<2555::aid-sim587>3.0.co;2-#. [DOI] [PubMed] [Google Scholar]

[R7] 7.Lagazio C, Dreassi E, Biggeri A. A hierarchical Bayesian model for spacetime variation of disease risk. Statistical Modelling. 2001;1:17–29. [Google Scholar]

[R8] 8.Richardson S, Abellán JJ, Best N. Bayesian spatio-temporal analysis of joint patterns of male and female lung cancer risks in Yorkshire. Statistical Methods in Medical Research. 2006;15:385–407. doi: 10.1191/0962280206sm458oa. [DOI] [PubMed] [Google Scholar]

[R9] 9.Ugarte MD, Goicoa T, Ibáñez B, Militino AF. Evaluating the performance of spatiotemporal Bayesian models in disease mapping. Environmetrics. 2009;20:647–665. [Google Scholar]

[R10] 10.Assunção RM. Space varying coefficient models for small area data. Environmetrics. 2003;14:453–473. [Google Scholar]

[R11] 11.Gamerman D, Moreira ARB, Rue H. Space-varying regression models: specifications and simulation. Computational Statistics and Data Analysis. 2003;42:513–533. [Google Scholar]

[R12] 12.Dreassi E, Biggeri A, Catelan D. Space-time models with time dependent covariates for the analysis of the temporal lag between socio-economic factors and mortality. Statistics in Medicine. 2005;24:1919–1932. doi: 10.1002/sim.2063. [DOI] [PubMed] [Google Scholar]

[R13] 13.Cai B, Lawson AB, Hossain Md, Choi J. Bayesian Latent Structure Models with Space-Time Dependent Covariates. Statistical Modeling. 2012;12:145–164. doi: 10.1177/1471082X1001200202. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Gelfand A, Banerjee S, Gamerman D. Spatial process modelling for univariate and multivariate dynamic spatial data. Environmetrics. 2005;16:465–479. [Google Scholar]

[R15] 15.Paez MS, Gamerman D, Landim FMP, Salazar E. Spatially varying dynamic coefficient models. Journal of Statistical Planning and Inference. 2008;138(4):1038–1058. [Google Scholar]

[R16] 16.Gelfand AE, Kottas A, MacEachern SN. Bayesian nonparametric spatial modeling with Dirichlet process mixing. Journal of the American Statistical Association. 2005;100:1021–1035. [Google Scholar]

[R17] 17.Duan J, Guindani M, Gelfand A. Generalized spatial Dirichlet process models. Biometrika. 2007;94:809–825. [Google Scholar]

[R18] 18.Reich BJ, Fuentes M. A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields. Annals of Applied Statistics. 2007;1:249–264. [Google Scholar]

[R19] 19.Li P, Banerjee S, Hanson TA, McBean AM. Research Report. Division of Biostatistics, University of Minnesota; 2010. Nonparametric hierarchical modeling for detecting boundaries in areally referenced spatial datasets. [Google Scholar]

[R20] 20.Reich BJ, Fuentes M, Herring AH, Evenson KR. Bayesian Variable Selection for Multivariate Spatially-Varying Coefficient Regression. Biometrics. 2010;66:772–782. doi: 10.1111/j.1541-0420.2009.01333.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Kuo L, Mallick B. Variable selection for regression models. Sankhyá B. 1998;60:65–81. [Google Scholar]

[R22] 22.Geisser S. On prior distribution for binary trials (with discussion) American Statistician. 1984;38(4):244–251. [Google Scholar]

[R23] 23.Scott J, Berger J. An exploration of aspects of Bayesian multiple testing. Journal of Statistical Planning and Inference. 2006;136:2144–2162. [Google Scholar]

[R24] 24.Cui W, George EI. Empirical Bayes vs. fully Bayes variable selection. Journal of Statistical Planning and Inference. 2008;138:888–900. [Google Scholar]

[R25] 25.Ghosal S, van der Vaart AW. Fundamentals of Nonparametric Bayesian Inference. Cambridge University Press; (expected in 2013). [Google Scholar]

[R26] 26.Sethuraman J. A constructive definition of Dirichlet priors. Statistica Sinica. 1994;4:639–650. [Google Scholar]

[R27] 27.Blackwell D, Macqueen JB. Ferguson distributions via Pólya urn schemes. Annals of Statistics. 1973;1:353–355. [Google Scholar]

[R28] 28.Bush CA, MacEachern SN. A semiparametric Bayesian model for randomised block designs. Biometrika. 1996;83:275–285. [Google Scholar]

[R29] 29.Ishwaran H, James LF. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association. 2001;96:161–173. [Google Scholar]

[R30] 30.Ishwaran H, Zarepour M. Markov Chain Monte Carlo in Approximate Dirichlet and Beta Two- Parameter Process Hierarchical Models. Biometrika. 2000;87:371–390. [Google Scholar]

[R31] 31.Aranda-Ordaz F. On two families of transformations to additivity for binary response data. Biometrika. 1981;68:357–363. [Google Scholar]

[R32] 32.Ohlssen DI, Sharples LD, Spiegelhalter DJ. Flexible random-effects models using Bayesian semi-parametric models: Application to institutional comparisons. Statistics in Medicine. 2007;26:2088–2112. doi: 10.1002/sim.2666. [DOI] [PubMed] [Google Scholar]

[R33] 33.Gilks WR, Best NG, Tan KKC. Adaptive Rejection Metropolis Sampling within Gibbs Sampling. Journal of Applied Statistics. 1995;44:455–472. [Google Scholar]

[R34] 34.Spiegelhalter DJ, Best NG, Carlin BP, Linde AVD. Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society, Series B. 2002;64:1–34. [Google Scholar]

[R35] 35.Plummer M. Penalized loss functions for Bayesian model comparison. Biostatistics. 2008;9:523–539. doi: 10.1093/biostatistics/kxm049. [DOI] [PubMed] [Google Scholar]

[R36] 36.Geisser S. Predictive Inference: An Introduction. London: Chapman & Hall; 1993. [Google Scholar]

[R37] 37.Gelfand AE, Dey D, Chang H. Model determination using predictive distributions with implementation via sampling based methods (with discussion) In: Bernardo J, et al., editors. Bayesian Statistics 4. Oxford University Press; 1992. pp. 147–167. [Google Scholar]

[R38] 38.Dey D, Chen MH, Chang H. Bayesian Approach for Nonlinear Random Effects Models. Biometrics. 1997;53:1239–1252. [Google Scholar]

[R39] 39.Sinha D, Dey DK. Semiparametric Bayesian Analysis of Survival Data. Journal of the American Statistical Association. 1997;92:1195–1212. [Google Scholar]

[R40] 40.Draper D, Krnjajić M. Technical Report. Santa Cruz: Department of Applied Mathematics and Statistics, Baskin School of Engineering, University of California; 2006. Bayesian model specification. [Google Scholar]

[R41] 41.Cowles MK, Carlin BP. Markov Chain Monte Carlo diagnostics: A comparative review. Journal of the American Statistical Association. 1995;91:883–904. [Google Scholar]

[R42] 42.Plummer M, Best NG, Cowles K, Vines K. CODA: Convergence Diagnosis and Output Analysis for MCMC. R News. 2006;6:7–11. [Google Scholar]

[R43] 43.R Development Core Team. Vienna, Austria: R Foundation for Statistical Computing; 2007. R: A language and environment for statistical computing. ISBN 3-900051-07-0, URL http://www.R-project.org. [Google Scholar]

[R44] 44.Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS – a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing. 2000;10:325–337. [Google Scholar]

[R45] 45.Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis. 2006;1:515–534. [Google Scholar]

[R46] 46.Jeffreys H. The Theory of Probability. 3rd ed. Oxford: 1961. [Google Scholar]

[R47] 47.Kirby R, Liu J, Lawson A, Choi J, Cai B, Hossain M. Small area low birth weight incidence and socio-economic predictors: a latent spatial structure approach. Spatial and Spatio-Temporal Epidemiology. 2011;2(4):265–271. doi: 10.1016/j.sste.2011.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] 48.Olson ME, Diekema D, Elliott BA, Renier CM. Impact of Income and Income Inequality on Infant Health Outcomes in the United States. Pediatrics. 2010;126:1165–1173. doi: 10.1542/peds.2009-3378. [DOI] [PubMed] [Google Scholar]

PERMALINK

Bayesian semiparametric model with spatially-temporally varying coefficients selection

Bo Cai

Andrew B Lawson

Md Monir Hossain

Jungsoon Choi

Russell S Kirby

Jihong Liu

Abstract

1 Introduction

2 Semiparametric Model with Uncertainty of Spatial-Temporally Varying Coefficients

2.1 The model with selection of spatial-temporally varying coefficients