Bayesian latent structure models with space-time-dependent covariates

Bo Cai; Andrew B Lawson; Md Monir Hossain; Jungsoon Choi

doi:10.1177/1471082X1001200202

. Author manuscript; available in PMC: 2013 Jun 3.

Published in final edited form as: Stat Modelling. 2012 Apr 1;12(2):145–164. doi: 10.1177/1471082X1001200202

Bayesian latent structure models with space-time-dependent covariates

Bo Cai ¹, Andrew B Lawson ², Md Monir Hossain ³, Jungsoon Choi ²

PMCID: PMC3670235 NIHMSID: NIHMS389179 PMID: 23741176

Abstract

Spatial-temporal data requires flexible regression models which can model the dependence of responses on space- and time-dependent covariates. In this paper, we describe a semiparametric space-time model from a Bayesian perspective. Nonlinear time dependence of covariates and the interactions among the covariates are constructed by local linear and piecewise linear models, allowing for more flexible orientation and position of the covariate plane by using time-varying basis functions. Space-varying covariate linkage coefficients are also incorporated to allow for the variation of space structures across the geographical location. The formulation accommodates uncertainty in the number and locations of the piecewise basis functions to characterize the global effects, spatially structured and unstructured random effects in relation to covariates. The proposed approach relies on variable selection-type mixture priors for uncertainty in the number and locations of basis functions and in the space-varying linkage coefficients. A simulation example is presented to evaluate the performance of the proposed approach with the competing models. A real data example is used for illustration.

Keywords: Bayesian regression, latent structure model, piecewise linear splines, space-time models, variable selection

1 Introduction

There is an increasing attention in the analysis of spatially and temporally referenced data in both methodological and applied research. Such data are of substantial interest in a variety of disciplines such as epidemiology, ecology, political sciences and economics. For example, one might be interested in geographical patterns and trends of a certain disease in a particular region over time. We start with describing a general space-time model.

Suppose that the dependent variable y_it is observed in the ith spatial unit (e.g., region or individual) and the tth time point with i = 1, …, n and t = 1, …, T. A general space-time model can be expressed as

y_{i t} ~ f (y_{i t} ∣ \cdot),

(1.1)

where f (y_it|·) denotes a conditional distribution of y_it given observed covariates, latent variables and measurement errors, with mean μ_it, μ_it = E(y_it), which is typically related to a linear predictor η_it through a suitable link function g(·), where η_it = g(μ_it). The response variable could be observed as a continuous (e.g., disease rate), categorical (e.g., indicates of disease or health status) and count (e.g., disease or death number) outcome. The predictor η_it is usually expressed as

η_{i t} = x_{i t}^{'} β + u_{i} + v_{i} + δ_{t},

(1.2)

where x_it = (1, x_it2, …, x_itp)′ denotes a p × 1 vector of covariates associated with unit i and time t, β = (β₁, …, β_p)′ denotes ap × 1 vector of population parameters, u_i and v_i denote random effects measuring spatial similarity and excess heterogeneity, respectively, and δ_t denotes a structured temporal random component. Conventionally, the fixed effects β can be modelled to follow a multivariate normal prior. The parameters u_i and v_i are assumed to be independent. The parameter v_i captures the heterogeneity among the units which is chosen to follow an exchangeable normally distributed prior, while u_i captures the clustering property of spatial data which is assumed to follow a conditional autoregressive (CAR) distribution (a special case of the general class of Markov random field) (Besag, 1974), $u_{i} ∣ u_{- i} ~ N ({\overset{‒}{u}}_{i}, {(τ m_{i})}^{- 1})$ , where u_−i = (u₁, …, u_i−1, u_i+1, …, u_n)′, ${\overset{‒}{u}}_{i} = m_{i}^{- 1} \sum_{j \in \partial_{i}} u_{j}$ with ∂_i denoting the neighbour set of unit i, m_i denotes the number of neighbours of unit i and τ denotes the precision parameter. The constraint $\sum_{i = 1}^{n} u_{i} = 0$ is defined for the purpose of identifiability of the overall intercept. The temporal parameter δ_t is assumed to follow AR(1) prior.

When responses are count data, model (1) becomes a typical spatio-temporal model based on which some hierarchical models were developed (Waller et al., 1997; Knorr-Held and Besag, 1998; Lagazio et al., 2003; among others). More complex issues occur when the space-time interaction effect, θ_it, is included in the predictor (2). Knorr-Held (2000) incorporated a space-time interaction for inseparable space-time variation in disease risk where four types of space-time interaction were described. Richardson et al. (2006) proposed a joint spatio-temporal modelling of two diseases with shared space-time interaction. Lagazio et al. (2001) focussed on the birth cohort model to assess latent effects associated with temporal trends. Ugarte et al. (2009) evaluated the performance of various spatio-temporal Bayesian models. Hossain and Lawson (2010) also evaluated ST (spatial-temporal) small area models but with an emphasis on cluster recovery/detection. When the effect of covariates on the response is the main focus, some space-time models with space-dependent coefficients (e.g., Assunçao, 2003; Gamerman et al., 2003) or time-dependent coefficients (Dreassi et al., 2005) were developed. In practice, the space-time-dependent effect of a specific space-time covariate on responses is also of substantial interest (e.g., effects of poverty rates on low birth weight may vary across different regions and time points). Although some work has been done (Gelfand et al., 2005; Paez et al., 2008), there is still lack of development for such models. Typically, one may consider a space-time model based on linear combinations of covariates such as $η_{i t} = x_{i t}^{'} β_{i t} + u_{i} + ν_{i} + δ_{t}$ . This expression can be treated as a general case including the model with a space-time interaction term, the model with space-dependent covariates (Assunção, 2003) and the model with time-dependent covariates (Dreassi et al., 2005). However, linear models prove too rigid when large quantities of data are considered and there exists nonlinearity (Hastie and Tibshirani, 1990).

Outside the context of the space-time data analysis, there exists a fairly rich literature of nonlinear regression modelling in both the frequentist and Bayesian framework. For example, Friedman (1991) proposed multivariate adaptive regression splines by using flexible tensor product splines. Holmes and Mallick (2001, 2003) described a Bayesian approach of piecewise linear model with covariate surface constructed by basis functions. Bigelow and Dunson (2007) extended the method to allow the spline coefficients to be subject specific. Pintore et al. (2006) derived a spatially adaptive smoothing splines based on a reproducing kernel Hilbert space representation. Numerous references can be found in Denison et al. (2002) and Ruppert et al. (2003), and therein.

Within the context of spatial and temporal modelling, Schmid and Held (2004) investigated space-time trends by incorporating intercept terms of covariates of interest with additional spatial component into the model. Banerjee and Johnson (2006) proposed to model single and multi resolution spatially varying growth curves as Gaussian processes that capture associations at single and multiple resolutions. Kneib and Fahrmeir (2006) described a general class of structured additive regression models for categorical responses, allowing for a semiparametric predictor. Zhao et al. (2006) developed general design generalized linear mixed models in which random effects with spatial correlation structure are included. Among the methods developed, however, none of them simultaneously consider, space- and time-specific effects of space-time covariates on responses.

In this paper, we focus on developing a general space-time model with main interest in the effect of space- and time-dependent covariates on the response. We extend the generalized multivariate regression splines (Holmes and Mallick, 2001) to flexibly accommodate the space- and time-specific covariates, allowing for flexible orientation and position of the covariate plane by using time-varying basis functions. Space-varying covariate linkage coefficients are incorporated for variation of space structures across geographical locations. Such multivariate regression models allow for effects of covariates on responses not only across space but also over time (i.e., interactions) in a flexible manner. We develop an approach which relies on variable selection-type mixture priors for uncertainty in the number and locations of the piecewise linear basis functions and in the space-varying linkage coefficients.

The remainder of the paper is organized as follows. Section 2 describes the space-time latent structure model with multivariate linear splines for a covariate linkage. Prior specification and posterior implementation are described. Section 3 discusses the model evaluation and comparison. Section 4 evaluates the performance of the approach based on a simulated example. Section 5 illustrates the approach via a real spatial-temporal data. Finally, Section 6 summarizes and discusses the results.

2 Space-time models with latent structure

2.1 The model and prior specification

We consider to model the unknown linear predictor as

η_{i t} = \sum_{k = 1}^{K} β_{i k} {(x_{i t}^{'} u_{t k})}_{+},

(2.1)

where β_i = (β_i1, …, β_iK)′ denotes a K × 1 vector of candidate space-specific linkage parameters for the underlying latent effects, u_tk denotes a p × 1 vector of time-varying basis parameters and (x_itu_tk)₊ denotes a basis function which is an inner product of x_it and u_tk truncated below by 0. To allow each model to include an intercept term, we define (x_itu_t1)₊ to be one for all i and t. When β_i are typically subject-specific coefficients, this formula is a generalization of the formulae by Holmes and Mallick (2001) and Bigelow and Dunson (2007). In model (3), the linkage coefficients β_i have spatially structured effects on covariates. To avoid identifiability problem, we constrain $\sum_{i = 1}^{n} β_{i k} = C_{k}$ , where C_k is some constant (Assunção et al., 2002). One may typically decompose β_ik as α_k + ζ_ik, where α_k denotes the global effect and ζ_ik denotes the spatially structured effect with $\sum_{i = 1}^{n} ξ_{i k} = 0$ to identify the overall effect due to the location invariance of the CAR prior. An attractive property of the structure of equation (2.1) is that the basis functions are time dependent which provide flexible orientation and position of the covariate plane and important trends in the impact of covariates over time. It is clear that each of the (K – 1) non-intercept basis functions contains linear effect for at least one covariate. When a basis contains multiple covariate effects, the proposed model allows for the effects of interactions (i.e., dependence) of space-time covariates on the response. The proposed model can be thought of as a more general spatio-temporal model. For example, if the observations are only spatially dependent, then basis functions are time irrelevant. In this case, the model reduces to the space-varying regression model (Assunção, Gamerman et al., 2003) with time-dependent covariates, i.e., $η_{i t} = x_{i t}^{'} θ_{i}$ .

The proposed approach allows for a flexible number of unknown basis functions and the linkage coefficients. Since the number of basis functions related to covariates is unknown a priori, one may consider the reversible jump MCMC (Markov chain Monte Carlo) (Green, 1995) for such models (e.g., Holmes and Mallick, 2001; Bigelow and Dunson, 2007). However, it involves complicated marginal likelihood calculation or approximation. To avoid this complexity, we adopt variable selection-type mixture priors for uncertainty of the number and locations of piecewise basis functions. To allow the kth basis to be effectively excluded from the model, we choose a mixture prior including a point mass at zero and a CAR distribution for the linkage coefficient β_ik given the indicator γ_k:

β_{i k} ∣ γ_{k} ~ γ_{k} δ_{0} (β_{i k}) + (1 - γ_{k}) N ({\overset{‒}{β}}_{\partial_{i}, k}, {(τ_{k} m_{i})}^{- 1}),

(2.2)

where γ_k is an indicator variable which is 1 for exclusionor 0 for inclusion of the kth basis function, δ₀(·) denotes a point mass at zero, ${\overset{‒}{β}}_{\partial_{i}, k} = m_{i}^{- 1} \sum_{j \in \partial_{i}} β_{j k}$ with ∂_i denoting the neighbour set of unit i, m_i denotes the number of neighbours of unit i and τ_k denotes the precision parameter. We refer to prior (4) as a zero-inflated CAR prior, ZI-CAR(γ_k, τ_k). The prior probability of the kth basis out of the K candidate bases related to covariates being excluded is p_1,k0 = Pr(H_1,k0: β_ik = 0). The prior for γ_k is then chosen as a Bernoulli distribution, Bern(p_1,k0). The prior for τ_k is chosen as Gamma(a_τ, b_τ), where Gamma(a, b) is a gamma distribution with mean a/b and variance a/b².

To reflect time-dependent measurements in each region, we use multivariate dynamic normal priors for u_tk,−1 which can be written as

u_{t k, - 1} ~ N_{p - 1} (ρ_{k} u_{t - 1, k, - 1}, ν), t = 1, \dots, T,

where ρ_k denotes the variation of the temporal autocorrelation in the risk, u_0k,−1 denotes the starting vector of u_tk−1 and ν denotes a diagonal covariance matrix, diag(ν₂, …, ν_p). Due to lack of unique solutions of β_ik and u_tk to the same model for each k, following Holmes and Mallick (2001), we normalize each component of u_tk,−1 = u_tk/u_tk,1 = (u_tk,2, …, u_tk,p)′, i.e., ||u_tk,−1|| = 1 for t = 1, …, T and k = 1, …, K, so that u_tk/u_tk,1 can be used for orientation of the plane in (p –1)-dimension covariate space and u_tk,1 for the position of the plane. To flexibly select the components from p – 1 covariates at each time point, we first choose u_0k,−1 to be zero. We then choose a variable selection-type mixture prior with a point mass at zero for variance ν_l, for l = 2, …, p,

ν_{l} ~ κ_{l} δ_{0} (ν_{l}) + (1 - κ_{l}) IG (ν_{l}; a_{ν}, b_{ν}),

(2.3)

where κ_l is an indicator variable which is 1 for exclusionor 0 for inclusion of the lth covariate and IG(·) denotes an inverse gamma distribution. We refer to prior (5) as ZI-IG(κ_l, a_ν, b_ν). The prior for κ_l is chosen as Bern(p_2,l0), where p_2,l0 denotes the probability of the lth covariate being excluded. The first element of u_tk can be defined as $u_{t k 1} = - x_{i^{'} t, - 1}^{'} u_{t k, - 1}$ , where x_i′t,−1 is randomly selected with i′ ∈ {1, …, n}. With probability of p_2,l0, all u_tkl are zeroes, for t = 1, …, T and k = 1, …, K, indicating that the lth covariate is excluded from the model. The mixture prior allows for the locations of the splines to vary over time by effectively excluding the elements from each basis function. The overall prior probability of excluding all covariates (except intercept) from the model at time t is $Π_{l = 2}^{p} p_{2, l 0}$ .

To allow for flexibility of the prior probability, p_1,k0, we consider choosing a hyper-prior beta distribution for the prior exclusion probability, p_1,k0 ~ Beta(c₁, d₁). Given the assumption that all prior probabilities are equal (say, p₀), the full conditional for p_1,k0 can be easily calculated (see details in Appendix). Similarly, the prior of p_2,l0 is chosen as a beta distribution, Beta(c₂, d₂), allowing for more flexibility in adapting the desired model. For the choice of c_i and d_i (i = 1, 2), following the suggestion by Geisser (1984), we choose c_i = d_i = 1 which yields the uniform hyperprior. Scott and Berger (2006) discuss the choice of priors for the prior probability. They conclude that the objective prior (i.e., the uniform prior) for the prior probability can easily be implemented computationally while incorporation of subjective prior information can be beneficial when available. In our case, we have no subjective information about the prior probability of inclusion of the covariates, resulting in choosing a uniform prior. For more details, please refer to Geisser (1984), Scott and Berger (2006, 2008) and Cui and George (2008), among others.

2.2 Posterior computation

The joint posterior distribution for the parameters is

π (β, u, τ, ν, γ, κ ∣ y, x) \propto Π_{i = 1}^{n} Π_{k = 1}^{T} f {\sum_{k = 1}^{K} β_{i k} {(x_{i t}^{'} u_{t k})}^{+}} π (β ∣ γ) π (γ) π (u ∣ ν, κ) π (ν ∣ κ) π (τ),

where β = (β₁, …, β_n)′, u = (u₁, …, u_T)′ and f(·) can be normal linear, Poisson and logistic regression models for the continuous, count and binary outcomes, respectively, described as

\begin{matrix} f (y_{i t} ∣ η_{i t}) & = \sqrt{\frac{τ}{2 π}} exp {- \frac{τ}{2} {(y_{i t} - η_{i t})}^{2}} (normal linear with η_{i t} = μ_{i t}) \\ = \frac{1}{y_{i t}!} exp {y_{i t} η_{i t} - exp (η_{i t})} (Poisson with η_{i t} = \log μ_{i t}) \\ = \frac{exp (y_{i t} η_{i t})}{1 + exp (η_{i t})} (logistic with η_{i t} = \log \frac{μ_{i t}}{1 - μ_{i t}}) . \end{matrix}

We choose priors for the parameters as described in Section 2.1. The posterior computation relies on a stochastic search variable selection Gibbs sampling algorithm (George and McCulloch, 1993), in which we iteratively sample from the full conditional distributions of each of the parameters. For each element of β_i and ν, the posterior has a mixture structure with a point mass at zero and a conjugate (for normal linear) or non-conjugate (for Poisson and logistic) distribution. To sample from the non-conjugate distribution, we use adaptive rejection Metropolis sampling (Gilks et al., 1995). Under the linear normal case, reparameterization allows the model to have conditionally linear structure for each parameter which facilitates the use of conjugate priors. For the purpose of generality, we instead provide a general full conditional posterior distribution for sampling.

The posterior computation relies on the Gibbs sampler and Metropolis-Hastings algorithms. After initializing values for the parameters, the proposed MCMC algorithm proceeds which is detailed in Appendix. Samples from the joint posterior distribution of the parameters are generated by repeating these steps for a large number of iterations after apparent convergence. Obviously, for identity link, the parameters can be sampled from the conjugate full conditional distributions.

As to guidance of how to specify the initial number for truncated planes for a particular analysis, from the simulation experiments that we conducted, we found that a large initial number of truncated planes, K, may provide sufficient space for change of dimension. However, after a minimum necessary number reaches, any further increase only marginally affects the fit while the computation time increases dramatically. Too low values of K, however, result in an inflexible modelling of the unknown linear predictor. Thus, we recommend to start with at least 10 number of truncated planes for small sample sizes and for large sample sizes of n > 20, referring to the heuristic rule of thumb given by Ruppert (2002) in the context of penalized splines of K = min(40, n/4).

3 Model comparison

The deviance information criterion (DIC)(Spiegelhalter et al., 2002) is widely used as a model comparison tool. DIC is shown to be an approximation to a penalized loss function based on the deviance with a penalty derived from a cross-validation argument. However, the implicit approximation is valid only when the effective number of parameters is much smaller than the number of independent observations (Plummer, 2008). Plummer (2008) pointed out that in disease mapping, this assumption does not hold, resulting in that DIC under-penalizes the complex models. Plummer (2008) proposed penalized loss functions instead of p_D, the effective number of parameter, to assess model adequacy. However, as Plummer (2008) noticed, this method requires MCMC runs with each observation left out in turn. Such calculation is not feasible in general, especially for large datasets. In this paper, we consider the comparison method based on the conditional predictive ordinate (CPO) (Gelfand et al., 1992; Geisser, 1993; Dey et al., 1997; Sinha and Dey, 1997). The CPO for the ith observation at time t is defined as the cross-validated marginal posterior predictive density

\begin{matrix} {CPO}_{i t} & = f (y_{i t} ∣ y_{(i t)}) \\ = \int f (y_{i t} ∣ θ) f (θ ∣ y_{(i t)}, x_{(i t)}) d θ \\ = {(\int \frac{1}{f (y_{i t} ∣ θ, x_{i})} f (θ ∣ y, x) d θ)}^{- 1}, \end{matrix}

where y_(it) denotes the vector of observations with the ith observation at time t deleted and θ is the vector of model parameters. The cross-validation likelihood can be estimated by

L_{C V} = Π_{i = 1}^{n} Π_{t = 1}^{T} {CPO}_{i t} .

Since the quantity of the cross-validation likelihood is typically close to zero, the negative cross-validatory predictive log-likelihood (Spiegelhalter et al., 1996; Draper and Krnjajić, 2006) can be used

N L L K_{C V} = - \sum_{i = 1}^{n} \sum_{t = 1}^{T} \log {CPO}_{i t} .

Since a closed form of CPO_it is usually unavailable, a Monte Carlo estimate of CPO_it can be obtained straightforwardly through MCMC samples ${θ^{(s)}}_{s = 1}^{N}$ from the postrior distribution f(θ|y,x)

{\hat{CPO}}_{i t} = {(\frac{1}{N} \sum_{s = 1}^{N} \frac{1}{f (y_{i t} ∣ θ^{(s)}, x_{i})})}^{- 1},

where N is the number of iterations after a burn-in period. The estimate of the negative cross-validatory predictive log-likelihood can be calculated accordingly. Since a large CPO indicates agreement between the observation and the model, a model with a smaller NLLK_CV for all observations implies a better fit.

4 A simulated example

The motivation of this simulation was to evaluate the performance of the proposed approach, including the accuracy of the estimates, the sensitivity to different choices of hyperparameters and comparison of the proposed model with other space-time models. Without loss of generality and for illustration purpose, we created the data based on Belgium map available from GeoBUGS in WinBUGS (Lunn et al., 2000) containing 43 districts. We considered the case of count responses. The data were generated for each of n = 43 districts over an observation period of T = 10 based on the model y_it ~ Poisson(E_it exp(η_it)), where the log-relative risk $η_{i t} = x_{i t}^{'} α + {\tilde{x}}_{i t}^{'} ξ_{i} + ν_{i} + δ_{t}$ , where $x_{i t} = {\tilde{x}}_{i t} = {(1, x_{i t 2}, x_{i t 3}, x_{i t 4}, x_{i t 5})}^{'}$ . E_it is an expected number of events obtained by Rn_it, where n_it is the population count in district i at time t and $R = \sum_{i t} y_{i t} ∕ \sum_{i t} n_{i t}$ . This model is similar to the one by Assunção (2003), where space-dependent covariates are included. The fixed effect α was chosen as (1, 1, 1, 0, 0)′, implying that the last two covariates are irrelevant. We generated ζi from a multivariate CAR, MVCAR(τ), where τ⁻¹ is a 3 × 3 covariance matrix with components along the row as {0.5, 0.2, 0.2, 0.2, 0.4, 0.2, 0.2, 0.2, 0.8}, v_i ~ N(0, 1) and and δ_t ~ N(δ_t−1, 2) for t = 2, …, T with δ₁ ~ N(0, 2). We generated x_itl ~ U(0, 1) for l = 2, …, 5. For practical reason, the expected count E_it was sampled from U(1, 5).

We specified the priors for the parameters of the proposed model as follows. We used Gamma(0.05, 0.05) as the prior for τ_k. The prior for the spatially structured random effects β_ik was chosen as the prior in (4) with $N ({\overset{‒}{β}}_{\partial_{i k}}, {(τ_{k} m_{i k})}^{- 1})$ . For the time-varying bases u_tk,−1, the prior was chosen as N(u_t−1,k,−1, ν), where ν_l ZI-IG(κ_l, 0.05, 0.05) for l = 2, …, 5. The starting vector of u_tk,−1, u_0k,−1 was chosen as 0 ρ_k as 1. For a flexible hyperprior beta distribution of p_1,k0 and p_2,l0, we chose c = d = 1 which yields the uniform hyperprior. Following Holmes and Mallick (2001), we chose the initial number of truncated planes as K = 30. We also tried several larger initial numbers of truncated planes which yielded essentially identical results.

We implemented the analysis using the Gibbs sampling algorithm described in Section 2. We generated 50 000 iterations after a burn-in of 10 000 iterations. Convergence was assessed by using a variety of diagnostics described by Cowles and Carlin (1995) and implemented using CODA (Plummer et al., 2006) in R. The diagnostic tests showed rapid convergence and efficient mixing. The parameters were estimated by thinning the chain by factor of 5 to obtain a sample of size 10 000. Sensitivity test of the results to the prior specification was assessed by repeating the analysis with different hyperparameters, which showed very similar results.

We compared the proposed model (Model 5) with the four competing spatio-temporal models. The log-relative risks of these models are listed as follows:

Model 1 : η_{i t} = x_{i t}^{'} α + ξ_{i} + δ_{t},

Model 2 : η_{i t} = x_{i t}^{'} α + ξ_{i} + v_{i} + δ_{t},

Model 3 : η_{i t} = x_{i t}^{'} α + ξ_{i} + v_{i} + δ_{t} + b_{i t} and

Model 4 : η_{i t} = x_{i t}^{'} α + x_{i t}^{'} ξ_{i} + v_{i} + δ_{t}

In the first three models, we followed conventional settings by specifying the prior of α as $N_{p} (0, \sum_{α})$ with $\sum_{α} ~ IWishart (p, \sum_{0}^{- 1})$ and Σ₀ = {0.1, 0.005, 0.005, 0.005, 0.1, 0.005, 0.005, 0.005, 0.1}. The prior of ζ_i was chosen as $N ({\overset{‒}{ξ}}_{i}, {(τ_{1} m_{i})}^{- 1})$ with ${\overset{‒}{ξ}}_{i} = m_{i}^{- 1} \sum_{j \in \partial i} ξ_{j}$ and τ₁ ~ Gamma(0.005, 0.005). The prior of v_i was taken as $N (0, τ_{2}^{- 1})$ with τ₂ ~ Gamma(0.005, 0.005). The prior of δ_t was chosen as $N (δ_{t - 1}, τ_{3}^{- 1})$ with $δ_{0} ~ N (0, τ_{3}^{- 1})$ and τ₃ ~ Gamma(0.005, 0.005). We chose the prior of b_it to be $N (0, τ_{4}^{- 1})$ with τ₄ ~ Gamma(0.005, 0.005). For Model 4, the prior of ζ_i was chosen as $N ({\overset{‒}{ξ}}_{\partial_{i}}, {(\sum_{ξ} m_{i})}^{- 1})$ with ${\overset{‒}{ξ}}_{\partial_{i}} = m_{i}^{- 1} \sum_{j \in \partial_{i}} ξ_{j}$ and $\sum_{ξ} ~ Wishart (p, \sum_{0})$ . We implemented Models 1-4 using WinBUGS (Lunn et al., 2000). Although Model 5 can also be implemented by WinBUGS, it is computationally intensive. A C program was instead written to carry out the proposed algorithm.

The second column in Table 1 presents the comparison of the estimated negative cross-validatory predictive log-likelihoods for the five models. We can see that Model 1 and Model 2 are basically the same. This is due to the fact that the unstructured random effects vary moderately across regions which is consistent with the setting. It is evident that Models 3-5 appear much better than the first two models with Model 5 being the best. Since Model 4 is the model where the data are generated, its performance is very close to Model 5.

Table 1.

Model comparison based on the negative cross-validatory log-likelihood for the simulated example and the application to the low birth weight data in South Carolin

Model	NLLK_{CV, sim}	NLLK_CV,app
Model 1	1160.967	1820.849
Model 2	1161.409	1802.395
Model 3	767.532	1746.393
Model 4	637.534	1763.222
Model 5	634.297	1738.383

Open in a new tab

In Figure 1, the upper plot represents true and pointwise estimated relative risks with 95% credible intervals from the proposed approach (Model 5) across all districts at time 5. The lower plot shows true and estimated relative risks for district 20 over time based on the five models along with 95% credible intervals from Model 5. It is clear that the proposed model provides closer estimates than the others.

Results from the simulated example. Top panel: true and estimated relative risks with 95% pointwise credible intervals from the proposed approach (Model 5) across all districts at time 5. Bottom panel: true and estimated relative risks with 95% pointwise credible intervals from the proposed approach across time points for district 20, along with posterior estimates from Models 1-4

Figure 2 shows posterior densities of variance parameters ν₄ and ν₅ and boxplots of the posterior means for the time-varying basis components u₄ and u₅ at each time point in the simulated example. We can see that the two variances ν₄ and ν₅ corresponding to the time-varying coefficients, u_tk₄ and u_tk₅, are close to zero, implying that x_it₄ and x_it₅ are not involved. This is consistent with the simulation design.

Left panel: posterior densities of the parameters ν₄ and ν₅ in the simulated example. Right panel: boxplots of the posterior means for the time-varying basis components u₄ and u₅ at each time point in the simulation. The horizontal line denotes the true value

Sensitivity of the results to the prior specification was assessed by repeating the analysis with different hyperparameters. Figure 3 shows the histograms of the posterior number of truncated planes and the probabilities of inclusion of covariates in the basis functions. We noticed that the average number of components varies insubstantially with various choices of the hyperparameters. It is evident that the last two covariates are basically excluded from the model, which is consistent with the design of the covariates.

(a) Histogram of the posterior number of truncated planes and (b) the probabilities of inclusion of covariates in the basis functions for the simulated data

5 Application

As an illustration, we applied the approach to the data of county-specific low birth weights (i.e., birth weight is less than 2500 gram) across 46 counties in the state of South Carolina during the period 1997-2007. A number of county-level low birth weights were obtained from South Carolina Department of Health and Environmental Control. The population density, the proportion of African American population, the household income and the poverty rate were acquired from the U.S. census. The unemployment rates were attained from the U.S. Bureau of Labor Statistics.

In the data, y_it denotes the number of low birth weights in county i during year t and x_it = (1, x_it1, x_it2, x_it3, x_it4, x_it5)′ with x_it1 indicating the county-level population density, x_it2 the proportion of black people, x_it3 the median household income, x_it4 the poverty rate and x_it5 the unemployment rate in county i for year t, for i = 1, …, 46 and t = 1, …, 11. The population density is defined as population divided by the total land area in square miles. The expected low birth weight counts for county i in year t, E_it, is calculated by n_it R, where n_it is the total number of births for county i in year t and R is the overall statewide low birth weight rate which can be calculated by the total low birth weight counts divided by the total number of births over the entire counties and time periods.

We completed the specification of the proposed model by choosing prior Gamma(0.05, 0.05) for τ_k and ZI-IG(0.05, 0.05) for ν_l. The prior probability of a point mass at zero for the variance components of β_i and u_tk,−1 is chosen to follow Beta(1, 1). Since five covariates were initially included in the model, the initial number of truncated planes was chosen as 30. We collected 10 000 samples by thinning 50 000 samples by factor of 5 after a burn-in of 10 000 iterations.

Figure 4 displays spatial maps of the posterior means and the standard deviations of relative risk in years 1997, 2002 and 2007. Figure 5 shows the posterior means and the 95% pointwise credible intervals for relative risk of low birth weight in four representative counties (randomly selected) along with their corresponding poverty rates over years 1997-2007. We can see that the estimated relative risk of low birth weight in County Dorchester with decreasing poverty rates slightly decreases over the 11-year time period. Counties Abbeville and Greenwood with increasing poverty rates basically have ascending trends over time. County Sumter roughly has a ‘V’ shape of poverty rate over time and the estimated relative risks, and interestingly the estimated relative risk has a similar curve. We also investigated the sensitivity of the number of truncated planes for β_i to various choices of the hyperparameters, which varies slightly. The posterior mean of the number of truncated planes is 12.4 with 95% credible interval (9.6, 16.5). We note that proportion of black people, the median household income, the poverty rate and the unemployment rate are included in the model with posterior probability of inclusion over 97% for each covariate while the population density has over 98% posterior probability of exclusion, implying that the population density can be excluded from the model at 5% significant level.

Spatial maps of (a) posterior mean and (b) posterior standard deviation (STD) of relative risks in years 1997, 2002 and 2007 for the low birth weight data in South Carolina. Left panel: posterior means of relative risk. Right panel: posterior STD of relative risk

Posterior means and 95% pointwise credible intervals of relative risks of low birth weight for the four counties along with the corresponding poverty rates in SC through 11-year time period. Left panel: solid lines denote posterior means of relative risks and dashed lines denote 95% pointwise credible intervals. Right panel: poverty rate over time

The third column in Table 1 shows the estimated negative cross-validatory predictive log-likelihoods for the proposed model along with the four competing models. We can see that the estimated NLLK_CV,app values for Model 1 and 2 are close while the other three models have much lower NLLK_CV,app values. The proposed model (Model 5) has the smallest NLLK_CV,app value, evincing that it is the best among all the models. The priors of the parameters and the settings for the hyperparameters used were similar to those in the models of the simulated example.

6 Discussion

We proposed a Bayesian regression model with multivariate linear splines for the analysis of space-time data. The proposed approach extends generalized multivariate regression splines (Holmes and Mallick, 2001) to flexibly accommodate the space- and time-specific covariates, allowing for flexible orientation and position of the covariate plane by incorporating time-varying basis functions.

One of the major advantages of a semiparametric modelling specification is the ability to flexibly model variation within a localized areas of a study region. In the proposed model, we allow geographically localized definition of the dependence of covariates and provide a flexible method of incorporation of variates via zero-inflation mixture priors. Although in the examples the covariate profiles show some impact on the overall county rates, it is evident that the estimated negative cross-validatory predictive log-likelihoods supports the proposed model over conventional space-time random effect models. This suggests that even with the degree of para-meterization, there is an overall benefit in the use of such semiparametric models, especially when covariates are to be flexibly accommodated. Computational intensity is noticed in the proposed approach, though it is reasonably efficient when it is coded in C language. Future work will focus on developing space-time models with nonparametric modelling and clustering on spatial effects coefficients, and on developing a more efficient sampling method.

Acknowledgements

The authors would also like to thank the editor the associate editor, and the referees for valuable comments which greatly improved the presentation of the paper. This work was supported by NIH/NHLBI 1R21HL088654-01A2.

Appendix

Full conditional distributions in Section 2.2.

Step 1: Update β_ik, for k = 1, …, K, from its full conditional posterior distribution,

Π_{i = 1}^{n} Π_{t = 1}^{T} f {\sum_{k = 1}^{K} β_{i k} {(x_{i t}^{'} u_{t k})}_{+}} exp {- \frac{τ_{k}}{2} \sum_{i = 1}^{n} \sum_{i ~ j} {(β_{i k} - β_{j k})}^{2}}

with the conditional posterior probability

1 - {\hat{p}}_{1, k} = P r (γ_{k} = 0 ∣ β, γ_{- k}) = \frac{1}{1 + C_{k}},

where C_k = p_1,k0/(1 – p_1,k0) × L(β_k = 0, β_−k, u, τ, ν, γ)/L(β_k = β_k, β_−k, u, τ, ν, γ) with $L (β_{k}, β_{- k}, u, τ, ν, γ) = Π_{i = 1}^{n} Π_{t = 1}^{T} f {\sum_{k = 1}^{K} β_{i k} {(x_{i t}^{'} u_{t k})}_{+}}$ and β_k = (β_1k, …, β_nk)′. Otherwise, β_ik is assigned to be zero. Simultaneously, γ_k (k = 1, …, K) can be sampled from its full conditional posterior distribution, $({\hat{p}}_{1, k})$ .

Step 2: Update p_1,k0 from its full conditional distribution,

p_{1, k_{0}} ∣ γ ~ Beta (c_{1} + n_{γ}, d_{1} + K - n_{γ}),

where γ corresponds to a model from the model space M containing 2^K models, and n_γ denotes the number of excluded predictors in the model, i.e., $\sum_{k = 1}^{K} γ_{k}$ .

Step 3: Update τ_k, for k = 1, …, K, from its full conditional posterior distribution,

Gamma (a_{τ} + \frac{n}{2}, b_{τ} + \frac{1}{2} \sum_{t = 1}^{n} \sum_{i ~ j} {(β_{i k} - β_{j k})}^{2}) .

Step 4: Update ν_l, for l = 2, …, p, from its full conditional distribution with a point mass at zero

κ_{l} δ_{0} (ν_{l}) + (1 - κ_{l}) IG {a_{ν} + \frac{K T}{2}, b_{ν} + \frac{1}{2} \sum_{t = 1}^{T} \sum_{k = 1}^{K} {(u_{t k l} - ρ_{k} u_{t - 1, k l})}^{2}},

where $κ_{l} ~ Bern ({\hat{p}}_{2, l})$ with ${\hat{p}}_{2, l} = (p_{2, l 0} L (β, u_{l} = 0, τ, ν, γ)) ∕ (p_{2, l 0} L (β, u_{l} = 0, τ, ν, γ) + (1 - p_{2, l 0}) L (β, u_{l} = u_{l}, u_{- l}, τ, ν, γ))$ and u_l = {u_tkl}_t,k.

Step 5: Update u_tk from its full conditional distribution, for t = 1}, …, T and k = 1, …, K,

Π_{i = 1}^{n} Π_{t = 1}^{T} f {\sum_{k = 1}^{K} β_{i k} {(x_{i t}^{'} u_{t k})}_{+}} exp {- \frac{1}{2} {(u_{t k} - ρ_{k} u_{t - 1}, k)}^{'} ν^{- 1} (u_{t k} - ρ_{k} u_{t - 1, k})},

For each t and k, we standardize the components of u_tk,−1 and u_tk1 = x_it,−1

Step 6: Update p_2,l0 from its full conditional distribution,

p_{2, l 0} ∣ κ ~ Beta (c_{2} + n_{κ}, d_{2} + p - 1 - n_{k}),

where n_κ denotes the number of excluded predictors in the model, i.e., $\sum_{l = 2}^{p} κ_{l}$ .

Step 7: When link function is identity, update τ from its full conditional distribution

Gamma {c_{τ} + \frac{n T}{2}, d_{τ} + \frac{1}{2} \sum_{i = 1}^{n} \sum_{t = 1}^{T} {(y_{i t} - η_{i t})}^{2}},

where π(τ) ~ Gamma(c_τ, d_τ).

References

Assunção RM. Space varying coefficient models for small area data. Environmetrics. 2003;14:453–73. [Google Scholar]
Assunção RM, Potter JE, Cavenaghi SM. A Bayesian space varying parameter model applied to estimating fertility schedules. Statistics in Medicine. 2002;21:2057–2075. doi: 10.1002/sim.1153. [DOI] [PubMed] [Google Scholar]
Banerjee S, Johnson GA. Coregionalized single and multi-resolution spatially varying growth-curve modelling. Biometrics. 2006;62:864–76. doi: 10.1111/j.1541-0420.2006.00535.x. [DOI] [PubMed] [Google Scholar]
Besag J. Spatial interaction and statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B. 1974;36:192–36. [Google Scholar]
Bigelow J, Dunson DB. Bayesian adaptive regression splines for hierarchical data. Biometrics. 2007;63:724–32. doi: 10.1111/j.1541-0420.2007.00761.x. [DOI] [PubMed] [Google Scholar]
Cowles MK, Carlin BP. Markov Chain Monte Carlo diagnostics: A comparative review. Journal of the American Statistical Association. 1995;91:883–04. [Google Scholar]
Cui W, George EI. Empirical Bayes vs. fully Bayes variable selection. Journal of Statistical Planning and Inference. 2008;138:888–900. [Google Scholar]
Denison DGT, Holmes CC, Mallick BK, Smith AFM. Bayesian Methods for Nonlinear Classification and Regression. John Wiley; Chichester: 2002. [Google Scholar]
Dey D, Chen MH, Change H. Bayesian approach for nonlinear random effects models. Biometrics. 1997;53:1239–1252. [Google Scholar]
Draper D, Krnjajić M. Bayesian model specification. Technical report, Department of Applied Mathematics and Statistics, Baskin School of Engineering, University of California; Santa Cruz: 2006. [Google Scholar]
Dreassi E, Biggeri A, Catelan D. Space-time models with time dependent covariates for the analysis of the temporal lag between socio-economic factors and mortality. Statistics in Medicine. 2005;24:1919–32. doi: 10.1002/sim.2063. [DOI] [PubMed] [Google Scholar]
Friedman JH. Multivariate adaptive regression splines. Annals of Statistics. 1991;19:1–141. [Google Scholar]
Gamerman D, Moreira ARB, Rue H. Space-varying regression models: specifications and simulation. Computational Statistics and Data Analysis. 2003;42:513–33. [Google Scholar]
Geisser S. On prior distribution for binary trials. American Statistician. 1984;38:244–51. with discussion. [Google Scholar]
Geisser S. Predictive inference: An introduction. Chapman & Hall; London: 1993. [Google Scholar]
Gelfand AE, Dey D, Chang H. Model determination using predictive distributions with implementation via sampling based methods (with discussion) In: Bernado J, et al., editors. Bayesian Statistics 4. Oxford University Press; 1992. pp. 147–167. [Google Scholar]
Gelfand A, Banerjee S, Gamerman D. Spatial process modelling for univariate and multivariate dynamic spatial data. Environmetrics. 2005;16:465–79. [Google Scholar]
George EI, McCulloch RE. Variable selection via Gibbs sampling. Journal of the American Statistical Association. 1993;88:881–9. [Google Scholar]
Gilks WR, Best NG, Tan KKC. Adaptive rejection metropolis sampling within Gibbs sampling. Journal of Applied Statistics. 1995;44:455–72. [Google Scholar]
Green P. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995;82:711–32. [Google Scholar]
Hastie TJ, Tibshirani RJ. Generalized additive models. Chapman & Hall; Londom: 1990. [Google Scholar]
Holmes C, Mallick B. Bayesian regression with multivariate linear splines. Journal of the Royal Statistical Society, Series B. 2001;63:3–17. [Google Scholar]
Holmes C, Mallick B. Generalized nonlinear modelling with multivariate free-knot regression splines. Journal of the American Statistical Association. 2003;98:352–68. [Google Scholar]
Hossain MM, Lawson AB. Space-time Bayesian small area disease risk models: development and evaluation with a focus on cluster detection. Environmental and Ecological Statistics. 2010;17:73–95. doi: 10.1007/s10651-008-0102-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kneib T, Fahrmeir L. Structured additive regression for categorical space-time data: a mixed model approach. Biometrics. 2006;62:109–18. doi: 10.1111/j.1541-0420.2005.00392.x. [DOI] [PubMed] [Google Scholar]
Knorr-Held L. Bayesian modelling of inseparable space-time variation in disease risk. Statistics in Medicine. 2000;19:2555–67. doi: 10.1002/1097-0258(20000915/30)19:17/18<2555::aid-sim587>3.0.co;2-#. [DOI] [PubMed] [Google Scholar]
Knorr-Held L, Besag J. Modelling risk from a disease in time and space. Statistics in Medicine. 1998;17:2045–60. doi: 10.1002/(sici)1097-0258(19980930)17:18<2045::aid-sim943>3.0.co;2-p. [DOI] [PubMed] [Google Scholar]
Lagazio C, Biggeri A, Dreassi E. Age-period-cohort models and disease mapping. Environmetrics. 2003;14:475–90. [Google Scholar]
Lagazio C, Dreassi E, Biggeri A. A hierarchical Bayesian model for space-time variation of disease risk. Statistical Modelling. 2001;1:17–29. [Google Scholar]
Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS—a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing. 2000;10:325–37. [Google Scholar]
Natarajan R, McCulloch CE. A note on the existence of the posterior distribution for a class of mixed models for binomial responses. Biometrics. 1995;82:639–43. [Google Scholar]
Paez MS, Gamerman D, Landim FMP, Salazar E. Spatially varying dynamic coefficient models. Journal of Statistical Planning and Inference. 2008;138:1038–58. [Google Scholar]
Pintore A, Speckman P, Holmes C. Spatially adaptive smoothing splines. Biometrika. 2006;93:113–25. [Google Scholar]
Plummer M. Penalized loss functions for Bayesian model comparison. Biostatistics. 2008;9:523–539. doi: 10.1093/biostatistics/kxm049. [DOI] [PubMed] [Google Scholar]
Plummer M, Best NG, Cowles K, Vines K. CODA: Convergence Diagnosis and Output Analysis for MCMC. R News. 2006;6:7–11. [Google Scholar]
Richardson S, Abellán JJ, Best N. Bayesian spatio-temporal analysis of joint patterns of male and female lung cancer risks in Yorkshire. Statistical Methods in Medical Research. 2006;15:385–07. doi: 10.1191/0962280206sm458oa. [DOI] [PubMed] [Google Scholar]
Ruppert D. Selecting the number of knots for penalized splines. Journal of Computational and Graphical Statistics. 2002;11:735–57. [Google Scholar]
Ruppert D, Wand MP, Caroll RJ. Semiparametric regression. Cambridge University Press; Cambridge: 2003. [Google Scholar]
Schmid V, Held L. Bayesian extrapolation of space-time trends in cancer registry data. Biometrics. 2004;60:1034–42. doi: 10.1111/j.0006-341X.2004.00259.x. [DOI] [PubMed] [Google Scholar]
Scott J, Berger J. An exploration of aspects of Bayesian multiple testing. Journal of Statistical Planning and Inference. 2006;136:2144–62. [Google Scholar]
Scott J, Berger J. Multiple testing, empirical bayes, and the variable-selection problem. 2008 Discussion Paper 2008-10. [Google Scholar]
Department of Statistical Science, Duke University. Sinha D, Dey DK. Semiparametric Bayesian analysis of survival data. Journal of the American Statistics Association. 1997;92:1195–1212. [Google Scholar]
Spiegelhalter D, Thomas A, Best N, Gilks W. BUGS 0.5: Bayesian inference using Gibbs sampling manual. MRC Biostatistics Unit, Institute of Public Health; Cambridge, UK: 1996. [Google Scholar]
Spiegelhalter DJ, Best NG, Carlin BP, Linde AVD. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B. 2002;64:1–34. [Google Scholar]
Ugarte MD, Goicoa T, Ibáñez B, Militino AF. Evaluating the performance of spatio-temporal Bayesian models in disease mapping. Environmetrics. 2009;20:647–65. [Google Scholar]
Waller LA, Carlin BP, Xia H, Gelfand AE. Hierarchical spatio-temporal mapping of disease rates. Journal of the American Statistical Association. 1997;92:607–17. [Google Scholar]
Zhao Y, Staudenmayer J, Coull BA, Wand MP. General design Bayesian generalized linear mixed models. Statistical Science. 2006;21:35–51. [Google Scholar]

[R1] Assunção RM. Space varying coefficient models for small area data. Environmetrics. 2003;14:453–73. [Google Scholar]

[R2] Assunção RM, Potter JE, Cavenaghi SM. A Bayesian space varying parameter model applied to estimating fertility schedules. Statistics in Medicine. 2002;21:2057–2075. doi: 10.1002/sim.1153. [DOI] [PubMed] [Google Scholar]

[R3] Banerjee S, Johnson GA. Coregionalized single and multi-resolution spatially varying growth-curve modelling. Biometrics. 2006;62:864–76. doi: 10.1111/j.1541-0420.2006.00535.x. [DOI] [PubMed] [Google Scholar]

[R4] Besag J. Spatial interaction and statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B. 1974;36:192–36. [Google Scholar]

[R5] Bigelow J, Dunson DB. Bayesian adaptive regression splines for hierarchical data. Biometrics. 2007;63:724–32. doi: 10.1111/j.1541-0420.2007.00761.x. [DOI] [PubMed] [Google Scholar]

[R6] Cowles MK, Carlin BP. Markov Chain Monte Carlo diagnostics: A comparative review. Journal of the American Statistical Association. 1995;91:883–04. [Google Scholar]

[R7] Cui W, George EI. Empirical Bayes vs. fully Bayes variable selection. Journal of Statistical Planning and Inference. 2008;138:888–900. [Google Scholar]

[R8] Denison DGT, Holmes CC, Mallick BK, Smith AFM. Bayesian Methods for Nonlinear Classification and Regression. John Wiley; Chichester: 2002. [Google Scholar]

[R9] Dey D, Chen MH, Change H. Bayesian approach for nonlinear random effects models. Biometrics. 1997;53:1239–1252. [Google Scholar]

[R10] Draper D, Krnjajić M. Bayesian model specification. Technical report, Department of Applied Mathematics and Statistics, Baskin School of Engineering, University of California; Santa Cruz: 2006. [Google Scholar]

[R11] Dreassi E, Biggeri A, Catelan D. Space-time models with time dependent covariates for the analysis of the temporal lag between socio-economic factors and mortality. Statistics in Medicine. 2005;24:1919–32. doi: 10.1002/sim.2063. [DOI] [PubMed] [Google Scholar]

[R12] Friedman JH. Multivariate adaptive regression splines. Annals of Statistics. 1991;19:1–141. [Google Scholar]

[R13] Gamerman D, Moreira ARB, Rue H. Space-varying regression models: specifications and simulation. Computational Statistics and Data Analysis. 2003;42:513–33. [Google Scholar]

[R14] Geisser S. On prior distribution for binary trials. American Statistician. 1984;38:244–51. with discussion. [Google Scholar]

[R15] Geisser S. Predictive inference: An introduction. Chapman & Hall; London: 1993. [Google Scholar]

[R16] Gelfand AE, Dey D, Chang H. Model determination using predictive distributions with implementation via sampling based methods (with discussion) In: Bernado J, et al., editors. Bayesian Statistics 4. Oxford University Press; 1992. pp. 147–167. [Google Scholar]

[R17] Gelfand A, Banerjee S, Gamerman D. Spatial process modelling for univariate and multivariate dynamic spatial data. Environmetrics. 2005;16:465–79. [Google Scholar]

[R18] George EI, McCulloch RE. Variable selection via Gibbs sampling. Journal of the American Statistical Association. 1993;88:881–9. [Google Scholar]

[R19] Gilks WR, Best NG, Tan KKC. Adaptive rejection metropolis sampling within Gibbs sampling. Journal of Applied Statistics. 1995;44:455–72. [Google Scholar]

[R20] Green P. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995;82:711–32. [Google Scholar]

[R21] Hastie TJ, Tibshirani RJ. Generalized additive models. Chapman & Hall; Londom: 1990. [Google Scholar]

[R22] Holmes C, Mallick B. Bayesian regression with multivariate linear splines. Journal of the Royal Statistical Society, Series B. 2001;63:3–17. [Google Scholar]

[R23] Holmes C, Mallick B. Generalized nonlinear modelling with multivariate free-knot regression splines. Journal of the American Statistical Association. 2003;98:352–68. [Google Scholar]

[R24] Hossain MM, Lawson AB. Space-time Bayesian small area disease risk models: development and evaluation with a focus on cluster detection. Environmental and Ecological Statistics. 2010;17:73–95. doi: 10.1007/s10651-008-0102-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Kneib T, Fahrmeir L. Structured additive regression for categorical space-time data: a mixed model approach. Biometrics. 2006;62:109–18. doi: 10.1111/j.1541-0420.2005.00392.x. [DOI] [PubMed] [Google Scholar]

[R26] Knorr-Held L. Bayesian modelling of inseparable space-time variation in disease risk. Statistics in Medicine. 2000;19:2555–67. doi: 10.1002/1097-0258(20000915/30)19:17/18<2555::aid-sim587>3.0.co;2-#. [DOI] [PubMed] [Google Scholar]

[R27] Knorr-Held L, Besag J. Modelling risk from a disease in time and space. Statistics in Medicine. 1998;17:2045–60. doi: 10.1002/(sici)1097-0258(19980930)17:18<2045::aid-sim943>3.0.co;2-p. [DOI] [PubMed] [Google Scholar]

[R28] Lagazio C, Biggeri A, Dreassi E. Age-period-cohort models and disease mapping. Environmetrics. 2003;14:475–90. [Google Scholar]

[R29] Lagazio C, Dreassi E, Biggeri A. A hierarchical Bayesian model for space-time variation of disease risk. Statistical Modelling. 2001;1:17–29. [Google Scholar]

[R30] Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS—a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing. 2000;10:325–37. [Google Scholar]

[R31] Natarajan R, McCulloch CE. A note on the existence of the posterior distribution for a class of mixed models for binomial responses. Biometrics. 1995;82:639–43. [Google Scholar]

[R32] Paez MS, Gamerman D, Landim FMP, Salazar E. Spatially varying dynamic coefficient models. Journal of Statistical Planning and Inference. 2008;138:1038–58. [Google Scholar]

[R33] Pintore A, Speckman P, Holmes C. Spatially adaptive smoothing splines. Biometrika. 2006;93:113–25. [Google Scholar]

[R34] Plummer M. Penalized loss functions for Bayesian model comparison. Biostatistics. 2008;9:523–539. doi: 10.1093/biostatistics/kxm049. [DOI] [PubMed] [Google Scholar]

[R35] Plummer M, Best NG, Cowles K, Vines K. CODA: Convergence Diagnosis and Output Analysis for MCMC. R News. 2006;6:7–11. [Google Scholar]

[R36] Richardson S, Abellán JJ, Best N. Bayesian spatio-temporal analysis of joint patterns of male and female lung cancer risks in Yorkshire. Statistical Methods in Medical Research. 2006;15:385–07. doi: 10.1191/0962280206sm458oa. [DOI] [PubMed] [Google Scholar]

[R37] Ruppert D. Selecting the number of knots for penalized splines. Journal of Computational and Graphical Statistics. 2002;11:735–57. [Google Scholar]

[R38] Ruppert D, Wand MP, Caroll RJ. Semiparametric regression. Cambridge University Press; Cambridge: 2003. [Google Scholar]

[R39] Schmid V, Held L. Bayesian extrapolation of space-time trends in cancer registry data. Biometrics. 2004;60:1034–42. doi: 10.1111/j.0006-341X.2004.00259.x. [DOI] [PubMed] [Google Scholar]

[R40] Scott J, Berger J. An exploration of aspects of Bayesian multiple testing. Journal of Statistical Planning and Inference. 2006;136:2144–62. [Google Scholar]

[R41] Scott J, Berger J. Multiple testing, empirical bayes, and the variable-selection problem. 2008 Discussion Paper 2008-10. [Google Scholar]

[R42] Department of Statistical Science, Duke University. Sinha D, Dey DK. Semiparametric Bayesian analysis of survival data. Journal of the American Statistics Association. 1997;92:1195–1212. [Google Scholar]

[R43] Spiegelhalter D, Thomas A, Best N, Gilks W. BUGS 0.5: Bayesian inference using Gibbs sampling manual. MRC Biostatistics Unit, Institute of Public Health; Cambridge, UK: 1996. [Google Scholar]

[R44] Spiegelhalter DJ, Best NG, Carlin BP, Linde AVD. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B. 2002;64:1–34. [Google Scholar]

[R45] Ugarte MD, Goicoa T, Ibáñez B, Militino AF. Evaluating the performance of spatio-temporal Bayesian models in disease mapping. Environmetrics. 2009;20:647–65. [Google Scholar]

[R46] Waller LA, Carlin BP, Xia H, Gelfand AE. Hierarchical spatio-temporal mapping of disease rates. Journal of the American Statistical Association. 1997;92:607–17. [Google Scholar]

[R47] Zhao Y, Staudenmayer J, Coull BA, Wand MP. General design Bayesian generalized linear mixed models. Statistical Science. 2006;21:35–51. [Google Scholar]

PERMALINK

Bayesian latent structure models with space-time-dependent covariates

Bo Cai

Andrew B Lawson

Md Monir Hossain

Jungsoon Choi

Abstract

1 Introduction