Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Sep 20.
Published in final edited form as: Stat Med. 2013 Mar 25;32(21):3670–3685. doi: 10.1002/sim.5789

Bayesian semiparametric model with spatially-temporally varying coefficients selection

Bo Cai 1,*, Andrew B Lawson 2, Md Monir Hossain 3, Jungsoon Choi 2, Russell S Kirby 4, Jihong Liu 1
PMCID: PMC3744634  NIHMSID: NIHMS461190  PMID: 23526312

Abstract

In spatio-temporal analysis, the effect of a covariate on the outcome usually varies across areas and time. The spatial configuration of the areas may potentially depend on not only the structured random intercept but also spatially varying coefficients of covariates. In addition, the normality assumption of the distribution of spatially varying coefficients could lead to potential biases of estimations. In this article, we propose a Bayesian semiparametric space-time model where the spatially-temporally varying coefficient is decomposed as fixed, spatially varying and temporally varying coefficients. The spatially varying coefficients of space-time covariates are modeled nonparametrically by using the area-specific Dirichlet process prior with weights transformed via a generalized transformation. Temporally varying coefficients of covariates are modeled through the dynamic model. Uncertainty of inclusion of the spatially-temporally varying coefficients is also taken into account by variable selection procedure through determining the probabilities of different effects for each covariate. The proposed semiparametric approach shows the improvement compared to the Bayesian spatial-temporal models with normality assumption on spatial random effects and the Bayesian model with the Dirichlet process prior on the random intercept. A simulation example is presented to evaluate the performance of the proposed approach with the competing models. An application to low birth weight data in South Carolina is used for an illustration.

Keywords: Area-specific Dirichlet process, Bayesian space-time models, Spatially-temporally varying coefficients, Variable selection

1 Introduction

Spatial-temporal data are often encountered in various disciplines such as epidemiology, ecology, political sciences, and economics. For example, the average of household income varies across different areas and time. In many applications, spatial-temporal regression models are used to explain the response variable observed over areas and time.

Suppose that the dependent variable yit is observed in the ith spatial unit and the tth time point, for i = 1, … ,n and t = 1, …, T. A general space-time model can be expressed as

yit~f(yit|·), (1)

where f(yit|·) denotes a conditional distribution of yit given observed covariates, latent variables, and measurement errors, with mean μit, μit = E(yit), which is typically related to a linear predictor ηit through a suitable link function g(·) where ηit = git). The response variable could be observed as a continuous (e.g. disease rate), categorical (e.g. indicates of disease or health status) and count (e.g. disease or death number) outcome. When the response is an area-referenced count,

yit~Poisson(Eitexp(ηit)), (2)

where Eit is an expected number of events which is thought of as fixed and sometimes obtained by applying a standard table of sex- and age group-specific rates to the population count in district i at time t, nit, subdivided by age and sex [1]. In our case, we set Eit = Rnit with R = Σit yitit nit. The standardization here is often referred to as internal standardization because we have used the same data to compute reference rates R. The logarithm of the relative risk, ηit, can usually be expressed as

ηit=xitβ+ui+vi+γt, (3)

where xit = (1, xit2, …, xitp)′ denotes a p × 1 vector of covariates associated with unit i and time t, β = (β1, …, βp)′ denotes a p × 1 vector of population parameters, ui and vi denote random effects measuring spatial similarity and excess heterogeneity, respectively, and γt denotes a structured temporal random component. Conventionally, the fixed effects β can be modeled by a multivariate normal prior. The parameters ui and vi are assumed to be independent. The parameter vi captures the heterogeneity among the units which is chosen to follow an exchangeable normally distributed prior, while ui captures the spatial heterogeneity of data which is assumed to follow an intrinsic conditional autoregressive (CAR) distribution (a special case of the general class of Markov random field) [2], ui|ui ~ CAR(τ), i.e. ui|u−i ~ N(ūi, (τmi)−1), where ui = (u1, …, ui−1, ui+1, …, un)′, ūi=mi1jiuj with ∂i denoting the neighbor set of unit i, mi denotes the number of neighbors of unit i, and τ denotes the precision parameter. The constraint i=1nui=0 is defined for the purpose of identifiability of the overall intercept. The temporal parameter γt is assumed to follow an autoregressive (AR) prior.

Model (2) is a typical spatio-temporal model for areal data based on which some hierarchical structures are developed [35]. More complex issues occur when the space-time interaction effect (e.g. wit) is included in the predictor (3) [68]. Ugarte et al. [9] presented the evaluation of the performance of various simple spatio-temporal Bayesian models. Much work, however, assumed that effects of covariates on the response were constant across areas and time. In some applications, this assumption would be inappropriate. For example, the effect of the poverty rate on the low birth weight may vary across different regions and time points. To allow coefficients to vary spatially, among others, Assunção [10] and Gamerman et al. [11] respectively proposed spatially varying coefficients models for small area data. Dreassi et al. [12] developed a space model with time dependent covariates for small area data. Cai et al. [13] proposed a Bayesian regression model with multivariate linear splines for the analysis of space-time data. For point-referenced data, some approaches have been developed. Gelfand et al. [14] proposed a spatial process modelling for univariate and multivariate dynamic spatial data. Paez et al. [15] developed spatially varying dynamic coefficient models.

In the aforementioned spatial-temporal models, the spatially varying coefficients are often assumed to follow Gaussian distributions. In practice, the normality assumption is difficult to verify empirically and may be overly restrictive as spatially-varying coefficients may follow other distributions and may have clustering issues. Recently some approaches have been developed to relax the normality assumption for modeling point-referenced data. Gelfand et al. [16] proposed a Bayesian nonparametric spatial modeling with spatial Dirichlet process (SDP) mixture models. Duan et al. [17] developed a generalized spatial Dirichlet process. Reich and Fuentes [18] described a multivariate semiparametric Bayesian spatial model for spatial data. In contrast, for areal data, the semiparametric model with spatially-temporally varying coefficients of covariates has a lack of development. Li et al. [19] proposed nonparametric hierarchical models for areal data. They modeled the spatial random intercept by using areal-referenced spatial stick-breaking prior with the logit link between the weight and the random variate from the CAR.

In this paper, we focus on developing a Bayesian semiparametric space-time model with spatial-temporally varying coefficients of covariates. We model the spatially varying coefficients by using the area-specific stick-breaking representation for the Dirichlet process prior with the generalized transformation between the weight in the stick-breaking prior and the spatially-specific random variate from the CAR. The generalized transformation includes the linear link, logit link and probit link, providing more realistic weights associated with the spatial information in the areal data. Temporally varying coefficients are modeled by using a dynamic model. Each covariate could have different effects on the response variable, including no effect, the overall-only effect, the spatial-only effect, the temporal-only effect and the spatial-temporal effect. The proposed model allows for the uncertainty of inclusion of different effects for each covariate. We use the variable selection procedure through determining the probabilities of different effects for each covariate.

The remainder of the article proceeds as follows. Section 2 describes the semiparametric model with spatial-temporally varying coefficients while allowing for uncertainty of inclusion of the coefficients. Prior specification and posterior implementation are described. Section 3 discusses the model evaluation and comparison. Section 4 evaluates the performance of the approach based on a simulated example. Section 5 illustrates the approach via a real spatial-temporal data. Finally, Section 6 summarizes and discusses the results.

2 Semiparametric Model with Uncertainty of Spatial-Temporally Varying Coefficients

2.1 The model with selection of spatial-temporally varying coefficients

We consider to model the logarithm of the relative risk as

ηit=xitθit, (4)

where θit = (θit1, …, θitp)′ denotes a p × 1 vector of spatial-temporally varying coefficients of covariates. One can decompose each element of coefficients θit as θitk = αk + βik + γtk, for k = 1, …, p, where αk denotes the global effect of the kth covariate, βik denotes the spatially structured random effect of the kth covariate and γtk denotes the temporal-specific effect of the kth covariate. The regression coefficients are taken to be independent across covariates. However, this decomposition assumes that each covariate simultaneously has an overall effect, spatially and temporally varying effects on the response. This assumption is too restrictive in general which might make the fitted model over-parameterized because a covariate may have 1) no effect; 2) only fixed effect; 3) only spatial-specific effect given the fixed effect; 4) only temporal-specific effect given the fixed effect; 5) all three effects on the response. To account for this uncertainty, we consider to define θitk as

θitk=δ1kαk+δ1k(δ2kβik+δ3kγtk), (5)

where δ1k, δ2k and δ3k denote the indicator variables for αk, βik and γtk, respectively. We fix δ11, δ21 and δ31 to be one for all i and t to reflect some overall spatial-temporal effect. For k ≥ 2, under this construction, a covariate has

  1. no effect (i.e. θitk = 0) for all i and t if δ1k = 0,

  2. only fixed effect (i.e. θitk = αk) for all i and t if δ1k = 1 and δ2k = δ3k = 0,

  3. only spatial-specific effect given the fixed effect (i.e. θitk = αk + βik) for all t if δ1k = δ2k = 1 and δ3k = 0,

  4. only temporal-specific effect given the fixed effect (i.e. θitk = αk + γtk) for all i if δ1k = δ3k = 1 and δ2k = 0,

  5. spatial-temporally varying effects (i.e. θitk = αk + βik + γtk) across areas and time if δ1k = δ2k = δ3k = 1.

It is assumed that a covariate has no spatial- and temporal-specific effects if it does not have a global effect. In addition, given a global effect, a covariate having the spatial-specific effect is assumed to be independent of having the temporal-specific effect. Thus, for the priors of the indicators, we have π(δ1k, δ2k, δ3k) = π(δ2k1k)π(δ3k1k)π(δ1k), where π(δ1k = 1) = p1k, π(δ2k = 0|δ1k = 0) = π(δ3k = 0|δ1k = 0) = 1, π(δ2k = 1|δ1k = 1) = p2k and π(δ3k = 1|δ1k = 1) = p3k. Then the prior for δ1k, δ2k and δ3k is expressed as

π(δ1k,δ2k,δ3k)={1p1kifδ1k=δ2k=δ3k=0p1k(1p2k)(1p3k)ifδ1k=1andδ2k=δ3k=0p1kp2k(1p3k)ifδ1k=δ2k=1and=δ3k=0p1kp3k(1p2k)ifδ1k=δ3k=1and=δ2k=0p1kp2kp3kifδ1k=δ2k=δ3k=1 (6)

It is obvious that sum of the probabilities in (6) equals one. These priors provide prior probabilities of the five different scenarios. It is shown that the indicator δ1k allows the kth covariate to be included or excluded from the model while δ2k and δ3k indicate if the kth covariate has spatial-temporally varying effects given that it is included in the model. The proposed variable selection structure (5) can be thought of as a general case of the variable selection method for spatial-temporally varying effects of covariates. When there are only global effects for the covariates, the proposed model reduces to the model in (3). When the observations are only spatially dependent, the proposed structure reduces to the one by Reich et al. [20] where they focused on variable selection in the parametric model with spatially-varying coefficients. If the indicators are ones (i.e. there is no variable selection), the model becomes a spatial-temporally varying coefficient model. Specially, when the observations are only spatially dependent, the proposed model reduces to the space-varying regression model [10, 11], i.e. ηi=xiθi with θi = α + βi. One might be concerned with the identifiability of the indicators and the coefficients in (5). This concern can be relieved as we are actually interested in δ1kαk, δ1kδ2kβik and δ1kδ3kγtk, and they are identifiable. On the other hand, Bayesian identifiability concerns whether the data and prior provide information for updating the indicators and the coefficients [21]. For example, when the indicator δ1k = 0, all the coefficients (i.e. αk, βik and γtk) only rely on the priors. The data will be involved in updating the coefficient(s) when δ1k = 1.

To allow for exibility of the prior probability, plk, for l = 1, 2, 3, we consider choosing a hyper-prior Beta distribution for the prior exclusion probability, plk ~ Beta(cl, dl). Given these prior probabilities, the full conditional probabilities for different scenarios shown in (6) can be easily calculated through the categorical distribution (see details in Appendix). For the choice of cl and dl (l = 1, 2, 3), following the suggestion by Geisser [22], we choose cl = dl = 1 which yields the uniform hyper prior. Scott and Berger [23] discuss the choice of priors for the prior probability. They conclude that the objective prior (i.e. the uniform prior) for the prior probability can easily be implemented computationally while incorporation of subjective prior information can be beneficial when available. In our case, we have no subjective information about the prior probability of inclusion of the covariates, resulting in choosing a uniform prior. For more details, please refer to Geisser [22], Scott and Berger [23], and Cui and George [24], among others.

2.2 Nonparametric modeling for spatially varying coefficients

Typically, the prior of the global effect of the kth covariate, αk, is taken to be N(0,τα,k1) where τα,k is the precision following a gamma prior Gamma(aα,k, bα,k) with mean aα,k/bα,k and variance aα,k/bα,k2. The conditional specification of the prior for the temporally varying effect of the kth covariate can be taken as N(γt1,k,τγ,k1), for t = 1, …, T, with τγ,k being the precision following Gamma(aγ,k, bγ,k). The initial element 0k is chosen to be zero. For the spatially varying effect of the kth covariate, a conventional choice is the conditional distribution, βikik ~ N(β̄ik, (miτβ,k)−1), where βik = (β1k, …, βi−1,k, βi+1,k, …, βnk)′, β̅ik=mi1jiβjk, mi denotes the number of neighbors of area i, and τβ,k denotes the precision following a gamma prior Gamma(aβ,k, bβ,k). For identifiability, the constraint for βik’s is Σi βik = 0 for k = 1, …, p. However, the normal prior assumption constrains the distributions that the spatially varying random effects may follow. In contrast, the nonparametric prior over distributions provides wider support, typically the support being the space of all distributions (i.e. a infinite dimensional space). As a result, a nonparametric assumption allows for various shapes of the distribution, which may more accurately reflect our prior belief about the true distribution of spatially varying random effects. To allow for uncertainty of distributions that βik may follow, we consider βik ~ Gik, where Gik is an unknown random distribution varying across different areas. We can then choose a prior distribution for Gik with support on the space of all probability measures.

Among the nonparametric processes (e.g. Gaussian process, Pólya tree process, etc.), the Dirichlet process (DP) is one of the most prominent random probability measures due to its richness, computational ease, and interpretability. We consider using the DP in our approach for several reasons. First, any distribution over its space can be approximated arbitrarily and accurately in the weak limit by a sequence of draws from the DP [25]. Second, since the distributions drawn from the DP are discrete, the DP has the clustering property which allows for the repeated values, implying that multiple βik’s can take on the same value simultaneously. This feature of the DP is desirable and reflects the attribute of the spatially varying coefficients which are typically clustered. Third, the stick-breaking representation [26] (which will be described later on) of the DP provides a convenient way of incorporating the area-specific information into the random distribution of βik’s. Finally, with the nice representation such as stick-breaking, the DP can be efficiently implemented. For more details on nonparametric Bayesian processes, one may refer to Ghosal and van der Vaart [25].

The Dirichlet process (DP) prior can be specified as DP(MG0), where M is a concentration parameter and G0 is the base measure of the Dirichlet process. Under this specification, for any partition B = (B1,…, Bq)′ of ℛ, we have

{G(B1),,G(Bq)}~D(MG0(B1),,MG0(Bq)),

where D(·) denotes the Dirichlet distribution on the simplex of Rn. This structure centers the distribution at the parametric base distribution, G0, while allowing the true distribution to deviate from the parametric form. The amount of uncertainty in the parametric assumption is controlled by M. As M tends to zero, most of the samples share the same value sampled from the base measure G0, whereas when M tends to infinity, the samples are almost i.i.d. samples from G0. One of the popular representations of the DP prior is the Pólya urn representation [27, 28]. Briefly, a Pólya urn prior of βi can be expressed as (M+n1)1MG0+(M+n1)1s=1k(i)rs(i)δβs*(i)(·), where k(i) denotes the number of distinct values across all βjs excluding βi, rs(i) denotes the frequency of all βjs (excluding βi) being equal to the unique value βs*(i), and δβ*(·) denotes the degenerate distribution at β*. Although the Pólya urn Gibbs sampling can be implemented straightforwardly, some limitations remain. When model (1) is not a normal distribution, it is problematic to calculate the probability of generating new samples from the posterior based on the prior and the likelihood due to its nonconjugacy. In addition, from the posterior distribution, the parameter is updated one at a time by using Gibbs sampling. This procedure could lead to slow mixing problem. Although an accelerated step [28] can enhance the mixing behavior, slow mixing may still occur due to the inherent property of one-at-a-time updates.

To avoid the limitations with the Pólya urn Gibbs sampling, we consider the blocked Gibbs sampling based on the finite dimensional Dirichlet priors [29]. With stick-breaking representation [26], the finite dimensional prior G can be expressed as G=ds=1rωsδθs(·), where r denotes the number of mixture components, ωs denotes the weight and δ(·) denotes a discrete measure concentrated at θs which is randomly generated from the base measure G0. For the choice of the truncation of the mixture, Ishwaran and Zarepour [30] suggested to use a reasonably large value such as 50 or the sample size.

To allow the unknown distribution of βik to vary across different areas, we propose to model the spatially varying coefficients by using the area-specific stick-breaking prior. Let Sk = (S1k, …, Snk)′ be a configuration, determining a classification of βk = (β1k, …, βnk)′ into rk distinct values βk*=(β1k*,,βrkk*), with Sik = s if βik in area i belonging to group s for covariate k in terms of the spatially varying effect, i.e. βik=βsk*, for s = 1, …, rk. Then we can model βik as follows

Sik~s=1rkωiskδs(·),ωisk=Visk*l=1s1(1Vilk*),βsk*|·~N(0,τβ,k1),fors=1,,rk,

where Visk*=uiskVsk,Vsk~iidBeta(1,Mk) and l<s(1Vilk*)=1 for s = 1. The parameter uisk is defined as a covariate-specific spatial weight which depends on the location-associated random variate. Since uisk ∈ (0, 1), following Ishwaran and James [29], it can be shown that s=1rkωisk=1 is almost surely in the aforementioned area-specific stick-breaking. We use a transformation, g(uisk) = ϕisk, where ϕisk is assumed to follow a CAR(τk) prior. The transformation links the spatial weight to the CAR-distributed variate. Unlike the logit transformation used by Li et al. [19], we consider a more general transformation family introduced by Aranda-Ordaz [31],

g(u)=2λuλ(1u)λuλ+(1u)λ, (7)

where λ denotes the transformation parameter. The choice of different values of λ results in various link functions. This includes that λ = 0, 0.4, 1 corresponds to the logit transformation in the limit, the probit link in approximation and the linear transformation, respectively. The inverse transformation function is then defined as

u=h(ϕ)=(1+ϕλ/2)1/λ(1+ϕλ/2)1/λ+(1ϕλ/2)1/λ,for|ϕλ|<2 (8)

h(ϕ) = 0 for ϕλ ≤ −2 and h(ϕ) = 1 for ϕλ ≥ 2. Since the transformation is symmetric, we can focus on λ ≥ 0. We choose a uniform prior for λ in the range of (0, 0.5) where the logit and probit links are covered. This setting also allows ϕ to vary in a reasonable range. The prior of the concentration parameter Mk is chosen to be Uniform(0, 10) [32].

2.3 Posterior computation

We choose priors for the parameters as described in Section 2.1. The posterior computation relies on a blocked Gibbs sampling algorithm in which we iteratively sample from the full conditional distributions of a block of the parameters. For update of a single parameter from the non-conjugate distribution, we use adaptive rejection Metropolis sampling [33]. For a block of parameters, the posterior computation relies on the Gibbs sampler and Metropolis-Hastings algorithms. After initializing values for the parameters, the proposed MCMC algorithm proceeds in a series of steps outlined in the Appendix. Samples from the joint posterior distribution of the parameters are generated by repeating those steps for a large number of iterations after apparent convergence.

3 Model Comparison

The deviance information criterion (DIC) [34] is widely used as a model comparison tool. DIC is shown to be an approximation to a penalized loss function based on the deviance with a penalty derived from a cross-validation argument. However, the implicit approximation is valid only when the effective number of parameters is much smaller than the number of independent observations [35]. Plummer [35] pointed out that in disease mapping, this assumption does not hold, resulting in that DIC under-penalizes the complex models. Plummer [35] proposed penalized loss functions instead of pD, the effective number of parameter, to assess model adequacy. However, as Plummer [35] noticed, this method requires MCMC runs with each observation left out in turn. Such calculation is not feasible in general, especially for large data sets. In this article, we consider a comparison method based on the conditional predictive ordinate (CPO) [3639]. The CPO for the ith observation at time t is defined as the cross-validated marginal posterior predictive density

CPOit=f(yit|y(it))=f(yit|θ)f(θ|y(it),x(it))dθ=(1f(yit|θ,xi)f(θ|y,x)dθ)1,

where y(it) denotes the vector of observations with the ith observation at time t deleted and θ is the vector of model parameters. The cross-validation likelihood can be estimated by

LCV=i=1nt=1TCPOit.

Since the quantity of the cross-validation likelihood is typically close to zero, the negative cross-validatory predictive log-likelihood [40] can be used,

NLLKCV=i=1nt=1TlogCPOit.

Since a closed form of CPOit is usually unavailable, a Monte Carlo estimate of CPOit can be obtained straightforwardly through MCMC samples {θ(s)}s=1N from the posterior distribution f(θ|y, x),

CPO^it=(1Ns=1N1f(yit|θ(s),xi))1,

where N is the number of iterations after a burn-in period. The estimate of the negative cross-validatory predictive log-likelihood can be calculated accordingly. Since a large CPO indicates agreement between the observation and the model, a model with a smaller NLLKCV for all observations implies a better fit.

4 A Simulation Study

We evaluated the performance of the proposed approach, including the accuracy of the estimates, the sensitivity to different choices of hyperparameters, and comparison of the proposed model with other space-time models. Without loss of generality and for illustration purpose, we created the spatial data using South Carolina geographical structure containing 46 counties. The data were generated for n = 46 counties over T = 10 time points based on the model yit ~ Poisson(Eit exp(ηit)), where the log-relative risk ηit=xitθit with xit = (1, xit2, xit3, xit4, xit5)′ and θit = α + βi + γt. We chose α = (1, 1, 1, 1, 0)′, βi = (0, βi2, βi3, 0, 0)′ with βi2 being clustered to follow four different distributions and βi3 to follow five different distributions (Figure 1), and γt = (0, γt2, 0, γt4, 0)′ with γt2 ~ N(γt−1,2, 0.5) and γt4 ~ N(γt−1,4, 1), for t = 1,…, T. This setting implies that the first covariate (i.e. the intercept) only has an overall effect, the second covariate has the fixed and spatial-temporal effects, the third covariate has the fixed and spatial effects, the fourth covariate has the fixed and temporal effects and the fifth covariate has no effect. We generated xitl ~ Uniform(0, 1) for l = 2,…, 5.

Figure 1.

Figure 1

The design of two spatial random effects, βi2 and βi3, in the simulation study, where the clusters with different colors in the map show different distributions.

We specified the priors for the parameters of the proposed model as follows. We used Gamma(0.05, 0.05) as the prior for τα,k and τγ,k. Following Ishwaran and James [29], we chose Gamma(2,2) as the prior for Mk to encourage both small and large values of Mk. Following Ishwaran and Zarepour [30], we chose rk = n = 46. The prior for the spatially structured random effects βik was chosen as the nonparametric prior described in Section 2.2. We chose prior probabilities in equation (6) to be 0.5 to express an equal chance for inclusion and exclusion.

We implemented the analysis using the Gibbs sampler described in Section 2.3. We generated 50,000 iterations after a burn-in of 10,000 iterations. Convergence was assessed by using a variety of diagnostics described by Cowles and Carlin [41] and implemented using CODA [42] in R [43]. The diagnostic tests showed rapid convergence and efficient mixing. The parameters were estimated by thinning the chain by factor of 5 to obtain a sample of size 10,000.

We compared the proposed model (Model 5) with the four competing spatio-temporal models. The log-relative risks of these models are listed as follows:

  • Model 1: ηit=xitα+ui+vi+γt,,

  • Model 2: ηit=xitα+xitui+vi+γt,

  • Model 3: ηit=xitγt+uit+vit, and

  • Model 4: ηit=xitα+ui+γt.

In the four models, we followed conventional settings by specifying the prior of α as Np(0, Σα) with Σα~IWishart(p,Σ01) and Σ01 is a 5 × 5 precision matrix with diagonal element being 0.1 and off-diagonal element being 0.05. The prior of ui in Model 1 was chosen as CAR(τ1*) with τ1*~Gamma(0.005,0.005). The prior of vi was taken as N(0,τ2*1) with τ2*~Gamma(0.005,0.005). The conditional specification of the prior of γt was chosen as N(γt1,τ3*1) with γ0~N(0,τ3*1) and τ3*~Gamma(0.005,0.005). In Model 2, the prior of ui was chosen as a multivariate CAR model, MCAR(Σu1). In Model 3, the prior of γt was chosen as N(γt1,τ4*1). The priors of ut and vt in Model 3 were taken as CAR(τ5*) and N(0,τ6*1I), respectively, for t = 1, …, T. In Model 4, the prior of ui was assumed to be a typical DP prior. Relying on the BlackBox component builder, WinBUGS[44] allows one to carry out relatively simple Bayesian statistical modeling by simply specifying a model and the priors for its parameters. For this reason, we implemented Models 1–4 using WinBUGS. Although it can be conceptually implemented using WinBUGS, the proposed model was written in R due to slowness and lack of exibility of WinBUGS. Based on our experience, both WinBUGS and R programs provide really similar results.

The second column in Table 1 presents the comparison of the estimated negative cross-validatory predictive log-likelihoods for the four models. It is shown that Model 5 with the smallest value of NLLKCV,sim outperforms the other four models. Table 2 shows the posterior probabilities of inclusion of covariates in the five different cases listed in (6). It is evident that the model selects the designed covariate structure for each covariate with the highest posterior probability.

Table 1.

Model comparison based on the negative cross-validatory log-likelihood for the simulated example and the application to the low birth weight data in South Carolina.

Model NLLKCV,sim NLLKCV,app
Model 1 1189.78 1897.18
Model 2 1166.20 1854.43
Model 3 1175.92 1869.29
Model 4 1191.93 1810.67
Model 5 879.51 1689.83

Table 2.

Posterior probabilities of inclusion of the four covariates in the simulation example. Case (δ123) indicates the five scenarios described in (6).

Case

Predictor (0,0,0) (1,0,0) (1,1,0) (1,0,1) (1,1,1)
x2 0.03 0.06 0.14 0.15 0.62
x3 0.02 0.04 0.65 0.13 0.16
x4 0.03 0.09 0.08 0.66 0.14
x5 0.70 0.16 0.05 0.07 0.02

In the simulation study (and the real data example), we checked sensitivity of the results to the prior specification by repeating the analyses with different hyperparameters. Particularly, we applied the Gamma prior, Gamma(0.01, 0.005), for the precision, and the uniform prior, Uniform(0, 50), for the standard deviation. Although we do not show details, there is basically no difference in parameter estimates, inferences or model ranking for the prior specification. One may also choose other potential priors such as a half-Cauchy prior. According to Gelman [45], the choice of noninformative priors for some scale parameter of the parameter with a common distribution may have a big impact on inferences, especially when the number of clusters is small (say, below 5) or the cluster-level variance is close to zero. However, using the traditional Gamma prior does not seem to sensitively affect the inference in our cases. The reasons might be: first, the number of subjects (i.e. clusters) is relatively large (n=46); second, since the random effects in our hierarchical model follow a random distribution rather than common distributions, it is not clear which prior for the variance of random effects should be more appropriate. In addition, our sensitivity analysis shows the appropriateness of the prior specification in the proposed model.

5 Application to Low Birth Weight Data in South Carolina

As an illustration, we applied the approach to the data of county-specific low birth weights (i.e. birth weight is less than 2500 gram) across 46 counties in the state of South Carolina during the period 1997–2006. As the observations were made yearly, a total of 460 observations were included in the data. The number of county-level low birth weights were obtained from South Carolina Department of Health and Environmental Control (DHEC). We considered the county-level population density (defined as population divided by the total land area in square miles), the proportion of African Americans, median household income and unemployment rate as socio-economic predictors of low birth weights. The population density, the proportion of African American population and the household income were acquired from the U.S. census. The unemployment rates were attained from the U.S. Bureau of Labor Statistics. In addition, we also considered aggregate data based on birth certificates for the other known socio-demographic and behavioral risk factors for low birth weights, including the proportion of mothers with less than 12th grade education (i.e. high school), the proportion of mothers smoking during pregnancy and the proportion of mothers receiving inadequate prenatal care based on the Kotelchuck Index (IKI value). See Kirby et al. [46] for details of the choice of the covariates. We calculated the correlation for each pair of the covariates. The range of the correlations is between 0.01 and 0.46, indicating that the covariates have reasonably low correlation. The multicollinearity was diagnosed by calculating the variance inflation factor (VIF) for each covariate. The range of VIF is between 1.15 and 3.68, implying the low multicollinearity.

In the data, yit denotes the number of low birth weights in county i during year t, and xit = (1, xit2, xit3, xit4, xit5, xit6, xit7, xit8)′ with xit2 indicating the county-level population density, xit3 the proportion of black people, xit4 the median household income, xit5 the unemployment rate, xit6 the proportion of mothers with less than 12th grade education, xit7 the proportion of mothers smoking during pregnancy and xit8 the proportion of mothers with IKI value in county i for year t, for i = 1, …, 46 and t = 1, …, 10.

We completed the specification of the proposed model by choosing prior Gamma(0.005, 0.005) for τα,k, τβ,k, τγ,k and τk. The prior probability for selection of regression coefficients is chosen to follow Beta(1, 1). Since the number of regions is 46, we chose the truncation of the stick-breaking representation as 15 [29]. We also chose larger values which gave similar results. We collected 10,000 samples by thinning 50,000 samples by a factor of 5 after a burn-in of 10,000 iterations.

The third column in Table 1 shows the estimated negative cross-validatory predictive log-likelihoods for the proposed model along with the four competing models (as outlined in Section 4). We can see that the estimated NLLKCV,app values for Models 1–4 are much higher than that for Model 5, evincing that Model 5 is the best among all the models. The priors of the parameters and the settings for the hyperparameters used were similar to those in the models of the simulated example.

Table 3 elucidates the marginal posterior probabilities of inclusion of the seven predictors in terms of fixed effects, spatial random effects and temporal effects. To detect if the fixed effects and space/time variations are significant, we calculated the Bayes factor based on the marginal posterior probabilities of indicators (i.e. δ1k, δ2k and δ3k, k = 2, …, 8). More precisely, the Bayes factor can be calculated as

BF=Pr(δjk=1|x,y)/Pr(δjk=1)Pr(δjk=0|x,y)/Pr(δjk=0),j=1,2,3,

where Prjk = 1) denotes the prior probability of inclusion and Prjk = 1|x, y) denotes the marginal posterior probability of inclusion. Since we assume the prior probabilities of inclusion and exclusion to be equivalent (i.e. 0.5), the Bayes factor reduces to Prjk = 1|x, y)/Prjk = 0|x, y). It is noticed that when Prjk = 1|x, y) ≥ 0.94, the Bayes factor is over 15. Based on Jeffrey’s Bayes factor criteria ([46], p. 432), there is a very strong evidence of the effects with the posterior probability of inclusion being over 0.94. For the fixed effects, the proportion of black people and the median household income are significantly included in the model with the posterior probability of inclusion over 94%. For the spatial random effects, we conclude that the spatial variation of the effect for the proportion of black people is marginally very strong (93%), indicating that the effect of the covariate on low birth weights significantly vary across the counties. On the other hand, there is no strong temporal variation of the effect for any covariates, implying little variation of the covariates effects over time.

Table 3.

Marginal posterior probabilities of inclusion of the 7 predictors in the application.

Predictor Fixed Effect Spatial Effect Temporal Effect
Population Density 0.91 0.82 0.63
Proportion of Black 0.97 0.93 0.89
Median Household Income 0.96 0.91 0.90
Unemployment Rate 0.88 0.82 0.66
Proportion of Less Education 0.85 0.74 0.70
Proportion of Smoking 0.92 0.90 0.78
Proportion of IKI 0.89 0.84 0.69

Table 4 provides the estimates and 95% credible intervals of the fixed effects for the seven predictors based on the posterior samples. It is clear that the African American women significantly have lower birth weight babies than the other race group. Besides the proportion of black people, it is noticed that the median household income is marginally negatively associated with the probability of low birth weight (95% CI is (−1.92,0.02)). This is basically consistent with its posterior probability of inclusion which is 0.96. For the population density and the proportion of smoking, as their posterior probabilities of inclusion are 0.91 and 0.92, respectively, it is anticipated that their 95% CIs cover zero, implying that these two covariates have less impact on explaining the low birth weights. The rest predictors are not significant in predicting the low birth weight. These results are consistent with the previous studies [47, 48].

Table 4.

Estimates and 95% credible intervals of fixed effects from the posterior samples.

Predictor Fixed Effect 95% CI
Population Density −0.82 (−1.76, 0.16)
Proportion of Black 1.71 (0.30, 3.06)
Median Household Income −1.00 (−1.92, 0.02)
Unemployment Rate 0.86 (−0.43, 2.23)
Proportion of Less Education 0.86 (−0.47, 2.11)
Proportion of Smoking 1.13 (−0.21, 2.56)
Proportion of IKI 0.38 (−0.87, 1.54)

Figure 2 depicts the posterior densities for the spatial-specific effects of the predictors. It shows that the spatial random effects of different predictors follow different distributions. Figure 3 exemplifies boxplots of the posterior samples of the spatially-varying coefficient of the proportion of black people over county, where the points at zero are shown due to variable selection. Figure 4 presents boxplots of the posterior samples of the temporally-varying coefficients of all covariates over time. It is noted that although there are some trend variations, the temporally-varying coefficients vary in a small scale and almost all of the ranges cover zero, implying insignificant temporal effect for the covariates. Interestingly, even in the cases where the model coefficients do not present significant variation in time and space, the proposed approach still provides better fitting to the data in this application based on the NLLKCV,app in Table 1. This advantage benefits from the area-specific nonparametric distribution assumption for the spatially-varying coefficients along with variable selection which allows for positive probabilities of excluding coefficients (i.e. zero values).

Figure 2.

Figure 2

Posterior densities for spatially varying coefficients(βi) across areas in the application.

Figure 3.

Figure 3

The boxplots of the posterior samples of the spatially-varying coefficient of the proportion of black people over county.

Figure 4.

Figure 4

The boxplots of the posterior samples of the temporally-varying coefficients of the 7 covariates over time.

Figure 5 displays the choropleth maps of comparison of the raw standard mortality ratio (SMR) of the low birth weight and the estimated SMR based on the proposed model in years 1997, 2002 and 2006. It is shown that the estimated relative risks of the low birth weight based on the proposed model capture the main geographical pattern and the temporal trend. In general, the relative risk of the low birth weights in many counties of South Carolina was increasing during 1997–2006. More precisely, initially high relative risks were mostly in the center and the East of the region. Then relative risks were getting worse in the East while in the center remained in the same interval, and increased remarkably in the North and the Southwest.

Figure 5.

Figure 5

The choropleth maps of comparison of the raw SMR of low birth weight in SC and the estimated SMR based on the proposed method in years 1997, 2002 and 2006.

6 Discussion

We proposed a Bayesian semiparametric model with variable selection for the analysis of space-time data. The proposed approach relaxes the normality assumption for the spatial random effects of covariates while allowing for uncertainty of the inclusion of the fixed effects, spatial random effects and temporal effects. The spatial information is incorporated into the nonparametric distributions for the spatial random effects via the generalized transformation which includes various popular links.

One of the major advantages of the proposed semiparametric model is the ability to flexibly model variation within a localised areas of a study region. In the proposed model we allow geographically-localised definition of the dependence of covariates and provide a flexible method of incorporation of covariates with pre-defined inclusion probabilities. Although in the examples the covariate profiles show some impact on the overall county rates, it is evident that the estimated negative cross-validatory predictive log-likelihoods supports the proposed model over conventional space-time random effect models. This suggests that even with the degree of parameterization, there is an overall benefit in the use of such semiparametric models, especially when covariates are to be flexibly accommodated. Computational intensity is noticed in the proposed approach, though it is reasonably efficient when it is coded in R. It took about 84 hours to run 50,000 iterations for the real data example on a Linux server with Xeon(R) CPU X5355 at 2.66GHz. Future work will focus on developing more efficient semiparametric space-time models for areal data.

Acknowledgements

The authors would also like to thank the editor, the associate editor, and the three referees for valuable comments which greatly improved the presentation of the article. This work was supported by NIH/NHLBI 5R21HL088654-02.

Appendix

Full conditional distributions in Section 2.3

Step 1: Update (δ1k, δ2k, δ3k), for k = 1, …, p, from its full conditional posterior distribution,

exp[i=1nt=1T(xitkyitθitkEitexp(k=1pxitkθitk))]π(δ1k,δ2k,δ3k).

With this posterior distribution, we can calculate the posterior probability for each scenario in (6). After standardization, we can generate a sample of (δ1k, δ2k, δ3k) from a discrete probability measure as

(δ1k,δ2k,δ3k)~l=15plk*δκlk(·),

where plk* denotes the standardized probability and κlk denotes different scenarios for (δ1k, δ2k, δ3k) shown in (6).

Step 2: Update αk, for k = 1, …, p, from its full conditional posterior distribution,

exp[i=1nt=1T(xitkyitδ1kαkEitexp(k=1pxitkθitk))τα,k2αk2],

Step 3: Update τα,k, for k = 1, …, p, from its full conditional posterior distribution,

Gamma(aα,k+12,bα,k+αk22)

Step 4: Update γtk, for t = 1, …, T and k = 1, …, p, from its full conditional posterior distribution,

exp[i=1nt=1T(xitkyitδ3kγtkEitexp(k=1pxitkθitk))τγ,k2t=1T(γtkγt1,k)2],

Step 5: Update τγ,k, for k = 1, …, p, from its full conditional posterior distribution,

Gamma(aγ,k+T2,bα,k+12t=1T(γtkγt1,k)2)

Step 6: Update βik, for i = 1, …, n and k = 1, …, p. Let Sk*=(S1k*,,Srkk*) be the configuration of the current distinct values of Sk. Then we sample βSsk**, for s = 1, …, rk, from its full conditional posterior distribution,

exp[i:Sik=Ssk*t=1T(xitkyitδ2kβSsk**Eitexp(k=1pxitkθitk))τβ,kβSsk**22].

Step 7: Update τβ,k, for k = 1, …, p, from its full conditional posterior distribution,

Gamma(aβ,k+n2,bβ,k+12i=1nβik2)

Step 8: Update Sik, for i = 1, …, n and k = 1, …, p from its full conditional posterior distribution,

s=1rkω̂iskδs(·),

where ω̂iskωiskexp[t=1T(xitkyitδ2kβSsk**Eitexp(k=1pxitkθitk))] for s = 1, …, rk.

Step 9: Update ωisk, for s = 1, …, rk and k = 1, … p, as follows,

ωisk=isk*l<s(1ilk*),

where isk*=uisksk,uisk=h(ϕisk),sk~Beta(1+Lsk,Mk+l=s+1rkLlk), with Lsk = #{Sik = s, for i = 1, …, n}, denoting the number of Sik values that equal s.

Step 10: Update Mk, for k = 1, …, p, from its full conditional posterior distribution,

Gamma(rk,l=1rk1log(1lk)).

Step 11: Update ϕisk, for k = 1, …, p, from its full conditional posterior distribution,

s=1rkisk*l<s(1ilk*)δβis*exp(miτk2(ϕiskϕ̅isk)2),

where isk*=uisksk with uisk=(1+ϕiskλk/2)1/λk(1+ϕiskλk/2)1/λk+(1+ϕiskλk/2)1/λk.

Step 12: Update τk, for k = 1, …, p, from its full conditional posterior distribution,

Gamma(ak+nrk2,bk+12s=1rki~j(ϕiskϕjsk)2)

Step 13: Update λk, for k = 1, …, p, from its full conditional posterior distribution,

i=1n(s=1rkisk*l<s(1ilk*)δβis*).

REFERENCES

  • 1.Bernardinelli L, Montomoli C. Empirical Bayes versus fully Bayesian analysis of geographical variation in disease risk. Statistics in Medicine. 1992;11:983–1007. doi: 10.1002/sim.4780110802. [DOI] [PubMed] [Google Scholar]
  • 2.Besag J. Spatial interaction and statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B. 1974;36:192–236. [Google Scholar]
  • 3.Waller LA, Carlin BP, Xia H, Gelfand AE. Hierarchical spatio-temporal mapping of disease rates. Journal of the American Statistical Association. 1997;92:607–617. [Google Scholar]
  • 4.Knorr-Held L, Besag J. Modelling risk from a disease in time and space. Statistics in Medicine. 1998;17:2045–2060. doi: 10.1002/(sici)1097-0258(19980930)17:18<2045::aid-sim943>3.0.co;2-p. [DOI] [PubMed] [Google Scholar]
  • 5.Lagazio C, Biggeri A, Dreassi E. Age-period-cohort models and disease mapping. Environmetrics. 2003;14:475–490. [Google Scholar]
  • 6.Knorr-Held L. Bayesian modelling of inseparable space-time variation in disease risk. Statistics in Medicine. 2000;19:2555–2567. doi: 10.1002/1097-0258(20000915/30)19:17/18<2555::aid-sim587>3.0.co;2-#. [DOI] [PubMed] [Google Scholar]
  • 7.Lagazio C, Dreassi E, Biggeri A. A hierarchical Bayesian model for spacetime variation of disease risk. Statistical Modelling. 2001;1:17–29. [Google Scholar]
  • 8.Richardson S, Abellán JJ, Best N. Bayesian spatio-temporal analysis of joint patterns of male and female lung cancer risks in Yorkshire. Statistical Methods in Medical Research. 2006;15:385–407. doi: 10.1191/0962280206sm458oa. [DOI] [PubMed] [Google Scholar]
  • 9.Ugarte MD, Goicoa T, Ibáñez B, Militino AF. Evaluating the performance of spatiotemporal Bayesian models in disease mapping. Environmetrics. 2009;20:647–665. [Google Scholar]
  • 10.Assunção RM. Space varying coefficient models for small area data. Environmetrics. 2003;14:453–473. [Google Scholar]
  • 11.Gamerman D, Moreira ARB, Rue H. Space-varying regression models: specifications and simulation. Computational Statistics and Data Analysis. 2003;42:513–533. [Google Scholar]
  • 12.Dreassi E, Biggeri A, Catelan D. Space-time models with time dependent covariates for the analysis of the temporal lag between socio-economic factors and mortality. Statistics in Medicine. 2005;24:1919–1932. doi: 10.1002/sim.2063. [DOI] [PubMed] [Google Scholar]
  • 13.Cai B, Lawson AB, Hossain Md, Choi J. Bayesian Latent Structure Models with Space-Time Dependent Covariates. Statistical Modeling. 2012;12:145–164. doi: 10.1177/1471082X1001200202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gelfand A, Banerjee S, Gamerman D. Spatial process modelling for univariate and multivariate dynamic spatial data. Environmetrics. 2005;16:465–479. [Google Scholar]
  • 15.Paez MS, Gamerman D, Landim FMP, Salazar E. Spatially varying dynamic coefficient models. Journal of Statistical Planning and Inference. 2008;138(4):1038–1058. [Google Scholar]
  • 16.Gelfand AE, Kottas A, MacEachern SN. Bayesian nonparametric spatial modeling with Dirichlet process mixing. Journal of the American Statistical Association. 2005;100:1021–1035. [Google Scholar]
  • 17.Duan J, Guindani M, Gelfand A. Generalized spatial Dirichlet process models. Biometrika. 2007;94:809–825. [Google Scholar]
  • 18.Reich BJ, Fuentes M. A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields. Annals of Applied Statistics. 2007;1:249–264. [Google Scholar]
  • 19.Li P, Banerjee S, Hanson TA, McBean AM. Research Report. Division of Biostatistics, University of Minnesota; 2010. Nonparametric hierarchical modeling for detecting boundaries in areally referenced spatial datasets. [Google Scholar]
  • 20.Reich BJ, Fuentes M, Herring AH, Evenson KR. Bayesian Variable Selection for Multivariate Spatially-Varying Coefficient Regression. Biometrics. 2010;66:772–782. doi: 10.1111/j.1541-0420.2009.01333.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kuo L, Mallick B. Variable selection for regression models. Sankhyá B. 1998;60:65–81. [Google Scholar]
  • 22.Geisser S. On prior distribution for binary trials (with discussion) American Statistician. 1984;38(4):244–251. [Google Scholar]
  • 23.Scott J, Berger J. An exploration of aspects of Bayesian multiple testing. Journal of Statistical Planning and Inference. 2006;136:2144–2162. [Google Scholar]
  • 24.Cui W, George EI. Empirical Bayes vs. fully Bayes variable selection. Journal of Statistical Planning and Inference. 2008;138:888–900. [Google Scholar]
  • 25.Ghosal S, van der Vaart AW. Fundamentals of Nonparametric Bayesian Inference. Cambridge University Press; (expected in 2013). [Google Scholar]
  • 26.Sethuraman J. A constructive definition of Dirichlet priors. Statistica Sinica. 1994;4:639–650. [Google Scholar]
  • 27.Blackwell D, Macqueen JB. Ferguson distributions via Pólya urn schemes. Annals of Statistics. 1973;1:353–355. [Google Scholar]
  • 28.Bush CA, MacEachern SN. A semiparametric Bayesian model for randomised block designs. Biometrika. 1996;83:275–285. [Google Scholar]
  • 29.Ishwaran H, James LF. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association. 2001;96:161–173. [Google Scholar]
  • 30.Ishwaran H, Zarepour M. Markov Chain Monte Carlo in Approximate Dirichlet and Beta Two- Parameter Process Hierarchical Models. Biometrika. 2000;87:371–390. [Google Scholar]
  • 31.Aranda-Ordaz F. On two families of transformations to additivity for binary response data. Biometrika. 1981;68:357–363. [Google Scholar]
  • 32.Ohlssen DI, Sharples LD, Spiegelhalter DJ. Flexible random-effects models using Bayesian semi-parametric models: Application to institutional comparisons. Statistics in Medicine. 2007;26:2088–2112. doi: 10.1002/sim.2666. [DOI] [PubMed] [Google Scholar]
  • 33.Gilks WR, Best NG, Tan KKC. Adaptive Rejection Metropolis Sampling within Gibbs Sampling. Journal of Applied Statistics. 1995;44:455–472. [Google Scholar]
  • 34.Spiegelhalter DJ, Best NG, Carlin BP, Linde AVD. Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society, Series B. 2002;64:1–34. [Google Scholar]
  • 35.Plummer M. Penalized loss functions for Bayesian model comparison. Biostatistics. 2008;9:523–539. doi: 10.1093/biostatistics/kxm049. [DOI] [PubMed] [Google Scholar]
  • 36.Geisser S. Predictive Inference: An Introduction. London: Chapman & Hall; 1993. [Google Scholar]
  • 37.Gelfand AE, Dey D, Chang H. Model determination using predictive distributions with implementation via sampling based methods (with discussion) In: Bernardo J, et al., editors. Bayesian Statistics 4. Oxford University Press; 1992. pp. 147–167. [Google Scholar]
  • 38.Dey D, Chen MH, Chang H. Bayesian Approach for Nonlinear Random Effects Models. Biometrics. 1997;53:1239–1252. [Google Scholar]
  • 39.Sinha D, Dey DK. Semiparametric Bayesian Analysis of Survival Data. Journal of the American Statistical Association. 1997;92:1195–1212. [Google Scholar]
  • 40.Draper D, Krnjajić M. Technical Report. Santa Cruz: Department of Applied Mathematics and Statistics, Baskin School of Engineering, University of California; 2006. Bayesian model specification. [Google Scholar]
  • 41.Cowles MK, Carlin BP. Markov Chain Monte Carlo diagnostics: A comparative review. Journal of the American Statistical Association. 1995;91:883–904. [Google Scholar]
  • 42.Plummer M, Best NG, Cowles K, Vines K. CODA: Convergence Diagnosis and Output Analysis for MCMC. R News. 2006;6:7–11. [Google Scholar]
  • 43.R Development Core Team. Vienna, Austria: R Foundation for Statistical Computing; 2007. R: A language and environment for statistical computing. ISBN 3-900051-07-0, URL http://www.R-project.org. [Google Scholar]
  • 44.Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS – a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing. 2000;10:325–337. [Google Scholar]
  • 45.Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis. 2006;1:515–534. [Google Scholar]
  • 46.Jeffreys H. The Theory of Probability. 3rd ed. Oxford: 1961. [Google Scholar]
  • 47.Kirby R, Liu J, Lawson A, Choi J, Cai B, Hossain M. Small area low birth weight incidence and socio-economic predictors: a latent spatial structure approach. Spatial and Spatio-Temporal Epidemiology. 2011;2(4):265–271. doi: 10.1016/j.sste.2011.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Olson ME, Diekema D, Elliott BA, Renier CM. Impact of Income and Income Inequality on Infant Health Outcomes in the United States. Pediatrics. 2010;126:1165–1173. doi: 10.1542/peds.2009-3378. [DOI] [PubMed] [Google Scholar]

RESOURCES