Abstract
We propose a space-time stick-breaking process for the disease cluster estimation. The dependencies for spatial and temporal effects are introduced by using space-time covariate dependent kernel stick-breaking processes. We compared this model with the space-time standard random effect model by checking each model’s ability in terms of cluster detection of various shapes and sizes. This comparison was made for simulated data where the true risks were known. For the simulated data, we have observed that space-time stick-breaking process performs better in detecting medium- and high-risk clusters. For the real data, county specific low birth weight incidences for the state of South Carolina for the years 1997–2007, we have illustrated how the proposed model can be used to find grouping of counties of higher incidence rate.
Keywords: Cluster, Dependence, Dirichlet process mixture, Space-time, Stick-breaking processes
1 Introduction
Health data are now routinely available for many spatial locations for successive years. This repeated year information introduces a further dimension that is often important because of changes in socio-demographic structure or other health factors during the time period of the study. It has also been observed that any significant changes close in space can be close in time too. This observation underscores the importance of space-time modeling and emphasizes the consideration of two potential dimensions of dependency: between spatially neighbor locations, and between successive time points. Including these dimensions in the analysis adjusts the parameter estimates for spatial, temporal and spatial-temporal dependences, and contributes to the better understanding of disease etiology. Central to small area health investigations, an important application of these models is the identification of areas associated with high rates of disease incidence, i.e cluster detection.
Recent works by Gelfand et al. (2005), Griffin and Steel (2006) and Duan et al. (2007) have proposed to use Dirichlet process (DP) prior distribution for better risk estimation of spatially reference health data. In these works, the spatial dependence was introduced either by defining spatial model for the mixing weights or for the mixing components. In Gelfand et al. (2005), dependent DP prior distributions are assumed for the random mixing components, and stick-breaking prior distributions are assumed for the weight components. The weights are not indexed by spatial locations. For the consideration of dependency among the neighboring spatial locations, a zero-mean stationary Gaussian process model was considered for the base distribution from which the random mixing components are drawn. A generalization of the above spatial dependence DP was proposed by Duan et al. (2007). In their proposal, instead of a common surface selection (i.e., common weights) for all spatial locations, a latent covariate was introduced to determine a random surface selection (i.e., the weights are spatial location dependent). The latent covariates are generated from a Gaussian random field. In these works (Gelfand et al. 2005; Duan et al. 2007), an error process of Gaussian distributions was used for the mixing components to have continuous supports for the observed data, and by this way the discreteness of the DP prior distribution was resolved. In addition to this, the dependent DP was introduced through the mixing components by assigning zero-mean stationary Gaussian process prior distribution, i.e., by allowing the mixing parameter to be drawn from a random field. In a different approach, Griffin and Steel (2006) proposed a dependent DP by introducing an ordered stick-breaking prior distribution for the mixing weights. The ordering depends on the closeness of a covariate value, i.e., distributions for similar covariate values will be assigned to similar orderings. In a non-spatial setup, Dunson and Park (2008) proposed kernel based stick-breaking processes to smooth the mixing weights. Reich and Fuentes (2007) extended these kernels to include spatial processes in order to model the spatial dependence where the weights are spatially indexed.
The development of space-time dependent Dirichlet Processes is very limited. Gelfand et al. (2005) used the temporal observations as the replications at each spatial location in their development of a spatial dependence DP. Instead of considering this temporal information as independent replications in spatial dependence DP, Kottas et al. (2007) viewed it as a temporally evolving spatial process. The proposal is based on the decomposition of mixing components with temporal index into a first-order autoregressive term and a random innovation term. Dependent DP prior distributions are assumed for the random innovations, and a zero-mean stationary Gaussian process is considered for the base distribution.
In this paper, we propose a different space-time dependence DP mixture model for the modeling of incidence count data that are observed at spatial locations for successive time points. In our proposal, we introduce the dependencies for spatial and temporal effects by using space-time covariate dependent kernel stick-breaking processes. The space-time covariate dependent kernel can include spatially varying regression coefficients and separable/non-separable space-time effects. The spatially varying regression coefficients are modeled by a multivariate conditional autoregressive (MCAR) type model, and the temporal effects are modeled by a first order autoregressive type model. Our proposal is different from Kottas et al. (2007) in at least two ways: i) we use the mixing weights instead of mixing components to introduce space-time dependency; and ii) our space-time dependent kernel includes separate effects for space, time, and their interaction, instead of considering a dynamic spatial process.
The rest of this paper is structured as follows. First, we give a brief description of DP nonparametrics. Following that, we introduce the proposed space-time stick-breaking process (Sect. 2). Section 3 describes a standard space-time random effect model, employed here for comparison with the proposed stick-breaking process. Implementation of the posterior sampling algorithm and the prior distribution specifications are discussed in Sect. 4. In sect. 5 we describe a simulation study with the aim of comparative evaluation of the two modeling approaches. The comparisons are made by using a cluster detection diagnostics by checking these models’ ability in terms of cluster detection of various shapes and sizes. An application to county level low birth weight incidences in the state of South Carolina is described in Sect. 6. With the real data, we describe how the grouping information of space-time stick-breaking process can be used for finding clusters of similar risk areas. Section 7 provides the concluding remarks.
1.1 Dirichlet process mixture background
The data y1, …, yn are assumed to be generated from an unknown density r, i.e. , i = 1, …, n. Then, the unknown r in Dirichlet process mixture (DPM) model (Ferguson 1983; Lo 1984) framework can be written as
| (1.1) |
where f(·|θ) is a conditional density for a given cluster membership, i.e. indexed by θ which can be a scalar or a vector, and θ ∈ Θ. G(·) in (1.1) is the mixing distribution. In Bayesian nonparametric, G(·) is a random probability measure (RPM), i.e., a probability measure over a set of probability distributions. The other way of seeing (1.1), marginally, the distribution of y is a mixture distribution over density f(y|θ). The RPM G(θ) can be discrete or continuous. In order to achieve grouping in the observed data, we are to introduce discreteness in G(θ). We select a DP prior distribution for this RPM, G(·).
The stick-breaking construction of Sethuraman (1994) confirms that the measures drawn from a DP are discrete with probability one. The probability measure G can be represented as infinite mixtures of point masses, as , where the locations , δθ denotes the point mass located at θ and the stick-breaking probabilities with .
Using this constructive definition of the DP, Eq. (1.1) can be written as . It reveals the fact that the prior distribution for the unknown r can be defined by infinite mixtures of density f. The above can also be written as .
The DPM is defined by writing the above model in a Bayesian hierarchical fashion
| (1.2) |
In words, yi is drawn from , parameterized by is drawn independently from an identical RPM G and G has a DP prior distribution with a base distribution G0 and a scale (or, concentration) parameter α. DPM is a mixture model since a number of s can take the same value because of discrete G, and yi’s with the same value of belong to the same cluster. Hence, the s give rise to a partition of the set of yi’s and α controls the distribution of the number of partitions.
Another way of writing the above DPM models for a finite number of clusters (say, K) with a cluster assignment variable zi and as
In this representation, the cluster assignment variable zi loses the meaning of discreteness as the space for zi becomes continuous in the limit K → ∞ since the number of clusters becomes countably infinite and the prior expectation of any specific cluster tend to zero. Hence, the assignment variable zi can be ignored and θzi can be replaced by θj. The parameter θj is drawn from a DP with base distribution instead. Green and Richardson (2001) showed that for large K when α/K → 0, the finite mixture model with multinomial prior distributions for the assignment variables and Dirichlet distribution for the weight (DMA in their terminology) converges to DPM.
2 Stick-breaking process space-time (SBPST) model
We assume that disease incidence data are observed in a fixed set of small areas and for discrete time periods. Denote the small area units as i = 1, …, n, and the temporal units as t = 1, …, T. The observed count of incidence is denoted by Oit, and the expected count by Eit for location i and time t. The expected count is usually assumed to be fixed and obtained from a standard population after adjustment for case-mix. The observed count can be modeled as a mixture of Poisson distributions as
| (2.1) |
where θ = (θ1, …, θK) is a vector of mixing components, and non-negative ωitj’s are the mixing weights and satisfy the condition . The number of mixing components K is not fixed beforehand and considered a random variable.
Following the De Finetti theorem, if (O11, O12, …) are infinitly exchangeable, then the joint probability p(O11, O12, …, OnT) has a representation as a mixture:
for RPM, G(θ). A general representation of this can be written in the DPM framework (1.2) as
The functional form for f(·|·) can assume any probability distribution that belongs to exponential family, and defined for positive random variable. Depending on this specific distribution, θj could be a scalar or a vector of parameters. We assume a Poisson distribution for f(·|Eitθj) which is a reasonable assumption for count data and because of this specification, θj can be interpreted as a relative-risk parameter. This interpretation of θj potentially contributes to the specification of the baseline distribution, G0. We assume a gamma distribution of the form Gamma(ς, ξ) for the baseline measure G0, where ς and ξ are respectively the shape and scale parameters.
The stick-breaking construction of the RPM is , where δθj is the mass probability at point θj. The mixing weight, ωitj is indexed by space and time so that the component specific weights will be able to incorporate spatial and temporal dependence into the mixture model. We define stick-breaking representation for the weights by , where Bitj = qitj pj with qitj is a covariate dependent kernel function and pj ~ Beta(1, α). The above stick-breaking representation tells, the unit probability stick is broken off by the amount qit1 pt1 for the first basis, by the amount qit2 pt2 from the remaining probability (1 − qit1 pt1) for the second basis, and it continues until using up the unit stick in the limit. Although there are many possible kernel functions that can be used for spatially correlated data, e.g. uniform or squared-exponential kernel (Reich and Fuentes 2007), or logistic normal or grouped continuous kernel (Fernandez and Green 2002), we preferred to use logistic normal kernel since Fernandez and Green (2002) already demonstrated its performance in disease mapping and also it is reasonably flexible to introduce spatially varying regression coefficients with the space-time covariates to this kernel. The covariate dependent kernel is specified by a logistic normal form , where the scaling parameter, φ (> 0), controls the smoothness of hitj, and the specific form of it will depend on the application. Instead of logit link function, other link functions such as probit are also possible but here we did not explore this possibility. The φ has a uniform prior distribution with the range (0, φmax), φmax is assumed to be a small number (generally, 10) (Green and Richardson 2002).
The dependency on covariates can be introduced by defining a function , where Xit = (1, X2it, …, Xpit)T and βi1, βi2, …, βiK are K vectors of unknown regression parameters and each of which has p elements. These parameters are component specific, even though Xit is constant across mixtures. This model in addition of heterogeneous covariate effect, also considers the spatial dependence between the covariates. In Hossain and Lawson (2010), we have implemented a simpler model as hitj = κij + γtj, where the κ’s are the structured spatial random effects, γ’s are the structured temporal random effects. However, the function hitj can also include unstructured spatial and temporal effects, and separable and non-separable space-time interaction effects. In order to define the exact form for hitj, we need to remember the fact in many applications with real data, including the non-separable interaction effects does not improve the goodness-of-fit statistics (e.g., Knorr-Held 2000). In the current applications with South Carolina law birth weight data, we will illustrate two other functional forms for hitj and also explain how to model non-separable space-time interaction effects.
To ensure that the above space-time stick-breaking prior is proper, i.e., , according to Ishwaran and James (2001) (see also, Ishwaran and Zarepour 2002) we must show for infinite K that . The proof is straight-forward since E(qitj) and E(pj) are both positive. It is relevant to mention that qitj is restricted to the interval [0, 1]. For finite K, can be proved by setting BitK = 1. Setting the condition BitK = 1 means, the infinite mixture is truncated to a finite K by putting all of the masses for the components with j ≥ K to ωitK.
3 Standard random effect space-time (SREST) model
We compare the above SBPST model to the standard random effect space-time (SREST) model since it is one of the best performing model in disease mapping for space-time data (Knorr-Held and Besag 1998; Knorr-Held 2000). Because of its simple hierarchical structure, it is relatively easier to implement in any statistical software (specifically, in WinBUGS). Although SREST model was initially introduced for smoothing risks, this model can also be used to find clusters of excess risk regions.
The SREST model can be regarded as a hierarchical generalized mixed model, which allows any probability distribution that belongs to the exponential family to be specified at the first level of hierarchy. For example Knorr-Held and Besag (1998) used Bernoulli distribution for modeling the Ohio lung cancer mortality data. Since, the outcome variable of our interest is a count data, we assume a Poisson distribution at the first level of hierarchy, specified as
The specification for the model for relative-risk, θit, will depend on study objective. For example, the logarithm of relative risk could have the form , where Xit = (1, X2it, …, Xpit)T, and βi1, βi2, …, βip are p unknown regression coefficients and the joint probability distribution for these coefficients can be specified by multivariate CAR model (Assunção 2003). Since, our interest is to get a better area specific relative-risk estimate for the observed cancer mortality for the simulated data-set in order to compare each model’s ability in terms of cluster detection of various shapes and sizes; it will be useful to include random effects to represent each source of variation. Considering random effects model is also common (Best et al. 2005) in small area health investigation since most often the disease under study is rare and the area population sizes are small. These conditions increase the random variation associated with risk estimates and including random effects to include each source of these variations may improve the relative risk estimates. We consider a nearly saturated model for the logarithm of relative-risk, defined as
where ρ is an overall mean parameter, κi and γt are spatial and temporal random effects for structured heterogeneity, εi and ξt are for unstructured heterogeneity, and λit is a space-time separable random effect.
4 Posterior sampling and the prior distribution specification
Recent research (Ohlssen et al. 2007) suggested that from computational aspect a finite approximation of the stick-breaking representation of the full DP model is more feasible to apply. The SBPST model in (2.1) can be regarded as a finite approximation of the full model where the number of mixing component, K could be fixed to a number for which very minimal information is compromised. This finite approximation for SBPST model can be implemented in WinBUGS (Spiegelhalter et al. 2003) by introducing space-time latent variables Z = (Z11, Z12, …, ZnT). These latent variables indicate the group membership for the unobserved variables θj where the mixing weights are defined as ωitj = pr(Zit = j). WinBUGS uses Gibbs sampling with necessary Metropolis-Hastings steps to obtain samples from the posterior distributions. The WinBUGS implementation of the SREST model is much easier and straightforward.
4.1 Prior distribution specification and the identifiability condition for the SBPST model
The prior distribution for the spatial dependence between the βij’s will be specified by a proper multivariate CAR model (Gelfand and Vounatsou 2003) which takes the form (βij·|{βi′j·, i′ ≠ i}, ρj, Σj) ~ MCAR(ρj, Σj), where Σj is a variance-covariance matrix of order p × p and ρj is a vector of spatial autocorrelation of order p. In the case of independent covariates, this multivariate specification will be reduced to the proper univariate CAR model. The prior distributions for spatial autocorrelation and covariance parameters can be simplified by assuming common prior distributions for all mixture components so that information can be pooled over all components. We implemented the latter approach. A standard choice of hyperprior distribution for the precision matrix, Σ−1, is a Wishart distribution with p degrees of freedom, Σ−1 ~ W(Ω, p). A common practice is to set Ω to a unit matrix (Gelman et al. 2004). The hyperprior distributions for the scale and shape parameters of baseline measure are assumed as, respectively, exp(0.1) and Gamma(0.001, 0.001) (Ohlssen et al. 2007). The spread parameter α is assumed to have uniform distribution with the range 0.3–10 (Ohlssen et al. 2007).
We are aware that when the covariates are introduced in the finite mixture model and the parameters are space and component dependent, the issues of label switching and potential overfitting become more delicate. Hennig (2000) showed that the regression parameters are identifiable if the number K of clusters is smaller than the number of distinct (p − 1)-dimensional hyperplanes generated by the covariates, i.e., if the covariates show too little variability. We propose to imposing the restriction as Assunção (2003) suggested, for the Bayesian spatially varying parameter model to ensure identifiability.
4.2 Prior distribution specification for the SREST model
The prior distribution specifications for the structured spatial effect is assigned to be a CAR model with the hyperprior distribution for the precision parameter as Gamma(0.5, 0.0005) (Bernardinelli et al. 1995), and for the structured temporal effect is assigned to be a first-order autoregressive model with the hyperprior distributions for the autocorrelation parameter as Beta(1, 1) (Lawson et al. 2010) and the temporal standard deviation as Uniform(0, 20) (Lambert et al. 2005). All other random effects εi, ξt and λit are assumed to have prior distributions as, respectively, and . A flat prior distribution can be assumed for the overall mean parameter ρ as N(0, kρ), where kρ is a large quantity (say, 10,000). All the standard deviation parameters are assumed to have the same prior distribution specifications as Uniform(0, 20) (Lambert et al. 2005).
5 Simulation design: Ohio County geographies
We conduct a simulation study to assess the performance of the SBPST model in comparison to the SREST model. The aim was to check how these two models perform in estimating clusters of different shapes and sizes. We use the Ohio lung cancer mortality dataset for our simulation since it is a freely available dataset (www.stat.uni-muenchen.de/service/datenarchiv/ohio/ohio_e.html), and have been analyzed previously in many studies (e.g., Xia and Carlin 1998; Knorr-Held and Besag 1998; Waller et al. 1997).
The state of Ohio in USA has 88 counties. We use the Ohio geographies and, the county and year specific expected lung cancer mortality for the period 1968–1988 as a base for our simulation. The simulation is as described in Hossain and Lawson (2010). In short, the steps of simulation are as follows.
-
The underlying risks are generated by
since in a normal situation it is expected that the relative risk at each county and year will be close to ‘one’ with some small variation,
The clusters of different shapes and magnitudes (δit) are embedded to the desired counties and years to give , where is assumed to be the true value of θit. The values for δ’s are chosen in the range of 0.0–3.39 such that the θtrue vary in the range of 0.5–4.0, and
The county and year specific lung cancer mortality count was generated from a Poisson model with: , where s = 1, …, S is the size of replication.
In our simulation experiment, the number of replications was set at S = 100 and was chosen as a balance between computation time and accuracy. We assume that the size 100 is reasonable to capture all the variations in the replicated dataset. In Fig. 1, the maps for the are given. The darker colors show the higher risk regions. Some of the clusters are embedded with fairly high values, intended to check model performances for this scenario.
Fig. 1.
Thematic maps of true relative risks of mortality from lung cancer incidences that have been assumed for simulating the datasets for Ohio geographies for the years 1968–1988
5.1 SBPST model
In the assessment of SBPST model for the simulated datasets, we have considered the function, hitj = κij + γtj, where the κ’s are the structured spatial random effects, γ’s are the structured temporal random effects.
The structured spatial random effects κj = (κ1j, …, κnj) have a CAR prior distribution, specified by: κij|{κ−ij}, , j = 1, …, K, where, , {Δi} is the set of first-order neighbors of the ith region and {κ−ij} is the set of all κ’s excluding κij. A vague prior distribution for the spatially hyperparameter is assigned and is specified as Gamma(0.5, 0.0005) (Bernardinelli et al. 1995).
The structured temporal random effects γj = (γ1j, …, γTj) have the first-order autoregressive prior distribution: γ1j|χj, , and γtj|γt−1j, χj, for t = 2, …, T, and j = 1, …, K. The first-order autocorrelation coefficient χj ∈ (0, 1) and γtj reduces to temporal independence when χj = 0. When estimating γ1j(j = 1, …, K), we restricted to be less than one to ensure identifiability. We assign a noninformative prior distribution for χj, specified as Beta(1, 1) (Lawson et al. 2010). The choice Beta(1, 1) gives the standard uniform distribution. A noninformative prior distribution was assigned for the temporal standard deviation σγj(j = 1, …, K), and is specified as Uniform(0, 20) (Lambert et al. 2005).
5.2 Results
For the simulated data where the true excess risk regions are known, it is possible to check the relative performances of SBPST and SREST models in the recovery of clusters. We use previously developed spatial-temporal diagnostic criteria in Hossain and Lawson (2010) for this comparison. The SBPST and SREST models were implemented in WinBUGS. The WinBUGS code developed for the SBPST model is available from the first author on request. The SBPST model was implemented for a finite number of components; here it was set to 7. The number was chosen to balance between the model complexity and computation time. To estimate all the model parameters, we ran two parallel chains with widely different initial values for 15,000 iterations. After discarding the first 10,000 iterations as burn-in samples, 5,000 from each chain were considered for posterior inference. The reported results are the posterior mean over these 10,000 MCMC samples. We checked the Gelman–Rubin statistic, kernel density plot and the trace plot of each parameter to ensure convergence. In addition, we performed a limited sensitivity analysis for the SBPST model for the choice of number of components. We compared the posterior exceedance probability that was obtained for the number of components 7 with that for the number of components 12, only for the first simulated dataset, and observed that the estimates were consistent with these numbers. Hence, the results reported for the SBPST model hereafter are for the number of component, 7. The thematic maps for R (R is defined as the average number of realizations where posterior exceedance probability is >0.95) for the selected years 1969, 1973, 1980, 1981, and 1987 are given in Fig. 2 where the top row is the result for SBPST model and the bottom rows is for the SREST model. For space limitation we report results only for those years where the clusters vary in shape and size. The posterior exceedance probability (Richardson et al. 2004) was calculated for the threshold value ‘1’, i.e., , where G is the posterior sample size and is the estimated risk for the gth sample value from converged posterior sampling output. The reported results are averages over the 100 replicates. This threshold level seems to signal the clusters of high intensity well. It is also visible in these maps that these two models seem to correctly signal all the spatio-temporal clusters. For the increasing threshold value (i.e., 2–3), the R maps appear to become ‘cleaned out’, in that lower risk areas no longer signal (the maps are not included here for space limitation).
Fig. 2.
Thematic maps of R for the selected years (1969, 1973, 1980, 1981, and 1987) of Ohio simulated data for the threshold value ‘one’. Top row stick-breaking process space-time (SBPST) model, and bottom row standard random effects space-time (SREST) model
Table 1 gives the threshold value specific cluster misspecification rates and the mean square error (MSE) for each model. The threshold values are set at c = 1, 2, and 3. Choosing these threshold values will help to define three risk levels as: low-risk level when c = 1, medium-risk level when c = 2 and high-risk level when c = 3. It appears that in detecting clusters of medium-risk (c = 2) and high-risk (c = 3), SBPST model performs better than SREST models. In detecting clusters of low-risk (c = 1), SREST model performs slightly better than SBPST model.
Table 1.
Cluster misspecification rates and mean square error for the stick-breaking process space-time (SBPST) model and the standard random effects space-time (SREST) model for the threshold values, c = 1, 2, and 3
| Threshold value (c) | Cluster misspecification rate
|
Mean square error
|
||
|---|---|---|---|---|
| SBPST | SREST | SBPST | SREST | |
| 1.00 | 0.013 | 0.016 | 0.015 | 0.014 |
| 2.00 | 0.006 | 0.011 | 0.036 | 0.062 |
| 3.00 | 0.100 | 0.175 | 0.021 | 0.087 |
6 South Carolina low birth weight data and results
Low birth weight (LBW) is an important indicator of women’s reproductive health and general health status of population (Goldenberg and Culhane 2007). It is defined as baby weighting <2,500 g at live-birth. The causes of LBW may include individual level behavioral and psychosocial factors, neighborhood characteristics, environmental exposures, access to prenatal care, and biological factors. Recent studies on LBW have focused on the effects of neighborhood factors such as environmental health factors, residential segregation, and income inequalities using ecological or multilevel study design (Grady 2006; Janevic et al. 2010). In our current study, we sought to find grouping of areas where rates of LBW incidence are relatively higher by modeling the spatio-temporal patterns of LBW according to SBPST methods while considering space- and time-varying covariates and the potential interactions between the spatio-temporal factors.
The state of South Carolina has 46 counties, and the study period is considered as years 1997–2007. We have considered three models for the county and year specific South Carolina LBW incidences. The models are different based on their specification of the function hitj. Model 1 considers no covariates, specified only by the structured spatial and structured temporal random effects, i.e. hitj = κij + γtj. Model 2 considers covariates, and we consider the county and year specific covariates: population density (PD), proportion of African-American (PAA), median household income (MHI), proportion of poverty (PP), and unemployment rate (UR). These covariates are considered by following the previous studies by Fang et al. (1999) and Pearl et al. (2001). Thus, we use the form, hitj = β0j + β1jPDit + β2jPAAit + β3jMHIit + β4jPRit + β5jURit + κij + γtj. This model assumes that the regression parameters are component specific and has the flexibility to capture any effect of discontinuity or clustering pattern in covariates on LBW. Model 3 considers a general approach by defining a function, hitj = β0ij + β1ijPDit + β2ijPAAit + β3ijMHIit + β4ijPRit + β5ijURit + β6ijt. This model illustrates a simpler way to model a space-time interaction effect, and specifying a multivariate prior distribution for βij’s ensures a nonseparable covariance for this effect. Using a multivariate prior distribution for the joint modeling of covariate effects is very practical in a sense that counties with a higher UR are likely to have a higher PP and a lower MHI.
As an aid to model selection we use the deviance information criteria (DIC) (Spiegelhalter et al. 2003) and mean square prediction error (MSPE) (Gelfand and Ghosh 1998). Both of these criteria, DIC and MSPE, are based on the predictive ability of the model while the DIC captures the model complexity by the effective number of parameters. The DIC for model M is defined as , where ΘM is the set of all parameters under model M, is the posterior mean deviance and pM is the effective number of parameters, which is a measure of model complexity. The pM is commonly measured by , where D(Θ̂M) is the deviance of the posterior means. In our mixture model, the latent variables to indicate the group membership are discrete, and this membership indicator changes over the MCMC iterations. Thus, it is not clear which group membership information will be used to calculate D(Θ̂M). Instead, we follow the Gelman et al. (2004) proposal to measure pM which is defined as half the posterior variance of the deviance. The MSPE for model M is calculated by , where is the predicted O for the model M at gthiteration and G is the MCMC sample size. The predicted values are generated from the posterior predictive distribution, p(O(pred)|O) = ∫ p(O(pred)|Ψ) p(Ψ|O)dΨ, after ensuring the convergence of all model parameters, Ψ. As it appears, MSPE does not separate out any measure for model complexity, although Gelfand and Ghosh (1998) illustrated that for increasing model complexity MSPE increases.
The data sources for this study are: LBW data acquired from the South Carolina Department of Health and Environmental Control (http://www.scdhec.gov/), and population, income, poverty and unemployment data are from the US Census Bureau (http://www.census.gov). The county and year specific expected LBW incidences, Eit, is calculated by nit D, where nit is the total birth in county i and year t, and D is the South Carolina overall LBW rate calculated by the ratio of total LBW to total birth over the entire spatial-temporal domain (Banerjee et al. 2004). The county specific standardized incidence ratios (SIRs) for the years 1997, 2001, 2004, and 2007 are given in Fig. 4 (top row). The observed SIRs are showing a presence of a small cluster of excess risks in the east in the year 2001. This cluster appears again in the year 2004. Although there is no spatial cluster that is persistent over the study period, there are some sporadic regions of excess risks.
Fig. 4.

Thematic maps of standardized incidence ratios (top row) and the group membership labels of South Carolina low birth weight (LBW) incidences from SBPST Model 1 (bottom row), for the selected years (1997, 2001, 2004, and 2007)
6.1 Results
The three models as specified in above are fitted in WinBUGS for a fixed number of components. The number of components was set to 10 in order to balance between computational time and model complexity. We preferred to choose the number of components for the real data a little higher than the simulated data since the number of counties for SC is much smaller than the state of Ohio. The posterior inference is based on 10,000 samples from two parallel chains, after a burn-in period of 10,000 samples for each chain. The initial values for each chain were set to widely different values. We ensured the convergence of each estimate by checking the Gelman–Rubin statistic, kernel density plot and the trace plot. The DIC and the MSPE values for these models are reported in Table 2. We have observed similar trends in DIC and MSPE values for the three models specified in above. The smallest values are obtained for Model 1 which included only the structured spatial and structured temporal components. The second smallest values are obtained for Model 3. This model considered a MCAR model as a prior distribution for the spatially varying regression parameters, and also includes a space-time nonseparable effect. The DIC and MSPE values for Model 3 are lower comparing to Model 2 indicates the presence of spatial correlation within the covariates, although this evidence is not substantial since Model 1 has the smallest DIC and MSPE values. In the following we report the results for Model 1.
Table 2.
The deviance information criteria (DIC) and the mean square predictive error (MSPE) values for the three competing SBPST models for the South Carolina low birth weight (LBW) incidences for the years 1997 to 2007
| Model | Mean of deviance | Variance of deviance | DIC | MSPE |
|---|---|---|---|---|
| Model 1 | 3636.0 | 27.71 | 3649.86 | 224.2 |
| Model 2 | 3676.0 | 28.66 | 3690.33 | 256.0 |
| Model 3 | 3664.0 | 20.76 | 3674.38 | 244.6 |
In cluster analysis, we intend to find the group in which observations are similar to each other than observations in other groups. One way of assigning the appropriate group label to each observation is by calculating the frequency distribution of the latent variable Zit over all posterior samples, and then assigns the label for which the frequency is maximized. The frequency value for each l can be calculated by , where is the gth posterior latent variable Zit which indicate the group membership, l = 1, …, K and G is the posterior sample size. The group label for the (i, t)th observation will be l′ for which . In Fig. 3, we report the histogram plot of group membership labels over 11 years (1997–2007) for each county specific South Carolina LBW incidences. The membership labels are in the range of 1–10, since the number of mixing components K in the computation was set to 10. It appears that the number of grouping for each county was varying over the 11 year study period and it was varying in the range of 3–6 levels. The maximum number of grouping occurred for the counties Charleston and Greenwood. The thematic maps of this membership label for the years 1997, 2001, 2004 and 2007 are presented in Fig. 4 (bottom). In both maps, top and bottom of Fig. 4, the darker colors may not bear the same meaning. In the top, darker the colors indicate high risk counties and in the bottom, it distinguishes the group membership.
Fig. 3.
County specific histogram plot of group membership labels over the 11 year study period (1997–2007) of South Carolina low birth weight (LBW) incidences from SBPST Model 1
7 Conclusions
In this paper, we propose a space-time stick-breaking process for the modeling of incidence count data that are observed at spatially varying locations and for successive time points, and illustrated how this model can be used for finding clusters of high risk areas. We have used the Poisson distribution as an obvious choice of modeling the count data at the first level of hierarchy. The dependencies for spatial and temporal effects are introduced by using space-time covariate dependent kernel stick-breaking processes for the weight component. The space-time dependent kernel can include component specific spatially varying regression coefficients and non-separable space-time effects. The spatially varying regression coefficients are modeled by a MCAR type model.
The proposed model has been extensively validated by using the simulated data and compared with the SREST model by checking each model’s ability in terms of detecting clusters of various shapes and sizes. In generating artificial geo-referenced lung cancer incidences, we have used Ohio geographies and 21 years of county specific expected lung cancer incidences for the years 1968–1988. Clusters of various shapes and sizes are then embedded to the simulated data. The finding is mixed; neither of these two models can be singled out as the best performing model to be recommended for all levels of cluster detection. We have observed that SBPST model performs better than SREST model in detecting medium- and high-risks clusters. The SREST model performs better in detecting low risk clusters. After the validation, we applied the SBPST model to a real dataset, county specific low birth weight incidences for the state of South Carolina for the years 1997–2007. By using the latent variable labels in posterior samples, we reported the county specific group membership labels over the 11 year study period by histogram plot (Fig. 3). We have observed some similarities of these group membership labels with observed SMR (Fig. 4).
In our current implementation of the SBPST model, we fixed the number of components to a finite number. In our future work we will consider setting this number as an unknown constant and estimate this number by using reversible jump MCMC (RJMCMC) method.
Acknowledgments
The support of NIH grants UL1 RR024148 (CTSA) and R21 HL088654-01A2 are gratefully acknowledged. Our sincere thanks go to the editor for many constructive comments which contributed to the further improvement of this manuscript.
Biographies
Md. Monir Hossain is an Assistant Professor of Biostatistics in the Center for Clinical and Translational Sciences, University of Texas Health Science Center at Houston, USA. His research interests include developing models for spatial and spatial-temporal data, and developing method for cluster diagnostics.
Andrew B. Lawson is a Professor of Biostatistics in the Division of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina at Charleston, USA. He has a research specialization in statistical methods in spatial epidemiology. He has published extensively, including 6 books in the area of spatial epidemiology.
Bo Cai is an Assistant Professor of Biostatistics in the Department of Epidemiology and Biostatistics, University of South Carolina at Columbia, USA. He has research interests in Semiparametric modeling, latent variable modeling and spatial modeling. He has published extensively in statistical methodological journals.
Jungsoon Choi is a Post-Doctoral Scholar in Biostatistics in the Division of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina at Charleston, USA. Her postdoctoral training is in the area of spatial statistics and latent structure modeling.
Jihong Liu is an Associate Professor of Epidemiology in the Department of Epidemiology and Biostatistics, University of South Carolina at Columbia, USA. Her research interests include perinatal epidemiology and reproductive health.
Russell S. Kirby is a Professor of Epidemiology in the Department of Community and Family Health, University of South Florida at Tampa, USA. His research specializations include perinatal epidemiology, birth defects and developmental disabilities epidemiology and prevention, GIS and spatial analysis.
Contributor Information
Md. Monir Hossain, Email: md.hossain@cchmc.org, Division of Biostatistics and Epidemiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio.
Andrew B. Lawson, Division of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina, Charleston, SC, USA
Bo Cai, Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, USA.
Jungsoon Choi, Division of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina, Charleston, SC, USA.
Jihong Liu, Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, USA.
Russell S. Kirby, Department of Community and Family Health, University of South Florida, Tampa, FL, USA
References
- Assunção RM. Space varying coefficient models for small area data. Environmetrics. 2003;14:453–473. [Google Scholar]
- Banerjee S, Carlin B, Gelfand AE. Hierarchical modeling and analysis for spatial data. Chapman and Hall; New York: 2004. [Google Scholar]
- Bernardinelli L, Clayton D, Montomoli C. Bayesian estimates of disease maps: how important are priors? Stat Med. 1995;14:2411–2431. doi: 10.1002/sim.4780142111. [DOI] [PubMed] [Google Scholar]
- Best N, Richardson S, Thomas A. A comparison of Bayesian spatial models for disease mapping. Stat Methods Med Res. 2005;14:35–59. doi: 10.1191/0962280205sm388oa. [DOI] [PubMed] [Google Scholar]
- Duan JA, Guindani M, Gelfand AE. Generalized spatial Dirichlet process models. Biomerika. 2007;94:809–825. [Google Scholar]
- Dunson DB, Park JH. Kernel stick-breaking processes. Biomerika. 2008;95:307–323. doi: 10.1093/biomet/asn012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fang J, Madhavan S, Alderman MH. Low birthweight: race and maternal nativity-impact of community income. Pediatrics. 1999;103:E5. doi: 10.1542/peds.103.1.e5. [DOI] [PubMed] [Google Scholar]
- Ferguson TS. Bayesian density estimation by mixtures of normal distributions. In: Rizvi MH, Rustagi JS, Siegmund D, editors. Recent advances in statistics. Academic Press; New York: 1983. pp. 287–302. [Google Scholar]
- Fernandez C, Green PJ. Modelling spatially correlated data via mixtures: a Bayesian approach. J R Stat Soc Ser B. 2002;64:805–826. [Google Scholar]
- Gelfand AE, Ghosh SK. Model choice: a minimum posterior predictive loss. Biometrika. 1998;85:1–11. [Google Scholar]
- Gelfand AE, Kottas A, MacEachern SN. Bayesian nonparametric spatial modeling with Dirichlet process mixing. J Am Stat Assoc. 2005;100:1021–1035. [Google Scholar]
- Gelfand AE, Vounatsou P. Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics. 2003;4:11–25. doi: 10.1093/biostatistics/4.1.11. [DOI] [PubMed] [Google Scholar]
- Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis. Chapmann & Hall; Boca Raton: 2004. [Google Scholar]
- Goldenberg RL, Culhane JF. Low birth weight in the United States. Am J Clin Nutr. 2007;85:584S–590S. doi: 10.1093/ajcn/85.2.584S. [DOI] [PubMed] [Google Scholar]
- Grady S. Racial disparities in low birthweight and the contribution of residential segregation: a multilevel analysis. Soc Sci Med. 2006;63:3013–3029. doi: 10.1016/j.socscimed.2006.08.017. [DOI] [PubMed] [Google Scholar]
- Green J, Richardson S. Hidden Markov models and disease mapping. J Am Stat Assoc. 2002;97:1055–1070. [Google Scholar]
- Green PJ, Richardson S. Modelling heterogeneity with and without the Dirichlet process. Scand J Stat. 2001;28:355–375. [Google Scholar]
- Griffin JE, Steel MF. Order-based dependent Dirichlet processes. J Am Stat Assoc. 2006;101:179–194. [Google Scholar]
- Hennig C. Identifiability of models for clusterwise linear regression. J Classif. 2000;17:273–296. [Google Scholar]
- Hossain MM, Lawson AB. Space-time Bayesian small area disease risk models: development and evaluation with a focus on cluster detection. Environ Ecol Stat. 2010;17:73–95. doi: 10.1007/s10651-008-0102-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ishwaran H, James LF. Gibbs sampling methods for stick-breaking priors. J Am Stat Assoc. 2001;96:161–173. [Google Scholar]
- Ishwaran H, Zarepour M. Exact and approximate sum representations for the Dirichlet process. Can J Stat. 2002;30:269–283. [Google Scholar]
- Janevic T, Stein CR, Savitz DA, Kaufman JS, Mason SM, Herring AH. Neighborhood deprivation and adverse birth outcomes among diverse ethnic groups. Ann Epidemiol. 2010;20:445–451. doi: 10.1016/j.annepidem.2010.02.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Knorr-Held L. Bayesian modelling of inseparable space-time variation in disease risk. Stat Med. 2000;19:2555–2567. doi: 10.1002/1097-0258(20000915/30)19:17/18<2555::aid-sim587>3.0.co;2-#. [DOI] [PubMed] [Google Scholar]
- Knorr-Held L, Besag J. Modelling risk from a disease in time and space. Stat Med. 1998;17:2045–2060. doi: 10.1002/(sici)1097-0258(19980930)17:18<2045::aid-sim943>3.0.co;2-p. [DOI] [PubMed] [Google Scholar]
- Kottas A, Duan JA, Gelfand AE. Modeling disease incidence data with spatial and spatial-temporal Dirichlet process mixtures. Biom J. 2007;49:1–14. doi: 10.1002/bimj.200610375. [DOI] [PubMed] [Google Scholar]
- Lambert PC, Sutton AJ, Burton PR, Abrams KR, Jones DR. How vague is vague? A simulation impact of the use of vague prior distributions in MCMC using winbugs. Stat Med. 2005;24:2401–2428. doi: 10.1002/sim.2112. [DOI] [PubMed] [Google Scholar]
- Lawson AB, Song HR, Cai B, Hossain MM, Huang K. Space-time latent component modeling of geo-referenced health data. Stat Med. 2010;29:2012–2027. doi: 10.1002/sim.3917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lo AY. On a class of Bayesian nonparametric estimates: I. Density estimates. Ann Stat. 1984;12:351–357. [Google Scholar]
- Ohlssen DI, Sharples LD, Spiegelhalter DJ. Flexible random-effects models using Bayesian semi-parametric models: application to institutional comparisons. Stat Med. 2007;26:2088–2112. doi: 10.1002/sim.2666. [DOI] [PubMed] [Google Scholar]
- Pearl M, Braveman P, Abrams B. The ralationship of neighborhood socioeconomic characteristics to birthweight among 5 ethnic groups in California. Am J Public Health. 2001;91:1808–1814. doi: 10.2105/ajph.91.11.1808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reich BJ, Fuentes M. A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields. Ann Appl Stat. 2007;1:249–264. [Google Scholar]
- Richardson S, Thomas A, Best N, Elliott P. Interpreting posterior relative risk estimates in disease-mapping studies. Environ Health Perspect. 2004;112:1016–1025. doi: 10.1289/ehp.6740. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sethuraman J. A constructive definition of Dirichlet priors. Stat Sin. 1994;4:639–650. [Google Scholar]
- Spiegelhalter D, Thomas A, Best N, Lunn D. WinBUGS user manual [1.4.] MRC Biostatistics Unit, Institute of Public Health; Cambridge: 2003. [Google Scholar]
- Waller LA, Carlin BP, Xia H, Gelfand AE. Hierarchical spatio-temporal mapping of disease rates. J Am Stat Assoc. 1997;92:S.607–S.617. [Google Scholar]
- Xia H, Carlin BP. Spatio-temporal models with errors in covariates: mapping Ohio lung cancer mortality. Stat Med. 1998;17:2025–2043. doi: 10.1002/(sici)1097-0258(19980930)17:18<2025::aid-sim865>3.0.co;2-m. [DOI] [PubMed] [Google Scholar]



