Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Jun 3.
Published in final edited form as: Bayesian Anal. 2012 Dec 1;7(4):813–840. doi: 10.1214/12-BA727

Nonparametric Bayesian Segmentation of a Multivariate Inhomogeneous Space-Time Poisson Process

Mingtao Ding *, Lihan He , David Dunson , Lawrence Carin §
PMCID: PMC3670617  NIHMSID: NIHMS444983  PMID: 23741284

Abstract

A nonparametric Bayesian model is proposed for segmenting time-evolving multivariate spatial point process data. An inhomogeneous Poisson process is assumed, with a logistic stick-breaking process (LSBP) used to encourage piecewise-constant spatial Poisson intensities. The LSBP explicitly favors spatially contiguous segments, and infers the number of segments based on the observed data. The temporal dynamics of the segmentation and of the Poisson intensities are modeled with exponential correlation in time, implemented in the form of a first-order autoregressive model for uniformly sampled discrete data, and via a Gaussian process with an exponential kernel for general temporal sampling. We consider and compare two different inference techniques: a Markov chain Monte Carlo sampler, which has relatively high computational complexity; and an approximate and efficient variational Bayesian analysis. The model is demonstrated with a simulated example and a real example of space-time crime events in Cincinnati, Ohio, USA.

Keywords: Bayesian hierarchical model, spatial segmentation, temporal dynamics, Gaussian process, logistic stick breaking process, inhomogeneous Poisson process

1 Introduction

1.1 Motivating application

Assume access to the locations of various types of crimes occurring in a given city, as a function of time. As a motivating example, in Figure 1(a) data are shown for 3090 crimes (of 17 crime types) in Cincinnati in Jan 2008. Our focus is on obtaining a spatial segmentation, such as that shown in Figure 1(b). In addition to the spatial dependence of point process data, we wish to simultaneously explore time dynamics. For example, in the crime data analysis, the crime intensity in summer may be different statistically from that in winter, and this intensity may change smoothly over seasons; consequently, the spatial segmentation of the city may also vary smoothly over time.

Figure 1.

Figure 1

Crime events and the segmentation of the city. In (a) 3090 crime events are shown as black dots; in (b) each color indexes a segment with associated crime intensities in 17 crime types (see Section 4 for details).

The analysis of time dynamics helps to discover the temporal pattern of the events and to predict the spatial segmentation at an unobserved time instance or in the future. We desire that the analysis provide a simple summary that is useful to police forces and city planners in targeting resources, as well as to researchers in studying crime trends. We would like to obtain this space-time segmentation quickly, utilizing data from different types of events, while allowing temporal interpolation and forecasting.

1.2 Summary of proposed model

Consider the data 𝒟 = {si, vit}i=1, …, M, t=1, …, T, where vit is a d-dimensional vector of the counts of d types of events, occurring in a (small) spatial region Δ(si), with the center of the region being si ∈ ℝ2. In the context of Figure 1, we are interested in d types of crime. The contiguous grid of spatial regions Δ(·) is fixed in advance, and the size of Δ(·) is very small relative to the size of the entire spatial domain, providing justification for an approximation in which we index regions by the center point and assume homogeneity within regions (using the model developed below, in the limit Δ → 0 we have a Poisson process). There are T time points at which data are observed, not necessarily uniformly spaced in time. Although not done here, one may envision aligning the grid Δ(·) with the geometry of the terrain (e.g., roads).

The proposed space-time model may be summarized as

vit~j=1dPoisson(λijt),    λit~k=1Kwk(si;θkt)δλkt* (1)

where wk(si; θkt) ≥ 0, k=1Kwk(si;θkt)=1 for all si, δλkt* is a unit measure concentrated at λkt*, and λijt is the jth component of λit. This corresponds to a mixture model, with space-time varying mixture weights wk(si; θkt) and time-varying atoms δλkt*.

Expression wk(s; θkt) represents a general parametric function capable of modeling the probability of cluster k at spatial location s. In the details of the proposed model, one of the {wk(s; θkt)}k=1,K is likely to be dominant (large probability) over a contiguous region, yielding a segmentation. Since the parameters θkt change in general with time t, a probabilistic space-time segmentation is manifested. Within the proposed model, the prior encourages that {θkt} and λkt* vary smoothly as a function of time, and hence the model imposes smooth space-time variation in the shape/form of the segments, and smooth temporal variation of the Poisson rates associated with a given segment.

Two methods are considered for imposing temporal smoothness, representing two perspectives on imposing the same temporal structure. For discrete-time data with uniform temporal spacing, it is natural to consider the first-order autoregressive model, i.e., AR(1), as θkpt~𝒩(ζθkp(t1),α01), with θkpt the pth component of θkt, ζ the AR(1) coefficient (with |ζ| < 1), and α0 a precision to be inferred (ζ and α0 could also be extended to depend on k and p). The log of each component of λkt* may be similarly modeled.

We also consider a Gaussian process (GP) model (Rasmussen and Willams 2006) in time for each component θkpt, and for the log of each component of λkt*, thus allowing non-uniform temporal sampling. To make the AR(1) and GP models consistent, we assume an exponential model for the GP covariance between times ti and tl, c0c1|titl|, with c1 playing a role analogous to ζ in the AR(1) model, and the variance c0 corresponding to [(1 − ζ20]−1 from the AR(1) model. The AR(1) and chosen GP representations are therefore essentially different means of imposing the same temporal prior, with the former restricted to uniform temporal sampling.

In addition to developing a new model for multivariate inhomogeneous space-time Poisson process data, a contribution of this paper concerns computations, in the form of a detailed comparison of Markov chain Monte Carlo (MCMC) and variational Bayesian (VB) inference for this class of models. The former is widely used, but it can be computationally prohibitive for the motivating large-scale problems considered here. Computations based on VB are attractive for large-scale modeling studies, but many simplifying assumptions must be made.

1.3 Related research

A natural model for exploiting spatial information, and to model point process data, is the inhomogeneous Poisson process (Diggle 2003),(Møller and Waagepetersen 2004). Researchers have recently studied nonparametric Bayesian approaches for such applications. One of these approaches models the Poisson intensity function by a variation of a Gaussian process (GP) (Adams et al. 2009; Rathbun and Cressie 1994; Møller et al. 1998). The log-Gaussian Cox process (Møller et al. 1998), corresponding to an intensity function modeled as an exponentiated GP, has proven highly successful in point process (Hossain and Lawson 2009) and geostatistical modeling (Diggle et al. 2010; Pati et al. 2010). Mixture models provide another approach to representing the Poisson intensity function (Wolpert and Ickstadt 1998). Kottas and Sansó (2007) proposed a Dirichlet process (DP) mixture model of bivariate beta densities to model heterogeneity in intensity functions. Dirichlet process mixture models of multivariate normal densities can be also found in (Ji et al. 2009; Chakraborty and Gelfand 2010).

In Taddy (2008, 2010); Taddy and Kottas (2012) a dynamic model was proposed for Poisson point processes, based on a novel version of the dependent Dirichlet process. Models of this type have been applied to the data considered in Figure 1, although the problem of segmentation was not considered. In Achcar et al. (2011) a time inhomogeneous Poisson model was proposed, with change-points to estimate the number of times that a given environmental standard is violated in a time interval of interest.

Rather than modeling the Poisson intensity via a GP or a DP mixture model, the model in (1) constitutes a mixture model with space-time mixture weights, and the spatial locations {si} of the grid are modeled as covariates. The details of how wk(s; θkt) is modeled encourages contiguous regions in space and time for which a single component (cluster) dominates, encouraging a piecewise-constant Poisson intensity function. In Heikkinen and Arjas (1998) the authors similarly build a piecewise constant prior model for spatial Poisson intensities, using Voronoi tessellations. We model wk(s; θkt) via an extension of the logistic stick-breaking process (LSBP) (Ren et al. 2011). The region of interest is partitioned into a set of contiguous small square cells, with related ideas considered in Hossain and Lawson (2009). Within the context of the aforementioned GP construction for the temporal dependence of θkt, related ideas were presented in the context of factor analysis (Luttinen and Ilin 2009), where GPs were used to describe the smoothness of both spatial locations and time. An AR model for temporal dynamics was considered in Taddy (2008, 2010).

2 Model Details

2.1 Basic construction

The proposed space-time model for data 𝒟 = {si, vit}i=1, …, M, t=1, …, T is summarized as

vit~j=1dPoisson(λijt),    λit~k=1Kwk(sit)δλkt* (2)
wk(sit)=pk(sit)h=1k1[1ph(sit)] (3)
pk(sit)=σ(gk(sit)),  for  k=1,,K1,    pK(sit)=1 (4)
gk(sit)=j=1βkjt𝒦(sit,j;ψk)+βk0t (5)

where (2) is repeated here from (1), for convenience. Below we explain and motivate each term in this construction. Parameters θkt from the Introduction correspond here to {βkjt}j=0,J and ψk. In what follows, the notation sit is meant to assign statistics to spatial location si at time t; for example, wk(sit) is the kth mixture weight as observed at si and time t. The spatial grid defining the regions {Δ(si)}i=1,M is not changing with time.

The expression in (3), with pk(sit) ∈ [0, 1] for all sit, is suggestive of the stick-breaking representation of the Dirichlet process (Sethuraman 1994). The function σ(x) = exp(x)/(1 + exp(x)) is associated with a logistic model, and pK(sit) = 1 such that k=1Kwk(sit)=1 for all sit. By the construction of gk(sit) in (5), the probabilities pk(sit) have space-time variation, with such variation transferred to the mixture weights wk(sit) via (3). Therefore, via mixture weights wk(sit) in (2) we constitute a multivariate Poisson mixture model, with weights that vary as a function of sit.

Function 𝒦(s, j ; ψk) denotes a kernel with parameter ψk. Here we employ the radial basis function 𝒦(s,j;ψk)=exp(sj22/ψk), with J predefined kernel centers {j}j=1,J; for convenience these J centers are here aligned with the centers of the spatial grid defined by Δ(j) (recall discussion in the Introduction). The appropriate kernel parameters {ψk} will be inferred. To ease computations, we assume a discrete set of parameters {ψ1*,,ψL*} over which a uniform prior is placed; each kernel parameter ψk is assumed drawn from this finite library of parameters.

The space-time dependence of the model is manifested in how {βkjt}j=0,J and {λkt*} are modeled.

2.2 Temporal modeling

When the data are sampled uniformly in time, an autoregressive (AR) temporal model is natural. Specifically, we consider

βkjt~𝒩(ζβkj(t1),αβ1),   j=0,,J (6)
log λkjt*~𝒩(ξ log λkj(t1)*,αλ1),   j=1,,J (7)

with βkj0=log λkj0*=0. Gamma priors are placed on αβ and αλ. Further, ζ and ξ are drawn from a truncated normal 𝒩(0,1) (0, 1) with 0 < ζ, ξ < 1.

The collection of data may be expensive, and there may be situations for which nonuniform temporal sampling is desired (e.g., to provide fine-scale sampling in particular regions – seasons – of time that may be interesting). This suggests using a Gaussian process (GP) model (Rasmussen and Willams 2006) for the temporal variation of βkjt and log λkjt*.

For the kth mixture component, we let

Bk~𝒩(Bk|0,Ωk)=j=0J𝒩(βkj:|0,Σkj),   [Σkj]il=c0c1|titl| (8)

where βkj: = [βkj1, …, βkjT]T, and Bk ∈ ℝT(J + 1) denotes a vector formed by concatenating βkj: for j = 0, …, J. The covariance Ωk is a block-diagonal matrix of size T(J + 1) × T(J + 1), and each block Σkj is a T × T covariance matrix; the entry at row i and column l, denoted as [Σkj]il, is evaluated using the GP covariance function with the hyperparameters {c0, c1}. A gamma prior is placed on c0. Since c1 plays the same role as ζ, we also draw c1 from the truncated normal 𝒩(0,1) (0, 1) with 0 < c1 < 1.

The Gaussian process priors are also placed on log λkjt*. For mixture component k

log(λkj:*)~𝒩(0,Γkj),    [Γkj]il=d0d1|titl| (9)

where log(λkj:*)=[log(λkj1*),,log(λkjT*)]T, and the covariance matrix Γkj ∈ ℝT×T, with the entries defined by the GP covariance function with the hyperparameters {d0, d1}. A gamma prior and truncated normal prior are placed on d0 and d1. As discussed in the Introduction, the considered AR(1) and GP priors are consistent, and provide different modeling strategies for the same imposed temporal dynamics.

2.3 Model interpretation

Equations (3)(5) are of the form of the logistic stick-breaking process (LSBP) introduced in Ren et al. (2011); however, that paper did not consider Poisson data, and space-time processes were not addressed. Recall that σ(x) ≈ 1 for x > 4; we refer to this as the “clipping” property of the logistic, as all x larger than about 4 contribute effectively in the same manner to σ(x); one may alternatively use a probit model, to achieve the same end. If βkjt > 4, then pk(s) ≈ 1 for sj22<ψk. This implies via (3) that within region sj22<ψk, if βkjt > 4 mixture component k is highly probable (assuming that other clusters k′ ≠ k do not have large pk′ (s) in the vicinity of j). The “clipping” nature of the logistic function, and large values of βkjt > 4, encourage contiguous regions for which a given cluster k has high space-time probability of being manifested (all locations s at which gk(s) > 4 have similarly high probability of being associated with cluster k, regardless of the exact value of gk(s)). The weights {βkjt} play the role of assigning which regions in space-time are most likely to be associated with a given cluster k, and ψk defines the size scale of the cluster. Note that while we truncate the model to K mixture components, this does not mean that all components need actually be used to represent the data. For example, if a given βk0t is large and negative, then the kth mixture component is unlikely to be utilized at all spatial locations at time t; K is simply an upper bound on the number of mixture components (segment types).

3 Posterior inference

The posterior distribution of the model parameters is inferred via an MCMC sampler and via variational Bayesian (VB) inference (Beal 2003). The VB inference typically converges quickly and is computationally efficient; by contrast, MCMC convergence may be difficult to diagnose, and a large number of iterations is required to collect samples representing the joint posterior distribution. The detailed MCMC and VB update equations are provided in the Appendix (we provide equations for the GP model, with minor changes manifested for the AR case). Since VB analysis is not as widely used in the statistics literature, for completeness we provide details on its modeling assumptions.

Let Θ represent a vector of all model parameters; the goal is to infer the posterior p(Θ|𝒟). The likelihood of the data is represented by p(𝒟|Θ) and the prior on the model parameters is denoted by p(Θ). Let q(Θ; Γ) be a parametric distribution with hyperparameters Γ, and consider the variational expression

(Γ)=dΘq(Θ;Γ)lnq(Θ;Γ)p(𝒟|Θ)p(Θ)=DKL[q(Θ;Γ)p(Θ|𝒟)]lnp(𝒟). (10)

In VB analysis the goal is to optimize the hyperparameters Γ to minimize the Kullback-Leibler divergence between q(Θ; Γ) and the true posterior p(Θ|𝒟); this corresponds to adjusting Γ in q(Θ; Γ) such that ℱ(Γ) is minimized. Note that dΘq(Θ;Γ)lnq(Θ;Γ)p(𝒟|Θ)p(Θ) is only a function of the likelihood p(𝒟|Θ) and the prior p(Θ), and not the unknown posterior; with careful selection of q(Θ; Γ), numerical techniques akin to expectation-maximization (EM) (Beal 2003) can be employed to minimize (Γ), with assurance of convergence to a local-optimal solution.

Focusing on the GP temporal model (the AR case is very similar), the model parameters are

Θ={{λkj:*}j=1,,d,k=1,,K,{Bk}k=1,,K,{Zk(sit)}t=1,,T,i=1,,M,k=1,,K,c0,c1,d0,d1} (11)

where Zk(sit) ~ Bernoulli(pk(sit)), with pk(sit) defined in (4). Completing the generative process, vit~j=1dPoisson(λjt*) if Zk(sit) = 0 for k < and Z(sit) = 1; λjt* is the jth component of vector λjt*.

In VB one typically assumes a factorized form for q(Θ; Γ), i.e., q(Θ; Γ) = ∏l ql(Θl; Γl), where Θl represents the lth set of model parameters and qll; Γl) is a parametric density function with hyperparameters Γl; the union of all Θl corresponds to Θ. Through careful selection of ql(Θl; Γl) one may iteratively optimize the variational expression ℱ(Θ).

For the proposed model, q(Bk) is a multivariate normal distribution, q(Zk(sit)) is Bernoulli (with Bernoulli probability defined by a logistic function), qk) is multinomial based upon a finite library of possible parameters {ψl*}l=1,L, and q(c0) and q(d0) are gamma distributions. It is not possible to define a q(λkj:*) that yields closed-form updates. Therefore, the parameters λkj:* within the VB analysis are also approximated at each iteration via a point estimate that maximizes the functional ℱ(Γ). Similarly, q(c1) and q(d1) cannot be obtained in closed form. The parameters c1 and d1 are updated on each VB iteration by defining parameters that maximize the functional ℱ(Γ).

4 Example Results

While the proposed model may appear relatively complicated, the number of hyperparameters that need be set is actually modest. We compare the AR-LSBP and GP-LSBP models for imposing a prior on the temporal dependence with a simpler model in which the priors for each time point t are independent. In the context of this independent LSBP (ind-LSBP), we impose

βkjt~𝒩(0,αkjt1),  αkjt~Gamma(a0,b0) (12)

and we set a0 = b0 = 10−6 as in the relevance vector machine (RVM) (Tipping 2001). The same gamma priors are placed on αβ and αλ for the AR-LSBP model, and on c0 and c1 for the GP-LSBP model. In all examples the truncation level on the LSBP was set at K = 20, and the results are insensitive to this parameter, as long as it is large relative to the actual number of clusters/segments inferred by the model. Finally, we must specify the library for kernel parameters {ψk}k=1,K; the manner in which these are specified is discussed when presenting the specific examples.

For uniform temporal sampling, the AR(1) and GP imposition of temporal dynamics are theoretically identical, for the imposed GP covariance. Nevertheless, even for uniform temporal sampling we show results for both of these implementations, because the details of the numerics dictates that the two models are slightly different in practice. Specifically, within the GP model a point estimate is employed for the kernel hyperparameters, with this obviously unnecessary for the direct AR(1) model. The comparison allows examination of the accuracy of this approximation within the GP inference, relative to the direct AR(1) implementation; this sheds light on the quality of the computations for non-uniform temporal sampling, where the GP implementation is required.

4.1 Simulation Example

We assume the data are constructed by a total of 9 equally spaced time instances, t = 1, 2, …, 9. At each time we randomly draw 50 spatial locations in one-dimensional space from a uniform distribution with support [0, 20], denoted as sit ~ Uniform[0, 20], i = 1, …, 50, t = 1, …9. For each location, we draw an event count vit from a Poisson distribution with the intensity parameter λit. To represent the time dynamics, we let λit = 20 when 5+58(t1)sit10+58(t1), and λit = 1 otherwise. By this setting the high-intensity window moves gradually from [5, 10] to [10, 15] when time t increases. Note that here sit ∈ ℝ1 and vit ∈ ℝ1. The kernel centers are defined as j = 0.5(j − 1) for j = 1, …, J. The data are depicted in Figure 2. Within the analysis, the library of kernel parameters is the union of the following two sets: {0.05, 0.1, 0.05, …, 0.5} and {0.5, 1, 1.5, …, 5}.

Figure 2.

Figure 2

Simulation example. The high-intensity window moves gradually from [5, 10] to [10, 15] when time increases.

The mean results from VB are shown in Figure 3, in which the inferred Poisson rate is constituted; for these and all VB results the computations were stopped when the change in the variational bound changed by 10−4. Further, all VB results are initialized at random. The VB results presented below represent a local-optimal solution, which forms one source of error, and this is compounded by the factorized approximation to the posterior. Nevertheless, the VB implementation of the GP-LSBP and AR-LSBP model yields results comparable to that of the MCMC implementation. When implementing MCMC, a total of 10,000 iterations are run, with the first 1000 discarded as burnin. On the same PC (and both codes written in Matlab), the VB GP-LSBP and ARLSBP results required approximately 158 seconds of CPU time, while the VB ind-LSBP results required approximately 96 seconds. In contrast, the GP-LSBP and AR-LSBP results based on the MCMC sampler required 6517 seconds, and ind-LSBP required 2913 seconds (109 and 48 minutes, respectively). The software was not optimized, and these numbers therefore represent a relative view of computational expense of the VB and MCMC solutions.

Figure 3.

Figure 3

Segmentation and latent intensity inferred by VB: Comparison between GP-LSBP and ind-LSBP, considering the simulated-data example. The AR-LSBP results are similar to the GP-LSBP results, and are omitted for brevity.

From Figure 3 it is observed that, for the VB solution, incorporation of temporal smoothness in the GP-LSBP model yields significant improvements in the inferred Poisson rate, as compared to the VB ind-LSBP solution (with temporal dependence not accounted for in the prior); the AR-LSBP model performed similarly to GP-LSBP. It appears that the prior constraint imposed by GP/AR within the VB solution plays an important role in mitigating the underlying VB approximations. By contrast, for the MCMC results improvements are manifested via GP-LSBP and AR-LSBP relative to ind-LSBP, but in this case the differences are less dramatic (plots of MCMC results are not shown, for brevity).

We next examine the generative performance of the proposed model. After the model has been learned, either via VB or MCMC, we randomly generate 100 new test data, following the same procedure that generated the training data. We then compute the average log-likelihood and the accuracy rate of segmentation from the learned GP-LSBP, AR-LSBP and ind-LSBP models. The accuracy rate of segmentation is defined as the number of test data points segmented correctly as a fraction of the total number of test data points. The results are summarized in Table 1. We find that the GP-LSBP and AR-LSBP models achieve a higher likelihood and accuracy of segmentation compared to the ind-LSBP. Note that the differences between GP-LSBP, AR-LSBP and ind-LSBP are relatively modest for the MCMC solution, while there are again marked advantages in the GP-LSBP and AR-LSBP solutions relative to ind-LSBP when employing VB inference.

Table 1.

Comparison of generative performance between AR-LSBP, GP-LSBP and ind-LSBP, on simulated data.

Method Average log-likelihood Accuracy rate of segmentation
VB MCMC VB MCMC
AR-LSBP −3.702 −1.749 0.9796 0.9801
GP-LSBP −3.882 −2.082 0.9765 0.9757
ind-LSBP −15.544 −2.274 0.9478 0.9741

Finally we test the prediction performance of the model. We first generate data 𝒟 = {si, vit}i=1,…, 50, t=1, …, 9 as discussed above, and then randomly select Nmiss time instances 1, …, t̂Nmiss from t = 1, …, 9, and this constructs our test data 𝒟tst; the training data 𝒟trn is composed of the data in 𝒟 but not in 𝒟tst. We learn the model based on VB or MCMC analysis with 𝒟trn, and predict the kernel weights β̂kjt̂ and Poisson intensities λ̂k* at time . The average log-likelihood and accuracy of segmentation are evaluated based on the prediction results of 𝒟tst, given only the spatial locations sit̂. We perform 100 trials, and at each trial Nmiss time instances are selected randomly to construct 𝒟tst. The average results are shown in Table 2.

Table 2.

Comparison of prediction performance between AR-LSBP, GP-LSBP and ind-LSBP.

Nmiss Average log-likelihood
AR-LSBP GP-LSBP ind-LSBP
VB MCMC VB MCMC VB MCMC
1 −3.948 −1.975 −4.102 −2.123 −21.194 −2.641
2 −4.211 −2.241 −4.526 −2.473 −27.195 −3.077
3 −4.468 −2.573 −4.718 −2.652 −27.776 −3.507
4 −4.882 −2.740 −5.133 −3.108 −26.682 −3.963
5 −5.801 −3.014 −5.987 −3.521 −31.217 −4.316
Nmiss Accuracy rate of segmentation
AR-LSBP GP-LSBP ind-LSBP
VB MCMC VB MCMC VB MCMC
1 0.9792 0.9794 0.9767 0.9758 0.7165 0.9545
2 0.9787 0.9786 0.9761 0.9754 0.6669 0.9581
3 0.9787 0.9785 0.9763 0.9752 0.6458 0.9379
4 0.9780 0.9783 0.9752 0.9740 0.6647 0.9274
5 0.9763 0.9770 0.9741 0.9633 0.6131 0.9066

Only the GP-LSBP results are fully principled in this analysis, where we use the learned parameters of the GP covariance matrix to interpolate to new time points (Rasmussen and Willams 2006). The AR model implicitly assumes that the data are sampled uniformly in time, while the ind-LSBP has no principled means of interpolating to missing time points. Nevertheless, as a comparison, for the AR-LSBP computations in this test the AR component was simply applied to consecutive observed time points, essentially assuming that the temporal variation was smooth, even if not sampled uniformly. To interpolate to new points using the learned AR-LSBP and ind-LSBP results, to obtain model parameters at any new point , we average the learned model parameters from the two closest observed points, before and after . From Table 2 it is observed that again for the VB solution, there is a marked advantage manifested via the GP-LSBP and AR-LSBP priors, as compared to ind-LSBP. For the MCMC solution, there is also a noticeable advantage manifested via the GP-LSBP and AR-LSBP solutions, particularly for segmentation accuracy for relatively large Nmiss. Based upon the average log-likelihood, we note a small but consistent advantage of the AR-LSBP model over the GP-LSBP counterpart, for both VB and MCMC computations. This observation on simulated data will carry over to the analysis of real data.

4.2 Crime Data

We investigate crime events in Cincinnati, Ohio, USA; the data are available online at http://www.cincinnati-oh.gov. The data include the date, time, location and other information of all reported crimes in Cincinnati since 2006. This data set was first studied in Taddy (2008, 2010), where a mixture of beta distributions was employed to model the event density ν(s), and to discover the evolution of the density with time. In our problem we seek to segment the city into contiguous regions, with crime events at each region characterized by a common constant Poisson intensity vector.

We consider 117,314 crime events within the city, reported from January 2006 to December 2008. Each crime is assigned a uniform crime reporting (UCR) code. In total more than 170 different UCR codes describe a variety of crimes. These crime events can be categorized into 17 different crime types, based on the prefix of their UCR codes. They are: 1) murder, 2) rape, 3) robbery, 4) assault with weapon, 5) burglary, 6) nonvehicle theft, 7) vehicle theft, 8) general assault, 9) arson, 10) forgery, 11) fraud, 12) receiving stolen property 13) vandalism, 14) weapons related but no physical harm, 15) sexual crime, 16) children related, 17) general harassment. As an example, the locations (latitude and longitude coordinates) of the 3090 crime events in January 2008 are shown in Figure 1(a). Based on the locations of all the 117,314 crime events, the observation window is considered within a rectangular region of [39.06°, 39.24°] latitude and [−84.70°, −84.35°] longitude.

We construct the data 𝒟 = {si, vit}i=1, …, M, t=1, …, T as follows. The total crime events within one month are considered as one time instance, and therefore there are in total 36 time points. At each time, the observation window is divided into 15,750 small square grids (90 rows by 175 columns) of size 0.002° × 0.002°, and the event location sit is defined as the center of each small square area, with this denoted as Δ(si). The count vijt is then the number of Type j crimes within Δ(si) over the corresponding month indexed by t. This produces a 17-dimensional count vector vit at si for i = 1, …, 15750 and t = 1, …, 36. Related research in Taddy (2008, 2010) applied marked Poisson processes to address the crime types, regarding each crime type at sit as a random mark. Here we attempt to segment the city by considering all the crime types within a local region Δ(sit) as a correlated variable (a vector), instead of treating each event as a random type.

The proposed GP-LSBP, AR-LSBP and ind-LSBP models are inferred via VB and MCMC, with truncation level K = 20. The kernel centers are uniformly spaced every 0.04° (latitude and longitude) in the observation window, with a total of 60 kernel centers defined. The library of kernel parameters {ψl*}l=1,L is the union of the following sets: {0.006°, 0.012°, 0.018°, …, 0.06°} and {0.06°, 0.12°, 0.18°, …, 0.6°}. On the same PC, the VB GP-LSBP and AR-LSBP results require approximately 2.8 hours of CPU time, while the VB ind-LSBP results required approximately 1.3 hours. By contrast, due to the large size of the data, 3000 MCMC samples are employed, with 1000 discarded as burn-in. With the same PC, the MCMC GP-LSBP and AR-LSBP results required approximately 47.5 hours. We also considered 10,000 MCMC samples, with 1000 discarded as burn-in (at very significant computational cost), with little change in the results relative to those presented below.

Figure 4(a) shows the VB-based segmentation of the entire spatial observation window at 36 time instances, using GP-LSBP (similar results were found using AR-LSBP, omitted for brevity). The city is segmented into 4 regions (inferred by the model), and the segmentation changes smoothly with time. For comparison, Figure 4(b) shows the segmentation results obtained by applying an independent LSBP (VB computations) at each time instance. It is observed that with GP priors the proposed model presents a spatial segmentation more consistently over time and spatially more contiguously than ind-LSBP.

Figure 4.

Figure 4

Comparison of spatial segmentation for crime data in Cincinnati, Ohio from January 2006 to December 2008 (VB results). Each color represents a segment with an associated intensity vector λkt*, and there are a total of four segments inferred: 1 - dark blue, 2 - light blue, 3 - yellow, and 4 - dark red. (a) GP-LSBP, (b) ind-LSBP.

We are also interested in examining the clustering manifested by the MCMC computations, with this complicated by label switching between samples. We compute an MCMC clustering that may be compared to the VB results as follows. We consider one spatial location from Segment 1 in Figure 4, denoted s1*. Based upon the MCMC collection samples, for each other spatial location in the scene ss1*, we compute the probability that position s and s1* are in the same cluster. All positions s with high probability of such clustering should (ideally) constitute a spatial region similar to Segment 1 inferred via VB. In Figure 5(a) we show MCMC results for Segment 1, and the high-probability regions (red) do indeed align well with the VB results in Figure 4. In Figure 5(b) we compute similar MCMC results for Segment 2, and in this case the high-probability spatial locations are aligned well with the VB results for Segment 2 in Figure 4. We found in general good agreement between the VB and MCMC segmentation results for GP-LSBP and AR-LSBP for these data.

Figure 5.

Figure 5

Comparison of spatial segmentation for crime data in Cincinnati, Ohio from January 2006 to December 2008 (MCMC results). (a) Segment 1, (b) Segment 2, where these segments are related to the results in Figure 4(a). The color scale is the same in (a) and (b).

Figures 6(a)–(d) show the dynamic change of the VB-inferred Poisson intensities for each segment. To make the figure easier to read, we only plot components 3, 5 and 6 from the 17-dimensional vector λkt*; these components correspond to crime types “robbery”, “burglary”, and “nonvehicle theft”, respectively. From these figures we observed that in all segments the crime intensities fluctuated periodically over season. Generally in summer there were more crime events of all types than than in winter. The overall crime intensities varied with regions. Segment 4 was in the downtown region, and had much more crime events compared to other regions. In all four regions Type 6 crime (nonvehicle theft) was dominant. In addition, the crime patterns were different in different regions. For example, Segment 4 had relatively less Type 5 crime (burglary), while in other 3 segments, the intensity of Type 5 crime was almost half of Type 6 crime. In Segment 4, Type 3 crime (robbery) was prevalent, while Segment 1 had relatively less Type 3 crime. For a comparison, we also present the MCMC-inferred Poisson intensities of Segment 3, as a representative (typical) example. It is observed that the MCMC and VB results are in generally good agreement for the GP-LSBP and AR-LSBP models.

Figure 6.

Figure 6

Inferred intensity vector λkt* associated with the segments shown in Figure 4(a). Only 3 crime types are shown here to make the figure easy to read.

These results may be used by police to assign resources (personnel) to segmented regions in a consistent manner, to address varying levels of crimes. The segments typically change with season, and the spatial distribution of resources may be temporally adjusted as well. By relating the demographics of regions to the spatial segments (we didn’t have access to such demographics), one may deduce relationships between types of crimes and the types of people living and working in given regions, of interest to criminologists and city planners.

Following the same procedure as in the simulated example, we now examine the prediction performance of our model for the crime data. We randomly select Nmiss time instances to construct a test set, and let the remaining data be the training set. Ten random trials are performed and the comparison of average log-likelihood between GP-LSBP, AR-LSBP and ind-LSBP inferred by VB is shown in Table 3. Since in this real application there is no ground truth, we cannot evaluate the accuracy rate of segmentation as done in the simulated example. From Table 3 GP-LSBP and ARLSBP consistently achieve higher likelihood than the independent LSBP for various Nmiss values. Note also that for these real data there is less of a difference between the AR/GP-LSBP and ind-LSBP results for the VB solution, as compared to the synthetic data considered above. We do not perform this experiment for MCMC inference, as the computational requirements needed to perform this many experiments are prohibitive with this large data set (however, in isolated tests, the results were slightly better than the VB-based GP-LSBP and AR-LSBP models, consistent with the simulated example above).

Table 3.

Comparison of average log-likelihood in the prediction for the crime data (VB inference).

Nmiss 1 2 3 4 5 6
AR-LSBP −6.131 −6.352 −7.204 −7.631 −7.957 −8.338
GP-LSBP −6.570 −6.762 −7.713 −7.965 −8.426 −8.721
ind-LSBP −8.666 −9.247 −9.595 −8.840 −9.848 −8.762

4.3 Pearson residuals

Following Taddy (2010), we check model quality via computation of Pearson residuals (see Turner et al. (2005) for a detailed discussion of residuals for spatial point processes). For the modeling framework considered here, the Pearson residual reduces to

R(Δ(sit),λ̂it)=nitλ̂it=λ̂it (13)

where nit is the number of events in region Δ(sit) and λ̂it is the inferred Poisson rate parameter in small region Δ(sit). Ideally the residual should be close to zero, if the underlying Poisson assumption is valid. Note that within the proposed model we have a vector of counts vit, and therefore we may compute the residual for each of the different types of crimes.

From Figure 7, which is based upon VB inference, we observe that the Pearson residuals tend to decrease substantially based upon a model that explicitly imposes temporal smoothness (note that the residuals are significantly lower for GP-LSBP and AR-LSBP, relative to ind-LSBP). Further, the AR-LSBP residuals are smaller than those of the GP-LSBP. Although we omit the MCMC results for brevity, similar phenomena were observed in that case. The residuals tend to be small, in the range [−2,2], with the larger values manifested on the edges of segments, as might be expected (segment interfaces are characterized typically by abrupt changes in statistical properties).

Figure 7.

Figure 7

Pearson residuals for “nonvehicle theft,” using VB inference; best viewed electriconically, zoomed in. (a) ind-LSBP, (b) GP-LSBP, (c) AR-LSBP.

5 Conclusions

A Bayesian hierarchical model has been presented for segmenting time-evolving point process data, when the events are in vector form. The spatial-dependent point process is modeled using a generalization of a Poisson process, with piecewise constant Poisson intensities defined within the observation window. The logistic stick-breaking process is employed to favor spatially contiguous segments, and GP and AR models are considered for imposition of temporal smoothness of the segmentation and the Poisson intensity.

In addition to developing the model, a contribution of this paper concerns a detailed comparison between MCMC sampling and a VB approximation. For both the synthetic and real data, it was found that the GP-LSBP and AR-LSBP results computed via VB and MCMC were in close agreement, and the imposition of temporal smoothness manifested via GP/AR (compared to treating the different temporal samples independently) yielded significant improvements in the VB results. While the VB results are approximate, and are subject to local-optimal solutions (although the GP/AR models seemed to mitigate this to some extent), the VB approach provides significant advantages with regard to computations. For the large crime data set considered, while the MCMC results are in principle convergent, if run for enough samples, this attractiveness is tempered by the very significant computation time required to realize a number of collection samples to assure that we are indeed sampling from the posterior. Given that computational requirements will in practice mitigate the ability to collect as many MCMC samples as desired (and therefore MCMC is also an approximation), the VB solution appears to be an attractive option. However, the results presented here indicate that imposition of as much information as possible (here smoothness via GP/AR) is desirable. In future research it is of interest to consider online VB analysis (Hoffman et al. 2010), which provides further acceleration for large datasets, and it is appropriate for time-dependent data observed in an online/sequential manner, like the time-evolving crime data considered here.

Acknowledgments

The authors wish to thank the reviewers and editors for their comments, which have substantially improved the paper. The research reported here was supported by the Army Research Office (Dr. Liyi Dai) and the Office of Naval Research (Dr. Wen Masters).

Appendix: MCMC and VB Update Equations

A.1 MCMC Inference

The MCMC computations are performed using Gibbs sampling where the conditional density functions are analytic, and samples are drawn from the conditional density functions via Metropolis-Hastings when not analytic. The update equations are summarized as follows.

  • Sample λkj:* from their respective posteriors conditional on {Zk (sit)} and {νijt}:
    p(λkj:*|)t=1Ti=1MPoisson (νijt|λkjt*)I(ci=k) ln𝒩(λkj:*|0,Γkj). (14)
    It is not possible to sample λkj:* from the full conditionals. We update each λkj:* by the Metropolis-Hastings algorithm. When updating λkj:*, the proposed λkj:*(τ+1) is generated from the following distribution
    q(ln λkj:*(τ+1)| ln λkj:*(τ))=𝒩(ln λkj:*(τ),(d0+d2)IT) (15)
    where IT is the T × T identity matrix, and T denotes the number of time points. The acceptance probability for the proposed λkj:*(τ+1) is min (1,α(λkj:*(τ+1),λkj:*(τ))), where
    α(λkj:*(τ+1),λkj:*(τ))=exp(12(λkj:*(τ+1))TΓkj1λkj:*(τ+1)+12λkj:*(τ)TΓkj1λkj:*(τ))·t=1T[(λkjt*(t+1)λkjt*(t))i=1Mwk(sit)υij11exp[i=1Mwk(sit)(λkjt*(τ+1)λkjt*(τ))]]. (16)
  • Sample βk:i from their respective posteriors conditional on {Zk (sit)}:
    p(Bk|)t=1Ti=1Mp(Zk(sit)|Bk)j=1J𝒩(βkj:|0,Σkj). (17)
    Reorder the entries of Bk (and the associated Ωk) in (8) such that Bk = [βk:1, …, βk:T]T, then we obtain
    p(Bk|)exp{t=1Ti=1Mf(ηkit)βk:tTφkitφkitTβk:t}× exp{12BkTΩK1Bk+t=1Ti=1M(2Zk(sit)1)φkitTβk:t}. (18)
    So, Bk can be drawn from a normal distribution as
    p(Bk|)=𝒟(Bk;(Ωk1+Uk)1Yk,(Ωk1+Uk)1), (19)
    where Uk is a (J + 1) T × (J + 1) T block-diagonal matrix with the t-th (J + 1) × (J + 1) block expressed as ukt=2i=1Mf(ηkit)ϕkitϕkitT and Yk is a (J + 1) T × 1 vector formed by concatenating the T vectors ykt=i=1M(Zk(sit)12)ϕkit, t = 1, ⋯, T. In these expressions ϕkit = [1, 𝒟 (sit, 1; ψk), ⋯, 𝒟 (sit, J; ψk)]T. The parameter f(ηkit)=φkitTβk:t.
  • Sample Zk (sit) from their respective posteriors conditional on Bk and {νijt}. According to the definition of LSBP,
    p(Zk(sit)=1|)={σ(gk(sit))p(νit|λkt*)σ(gk(sit))p(υit|λkt*)+σ(gk(sit))p(νit|λkt*), if  Zl(sit)=0 for l<kσ(gk(sit)),      if l<k, such that Zl(sit)=1 (20)
    where k′ is the first integer larger than k, associated with non-zero indicator. The equation can be expressed as
    p(Zk(sit)=1|)=11+exp (ρkit), (21)
    with
    ρkit=l<k(1Zl(sit)) log p(νit|λkt*)k>kZl(sit)l<klk(1Zl(sit)) log p((νit|λkt*))+φkitTβk:t. (22)
  • With a uniform prior assumed on the kernel parameter library (a predefined finite set), the posterior distribution for each ψk can be represented as
    p(ψk=ψl*)t=1Ti=1Mσ(gkl(sit))wk(sit)t=1Ti=1Mk>k(1σ(gkl(sit)))wk(sit). (23)
    For each specific k from k = 1, …, K, we have the following update equation
    ψk=ψrk*,rk~Mult (pk1,,pkL),pkj=p(ψk=ψj*)l=1Lp(ψk=ψl*). (24)
    We sample the kernel parameters based on the multinomial distributions from a given discrete set in each MCMC iteration.
  • Sample c0 from its posteriors conditional on {Bk} and {a0, b0}:
    p(c0)Gamma(c0;a0,b0)k=1K𝒩(Bk;0,Ωk). (25)
    Therefore, c0 can be drawn from a Gamma distribution
    p(c0)=Gamma(c0;ã0,0), (26)
    where ã0 = a0+0.5KT(J + 1) and 0=b0+0.5k=1Kj=0Jβkj:T∑̃kj1βkj: with [Σ̃kj]il = c1|titl|.
  • Sample c1 from its posterior conditional on {Bk}:
    p(c1)𝒩(0,1)(c1;0,1)k=1K𝒩(Bk;0,Ωk). (27)
    When updating c1, the proposed c1(τ+1) is generated from the following distribution:
    q(c1(τ+1)|c1τ)=𝒩(0,1)(c1(τ+1);c1τ,1). (28)
    The acceptance probability for the proposed c1(τ+1) is min (1,α(c1(τ+1),c1τ)) , where
    α(c1(τ+1),c1τ)=|Σkj1(c1τ)|K(J+1)2|Σkj1(c1(τ+1))|K(J+1)2exp{12(c1(τ+1)2c1(τ+1)2)}  ×  exp {12(k=1Kj=0Jβkj:TΣkj1(c1τ)βkj:k=1Kj=0Jβkj:TΣkj1(c1(τ+1))βkj:)}. (29)
  • Similarly, d0 can be drawn from a Gamma distribution
    p(d0)=Gamma(d0;ã0,0), (30)
    where ã0 = a0 + 0.5dKT and 0=b0+0.5k=1Kj=1dlnλkj:*TΓ̃kj1lnλkj:* with [Γ̃kj]il = d1|titl|.
  • Similar to c1, we update d1 by the Metropolis-Hastings algorithm. The proposed d1(τ+1) is generated from the following distribution:
    q(d1(τ+1)|d1τ)=𝒩(0,1)(d1(τ+1);d1τ,1). (31)
    The acceptance probability for the proposed d1(τ+1) is min (1,α(d1(τ+1),d1τ)), where
    α(d1(τ+1),d1τ)=|Γkj1(c1τ)|dK2|Γkj1(c1(τ+1))|dK2exp{12(d1(τ+1)2d1(τ+1)2)}×exp{12(k=1Kj=1dlnλkj:*TΓkj1(d1τ)lnλkj:k=1Kj=1dlnλkj:*TΣkj1(d1(τ+1))lnλkj:)}. (32)

A2. VB inference

The log-normal priors placed on the Poisson intensities introduce non-conjugacy, which results in difficulty for VB inference. Therefore, we employ a point estimate for the Poisson intensities, by maximizing the lower bound ℱ. For the GP hyperparameters c1 and d1, the truncated normal prior also introduces non-conjugacy. Their posteriors are also inferred from point estimation by maximizing the VB lower bound. The update equations of the posterior inference of Θ are summarized below. In our model,

Θ={{λkj:*}j=1,,d,k=1,,K,{Bk}k=1,,K,{Zk(si,t)}t=1,,T,i=1,,M,k=1,,K,c0,c1,d0,d1}.
  • The lower bound for the Poisson intensity λkj:* may be derived as
    (λkj:*)12Λk,jTΓkj1ΛkjQkjTeΛkj+RkjTΛkj+constant (33)
    where Λkj=log(λkj:*),Rkj=[i=1M1wk(si1)νij11,,i=1Mwk(siT)νijT1]T, and Qkj=[i=1M1wk(si1),,i=1Mwk(si1)]T, with 〈;·〉 denoting the expectation such that 〈wk(sit)〉 = q(wk(sit) = 1) (see Section 2 for detail of wk(sit)). The point estimate for λkj:* can be updated at each VB iteration by maximizing the lower bound (λkj:*). One may easily examine that (λkj:*) is a concave function, and therefore a global maximum can be obtained by any appropriate convex optimization method. Note that if Γkj10 (setting large variance for the prior distribution), by taking the derivative of (33) and setting it to zero, we have λkj:*=eΛkjRkj/Qkj, which is consistent with the update equation if independent gamma priors are placed on λkjt:* for t = 1, …, T. Therefore, the GP priors represented in Γkj introduce the correlation among the components of λkj:*.
  • To update the variational distribution for the kernel weights βkjt, note that the logistic link function σ(·) is not within the exponential family and therefore introduces the nonconjugacy. We here follow Jaakkola and Jordan (1998) by introducing a variational bound using the inequality
    σ(y)z[1σ(y)]1z=σ(x)σ(η) exp(xη2f(η)(x2η2))
    where x = (2z − 1)y, f(η)=tanh(η/2)4η, and η is a variational parameter. An exact bound is achieved as η = ±x. If we reorder the entries of Bk (and the associated Ωk) in (8) such that Bk = [βk:1, …, βk:T]T, the update equation for Bk can be expressed as
    q(Bk)=𝒩((Ωk1+Uk)1Yk,(Ωk1+Uk)1) (34)
    where Uk is a (J + 1)T×(J + 1)T block-diagonal matrix with the tth (J + 1)×(J + 1) block expressed as
    ukt=2i=1Mf(ηkit)ϕkitϕkitT
    and Yk is a (J + 1)T × 1 vector formed by concatenating the T vectors
    ykt=i=1M(Zk(sit)12)ϕkit,  t=1,,T.
    In the above expressions ϕkit = [1, 𝒦 (sit, 1; ψk), …, 𝒦 (sit, J; ψk)]T. The variational parameters ηkit are then updated as
    ηkit2=ϕkitTβk:tTβk:tϕkit (35)
    where βk:tTβk:t=COV(βk:t,βk:t)+βk:tβk:tT and it may be evaluated from q(Bk) with the mean and variance associated with time t.
  • The variational distribution for the binary indicator Zk(sit) may be updated as
    q(Zk(sit)=1)=11+exp(ρkit) (36)
    with
    ρkit=l<k(1Zl(sit))logp(νit|λkt*)k>kZk(sit)l<klk(1Zl(sit))logp(νit|λkt*)+j=1Jβkjt𝒦(sit,j;ψk)+βk0t
    where log p(νit|λkt*) is the data log-likelihood from the Poisson distribution such that log p(vit|λkt*)=log (j=1dpoisson(νijt|λkjt*)), and the expectation 〈βkjt〉 can be obtained from q(Bk).
  • Due to the non-conjugacy of the sigmoid function, we cannot acquire a variational distribution for ψk. However, we can sample it from its posterior distribution by establishing a discrete set of potential kernel widths {ψl*}l=1,,L. The posterior distribution for each ψk is represented as
    p(ψk=ψl*)exp{t=1Ti=1Mwk(sit)log σ(gkl(sit))}×exp{t=1Ti=1Mk>kwk(sit)log(1σ(gkl(sit)))}, (37)
    where gkl(sit)=j=1Jβkjt𝒦(sit,j;ψl*)+βk0t. The detailed calculations of log σ(gkl(sit)) and log (1σ(gkl(sit))) can be found in Ren et al. (2011).
  • The variational distribution for c0 may be updated as
    q(c0)=Gamma (c0;ã0,0), (38)
    with ã0 = a0+0.5KT(J + 1) and 0=b0+0.5k=1Kj=0Ji=1Tl=1T[∑̃kj1]ilβkjiβkjl with [Σ̃kj]il = c1|titl|.
  • The VB lower bound for c1 may be derived as
    (c1)=log𝒩(0,1)(c1;0,1)+k=1Klog𝒩(Bk;0,Ωk)+constant. (39)
    The point estimate for c1 can be updated at each VB iteration by maximizing the lower bound ℱ(c1).
  • Since a point estimate of λkj: is employed at each VB iteration, the variational distribution for d0 may be the same as (30)
    q(d0)=Gamma (d0;ã0,0), (40)
    where ã0 = a0 + 0.5dKT and 0=b0+0.5k=1Kj=1dlnλkj:*TΓ̃kj1lnλkj:*.
  • Similarly, the lower bound for d1 is
    (d1)=log𝒩(0,1)(d1;0,1)+k=1Kj=1dlog𝒩(Λkj;0,Γkj)+constant (41)
    and the point estimation for d1 is obtained by maximizing ℱ(d1).

By following (33)(41), the model parameters and GP hyperparameters can be updated iteratively until convergence. In our experiment, we observed fast convergence; typically the relative change of the lower bound reduces to 10−4 within 100 iterations.

References

  1. Achcar JA, Rodrigues ER, Tzintzun G. Using non-homogeneous Poisson models with multiple change-points to estimate the number of ozone exceedances in Mexico City. Environmetrics. 2011;22:1–12. [Google Scholar]
  2. Adams RP, Murray I, MacKay D. Tractable nonparametric Bayesian Inference in Poisson processes with Gaussian process intensities; International Conference on Machine Learning; 2009. pp. 9–16. [Google Scholar]
  3. Beal MJ. Ph.D. thesis, Gatsby Computational Neuroscience Unit. University College London; 2003. Variational algorithms for approximate Bayesian inference. [Google Scholar]
  4. Chakraborty A, Gelfand AE. Analyzing spatial point patterns subject to measurement error. Bayesian Analysis. 2010;5:97–122. [Google Scholar]
  5. Diggle PJ. Statistical Analysis of Spatial Point Patterns. 2 edition. Arnold; 2003. p. 815. [Google Scholar]
  6. Diggle PJ, Menezes R, Su T. Geostatistical inference under preferential sampling (with discussion) Journal of the Royal Statistical Society - Series C. 2010;59:191–232. [Google Scholar]
  7. Heikkinen J, Arjas E. Non-parametric Bayesian estimation of a spatial Poisson Intensity. Scandinavian Journal of Statistics. 1998;25:435–450. [Google Scholar]
  8. Hoffman M, Blei D, Bach F. Advances in Neural Information Processing Systems. Vancouver, Canada: 2010. Online learning for latent Dirichlet allocation; pp. 993–1022. [Google Scholar]
  9. Hossain MM, Lawson AB. Approximate methods in Bayesian point process spatial models. Computational Statistics and Data Analysis. 2009;53:2831–2842. doi: 10.1016/j.csda.2008.05.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Jaakkola T, Jordan MI. Bayesian parameter estimation through variational methods. Statistics and Computing. 1998;10:25–37. [Google Scholar]
  11. Ji C, Merl D, Kepler TB. Spatial mixture modeling for unobserved point processes: Examples in immunofluorescence histology. Bayesian Analysis. 2009;4:297–315. doi: 10.1214/09-ba411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Kottas A, Sansó B. Bayesian mixture modeling for spatial Poisson process intensities, with applications to extreme value analysis. Journal of Statistical Planning and Inference. 2007;137:3151–3163. [Google Scholar]
  13. Luttinen J, Ilin A. Advances in Neural Information Processing Systems. Vancouver, Canada: 2009. Variational Gaussian-process factor analysis for modeling spatio-temporal data; pp. 1177–1185. [Google Scholar]
  14. Møller J, Syversveen AR, Waagepetersen RP. Log Gaussian Cox process. Scandinavian Journal of Statistics. 1998;25:451–482. [Google Scholar]
  15. Møller J, Waagepetersen RP. Statistical Inference and Simulation for Spatial Point Processes. Chapman & Hall/CRC; 2004. [Google Scholar]
  16. Pati D, Reich BJ, Dunson DB. Bayesian geostatistical modeling with informative sampling locations. Biometrika. 2010;98:35–48. doi: 10.1093/biomet/asq067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Rasmussen CE, Willams CKI. Gaussian Processes for Machine Learning. MIT Press; 2006. [Google Scholar]
  18. Rathbun SL, Cressie N. Asymptotic propertes of estimators for the parameters of spatial inhomogeous Poisson point processes. Advances in Applied Probability. 1994;26:122–154. [Google Scholar]
  19. Ren L, Du L, Carin L, Dunson DB. Logistic stick-breaking process. Journal of Machine Learning Research. 2011;12:203–239. [PMC free article] [PubMed] [Google Scholar]
  20. Sethuraman J. A constructive definition of Dirichlet priors. Statistica Sinica. 1994;4:639–650. [Google Scholar]
  21. Taddy M. Autoregressive mixture models for dynamic spatial Poisson processes: Application to tracking intensity of violent crime. Journal of the American Statistical Association. 2010;105:1403–1417. [Google Scholar]
  22. Taddy M, Kottas A. Mixture modeling for marked Poisson processes. Bayesian Analysis. 2012;7:335–362. [Google Scholar]
  23. Taddy MA. Ph.D. thesis, Statistics and Stochastic Modeling. Santa Cruz: University of California; 2008. Bayesian nonparametric analysis of conditional distributions and inference for Poisson point processes. [Google Scholar]
  24. Tipping ME. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research. 2001;1:211–244. [Google Scholar]
  25. Turner R, Møller J, Hazelton M. Residual analysis for spatial point processes (with discussion) Journal of the Royal Statistical Society - Series B. 2005;67:617–666. [Google Scholar]
  26. Wolpert R, Ickstadt K. Poisson/Gamma random field models for spatial statistics. Biometrika. 1998;85:251–267. [Google Scholar]

RESOURCES