Abstract
In disease surveillance applications, the disease events are modeled by spatio-temporal point processes. We propose a new class of semiparametric generalized linear mixed model for such data, where the event rate is related to some known risk factors and some unknown latent random effects. We model the latent spatio-temporal process as spatially correlated functional data, and propose Poisson maximum likelihood and composite likelihood methods based on spline approximations to estimate the mean and covariance functions of the latent process. By performing functional principal component analysis to the latent process, we can better understand the correlation structure in the point process. We also propose an empirical Bayes method to predict the latent spatial random effects, which can help highlight hot areas with unusually high event rates. Under an increasing domain and increasing knots asymptotic framework, we establish the asymptotic distribution for the parametric components in the model and the asymptotic convergence rates for the functional principal component estimators. We illustrate the methodology through a simulation study and an application to the Connecticut Tumor Registry data.
Keywords: Composite likelihood, Functional data, Latent process, Point process, Semi-parametric methods, Splines, Spatio-temporal data, Strong mixing
1 Introduction
Spatio-temporal point patterns commonly arise from many fields including ecology, epidemiology and seismology (e.g., Brix and Møller, 2001; Diggle, 2006; Schoenberg, 2003). The log-Gaussian Cox processes (LGCPs), first introduced by Møller et al. (1998) in the spatial case and later on extended to the spatio-temporal setting by Brix and Møller (2001) and Brix and Diggle (2001), provide a wide class of useful models for modeling such kind of data. For a typical spatio-temporal LGCP, its intensity function is assumed to be a log-linear model of some latent spatio-temporal Gaussian process, where the mean of the process may depend on some observed covariates. Borrowing ideas from recent developments in functional data analysis (Ramsay and Silverman, 2005), we model the latent temporal process at any fixed spatial location as a functional process with a standard functional principal component expansion. We allow the functional principal component scores at different locations to be spatially correlated. The proposed model can accommodate both nonparametric temporal trend and spatio-temporal correlations in the point process.
In functional data analysis (FDA), the data considered are collections of curves, which are usually modeled as independent realizations of a stochastic process. Some recent papers on this topic include Yao, Müller and Wang (2005ab), Hall and Hosseini-Nasab (2006), Hall, Müller and Wang (2006) and Li and Hsing (2010ab). Di et al. (2009) and Zhou et al. (2010) studied multi-level functional data, where functional data at the lower level of the hierarchy are allowed to be temporally correlated. All the aforementioned papers considered only Gaussian type of functional data. Recently, Hall, Müller and Yao (2008) studied generalized longitudinal data, where the non-Gaussian longitudinal trajectories are linked to some Gaussian latent processes through a nonlinear link function and these latent random processes are modeled as functional data. For such non-Gaussian longitudinal data, Hall et al. proposed a nonparametric estimation procedure based on a delta method, which is an approximation by ignoring the higher order influence of the latent processes. There has also been some recent work on functional data modeling of point processes, including Bouzas et al. (2006), Illian et al. (2006) and Wu et al. (2013). These authors considered data with independent replicates of the point process and modeled a summary measure of the point process (e.g. the intensity function or the L-function) as functional data.
To develop FDA tools for spatio-temporal point processes, we encounter many new challenges and our proposed method is hence different from those in the literature in a number of ways. First, in most FDA papers in the literature, the data consist of n independent units (subjects). In our settings, however, there is only one realization of the spatio-temporal process, and the data are correlated both spatially and temporally. Second, unlike the scenarios considered in the classic FDA literature where the functional trajectories can be directly observed, the functional data in our setting are latent processes that determine the rate of events. To estimate the covariance structure of the process, we propose a novel method based on composite likelihood and spline approximation. We develop asymptotic properties of our estimators under an increasing domain asymptotic framework. Third, we perform spatial prediction of the latent principal component scores using an empirical Bayes method. These predicted spatial random effects can be put into maps to highlight hot areas with unusually high event rates or increasing trends in event rates. Such information can be valuable to government agencies when making public health polices.
Our work is motivated by cancer surveillance data collected by the Connecticut Tumor Registry (CTR). The CTR is a population-based resource for examining cancer patterns in Connecticut, and its computerized database includes all reported cancer cases diagnosed in Connecticut residents from 1935 to the present. Our primary interest here is to study the spatio-temporal pattern of pancreatic cancer incidences based on 8,230 pancreatic cancer cases in the CTR database from 1992 to 2009. The residential addresses and time of diagnosis are both available and are assumed to be generated by a spatio-temporal point process.
The rest of the paper is organized in the following way. We introduce the model assumptions in Section 2 and propose our estimation procedures in Section 3. Then we study the asymptotic properties of the proposed estimators in Section 4. The proposed methods are tested by a simulation study in Section 5 and are applied to the CTR data in Section 6. Assumptions for our asymptotic theory are collected in the appendix. All technical proofs and implementation details, including variance estimation, model selection and model diagnostic, are provided in the online Supplementary Material.
2 Model Assumptions
Let N denote a spatio-temporal point process that is observed on W = D⊗T, where D ⊂ ℝ2 is a spatial domain and T is a time domain. Let X be an L2 Gaussian random field on W. We assume that conditional on X, N is a Poisson process with an intensity function λ(s, t) given by
| (1) |
where g is a known link function such that g−1(·) is nonnegative, Z(s, t) is a d-dimensional covariate vector, and X represents spatio-temporal random effects that cannot be explained by Z.
In this paper, we will focus on the log link function, i.e. g(·) = log(·). The model given in (1) then becomes an LGCP model. In point process literature, the effect of the covariate Z(s, t) is often assumed to be parametric (Møller and Waagepetersen, 2007), although nonparametric approaches have also been recently proposed (Guan, 2008a; Guan and Wang, 2010). Similarly, a parametric model is generally used for the covariance structure of the latent process X(s, t). For example, Brix and Diggle (2001) assumed a covariance structure from a class of Ornstein-Uhlenbeck processes, while Diggle et al. (2005) used a parametric covariance model that is stationary both in space and time. However, we are not aware of any existing literature that models the latent process nonparametrically in a spatio-temporal log-Gaussian Cox process as what we will do next.
For a fixed location s, X(s, t) can be considered as an L2 Gaussian process on T, and hence by the standard Karhunen-Loève expansion (Ash and Gardner, 1975),
| (2) |
where μ(t) = E{X(s, t)} with the expectation taken over all locations, ψj(·)'s are orthonormal functions, and ξj(·)'s are independent spatial Gaussian random fields. We assume that ξj(s) is a zero-mean random field with variance ωj and covariance function, Cj(s1, s2) = cov{ξj(s1), ξj(s2)}, for j = 1, 2, ⋯, p. The functions ψj(·)'s are called the eigenfunctions of process X. We assume that ψj(·)'s are kept in a descending order of ωj's, i.e. ω1 ≥ ω2 ≥ ⋯ ≥ ωp > 0. The number of principal components p can be ∞ in theory, but is often assumed to be finite for practical considerations.
The general covariance function of X(s, t) is
| (3) |
which implies that X(s, t) is not necessarily stationary in t. Note that the above model coincides with the spatial coregionalization model commonly used for multivariate Gaussian random fields (Gelfand et al., 2004) and is not separable when p > 1. To connect with the FDA literature, it is helpful to consider the covariance function of the latent process X(s, t) at the same location s. By setting s1 = s2, (3) is simplified to
| (4) |
where ωj and ψj(·)'s are the eigenvalues and eigenfunctions of RT(·, ·). If a consistent estimator R̂T(·, ·) exists, one can then estimate {ωj,ψj(·)} by an eigenvalue decomposition of R̂T using a standard functional data analysis approach (Ramsay and Silverman, 2005).
We estimate the proposed model through the use of the first- and second-order intensity functions of N. Let N(ds, dt) denote the number of events in an infinitesimal window (ds, dt), and let |ds| and |dt| denote the volumes of ds and dt, respectively. The marginal first-order intensity function, which characterizes the probability to observe an event at a given location and time, is defined as
| (5) |
where . In derivation of (5), we use the fact that E{exp(Y)} = exp(μ + σ2/2) for a Y ∼ Normal(μ, σ2) and the covariance of X(s, t) in (4). Non-stationarity in the first-order intensity function can be modeled by including proper spatio-temporal covariates Z(s, t). For example, in our disease surveillance application, non-stationarity in the cancer rate caused by spatially varying population level is accommodated by including population density as a covariate.
The second-order intensity function, which characterizes the correlation within the process, is defined as
| (6) |
for (s1, t1) ≠ (s2, t2), where the last equality is a result of the Gaussian assumption for the principal component random processes ξj(s). The Gaussian assumption is also commonly made in other settings such as generalized linear mixed models and spatial hierarchical models (Banerjee et al., 2003).
Given the first- and second-order intensity functions, the pair correlation function (e.g., Møller and Waagepetersen, 2004) for the point process is
| (7) |
If Cj(s1, s2) is stationary, i.e., it only depends on the spatial lag s1 – s2, then
2(s1, s2, t1, t2) is a function of s1 – s2 and the time points (t1, t2). Hence, the point process is second-order intensity reweighted stationary in space (Braddley et al., 2000).
3 Estimation procedure
3.1 Estimation of the mean components
The Poisson maximum likelihood (Schoenberg, 2005) method is a general approach to fit parametric models for the intensity function of a point process, where the point process can be purely spatial, temporal or spatio-temporal. Asymptotic properties of the resulting estimator such as consistency and asymptotic normality were considered in Guan and Loh (2007). In the spatio-temporal case, let λ(s, t; θ) be such a model under consideration where θ is some unknown parameter. Then, θ can be estimated by maximizing
| (8) |
In our setting, we will apply the above method to estimate β and γ(t). Observe that λ(s, t; θ) = exp{ZT (s, t)β + γ(t)}. We then modify (8) as
| (9) |
To further parameterize γ(t), we propose to approximate it by regression splines (Zhou et al., 1998; Zhu, et al., 2008). For simplicity, we assume the time domain to be T = [0, 1]. Let κj = j/(J1 + 1), j = 0, …, J1 + 1 be equally spaced knots on T, then we can define K1 = J1 + r1 normalized B-spline basis functions of order r1, which form the basis of a functional space . The B-spline basis functions are
where [κj−r1, …,κj]ϕ(κ) denotes the r1th order divided difference of the function ϕ(κ) on r1 + 1 points κj−r1, …, κj, κj = κmin{max(j,0), J1+1} for j = 1 − r1, …, K1, and (x)+ = max(x, 0).
Denote the estimators of β and γ(t) as
| (10) |
Let B(t) = {B1(t), …, BK1(t)}T be the vector of spline basis, and write γ(t) = BT (t)v. For convenience of developing asymptotic theory, we denote , and . Then θ̂ is the solution of the estimating equation
| (11) |
The estimating equation can be solved numerically by a Newton-Raphson algorithm, where the integral in the equation is evaluated numerically. Asymptotic properties of these estimators are studied in Section 4. The number of spline basis functions K1 is often deemed as a tuning parameter in spline smoothing. Selection of this tuning parameter is discussed in Section W.5 of the Supplementary Material.
3.2 Estimation of the eigenvalues and eigenfunctions
As mentioned before, the eigenvalues and eigenfunctions, {ωj, ψj(·)}, can be estimated by an eigen-decomposition of the covariance function RT. By (6), the second-order intensity of the point process across time at a given spatial location is
| (12) |
We will approximate RT by tensor product splines. Let {Bj(t);j = 1, …, K2} be B-spline functions with order r2 defined on J2 equally spaced knots on [0, 1]. The tensor product spline basis functions are given by Bjj′(t1, t2) = Bj(t1)Bj′(t2), j, j′ = 1, …, K2. Denote B[2](t1, t2) = (B11, B12, …, B1K2, B21, …, BK2K2)T(t1, t2), and the functional space spanned by B[2] as . Then the spline approximation for the covariance function is .
We estimate b and hence RT by generalizing the composite likelihood approach of Guan (2006) and Waagepetersen (2007). Let λ2(s1, s2, t1, t2; η) be a parametric model for the second-order intensity function of a point process depending on some parameter vector η. Then, η can be estimated by maximizing
| (13) |
where D⊗T − (s1, t1) = {(s2, t2) : (s2, t2) ∈ D⊗T, and (s2, t2) ≠ (s1, t1)} and w(s1, s2, t1, t2) is a prespecified weight function. For our purpose, it is sufficient to estimate λ2,s(t1, t2) as a result of (12). Thus, we may want to consider pairs of events that occurred at the same location. However, for an orderly spatial point process, the probability of observing two events at the same location is zero. Instead, we define a small spatial neighborhood for every event location s, denoted as Ds,δ = {u ∈ D, ∥u − s∥ < δ}, and consider pairs of the given event and any other events within the neighborhood. This can be achieved by defining the weight function as
where I(·) is an indicator function. We assume that the spatial covariance functions Cj(·)'s are continuous at 0, and we choose δ to be small so that λ2(s1, s2, t1, t2) ≈ λ2,s1(t1, t2) for any s2 ∈ Ds1,δ. The role of δ in our estimation procedure will be discussed in Section 4, after developing the asymptotic theory of the proposed covariance estimator, and a practical criterion to choose δ is provided in the online Supplementary Material.
With the above modifications, the composite likelihood criterion in (13) becomes
| (14) |
where Ds1,δ ⊗ T − (s1, t1) = {(s2, t2) : ∥s2 − s1∥ ≤ δ, and (s2, t2) ≠ (s1, t1)}. We propose to estimate the covariance function as the maximizer of the composite likelihood restricted in the spline space, i.e.,
| (15) |
where β̂ and γ̂ are the estimators defined in (10).
The covariance estimator can be rewritten as , where b̂ is the solution of
| (16) |
where λ̂(s, t) = exp{ZT (s, t)β̂+ γ̂ (t)}. When the neighborhood Ds,δ is sufficiently small and the number of knots of the spline basis is sufficiently large, (16) is an approximately unbiased estimating equation.
The estimates of the eigenvalues and eigenfunctions are obtained by solving the eigen-decomposition problems
| (17) |
Since our estimator R̂T (·, ·) is constrained in a functional subspace spanned by tensor products of a spline basis B(·), the estimated eigenfunction function is spanned by the same spline basis. Hence, the functional eigen-decomposition problem in (17) can be translated into a multivariate problem. Notice that our estimator R̂T is inherently symmetric because the same pairs of events contribute equally in estimating RT(t1, t2) and RT(t2, t1). We can arrange the coefficient vector b̂ into a symmetric matrix Ĝ, so that R̂T(t1, t2) = BT (t1)ĜB(t2). Define an inner product matrix
= ∫T
B(t)BT (t)dt, then the eigen-decomposition problem in (17) is equivalent to the multivariate generalized eigenvalue decomposition
| (18) |
where I(·) is an indicator function. Then, ψ̂j(t) = BT(t)ϕ̂j, j = 1, …, p.
In the procedures described above, selecting the tuning parameters K2 and δ as well as selecting the number of principal components p are important issues, which are addressed in Section W.5 of the Supplementary Material.
3.3 Estimation of the spatial correlation
In the previous section, we estimate the eigenfunctions ψj's and eigenvalues ωj's using pairs of events that occurred within a close distance to avoid the complications of spatial correlation. With ψj's and ωj's consistently estimated, we now estimate the spatial correlation functions using another composite likelihood that includes pairs of events further apart. Suppose the spatial covariance functions are of a parametric form Cj(s1, s2; ϑj), where ϑj's are unknown parameters. We will focus on stationary covariance models such as the flexible class of Matérn covariance models (Stein, 1999). Stationarity in space is commonly assumed in spatial statistics including spatio-temporal log Gaussian Cox processes (e.g., Brix and Møller, 2001; Diggle et al., 2005). In what follows, we use Cj(s1 − s2; ϑj) instead in order to reflect the assumption of stationarity.
Define . To estimate ϑ, we again modify the composite likelihood (13) through the use of a proper weight function w. Specifically, we use
where ϱ is a pre-specified spatial distance. By (6),
Thus, we avoid integrating the covariate process Z(s, t) over the entire spatial temporal domain while evaluating the integral in (13). Let ℓc,spat(ϑ, β, γ, ω, ψ) be the resulting modified composite likelihood, where ω = (ω1, …,ωp)T and ψ(t) = (ψ1, …, ψp)T(t). We substitute (β, γ, ω, ψ) with their estimators described in Section 3.2 and therefore define the estimator of ϑ as
| (19) |
The proposed weight function w excludes the pairs of events with a distance greater than ϱ, since the spatial correlation tends to diminish as the spatial lag increases and including events that are too far away may provides little information about the correlation function. The parameter ϱ therein can be considered as a tuning parameter. A reasonable choice of ϱ is the range of the spatial correlation, which can be estimated by fitting a pilot parametric model to the data or by checking the empirical pair correlation function of the point pattern (e.g., Guan, 2008b).
3.4 Prediction of the spatial random effects
To predict the random fields ξk(s), we use a maximum a posteriori (MAP) predictor as in Møller et al. (1998). For ease of presentation, we assume that the spatial domain D is a rectangle [0, L]2. We partition D into smaller rectangles, Dij = [(i − 1)/M, i/M) × [(j − 1)/M, j/M), i, j = 1, …, ML. We take each Dij sufficiently small so that ξk(s) is approximately a constant for s ∈ Dij, and denote this value as ξij,k, for k = 1, …, p. Given ξij = (ξij,1, …, ξij,p)T, the conditional log-likelihood for the events in Dij ⊗ T is
| (20) |
By the model, ξk has a prior distribution Normal(0, Σk), where Σk is the covariance matrix for the k-th principal component by interpolating Ck(s1 − s2; ϑk) on the discrete grid points. Collect the grid point values of the kth principal component score into ξk = {ξij,k; i, j = 1, …, M}, then the log posterior density of ξ = {ξk; k = 1, …, p} is
| (21) |
We substitute β, ψk(t) and ωk with their estimators defined above, and μ(t) with μ̂(t) = γ̂(t) − 1/2R̂T(t, t). The empirical Bayes estimator ξ̂ is then obtained by maximizing the posterior (21). We can also draw samples from the posterior (21) using the Metropolis-adjusted Langevin algorithm (MALA) described in Møller et al. (1998), and estimate the prediction error by the posterior variance.
Choosing the partition in spatial prediction is a compromise between prediction bias and computation feasibility. The latent processes are defined in a continuous space, hence a finer spatial grid leads to smaller bias. On the other hand, using a finer grid increases the dimension of the latent random vector ξk and makes it harder to simulate from the posterior distribution in (21). Specifically, a higher dimension of ξk make it harder for the Markov chain to mix and, as a result, the Markov chain takes a longer time to converge. In many real applications such as the CTR data considered in this paper, there are natural choices for the partition of the spatial domain, e.g. we used the census tracts to partition the state of Connecticut.
4 Asymptotic properties
To distinguish from other possible values in the parameter space, we denote the true parameters (functions) as β0, γ0, RT0, ωj0 and ψj0. We study the asymptotic properties of the proposed estimators under an increasing domain asymptotic framework as in Guan (2006). We consider a sequence of spatial domains Dn with expanding areas, but the time domain T remains fixed. Specifically, we assume
| (22) |
where |∂Dn| denotes the perimeter of Dn. Condition (22) is satisfied by many commonly encountered shape sequences. For example, let D ⊂ (0, 1] × (0, 1] be the interior of a simple closed curve with nonempty interior. If we multiply D by n to obtain Dn, then Dn satisfies (22). This formulation allows for a wide variety of shapes as the sequence of observation windows, including both rectangular and elliptical regions.
For any function f(x) defined on a compact set
, where
⊂ ℝ or ℝ2, define the supremum and L2 norm of g to be ∥f ∥∞ = supx∈I |f(x)| and ∥f ∥ = {∫I f2(x)dx}1/2, respectively. For any m dimensional vector a, define its L2 norm ∥a∥ = (aTa)1/2, and its L∞ norm
. For any real valued m1 × m2 matrix A = (aij), define its L2 norm as ∥A∥ = supx∈ℝm2 ∥Ax∥/∥x∥, its L∞ norm as
, and its Frobenius norm as ∥A∥F = {tr(AT
A)}1/2.
Theorem 1
Let T be a fixed time domain, Dn satisfies condition (22), then under Assumptions 1-5 in Appendix A,
The convergence rate in Theorem 1 is not optimal. A more detailed study in Theorem 2 below reveals that β̂ converges to β0 in a parametric convergence rate and is asymptotically normal, and γ̂(t) converges to γ0(t) with the usual nonparametric asymptotic convergence rate. To facilitate this result, we first define the residual process in the spatio-temporal pattern (Baddeley et al., 2005) as
| (23) |
We also define
| (24) |
where q(t; β, γ) is defined in Assumption 3, and
| (25) |
where x⊗2 = xxT for any vector x.
Theorem 2
Under the same conditions as in Theorem 1, we have the following weak convergence result
where ΣZ,0 is a shorthand for ΣZ(β0,γ0) and
A tighter asymptotic convergence rate for γ̂ is .
For statistical inference, we need to estimate the covariance matrix of β̂. We follow Heinrich and Prokešová (2010) to derive a consistent moment estimator for cov(β̂). Details of the derivations are given in the Web Appendix B. We also outline a strategy on how to estimate the variance of γ̂(·), in light of the fact that both β̂ and γ̂(·) are obtained by solving the estimating equation (11).
Next, we study the asymptotic properties of the estimated covariance function and those of the estimated eigenvalues and eigenfunctions. The radius of the local neighborhood in the composite likelihood (14) should depend on n. However, we will continue to use δ for ease of exposition.
Theorem 3
Assume that condition (22) and Assumptions 1-9 in Appendix A are true. Then, .
Theorem 3 implies that the radius parameter δ plays a similar role to the bandwidth used in nonparametric regressions. As such, there is a trade-off between bias and variance when choosing the optimal δ. Specifically, increasing δ will include more data into the estimation and hence reduce the variance of R̂T, but it will increase the bias due to the use of pairs of events that are much further apart; the opposite can be said when decreasing δ. A practical method to select δ is proposed in Section W.5.2.
Following Theorem 1 in Hall and Hosseini-Nasab (2006), the following asymptotic properties for the functional principal component estimators in (17) and (18) are immediate.
Corollary 1
Letting Δn = ∥ R̂T − RT0∥, then supj |ω̂j − ωj| ≤ Δn. If p is finite, define ωp+1 = 0. Put τj = min1≤k≤j(ωk − ωk+1), J = inf{j ≥ 1 : ωj − ωj+1 ≤ 2Δn}, then ∥ψ̂j − ψj∥ ≤ CΔn/τj, for 1 ≤ j ≤ J − 1.
Theorems 2 and 3 and Corollary 1 show that our estimators β̂, γ̂(·) and {ω̂j,ψ̂j(·);j = 1, …, p} are consistent, and hence by plugging in these consistent estimators the method described in Section 3.3 also provides a consistent estimator for the spatial correlation parameter ϑ. Using the theory in Guan (2006), we have the following corollary.
Corollary 2
Under condition (22), the assumptions in Appendix A and the regularity conditions in Theorem 1 of Guan (2006), ϑ̂ defined by (19) is consistent to the true correlation parameter ϑ.
5 Simulation study
Let the spatial region be D = [0, 2]⊗2, and the time window be T = [0, 1]. We assume that Z(s) is a one dimensional covariate, which is generated as an isotropic Gaussian random field on D with an exponential covariance structure. In particular, cov{Z(s1), Z(s2)} = exp(−∥s1 − s2∥/ρ), and we set the scale parameter to be ρ = 0.2. The random field X(s, t) is generated with μ(t) = 3 + 2t2 and p = 2 principal components, where (ω1, ω2) = (2, 1), ψ1(t) = 1, and . Both principal component scores, ξj(s), j = 1, 2, are generated from Gaussian random fields with isotropic exponential covariance structures Cj(s1 − s2;ϑj) = ωj exp(− ∥s1 − s2∥/ϑj), and the scale parameters ϑj are also set to be 0.2. Both Z(s) and ξj(s) are simulated on a regular grid with increments 0.01, using the RandomFields package in R. The events are generated using rejection sampling.
In this setting, the covariance function is RT(t1, t2) = 2 + 2cos(2πt1) cos(2πt2). The two principal components have clear interpretations. The first principal component is a random intercept. If ξ1(s) is high in a location s, the event intensity is high at that location. The second principal component can be interpreted as a periodic random effect. On average, there are 1,661 events in the defined spatio-temporal domain.
The simulation is repeated 200 times, and the proposed model selection and estimation procedures are applied to each simulated data set. We first choose the tuning parameters as described in Section W.5 of the Supplementary Material. The AIC (W.9) picks K1 = 10 for most of the simulated data sets, and the cross-validation procedure in Section W.5.2 chooses K2 = 7 and δ = 0.01 most frequently. Therefore, we fix the value of these tuning parameters for further estimation. Under our choice of δ, we include, on average, one neighboring event for every event in the composite likelihood (14). Next, we apply our second AIC (W.11) to choose the number of principal components, and it chooses the correct number, p = 2, of principal components 57% of the time. We find that AIC tends to choose an over-fitted model, and 88% of the time, it chooses the number of principal components to be between 2 and 4. Such an over-fitting tendency is consistent with what has been discovered in the literature. Since under-fitting is usually a more serious problem than over-fitting, the performance of AIC seems satisfactory.
The estimation results are summarized in Figure 1. In panel (a) of Figure 1, we show the box-plots of β̂, ω̂1 and ω̂2. As we can see, β̂ is almost unbiased to the true value β0 = 1, which is consistent with our asymptotic theory. The estimated eigenvalues are slightly biased but nevertheless close to the truth. Although these estimators are consistent, some bias is often reported in FDA literature in a finite-sample setting, see Li and Hsing (2010b). This is true even when direct measurements are made on the curves. In our setting, X(s, t) are latent processes which makes estimation of these parameters considerably harder. In that sense, the behavior of these estimators are reasonable. In panels (b) – (d) of Figure 1, we summarize the estimation results for γ(t) and the two eigenfunctions, where we compare the mean, 5% and 95% pointwise percentiles of the functional estimators with the true functions. The plots suggest that these estimators behave reasonably well. We also provide, in panel (e) of Figure 1, the box-plots of ϑ̂1 and ϑ̂2, which are the spatial correlation parameters for the two principal components estimated using the composite likelihood method in Section 3.3. The box-plots show that these estimates are reasonably close to the true value 0.2. Since the second principal component is less prominent in the data, its spatial correlation is harder to estimate. Consequently, ϑ̂2 is more variable then ϑ̂1.
Figure 1.
Estimation results in the simulation study. Panel (a) shows the boxplots of β̂, ω̂1 and ω̂2. Panels (b) – (d) show the estimation results for γ̂(t), ψ̂1(t) and ψ̂2(t) respectively, where the solid curve in each panel is the true curve, the dashed curve is the mean of the estimator, and the two dotted curves are the pointwise 5% and 95% percentiles of the estimator. Panel (e) shows box plots of the estimated spatial correlation parameters ϑ1 and ϑ2 for the two principal components.
We also perform spatial prediction for the latent processes ξ1(s) and ξ2(s) as described in Section 3.4. Plots of the prediction results in a typical run is provided in Section W.6 of the online Supplementary Material. These predicted maps can provide useful information on hot spatial regions due to clustering.
6 Data Analysis
We apply the proposed methodology to historical cancer incidence records collected by the CTR. The CTR is the oldest cancer registry in the United States. Since the Surveillance, Epidemiology and End Results (SEER) Program was launched by the National Cancer Institute in 1973, it has always been a program participating SEER site. The CTR has reciprocal reporting agreements with cancer registries in all adjacent states (and Florida, a popular winter destination for retirees) to identify Connecticut residents with cancer diagnosed and/or treated in these states. For each identified CTR case, both the date of diagnosis and residential address at the time of diagnosis were recorded, along with a list of demographic and diagnostic variables. The longitude and latitude of a diagnosis are recorded in the Universal Transverse Mercator (UTM) coordinate system.
Pancreatic cancer is the fourth most common cause of cancer-related deaths in both men and women in the United States. We consider the CTR data of 8,230 pancreatic cancer incidences that were diagnosed from 1992 to 2009. Our primary interest is to study the spatio-temporal pattern of pancreatic cancer incidences, after having accounted for heterogeneities in both population density and socioeconomic status (SES) scores at the census block group level. There are 2,616 block groups within the state of Connecticut. The SES score is an aggregated measure to reflect poverty level in a neighborhood, where a higher SES score indicates a more deprived neighborhood (Wang et al., 2009).
Similar to model (1), we assume that the conditional intensity for the cancer incidences is
where λ0(s) and Z(s) are the population density and SES score at s, and X(s, t) is a latent process with the same structure as in (2). We assume that λ0(s) and Z(s) are constants within a block group.
We first the estimate the parameters in the first order intensity. The AIC in (W.9) picks K1 = 9 cubic B-splines to model the function γ(t). We apply the proposed method in Section 3.1 to estimate β and γ(t), and use the method described in Section W.4 of the Supplementary Material to estimate the standard error of the estimators. The estimated coefficient for the SES score is β̂ = 1.63 × 10−2, with standard error 3.87 × 10−3. We therefore conclude that the SES score is positively associated with the pancreatic cancer rate in a neighborhood. The estimated temporal trend function γ(t) and the 95% confidence bands are presented in Figure 2. The plot suggests that the pancreatic cancer rate was increasing over the years in the study period.
Figure 2.
Estimation results for the Connecticut Tumor Registry data. The first plot is the estimated temporal trend γ̂(t) (the dashed curves are the 95% confidence band), the second plot shows the first two estimated eigenfunctions (the solid curve is the ψ̂1 (t) and the dashed curve is ψ̂2(t)).
To estimate the covariance function RT, the cross-validation procedure in Section W.5.2 picks K2 = 9 and δ = 1000 UTM units. Therefore, block groups with a UTM distance less than 1000 are considered neighbors, the pancreatic cancer incidences in neighboring block groups are considered neighboring events, and all such pairs of neighboring events are used in the composite likelihood (14). Note that 1000 UTM units is about 1 kilometer, which is a small distance in the scale of this application. The AIC defined in (W.11) suggests that there are two principal components in X(s, ·). The first two eigenvalues are 8.742 and 0.345 which explain a total of 93% of variation in the covariance function. The first two eigenfunctions are shown in the second plot of Figure 2. As we can see, ψ̂1(t) given by the solid curve is almost a constant over time, indicating that the first principal component score ξ1(s) is a spatial random intercept. When ξ1(s) is higher, the cancer rate at s is also higher than the average rate. On the other hand, ψ̂2(t) represents an increasing trend in time, even though it does not increase linearly. Hence, when ξ2(s) is higher, the cancer rate at s increases faster than average.
We also model the spatial correlation in ξ1(s) and ξ2(s) by the exponential correlation function, and estimate correlation range parameters by the composite likelihood method in Section 3.3. The estimated range is about 2400 UTM units for both principal components. We perform spatial prediction for ξ1(s) and ξ2(s) at the census tract level, by simulating samples from the posterior (21). We use the posterior means as the predicted values of the principal component scores, and the posterior standard deviations as the prediction errors.
In the two panels of Figure 3, the predicted values of ξ1(s) and ξ2(s) are highlighted by gray levels in maps of Connecticut respectively, where black represents the highest positive values and white represents the lowest negative values. By the interpretation of the two principal components described above, we believe the dark census tracts in panel (a) of Figure 3 have higher pancreatic cancer rates than average, while the dark tracts in panel (b) have higher increasing rate in pancreas cancer than others. The posterior standard deviations of the two latent random fields are also given in the two panels of the Figure 4. These maps help us to understand the uncertainty in the spatial prediction. We perform z-tests on the predicted principal component scores and find that about 27% of census tracts have ξ1 significant different from 0, however none of the predictions for ξ2 are significant. The latter result is simply because the relatively large amount of prediction error for ξ2. We think that the signal in the second principal component is much weaker compared with the first component and the test on a local signal (i.e. at any single census tract) does not have enough power.
Figure 3.
Estimated scores for the first two principal components in the Connecticut Tumor Registry data. The principal component scores are estimated at census tract level and highlighted by grey levels on the map of Connecticut.
Figure 4.
Estimated prediction errors for the first two principal components in the Connecticut Tumor Registry data. The prediction errors are the square root of the posterior variance of the scores at census tract level.
Supplementary Material
Acknowledgments
This research has been partially supported by NIH grants 1R01CA169043 and NSF grants DMS-0845368, DMS-1105634 and DMS 1317118. The Connecticut Tumor Registry is supported by Contract No. HHSN261201300019I between the National Cancer Institute and State of Connecticut Department of Public Health. This study was approved by the Connecticut Department of Public Health (CDPH). Certain data used in this paper were obtained from the CDPH. The authors assume full responsibility for analysis and interpretation of these data.
Appendix A: Assumptions for the theoretical results
A.1 Notation
For any subset E ⊂ ℝ2, let
(E) be the σ-algebra generated by N ∩ (E ⊗ T) and {Z(s, t),ξj(s, t), j = 1, ···, p : (s, t) ∈ (E ⊗ T)}. To quantify the spatial dependence, we introduce the strong mixing coefficient (Rosenblatt, 1956),
| (A.1) |
where d(E1, E2) denotes the minimal spatial distance between E1 and E2.
Define
| (A.2) |
and
| (A.3) |
where |Dδ| is the common area for all Ds,δ, e.g. |Dδ| = πδ2 if Ds,δ is a disc centered at s. Put μ
(t1, t2) = E{
(t1, t2)}.
A.2 Assumptions
We make the following assumptions in order to derive our asymptotic theory.
Assumption 1. Define the class of Hölder continuous functions on [0, 1] as for some nonnegative integer r and some a > 0. We assume where r1 ≥ 2 is the order of the spline estimator and a > 0.
-
Assumption 2. We assume that the processes N, Z and ξj, j = 1, …, p, are strictly stationary in s and satisfy the following mixing condition (Guyon, 1995):
We also assume that E[Z(s, t) exp{ZT(s, t)β + γ(t)}]C < ∞ for some C > 2, and supt1,t2 ∫ℝ2 |
2(u, t1, t2) − 1 | du < ∞. Assumption 3. Define q(t; β, γ) = E[exp{ZT (s, t)β + γ(t)}], which does not depend on s by the stationarity of Z(s, t) for any fixed t. Assume that 0 < C1 ≤ mint q(t;θ,γ) ≤ maxt q(t; θ, γ) ≤ C2 < ∞, for all .
Assumption 4. Let μZ(t; β, γ) and ΣZ(β, γ) be defined in (24) and (25), and λmax(·) and λmin(·) be the functionals to take the maximum and minimum eigenvalues of a matrix. Assume that and 0 < C3 ≤ λmin(ΣZ) ≤ λmax(ΣZ) ≤ C4 < ∞, for all (β,γ) ∈
C0. We also assume μZ is continuous in β and γ, with ∥μZ(•;β1,γ1) − μZ(•; β2, γ2)∥ ≤ C(∥β1 − β2∥ + ∥γ1 − γ2∥), and similar for ΣZ.Assumption 5. Let C > 0 be a genuine constant. K = C|Dn|υ1, 1/(4r1) < υ1 < 1/2.
Assumption 6. The spatial covariance functions Cj(u) = cov{ξj(s),ξj(s + u)} are Lipschitz continuous at 0. There exists a constant M0 such that |Cj(u) − ωj| ≤ M0∥u∥, for j = 1, …, p.
Assumption 7. Define the class of bivariate Hölder continuous functions on [0, 1]⊗2 as , for u1, u2 ≥ 0, u1 + u2 ≤ r}. We assume that , where r2 ≥ 2 is the order of the tensor product spline function and a > 0.
- Assumption 8. Assume that 0 < C5 ≤,μm(t1, t2) ≤ C6 < ∞ for all t1, t2 ∈ T,
Assumption 9. Assume that δ → 0, |Dn∥Dδ| → ∞ and K2 = C(|Dn||Dδ|)u2, 1/(4r2) < u2 < 1/2.
Contributor Information
Yehua Li, Email: yehuali@iastate.edu, Department of Statistics and Statistical Laboratory, Iowa State University, Ames, IA 50011.
Yongtao Guan, Email: yguan@bus.miami.edu, Department of Management Science, University of Miami, Coral Gables, FL 33124.
References
- Ash RB, Gardner MF. Topics in stochastic processes. Academic press; 1975. [Google Scholar]
- Baddeley AJ, Møller J, Waagepetersen R. Non- and semi- parametric estimation of interaction in inhomogeneous point patterns. Statistica Neerlandica. 2000;54:329–350. [Google Scholar]
- Baddeley A, Turner R, Møller J, Hazelton M. Residual analysis for spatial point processes (with discussion) Journal of the Royal Statistical Society, Series B. 2005;67:617–666. [Google Scholar]
- Banerjee S, Gelfand AE, Carlin BP. Hierarchical Modeling and Analysis for Spatial Data. Chapman & Hall/CRC; 2003. [Google Scholar]
- Brix A, Diggle PJ. Spatiotemporal prediction for log-Gaussian Cox processes. Journal of the Royal Statistical Society, Series B. 2001;63:823–841. [Google Scholar]
- Brix A, Møller J. Space-time multi type log Gaussian Cox processes with a view to modelling weeds. Scandinavian Journal of Statistics. 2001;28:471–488. [Google Scholar]
- Bouzas PR, Valderrama M, Aguilera AM, Ruiz-Fuentes N. Modeling the mean of a doubly stochastic Poisson process by functional data analysis. Computational Statistics & Data Analysis. 2006;50:2655–2667. [Google Scholar]
- Demko S. Inverses of band matrices and local convergence of spline projection. SIAM Journal on Numerical Analysis. 1977;14:616–619. [Google Scholar]
- Di C, Crainiceanu CM, Caffo BS, Punjabi NM. Multilevel functional principal component analysis. The Annals of Applied Statistics. 2009;3:458–488. doi: 10.1214/08-AOAS206SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Diggle P, Rowlingson B, Su T. Point process methodology for on-line spatio-temporal disease surveillance. EnvironMetrics. 2005;16:423–434. [Google Scholar]
- Diggle PJ. Spatio-temporal point processes, partial likelihood, foot and mouth disease. Statistical Methods in Medical Research. 2006;15:325–336. doi: 10.1191/0962280206sm454oa. [DOI] [PubMed] [Google Scholar]
- Gelfand AE, Schmidt AM, Banerjee S, Sirmans CF. Nonstationary multivariate process modeling through spatially varying coregionalization. TEST. 2004:263–312. [Google Scholar]
- Guan Y. A composite likelihood approach in fitting spatial point process models. Journal of the American Statistical Association. 2006;101(476):1502–1512. [Google Scholar]
- Guan Y, Loh JM. A thinned block bootstrap variance estimation procedure for inhomogeneous spatial point patterns. Journal of the American Statistical Association. 2007:1377–1386. doi: 10.1198/jasa.2009.tm08541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guan Y. On consistent nonparametric intensity estimation for inhomogeneous spatial point processes. Journal of the American Statistical Association. 2008a;103:1238–1247. doi: 10.1198/jasa.2009.tm08541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guan Y. A KPSS test for stationarity for spatial point processes. Biometrics. 2008b;64:800–806. doi: 10.1111/j.1541-0420.2007.00977.x. [DOI] [PubMed] [Google Scholar]
- Guan Y. On nonparametric variance estimation for second-order statistics of in-homogeneous spatial point processes with known parameteric intensity form. Journal of the American Statistical Association. 2009;104:1482–1491. doi: 10.1198/jasa.2009.tm08541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guan Y, Sherman M, Calvin JA. A nonparametric test for spatial isotropy using subsampling. Journal of the American Statistical Association. 2004;99:810–821. [Google Scholar]
- Guan Y, Wang H. Sufficient dimension reduction for spatial point processes directed by Gaussian random fields. Journal of the Royal Statistical Society, Series B. 2010:367–387. [Google Scholar]
- Guyon X. Random fields on a network: modeling, statistics, and applications. Springer-Verlage; New York: 1995. [Google Scholar]
- Hall P, Hosseini-Nasab M. On properties of functional principal components analysis. Journal of the Royal Statistical Society, Series B. 2006;68:109–126. [Google Scholar]
- Hall P, Müller HG, Wang JL. Properties of principal component methods for functional and longitudinal data analysis. Annals of Statistics. 2006;34:1493–1517. [Google Scholar]
- Hall P, Müller HG, Yao F. Modelling sparse generalized longitudinal observations with latent Gaussian processes. Journal of the Royal Statistical Society, Series B. 2008;70:703–723. [Google Scholar]
- Heinrich L, Prokešová M. On estimating the asymptotic variance of stationary point processes. Methodology and Computing in Applied Probability. 2010;12(3):451–471. [Google Scholar]
- Huang J, Yang L. Identification of nonlinear additive autoregressive models. Journal of the Royal Statistical Society, Series B. 2004;66:463–477. [Google Scholar]
- Illian J, Benson E, Crawford J, Staines H. Case studies in spatial point process modeling, vol. 185 of Lecture Notes in Statistics. Springer; New York: 2006. Principal component analysis for spatial point processes –assessing the appropriateness of the approach in an ecological context; pp. 135–150. [Google Scholar]
- Li Y, Hsing T. Deciding the dimension of effective dimension reduction space for functional and high-dimensional data. Annals of Statistics. 2010a;38:3028–3062. [Google Scholar]
- Li Y, Hsing T. Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Annals of Statistics. 2010b;38:3321–3351. [Google Scholar]
- Møller J, Syversveen AR, Waagepetersen RP. Log-gaussian cox processes. Scandinavian Journal of Statistics. 1998;25:451–482. [Google Scholar]
- Møller J, Waagepetersen R. Modern statistics for spatial point processes. Scandinavian Journal of Statistics. 2007;34:643–684. [Google Scholar]
- Ramsay JO, Silverman BW. Functional Data Analysis. 2nd. Springer-Verlag; New York: 2005. [Google Scholar]
- Rosenblatt M. A central limit theorem and a strong mixing condition. Proceedings of the National Academy of Science. 1956;42:43–47. doi: 10.1073/pnas.42.1.43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schoenberg FP. Multidimensional residual analysis of point process models for earthquake occurrences. Journal of the American Statistical Association. 2003;98:789–795. [Google Scholar]
- Schoenberg FP. Consistent parametric estimation of the intensity of a spatialtemporal point process. Journal of Statistical Planning and Inference. 2005;128:79–93. [Google Scholar]
- Schumaker LL. Spline Functions. Wiley; New York: 1981. [Google Scholar]
- Stein ML. Interpolation of Spatial Data. Springer; New York: 1999. [Google Scholar]
- Stone C. The dimensionality reduction principle for generalized additive models. Annals of Statistics. 1986;14:590–606. [Google Scholar]
- Stone C. The use of polynomial splines and their tensor products in multivariate function estimation. Annals of Statistics. 1994;22:118–184. [Google Scholar]
- Tanaka U, Ogata Y, Stoyan D. Parameter estimation and model selection for Neymann-Scott point processes. Biometrical Journal. 2007;49:1–15. doi: 10.1002/bimj.200610339. [DOI] [PubMed] [Google Scholar]
- Waagepetersen RP. An estimating function approach to inference for inhomogeneous Neyman-Scott Processes. Biometrics. 2007;63:252–258. doi: 10.1111/j.1541-0420.2006.00667.x. [DOI] [PubMed] [Google Scholar]
- Wang R, Gross CP, Halene S, Ma X. Neighborhood socioeconomic status influences the survival of elderly patients with myelodysplastic syndromes in the Unite States. Cancer Causes and Control. 2009;20(8):1369–1376. doi: 10.1007/s10552-009-9362-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu S, Müller HG, Zhang Z. Functional data analysis for point processes with rare events. Statistica Sinica. 2013;23:1–23. [Google Scholar]
- Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005a;100:577–590. [Google Scholar]
- Yao F, Müller HG, Wang JL. Functional linear regression analysis for longitudinal data. Annals of Statistics. 2005b;33:2873–2903. [Google Scholar]
- Zhou L, Huang J, Martinez JG, Maity A, Baladandayuthapani V, Carroll RJ. Reduced rank mixed effects models for spatially correlated hierarchical functional data. Journal of the American Statistical Association. 2010;105:390–400. doi: 10.1198/jasa.2010.tm08737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou S, Shen X, Wolfe DA. Local asymptotics for regression splines and confidence regions. Annals of Statistics. 1998;26(5):1760–1782. [Google Scholar]
- Zhu Z, Fung WK, He X. On the asymptotics of marginal regression splines with longitudinal data. Biometrika. 2008;95(4):907–917. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




