Quasi-likelihood for Spatial Point Processes

Yongtao Guan; Abdollah Jalilian; Rasmus Waagepetersen

doi:10.1111/rssb.12083

. Author manuscript; available in PMC: 2016 Jun 1.

Published in final edited form as: J R Stat Soc Series B Stat Methodol. 2015 Sep 4;77(3):677–697. doi: 10.1111/rssb.12083

Quasi-likelihood for Spatial Point Processes

Yongtao Guan ¹, Abdollah Jalilian ², Rasmus Waagepetersen ^3,^†

PMCID: PMC4450110 NIHMSID: NIHMS605050 PMID: 26041970

Summary

Fitting regression models for intensity functions of spatial point processes is of great interest in ecological and epidemiological studies of association between spatially referenced events and geographical or environmental covariates. When Cox or cluster process models are used to accommodate clustering not accounted for by the available covariates, likelihood based inference becomes computationally cumbersome due to the complicated nature of the likelihood function and the associated score function. It is therefore of interest to consider alternative more easily computable estimating functions. We derive the optimal estimating function in a class of first-order estimating functions. The optimal estimating function depends on the solution of a certain Fredholm integral equation which in practise is solved numerically. The derivation of the optimal estimating function has close similarities to the derivation of quasi-likelihood for standard data sets. The approximate solution is further equivalent to a quasi-likelihood score for binary spatial data. We therefore use the term quasi-likelihood for our optimal estimating function approach. We demonstrate in a simulation study and a data example that our quasi-likelihood method for spatial point processes is both statistically and computationally efficient.

Keywords: Estimating function, Fredholm integral equation, Godambe information, Intensity function, Regression model, Spatial point process

1. INTRODUCTION

In many applications of spatial point processes it is of interest to fit a regression model for the intensity function. In case of a Poisson point process, maximum likelihood estimation of regression parameters is rather straightforward with a user-friendly implementation available in the R package spatstat (Baddeley and Turner, 2005). However, if e.g. Cox or cluster point process models are used to accommodate clustering not explained by a Poisson process, then maximum likelihood estimation is in general difficult from a computational point of view (see e.g. Møller and Waagepetersen, 2004). Alternatively, one may follow composite likelihood arguments (e.g. Møller and Waagepetersen, 2007) to obtain an estimating function that is equivalent to the score of the Poisson likelihood function. This provides a computationally tractable estimating function and theoretical properties of the resulting estimator are well understood, see e.g. Schoenberg (2005), Waagepetersen (2007) and Guan and Loh (2007).

A drawback of the Poisson score function approach is the loss of efficiency since possible dependence between points is ignored. In the context of intensity estimation, it appears that only Mrkvička and Molchanov (2005) and Guan and Shen (2010) have tried to incorporate second-order properties in the estimation so as to improve efficiency. Mrkvička and Molchanov (2005) show that their proposed estimator is optimal among a class of linear, unbiased intensity estimators, where the word ‘optimal’ refers to minimum variance. However, their approach is confined to a very restrictive type of intensity function known up to a one-dimensional scaling factor. In contrast, Guan and Shen (2010) propose a weighted estimating equation approach that is applicable to intensity functions in more general forms. A similar optimality result can on the other hand not be established for their approach.

In this paper we derive an optimal estimating function that not only takes into account possible spatial correlation but also is applicable for point processes with a general regression model for the intensity function. In the spirit of generalized linear models the intensity is given by a differentiable function of a linear predictor depending on spatial covariates. The optimal estimating function depends on the solution of a certain Fredholm integral equation and reduces to the likelihood score in case of a Poisson process. We show in Section 3.2 that the optimality result in Mrkvička and Molchanov (2005) is a special case of our more general result, and that the estimation method in Guan and Shen (2010) is only a crude approximation of our new approach. Apart from being computationally efficient, our estimating function only requires specification of the intensity function and the so-called pair correlation function, which is another advantage compared with maximum likelihood estimation.

For many types of correlated data other than spatial point patterns, estimating functions have been widely used for model fitting when maximum likelihood estimation is computationally challenging. Examples of such data include longitudinal data (Liang and Zeger, 1986), time series data (Zeger, 1988), clustered failure time data (Gray, 2003) and spatial binary or count data (Gotway and Stroup, 1997; Lin and Clayton, 2005). For most of these methods, the inverse of a covariance matrix is used to account for the correlation in data, and optimality can be established when the so-called quasi-score estimating functions are used (Wedderburn, 1974; Heyde, 1997). For a quasi-score estimating function the inverse covariance matrix contributes to an optimal linear transformation of the residual vector. For point processes, an analogue of residuals is given by the so-called residual measure and our optimal estimating function can be viewed as an optimal linear transformation of the residual measure. Moreover, the numerical implementation of our method is closely related to the quasi-likelihood for spatial data considered in Gotway and Stroup (1997) and Lin and Clayton (2005). Our work hence not only lays the theoretical foundation for optimal intensity estimation, but also fills in a critical gap between existing literature on spatial point processes and the well-established quasi-likelihood estimation method. We therefore adopt the term quasi-likelihood for our approach.

Following some background material on point processes and estimating functions, we derive our optimal quasi-likelihood score estimating function and discuss the practical implementation of it based on a numerical solution of the Fredholm integral equation. Asymptotic properties of the resulting parameter estimator is then considered and the superior performance of the quasi-likelihood method compared with existing ones is demonstrated through a simulation study. We finally illustrate the practical use of the quasi-likelihood in a data example of three tropical tree species.

2. BACKGROUND

In this section we provide background on the intensity and pair correlation function of a spatial point process and we state the basic assumptions on these needed for our quasi-likelihood method. Also reviews on composite likelihood estimation and estimating functions are provided. Throughout the presentation, we use E, Var and Cov to denote expectation, variance and covariance, respectively.

2.1. Basic Assumptions on the Intensity and Pair Correlation Function

Let X be a point process on $R^{2}$ and let N(B) denote the number of points in X ⋂ B for any bounded set $B \subset R^{2}$ . We assume that X has an intensity function λ(·) and a pair correlation function g(·, ·) whereby the first- and second-order moments of the counts N(B) are given by

E N (B) = \int_{B} λ (u) d u

(1)

and

C ov [N (A), N (B)] = \int_{A \cap B} λ (u) d u + \int_{A} \int_{B} λ (u) λ (v) [g (u, v) - 1] d u d v

(2)

for bounded sets $A, B \subseteq R^{2}$ (Møller and Waagepetersen, 2004).

We assume that the intensity function is given in terms of a parametric model λ(u) = λ(u; β), where $β = (β_{1}, \dots, β_{p}) \in R^{p}$ is a vector of regression parameters. The intensity function is further assumed to be positive and differentiable with respect to β with gradient λ’(u; β) = dλ(u; β)/dβ. A popular example is the log linear model log λ(u; β) = z(u)β^T, where z(u) = (z₁(u), … , z_p(u)) is a covariate vector for each $u \in R^{2}$ . For convenience of exposition we assume that g(u, v) only depends on the difference u − v since this is the common assumption in practise. This implies that X becomes second-order re-weighted stationary (Baddeley et al., 2000). In the following we thus let g(r) denote the pair correlation function for two points u and v with u − v = r. However, our proposed optimal estimating function is applicable also in the case of a non-translation invariant pair correlation function. For the derivations in Section 3 we further need that g(·) is continuous and that g(·) − 1 is a non-negative definite function.

For given functions λ(·) and g(·), in addition to the assumptions specified above, it is of course required that these functions are indeed respectively an intensity function and a pair correlation function of some spatial point process. To be an intensity function, λ(·) just needs to be non-negative and integrable. We are not aware of simple necessary and sufficient conditions that ensure that a function g(·) is a pair correlation function. We hence restrict attention to functions g(·) which conform with pair correlation functions of existing point process models. From a practical point of view, we also need a computationally tractable expression for g(·). This precludes pair correlation functions of Markov and Gibbs point processes (e.g. Møller and Waagepetersen, 2007). On the other hand, a wide range of shot-noise Cox processes, log Gaussian Cox processes and Poisson cluster processes have pair correlation functions given in closed form (Møller and Waagepetersen, 2004) and satisfying the assumptions stated in the previous paragraph (see also Section 3.1 regarding the condition of non-negative definiteness).

2.2. Composite Likelihood

A first-order log composite likelihood function (Schoenberg, 2005; Waagepetersen, 2007) for estimation of β is given by

\sum_{u \in X \cap W} \log λ (u; β) - \int_{W} λ (u; β) d u,

(3)

where $W \subset R^{2}$ is the observation window. This can be viewed as a limit of log composite likelihood functions for binary variables Y_i = 1[N(B_i) > 0], i = 1, … , m, where the cells B_i form a disjoint partitioning of W and 1[·] is an indicator function (e.g. Møller and Waagepetersen, 2007). The limit is obtained when the number of cells tends to infinity and the areas of the cells tend to zero. In case of a Poisson process, the composite likelihood function coincides with the likelihood function.

The composite likelihood is computationally simple and enjoys considerable popularity in particular in studies of tropical rain forest ecology where spatial point process models are fitted to spatial point pattern data sets of locations of thousands of rain forest trees (see e.g. Shen et al., 2009; Lin et al., 2011; Renner and Warton, 2013). However, it is not statistically efficient for non-Poisson data since possible correlations between counts of points are ignored.

2.3. Primer on Estimating Functions and Quasi-likelihood

Referring to the previous Section 2.2, the composite likelihood estimator of βis obtained by maximizing the log composite likelihood (3). This is equivalent to solving the following equation:

e (β) = 0,

(4)

where

e (β) = \sum_{u \in X \cap W} \frac{λ^{'} (u; β)}{λ (u; β)} - \int_{W} λ^{'} (u; β) d u .

(5)

Equations in the form of (4) are typically referred to as estimating equations and functions like e(β) are called estimating functions (Heyde, 1997). Note that many other statistical estimation procedures, such as maximum likelihood estimation, moment based estimation and minimum contrast estimation, can all be written in terms of estimating functions.

We defer rigorous asymptotic details to Section 5 and here just provide an informal overview of properties of an estimator $\hat{β}$ based on an estimating function e(β). By a first-order Taylor series expansion at $\hat{β}$ ,

e (β) \approx e (\hat{β}) + [\hat{β} - β] S = (\hat{β} - β) S,

where $S = - E d e (β) ∕ d β^{⊺}$ is the so-called sensitivity matrix (e.g. page 62 in Song, 2007) and the equality is due to $e (\hat{β}) = 0$ as required by (4). It then follows immediately that $\hat{β} \approx β + e (β) S^{- 1}$ . Thus, with equal to the true parameter value, $\hat{β}$ is approximately unbiased if $E e (β) = 0$ , i.e. e(β) is an unbiased estimating function. Moreover, $V ar \hat{β} \approx S^{- 1} Σ S^{- 1}$ where $Σ = V ar e (β)$ and S⁻¹ ΣS⁻¹ is the asymptotic covariance matrix when the size of the data set goes to infinity in a suitable manner (Section 5). The inverse of S⁻¹ ΣS⁻¹, i.e. SΣ⁻¹S, is called the Godambe information (e.g. Definition 3.7 in Song, 2007).

Suppose that two competing estimating functions e₁(β) and e₂(β) with respective Godambe informations I₁ and I₂ are used to obtain the estimators ${\hat{β}}_{1}$ and ${\hat{β}}_{2}$ . Then e₁(β) is said to be superior to e₂(β) if I₁ − I₂ is positive definite, since this essentially means that ${\hat{β}}_{1}$ has a smaller asymptotic variance than ${\hat{β}}_{2}$ . If I₁ − I₂ is positive definite for all possible e₂(β), then we say that e₁(β) has the maximal Godambe information and is an optimal estimating function. The resulting estimator ${\hat{β}}_{1}$ is then the asymptotically most efficient.

Consider an m-dimensional data vector Y with covariance matrix V and mean vector μ, a differentiable function of some p-dimensional parameter vector β. Let D = dμ^T/dβ be the m × p matrix of partial derivatives dμ_i/dβ_j. The quasi-likelihood score function is then

(Y - μ) V^{- 1} D,

(6)

which is optimal among all estimating functions of the form (Y − μ)A for some m × p matrix A (e.g. Heyde, 1997).

3. AN OPTIMAL FIRST-ORDER ESTIMATING EQUATION

The estimating function given in (5) can be rewritten as

e_{f} (β) = \sum_{u \in X \cap W} f (u) - \int_{W} f (u) λ (u; β) d u,

(7)

where f(u) = λ’(u; β)/λ(u; β). In general, f(u) can be any 1 × p real vector valued function, where p is the dimension of β. We call (7) a first-order estimating function. Our aim is to find a function ϕ so that e_ϕ is optimal within the class of first-order estimating functions; in other words, the resulting estimator of β associated with e_ϕ is asymptotically most efficient.

The estimating function (7) can be further re-expressed in terms of the residual measure (Baddeley et al., 2005; Waagepetersen, 2005) defined for bounded $B \subset R^{2}$ as

R (B) = \sum_{u \in X \cap W} 1 [u \in B] - \int_{B} λ (u; β) d u .

Thus e_f(β) = ∫_W f(u)R(du) so our estimating function can be viewed as a linear transformation of the residual measure. Hence just like the quasi-likelihood score for ordinary numerical data is the optimal linear transformation of the residual vector (Section 2.3), our optimal estimating function will be the optimal linear transformation of the residual measure.

Let $Σ_{f} = V ar e_{f} (β), J_{f} = - d e_{f} (β) ∕ d β^{⊺}$ and $S_{f} = E J_{f}$ . Note that Σ_f, J_f and S_f all depend on β but we suppress the dependence on β in this section for ease of presentation. Recalling the definition of optimality in Section 2.3, for e_ϕ to be optimal we must have that

S_{ϕ} Σ_{ϕ}^{- 1} S_{ϕ} - S_{f} Σ_{f}^{- 1} S_{f}

(8)

is non-negative definite for all $f : W \to R^{p}$ . A sufficient condition for this is

Σ_{ϕ f} = S_{f}

(9)

for all f where $Σ_{ϕ f} = C ov [e_{ϕ} (β), e_{f} (β)]$ . To understand the intuition behind (9), view e_ϕ as the score function from maximum likelihood estimation (MLE) and e_f as an arbitrary unbiased estimating function not necessarily in the form of (7). It is then well known that (9) holds and in fact leads to the optimality of the MLE score function among all unbiased estimating functions. In our setting, (9) therefore suggests that e_ϕ plays the role of the MLE score function and is expected to be optimal within the class of first-order estimating functions e_f defined by (7). This type of condition is also provided in Theorem 2.1 in Heyde (1997) for both discrete and continuous vector-valued data. In Appendix A, we give a short self-contained proof of the sufficiency of (9) in our setting.

By the Campbell formulae (e.g. Møller and Waagepetersen, 2004, Chapter 4),

\begin{matrix} Σ_{ϕ f} & = \int_{W} f^{⊺} (u) ϕ (u) λ (u; β) d u + \int_{W^{2}} f^{⊺} (u) ϕ (v) λ (u; β) λ (v; β) [g (u - v) - 1] d u d v, \\ S_{f} & = \int_{W} f^{⊺} (u) λ^{'} (u; β) d u . \end{matrix}

Hence, (9) is equivalent to

\int_{W} f^{⊺} (u) {λ^{'} (u; β) - ϕ (u) λ (u; β) - λ (u; β) \int_{W} ϕ (v) λ (v; β) [g (u - v) - 1] d v} d u = 0

for all $f : W \to R^{p}$ , which is true if

λ^{'} (u; β) - ϕ (u) λ (u; β) - λ (u; β) \int_{W} ϕ (v) λ (v; β) [g (u - v) - 1] d v = 0

(10)

for all u ∈ W. Since λ > 0, (10) implies that ϕ is a solution to the Fredholm integral equation (e.g. Hackbusch, 1995, Chapter 3)

ϕ = \frac{λ^{'}}{λ} - T ϕ,

(11)

where T is the operator given by

(Tf) (u) = \int_{W} t (u, v) f (v) d v with t (u, v) = λ (v; β) [g (u - v) - 1] .

(12)

By continuity of g, T is compact in the space of continuous functions on W (Hackbusch, 1995, Theorem 3.2.5). Moreover, −1 is not an eigenvalue of T since g(·) − 1 is positive definite (Section 3.1). It then follows by Theorem 3.2.1 in Hackbusch (1995) that (11) has a unique solution

ϕ = {(I + T)}^{- 1} \frac{λ^{'}}{λ},

where I is the identity operator (or, depending on context, the identity matrix) and (I+T)⁻¹ is the bounded linear inverse of I + T. We define

\begin{matrix} e (β) & = e_{ϕ} (β) = \sum_{u \in X \cap W} ϕ (u) - \int_{W} ϕ (u) λ (u; β) d u, \\ Σ & = V ar e (β), J = - d e (β) ∕ d β^{⊺}, S = E J, \end{matrix}

(13)

where by the above derivations,

S = Σ = \int_{W} ϕ^{⊺} (u) λ^{'} (u; β) d u .

(14)

In the Poisson process case where g(·) = 1, (13) reduces to the Poisson likelihood score (5).

We develop a more explicit expression for ϕ by using Neumann series expansion in Appendix B. The Neumann series expansion is also useful for checking the conditions for our asymptotic results; see Section 1 in the supplementary material. However, it is not essential for our approach so we omit the detailed discussion here.

3.1. Condition for non-negative eigenvalues of T

In general it is difficult to assess the eigenvalues of T given by (12). However, since g − 1 is non-negative definite, T^s is a positive operator (i.e., ∫W f^T(u)(T^sf)(u)du ≥ 0) where T^s is given by the symmetric kernel t^s(u, v) = λ(u; β)^1/2λ(v; β)^1/2[g(u − v) − 1). Then all eigenvalues of T^s are non-negative (Lax, 2002, Corollary 1, p. 320). In particular, −1 is not an eigenvalue. The same holds for T since it is easy to see that the eigenvalues of T coincide with those of T^s.

For the wide class of second-order re-weighted stationary Cox point processes, $g (r) = 1 + C ov [Λ (u), Λ (u + r)] ∕ [λ (u) λ (u + r)]$ where Λ denotes the random intensity function of the Cox process. Hence g(·) − 1 is non-negative definite for this class of processes.

3.2. Relation to Existing Methods

Guan and Shen (2010) consider a subset of first-order estimating functions of the form

\sum_{u \in X \cap W} w (u) \frac{λ^{'} (u; β)}{λ (u; β)} - \int_{W} w (u) λ^{'} (u; β) d u

(15)

obtained by introducing a weight function w(·) in the composite likelihood estimating function (5). They then seek to minimize the parameter estimation variance with respect to w(·) and obtain an approximate solution of this minimization problem. In contrast, our estimating function (13) is optimal among all first-order estimating functions including those of the form (15).

An approximate version of our estimating function coincides with the one obtained by Guan and Shen (2010). This follows by approximating the operator T by

(Tf) (u) = \int_{W} f (v) λ (v; β) [g (u - v) - 1)] d v \approx λ (u; β) f (u) \int_{W} [g (u - v) - 1] d v .

(16)

This is justified if f(v)λ(v; β) is close to f(u)λ(u; β) for the v where g(u − v) − 1 differs substantially from zero. Then the Fredholm integral equation (11) can be approximated by $ϕ = \frac{λ^{'}}{λ} - λ A ϕ$ , where

A (u) = \int_{W} [g (u - v) - 1] d v .

(17)

We hence obtain an approximate solution ϕ = wλ’/λ with w(u) = [1 + λ(u; β)A(u)]⁻¹. Using this approximation in (13) the resulting estimating function is precisely of the form (15) suggested by Guan and Shen (2010).

Mrkvička and Molchanov (2005) derived optimal intensity estimators in the situation of λ(u; ρ) = ργ(u) for some known function γ(u) and unknown parameter ρ > 0. Since ρ is the only unknown parameter, a direct application of (11) yields

ρ ϕ (u) + ρ^{2} \int_{W} ϕ (v) γ (v) [g (u - v) - 1] d v = 1,

which is essentially Corollary 3.1 of Mrkvička and Molchanov (2005). It is uncommon for an intensity function to be known up to a one-dimensional scaling factor. In contrast, our proposed modeling framework for the intensity function closely mimics that used in classical regression analysis and is more general. As a result, our method of derivation is completely different from that in Mrkvička and Molchanov (2005).

4. IMPLEMENTATION

In this section we discuss practical issues concerning the implementation of our proposed optimal estimating function. In particular we show in Section 4.2 that a particular numerical approximation of our optimal estimating function is equivalent to a quasi-likelihood for binary spatial data for which an iterative generalized least squares solution can be implemented. An R implementation ql.ppm() will appear in future releases of spatstat (Baddeley and Turner, 2005).

4.1. Numerical Approximation

To estimate ϕ, consider the numerical approximation

(T ϕ) (u) = \int_{W} t (u, v) ϕ (v) d v \approx \sum_{i = 1}^{m} t (u, u_{i}) ϕ (u_{i}) w_{i},

(18)

where u_i, i = 1, … , m, are quadrature points with associated weights w_i. Inserting this approximation in (11) with u = u_l we obtain estimates $\hat{ϕ} (u_{l})$ of $ϕ (u_{l}), l = 1, \dots, m$ , by solving the system of linear equations,

ϕ (u_{l}) + \sum_{i = 1}^{m} t (u_{l}, u_{i}) ϕ (u_{i}) w_{i} = \frac{λ^{'} (u_{l}; β)}{λ (u_{l}; β)}, l = 1, \dots, m .

Then $(T ϕ) (u) \approx \sum_{i = 1}^{m} t (u, u_{i}) \hat{ϕ} (u_{i}) w_{i}$ and plugging this further approximation into (11), the Nyström approximate solution of (11) directly becomes

\hat{ϕ} (u) = \frac{λ^{'} (u; β)}{λ (u; β)} - \sum_{i = 1}^{m} t (u, u_{i}) \hat{ϕ} (u_{i}) w_{i} .

(19)

In (13) we replace ϕ by $\hat{ϕ}$ and we approximate the integral term applying again the quadrature rule used to obtain $\hat{ϕ}$ . This leads to

\hat{e} (β) = \sum_{u \in X \cap W} \hat{ϕ} (u) - \sum_{i = 1}^{n} \hat{ϕ} (u_{i}) λ (u_{i}; β) w_{i} .

(20)

To estimate β, we solve $\hat{e} (β) = 0$ iteratively using Fisher scoring. Suppose that the current estimate is β^(l). Then β^(l+1) is obtained by the Fisher scoring update

β^{(l + 1)} = β^{(l)} + \hat{e} (β^{(l)}) {\hat{S}}^{- 1},

(21)

where

\hat{S} = \sum_{i = 1}^{m} \hat{ϕ} {(u_{i})}^{⊺} λ^{'} (u_{i}; β^{(l)}) w_{i}

(22)

is the numerical approximation of the sensitivity matrix S = ∫_Wϕ^T(u)λ’(u; β^(l))du.

Provided the quadrature scheme is convergent, it follows by Lemma 4.7.4, Lemma 4.7.6 and Theorem 4.7.7 in Hackbusch (1995) that ${‖ ϕ - \hat{ϕ} ‖}_{\infty}$ converges to zero as m → ∞ where ∥ · ∥_∞ denotes supremum norm of a function. This justifies the use of the Nyström method to obtain an approximate solution of the Fredholm integral equation.

4.2. Implementation as quasi-likelihood

Suppose that we are using simple Riemann quadrature in (18). Then the w_i’s correspond to areas of sets B_i that partition W and for each i, u_i ∈ B_i. Let Y_i denote the number of points from X falling in B_i and define μ_i = λ(u_i; β)w_i. If the B_i’s are sufficiently small so that the Y_i’s are binary then (20) is approximately equal to

\sum_{i = 1}^{m} \hat{ϕ} (u_{i}) (Y_{i} - μ_{i}) .

(23)

Further, by (1) and (2), $E Y_{i} \approx μ_{i}$ and

C ov (Y_{i}, Y_{j}) = 1 (i = j) \int_{B_{i}} λ (u; β) d u + \int_{B_{i} \times B_{j}} λ (u; β) λ (v; β) [g (u - v) - 1] d u d v \approx V_{ij} = μ_{i} 1 (i = j) + μ_{i} μ_{j} [g (u_{i}, u_{j}) - 1] .

Define Y = (Y_i)_i, μ = (μ_i)_i and V = [V_ij]_ij. Then $E Y \approx μ$ and $C ov Y \approx V$ . Moreover, from (19), ${[\hat{ϕ} (u_{i})]}_{i} = V^{- 1} D$ where D = dμ^T/dβ is the m × p matrix of partial derivatives dμ_i/dβ_j. Hence, (22) becomes

D^{⊺} V^{- 1} D

(24)

and (23) becomes

(Y - μ) V^{- 1} D,

(25)

which is formally a quasi-likelihood score for spatial data Y with mean μ and covariance matrix V (Gotway and Stroup, 1997).

4.3. Computational details

With the quasi-likelihood formulation discussed in the previous section, $\hat{S}$ and $\hat{e}$ in (21) are given by respectively (24) and (25). The Fisher scoring updates (21) thus take the form of generalized least squares updates

(β^{(l + 1)} - β^{(l)}) D {(β^{(l)})}^{⊺} V {(β^{(l)})}^{- 1} D (β^{(l)}) = [Y - μ (β^{(l)})] V {(β^{(l)})}^{- 1} D (β^{(l)}),

(26)

where we here use the notation D(β), V(β) and μ(β) to emphasize the dependence of D, V, and μ on β.

Let $V = V_{μ}^{1 ∕ 2} (I + G) V_{μ}^{1 ∕ 2}$ where V_μ = Diag(μ_i) and $G_{ij} = \sqrt{μ_{i} μ_{j}} [g (u_{i}, u_{j}) - 1]$ so that G = [G_ij]_ij is the matrix analogue of the symmetric operator T^s from Section 3.1. A computational difficulty in (26) arise from the inversion of the high-dimensional matrix I+G. However, we can approximate G by a sparse matrix G_taper obtained using tapering (e.g. Furrer et al., 2006). Thus in (26) V is replaced by $V_{taper} = V_{μ}^{1 ∕ 2} (I + G_{taper}) V_{μ}^{1 ∕ 2}$ . A sparse matrix Cholesky decomposition I + G_taper = LL^T is obtained using the R Matrix package by Doug Bates and Martin Maechler. The matrix product $V_{taper}^{- 1} D$ can then be efficiently computed by solving the equation ${xV}_{μ}^{1 ∕ 2} {LL}^{⊺} = V_{μ}^{- 1 ∕ 2} D$ with respect to x by using forward and backward substitution for the sparse Cholesky factors L and L^T, respectively.

In practise, it is often assumed that g(r) = g₀(∥r∥) for some function g₀. If g₀ is a decreasing function of ∥r∥ then we may define the entries in G_taper as G_ij1[∥u_i − u_j∥ ≤ d_taper], where d_taper solves [g₀(d) − 1]/[g₀(0) − 1] = ∊ for some small ∊. That is, we re[;ace entries G_ij by zero if g₀(∥u_i − u_j∥) − 1 is below some small percentage of the maximal value g₀(0) − 1. In general g inside G_taper is unknown and must be replaced by an estimate. We replace β and g in G_taper by preliminary estimates (see Section 4.4) which are fixed during the generalized least squares iterations. This yields further computational simplification since the Cholesky factorization of I + G_taper then only needs to be computed once.

By the asymptotic result (29) in Section 5 the asymptotic covariance matrix of $\hat{β}$ is given by the inverse sensitivity where the sensitivity is estimated by (24). When V is replaced by V_taper in (26) we need the following adjusted estimate of the covariance matrix of $\hat{β}$ :

S_{taper}^{- 1} D^{⊺} V_{taper}^{- 1} V V_{taper}^{- 1} {DS}_{taper}^{- 1},

(27)

where $S_{taper} = D^{⊺} V_{taper}^{- 1} D$ . Parameter standard errors are given by the square roots of the diagonal elements in (27). Note that it is not required to invert the non-sparse covariance matrix V in order to compute (27). Nevertheless, the computation of (27) can still be computationally intensive as shown in the practical examples in Section 6.2.

The discretization of W should be chosen as fine as possible and the tapering parameter ∊ as small as possible given the available computational resources. Typically the discretization of W is generated by the cells B_i of a regular grid. In the practical data example in Section 6.2 we investigate the sensitivity of the parameter estimates and parameter standard errors to the choice of grid size and ∊. In the data example very similar results are obtained with ∊ = 0.002, 0.01 and ∊ = 0.05 suggesting that ∊ = 0.05 is in fact sufficiently small. Tapering entails a loss of statistical efficiency relative to quasi-likelihood estimation without tapering. However, valid parameter estimates and standard errors from the adjusted covariance matrix (27) are obtained regardless of the chosen value of the tapering tuning parameter ∊.

4.4. Preliminary Estimation of Intensity and Pair Correlation

To obtain a preliminary estimate of g we assume that g(r) = g(r; ψ) where g(·; ψ) is a translation invariant parametric pair correlation function model. We replace ψ and β inside G by preliminary estimates $\tilde{β}$ and $\tilde{ψ}$ which are fixed during the iterations (26). The estimates $\tilde{β}$ and $\tilde{ψ}$ can be obtained using the two-step approach in Waagepetersen and Guan (2009) where $\tilde{β}$ is obtained from the composite likelihood function and $\tilde{ψ}$ is a minimum contrast estimate based on the K-function. That is,

\tilde{ψ} = \underset{ψ}{argmin} \int_{0}^{r_{\max}} {[K (t; ψ) - \hat{K} (t)]}^{1 ∕ 4} d t

where

K (t; ψ) = \int_{‖ r ‖ \leq t} g (r; ψ) d r

(28)

is the K-function and $\hat{K}$ is a non-parametric estimate of the K-function (Baddeley et al., 2000). If translation invariance can not be assumed, ψ may instead be estimated using a second-order composite likelihood as in Jalilian et al. (2013).

5. ASYMPTOTIC THEORY

Let $W_{n} \subset R^{2}$ be an increasing sequence of observation windows in $R^{2}$ . Following Section 4.4 we assume that the true pair correlation function is given by a parametric model g(r) = g(r; ψ) for some unknown parameter vector $ψ \in R^{q}$ . Let $θ = (β, ψ) \in R^{p + q}$ . We denote the true value of θ by θ* = (β*, ψ*). In what follows, $E$ and $V ar$ denote expectation and variance under the distribution corresponding to θ*.

Introducing the dependence on n and θ in the notation from Section 3, we have

ϕ_{n, θ} (u β) = [{(I + T_{n, θ})}^{- 1} \frac{λ^{'} (\cdot; β)}{λ (\cdot; β)}] (u), (T_{n, θ} f) (u) = \int_{W_{n}} t_{θ} (u, v) f (v) d v

and

t_{θ} (u, v) = λ (v; β) [g (u - v; ψ) - 1] .

Following Section 4.4 we replace θ in the kernel t_θ by a preliminary estimate ${\tilde{θ}}_{n} = ({\tilde{β}}_{n}, {\tilde{ψ}}_{n})$ . The estimating function (13) then becomes $e_{n, {\tilde{θ}}_{n}} (β)$ where

e_{n, θ} (β) = \sum_{u \in X \cap W_{n}} ϕ_{n, θ} (u, β) - \int_{W_{n}} ϕ_{n, θ} (u, β) λ (u; β) d u .

Let ${\hat{β}}_{n}$ denote the estimator obtained by solving $e_{n, {\tilde{θ}}_{n}} (β) = 0$ . Further, define

{\overset{‒}{Σ}}_{n} = {∣ W_{n} ∣}^{- 1} V ar e_{n, θ^{*}} (β^{*}), J_{n, θ} (β) = - \frac{d}{d β^{⊺}} e_{n, θ} (β) and {\overset{‒}{S}}_{n, θ} (β) = {∣ W_{n} ∣}^{- 1} E J_{n, θ} (β) .

Note that ${\overset{‒}{Σ}}_{n}$ and ${\overset{‒}{S}}_{n, θ} (β)$ are ‘averaged’ versions of $Σ_{n} = V ar e_{n, θ^{*}} (β^{*})$ and $S_{n, θ} (β) = E J_{n, θ} (β)$ .

In Section 2 in the supplementary material we verify the existence of a ∣W_n∣^1/2 consistent sequence of solutions ${\hat{β}}_{n}$ , i.e., ${∣ W_{n} ∣}^{1 ∕ 2} ({\hat{β}}_{n} - β^{*})$ is bounded in probability. We further show in Section 3 in the supplementary material that ${∣ W_{n} ∣}^{1 ∕ 2} e_{n, {\tilde{θ}}_{n}} (β^{*}) {\overset{‒}{Σ}}_{n}^{- 1 ∕ 2}$ is asymptotically standard normal. The conditions needed for these results are listed in Section 1 in the supplementary material. By a Taylor series expansion,

{∣ W_{n} ∣}^{- 1 ∕ 2} e_{n, {\tilde{θ}}_{n}} (β^{*}) {\overset{‒}{Σ}}_{n}^{- 1 ∕ 2} = {∣ W_{n} ∣}^{1 ∕ 2} ({\hat{β}}_{n} - β^{*}) \frac{J_{n, {\tilde{θ}}_{n}} (b_{n})}{∣ W_{n} ∣} {\overset{‒}{Σ}}_{n}^{- 1 ∕ 2}

for some $b_{n} R^{p}$ satisfying $‖ b_{n} - β^{*} ‖ \leq ‖ {\hat{β}}_{n} - β^{*} ‖$ . Invoking further the convergence $J_{n, {\tilde{θ}}_{n}} (b_{n}) ∕ ∣ W_{n} ∣ - \to {\overset{‒}{S}}_{n, θ^{*}} (β^{*})$ (results R2 and R3 in Section 2 in the supplementary material) and that ${\overset{‒}{Σ}}_{n} = {\overset{‒}{S}}_{n, θ^{*}} (β^{*})$ by (14), we obtain

{∣ W_{n} ∣}^{1 ∕ 2} ({\hat{β}}_{n} - β^{*}) {\overset{‒}{S}}_{n, θ^{*}} {(β^{*})}^{1 ∕ 2} \to N_{p} (0, I) .

(29)

Hence, for a fixed n, ${\hat{β}}_{n}$ is approximately normal with mean $β^{*}$ and covariance matrix estimated by ${∣ W_{n} ∣}^{- 1} {\overset{‒}{S}}_{n, ({\tilde{ψ}}_{n}, {\hat{β}}_{n})}^{- 1} ({\hat{β}}_{n})$ .

6. SIMULATION STUDY AND DATA EXAMPLE

To examine the performance of our optimal quasi-likelihood estimator relative to composite likelihood and weighted composite likelihood, we carry out a simulation study under the Guan and Shen (2010) setting. We refrain from a comparison with maximum likelihood estimation due to the lack of a computationally feasible implementation of this method. In addition to the simulation study we demonstrate the practical usefulness of our method and discuss computational issues in a tropical rain forest data example.

6.1. Simulation Study

Following Guan and Shen (2010), realizations of Cox processes are generated on a square window W. Each simulation involves first the generation of a zero-mean Gaussian random covariate field Z = {Z(u)}_u∈W with exponential covariance function c(u) = exp(−∥u∥/γ), γ > 0, and then the generation of an inhomogeneous Thomas point process given Z with intensity function λ(u; β) = exp [β₀ + β₁Z(u)] and pair correlation function

g (r) = 1 + \exp [- {‖ r ‖}^{2} ∕ (4 ω^{2})] ∕ (4 π ω^{2} κ),

(30)

where κ > 0 is the intensity of the parent process and ω > 0 is the dispersal parameter (Waagepetersen, 2007). For each simulation, preliminary estimates $\tilde{β}$ and $\tilde{ψ}$ of β and ψ = (κ, ω) are obtained using the two-step method in Waagepetersen and Guan (2009) (i.e. $\tilde{β}$ is the composite likelihood (CL) estimate of β). These preliminary estimates are then used to further obtain weighted composite likelihood (WCL) and quasi-likelihood (QL) estimates, see Section 3.2 and Section 4.3-4.4 for details.

The root mean square error (RMSE) of the CL, WCL and QL estimates is computed from 1000 simulations for different settings of parameters and observation windows W = [0, 1]² or W = [0, 2]². We consider covariate spatial correlation scales γ* = 0.05, 0.1, 0.2. The combinations of parent point intensities κ* = 100, 200 and dispersal parameters ω* = 0.02, 0.04 further create a wide selection of clustering behaviours given Z as reflected by the corresponding pair correlation functions. We moreover consider different inhomogeneity levels $β_{1}^{*}$ = 0.5, 1 and adjust the intercept $β_{0}^{*}$ so that the expected number of points is always 400 in the case of W = [0, 1]² and 1600 in the case of W = [0, 2]². The integral terms in the CL, WCL and QL estimating functions are approximated using a 50 × 50 grid for W = [0, 1]² and a 100 × 100 grid for W = [0, 2]². Tapering for QL is carried out as described in Section 4.4 using d_taper obtained with ∊ = 0.01 for each estimated pair correlation function $g (\cdot; \tilde{ψ})$ . For WCL we use $A (u) \approx K (d_{taper}; \tilde{ψ}) - π d_{taper}^{2}$ where A(u) is given by (17) and K by (28).

The intercept β₀ is typically not of prime interest when it comes to statistical inference in regression models so in the following we focus on the results for β₁. Table 1 shows for each setting of $(γ^{*}, κ^{*}, ω^{*}, β_{1}^{*})$ and size of window W the root mean square errors (RMSEs) of the QL estimates for β₁ as well as the increases in RMSE for WCL and CL relative to QL. The table also shows empirical standard errors (SD) for the QL estimates as well as the square root of the average of the estimated variances obtained for the QL estimates using (27) (column ASD in Table 1). The RMSE and the empirical standard errors coincide which confirms the unbiasedness of the QL estimates. There is further close agreement between SD and ASD as expected from the formula $\sqrt{V ar {\hat{β}}_{1}} = \sqrt{EV ar [{\hat{β}}_{1} ∣ Z ∣ + V ar E [{\hat{β}}_{1} ∣ Z]}$ since the expression (27) provides an estimate of $V ar [{\hat{β}}_{1} ∣ Z]$ and $V ar E [{\hat{β}}_{1} ∣ Z]$ is zero by unbiasedness and of ${\hat{β}}_{1}$ given Z. The standard errors increase with increasing γ* but there is not a strong dependence of the standard errors on $(κ^{*}, ω^{*}, β_{1}^{*})$ . For the larger observation window the standard errors are half as big as for the smaller. This is in accordance with the asymptotic result (29) which implies that the standard deviation is inversely proportional to the square root of the observation window size.

Table 1.

RMSE for QL estimates of β₁ and the increase (in %) in RMSE for WCL and CL relative to QL. The empirical standard error (SD) and asymptotic standard error (ASD) for the QL estimates are shown too.

ψ	γ	$β_{1}^{*}$	$W = {[0, 1]}^{2} E [N (W)] = 400$					$W = {[0, 2]}^{2} E [N (W)] = 1600$

			QL	WCL	CL	SD	ASD	QL	WCL	CL	SD	ASD
(100, .02)	.05	.5	.08	21	24	.08	.08	.04	21	26	.04	.04
	.05	1	.09	22	44	.09	.08	.04	25	59	.04	.04
	.1	.5	.09	13	21	.09	.09	.04	15	25	.04	.04
	.1	1	.10	17	39	.10	.10	.05	17	61	.05	.05
	.2	.5	.11	11	15	.11	.10	.05	11	20	.05	.05
	.2	1	.12	13	31	.12	.12	.06	12	49	.06	.06

(100, .04)	.05	.5	.06	23	20	.06	.06	.03	19	21	.03	.03
	.05	1	.07	29	26	.07	.07	.03	31	40	.03	.03
	.1	.5	.08	23	23	.08	.07	.04	22	28	.04	.04
	.1	1	.08	29	34	.08	.08	.04	24	42	.04	.04
	.2	.5	.09	14	20	.09	.09	.05	18	25	.05	.04
	.2	1	.11	16	25	.11	.10	.05	14	38	.05	.05

(200, .02)	.05	.5	.07	7	9	.07	.07	.03	10	13	.03	.03
	.05	1	.08	12	23	.08	.07	.04	21	34	.04	.04
	.1	.5	.08	7	10	.08	.08	.04	7	11	.04	.04
	.1	1	.09	6	19	.09	.08	.04	13	43	.04	.04
	.2	.5	.09	4	7	.09	.09	.04	6	11	.04	.04
	.2	1	.11	8	19	.11	.10	.05	4	30	.05	.05

(200, .04)	.05	.5	.06	9	8	.06	.07	.03	11	10	.03	.03
	.05	1	.07	20	10	.07	.06	.03	25	21	.03	.03
	.1	.5	.07	9	10	.07	.07	.03	11	13	.03	.03
	.1	1	.08	13	9	.08	.07	.04	16	25	.04	.03
	.2	.5	.08	7	8	.08	.08	.04	9	9	.04	.04
	.2	1	.10	14	11	.10	.08	.04	11	23	.04	.04

Open in a new tab

As expected from the theoretical results, the QL estimator has superior performance compared with both CL and WCL in all cases. The improvement over CL is especially substantial in the more clustered (corresponding to small κ* and ω*) and more inhomogeneous (corresponding to $β_{1}^{*} = 1$ ) cases where CL has up to 44% larger RMSE than QL in case of W = [0, 1] and up to 61% larger RMSE in case of W = [0, 2]². As we alluded in Section 3.2, the performance of WCL may rely on the validity of the approximation (16). In case of a longer dependence range, the approximation is expected to be less accurate and this explains the large drop in efficiency of WCL relative to CL when ω* increases from 0.02 to 0.04. In particular, WCL even performs worse than CL when ψ* = (200, 0.04). In contrast, QL still fares better than CL with increases in RMSE of 8-11% (W = [0, 1]²) and 9-25% (W = [0, 2]²) for CL relative to QL. There is not a clear pattern regarding the dependence on γ* of the relative increases.

6.2. Data Example

A fundamental problem in biological research is to understand the very high biodiversity in tropical rain forests. One explanation is the niche assembly hypothesis, which states that different species coexist by adapting to different environmental niches. Data available for studying this hypothesis consist of point patterns of locations of trees as well as observations of environmental covariates. Figure 1 shows the spatial locations of three tree species, Acalypha diversifolia (528 trees), Lonchocarpus heptaphyllus (836 trees) and Capparis frondosa (3299 trees), in a 1000m×500m observation window on Barro Colorado Island (Condit et al., 1996; Condit, 1998; Hubbell and Foster, 1983). Also one example of an environmental variable (potassium content in the soil) is shown.

Fig. 1 — Locations of Acalypha, Lonchocarpus, and Capparis trees and image of interpolated potassium content in the surface soil (from top to bottom).

In order to study the niche assembly hypothesis we use our quasi-likelihood method to fit log-linear regression models for the intensity functions depending on environmental variables. In addition to soil potassium content (K, divided by 1000), we consider nine other covariates for the intensity functions: pH, elevation (dem, divided by 100), slope gradient (grad), multi-resolution index of valley bottom flatness (mrvbf), incoming mean solar radiation (solar), topographic wetness index (twi) as well as soil contents of copper (Cu), mineralized nitrogen (Nmin, divided by 100) and phosphorus (P). The quasi-likelihood estimation was implemented as in the simulation study using a 100×50 grid for the numerical quadrature and tapering tuning parameter ∊ = 0.01.

6.2.1. Modeling of pair correlation function

For each species we initially fit using the two-step procedure in Waagepetersen and Guan (2009) the following pair correlation functions of normal variance mixture type (Jalilian et al., 2013):

g (r; ψ) = 1 + c (r; ψ), r \in R^{2},

where the covariance function c(r; ψ) is either Gaussian

c (r; (σ^{2}, α)) = σ^{2} \exp [- {(‖ r ‖ ∕ α)}^{2}],

Matérn

c (r; (σ^{2}, α, ν)) = σ^{2} \frac{{(‖ r ‖ ∕ α)}^{ν} K_{ν} (‖ r ‖ ∕ α)}{2^{ν - 1} Γ (ν)},

(K_ν is the modified Bessel function of the second kind) or Cauchy

c (r; (σ^{2}, α)) = σ^{2} {[1 + {(‖ r ‖ ∕ α)}^{2}]}^{- 3 ∕ 2} .

These covariance functions represent very different tail behaviour ranging from light (Gaussian), exponential (Matérn), to heavy tails (Cauchy). The pair correlation function obtained with the Gaussian covariance function is just a re-parametrization of the Thomas process pair correlation function (30). For the Matérn covariance we consider three different values of the shape parameter ν = 0.25, 0.5 and 1. With ν = 0.5 the exponential model c[r; (σ², α, 0.5)] = σ² exp(−∥r∥/α) is obtained while ν = 0.25 and 1 yields respectively a log convex and a log concave covariance function.

The solid curves in Figure 2 show $g (\cdot; \hat{ψ})$ for the best fitting (in terms of the minimum contrast criterion for the corresponding K-function) pair correlation functions: Cauchy for Acalypha $(\hat{ψ} = (15.4, 4.6))$ , Matérn $(\hat{ψ} = (2.3, 15.4, 0.5))$ for Lonchocarpus and Matérn $(\hat{ψ} = (1.3, 22.9, 0.25))$ for Capparis. The tapering distances corresponding to ∊ = 0.01 are respectively 20.1, 71.3 and 84.9 for the three species. Hence Capparis is the computationally most challenging case.

Fig. 2 — Solid curves: best fitting pair correlation functions $g (\cdot; \hat{ψ}) - 1$ for Acalypha (left), Lonchocarpus (middle), and Capparis (right). Dotted curve: non-parametric estimate of the pair correlation function. Shaded area: 95% point-wise envelopes based on 199 simulations of fitted model. Dashed curve: mean of non-parametric estimates of pair correlation functions obtained from the simulations.

6.2.2. Results of quasi-likelihood estimation and comparison with previous methods

Backward model selection based on the quasi-likelihood estimates is carried out for each species using significance level 5%. The selected covariates are potassium (K) for Acalypha, mineralized nitrogen (Nmin) and phosphorous (P) for Lonchocarpus and elevation (dem), gradient (grad) and potassium for Capparis. The associated parameter estimates and standard errors are shown in Table 2. The qualitative biological findings from the selected models and the associated parameter estimates are that Acalypha and Capparis both have a preference for niches with high levels of potassium. Regarding topography, Capparis prefers flat areas in high altitudes. Finally, Lonchocarpus appears as a frugal species which is adapted to low levels of mineralized nitrogen and phosphorous. Table 2 for comparison also shows the composite and weighted composite likelihood estimates which are in general quite similar to the quasi-likelihood estimates. One exception is the gradient (grad) regression parameter estimate for Capparis where the absolute value of the quasi-likelihood estimate is about twice as large as for the other estimates.

Table 2.

Quasi-likelihood parameter estimates and standard errors for the selected intensity function models. The table also shows the composite and weighted composite likelihood estimates and the increases in percent of their standard errors relative to quasi-likelihood.

		Acalypha		Lonchocarpus			Capparis
			K		Nmin	P		dem	grad	K
Estm.	QL	−6.9	4.4	−6.5	−2.8	−0.15	−5.1	2.3	−2.0	4.1
	WCL	−6.9	4.3	−6.5	−2.8	−0.16	−5.1	2.8	−0.9	4.2
	CL	−6.9	4.0	−6.5	−2.7	−0.16	−5.1	2.9	−1.1	4.3

se.	QL	0.085	1.2	0.088	0.69	0.055	0.066	0.80	0.95	1.0

Incr. se.	WCL	0.0	0.1	0.9	1.4	2.1	1.3	4.6	11.5	5.3
Incr. se.	CL	0.0	2.3	1.2	7.0	4.5	1.5	8.5	17.1	7.4

Open in a new tab

The advantage of using quasi-likelihood instead of composite likelihood or weighted composite likelihood is that we obtain more precise regression parameter estimates as reflected in smaller estimation standard errors and narrower confidence intervals. Table 2 also shows parameter estimation standard errors for quasi-likelihood and the increase in percent in the standard errors when composite or weighted composite likelihood is used instead. As demonstrated in the simulation study, the advantage of using quasi-likelihood depends much on the strength of correlation as measured by the pair correlation function. For Acalypha where the pair correlation function drops off quickly, the standard errors for the composite likelihood approaches are at most 2.3% larger than for quasi-likelihood. On the other hand, for Capparis the increase in standard error is up to 17.1%. The differences in the parameter estimates and standard errors imply that gradient is a significant covariate on the 5% level according to the quasi-likelihood results but not according to the other two approaches.

We carry out model assessment using the empirical empty space (F) function and a non-parametric estimate of the pair correlation function adjusted for the inhomogeneous intensity function where 95% point-wise envelopes are computed using simulations under the fitted models (for details see e.g. Sections 4.3.4-4.3.6 in Møller and Waagepetersen, 2004). For Lonchocarpus and Capparis the non-parametric estimates of the pair correlation function are contained within the envelopes while a few minor excursions occur for Acalypha (Figure 2). Plots based on the F function (not shown) did not reveal any deficiencies of the fitted models.

6.2.3. Dependence on grid size and tapering parameter

The computing time for the QL estimation depends both on the grid used for the numerical quadrature and the tapering tuning parameter ∊. We also tried out a 150 × 75 grid and ∊ = 0.05 and 0.02 for the QL fitting of the final models. Parameter estimates and parameter estimation computing time (system plus CPU time on a 2.90 GHz lap top) for all combinations of grid sizes, ∊ and species are shown in Table 3. The computing time for the parameter estimation depends much on both grid sizes, ∊ and species (i.e. range of spatial dependence). Computing time including computation of standard errors is shown in Table 4, together with the computed standard errors for the parameter estimates in Table 3. The computing time with computation of standard errors is less sensitive to ∊ and species since in this case the main computational burden arises from the non-sparse matrix in (27). For the 100 × 50 grid and ∊ = 0.01, the maximal computing time of 13.5 seconds (including computation of standard errors) occurs for Capparis. In contrast to large variations in the computing time, the parameter estimates and estimated standard errors for each species are very stable across the combinations of grid sizes and tapering parameter ∊.

Table 3.

Computing times (T) in seconds (without computation of standard errors) and QL parameter estimates for different combinations of grid size and tapering.

		Acalypha		Lonchocarpus		Capparis
Grid	∊	T	estm.	T	estm.	T	estm.
100×50	0.05	0.3	−6.9 4.4	1.1	−6.5 −2.8 −0.16	1.0	−5.1 2.4 −2.0 4.3
	0.01	0.4	−6.9 4.4	2.6	−6.5 −2.8 −0.15	2.2	−5.1 2.3 −2.0 4.1
	.002	0.6	−6.9 4.4	4.4	−6.5 −2.8 −0.15	4.3	−5.1 2.3 −2.0 4.1

150×75	0.05	0.5	−6.9 4.3	8.5	−6.5 −2.8 −0.16	10.2	−5.1 2.4 −1.9 4.2
	0.01	1.8	−6.9 4.3	23.7	−6.5 −2.8 −0.15	25.0	−5.1 2.3 −1.9 4.1
	.002	5.3	−6.9 4.3	41.6	−6.5 −2.8 −0.15	108.2	−5.1 2.3 −1.9 4.0

Open in a new tab

Table 4.

Computing times (T) in seconds (including computation of standard errors) and estimated standard errors of QL parameter estimates for different combinations of grid size and tapering

		Acalypha		Lonchocarpus		Capparis
Grid	∊	T	sd.	T	sd.	T	sd.
100×50	0.05	12.1	0.085 1.2	22.4	0.088 0.69 0.055	13.0	0.067 0.86 1.1 1.0
	0.01	12.0	0.085 1.2	24.0	0.088 0.69 0.055	13.5	0.067 0.86 1.1 1.0
	.002	12.1	0.085 1.2	25.9	0.088 0.69 0.055	13.6	0.067 0.86 1.1 1.0

150×75	0.05	59.4	0.079 1.1	187.2	0.087 0.69 0.055	158.6	0.066 0.86 1.1 1.0
	0.01	58.9	0.079 1.1	204.6	0.087 0.69 0.055	169.5	0.066 0.86 1.1 1.0
	.002	63.6	0.079 1.1	226.5	0.087 0.69 0.055	170.7	0.066 0.86 1.1 1.0

Open in a new tab

7. DISCUSSION

In contrast to maximum likelihood estimation our quasi-likelihood estimation method only requires the specification of the intensity function and a pair correlation function. Moreover, the estimation of the regression parameters can be expected to be quite robust toward misspecification of the pair correlation function since the resulting estimating equation is unbiased for any choice of pair correlation function. In the data example we considered pair correlation functions obtained from covariance functions of normal variance mixture type. Alternatively one might consider pair correlation functions of the log Gaussian Cox process type (Møller et al., 1998), i.e., g(r) = exp [c(r)], where c(·) is an arbitrary covariance function.

If a log Gaussian Cox process is deemed appropriate, a computationally feasible alternative to our approach is to use the method of integrated nested Laplace approximation (INLA, Rue et al., 2009; Illian et al., 2012) to implement Bayesian inference. However, in order to apply INLA it is required that the Gaussian field can be approximated well by a Gaussian Markov random field and this can limit the choice of covariance function. For example, the accurate Gaussian Markov random field approximations in Lindgren et al. (2011) of Gaussian fields with Matérn covariance functions are restricted to integer ν in the planar case. In contrast, our approach is not subject to such limitations and is not restricted to log Gaussian Cox processes.

We finally note that for the Nyström approximate solution of the Fredholm equation we used the simplest possible quadrature scheme given by a Riemann sum for a fine grid. This entails a minimum of assumptions regarding the integrand but at the expense of a typically high-dimensional covariance matrix V. There may be scope for further development considering more sophisticated numerical quadrature schemes.

Supplementary Material

Supp Material

NIHMS605050-supplement-Supp_Material.pdf^{(103.2KB, pdf)}

Acknowledgements

We thank the editors and a referee for their helpful and constructive comments. Abdollah Jalilian and Rasmus Waagepetersen’s research was supported by the Danish Natural Science Research Council, grant 09-072331 ‘Point process modeling and statistical inference’, Danish Council for Independent Research — Natural Sciences, Grant 12-124675, ‘Mathematical and Statistical Analysis of Spatial Data’, and by Centre for Stochastic Geometry and Advanced Bioimaging, funded by a grant from the Villum Foundation. Yongtao Guan’s research was supported by NSF grant DMS-0845368, by NIH grant 1R01DA029081-01A1 and by the VELUX Visiting Professor Programme.

The BCI forest dynamics research project was made possible by National Science Foundation grants to Stephen P. Hubbell: DEB-0640386, DEB-0425651, DEB-0346488, DEB-0129874, DEB-00753102, DEB-9909347, DEB-9615226, DEB-9615226, DEB-9405933, DEB-9221033, DEB-9100058, DEB-8906869, DEB-8605042, DEB-8206992, DEB-7922197, support from the Center for Tropical Forest Science, the Smithsonian Tropical Research Institute, the John D. and Catherine T. MacArthur Foundation, the Mellon Foundation, the Celera Foundation, and numerous private individuals, and through the hard work of over 100 people from 10 countries over the past two decades. The plot project is part of the Center for Tropical Forest Science, a global network of large-scale demographic tree plots.

The BCI soils data set were collected and analyzed by J. Dalling, R. John, K. Harms, R. Stallard and J. Yavitt with support from NSF DEB021104, 021115, 0212284, 0212818 and OISE 0314581, STRI and CTFS. Paolo Segre and Juan Di Trani provided assistance in the field. The covariates dem, grad, mrvbf, solar and twi were computed in SAGA GIS by Tomislav Hengl (http://spatial-analyst.net/).

APPENDIX A. Condition for optimality

To show that (9) implies non-negative definiteness of (8), let ${\hat{e}}_{ϕ} (β) = e_{f} (β) Σ_{f}^{- 1} Σ_{f ϕ}$ be the optimal linear predictor of e_ϕ(β) given e_f(β). Then

V ar [{\hat{e}}_{ϕ} (β) - e_{ϕ} (β)] = Σ_{ϕ} - Σ_{ϕ f} Σ_{f}^{- 1} Σ_{f ϕ}

is non-negative definite whereby

S_{ϕ} Σ_{ϕ}^{- 1} S_{ϕ} - S_{ϕ} Σ_{ϕ}^{- 1} Σ_{ϕ f} Σ_{f}^{- 1} Σ_{f ϕ} Σ_{ϕ}^{- 1} S_{ϕ}

is non-negative definite too. Hence, (8) is non-negative definite provided

S_{ϕ} Σ_{ϕ}^{- 1} Σ_{ϕ f} = S_{f}

which follows from (9) (in particular, (9) implies Σ_ϕ = Σ_ϕϕ = S_ϕ.

APPENDIX B. SOLUTION USING NEUMANN SERIES EXPANSION

Suppose that ∥T∥_op = sup{∥Tf∥_∞/∥f∥_∞ : ∥f∥_∞ ≠ 0} < 1. Then we can obtain the solution ϕ of (11) using a Neumann series expansion which may provide additional insight on the properties of ϕ. More specifically,

ϕ = \sum_{k = 0}^{\infty} {(- T)}^{k} \frac{λ^{'}}{λ} .

(31)

If the infinite sum in (31) is truncated to the first term (k = 0) then (13) becomes the Poisson score. Note that

{‖ T ‖}_{\infty} \leq \sup_{u \in W} \int_{W} ∣ t (u, v) ∣ d v .

Hence, a sufficient condition for the validity of the Neumann series expansion is

\sup_{u \in W} λ (u; β) \int_{R^{2}} ∣ g (r) - 1 ∣ d r < 1 .

(32)

Condition (32) roughly requires that g(r)−1 does not decrease too slowly to zero and/or that λ is moderate. For example, suppose that g is the Thomas process pair correlation function (30). Then,

\int_{R^{2}} ∣ g (r) - 1 ∣ d r = \frac{1}{4 π κ ω^{2}} \int_{R^{2}} \exp (- \frac{{‖ r ‖}^{2}}{4 ω^{2}}) d r = 1 ∕ κ

and (32) is equivalent to sup_u∈W λ(u; β) < κ. In this case, Condition (32) can be quite restrictive. However, the Neumann series expansion is not essential for our approach and we use it only for checking the conditions for asymptotic results; see Section 1 in the supplementary material.

Contributor Information

Yongtao Guan, Miami, USA.

Abdollah Jalilian, Kermanshah, Iran.

Rasmus Waagepetersen, Aalborg, Denmark.

References

Baddeley A, Turner R. spatstat: An R package for analyzing spatial point patterns. Journal of Statistical Software. 2005;12(6):1–42. [Google Scholar]
Baddeley AJ, Møller J, Waagepetersen R. Non- and semi-parametric estimation of interaction in inhomogeneous point patterns. Statistica Neerlandica. 2000;54:329–350. [Google Scholar]
Baddeley AJ, Turner R, Møller J, Hazelton M. Residual analysis for spatial point processes (with discussion) Journal of the Royal Statistical Society, Series B. 2005;67:617–666. [Google Scholar]
Condit R. Tropical Forest Census Plots. Springer-Verlag and R. G. Landes Company; Berlin, Germany and Georgetown, Texas: 1998. [Google Scholar]
Condit R, Hubbell SP, Foster RB. Changes in tree species abundance in a neotropical forest: impact of climate change. Journal of Tropical Ecology. 1996;12:231–256. [Google Scholar]
Furrer R, Genton MG, Nychka D. Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics. 2006;15:502–523. [Google Scholar]
Gotway CA, Stroup WW. A generalized linear model approach to spatial data analysis and prediction. Journal of Agricultural, Biological, and Environmental Statistics. 1997;2:157–178. [Google Scholar]
Gray, Robert J. Weighted estimating equations for linear regression analysis of clustered failure time data. Lifetime Data Analysis. 2003;9(2):123–138. doi: 10.1023/a:1022932117951. [DOI] [PubMed] [Google Scholar]
Guan Y, Loh JM. A thinned block bootstrap procedure for modeling inhomogeneous spatial point patterns. Journal of the American Statistical Association. 2007;102:1377–1386. [Google Scholar]
Guan Y, Shen Y. A weighted estimating equation approach for inhomogeneous spatial point processes. Biometrika. 2010;97:867–880. [Google Scholar]
Hackbusch W. Integral equations - theory and numerical treatment. Birkhäuser; 1995. [Google Scholar]
Heyde CC. Springer Series in Statistics. Springer; 1997. Quasi-likelihood and its application - a general approach to optimal parameter estimation. [Google Scholar]
Hubbell SP, Foster RB. Diversity of canopy trees in a neotropical forest and implications for conservation. In: Sutton SL, Whitmore TC, Chadwick AC, editors. Tropical Rain Forest: Ecology and Management. Blackwell Scientific Publications; Oxford: 1983. pp. 25–41. [Google Scholar]
Illian JB, Sørbye SH, Rue H. A toolbox for fitting complex spatial point process models using integrated nested Laplace approximation (INLA) Annals of Applied Statistics. 2012;6:1499–1530. [Google Scholar]
Jalilian A, Guan Y, Waagepetersen R. Decomposition of variance for spatial Cox processes. Scandinavian Journal of Statistics. 2013;40:119–137. doi: 10.1111/j.1467-9469.2012.00795.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lax PD. Functional analysis. Wiley; 2002. [Google Scholar]
Liang K, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]
Lin P-S, Clayton MK. Analysis of binary spatial data by quasi-likelihood estimating equations. Annals of Statistics. 2005;33:542–555. [Google Scholar]
Lin Y-C, Chang L-W, Yang K-C, Wang H-H, Sun I-F. Point patterns of tree distribution determined by habitat heterogeneity and dispersal limitation. Oecologia. 2011;165:175–184. doi: 10.1007/s00442-010-1718-x. [DOI] [PubMed] [Google Scholar]
Lindgren F, Rue H, Lindström J. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society B. 2011;73:423–498. [Google Scholar]
Møller J, Syversveen AR, Waagepetersen RP. Log Gaussian Cox processes. Scandinavian Journal of Statistics. 1998;25:451–482. [Google Scholar]
Møller J, Waagepetersen RP. Statistical inference and simulation for spatial point processes. Chapman and Hall/CRC; Boca Raton: 2004. [Google Scholar]
Møller J, Waagepetersen RP. Modern statistics for spatial point processes. Scandinavian Journal of Statistics. 2007;34:643–684. [Google Scholar]
Mrkvička T, Molchanov I. Optimisation of linear unbiased intensity estimators for point processes. Annals of the Institute of Statistical Mathematics. 2005;57:71–81. [Google Scholar]
Renner IW, Warton DI. Equivalence of MAXENT and Poisson point process models for species distribution modeling in ecology. Biometrics. 2013;69:274–281. doi: 10.1111/j.1541-0420.2012.01824.x. [DOI] [PubMed] [Google Scholar]
Rue H, Martino S, Chopin N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations (with discussion) Journal of the Royal Statistical Society B. 2009;71:319–392. [Google Scholar]
Schoenberg FP. Consistent parametric estimation of the intensity of a spatial-temporal point process. Journal of Statistical Planning and Inference. 2005;128:79–93. [Google Scholar]
Shen G, Yu M, Hu X-S, Mi X, Ren H, Sun I-F, Ma K. Species-area relationships explained by the joint effects of dispersal limitation and habitat heterogeneity. Ecology. 2009;90:3033–3041. doi: 10.1890/08-1646.1. [DOI] [PubMed] [Google Scholar]
Song PX-K. Springer Series in Statistics. Springer; New York, NY: 2007. Correlated data analysis: modeling, analytics, and applications. [Google Scholar]
Waagepetersen R. Discussion of ‘Residual analysis for spatial point processes’. Journal of the Royal Statistical Society,Series B. 2005;67:662. [Google Scholar]
Waagepetersen R. An estimating function approach to inference for inhomogeneous Neyman-Scott processes. Biometrics. 2007;63:252–258. doi: 10.1111/j.1541-0420.2006.00667.x. [DOI] [PubMed] [Google Scholar]
Waagepetersen R, Guan Y. Two-step estimation for inhomogeneous spatial point processes. Journal of the Royal Statistical Society, Series B. 2009;71:685–702. doi: 10.1111/rssb.12083. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wedderburn RWM. Quasi-likelihood functions, generalized linear models, and the Gauss - Newton method. Biometrika. 1974;61(3):439–447. [Google Scholar]
Zeger SL. A regression model for time series of counts. Biometrika. 1988;75:621–629. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Material

NIHMS605050-supplement-Supp_Material.pdf^{(103.2KB, pdf)}

[R1] Baddeley A, Turner R. spatstat: An R package for analyzing spatial point patterns. Journal of Statistical Software. 2005;12(6):1–42. [Google Scholar]

[R2] Baddeley AJ, Møller J, Waagepetersen R. Non- and semi-parametric estimation of interaction in inhomogeneous point patterns. Statistica Neerlandica. 2000;54:329–350. [Google Scholar]

[R3] Baddeley AJ, Turner R, Møller J, Hazelton M. Residual analysis for spatial point processes (with discussion) Journal of the Royal Statistical Society, Series B. 2005;67:617–666. [Google Scholar]

[R4] Condit R. Tropical Forest Census Plots. Springer-Verlag and R. G. Landes Company; Berlin, Germany and Georgetown, Texas: 1998. [Google Scholar]

[R5] Condit R, Hubbell SP, Foster RB. Changes in tree species abundance in a neotropical forest: impact of climate change. Journal of Tropical Ecology. 1996;12:231–256. [Google Scholar]

[R6] Furrer R, Genton MG, Nychka D. Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics. 2006;15:502–523. [Google Scholar]

[R7] Gotway CA, Stroup WW. A generalized linear model approach to spatial data analysis and prediction. Journal of Agricultural, Biological, and Environmental Statistics. 1997;2:157–178. [Google Scholar]

[R8] Gray, Robert J. Weighted estimating equations for linear regression analysis of clustered failure time data. Lifetime Data Analysis. 2003;9(2):123–138. doi: 10.1023/a:1022932117951. [DOI] [PubMed] [Google Scholar]

[R9] Guan Y, Loh JM. A thinned block bootstrap procedure for modeling inhomogeneous spatial point patterns. Journal of the American Statistical Association. 2007;102:1377–1386. [Google Scholar]

[R10] Guan Y, Shen Y. A weighted estimating equation approach for inhomogeneous spatial point processes. Biometrika. 2010;97:867–880. [Google Scholar]

[R11] Hackbusch W. Integral equations - theory and numerical treatment. Birkhäuser; 1995. [Google Scholar]

[R12] Heyde CC. Springer Series in Statistics. Springer; 1997. Quasi-likelihood and its application - a general approach to optimal parameter estimation. [Google Scholar]

[R13] Hubbell SP, Foster RB. Diversity of canopy trees in a neotropical forest and implications for conservation. In: Sutton SL, Whitmore TC, Chadwick AC, editors. Tropical Rain Forest: Ecology and Management. Blackwell Scientific Publications; Oxford: 1983. pp. 25–41. [Google Scholar]

[R14] Illian JB, Sørbye SH, Rue H. A toolbox for fitting complex spatial point process models using integrated nested Laplace approximation (INLA) Annals of Applied Statistics. 2012;6:1499–1530. [Google Scholar]

[R15] Jalilian A, Guan Y, Waagepetersen R. Decomposition of variance for spatial Cox processes. Scandinavian Journal of Statistics. 2013;40:119–137. doi: 10.1111/j.1467-9469.2012.00795.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Lax PD. Functional analysis. Wiley; 2002. [Google Scholar]

[R17] Liang K, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]

[R18] Lin P-S, Clayton MK. Analysis of binary spatial data by quasi-likelihood estimating equations. Annals of Statistics. 2005;33:542–555. [Google Scholar]

[R19] Lin Y-C, Chang L-W, Yang K-C, Wang H-H, Sun I-F. Point patterns of tree distribution determined by habitat heterogeneity and dispersal limitation. Oecologia. 2011;165:175–184. doi: 10.1007/s00442-010-1718-x. [DOI] [PubMed] [Google Scholar]

[R20] Lindgren F, Rue H, Lindström J. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society B. 2011;73:423–498. [Google Scholar]

[R21] Møller J, Syversveen AR, Waagepetersen RP. Log Gaussian Cox processes. Scandinavian Journal of Statistics. 1998;25:451–482. [Google Scholar]

[R22] Møller J, Waagepetersen RP. Statistical inference and simulation for spatial point processes. Chapman and Hall/CRC; Boca Raton: 2004. [Google Scholar]

[R23] Møller J, Waagepetersen RP. Modern statistics for spatial point processes. Scandinavian Journal of Statistics. 2007;34:643–684. [Google Scholar]

[R24] Mrkvička T, Molchanov I. Optimisation of linear unbiased intensity estimators for point processes. Annals of the Institute of Statistical Mathematics. 2005;57:71–81. [Google Scholar]

[R25] Renner IW, Warton DI. Equivalence of MAXENT and Poisson point process models for species distribution modeling in ecology. Biometrics. 2013;69:274–281. doi: 10.1111/j.1541-0420.2012.01824.x. [DOI] [PubMed] [Google Scholar]

[R26] Rue H, Martino S, Chopin N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations (with discussion) Journal of the Royal Statistical Society B. 2009;71:319–392. [Google Scholar]

[R27] Schoenberg FP. Consistent parametric estimation of the intensity of a spatial-temporal point process. Journal of Statistical Planning and Inference. 2005;128:79–93. [Google Scholar]

[R28] Shen G, Yu M, Hu X-S, Mi X, Ren H, Sun I-F, Ma K. Species-area relationships explained by the joint effects of dispersal limitation and habitat heterogeneity. Ecology. 2009;90:3033–3041. doi: 10.1890/08-1646.1. [DOI] [PubMed] [Google Scholar]

[R29] Song PX-K. Springer Series in Statistics. Springer; New York, NY: 2007. Correlated data analysis: modeling, analytics, and applications. [Google Scholar]

[R30] Waagepetersen R. Discussion of ‘Residual analysis for spatial point processes’. Journal of the Royal Statistical Society,Series B. 2005;67:662. [Google Scholar]

[R31] Waagepetersen R. An estimating function approach to inference for inhomogeneous Neyman-Scott processes. Biometrics. 2007;63:252–258. doi: 10.1111/j.1541-0420.2006.00667.x. [DOI] [PubMed] [Google Scholar]

[R32] Waagepetersen R, Guan Y. Two-step estimation for inhomogeneous spatial point processes. Journal of the Royal Statistical Society, Series B. 2009;71:685–702. doi: 10.1111/rssb.12083. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Wedderburn RWM. Quasi-likelihood functions, generalized linear models, and the Gauss - Newton method. Biometrika. 1974;61(3):439–447. [Google Scholar]

[R34] Zeger SL. A regression model for time series of counts. Biometrika. 1988;75:621–629. [Google Scholar]

PERMALINK

Quasi-likelihood for Spatial Point Processes

Yongtao Guan

Abdollah Jalilian

Rasmus Waagepetersen

Summary

1. INTRODUCTION

2. BACKGROUND

2.1. Basic Assumptions on the Intensity and Pair Correlation Function

2.2. Composite Likelihood

2.3. Primer on Estimating Functions and Quasi-likelihood

3. AN OPTIMAL FIRST-ORDER ESTIMATING EQUATION

3.1. Condition for non-negative eigenvalues of T

3.2. Relation to Existing Methods

4. IMPLEMENTATION

4.1. Numerical Approximation

4.2. Implementation as quasi-likelihood

4.3. Computational details

4.4. Preliminary Estimation of Intensity and Pair Correlation

5. ASYMPTOTIC THEORY

6. SIMULATION STUDY AND DATA EXAMPLE

6.1. Simulation Study

Table 1.

6.2. Data Example

Fig. 1.

6.2.1. Modeling of pair correlation function

Fig. 2.

6.2.2. Results of quasi-likelihood estimation and comparison with previous methods

Table 2.

6.2.3. Dependence on grid size and tapering parameter

Table 3.

Table 4.

7. DISCUSSION

Supplementary Material

Acknowledgements

APPENDIX A. Condition for optimality

APPENDIX B. SOLUTION USING NEUMANN SERIES EXPANSION

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases