Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Sep 16.
Published in final edited form as: J Comput Graph Stat. 2014 Jul 31;24(3):733–755. doi: 10.1080/10618600.2014.948178

copCAR: A Flexible Regression Model for Areal Data

John Hughes 1
PMCID: PMC4628820  NIHMSID: NIHMS642503  PMID: 26539023

Abstract

Non-Gaussian spatial data are common in many fields. When fitting regressions for such data, one needs to account for spatial dependence to ensure reliable inference for the regression coefficients. The two most commonly used regression models for spatially aggregated data are the automodel and the areal generalized linear mixed model (GLMM). These models induce spatial dependence in different ways but share the smoothing approach, which is intuitive but problematic. This article develops a new regression model for areal data. The new model is called copCAR because it is copula-based and employs the areal GLMM’s conditional autoregression (CAR). copCAR overcomes many of the drawbacks of the automodel and the areal GLMM. Specifically, copCAR (1) is flexible and intuitive, (2) permits positive spatial dependence for all types of data, (3) permits efficient computation, and (4) provides reliable spatial regression inference and information about dependence strength. An implementation is provided by R package copCAR, which is available from the Comprehensive R Archive Network, and supplementary materials are available online.

Keywords: Composite likelihood, Copula, Distributional transform, Generalized linear model, Markov random field, Non-Gaussian data, Spatial confounding, Spatial regression

1. INTRODUCTION

Spatially aggregated, or areal, data are common in many fields, including forestry, marketing, image analysis, ecology, epidemiology, geography, and dentistry. Researchers in these fields are often interested in scientific explanation rather than prediction, in which case reliable, i.e., accurate and precise, spatial regression inference is important.

The two most often used models for areal regression are the automodel (Besag 1974) and the areal generalized linear mixed model (GLMM) (Besag, York and Mollié 1991), both of which have long histories and enjoy widespread popularity. But these models pose numerous analytical, statistical, and computational challenges. We develop a new areal model that addresses many of these challenges. The new model is called copCAR because it is copula-based and because it borrows the conditional autoregression (CAR) from the areal GLMM.

In Section 2 we review the automodel and the areal GLMM and describe their drawbacks. In Section 3 we develop the copCAR model and discuss its advantages and limitations. In Section 4 we recommend three approaches to frequentist inference for copCAR. In Section 5 we describe how to efficiently fit copCAR to data. In Section 6 we present the results of a simulation study designed to reveal the behavior of the copCAR estimators for realistic sample sizes. In Section 7 we apply copCAR and the areal GLMM to Slovenian stomach cancer data (Zadnik and Reich 2006; Reich, Hodges and Zadnik 2006).

2. AREAL MODELS

In this section we provide the motivation for copCAR by reviewing the automodel and the areal GLMM and describing the difficulties they pose. Although the automodel and the areal GLMM can be distinguished by their dependence components (one is direct, the other hierarchical), the two models share the smoothing approach, which is to say they induce spatial dependence by adding a spatial term to the linear predictor of the classical GLM. (Although this article uses GLM terminology, these ideas also apply to similar models for which the response does not follow an exponential family distribution, e.g., beta regression models.) Although smoothing models have proved quite successful, smoothing can be problematic. copCAR, by contrast, exploits the modularity of copula-based modeling, which allows us to model the marginal distributions and dependence structure separately before joining them by way of the probability integral transform (Sklar 1959). In our view, this is the simplest, most natural, and most flexible way to construct a multivariate distribution. And this approach allows copCAR to overcome many of the challenges faced by the smoothing models.

2.1 The Automodel

The automodel is a Markov random field (MRF) model (Kindermann and Snell 1980). That is, the model is specified in terms of full conditional distributions that correspond to a valid joint distribution if certain conditions are satisfied. MRF models were developed by physicists in the 1920s—in connection with the study of ferromagnetism (Ising 1925)—but did not make their way into the statistics literature until the late Julian Besag formulated the automodel in a seminal 1974 paper.

The automodel can be specified as follows. Let G = (V, E) be the underlying graph, where V = {1, 2, …, n} are the vertices and EV × V are the edges of G. Each vertex of G corresponds to a region over which measurements have been aggregated—e.g., pixel, political district, province. And each edge of G represents the spatial adjacency of two such regions. We assume that G is undirected and free of loops and parallel edges.

Associate with the ith vertex the random variable Zi, so that Z = (Z1, …, Zn)′ is the random field of interest. Then the transformed conditional means are

g{E(Zi|θ,Zi)}=xiβ+ηj:(i,j)E(Zjμj), (1)

where g is a link function, θ = (β′, η)′ is the complete vector of parameters, Z−i = Z \ {Zi} is the field with the ith observation excluded, xi is a p-vector of spatial predictors for the ith areal unit, β is a p-vector of regression coefficients, η is a spatial dependence parameter, and μj is the independence expectation of Zj, i.e., μj=E(Zj|β,η=0)=g1(xjβ). Note that η > 0 implies positive dependence, η = 0 implies spatial independence, and η < 0 implies spatial repulsion. In the sequel we take η to be non-negative since repulsion is not often observed in the phenomena to which these models are typically applied.

The second term in (1) is called the autocovariate. It is by means of this “device” that the automodel induces dependence. To see this, note that η measures the reactivity of Zi to its neighbors, conditional on the large-scale structure represented by the neighbors’ independence expectations {μj : (i, j) ∈ E}. As η increases, Zi becomes more dependent on its neighbors and so less dependent on xjβ. Eventually the dependence becomes so strong that the large-scale structure is overwhelmed by small-scale interactions, making recovery of β impossible.

Although the automodel has much to recommend it for certain types of data, the model presents various theoretical and computational difficulties in general. For example,

We turn now to the areal GLMM, which addresses some of the automodel’s challenges but faces formidable difficulties of its own.

2.2 The Areal GLMM

The areal GLMM also induces dependence through smoothing, but instead of modeling dependence directly, as the automodel does, the mixed model is hierarchical with an autonormal second stage. The model was introduced by Besag et al. in 1991, and it has since been the most commonly used areal model, owing to its flexible specification, the availability of the WinBUGS software application (Lunn, Thomas, Best and Spiegelhalter 2000), and the difficulties posed by the automodel.

The first stage of the areal GLMM is

g{E(Zi|β,Si)}=xiβ+Si. (2)

Distortion of the large-scale structure is accomplished by way of spatially dependent random effects S = (S1, …, Sn)′, which are assumed to have come from a zero-mean Gaussian Markov random field (GMRF) (Rue and Held 2005). That is, SN{0,(τQ)1}, where τ > 0 is a smoothing parameter and Q is a precision matrix. Two forms for Q are commonly used: the proper CAR and the improper (or intrinsic) CAR (Besag and Kooperberg 1995), where CAR stands for conditional autoregression. The proper CAR can be specified as follows.

Let di be the degree of vertex i, let D = diag(d1,…, dn), let ρ ∈ [0, 1), and let A = [1{(i, j) ∈ E}] be the adjacency matrix of G, where 1{•} is the indicator function. Then the precision matrix for the proper CAR is Q = DρA. This marginal definition corresponds to the conditional distributions

Si|ρ,τ,Si~N(ρdij:(i,j)ESj,1τdi),

which reveals that ρ is analogous to η in (1).

The restriction of ρ to [0, 1) is not necessary to ensure that Q is nonsingular. It is sufficient to take ρ ∈ (1/ min λ, 1/ max λ), where λ = (λ1, …, λn)′ are the eigenvalues of Ω = D−1A, the row-normalized adjacency matrix (Banerjee, Carlin and Gelfand 2004, p. 80). In short, the restriction ensures that Q is nonsingular and that the model has a sensible interpretation for the desired sphere of application (Gelfand and Vounatsou 2003).

The areal GLMM has at least the following three unappealing aspects.

  • (1)

    It can be difficult to deal with S. In a Bayesian framework, it is well known that a univariate Metropolis–Hastings algorithm for sampling from the posterior distribution of S leads to a slow mixing Markov chain because the coordinates of S exhibit strong a posteriori dependence. This has led to a number of approaches for updating S in a block(s). Constructing proposals for these updates is challenging, and the better mixing comes at the cost of increased running time per iteration—see, for instance, Knorr-Held and Rue (2002), Haran, Hodges and Carlin (2003), and Haran and Tierney (2012).

  • (2)

    The spatial random effects are collinear with the fixed effects predictors. This spatial confounding causes variance inflation that may make important predictors appear insignificant. This was discovered by Clayton, Bernardinelli and Montomoli (1993) and proved (for the intrinsic CAR) by Reich et al. (2006). The argument given by Reich et al. (2006) carries over essentially unchanged to the proper CAR model and to the generalized linear geostatistical model (Diggle, Tawn and Moyeed 1998).

  • (3)

    The areal GLMM is generally unable to recover ρ (Banerjee et al. 2004). In a Bayesian setting, this problem is customarily handled by assigning ρ an informative prior distribution that encourages ρ’s posterior to be concentrated near 1 (Banerjee et al. 2004; Gelfand and Vounatsou 2003). This forces smoothing, which can improve prediction, but does not result in reliable inference for ρ.

The last two of these liabilities are the most troubling because they can lead to erroneous regression inference. Reich et al. (2006) and Hughes and Haran (2013) proposed reparameterized versions of the areal GLMM that alleviate the spatial confounding and also result in a faster mixing Markov chain. Hughes and Haran (2013) demonstrated the effectiveness of these reparameterizations for the intrinsic CAR, but it is not clear that the techniques can be carried over to the proper CAR in a way that permits efficient computation. By inducing dependence using a copula instead of an augmented linear predictor, copCAR eliminates problems (1) and (2) and is able to recover ρ.

3. THE COPCAR MODEL FOR AREAL DATA

During the last decade or so, copula regression models have been applied in several fields, most notably economics, finance, and insurance (Kolev and Paiva 2009). Only recently were copula models developed for spatial applications, and the copula approach has thus far been limited to geostatistical data (Madsen 2009; Kazianka and Pilz 2010). To our knowledge, copCAR is the first copula model for areal data.

The chief advantage of copula modeling is modularity. The dependence structure and the marginal distributions can be modeled separately (Sklar 1959) and then joined by way of the probability integral transform. This approach allows copCAR to overcome the problems associated with the smoothing models described in the previous section. (See Nelsen (2006) for an introduction to copulas.)

3.1 The Gaussian Copula

The focus of this article is the Gaussian copula, which can be constructed as follows. Let ΦR be the cdf of a multinormal random variable with mean 0 and correlation matrix R(ξ) (where ξ are dependence parameters), and let Φ be the standard normal cdf. Then the Gaussian copula is

CR(u)=ΦR{Φ1(u1),,Φ1(un)}. (3)

Note that CR is defined using a correlation matrix rather than a covariance matrix—the variances, being properties of the marginal distributions, are irrelevant to the dependence structure and can be omitted.

3.2 The copCAR Specification

copCAR employs what we call the CAR copula, a Gaussian copula (or other suitable copula) based on the proper CAR described above. Recall that the proper CAR has precision matrix τQ, where Q = DρA. Since a copula is scale free, we do not need τ, but omitting τ does not leave us with an inverse correlation matrix because the variances σ2=(σ12,,σn2)=vecdiag(Q1) are not equal to 1. We could rescale Q so that its inverse is a correlation matrix, i.e., we could construct a Gaussian copula using Σ1/2QΣ1/2, where Σ = diag(σ2). In fact, rescaling is necessary in the general case lest the model be unidentifiable with respect to the variances. For copCAR, however, rescaling is unnecessary because the variances σ2 are not free parameters. The variances are entirely determined by Q’s only dependence parameter, ρ, which is not a scale parameter. (Assunção and Krainski (2009) showed that ρ is best thought of as a range parameter.) Since using Q itself leads to an identifiable model, rescaling would slow computation but accomplish nothing. Thus copCAR employs the CAR correlation structure indirectly, by using Q along with the variances σ2. This leads to the CAR copula:

CQ1(u)=ΦQ1{Φσ11(u1),,Φσn1(un)}, (4)

where Φσi denotes the distribution function of the normal distribution with mean 0 and variance σi2.

The model specification can be completed by pairing the CAR copula with a set of suitable marginal distributions for the outcomes. The copula and the marginals are linked by way of the probability integral transform. Specifically, if Z = (Z1,…, Zn)′ are the observations, and F1,…, Fn are the desired marginal distribution functions, we have Zi=Fi1(Ui), where U = (U1,…, Un)′ is a realization of the copula. In a spatial setting, Bernoulli or Poisson marginals, with expectations {1+exp(xiβ)}1 and exp(xiβ), respectively, are common, but there are many other options for the marginal specifications, e.g., extreme-value distributions, beta regression models, zero-inflated models, heavy-tailed distributions, etc.

3.3 Advantages and Limitations of copCAR

copCAR addresses the problems associated with the above mentioned smoothing models. Unlike the auto-model, copCAR is flexible—a new type of data or change in G does not necessitate new analytical work—and permits positive spatial dependence for any type of data. Unlike the areal GLMM, copCAR is able to recover ρ, as we will show in Section 6; is free of computationally burdensome random effects; and cannot exhibit spatial confounding. Additionally, copCAR is a marginal regression model, which is to say that estimates of regression coefficients have the same interpretation as in an independence model. Practitioners may find this less confusing than interpreting estimates conditional on random effects or neighboring observations.

Spatial confounding deserves further discussion. Let P be the orthogonal projection onto C(X), where X is the design matrix. Now, spectrally decompose the operators P and IP to acquire orthogonal bases K and L for C(X) and C(X), respectively. These bases allow us to rewrite (2) as

g{E(Zi|β,Si)}=xiβ+Si=xiβ+kiγ+liδ,

where γ and δ are random coefficients (Reich et al. 2006). This form exposes the source of the areal GLMM’s spatial confounding: K and X have the same column space. This cannot happen for copCAR because copCAR uses the “bare” linear predictor xiβ.

Although copCAR is an appealing alternative to the automodel and the areal GLMM, it does have some limitations qua spatial model. copCAR obviously cannot capture non-Gaussian dependence, e.g., tail dependence or asymmetric dependence. But asymmetry could easily be accommodated by incorporating Q into a non-central χ2 copula (Bárdossy and Li 2008). And it would be straightforward to accommodate tail dependence by using Q to construct an elliptical copula such as a t copula (Demarta and McNeil 2005), for example. Another attractive choice is the skew-t copula, which can accommodate both asymmetric and tail dependence (Smith, Gan and Kohn 2012). Neither the t nor the skew-t copula would permit spatial independence, however, which some practitioners may find unacceptable, or at least unappealing.

A second potential drawback has to do with identifiability. When the marginal distributions are discrete, the joint distribution is uniquely defined only on the support of the marginals, and the dependence between a pair of random variables depends on the marginal distributions as well as on the copula. Genest and Neslehova (2007) described the implications of this and warned that, for discrete data, “modeling and interpreting dependence through copulas is subject to caution.” But Genest and Neslehova (2007) go on to say that copula parameters can still be interpreted as dependence parameters, and estimation of copula parameters is often possible using fully parametric likelihood methods. It is precisely such methods that we recommend in the next section, and the simulation results in Section 6 suggest that copCAR’s parameters can be recovered using those methods.

4. FREQUENTIST INFERENCE FOR COPCAR

In this section we recommend three approaches—continuous extension (CE), composite marginal likelihood (CML), and distributional transform (DT)—to frequentist inference for copCAR with discrete margins. We begin by describing maximum likelihood inference for copCAR with continuous margins, since the approximate likelihoods for the CE and DT approaches are reminiscent of the true likelihood for continuous data.

4.1 Continuous Marginals

Maximum likelihood inference is straightforward for Gaussian copula models with continuous marginal distributions. To construct the relevant likelihood for copCAR, we begin with the density for the CAR copula, which can be obtained by applying the chain rule to (4):

cQ1(u)|Q|1/2||1/2exp{12y(Q1)y},

where y=(y1,,yn)=(Φσ11(u1),,Φσn1(un)) and =diag(σ2). Now, given observations zi with continuous marginal cdfs Fi and corresponding pdfs fi, the likelihood of the parameters θ given the data z is the product of the copula density and the marginal densities:

L(θ|z)cQ1{F1(z1),,Fn(zn)}i=1nfi(zi).

Notice that the argument to the copula density is the vector of probability integral transformed observations, which are standard uniform random variables, as required. This implies the copCAR log likelihood

(θ|z)=12log|Q|+12log||12y(Q1)y+i=1nlogfi(zi),

where yi=Φσi1{Fi(zi)}. This log likelihood can be optimized to arrive at the maximum likelihood estimate θ^ of θ.

4.2 Discrete Marginals

When the marginal distributions are discrete, the likelihood with respect to counting measure is

L(θ|z)=ji=01jn=01(1)kCQ1(u1j1,,unjn), (5)

where k=i=1nji, ui0=Fi(zi), and ui1=limzziFi(z)=Fi(zi)=Fi(zi1). The last equality holds when the marginals have integer support.

Unless n is quite small, computation of (5) is infeasible because the multinormal cdf is unstable in high dimensions and because the sum contains 2n terms. To put this last fact in perspective, consider that evaluating the true likelihood for a dataset not much larger than the cancer dataset analyzed in Section 7 would require approximately the same number of operations as there are atoms in the known universe.

The following subsections describe three computationally feasible approaches to inference for copCAR with discrete margins. The first approach is a form of Monte Carlo maximum likelihood. We will see that this approach does not scale as well as the others, but, because it is exact (up to Monte Carlo error), the approach can serve as a standard against which to judge the other two approaches, each of which entails a misspecification.

The Continuous Extension

The continuous extension approach to maximum likelihood inference for Gaussian copula models with discrete marginals was developed by Madsen (2009). The approach gets its name from a technique whereby a discrete random variable is transformed to a continuous one by introducing an auxiliary random variable supported on the unit interval (Denuit and Lambert 2005).

To see how this can be accomplished, first suppose that Z ~ F is a discrete random variable, and let f be the mass function corresponding to F. Let W be a continuous random variable supported on the unit interval, and suppose that W has distribution function G, density function g, and is independent of Z. Then the continuation of Z is the continuous random variable Z* = Z + (W − 1). Denuit and Lambert (2005) showed that Z* has distribution function F* (z) = F ([z]) + G(z − [z])f([z + 1]) and density function f*(z) = g(z − [z])f([z + 1]), where [•] returns the integer part of its argument. If we take W to be standard uniform, Z* = Z−W and the distribution and density functions simplify to F*(z) = F ([z])+(z−[z])f([z+1]) and f*(z) = f([z + 1]), respectively.

For copCAR we continue Z = (Z1,…, Zn)′ using n independent standard uniforms W = (W1,…, Wn)′ and form the expected likelihood

L(θ|z)EW[|Q|1/2||1/2exp{12y(Q1)y}i=1nfi(zi)],

where y=(Φσ11{F1(z1)},,Φσn1{Fn(zn)}). Using a result proved by Madsen (2009), one can show that this expectation is equal to the true likelihood given in (5).

We approximate the expectation using a sample-based approach. Let m be a positive integer, and simulate a vector of independent standard uniforms Wj = (Wj,1,…, Wj,n)′ for j = 1, 2,…, m. Then approximate the expected likelihood using

LCE(θ|z)=1mj=1m|Q|1/2||1/2exp{12yj(Q1)yj}i=1nfi(zi), (6)

where yj,i=Φσi1{Fi(ziwj,i)}. This approximate likelihood can then be optimized to arrive at an approximate maximum likelihood estimate θ^CE of θ.

The Distributional Transform

When the marginal distributions have fairly smooth cdfs, we recommend that the likelihood be approximated using the distributional transform. This approach was proposed by Kazianka and Pilz (2010) for fitting Gaussian copula geostatistical models.

It is well known that if Z ~ F is continuous, F (Z) has a standard uniform distribution. But if Z is discrete, F (Z) tends to be stochastically larger, and F(Z) tends to be stochastically smaller, than a standard uniform random variable. This can be remedied by stochastically “smoothing” F at its jumps, a technique that goes at least as far back as Ferguson (1967), who used it in connection with hypothesis tests. More recently, the DT has been applied to stochastic ordering (Rüschendorf 1981), conditional value at risk (Burgert and Rüschendorf 2006), and the extension of limit theorems for the empirical copula process to general distributions (Rüschendorf 2009), for example.

Let W~U(0,1), and suppose that Z ~ F and is independent of W. Then the distributional transform G(W, Z) = (1 − W)F (Z) + WF (Z) follows a standard uniform distribution and F−1{G(W, Z)} follows the same distribution as Z. See Rüschendorf (2009) for a proof.

Turning back to the problem at hand, the DT-based approximate likelihood for copCAR models with discrete marginals can be developed as follows. For each i ∈ {1,…, n}, let

Gi(Wi,Zi)=(1Wi)Fi(Zi)+WiFi(Zi),

where the Wi are standard uniform random variables and are independent of one another and of the Zi. Now put

ui=E{Gi(Wi,Zi)|Zi=zi}={Fi(zi)+Fi(zi)}/2=(ui0+ui1)/2. (7)

Then the approximate likelihood for copCAR is

LDT(θ|z)=cQ1(u1,,un)i=1nfi(zi),

which implies the approximate log likelihood

DT(θ|z)=12log|Q|+12log||12y(Q1)y+i=1nlogfi(zi), (8)

where yi=Φσi1(ui). Optimization of (8) yields θ^DT.

Kazianka and Pilz (2010) noted that the DT-based approximation performs well when the cdfs are not too rough. A subsequent investigation by Kazianka (2013) revealed that the marginal variance(s) is a suitable indicator of the level of discretization, and hence the performance of the DT-based approximation. Specifically, a smaller variance(s) leads to worse performance. Our simulation studies (Section 6) show that the approach performs well for copCAR when the marginals are Poisson with small means, and so we should expect the approach to perform well for copCAR analyses of similar types of data, e.g., negative binomial data. Other simulation studies, the results of which will not be detailed here, confirmed this conjecture. Specifically, the DT-based approximation performs well for copCAR with negative binomial marginals, but performance is unacceptable when the marginals are Bernoulli.

Composite Marginal Likelihood

When the marginal cdfs are rather rough (e.g., Bernoulli), we recommend a composite marginal likelihood approach (Varin 2008), where the objective function is a product of pairwise likelihoods:

LCML(θ|z)=i,j{1,,n}ijj1=01j2=01(1)kCij(uij1,ujj2),

where Cij denotes the bivariate Gaussian copula with covariance matrix

Vij=(σi2(Q1)ij(Q1)ijσj2).

This implies the log composite likelihood

CML(θ|z)=i{1,,n1}j{i+1,,n}log{j1=01j2=01(1)kΦVij(yij1,yjj2)}, (9)

where y0=Φσ1{F(z)} and y1=Φσ1{F(z1)}. Optimization of (9) yields θ^CML. (For more information about composite likelihood methods, see, e.g., the seminal paper by Lindsay (1988), or the more recent overview by Varin, Reid and Firth (2011).)

Computing Confidence Intervals

A preliminary simulation study revealed that β^CE, β^CML, and β^DT tend to be approximately normally distributed, whereas ρ^CE, ρ^CML, and ρ^DT tend to be left skewed when ρ is close to 1. This implies that asymptotic inference for ρ tends to result in poor coverage rates. This can be avoided by using a parametric bootstrap, but a parametric bootstrap is rather burdensome computationally, especially for the continuous extension.

Luckily, a simple reparameterization yields an approximately normally distributed estimator because the objective function for the reparameterized model is approximately quadratic with constant Hessian (Geyer 2013). This is true for all three procedures, at least for the realistic scenarios considered in this article. Specifically, for θ = (β′, Φ−1(ρ))′, we have

n(θ^CEθ)N{0,ICE1(θ)}n(θ^CMLθ)N{0,ICML1(θ)JCML(θ)ICML1(θ)}n(θ^DTθ)N{0,IDT1(θ)JDT(θ)IDT1(θ)},

where I is the appropriate Fisher information matrix and J is the variance of the score:

J(θ)=V(θ|Z).

Note that the asymptotic covariance matrices for the CML and DT estimators are Godambe information matrices (Godambe 1960) because ℓCML and ℓDT are misspecified.

A sensible estimator of I is the observed Fisher information matrix I^(θ)=H(θ^), where H denotes the Hessian matrix [∇2(θ | Z)]. This matrix can be approximated efficiently as a side effect of optimization. Estimation of J is more challenging. One appealingly general approach is window subsampling (Heagerty and Lumley 2000), but the quality of the resulting estimate can be sensitive to the manner in which internal replicates are defined. We recommend that J be estimated using a parametric bootstrap instead, i.e, our estimator of J is

J^(θ)=1bk=1b(θ^|Z(k)),

where b is the bootstrap sample size and the Z(k) are datasets simulated from copCAR at θ=θ^. This approach performs well, as our simulation results show, and is considerably more efficient computationally than a “full” bootstrap (it is much faster to approximate the score than to optimize the objective function). The procedure can be made even more efficient through parallelization. We recommend the snow package (Tierney, Rossini and Li 2009) for R (Ihaka and Gentleman 1996), which makes embarrassingly parallel computation embarrassingly easy to do.

5. COMPUTING ISSUES

In this section we will show how to do reliable and efficient computation for copCAR. The recipe for efficiency is threefold: for CE and DT estimation, (1) exploit the sparsity of Q to compute |Q| quickly, and (2) employ matrix identities that allow the CAR variances to be approximated quickly and accurately; and (3) use G’s adjacency structure to speed up maximum CML estimation. The use of these techniques makes copCAR suitable for the analyses of larger areal datasets. Additionally, we describe how to ensure the accuracy and numerical stability of the continuous extension.

5.1 Computing |Q| Efficiently

Numerical methods for sparse matrices can be used to compute |Q| quickly. Let C be the lower Cholesky triangle of Q, so that Q = CC′. Then |Q| = |C||C′| = |C|2, which implies that

12log|Q|=i=1nlogCii. (10)

The righthand side of (10) can be computed efficiently because C can be computed efficiently after Q has been reordered to reduce its bandwidth. Also note that C needs to be computed just once—the structure of C depends on the sparsity structure of Q and not on ρ, and so C can simply be updated to reflect a change in ρ (Ng and Peyton 1993).

We use the spam package (Furrer and Sain 2010) for R to do sparse matrix computations. The chol and update functions of package spam perform the fast Cholesky decomposition and updating described in the preceding paragraph. (See Rue and Held (2005) for more information about the use of sparse matrix techniques in statistical computing.)

5.2 Approximating the CAR Variances

Since computing σ2 requires inversion of Q (the relevant spam function is chol2inv), it would seem that using the CAR copula given in (4) must leave us unable to fully exploit the sparsity of Q. This is not the case, however, because the variances in (4) can be replaced with approximations so that Q need not be inverted. Here we present an algorithm for computing the approximate variances quickly and with arbitrary accuracy. This scheme has a greater upfront cost than inversion of Q, but the amortized cost of approximating σ2 is much lower. And the difference between the approximate CAR copula and (4) is negligible except with respect to computational complexity.

First, choose a maximum value for ρ, say ρmax, and a tolerance ∈ > 0. Then if k is chosen such that

k>log(1ρmax)+loglog c(G)logρmax1,

where

c(G)=maxi{(Ω2)iidi},

the approximate variances σ2=(σ12,σn2) will satisfy (σi2σi2)< for all i and for every ρ ∈ [0, ρmax). See the appendix for a proof of this claim and an efficient algorithm for finding the smallest possible value of k.

For a given value of ρ, the approximate variances are σ2=D1Mρ, where M•1 = 1; M•j = vecdiag(Ωj−1) for j = 2,…, k + 1; and ρ = (1, ρ, ρ2,…, ρk)′. The diagonal of Ωj−1 can be computed quickly using the eigendecomposition of Ω. Let Ω = ΓΛΓ−1, where Λ = diag(λ) and Γ has as its columns the eigenvectors of Ω. Then

vecdiag(Ωj1)={Γ(Γ1)}λj1 (11)

(Horn and Johnson 1994, p. 305), where denotes the Hadamard product and λj1=(λ1j1,,λnj1). This simple yet somewhat obscure identity is crucial for making computation of σ2 much more efficient than computation of σ2. Naive computation of the desired vectors is rather expensive, whereas the use of (11) allows M to be computed efficiently, even for large datasets. For example, with ρmax = 0.999, = 0.01, and G the 50 × 50 square lattice, k is equal to 4,100. Under these circumstances, a pure R implementation of (11) can build M in approximately eleven minutes on a 2.8 GHz quad-core Intel Xeon processor. A C++ implementation based on the Armadillo library (Sanderson 2010) built M in just eight minutes. (Armadillo code can be called from R using the Rcpp and RcppArmadillo packages (Eddelbuettel and Francois 2011).)

In our simulation studies, and for the analysis of the cancer data, we used ρmax = 0.999 and ∈ = 0.01. Not once did ρ^ equal ρmax for any of the thousands of datasets analyzed. This is because the estimator ρ^ becomes very precise as ρ approaches 1. A practitioner who wishes to be more conservative might use, say, 0.9999 instead, but this would be unnecessary in most cases and would increase the computational burden.

The value of 0.01 for is already conservative because the proof given in the appendix does not provide a tight upper bound for the discrepancy between the true and approximate variances. For the scenarios described in this article, using = 0.01 leads to approximate variances that are numerically indistinguishable from the true variances.

A thoughtful referee suggested that, in some scenarios, it might be more efficient to simply invert Q by inverting C. This is true for small datasets, but inversion does not scale well. For the graph used in our simulation study, i.e., the square lattice with n vertices, matrix inversion has a running time in O(n2). Our approach, which exploits the special structure of the CAR precision matrix (see the proof for details), has a running time in O(n). This advantage can reduce the running time considerably when n is large (greater than 1,000, say).

5.3 Fast Maximum CML Estimation

The form of ℓCML given in (9) requires four evaluations of the bivariate normal cdf for each of the n(n − 1)/2 pairs of observations. This computation is rather expensive even for fairly small samples.

In a spatial setting we can expect a pair of nearby observations to carry more information about dependence than a pair of more distant observations. Others have found, in a variety of contexts, that retaining the contributions to the CML made by more distant pairs of observations decreases not only the computational efficiency of the procedure but also the statistical efficiency of the estimator (Varin and Vidoni 2009; Apanasovich, Ruppert, Lupton, Popovic, Turner, Chapkin and Carroll 2008). We investigated this by simulation and found that including second-order, not to mention third-order, neighbors in the copCAR CML does yield a less efficient estimator. Hence, we allow only pairs of adjacent observations to contribute to the copCAR CML. This means replacing (9) with

CML(θ|z)=i,j:(i,j)Ei<jlog{j1=01j2=01(1)kΦVij(yij1,yjj2)}. (12)

If thoughtfully implemented, optimization of (12) is efficient enough to permit the analyses of larger areal datasets. For example, Brenton Kenkel’s pbivnorm package (Kenkel 2012) for R offers a vectorized implementation of the bivariate normal cdf. The vectorization minimizes the number of calls to the underlying Fortran workhorse function. Through further specialization of Kenkel’s R wrapper (e.g., removal of many error-checking statements that are unnecessary in our setting), we were able to speed up the evaluation of (12) by a factor of ten relative to a more naive implementation.

5.4 Stabilizing the Continuous Extension

Computation of LCE is numerically stable for smaller sample sizes but becomes unstable as the sample size increases. This is because the quadratic form vj=yj(Q1)yj can be so large that computation of exp(vj) results in underflow. If the sample size is large enough, computation of exp (vj) may cause underflow for every j, in which case application of the log function then causes overflow. And even if this worst case scenario does not obtain, many of the exponentials may underflow, which means the procedure is not able to benefit fully from the current Monte Carlo sample size m.

This instability can be alleviated, using the so called log-sum-exp trick, as follows. Suppose our aim is to compute log(ea + eb), where a, b < 0 are potentially large in absolute value. Choose an appropriate c, and rewrite the expression as log{(ea + eb)e−cec} = log(ea−c + eb−c) + c. Choosing c = max{a, b} sets one of the exponents to 0 and shifts the other toward 0, which reduces the likelihood of underflow. The resulting stabilized form of the log likelihood is

CE(θ|z)=12log|Q|+12log||+i=1nlogfi(zi)+logj=1mexp{12(ujs)},

where s = max{v1,…, vm}. The added computational cost is in O(m) and so is dominated by the cost of evaluating the quadratic forms.

5.5 Choosing the Monte Carlo Sample Size for the Continuous Extension

One way to choose m is to compute Monte Carlo standard errors (Flegal, Haran and Jones 2008). In the case of ρ^CE, for example, observe that Vρ^CE=E{V(ρ^CE|W)}+V{E(ρ^CE|W)}, where W denotes the collection of the Wj. The second term in this sum is the Monte Carlo variance of ρ^CE, and it can be estimated by computing ρ^CE(k) for many different W(k) (k = 1,…, b) and taking the sample standard deviation of the estimates:

mcse(ρ^CE)={1b1k=1b(ρ^CE(k)μ^)2}1/2,

where μ^ is the sample mean b1k=1bρ^CE(k) (Madsen and Fang 2011). If the Monte Carlo standard error of ρ^CE is small relative to the sample mean, i.e., if the estimated coefficient of variation cv(ρ^CE)=mcse(ρ^CE)/μ^ is sufficiently small, the current value of m is sufficiently large.

Another approach is to fit a sequence of models for values of m that range over several orders of magnitude. We used this method while preparing to conduct our simulation studies and found that m = 100 is generally sufficient when the dependence is weak to moderate in strength. When the dependence is strong, however, m should be 1,000 or even 10,000. Thus the CE approach scales poorly relative to the CML and DT approaches, even though the latter two approaches require bootstrap estimation of J. For example, we took G to be the 50×50 square lattice, and simulated from copCAR with Poisson marginals (see below for details). Analyzing such data using the CE approach with m = 10,000 took about 225 minutes, on average, while CML and DT analyses (with a bootstrap sample size of 500 and using five CPU cores in parallel) averaged approximately 75 minutes and 30 minutes, respectively.

6. SIMULATION STUDY

To determine the finite-sample behavior of the CE, CML, and DT estimators for copCAR, we conducted an extensive simulation study. For this study we simulated data on the 20 × 20 and 30 × 30 square lattices, where the coordinates of the vertices were restricted to the unit square. We chose to focus on these two lattice sizes for two reasons: (1) most areal datasets have a few hundred observations, and so the 20 × 20 lattice represents a realistic scenario, and (2) the 30 × 30 lattice is large enough by comparison to reveal large-sample behavior.

The marginal distributions were Poisson with means exp(xi + yi), where xi and yi are the coordinates of vertex i. Each coordinate was treated as a separate spatial predictor, and so β = (β1, β2)′ = (1, 1)′. The resulting means range from 1 to approximately 7.4 and increase in the direction (1,1)′.

We used three values for ρ: 0.8, 0.975, and 0.995. These correspond to moderate, strong, and very strong dependence, respectively. This is illustrated in Figure 1. The dots in Panel A mark the correlations between the first CAR variable and every other variable in the same row of the 20 × 20 lattice. The level plots in Panel B show simulated realizations of the CAR copula, one for each value of ρ.

Figure 1.

Figure 1

Panel A shows the correlations between the first CAR variable of the simulation study and every other variable in the same row of the 20 × 20 lattice, for three values of the dependence parameter ρ. The circles mark the correlations; the curves merely guide the eye. The horizontal line allows us to see the effective range of the correlation function, i.e., the “distance” at which the correlation has dropped to 0.05. Panel B shows a simulated realization from the CAR copula for each value of ρ.

For each lattice size and value of ρ we analyzed 1,000 simulated datasets, 6,000 datasets altogether. We used a BFGS quasi-Newton method (Byrd, Lu, Nocedal and Zhu 1995) to optimize the objective functions. For the CE and DT procedures we chose ρmax = 0.999 and constrained the estimate of ρ to (0, ρmax). For the CE approach we used m = 100 for ρ = 0.8, and m = 1,000 for the two larger values of ρ.

We obtained approximate 95% confidence intervals for the coordinates of θ using the approach described near the end of Section 4, with a bootstrap sample size of 500 for estimation of JCML and JDT.

Tables 1 and 2 show mean estimates, mean standard errors, simulation standard deviations, and coverage rates for the various scenarios. We see that the three approaches perform well—and comparably, for the most part. As expected, θ^CE is the most efficient of the three estimators, and θ^CML is the least efficient. What is perhaps surprising is the excellent performance of the DT estimator: β^DT is nearly as efficient as β^CE, and ρ^DT is more efficient than ρ^CE.

Table 1.

The results of the simulation study for the 20 × 20 lattice and Poisson marginals.

Mean Estimate Mean SE Simulation SD Coverage
Parameter CE CML DT CE CML DT CE CML DT CE CML DT
β1 = 1 1.002 1.008 0.999 0.130 0.136 0.142 0.122 0.128 0.124 94% 96% 95%
ρ = 0.8 0.769 0.772 0.806 0.065 0.067 0.070 0.065 0.065 0.063 96% 96% 96%

β1 = 1 0.981 0.969 0.991 0.236 0.267 0.261 0.252 0.287 0.253 94% 94% 95%
ρ = 0.975 0.971 0.968 0.977 0.013 0.016 0.011 0.013 0.012 0.008 94% 96% 95%

β1 = 1 0.966 0.956 0.980 0.303 0.377 0.331 0.322 0.406 0.336 94% 95% 97%
ρ = 0.995 0.993 0.992 0.995 0.003 0.005 0.002 0.003 0.004 0.002 93% 97% 98%

Table 2.

The results of the simulation study for the 30 × 30 lattice and Poisson marginals.

Mean Estimate Mean SE Simulation SD Coverage
Parameter CE CML DT CE CML DT CE CML DT CE CML DT
β1 = 1 1.004 1.004 1.001 0.089 0.095 0.090 0.085 0.088 0.087 96% 97% 97%
ρ = 0.8 0.779 0.791 0.821 0.042 0.042 0.049 0.042 0.044 0.041 94% 96% 96%

β1 = 1 0.988 0.991 1.001 0.178 0.197 0.190 0.181 0.202 0.185 94% 94% 96%
ρ = 0.975 0.972 0.970 0.976 0.009 0.010 0.008 0.009 0.010 0.008 96% 93% 94%

β1 = 1 0.985 1.005 1.016 0.244 0.311 0.285 0.251 0.326 0.283 95% 94% 96%
ρ = 0.995 0.995 0.993 0.995 0.002 0.003 0.002 0.002 0.002 0.002 95% 98% 98%

And we see that copCAR, unlike the areal GLMM, is able to recover ρ. In fact, ρ^ becomes considerably more precise as ρ approaches 1. This is a useful property—when ρ is close to 1, we can be more certain of that fact and so more certain that the data are strongly dependent, in which case regression inference is less likely to be reliable.

7. COPCAR APPLIED TO THE SLOVENIA DATA

The plot in Figure 2 shows stomach cancer incidence data for Slovenia for the period 1995–2001. These data were first analyzed by Zadnik and Reich (2006), who investigated the relationship between stomach cancer incidence and socioeconomic factors. Then Reich et al. (2006) reconsidered the data while exploring the spatial confounding of the areal GLMM. The plotted values are observed standardized incidence ratios (SIR), i.e., the ratio of the observed to the expected (Oi/Ei) number of cases, for 194 municipalities.

Figure 2.

Figure 2

The Slovenia stomach cancer data. A darker shade of gray indicates a larger SIR. The values range from 0 to approximately 4.1.

The covariate of interest is each municipality’s socioeconomic status (SE) as determined by Slovenia’s Institute of Macroeconomic Analysis and Development. Each status takes one of five ordered values. Visual inspection reveals a negative relationship between SIR and SE—western Slovenia tends to exhibit low SIR and high SE, while eastern Slovenia tends to exhibit high SIR and low SE.

To these data we fitted one nonspatial and two spatial Poisson regressions. For the areal GLMM with proper CAR, the model is

logEOi=log Ei+α+β SE+Si,

where α is an intercept and β is the fixed effect for SE. (Note that SE was standardized prior to the analyses.) The corresponding expressions for the classical GLM and copCAR are the same except Si does not appear.

Since the areal GLMM is usually applied in a Bayesian setting, we fitted the areal GLMM using MCMC for Bayesian inference. We chose uninformative prior distributions for (α, β)′ and ρ: N (0; 1,000,000 I) and U(0, 1), respectively. And the prior for τ (which is present only for the areal GLMM) was gamma with shape parameter 0.5 and scale parameter 2,000 (Kelsall and Wakefield 1999). This prior is appealing because it corresponds to the prior assumption that the fixed effects are sufficient to explain the data, and because it discourages artifactual spatial structure in the posterior. We used a Metropolis–Hastings random walk to draw a sample of size 100,000. The MCMC estimates stabilized after approximately 75,000 iterations. The effective sample size for β was just under 1,700, and the estimated coefficients of variation were smaller than 0.02.

For copCAR we used m = 1,000 (for CE), ρmax = 0.999 (for CE and DT), and a bootstrap sample size of 500 (for CML and DT).

The results of the analyses are shown in Table 3. copCAR found SE to be a significant predictor of SIR, while the areal GLMM analysis indicates no regression. The copCAR estimates of ρ were small, which suggests that SE accounts for nearly all of the spatial clustering in the response. The copCAR intervals are appreciably wider than the intervals from the nonspatial fit, however, which implies that a spatial model is indicated for these data. The areal GLMM had to find a much larger degree of spatial dependence in order to explain the data. This is presumably due to confounding, which would explain the much wider intervals for α and β.

Table 3.

The results of applying the classical GLM, copCAR, and the areal GLMM to the stomach cancer data.

Model α β ρ τ
Nonspatial 0.156 (0.120, 0.193) −0.137 (−0.176, −0.098) NA NA
copCAR (CE) 0.153 (0.111, 0.195) −0.128 (−0.171, −0.084) 0.289 (0.097, 0.574) NA
copCAR (CML) 0.170 (0.124, 0.216) −0.149 (−0.195, −0.101) 0.279 (0.139, 0.466) NA
copCAR (DT) 0.153 (0.112, 0.194) −0.128 (−0.174, −0.081) 0.283 (0.138, 0.475) NA
areal GLMM 0.137 (0.003, 0.216) −0.070 (−0.144, 0.022) 0.980 (0.494, 1.000) 4.54 (2.66, 8.49)

These results are comparable to those given by Reich et al. (2006). They applied both the traditional ICAR GLMM and their reparameterized ICAR GLMM, the RHZ model, to the Slovenia data. copCAR produced inference very similar to that produced by the RHZ model. And the proper CAR GLMM and the traditional ICAR GLMM, both of which are known to be spatially confounded, resulted in 95% HPD intervals for β that cover 0.

We have not provided measures of model complexity and fit because such measures would be misleading. We would expect the areal GLMM to provide a better fit than copCAR simply because the former has spatial random effects. But closeness of fit is beside the point. copCAR’s purpose is regression, not smoothing. And the mixed model’s facility for smoothing does not imply that the model also performs well with respect to regression. The results of Reich et al. (2006) and Hughes and Haran (2013) show that the mixed model tends to result in a close fit and erroneous regression inference. copCAR tends to provide a poorer fit, but regression inference is improved because copCAR cannot be spatially confounded.

In any case, copCAR is capable of smoothing if smoothing is desired. This can be accomplished using a technique known as spatial filtering (Griffith 2003), the details of which are beyond the scope of this article.

8. CONCLUSION

This article developed copCAR, a flexible regression model for areal data. Because copCAR induces spatial dependence using a copula instead of an augmented linear predictor, copCAR overcomes many of the problems posed by the most commonly used areal models. Specifically, copCAR is flexible, permits positive spatial dependence for all types of data, is able to recover the dependence parameter ρ, cannot exhibit spatial confounding, and does not require conditional interpretation of estimated regression coefficients. Moreover, copCAR permits efficient computation for approximate maximum likelihood and maximum composite marginal likelihood inference. This makes copCAR suitable for the analyses of larger areal datasets.

Supplementary Material

Supplementary Materials

Acknowledgments

The author is grateful to Jim Hodges, Brian Reich, and Vesna Zadnik for providing the Slovenia data; and to an associate editor and two referees, whose many helpful comments led to a much better article.

This work was partially supported by a grant from the National Cancer Institute, National Institutes of Health (R03 CA179555).

APPENDIX

Proposition 1

Let Y~N{0,(DρA)1}. Then for any ∈ > 0,

VYi1di{1+ρ(Ω)ii+ρ2(Ω2)ii++ρk(Ωk)ii}<

for all i so long as

k>log(1ρ)+loglogc(G)logρ1,

where

c(G)=maxi{(Ω2)iidi}.

Proof

Observe that

VY=(DρA)1={D(IρΩ)}1=(IρΩ)1D1,

which implies that VYi=1di{(IρΩ)1}ii. Now, if |ρ|<1,(IρΩ)1 can be written as

(IρΩ)1=I+ρΩ+ρ2Ω2++ρkΩk+

(Iosifescu 1980, p. 45), and so we see that

VYi=1di{1+ρ(Ω)ii+ρ2(Ω2)ii++ρk(Ωk)ii}+ri,

Where ri=1dim=k+1ρm(Ωm)ii. Since (Ωm)ii(Ω2)ii when m > 2, we have

ri1dim=k+1ρm(Ω2)ii=(Ω2)iidim=k+1ρm=(Ω2)iidiρk+11ρ.

Thus ri < ∈ for all i if

maxi{(Ω2)iidi}ρk+11ρ<.

And a bit of algebra establishes the result.

Note that k can be decreased considerably by using a larger power of Ω in place of Ω2. An efficient way to find the smallest possible value for k is to first generate the sequence

c(G)2,c(G)4,,c(G)2j,=maxi{(Ω2)iidi},maxi{(Ω4)iidi},,maxi{(Ω2j)iidi},

until the sequence converges, to c(G)*, say. Then take

k>log(1ρ)+loglogc(G)logρ1.

Footnotes

SUPPLEMENTARY MATERIALS

R code: This file contains the R code used to apply copCAR to the Slovenia data. (copCAR.R)

Slovenia data: This file contains the Slovenia data. (Slovenia.R)

References

  1. Apanasovich T, Ruppert D, Lupton J, Popovic N, Turner N, Chapkin R, Carroll R. Aberrant crypt foci and semiparametric modeling of correlated binary data. Biometrics. 2008;64(2):490–500. doi: 10.1111/j.1541-0420.2007.00892.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Assunção R, Krainski E. Neighborhood dependence in Bayesian spatial models. Biometrical Journal. 2009;51(5):851–869. doi: 10.1002/bimj.200900056. [DOI] [PubMed] [Google Scholar]
  3. Banerjee S, Carlin B, Gelfand A. Hierarchical Modeling and Analysis for Spatial Data. Boca Raton: Chapman & Hall Ltd; 2004. [Google Scholar]
  4. Bárdossy A, Li J. Geostatistical interpolation using copulas. Water Resour Res. 2008;44(7):W07412. [Google Scholar]
  5. Besag J. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society Series B (Methodological) 1974;36(2):192–236. [Google Scholar]
  6. Besag J, Kooperberg C. On conditional and intrinsic autoregression. Biometrika. 1995;82(4):733–746. [Google Scholar]
  7. Besag J, York J, Mollié A. Bayesian image restoration, with two applications in spatial statistics. Annals of the Institute of Statistical Mathematics. 1991;43(1):1–20. [Google Scholar]
  8. Burgert C, Rüschendorf L. On the optimal risk allocation problem. Statistics & Decisions. 2006;24(1/2006):153–171. [Google Scholar]
  9. Byrd R, Lu P, Nocedal J, Zhu C. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing. 1995;16(5):1190–1208. [Google Scholar]
  10. Clayton D, Bernardinelli L, Montomoli C. Spatial correlation in ecological analysis. International Journal of Epidemiology. 1993;22(6):1193–1202. doi: 10.1093/ije/22.6.1193. [DOI] [PubMed] [Google Scholar]
  11. Demarta S, McNeil A. The t copula and related copulas. International Statistical Review. 2005;73(1):111–129. [Google Scholar]
  12. Denuit M, Lambert P. Constraints on concordance measures in bivariate discrete data. Journal of Multivariate Analysis. 2005;93(1):40–57. [Google Scholar]
  13. Diggle P, Tawn J, Moyeed R. Model-based geostatistics. Applied Statistics. 1998:299–350. [Google Scholar]
  14. Eddelbuettel D, Francois R. Rcpp: Seamless R and C++ integration. Journal of Statistical Software. 2011;40(8):1–18. [Google Scholar]
  15. Ferguson T. Mathematical Statistics: A Decision Theoretic Approach. New York: Academic Press; 1967. [Google Scholar]
  16. Flegal J, Haran M, Jones G. Markov chain Monte Carlo: Can we trust the third significant figure? Statistical Science. 2008;23(2):250–260. [Google Scholar]
  17. Furrer R, Sain SR. spam: A sparse matrix R package with emphasis on MCMC methods for Gaussian Markov random fields. Journal of Statistical Software. 2010;36(10):1–25. [Google Scholar]
  18. Gelfand A, Vounatsou P. Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics. 2003;4(1):11. doi: 10.1093/biostatistics/4.1.11. [DOI] [PubMed] [Google Scholar]
  19. Genest C, Neslehova J. A primer on copulas for count data. Astin Bulletin. 2007;37(2):475. [Google Scholar]
  20. Geyer C. On the convergence of Monte Carlo maximum likelihood calculations. Journal of the Royal Statistical Society Series B (Methodological) 1994:261–274. [Google Scholar]
  21. Geyer CJ. Le Cam made simple: Asymptotics of maximum likelihood without the LLN or CLT or sample size going to infinity. In: Jones GL, Shen X, editors. Advances in Modern Statistical Theory and Applications: A Festschrift in honor of Morris L. Eaton. Beachwood, Ohio, USA: Institute of Mathematical Statistics; 2013. [Google Scholar]
  22. Godambe V. An optimum property of regular maximum likelihood estimation. The Annals of Mathematical Statistics. 1960:1208–1211. [Google Scholar]
  23. Griffith DA. Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization. Berlin: Springer; 2003. [Google Scholar]
  24. Haran M, Hodges J, Carlin B. Accelerating computation in Markov random field models for spatial data via structured MCMC. Journal of Computational and Graphical Statistics. 2003;12(2):249–264. [Google Scholar]
  25. Haran M, Tierney L. On automating Markov chain Monte Carlo for a class of spatial models. 2012 arXiv preprint arXiv:1205.0499. [Google Scholar]
  26. Heagerty P, Lumley T. Window subsampling of estimating functions with application to regression models. Journal of the American Statistical Association. 2000;95(449):197–211. [Google Scholar]
  27. Hoef JV, Jansen J. Space-time zero-inflated count models of harbor seals. Environmetrics. 2007;18(7):697–712. [Google Scholar]
  28. Horn RA, Johnson CR. Topics in Matrix Analysis. New York: Cambridge University Press; 1994. [Google Scholar]
  29. Hughes J, Haran M. Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2013;75(1):139–159. [Google Scholar]
  30. Hughes J, Haran M, Caragea PC. Autologistic models for binary data on a lattice. Environmetrics. 2011;22(7):857–871. [Google Scholar]
  31. Ihaka R, Gentleman R. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics. 1996;5:299–314. [Google Scholar]
  32. Iosifescu M. Finite Markov Processes and Their Applications. New York: Wiley; 1980. [Google Scholar]
  33. Ising E. Beitrag zur theorie des ferromagnetismus. Zeitschrift für Physik A Hadrons and Nuclei. 1925;31(1):253–258. [Google Scholar]
  34. Kaiser M, Cressie N. Modeling Poisson variables with positive spatial dependence. Statistics & Probability Letters. 1997;35(4):423–432. [Google Scholar]
  35. Kaiser M, Cressie N. The construction of multivariate distributions from Markov random fields. Journal of Multivariate Analysis. 2000;73(2):199–220. [Google Scholar]
  36. Kazianka H. Approximate copula-based estimation and prediction of discrete spatial data. Stochastic Environmental Research and Risk Assessment. 2013;27(8):2015–2026. [Google Scholar]
  37. Kazianka H, Pilz J. Copula-based geostatistical modeling of continuous and discrete data including covariates. Stochastic Environmental Research and Risk Assessment. 2010;24(5):661–673. [Google Scholar]
  38. Kelsall J, Wakefield J. Discussion of “Bayesian models for spatially correlated disease and exposure data”, by Best et al. In: Bernardo J, Berger J, Dawid A, Smith A, editors. Bayesian Statistics 6. New York: Oxford University Press; 1999. [Google Scholar]
  39. Kenkel B. pbivnorm: Vectorized Bivariate Normal CDF. 2012 R package version 0.5-1. URL: http://CRAN.R-project.org/package=pbivnorm.
  40. Kindermann R, Snell J. Markov Random Fields and Their Applications. Providence, RI: American Mathematical Society; 1980. [Google Scholar]
  41. Knorr-Held L, Rue H. On block updating in Markov random field models for disease mapping. Scandinavian Journal of Statistics. 2002;29(4):597–614. [Google Scholar]
  42. Kolev N, Paiva D. Copula-based regression models: A survey. Journal of statistical planning and inference. 2009;139(11):3847–3856. [Google Scholar]
  43. Lindsay B. Composite likelihood methods. Contemporary Mathematics. 1988;80(1):221–239. [Google Scholar]
  44. Lunn D, Thomas A, Best N, Spiegelhalter D. WinBUGS – A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing. 2000;10(4):325–337. [Google Scholar]
  45. Madsen L. Maximum likelihood estimation of regression parameters with spatially dependent discrete data. JABES. 2009;14(4):375–391. [Google Scholar]
  46. Madsen L, Fang Y. Joint regression analysis for discrete longitudinal data. Biometrics. 2011;67(3):1171–1175. doi: 10.1111/j.1541-0420.2010.01494.x. [DOI] [PubMed] [Google Scholar]
  47. Møller J. Perfect simulation of conditionally specified models. Journal of the Royal Statistical Society, Series B, Methodological. 1999;61:251–264. [Google Scholar]
  48. Moller J, Pettitt A, Reeves R, Berthelsen K. An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika. 2006;93(2):451–458. [Google Scholar]
  49. Nelsen RB. An Introduction to Copulas. New York: Springer; 2006. [Google Scholar]
  50. Ng EG, Peyton BW. Block sparse Cholesky algorithms on advanced uniprocessor computers. SIAM Journal on Scientific Computing. 1993;14(5):1034–1056. [Google Scholar]
  51. Reich B, Hodges J, Zadnik V. Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics. 2006;62(4):1197–1206. doi: 10.1111/j.1541-0420.2006.00617.x. [DOI] [PubMed] [Google Scholar]
  52. Rue H, Held L. Gaussian Markov Random Fields: Theory and Applications Vol. 104 of Monographs on Statistics and Applied Probability. London: Chapman & Hall; 2005. [Google Scholar]
  53. Rüschendorf L. Stochastically ordered distributions and monotonicity of the OC-function of sequential probability ratio tests. Statistics. 1981;12(3):327–338. [Google Scholar]
  54. Rüschendorf L. On the distributional transform, Sklar’s theorem, and the empirical copula process. Journal of Statistical Planning and Inference. 2009;139(11):3921–3927. [Google Scholar]
  55. Sanderson C. Armadillo: An open source C++ linear algebra library for fast prototyping and computationally intensive experiments. 2010. (Technical report, NICTA). [Google Scholar]
  56. Sklar A. Fonctions de répartition à n dimensions et leurs marges. Publ Inst Statist Univ Paris. 1959;8(1):11. [Google Scholar]
  57. Smith MS, Gan Q, Kohn RJ. Modelling dependence using skew t copulas: Bayesian inference and applications. Journal of Applied Econometrics. 2012;27(3):500–522. [Google Scholar]
  58. Tierney L, Rossini A, Li N. Snow : A parallel computing framework for the R system. International Journal of Parallel Programming. 2009;37(1):78–90. [Google Scholar]
  59. Varin C. On composite marginal likelihoods. AStA Advances in Statistical Analysis. 2008;92(1):1–28. [Google Scholar]
  60. Varin C, Reid N, Firth D. An overview of composite likelihood methods. Statistica Sinica. 2011;21(1):5–42. [Google Scholar]
  61. Varin C, Vidoni P. Pairwise likelihood inference for general state space models. Econometric Reviews, 28. 2009;1(3):170–185. [Google Scholar]
  62. Zadnik V, Reich B. Analysis of the relationship between socioeconomic factors and stomach cancer incidence in Slovenia. Neoplasma. 2006;53(2):103. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Materials

RESOURCES