Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Aug 27.
Published before final editing as: Stat Med. 2020 Feb 27:10.1002/sim.8514. doi: 10.1002/sim.8514

Spatiotemporal signal detection using continuous shrinkage priors

An-Ting Jhuang 1, Montserrat Fuentes 2, Dipankar Bandyopadhyay 3, Brian J Reich 4
PMCID: PMC7561003  NIHMSID: NIHMS1634844  PMID: 32106341

Abstract

Periodontal disease (PD) is a chronic inflammatory disease that affects the gum tissue and bone supporting the teeth. Although tooth-site level PD progression is believed to be spatio-temporally referenced, the whole-mouth average periodontal pocket depth (PPD) has been commonly used as an indicator of the current/active status of PD. This leads to imminent loss of information, and imprecise parameter estimates. Despite availability of statistical methods that accommodates spatiotemporal information for responses collected at the tooth-site level, the enormity of longitudinal databases derived from oral health practice-based settings render them unscalable for application. To mitigate this, we introduce a Bayesian spatiotemporal model to detect problematic/diseased tooth-sites dynamically inside the mouth for any subject obtained from large databases. This is achieved via a spatial continuous sparsity-inducing shrinkage prior on spatially varying linear-trend regression coefficients. A low-rank representation captures the nonstationary covariance structure of the PPD outcomes, and facilitates the relevant Markov chain Monte Carlo computing steps applicable to thousands of study subjects. Application of our method to both simulated data and to a rich database of electronic dental records from the HealthPartners® Institute reveal improved prediction performances, compared with alternative models with usual Gaussian priors for regression parameters and conditionally autoregressive specification of the covariance structure.

Keywords: nonstationary covariance, periodontal disease, shrinkage priors, space-time disease surveillance

1. INTRODUCTION

Disease surveillance, which consists of an ongoing systematic collection, collation, analysis, and interpretation of data to establish patterns of a chronic disease progression leading to dissemination of informed action for its prevention and control,1 has remained an active area of epidemiological research. For example, oral health surveillance techniques2 are widely used to detect and prevent periodontal disease (PD), a chronic inflammatory disease that affects the gum tissues and bone supporting the teeth. In the United States, during 2009 and 2010, 47.2% of adults aged ≥ 30 years had some form of PD, affecting approximated 64.7 million people.3 Furthermore, PD is associated with number of comorbid diseases, such as cancer,4 cardiovascular diseases,5 inflammatory bowel diseases,6 and so on, and hence can only increase the cost and disease burden if not properly detected.

Under various temporal settings (such as in clinical studies, or practice-based), oral health clinicians often use partial- or whole-mouth averages7 of tooth-site level periodontal pocket depth (PPD) to detect a subject’s active/current PD status8,9 at an observed time-point. The PPD, recorded in whole millimeters, is defined as the distance from the gingival margin to the epithelial attachment.10 In addition to the loss of information by taking averages, the corresponding surveillance tools developed mostly ignore the hypothesized spatial referencing11 of PD progression inside the mouth, given the level of PD for a group of proximal sites can be different from those that are located distally. The ability to detect specific regions (say, tooth-sites) in the mouth where PD is rapidly progressing compared with others can lead to quicker interventions with treatments (such as scaling and root planning), medication, and surgery. This early detection and flagging of anomalies can enhance coordination and control activities, specially for chronic PD.

It is believed that the incorporation of spatial information strengthens the power of surveillance, and can localize outbreaks of a disease or characterize variations in regional patterns.12 Methods for space-time disease surveillance can be broadly classified into the test-based and model-based approaches. The tools to test space-time interactions include Knox test,13 Mantel’s test,14 the k nearest neighbor test,15 and the computationally heavy cumulative sum methods16 for detecting change-points. An alternative popular method that detects disease clusters in space and time and provides more information than the space-time interaction tests is the scan statistic,17 that also inspired a Bayesian test to detect unusual temporal patterns in small area data.18 However, compared with the test-based methods, model-based approaches estimate disease risk, and thus provide better insight into etiology, spread, prediction, and control of the disease. Under Poisson assumptions for count data in specified time and areas, there exists a number of methods, such as the residual-based approach,19 exponentially weighted moving average,20 hidden Markov models,21 and so on. Recent surveillance techniques in this era of big data can now incorporate data from a variety of sources, such as electronic health records, mobile phone call records, geographically tagged tweets, and so on.22

Contrary to these geographically aggregated data methods, oral health surveillance for PD mostly considered subject-level exploration for a cross-sectional spatial setting under various complex scenarios, such as complex covariance,23,24 informative missingess,11 and non-Gaussian responses.25,26 While the whole-mouth average may be sufficient for studying population effects of systematic treatment, we focus on site-level modeling to quickly detect local changes to guide site-specific treatments. In this article, our motivation for developing short-term spatiotemporal PD surveillance comes form an observational database in a dental practice-based setting, maintained by the HealthPartners® (HP) Institute (henceforth, HP data) located at suburban Minneapolis, Minnesota. Although Bayesian methods exist under nonstationary space-time dependence assumptions,27 their computational scalability in light of the enormity of the HP database is questionable. In addition, the assumed spatial association inside a mouth can be nonstationary, because the association among posteriorly located tooth-sites in the molars can be different than those in the anterior incisors. We set forward to address this problem via a Bayesian spatiotemporal proposal that can detect problematic sites in the mouth via the spatial horseshoe (SHS)28 prior on the site-specific linear-trend coefficients.

Bayesian sparsity-inducing regressions can be broadly classified into two categories: (a) the (discrete) mixture “spike-and-slab” priors,29 which places a point mass at zero and an absolutely continuous prior on the remaining nonzero elements of the parameter vector, and (b) the (continuous) shrinkage priors30 with absolutely continuous shrinkage on the entire parameter vector. Although the spike-and-slab models are theoretically attractive, the discrete indicators there give rise to poor mixing and slow convergence, often complicating the full exploration of the posterior via Markov chain Monte Carlo (MCMC) techniques.3133 On the other hand, the “global-local” shrinkage prior models are computationally elegant as it models the posterior inclusion probabilities directly, thereby adjusting to sparsity via global shrinkage, and identifying signals via local shrinkage.34,35

As the first spatial continuous shrinkage prior, our SHS proposal28 reflects the (realistic) prior belief that there are usually only a few unhealthy sites in one’s mouth during a short period of time that simultaneously incorporate spatial dependence in the signal at nearby observations. The article explores some of its nice theoretical properties, such as high concentration around zero for sparsity, and heavy tails to avoid excessive shrinkage. In this article, we extend this approach to the multisubject spatiotemporal setting, and develop a low-rank representation to capture the nonstationary spatial covariance structure of the HP data with reduced computing time.

The rest of the article proceeds as follows. We describe the motivating HP dataset in Section 2. In Section 3, we introduce our spatiotemporal model for sparse signal detection, and the low-rank representation. We present the Bayesian inferential setup through prior specifications, and related MCMC-based computing details in Section 4. In Section 5, we apply our method to the HP dataset, and summarize our findings. We conduct a simulation study to evaluate the prediction performance of the proposed model in Section 6. Finally, we conclude with a brief discussion in Section 7.

2. MOTIVATING HP DATA

The longitudinal HP dataset consists of information on periodontal health collected for 25 763 subjects from routine dental practices located in suburban Minneapolis. The study period we selected was 2007 to 2014, and the subjects were at least 18-years-old as of January 1, 2007. PPD was measured at six prespecified sites for each tooth, excluding the wisdom teeth (third molars), of each subject via a periodontal probe, and recorded as integer values (in mm). Figure 1 illustrates the tooth numbering system and measured locations for each tooth, excluding wisdom teeth, that is, tooth number 1, 16, 17, and 32. Maxillary (upper) and mandibular (lower) are the jaw indicators, while buccal and lingual represent the cheek-side, and the side closest to the tongue, respectively. The left side of the plots in this article represents the right side of the individual, and vice versa.

FIGURE 1.

FIGURE 1

Mean and SD of periodontal pocket depth (PPD) in millimeters across all subjects and their visits during the first 2 years. Tooth numbering system is described in numbers and texts for the 28 teeth (2–15 and 18–31), excluding the four third molars (wisdom teeth). There are six locations measured for each tooth, numbered from the buccal to the lingual surface, and the mesial to the distal surface. Note that increments of 0.5 mm reflect the smallest clinically meaningful increment

The objective of our analysis is to flag unhealthy sites at an early stage before severe PD progression. Hence, we restricted our analysis to data collected only during the first 2 years for each subject. This short period also makes the assumption of a linear change of PPD in time reasonable. Although each examining oral clinician provided a recommended follow-up time for periodontal checkups at each time point for each subject, there is no reason to believe that subjects will be abiding to those in this practice-based setting. This leads to a longitudinal database with irregular observation times, and we exclude subjects with less than four visits in the first 2 years. This leads to 7279 subjects satisfying the above conditions, with the number of visits ranging from 4 to 8. Figure 1 presents the mean and SD of the recorded PPD for these subjects. The average ranged between 1.1 and 3.0 mm, with higher values in the posterior located sites (molars) than the anterior, confirming previous findings.36 The variation of PPD exhibits a similar pattern, with the SD ranging from 0.7 to 1.8 mm. Although the database in not publicly available, it can be requested via relevant data use agreement with the HP.

3. MODEL DESCRIPTION

3.1. Spatiotemporal model

Denote yijk as the recorded PPD in millimeters for subject i = 1, … np at visit j = 1, … , nvi and at site k = 1, … , ns, where np = 7279 is the number of subjects, ns = 168 is the number of sites, and nvi is the number of visits for subject i. With PPD recorded as an integer, we define yijk* as a latent variable for subject i at site k for the jth visit related to the observed PPD as yijk=max{[yijk*],0}, where [x] is the nearest integer to x. We use the data at the visits in the first 2 years, and assume a latent linear trend of PPD in time. We specify the complete data model for subject i as:

yijk*=αik+βiktij+εijk, (1)

where αik is the baseline PPD at site k, βik is the slope at site k, tij is the years since baseline for the jth visit, and εijk is the random error. We account for missing observations using standard Bayesian missing data methods, assuming the data are missing at random (MAR).37

To capture the prior belief that only a few sites may have deteriorated during the study period, we assume that the slope βik marginally follows a horseshoe prior.30 The horseshoe prior can be written hierarchically as

βikλik~N(0,λik2),λik~C+(0,1), (2)

where λik is the prior SD, and follows the standard half-Cauchy distribution on the positive reals. Marginally over λik, the prior for βik has a mass concentration near zero with heavy tails. The shape of the density shrinks null signals toward zero and avoids shrinking the true signals. This property facilitates separating signals from the noise.

Define the vector of slopes for subject i as βi=(βi1,,βins)T. To incorporate spatial dependence into a multivariate horseshoe prior, we propose to set βik = λik ζik, where λik is a shrinkage parameter with half-Cauchy prior and ζik is normal. Spatial shrinkage is induced by the spatial process model for λi=(λi1,,λins)T. We propose a Gaussian copula model38 that preserves the marginal half-Cauchy distribution as in Equation (2), and captures spatial dependence, such that

λlk=f(δik), (3)

where δik is the kth variable of the latent process δi=(δi1,,δins)T, f()=FC+1[Φ()] is the half-Cauchy link function, FC+1() is the inverse cumulative density function of the half-Cauchy distribution, and Φ(⋅) is the standard normal cumulative distribution function. The model can be expressed as

yij*=αi+tijf(δi)ζi+εij, (4)

where yij*=(yij1*,,yijns*)T is the vector of latent variables for subject i at the jth visit, αi=(αi1,,αins)T is the vector of baseline PPD for subject i, the operator ⊙ defines the pointwise vector product, δi is the spatial latent vector, ζi=(ζi1,,ζins)T is the normal vector, and εij=(εij1,,εijns)T is the vector of random error.

3.2. Low-rank representation

We use a low-rank representation to capture the complex spatial dependence of the PPD responses, and to facilitate computing for the vectors αi, δi, and ζi. Let Q be an ns × L basis function matrix that determines the covariance of PPD among sites. We set αi = Qai, δi = Qdi, and ζi = Qzi, such that the model becomes

yij*=Q ai+tijf(Qdi)Q zi+εij, (5)

where ai = (ai1, … , aiL)T is the vector related to baseline PPD for subject i, di = (di1, … , diL)T is the vector related to spatial latent vector, and zi = (zi1, … , ziL)T is the vector related to slope.

We use principal component analysis (PCA)39 to form the basis function matrix Q. PCA does not merely increase computational efficiency, but provides interpretable decomposition of our data. Denote the total visit times as Nv=i=1npnvi and S as the ns × ns sample covariance matrix of the Nv response vectors yij=(yij1,,yijns)T for i = 1, … , np and j = 1, … , nvi. The eigen decomposition of S is S=Q˜D˜Q˜T , where Q˜ is the matrix of ordered eigenvectors q(i) in the ith column, and D˜ is the diagonal matrix with the ordered eigenvalues d˜(1)d˜(ns). We take the basis matrix Q to be the first L columns of Q˜. The choice of L depends on the proportion of explained variation, l=1Ld˜(l)/k=1nsd˜(k).

In this article, we assume the vectors αi and βi are both expanded using the same basis function matrix Q. However, it is possible to have a different basis for different model components. For example, one option is to perform PCA on the sample covariance of least squares estimates of αi and βi, and use them as the basis function for αi and βi.

4. BAYESIAN INFERENCE

4.1. Prior specification

We select multivariate normal priors for the vectors related to baseline PPD ai, the slope zi and the latent di, such that ai~N(μa,σai2Σa),di~N(0,σdi2Σd), and zi~N(0,γi2Σz). The mean of ai is nonzero to capture the overall mean spatial trend, and assigned a noninformative prior as μa~N(0,1002IL). The priors for the variance parameters σai2, σdi2 and γi2 are the uninformative inverse gamma distribution, IG(0.1, 0.1). The random error εijk follows an independent and identical normal prior with zero mean and variance σε2 , with an uninformative inverse gamma hyperprior IG(0.1, 0.1) for σε2.

We select inverse Wishart priors for the covariance matrices Σa and Σz. The covariance of the intercepts and slopes across subjects may not be the same as the sample covariances, and our model allows for this due to the assignment of an inverse Wishart prior for Σa and Σz. We hope that the basis matrix Q captures the main features, and allow Σa to specify the best covariance in the span of Q. If Q is full rank, this model spans all possible covariance matrices for ai and zi, and is thus a flexible model in this limiting sense. The slopes βi are the product of the two terms, f(δi) and ζi, which can pose difficulty in estimating the scale of both δi and ζi. We therefore fix Σd at D, the diagonal matrix, with the first L eigenvalues d˜(1),,d˜(L). Furthermore, to preserve the half-Cauchy marginal distribution for λi, we modify the link function to be f(δik)=FC+1[Φ(δik/wk)] , where wk is the kth diagonal element QDQT. This produces an identifiable model, that is, still quite flexible, with the slope process βi nonstationary such that Cov(βik,βik)=λikλikCov(ζik,ζik) depending on the two sites k and k for k, k = 1, … , ns and kk.

In summary, we formulate the priors

ai|μa,Σa~N(μa,σai2Σa),di|Σd~N(0,D),ziΣz~N(0,γi2Σz),μa~N(0,1002IL),σai2,γi2,σε2~IG(0.1,0.1),Σa1,Σz1~ Wishart(L,D1), (6)

where L is the degrees of freedom.

4.2. Computing details

We perform MCMC sampling using R. We implement blocked Metropolis-Hastings (MH) sampling40 for the vectors di and zi. The full conditional distributions for these parameters

P(dl)[j=1nviP(yij*di)]×P(di)exp[12j=1nvi(yij*μij)T(σε2I)1(yij*μij)12diTD1di], andP(zi)[j=1nviP(yij*zi)]×P(zi)exp[12j=1nvi(yij*μij)T(σε2I)1(yij*μij)12γi2ziTΣz1zi], (7)

are where the latent mean vector μij = Qai + tij f(Qdi) ⊙ Qzi. We use Gaussian candidate distributions N(0, D) and N(0,γi2Σz). We tune the blocked MH algorithm of di and zi via D and Σz to attain acceptance probability near 40%. We monitor convergence using trace plots of several representative parameters.

Gibbs sampling is used for the remaining parameters: the vectors yij*, ai, μa the parameters σai2,γi2,σε2 and the inverse covariance matrices Σa1, Σz1. Given the priors in Equation (6), the full conditional distributions used for the Gibbs updates are given below. Define the latent mean as μijk = αik + tij βik. The latent PPD for subject i at visit j and site k, yijk*, follows a truncated normal distribution with mean μijk, variance σε2, lower bound min{0, yijk − 0.5} and upper bound yijk + 0.5, that is,

yijk*yijk,μijk,σε2~TN[μijk,σε2,min(0,yijk0.5),yijk+0.5], (8)

where TN(μ, σ2, l, u) is the truncated normal density with the mean μ, the variance σ2, the lower bound l and the upper bound u. The low-rank vector of baseline PPD ai follows a multivariate normal posterior distribution.

ai~N{WaiQT[1σε2j=1nvi(yij*tijβi)+1σai2Σa1μa],Wai}, (9)

where Wai=(1σai2Σa1+nviσε2QTQ)1 . In the next layer, the posterior mean μ , scale parameters σai2,γi2,σε2 , and the scale matrices Σa1, Σz1 all have full conditionals as below. Denote the total visit times as Nv=i=1npnvi.

μa~N[(i=1np1σai2Σa1+11002I)1(i=1np1σai2Σa1ai),(i=1np1σai2Σa1+11002I)1],σai2~IG[0.1+12,0.1+12(aiμa)TΣa1(aiμa)],γl2~IG(0.1+12,0.1+12zlTΣz1zi),σε2~IG[0.1+Nvns2,0.1+12i=1npj=1nvik=1ns(yijk*μijk)2],Σa1~ Wishart{np+L,[i=1np1σai2(aiμa)(aiμa)T+D1]1},Σz1~ Wishart[np+L,(i=1np1γi2ziziT+D1)1]. (10)

We generate 10 000 samples and discard the first 2000 as burn-in for data analysis in Section 5.

5. APPLICATION: HP DATA

In this section, we apply the proposed model in Section 3 to the HP data described in Section 2.

5.1. Model comparisons

We fit the model to the visits during the first 2 years for all 7279 subjects simultaneously, and evaluate the prediction of PPD at the next visit for each subject. We compare models with varying flexibility of shrinkage across space and different covariances. We consider two priors (Gaussian and SHS) for the slopes βi, and two covariances (the sample covariance and conditionally autoregressive, or conditional autoregressive (CAR) covariance41) across space, via the basis function matrix Q. The Gaussian βi has a constant shrinkage parameter across space, that is, λik = 1 for all subjects i = 1, … , np and sites k = 1, … , ns. By contrast, the SHS prior for the slopes allows spatially varying shrinkage, βi = f(δi) · ζi. Regarding the basis function matrix Q, we consider low-rank representation of the sample covariance, or CAR covariance41 with the first-order neighbors. Here, a site neighbors the one or two sites on the same buccal/lingual side of the same tooth on the same side of the same jaw, the site on the tooth’s opposite buccal/lingual side, and the site directly above/below on the opposite jaw. Therefore, the four most posterior sites in the buccal side have two neighbors, the other sites in the buccal side have three, and all others have four. Consider the site at location 5 of tooth 15 in (Figure 1) as an example. Its four neighbors are locations 4 and 6 on the same lingual side of tooth 15, location 2 on tooth 15’s buccal side, and location 5 of tooth 18 directly below on the opposite jaw. The CAR covariance is proportional to (MρA)−1, where M is the diagonal matrix with the elements m1, … , mns indicating the number of neighbors for sites 1, … , ns, ρ is the spatial dependence parameter, and A is the adjacency matrix, with Aij = 1 if sites i and j are neighbors and Aij = 0, otherwise. The spatial dependence parameter ρ does not quantify the correlation between neighbors, however, correlations generally increase with ρ. We set ρ = 0.99, which gives moderate spatial dependence.42 The number of eigenvectors L = 11, 53 in the basis function matrix Q for the sample and CAR covariance are chosen for 70% and 90% explained variation in the sample covariance, respectively. We also compared ρ = 0.5 and ρ = 0.9, and found no substantial improvement.

Table 1 presents the prediction results, based on 100 MCMC iterations. For both L = 11 and L = 53 basis functions, the SHS model with the low-rank representation of the sample covariance produces the smallest predicted mean squared error (MSE) for the observed yijk. Compared with L = 11, the MSE is smaller with L = 53 for all models, and with L = 53 the MSE of the SHS model based on the sample covariance is roughly half the MSE of the Gaussian CAR model. Coverage is close to the nominal level 95% for all models. Using a Dell Optiplex 9020 computer with 64-Bit Windows 10, Intel i7–4790 3.6 GHz processor and 32 GB RAM, the computing times (in minutes) for the Gaussian (SHS) models are approximately 17 (23) and 34 (46), for L = 11 and 53, respectively.

TABLE 1.

HP data analysis results

L = 11 L = 53
Statistic Model Covariance Estimate SE Estimate SE
100×MSE Gaussian Sample 89.57 1.23 54.82 0.87
CAR 101.31 1.37 84.08 0.91
SHS Sample 83.85 2.02 43.85 0.99
CAR 94.36 1.44 60.11 1.02
Coverage (%) Gaussian Sample 93.72 0.12 94.79 0.13
CAR 93.86 0.11 96.15 0.08
SHS Sample 93.64 0.12 94.31 0.15
CAR 94.10 0.11 95.40 0.12
Computing time Gaussian Sample 16.89 34.72
CAR 17.01 34.26
SHS Sample 22.63 45.57
CAR 23.90 47.13

Note: Comparison of prediction accuracy between the Gaussian and spatial horseshoe (SHS) models using the low-rank representation of the sample covariance and conditional autoregressive (CAR) covariance, with the number of basis functions L = 11, 53. Methods are compared using mean squared error (MSE), coverage %, and computing time (in minutes) for 100 MCMC iterations.

5.2. Interpreting eigenvectors

Figures 2 and 3 illustrate the first to fourth and the fifth to eighth eigenvectors of the sample covariance, respectively. We interpret the first eigenvector as the overall mean of PPD; the second as a weighted average of PPD with more emphasis on the posterior teeth; the third puts more weight on the teeth in the mandibular side (ie, lower jaw); and the fourth puts more weight on the posterior teeth but more anterior part than the second eigenvector. The other four eigenvectors in Figure 3 exhibit several local features.

FIGURE 2.

FIGURE 2

The first four eigenvectors of the sample covariance

FIGURE 3.

FIGURE 3

The fifth to eighth eigenvectors of the sample covariance

Similarly, Figures A1 and A2 (in Appendix A1) present the first to fourth, and fifth to seventh, and twelfth eigenvectors of the CAR covariance in a tooth map, respectively. The first 11 eigenvectors of the CAR covariance change horizontally, from the posterior to the anterior, to the posterior region which are nearly identical for both jaws. Starting from the twelfth eigenvector, there are differences between the maxillary side (ie, upper jaw) and the mandibular side, and the buccal side and the lingual side. The first eigenvector serves as the overall mean of PPD. The second eigenvector puts more emphasis on the left-posterior region and decreases toward the right-posterior region. The third eigenvector is similar to the second in the sample covariance. The remaining eigenvectors in the CAR covariance depict varying characteristics in subregions of a mouth.

5.3. Summary of the fitted models

In this subsection, we summarize the fit of the Gaussian and SHS models with L = 53 eigenvectors to the HP data. To avoid excessive false positives, we implement the Bayesian spatial false discovery rate (BSFDR) procedure43 with rate 0.01 to control for multiple testing. We consider the one-sided null and alternative hypotheses H0 : βik ≤ 0 and H1 : βik > 0, for i = 1, …, 7279 and k = 1, …, 168. We reject the null if the posterior probability of the alternative exceeds the threshold T. The BSFDR procedure determines T, such that the false discovery rate is approximately 0.01. The critical probabilities are T = 94.95% for the Gaussian models and T = 96.97% for the SHS models. The proportions of sites for which H0ik is rejected across subjects are 4.81% and 7.29% for the Gaussian and SHS models. Hence, the SHS model appears to be more powerful.

Figures 4 and 5 plot the fitted results with L = 53 eigenvectors for the two subjects (hereforth, labelled “Subject 1” and “Subject 2”) with greatest difference in the posterior mean of βik between the Gaussian and SHS models. The posterior mean for subject 1 in the SHS model is larger in teeth 5, 6, 14, 15, and 18 than the Gaussian model. The map of posterior probability, P(βik > 0|Y), indicates that the PPD in the left side of mouth have increased significantly within the first 2 years. Compared with the Gaussian model, SHS finds more significant deterioration of PPD in the buccal side of the lower jaw and in the middle of the right upper jaw (eg, teeth 2, 5, and 6). For Subject 2, the posterior means are larger in the left maxillary side for the SHS model compared with the Gaussian model. The map of posterior probability shows similar results. Comparing models, we find more significant sites and stronger spatial clustering of the signal in the SHS model, compared with the Gaussian model.

FIGURE 4.

FIGURE 4

Posterior mean of βk and posterior probability P(β1k > 0|Y), k = 1,…, 168 for Subject 1 in the Gaussian and spatial horseshoe models with L = 53 eigenvectors

FIGURE 5.

FIGURE 5

Posterior mean of βk and posterior probability P(β2k > 0|Y), k = 1, …, 168 for Subject 2 in the Gaussian and spatial horseshoe models with L = 53 eigenvectors

Figure 6 illustrates that compared with the Gaussian model, the density of the posterior means of βik (combining all subjects) from the SHS model has higher concentration around zero, with heavier tails. Moreover, Figure 7 plots the average rejection rate among all subjects by teeth. Aside from teeth in the posterior region, SHS model also detects progression of teeth in the mandibular side and few in the right-maxillary side of the mouth. In addition, Figure 8 presents the correlation among the basis functions for the covariances Σa and Σz using low-rank representation of the sample covariance and CAR covariance. It is not surprising that almost all basis functions are unrelated due to the eigendecomposition of the two covariances. The only exception is the first and second basis functions in Σa, where we observe a weak negative correlation using the sample covariance.

FIGURE 6.

FIGURE 6

Density plot (left) and quantile-quantile plot (right) of the posterior mean of site-specific linear-trend coefficient βik for subjects i = 1, …, 7279 and k = 1, …, 168 from the Gaussian and spatial horseshoe models with the low-rank representation of the sample covariance under 90% explained variation

FIGURE 7.

FIGURE 7

The average rejection rate across subjects by teeth from the Gaussian and spatial horseshoe models with L = 53 eigenvectors. The rejection rule is available in the beginning of Section 5.3

FIGURE 8.

FIGURE 8

Posterior mean of the correlation matrix corresponding to the covariances Σa and Σz from the spatial horseshoe model using the sample covariance (left column) and conditional autoregressive (CAR) covariance (right column) in the basis function matrix Q. The diagonal values are all 1, and removed for better illustration

Fitting our model to the entire dataset of 7279 subjects is time-consuming. However, this is required only once offline to estimate population parameters. Fitting the model to one subject as would be done in practice is fast. The computing times are 0.27 and 0.36 minutes, with L = 11 using the low-rank representation of the sample covariance for the Gaussian and SHS models, respectively. When L = 53, it takes 0.63 and 1.17 minutes for the Gaussian and SHS models, respectively.

6. SIMULATION STUDY

In this section, we conduct a brief simulation study to examine the benefits of using shrinkage priors to detect increases in PD. For all simulations, we restrict the spatial domain to be one jaw (ie, the 84 sites on 14 teeth) and generate data for 50 subjects. For each subject, the intercept is generated from the CAR model (defined in Section 5.1) (αi1, …, αi84)T ~ Normal(31, 0.52S), where S = (M − 0.99A)−1, A is the 84 × 84 adjacency matrix with (u, v) element equal one with sites u and v are adjacent, and zero otherwise (including the diagonal), and M is the diagonal matrix with ith element equal to the number of sites that are adjacent to site i. The slopes βik are generated to be the same within a tooth, and independent across teeth and subjects. The slopes for a tooth are assigned value β0 with probability π0, and 0 with probability 1 − π0. Given the slopes and intercepts, the data are generated as yijk*~Normal(αik+(j1)βik,1) for time steps j = 1, …, 5. Therefore, in the simulation, the data are not integer-valued as in the real data analysis. The simulations vary by the effect size β0 ∈ {0.50, 1.00} and proportion of nonnull slopes π0 ∈ {0.05, 0.20}. For each combination of these factors, we generate 100 datasets.

For each dataset, we fit three models. The first model is the Gaussian model with δi set to zero (“Gaussian”). The second model is the horseshoe model that uses data from all five visits (“HS5”), and the third model is the horseshoe model that uses data from only the first four visits (“HS4”). For each model, we use the full CAR covariance to determine the latent-factor structure, that is, Q and D are set to the eigenvectors and eigenvalues, respectively, of S. For each model, we use the priors given in Section 4.1, and generate 5000 MCMC samples after discarding 1000 as burn-in. This gives estimates of the posterior means β^ik and posterior probabilities that βik is positive, denoted qik. We conclude that the slope is positive if qik > 0.9. Table 2 reports the MSE of β^ik (averaged over tooth-site and subject) and the Type I error and power (also, averaged over tooth-site and subject) of the test for a positive slope.

TABLE 2.

Summary of the simulation study

Statistic Effect Size Proportion Nonnull Gaussian HS5 HS4
MSE 0.5 0.05 19.6(0.3) 1.4(0.1) 2.4(0.1)
0.20 19.6(0.3) 3.2(0.1) 4.7(0.1)
1.0 0.05 19.6(1.5) 1.5(0.1) 3.1(0.0)
0.20 19.7(0.3) 3.8(0.1) 7.5(0.1)
Type I error 0.5 0.05 1.1(0.1) 0.3(0.1) 0.4(0.1)
0.20 1.1(0.4) 0.4(0.1) 0.4(0.1)
1.0 0.05 1.1(0.1) 0.4(0.1) 0.4(0.1)
0.20 1.1(0.1) 0.8(0.1) 0.6(0.1)
Power 0.5 0.05 12.1(0.5) 23.0(0.4) 8.5(0.3)
0.20 11.9(0.5) 22.6(0.2) 8.4(0.1)
1.0 0.05 48.7(1.0) 90.9(0.2) 58.7(0.5)
0.20 48.0(0.9) 91.0(0.1) 58.5(0.2)

Note: The competing models are the Gaussian model, and the horseshoe (“HS”) model that uses data from four (HS4), or five (HS5) visits. The simulations vary depending on the effect size β0 and proportion of nonnull slopes π0. MSE, Type I error and power are multiplied by 100, and standard errors are given in parentheses.

The MSE is dramatically smaller for the HS prior than the Gaussian prior, especially when the proportion of nonnull slopes is low (π0 = 0.05). All three methods are conservative, with Type I error less than 0.05 in all cases. The HS prior that uses the full dataset is more powerful than the Gaussian prior. In fact, the horseshoe prior that only uses data from the first four visits is often more powerful than the Gaussian model that uses data from all five visits.

7. DISCUSSION

In this article, we propose a spatiotemporal model for detecting local changes in PD. We implement the SHS prior on the linear time trend by sites for each subject. We introduce low-rank representation to reduce computational load, and obtain a nonstationary spatial covariance which suits the HP data and provides more flexibility. The empirical results show improved prediction compared with alternatives that rely on the usual Gaussian priors for the regression parameters, and a CAR specification for the covariance structure. Computing codes in R for fitting the proposed model is available on request from the corresponding author.

A potential limitation of our model is the assumption of a linear change of the PPD in time. We believe this is reasonable as we are using PPD responses collected within the subjects’ dental visits in 2 years. This linear time trend can be modified (via splines, or other functional structures) to meet the assumptions for a longer study duration.44 One possibility is to allow for a higher order trend at each site, but assume the same shrinkage parameter λik to appear in the prior SD of all terms to shrink toward the static-mean model, following developments45 in nonspatial data. A second restriction is that although some spatiotemporal dependence is induced by the random slopes and intercepts, the errors are assumed to be independent. We think this is sufficient for the HP data in that PPDs were measured independently across subjects, visits and sites. However, this may not hold for other datasets. Although we found the eigen-decomposition based on the sample covariance matrix to yield better results than the spatial CAR model, we are yet to explore more sophisticated parametric correlation structures.23,46 In addition, our shrinkage prior for the slopes is symmetric. While negative regression coefficients are plausible as PPD can decrease with interventions such as improvements in dental hygiene, increasing PPD is more common and relevant for disease monitoring, and so an asymmetric shrinkage prior could prove useful.

Relying on previous oral health studies,11,25 one maybe tempted to consider informative missingness, or the “missing-not-at-random” scenario within a spatiotemporal setup. Missing teeth are indicative of poor periodontal health. Hence, a specific region with many missing teeth is likely to have higher PPD at the nonmissing sites (in that region)—an observation which can be attributed to spatial clustering. However, in the present analysis, we feel the working MAR assumption is reasonable, given that subjects rarely loose teeth during the short follow-up time we are considering, and thus periodontal health assessment can be relied on changes in PPD over time at nonmissing sites. Furthermore, our current flagging algorithm only considers the baseline PPD, whereas, other covariates (sociodemographic, behavioral, and so on) may also influence signal detection. All these are important avenues of future research, and will be considered elsewhere.

ACKNOWLEDGEMENTS

This work was supported by grant R01DE024984 from the National Institutes of Health. The authors thank B.S.M., B.D.R., S.K., and B.A.R. for providing the HealthPartners dataset, and the context behind this work.

Funding information

Foundation for the National Institutes of Health, Grant/Award Number: R01-DE024984-01A1

APPENDIX

A1. Plots of estimated parameters

FIGURE A1.

FIGURE A1

The first four eigenvectors of the conditional autoregressive (CAR) covariance

FIGURE A2.

FIGURE A2

The fifth to seventh and the twelfth eigenvectors of the conditional autoregressive (CAR) covariance

Footnotes

SUPPORTING INFORMATION

Additional supporting information may be found online in the Supporting Information section at the end of this article.

REFERENCES

  • 1.World Health Organization. Global early warning system for major animal diseases, including Zoonoses (GLEWS) http://www.who.int/zoonoses/outbreaks/glews/en/; 2007.
  • 2.Beltrán-Aguilar ED, Malvitz DM, Lockwood SA, Rozier R, Gary TSL. Oral health surveillance: past, present, and future challenges. J Public Health Dentist. 2003;63:141–149. [DOI] [PubMed] [Google Scholar]
  • 3.Eke PI, Dye BA, Wei L, Thornton-Evans GO, Genco RJ. Prevalence of periodontitis in adults in the United States: 2009 and 2010. J Dental Res. 2012;91:914–920. [DOI] [PubMed] [Google Scholar]
  • 4.Cheng Y-SL, Jordan L, Chen H-S, et al. Chronic periodontitis can affect the levels of potential oral cancer salivary mRNA biomarkers. J Periodontal Res. 2017;52:428–437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Persson GR, Persson RE. Cardiovascular disease and periodontitis: an update on the associations and risk. J Clinical Periodontology. 2008;35:362–379. [DOI] [PubMed] [Google Scholar]
  • 6.Vavricka SR, Manser CN, Hediger S, et al. Periodontitis and gingivitis in inflammatory bowel disease: a case—control study. Inflammatory Bowel Diseases. 2013;19:2768–2777. [DOI] [PubMed] [Google Scholar]
  • 7.Tran DT, Gay I, Du Xianglin L, et al. Assessment of partial-mouth periodontal examination protocols for periodontitis surveillance. J Clinical Periodontology. 2014;41:846–852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Page RC, Eke PI. Case definitions for use in population-based surveillance of periodontitis. J Periodontology. 2007;78:1387–1399. [DOI] [PubMed] [Google Scholar]
  • 9.Michalowicz HJS, Philstrom BL. Is change in probing depth a reliable predictor of change in clinical attachment loss? J Am Dental Assoc. 2013;144:171–178. [DOI] [PubMed] [Google Scholar]
  • 10.Bandyopadhyay LVH, Abanto-Valle CA, Ghosh P. Linear mixed models for skew-normal/independent bivariate responses with an application to periodontal disease. Stat Med. 2010;29:2643–2655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Reich BJ, Bandyopadhyay D. A latent factor model for spatial data with informative missingness. The Annals of Applied Statistics. 2010;4:439–459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Unkel S, Farrington CP, Garthwaite PH, Robertson C, Andrews N. Statistical methods for the prospective detection of infectious disease outbreaks: a review. J Royal Stat Soc Ser A (Stat Soc). 2012;175:49–82. [Google Scholar]
  • 13.Knox EG, Bartlett MS. The detection of space-time interactions. J Royal Stat Soc Ser C (Appl Stat). 1964;13:25–30. [Google Scholar]
  • 14.Mantel N The detection of disease clustering and a generalized regression approach. Cancer Res. 1967;27:209–220. [PubMed] [Google Scholar]
  • 15.Jacquez GM. A k nearest neighbor test for space-time interaction. Stat Med. 1996;15:1935–1949. [DOI] [PubMed] [Google Scholar]
  • 16.Rogerson PA, Ikuho Y. Monitoring change in spatial patterns of disease: comparing univariate and multivariate cumulative sum approaches. Stat Med. 2004;23:2195–2214. [DOI] [PubMed] [Google Scholar]
  • 17.Kulldorff M, Heffernan R, Hartman J, Assunção R, Mostashari F. A space–time permutation scan statistic for disease outbreak detection. PLOS Med. 2005;2:e59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Li G, Best N, Hansell AL, Ahmed I, Richardson S. BaySTDetect: detecting unusual temporal patterns in small area data via Bayesian model choice. Biostatistics. 2012;13:695–710. [DOI] [PubMed] [Google Scholar]
  • 19.Vidal Rodeiro CL, Lawson Andrew B. Monitoring changes in spatio-temporal maps of disease. Biomet J. 2006;48:463–480. [DOI] [PubMed] [Google Scholar]
  • 20.Zhou H, Lawson AB. EWMA smoothing and Bayesian spatial modeling for health surveillance. Stat Med. 2008;27:5907–5928. [DOI] [PubMed] [Google Scholar]
  • 21.Watkins RE, Eagleson S, Veenendaal B, Wright G, Plant AJ. Disease surveillance using a hidden Markov model. BMC Med Inform Decis Mak. 2009;9:39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lee EC, Asher JM, Goldlust S, Kraemer JD, Lawson AB, Bansal S. Mind the scales: harnessing spatial big data for infectious disease surveillance and inference. J Infect Diseas. 2016;214:S409–S413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Reich BJ, Hodges JS, Carlin BP. Spatial analyses of periodontal data using conditionally autoregressive priors having two classes of neighbor relations. J Am Stat Assoc. 2007;102:44–55. [Google Scholar]
  • 24.Jin IH, Yuan Y, Bandyopadhyay D. A Bayesian hierarchical spatial model for dental caries assessment using non-Gaussian Markov random fields. Ann Appl Stat. 2016;10:884–905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Reich BJ, Bandyopadhyay D, Bondell HD. A nonparametric spatial model for periodontal data with non-random missingness. J Am Stat Assoc. 2013;108:820–831. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Cai B, Bandyopadhyay D. Bayesian semiparametric variable selection with applications to periodontal data. Stat Med. 2017;36:2251–2264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Reich BJ, Hodges JS. Modeling longitudinal spatial periodontal data: a spatially-adaptive model with tools for specifying priors and checking fit. Biometrics. 2008;64:790–799. [DOI] [PubMed] [Google Scholar]
  • 28.Jhuang A-T, Fuentes M, Jones JL, et al. Spatial signal detection using continuous shrinkage priors. Techonometrics. 2019;61:494–506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ishwaran H, Rao JS. Spike and slab variable selection: frequentist and Bayesian strategies. Ann Stat. 2005;33:730–773. [Google Scholar]
  • 30.Carvalho CM, Polson NG, Scott JG. The Horseshoe estimator for sparse signals. Biometrika. 2010;97:465–480. [Google Scholar]
  • 31.Goldsmith J, Huang L, Crainiceanu CM. Smooth scalar-on-image regression via spatial Bayesian variable selection. J Comput Graphical Stat. 2014;23:46–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Boehm Vock LF, Reich BJ, Fuentes M, Dominici F. Spatial variable selection methods for investigating acute health effects of fine particulate matter components. Biometrics. 2015;71:167–177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ročková V, George EI. The Spike-and-Slab LASSO. J Am Stat Assoc. 2018;113:431–444. [Google Scholar]
  • 34.Polson NG, Scott JG. Shrink globally, act locally: sparse Bayesian regularization and prediction. Bayesian Stat. 2010;9:501–538. [Google Scholar]
  • 35.Bhadra A, Datta J, Polson NG, Willard B. Default Bayesian analysis with global-local shrinkage priors. Biometrika. 2016;103:955–969. [Google Scholar]
  • 36.Quteish TDSM. Periodontal reasons for tooth extraction in an adult population in Jordan. J Oral Rehabilitat. 2003;30:110–112. [DOI] [PubMed] [Google Scholar]
  • 37.Little RJA, Rubin DB. Statistical Analysis with Missing Data. 3rd ed. Hoboken, NJ: John Wiley & Sons; 2019. [Google Scholar]
  • 38.Nelsen RB. An Introduction to Copulas. New York, NY: Springer; 2006. [Google Scholar]
  • 39.Pearson K LIII.On lines and planes of closest fit to systems of points in space. London Edinburgh Dublin Philosoph Mag J Sci. 1901;2(11):559–572. [Google Scholar]
  • 40.Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis. 3rd ed. Boca Raton, FL: Chapman and Hall/CRC; 2013. [Google Scholar]
  • 41.Banerjee S, Carlin BP, Gelfand AE. Hierarchical Modeling and Analysis for Spatial Data. 2nd ed. Boca Raton, FL: Chapman and Hall/CRC; 2014. [Google Scholar]
  • 42.Gelfand AE, Vounatsou P. Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics. 2003;4:11–15. [DOI] [PubMed] [Google Scholar]
  • 43.Sun W, Reich BJ, Tony CT, Guindani M, Schwartzman A. False discovery control in large-scale spatial multiple testing. J Royal Stat Soc Ser B (Stat Methodol). 2015;77:59–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Corberán-Vallet A, Lawson AB. chapter 27 Spatial health surveillance In: Lawson AB, Banerjee S, Haining RP, Ugarte MD, eds. Handbook of Spatial Epidemiology. Boca Raton, FL: Chapman and Hall/CRC; 2016:501–519. [Google Scholar]
  • 45.Wei R, Reich BJ, Hoppin JA, Ghosal S. Sparse Bayesian additive nonparametric regression with application to health effects of pesticides mixtures. Statistica Sinica. 2020;30:55–79. [Google Scholar]
  • 46.Mancl LA, Leroux BG. Efficiency of regression estimates for clustered data. Biometrics. 1996;52:500–511. [PubMed] [Google Scholar]

RESOURCES