Abstract
Periodontal disease (PD) is a chronic inflammatory disease that affects the gum tissue and bone supporting the teeth. Although tooth-site level PD progression is believed to be spatio-temporally referenced, the whole-mouth average periodontal pocket depth (PPD) has been commonly used as an indicator of the current/active status of PD. This leads to imminent loss of information, and imprecise parameter estimates. Despite availability of statistical methods that accommodates spatiotemporal information for responses collected at the tooth-site level, the enormity of longitudinal databases derived from oral health practice-based settings render them unscalable for application. To mitigate this, we introduce a Bayesian spatiotemporal model to detect problematic/diseased tooth-sites dynamically inside the mouth for any subject obtained from large databases. This is achieved via a spatial continuous sparsity-inducing shrinkage prior on spatially varying linear-trend regression coefficients. A low-rank representation captures the nonstationary covariance structure of the PPD outcomes, and facilitates the relevant Markov chain Monte Carlo computing steps applicable to thousands of study subjects. Application of our method to both simulated data and to a rich database of electronic dental records from the HealthPartners® Institute reveal improved prediction performances, compared with alternative models with usual Gaussian priors for regression parameters and conditionally autoregressive specification of the covariance structure.
Keywords: nonstationary covariance, periodontal disease, shrinkage priors, space-time disease surveillance
1. INTRODUCTION
Disease surveillance, which consists of an ongoing systematic collection, collation, analysis, and interpretation of data to establish patterns of a chronic disease progression leading to dissemination of informed action for its prevention and control,1 has remained an active area of epidemiological research. For example, oral health surveillance techniques2 are widely used to detect and prevent periodontal disease (PD), a chronic inflammatory disease that affects the gum tissues and bone supporting the teeth. In the United States, during 2009 and 2010, 47.2% of adults aged ≥ 30 years had some form of PD, affecting approximated 64.7 million people.3 Furthermore, PD is associated with number of comorbid diseases, such as cancer,4 cardiovascular diseases,5 inflammatory bowel diseases,6 and so on, and hence can only increase the cost and disease burden if not properly detected.
Under various temporal settings (such as in clinical studies, or practice-based), oral health clinicians often use partial- or whole-mouth averages7 of tooth-site level periodontal pocket depth (PPD) to detect a subject’s active/current PD status8,9 at an observed time-point. The PPD, recorded in whole millimeters, is defined as the distance from the gingival margin to the epithelial attachment.10 In addition to the loss of information by taking averages, the corresponding surveillance tools developed mostly ignore the hypothesized spatial referencing11 of PD progression inside the mouth, given the level of PD for a group of proximal sites can be different from those that are located distally. The ability to detect specific regions (say, tooth-sites) in the mouth where PD is rapidly progressing compared with others can lead to quicker interventions with treatments (such as scaling and root planning), medication, and surgery. This early detection and flagging of anomalies can enhance coordination and control activities, specially for chronic PD.
It is believed that the incorporation of spatial information strengthens the power of surveillance, and can localize outbreaks of a disease or characterize variations in regional patterns.12 Methods for space-time disease surveillance can be broadly classified into the test-based and model-based approaches. The tools to test space-time interactions include Knox test,13 Mantel’s test,14 the k nearest neighbor test,15 and the computationally heavy cumulative sum methods16 for detecting change-points. An alternative popular method that detects disease clusters in space and time and provides more information than the space-time interaction tests is the scan statistic,17 that also inspired a Bayesian test to detect unusual temporal patterns in small area data.18 However, compared with the test-based methods, model-based approaches estimate disease risk, and thus provide better insight into etiology, spread, prediction, and control of the disease. Under Poisson assumptions for count data in specified time and areas, there exists a number of methods, such as the residual-based approach,19 exponentially weighted moving average,20 hidden Markov models,21 and so on. Recent surveillance techniques in this era of big data can now incorporate data from a variety of sources, such as electronic health records, mobile phone call records, geographically tagged tweets, and so on.22
Contrary to these geographically aggregated data methods, oral health surveillance for PD mostly considered subject-level exploration for a cross-sectional spatial setting under various complex scenarios, such as complex covariance,23,24 informative missingess,11 and non-Gaussian responses.25,26 While the whole-mouth average may be sufficient for studying population effects of systematic treatment, we focus on site-level modeling to quickly detect local changes to guide site-specific treatments. In this article, our motivation for developing short-term spatiotemporal PD surveillance comes form an observational database in a dental practice-based setting, maintained by the HealthPartners® (HP) Institute (henceforth, HP data) located at suburban Minneapolis, Minnesota. Although Bayesian methods exist under nonstationary space-time dependence assumptions,27 their computational scalability in light of the enormity of the HP database is questionable. In addition, the assumed spatial association inside a mouth can be nonstationary, because the association among posteriorly located tooth-sites in the molars can be different than those in the anterior incisors. We set forward to address this problem via a Bayesian spatiotemporal proposal that can detect problematic sites in the mouth via the spatial horseshoe (SHS)28 prior on the site-specific linear-trend coefficients.
Bayesian sparsity-inducing regressions can be broadly classified into two categories: (a) the (discrete) mixture “spike-and-slab” priors,29 which places a point mass at zero and an absolutely continuous prior on the remaining nonzero elements of the parameter vector, and (b) the (continuous) shrinkage priors30 with absolutely continuous shrinkage on the entire parameter vector. Although the spike-and-slab models are theoretically attractive, the discrete indicators there give rise to poor mixing and slow convergence, often complicating the full exploration of the posterior via Markov chain Monte Carlo (MCMC) techniques.31–33 On the other hand, the “global-local” shrinkage prior models are computationally elegant as it models the posterior inclusion probabilities directly, thereby adjusting to sparsity via global shrinkage, and identifying signals via local shrinkage.34,35
As the first spatial continuous shrinkage prior, our SHS proposal28 reflects the (realistic) prior belief that there are usually only a few unhealthy sites in one’s mouth during a short period of time that simultaneously incorporate spatial dependence in the signal at nearby observations. The article explores some of its nice theoretical properties, such as high concentration around zero for sparsity, and heavy tails to avoid excessive shrinkage. In this article, we extend this approach to the multisubject spatiotemporal setting, and develop a low-rank representation to capture the nonstationary spatial covariance structure of the HP data with reduced computing time.
The rest of the article proceeds as follows. We describe the motivating HP dataset in Section 2. In Section 3, we introduce our spatiotemporal model for sparse signal detection, and the low-rank representation. We present the Bayesian inferential setup through prior specifications, and related MCMC-based computing details in Section 4. In Section 5, we apply our method to the HP dataset, and summarize our findings. We conduct a simulation study to evaluate the prediction performance of the proposed model in Section 6. Finally, we conclude with a brief discussion in Section 7.
2. MOTIVATING HP DATA
The longitudinal HP dataset consists of information on periodontal health collected for 25 763 subjects from routine dental practices located in suburban Minneapolis. The study period we selected was 2007 to 2014, and the subjects were at least 18-years-old as of January 1, 2007. PPD was measured at six prespecified sites for each tooth, excluding the wisdom teeth (third molars), of each subject via a periodontal probe, and recorded as integer values (in mm). Figure 1 illustrates the tooth numbering system and measured locations for each tooth, excluding wisdom teeth, that is, tooth number 1, 16, 17, and 32. Maxillary (upper) and mandibular (lower) are the jaw indicators, while buccal and lingual represent the cheek-side, and the side closest to the tongue, respectively. The left side of the plots in this article represents the right side of the individual, and vice versa.
FIGURE 1.

Mean and SD of periodontal pocket depth (PPD) in millimeters across all subjects and their visits during the first 2 years. Tooth numbering system is described in numbers and texts for the 28 teeth (2–15 and 18–31), excluding the four third molars (wisdom teeth). There are six locations measured for each tooth, numbered from the buccal to the lingual surface, and the mesial to the distal surface. Note that increments of 0.5 mm reflect the smallest clinically meaningful increment
The objective of our analysis is to flag unhealthy sites at an early stage before severe PD progression. Hence, we restricted our analysis to data collected only during the first 2 years for each subject. This short period also makes the assumption of a linear change of PPD in time reasonable. Although each examining oral clinician provided a recommended follow-up time for periodontal checkups at each time point for each subject, there is no reason to believe that subjects will be abiding to those in this practice-based setting. This leads to a longitudinal database with irregular observation times, and we exclude subjects with less than four visits in the first 2 years. This leads to 7279 subjects satisfying the above conditions, with the number of visits ranging from 4 to 8. Figure 1 presents the mean and SD of the recorded PPD for these subjects. The average ranged between 1.1 and 3.0 mm, with higher values in the posterior located sites (molars) than the anterior, confirming previous findings.36 The variation of PPD exhibits a similar pattern, with the SD ranging from 0.7 to 1.8 mm. Although the database in not publicly available, it can be requested via relevant data use agreement with the HP.
3. MODEL DESCRIPTION
3.1. Spatiotemporal model
Denote yijk as the recorded PPD in millimeters for subject i = 1, … np at visit j = 1, … , nvi and at site k = 1, … , ns, where np = 7279 is the number of subjects, ns = 168 is the number of sites, and nvi is the number of visits for subject i. With PPD recorded as an integer, we define as a latent variable for subject i at site k for the jth visit related to the observed PPD as , where [x] is the nearest integer to x. We use the data at the visits in the first 2 years, and assume a latent linear trend of PPD in time. We specify the complete data model for subject i as:
| (1) |
where αik is the baseline PPD at site k, βik is the slope at site k, tij is the years since baseline for the jth visit, and εijk is the random error. We account for missing observations using standard Bayesian missing data methods, assuming the data are missing at random (MAR).37
To capture the prior belief that only a few sites may have deteriorated during the study period, we assume that the slope βik marginally follows a horseshoe prior.30 The horseshoe prior can be written hierarchically as
| (2) |
where λik is the prior SD, and follows the standard half-Cauchy distribution on the positive reals. Marginally over λik, the prior for βik has a mass concentration near zero with heavy tails. The shape of the density shrinks null signals toward zero and avoids shrinking the true signals. This property facilitates separating signals from the noise.
Define the vector of slopes for subject i as . To incorporate spatial dependence into a multivariate horseshoe prior, we propose to set βik = λik ζik, where λik is a shrinkage parameter with half-Cauchy prior and ζik is normal. Spatial shrinkage is induced by the spatial process model for . We propose a Gaussian copula model38 that preserves the marginal half-Cauchy distribution as in Equation (2), and captures spatial dependence, such that
| (3) |
where δik is the kth variable of the latent process , is the half-Cauchy link function, is the inverse cumulative density function of the half-Cauchy distribution, and Φ(⋅) is the standard normal cumulative distribution function. The model can be expressed as
| (4) |
where is the vector of latent variables for subject i at the jth visit, is the vector of baseline PPD for subject i, the operator ⊙ defines the pointwise vector product, δi is the spatial latent vector, is the normal vector, and is the vector of random error.
3.2. Low-rank representation
We use a low-rank representation to capture the complex spatial dependence of the PPD responses, and to facilitate computing for the vectors αi, δi, and ζi. Let Q be an ns × L basis function matrix that determines the covariance of PPD among sites. We set αi = Qai, δi = Qdi, and ζi = Qzi, such that the model becomes
| (5) |
where ai = (ai1, … , aiL)T is the vector related to baseline PPD for subject i, di = (di1, … , diL)T is the vector related to spatial latent vector, and zi = (zi1, … , ziL)T is the vector related to slope.
We use principal component analysis (PCA)39 to form the basis function matrix Q. PCA does not merely increase computational efficiency, but provides interpretable decomposition of our data. Denote the total visit times as and S as the ns × ns sample covariance matrix of the Nv response vectors for i = 1, … , np and j = 1, … , nvi. The eigen decomposition of S is , where is the matrix of ordered eigenvectors q(i) in the ith column, and is the diagonal matrix with the ordered eigenvalues . We take the basis matrix Q to be the first L columns of . The choice of L depends on the proportion of explained variation, .
In this article, we assume the vectors αi and βi are both expanded using the same basis function matrix Q. However, it is possible to have a different basis for different model components. For example, one option is to perform PCA on the sample covariance of least squares estimates of αi and βi, and use them as the basis function for αi and βi.
4. BAYESIAN INFERENCE
4.1. Prior specification
We select multivariate normal priors for the vectors related to baseline PPD ai, the slope zi and the latent di, such that , and . The mean of ai is nonzero to capture the overall mean spatial trend, and assigned a noninformative prior as . The priors for the variance parameters , and are the uninformative inverse gamma distribution, IG(0.1, 0.1). The random error εijk follows an independent and identical normal prior with zero mean and variance , with an uninformative inverse gamma hyperprior IG(0.1, 0.1) for .
We select inverse Wishart priors for the covariance matrices Σa and Σz. The covariance of the intercepts and slopes across subjects may not be the same as the sample covariances, and our model allows for this due to the assignment of an inverse Wishart prior for Σa and Σz. We hope that the basis matrix Q captures the main features, and allow Σa to specify the best covariance in the span of Q. If Q is full rank, this model spans all possible covariance matrices for ai and zi, and is thus a flexible model in this limiting sense. The slopes βi are the product of the two terms, f(δi) and ζi, which can pose difficulty in estimating the scale of both δi and ζi. We therefore fix Σd at D, the diagonal matrix, with the first L eigenvalues . Furthermore, to preserve the half-Cauchy marginal distribution for λi, we modify the link function to be , where wk is the kth diagonal element QDQT. This produces an identifiable model, that is, still quite flexible, with the slope process βi nonstationary such that depending on the two sites k and k′ for k, k′ = 1, … , ns and k ≠ k′.
In summary, we formulate the priors
| (6) |
where L is the degrees of freedom.
4.2. Computing details
We perform MCMC sampling using R. We implement blocked Metropolis-Hastings (MH) sampling40 for the vectors di and zi. The full conditional distributions for these parameters
| (7) |
are where the latent mean vector μij = Qai + tij f(Qdi) ⊙ Qzi. We use Gaussian candidate distributions N(0, D) and . We tune the blocked MH algorithm of di and zi via D and Σz to attain acceptance probability near 40%. We monitor convergence using trace plots of several representative parameters.
Gibbs sampling is used for the remaining parameters: the vectors , ai, μa the parameters and the inverse covariance matrices , . Given the priors in Equation (6), the full conditional distributions used for the Gibbs updates are given below. Define the latent mean as μijk = αik + tij βik. The latent PPD for subject i at visit j and site k, , follows a truncated normal distribution with mean μijk, variance , lower bound min{0, yijk − 0.5} and upper bound yijk + 0.5, that is,
| (8) |
where TN(μ, σ2, l, u) is the truncated normal density with the mean μ, the variance σ2, the lower bound l and the upper bound u. The low-rank vector of baseline PPD ai follows a multivariate normal posterior distribution.
| (9) |
where . In the next layer, the posterior mean μ , scale parameters , and the scale matrices , all have full conditionals as below. Denote the total visit times as .
| (10) |
We generate 10 000 samples and discard the first 2000 as burn-in for data analysis in Section 5.
5. APPLICATION: HP DATA
In this section, we apply the proposed model in Section 3 to the HP data described in Section 2.
5.1. Model comparisons
We fit the model to the visits during the first 2 years for all 7279 subjects simultaneously, and evaluate the prediction of PPD at the next visit for each subject. We compare models with varying flexibility of shrinkage across space and different covariances. We consider two priors (Gaussian and SHS) for the slopes βi, and two covariances (the sample covariance and conditionally autoregressive, or conditional autoregressive (CAR) covariance41) across space, via the basis function matrix Q. The Gaussian βi has a constant shrinkage parameter across space, that is, λik = 1 for all subjects i = 1, … , np and sites k = 1, … , ns. By contrast, the SHS prior for the slopes allows spatially varying shrinkage, βi = f(δi) · ζi. Regarding the basis function matrix Q, we consider low-rank representation of the sample covariance, or CAR covariance41 with the first-order neighbors. Here, a site neighbors the one or two sites on the same buccal/lingual side of the same tooth on the same side of the same jaw, the site on the tooth’s opposite buccal/lingual side, and the site directly above/below on the opposite jaw. Therefore, the four most posterior sites in the buccal side have two neighbors, the other sites in the buccal side have three, and all others have four. Consider the site at location 5 of tooth 15 in (Figure 1) as an example. Its four neighbors are locations 4 and 6 on the same lingual side of tooth 15, location 2 on tooth 15’s buccal side, and location 5 of tooth 18 directly below on the opposite jaw. The CAR covariance is proportional to (M − ρA)−1, where M is the diagonal matrix with the elements m1, … , mns indicating the number of neighbors for sites 1, … , ns, ρ is the spatial dependence parameter, and A is the adjacency matrix, with Aij = 1 if sites i and j are neighbors and Aij = 0, otherwise. The spatial dependence parameter ρ does not quantify the correlation between neighbors, however, correlations generally increase with ρ. We set ρ = 0.99, which gives moderate spatial dependence.42 The number of eigenvectors L = 11, 53 in the basis function matrix Q for the sample and CAR covariance are chosen for 70% and 90% explained variation in the sample covariance, respectively. We also compared ρ = 0.5 and ρ = 0.9, and found no substantial improvement.
Table 1 presents the prediction results, based on 100 MCMC iterations. For both L = 11 and L = 53 basis functions, the SHS model with the low-rank representation of the sample covariance produces the smallest predicted mean squared error (MSE) for the observed yijk. Compared with L = 11, the MSE is smaller with L = 53 for all models, and with L = 53 the MSE of the SHS model based on the sample covariance is roughly half the MSE of the Gaussian CAR model. Coverage is close to the nominal level 95% for all models. Using a Dell Optiplex 9020 computer with 64-Bit Windows 10, Intel i7–4790 3.6 GHz processor and 32 GB RAM, the computing times (in minutes) for the Gaussian (SHS) models are approximately 17 (23) and 34 (46), for L = 11 and 53, respectively.
TABLE 1.
HP data analysis results
| L = 11 | L = 53 | |||||
|---|---|---|---|---|---|---|
| Statistic | Model | Covariance | Estimate | SE | Estimate | SE |
| 100×MSE | Gaussian | Sample | 89.57 | 1.23 | 54.82 | 0.87 |
| CAR | 101.31 | 1.37 | 84.08 | 0.91 | ||
| SHS | Sample | 83.85 | 2.02 | 43.85 | 0.99 | |
| CAR | 94.36 | 1.44 | 60.11 | 1.02 | ||
| Coverage (%) | Gaussian | Sample | 93.72 | 0.12 | 94.79 | 0.13 |
| CAR | 93.86 | 0.11 | 96.15 | 0.08 | ||
| SHS | Sample | 93.64 | 0.12 | 94.31 | 0.15 | |
| CAR | 94.10 | 0.11 | 95.40 | 0.12 | ||
| Computing time | Gaussian | Sample | 16.89 | – | 34.72 | – |
| CAR | 17.01 | – | 34.26 | – | ||
| SHS | Sample | 22.63 | – | 45.57 | – | |
| CAR | 23.90 | – | 47.13 | – | ||
Note: Comparison of prediction accuracy between the Gaussian and spatial horseshoe (SHS) models using the low-rank representation of the sample covariance and conditional autoregressive (CAR) covariance, with the number of basis functions L = 11, 53. Methods are compared using mean squared error (MSE), coverage %, and computing time (in minutes) for 100 MCMC iterations.
5.2. Interpreting eigenvectors
Figures 2 and 3 illustrate the first to fourth and the fifth to eighth eigenvectors of the sample covariance, respectively. We interpret the first eigenvector as the overall mean of PPD; the second as a weighted average of PPD with more emphasis on the posterior teeth; the third puts more weight on the teeth in the mandibular side (ie, lower jaw); and the fourth puts more weight on the posterior teeth but more anterior part than the second eigenvector. The other four eigenvectors in Figure 3 exhibit several local features.
FIGURE 2.

The first four eigenvectors of the sample covariance
FIGURE 3.

The fifth to eighth eigenvectors of the sample covariance
Similarly, Figures A1 and A2 (in Appendix A1) present the first to fourth, and fifth to seventh, and twelfth eigenvectors of the CAR covariance in a tooth map, respectively. The first 11 eigenvectors of the CAR covariance change horizontally, from the posterior to the anterior, to the posterior region which are nearly identical for both jaws. Starting from the twelfth eigenvector, there are differences between the maxillary side (ie, upper jaw) and the mandibular side, and the buccal side and the lingual side. The first eigenvector serves as the overall mean of PPD. The second eigenvector puts more emphasis on the left-posterior region and decreases toward the right-posterior region. The third eigenvector is similar to the second in the sample covariance. The remaining eigenvectors in the CAR covariance depict varying characteristics in subregions of a mouth.
5.3. Summary of the fitted models
In this subsection, we summarize the fit of the Gaussian and SHS models with L = 53 eigenvectors to the HP data. To avoid excessive false positives, we implement the Bayesian spatial false discovery rate (BSFDR) procedure43 with rate 0.01 to control for multiple testing. We consider the one-sided null and alternative hypotheses H0 : βik ≤ 0 and H1 : βik > 0, for i = 1, …, 7279 and k = 1, …, 168. We reject the null if the posterior probability of the alternative exceeds the threshold T. The BSFDR procedure determines T, such that the false discovery rate is approximately 0.01. The critical probabilities are T = 94.95% for the Gaussian models and T = 96.97% for the SHS models. The proportions of sites for which is rejected across subjects are 4.81% and 7.29% for the Gaussian and SHS models. Hence, the SHS model appears to be more powerful.
Figures 4 and 5 plot the fitted results with L = 53 eigenvectors for the two subjects (hereforth, labelled “Subject 1” and “Subject 2”) with greatest difference in the posterior mean of βik between the Gaussian and SHS models. The posterior mean for subject 1 in the SHS model is larger in teeth 5, 6, 14, 15, and 18 than the Gaussian model. The map of posterior probability, P(βik > 0|Y), indicates that the PPD in the left side of mouth have increased significantly within the first 2 years. Compared with the Gaussian model, SHS finds more significant deterioration of PPD in the buccal side of the lower jaw and in the middle of the right upper jaw (eg, teeth 2, 5, and 6). For Subject 2, the posterior means are larger in the left maxillary side for the SHS model compared with the Gaussian model. The map of posterior probability shows similar results. Comparing models, we find more significant sites and stronger spatial clustering of the signal in the SHS model, compared with the Gaussian model.
FIGURE 4.

Posterior mean of βk and posterior probability P(β1k > 0|Y), k = 1,…, 168 for Subject 1 in the Gaussian and spatial horseshoe models with L = 53 eigenvectors
FIGURE 5.

Posterior mean of βk and posterior probability P(β2k > 0|Y), k = 1, …, 168 for Subject 2 in the Gaussian and spatial horseshoe models with L = 53 eigenvectors
Figure 6 illustrates that compared with the Gaussian model, the density of the posterior means of βik (combining all subjects) from the SHS model has higher concentration around zero, with heavier tails. Moreover, Figure 7 plots the average rejection rate among all subjects by teeth. Aside from teeth in the posterior region, SHS model also detects progression of teeth in the mandibular side and few in the right-maxillary side of the mouth. In addition, Figure 8 presents the correlation among the basis functions for the covariances Σa and Σz using low-rank representation of the sample covariance and CAR covariance. It is not surprising that almost all basis functions are unrelated due to the eigendecomposition of the two covariances. The only exception is the first and second basis functions in Σa, where we observe a weak negative correlation using the sample covariance.
FIGURE 6.

Density plot (left) and quantile-quantile plot (right) of the posterior mean of site-specific linear-trend coefficient βik for subjects i = 1, …, 7279 and k = 1, …, 168 from the Gaussian and spatial horseshoe models with the low-rank representation of the sample covariance under 90% explained variation
FIGURE 7.

The average rejection rate across subjects by teeth from the Gaussian and spatial horseshoe models with L = 53 eigenvectors. The rejection rule is available in the beginning of Section 5.3
FIGURE 8.

Posterior mean of the correlation matrix corresponding to the covariances Σa and Σz from the spatial horseshoe model using the sample covariance (left column) and conditional autoregressive (CAR) covariance (right column) in the basis function matrix Q. The diagonal values are all 1, and removed for better illustration
Fitting our model to the entire dataset of 7279 subjects is time-consuming. However, this is required only once offline to estimate population parameters. Fitting the model to one subject as would be done in practice is fast. The computing times are 0.27 and 0.36 minutes, with L = 11 using the low-rank representation of the sample covariance for the Gaussian and SHS models, respectively. When L = 53, it takes 0.63 and 1.17 minutes for the Gaussian and SHS models, respectively.
6. SIMULATION STUDY
In this section, we conduct a brief simulation study to examine the benefits of using shrinkage priors to detect increases in PD. For all simulations, we restrict the spatial domain to be one jaw (ie, the 84 sites on 14 teeth) and generate data for 50 subjects. For each subject, the intercept is generated from the CAR model (defined in Section 5.1) (αi1, …, αi84)T ~ Normal(31, 0.52S), where S = (M − 0.99A)−1, A is the 84 × 84 adjacency matrix with (u, v) element equal one with sites u and v are adjacent, and zero otherwise (including the diagonal), and M is the diagonal matrix with ith element equal to the number of sites that are adjacent to site i. The slopes βik are generated to be the same within a tooth, and independent across teeth and subjects. The slopes for a tooth are assigned value β0 with probability π0, and 0 with probability 1 − π0. Given the slopes and intercepts, the data are generated as for time steps j = 1, …, 5. Therefore, in the simulation, the data are not integer-valued as in the real data analysis. The simulations vary by the effect size β0 ∈ {0.50, 1.00} and proportion of nonnull slopes π0 ∈ {0.05, 0.20}. For each combination of these factors, we generate 100 datasets.
For each dataset, we fit three models. The first model is the Gaussian model with δi set to zero (“Gaussian”). The second model is the horseshoe model that uses data from all five visits (“HS5”), and the third model is the horseshoe model that uses data from only the first four visits (“HS4”). For each model, we use the full CAR covariance to determine the latent-factor structure, that is, Q and D are set to the eigenvectors and eigenvalues, respectively, of S. For each model, we use the priors given in Section 4.1, and generate 5000 MCMC samples after discarding 1000 as burn-in. This gives estimates of the posterior means and posterior probabilities that βik is positive, denoted qik. We conclude that the slope is positive if qik > 0.9. Table 2 reports the MSE of (averaged over tooth-site and subject) and the Type I error and power (also, averaged over tooth-site and subject) of the test for a positive slope.
TABLE 2.
Summary of the simulation study
| Statistic | Effect Size | Proportion Nonnull | Gaussian | HS5 | HS4 |
|---|---|---|---|---|---|
| MSE | 0.5 | 0.05 | 19.6(0.3) | 1.4(0.1) | 2.4(0.1) |
| 0.20 | 19.6(0.3) | 3.2(0.1) | 4.7(0.1) | ||
| 1.0 | 0.05 | 19.6(1.5) | 1.5(0.1) | 3.1(0.0) | |
| 0.20 | 19.7(0.3) | 3.8(0.1) | 7.5(0.1) | ||
| Type I error | 0.5 | 0.05 | 1.1(0.1) | 0.3(0.1) | 0.4(0.1) |
| 0.20 | 1.1(0.4) | 0.4(0.1) | 0.4(0.1) | ||
| 1.0 | 0.05 | 1.1(0.1) | 0.4(0.1) | 0.4(0.1) | |
| 0.20 | 1.1(0.1) | 0.8(0.1) | 0.6(0.1) | ||
| Power | 0.5 | 0.05 | 12.1(0.5) | 23.0(0.4) | 8.5(0.3) |
| 0.20 | 11.9(0.5) | 22.6(0.2) | 8.4(0.1) | ||
| 1.0 | 0.05 | 48.7(1.0) | 90.9(0.2) | 58.7(0.5) | |
| 0.20 | 48.0(0.9) | 91.0(0.1) | 58.5(0.2) |
Note: The competing models are the Gaussian model, and the horseshoe (“HS”) model that uses data from four (HS4), or five (HS5) visits. The simulations vary depending on the effect size β0 and proportion of nonnull slopes π0. MSE, Type I error and power are multiplied by 100, and standard errors are given in parentheses.
The MSE is dramatically smaller for the HS prior than the Gaussian prior, especially when the proportion of nonnull slopes is low (π0 = 0.05). All three methods are conservative, with Type I error less than 0.05 in all cases. The HS prior that uses the full dataset is more powerful than the Gaussian prior. In fact, the horseshoe prior that only uses data from the first four visits is often more powerful than the Gaussian model that uses data from all five visits.
7. DISCUSSION
In this article, we propose a spatiotemporal model for detecting local changes in PD. We implement the SHS prior on the linear time trend by sites for each subject. We introduce low-rank representation to reduce computational load, and obtain a nonstationary spatial covariance which suits the HP data and provides more flexibility. The empirical results show improved prediction compared with alternatives that rely on the usual Gaussian priors for the regression parameters, and a CAR specification for the covariance structure. Computing codes in R for fitting the proposed model is available on request from the corresponding author.
A potential limitation of our model is the assumption of a linear change of the PPD in time. We believe this is reasonable as we are using PPD responses collected within the subjects’ dental visits in 2 years. This linear time trend can be modified (via splines, or other functional structures) to meet the assumptions for a longer study duration.44 One possibility is to allow for a higher order trend at each site, but assume the same shrinkage parameter λik to appear in the prior SD of all terms to shrink toward the static-mean model, following developments45 in nonspatial data. A second restriction is that although some spatiotemporal dependence is induced by the random slopes and intercepts, the errors are assumed to be independent. We think this is sufficient for the HP data in that PPDs were measured independently across subjects, visits and sites. However, this may not hold for other datasets. Although we found the eigen-decomposition based on the sample covariance matrix to yield better results than the spatial CAR model, we are yet to explore more sophisticated parametric correlation structures.23,46 In addition, our shrinkage prior for the slopes is symmetric. While negative regression coefficients are plausible as PPD can decrease with interventions such as improvements in dental hygiene, increasing PPD is more common and relevant for disease monitoring, and so an asymmetric shrinkage prior could prove useful.
Relying on previous oral health studies,11,25 one maybe tempted to consider informative missingness, or the “missing-not-at-random” scenario within a spatiotemporal setup. Missing teeth are indicative of poor periodontal health. Hence, a specific region with many missing teeth is likely to have higher PPD at the nonmissing sites (in that region)—an observation which can be attributed to spatial clustering. However, in the present analysis, we feel the working MAR assumption is reasonable, given that subjects rarely loose teeth during the short follow-up time we are considering, and thus periodontal health assessment can be relied on changes in PPD over time at nonmissing sites. Furthermore, our current flagging algorithm only considers the baseline PPD, whereas, other covariates (sociodemographic, behavioral, and so on) may also influence signal detection. All these are important avenues of future research, and will be considered elsewhere.
ACKNOWLEDGEMENTS
This work was supported by grant R01DE024984 from the National Institutes of Health. The authors thank B.S.M., B.D.R., S.K., and B.A.R. for providing the HealthPartners dataset, and the context behind this work.
Funding information
Foundation for the National Institutes of Health, Grant/Award Number: R01-DE024984-01A1
APPENDIX
A1. Plots of estimated parameters
FIGURE A1.

The first four eigenvectors of the conditional autoregressive (CAR) covariance
FIGURE A2.

The fifth to seventh and the twelfth eigenvectors of the conditional autoregressive (CAR) covariance
Footnotes
SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of this article.
REFERENCES
- 1.World Health Organization. Global early warning system for major animal diseases, including Zoonoses (GLEWS) http://www.who.int/zoonoses/outbreaks/glews/en/; 2007.
- 2.Beltrán-Aguilar ED, Malvitz DM, Lockwood SA, Rozier R, Gary TSL. Oral health surveillance: past, present, and future challenges. J Public Health Dentist. 2003;63:141–149. [DOI] [PubMed] [Google Scholar]
- 3.Eke PI, Dye BA, Wei L, Thornton-Evans GO, Genco RJ. Prevalence of periodontitis in adults in the United States: 2009 and 2010. J Dental Res. 2012;91:914–920. [DOI] [PubMed] [Google Scholar]
- 4.Cheng Y-SL, Jordan L, Chen H-S, et al. Chronic periodontitis can affect the levels of potential oral cancer salivary mRNA biomarkers. J Periodontal Res. 2017;52:428–437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Persson GR, Persson RE. Cardiovascular disease and periodontitis: an update on the associations and risk. J Clinical Periodontology. 2008;35:362–379. [DOI] [PubMed] [Google Scholar]
- 6.Vavricka SR, Manser CN, Hediger S, et al. Periodontitis and gingivitis in inflammatory bowel disease: a case—control study. Inflammatory Bowel Diseases. 2013;19:2768–2777. [DOI] [PubMed] [Google Scholar]
- 7.Tran DT, Gay I, Du Xianglin L, et al. Assessment of partial-mouth periodontal examination protocols for periodontitis surveillance. J Clinical Periodontology. 2014;41:846–852. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Page RC, Eke PI. Case definitions for use in population-based surveillance of periodontitis. J Periodontology. 2007;78:1387–1399. [DOI] [PubMed] [Google Scholar]
- 9.Michalowicz HJS, Philstrom BL. Is change in probing depth a reliable predictor of change in clinical attachment loss? J Am Dental Assoc. 2013;144:171–178. [DOI] [PubMed] [Google Scholar]
- 10.Bandyopadhyay LVH, Abanto-Valle CA, Ghosh P. Linear mixed models for skew-normal/independent bivariate responses with an application to periodontal disease. Stat Med. 2010;29:2643–2655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Reich BJ, Bandyopadhyay D. A latent factor model for spatial data with informative missingness. The Annals of Applied Statistics. 2010;4:439–459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Unkel S, Farrington CP, Garthwaite PH, Robertson C, Andrews N. Statistical methods for the prospective detection of infectious disease outbreaks: a review. J Royal Stat Soc Ser A (Stat Soc). 2012;175:49–82. [Google Scholar]
- 13.Knox EG, Bartlett MS. The detection of space-time interactions. J Royal Stat Soc Ser C (Appl Stat). 1964;13:25–30. [Google Scholar]
- 14.Mantel N The detection of disease clustering and a generalized regression approach. Cancer Res. 1967;27:209–220. [PubMed] [Google Scholar]
- 15.Jacquez GM. A k nearest neighbor test for space-time interaction. Stat Med. 1996;15:1935–1949. [DOI] [PubMed] [Google Scholar]
- 16.Rogerson PA, Ikuho Y. Monitoring change in spatial patterns of disease: comparing univariate and multivariate cumulative sum approaches. Stat Med. 2004;23:2195–2214. [DOI] [PubMed] [Google Scholar]
- 17.Kulldorff M, Heffernan R, Hartman J, Assunção R, Mostashari F. A space–time permutation scan statistic for disease outbreak detection. PLOS Med. 2005;2:e59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Li G, Best N, Hansell AL, Ahmed I, Richardson S. BaySTDetect: detecting unusual temporal patterns in small area data via Bayesian model choice. Biostatistics. 2012;13:695–710. [DOI] [PubMed] [Google Scholar]
- 19.Vidal Rodeiro CL, Lawson Andrew B. Monitoring changes in spatio-temporal maps of disease. Biomet J. 2006;48:463–480. [DOI] [PubMed] [Google Scholar]
- 20.Zhou H, Lawson AB. EWMA smoothing and Bayesian spatial modeling for health surveillance. Stat Med. 2008;27:5907–5928. [DOI] [PubMed] [Google Scholar]
- 21.Watkins RE, Eagleson S, Veenendaal B, Wright G, Plant AJ. Disease surveillance using a hidden Markov model. BMC Med Inform Decis Mak. 2009;9:39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lee EC, Asher JM, Goldlust S, Kraemer JD, Lawson AB, Bansal S. Mind the scales: harnessing spatial big data for infectious disease surveillance and inference. J Infect Diseas. 2016;214:S409–S413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Reich BJ, Hodges JS, Carlin BP. Spatial analyses of periodontal data using conditionally autoregressive priors having two classes of neighbor relations. J Am Stat Assoc. 2007;102:44–55. [Google Scholar]
- 24.Jin IH, Yuan Y, Bandyopadhyay D. A Bayesian hierarchical spatial model for dental caries assessment using non-Gaussian Markov random fields. Ann Appl Stat. 2016;10:884–905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Reich BJ, Bandyopadhyay D, Bondell HD. A nonparametric spatial model for periodontal data with non-random missingness. J Am Stat Assoc. 2013;108:820–831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Cai B, Bandyopadhyay D. Bayesian semiparametric variable selection with applications to periodontal data. Stat Med. 2017;36:2251–2264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Reich BJ, Hodges JS. Modeling longitudinal spatial periodontal data: a spatially-adaptive model with tools for specifying priors and checking fit. Biometrics. 2008;64:790–799. [DOI] [PubMed] [Google Scholar]
- 28.Jhuang A-T, Fuentes M, Jones JL, et al. Spatial signal detection using continuous shrinkage priors. Techonometrics. 2019;61:494–506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ishwaran H, Rao JS. Spike and slab variable selection: frequentist and Bayesian strategies. Ann Stat. 2005;33:730–773. [Google Scholar]
- 30.Carvalho CM, Polson NG, Scott JG. The Horseshoe estimator for sparse signals. Biometrika. 2010;97:465–480. [Google Scholar]
- 31.Goldsmith J, Huang L, Crainiceanu CM. Smooth scalar-on-image regression via spatial Bayesian variable selection. J Comput Graphical Stat. 2014;23:46–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Boehm Vock LF, Reich BJ, Fuentes M, Dominici F. Spatial variable selection methods for investigating acute health effects of fine particulate matter components. Biometrics. 2015;71:167–177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ročková V, George EI. The Spike-and-Slab LASSO. J Am Stat Assoc. 2018;113:431–444. [Google Scholar]
- 34.Polson NG, Scott JG. Shrink globally, act locally: sparse Bayesian regularization and prediction. Bayesian Stat. 2010;9:501–538. [Google Scholar]
- 35.Bhadra A, Datta J, Polson NG, Willard B. Default Bayesian analysis with global-local shrinkage priors. Biometrika. 2016;103:955–969. [Google Scholar]
- 36.Quteish TDSM. Periodontal reasons for tooth extraction in an adult population in Jordan. J Oral Rehabilitat. 2003;30:110–112. [DOI] [PubMed] [Google Scholar]
- 37.Little RJA, Rubin DB. Statistical Analysis with Missing Data. 3rd ed. Hoboken, NJ: John Wiley & Sons; 2019. [Google Scholar]
- 38.Nelsen RB. An Introduction to Copulas. New York, NY: Springer; 2006. [Google Scholar]
- 39.Pearson K LIII.On lines and planes of closest fit to systems of points in space. London Edinburgh Dublin Philosoph Mag J Sci. 1901;2(11):559–572. [Google Scholar]
- 40.Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis. 3rd ed. Boca Raton, FL: Chapman and Hall/CRC; 2013. [Google Scholar]
- 41.Banerjee S, Carlin BP, Gelfand AE. Hierarchical Modeling and Analysis for Spatial Data. 2nd ed. Boca Raton, FL: Chapman and Hall/CRC; 2014. [Google Scholar]
- 42.Gelfand AE, Vounatsou P. Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics. 2003;4:11–15. [DOI] [PubMed] [Google Scholar]
- 43.Sun W, Reich BJ, Tony CT, Guindani M, Schwartzman A. False discovery control in large-scale spatial multiple testing. J Royal Stat Soc Ser B (Stat Methodol). 2015;77:59–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Corberán-Vallet A, Lawson AB. chapter 27 Spatial health surveillance In: Lawson AB, Banerjee S, Haining RP, Ugarte MD, eds. Handbook of Spatial Epidemiology. Boca Raton, FL: Chapman and Hall/CRC; 2016:501–519. [Google Scholar]
- 45.Wei R, Reich BJ, Hoppin JA, Ghosal S. Sparse Bayesian additive nonparametric regression with application to health effects of pesticides mixtures. Statistica Sinica. 2020;30:55–79. [Google Scholar]
- 46.Mancl LA, Leroux BG. Efficiency of regression estimates for clustered data. Biometrics. 1996;52:500–511. [PubMed] [Google Scholar]
