Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jul 20.
Published in final edited form as: Stat Med. 2020 Apr 3;39(16):2139–2151. doi: 10.1002/sim.8536

BAREB: A Bayesian repulsive biclustering model for periodontal data

Yuliang Li 1, Dipankar Bandyopadhyay 2, Fangzheng Xie 1, Yanxun Xu 1,*
PMCID: PMC7272289  NIHMSID: NIHMS1577761  PMID: 32246534

Summary

Preventing periodontal diseases (PD) and maintaining the structure and function of teeth are important goals for personal oral care. To understand the heterogeneity in patients with diverse PD patterns, we develop BAREB, a Bayesian repulsive biclustering method that can simultaneously cluster the PD patients and their tooth sites after taking the patient- and site-level covariates into consideration. BAREB uses the determinantal point process (DPP) prior to induce diversity among different biclusters to facilitate parsimony and interpretability. Since PD progression is hypothesized to be spatially-referenced, BAREB factors in the spatial dependence among tooth sites. In addition, since PD is the leading cause for tooth loss, the missing data mechanism is non-ignorable. Such nonrandom missingness is incorporated into BAREB. For the posterior inference, we design an efficient reversible jump Markov chain Monte Carlo sampler. Simulation studies show that BAREB is able to accurately estimate the biclusters, and compares favorably to alternatives. For real world application, we apply BAREB to a dataset from a clinical PD study, and obtain desirable and interpretable results. A major contribution of this paper is the Rcpp implementation of our methodology, available in the R package BAREB.

Keywords: Biclustering, Determinantal point process, Markov chain Monte Carlo, Periodontal disease, Spatial association

1 |. INTRODUCTION

Periodontal disease (PD), a chronic widespread inflammatory disease, can damage the soft tissues and bones that support the teeth, leading to loosening and eventual loss of teeth. In addition to impacting the quality of life,1 PD has been linked to a number of systemic diseases, such as heart diseases and diabetes.2,3 Therefore, preventing PD, and maintaining the structure and function of teeth are important goals of personal oral care. The most popular clinical biomarker quantifying the progression of PD is the clinical attachment level (CAL), defined as the depth (in mm) from the cementoenamel junction to the base of a tooth. In clinical studies, the CAL is measured at six pre-specified tooth-sites (excluding the four third molars) using a periodontal probe, which leads to 168 measurements for a full mouth with no missing teeth (for illustration, see Figure F1 in the Supplementary Materials).

PD data are complex and multi-level (patient-, tooth-, site-level), and traditional summary-based statistical approaches, such as mean, sum scores, or maximum site-level values when applied to patient-level evaluation lead to imminent loss of information.4,5,6 To mitigate this and other salient features of PD data such as non-random missingness, spatial association, and non-Gaussianity of responses, Reich and colleagues7,8 proposed Bayesian inference under the desired mixed-effects modeling framework. However, in estimating covariate effects on the CAL response, the authors assumed all patients share the same coefficient. This assumption is questionable, as the rate of PD progression can be very different among patients, with possible clustering of patients according to PD incidence.

The motivation for this work comes from a clinical study of oral PD assessment for Gullah-speaking African-American Diabetic patients (henceforth, GAAD study) residing in the coastal South Carolina sea-islands.3 To evaluate the heterogeneity of PD incidence among patients, available one-dimensional clustering methods focus on grouping either the patients, or their toothsites, separately,9 according to disease status. While useful, such clustering techniques cannot identify co-localized tooth sites that are important in inferring patient-level clustering. Furthermore, clustering of tooth sites should depend on which subgroup of patients we focus upon, given that different subgroups may partition the tooth sites in different ways, indicating different PD patterns. Therefore, it is desirable to learn whether there exist subgroups of patients, such that within each subgroup, the PD incidence of some tooth sites are different from others, and also different from patients in other subgroups. However, this inferential framework is further complicated in presence of missing data. In the GAAD dataset, a considerable proportion of patients (around 95%) have missing teeth, with an average of 32% teeth missing for a patient. This missingness is often assumed non-ignorable in PD studies,7 since PD is a major cause for tooth-loss, and estimating the counterfactual CAL values the missing tooth-sites (from a missing tooth) would have had if the tooth was not missing can facilitate subgroup identification. This is important, in conjunction to spatial associations observed in PD progression studies, i.e., proximally located tooth/tooth-sites may exhibit similar PD patterns compared to the distally located ones, because a missing/failed tooth is predictive of higher PD status, and often expected to be surrounded by teeth with high CAL.10 We aim to fill these gaps in the existing PD literature by developing a probabilistic biclustering, or two-dimensional clustering method.11

The current literature on biclustering is considerably rich,12,13,14,15 with applications to genomics and other fields. However, all these biclustering methods focus on the mean in each bicluster, and fails to incorporate important covariate information (such as age, gender), and aforementioned data characteristics typical to PD. Also, from a Bayesian standpoint, utilizing independent priors on the bicluster-specific parameters continues to remain popular due to their computational convenience and flexibility.13,15,16 However, such an approach could cause over-fitting issues and redundant biclusters, leading to inferences that are hard to interpret.17 For example, the NoB-LCP biclustering method of Xu et al.16 used the Dirichlet process (DP) priors18, where the atoms are independent and identically distributed (i.i.d.) from a base distribution, to infer clustering of histone modifications and genomic locations. Due to the properties of DP, NoB-LCP inferred a large number of small clusters with very few genomic locations, leading to unnecessarily complex models and poor interpretability.

To this end, we develop BAREB, a BAyesian REpulsive Biclustering model to study the heterogeneity in patients with diverse PD patterns. Our contributions are three fold. First, under a matrix formulation of our CAL responses (with rows as patients and columns as tooth-sites), the proposed method produces simultaneous clustering of the study patients and their tooth sites, taking into account the spatial association among tooth sites and non-random missingness patterns, and provides model-based posterior probabilities for these random partitions. These biclusters are defined via consistent associations between CAL values and covariates among a subset of tooth sites for a subgroup of patients. In other words, our proposed model will cluster any two patients together, if they give rise to the same partition of tooth sites. Second, to address the issues with independent priors, we make use of a repulsive prior – the determinantal point process, or DPP19 on the random partitions, which encourages diversity in PD patterns among different biclusters. Bayesian inference using the DPP priors have proved to be extremely effective in facilitating parsimony and interpretability in a variety of mixture and latent feature allocation models with biomedical applications to infer clinically meaningful subpopulations.20,21 Third, BAREB introduces a (latent) shared-parameter framework to deal with the non-ignorable missingness, with the resulting marginal mixture density accommodating the non-Gaussianity of CAL responses. In addition, integrated R and C++ codes for implementing BAREB in other application domains are available via the R package BAREB.22

The rest of the paper proceeds as follows. In Section 2, we present the statistical formulation for BAREB and develop the Bayesian inferential framework, with the associated choice of priors, joint likelihood, posteriors, and model comparison measures. In Section 3, we evaluate the finite sample performance of BAREB, and the advantages of using the repulsive DPP over plausible alternatives via a simulation study. Application of our method to the motivating GAAD data is presented in Section 4. Finally, we conclude, with a discussion in Section 5.

2 |. STATISTICAL MODEL AND INFERENCE

2.1 |. Formulating BAREB

We use an N × J matrix Y = [yij] to represent the observed CAL values, with yij denoting the CAL for patient i at tooth site j, where i = 1, …, N; j = 1, …, J. From the motivating GAAD study, we consider patients with at least one tooth present, and with complete set of covariates. Missing CAL values are denoted as yij = NA. Note, a tooth-site is missing, if and only if the corresponding tooth is missing. Hence, we consider the missingness indicator δi(t) at the tooth-level, i.e., δi(t) = 1 if tooth t of patient i is missing, otherwise δi(t) = 0, i = 1, …, N; t = 1, …, T.

BAREB clusters any two patients together if they have the same partition of tooth sites after accounting for patient-level covariates. Since the clustering of tooth sites are nested within clusters of patients, we start the model construction with a random partition of patients {1, …, N} by denoting the vector e = (e1, …, eN) as the patient cluster membership indicator. Denote S to be the number of patient clusters, where ei = s indicates patient i belongs to patient cluster s, s = 1, …, S. We propose a categorical distribution prior for e, such that

ei~ i.i.d. Categorical (w),  i=1,,N, (1)

where w = (w1, …, wS) with s=1Sws=1. We assume a Dirichlet distribution prior on w, such that w ~ Dirichlet(α), where α = (α1, …, αS).

Next, we consider clustering of tooth sites within each of the S patient clusters. Recall, the partition of tooth sites is nested within patient clusters, i.e., site clusters can be different for different patient clusters. Let Ds be the number of tooth site clusters for the s-th patient cluster. Define rs = (rs1, …, rsJ) the vector of clustering labels rsj ∈ {1, …, Ds} that describe the partition of tooth sites corresponding to the s-th patient cluster, where rsj = d denotes that tooth site j is assigned to site cluster d in patient cluster s. Letting r = (r1, …, rS), we assume independent categorical priors for each rs, given by

p(r|e)=s=1Sp(rs)  and  rsj~ i.i.d.  Categorical(ϕs),j=1,,J (2)

where ϕs ~ Dirichlet(αsϕ), with αsϕ=(αs1ϕ,,αsDsϕ). The prior probability models on the biclustering in (1) and (2) can be characterized as a partition of patients and a nested partition of tooth sites, nested within each cluster of patients. These biclusters will provide valuable information on the periodontal decay patterns of teeth and heterogeneity among PD patients for developing subsequent prevention and treatment strategies.

Given e and r, we construct a sampling model for the CAL as:

yij=xiβi+zjγij+vij+ϵij. (3)

Here, xi is the vector of patient-level covariates (e.g., age) for patient i with the corresponding regression parameter βi; zj is the vector of tooth site-level covariates (e.g., jaw indicator) including an intercept term with the corresponding regression parameter γij; vij models the spatial dependence among tooth sites; and ϵij are independent errors: ϵij~ i.i.d. N(0,σ2).

The prior probability models for βi and γij make use of the biclustering. We define p(βi, γij | e, r) as follows. If ei = s and rsj = d, we assume all sites in the same site cluster d in patient cluster s share the parameter γ˜sd, and all patients in the same patient cluster s share the parameter β˜s, i.e., βi=β˜s for all i with ei = s and γij=γ˜sd for all i and j, with ei = s and rsj = d. Figure 1 presents a graphical illustration of the proposed BAREB model with 8 patients and 6 tooth sites. Here, we assume three patient clusters, with cluster # 1 having two site clusters, cluster # 2 having three site clusters, and cluster # 3 having two site clusters. Varying colors indicate the different biclusters with the corresponding parameters. Within each patient cluster, all patients share the same β˜s, but different γ˜sds across different site clusters, with d = 1, …, Ds; s = 1, …, S. We will discuss the priors for β˜s and γ˜sd in Section 2.2.

FIGURE 1.

FIGURE 1

An illustration of the proposed BAREB model with 8 patients and 6 tooth sites. Here, we assume 3 patient clusters.

To account for non-randomly missing data, we introduce a latent variable gi(t), such that δi(t) = I(gi(t) > 0), t = 1, …, T. We model gi(t) as

gi(t)~N(μi*(t),1)  and  μi*(t)=c0+c1Rtμi, (4)

where μi = (μi1, …, μiJ) with each element μij = xiβi + zjγij + vij, Rt is a J-dimensional vector with Rt(j)=16 if site j is on tooth t, and 0 otherwise, and c = (c0, c1) is the unknown (estimable) parameter that controls the relationship between the CAL response and the probability of missing tooth. Under this shared-parameter joint modeling framework7, the mean μi is shared between the two regression models, with the mean of gi(t) specified as the average CAL value corresponding to the six locations of tooth t for patient i. If c1 > 0, higher CAL values will more likely result into increased probability of missing tooth, and vice versa. From (4), we can easily derive p(δi(t)=1)=Φ(μi*(t)) after integrating out gi(t), where Φ(·) is the cumulative distribution function of the standard normal distribution.

2.2 |. Priors and joint likelihood

In this subsection, we discuss the priors on the linear coefficients β˜ss and γ˜sds, and the spatial term vij. In practice, independent priors on biclustering-specific parameters β˜ss and γ˜sds are preferred due to their computational tractability. However, such an approach could cause redundant clusters, resulting in inferences that are hard to interpret as biologically/clinically meaningful subpopulations.15,16 Xu etal.21 proposed to use the DPP as a prior to induce repulsiveness among component-specific parameters in the context of Gaussian mixture models, and showed that the DPP prior yields more parsimonious and interpretable inference compared to independent priors. Now, we extend the use of the DPP in our biclustering setup to encourage repulsive coefficients β˜ss and γ˜sds, thereby inducing diverse PD patterns among different biclusters.

Let Cβ denote an S×S positive semidefinite matrix constructed through a covariance function Cssβ=Cβ(βs,βs). We modify the DPP prior used in Xu et al.21 by defining a prior on (β˜1,, β˜S), with respect to the S-dimensional Lebesgue measure on S as

p(β˜1,, β˜S|S)=1ZSdet[Cβ](β˜1,, β˜S), (5)

where ZS is the normalizing constant, and det[Cβ](β˜1,, β˜S) is the determinant of the matrix [Cssβ]S×S. Geometrically, the determinant can be interpreted as the volume of a parallelotope spanned by the column vectors of Cβ. Therefore, the prior in (5) defines a repulsive point process, since equal or similar column vectors span smaller volume than very diverse ones. Specifically, p(β˜1,,β˜S|S)=0, whenever β˜s=β˜s for some ss. In other words, it assigns vanishing values to the density at the point configurations that have replicate(s) within themselves. In this paper, we use Cβ(β˜s, β˜s)=ψ(β˜s;0,σβ2I)exp{β˜sβ˜s,22θβ2}ψ(β˜s;0,σβ2I), where θβ is an unknown parameter that controls how repulsive the prior is; and ψ(·;μ, Σ) is the density function of a multivariate normal distribution with mean μ and variance Σ. If desired, alternative covariance functions can be implemented here without complicating the model. Similarly, we define the repulsive prior on γ˜sd, s = 1, …, S, as follows:

p(γ˜s1,, γ˜sDs|Ds)=1Zγsdet[Cγs](γ˜s1,, γ˜sDs), (6)

where Cγs (γ˜sd,γ˜sd)=ψ(γ˜sd;0,σγ2I)exp{γ˜sdγ˜sd22θγs2}ψ(γ˜sd;0,σγ2I). We further assume the priors p(Ds)Zγs/Ds!, p(θβ)=N(0,σθβ2),p(θγs)=N(0,σθγ2), s = 1, …, S.

We assume the priors σsp2~IG(asp,bsp) and ρ ~ Unif(aρ, bρ), where IG(·, ·) and Unif(·, ·) denote the Inverse Gamma and Uniform densities, respectively. For the variance of the error ϵij, we assume a conjugate prior p(σ2)~IG(aσ2,bσ2). The elements of the parameter vector c determining non-random missingness are assigned conjugate normal priors, i.e., p(ci)=N(0,σc02), i = 0, 1. Note, we do not assume a prior on the number of patient clusters S, as it complicates posterior computation if we allow both S and Ds, s = 1, …, S to be random. We will discuss how to choose S in Section 2.3.

Finally, we complete the model construction by assigning a prior to the spatial term vij. Let vi = (vi1, …, viJ). For each patient i, we assume

vi~ i.i.d. MVN(0,Σ), (7)

where MVN(·, ·) denotes a multivariate normal density. Following Besag’s work,23 we model the spatial effect by considering a conditional autoregressive (CAR) prior: Σ=σsp2G(ρ)1, where G(ρ) = BρW. Here B is a diagonal matrix with the jth diagonal entry being the number of neighbors at jth site and W denotes the adjacency matrix for 168 tooth sites in the mouth structure. To construct the adjacency matrix, we consider the adjacent sites on the same tooth and sites that share a gap between teeth as “neighbors”. The adjacency structure considered in both data analysis and simulation studies is presented in Figure F2 in the Supplementary Materials (see Type I & II neighbors).

In summary, the joint model of BAREB factors as

p(Y|{β˜s}s=1S,{γ˜sd}d=1,s=1Ds,     S,{vi}i=1N,e,r,σ2)(3)p(e|S,w)(1)p(r|e,S,{ϕs}s=1S)(2)p(δi(t)|c,μi*(t))(4)×p({β˜s}s=1S)(5)p(σ2)p(c)p(θβ)p(S)s=1Sp({γ˜sd}d=1Ds)(6)p(Ds)p(θγs)p({vi}i=1N|σsp2,ρ)(7)p(σsp2)p(ρ).

2.3 |. Posterior inference

We carry out Markov chain Monte Carlo (MCMC) simulations for posterior inference. One challenging step is to update the number of site clusters Ds nested within each patient cluster. We design a reversible jump MCMC (RJMCMC) sampler24 that allows random Ds within the DPP prior for (γ˜s1,,γ˜sDs) using the moment-matching principle25 in a multivariate setting. Details of the sampling procedures are described in Appendix B of the Supplementary Materials.

To determine the patient cluster cardinality S, we use a model selection procedure based on the Watanabe-Akaike information criterion, WAIC,26 instead of designing a RJMCMC sampler for computational efficiency. Compared to other popular model selection methods such as AIC, BIC, and DIC,27 the WAIC based on point-wise predictive density is fast, computationally convenient, and fully Bayesian using the full posterior distribution rather than a point estimate. WAIC has been widely used for model comparison and estimating the number of mixtures in mixture models28,29,30. WAIC estimates the expected log point-wise predictive density (elppd^) as the measurement of model performance, defined as WAIC=2elppd^, where

elppd^=i=1N{log(1Bb=1Bp(yi|θ(b)))VB(logp(yi|θ(b)))}. (8)

Here yi = (yi1, …, yiJ); B is the number of post burn-in MCMC posterior samples; θ(b) is the posterior draw of the parameter vector from the b-th iteration; and VB represents the sample variance denoted by VB(ab)=1B1b=1B(aba¯)2, where a¯=b=1Bab. The first term within the braces in (8) is the log point-wise predictive density for non-missing data, which can be considered as the goodness of fit; the second term is the estimated effective number of parameters, which can be considered as the penalty term determining model complexity. We run BAREB for a set of different S values, and choose the optimal S that yields the smallest WAIC.

Another challenge in implementing BAREB is to summarize a distribution over random partitions. We follow Dahl’s approach,31 and report a point estimate of the biclustering. Consider an N × N matrix H, where the element Hi1,i2=(ei1=ei2|data) represents the estimated posterior probability of patient pairs clustered together, with () being the empirical posterior mean computed based on MCMC samples. Within each MCMC iteration, the posterior sample of the patient clustering indicator e defines an N ×N clustering matrix Ve, with the element Vi1,i2e=I(ei1=ei2) defined as an indicator that patient i1 is clustered with patient i2. With this, we propose a least-square (LS) summary for patient clustering by minimizing the Frobenius distance between Ve and H of the posterior pairwise co-clustering probabilities, given as eLS=arg mineVeH2, which is a point estimate of the patient clustering. Conditional on eLS, we extract site-level clustering rs from MCMC iterations in which the patient clustering indicator e is the same as eLS. Then we compute the LS summary of the site-level clustering rsLS for each patient level cluster s, through the same formulation.

The LS summary also plays a crucial role in handling the label switching problem typical to any RJMCMC implementation.32 Here, we relabel the cluster membership indicator at each iteration to match the LS summary in a post-processing step after the MCMC runs. For example, consider patient-level clustering, and let eb be the patient cluster membership indicator drawn from the bth iteration. We assign the relabelling of eb as enewb, obtained by minimizing minenew bA(eb)dist(eLS,enew b). Here, the distance dist(·, ·) is defined as the number of elements that are dissimilar in the two input vectors, and A(eb) denotes the set consisting of Sb! possible relabelings for eb, where Sb denotes the number of patient-level clusters in the bth iteration.

3 |. SIMULATION STUDY

In this section, we conduct simulation studies to evaluate the performance of BAREB by comparing the posterior inference to the simulation truth. Furthermore, to elucidate the advantages of using the repulsive DPP prior that encourages the linear coefficients in different biclusters to be diverse, we compared BAREB to an alternative model that uses independent priors on these linear coefficients.

We simulated a data matrix Y with N = 80 patients and J = 168 tooth sites, with the true number of patient clusters S0 = 3. All 80 patients have the same probability to be assigned to the three clusters. We assumed the three patient clusters partitioned the tooth sites into (D10, D20, D30) = (2, 3, 4) site clusters, respectively, where each tooth site was equally assigned to the two site clusters in patient cluster 1, the three site clusters in patient cluster 2, and the four site clusters in patient cluster 3. Figure 2(a) illustrates the simulated true biclustering scheme, with rows representing patients, and columns representing tooth sites. We generated three patient-level covariates, xi = (xi1, xi2, xi3) with two continuous covariates xi1 and xi2 generated from N(0, 32), and one binary xi3 generated from Bernoulli(0.5). The site-level covariate vector zj = (1, zj1, zj2) was generated as zj1 from N(0, 32), and zj2 from Binomial(5, 0.5). For the linear coefficients {β˜s}s=1S0 and {γ˜sd}d=1,s=1Ds0,  S0, we fixed them as in Table T1 (Supplementary Materials). Conditional on ei = s and rsj = d, the observed response yij was generated from yij | ei = s, rsj=d~N(xiβ˜s+zjγ˜sd+vij,σ02), where σ0 = 1 and vi was generated from the CAR model with parameters σsp2=4 and ρ = 0.96. In the missing model (4), we assumed c = (c0, c1) = (0.1, 0.2), leading to about 20% missing teeth in the simulated data.

FIGURE 2.

FIGURE 2

Simulation Illustration of biclustering: (panel a) The heatmap of biclustering in the simulation truth, (panel b) The estimated biclustering under BAREB, (panel c) the estimated biclustering under the Indep model. Different colors represent different biclusters.

We applied the proposed BAREB to the simulated dataset. The hyperparameters were set to be α = (1, …, 1), αsϕ=(1,,1) for s = 1, …, S, aσ2=bσ2=1/2, σθβ2=100, asp = 1, bsp = 1, aρ = 0.95, bρ = 1, σθγ2=100, σβ2=σγ2=105, and σc02=100. We considered S ∈ {2, …, 10}. The RJMCMC sampler was implemented with an initial burn-in of 3000 iterations, followed by B = 2000 post-burn-in iterations. A laptop computer with 2 GHz Intel Core i5 processor with 8 GB memory took around 45 minutes to run 5000 iterations. Convergence diagnostics assessed using R package coda revealed no issues. Figure F3, panel a (Supplementary Materials) plots WAIC versus S, indicating S^=3, which is the same as the simulation truth. To further investigate the effectiveness of WAIC in model selection (question raised by a reviewer), we also simulated data under the ground truth S = 4. We again observe that WAIC is the least when S^=4. Next, the LS summary of the posterior on e was calculated. Then, conditional on eLS, we calculated the LS estimates of site clusters rLS. Figure 2(b) plots the LS summary of the posterior on e and r, showing that the BAREB correctly assigns the patients to the three patient clusters. For site clusters in each patient cluster, BAREB identified D^1=2, D^2=3, and D^3=4, which also matches the simulation truth. As shown in Figure 2(b), BAREB assigns most of the tooth sites to their simulated true site clusters. For instance, in patient cluster 1, only four sites in site cluster 1 were misclassified; in patient cluster 3, site clusters 1 and 4 were correctly identified, while only one site in cluster 2 and two sites in cluster 3 were misclassified. Table T1 (Supplementary Materials) reports the posterior mean and the mean squared error (MSE) of the estimated coefficients, where MSE is computed as the mean squared differences between the posterior samples and the simulated true values across post-burn-in iterations. Compared to the simulation truth, BAREB can accurately estimate these coefficients with small MSE. We also plot the 95% estimated credible intervals (CI) of the coefficients in Figure 3, where the black dots represent the simulated true values. We observe that the 95% CIs are centered around the simulated true values.

FIGURE 3.

FIGURE 3

Simulation study: Posterior mean and 95% CIs for the estimated {β˜s}s=1S^ and {γ˜sd}d=1,s=1D^s,S^, the patient- and site-level parameters under BAREB. Black dots represent the simulated true values.

Furthermore, we conducted sensitivity analyses of the proposed BAREB model with respect to both hyperparameters and the covariance function used in the DPP prior. For the hyperparameters, we considered σθβ2{50,100,200} and σθβ2{50,100,200}. For the covariance function, we applied the proposed BAREB by replacing the exponential covariance function used in generating the simulated dataset with the rational quadratic covariance function: Cα,k(x1,x2)=(1+x1x2222αk2)α for both Cβ(β˜s, β˜s) and Cγs(γ˜sd, γ˜sd). Figure F6 (Supplementary Materials) plots the estimated LS summary of the posterior on both patient clustering and site clustering for these sensitivity analyses. We observe that the BAREB still correctly assigns the patients to the three patient clusters and most of the tooth sites to their simulated true site clusters, irrespective of the choice of the hyperparameter values, and covariance functions. We also summarized the corresponding MSEs for patient-level and site-level coefficients in Table T2 (Supplementary Materials). These values are similar across the various choices of hyperparameters and covariance functions, revealing the robustness of BAREB to the choices of hyperparameters and the covariance functions used in the DPP prior.

Next, to emphasize the advantage of the repulsive DPP prior, we conducted another study by replacing the DPP priors on β˜ss and γ˜sds with the independent multivariate normal (MVN) priors. To be precise, for the patient-level β˜s,s=1,,S, we assume independent MVN priors. For the site cluster membership indicator rs in patient cluster s, we consider a Pólya urn prior p(rs)α1Dsd=1DsΓ(nsd), where α1 is the total mass parameter of the Pólya urn scheme, and nsd is the number of tooth sites in site Patient–level coefficients Site–level coefficients in patient cluster 2 cluster d of patient cluster s. Conditional on e and r, we assume independent MVN priors on γ˜sd, d = 1, …, Ds; s = 1, …, S. We coin this as the ‘Indep’ model. Based on the WAIC criteria, Indep identified S^=3 patient clusters, which agrees with the truth. Figure 2(c) plots the LS summary of the posterior on e and r under this Indep model. We observe that although Indep assigns patients to their simulated true patient clusters, it fails to identify tooth-site clusters within the patient clusters.

Figure F4 (Supplementary Materials) compares the histograms of the posterior number of site-clusters within each patient cluster under the BAREB and Indep models. Clearly, BAREB recovers the ground truth, while the Indep overestimates the number of site clusters with substantial probability. This is a well-known phenomenon of applying Bayesian nonparametric priors to clustering16. Since Indep correctly estimates the number of patient clusters but incorrectly estimates the number of site clusters within each patient cluster, the posterior estimated cluster-specific linear coefficients β˜s and γ˜sd cannot be compared directly between the BAREB and Indep. Considering βi=β˜s for all i with ei = s and γij=γ˜sd for all i and j, with ei = s and rsj = d, we computed the mean squared errors (MSE) between the posterior estimated and the simulated true linear patientand site-level coefficients, respectively, under the BAREB and Indep models. For patient-level coefficients βi’s, the MSE’s are 7.66e-3 (BAREB) and 8.50e-3 (Indep); for the site-level coefficients γij’s, the MSEs are 0.278 (BAREB) and 0.467 (Indep). Thus, we observe that the DPP is advantageous over independent priors in biclustering scenarios.

4 |. APPLICATION: GAAD DATA

The GAAD study3 was primarily designed to explore the relationship between PD and diabetes status, determined by the glycosylated haemoglobin (HbA1c) level. Excluding patients with all teeth missing, we have N = 288 patients in the dataset. We considered several patient-level covariates as potential risk factors of PD, which includes Age (in years), Gender (female=1, male=0), Smoking indicator (smoker=1, non-smoker=0), and HbA1c (high level=1, controlled=0). We also considered a site-level covariate, the jaw indicator (upper jaw=1, low jaw=0). Table T3 (Supplementary Materials) lists the patient characteristics from the dataset.

We applied BAREB to the PD dataset, considering S = {2, ⋯, 10}. The hyperparameters were set to be α = (1, …, 1), αsϕ=(1,,1) for s = 1, …, S, ac2=bo2=1/2, σθ12=100, asp = 1, bsp = 1, aρ = 0.8, bρ = 1, σθ12=100, and σc02=100. For each S, we used 5,000 post burn-in samples after 10,000 iterations to compute our posterior estimates. WAIC identified S^=4 patient clusters. We then computed the LS estimates eLS and rLS to summarize the posterior inference for biclustering. The four patient clusters had cluster sizes of 3, 174, 80, and 31 patients, respectively. The numbers of site clusters within patient clusters are 3, 2, 2, and 1. To examine the goodness-of-fit of the proposed BAREB model to the observed GAAD data, we used the posterior predictive measure of Gelman et al.,33 with the likelihood as the discrepancy function. The resulting p-value of 0.494 indicates decent model adequacy.

Figure 4 summarizes the posterior mean and 95% credible intervals of the covariates within the four estimated patient clusters. Note, we separately plot the parameters for patient cluster 1 for visualization, since its scale is very different from the other clusters. Also, due to the small cluster size (n1 = 3) in patient cluster 1, the estimated parameters may not be very reliable, with the 95% CIs much wider than those observed in other patient clusters, as expected. Further looking into patient cluster 1 reveals these 4 patients have at least half of the teeth missing, with an average missingness rate being 69%. Therefore, we excluded the patient cluster 1 from the following clinical interpretations. As shown in Figure 4, the effects of the patient-level covariates are quite distinct among different patient clusters. Panel (a) shows periodontal health deteriorates with age, since CIs of Age in all clusters are positive, and exclude 0. Similarly, a significant positive association was found between PD and smoking, a factor that has been believed to increase the risk of PD.34 Panel (b) shows that males are more likely to have severe PD than females, confirming previous findings,35 while panel (c) reveals smoking to be an important determinant of PD, mostly for patient clusters 3 and 4. From panel (d), we observe that PD has a positive correlation with diabetes (HbA1c), revealing that diabetes is possibly an important risk factor of PD.36 The plots for the site-level Jaw Indicator in panel (e) implies that teeth in the upper jaw (maxilla) are more likely to develop PD. Although this finding is inconclusive, previous studies37,38 seem to indicate that some maxillary teeth, such as the left and right central incisors and the first premolar, experience a higher rate of missingness (due to PD), than the mandibular (lower jaw) teeth. In the GAAD dataset, the tooth missingness rates in the maxilla and mandible are, respectively, 37% and 28%. Among the non-missing teeth, the mean CAL values (combining all teeth) are 2.03 and 1.83, respectively, for the maxilla and mandible.

FIGURE 4.

FIGURE 4

Posterior mean and 95% credible intervals for the parameters corresponding to the patient-level covariates: (a) Age, (b)Gender, (c) Smoking status indicator, (d) HbA1c, and site-level covariate (e) Jaw indicator. In panel (e), P1S1 denotes site cluster 1 in patient cluster 1, P1S2 denotes site cluster 2 in patient cluster 1, etc.

Next, we report site clustering in patient clusters 2 and 3. Patient cluster 1 is excluded owing to its small cluster size, while patient cluster 4 is excluded as there is only one site cluster from the LS estimate. Figure 5 (panels a and c) display the ordinal CAL values from two randomly selected patients in patient clusters 2 and 3, respectively, based on the classification of the American Association of Periodontology.39 The five ordinal categories are (i) no PD (CAL 0–1 mm), (ii) slight PD (CAL 1–2 mm), (iii) moderate PD (CAL 3–4 mm), (iv) severe PD (CAL 5 ≥ 5mm), and (v) missing (if the tooth is missing). Figure 5 (panels b and d) plot the heatmaps of the estimated posterior probability of each pair of sites being clustered together, as described in subsection 2.3. BAREB estimates distinct site clustering patterns in different patient clusters. In particular, for the randomly selected patient in patient cluster 2, two maxillary molars and four mandibular molars are missing. From the corner dark squares in Figure 5 (panel b), these missing molar sites are clustered together. For example, the missing maxillary molar sites 1–6 and 79–84 are clustered, along with the mandibular missing sites 157–168. Also, the four rectangular black patches reveal tooth sites of the same type are more likely to be clustered together. On the contrary, for the randomly selected patient from patient cluster 3, the corresponding plot (panel d) produces a checkerboard pattern, where the missing sites at different site locations tend to cluster with a high probability.

FIGURE 5.

FIGURE 5

Site-level clustering in GAAD data. Panels (a) and (b) plot the ordinal CAL levels, and the heatmap of the posterior clustering probability of tooth-sites for a randomly selected patient from patient cluster 2. Panels (c) and (d) plot the same, respectively, now for a randomly selected patient from patient cluster 3.

Then, in Figure 6, we report the posterior predictive probabilities of the aforementioned ordinal CAL categories. For illustration, we consider a specific tooth-site (# 120) from the mandibular incisor of a hypothetical patient with mean age 55.27 years old, under all possible combinations of gender, smoking and HbA1c levels, where F, M, N, S, L, and H denote female, male, non-smoker, smoker, controlled HbA1c, and high HbA1c, respectively. Considering a tooth-site with no, or slight PD as ‘healthy’, the probability being healthy is 0.561 for a female non-smoker with controlled HbA1c, while the probability is 0 for a male smoker with high HbA1c. Specifically, females have higher probability of having healthy teeth than males (for example, FNL = 0.561, versus MNL = 0.461); smokers have lower probability than nonsmokers (FSL = 0.542 versus FNL = 0.561); controlled HbA1c have higher probability of having no PD than high HbA1c (FNL = 0.423 versus MNH = 0.000), and so on. Figure F5 (Supplementary Materials) presents the density histogram plot of the CAL response, overlaid with the fitted curve generated from the marginal posteriors and Silverman’s rule-of-thumb smoothing bandwidth. The proposed BAREB model induces a marginal density capable of accommodating the possible non-Gaussian features of the CAL. Thus, in lieu of non-Gaussian parametric (say, skew-t), and semiparametric assumptions for the error term ϵij that may induce computational issues, we consider our N(0, σ2) assumption to be adequate. We also present the posterior means and 95% credible intervals for other remaining parameters in Table T4 (Supplementary Materials). It is worth noting that estimate of c1 is positive and significant, which unsurprisingly implies that higher values of CAL leads to higher probability of missing teeth. The estimate of ρ, the spatial association parameter in the CAR model, is 0.843, which corresponds to moderate spatial correlation based on the calibration of ρ.40

FIGURE 6.

FIGURE 6

Posterior predictive probabilities of ordinal CAL categories for site # 120 of a mandibular incisor from a hypothetical patient with age 55.27 years, under various combinations of gender, smoking and HbA1c levels.

5 |. CONCLUSION

Our proposed BAREB can detect simultaneous clustering patterns among PD study patients and tooth sites, factoring in both patient- and site-level covariates, the spatial dependence among tooth sites, and possible nonrandom missingness patterns. That way, our proposal improves upon available clustering techniques into learning the heterogeneity in PD among patients with distinct disease patterns. We also demonstrate the advantages of using the DPP prior over independent priors for quantifying diversity among various biclusters, balancing parsimony and interpretation. In addition, BAREB is readily implementable via R, and can be a welcome addition to a user’s toolbox.

Although motivated from an oral health application, BAREB provides a general framework for inference on biclustering in many other applications involving a data matrix and covariates. For example, in gene expression data (where rows and columns represent genes and patient samples, respectively), BAREB can discover functionally related genes under different subsets of patients, factoring in various clinical covariates. Furthermore, the choice of a parametric selection model to tackle non-random missingness was also due to computational reasons, leaving us no opportunity to introduce sensitivity parameters for assessing non-random missingness.41 All these are viable areas for future research, and will be pursued elsewhere.

Supplementary Material

Supp info2
Supp info1

ACKNOWLEDGEMENTS

The authors thank the Center for Oral Health Research at the Medical University of South Carolina for providing the GAAD dataset. The work of Bandyopadhyay was supported by NIH grant R01-DE024984. The work of Xu was supported by NSF grant 1918854.

References

  • 1.Ferreira M, Dias-Pereira A, Almeida B. -dL, Martins C, Paiva S. Impact of periodontal disease on quality of life: a systematic review. Journal of Periodontal Research 2017; 52(4): 651–665. [DOI] [PubMed] [Google Scholar]
  • 2.Bahekar AA, Singh S, Saha S, Molnar J, Arora R. The prevalence and incidence of coronary heart disease is significantly increased in periodontitis: a meta-analysis. American Heart Journal 2007; 154(5): 830–837. [DOI] [PubMed] [Google Scholar]
  • 3.Fernandes JK, Wiegand RE, Salinas CF, et al. Periodontal disease status in Gullah African Americans with type 2 diabetes living in South Carolina. Journal of Periodontology 2009; 80(7): 1062–1068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Pilgram TK, Hildebolt CF, Dotson M, et al. Relationships between clinical attachment level and spine and hip bone mineral density: data from healthy postmenopausal women. Journal of Periodontology 2002; 73(3): 298–301. [DOI] [PubMed] [Google Scholar]
  • 5.Cho YI, Kim HY. Analysis of periodontal data using mixed effects models. Journal of Periodontal & Implant Science 2015; 45(1): 2–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Nomura Y, et al. Site-level progression of periodontal disease during a follow-up period. PloS One 2017; 12(12): e0188670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Reich BJ, Bandyopadhyay D. A latent factor model for spatial data with informative missingness. The Annals of Applied Statistics 2010; 4(1): 439–459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Reich BJ, Bandyopadhyay D, Bondell HD. A nonparametric spatial model for periodontal data with nonrandom missingness. Journal of the American Statistical Association 2013; 108(503): 820–831. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bandyopadhyay D, Canale A. Non-parametric spatial models for clustered ordered periodontal data. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2016; 65(4): 619–640. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Schnell P, Bandyopadhyay D, Reich BJ, Nunn M. A marginal cure rate proportional hazards model for spatial survival data. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2015; 64(4): 673–691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Cheng Y, Church GM. Biclustering of expression data In: Philip Bourne RANJDHTLJMESCSSSHW., ed. Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB 2000). 8 2000. (pp. 93–103). [PubMed] [Google Scholar]
  • 12.Getz G, Levine E, Domany E. Coupled two-way clustering analysis of gene microarray data. Proceedings of the National Academy of Sciences 2000; 97(22): 12079–12084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Gu J, Liu JS. Bayesian biclustering of gene expression data. BMC Genomics 2008; 9(Suppl 1): S4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Li G, Ma Q, Tang H, Paterson AH, Xu Y. QUBIC: a qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Research 2009; 37(15): e101. doi: 10.1093/nar/gkp491 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lee J, Müller P, Zhu Y, Ji Y. A nonparametric Bayesian model for local clustering with application to proteomics. Journal of the American Statistical Association 2013; 108(503): 775–788. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Xu Y, Lee J, Yuan Y, et al. Nonparametric Bayesian bi-clustering for next generation sequencing count data. Bayesian Analysis 2013; 8(4): 759–780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Xie F, Xu Y. Bayesian repulsive Gaussian mixture model. Journal of the American Statistical Association 2019. [Google Scholar]
  • 18.Neal RM. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 2000; 9(2): 249–265. [Google Scholar]
  • 19.Macchi O The Coincidence Approach to Stochastic Point Processes. Advances in Applied Probability 1975; 7(1): 83–122. [Google Scholar]
  • 20.Affandi RH, Fox E, Taskar B. Approximate inference in continuous determinantal processes. Advances in Neural Information Processing Systems 2013: 1430–1438. [Google Scholar]
  • 21.Xu Y, Müller P, Telesca D. Bayesian inference for latent biologic structure with determinantal point processes (DPP). Biometrics 2016; 72(3): 955–964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Li Y, Bandyopadhyay D, Xie F, Xu Y. R package BAREB, Version 0.1.0. 2020.
  • 23.Besag J Spatial interaction and the Statistical Analysis of Lattice Systems. Journal of the Royal Statistical Society: Series B (Methodological) 1974; 36(2): 192–225. [Google Scholar]
  • 24.Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 1995; 82(4): 711–732. [Google Scholar]
  • 25.Zhang Z, Chan KL, Wu Y, Chen C. Learning a multivariate Gaussian mixture model with the reversible jump MCMC algorithm. Statistics and Computing 2004; 14(4): 343–355. [Google Scholar]
  • 26.Vehtari A, Gelman A, Gabry J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing 2017; 27(5): 1413–1432. [Google Scholar]
  • 27.Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002; 64(4): 583–639. [Google Scholar]
  • 28.Almond R. Comparison of two MCMC algorithms for hierarchical mixture models In: Kathryn Laskey RA., ed. Bayesian Modeling Application Workshop at the Uncertainty in Artificial Intelligence Conference 3 ed. 2014. (pp. 1–19). [Google Scholar]
  • 29.Zhao L, Shi J, Shearon TH, Li Y. A Dirichlet process mixture model for survival outcome data: assessing nationwide kidney transplant centers. Statistics in Medicine 2015; 34(8): 1404–1416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Susanto I, Iriawan N, Kuswanto H, Suhartono. Bayesian inference for the finite gamma mixture model of income distribution. Journal of Physics: Conference Series 2019; 1217: 012077. doi: 10.1088/1742-6596/1217/1/012077 [DOI] [Google Scholar]
  • 31.Dahl DB. Model-based clustering for expression data via a Dirichlet process mixture model In: Do KA, Müller P, Vannucci M., eds. Bayesian Inference for Gene Expression and Proteomics Cambridge University Press; 2006. (pp. 201–218). [Google Scholar]
  • 32.Jasra A, Holmes CC, Stephens DA. Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statistical Science 2005; 20(1): 50–67. [Google Scholar]
  • 33.Gelman A, Meng XL, Stern H. Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica 1996; 6(4): 733–760. [Google Scholar]
  • 34.Leite FR, Nascimento GG, Scheutz F, López R. Effect of smoking on periodontitis: A systematic review and metaregression. American Journal of Preventive Medicine 2018; 54(6): 831–841. [DOI] [PubMed] [Google Scholar]
  • 35.Mamai-Homata E, Koletsi-Kounari H, Margaritis V. Gender differences in oral health status and behavior of Greek dental students: A meta-analysis of 1981, 2000, and 2010 data. Journal of International Society of Preventive & Community Dentistry 2016; 6(1): 60–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Jansson H, Lindholm E, Lindh C, Groop L, Bratthall G. Type 2 diabetes and risk for periodontal disease: a role for dental health awareness. Journal of Clinical Periodontology 2006; 33(6): 408–414. [DOI] [PubMed] [Google Scholar]
  • 37.Shigli K, Hebbal M, Angadi GS. Relative contribution of caries and periodontal disease in adult tooth loss among patients reporting to the Institute of Dental Sciences, Belgaum, India. Gerodontology 2009; 26(3): 214–218. [DOI] [PubMed] [Google Scholar]
  • 38.Volchansky A, Cleaton-Jones P, Evans W, Shackleton J. Patterns of previous tooth loss in patients presenting at five different types of dental practice. South African Dental Journal 2016; 71(2): 70–74. [Google Scholar]
  • 39.Armitage GC. Development of a classification system for periodontal diseases and conditions. Annals of Periodontology 1999; 4(1): 1–6. [DOI] [PubMed] [Google Scholar]
  • 40.Carlin BP, Gelfand AE, Banerjee S. Hierarchical Modeling and Analysis for Spatial Data. Chapman and Hall/CRC; 2014. [Google Scholar]
  • 41.Daniels MJ, Hogan JW. Missing data in longitudinal studies: Strategies for Bayesian modeling and sensitivity analysis. Chapman and Hall/CRC; 2008. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp info2
Supp info1

RESOURCES