Marginal Bayesian Semiparametric Modeling of Mismeasured Multivariate Interval-Censored Data

Li Li; Alejandro Jara; María José García-Zattera; Timothy E Hanson

doi:10.1080/01621459.2018.1476240

. Author manuscript; available in PMC: 2019 Oct 26.

Published in final edited form as: J Am Stat Assoc. 2018 Oct 26;114(525):129–145. doi: 10.1080/01621459.2018.1476240

Marginal Bayesian Semiparametric Modeling of Mismeasured Multivariate Interval-Censored Data

Li Li ¹, Alejandro Jara ², María José García-Zattera ³, Timothy E Hanson ⁴

PMCID: PMC6711609 NIHMSID: NIHMS1515024 PMID: 31456598

Abstract

Motivated by data gathered in an oral health study, we propose a Bayesian nonparametric approach for population-averaged modeling of correlated time-to-event data, when the responses can only be determined to lie in an interval obtained from a sequence of examination times and the determination of the occurrence of the event is subject to misclassification. The joint model for the true, unobserved time-to-event data is defined semiparametrically; proportional hazards, proportional odds, and accelerated failure time (proportional quantiles) are all fit and compared. The baseline distribution is modeled as a flexible tailfree prior. The joint model is completed by considering a parametric copula function. A general misclassification model is discussed in detail, considering the possibility that different examiners were involved in the assessment of the occurrence of the events for a given subject across time. We provide empirical evidence that the model can be used to estimate the underlying time-to-event distribution and the misclassification parameters without any external information about the latter parameters. We also illustrate the effect on the statistical inferences of neglecting the presence of misclassification.

Keywords: Mismeasured continuous response, Multivariate survival data, Population-averaged modeling, Copula function

1. Introduction

Considerable attention has been given to estimation of survival functions and of regression coefficients from a variety of standard models for univariate and multivariate censored data (see, e.g. Hougaard, 2000; Ibrahim et al., 2001). For the analysis of multivariate censored survival data, frailty and marginal models have been discussed, including versions of the proportional hazards (Cox, 1972), accelerated failure time (AFT) (e.g., Hanson & Johnson, 2004), additive hazards (AH) (e.g., Lin & Ying, 1994), and the proportional odds (PO) (e.g., Hanson & Yang, 2007) models.

These models usually assume that the determination of the event of interest is done without error which can be unrealistic. In fact, in many applications the ascertainment of the event of interest is based on a screening test which may not have perfect sensitivity and specificity. In this context, the use of standard survival models can lead to wrong inferences about the distribution of the event times (García-Zattera et al., 2016).

Compared to the rich literature on methods for correcting for misclassification in regression models for categorical data (see, e.g. García-Zattera et al., 2010, 2012, and references therein), the study of models in the context of time-to-event data has received much less attention and have been almost exclusively focused on misclassification and measurement errors in covariates (see, e.g. Gong et al., 1990). We are only aware of McKeown & Jewell (2010), where a nonparametric maximum likelihood approach is proposed in the context of misclassified univariate current status data and García-Zattera et al. (2016), where the AFT frailty modeling approach is extended to account for misclassification in the response for multivariate interval-censored data.

Frailty models are one of the most popular approaches to account for the association structure in time-to-event data. These models provide useful summary information in the absence of estimates of a baseline survival distribution and may be formulated in a parametric or semi-parametric fashion. However, under these models the regression coefficients describe changes in individual responses due to changes in covariates, they induce a particular association structure for the clustered variables, and rely heavily on the (conditional or subject-specific) assumptions in the relationship between the covariates and the event times (e.g., AFT, PH or PH), which is not always inherit in the induced marginal model. Furthermore, an often overlooked limitation of this approach is that the interpretation of regression coefficients can be highly sensitive to difficult-to-verify assumptions about the distribution of random effects, particularly its dependence on covariates. This issue is particularly relevant for interval-censored data where the degree of available information to perform diagnostic techniques is rather limited due to the censoring mechanism.

In this article, we propose a general framework for analyzing the marginal effects of predictors on the distribution of mis-measured multivariate interval-censored data. Specifically, we define the joint distribution of the multivariate time-to-event variables by combining marginal distributions arising from standard assumptions on the relationship of the predictors and time-to-event responses, and a parametric copula function, which describes the dependence structure among the event times. To avoid the potential disadvantages of adopting a fully parametric probability model, we consider a Bayesian semiparametric specification of the marginal distributions, where the baseline distribution of the event times is modeled using a Bayesian nonparametric (BNP) prior. Different misclassification models allowing for different classifiers for each subject across examinations are discussed.

The rest of the paper is organized as follows. Section 2 introduces the motivating data and research questions. The proposed model is introduced in Section 3, including the discussion of aspects associated with its computational implementation. In Section 4, the performance of the proposed model is assessed by means of simulation studies. The simulated data are also used to evaluate the effect of neglecting the presence of misclassification in the statistical analysis. The proposed model is applied to our motivating problem in Section 5. A final discussion section concludes the paper.

2. The Dental Research Questions

The Signal-Tandmobiel^® (ST) study is a longitudinal prospective oral health study conducted in Flanders, Belgium, between 1996 and 2001. This study involved a sample of 4468 Flemish primary school children (2315 boys and 2153 girls). The sample represents 7.3% of the children born in 1989 in Flanders and first examined in 1996. At the first examination the average age of the children in the sample was 7.1 years, with a standard deviation of 0.4 years. The age of the children at the first examination varied from 6.1 to 8.1 years.

The children were randomly drawn through a stratified cluster sampling design without replacement. The selection units were the schools, which were stratified by province and educational system. Thus, the target population was divided into 15 different strata, comprising the three types of the Belgian educational system (private, public, and municipal schools) for the five provinces of Flanders. Schools were selected with a probability proportional to the number of children in the first year of primary school. Whenever a school was selected, all children in the first class of the selected school were included in the sample. The children were examined on a yearly basis during their primary school time (between 7 and 12 years of age).

The children were examined annually by one of sixteen dental examiners in a mobile dental clinic on the school premises and the visit dates for each school were mainly determined by logistic reasons. Therefore, the number of visits and their timing were not related with any potential response variable or covariate gathered in the study. Clinical information was obtained based on visual and tactile observations. They included information about gingival condition, dental trauma, presence and extent of enamel developmental defects, tooth decay, presence of restorations, missing teeth, stage of tooth eruption, and orthodontic treatment need, all using established criteria, as recommended by the WHO report in 1987, and based on the diagnostic criteria for caries prevalence surveys published by the British Association for the Study of Community Dentistry (BASCD) (Pitts et al., 1997). Besides the oral health data, information on oral hygiene and dietary habits, use of fluorides, dental attendance, medical history and social demographic background of the children was also obtained from questionnaires completed by parents and school medical centers. For a more detailed description of the ST study we refer to Vanobbergen et al. (2000).

One of the main purposes of the ST study was to assess the marginal effect of covariates on the time-to-caries experience (CE). Caries lesions are typically scored at four levels of lesion severity: D₄ (dentine caries with pulpal involvement), D₃ (dentine caries with obvious cavitation), D₂ (hidden dentine caries) and D₁ (white or brown-spot initial lesions in enamel without cavitation). CE corresponds to an event indicating whether a particular tooth is decayed at least D₃ level, missing, or filled due to caries. Teeth extracted for reasons different from caries, e.g. orthodontics, were coded in a different manner and treated as missing values for CE.

CE as just defined is a monotone process. Thus, the existence of reversals in longitudinal data, that is, teeth or surfaces initially recorded as being carious and subsequently recorded as caries-free, provides evidence of the existence of classification errors. For the teeth considered here, the reversals varied from 1.3 to 3.8% across the study. Diagnosis of CE is surrounded by a number of challenges. For instance, nowadays, composite materials can imitate the natural enamel so well that it is sometimes difficult to spot a restored lesion. Another reason may be that the location of the cavity, e.g. far back in the mouth, hampers the view of the dental examiner. Hence, overlooking CE is likely to happen in practice, but the dental examiner could also classify discolorations as CE.

The selected examiners participated every year in training and calibration sessions, according to the guidelines issued by the BASCD. At the end of each calibration exercise the sensitivity and specificity of each dental examiner vis–a–vis the benchmark examiner were determined, yielding a misclassification table for each examiner for scoring of caries at tooth and surface levels. The results suggest that some examiners over– or under–score the true caries status and that the scoring behavior of the examiners was constant across the study period. It is also important to stress that children who participated in the calibration exercises were not taken at random from the main data, rather a school was selected with a presumed high prevalence for caries.

Finally, the analyses reported in Section 5 involve the four permanent first molars, that is, teeth 16, 26 on the maxilla (upper quadrants), and teeth 36 and 46 on the mandible (lower quadrants). The numbering of the teeth follows the FDI (Federation Dentaire Internationale) notation which indicates the position of the tooth in the mouth. Position 26, for instance, means that the tooth is in quadrant 2 (upper left quadrant from the viewpoint of the dental examiner) and position 6 where numbering starts from the mid–sagittal plane. The choice of these teeth for the statistical analyses is primarily based on the relatively non-negligible prevalence of the disease at this age in this population.

3. The Bayesian Semiparametric Models

Let $T_{(i, j)} \in R_{+}$ be the continuous time-to-event (time to CE) for the jth unit (tooth) of the ith subject (child), i = 1, … , N, j = 1, … , J. Suppose that the occurrence of the event is assessed by using a sequence of subject-specific evaluations. Let 0 < v_(i,1) < v_(i,2) < ⋯ < v_{(i,K_i)} < +∞ be the ordered examination times for the ith subject, i = 1, … , N, where K_i is the number of examinations. In a regular interval-censored data context, the time-to-event T_(i,j) is unobserved but is known with certainty to lie in an interval T_(i,j) ∈ (v_{(i,l_(i,j)–1)}, v_{(i,l_(i,j))}] obtained from the sequence of examinations, l_(i,j) ∈ {1, … ,K_i+1}, where v_(i,0) ≡ 0 and v_{(i,K_i+1)} ≡ +∞. However,in our setting the determination of the event is prone to misclassification and the observed data are given by the binary variables D_(i,j,k), k = 1, … , K_i, indicating whether the (potentially) error-corrupted evaluation concludes that the event has occurred by time v_(i,k) (D_(i,j,k) = 1) or not (D_(i,j,k) = 0). An illustration of the observed data generating mechanism is given in Appendix A of the online supplementary material.

In the following, set T = (T₁, … , T_N ), where T_i = (T_(i,1), … , T_(i,J)), i = 1, … , N, is a vector of unobserved event times, and D = (D₁, … , D_N), where D_i = D_(i,1), … , D_{(i,K_i)}), D_(i,k) = (D_(i,1,k), … , D_(i,J,k)), i = 1, … , N, k = 1, … , K_i, is a vector of observed binary indicators of potentially misclassified event status. We assume that for each subject and unit, a p-dimensional design vector including exogenous covariates is recorded, x_(i,j), i = 1, … , N, = 1, … , J. The main aim here is to develop a method to infer on the marginal dependence of the event times T_(i,j) on covariates x_(i,j), where the event times T_(i,j) are observed only through sequences of possibly misclassified binary indicators D_(i,j,k) of the event status. To this end, we first specify marginal models for the dependence of event times on covariates in Section 3.1. Second,the link between the observable binary variables D and unobservable event times T is given by the misclassification models in Section 3.2. The event times and the misclassification models induce marginal models for the observed data D described in Section 3.3.

3.1. The Semiparametric Time-to-Event Models

Let f_{x_(i,1),…,x_(i,J)} be the joint density function for the unobserved time-to-event responses for the ith subject. We build on Sklar’s theorem (Sklar, 1959) and model f_{x_(i,1),…,x_(i,J)} by using its unique marginal-copula representation

f_{x_{(i, 1)}, \dots, x_{(i, J)}} (t_{1}, \dots, t_{J}) = c_{ρ} (F_{x_{(i, 1)}} (t_{1}), \dots, F_{x_{(i, J)}} (t_{J})) {\prod_{j = 1}^{J} f_{x_{(i, j)}} (t_{j})},

where $(t_{1}, \dots, t_{J}) \in R_{+}^{J}$ , c_ρ is the density of the a copula function, parametrized by the finite-dimensional parameter ρ, and F_{x_(i,j)} (t) and f_{x_(i,j)} (t) denote the marginal cumulative distribution and density function for the jth unit of the ith subject, with covariates x_(i,j), respectively. A Gaussian copula function is assumed throughout, such that,

c_{ρ} (u_{1}, \dots, u_{J}) = ∣ R_{ρ} ∣^{- 1 ∕ 2} exp {- \frac{1}{2} (Φ^{- 1} (u_{1}), \dots, Φ^{- 1} (u_{J})) U_{ρ} {(Φ^{- 1} (u_{1}), \dots, Φ^{- 1} (u_{J}))}^{'}},

where Φ⁻¹ is the inverse cumulative distribution function of a standard Normal distribution, $U_{ρ} (R_{ρ}^{- 1} - I_{J})$ , I_J is the identity matrix of dimension J, and $R_{ρ}^{- 1}$ is a correlation matrix.

The PH, AFT and PO marginal regression models are considered by expressing the covariate-dependent cumulative distribution function (CDF), F_x(t), as

1 - F_{x_{(i, j)}} (t) = (1 - F_{0} (t))^{exp {x_{(i, j)}^{'} β_{j}}},

(1)

1 - F_{x_{(i, j)}} (t) = 1 - F_{0} (exp {x_{(i, j)}^{'} β_{j}} t),

(2)

and

\frac{F_{x_{(i, j)}} (t)}{1 - F_{x_{(i, j)}} (t)} = exp {x_{(i, j)}^{'} β_{j}} (\frac{F_{0} (t)}{1 - F_{0} (t)}),

(3)

respectively, where $β_{j} \in R^{p}$ , j = 1, … , J, is a vector of regression coefficients and F₀ is the marginal baseline CDF. Finally, we assume that, for i = 1, … , N,

T_{i} ∣ β, ρ, F_{0} \overset{i n d .}{\sim} f_{x_{(i, 1)}, \dots, x_{(i, J)}} (\cdot ∣ β, ρ, F_{0}),

(4)

where β = (β₁, … , β_J).

There is a rich Bayesian nonparametric (BNP) literature for robustifying the modeling of a baseline CDF F₀ (or equivalently its hazard function) in the context of univariate and multivariate frailty-based models (see, e.g. Müller et al., 2015), including the use of gamma processes (Kalbfleisch, 1978), beta processes (Hjort, 1990), piecewise exponential priors (Ibrahim et al.,2001), correlated increments priors (Sinha & Dey, 1997), Bernstein polynomials (Gelfand & Mallick, 1995) and tailfree processes (Hanson, 2006; Hanson & Yang, 2007; Zhao et al., 2009; Hanson et al., 2011). Among the BNP approaches, we opted for tailfree processes because they allow for the use of the same BNP model for F₀ under the different formulations of the model given by expressions (1), (2), and (3). By placing the three time-to-event models on common ground, potential differences in fit and/or predictive performance of the models can be attributed to the time-to-event model assumptions only, rather than to additional possible differences in quite different nonparametric models or estimation methods. Furthermore, the BNP model can be specified such that standard parametric models are special cases of the model.

We model the baseline CDF as a mixture of tailfree processes prior, centered at the Weibull family. Tailfree processes are stochastic processes that can be defined to have trajectories on the space of all probability distributions on a given space (see, e.g. Freedman, 1963; Fabius, 1964; Ferguson, 1974; Jara & Hanson, 2011). A tailfree random probability measure F₀ supported on $R_{+}$ is defined by allocations of random probabilities to increasingly refined partitions of $R_{+}$ . Let E = {0, 1} and E^m be the m-fold Cartesian product E×⋯×E. Further, set $E^{*} = ⋃_{m = 1}^{\infty} E^{m}$ . Consider the sequence of partitions of $R$ given by $π_{0} = {R_{+}}$ , π₁ = {B₀, B₁}, π₂ = {B₀₀, B₀₁, B₁₀, B₁₁}, …, such that $R_{+} = B_{0} \cup B_{1}$ and B₀ ∩ B₁ = ∅, and for each $m \in N$ and every ϵ = ε₁ … ϵ_m ∈ E^m, B_ϵ = B_ϵ0 ∪ B_ϵ1 and B_ϵ0 ∩ B_ϵ1 = ∅. Assume that B_ϵ0 lies below B_ϵ1 and that for all ϵ ∈ E*, B_ϵ is a left-open right-closed interval unless ϵ is a string of ones only. Throughout the paper, we use the convention that ϵ = ε₁ ⋯ ε_m–10 = 0 and ϵ = ε₁ ⋯ ϵ_m–1 1 = 1, if m = 1. Let $Π = \cup_{i = 0}^{\infty} π_{i}$ and further assume that the partitions form a rich class in the sense that Π is a generator of the Borel σ-field of $R_{+}$ , $B \equiv B (R_{+})$ .

Definition 1. Let Π be a sequence of binary partitions as before and $A = {a_{ϵ} : ϵ \in E^{*}}$ be a collection of real numbers. A random probability measure F₀ on $(R_{+}, B)$ is said to be a tailfree process with parameters ( $Π, A$ ), denoted F₀ $∣ Π, A \sim TFP (Π, A)$ , if there exist a collection $Y = {Y_{ϵ} : ϵ \in E^{*}}$ of [0, 1]-valued random variables such that the following hold:

The vectors (Y₀, Y₁), (Y₀₀, Y₀₁, Y₁₀, Y₁₁), …, are mutually independent and with probability law determined by (a₀, a₁), (a₀₀, a₀₁, a₁₀, a₁₁), …, respectively.
For every ϵ = ε₁ ⋯ ε_m ∈ E*, Y_{ε₁⋯ε_m–10} + Y_{ε₁⋯ε_m–11} = 1 almost surely.
For every ϵ = ε₁ ⋯ ε_m ∈ E*, the random probability measure F₀ is related to $Y$ through the relations

F_{0} (B_{ϵ}) = \prod_{j = 1}^{m} Y_{ε_{1} \dots ε_{j}} .

We consider partition sets in Π such that their limits correspond to quantiles of a parametric distribution G_θ, θ ∈ Θ, defined on $(R_{+}, B)$ (Lavine, 1992). Specifically, we consider sets in Π of the form $B_{ϵ}^{θ} = (l_{ϵ}^{θ}, u_{ϵ}^{θ}]$ , where $l_{ϵ}^{θ} = G_{θ}^{- 1} (k ∕ 2^{m})$ and $u_{ϵ}^{θ} = G_{θ}^{- 1} {(k + 1) ∕ 2^{m}}$ , with $G_{θ}^{- 1} (\cdot)$ being the quantile function of G_θ, and k is the decimal representation of ϵ = ε₁ ⋯ ε_m ∈ E*. If needed, the notation Π^θ will be used to make the dependence of Π on the parameters of G_θ explicit. Without loss of generality, for the rest of the paper we assume that the sets are constructed based on the quantiles of the Weibull distribution, such that G_θ (t) = 1 – exp(−(t/η₂)^η₁) for t ≥ 0, θ = (log(η₁), log(η₂)).

Following Jara & Hanson (2011), we consider a logistic-Normal specification of the tailfree conditional probabilities, such that for every ϵ0 = ε₁ ⋯ ε_m–10 ∈ E*,

Y_{ϵ 0} = \frac{exp {λ_{ϵ 0}}}{1 + exp {λ_{ϵ 0}}},

and

λ_{ϵ 0} ∣ c, τ \overset{i n d .}{\sim} N (0, 2 ∕ [c τ (j)]),

where τ(j) is a nondecreasing known function of j. A common choice for τ(j) is j². The parameter c is a precision parameter; lower values of c allow mass of F₀ to move easily from the centering distribution G_θ. As c → 0⁺, E{F₀(·)} tends to the empirical CDF of the data (Hanson & Johnson,2002); as c → ∞, all conditional probabilities go to 0.5 and hence F₀(A) → G_θ(A) a.s. for every measurable set A. Common choices simply set c at small values, e.g. c = 1.

Under this specification Y_ϵ0 approximately follows an beta(cτ(j), cτ(j)) distribution (Jara & Hanson, 2011) and the resulting process closely matches a Polya tree prior (see, e.g. Lavine, 1992, 1994; Christensen et al., 2008). As is usually done for Polya trees priors, the tailfree model is partially specified, where the tailfree process is terminated at level L and on sets in the finest partition $π_{L}^{θ}$ the random F₀ matches exactly the parametric distribution G_θ (Hanson, 2006). We typically consider L ≈ log₂(n/M), where n is the sample size and M is 5 to 10 (Hanson, 2006). The resulting process is denoted by

F_{0} ∣ c, θ \sim {TFP}^{L} (Π^{θ}, A^{c})

(5)

Under this prior specification, the density of a realization of the process is given by

\begin{matrix} f_{0} (t) & = 2^{J} g_{θ} (t) \prod_{l = 1}^{L} Y_{ϵ_{θ} (t, l)}, \\ = 2^{L} g_{θ} (t) \prod_{l = 1}^{L} \frac{exp {λ_{ϵ_{θ} (t, l - 1) 0}}^{I_{{t \in B_{ϵ_{θ} (t, l - 1) 0}^{θ}}}}}{1 + exp {λ_{ϵ_{θ} (t, l - 1) 0}}}, \end{matrix}

(6)

where $t \in R_{+}$ , I_{A} is the indicator function for A, ϵ_θ(t, l) = ε₁ε₂ ⋯ ε_l is the set in $π_{θ}^{l}$ that t is in, and g_θ(·) is the density of a Weibull distribution. This expression can be employed to derive closed form expressions for the cumulative distribution function F₀ and to construct the likelihood in different settings.

It may be difficult in practice to specify a single centering Weibull distribution with which to center the linear tailfree process; and once specified, a single centering distribution may affect inference unduly. One way to mitigate the dependence of the process on the partitioning sets is to specify a mixture of prior distributions. A mixture of tailfree processes is induced for F₀ by allowing parameters of the centering distribution G_θ and/or the precision parameter c to be random, that is,

F_{0} ∣ c, θ \sim {TFP}^{L} (Π^{θ}, A^{c}) and (θ, c) \sim p (θ, c),

where p(θ, c) refers to the joint prior for θ and c. Smoothness properties in terms of continuity and differentiability of the densities for F₀ under the mixture of tailfree processes carry over from the results reported by Hanson (2006). One important property is posterior propriety under improper priors on the mixing parameter θ, following a simple application of Tonelli’s theorem.

3.2. The Misclassification Models

As in the case of the ST study, suppose now that the evaluation of the event status at each visit is performed by Q examiners. Denote by ξ_(i,k) ∈ {1, … , Q} the variable indexing the examiner that evaluates all four molars of subject i at examination time v_(i,k), and let ξ_i = (ξ_(i,1), … , _{(i,K_i)}) be the vector of indicators of the examiners that score the responses of subject i over time. We further assume that the scoring behavior of each examiner is the same across the study. Let η_q = (η_(q,1), … , η_(q,J) and α_q = (α_(q,1), … , α_(q,J)), q = 1, … , Q, be the vectors containing the unit-specific specificity and sensitivity parameters for the qth examiner, respectively. Finally, let α = (α₁, … , α_Q) and η = (η₁, … , η_Q) be the matrices containing all sensitivity and specificity parameters, respectively. In this setting, the misclassification model assumes that

\begin{matrix} Pr (D_{(i, j, k)} = 1 ∣ T_{(i, j)} \in (0, v_{(i, k)}]) & = α_{(ξ_{(i, k)}, j)}, \\ Pr (D_{(i, j, k)} = 0 ∣ T_{(i, j)} \in (v_{(i, k)}, + \infty)) & = η_{(ξ_{(i, k)}, j)}, \end{matrix}

and the process is characterized by the following conditional independence assumptions. Note that assumptions (A.1) - (A.5) represent natural extensions of the commonly used assumptions for the analysis of misclassified binary data (see, e.g. García-Zattera et al., 2010, 2012).

(A.1)
⫫_1≤i≤N D_i ∣ T₁, … , T_N, ξ₁, … , ξ_N, η, α, i.e. the observed response matrices for each subject are independent given the true unobserved event times, examiner indicators, and sensitivity and specificity parameters,
(A.2)
D_i⫫D₁, … , D_i–1, D_i+1, … , D_N ∣ T_i, ξ_i, η, α, ∀ i, i.e. the distribution of the observed response matrix for a subject only depends on his true unobserved time-to-event vector, the examiners that score his responses, and the sensitivity and specificity parameters,
(A.3)
⫫_{1≤k≤K_i} D_i,k ∣ T_i, ξ_i, η, α, ∀ (i, k), i.e. the observed response vectors for a subject are independent across time given his unobserved time-to-event vector, the examiners that score his responses and the sensitivity and specificity parameters,
(A.4)
⫫_1≤j≤J D_(i,j,k) ∣ T_i, ξ_(i,k), η_ξ(i,k), α_ξ(i,k), ∀ i, k, i.e. the observed responses at the kth examination are independent given the unobserved time-to-event vector, the examiner that scores his responses at the kth examination, and the examiner-specific sensitivity and specificity parameters,
(A.5)
D_(i,j,k)⫫T_(i,1), … , T_(i,j–1), T_(i,j+1), … , T_(i,J) ∣ T_(i,j), ξ_(i,k), η_{(ξ_(i,k),j)}, α_{(ξ_(i,k),j)}, i.e. the distribution of the jth observed variable at the kth examination only depends on the true unobserved time-to-event for the same variable, the examiner that scores his responses at examination k, and the sensitivity and specificity parameters of this examiner for the jth variable.

A simplified version of the above defined general misclassification model, which assumes unstructured examiner-unit specific sensitivity and specificity parameters, is to assume the same misclassification parameters across units for each examiner: for q = 1, … , Q, η_(q,j) = η_(q) and α_(q,j) = α_(q), ∀j. Extensions of the general misclassification model can also be considered. For instance, the model could be extended by including examinee-specific or examiner-specific characteristics in the misclassification parameters, allowing for the understanding of the potential heterogeneity in the scoring behavior of the examiners. Unfortunately, there is no available information about the specific characteristics of the examiners in the ST study and we do not pursue this here. However, tooth position, gender, and age of the examinee are considered in Section 4.2.

Following García-Zattera et al. (2010), García-Zattera et al. (2012) and García-Zattera et al. (2016), the following restricted parameter spaces for the misclassification parameters are considered to avoid identification problems,

{(η_{(q, j)}, α_{(q, j)}) \in [0, 1]^{2} : η_{(q, j)} + α_{(q, j)} > 1}, q = 1, \dots, Q, j = 1, \dots, J .

3.3. The Implied Statistical Models and Stochastic Representations

Regardless of the misclassification model, the assumptions (A.1) – (A.5), along with the joint probability model for the time-to-event responses (4) and the BNP prior for the baseline probability distribution (5), imply that the joint probability model for the observed binary indicators and unobserved time-to-event variables for each subject is given by

\begin{matrix} p (D_{1}, \dots, D_{N}, T_{1}, \dots, T_{N} & ∣ α, η, β, ρ, c, θ) = \prod_{i = 1}^{N} p (D_{i} ∣ T_{i}, α, η) p (T_{i} ∣ β, ρ, c, θ), \\ = \prod_{i = 1}^{N} {\prod_{j = 1}^{J} \prod_{k = 1}^{K_{i}} p (D_{(i, j, k)} ∣ T_{(i, j)}, η_{(ξ_{(i, k)}, j)}, α_{(ξ_{(i, k)}, j)})} \times \\ \int p (T_{i} ∣ β, ρ, F_{0}) p (F_{0} ∣ c, θ) d F_{0}, \end{matrix}

where

\begin{matrix} p (D_{(i, j, k)} ∣ T_{(i, j)}, & η_{(ξ_{(i, k)}, j)}, α_{(ξ_{(i, k)}, j)}) \\ = {α_{(ξ_{(i, k)}, j)}^{D_{(i, j, k)}} {(1 - α_{(ξ_{(i, k)}, j)})}^{1 - D_{(i, j, k)}}}^{I (T_{(i, j)})_{{T_{i, j)} \in (0, v_{(i, k)}]}}} \times \\ {{(1 - η_{(ξ_{(i, k)}, j)})}^{D_{(i, j, k)}} η_{(ξ_{(i, k)}, j)}^{1 - D_{(i, j, k)}}}^{I (T_{(i, j)})_{{T_{i, j)} \in (v_{(i, k)}, + \infty)}}}, \\ = \prod_{l = 1}^{k} {(α_{(ξ_{(i, k)}, j)}^{D_{(i, j, k)}} {(1 - α_{(ξ_{(i, k)}, j)})}^{1 - D_{(i, j, k)}}}}^{I (T_{(i, j)})_{{T_{i, j)} \in (v_{(i, l - 1)}, v_{(i, l)}]}}} \times \\ \prod_{l = k + 1}^{K_{i} + 1} {{(1 - η_{(ξ_{(i, k)}, j)})}^{D_{(i, j, k)}} η_{(ξ_{(i, k)}, j)}^{1 - D_{(i, j, k)}}}^{I (T_{(i, j)})_{{T_{i, j)} \in (v_{(i, l - 1)}, v_{(i, l)})}}}, \end{matrix}

and p(T_i ∣ β, π, F₀) is given by f_{x_(i,1),…,x_(i,J)} (T_(i,1), … , T_(i,J) ∣ β, π, F₀ under each specific time-to-event marginal model assumption ((1) – (3)). Therefore, the likelihood function for observed data is given by

\begin{matrix} p (D_{1}, \dots, D_{N} ∣ α, η, β, ρ, c, θ) & = \prod_{i = 1}^{N} \int_{R_{+}^{J}} p (D_{i} ∣ T_{i}, α, η) p (T_{i} ∣ β, ρ, c, θ, τ) d T_{i}, \\ = \prod_{i = 1}^{N} \int_{R_{+}^{J}} {\prod_{j = 1}^{J} \prod_{k = 1}^{K_{i}} p (D_{(i, j, k)} ∣ T_{(i, j)}, η_{(ξ_{(i, k)}, j)}, α_{(ξ_{(i, k)}, j)})} \times \\ \int p (T_{i} ∣ β, ρ, F_{0}) p (F_{0} ∣ c, θ) d F_{0} d T_{i} . \end{matrix}

(7)

An alternative stochastic representation of the joint model for the unobserved time-to-events greatly simplifies the posterior computation for the proposed models. Under this representation, the time-to-events are viewed as transformed Gaussian random variables,

T_{(i, j)} = F_{x_{(i, j)}}^{- 1} (Φ (Z_{(i, j)})),

where

Z_{i} = (Z_{(i, 1)}, \dots, Z_{(i, J)}) ∣ ρ \overset{i . i . d .}{\sim} N_{J} (0_{J}, R_{ρ}),

i = 1, … , N, with N_d(m, S) denoting a d-variate Normal distribution with mean m and covariance matrix S, and density denoted by ϕ_d (· ∣ m, S). The joint density implied by this transformation is then given by

\prod_{j = 1}^{J} {\frac{f_{x_{(i, j)}} (t_{j})}{ϕ (F_{x_{(i, j)}}^{- 1} (Φ (t_{j})))}} ϕ_{J} (F_{x_{(i, 1)}}^{- 1} (Φ (t_{1})), \dots, F_{x_{(i, J)}}^{- 1} (Φ (t_{J})) ∣ 0_{J}, R_{ρ}),

which is equivalent to f_{x_(i,1),…,x_(i,J)} (t₁, … , t_j ∣ , β, ρ, F₀). This distribution can also be viewed as the marginal distribution for T_i arising from the joint model

p (T_{i} ∣ Z_{i}, β, F_{0}) p (Z_{i} ∣ ρ),

where p (· ∣ Z_i, β, F₀) is a degenerate probability distribution arising from

T_{(i, j)} ∣ Z_{(i, j)}, β, F_{0} \overset{i n d .}{\sim} p (\cdot ∣ Z_{(i, j)}, β, F_{0}) = δ_{F_{x_{(i, 1)}}^{- 1} {(Φ (Z_{(i, j)}))}^{(\cdot)}},

j = 1, … , J, where δ_a(·) is the Dirac measure at a. Based in this, the data augmented hierarchical representation of the proposed models, along with the employed prior distributions, is given by

\begin{matrix} D_{(i, j, k)} ∣ T_{(i, j)}, η_{(ξ_{(i, k)}, j)}, α_{(ξ_{(i, k)}, j)} \overset{i n d .}{\sim} p (\cdot ∣ T_{(i, j)}, η_{(ξ_{(i, k)}, j)}, α_{(ξ_{(i, k)}, j)}), \\ T_{(i, j)} ∣ Z_{(i, j)}, β, F_{0} \overset{i n d .}{\sim} p (\cdot ∣ Z_{(i, j)}, β, F_{0}), \\ Z_{(i, j)} ∣ ρ \overset{i n d .}{\sim} N_{J} (0_{J}, R_{ρ}), \\ (η_{(q, j)}, α_{(q, j)}) ∣ a^{(η, 0)}, a^{(η, 1)}, a^{(α, 0)}, a^{(α, 1)} \overset{i n d .}{\sim} Beta (a^{(η, 0)}, a^{(η, 1)}) \times Beta (a^{(α, 0)}, a^{(α, 1)}) \times \\ I {(η_{(q, j)}, α_{(q, j)})}_{{(η_{(q, j)}, α_{(q, j)}) : η_{(q, j)} + α_{(q, j)} > 1}}, \\ β ∣ m_{β}, V_{β} \sim N_{p} (m_{β}, V_{β}), \\ F_{0} ∣ c, θ \sim {TFP}^{J} (Π^{θ}, A^{c}), \\ θ ∣ m_{θ}, V_{θ} \sim N_{2} (m_{θ}, V_{θ}) . \end{matrix}

A similar hierarchical representation is obtained under the misclassification model assuming equal misclassification parameters across variables for each examiner. The model specification is completed by assuming a prior distribution on the parameters of the Gaussian copula model ρ, which depends on the parameterization of the correlation matrix R_ρ. We assume priors on ρ such that the resulting prior on the correlation matrix R_ρ is uniform on the corresponding space of correlation matrices.

3.4. Main Aspects of the Posterior Computation

Samples from the posterior distribution for the model parameters are obtained by using a Gibbs sampler algorithm based on the augmented posterior distribution described in the previous section. In this Gibbs sampler, blocks of parameters are updated using Metropolis-Hastings steps (Tierney, 1994) or directly sampled from the corresponding conditional distributions. The parameters defining the conditional tailfree probabilities are updated in a single block by using the adaptive Gaussian random-walk proposal described by (Haario et al., 2001), where the candidate generating covariance matrix is tuned to get acceptance rates in the 20% to 50% range. The underlying time-to-event variables Z_i, i = 1, … , n, the regression parameters β and the parameters of the centering distribution of the tailfree process θ can be updated in a similar way.

Assumptions (A.1) – (A.5), along with the assumptions of the semiparametric time-to-event models for clustered data, imply that the full conditionals for the misclassification parameters under the more general misclassification model are truncated beta distributions given by

η_{(q, j)} ∣ \dots \sim Beta (a^{(η, 0)} + n_{(q, j)}^{00}, a^{(η, 1)} + n_{(q, j)}^{+ 0} - n_{(q, j)}^{00}) I {(η_{(q, j)})}_{{η_{(q, j)} : η_{(q, j)} > 1 - α_{(q, j)}}},

and

α_{(q, j)} ∣ \dots \sim Beta (a^{(α, 0)} + n_{(q, j)}^{11}, a^{(α, 1)} + n_{(q, j)}^{+ 1} - n_{(q, j)}^{11}) I {(α_{(q, j)})}_{{α_{(q, j)} : α_{(q, j)} > 1 - η_{(q, j)}}},

where

\begin{matrix} n_{(q, j)}^{00} & = \sum_{i = 1}^{N} \sum_{k = 1}^{K_{i}} I {(D_{(i, j, k)}, T_{(i, j)})}_{{D_{(i, j, k)} = 0, T_{(i, j)} \in (v_{(i, k)}, + \infty)}^{I (ξ_{(i, k)})}_{{q}},} \\ n_{(q, j)}^{+ 0} = \sum_{i = 1}^{N} \sum_{k = 1}^{K_{i}} I {(T_{(i, j)})}_{{T_{(i, j)} \in (v_{(i, k)}, + \infty]}^{I (ξ_{(i, k)})}_{{q}},} \\ n_{(q, j)}^{11} & = \sum_{i = 1}^{N} \sum_{k = 1}^{K_{i}} I {(D_{(i, j, k)}, T_{(i, j)})}_{{D_{(i, j, k)} = 1, T_{(i, j)} \in (0, v_{(i, k)}]}^{I (ξ_{(i, k)})}_{{q}},} \end{matrix}

and

n_{(q, j)}^{+ 1} = \sum_{i = 1}^{N} \sum_{k = 1}^{K_{i}} I {(T_{(i, j)})}_{{T_{(i, j)} \in (0, v_{(i, k)}]}^{I (ξ_{(i, k)})}_{{q}} .}

Similar expressions are obtained for the model assuming the same examiner-specific misclassification parameters for each variable.

The updating scheme for the association parameters of the Gaussian copula model, ρ, depends on the parametrization of the correlation matrix R_ρ. Under an unstructured correlation matrix, parameter expansion for data augmentation strategies can be used (Liu & Wu, 1999; van Dyk & Meng, 2001; Imai & van Dyk, 2005). A compound symmetric parameterization, with off-diagonal elements equal and positive, allows for a simpler marginal joint likelihood of the proposed models. Specifically, a compound symmetry parameterization of the correlation matrix can be obtained from the stochastic representation Z_ij = γ_i + ϵ_ij, where $γ_{i} ∣ ρ \overset{i . i . d .}{\sim} N (0, ρ)$ and $ϵ_{i j} ∣ ρ \overset{i . i . d .}{\sim} N (0, 1 - ρ)$ . Thus, given γ_i, the conditional CDF for T_(i,j) is given by

F_{x_{(i, j)}} (t ∣ γ_{i}) = Φ (\frac{Φ^{- 1} (F_{x_{(i, j)}} (t)) - γ_{i}}{\sqrt{1 - ρ}}),

and p (D_i ∣ γ_i, α, η, β, ρ, F₀) is given by

\prod_{j = 1}^{J} {\sum_{k = 1}^{K_{i} + 1} A_{(i, j, k)} [Φ (\frac{Φ^{- 1} (F_{x_{(i, j)}} (v_{(i, k)})) - γ_{i}}{\sqrt{1 - ρ}}) - Φ (\frac{Φ^{- 1} (F_{x_{(i, j)}} (v_{(i, k - 1)})) - γ_{i}}{\sqrt{1 - ρ}})]},

(8)

Where $A_{(i, j, k)} = \prod_{l = k}^{K_{i}} α_{(ξ_{(i, l)}, j)}^{D_{(i, j, l)}} {(1 - α_{(ξ_{(i, l)}, j)})}^{1 - D_{(i, j, l)}} \prod_{l = 1}^{k - 1} η_{(ξ_{(i, l)}, j)}^{1 - D_{(i, j, l)}} {(1 - η_{(ξ_{(i, l)}, j)})}^{D_{(i, j, l)}}$ . A detailed description of the MCMC algorithm employed under a general correlation matrix is given in Appendix B of the online supplementary material.

4. A Simulation Study

To validate the proposed models, we conducted an analysis of simulated datasets. The main aim of this study is to provide empirical evidence that under the proposed semiparametric marginal approach to modeling misclassified time-to-event data the model parameters can be estimated from the observed data only, without the need of external information about the misclassification parameters. It is important to emphasize that external information beyond the observed data is often required for misclassified data in other settings. The simulation study is also used to evaluate the performance of classical model selection criteria in identifying among the time-to-event model assumptions, to show the effect of performing naive analyses neglecting the misclassification process, and to assess the effect on inferences under a wrong time-to-event model.

4.1. The Simulation Settings

Three different marginal models are considered for the underlying time-to-event data T_(i,j). Specifically, we consider PH, AFT, and PO marginal assumptions in the definition of the true model, respectively. Under the three models we considered J = 4 teeth and the joint model was completed by considering a Gaussian copula function. For all models, a bimodal baseline distribution is assumed by considering F₀(·) = 0.5 × LN(· ∣ −0.5, 0.8²) + 0.5 × LN(· ∣ 0.5, 0.3²), where LN(· ∣ μ, σ²) refers to the CDF of a log-Normal distribution with location μ and scale parameter σ². For each model we set x_(i,j) = (x_(i,j,1), x_(i,j,2)), where $x_{(i, j, 1)} \overset{i . i . d .}{\sim} Bernoulli (0.5)$ , $x_{(i, j, 2)} \overset{i . i . d .}{\sim} Uniform (0, 1)$ . The true time-to-event marginal models are shown in Appendix C of the online supplementary material.

For each marginal model, three different simulation scenarios were considered. In Scenario I, a compound symmetry correlation matrix and common effects of the predictors across teeth were assumed. In this case, we set ρ = 0.2 and β_j = (−0.5, 1), for every j. The true time-to-events were interval-censored by simulating the “visit” times for each subject. We considered K_i = 10. The first visit time was randomly chosen from an LN(−1.0, 2²) distribution. The time between the consecutive visits, υ_i,k – υ_i,k–1,, was drawn from an LN(−0.7, 0.2²) distribution. We assumed that the assessment of the occurrence of the event was performed by Q = 4 examiners, allocated randomly to each subject and visit. We further assumed common misclassification parameters for each examiner across variables and set α = (0.95, 0.90, 0.85, 0.80) and η = (0.80, 0.85, 0.90, 0.95).

In Scenario II, a general correlation matrix was assumed, keeping everything else the same as in Scenario I. In this case, we set

R_{ρ} = (\begin{matrix} 1.0 & ρ_{12} & ρ_{13} & ρ_{14} \\ ρ_{21} & 1.0 & ρ_{23} & ρ_{24} \\ ρ_{31} & ρ_{32} & 1.0 & ρ_{34} \\ ρ_{41} & ρ_{42} & ρ_{43} & 1.0 \end{matrix}) = (\begin{matrix} 1.0 & 0.4 & 0.2 & 0.1 \\ 0.4 & 1.0 & 0.4 & 0.2 \\ 0.2 & 0.4 & 1.0 & 0.4 \\ 0.1 & 0.2 & 0.4 & 1.0 \end{matrix}) .

In Scenario III, data were generated using the same setup as Scenario II, but we allowed tooth-specific misclassification parameters and predictor effects for each of the four examiners in estimation. For each simulation scenario and true marginal model, we considered three different sample sizes N = 100, 200, and 300. For each scenario, true marginal model, and sample size, 200 datasets were generated.

To evaluate the ability to identify among the correct time-to-event modeling assumption, AFT,PH, and PO versions of the proposed marginal model were fit to each dataset, using the algorithms described in Section 3.4. Under Scenarios I and II, we considered versions of the proposed model assuming common effects of the predictors across teeth and common misclassification parameters across teeth for each examiner. In this case, we set m_β = m_θ = 0₂, V_β = V_θ = 10³ × I₂, and constrained uniform priors distributions were assumed for the misclassification parameters by taking $a_{(1)}^{(α, 0)} = a_{(1)}^{(α, 1)} = \dots = a_{(Q)}^{(α, 0)} = a_{(Q)}^{(α, 1)} = 1$ and $a_{(1)}^{(η, 0)} = a_{(1)}^{(η, 1)} = \dots = a_{(Q)}^{(η, 0)} = a_{(Q)}^{(η, 1)} = 1$ . Under Scenario I a compound symmetry correlation matrix was assumed with an uniform prior for ρ. Under Scenario II an unstructured correlation matrix was assumed with a uniform Haar prior over all correlation matrices. Under Scenario III, on the other hand, we considered versions of the proposed model assuming different effects of the predictors across teeth, an unstructured correlation matrix, and different misclassification parameters across teeth for each examiner. In this case, we set m_β = m_θ = 0₈, V_β = V_θ = 10³ × I₈, and considered a uniform prior for the general correlation matrix and constrained uniform priors for the misclassification parameters. For all models we set c = 1.

For each model and dataset, we obtain a posterior sample of size 5,000, after a burn-in period of 20,000 and thinning of every other 5 scans of the posterior distribution. The three versions of the proposed marginal model fit for each dataset were compared by means of the pseudo Bayes factor (PsBF), originally developed by Geisser & Eddy (1979) and further considered by Gelfand Dey (1994). The PsBF for the comparison of M_i versus M_j corresponds to the ratio between the pseudo marginal likelihood (PML) for model M_i and model M_j. In our context, the PML for model M_i is defined as

{PML}_{M_{i}} = \prod_{i = 1}^{N} \prod_{j = 1}^{J} p_{M_{i}} (D_{(i, j, 1)}, \dots, D_{(i, j, K_{i})} ∣ D^{[- (i, j)]}),

where p_{M_i}(D_(i,j,1), … , D_{(i,j,K_i)}∣D^[−(i,j)]) is the predictive distribution for observations associated with the jth tooth of the ith subject, based on the data D^[−(i,j)] and under model M_i, with D^[−(i,j)] being the observed data matrix that excludes the observation for the jth tooth of subject i. Therefore, PsBF for model M_i versus model M_j is defined as

{PBF}_{M_{i}, M_{j}} = \prod_{i = 1}^{N} \prod_{j = 1}^{J} \frac{p_{M_{i}} (D_{(i, j, 1)}, \dots, D_{(i, j, K_{i})} ∣ D^{[- (i, j)]})}{p_{M_{i}} (D_{(i, j, 1)}, \dots, D_{(i, j, K_{i})} ∣ D^{[- (i, j)]})} .

(9)

The method suggested by Gelfand & Dey (1994) was used to obtain estimates of CPO statistics from the MCMC output. Under a compound symmetry correlation matrix, the CPO can be computed as

\begin{matrix} p_{M_{i}} & (D_{(i, j, 1)}, \dots, D_{(i, j, K_{i})} ∣ D^{[- (i, j)]}) \\ = {E_{γ_{i}, α, η, β, ρ, F_{0} ∣ D} (\frac{1}{p (D_{(i, j)} ∣ γ_{i}, α, η, β, ρ, F_{0})})}^{- 1}, \\ \approx {\frac{1}{B} \sum_{b = 1}^{B} (\frac{1}{p (D_{(i, j)} ∣ γ_{i}^{(b)}, α^{(b)}, η^{(b)}, β^{(b)}, ρ^{(b)}, F_{0}^{(b)})})}^{- 1}, \end{matrix}

Where ( $γ_{i}^{(b)}$ , α^(b), η_(b), β_(b), ρ_(b), $F_{0}^{(b)}$ ), b = 1, … , B, are MCMC samples from the posterior distribution, and p(D_(i,j) ∣ γ_i, α, η, β, ρ, F₀) can be derived from expression (8). The expression forapproximating the CPO under an unstructured correlation matrix is given in Appendix D of the online supplementary material.

To assess the effect of ignoring the misclassification process on the statistical inferences we performed naive analyses to data generated under misclassification. Specifically, we implemented the semiparametric marginal models described in Section 3.1 for regular interval-censored data. These models were fit to the data that arises by assuming that the identification of the interval of time where each event occurred is free of error, leading to regular interval-censored data. In this case, each response was assumed to lie in the corresponding kth interval, where k is the first interval where D_(i,j,k) = 1, regardless of the values of D_(i,j,k+1), … , D_{(i,j,K_i)}. The naive analyses were performed for the data generated under Scenario I, using the same MCMC and prior specification as for the corresponding semiparametric marginal models taking into account the misclassification process.

Finally, to assess the effect on the inferences of the use of a wrong time-to-event model, we also simulate data from an extended hazard (EH) model (see,e.g, Li et al., 2015). The EH model assumes the following relationship among the baseline survival distribution, the predictors, and the marginal survival distributions:

1 - F_{x_{(i, j)}} (t) = (1 - F_{0} (exp {x_{(i, j)}^{'} ζ} t))^{exp {x_{(i, j)}^{'} β}},

where β and ζ are vectors of regression coefficients. The EH model is a more flexible survival model, including AFT and PH as special cases.

4.2. The Results

The results suggest that the regression and association parameters can be estimated with only minimal bias and with reasonable precision under all simulation settings. Table 1 shows the means, across simulations, the biases, and the MSE of the posterior mean of the parameters from the different versions of the semiparametric model, from the different time-to-event modeling assumptions under Scenario I. The results under Scenarios II, III, and a variation of Scenario I with a different baseline time-to-event distribution (Scenario IV), are shown in Appendix E of the online supplementary material.

Table 1:

Simulated data - Scenario I. True value, Monte Carlo mean, bias, and mean square error (MSE) of the posterior mean of the time-to-event model parameters. The results are presented for different group sample sizes (N) and true underlying time-to-event model assumptions (PH, AFT and PO). In this table, the same true time-to-event model is assumed to simulate and to fit the data.

			True Marginal Model
			PH			AFT			PO
N	Parameter	True Value	Mean	Bias	MSE	Mean	Bias	MSE	Mean	Bias	MSE
100	β₁	−0.5	−0.502	0.002	0.036868	−0.495	0.005	0.005954	−0.502	0.002	0.036868
	β₂	1.0	1.032	0.032	0.114593	1.004	0.004	0.016145	1.032	0.032	0.114593
	ρ	0.2	0.2350	0.035	0.005422	0.226	0.026	0.004412	0.244	0.044	0.006299
200	β₁	−0.5	−0.496	0.004	0.007412	−0.502	0.002	0.002504	−0.479	0.021	0.019762
	β₂	1.0	1.022	0.022	0.025133	1.015	0.015	0.008689	1.070	0.070	0.067400
	ρ	0.2	0.218	0.018	0.002634	0.214	0.014	0.002304	0.217	0.017	0.002817
300	β₁	−0.5	−0.504	0.004	0.005200	−0.505	0.005	0.001625	−0.502	0.002	0.015133
	β₂	1.0	1.001	0.001	0.017162	0.997	0.003	0.004365	1.029	0.029	0.046210
	ρ	0.2	0.212	0.012	0.001533	0.212	0.012	0.00168	0.214	0.014	0.001809

Open in a new tab

Similar results regarding bias and MSE were observed for the misclassification parameters for all simulation settings. Figures 1 and 2 show the results for Scenario I. In general the MSE is similar for the misclassification parameters across true time-to-event models and there is a larger variability of the estimates of the specificity parameters. This is explained by the distribution of the visit times. In fact, assessment intervals were simulated to roughly capture all possible survival times, i.e. approximately cover the support of the true survival distributions. However, relatively more assessment visits are toward the tail of the survival distribution. Therefore, less information is available to estimate the specificity parameters.

As illustrated in Table 1, Figure 1, and Figure 2 for Scenario I, important reductions in the MSE were observed for all parameters when the sample size increased for all simulation settings, suggesting that the posterior mean is a consistent estimator of the model parameters. These results on bias, MSE, and consistency strongly suggest that prior information on the misclassification parameters is not needed to obtain nearly unbiased and precise estimates for the regression coefficients, association parameters and misclassification parameters. Thus, the model parameters can be estimated from the observed data without extra information on the misclassification parameters.

Table 2 displays the results on the behavior of the model selection criteria under Scenario I. This table shows the percentage of time across simulations in which the logarithm of the PML (LPML) selects the correct time-to-event model assumption. The results show that the LPML is an adequate model selection criteria and that the power for selecting the correct regression model assumption is high even for sample sizes as small as N = 100. Furthermore, the power of LPML for selecting the correct model assumption rapidly increases with the sample size. The less power observed for the LPML for detecting the correct regression assumption under the PH and PO model is explained by the distribution of the visit times. More assessment visits are toward the tail of the time-to-event distribution under the PH and PO models, in comparison with the AFT model.

Table 2:

Simulated data - Scenario I. Percentage of time, across simulations, in which the LPML favors the correct true underlying time-to-event regression model assumption. The results are shown for the different group sample sizes (N) and true underlying time-to-event regression model assumption.

	True Marginal Model
N	PH	AFT	PO
100	67.5	84.0	67.5
200	87.1	94.4	81.0
300	90.0	98.4	86.2

Open in a new tab

Table 3 and Figure 3 show the results for the naive analysis assuming no misclassification. The increased bias and MSE strongly support the benefits of the proposed model under the presence of misclassification. Indeed, systematic and strong bias were observed for the regression coefficients and variance components. The posterior mean of the regression coefficients under the naive model were biased towards the null effect. Furthermore, an underestimation of the correlation is obtained under a naive analysis. As expected from the results obtained for the model parameters under a naive analysis, the posterior mean is a strongly biased estimator of the baseline survival function if the misclassification process is not taken into account. Most of the marginal survival probabilities are significantly underestimated by the posterior mean under the naive analysis.

Table 3:

Simulated data - Scenario I. True value, and Monte Carlo mean, bias and mean square error (MSE) of the posterior mean of the time-to-event model parameters for different sample sizes. The results are presented for naive fitting of AFT, PO, and PH models. In this table, the same true time-to-event model is assumed to simulate and to fit the data.

		Fitted Model
			PH			AFT			PO
N	Parameter	True Value	Mean	Bias	MSE	Mean	Bias	MSE	Mean	Bias	MSE
100	β₁	−0.5	−0.308	0.192	0.046564	−0.386	0.114	0.017434	−0.325	0.175	0.059050
	β₂	1.0	0.686	0.314	0.129580	0.791	0.209	0.057670	0.735	0.265	0.165507
	ρ	0.2	0.146	0.054	0.005063	0.141	0.059	0.005210	0.144	0.056	0.005398
200	β₁	−0.5	−0.312	0.188	0.040296	−0.390	0.11	0.014599	−0.347	0.153	0.037034
	β₂	1.0	0.710	0.290	0.100839	0.798	0.202	0.048498	0.717	0.283	0.125007
	ρ	0.2	0.125	0.075	0.006762	0.125	0.075	0.006812	0.130	0.070	0.006303
300	β₁	−0.5	−0.315	0.185	0.037274	−0.386	0.114	0.014451	−0.332	0.168	0.038471
	β₂	1.0	0.713	0.287	0.095330	0.809	0.191	0.040722	0.687	0.313	0.128777
	ρ	0.2	0.120	0.080	0.007240	0.118	0.082	0.007573	0.122	0.078	0.006896

Open in a new tab

Figure 3: — Simulated data - Scenario I. Mean across simulations of the posterior mean of the baseline survival function (dashed line), point-wise 95% confidence region (shaded) under the a naive analysis ignoring the misclassification process. The true survival function is represented as a solid line. Panels (a), (b), and (c) display the results for N = 100 under a true PH, AFT and PO marginal model, respectively. Panels (d), (e), and (f) display the results for N = 200 under a true PH, AFT and PO marginal model, respectively. Panels (g), (h), and (i) display the results for N = 300 under a true PH, AFT and PO marginal model, respectively

Finally, when an incorrect probability model is fit to the data, it is expected to observe misleading inferences associated with parameters with different interpretations across models (e.g., the regression coefficients) and parameters highly influenced by the model assumptions (e.g., the marginal survival functions, which varies as a function of predictors in different ways under the different models). However, no or little effects are expected on parameters with a common interpretation, such as the association structure and misclassification parameters. A more detailed discussion on this aspect is provided in Appendix F of the online supplementary material.

5. The analysis of the Signal-Tandmobiel^® data

In this section analyses of the ST study data are presented. We are interested in the evaluation of the marginal effect of gender, age at baseline, age when brushing starts, number of between-meal snacks (two or less a day versus more than two a day), and geographical location of the school, expressed in terms of the x– and y–coordinates, on the time-to-CE for permanent first molars: teeth 16, 26 on the maxilla (upper quadrants), and teeth 36 and 46 on the mandible (lower quadrants). The inclusion of the geographical components, was motivated by the results of exploratory data analyses without correcting for misclassification, that showed a significant East-West gradient in the apparent prevalence of CE in Flanders (estimated as the number of teeth testing positive to CE by the dentists divided by the number of teeth in the sample, and shown in Figure 10 of Appendix G of the online supplementary material). Therefore, one of the research questions is whether there is a geographical trend in the true prevalence of CE or the observed trend in the apparent prevalence is completely explained by the geographic distribution of the dentists. In fact, and for practical reasons, the dentists were active in a relatively restricted geographical area. For instance, the spatial distribution of the dentist in the first year of examination of the ST is shown in Figure 11 of Appendix G of the online supplementary material. Thus, a possible cause for the apparent trend in CE is a different scoring behavior of the 16 dental examiners and their non–homogeneous spatial distribution in the study area. The proposed model address this question by correcting for the misclassification of the examiners and, at the same time, evaluating the effect of the geographic location of the school on the underlying distribution of the time-to-CE. Notice that the identification of the two possible sources of the geographic trend are possible because in each year there was more than one examiner active in each geographical area and there was some overlapping between the area where each examiner was active and the regions. For instance, for the first year of examination at least 4 examiners were active in each province and 14 out of the 16 examiners were active in more than one province.

Different versions of the proposed models were fit the to ST data. Specifically, we considered different marginal modeling assumptions, common or tooth-specific regression coefficients, compound symmetry (structured) or unstructured correlation matrices, nonlinear and linear models for the effects of the geographic location of the schools, and common or covariate specific misclassification parameters. For the geographic location of the schools we considered a model based on tensor product of spline basis functions for x and y (i.e., nonlinear and with interaction) (Hennerfeind et al., 2006), additive spline basis for x and y (i.e., nonlinear and without interaction), linear terms for x and y with an interaction term, and a linear version without interaction. For the misclassification parameters, we considered models assuming common sensitivity and specificity parameters across teeth for every examiner, along with a model where these parameters were allow to vary with tooth’s position, child’s gender and age at baseline.

The models were fit by assuming similar priors to the ones described in the analyses of simulated data. For each model, we ran the Markov chain cycle described in Section 3.4 a conservative total number of 1,000,000 samples. The full chain was subsampled every 50 iterations after a burn-in period of 250,000 samples, to give a reduced chain of length 15,000. Standard MCMC tests (not shown), suggested convergence of the chains.

Table 4 shows the LPML for the different models. The results suggest that, from a predictive point of view, the PO version of the Bayesian semiparametric marginal model predicts these data the best. Furthermore, the results show that the simplest version of the model better fits the data. Specifically, we conclude that there is no need for a “nonparametric” modeling of the geographic information or evidence of spatial interaction, interaction between the predictors and the tooth’s location, an unstructured correlation matrix, different misclassification parameters across teeth or predictor-dependent misclassification parameters. More importantly, the results also suggest that the marginal models outperform the flexible AFT frailty model proposed by García-Zattera et al. (2016) for these data. In fact, the LPML for the frailty AFT model considering the same predictors and misclassification model was −5560 versus −5543 for the simplest PO model.

Table 4:

Signal-Tandmobiel^® data. Log pseudo marginal likelihood (LPML) for the considered models. For the geographic location of the schools the tensor product of spline basis functions for x and y, additive spline basis for x and y, linear terms for x and y with an interaction term, and a linear version without interaction are represented by g(x, y), g_x(x) + g_y(y), x + y + x × y, and x + y, respectively.

Marginal model	β (across teeth)	R_ρ	x and y	α and η	LPML
AFT	Common	Structured	x + y	Common	−5552
PH	Common	Structured	x + y	Common	−5545
PO	Common	Structured	x + y	Common	−5543
PO	Common	Unstructured	x + y	Common	−5828
PO	Common	Structured	x + y	Different	−5610
PO	Common	Structured	x + y	Depending on x	−5547
PO	Different	Structured	x + y	Common	−5556
PO	Common	Structured	g(x, y)	Common	−5544
PO	Common	Structured	g_x(x) + g_y(y)	Common	−5545
PO	Common	Structured	x + y + x × y	Common	−5546

Open in a new tab

To assess the goodness of fit of the proposed model two different measures were considered. Specifically, we consider a posterior predictive check strategy (Gelman et al., 2014), where we compare the predictive distribution of the error-corrupted binary variables with the observed ones. A summary of the results for the ST data under different models is given in Appendix H of the online supplementary material. The results show there is no evidence of lack of fit for the selected model. For instance, the 95% credible band from the posterior predictive distribution contains the observed count in all cases. Furthermore, the PO versions of the proposed model (under a compound symmetry correlation matrix and under an unstructured correlation matrix) showed the best performance. In fact, the posterior predictive mean (95% credible interval) of the “chi-square” goodness of fit statistics was 48.76 (17.67, 118.34), 47.54 (19.22, 118.87), 83.28 (48.44, 153.12), 53.62 (19.55, 144.49), and 65.06 (31.42, 138.80), for the PO model under a compound symmetry correlation matrix, for the PO model under an unstructured correlation matrix, for a parametric version of a PO model (with a compound symmetry correlation matrix) using a Weibull baseline distribution, for the for the PH version of the model, for the AFT version of the model, respectively.

As a second measure of goodness of fit we consider the posterior predictive distribution for the latent time-to-event residuals and compare it with the theoretical distribution, assuming that the model is correct. The results for the different teeth under the PO version of the proposed model are given in Appendix H of the online supplementary material for a compound symmetry correlation matrix and under an unstructured correlation matrix. The results do not show significant deviations from the theoretical distribution for all teeth. The PO model under an unstructured correlation matrix shows a slightly better performance, where the point-wise 95% credible band for the quantiles of the latent residuals cover the theoretical straight line completely. For the PO model under a compound symmetry correlation matrix, the point-wise 95% credible band for the quantiles of the latent residuals cover most of the theoretical straight line, with small deviations observed for tooth 46.

Table 5 shows the posterior means and 95% highest posterior density (95% HPD) credible intervals for the regression and association parameters under the selected PO semiparametric model. The 95%HPD interval for each regression coefficient suggest that gender, age, age when brushing starts, and the geographical location (y-coordinate) has a significant effect on the marginal odds of CE at any given time. To evaluate the posterior evidence about the effect of the predictors on the time to CE, we also computed the pseudo contour probability (PsCP) for each of these hypotheses. The PsCP was computed based on equi-tailed credible bands and is defined as one minus the smallest credible level for which the null hypothesis parameter value is contained in the corresponding credible bands. The PsCP was 0.007 for the marginal effect of gender, 0.001 for the marginal effect of both age and age when brushing starts, 0.185 for the number of between-meal snacks, 0.275 for the x-coordinate, and 0.006 for the y-coordinate. These results suggest that there is strong posterior evidence against several of the corresponding null hypotheses: namely, boys have greater odds of developing CE and that the older the child is when he/she starts brushing, the greater the odds of developing CE. Furthermore, the results on β₅ and β₆ support the hypothesis that the observed geographical gradient is indeed explained by real local geographical differences and not due to the different scoring behavior of the examiners.

Table 5:

Signal-Tandmobiel^® data. Posterior mean (95% credible interval) for the time-to-event model parameters under the PO version of the proposed model (PO), a semiparametric version of the PO model for error-free interval-censored data, neglecting the misclassification process (PO naive), and for a Weibull parametric PO model, taking into account the misclassification process (PO parametric).

	Model
Parameter	PO	PO naive	PO parametric
β₁(Gender; Girl)	0.2853 ( 0.0677 ; 0.5049)	0.2126 ( 0.0018 ; 0.4242)	0.3742 ( 0.1420 ; 0.5730)
β₂ (Age at baseline; years)	0.2275 ( 0.1343 ; 0.3266)	0.1927 ( 0.1165 ; 0.2667)	0.2381 ( 0.1348 ; 0.3376)
β₃ (Age when brushing starts; years)	−0.3108 (−0.5082; −0.1202)	−0.2642 (−0.4042; −0.1139)	0.1362 (−0.0649 ; 0.3059)
β₄ (In between–meal snacks; ≥ 2 a day)	0.1609 (−0.0718 ; 0.4039)	0.1554 (−0.0565 ; 0.3804)	0.1947 (−0.0323 ; 0.4258)
β₅ (x-coordinate)	1.2029 (−0.9065 ; 3.2985)	1.1348 (−0.6141 ; 2.6372)	2.0010 (−0.1179 ; 3.8020)
β₆ (y-coordinate)	−8.5588 (−14.7564 ; −2.7341)	−7.9773 (−12.8916 ; −3.1642)	−4.0854 (−9.8616 ; 1.9983)
ρ	0.6935 ( 0.6520 ; 0.7309)	0.5536 ( 0.4672 ; 0.6512)	0.4241 ( 0.3342 ; 0.4966)

Open in a new tab

Figure 4 shows the posterior mean and 95% HPD credible interval for the sensitivity and specificity of each examiner under the selected PO model. The results under the corresponding AFT and PH models are also displayed in this figure. The results suggest a greater variability in the sensitivity than in the specificity estimates, which can be explained by the low prevalence of CE at this age. All examiners showed a sensitivity greater than 0.75, with rather narrow 95% HPD credible intervals, with one exception. The latter result is explained by the fact that this examiner (examiner 9) was only involved in the first two years of the ST study, having less information for the estimation of his parameters. The posterior means for the specificity parameters were higher than 0.93 for all examiners.

To illustrate the contributions of both the nonparametric and misclassification components of the proposed model for the ST data, we also implemented and fit parametric versions of the proposed models and performed naive analyses by considering a Bayesian semiparametric models for error-free interval-censored data (i.e., neglecting the misclassification process). The results show the need of the Bayesian nonparametric component in the time-to-event model. As a matter of fact, the LPML for parametric counterparts of the simplest PH, AFT, and PO versions of the model were −5570, −5583, and −5564, respectively. In these cases, the parametric models were fit using a Weibull baseline distribution, the same predictors, and the same misclassification model.

Table 5 shows the results of the regression parameters under a semiparametric PO model using a naive analysis. Not taking into account the misclassification process for the ST data causes an attenuation of the effects of the predictors towards zero. Also, the power for detecting differences is reduced. The results also show that the correction of the point estimates of predictor effects obtained under the model taking into account the misclassification, in comparison with the naive analysis, does not come with an increase in variability, which is an important advantage with respect to contexts where the data does not contain information on the misclassification parameters (see, e.g., Luan et al., 2015). On the other hand, the results under a Weibull parametric PO model show that the differences in the posterior inferences do not follow a systematic pattern, with coefficients taking higher or smaller values than observed under the semiparametric PO model. More importantly, the significant effect of the age when brushing starts variable and geographic location are not detected under the parametric version of the model.

Figures 5 and 6 display the estimated survival functions for some combinations of the predictors under the different models. The results also show that significantly different inferences are obtained when the objects of interest are the predictor-dependent marginal survival functions. The inferences under naive analyses not taking into account the misclassification process and a parametric version of the model can even produce survival point estimates that are outside the credible region under the PO model (please also see Figures 14 and 15 in the online supplementary material).

Figure 5: — Signal-Tandmobiel^® data - PO model - misclassification. Posterior predictive mean of the survival function under the selected model (solid line) and under a semiparametric PO model neglecting the misclassification process (dashed line). The pointwise 60% credible bands for each model are displayed as gray areas. Panel (a) displays the results for a girl, 7.2 years old at baseline, 3 years old when brushing starts, having two or fewer snacks in-between meals, and sample mean x– and y– coordinates. Panel (b) displays the results for a girl, 7.2 years old at baseline, 2 years old when brushing starts, having two or fewer snacks in-between meals, and sample mean x– and y–coordinates. Panel (c) displays the results for a boy, 7.2 years old at baseline, 3 years old when brushing starts, having two or fewer snacks in-between meals, and sample mean x– and y–coordinates. Panel (d) displays the results for a boy, 7.2 years old at baseline, 2 years old when brushing starts, having two or fewer snacks in-between meals, and sample mean x– and y–coordinates.

Figure 6: — Signal-Tandmobiel^® data - PO model - nonparametric. Posterior predictive mean of the survival function under the selected model (solid line) and under a Weibull parametric PO model (dashed line). The pointwise 60% credible bands for each model are displayed as gray areas. Panel (a) displays the results for a girl, 7.2 years old at baseline, 3 years old when brushing starts, having two or fewer snacks in-between meals, and sample mean x– and y–coordinates. Panel (b) displays the results for a girl, 7.2 years old at baseline, 2 years old when brushing starts, having two or fewer snacks in-between meals, and sample mean x– and y–coordinates. Panel (c) displays the results for a boy, 7.2 years old at baseline, 3 years old when brushing starts, having two or fewer snacks in-between meals, and sample mean x– and y–coordinates. Panel (d) displays the results for a boy, 7.2 years old at baseline, 2 years old when brushing starts, having two or fewer snacks in-between meals, and sample mean x– and y–coordinates.

6. Concluding Remarks

We have proposed a Bayesian semiparametric approach for the marginal modeling of misclassified correlated interval-censored data and showed that models under this framework can outperform standard frailty models, even when they are specified in a flexible way regarding the distributional assumptions. Although the methodology was motivated by an oral health application, it can be applied to any situation where correlated responses are of interest, they can only be determined to lie in an interval of time, and the assessment of the event is subject to misclassification. Examples include studies about kidney failure or vision loss. An important aspect associated with the Bayesian nonparametric formulation of the model is that, by assuming the same flexible model for the baseline marginal time-to-event distribution function F₀, the different regression model assumptions are placed on common ground. Furthermore, parametric models are special cases of the nonparametric models. Thus, differences in the performance of the models can be attributed to the regression model assumption only, rather than to additional possible differences in nonparametric models or estimation methods. The proposed approach is illustrated under the three most commonly used regression assumptions (PH, AFT and PO). However, it can be easily extended for other specifications, such EH (see,e.g, Li et al., 2015) or to fully nonparametric specification of the marginal distributions (see, e.g. Jara et al., 2010). As a matter of fact, we also fit a EH version of the proposed mode for the ST data. However, the LPML for this model was −5564 and thus the LPML still favors the simplest and PO version of the proposed model.

We provided empirical evidence showing that under simple restrictions on the parameter space, the model parameters in the proposed model can be estimated from the observed data obtained from a longitudinal study, where the follow-up for individuals and variables continues after the first positive result, thus avoiding the need of external information on the misclassification parameters. The results suggest that even under the use of uniform priors on the misclassification parameters, the posterior mean of the model parameters is unbiased, precise, and consistent. We noted that if external information on the misclassification parameters is available, this can be easily incorporated into the model specification.

The generalization of the proposed modeling approach to account for potential time trends in the misclassification parameters is also of interest in some applications, for instance, when the examiners follow a learning-by-doing process. The most important question in such generalizations of the models are related to the potential lack of identification of the model parameters. In the context of models for categorical data, the assumption of constant misclassification parameters is a necessary and sufficient identification restriction when at least three time points are considered (García-Zattera et al., 2010, 2012). The empirical results provided in this paper suggest that this constraint is at least a sufficient identification restriction when more time points are considered. These and other generalizations are the subject of ongoing research.

The MCMC algorithms were coded in C++. The code were compiled into a shared library and linked into R via “Rcpp” package’s foreign language interface. For a simulated data of size N = 100 and J = 4, the computation time to obtain a Markov chain of length 45,000 is on average 25 minutes, based on an IMAC machine with 3.2 GHz intel Core i5 and 16 GB 1600 MHz DDR3. When the number of covariates is increased from 2 to 10, the computation time for n = 100 increases to 30 minutes. On the other hand, if the sample size is N = 300, the computation time is one hour on average.

Supplementary Material

Supp1

NIHMS1515024-supplement-Supp1.pdf^{(1.5MB, pdf)}

Acknowledgements

The research time of Li was supported in part by the National Cancer Institute grant (5P30CA118100-11; the National Cancer Institute, USA; PI: Willman). The second author was supported by Fondecyt 1141193 and 1180640 grants. The third author was supported by Fondecyt 11110033 grant. The work was partially performed during a visit of the fourth author to Pontificia Universidad Católica de Chile, supported by Fondecyt 11110033 grant. The Signal-Tandmobiel^® study comprises following partners: D. Declerck (Dental School, Catholic University Leuven), L. Martens (Dental School, University Ghent), J. Vanobbergen (Dental School, University Ghent), P. Bottenberg (Dental School, University Brussels), E. Lesaffre (L-BioStat, Catholic University Leuven) and K. Hoppenbrouwers (Youth Health Department, Catholic University Leuven; Flemish Association for Youth Health Care. The authors thank Sofía and Josefa Jara for proofreading.

Contributor Information

Li Li, Department of Mathematics and Statistics, The University of New Mexico, Albuquerque, NM 87131, USA (llis@unm.edu)..

Alejandro Jara, Department of Statistics, Pontificia Universidad Católica de Chile, Casilla 306, Correo 22, Santiago, Chile (atjara@uc.cl)..

María José García-Zattera, Department of Statistics, Pontificia Universidad Católica de Chile, Casilla 306, Correo 22, Santiago, Chile (mjgarcia@uc.cl)..

Timothy E. Hanson, Statistician, Medtronic, Inc., 710 Medtronic Parkway N.E., Minneapolis, MN 55432 USA (tim.hanson2@medtronic.com)..

References

Christensen R, Hanson T & Jara A (2008). Parametric nonparametric statistics: An introduction to mixtures of finite Polya trees. The American Statistician 62 296–306. [Google Scholar]
Cox DR (1972). Regression models and life-tables (with Discussion). Journal of the Royal Statistical Society, Series B 34 187–220. [Google Scholar]
Fabius J (1964). Asymptotic behavior of Bayes’ estimates. The Annals of Mathematical Statistics 35 846–856. [Google Scholar]
Ferguson TS (1974). Prior distribution on the spaces of probability measures. Annals of Statistics 2 615–629. [Google Scholar]
Freedman D (1963). On the asymptotic distribution of Bayes’ estimates in the discrete case. Annals of Mathematical Statistics 34 1386–1403. [Google Scholar]
García-Zattera MJ, Jara A & Komarek A (2016). A flexible AFT model for misclassified clustered interval-censored data. Biometrics 72 473 – 483. [DOI] [PubMed] [Google Scholar]
García-Zattera MJ, Jara A, Lesaffre E& Marshall G(2012). Modeling of multivariate monotone disease processes in the presence of misclassification. Journal of the American Statistical Association 107 976–989. [Google Scholar]
García-Zattera MJ, Mutsvari T, Jara A, Declerk D & Lesaffre E (2010). Correcting for misclassification for a monotone disease process with an application in dental research. Statistics in Medicine 29 3103–3117. [DOI] [PubMed] [Google Scholar]
Geisser S & Eddy W (1979). A predictive approach to model selection. Journal of the American Statistical Association 74 153–160. [Google Scholar]
Gelfand AE & Dey D (1994). Bayesian model choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society, Series B 56 501–514. [Google Scholar]
Gelfand AE & Mallick BK (1995). Bayesian analysis of proportional hazards models built from monotone functions. Biometrics 51 843–852. [PubMed] [Google Scholar]
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A & Rubin DB (2014). Bayesian Data Analysis. CRC press, 2nd ed. [Google Scholar]
Gong G, Whittemore AS & Grosser S (1990). Censored survival data with misclassified covariates: A case study of breast-cancer mortality. Journal of the American Statistical Association 85 20–28. [DOI] [PubMed] [Google Scholar]
Haario H, Saksman E & Tamminen J (2001). An adaptive Metropolis algorithm. Bernoulli 7 223–242. [Google Scholar]
Hanson T (2006). Inference for mixtures of finite Polya tree models. Journal of the American Statistical Association 101 1548–1565. [Google Scholar]
Hanson T & Johnson WO (2002). Modeling regression error with a mixture of Polya trees. Journal of the American Statistical Association 97 1020–1033. [Google Scholar]
Hanson T & Johnson WO (2004). A Bayesian semiparametric AFT model for interval-censored data. Journal of Computational and Graphical Statistics 13 341–361. [Google Scholar]
Hanson T & Yang M (2007). Bayesian semiparametric proportional odds models. Biometrics 63 88–95. [DOI] [PubMed] [Google Scholar]
Hanson TE, Branscum A & Johnson WO (2011). Predictive comparison of joint longitudinal–survival modeling: a case study illustrating competing approaches. Lifetime Data Analysis 17 3–28. [DOI] [PubMed] [Google Scholar]
Hennerfeind A, Brezger A & Fahrmier L (2006). Geoadditive survival models. Journal of the American Statistical Association 1 1065–1075. [Google Scholar]
Hjort NL (1990). Nonparametric Bayes estimators based on beta processes in models for life history data. The Annals of Statistics 1259–1294. [Google Scholar]
Hougaard P (2000). Analysis of Multivariate Survival Data. New York, USA: Springer. [Google Scholar]
Ibrahim JG, Chen M-H & Sinha D (2001). Bayesian Survival Analysis. New York, USA: Springer. [Google Scholar]
Imai K & van Dyk D (2005). A Bayesian analysis of the multinomial probit model using marginal data augmentation. Journal of the American Statistical Association 124 311–334. [Google Scholar]
Jara A & Hanson T (2011). A class of mixtures of dependent tail-free processes. Biometrika 98 553–566. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jara A, Lesaffre E, De Iorio M & Quintana FA (2010). Bayesian semiparametric inference for multivariate doubly-interval-censored data. The Annals of Applied Statistics 4 2126–2149. [Google Scholar]
Kalbfleisch JD (1978). Nonparametric Bayesian analysis of survival time data. Journal of the Royal Statistical Society, Series B 40 214–221. [Google Scholar]
Lavine M (1992). Some aspects of Polya tree distributions for statistical modeling. The Annals of Statistics 20 1222–1235. [Google Scholar]
Lavine M (1994). More aspects of Polya tree distributions for statistical modeling. The Annals of Statistics 22 1161–1176. [Google Scholar]
Li L, Hanson T & Zhang J (2015). Spatial extended hazard model with application to prostate cancer survival. Biometrics 71 313–322. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin DY & Ying Z (1994). Semiparametric analysis of the additive risk model. Biometrika 81 61–71. [Google Scholar]
Liu JS & Wu Y (1999). Parameter expansion for data augmentation. Journal of the American Statistical Association 94 1264–1274. [Google Scholar]
Luan X, Pan W, Gerberich S & Carlin B (2015). Does it always help to adjust for misclassification of a binary outcome in logistic regression? Statistics in Medicine 24 2221–2234. [DOI] [PubMed] [Google Scholar]
McKeown K & Jewell NP (2010). Misclassification of current status data. Lifetime Data Analysis 16 215–230. [DOI] [PMC free article] [PubMed] [Google Scholar]
Möller P, Quintana FA, Jara A & Hanson TE (2015). Bayesian Nonparametric Data Analysis. New York, USA: Springer. [Google Scholar]
Pitts NB, Evans DJ & Pine CM (1997). British association for the study of community dentistry (BASCD) diagnostic criteria for caries prevalence surveys-1996/97. Community Dent Health 14(Suppl 1) 6–9. [PubMed] [Google Scholar]
Sinha D & Dey DK (1997). Semiparametric Bayesian analysis of survival data. Journal of the American Statistical Association 92 1195–1212. [Google Scholar]
Sklar A (1959). Fonctions de rpartition n dimensions et leurs marges. Publications de lInstitut de Statistique de LUniversit de Paris 8 229231. [Google Scholar]
Tierney L (1994). Markov chains for exploring posterior distributions. The Annals of Statistics 22 1701–1762. [Google Scholar]
van Dyk D & Meng X (2001). The art of data augmentation. Journal of Computational and Graphical Statistics 1 1–50. [Google Scholar]
Vanobbergen J, Martens L, Lesaffre E & Declerck D (2000). The Signal Tand-mobiel project, a longitudinal intervention health promotion study in Flanders (Belgium): Baseline and first year results. European Journal of Paediatric Dentistry 2 87–96. [Google Scholar]
Zhao L, Hanson T & Carlin BP (2009). Flexible spatial frailty modeling via mixtures of Polya trees. Biometrika 96 263–276. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp1

NIHMS1515024-supplement-Supp1.pdf^{(1.5MB, pdf)}

[R1] Christensen R, Hanson T & Jara A (2008). Parametric nonparametric statistics: An introduction to mixtures of finite Polya trees. The American Statistician 62 296–306. [Google Scholar]

[R2] Cox DR (1972). Regression models and life-tables (with Discussion). Journal of the Royal Statistical Society, Series B 34 187–220. [Google Scholar]

[R3] Fabius J (1964). Asymptotic behavior of Bayes’ estimates. The Annals of Mathematical Statistics 35 846–856. [Google Scholar]

[R4] Ferguson TS (1974). Prior distribution on the spaces of probability measures. Annals of Statistics 2 615–629. [Google Scholar]

[R5] Freedman D (1963). On the asymptotic distribution of Bayes’ estimates in the discrete case. Annals of Mathematical Statistics 34 1386–1403. [Google Scholar]

[R6] García-Zattera MJ, Jara A & Komarek A (2016). A flexible AFT model for misclassified clustered interval-censored data. Biometrics 72 473 – 483. [DOI] [PubMed] [Google Scholar]

[R7] García-Zattera MJ, Jara A, Lesaffre E& Marshall G(2012). Modeling of multivariate monotone disease processes in the presence of misclassification. Journal of the American Statistical Association 107 976–989. [Google Scholar]

[R8] García-Zattera MJ, Mutsvari T, Jara A, Declerk D & Lesaffre E (2010). Correcting for misclassification for a monotone disease process with an application in dental research. Statistics in Medicine 29 3103–3117. [DOI] [PubMed] [Google Scholar]

[R9] Geisser S & Eddy W (1979). A predictive approach to model selection. Journal of the American Statistical Association 74 153–160. [Google Scholar]

[R10] Gelfand AE & Dey D (1994). Bayesian model choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society, Series B 56 501–514. [Google Scholar]

[R11] Gelfand AE & Mallick BK (1995). Bayesian analysis of proportional hazards models built from monotone functions. Biometrics 51 843–852. [PubMed] [Google Scholar]

[R12] Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A & Rubin DB (2014). Bayesian Data Analysis. CRC press, 2nd ed. [Google Scholar]

[R13] Gong G, Whittemore AS & Grosser S (1990). Censored survival data with misclassified covariates: A case study of breast-cancer mortality. Journal of the American Statistical Association 85 20–28. [DOI] [PubMed] [Google Scholar]

[R14] Haario H, Saksman E & Tamminen J (2001). An adaptive Metropolis algorithm. Bernoulli 7 223–242. [Google Scholar]

[R15] Hanson T (2006). Inference for mixtures of finite Polya tree models. Journal of the American Statistical Association 101 1548–1565. [Google Scholar]

[R16] Hanson T & Johnson WO (2002). Modeling regression error with a mixture of Polya trees. Journal of the American Statistical Association 97 1020–1033. [Google Scholar]

[R17] Hanson T & Johnson WO (2004). A Bayesian semiparametric AFT model for interval-censored data. Journal of Computational and Graphical Statistics 13 341–361. [Google Scholar]

[R18] Hanson T & Yang M (2007). Bayesian semiparametric proportional odds models. Biometrics 63 88–95. [DOI] [PubMed] [Google Scholar]

[R19] Hanson TE, Branscum A & Johnson WO (2011). Predictive comparison of joint longitudinal–survival modeling: a case study illustrating competing approaches. Lifetime Data Analysis 17 3–28. [DOI] [PubMed] [Google Scholar]

[R20] Hennerfeind A, Brezger A & Fahrmier L (2006). Geoadditive survival models. Journal of the American Statistical Association 1 1065–1075. [Google Scholar]

[R21] Hjort NL (1990). Nonparametric Bayes estimators based on beta processes in models for life history data. The Annals of Statistics 1259–1294. [Google Scholar]

[R22] Hougaard P (2000). Analysis of Multivariate Survival Data. New York, USA: Springer. [Google Scholar]

[R23] Ibrahim JG, Chen M-H & Sinha D (2001). Bayesian Survival Analysis. New York, USA: Springer. [Google Scholar]

[R24] Imai K & van Dyk D (2005). A Bayesian analysis of the multinomial probit model using marginal data augmentation. Journal of the American Statistical Association 124 311–334. [Google Scholar]

[R25] Jara A & Hanson T (2011). A class of mixtures of dependent tail-free processes. Biometrika 98 553–566. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Jara A, Lesaffre E, De Iorio M & Quintana FA (2010). Bayesian semiparametric inference for multivariate doubly-interval-censored data. The Annals of Applied Statistics 4 2126–2149. [Google Scholar]

[R27] Kalbfleisch JD (1978). Nonparametric Bayesian analysis of survival time data. Journal of the Royal Statistical Society, Series B 40 214–221. [Google Scholar]

[R28] Lavine M (1992). Some aspects of Polya tree distributions for statistical modeling. The Annals of Statistics 20 1222–1235. [Google Scholar]

[R29] Lavine M (1994). More aspects of Polya tree distributions for statistical modeling. The Annals of Statistics 22 1161–1176. [Google Scholar]

[R30] Li L, Hanson T & Zhang J (2015). Spatial extended hazard model with application to prostate cancer survival. Biometrics 71 313–322. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Lin DY & Ying Z (1994). Semiparametric analysis of the additive risk model. Biometrika 81 61–71. [Google Scholar]

[R32] Liu JS & Wu Y (1999). Parameter expansion for data augmentation. Journal of the American Statistical Association 94 1264–1274. [Google Scholar]

[R33] Luan X, Pan W, Gerberich S & Carlin B (2015). Does it always help to adjust for misclassification of a binary outcome in logistic regression? Statistics in Medicine 24 2221–2234. [DOI] [PubMed] [Google Scholar]

[R34] McKeown K & Jewell NP (2010). Misclassification of current status data. Lifetime Data Analysis 16 215–230. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Möller P, Quintana FA, Jara A & Hanson TE (2015). Bayesian Nonparametric Data Analysis. New York, USA: Springer. [Google Scholar]

[R36] Pitts NB, Evans DJ & Pine CM (1997). British association for the study of community dentistry (BASCD) diagnostic criteria for caries prevalence surveys-1996/97. Community Dent Health 14(Suppl 1) 6–9. [PubMed] [Google Scholar]

[R37] Sinha D & Dey DK (1997). Semiparametric Bayesian analysis of survival data. Journal of the American Statistical Association 92 1195–1212. [Google Scholar]

[R38] Sklar A (1959). Fonctions de rpartition n dimensions et leurs marges. Publications de lInstitut de Statistique de LUniversit de Paris 8 229231. [Google Scholar]

[R39] Tierney L (1994). Markov chains for exploring posterior distributions. The Annals of Statistics 22 1701–1762. [Google Scholar]

[R40] van Dyk D & Meng X (2001). The art of data augmentation. Journal of Computational and Graphical Statistics 1 1–50. [Google Scholar]

[R41] Vanobbergen J, Martens L, Lesaffre E & Declerck D (2000). The Signal Tand-mobiel project, a longitudinal intervention health promotion study in Flanders (Belgium): Baseline and first year results. European Journal of Paediatric Dentistry 2 87–96. [Google Scholar]

[R42] Zhao L, Hanson T & Carlin BP (2009). Flexible spatial frailty modeling via mixtures of Polya trees. Biometrika 96 263–276. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Marginal Bayesian Semiparametric Modeling of Mismeasured Multivariate Interval-Censored Data

Li Li

Alejandro Jara

María José García-Zattera

Timothy E Hanson

Roles

Abstract

1. Introduction

2. The Dental Research Questions

3. The Bayesian Semiparametric Models

3.1. The Semiparametric Time-to-Event Models

3.2. The Misclassification Models

3.3. The Implied Statistical Models and Stochastic Representations

3.4. Main Aspects of the Posterior Computation

4. A Simulation Study

4.1. The Simulation Settings

4.2. The Results

Table 1:

Figure 1:

Figure 2:

Table 2:

Table 3:

Figure 3:

5. The analysis of the Signal-Tandmobiel® data

Table 4:

Table 5:

Figure 4:

Figure 5:

Figure 6:

6. Concluding Remarks

Supplementary Material

Acknowledgements

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

5. The analysis of the Signal-Tandmobiel^® data