Semiparametric Latent Class Analysis of Recurrent Event Data

Wei Zhao; Limin Peng; John Hanfelt

doi:10.1111/rssb.12499

. Author manuscript; available in PMC: 2022 Dec 2.

Published in final edited form as: J R Stat Soc Series B Stat Methodol. 2022 Apr 14;84(4):1175–1197. doi: 10.1111/rssb.12499

Semiparametric Latent Class Analysis of Recurrent Event Data

Wei Zhao ¹, Limin Peng ², John Hanfelt ³

PMCID: PMC9718440 NIHMSID: NIHMS1778095 PMID: 36465280

Summary.

Recurrent events data frequently arise in chronic disease studies, providing rich information on disease progression. The concept of latent class offers a sensible perspective to characterize complex population heterogeneity in recurrent event trajectories that may not be adequately captured by a single regression model. However, the development of latent class methods for recurrent events data has been sparse, typically requiring strong parametric assumptions and involving algorithmic issues. In this work, we investigate latent class analysis of recurrent events data based on flexible semiparametric multiplicative modeling. We derive a robust estimation procedure through novelly adapting the conditional score technique and utilizing the special characteristics of multiplicative intensity modeling. The proposed estimation procedure can be stably and efficiently implemented based on existing computational routines. We provide solid theoretical underpinnings for the proposed method, and demonstrate its satisfactory finite sample performance via extensive simulation studies. An application to a dataset from research participants at Goizueta Alzheimer’s Disease Research Center illustrates the practical utility of our proposals.

Keywords: Latent class analysis, Recurrent events data, Multiplicative intensity model, Estimating equation, Conditional score

1. Introduction

Recurrent events data frequently arise in chronic disease follow-up studies when repeated occurrences of a disease-related event, such as hospitalization and infection, are monitored over time. Such data contain rich information on disease progression which often presents complex heterogeneous patterns across individuals. A common strategy to accommodate the heterogeneity in recurrent events data is to perform regression analysis that links the recurrent events outcome with a set of observed explanatory variables based on a specified model. Well-known methods include assessing or modeling the intensity function of recurrent events (Andersen and Gill, 1982; Pepe and Cai, 1993; Wang et al., 2001, for example), the gap time between recurrent events (Prentice et al., 1981; Lin et al., 1999; Luo et al., 2013, for example), and the mean or rate function of recurrent events (Cook and Lawless, 1997; Lin et al., 2000, for example). Despite the success in many applications, the standard regression strategy may have poor performance when data are embedded with multiple distinct subgroups owing to heterogeneous underlying etiology or other factors. Results from our simulation studies (see Section 5) suggest that ignoring the existence of distinct subgroups and fitting one common model for all can lead to substantially increased prediction errors.

Latent class analysis (LCA) offers a parsimonious solution to tackle complex heterogeneity structure that cannot be adequately captured by a single regression model. A dominant type of LCA approaches is to adopt latent class mixture modeling, which views the observed data as a manifestation of multiple latent classes or subgroups. Such LCA methods have been well studied for various kinds of data, including standard uncensored data (Wedel et al., 1993; Gallop et al., 2009; Lim et al., 2014, for example), censored data (Farewell, 1982; Jedidi et al., 1993; Mair and Hudec, 2009; Qu et al., 2015; Egleston et al., 2017, for example), longitudinal data (Muthén and Shedden, 1999; Nagin, 1999; Muthén, 2004; Reinecke and Seddig, 2011; Lai et al., 2016; Jo et al., 2017; Bacci et al., 2019, among others), and longitudinal data in combination with survival data (Lin et al., 2002, 2004; Altstein et al., 2011; Proust-Lima et al., 2016; Hilton et al., 2018; Han et al., 2007; Han, 2009). However, LCA methods tailored to delineate the heterogeneity in recurrent event trajectory are limited. Relevant efforts include Han et al. (2007) and Han (2009) that investigated a joint latent class model of longitudinal biomarkers and recurrent events. These methods rely on parametric model assumptions and require a rather complex expectation-maximization (EM) algorithm to obtain the maximum likelihood estimates. This raises concerns on theoretical bias due to misspecification of the parametric model and computational instability due to the algorithmic complexity.

To help uncover the heterogeneity structure underlying the observed recurrent events with the above issues alleviated, we propose a robust semiparametric latent class method for recurrent events data based on the popular multiplicative intensity modeling. That is, we assume the whole population consists of K latent classes. Within each latent class, the occurrences of recurrent events can be captured by a semiparametric multiplicative intensity model (Prentice et al., 1981; Andersen and Gill, 1982), while the assumed recurrent events models have distinct covariate coefficients and baseline intensity function across different latent classes. To estimate the proposed latent class mixture model, the parametric likelihood approach is precluded by the nonparametric formulation of the baseline intensity function under the semiparametric multiplicative intensity model. Adapting existing methods for semiparametric multiplicative intensity models (Andersen and Gill, 1982; Wang et al., 2001, for example), however, confronts a notable challenge due to the unobservable latent class membership for all subjects.

In this work, we investigate the latent class analysis of recurrent events data based on the proposed semiparametric latent class mixture model. Utilizing the special stochastic properties implied by the adopted multiplicative intensity modeling, we construct a Nelson-Aalen type equation and derive a nonparametric estimator of the baseline mean function. We address the difficulty with the unobservable latent class membership by adapting the principal of conditional score (Stefanski and Carroll, 1987). Using empirical process arguments and estimating equation theory, we establish desirable asymptotic properties, including the uniform consistency and weak convergence of the baseline mean function estimator, and the consistency and asymptotic normality of parameter estimators. We also discuss the selection of K, the number of latent classes, using the classic relative entropy measure (Ramaswamy et al., 1993). Of practical appeals, our estimating equations can be solved by a simple iterative estimation procedure, which can be stably and efficiently implemented based on existing computational routines.

The reminder of this paper is organized as follows. We describe the latent class mixture modeling of recurrent events data and other model assumptions in Section 2. We present the proposed estimating equations and algorithm in Section 3. Section 4 covers the asymptotic properties of the proposed estimators and inferences. In Section 5, we report results from our simulation studies, which demonstrate satisfactory finite sample performance of the proposed method as well as its practical advantages. In Section 6, we apply the new method to a dataset from a group of research participants at Goizueta Alzheimer’s Disease Research Center who were diagnosed with a cognitive disorder. Some concluding remarks are contained in Section 7.

2. Data and Model Assumptions

For subject i, let $T_{i}^{(j)}$ denote time to the jth recurrent event and let ${\tilde{Z}}_{i}$ denote a p × 1 vector of time-independent covariates. The underlying counting process for the recurrent events is defined as $N_{i}^{*} (t) = \sum_{j = 1}^{\infty} I (T_{i}^{(j)} \leq t) (i = 1, \dots, n)$ . Suppose the observation of recurrent events is terminated by a censoring time C_i. Then the observed counting process of recurrent events is given by $N_{i} (t) = N_{i}^{*} (t \land C_{i}) = \sum_{j = 1}^{\infty} I (T_{i}^{(j)} \leq t \land C_{i})$ , where a ∧ b denotes the minimum of a and b. The observed recurrent events data consist of ${\{N_{i} (t), C_{i}, {\tilde{Z}}_{i}\}}_{i = 1}^{n}$ . In the sequel, notation without subscript _i represent the corresponding population analogues.

Suppose the whole population comprise K latent classes; each latent class represents a subpopulation that has its own mechanism governing the interplay between recurrent event occurrences and the observed covariates. To depict such a data scenario, we assume a latent class mixture model, where $N_{i}^{*} (t)$ is a nonstationary Poisson process with the intensity function,

λ_{i} (t) = \sum_{k = 1}^{K} I (ξ_{i} = k) \times λ_{0} (t) \times W_{i} \times η_{0, k} \times \exp ({\tilde{Z}}_{i}^{T} {\tilde{β}}_{0, k}) .

(1)

Here the number of latent classes, K, is pre-determined, ξ_i stands for the unobservable latent class membership with I(ξ_i = k) indicating whether or not subject i belongs to class k, λ₀(t) is an unspecified, continuous, nonnegative baseline intensity function shared among different latent classes, W_i is a positive subject-specific latent variable (or frailty) independent of $(ξ_{i}, {\tilde{Z}}_{i}, C_{i})$ , and η_0,k > 0 and ${\tilde{β}}_{0, k} (k = 1, \dots, K)$ are unknown class-specific parameters. Here η_0,k captures the class-k scale shift in the baseline intensity function, and ${\tilde{β}}_{0, k}$ represents the class-k covariate effects on the intensity function of recurrent events. The frailty W_i offers the flexibility to accommodate individual difference with a larger (or smaller) value indicating more (or less) frequent occurrences of recurrent events. To ensure the identifiability of λ₀(t) and η_0,k’s in (1), we assume $E (W_{i} | {\tilde{Z}}_{i}, ξ_{i} = k) = 1$ for k = 1,...,K and impose the constraint,

\int_{0}^{ν^{*}} λ_{0} (t) d t = 1,

(2)

where ν^∗ is a predetermined constant. In practice, ν^∗ may be chosen to be slightly smaller than the upper bound of C_i’s support. A different choice of ν^∗ would only imply a scale shift to λ₀(t) by a constant with η_0,k and ${\tilde{β}}_{0, k}$ remaining the same.

Note that the specification of the class-k recurrent event process under model (1) takes the same form as the multiplicative intensity model studied by Wang et al. (2001). Both models include multiplicative subject-specific frailty W_i which helps relax the memoryless constraint inherited with Poisson processes (Cook and Lawless, 2007), while our model involves one additional latent variable ξ_i to account for subject-specific variability induced by the underlying latent class membership. By writing (1) as $λ_{i} (t) = \{\sum_{k = 1}^{K} I (ξ_{i} = k) η_{0, k}\} λ_{0} (t) W_{i} \exp ({\tilde{Z}}_{i}^{T} \{\sum_{k = 1}^{K} I (ξ_{i} = k) {\tilde{β}}_{0, k}\})$ , we further note that, unlike in Wang et al. (2001)’ model, the latent variable ξ_i is incorporated in a non-multiplicative manner, and influences the covariate effects (i.e. $\sum_{i = 1}^{K} I (ξ_{i} = k) {\tilde{β}}_{0, k}$ ). The design of our model is tailored to tackle complex heterogeneity structure of recurrent events data through the perspective of latent class analysis (LCA).

To address the difficulty with the unobservable latent class membership, we further assume a multinomial logistic regression model for ξ_i:

Ρ (ξ_{i} = k | {\tilde{Z}}_{i}) = p_{k} (α_{0}, {\tilde{Z}}_{i}) ≐ \frac{\exp ({\tilde{Z}}_{i}^{T} α_{0, k})}{\sum_{k = 1}^{K} \exp ({\tilde{Z}}_{i}^{T} α_{0, k})}, k = 1, \dots, K,

(3)

where $α_{0} = {(α_{0, 1}^{T} \dots, α_{0, K}^{T})}^{T}$ with α_0,1 = 0_p×1. Model (3) is commonly adopted in latent class analysis literature to facilitate recovering information on the unobservable latent class structure based on the observed data. Note that model (3) can be readily adapted to allow only a subset of the covariates in ${\tilde{Z}}_{i}$ or a separate set of covariates to influence the distribution of the latent class membership. In addition, we assume that C_i and $(N_{i}^{*} (\cdot), ξ_{i})$ are independent given ${\tilde{Z}}_{i}$ .

3. The Proposed Estimation Procedure

3.1. Estimating equation

Let $Z_{i} = {(1, {\tilde{Z}}_{i}^{⊤})}^{⊤}$ and $β_{0, k} = {(\log η_{0, k}, {\tilde{β}}_{0, k}^{⊤})}^{⊤}$ . By model assumptions (1) and (2), and $E (W_{i} | {\tilde{Z}}_{i}, ξ_{i} = k) = 1$ , it holds that

Ε [N_{i}^{*} (t) | ξ_{i} = k, Z_{i}] = Ε \{μ_{0} (t) \cdot W_{i} \cdot \exp (Z_{i}^{T} β_{0, k}) | ξ_{i} = k, Z_{i}\} = μ_{0} (t) \exp (Z_{i}^{T} β_{0, k}),

(4)

where $μ_{0} (t) = \int_{0}^{t} λ_{0} (s) d s$ and µ₀(ν^∗) = 1. Given that C_i and $(N_{i}^{*} (\cdot), ξ_{i})$ are independent given Z_i, this implies $Ε [\frac{N_{i}^{*} (C_{i})}{μ_{0} (C_{i})} | ξ_{i} = k, Z_{i}] = \exp (Z_{i}^{T} β_{0, k})$ , and consequently

Ε [I (ξ_{i} = k) Z_{i} \{\frac{N_{i}^{*} (C_{i})}{μ_{0} (C_{i})} - \exp (Z_{i}^{T} β_{0, k})\}] = 0.

(5)

However, (5) cannot be directly utilized to construct an estimating equation for β_0,k’s because ξ_i’s are not observable. To overcome this difficulty, we adapt the principle of conditional score (Stefanski and Carroll, 1987) commonly used for handling missing data. Our specific idea is to recover the missing information on I(ξ_i = k) by conditioning it on the observed Z_i, C_i and $D_{i} ≐ N_{i} (C_{i})$ . This results in the following equation:

Ε [τ_{i k} Z_{i} \{\frac{N_{i}^{*} (C_{i})}{μ_{0} (C_{i})} - \exp (Z_{i}^{T} β_{0, k})\}] = 0,

(6)

where $τ_{i k} = Ε [I (ξ_{i} = k) | Z_{i}, D_{i}, C_{i}]$ . This equation provides a feasible platform for constructing an estimating equation for β_0,k’s.

It is remarkable that $τ_{i k}$ involved in (6) has a convenient analytic form, which is an appealing feature of the proposed estimation strategy. Specifically, by the definition, $τ_{i k}$ can be expressed as

τ_{i k} = \frac{P (D_{i} = d_{i} | ξ_{i} = k, Z_{i}, C_{i}) P (ξ_{i} = k | Z_{i}, C_{i})}{\sum_{l = 1}^{K} P (D_{i} = d_{i} | ξ_{i} = l, Z_{i}, C_{i}) P (ξ_{i} = l | Z_{i}, C_{i})} .

(7)

Under model (3) and the censoring assumption that C_i and $(N_{i}^{*} (\cdot), ξ_{i})$ are independent given Z_i, we have

Ρ (ξ_{i} = k | Z_{i}, C_{i}) = p_{k} (α_{0}, {\tilde{Z}}_{i}) = \frac{\exp ({\tilde{Z}}_{i}^{T} α_{0, k})}{\sum_{l = 1}^{K} \exp ({\tilde{Z}}_{i}^{T} α_{0, l})} .

(8)

To assess P(D_i = d_i|ξ_i = k,Z_i,C_i), it is important to note that under model (1), $N_{i}^{*} (t)$ , given ξ_i = k, W_i, and Z_i, is a nonhomogeneous Poisson process with mean function $μ_{0} (t) W_{i} \exp (Z_{i}^{⊤} β_{k})$ (Lin et al., 2000). Thus, $\{μ_{0} (T_{i}^{(1)}), μ_{0} (T_{i}^{(2)}), \dots\}$ can be viewed as random variates generated from a homogeneous Poisson process with mean function of the form, $W_{i} \exp (Z_{i}^{⊤} β_{0, k}) t$ . Using standard probabilistic arguments presented in Section 1.1 of Supplementary Materials, we show that

\begin{array}{l} Ρ (D_{i} = d_{i} | ξ_{i} = k, Z_{i}, C_{i}) \\ = \int_{0}^{\infty} \frac{{\{\exp (Z_{i}^{T} β_{0, k}) w \cdot μ_{0} (C_{i})\}}^{d_{i}}}{d_{i}!} \exp \{- \exp (Z_{i}^{T} β_{0, k}) w \cdot μ_{0} (C_{i})\} \cdot f_{W} (w) d w, \end{array}

(9)

where f_W (·) denotes the known density function of W, the population analogue of the subject-specific frailty W_i. A common choice of the frailty density f_W (·) is the density of the Gamma(r,r) distribution (r > 0). The selection of f_W (·) and an extension to unknown f_W (·) are discussed in Section 3.5 and Section 7 respectively. Plugging (8) and (9) into (7), we can obtain an explicit expression of $τ_{i k}$ in terms of α₀, β₀ and µ₀(·), denoted by $τ_{i k} (α_{0}, β_{0}, μ_{0})$ , where $β_{0} = {(β_{0, 1}^{T}, \dots, β_{0, K}^{T})}^{T}$ .

Based on equation (6), if µ₀(·) were known, we may consider the following estimating equations:

S_{1, n, k} (α, β, μ_{0}) ≐ \frac{1}{n} \sum_{i = 1}^{n} τ_{i k} (α, β, μ_{0}) Z_{i} \{\frac{N_{i}^{*} (C_{i})}{μ_{0} (C_{i})} - \exp (Z_{i}^{T} β_{k})\} = 0, k = 1, \dots, K .

(10)

In addition, under model (3), the likelihood assuming ξ_i’s are observed is given by $\prod_{i = 1}^{n} \prod_{k = 1}^{K} p_{k} {(α, {\tilde{Z}}_{i})}^{I (ξ_{i} = k)}$ , which leads to the score equation,

\sum_{i = 1}^{n} \sum_{k = 1}^{K} I (ξ_{i} = k) \frac{\partial}{\partial α} \log p_{k} (α, {\tilde{Z}}_{i}) = 0.

The same reasoning used to derive (6) motivates us to consider another set of estimating equations,

S_{2, n, k} (α, β, μ_{0}) ≐ \frac{1}{n} \sum_{i = 1}^{n} τ_{i k} (α, β, μ_{0}) ({\tilde{Z}}_{i} - \frac{\exp ({\tilde{Z}}_{i}^{T} α_{k}) {\tilde{Z}}_{i}}{\sum_{j = 1}^{K} \exp ({\tilde{Z}}_{i}^{T} α_{j})}) = 0, k = 1, \dots, K .

(11)

However, µ₀(·) is generally unknown. To overcome this obstacle, we propose a Nelson-Aalen type estimator of µ₀(·) under the assumed multiplicative intensity modeling of recurrent events. Specifically, define S_C(t|Z_i) = P(C ≥ t|Z_i) and H₀(t) = log{µ₀(t)/µ₀(ν^∗)}. Since µ₀(ν^∗) = 1, it is easy to see that H₀(ν^∗) = 0 and µ₀(t) = exp{H₀(t)}. Under the conditional independent censoring assumption, the multiplicative intensity structure imposed by equation (4) implies that $Ε [d N_{i} (t) | Z_{i}, ξ_{i} = k] = S_{C} (t | Z_{i}) \exp (Z_{i}^{T} β_{0, k}) λ_{0} (t) d t$ and $Ε [I (C_{i} \geq t) N_{i} (t) d H_{0} (t) | Z_{i}, ξ_{i} = k] = S_{C} (t | Z_{i}) \exp (Z_{i}^{T} β_{0, k}) λ_{0} (t) d t$ . It then follows that E[dM_i(t)] = 0, where dM_i(t) = dN_i(t)−I(C_i ≥ t)N_i(t)dH₀(t). Solving $\sum_{i = 1}^{n} d M_{i} (t) = 0$ yields an estimator of µ₀(t),

\hat{μ} (t) = \exp \{\hat{H} (t)\}

(12)

with $\hat{H} (t) = - \int_{t}^{ν^{*}} \frac{\sum_{i = 1}^{n} d N_{i} (s)}{\sum_{i = 1}^{n} I (C_{i} \geq s) N_{i} (s)}$ . Using $\hat{μ} (\cdot)$ in place of µ₀(·) in (10) and (11), we propose the following estimating equations for α₀ and β₀:

n^{1 / 2} S_{1, n} (α, β, \hat{μ}) = 0,

(13)

n^{1 / 2} S_{2, n} (α, β, \hat{μ}) = 0,

(14)

where $S_{j, n} (α, β, \hat{μ}) = {(S_{j, n, 1} {(α, β, \hat{μ})}^{T}, \dots, S_{j, n, K} {(α, β, \hat{μ})}^{T})}^{T}, j = 1, 2$ . Note that (13), which includes (p + 1)K equations, and (14), which includes pK equations, have sufficient dimensions to estimate a total number of (2p + 1)K unknown parameters, $\{(α_{0, k}^{T}, η_{0, k}, {\tilde{β}}_{0, k}^{T}) k = 1, \dots, K\}$ .

3.2. Estimation algorithm

The propose estimation procedure can be implemented as follows.

Step 1: Compute $\hat{μ} (\cdot)$ based on (12). Set r = 0 and initial estimates for α₀ and β₀, denoted by ${\hat{α}}^{[0]}$ and ${\hat{β}}^{[0]}$ . Calculate ${\hat{τ}}_{i k} ≐ τ_{i k} ({\hat{α}}^{[0]}, {\hat{β}}^{[0]}, \hat{μ})$ based on equations (7)–(9).

Step 2: Increase r by 1. Solve estimating equations, (13) and (14), with $τ_{i k} (α, β, \hat{μ})$ fixed as ${\hat{τ}}_{i k}$ . Denote the resulting solutions by ${\hat{α}}^{[r]}$ and ${\hat{β}}^{[r]}$ .

Step 3: Update ${\hat{τ}}_{i k}$ by $τ_{i k} ({\hat{α}}^{[r]}, {\hat{β}}^{[r]}, \hat{μ})$ .

Step 4: Repeat Steps 2 and 3 until pre-specified converge criteria are met. Denote the final estimators of α₀ and β₀ by $\hat{α}$ and $\hat{β}$ respectively.

In Step 1, we obtain the initial estimate, ${\hat{α}}^{[0]}$ , by following the strategy of Lin et al. (2002). Specifically, we randomly assign class memberships to all subjects and then fit the multinomial logistic regression to obtain ${\hat{α}}^{[0]}$ . We obtain the initial estimate, ${\hat{β}}^{[0]}$ , by fitting the multiplicative intensity model studied by Wang et al. (2001) using the reReg() function in R package reReg, stratified by the randomly assigned latent class membership.

In Step 2, we obtain β^[r] by solving the equation

\sum_{i = 1}^{n} {\hat{τ}}_{i k} Z_{i} \{\frac{N_{i}^{*} (C_{i})}{\hat{μ} (C_{i})} - \exp (Z_{i}^{T} β_{k})\} = 0,

for β (k = 1,...,K). This equation is a monotone estimating equation (Fygenson and Ritov, 1994). Furthermore, we find that solving this equation can be equivalently transformed to fitting a “pseudo” weighted Poisson regression model with response $\frac{N_{i}^{*} (C_{i})}{\hat{μ} (C_{i})}$ and covariates Z_i along with weights ${\hat{τ}}_{i k}$ . To obtain ${\hat{α}}^{[r]}$ , it is easy to show that solving equation (14) with $τ_{i k} (α, β, \hat{μ})$ fixed at ${\hat{τ}}_{i k}$ can be equivalently carried out by first generating $ξ_{i}^{*}$ from Multinomial $(1, ({\hat{τ}}_{i 1}, \dots, {\hat{τ}}_{i K}))$ distribution (i = 1,...,n) and then performing multinomial regression with responses ${\{ξ_{i}^{*}\}}_{i = 1}^{n}$ and covariates ${\{{\tilde{Z}}_{i}\}}_{i = 1}^{n}$ . These procedures can be readily implemented by existing computational routines, such as the R function glm() and the R function multinom() in the R package nnet.

In Step 4, the convergence criterion can be specified as the magnitude of parameter estimate change between two consecutive iterations below certain tolerance value. The magnitude of parameter estimate change may be measured by an absolute difference, for example, $\max ({‖{\hat{β}}^{[r]} - {\hat{β}}^{[r - 1]}‖}_{\infty}, {‖{\hat{α}}^{[r]} - {\hat{α}}^{[r - 1]}‖}_{\infty})$ , or by a relative difference, for example, $({‖\frac{{\hat{β}}^{[r]} - {\hat{β}}^{[r - 1]}}{{\hat{β}}^{[r - 1]}}‖}_{\infty}, {‖\frac{{\hat{α}}^{[r]} - {\hat{α}}^{[r - 1]}}{{\hat{α}}^{[r - 1]}}‖}_{\infty})$ . Here || · ||_∞ denotes the L_∞ norm and the fraction between vectors stands for the component-wise fraction.

3.3. Selection of the number of latent classes

To fit the proposed latent class mixture model to a real dataset, a critical question is how to select the value of K, which represents the number of latent classes. Common practice to address this question includes using domain knowledge, or through model evaluations based on information criteria, statistical tests, entropy, reliability, or other criteria. In this work, we consider a relative entropy measure (Ramaswamy et al., 1993) defined as,

E_{K} = 1 - \frac{\sum_{i = 1}^{n} \sum_{k = 1}^{K} - {\hat{τ}}_{i k} \log ({\hat{τ}}_{i k})}{n \log (K)},

(15)

where ${\hat{τ}}_{i k} = τ_{i k} (\hat{α}, \hat{β}, \hat{μ})$ . By the definition, E_K is bounded between 0 and 1. Following the discussions of Celeux and Soromenho (1996), E_K is expected to be close to 1 when latent classes are well separated, and take a small value when latent classes are heavily overlapped. Therefore, we propose to select K as the maximizer of E_K, which is, $\hat{K} = \arg \max_{K \geq 2} E_{K}$ . As suggested by the simulation studies presented in Section 5, this approach has a high chance to select the true value for K when latent classes are well separated. However, when latent classes overlap considerably, this approach may tend to misspecify K by a value smaller than the true one. This may reflect that given a small or moderate sample size, the relative entropy measure may not be sufficiently “powered” to differentiate all latent classes that are heavily overlapped. Nevertheless, the simulation studies also suggest that under-selecting K in such a case may still yield reliable predictions of recurrent event numbers based on the proposed models.

3.4. Model checking

Model checking is of practical importance. A simple graphic approach can be conducted to evaluate the overall fit of the proposed models. The basic idea is to contrast the number of the observed recurrent events, $D_{i} ≐ N_{i} (C_{i})$ , versus its prediction under the assumed models. Specifically, the model assumptions in (1)–(3) imply $E \{N_{i} (C_{i}) | Z_{i}\} = E \{N_{i}^{*} (C_{i}) | Z_{i}\} = \sum_{k = 1}^{K} τ_{i k} \cdot μ_{0} (C_{i}) \exp (Z_{i}^{T} β_{0, k})$ . Thus, under the proposed models, N_i(C_i) may be predicted by ${\hat{D}}_{i} ≐ \sum_{i = 1}^{K} {\hat{τ}}_{i k} \cdot \hat{μ} (C_{i}) \exp (Z_{i}^{T} {\hat{β}}_{k})$ with the observed data. Therefore, the graphic model checking may be conducted via examining the scatter plot of ${\hat{D}}_{i}$ versus D_i. Observing a systematic departure of the pairs of $({\hat{D}}_{i}, D_{i})$ from the 45 degree line may suggest a lack-of-fit of the assumed models.

3.5. Selection of the frailty density

In practice, the selection of f_W (·) may be guided by the model checking procedure presented in Section 3.4. Specifically, for each candidate $f_{W}^{S} (\cdot)$ , we compute the predictions of D_i based on the observed data, denoted by ${\hat{D}}_{i}^{S}$ . Then we select $f_{W}^{S} (\cdot)$ that yields the smallest discrepancy between ${\{{\hat{D}}_{i}^{S}\}}_{i = 1}^{n}$ and ${\{D_{i}\}}_{i = 1}^{n}$ , which may be summarized by ${APE}_{M} ≐ n^{- 1} \sum_{i = 1}^{n} |{\hat{D}}_{i}^{S} - D_{i}|$ , ${MPE}_{M} ≐ m e d i a n \{|{\hat{D}}_{i}^{S} - D_{i}| : i = 1, \dots, n\}$ , or ${SMSPE}_{M} ≐ \sqrt{n^{- 1} \sum_{i = 1}^{n} {({\hat{D}}_{i}^{S} - D_{i})}^{2}}$ . Our simulation studies suggest that the proposed estimates that use the f_W (·) determined by this approach have quite comparable performance to the proposed estimates that adopts the true f_W (·); see Table 5.

Table 5.

Simulation results from the proposed estimation that uses the true f_W(·) (i.e. r = 5) or uses f_W(·) selected by model-checking measures APE_M, MPE_M, or SMSPE_M in scenario S1.

Model	r = 5				APE_M				MPE_M				SMSPE_M
Parameter	BIAS	SD	SE	CP	BIAS	SD	SE	CP	BIAS	SD	SE	CP	BIAS	SD	SE	CP
${\tilde{α}}_{11}$	−0.009	0.452	0.450	0.954	0.032	0.440	0.448	0.946	0.008	0.421	0.430	0.949	0.021	0.437	0.454	0.953
${\tilde{α}}_{12}$	0.009	0.263	0.263	0.949	−0.016	0.255	0.247	0.941	−0.021	0.257	0.233	0.940	−0.019	0.254	0.267	0.951
${\tilde{α}}_{21}$	−0.004	0.392	0.392	0.951	0.051	0.352	0.372	0.922	−0.032	0.322	0.312	0.929	0.009	0.351	0.329	0.923
${\tilde{α}}_{22}$	0.001	0.229	0.229	0.948	0.008	0.227	0.227	0.943	0.011	0.217	0.198	0.931	0.013	0.226	0.244	0.949
log(η₁)	0.022	0.090	0.101	0.953	0.046	0.137	0.128	0.946	0.046	0.129	0.130	0.946	0.048	0.137	0.156	0.951
log(η₂)	0.029	0.116	0.124	0.952	0.032	0.132	0.111	0.947	0.035	0.133	0.121	0.942	0.027	0.134	0.123	0.917
log(η₃)	0.036	0.086	0.103	0.950	0.025	0.131	0.115	0.940	0.038	0.120	0.127	0.951	0.019	0.131	0.120	0.949
${\tilde{β}}_{11}$	0.061	0.116	0.125	0.957	−0.006	0.143	0.120	0.927	−0.012	0.134	0.145	0.951	−0.024	0.135	0.145	0.951
${\tilde{β}}_{12}$	−0.038	0.064	0.080	0.949	0.012	0.098	0.103	0.947	−0.043	0.076	0.078	0.927	−0.028	0.087	0.104	0.946
${\tilde{β}}_{21}$	0.037	0.293	0.311	0.951	0.043	0.183	0.101	0.910	−0.008	0.156	0.167	0.921	0.024	0.166	0.146	0.921
${\tilde{β}}_{22}$	−0.009	0.111	0.116	0.948	0.009	0.102	0.109	0.945	0.021	0.096	0.109	0.944	−0.009	0.100	0.122	0.949
${\tilde{β}}_{31}$	−0.017	0.126	0.133	0.953	−0.032	0.133	0.125	0.939	−0.027	0.118	0.104	0.938	−0.036	0.139	0.147	0.953
${\tilde{β}}_{32}$	−0.008	0.069	0.072	0.950	−0.009	0.079	0.056	0.932	0.021	0.080	0.065	0.931	−0.021	0.079	0.080	0.950

Open in a new tab

3.6. Efficiency augmentation via optimally weighted averaging

In this subsection, we discuss an efficiency augmentation approach. Let 0 < a₁ < a₂ < … < a_L ≤ 1 be pre-specified constants. Following the same arguments in Section 3.1, we can show that a modified equation (5) with C_i replaced by a_lC_i leads to a valid variant of the proposed estimator (l = 1,...,L). Let θ^(l) denote the resulting estimator of a given parameter in α₀ or β₀. Mimicking the oracle convex combination procedure studied by Lavancier and Rochet (2016), we combine ${\{θ^{(l)}\}}_{l = 1}^{L}$ by their weighted average, where the weights are chosen to minimize the estimated standard errors, i.e.

{\hat{θ}}_{W E} (w_{1}, \dots, w_{L - 1}) = w_{1} θ^{(1)} + \dots + w_{L - 1} θ^{(L - 1)} + (1 - \sum_{j = 1}^{L - 1} w_{j}) θ^{(L)},

where $(w_{1}, \dots, w_{L - 1}) = \arg \min_{(x_{1}, \dots, x_{L - 1}) \in X} SE ({\hat{θ}}_{W E} (x_{1}, \dots, x_{L - 1}))$ , and $X = \{(x_{1}, \dots, x_{L - 1}) : x_{1}, \dots, x_{L - 1} \in S, \sum_{l = 1}^{L - 1} x_{l} \leq 1\}$ . Here $S$ is a pre-specified subset of [0,1] including candidate values for weight, and $SE (\hat{θ})$ denotes the bootstrapped based standard error estimate for $\hat{θ}$ . This optimally weighted averaging procedure, by its definition, is expected to produce parameter estimators that are more efficient than each individual estimator being combined.

As suggested by our simulation studies, θ^(l) corresponding to a small constant a_l ∈ (0,1) may have considerably reduced estimation efficiency and stability. Therefore, when applying the optimally weighted averaging procedure in practice, one may need to set all a_l’s large enough so that reasonable amount of recurrent event information is captured up to time a_lC. As an empirical rule from our numerical experience, we recommend choosing a_l such that the standard errors of θ^(l), on average across different parameters, do not exceed three times those of the proposed estimators corresponding to the constant 1.

4. Asymptotic properties and estimation of asymptotic variance

4.1. Asymptotic properties

We study the asymptotic properties of the proposed estimators. We first introduce some necessary notation and regularity conditions. Let $θ = {(α^{T}, β^{T})}^{T}$ and $θ_{0} = {(α_{0}^{T}, β_{0}^{T})}^{T}$ . Let $L$ denote the covariate space containing all possible values of $\tilde{Z}$ , and Θ denote the parameter space for θ. Write $S_{n} (θ, μ) = {(S_{1 n}^{T} (α, β, μ), S_{2 n}^{T} (α, β, μ))}^{T}$ , and let $s (θ, μ) = Ε \{S_{n} (θ, μ)\}$ . Note that $τ_{i k} (θ, μ)$ depends on µ(·) only via µ(C_i). Therefore, we can define a function ${\tilde{τ}}_{i k} (θ, y) : Θ \times R \to [0, 1]$ such that $τ_{i k} (θ, μ) = {\tilde{τ}}_{i k} (θ, μ (C_{i}))$ for all θ ∈ Θ, i = 1,...,n, and k = 1,...,K. Let

ζ_{1, k, i} (θ, y) = {\tilde{τ}}_{i k} (α, β, y) Z_{i} \{\frac{N_{i}^{*} (C_{i})}{y} - \exp (Z_{i}^{T} β_{k})\}, k = 1, \dots, K

and

ζ_{2, k, i} (θ, y) = {\tilde{τ}}_{i k} (α, β, y) \{Z_{i} - \frac{\exp ({\tilde{Z}}_{i}^{T} α_{k}) {\tilde{Z}}_{i}}{\sum_{j = 1}^{K} \exp ({\tilde{Z}}_{i}^{T} α_{j})}\}, k = 1, \dots, K .

It is easy to see that $S_{n} (θ, μ) = n^{- 1} \sum_{i = 1}^{n} ζ_{i} (θ, μ (C_{i}))$ , where $ζ_{i} (θ, y) = {(ζ_{1, 1, i} {(θ, y)}^{T}, \dots, ζ_{1, K, i} {(θ, y)}^{T}, ζ_{2, 1, i} {(θ, y)}^{T}, \dots, ζ_{2, K, i} {(θ, y)}^{T})}^{T}$ . Define ${\dot{ζ}}_{i, θ} (θ, y) = \frac{\partial ζ_{i} (θ, y)}{\partial θ}$ and ${\dot{ζ}}_{i, μ} (θ, y) = \frac{\partial ζ_{i} (θ, y)}{\partial y}$ . Let ${\dot{H}}_{0} (t) = d H_{0} (t) / d t$ . For a vector u, let ||u|| denote its Euclidean form and u^(j) denote its j-th component.

We assume the following regularity conditions:

C1 The parameter space Θ and the covariate space $L$ are compact.

C2 (a) N^∗(ν^∗) is bounded, a.s.; (b) P(M = m|ξ = k,Z,C) and P(ξ = k|Z) are bounded away from zero for all m ≥ 0, k = 1,...,K and Z.

C3 For some ν_∗ ∈ (0,ν^∗), (a) P(C < ν_∗|Z) = P(C > ν^∗|Z) = 0, and P(C = ν^∗|Z) > 0; (b) $\inf_{t \in [ν_{*}, ν^{*}]} Ε \{S_{C} (t | Z_{i})\} μ_{0} (t) > 0$ .

C4 {∂s(θ,µ₀)/∂θ^T}⁻¹ exists and uniformly bounded in θ ∈ Θ.

C5 (a) ${\dot{ζ}}_{i, θ} (θ, y)$ and ${\dot{ζ}}_{i, μ} (θ, y)$ are uniformly bounded for all i, θ ∈ Θ and y ∈ [0,1]; (b) each component of ${\dot{ζ}}_{i, θ} (θ, y)$ or ${\dot{ζ}}_{i, μ} (θ, y)$ has bounded partial derivative with respect to θ.

Conditions C1 assumes bounded parameter space and bounded covariates. By condition C2 (a), the number of the observed recurrent events is bounded. This is realistic for studies with finite duration of follow-up. Condition C2(b) is assumed to guarantee the posterior probability $τ_{i k}$ is always meaningful. Condition C3 ensures that $\hat{μ} (\cdot)$ exists and is well defined, and $\hat{μ} (C_{i})$ is bounded between 0 and 1. Conditions C4 and C5 are rather standard assumptions for estimating equations and play important roles in establishing the consistency and asymptotic normality of the proposed estimators of α₀ and β₀.

We summarize the asymptotic properties of the proposed estimators in the following three Theorems. The detailed proofs are provided in Section 1.2 of the Supplementary Materials.

Theorem 1. Under the regularity conditions (C1)–(C3),

\sup_{t \in [ν_{*}, ν^{*}]} |\hat{μ} (t) - μ_{0} (t)| \to_{P} 0.

(16)

Furthermore, $\sqrt{n} \{\hat{μ} (t) - μ_{0} (t)\}$ converges weakly to a zero-mean Gaussian process with covariance function $Ε \{μ_{0} (s) ϕ_{i} (s) ϕ_{i} (t) μ_{0} (t)\}$ at (s,t), where $ϕ_{i} (t) = - \int_{t}^{ν^{*}} \frac{d M_{i} (s)}{Ε \{S_{C} (s | Z_{i})\} μ_{0} (s)}$ and $ν_{*} \leq s < t \leq ν^{*}$ .

Corollary 1. Under the regularity conditions (C1)–(C3),

|n^{1 / 2} \{\hat{μ} (C) - μ_{0} (C)\} - n^{- 1 / 2} \sum_{i = 1}^{n} μ_{0} (C) ϕ_{i} (C)| = o_{P} (1) .

Theorem 2. Under the regularity conditions (C1)–(C5), we have $\hat{θ} \to_{P} θ_{0}$ .

Theorem 3. Under the regularity conditions (C1)–(C5), $\sqrt{n} (\hat{θ} - θ_{0}) \to_{d} N (0, V)$ , where N(0,V) denotes a multivariate normal distribution with mean zero and covariance matrix V, and the definition of V is provided in Section 1.2 of the Supplementary Materials.

As shown in the proof of Theorem 3, the asymptotic covariance matrix V takes a complex form. Therefore, we recommend using bootstrapping to conduct variance estimation and other inferences. Specifically, we can resample the observed data with replacement and obtain an estimate for θ₀ based on the resampled sample, denoted by $θ^{*} ≐ {(α^{* T}, β^{* T})}^{T}$ . Repeating the resampling procedure B times, where B is a large predetermined number, we can obtain many realizations of θ^∗. Then the asymptotic covariance of $\hat{θ}$ can be estimated by the empirical covariance of θ^∗. The confidence intervals for each component of θ₀ can be constructed by using normal approximations to referring to the empirical distribution of the corresponding component of θ^∗.

Formulating the proposed estimators, $\hat{μ} (t)$ and $\hat{θ}$ , as Hadamard-differentiable functionals of Donsker empirical processes, we can formally justify the presented nonparametric bootstrapping inference procedure by applying the theory for bootstrapped empirical processes in combination with the delta-method (Van Der Vaart et al., 1996; Kosorok, 2008). The simulation results reported in Section 5 further provide strong empirical evidence to support the validity of nonparametric bootstrapping inferences in this work.

5. Simulation

We conduct extensive simulation studies to assess the finite-sample performance of the proposed method and demonstrate its advantages.

Considering three latent classes (i.e. K = 3), we generate the latent class membership, ξ, based on model (3) with two covariates $\tilde{Z} = {(Z_{1}, Z_{2})}^{T}$ . Given ξ, we generate T^(j) according to model (1). The true parameters, $\{α_{0, k}, {\tilde{β}}_{0, k}, η_{0, k} : k = 1, 2, 3\}$ are listed in Table S.1 in the Supplementary Materials. We let the censoring time C follow the Unif(2/3,1) distribution, independent of T^(j), Z and ξ. We set ν^∗ = 0.98 because the upper bound of C’s support is 1.

We investigate five data scenarios, denoted by S0–S4, with different specifications of λ₀(t) and the distributions of Z₁, Z₂, and W, which are shown in Table S.1 in the Supplementary Materials. In scenario S0, we let W = 1 to represent cases where model (1) holds without the subject-specific frailty. In scenarios S1–S4, we incorporate the subject-specific frailty W that follows the distribution, Gamma(5,5), Exponential(1), Weibull(2,1/Γ(1.5)), or truncated Norm(1,0.1), respectively, where truncated Norm(1,0.1) denotes the normal distribution Norm(1,0.1) truncated by the interval (0,2). In Table 1, we provide average summary statistics (including mean, standard deviation (SD), median, interquartile range, and range) for D by the latent class, which reflect how the number of the observed recurrent events varies across latent classes, P(ξ = k)’s (k = 1,2,3), which depict the proportions of subjects belonging to different latent classes, and relative entropy values, which capture the levels of separation among the three latent classes. We note that the distributions of D are rather different across the three latent classes in scenarios S0 and S1, but are very similar in scenarios S3 and S4. In scenario S2, in terms of D’s distribution, class 1 resembles class 2, but differs dramatically from class 3. These observations are consistent with the values of relative entropy, which show a decreasing trend as the scenario is changed from S0 to S4, suggesting well separated latent classes in scenarios S0 and S1 and heavily overlapped latent classes in scenarios S3 and S4. In scenario S2, roughly equal proportions of subjects belong to the three latent classes, while in the other scenarios, these proportions vary considerably between at least two latent classes. Thus, scenarios S0-S4 represent various data situations pertaining to different combinations of balanced versus unbalanced latent class distributions and separated versus overlapped outcomes across latent classes.

Table 1.

Summary statistics of the number of the observed recurrent events.

	Simulation scenario
		S0	S1	S2	S3	S4
	Summary Statistics for D_i
mean± SD	Class 1	7.0 ± 3.7	7.0 ± 4.9	7.9 ± 11.2	6.0 ± 8.2	4.9 ± 5.9
	Class 2	2.1 ± 2.6	2.0 ± 2.9	6.4 ± 8.3	5.1 ± 5.6	3.3 ± 3.2
	Class 3	13.9 ± 12.1	13.9 ± 14.4	10.4 ± 15.9	7.7 ± 8.1	3.3 ± 3.0
median	Class 1	6.5	6.0	5.3	4.1	4.2
	Class 2	1.0	1.0	3.0	3.5	3.1
	Class 3	10.1	9.1	6.1	5.8	3.3
interquartile range	Class 1	[4.4,9.1]	[3.5,9.3]	[2.3,9.8]	[1.3,7.7]	[1.5,11.3]
	Class 2	[0.0,3.0]	[0.0,2.8]	[1.1,9.7]	[1.2,6.3]	[1.2,5.4]
	Class 3	[5.6,20.1]	[3.9,19.2]	[2.2,11.3]	[2.0,11.4]	[1.0,5.8]
range	Class 1	[0.5,19.5]	[0.1,27.0]	[0.0,87.7]	[0.1,43.3]	[0.0,13.1]
	Class 2	[0.0,12.7]	[0.0,16.4]	[0.0,50.5]	[0.0,28.2]	[0.0,13.2]
	Class 3	[0.0,59.6]	[0.0,84.9]	[0.0,100.0]	[0.0,50.5]	[0.0,16.5]

P(ξ = k)	Class 1	25.9%	25.9%	33.0%	25.4%	26.8%
	Class 2	26.0%	26.2%	33.3%	24.0%	29.0%
	Class 3	48.1%	47.9%	33.7%	50.6%	44.2%
Relative entropy		0.661	0.656	0.462	0.397	0.356

Open in a new tab

For each scenario, we generate 1000 simulated datasets with sample size n = 200 and n = 500. For each simulated dataset, 200 bootstrapping samples are drawn to calculate standard error estimates and confidence intervals based on normal approximation. For the proposed iterative algorithm, the maximum iteration number is set as 200, and the convergence criterion is $\max ({‖{\hat{β}}^{[r]} - {\hat{β}}^{[r - 1]}‖}_{\infty}, {‖{\hat{α}}^{[r]} - {\hat{α}}^{[r - 1]}‖}_{\infty}) < 10^{- 2}$ . Algorithm convergence is achieved for each simulated dataset within 200 iterations. In practice, specification of the maximum iteration number may need to be adjusted according to specific data scenarios.

We first assume f_W (·) is correctly pre-specified. In Figure S.1 in the Supplementary Materials and Figure 1, we present simulation results on the estimation and inference of µ₀(t), including the empirical biases, average estimated standard errors, average empirical standard deviations, and average empirical coverages of 95% confidence intervals (CP (95%)), for the five data scenarios with n = 200 and n = 500 respectively. It is seen that the proposed estimator of µ₀(t) produces reasonably small biases, and the empirical and estimated standard deviations are fairly close except for large t’s. In all cases, the average empirical coverage probabilities are close to the nominal level 95%.

Fig. 1. — Simulation results for estimated $\hat{μ} (t)$ under five scenarios.

Table 2 reports the simulation results for estimating α₀ and β₀ in scenario S1. Results for scenarios S0 and S2–S4 are similar, and thus are relegated to Tables S.2–S.5 in the Supplementary Materials. The reported results include the average empirical biases (BIAS), average estimated standard errors based on bootstrapping (SE), average empirical standard deviations (SD), and average empirical coverage probabilities of 95% confidence intervals. We observed that, in all cases, the empirical biases are small, between 0.1% and 7.6% of the true values. The SDs and SEs are close to each other, indicating that the bootstrap-based inference works well. The empirical coverage probabilities are close to the nominal level, 0.95. In addition, we note that the agreement between SDs and SEs improves as the sample size increases. The results in Figure 1, Figure S.1, Table 2, and Tables S.2–S.5 demonstrate satisfactory finite-sample performance of the proposed estimators, regardless the degree of separation or proportion balance among latent classes, and the types of frailty distributions.

Table 2.

Simulation results for estimating α₀ and β₀ in scenario S1.

n	Parameter	True value	BIAS	SE	SD	CP (95%)
n = 200	α	α₂₁ = −1	−0.019	0.712	0.741	0.949
		α₂₂ = 1	0.026	0.425	0.426	0.949
		α₃₁ = 0:5	0.010	0.646	0.637	0.947
		α₃₂ = 0:8	0.023	0.383	0.372	0.939
	β	log(η₁) = log(4:5)	0.043	0.154	0.150	0.964
		log(η₂) = log(4)	0.042	0.198	0.170	0.958
		log(η₃) = log(3)	0.055	0.164	0.164	0.955
		${\tilde{β}}_{11} = 0.8$	0.011	0.203	0.182	0.954
		${\tilde{β}}_{12} = 0.5$	−0.043	0.109	0.107	0.948
		${\tilde{β}}_{21} = - 4$	0.045	0.521	0.453	0.946
		${\tilde{β}}_{22} = 1$	−0.010	0.184	0.178	0.951
		${\tilde{β}}_{31} = 3$	−0.019	0.205	0.197	0.955
		${\tilde{β}}_{32} = 0.5$	−0.007	0.115	0.109	0.951

n = 500	α	α₂₁ = −1	−0.009	0.450	0.452	0.954
		α₂₂ = 1	0.009	0.263	0.262	0.949
		α₃₁ = 0:5	−0.004	0.392	0.392	0.951
		α₃₂ = 0:8	0.001	0.229	0.229	0.948
	β	log(η₁) = log(4:5)	0.022	0.101	0.090	0.953
		log(η₂) = log(4)	0.029	0.124	0.116	0.952
		log(η₃) = log(3)	0.036	0.103	0.086	0.950
		${\tilde{β}}_{11} = 0.8$	0.061	0.125	0.116	0.957
		${\tilde{β}}_{12} = 0.5$	−0.038	0.080	0.064	0.949
		${\tilde{β}}_{21} = - 4$	0.037	0.311	0.293	0.951
		${\tilde{β}}_{22} = 1$	−0.009	0.116	0.111	0.948
		${\tilde{β}}_{31} = 3$	−0.017	0.133	0.126	0.953
		${\tilde{β}}_{32} = 0.5$	−0.008	0.072	0.069	0.950

Open in a new tab

We compare the proposed method with Wang et al. (2001)’s multiplicative intensity model by the performance in predicting the number of the observed recurrent events, D_i. To this end, we employ 5-fold cross validation. Using the proposed method, we obtain $\hat{α}$ , ${\hat{β}}_{i}$ , and $\hat{μ} (\cdot)$ based on the training dataset and predict D_i in the test dataset by ${\hat{D}}_{i, K} = \sum_{k = 1}^{K} I ({\hat{ξ}}_{i} = k) \cdot \hat{μ} (C_{i}) \exp (Z_{i}^{T} {\hat{β}}_{k})$ , where ${\hat{ξ}}_{i}$ follows the distribution, $M u l t i n o m i a l (1, p_{1} (\hat{α}, {\tilde{Z}}_{i}), \dots, p_{K} (\hat{α}, {\tilde{Z}}_{i}))$ . To evaluate Wang et al. (2001)’s method, we compute ${\hat{β}}_{A}$ and ${\hat{μ}}_{A}$ based on the training dataset using the R functions reReg() and plotRate() in R package reReg, and predict D_i in the test dataset by ${\hat{D}}_{i, A} = {\hat{μ}}_{A} (C_{i}) \exp (Z_{i}^{T} {\hat{β}}_{A})$ . In each fold, we compute ${(n / 5)}^{- 1} \sum_{i = 1}^{n / 5} |x - D_{i}|$ , $m e d i a n \{|x - D_{i}|, i = 1, \dots, (n / 5)\}$ , and $\sqrt{{(n / 5)}^{- 1} \sum_{i = 1}^{(n / 5)} {|x - D_{i}|}^{2}}$ , with x standing for ${\hat{D}}_{i, K}$ or ${\hat{D}}_{i, A}$ , and calculate their averages over the 5 folds and 1000 datasets. The corresponding results are referred to average prediction errors (APE), median prediction errors (MPE), and average square root of mean squared prediction errors (SMSPE) respectively. Results presented in Table 3 show that fitting a single multiplicative intensity model without accounting for the existence of latent classes always produces less accurate prediction of D_i compared to the proposed analysis; the increase in SMSPE can be as large as 165% (see S0 with n = 500). We also note that the proposed method and Wang et al. (2001)’s method have relatively more similar predictive performance in scenarios S3 and S4 than that in scenarios S0-S2. This observation is reasonable and can be explained by the less distinct three latent classes studied in scenarios S3 and S4.

Table 3.

Comparisons between the proposed method and Wang et al. (2001)’s method in predicting the number of the observed recurrent events.

Sample size	Scenario	the proposed model			Wang’s method
Sample size	Scenario	APE	MPE	SMSPE	APE	MPE	SMSPE
n = 200	S0	5.512	2.579	8.271	8.632	4.591	13.549
	S1	7.199	3.019	11.200	8.691	4.799	13.590
	S2	6.224	2.763	9.561	8.562	4.429	13.492
	S3	3.493	2.407	5.627	3.655	2.649	5.694
	S4	2.060	1.367	2.964	2.014	1.342	2.589

n = 500	S0	5.503	2.552	8.264	8.734	4.829	13.659
	S1	7.148	2.315	10.483	8.321	4.782	13.289
	S2	7.060	3.391	10.387	8.282	4.209	13.267
	S3	3.741	2.304	5.632	3.752	2.527	5.699
	S4	2.022	1.365	2.967	2.220	1.373	2.998

Open in a new tab

We evaluate the selection of K, the number of latent classes, based on the relative entropy measure E_K discussed in Section 3.3. The left section of Table 4 reports the average relative entropy measure E_K given K = 2, 3, 4, or 5 with n = 200 and n = 500. It is shown that the highest relative entropy is attained at the true K = 3 in all data scenarios. The right section of Table 4 presents the percentages of selecting K as 2, 3, 4, or 5 based on 1000 simulated datasets. The percentages of selecting the true K = 3 are always the highest among K = 2,3,4, and 5, and can be as large as 93.3% (see n = 500 in scenario S1). These results suggest good empirical performance of the proposed approach to determining the number of latent classes.

Table 4.

Simulation results on selecting K based on relative entropy E_k

Scenario	n	average relative entropy E_k				proportion of selecting K
Scenario	n	K = 2	K = 3	K = 4	K = 5	K = 2	K = 3	K = 4	K = 5
S0	200	0.518	0.616	0.598	0.217	1.8%	90.3%	7.9%	0.0%
	500	0.504	0.661	0.607	0.385	0.0%	92.7%	7.3%	0.0%
S1	200	0.510	0.614	0.565	0.220	4.2%	87.3%	8.5%	0.0%
	500	0.545	0.656	0.604	0.379	1.9%	93.3%	4.8%	0.0%
S2	200	0.401	0.457	0.318	0.217	17.9%	82.1%	0.0%	0.0%
	500	0.405	0.462	0.373	0.208	18.2%	81.8%	0.0%	0.0%
S3	200	0.345	0.374	0.275	0.203	38.3%	61.7%	0.0%	0.0%
	500	0.345	0.397	0.269	0.208	27.8%	72.2%	0.0%	0.0%
S4	200	0.311	0.314	0.217	0.201	44.7%	55.3%	0.0%	0.0%
	500	0.304	0.356	0.206	0.176	29.8%	70.2%	0.0%	0.0%

Open in a new tab

In Section 2.2 of the Supplementary Materials, we report simulation studies that assess the impact of mis-specifying K. Since the coefficient estimates with different selections of K involve different numbers of coefficients, we compare the accuracy of predicting D_i under correct and incorrect specifications of K. The observations from Table S.6 in the Supplementary Materials, combined with the findings from Table 4, suggest that the proposed method can yield reliable predictions despite the possibility of mis-specifying K in data analysis.

We also examine the empirical performance of our proposal in Section 3.5 for selecting f_W (·). Specifically, we generate data according to scenario S1, where W follows the Gamma(5,5) distribution. For each simulated dataset, we adopt APE_M, MPE_M, or SMSPE_M to decide f_W (·) among five candidate distributions for W, which are W = 1 and Gamma(r,r) with r = 1,3,5,7. Note that assuming W = 1 means completely ignoring the subject-specific frailty and Gamma(1,1) is the standard Exponential distribution. Thus, the five candidate frailty distributions represent various choices of f_W (·) that are similar to or very different from the true f_W (·). Table 5 presents the empirical biases (BIAS), empirical standard deviations (SD), average estimated standard errors (SE), and empirical coverage probabilities of 95% confidence intervals (CP) of the proposed parameter estimates with the true f_W (·) (corresponding to r = 5) and the selected f_W (·). We observe that the proposed estimation with the selected f_W (·), compared to that based on true f_W (·), show rather comparable or only slightly elevated empirical biases and standard deviations, and the corresponding CPs are reasonably close to 95%. This suggests a promising utility of our proposal for deciding the frailty distribution in real data analyses. In Section 2.3 of the Supplementary Materials, we also report simulation results on the proposed estimates with f_W (·) fixed to each candidate density. The results in Table S.7 suggest that mis-specifying f_W (·) has a rather minor influence on empirical biases in all cases. This indicates that the proposed estimation is very robust to either minor or major misspecification of f_W (·).

In addition, we conduct simulation studies to assess the robustness of the proposed method to the misspecification of model (3) for the latent class membership probability. The details are presented in Section 2.4 of the Supplementary Materials. We find that the proposed estimation can be biased when model (3) is severely mis-specified but is quite robust when model (3) represents only minor-to-moderate departure from the underlying true model.

In Section 2.5 of the Supplementary Materials, we present simulation studies that evaluate the variant of the proposed estimator and the efficiency augmentation approach discussed in Section 3.6. The results in Table S.10 and Table S.12 suggest that the presented variant of the proposed estimator generally works well but can be unstable when a_l is small. This is well expected because with a small a_l, N_i(a_lC_i) may not carry enough information to stably estimate the unknown parameters. Table S.11 and Table S.13 present comparisons between the proposed estimator and an augmented estimator. We observe that augmenting the proposed estimation with optimally weighted averaging results in similar empirical bias. The augmented estimator is generally more efficient than the proposed estimator. The proposed estimator has only slightly reduced efficiency compared to that of the augmented estimator with respect to β₀ (the parameter of main interest). These observations confirm the efficiency benefit of the proposed augmentation procedure and also suggests that the proposed estimator has reasonably good efficiency.

6. A real application

We apply the proposed method to a dataset from research participants at Goizueta Alzheimer’s Disease Research Center who were diagnosed with a cognitive disorder during the period 1997–2019. The main interest of our analysis is to explore the heterogeneity in the patterns of clinical phone calls made for these patients for purposes such as general inquiries and reporting clinical concerns or problems. The inter-individual variability in the phone call pattern reflects underlying disease severity and co-morbidities. It can also shed useful insight about the level of disease education and understanding by care partners and the availability of caregiver support and resources, which constitute a critical part of Alzheimer disease care.

To address this interest, we have a dataset extracted from the Emory Healthcare Clinical Data Warehouse (CDW) by selecting document types coded as “Phone Message” linked to each individual patient’s medical records. All such documents are time stamped. The dataset used by our analysis are confined to 398 patients who had clinical phone call records between September 1st, 2016 to October 23th, 2019. The recurrent event of interest is the occurrence of a clinical phone call with T^(j) representing the time (in years) from September 1st of 2016 to the jth phone call. The censoring time C is time to October 23th, 2019 or death, whichever occurs first. Since the rate of death during the study period is low, around 3%, we expect the potential violation of the independent censoring assumption due to the presence of death is very minor and has a minimal impact to the application of the proposed method to this dataset. We consider four potential contributors for this recurrent event, which are gender defined as Z₁ = 1 if female and 0 if male, age in years denoted by Z₂, number of years of education denoted by Z₃, and baseline Montreal Cognitive Assessment (MOCA) total score denoted by Z₄. The continuous covariates, Z₂, Z₃, and Z₄, are scaled to [0,1]. Excluding patients with missing data on these covariates, we have 246 patients included in the final analysis dataset. In addition, we exclude 61 phone calls, which were made for medical refills or appointment scheduling, from 37 patients’ records. This is due to the concern that these regular-care related phone calls, even after conditioning on the individual frailty and latent class membership, may not be “memoryless”, a property implied by the non-stationary Poisson process assumption adopted by the proposed model. The final dataset can be made available upon request.

We fit models (1)–(3) to this dataset with ν^∗ = 3, which is chosen because the longest follow-up time in this dataset is 3.14 years. We consider five candidate distributions for W, i.e. W = 1 and Gamma(r,r), r = 1,3,5,7. Given each candidate f_W (w), we calculate the relative entropy measure E_K presented in Section 3.3 with the number of latent classes K equal to 2, 3, 4, or 5. The results are presented in Table S.14 in the Supplementary Materials. It is shown that the maximum relative entropy is always achieved at K = 3 with the different choices of f_W (w), suggesting that three latent classes may provide the best fit of the data. Next, pre-specifying K = 3, we select the frailty density f_W (·) among the five candidate distributions, following the procedure in Section 3.5. Based on the results in Table S.15 in the Supplementary Materials, all the three model-checking measures attain the smallest value when W follows the distribution Gamma(7,7). Therefore, we set K = 3 and select f_W (·) as the density of Gamma(7,7) for the rest of the analyses.

We first examine the characteristics associated with the three latent classes. We apply the modal class assignment rule to categorize patients into three subgroups based on ${\hat{τ}}_{i k}$ obtained from our estimation procedure. Table S.16 in the Supplementary Materials summarizes the characteristics of the three classes. It is observed that Class 1 is the subgroup which tends to have higher education, higher MOCA scores and consists of more females, as compared to the two other classes. Class 3 is featured with the least number of years of education, youngest age, and the highest proportion of males. Class 2, while standing in the middle in terms of gender distribution, distinguishes itself from Class 1 and Class 3 by the lowest MOCA scores, which indicates severe cognitive impairment before the start of clinical phone call tracking. The age distribution is comparable between Class 1 and Class 2. These observations are rather consistent with the estimation results for α₀ provided in Table S.17 in the Supplementary Materials. For example, the results in Table S.17 suggest that younger patients are more likely belong to Class 3, and Class 3 is associated with fewer years of education.

Table 6 presents the estimation results for β₀ and η₀, including the parameter estimates (Est), the estimated standard errors (SE), and the associated p-values. It is shown that female patients, compared to male patients, may be associated with higher frequency of clinical phone calls in Class 1 but less frequent clinical phone calls in Class 2. The different directions of the gender effect may be explained by the different cognitive status between Class 1 and Class 2. That is, patients in Class 1 tend to have good cognitive functions and thus are more likely to be capable of self-care, while spouse-care may be more common in Class 2 due to the low cognitive function of patients. Thus the gender effects identified for Class 1 and Class 2 consistently suggest female care-takers (self or spouse) tend to make more frequent clinical phone calls. The results in Table 6, particularly those for Class 1 and Class 3, suggest that younger patients tend to be associated with less frequent clinical phone calls, likely owing to their better underlying health conditions. It is also noted that in Class 1, higher education may contribute to an increase in clinical phone call frequency. This may reflect the higher health consciousness associated with higher education in the old but less cognitive impaired population (e.g. Class 1). We observe that the estimated scale parameter for Class 3 (i.e. ${\hat{η}}_{k}$ ) is considerably larger than its counterparts for Class 1 and Class 2. This may imply an overall higher frequency of clinical phone calls in Class 3, likely driven by the lower level of disease education in this subgroup of patients.

Table 6.

Analysis of the clinical phone call data: estimation results for β₀.

Variable		Latent class 1	Latent class 2	Latent class 3
$Gender ({\hat{β}}_{1 k})$	Est	0.817	−1.187	−0.672
	SE	0.330	0.630	0.476
	p-value	0.013	0.060	0.158

$Age ({\hat{β}}_{2 k})$	Est	−1.320	−0.442	−0.717
	SE	0.639	0.254	0.364
	p-value	0.039	0.082	0.049

$Education ({\hat{β}}_{1 k})$	Est	1.540	−0.677	−0.465
	SE	0.742	0.454	0.203
	p-value	0.038	0.136	0.022

$MOCA ({\hat{β}}_{4 k})$	Est	−0.683	0.827	−1.217
	SE	0.466	0.432	0.541
	p-value	0.143	0.056	0.025

$scale parameter ({\hat{η}}_{k})$	Est	1.398	1.657	2.121
	SE	0.638	0.873	0.999
	p-value	0.028	0.058	0.034

Open in a new tab

For each class, we calculate the average estimated mean function defined as

\sum_{i = 1}^{n} I ({\hat{ξ}}_{i} = k) \hat{μ} (t) \exp (Z_{i}^{T} {\hat{β}}_{k}) / \sum_{i = 1}^{n} I ({\hat{ξ}}_{i} = k),

where ${\hat{ξ}}_{i}$ denotes the latent class membership assignment based on the modal rule (i.e. ${\hat{ξ}}_{i} = \arg \max_{1 \leq k \leq K} {\hat{τ}}_{i k}$ ), and k = 1,...,K. In Figure S.2 in the Supplementary Materials, we plot the average estimated mean functions for Classes 1–3. The results confirm the conjectured highest frequency of clinical phone calls in Class 3 based on Table 6. Figure S.2 shows that Class 2 is associated with lower frequency of clinical phone calls than Class 1. This may be explained by the gender distribution difference between these two classes.

We further check the overall fit of the proposed latent class mixture model to the clinical phone call data using the graphic model checking approach discussed in Section 3.4. In Figure 2, we present the scatter plot of ${\hat{D}}_{i}$ based on the proposed models versus D_i, and the scatter plot of ${\hat{D}}_{i, A}$ based on Wang et al. (2001)’s method versus D_i. It is shown that the pairs of $({\hat{D}}_{i}, D_{i})$ cluster around the 45 degree line fairly closely, while ${\hat{D}}_{i, A}$ and D_i do not demonstrate an agreeable pattern. Such an observation suggests that the proposed method provides an overall good fit to the data and clearly has an improved utility over the standard analysis.

Fig. 2. — Predicted numbers of phone calls by the proposed model and Wang et al. (2001)’s method versus the observe numbers of phone calls.

7. Concluding remarks

In this work, we propose to tackle complex heterogeneity structure of recurrent events data through the perspective of latent class analysis. The proposed semiparametric latent class model is inherently more robust than existing parametric models, while permitting easy and stable implementation. Each step of our estimation algorithm only involves either computing a closed-form estimator or obtaining estimates via standard software such as R functions, glm() and multinorm(). In contrast, existing latent class methods are commonly plagued by algorithmic complexity and the resulting computational issues (McLachlan and Peel, 2000). The proposed method strikes a good balance between modeling flexibility and implementation reliability.

As pointed out by one referee, a more efficient estimation approach may be developed by using nonparametric maximum likelihood (NPMLE) technique (Zeng et al., 2016, 2017, for example). However, we expect the potential NPMLE approach may involve greater implementation complexity, which needs to be carefully addressed in order to facilitate its real data applications. As another option to improve estimation efficiency, the optimally weighted averaging approach discussed in Section 3.6, while requiring extra computational efforts, is more straightforward and stable to implement. The efficiency benefit of this procedure is confirmed by our simulation studies.

Our numerical studies suggest that using goodness-of-fit measures to guide the selection of frailty density f_W (·) performs very well. Alternatively we may consider formulating f_W (·) in a parametric form, f_W (w;ν₀), and then estimating the unknown parameters in ν₀. In this case, by (7)–(9), we can express $τ_{i k}$ in terms of α₀,β₀,ν₀, and µ₀, and denote it by $τ_{i k} (α_{0}, β_{0}, ν_{0}, μ_{0})$ . To estimate ν₀, we may utilize the fact that $Ε (D_{i} | Z_{i}, C_{i}) = \sum_{k = 1}^{K} τ_{i k} (α_{0}, β_{0}, ν_{0}, μ_{0}) μ_{0} (C_{i}) \exp (Z_{i}^{T} β_{0, k})$ , and construct additional estimating equations, given by

\begin{array}{l} S_{3, n} (α, β, ν, \hat{μ}) ≐ \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{K} \hat{μ} (C_{i}) \exp (Z_{i}^{T} β_{k}) \frac{\partial τ_{i k} (α, β, ν, \hat{μ})}{\partial ν} \\ \cdot [D_{i} - \sum_{k = 1}^{K} τ_{i k} (α, β, ν, \hat{μ}) \hat{μ} (C_{i}) \exp (Z_{i}^{T} β_{k})] = 0. \end{array}

(17)

Solving (17) in conjunction with equations (13) and (14) can lead to consistent estimates for α₀, β₀, and ν₀. Note that $S_{3, n} (α, β, ν, \hat{μ})$ may have a non-monotone irregular surface even after simplification that fixes some component. This can cause undesirable implementation complexities. The strategies that facilitate solving equations (13) and (14), such as the use of Multinomial regression and Poisson regression, are no longer applicable to tackle (17). In contrast, addressing model (1) with a reasonably specified distribution of W as presented in Section 3 enjoys the appealing algorithm simplicity and stability, and thus may be more preferable in real data analyses.

Several potential extensions of the proposed models merit further research efforts. One is to allow for more flexible class-specific formulation of the baseline intensity function beyond the simple scale shift captured by the constant η_0,k. Another direction is to apply the proposed modeling and estimation strategies to conduct latent class analysis jointly for recurrent events data and longitudinal data. This is part of our ongoing work which will be reported separately.

Supplementary Material

Supplementary Materials

NIHMS1778095-supplement-Supplementary_Materials.pdf^{(373KB, pdf)}

Acknowledgements

The authors greatly appreciate valuable comments from the Editor, the Associate Editor, and Referees. The authors are grateful to Drs. James Lah and Felicia Goldstein, and Ms. Noy Hawkins for their help with the extraction and manipulation of the clinical phone call data and the interpretation of the analysis results. This work was supported by NIH grants R01 AG055634 and R01 HL113548.

Contributor Information

Wei Zhao, Department of Biostatistics and Bioinformatics, Emory University, Atlanta, U.S.A..

Limin Peng, Department of Bioinformatics and Biostatistics, Emory University, Atlanta, U.S.A..

John Hanfelt, Department of Bioinformatics and Biostatistics, Emory University, Atlanta, U.S.A..

References

Altstein LL, Li G and Elashoff RM (2011) A method to estimate treatment efficacy among latent subgroups of a randomized clinical trial. Statist. Med, 30, 709–717. [DOI] [PMC free article] [PubMed] [Google Scholar]
Andersen PK and Gill RD (1982) Cox’s regression model for counting processes: a large sample study. Ann. Statist, 10, 1100–1120. [Google Scholar]
Bacci S, Bartolucci F, Bettin G and Pigini C (2019) A latent class growth model for migrants’ remittances: an application to the german socio-economic panel. J. R. Statist. Soc. A, 182, 1607–1632. [Google Scholar]
Celeux G and Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture model. Journal of Classification, 13, 195–212. [Google Scholar]
Cook RJ and Lawless J (2007) The Statistical Analysis of Recurrent Events Springer Science & Business Media. [Google Scholar]
Cook RJ and Lawless JF (1997) Marginal analysis of recurrent events and a terminating event. Statist. Med, 16, 911–924. [DOI] [PubMed] [Google Scholar]
Egleston BL, Uzzo RG and Wong Y-N (2017) Latent class survival models linked by principal stratification to investigate heterogenous survival subgroups among individuals with early-stage kidney cancer. J. Am. Statist. Ass, 112, 534–546. [DOI] [PMC free article] [PubMed] [Google Scholar]
Farewell VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics, 1041–1046. [PubMed]
Fygenson M and Ritov Y (1994) Monotone estimating equations for censored data. Ann. Statist, 732–746.
Gallop R, Small DS, Lin JY, Elliott MR, Joffe M and Ten Have TR (2009) Mediation analysis with principal stratification. Statist. Med, 28, 1108–1130. [DOI] [PMC free article] [PubMed] [Google Scholar]
Han J (2009) Initial classification of joint data in em estimation of latent class joint model. Journal of Multivariate Analysis, 100, 2313–2323. [Google Scholar]
Han J, Slate EH and Peña EA (2007) Parametric latent class joint model for a longitudinal biomarker and recurrent events. Statist. Med, 26, 5285–5302. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hilton RP, Zheng Y and Serban N (2018) Modeling heterogeneity in healthcare utilization using massive medical claims data. J. Am. Statist. Ass, 113, 111–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jedidi K, Ramaswamy V and DeSarbo WS (1993) A maximum likelihood method for latent class regression involving a censored dependent variable. Psychometrika, 58, 375–394. [Google Scholar]
Jo B, Findling RL, Wang C-P, Hastie TJ, Youngstrom EA, Arnold LE, Fristad MA and Horwitz SM (2017) Targeted use of growth mixture modeling: a learning perspective. Statist. Med, 36, 671–686. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kosorok MR (2008) Introduction to empirical processes and semiparametric inference. Springer
Lai D, Xu H, Koller D, Foroud T and Gao S (2016) A multivariate finite mixture latent trajectory model with application to dementia studies. Journal of Applied Statistics, 43, 2503–2523. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lavancier F and Rochet P (2016) A general procedure to combine estimators. Computational Statistics & Data Analysis, 94, 175–192. [Google Scholar]
Lim HK, Li WK and Philip L (2014) Zero-inflated poisson regression mixture model. Computational Statistics & Data Analysis, 71, 151–158. [Google Scholar]
Lin D, Sun W and Ying Z (1999) Nonparametric estimation of the gap time distribution for serial events with censored data. Biometrika, 86, 59–70. [Google Scholar]
Lin DY, Wei L-J, Yang I and Ying Z (2000) Semiparametric regression for the mean and rate functions of recurrent events. J. R. Statist. Soc. B, 62, 711–730. [Google Scholar]
Lin H, McCulloch CE and Rosenheck RA (2004) Latent pattern mixture models for informative intermittent missing data in longitudinal studies. Biometrics, 60, 295–305. [DOI] [PubMed] [Google Scholar]
Lin H, Turnbull BW, McCulloch CE and Slate EH (2002) Latent class models for joint analysis of longitudinal biomarker and event process data: application to longitudinal prostate-specific antigen readings and prostate cancer. J. Am. Statist. Ass, 97, 53–65. [Google Scholar]
Luo X, Huang C-Y and Wang L (2013) Quantile regression for recurrent gap time data. Biometrics, 69, 375–385. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mair P and Hudec M (2009) Multivariate weibull mixtures with proportional hazard restrictions for dwell-time-based session clustering with incomplete data. Journal of the Royal Statistical Society: Series C (Applied Statistics), 58, 619–639. [Google Scholar]
McLachlan G and Peel D (2000) Finite Mixture Models John Wiley & Sons. [Google Scholar]
Muthén B (2004) Latent variable analysis. The Sage Handbook of Quantitative Methodology for the Social Sciences, 345, 106–109. [Google Scholar]
Muthén B and Shedden K (1999) Finite mixture modeling with mixture outcomes using the em algorithm. Biometrics, 55, 463–469. [DOI] [PubMed] [Google Scholar]
Nagin DS (1999) Analyzing developmental trajectories: a semiparametric, group-based approach. Psychological Methods, 4, 139. [DOI] [PubMed] [Google Scholar]
Pepe MS and Cai J (1993) Some graphical displays and marginal regression analyses for recurrent failure times and time dependent covariates. J. Am. Statist. Ass, 88, 811–820. [Google Scholar]
Prentice RL, Williams BJ and Peterson AV (1981) On the regression analysis of multivariate failure time data. Biometrika, 68, 373–379. [Google Scholar]
Proust-Lima C, Dartigues J-F and Jacqmin-Gadda H (2016) Joint modeling of repeated multivariate cognitive measures and competing risks of dementia and death: a latent process and latent class approach. Statist. Med, 35, 382–398. [DOI] [PubMed] [Google Scholar]
Qu P, Barlogie B and Crowley J (2015) Using a latent class model to refine risk stratification in multiple myeloma. Statist. Med, 34, 2971–2980. [DOI] [PubMed] [Google Scholar]
Ramaswamy V, DeSarbo WS, Reibstein DJ and Robinson WT (1993) An empirical pooling approach for estimating marketing mix elasticities with pims data. Marketing Science, 12, 103–124. [Google Scholar]
Reinecke J and Seddig D (2011) Growth mixture models in longitudinal research. AStA Advances in Statistical Analysis, 95, 415–434. [Google Scholar]
Stefanski LA and Carroll RJ (1987) Conditional scores and optimal scores for generalized linear measurement-error models. Biometrika, 74, 703–716. [Google Scholar]
Van Der Vaart AW, van der Vaart AW, van der Vaart A and Wellner J (1996) Weak convergence and empirical processes: with applications to statistics Springer Science & Business Media. [Google Scholar]
Wang M-C, Qin J and Chiang C-T (2001) Analyzing recurrent event data with informative censoring. J. Am. Statist. Ass, 96, 1057–1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wedel M, DeSarbo WS, Bult JR and Ramaswamy V (1993) A latent class poisson regression model for heterogeneous count data. Journal of Applied Econometrics, 8, 397–411. [Google Scholar]
Zeng D, Gao F and Lin D (2017) Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data. Biometrika, 104, 505–525. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zeng D, Mao L and Lin DY (2016) Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika, 103, 253–271. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1778095-supplement-Supplementary_Materials.pdf^{(373KB, pdf)}

[R1] Altstein LL, Li G and Elashoff RM (2011) A method to estimate treatment efficacy among latent subgroups of a randomized clinical trial. Statist. Med, 30, 709–717. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Andersen PK and Gill RD (1982) Cox’s regression model for counting processes: a large sample study. Ann. Statist, 10, 1100–1120. [Google Scholar]

[R3] Bacci S, Bartolucci F, Bettin G and Pigini C (2019) A latent class growth model for migrants’ remittances: an application to the german socio-economic panel. J. R. Statist. Soc. A, 182, 1607–1632. [Google Scholar]

[R4] Celeux G and Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture model. Journal of Classification, 13, 195–212. [Google Scholar]

[R5] Cook RJ and Lawless J (2007) The Statistical Analysis of Recurrent Events Springer Science & Business Media. [Google Scholar]

[R6] Cook RJ and Lawless JF (1997) Marginal analysis of recurrent events and a terminating event. Statist. Med, 16, 911–924. [DOI] [PubMed] [Google Scholar]

[R7] Egleston BL, Uzzo RG and Wong Y-N (2017) Latent class survival models linked by principal stratification to investigate heterogenous survival subgroups among individuals with early-stage kidney cancer. J. Am. Statist. Ass, 112, 534–546. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Farewell VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics, 1041–1046. [PubMed]

[R9] Fygenson M and Ritov Y (1994) Monotone estimating equations for censored data. Ann. Statist, 732–746.

[R10] Gallop R, Small DS, Lin JY, Elliott MR, Joffe M and Ten Have TR (2009) Mediation analysis with principal stratification. Statist. Med, 28, 1108–1130. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Han J (2009) Initial classification of joint data in em estimation of latent class joint model. Journal of Multivariate Analysis, 100, 2313–2323. [Google Scholar]

[R12] Han J, Slate EH and Peña EA (2007) Parametric latent class joint model for a longitudinal biomarker and recurrent events. Statist. Med, 26, 5285–5302. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Hilton RP, Zheng Y and Serban N (2018) Modeling heterogeneity in healthcare utilization using massive medical claims data. J. Am. Statist. Ass, 113, 111–121. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Jedidi K, Ramaswamy V and DeSarbo WS (1993) A maximum likelihood method for latent class regression involving a censored dependent variable. Psychometrika, 58, 375–394. [Google Scholar]

[R15] Jo B, Findling RL, Wang C-P, Hastie TJ, Youngstrom EA, Arnold LE, Fristad MA and Horwitz SM (2017) Targeted use of growth mixture modeling: a learning perspective. Statist. Med, 36, 671–686. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Kosorok MR (2008) Introduction to empirical processes and semiparametric inference. Springer

[R17] Lai D, Xu H, Koller D, Foroud T and Gao S (2016) A multivariate finite mixture latent trajectory model with application to dementia studies. Journal of Applied Statistics, 43, 2503–2523. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Lavancier F and Rochet P (2016) A general procedure to combine estimators. Computational Statistics & Data Analysis, 94, 175–192. [Google Scholar]

[R19] Lim HK, Li WK and Philip L (2014) Zero-inflated poisson regression mixture model. Computational Statistics & Data Analysis, 71, 151–158. [Google Scholar]

[R20] Lin D, Sun W and Ying Z (1999) Nonparametric estimation of the gap time distribution for serial events with censored data. Biometrika, 86, 59–70. [Google Scholar]

[R21] Lin DY, Wei L-J, Yang I and Ying Z (2000) Semiparametric regression for the mean and rate functions of recurrent events. J. R. Statist. Soc. B, 62, 711–730. [Google Scholar]

[R22] Lin H, McCulloch CE and Rosenheck RA (2004) Latent pattern mixture models for informative intermittent missing data in longitudinal studies. Biometrics, 60, 295–305. [DOI] [PubMed] [Google Scholar]

[R23] Lin H, Turnbull BW, McCulloch CE and Slate EH (2002) Latent class models for joint analysis of longitudinal biomarker and event process data: application to longitudinal prostate-specific antigen readings and prostate cancer. J. Am. Statist. Ass, 97, 53–65. [Google Scholar]

[R24] Luo X, Huang C-Y and Wang L (2013) Quantile regression for recurrent gap time data. Biometrics, 69, 375–385. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Mair P and Hudec M (2009) Multivariate weibull mixtures with proportional hazard restrictions for dwell-time-based session clustering with incomplete data. Journal of the Royal Statistical Society: Series C (Applied Statistics), 58, 619–639. [Google Scholar]

[R26] McLachlan G and Peel D (2000) Finite Mixture Models John Wiley & Sons. [Google Scholar]

[R27] Muthén B (2004) Latent variable analysis. The Sage Handbook of Quantitative Methodology for the Social Sciences, 345, 106–109. [Google Scholar]

[R28] Muthén B and Shedden K (1999) Finite mixture modeling with mixture outcomes using the em algorithm. Biometrics, 55, 463–469. [DOI] [PubMed] [Google Scholar]

[R29] Nagin DS (1999) Analyzing developmental trajectories: a semiparametric, group-based approach. Psychological Methods, 4, 139. [DOI] [PubMed] [Google Scholar]

[R30] Pepe MS and Cai J (1993) Some graphical displays and marginal regression analyses for recurrent failure times and time dependent covariates. J. Am. Statist. Ass, 88, 811–820. [Google Scholar]

[R31] Prentice RL, Williams BJ and Peterson AV (1981) On the regression analysis of multivariate failure time data. Biometrika, 68, 373–379. [Google Scholar]

[R32] Proust-Lima C, Dartigues J-F and Jacqmin-Gadda H (2016) Joint modeling of repeated multivariate cognitive measures and competing risks of dementia and death: a latent process and latent class approach. Statist. Med, 35, 382–398. [DOI] [PubMed] [Google Scholar]

[R33] Qu P, Barlogie B and Crowley J (2015) Using a latent class model to refine risk stratification in multiple myeloma. Statist. Med, 34, 2971–2980. [DOI] [PubMed] [Google Scholar]

[R34] Ramaswamy V, DeSarbo WS, Reibstein DJ and Robinson WT (1993) An empirical pooling approach for estimating marketing mix elasticities with pims data. Marketing Science, 12, 103–124. [Google Scholar]

[R35] Reinecke J and Seddig D (2011) Growth mixture models in longitudinal research. AStA Advances in Statistical Analysis, 95, 415–434. [Google Scholar]

[R36] Stefanski LA and Carroll RJ (1987) Conditional scores and optimal scores for generalized linear measurement-error models. Biometrika, 74, 703–716. [Google Scholar]

[R37] Van Der Vaart AW, van der Vaart AW, van der Vaart A and Wellner J (1996) Weak convergence and empirical processes: with applications to statistics Springer Science & Business Media. [Google Scholar]

[R38] Wang M-C, Qin J and Chiang C-T (2001) Analyzing recurrent event data with informative censoring. J. Am. Statist. Ass, 96, 1057–1065. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] Wedel M, DeSarbo WS, Bult JR and Ramaswamy V (1993) A latent class poisson regression model for heterogeneous count data. Journal of Applied Econometrics, 8, 397–411. [Google Scholar]

[R40] Zeng D, Gao F and Lin D (2017) Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data. Biometrika, 104, 505–525. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] Zeng D, Mao L and Lin DY (2016) Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika, 103, 253–271. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Semiparametric Latent Class Analysis of Recurrent Event Data

Wei Zhao

Limin Peng

John Hanfelt

Summary.

1. Introduction

2. Data and Model Assumptions

3. The Proposed Estimation Procedure

3.1. Estimating equation

3.2. Estimation algorithm

3.3. Selection of the number of latent classes

3.4. Model checking

3.5. Selection of the frailty density

Table 5.

3.6. Efficiency augmentation via optimally weighted averaging

4. Asymptotic properties and estimation of asymptotic variance

4.1. Asymptotic properties

5. Simulation

Table 1.

Fig. 1.

Table 2.

Table 3.

Table 4.

6. A real application

Table 6.

Fig. 2.

7. Concluding remarks

Supplementary Material

Acknowledgements

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Semiparametric Latent Class Analysis of Recurrent Event Data

Wei Zhao

Limin Peng

John Hanfelt

Summary.

1. Introduction

2. Data and Model Assumptions

3. The Proposed Estimation Procedure

3.1. Estimating equation

3.2. Estimation algorithm

3.3. Selection of the number of latent classes

3.4. Model checking

3.5. Selection of the frailty density

Table 5.

3.6. Efficiency augmentation via optimally weighted averaging

4. Asymptotic properties and estimation of asymptotic variance

4.1. Asymptotic properties

5. Simulation

Table 1.

Fig. 1.

Table 2.

Table 3.

Table 4.

6. A real application

Table 6.

Fig. 2.

7. Concluding remarks

Supplementary Material

Acknowledgements

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases