Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Dec 2.
Published in final edited form as: J R Stat Soc Series B Stat Methodol. 2022 Apr 14;84(4):1175–1197. doi: 10.1111/rssb.12499

Semiparametric Latent Class Analysis of Recurrent Event Data

Wei Zhao 1, Limin Peng 2, John Hanfelt 3
PMCID: PMC9718440  NIHMSID: NIHMS1778095  PMID: 36465280

Summary.

Recurrent events data frequently arise in chronic disease studies, providing rich information on disease progression. The concept of latent class offers a sensible perspective to characterize complex population heterogeneity in recurrent event trajectories that may not be adequately captured by a single regression model. However, the development of latent class methods for recurrent events data has been sparse, typically requiring strong parametric assumptions and involving algorithmic issues. In this work, we investigate latent class analysis of recurrent events data based on flexible semiparametric multiplicative modeling. We derive a robust estimation procedure through novelly adapting the conditional score technique and utilizing the special characteristics of multiplicative intensity modeling. The proposed estimation procedure can be stably and efficiently implemented based on existing computational routines. We provide solid theoretical underpinnings for the proposed method, and demonstrate its satisfactory finite sample performance via extensive simulation studies. An application to a dataset from research participants at Goizueta Alzheimer’s Disease Research Center illustrates the practical utility of our proposals.

Keywords: Latent class analysis, Recurrent events data, Multiplicative intensity model, Estimating equation, Conditional score

1. Introduction

Recurrent events data frequently arise in chronic disease follow-up studies when repeated occurrences of a disease-related event, such as hospitalization and infection, are monitored over time. Such data contain rich information on disease progression which often presents complex heterogeneous patterns across individuals. A common strategy to accommodate the heterogeneity in recurrent events data is to perform regression analysis that links the recurrent events outcome with a set of observed explanatory variables based on a specified model. Well-known methods include assessing or modeling the intensity function of recurrent events (Andersen and Gill, 1982; Pepe and Cai, 1993; Wang et al., 2001, for example), the gap time between recurrent events (Prentice et al., 1981; Lin et al., 1999; Luo et al., 2013, for example), and the mean or rate function of recurrent events (Cook and Lawless, 1997; Lin et al., 2000, for example). Despite the success in many applications, the standard regression strategy may have poor performance when data are embedded with multiple distinct subgroups owing to heterogeneous underlying etiology or other factors. Results from our simulation studies (see Section 5) suggest that ignoring the existence of distinct subgroups and fitting one common model for all can lead to substantially increased prediction errors.

Latent class analysis (LCA) offers a parsimonious solution to tackle complex heterogeneity structure that cannot be adequately captured by a single regression model. A dominant type of LCA approaches is to adopt latent class mixture modeling, which views the observed data as a manifestation of multiple latent classes or subgroups. Such LCA methods have been well studied for various kinds of data, including standard uncensored data (Wedel et al., 1993; Gallop et al., 2009; Lim et al., 2014, for example), censored data (Farewell, 1982; Jedidi et al., 1993; Mair and Hudec, 2009; Qu et al., 2015; Egleston et al., 2017, for example), longitudinal data (Muthén and Shedden, 1999; Nagin, 1999; Muthén, 2004; Reinecke and Seddig, 2011; Lai et al., 2016; Jo et al., 2017; Bacci et al., 2019, among others), and longitudinal data in combination with survival data (Lin et al., 2002, 2004; Altstein et al., 2011; Proust-Lima et al., 2016; Hilton et al., 2018; Han et al., 2007; Han, 2009). However, LCA methods tailored to delineate the heterogeneity in recurrent event trajectory are limited. Relevant efforts include Han et al. (2007) and Han (2009) that investigated a joint latent class model of longitudinal biomarkers and recurrent events. These methods rely on parametric model assumptions and require a rather complex expectation-maximization (EM) algorithm to obtain the maximum likelihood estimates. This raises concerns on theoretical bias due to misspecification of the parametric model and computational instability due to the algorithmic complexity.

To help uncover the heterogeneity structure underlying the observed recurrent events with the above issues alleviated, we propose a robust semiparametric latent class method for recurrent events data based on the popular multiplicative intensity modeling. That is, we assume the whole population consists of K latent classes. Within each latent class, the occurrences of recurrent events can be captured by a semiparametric multiplicative intensity model (Prentice et al., 1981; Andersen and Gill, 1982), while the assumed recurrent events models have distinct covariate coefficients and baseline intensity function across different latent classes. To estimate the proposed latent class mixture model, the parametric likelihood approach is precluded by the nonparametric formulation of the baseline intensity function under the semiparametric multiplicative intensity model. Adapting existing methods for semiparametric multiplicative intensity models (Andersen and Gill, 1982; Wang et al., 2001, for example), however, confronts a notable challenge due to the unobservable latent class membership for all subjects.

In this work, we investigate the latent class analysis of recurrent events data based on the proposed semiparametric latent class mixture model. Utilizing the special stochastic properties implied by the adopted multiplicative intensity modeling, we construct a Nelson-Aalen type equation and derive a nonparametric estimator of the baseline mean function. We address the difficulty with the unobservable latent class membership by adapting the principal of conditional score (Stefanski and Carroll, 1987). Using empirical process arguments and estimating equation theory, we establish desirable asymptotic properties, including the uniform consistency and weak convergence of the baseline mean function estimator, and the consistency and asymptotic normality of parameter estimators. We also discuss the selection of K, the number of latent classes, using the classic relative entropy measure (Ramaswamy et al., 1993). Of practical appeals, our estimating equations can be solved by a simple iterative estimation procedure, which can be stably and efficiently implemented based on existing computational routines.

The reminder of this paper is organized as follows. We describe the latent class mixture modeling of recurrent events data and other model assumptions in Section 2. We present the proposed estimating equations and algorithm in Section 3. Section 4 covers the asymptotic properties of the proposed estimators and inferences. In Section 5, we report results from our simulation studies, which demonstrate satisfactory finite sample performance of the proposed method as well as its practical advantages. In Section 6, we apply the new method to a dataset from a group of research participants at Goizueta Alzheimer’s Disease Research Center who were diagnosed with a cognitive disorder. Some concluding remarks are contained in Section 7.

2. Data and Model Assumptions

For subject i, let Tij denote time to the jth recurrent event and let Z˜i denote a p × 1 vector of time-independent covariates. The underlying counting process for the recurrent events is defined as Nit=j=1ITijt  i=1,,n. Suppose the observation of recurrent events is terminated by a censoring time Ci. Then the observed counting process of recurrent events is given by Nit=NitCi=j=1ITijtCi, where ab denotes the minimum of a and b. The observed recurrent events data consist of Nit,Ci,Z˜ii=1n. In the sequel, notation without subscript i represent the corresponding population analogues.

Suppose the whole population comprise K latent classes; each latent class represents a subpopulation that has its own mechanism governing the interplay between recurrent event occurrences and the observed covariates. To depict such a data scenario, we assume a latent class mixture model, where Nit is a nonstationary Poisson process with the intensity function,

λit=k=1KIξi=k×λ0t×Wi×η0,k×expZ˜iTβ˜0,k. (1)

Here the number of latent classes, K, is pre-determined, ξi stands for the unobservable latent class membership with I(ξi = k) indicating whether or not subject i belongs to class k, λ0(t) is an unspecified, continuous, nonnegative baseline intensity function shared among different latent classes, Wi is a positive subject-specific latent variable (or frailty) independent of ξi,Z˜i,Ci, and η0,k > 0 and β˜0,kk=1,,K are unknown class-specific parameters. Here η0,k captures the class-k scale shift in the baseline intensity function, and β˜0,k represents the class-k covariate effects on the intensity function of recurrent events. The frailty Wi offers the flexibility to accommodate individual difference with a larger (or smaller) value indicating more (or less) frequent occurrences of recurrent events. To ensure the identifiability of λ0(t) and η0,k’s in (1), we assume EWi|Z˜i,ξi=k=1 for k = 1,...,K and impose the constraint,

0νλ0tdt=1, (2)

where ν is a predetermined constant. In practice, ν may be chosen to be slightly smaller than the upper bound of Ci’s support. A different choice of ν would only imply a scale shift to λ0(t) by a constant with η0,k and β˜0,k remaining the same.

Note that the specification of the class-k recurrent event process under model (1) takes the same form as the multiplicative intensity model studied by Wang et al. (2001). Both models include multiplicative subject-specific frailty Wi which helps relax the memoryless constraint inherited with Poisson processes (Cook and Lawless, 2007), while our model involves one additional latent variable ξi to account for subject-specific variability induced by the underlying latent class membership. By writing (1) as λit=k=1KIξi=kη0,kλ0tWiexpZ˜iTk=1KIξi=kβ˜0,k, we further note that, unlike in Wang et al. (2001)’ model, the latent variable ξi is incorporated in a non-multiplicative manner, and influences the covariate effects (i.e. i=1KIξi=kβ˜0,k). The design of our model is tailored to tackle complex heterogeneity structure of recurrent events data through the perspective of latent class analysis (LCA).

To address the difficulty with the unobservable latent class membership, we further assume a multinomial logistic regression model for ξi:

Ρξi=k|Z˜i=pkα0,Z˜iexpZ˜iTα0,kk=1KexpZ˜iTα0,k,k=1,,K, (3)

where α0=α0,1T,α0,KTT with α0,1 = 0p×1. Model (3) is commonly adopted in latent class analysis literature to facilitate recovering information on the unobservable latent class structure based on the observed data. Note that model (3) can be readily adapted to allow only a subset of the covariates in Z˜i or a separate set of covariates to influence the distribution of the latent class membership. In addition, we assume that Ci and Ni,  ξi are independent given Z˜i.

3. The Proposed Estimation Procedure

3.1. Estimating equation

Let Zi=1,Z˜i and β0,k=logη0,k,β˜0,k. By model assumptions (1) and (2), and EWi|Z˜i,ξi=k=1, it holds that

ΕNit|ξi=k,Zi=Εμ0tWiexpZiTβ0,k|ξi=k,Zi=μ0texpZiTβ0,k, (4)

where μ0t=0tλ0sds and µ0(ν) = 1. Given that Ci and Ni,ξi are independent given Zi, this implies ΕNiCiμ0Ci|ξi=k,Zi=expZiTβ0,k, and consequently

ΕIξi=kZiNiCiμ0CiexpZiTβ0,k=0. (5)

However, (5) cannot be directly utilized to construct an estimating equation for β0,k’s because ξi’s are not observable. To overcome this difficulty, we adapt the principle of conditional score (Stefanski and Carroll, 1987) commonly used for handling missing data. Our specific idea is to recover the missing information on I(ξi = k) by conditioning it on the observed Zi, Ci and DiNiCi. This results in the following equation:

ΕτikZiNiCiμ0CiexpZiTβ0,k=0, (6)

where τik=ΕIξi=k|Zi,Di,Ci. This equation provides a feasible platform for constructing an estimating equation for β0,k’s.

It is remarkable that τik involved in (6) has a convenient analytic form, which is an appealing feature of the proposed estimation strategy. Specifically, by the definition, τik can be expressed as

τik=PDi=di|ξi=k,Zi,CiPξi=k|Zi,Cil=1KPDi=di|ξi=l,Zi,CiPξi=l|Zi,Ci. (7)

Under model (3) and the censoring assumption that Ci and Ni,ξi are independent given Zi, we have

Ρξi=k|Zi,Ci=pkα0,Z˜i=expZ˜iTα0,kl=1KexpZ˜iTα0,l. (8)

To assess P(Di = di|ξi = k,Zi,Ci), it is important to note that under model (1), Nit, given ξi = k, Wi, and Zi, is a nonhomogeneous Poisson process with mean function μ0tWiexpZiβk (Lin et al., 2000). Thus, μ0Ti1,μ0Ti2, can be viewed as random variates generated from a homogeneous Poisson process with mean function of the form,WiexpZiβ0,kt. Using standard probabilistic arguments presented in Section 1.1 of Supplementary Materials, we show that

     ΡDi=di|ξi=k,Zi,Ci=0expZiTβ0,kwμ0Cididi!expexpZiTβ0,kwμ0CifWwdw, (9)

where fW (·) denotes the known density function of W, the population analogue of the subject-specific frailty Wi. A common choice of the frailty density fW (·) is the density of the Gamma(r,r) distribution (r > 0). The selection of fW (·) and an extension to unknown fW (·) are discussed in Section 3.5 and Section 7 respectively. Plugging (8) and (9) into (7), we can obtain an explicit expression of τik in terms of α0, β0 and µ0(·), denoted by τikα0,β0,μ0, where β0=β0,1T,,β0,KTT.

Based on equation (6), if µ0(·) were known, we may consider the following estimating equations:

S1,n,kα,β,μ01ni=1nτikα,β,μ0ZiNiCiμ0CiexpZiTβk=0,   k=1,,K. (10)

In addition, under model (3), the likelihood assuming ξi’s are observed is given by i=1nk=1Kpkα,Z˜iIξi=k, which leads to the score equation,

i=1nk=1KIξi=kαlogpkα,Z˜i=0.

The same reasoning used to derive (6) motivates us to consider another set of estimating equations,

S2,n,kα,β,μ01ni=1nτikα,β,μ0Z˜iexpZ˜iTαkZ˜ij=1KexpZ˜iTαj=0,   k=1,,K. (11)

However, µ0(·) is generally unknown. To overcome this obstacle, we propose a Nelson-Aalen type estimator of µ0(·) under the assumed multiplicative intensity modeling of recurrent events. Specifically, define SC(t|Zi) = P(Ct|Zi) and H0(t) = log{µ0(t)0(ν)}. Since µ0(ν) = 1, it is easy to see that H0(ν) = 0 and µ0(t) = exp{H0(t)}. Under the conditional independent censoring assumption, the multiplicative intensity structure imposed by equation (4) implies that ΕdNit|Zi,ξi=k=SCt|ZiexpZiTβ0,kλ0tdt and ΕICitNitdH0t|Zi,ξi=k=SCt|ZiexpZiTβ0,kλ0tdt. It then follows that E[dMi(t)] = 0, where dMi(t) = dNi(t)−I(Cit)Ni(t)dH0(t). Solving i=1ndMit=0 yields an estimator of µ0(t),

μ^t=expH^t (12)

with H^t=tνi=1ndNisi=1nICisNis. Using μ^ in place of µ0(·) in (10) and (11), we propose the following estimating equations for α0 and β0:

n1/2S1,nα,β,μ^=0, (13)
n1/2S2,nα,β,μ^=0, (14)

where Sj,nα,β,μ^=Sj,n,1α,β,μ^T,,Sj,n,Kα,β,μ^TT,j=1,2. Note that (13), which includes (p + 1)K equations, and (14), which includes pK equations, have sufficient dimensions to estimate a total number of (2p + 1)K unknown parameters,α0,kT,η0,k,β˜0,kTk=1,,K.

3.2. Estimation algorithm

The propose estimation procedure can be implemented as follows.

Step 1: Compute μ^ based on (12). Set r = 0 and initial estimates for α0 and β0, denoted by α^0 and β^0. Calculate τ^ikτikα^0,β^0,μ^ based on equations (7)(9).

Step 2: Increase r by 1. Solve estimating equations, (13) and (14), with τikα,β,μ^ fixed as τ^ik. Denote the resulting solutions by α^r and β^r.

Step 3: Update τ^ik by τikα^r,β^r,μ^.

Step 4: Repeat Steps 2 and 3 until pre-specified converge criteria are met. Denote the final estimators of α0 and β0 by α^ and β^ respectively.

In Step 1, we obtain the initial estimate, α^0, by following the strategy of Lin et al. (2002). Specifically, we randomly assign class memberships to all subjects and then fit the multinomial logistic regression to obtain α^0. We obtain the initial estimate, β^0, by fitting the multiplicative intensity model studied by Wang et al. (2001) using the reReg() function in R package reReg, stratified by the randomly assigned latent class membership.

In Step 2, we obtain β[r] by solving the equation

i=1nτ^ikZiNi*Ciμ^CiexpZiTβk=0,

for β (k = 1,...,K). This equation is a monotone estimating equation (Fygenson and Ritov, 1994). Furthermore, we find that solving this equation can be equivalently transformed to fitting a “pseudo” weighted Poisson regression model with response NiCiμ^Ci and covariates Zi along with weights τ^ik. To obtain α^r, it is easy to show that solving equation (14) with τikα,β,μ^ fixed at τ^ik can be equivalently carried out by first generating ξi from Multinomial 1,τ^i1,,τ^iK distribution (i = 1,...,n) and then performing multinomial regression with responses ξii=1n and covariates Z˜ii=1n. These procedures can be readily implemented by existing computational routines, such as the R function glm() and the R function multinom() in the R package nnet.

In Step 4, the convergence criterion can be specified as the magnitude of parameter estimate change between two consecutive iterations below certain tolerance value. The magnitude of parameter estimate change may be measured by an absolute difference, for example, maxβ^rβ^r1,α^rα^r1, or by a relative difference, for example,β^rβ^r1β^r1,α^rα^r1α^r1. Here || · || denotes the L norm and the fraction between vectors stands for the component-wise fraction.

3.3. Selection of the number of latent classes

To fit the proposed latent class mixture model to a real dataset, a critical question is how to select the value of K, which represents the number of latent classes. Common practice to address this question includes using domain knowledge, or through model evaluations based on information criteria, statistical tests, entropy, reliability, or other criteria. In this work, we consider a relative entropy measure (Ramaswamy et al., 1993) defined as,

EK=1i=1nk=1Kτ^iklogτ^iknlogK, (15)

where τ^ik=τikα^,β^,μ^. By the definition, EK is bounded between 0 and 1. Following the discussions of Celeux and Soromenho (1996), EK is expected to be close to 1 when latent classes are well separated, and take a small value when latent classes are heavily overlapped. Therefore, we propose to select K as the maximizer of EK, which is,K^=argmaxK2EK. As suggested by the simulation studies presented in Section 5, this approach has a high chance to select the true value for K when latent classes are well separated. However, when latent classes overlap considerably, this approach may tend to misspecify K by a value smaller than the true one. This may reflect that given a small or moderate sample size, the relative entropy measure may not be sufficiently “powered” to differentiate all latent classes that are heavily overlapped. Nevertheless, the simulation studies also suggest that under-selecting K in such a case may still yield reliable predictions of recurrent event numbers based on the proposed models.

3.4. Model checking

Model checking is of practical importance. A simple graphic approach can be conducted to evaluate the overall fit of the proposed models. The basic idea is to contrast the number of the observed recurrent events, DiNiCi, versus its prediction under the assumed models. Specifically, the model assumptions in (1)(3) imply ENiCi|Zi=ENiCi|Zi=k=1Kτikμ0CiexpZiTβ0,k. Thus, under the proposed models, Ni(Ci) may be predicted by D^ii=1Kτ^ikμ^CiexpZiTβ^k with the observed data. Therefore, the graphic model checking may be conducted via examining the scatter plot of D^i versus Di. Observing a systematic departure of the pairs of D^i,Di from the 45 degree line may suggest a lack-of-fit of the assumed models.

3.5. Selection of the frailty density

In practice, the selection of fW (·) may be guided by the model checking procedure presented in Section 3.4. Specifically, for each candidate fWS, we compute the predictions of Di based on the observed data, denoted by D^iS. Then we select fWS that yields the smallest discrepancy between D^iSi=1n and Dii=1n, which may be summarized by APEMn1i=1nD^iSDi, MPEMmedianD^iSDi:   i=1,,n, or SMSPEMn1i=1nD^iSDi2. Our simulation studies suggest that the proposed estimates that use the fW (·) determined by this approach have quite comparable performance to the proposed estimates that adopts the true fW (·); see Table 5.

Table 5.

Simulation results from the proposed estimation that uses the true fW(·) (i.e. r = 5) or uses fW(·) selected by model-checking measures APEM, MPEM, or SMSPEM in scenario S1.

Model r = 5
APEM
MPEM
SMSPEM
Parameter BIAS SD SE CP BIAS SD SE CP BIAS SD SE CP BIAS SD SE CP
α˜11 −0.009 0.452 0.450 0.954 0.032 0.440 0.448 0.946 0.008 0.421 0.430 0.949 0.021 0.437 0.454 0.953
α˜12 0.009 0.263 0.263 0.949 −0.016 0.255 0.247 0.941 −0.021 0.257 0.233 0.940 −0.019 0.254 0.267 0.951
α˜21 −0.004 0.392 0.392 0.951 0.051 0.352 0.372 0.922 −0.032 0.322 0.312 0.929 0.009 0.351 0.329 0.923
α˜22 0.001 0.229 0.229 0.948 0.008 0.227 0.227 0.943 0.011 0.217 0.198 0.931 0.013 0.226 0.244 0.949
log(η1) 0.022 0.090 0.101 0.953 0.046 0.137 0.128 0.946 0.046 0.129 0.130 0.946 0.048 0.137 0.156 0.951
log(η2) 0.029 0.116 0.124 0.952 0.032 0.132 0.111 0.947 0.035 0.133 0.121 0.942 0.027 0.134 0.123 0.917
log(η3) 0.036 0.086 0.103 0.950 0.025 0.131 0.115 0.940 0.038 0.120 0.127 0.951 0.019 0.131 0.120 0.949
β˜11 0.061 0.116 0.125 0.957 −0.006 0.143 0.120 0.927 −0.012 0.134 0.145 0.951 −0.024 0.135 0.145 0.951
β˜12 −0.038 0.064 0.080 0.949 0.012 0.098 0.103 0.947 −0.043 0.076 0.078 0.927 −0.028 0.087 0.104 0.946
β˜21 0.037 0.293 0.311 0.951 0.043 0.183 0.101 0.910 −0.008 0.156 0.167 0.921 0.024 0.166 0.146 0.921
β˜22 −0.009 0.111 0.116 0.948 0.009 0.102 0.109 0.945 0.021 0.096 0.109 0.944 −0.009 0.100 0.122 0.949
β˜31 −0.017 0.126 0.133 0.953 −0.032 0.133 0.125 0.939 −0.027 0.118 0.104 0.938 −0.036 0.139 0.147 0.953
β˜32 −0.008 0.069 0.072 0.950 −0.009 0.079 0.056 0.932 0.021 0.080 0.065 0.931 −0.021 0.079 0.080 0.950

3.6. Efficiency augmentation via optimally weighted averaging

In this subsection, we discuss an efficiency augmentation approach. Let 0 < a1 < a2 < … < aL ≤ 1 be pre-specified constants. Following the same arguments in Section 3.1, we can show that a modified equation (5) with Ci replaced by alCi leads to a valid variant of the proposed estimator (l = 1,...,L). Let θ(l) denote the resulting estimator of a given parameter in α0 or β0. Mimicking the oracle convex combination procedure studied by Lavancier and Rochet (2016), we combine θll=1L by their weighted average, where the weights are chosen to minimize the estimated standard errors, i.e.

θ^WEw1,,wL1=w1θ1++wL1θL1+1j=1L1wjθL,

where w1,,wL1=argminx1,,xL1XSEθ^WEx1,,xL1, and X=x1,,xL1:x1,,xL1S,  l=1L1xl1. Here S is a pre-specified subset of [0,1] including candidate values for weight, and SEθ^ denotes the bootstrapped based standard error estimate for θ^. This optimally weighted averaging procedure, by its definition, is expected to produce parameter estimators that are more efficient than each individual estimator being combined.

As suggested by our simulation studies, θ(l) corresponding to a small constant al ∈ (0,1) may have considerably reduced estimation efficiency and stability. Therefore, when applying the optimally weighted averaging procedure in practice, one may need to set all al’s large enough so that reasonable amount of recurrent event information is captured up to time alC. As an empirical rule from our numerical experience, we recommend choosing al such that the standard errors of θ(l), on average across different parameters, do not exceed three times those of the proposed estimators corresponding to the constant 1.

4. Asymptotic properties and estimation of asymptotic variance

4.1. Asymptotic properties

We study the asymptotic properties of the proposed estimators. We first introduce some necessary notation and regularity conditions. Let θ=αT,βTT and θ0=α0T,β0TT. Let L denote the covariate space containing all possible values of Z˜, and Θ denote the parameter space for θ. Write Snθ,μ=S1nTα,β,μ,S2nTα,β,μT, and let sθ,μ=ΕSnθ,μ. Note that τikθ,μ depends on µ(·) only via µ(Ci). Therefore, we can define a function τ˜ikθ,y:Θ×R0,1 such that τikθ,μ=τ˜ikθ,μCi for all θΘ, i = 1,...,n, and k = 1,...,K. Let

ζ1,k,iθ,y=τ˜ikα,β,yZiNiCiyexpZiTβk,k=1,,K

and

ζ2,k,iθ,y=τ˜ikα,β,yZiexpZ˜iTαkZ˜ij=1KexpZ˜iTαj,k=1,,K.

It is easy to see that Snθ,μ=n1i=1nζiθ,μCi, where ζiθ,y=ζ1,1,iθ,yT,,ζ1,K,iθ,yT,ζ2,1,iθ,yT,,ζ2,K,iθ,yTT. Define ζ˙i,θθ,y=ζiθ,yθ and ζ˙i,μθ,y=ζiθ,yy. Let H˙0t=dH0t/dt. For a vector u, let ||u|| denote its Euclidean form and u(j) denote its j-th component.

We assume the following regularity conditions:

C1 The parameter space Θ and the covariate space L are compact.

C2 (a) N(ν) is bounded, a.s.; (b) P(M = m|ξ = k,Z,C) and P(ξ = k|Z) are bounded away from zero for all m ≥ 0, k = 1,...,K and Z.

C3 For some ν ∈ (0), (a) P(C < ν|Z) = P(C > ν|Z) = 0, and P(C = ν|Z) > 0; (b)inftν,νΕSCt|Ziμ0t>0.

C4 {∂s(θ0)/∂θT}−1 exists and uniformly bounded in θΘ.

C5 (a) ζ˙i,θθ,y and ζ˙i,μθ,y are uniformly bounded for all i, θΘ and y ∈ [0,1]; (b) each component of ζ˙i,θθ,y or ζ˙i,μθ,y has bounded partial derivative with respect to θ.

Conditions C1 assumes bounded parameter space and bounded covariates. By condition C2 (a), the number of the observed recurrent events is bounded. This is realistic for studies with finite duration of follow-up. Condition C2(b) is assumed to guarantee the posterior probability τik is always meaningful. Condition C3 ensures that μ^ exists and is well defined, and μ^Ci is bounded between 0 and 1. Conditions C4 and C5 are rather standard assumptions for estimating equations and play important roles in establishing the consistency and asymptotic normality of the proposed estimators of α0 and β0.

We summarize the asymptotic properties of the proposed estimators in the following three Theorems. The detailed proofs are provided in Section 1.2 of the Supplementary Materials.

Theorem 1. Under the regularity conditions (C1)–(C3),

suptν,νμ^tμ0tP0. (16)

Furthermore, nμ^tμ0t converges weakly to a zero-mean Gaussian process with covariance function Εμ0sϕisϕitμ0t at (s,t), where ϕit=tνdMisΕSCs|Ziμ0s and νs<tν.

Corollary 1. Under the regularity conditions (C1)–(C3),

n1/2μ^Cμ0Cn1/2i=1nμ0CϕiC=oP1.

Theorem 2. Under the regularity conditions (C1)–(C5), we have θ^Pθ0.

Theorem 3. Under the regularity conditions (C1)–(C5), nθ^θ0dN0,V, where N(0,V) denotes a multivariate normal distribution with mean zero and covariance matrix V, and the definition of V is provided in Section 1.2 of the Supplementary Materials.

As shown in the proof of Theorem 3, the asymptotic covariance matrix V takes a complex form. Therefore, we recommend using bootstrapping to conduct variance estimation and other inferences. Specifically, we can resample the observed data with replacement and obtain an estimate for θ0 based on the resampled sample, denoted by θαT,βTT. Repeating the resampling procedure B times, where B is a large predetermined number, we can obtain many realizations of θ. Then the asymptotic covariance of θ^ can be estimated by the empirical covariance of θ. The confidence intervals for each component of θ0 can be constructed by using normal approximations to referring to the empirical distribution of the corresponding component of θ.

Formulating the proposed estimators, μ^t and θ^, as Hadamard-differentiable functionals of Donsker empirical processes, we can formally justify the presented nonparametric bootstrapping inference procedure by applying the theory for bootstrapped empirical processes in combination with the delta-method (Van Der Vaart et al., 1996; Kosorok, 2008). The simulation results reported in Section 5 further provide strong empirical evidence to support the validity of nonparametric bootstrapping inferences in this work.

5. Simulation

We conduct extensive simulation studies to assess the finite-sample performance of the proposed method and demonstrate its advantages.

Considering three latent classes (i.e. K = 3), we generate the latent class membership, ξ, based on model (3) with two covariates Z˜=Z1,Z2T. Given ξ, we generate T(j) according to model (1). The true parameters, α0,k,β˜0,k,  η0,k:k=1,2,3 are listed in Table S.1 in the Supplementary Materials. We let the censoring time C follow the Unif(2/3,1) distribution, independent of T(j), Z and ξ. We set ν = 0.98 because the upper bound of C’s support is 1.

We investigate five data scenarios, denoted by S0–S4, with different specifications of λ0(t) and the distributions of Z1, Z2, and W, which are shown in Table S.1 in the Supplementary Materials. In scenario S0, we let W = 1 to represent cases where model (1) holds without the subject-specific frailty. In scenarios S1–S4, we incorporate the subject-specific frailty W that follows the distribution, Gamma(5,5), Exponential(1), Weibull(2,1/Γ(1.5)), or truncated Norm(1,0.1), respectively, where truncated Norm(1,0.1) denotes the normal distribution Norm(1,0.1) truncated by the interval (0,2). In Table 1, we provide average summary statistics (including mean, standard deviation (SD), median, interquartile range, and range) for D by the latent class, which reflect how the number of the observed recurrent events varies across latent classes, P(ξ = k)’s (k = 1,2,3), which depict the proportions of subjects belonging to different latent classes, and relative entropy values, which capture the levels of separation among the three latent classes. We note that the distributions of D are rather different across the three latent classes in scenarios S0 and S1, but are very similar in scenarios S3 and S4. In scenario S2, in terms of D’s distribution, class 1 resembles class 2, but differs dramatically from class 3. These observations are consistent with the values of relative entropy, which show a decreasing trend as the scenario is changed from S0 to S4, suggesting well separated latent classes in scenarios S0 and S1 and heavily overlapped latent classes in scenarios S3 and S4. In scenario S2, roughly equal proportions of subjects belong to the three latent classes, while in the other scenarios, these proportions vary considerably between at least two latent classes. Thus, scenarios S0-S4 represent various data situations pertaining to different combinations of balanced versus unbalanced latent class distributions and separated versus overlapped outcomes across latent classes.

Table 1.

Summary statistics of the number of the observed recurrent events.

Simulation scenario
S0 S1 S2 S3 S4
Summary Statistics for Di
mean± SD Class 1 7.0 ± 3.7 7.0 ± 4.9 7.9 ± 11.2 6.0 ± 8.2 4.9 ± 5.9
Class 2 2.1 ± 2.6 2.0 ± 2.9 6.4 ± 8.3 5.1 ± 5.6 3.3 ± 3.2
Class 3 13.9 ± 12.1 13.9 ± 14.4 10.4 ± 15.9 7.7 ± 8.1 3.3 ± 3.0
median Class 1 6.5 6.0 5.3 4.1 4.2
Class 2 1.0 1.0 3.0 3.5 3.1
Class 3 10.1 9.1 6.1 5.8 3.3
interquartile range Class 1 [4.4,9.1] [3.5,9.3] [2.3,9.8] [1.3,7.7] [1.5,11.3]
Class 2 [0.0,3.0] [0.0,2.8] [1.1,9.7] [1.2,6.3] [1.2,5.4]
Class 3 [5.6,20.1] [3.9,19.2] [2.2,11.3] [2.0,11.4] [1.0,5.8]
range Class 1 [0.5,19.5] [0.1,27.0] [0.0,87.7] [0.1,43.3] [0.0,13.1]
Class 2 [0.0,12.7] [0.0,16.4] [0.0,50.5] [0.0,28.2] [0.0,13.2]
Class 3 [0.0,59.6] [0.0,84.9] [0.0,100.0] [0.0,50.5] [0.0,16.5]

P(ξ = k) Class 1 25.9% 25.9% 33.0% 25.4% 26.8%
Class 2 26.0% 26.2% 33.3% 24.0% 29.0%
Class 3 48.1% 47.9% 33.7% 50.6% 44.2%
Relative entropy 0.661 0.656 0.462 0.397 0.356

For each scenario, we generate 1000 simulated datasets with sample size n = 200 and n = 500. For each simulated dataset, 200 bootstrapping samples are drawn to calculate standard error estimates and confidence intervals based on normal approximation. For the proposed iterative algorithm, the maximum iteration number is set as 200, and the convergence criterion is maxβ^rβ^r1,α^rα^r1<102. Algorithm convergence is achieved for each simulated dataset within 200 iterations. In practice, specification of the maximum iteration number may need to be adjusted according to specific data scenarios.

We first assume fW (·) is correctly pre-specified. In Figure S.1 in the Supplementary Materials and Figure 1, we present simulation results on the estimation and inference of µ0(t), including the empirical biases, average estimated standard errors, average empirical standard deviations, and average empirical coverages of 95% confidence intervals (CP (95%)), for the five data scenarios with n = 200 and n = 500 respectively. It is seen that the proposed estimator of µ0(t) produces reasonably small biases, and the empirical and estimated standard deviations are fairly close except for large t’s. In all cases, the average empirical coverage probabilities are close to the nominal level 95%.

Fig. 1.

Fig. 1.

Simulation results for estimated μ^t under five scenarios.

Table 2 reports the simulation results for estimating α0 and β0 in scenario S1. Results for scenarios S0 and S2–S4 are similar, and thus are relegated to Tables S.2S.5 in the Supplementary Materials. The reported results include the average empirical biases (BIAS), average estimated standard errors based on bootstrapping (SE), average empirical standard deviations (SD), and average empirical coverage probabilities of 95% confidence intervals. We observed that, in all cases, the empirical biases are small, between 0.1% and 7.6% of the true values. The SDs and SEs are close to each other, indicating that the bootstrap-based inference works well. The empirical coverage probabilities are close to the nominal level, 0.95. In addition, we note that the agreement between SDs and SEs improves as the sample size increases. The results in Figure 1, Figure S.1, Table 2, and Tables S.2S.5 demonstrate satisfactory finite-sample performance of the proposed estimators, regardless the degree of separation or proportion balance among latent classes, and the types of frailty distributions.

Table 2.

Simulation results for estimating α0 and β0 in scenario S1.

n Parameter True value BIAS SE SD CP (95%)
n = 200 α α21 = −1 −0.019 0.712 0.741 0.949
α22 = 1 0.026 0.425 0.426 0.949
α31 = 0:5 0.010 0.646 0.637 0.947
α32 = 0:8 0.023 0.383 0.372 0.939
β log(η1) = log(4:5) 0.043 0.154 0.150 0.964
log(η2) = log(4) 0.042 0.198 0.170 0.958
log(η3) = log(3) 0.055 0.164 0.164 0.955
β˜11=0.8 0.011 0.203 0.182 0.954
β˜12=0.5 −0.043 0.109 0.107 0.948
β˜21=4 0.045 0.521 0.453 0.946
β˜22=1 −0.010 0.184 0.178 0.951
β˜31=3 −0.019 0.205 0.197 0.955
β˜32=0.5 −0.007 0.115 0.109 0.951

n = 500 α α21 = −1 −0.009 0.450 0.452 0.954
α22 = 1 0.009 0.263 0.262 0.949
α31 = 0:5 −0.004 0.392 0.392 0.951
α32 = 0:8 0.001 0.229 0.229 0.948
β log(η1) = log(4:5) 0.022 0.101 0.090 0.953
log(η2) = log(4) 0.029 0.124 0.116 0.952
log(η3) = log(3) 0.036 0.103 0.086 0.950
β˜11=0.8 0.061 0.125 0.116 0.957
β˜12=0.5 −0.038 0.080 0.064 0.949
β˜21=4 0.037 0.311 0.293 0.951
β˜22=1 −0.009 0.116 0.111 0.948
β˜31=3 −0.017 0.133 0.126 0.953
β˜32=0.5 −0.008 0.072 0.069 0.950

We compare the proposed method with Wang et al. (2001)’s multiplicative intensity model by the performance in predicting the number of the observed recurrent events, Di. To this end, we employ 5-fold cross validation. Using the proposed method, we obtain α^, β^i, and μ^ based on the training dataset and predict Di in the test dataset by D^i,K=k=1KIξ^i=kμ^CiexpZiTβ^k, where ξ^i follows the distribution,Multinomial1,p1α^,Z˜i,,pKα^,Z˜i. To evaluate Wang et al. (2001)’s method, we compute β^A and μ^A based on the training dataset using the R functions reReg() and plotRate() in R package reReg, and predict Di in the test dataset by D^i,A=μ^ACiexpZiTβ^A. In each fold, we compute n/51i=1n/5xDi, medianxDi,  i=1,,n/5, and n/51i=1n/5xDi2, with x standing for D^i,K or D^i,A, and calculate their averages over the 5 folds and 1000 datasets. The corresponding results are referred to average prediction errors (APE), median prediction errors (MPE), and average square root of mean squared prediction errors (SMSPE) respectively. Results presented in Table 3 show that fitting a single multiplicative intensity model without accounting for the existence of latent classes always produces less accurate prediction of Di compared to the proposed analysis; the increase in SMSPE can be as large as 165% (see S0 with n = 500). We also note that the proposed method and Wang et al. (2001)’s method have relatively more similar predictive performance in scenarios S3 and S4 than that in scenarios S0-S2. This observation is reasonable and can be explained by the less distinct three latent classes studied in scenarios S3 and S4.

Table 3.

Comparisons between the proposed method and Wang et al. (2001)’s method in predicting the number of the observed recurrent events.

Sample size Scenario the proposed model
Wang’s method
APE MPE SMSPE APE MPE SMSPE
n = 200 S0 5.512 2.579 8.271 8.632 4.591 13.549
S1 7.199 3.019 11.200 8.691 4.799 13.590
S2 6.224 2.763 9.561 8.562 4.429 13.492
S3 3.493 2.407 5.627 3.655 2.649 5.694
S4 2.060 1.367 2.964 2.014 1.342 2.589

n = 500 S0 5.503 2.552 8.264 8.734 4.829 13.659
S1 7.148 2.315 10.483 8.321 4.782 13.289
S2 7.060 3.391 10.387 8.282 4.209 13.267
S3 3.741 2.304 5.632 3.752 2.527 5.699
S4 2.022 1.365 2.967 2.220 1.373 2.998

We evaluate the selection of K, the number of latent classes, based on the relative entropy measure EK discussed in Section 3.3. The left section of Table 4 reports the average relative entropy measure EK given K = 2, 3, 4, or 5 with n = 200 and n = 500. It is shown that the highest relative entropy is attained at the true K = 3 in all data scenarios. The right section of Table 4 presents the percentages of selecting K as 2, 3, 4, or 5 based on 1000 simulated datasets. The percentages of selecting the true K = 3 are always the highest among K = 2,3,4, and 5, and can be as large as 93.3% (see n = 500 in scenario S1). These results suggest good empirical performance of the proposed approach to determining the number of latent classes.

Table 4.

Simulation results on selecting K based on relative entropy Ek

Scenario n average relative entropy Ek
proportion of selecting K
K = 2 K = 3 K = 4 K = 5 K = 2 K = 3 K = 4 K = 5
S0 200 0.518 0.616 0.598 0.217 1.8% 90.3% 7.9% 0.0%
500 0.504 0.661 0.607 0.385 0.0% 92.7% 7.3% 0.0%
S1 200 0.510 0.614 0.565 0.220 4.2% 87.3% 8.5% 0.0%
500 0.545 0.656 0.604 0.379 1.9% 93.3% 4.8% 0.0%
S2 200 0.401 0.457 0.318 0.217 17.9% 82.1% 0.0% 0.0%
500 0.405 0.462 0.373 0.208 18.2% 81.8% 0.0% 0.0%
S3 200 0.345 0.374 0.275 0.203 38.3% 61.7% 0.0% 0.0%
500 0.345 0.397 0.269 0.208 27.8% 72.2% 0.0% 0.0%
S4 200 0.311 0.314 0.217 0.201 44.7% 55.3% 0.0% 0.0%
500 0.304 0.356 0.206 0.176 29.8% 70.2% 0.0% 0.0%

In Section 2.2 of the Supplementary Materials, we report simulation studies that assess the impact of mis-specifying K. Since the coefficient estimates with different selections of K involve different numbers of coefficients, we compare the accuracy of predicting Di under correct and incorrect specifications of K. The observations from Table S.6 in the Supplementary Materials, combined with the findings from Table 4, suggest that the proposed method can yield reliable predictions despite the possibility of mis-specifying K in data analysis.

We also examine the empirical performance of our proposal in Section 3.5 for selecting fW (·). Specifically, we generate data according to scenario S1, where W follows the Gamma(5,5) distribution. For each simulated dataset, we adopt APEM, MPEM, or SMSPEM to decide fW (·) among five candidate distributions for W, which are W = 1 and Gamma(r,r) with r = 1,3,5,7. Note that assuming W = 1 means completely ignoring the subject-specific frailty and Gamma(1,1) is the standard Exponential distribution. Thus, the five candidate frailty distributions represent various choices of fW (·) that are similar to or very different from the true fW (·). Table 5 presents the empirical biases (BIAS), empirical standard deviations (SD), average estimated standard errors (SE), and empirical coverage probabilities of 95% confidence intervals (CP) of the proposed parameter estimates with the true fW (·) (corresponding to r = 5) and the selected fW (·). We observe that the proposed estimation with the selected fW (·), compared to that based on true fW (·), show rather comparable or only slightly elevated empirical biases and standard deviations, and the corresponding CPs are reasonably close to 95%. This suggests a promising utility of our proposal for deciding the frailty distribution in real data analyses. In Section 2.3 of the Supplementary Materials, we also report simulation results on the proposed estimates with fW (·) fixed to each candidate density. The results in Table S.7 suggest that mis-specifying fW (·) has a rather minor influence on empirical biases in all cases. This indicates that the proposed estimation is very robust to either minor or major misspecification of fW (·).

In addition, we conduct simulation studies to assess the robustness of the proposed method to the misspecification of model (3) for the latent class membership probability. The details are presented in Section 2.4 of the Supplementary Materials. We find that the proposed estimation can be biased when model (3) is severely mis-specified but is quite robust when model (3) represents only minor-to-moderate departure from the underlying true model.

In Section 2.5 of the Supplementary Materials, we present simulation studies that evaluate the variant of the proposed estimator and the efficiency augmentation approach discussed in Section 3.6. The results in Table S.10 and Table S.12 suggest that the presented variant of the proposed estimator generally works well but can be unstable when al is small. This is well expected because with a small al, Ni(alCi) may not carry enough information to stably estimate the unknown parameters. Table S.11 and Table S.13 present comparisons between the proposed estimator and an augmented estimator. We observe that augmenting the proposed estimation with optimally weighted averaging results in similar empirical bias. The augmented estimator is generally more efficient than the proposed estimator. The proposed estimator has only slightly reduced efficiency compared to that of the augmented estimator with respect to β0 (the parameter of main interest). These observations confirm the efficiency benefit of the proposed augmentation procedure and also suggests that the proposed estimator has reasonably good efficiency.

6. A real application

We apply the proposed method to a dataset from research participants at Goizueta Alzheimer’s Disease Research Center who were diagnosed with a cognitive disorder during the period 1997–2019. The main interest of our analysis is to explore the heterogeneity in the patterns of clinical phone calls made for these patients for purposes such as general inquiries and reporting clinical concerns or problems. The inter-individual variability in the phone call pattern reflects underlying disease severity and co-morbidities. It can also shed useful insight about the level of disease education and understanding by care partners and the availability of caregiver support and resources, which constitute a critical part of Alzheimer disease care.

To address this interest, we have a dataset extracted from the Emory Healthcare Clinical Data Warehouse (CDW) by selecting document types coded as “Phone Message” linked to each individual patient’s medical records. All such documents are time stamped. The dataset used by our analysis are confined to 398 patients who had clinical phone call records between September 1st, 2016 to October 23th, 2019. The recurrent event of interest is the occurrence of a clinical phone call with T(j) representing the time (in years) from September 1st of 2016 to the jth phone call. The censoring time C is time to October 23th, 2019 or death, whichever occurs first. Since the rate of death during the study period is low, around 3%, we expect the potential violation of the independent censoring assumption due to the presence of death is very minor and has a minimal impact to the application of the proposed method to this dataset. We consider four potential contributors for this recurrent event, which are gender defined as Z1 = 1 if female and 0 if male, age in years denoted by Z2, number of years of education denoted by Z3, and baseline Montreal Cognitive Assessment (MOCA) total score denoted by Z4. The continuous covariates, Z2, Z3, and Z4, are scaled to [0,1]. Excluding patients with missing data on these covariates, we have 246 patients included in the final analysis dataset. In addition, we exclude 61 phone calls, which were made for medical refills or appointment scheduling, from 37 patients’ records. This is due to the concern that these regular-care related phone calls, even after conditioning on the individual frailty and latent class membership, may not be “memoryless”, a property implied by the non-stationary Poisson process assumption adopted by the proposed model. The final dataset can be made available upon request.

We fit models (1)(3) to this dataset with ν = 3, which is chosen because the longest follow-up time in this dataset is 3.14 years. We consider five candidate distributions for W, i.e. W = 1 and Gamma(r,r), r = 1,3,5,7. Given each candidate fW (w), we calculate the relative entropy measure EK presented in Section 3.3 with the number of latent classes K equal to 2, 3, 4, or 5. The results are presented in Table S.14 in the Supplementary Materials. It is shown that the maximum relative entropy is always achieved at K = 3 with the different choices of fW (w), suggesting that three latent classes may provide the best fit of the data. Next, pre-specifying K = 3, we select the frailty density fW (·) among the five candidate distributions, following the procedure in Section 3.5. Based on the results in Table S.15 in the Supplementary Materials, all the three model-checking measures attain the smallest value when W follows the distribution Gamma(7,7). Therefore, we set K = 3 and select fW (·) as the density of Gamma(7,7) for the rest of the analyses.

We first examine the characteristics associated with the three latent classes. We apply the modal class assignment rule to categorize patients into three subgroups based on τ^ik obtained from our estimation procedure. Table S.16 in the Supplementary Materials summarizes the characteristics of the three classes. It is observed that Class 1 is the subgroup which tends to have higher education, higher MOCA scores and consists of more females, as compared to the two other classes. Class 3 is featured with the least number of years of education, youngest age, and the highest proportion of males. Class 2, while standing in the middle in terms of gender distribution, distinguishes itself from Class 1 and Class 3 by the lowest MOCA scores, which indicates severe cognitive impairment before the start of clinical phone call tracking. The age distribution is comparable between Class 1 and Class 2. These observations are rather consistent with the estimation results for α0 provided in Table S.17 in the Supplementary Materials. For example, the results in Table S.17 suggest that younger patients are more likely belong to Class 3, and Class 3 is associated with fewer years of education.

Table 6 presents the estimation results for β0 and η0, including the parameter estimates (Est), the estimated standard errors (SE), and the associated p-values. It is shown that female patients, compared to male patients, may be associated with higher frequency of clinical phone calls in Class 1 but less frequent clinical phone calls in Class 2. The different directions of the gender effect may be explained by the different cognitive status between Class 1 and Class 2. That is, patients in Class 1 tend to have good cognitive functions and thus are more likely to be capable of self-care, while spouse-care may be more common in Class 2 due to the low cognitive function of patients. Thus the gender effects identified for Class 1 and Class 2 consistently suggest female care-takers (self or spouse) tend to make more frequent clinical phone calls. The results in Table 6, particularly those for Class 1 and Class 3, suggest that younger patients tend to be associated with less frequent clinical phone calls, likely owing to their better underlying health conditions. It is also noted that in Class 1, higher education may contribute to an increase in clinical phone call frequency. This may reflect the higher health consciousness associated with higher education in the old but less cognitive impaired population (e.g. Class 1). We observe that the estimated scale parameter for Class 3 (i.e. η^k) is considerably larger than its counterparts for Class 1 and Class 2. This may imply an overall higher frequency of clinical phone calls in Class 3, likely driven by the lower level of disease education in this subgroup of patients.

Table 6.

Analysis of the clinical phone call data: estimation results for β0.

Variable Latent class 1 Latent class 2 Latent class 3
Gender β^1k Est 0.817 −1.187 −0.672
SE 0.330 0.630 0.476
p-value 0.013 0.060 0.158

Age β^2k Est −1.320 −0.442 −0.717
SE 0.639 0.254 0.364
p-value 0.039 0.082 0.049

Education β^1k Est 1.540 −0.677 −0.465
SE 0.742 0.454 0.203
p-value 0.038 0.136 0.022

MOCA β^4k Est −0.683 0.827 −1.217
SE 0.466 0.432 0.541
p-value 0.143 0.056 0.025

scale parameter η^k Est 1.398 1.657 2.121
SE 0.638 0.873 0.999
p-value 0.028 0.058 0.034

For each class, we calculate the average estimated mean function defined as

i=1nIξ^i=kμ^texpZiTβ^k/i=1nIξ^i=k,

where ξ^i denotes the latent class membership assignment based on the modal rule (i.e.ξ^i=argmax1kKτ^ik), and k = 1,...,K. In Figure S.2 in the Supplementary Materials, we plot the average estimated mean functions for Classes 1–3. The results confirm the conjectured highest frequency of clinical phone calls in Class 3 based on Table 6. Figure S.2 shows that Class 2 is associated with lower frequency of clinical phone calls than Class 1. This may be explained by the gender distribution difference between these two classes.

We further check the overall fit of the proposed latent class mixture model to the clinical phone call data using the graphic model checking approach discussed in Section 3.4. In Figure 2, we present the scatter plot of D^i based on the proposed models versus Di, and the scatter plot of D^i,A based on Wang et al. (2001)’s method versus Di. It is shown that the pairs of D^i,Di cluster around the 45 degree line fairly closely, while D^i,A and Di do not demonstrate an agreeable pattern. Such an observation suggests that the proposed method provides an overall good fit to the data and clearly has an improved utility over the standard analysis.

Fig. 2.

Fig. 2.

Predicted numbers of phone calls by the proposed model and Wang et al. (2001)’s method versus the observe numbers of phone calls.

7. Concluding remarks

In this work, we propose to tackle complex heterogeneity structure of recurrent events data through the perspective of latent class analysis. The proposed semiparametric latent class model is inherently more robust than existing parametric models, while permitting easy and stable implementation. Each step of our estimation algorithm only involves either computing a closed-form estimator or obtaining estimates via standard software such as R functions, glm() and multinorm(). In contrast, existing latent class methods are commonly plagued by algorithmic complexity and the resulting computational issues (McLachlan and Peel, 2000). The proposed method strikes a good balance between modeling flexibility and implementation reliability.

As pointed out by one referee, a more efficient estimation approach may be developed by using nonparametric maximum likelihood (NPMLE) technique (Zeng et al., 2016, 2017, for example). However, we expect the potential NPMLE approach may involve greater implementation complexity, which needs to be carefully addressed in order to facilitate its real data applications. As another option to improve estimation efficiency, the optimally weighted averaging approach discussed in Section 3.6, while requiring extra computational efforts, is more straightforward and stable to implement. The efficiency benefit of this procedure is confirmed by our simulation studies.

Our numerical studies suggest that using goodness-of-fit measures to guide the selection of frailty density fW (·) performs very well. Alternatively we may consider formulating fW (·) in a parametric form, fW (w;ν0), and then estimating the unknown parameters in ν0. In this case, by (7)(9), we can express τik in terms of α0,β0,ν0, and µ0, and denote it by τikα0,β0,ν0,μ0. To estimate ν0, we may utilize the fact that ΕDi|Zi,Ci=k=1Kτikα0,β0,ν0,μ0μ0CiexpZiTβ0,k, and construct additional estimating equations, given by

S3,nα,β,ν,μ^1ni=1nk=1Kμ^CiexpZiTβkτikα,β,ν,μ^ν                               Dik=1Kτikα,β,ν,μ^μ^CiexpZiTβk=0. (17)

Solving (17) in conjunction with equations (13) and (14) can lead to consistent estimates for α0, β0, and ν0. Note that S3,nα,β,ν,μ^ may have a non-monotone irregular surface even after simplification that fixes some component. This can cause undesirable implementation complexities. The strategies that facilitate solving equations (13) and (14), such as the use of Multinomial regression and Poisson regression, are no longer applicable to tackle (17). In contrast, addressing model (1) with a reasonably specified distribution of W as presented in Section 3 enjoys the appealing algorithm simplicity and stability, and thus may be more preferable in real data analyses.

Several potential extensions of the proposed models merit further research efforts. One is to allow for more flexible class-specific formulation of the baseline intensity function beyond the simple scale shift captured by the constant η0,k. Another direction is to apply the proposed modeling and estimation strategies to conduct latent class analysis jointly for recurrent events data and longitudinal data. This is part of our ongoing work which will be reported separately.

Supplementary Material

Supplementary Materials

Acknowledgements

The authors greatly appreciate valuable comments from the Editor, the Associate Editor, and Referees. The authors are grateful to Drs. James Lah and Felicia Goldstein, and Ms. Noy Hawkins for their help with the extraction and manipulation of the clinical phone call data and the interpretation of the analysis results. This work was supported by NIH grants R01 AG055634 and R01 HL113548.

Contributor Information

Wei Zhao, Department of Biostatistics and Bioinformatics, Emory University, Atlanta, U.S.A..

Limin Peng, Department of Bioinformatics and Biostatistics, Emory University, Atlanta, U.S.A..

John Hanfelt, Department of Bioinformatics and Biostatistics, Emory University, Atlanta, U.S.A..

References

  1. Altstein LL, Li G and Elashoff RM (2011) A method to estimate treatment efficacy among latent subgroups of a randomized clinical trial. Statist. Med, 30, 709–717. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Andersen PK and Gill RD (1982) Cox’s regression model for counting processes: a large sample study. Ann. Statist, 10, 1100–1120. [Google Scholar]
  3. Bacci S, Bartolucci F, Bettin G and Pigini C (2019) A latent class growth model for migrants’ remittances: an application to the german socio-economic panel. J. R. Statist. Soc. A, 182, 1607–1632. [Google Scholar]
  4. Celeux G and Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture model. Journal of Classification, 13, 195–212. [Google Scholar]
  5. Cook RJ and Lawless J (2007) The Statistical Analysis of Recurrent Events Springer Science & Business Media. [Google Scholar]
  6. Cook RJ and Lawless JF (1997) Marginal analysis of recurrent events and a terminating event. Statist. Med, 16, 911–924. [DOI] [PubMed] [Google Scholar]
  7. Egleston BL, Uzzo RG and Wong Y-N (2017) Latent class survival models linked by principal stratification to investigate heterogenous survival subgroups among individuals with early-stage kidney cancer. J. Am. Statist. Ass, 112, 534–546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Farewell VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics, 1041–1046. [PubMed]
  9. Fygenson M and Ritov Y (1994) Monotone estimating equations for censored data. Ann. Statist, 732–746.
  10. Gallop R, Small DS, Lin JY, Elliott MR, Joffe M and Ten Have TR (2009) Mediation analysis with principal stratification. Statist. Med, 28, 1108–1130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Han J (2009) Initial classification of joint data in em estimation of latent class joint model. Journal of Multivariate Analysis, 100, 2313–2323. [Google Scholar]
  12. Han J, Slate EH and Peña EA (2007) Parametric latent class joint model for a longitudinal biomarker and recurrent events. Statist. Med, 26, 5285–5302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Hilton RP, Zheng Y and Serban N (2018) Modeling heterogeneity in healthcare utilization using massive medical claims data. J. Am. Statist. Ass, 113, 111–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Jedidi K, Ramaswamy V and DeSarbo WS (1993) A maximum likelihood method for latent class regression involving a censored dependent variable. Psychometrika, 58, 375–394. [Google Scholar]
  15. Jo B, Findling RL, Wang C-P, Hastie TJ, Youngstrom EA, Arnold LE, Fristad MA and Horwitz SM (2017) Targeted use of growth mixture modeling: a learning perspective. Statist. Med, 36, 671–686. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Kosorok MR (2008) Introduction to empirical processes and semiparametric inference. Springer
  17. Lai D, Xu H, Koller D, Foroud T and Gao S (2016) A multivariate finite mixture latent trajectory model with application to dementia studies. Journal of Applied Statistics, 43, 2503–2523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Lavancier F and Rochet P (2016) A general procedure to combine estimators. Computational Statistics & Data Analysis, 94, 175–192. [Google Scholar]
  19. Lim HK, Li WK and Philip L (2014) Zero-inflated poisson regression mixture model. Computational Statistics & Data Analysis, 71, 151–158. [Google Scholar]
  20. Lin D, Sun W and Ying Z (1999) Nonparametric estimation of the gap time distribution for serial events with censored data. Biometrika, 86, 59–70. [Google Scholar]
  21. Lin DY, Wei L-J, Yang I and Ying Z (2000) Semiparametric regression for the mean and rate functions of recurrent events. J. R. Statist. Soc. B, 62, 711–730. [Google Scholar]
  22. Lin H, McCulloch CE and Rosenheck RA (2004) Latent pattern mixture models for informative intermittent missing data in longitudinal studies. Biometrics, 60, 295–305. [DOI] [PubMed] [Google Scholar]
  23. Lin H, Turnbull BW, McCulloch CE and Slate EH (2002) Latent class models for joint analysis of longitudinal biomarker and event process data: application to longitudinal prostate-specific antigen readings and prostate cancer. J. Am. Statist. Ass, 97, 53–65. [Google Scholar]
  24. Luo X, Huang C-Y and Wang L (2013) Quantile regression for recurrent gap time data. Biometrics, 69, 375–385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Mair P and Hudec M (2009) Multivariate weibull mixtures with proportional hazard restrictions for dwell-time-based session clustering with incomplete data. Journal of the Royal Statistical Society: Series C (Applied Statistics), 58, 619–639. [Google Scholar]
  26. McLachlan G and Peel D (2000) Finite Mixture Models John Wiley & Sons. [Google Scholar]
  27. Muthén B (2004) Latent variable analysis. The Sage Handbook of Quantitative Methodology for the Social Sciences, 345, 106–109. [Google Scholar]
  28. Muthén B and Shedden K (1999) Finite mixture modeling with mixture outcomes using the em algorithm. Biometrics, 55, 463–469. [DOI] [PubMed] [Google Scholar]
  29. Nagin DS (1999) Analyzing developmental trajectories: a semiparametric, group-based approach. Psychological Methods, 4, 139. [DOI] [PubMed] [Google Scholar]
  30. Pepe MS and Cai J (1993) Some graphical displays and marginal regression analyses for recurrent failure times and time dependent covariates. J. Am. Statist. Ass, 88, 811–820. [Google Scholar]
  31. Prentice RL, Williams BJ and Peterson AV (1981) On the regression analysis of multivariate failure time data. Biometrika, 68, 373–379. [Google Scholar]
  32. Proust-Lima C, Dartigues J-F and Jacqmin-Gadda H (2016) Joint modeling of repeated multivariate cognitive measures and competing risks of dementia and death: a latent process and latent class approach. Statist. Med, 35, 382–398. [DOI] [PubMed] [Google Scholar]
  33. Qu P, Barlogie B and Crowley J (2015) Using a latent class model to refine risk stratification in multiple myeloma. Statist. Med, 34, 2971–2980. [DOI] [PubMed] [Google Scholar]
  34. Ramaswamy V, DeSarbo WS, Reibstein DJ and Robinson WT (1993) An empirical pooling approach for estimating marketing mix elasticities with pims data. Marketing Science, 12, 103–124. [Google Scholar]
  35. Reinecke J and Seddig D (2011) Growth mixture models in longitudinal research. AStA Advances in Statistical Analysis, 95, 415–434. [Google Scholar]
  36. Stefanski LA and Carroll RJ (1987) Conditional scores and optimal scores for generalized linear measurement-error models. Biometrika, 74, 703–716. [Google Scholar]
  37. Van Der Vaart AW, van der Vaart AW, van der Vaart A and Wellner J (1996) Weak convergence and empirical processes: with applications to statistics Springer Science & Business Media. [Google Scholar]
  38. Wang M-C, Qin J and Chiang C-T (2001) Analyzing recurrent event data with informative censoring. J. Am. Statist. Ass, 96, 1057–1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Wedel M, DeSarbo WS, Bult JR and Ramaswamy V (1993) A latent class poisson regression model for heterogeneous count data. Journal of Applied Econometrics, 8, 397–411. [Google Scholar]
  40. Zeng D, Gao F and Lin D (2017) Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data. Biometrika, 104, 505–525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Zeng D, Mao L and Lin DY (2016) Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika, 103, 253–271. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Materials

RESOURCES