Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 May 30.
Published in final edited form as: Stat Med. 2024 Apr 10;43(12):2452–2471. doi: 10.1002/sim.10079

A Discrete Approximation Method for Modeling Interval-Censored Multistate Data

Lu You 1, Xiang Liu 1, Jeffrey Krischer 1
PMCID: PMC11109708  NIHMSID: NIHMS1984079  PMID: 38599784

Summary

Many longitudinal studies are designed to monitor participants for major events related to the progression of diseases. Data arising from such longitudinal studies are usually subject to interval censoring since the events are only known to occur between two monitoring visits. In this work, we propose a new method to handle interval-censored multistate data within a proportional hazards model framework where the hazard rate of events is modeled by a nonparametric function of time and the covariates affect the hazard rate proportionally. The main idea of this method is to simplify the likelihood functions of a discrete-time multistate model through an approximation and the application of data augmentation techniques, where the assumed presence of censored information facilitates a simpler parameterization. Then the expectation-maximization algorithm is used to estimate the parameters in the model. The performance of the proposed method is evaluated by numerical studies. Finally, the method is employed to analyze a dataset on tracking the advancement of coronary allograft vasculopathy following heart transplantation.

Keywords: data augmentation, interval censoring, multistate model, proportional hazards model, time-to-event data

1 |. INTRODUCTION

The advancement of diseases can usually be described as several stages or states based on the etiology, severity and presentation of diseases. For example, the stages of cancer are defined by the spread and size of cancerous tumors in tissues. The stages of Alzheimer’s disease are characterized by cognitive impairment and the progression of Alzheimer’s disease is further complicated by various disease subtypes. A lot of clinical studies are designed to understand the natural history of disease progression. These studies will follow individuals with elevated risks of disease and monitor the disease progression through scheduled clinical visits. The typical research question that these studies aim to answer is whether the individuals are at high risk of progressing to the next state and what factors will accelerate and delay the state transitions. To analyze data from these studies, multistate models are an indispensable tool especially when there are multiple disease states of interest.

In this research, we focus on the problem of interval censoring in multistate models. In clinical studies, The disease status of individuals is evaluated at their clinical visits, so usually, the investigators only know that the changes of status occurred sometime between the two consecutive visits, and the exact times of state transitions are not known. In practice, some study individuals may not strictly adhere to the monitoring schedules and missed visits are anticipated. As a consequence, the sequences of monitoring times are irregularly spaced and the problem of interval censoring is not negligible. In the literature, there has been a variety of methods to analyze interval-censored data in a single-event survival model. Lindsey1 considered a parametric likelihood-based method; Turnbull2, Finkelstein et al.3 and Farrington4 used nonparametric maximum likelihood estimation methods in semiparametric models; Tanner et al.5 considered a data augmentation algorithm using multiple imputation method; Wang et al.6 considered a two-stage data augmentation method using a latent Poisson process. Interval censoring in multistate models is usually a more complex problem. Unlike single-event models, which have at most one unobserved event per interval, there can be multiple unobserved events occurring in the interval where the number of events and the actual event sequence is also unknown. Because of the difficulties mentioned above, many multistate models in the literature have restrictive assumptions on the structure of multistate models or the distribution of state transition times. In the frequentist framework, both Marshall7 and Satten8 considered using stationary Markov chains to model the multistate data; Alioum and Commenges9 used piecewise-constant intensities in Markov models to relax the stationarity assumptions; Frydman and Szarek10 considered nonparametric estimation of transition intensities in a three-state “illness-death” model; Pak et al.11 considered the semi-parametric estimation of a progressive three-state model; Zhang and Sun12 considered a Monte Carlo expectation-maximization algorithm in the semi-parametric estimation of a four-state model with informative missingness. In the Bayesian framework, multistate models have been considered by Sharples13, Pan14, Van Den Hout and Matthews15, Kneib and Hennerfeind16 and De Iorio et al.17 with different prior specifications and model structures, and all these methods deal with the censored information by sampling from the posterior distribution of the latent variables using Monte Carlo Markov chains. In sum, the existing methods are usually limited by restrictive assumptions (e.g., stationary and parametric), flexibility to accommodate complicated multistate structures, and intensive computational algorithms (e.g., constrained optimization and Monte Carlo methods). To address the problems mentioned above, we propose a new method for fitting interval-censored multistate models with arbitrary model structures. The proposed method is semiparametric where the transition hazards are a nonparametric function of time and model covariates affect the transition hazards proportionally as in a proportional hazards model. Parameters in the model can be estimated using an expectation-maximization algorithm.

In this paper, we will propose a novel method for fitting interval-censored multistate data. The proposed method applies an approximation to reduce the number of parameters in the likelihood and increase the computational efficiency of the algorithm. The remainder of the paper is organized as follows. In Section 2, we will introduce the proposed method along with the model estimation technique. Some simulation studies for evaluating the proposed method are presented in Section 3. A real data application to the heart transplant data is presented in Section 4. Section 5 concludes the paper with some discussions and directions for future research. Appendix A presents the proposed method in the single-event survival models as a special case of the multistate model, and some technical details about the model estimation are provided in Appendix B.

2 |. METHOD

2.1 |. Data and Notations

In this section, we will give a full description of the method for modeling multistate interval-censored data. Let us first consider a typical multistate interval-censored dataset. The relationship between different states can be described by a directed graph (V,E), where the set of vertices V represents the collection of Ns states and the set of directed edges E represents all possible transitions from one state to the next. In this paper, we will number the states by 1,2,,Ns (i.e., V=1,,Ns), and s1,s2E if and only if the individuals can make a transition from state s1 to state s2. We will let Ns1=s2:s1,s2E denote all states that can be transitioned to starting from state s1. Suppose that there are m individuals in the dataset. The ith individual is sequentially monitored at the time sequence 0=Ti0<Ti1<Ti2<<Tini for the occurrences of transitions. Here we let Si(t) be the state occupied by the ith individual at time t, and Sij=SiTij. The exact times of state transitions are not directly observed and can only be inferred from the observed data. We let (0,𝒯] be the design interval, where 𝒯 can be chosen arbitrarily large to cover all observation times and thus the design interval (0,𝒯] does not have to be specified a priori. Next, we introduce some additional notations that are needed to describe the discrete-time multistate model in this paper. Different from other discrete-time survival models that assume that event time distributions have discrete masses over the design interval, in this paper, we will formulate our model in terms of intervals instead of discrete times. Such consideration helps simplify the analysis of interval-censored data and the interpretation of results. The design interval (0,𝒯] is discretized into a union of Nt disjoint small intervals 1=t0,t1,2=t1,t2,,Nt=tNt-1,tNt, where the time sequence T=t0,,tNt encompasses all observation times Tij:i=1,,m;j=1,,ni. Here we let δi,s1s2(t) be the event indicator process such that δi,s1s2(t)=1 if the ith individual transitions from state s1 to s2 at time t and 0 otherwise. Let gi,s1(t) be the at-risk process such that gi,s1(t)=1 if the ith individual is at risk of transitioning out of state s1 at time t. For ease of exposition, we introduce the following notations to allow δi,s1s2 and gi,s1() to be functions of intervals. For an interval j, we let δi,s1s2(j)=maxtjδi,s1s2(t) which means δi,s1s2(j) if and only if the transition from state s1 to s2 is observed before censoring and and in the interval j. Similarly, we let gi,s1(j)=maxtjgi,s1(t), which means gi,s1(j)=1 if and only if the individual is at risk of transitioning out of state s1 before entering the interval j and has not been censored in the interval j. We let δi,s1s2(j)=1 if the ith individual transitions from state s1 to s2 in the interval j and 0 otherwise. Also, we let gi,s1(j)=1 if the ith individual is at risk of transitioning out of s1 in the interval j and 0 otherwise. Let xi be a p-dimensional vector of covariates, we assume the following discrete-time multistate model with a complementary log-log link

Pδi,s1s2(j)=1gi,s1(j)=1=1-exp-h0,s1s2(j)expβs1s2'xi

where βs1s2 is a p-dimensional vector of regression coefficients and h0,s1s2={h0,s1s2(j)}j=1Nt are parameters that characterize the rate of transitions in the interval j. Many existing works have pointed out the similarity between discrete-time survival models with complementary log-log link to the continuous-time survival models.3,18 In discrete-time survival models, the probability mass of event time distributions is assumed to be placed on the observed event times, while the proposed method allocates the mass of transition time distributions to intervals. h0,s1s2(j) can similarly be interpreted as the cause-specific baseline hazard in competing risks models, and h0,s1s2 is usually treated as a nuisance parameter when the focus of statistical inference is on the regression coefficient βs1s2. We will further explore the relationship between the proposed model and the Cox proportional hazards model in the next section. In this manuscript, we will assume that the observation time process Ti0<Ti1<Ti2<<Tini is independent of the event time process {δi,s1s2(j)}j=1s1,s2ENt which is usually referred to as the “independent inspection time” model. As noted by Lawless the “independent inspection time” model satisfies the constant sum condition by Oller et al.19,20

2.2 |. The Observed and Complete Likelihood

Following the frequentist inference approach, we estimate parameters by maximizing the observed likelihood, which is the likelihood function given all the observed data. However, due to interval censoring, the observed likelihood takes a complicated form that is hard to deal with. To overcome the difficulty, we will utilize the technique of data augmentation, as described by Van Dyk and Meng21. The data augmentation technique assumes the existence of certain unknown parameters and data to create an augmented dataset, which allows us to derive a complete likelihood function of a simpler form. This approach is particularly useful when the statistical problem is complicated by unobserved data and missing information.22,23 According to the theory of the EM algorithm, maximum likelihood estimation can be achieved by maximizing the expected complete log-likelihood, enabling us to work directly with the complete likelihood instead of the observed likelihood.22 Throughout this paper, we will follow this idea to develop the method for handling interval-censored data. In this section, we will first derive the observed and complete likelihood functions in the single-event case. In addition, we propose to apply an approximation to the complete likelihood function to further simplify the parameter estimation and numerical computation of the method. The benefits of utilizing the complete likelihood and the approximation will be demonstrated in this section.

We let δi={δi,s1s2(j)}j=1s1,s2ENt and gi={gi,s1(j)}j=1s1,s2ENt denote the underlying event and at-risk processes, Oi=Tijj=1ni,Sijj=1ni denote the observed data, and β=βs1s2s1,s2E and h0=h0,s1s2(t)s1,s2E denote the collection of parameters. To simplify the notations, we will let As1,s2 be the s1,s2-th element of matrix A The log probability density function of the observed data Oi is

logfOiβ,h0=j=1nilogPSi,j1=SiTi,j1,Sij=SiTij=j=1nilogj:jTi,j1,TijPijSi,j1,Sij

where Pij is the stochastic matrix given by

Pijs1,s2=1exph0,s1s2jexpβs1s2xiifs1s21s:ss1Pijs1,sifs1=s2.

Therefore, the observed log-likelihood function for β and h0 is lβ,h0=i=1mlogfOiβ,h0. Calculating the derivatives of lβ,h0 with respect to β and h0 will require much effort because it involves multiplication of matrices that depend on β and h0. However, we can construct augmented data that give a simple complete likelihood function by assuming the complete knowledge of δi={δi(j)}j=1Nt and gi={gi(j)}j=1Nt. Given the values of δi and gi we can easily write down the joint log probability density function of gi, δi and the observed data Oi

logfOi,δi,giβ,h0=logfOiδi,gi+logfδi,giβ,h0=logfOiδi,gi+j=1Nts1,s2Egi,s1(j)×δi,s1s2(j)logexp{h0,s1s2(j)expβs1s2xi}-1-h0,s1s2(j)expβs1s2xi,

We note that fOiδi,gi is the probability density function that specifies the probability of observing the interval-censored outcomes given the true event process. By our assumption that the observation process is independent of the event process, fOiδi,gi should not depend on the parameters β and h0. The complete log-likelihood function for β and h0 is then given by

l*β,h0=i=1mlogfOi,δi,giβ,h0.

where the term that does not depend on β and h0 can be dropped. We can see from the expression for l*β,h0 that taking derivatives of the complete likelihood is as easy as taking derivatives of the likelihood function of a complementary log-log model.

2.3 |. An Approximation Strategy for Attaining the Partial Likelihood

Next, we will apply an approximation to the complete likelihood function that will eventually make the nuisance parameters h0 implicit in the likelihood function. The result of the approximation resembles the partial likelihood of the Cox cause-specific hazard multistate model and we will see the connections between the proposed discrete-time survival model and the continuous-time Cox proportional hazards models. Let z=h0,s1s2(j)expβs1s2xi be the rate of transition, and consider the following Taylor’s expansion

log{exp(z)-1}=log(z)+z2+z224+Oz3

which holds for small, positive values of z or say, when the intervals j are small enough. If we apply Taylor’s expansion to the zeroth order, log{exp(x)-1}log(x), the complete log-likelihood function l*β,h0 can be approximated by

l0*β,h0=i=1mj=1Nts1,s2Egi,s1Ijδi,s1s2(j)logh0,s1s2(j)+βs1s2xi-h0,s1s2(j)expβs1s2xi.

We can re-write l0*β,h0 as a partial log-likelihood for β, following the arguments by Murphy and Van der Vaart.24 Given the importance of this technique in the context of this paper, we provide a detailed derivation in this section. Notice that when the complete likelihood is maximized, the following equation holds

l0*β,h0h0,s1s2(j)=0.

Solving the equation, we can derive the following expression for h0,s1s2(j)

h0,s1s2(j)=i=1mgi,s1(j)δi,s1s2(j)i=1mgi,s1(j)expβs1s2xi,s1s2.

After we plug in the above expression back to l0*β,h0, l0*β,h0 then can be written as a partial likelihood function that is free of the nuisance parameters h0

pl0*β,h0=i=1mj=1Nts1,s2Egi,s1(j)δi,s1s2(j)βs1s2xi-logl=1mgl,s1(j)expβs1s2xl.

The zeroth order approximation reveals that the complete log-likelihood function can be expressed in a form that closely resembles the partial likelihood function of the Cox cause-specific hazard multistate model.25 Similarly, we can apply the first order approximation, log{exp(x)-1}log(x)+x2, to l*β,h0, which reduces the log-likelihood function to

l1*β,h0=i=1mj=1Nts1,s2Egi,s1(j)×δi,s1s2(j)logh0,s1s2(j)+βs1s2xi-1-δi,s1s2(j)2h0,s1s2(j)expβs1s2xi.

We can similarly derive the partial log-likelihood form using the same technique. To re-write the complete log-likelihood function as a partial log-likelihood function, we can solve for

l1*β,h0h0,s1s2(j)=0,

which will give us the following expression for h0,s1s2(j),

h0,s1s2(j)=i=1mgi,s1(j)δi,s1s2(j)i=1mgi,s1(j)1-δi,s1s2(j)/2expβs1s2xi.

After plugging in the expression for h0,s1s2(j) back into the complete log-likelihood l1*β,h0, we obtain the corresponding partial log-likelihood function

pl1*β,h0=i=1mj=1Nts1,s2Egi,s1(j)δi,s1s2(j)βs1s2xi-logl=1mgl,s1(j)1-δl,s1s2(t)2expβs1s2xl.

We note that it is not possible to obtain the partial log-likelihood form for approximations of second order or higher, but due to limitations on space, we omit the details. In this manuscript, we focus on the first-order approximation as it tends to improve the efficiency of parameter estimation compared to the zeroth-order approximation as discussed in Appendix A.5 and the simple partial log-likelihood form associated with the first-order approximation significantly simplifies the problem and numerical computations.

2.4 |. Model Estimation by EM Algorithm

In the literature of interval-censored single-event data, there are two major types of methods for parameter estimation in interval-censoring models, the EM algorithm introduced by Turnbull originally known as the self-consistency algorithm, and the iterative convex minorant algorithm introduced by Groeneboom and Wellner.2,26 The self-consistency algorithm has a concise form that makes it easy to implement, but it may converge slowly, especially when the nonparametric component h0 has a large number of parameters.27 The iterative convex minorant algorithm is a gradient-based method that optimizes the observed likelihood function directly and is typically much faster. However, the implementation is complicated due to the high-dimensional parameter space and the need to satisfy bound constraints. A lot of the existing works on interval-censored multistate data have relied on optimization algorithms to directly maximize the observed likelihood.8,28,9 Recently, Wang et al.6 discovered a novel use of the EM algorithm by considering a two-stage data augmentation using Poisson processes to improve the self-consistency algorithm. Motivated by this method, we present an EM algorithm for estimating the parameters of the proposed model in this section. We have already set up our problem to apply the EM algorithm in the previous sections. The complete likelihood based on the augmented data has a simple form that is convenient to work with. The approximation we apply leads to a partial likelihood function that is free of the nuisance parameters h0. Therefore, it has the potential to overcome the slow convergence of the self-consistency algorithm, especially when the dimension of h0 is high. The theory of the EM algorithm by Dempster et al.22 suggests that maximizing the observed log-likelihood can be achieved by maximizing the expected complete log-likelihood given the observed data. In the calculation of the expected complete log-likelihood, we will adopt the method of fractional re-weighted at-risk process proposed by Datta and Satten29 when the survival status of the individual is unknown. Combining all the techniques mentioned above, we will be able to derive an algorithm that is easy to implement and also computationally efficient.

The EM algorithm primarily consists of the expectation step and the maximization step. In the expectation step, we are concerned with evaluating the expectation of the complete log-likelihood given the observed data and current parameter estimates. To simplify the notations a bit, we let P~() and E~[] be the conditional probability and conditional expectation of a random variable given all observed data Oii=1m, and P~i() and E~i[] be the conditional probability and conditional expectation of a random variable given all observed data from the individual i, Oi. The expected complete log-likelihood given the observed data is given by

E~[l1*β,h0]=i=1mj=1Nts1,s2EE~i[gi,s1(j)δi,s1s2Ij]logh0,s1s2(j)+βs1s2xi-(E~i[gi,s1(j)]-E~igi,s1jδi,s1(j)2)h0,s1s2(j)expβs1s2xi.

The quantities that we need to evaluate are E~i[gi,s1(j)δi,s1s2(j)] and E~i[gi,s1(j)]. Since gi,s1(j)δi,s1s2(j) and gi,s1(j) can only take values of 0 and 1, we have

E~i[gi,s1(j)δi,s1s2(j)]=P~igi,s1(j)δi,s1s2(j)=1

and

E~i[gi,s1(j)]=P~i(gi,s1(j)=1).

The above conditional probabilities can be calculated by some routine manipulations of stochastic matrices. Given β and h0, we can construct the a stochastic matrix Pij representing the transition probability in the interval j for the ith individual, where Pijs1,s2 is the probability of transitioning from state s1 to s2 when s1s2, and Pijs1,s2 is the probability of leaving state s1 when s1=s2. Namely,

Pijs1,s2=1exph0,s1s2jexpβs1s2xiifs1s21s:ss1Pijs1,sifs1=s2.

Let Ti,l-1,Til be the unique interval among Ti0,Ti1,Ti1,Ti2,,Ti,ni-1,Tini that contains j. To calculate the above conditional probabilities,

P˜igi,s1jδi,s1s2j=1=PSitj1=s1,Sitj=s2SiTi,l1=Si,l1,SiTil=Sil=PSiTi,l1=Si,l1,Sitj1=s1,Sitj=s2,SiTil=SilPSiTi,l1=Si,l1,SiTil=Sil=j:j(Ti,l1,tj1]PijSi,l1,s1Pijs1,s2j:j(tj,Til]Pijs2,Silj:jTi,l1,TilPijSi,l1,Sil

Similarly,

P˜igi,s1j=1=PSitj1=s1SiTi,l1=Si,l1,SiTil=Sil=PSiTi,l1=Si,l1,Sitj1=s1,SiTil=SilPSiTi,l1=Si,l1,SiTil=Sil=j:jTi,l1,tj1PijSi,l1,S1j:Ijtj1,TilPijs1,Silj:IjTi,l1,TilPijSi,l1,Sil

We can see that the expected log complete likelihood E~[l1*β,h0] is a re-weighted version of the log complete likelihood l1*β,h0, where the weights can be interpreted as the probability of observing certain information. The first term logh0,s1s2(j)+βs1s2xi is weighted by E~i[gi,s1(j)δi,s1s2(j)] representing the expected values of the event process, and the second term h0,s1s2(j)expβs1s2xi is weighted by E~i[gi,s1(j)-gi,s1(j)δi,s1s2(j)/2] representing the expected values of the at-risk process. We can easily draw a parallel between the proposed method and the fractional re-weighted at-risk process E~i[gi,s1(j)-gi,s1(j)δi,s1s2(j)/2] to the risk set determined by the probability that the individual is still at-risk in the interval considered by others.30,29 When the survival status of an individual is unknown, the individual contributes a fractional weight j. The proposed method extends the idea of the fractional re-weighted at-risk process by Datta et al.30 from right-censored cases to interval-censored cases.

In the maximization step, we update parameters to maximize the expected complete log-likelihood function E~[l1*β,h0]. Directly taking derivatives of E~[l1*β,h0] to update β and h0 via Newton-Raphson step is possible, but impractical due to the large number of parameters in h0 and the constraint that h0(j)0. In practice, we found that directly working on E~[l1*β,h0] is not convenient and leads to slow convergence. On the contrary, the partial log-likelihood pl1*(β) only depends on parameters β and is free of bound constraints. As pl1*(β) and l1*β,h0 are equivalent with different parameterizations, we suggest updating β using the partial log-likelihood function and then updating h0 by plugging in the updated values of β. The algorithm for the maximization step is described below

Sβs1s2i=1mj=1NtE~[gi,s1(j)δi,s1s2(j)]×xi-l=1mE~lgl,s1(j)-E~lgl,s1(j)δl,s1s2(j)/2expβs1s2xlxll=1mE~lgl,s1(j)-E~lgl,s1(j)δl,s1s2(j)/2expβs1s2xl
Iβs1s2i=1mj=1NtE˜gi,s1jδi,s1s2j×l=1mE˜lgl,s1jE˜lgl,s1jδl,s1s2j/2expβs1s2xlxl2l=1mE˜lgl,s1jE˜lgl,s1jδl,s1s2j/2expβs1s2xl+l=1mE˜lgl,s1jE˜lgl,s1jδl,s1s2j/2expβs1s2xlxll=1mE˜lgl,s1jE˜lgl,s1jδl,s1s2j/2expβs1s2xl2
βs1s2βs1s2Iβs1s21Sβs1s2
h0,s1s2(t)i=1mE~igi,s1(j)δi,s1s2(j)i=1mE~igi,s1(j)-E~igi,s1(j)δi,s1s2(j)/2expβs1s2xi,

where Sβs1s2 and Iβs1s2 are the first and second derivatives of the following expected partial likelihood with respect to βs1s2

i=1mj=1Nts1,s2EE~igi,s1(j)δi,s1s2(j)βs1s2xi-logl=1mE~lgl,s1(j)1-δl,s1s2(t)2expβs1s2xl.

Compared to directly taking derivatives of the expected log-complete likelihood, the above procedure typically speeds up parameter estimate convergence and reduces the likelihood of numerical singularities due to bound constraints on h0. The EM algorithm proceeds by iterating between the expectation step and maximization step until convergence holds. The convergence properties of the proposed EM algorithm is established as a proposition in Appendix B. The step-by-step implementation of the algorithm has been described in Section S1 of the Supplementary Materials.

The proposed EM algorithm is relatively easy to implement and has a clear interpretation. As a comparison, many other methods that directly optimize the likelihood functions usually involve calculating the derivatives of complicated functions and checking the bounds and singularities of parameter estimates.8,9,31. We can now further discuss how the proposed method is different from the EM method proposed by Wang et al. and Gu et al.6,32 Both methods use data augmentation and the EM algorithm in the estimation of parameters, but we highlight the following main differences between their methods and the proposed method. First, the proposed method is derived based on a discrete-time survival model where the event rate in each interval j is described by a binomial model with complementary log-log link, while the method in Want et al. is derived based on a continuous-time survival model where the survival events are described by Poisson point processes. Second, the proposed method treats both the event process δi,s1s2(t) and the at-risk process gi,s1(t) as incompletely observed data, and applies the method of the fractional re-weighted process by Datta et al.30 to account for missing information. Finally, Want et al. and Gu et al. employ a two-stage data augmentation approach that partitions the Poisson processes into two levels, while our proposed method unifies the two stages into a single-stage procedure.

3 |. SIMULATION STUDIES

We performed simulation studies to evaluate the proposed method for interval-censored multistate data. We let the design interval be (0, 1] and discretize it into 200 sub-intervals 1=0.000,0.005,2=0.005,0.010,,200=0.995,1.000.p=4 and xi=x11,,x1p follow multivariate normal distribution with mean 0p and variance Ip×p. In Case (A), we consider a 4-state model as shown in the left panel of Figure 1. In Case (B), we consider a more complicated 6-state model as shown in the right panel of Figure 1. The true values of βs1s2 are given in Table 1. In Case (A), h12k=h13k=0.1×texp(-5t) and h24k=h24k=0.005/[1+exp(5-10t)] where t=k200 is the right end of the interval k. In Case (B), h12k=h13k=0.01×exp(-2t) and h24k=h24k=h35k=h36k=0.01×t2 where t=k/200 is the right end of the interval k.

FIGURE 1.

FIGURE 1

The 4-state model considered in Case (A) (left) and the 6-state model considered in Case (B) (right) in the simulation studies.

TABLE 1.

Biases and RMSE of the estimated regression coefficients, biases, and RMSE in the simulation study of interval-censored multistate data. Values in the parentheses are the corresponding standard errors.

βs1s2,1
βs1s2,2
βs1s2,3
βs1s2,4
s 1 s 2 Method Bias Truth Bias Truth Bias Truth Bias Truth RMSE

Case (A)
1 2 1st Order Approx 0.008 (0.003) 0.6 0.003 (0.003) 0.2 0.000 (0.003) 0.2 0.003 (0.003) 0.2 0.129 (0.002)
1 2 Package msm 0.048 (0.003) 0.6 0.041 (0.003) 0.2 0.018 (0.003) 0.2 0.023 (0.003) 0.2 0.151 (0.003)
1 3 1st Order Approx −0.000 (0.003) 0.2 0.006 (0.003) 0.6 0.001 (0.003) 0.2 0.002 (0.003) 0.2 0.126 (0.002)
1 3 Package msm 0.038 (0.003) 0.2 0.045 (0.003) 0.6 0.020 (0.003) 0.2 0.021 (0.003) 0.2 0.147 (0.002)
2 4 1st Order Approx 0.004 (0.005) 0.2 0.004 (0.004) 0.2 0.016 (0.004) 0.6 0.008 (0.004) 0.2 0.199 (0.004)
2 4 Package msm −0.058 (0.004) 0.2 −0.056 (0.004) 0.2 −0.090 (0.004) 0.6 −0.038 (0.004) 0.2 0.211 (0.003)
3 4 1st Order Approx 0.012 (0.004) 0.2 0.008 (0.005) 0.2 0.006 (0.004) 0.2 0.010 (0.004) 0.6 0.201 (0.003)
3 4 Package msm −0.048 (0.004) 0.2 −0.055 (0.004) 0.2 −0.037 (0.004) 0.2 −0.093 (0.004) 0.6 0.211 (0.003)
Case (B)
1 2 1st Order Approx 0.004 (0.002) 0.5 −0.000 (0.002) 0.1 0.000 (0.002) 0.1 0.002 (0.002) 0.1 0.105 (0.002)
1 2 Package msm 0.075 (0.003) 0.5 0.074 (0.004) 0.1 0.022 (0.003) 0.1 0.028 (0.003) 0.1 0.163 (0.003)
1 3 1st Order Approx −0.001 (0.002) 0.1 0.008 (0.002) 0.5 −0.003 (0.002) 0.1 0.002 (0.002) 0.1 0.101 (0.002)
1 3 Package msm 0.068 (0.003) 0.1 0.077 (0.004) 0.5 0.021 (0.003) 0.1 0.027 (0.003) 0.1 0.159 (0.003)
2 4 1st Order Approx 0.001 (0.004) 0.1 0.006 (0.004) 0.1 0.005 (0.004) 0.4 0.004 (0.004) 0.2 0.182 (0.003)
2 4 Package msm −0.123 (0.004) 0.1 −0.060 (0.005) 0.1 −0.085 (0.004) 0.4 −0.054 (0.005) 0.2 0.231 (0.004)
2 5 1st Order Approx 0.011 (0.004) 0.5 −0.004 (0.004) 0.1 0.002 (0.004) 0.1 −0.005 (0.004) 0.1 0.169 (0.003)
2 5 Package msm −0.139 (0.004) 0.5 −0.069 (0.004) 0.1 −0.080 (0.004) 0.1 −0.061 (0.004) 0.1 0.234 (0.004)
3 5 1st Order Approx 0.003 (0.004) 0.1 0.009 (0.004) 0.5 0.001 (0.004) 0.1 0.008 (0.004) 0.1 0.172 (0.003)
3 5 Package msm −0.062 (0.004) 0.1 −0.139 (0.004) 0.5 −0.051 (0.004) 0.1 −0.073 (0.004) 0.1 0.227 (0.003)
3 6 1st Order Approx 0.001 (0.004) 0.1 −0.014 (0.004) 0.1 0.001 (0.004) 0.2 0.009 (0.004) 0.4 0.181 (0.003)
3 6 Package msm −0.063 (0.004) 0.1 −0.136 (0.005) 0.1 −0.064 (0.004) 0.2 −0.092 (0.005) 0.4 0.245 (0.004)

We compared the proposed method with the method in the R package “msm”, which is one of the few R packages that can implement regression analysis on multistate models with arbitrary structures. The method similarly assumes a proportional hazards model for transition times and covariates, so the estimates by “msm” are comparable to the estimates from the proposed model. However, the method in the “msm” assumes that transition times follow a parametric exponential multistate model. In contrast, the proposed method is semi-parametric where the baseline hazard functions can be nonparametric, so the proposed method is more flexible in comparison to the method in “msm”. In Table 1, we compare the bias and RMSE of the estimates produced by the package “msm” and the proposed method that applies first order approximation denoted by “1st Order Approx”. From the table, we can see that the proposed method generally gives estimates of smaller biases and smaller RMSE compared to the method in the package “msm”. Furthermore, in Case (B), the algorithm in the package “msm” failed to converge on 195 simulations with error messages that the Hessian matrices were not positive-definite. The “msm” algorithm is based on the scoring procedure by Kalbfleisch and Lawless33. In formulating the scoring procedure, the chain rule is used to obtain derivatives of the likelihood function with respect to the transition probabilities initially, followed by the coefficients. As a result, the transition probabilities are in the denominator of the Hessian matrix as shown in Equation (3.6) of Kalbfleisch and Lawless33. Consequently, their method is more likely to suffer from a singular Hessian matrix. By contrast, the proposed EM algorithm is less likely to suffer from numerical singularities because of the techniques we use to simplify the likelihood function to be optimized.

4 |. DATA APPLICATION TO HEART TRANSPLANT DATA

In this section, we apply the proposed method to analyze a real dataset that tracks the progression of coronary allograft vasculopathy (CAV) after heart transplantation. The dataset can be accessed through the R package “msm” on CRAN: https://cran.r-project.org/web/packages/msm/index.html. The dataset comprises 2846 visits from 622 patients, with each visit recording the severity of CAV. We categorize CAV into States 1, 2, and 3, representing no, mild, and severe CAV, respectively, and State 4 signifies death. The time origin is chosen to be the time of transplantation. We note that in this model, patients can transition from a more severe state to a less severe state (e.g., from State 2 to State 1 and from State 3 to State 2). In this dataset, we observed such transitions on 58 patients (10.3%) where 53 patients (9.4%) had one such transition and 5 patients had two such transitions. State 4 is the end state that patients can not recover from. The multistate models in many other works make the assumption that there are no bidirection transitions12,32,34, and the backward transitions are usually assumed to be misclassified and removed from the data analysis (e.g., Section 1.3.1 of Van Den Hout34). The proposed method is not constrained by such assumptions which could potentially lead to an underestimation of the transition risks. The model incorporates two covariates: the age group of organ recipients (coded as 0 for “under 50 years” and 1 for “over 50 years”) and the age group of organ donors (coded as 0 for “under 30 years” and 1 for over 30 years”). Figure 2 illustrates the transitions between states, Table 2 summarizes the frequency of each observed transition, and Table 3 summarizes the demographics and distribution of covariates in the analysis population.

FIGURE 2.

FIGURE 2

A four-state model for describing the progression of severity of CAV following heart transplantation.

TABLE 2.

Observed number of transitions in the heart transplant dataset.

To
State 1 State 2 State 3 State 4
From State 1 1367 204 44 148
State 2 46 134 54 48
State 3 4 13 107 55

TABLE 3.

Summary of the demographics and distribution of covariates in the analysis population.

Variable Frequency (Percent)

Sex (Female) 87 (14.0%)
Age of Organ Recipient ≥ 50 291 (46.8%)
Age of Organ Donor ≥ 30 297 (47.7%)

Table 4 presents the estimated regression coefficients. We find that the risk of transitioning from State 1 to State 2 is higher for older donor age groups (p < 0.001), and there is also a reduced likelihood of recovering from State 2 to State 1 (p < 0.001) and from State 3 to State 2 (p = 0.038) when donors are in the older group. Recipients in the older age group face a higher risk of transitioning from State 1 to State 4 (p < 0.001) and a lower chance of recovering from State 3 to State 2 (p = 0.005). At the 0.05 significance level, we do not find other estimates of regression coefficients to be statistically significant.

TABLE 4.

Estimated regression coefficients and the corresponding 95% CI and p-values in the application to heart transplant data.

Variable Estimate 95% CI P values

State 1 → State 2
 Recipient Age Group 0.033 (−0.188,0.253) 0.772
 Donor Age Group 0.409 (0.195,0.623) < .001
State 1 → State 4
 Recipient Age Group 0.983 (0.592,1.374) < .001
 Donor Age Group 0.157 (−0.222,0.535) 0.417
State 2 → State 1
 Recipient Age Group 0.245 (−0.085,0.575) 0.146
 Donor Age Group −0.583 (−0.915,−0.250) < .001
State 2 → State 3
 Recipient Age Group −0.252 (−0.699,0.195) 0.269
 Donor Age Group 0.023 (−0.357,0.403) 0.904
State 2 → State 4
 Recipient Age Group −0.074 (−0.856,0.709) 0.853
 Donor Age Group 0.143 (−0.569,0.856) 0.694
State 3 → State 2
 Recipient Age Group −1.121 (−1.908,−0.334) 0.005
 Donor Age Group −0.645 (−1.254,−0.036) 0.038
State 3 → State 4
 Recipient Age Group 0.307 (−0.234,0.848) 0.266
 Donor Age Group −0.138 (−0.626,0.350) 0.579

Figure 3 presents our estimates of the 1-year probabilities of state occupation for recipients and donors in the younger age group within 10 years of transplantation. The results indicate that given a patient is currently in State 1, the probability of leaving State 1 is generally less than 0.2. Patients in State 2 have a higher probability of transitioning to State 3, and the probability of returning to State 1 decreases over time. Notably, the 1-year probability of death (State 4) increases for patients in more severe states (State 1 < State 2 < State 3). The Julia code for implementing the data application can be found in the GitHub repository at https://github.com/luyouepiusf/approximation_method.

FIGURE 3.

FIGURE 3

The figure provides estimated probabilities of the recipient’s future state occupation after one year, based on their current state and the time elapsed since transplantation. These probabilities are calculated assuming both the recipient and the donor belong to the younger age groups.

5 |. CONCLUSIONS AND DISCUSSIONS

In this paper, a novel method based on the idea of data augmentation is proposed to handle interval censoring in both single-event survival models and multistate models. An efficient EM algorithm is proposed to estimate the parameters in the model. Theoretical and numerical results have shown that the proposed method gives sound parameter estimates and is computationally efficient. The proposed method is applied to the heart transplant dataset to model the advancement of CAV following heart transplantation.

There are still some questions and future research topics that can be explored following this research effort. First, the proposed method relies on an approximation method that compromises the precision of parameter estimates. It is worth discussing whether a higher-order approximation or a correction method can improve the estimation. Second, in many observational studies, longitudinal measures of risk factors are also collected from the participants. The longitudinal risk factors can be important predictors of disease progression and can be included as covariates in the model. The proposed method can be extended to incorporate longitudinal data. Third, the proportional hazards assumption can be relaxed to allow more flexibility in our model. For example, we may consider introducing time-dependent coefficients or using a semiparametric single-index model.35,36 Finally, the current research only discusses cases when the censoring is independent but in some applications, censoring can be dependent on the covariates or the past state occupation. Methods to handle dependent censoring need to be considered in this case.37,38

Supplementary Material

Code
Supinfo

ACKNOWLEDGMENTS

Research reported in this publication was supported by the National Institute Of Diabetes And Digestive And Kidney Diseases of the National Institutes of Health under Award Number R03DK135437. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

0. Abbreviations:

EM

expectation-maximization

CAV

coronary allograft vasculopathy

RMSE

root-mean-squared error

APPENDIX

A. METHOD FOR INTERVAL-CENSORED SINGLE-EVENT DATA

We will give a description of the method in the single-event interval model as a special case of the multistate model.

A.1. Data and Notations

While we will try to keep the notations consistent with those in Section 2, some adaptations to the notations are needed to represent single-event data. We consider a typical interval-censored dataset where each individual can only experience the survival event once. Suppose that there are m individuals in the dataset. The ith individual is sequentially monitored at the time sequence Ti1<Ti2<<Tini for the survival status. We let Δi=1 if the ith individual is known to have experienced the survival event at one of the monitoring times, and Δi=0 otherwise. For individuals with Δi=1, we know that the true event time Ti* falls in one of the intervals: 0,Ti1,Ti1,Ti2,,Tini-1,Tini and denote the interval by Li,Ri with the censoring time defined by Ci=Ri. For those who did not have an event, we know that the individual remains event-free until Tini and we will let the censoring time Ci be the last follow-up time Tini. Let (0,𝒯] be the design interval that encompasses all monitoring times Tij. The design interval (0,𝒯] is discretized into a union of Nt disjoint small intervals 1=t0,t1,2=t1,t2,,Nt=t1,tNt where t0=0, tNt=𝒯, and the time sequence T=t0,t1,,tNt encompasses all Li, Ri and Ci. Here we let δi(t)=I{t=Ti*,tCi} be the event indicator process, and gi(t)=I{tTi*,tCi} be the at-risk process. For ease of exposition, we introduce the following notations to allow δi() and g() to be functions of intervals. For an interval j, we let δi(j)=maxtjδi(t)=I{Ti*j,tjCi}, which means δi(j)=1 if and only if the event is observed before censoring and in the interval j. Similarly, we let gi(j)=maxtjgi(t)=I{tj-1<Ti*,tjCi}, which means gi(j)=1 if and only if the individual is at risk of events before entering the interval j and has not been censored in the interval j. Let xi be a p-dimensional vector of covariates for the ith individual. The survival times are modeled by a discrete-time survival model with complementary log-log link

Pδi(j)=1gi(j)=1=1-exp-h0(j)expβxi

where j=1,,Nt,β is a p-dimensional vector of regression coefficients, and h0={h01,,h0(Nt)} are parameters that characterize the rate of events in the interval j.

A.2. The Observed, Complete, and Partial Likelihood

Given the observed data, we only know the values of δi(j) and gi(j) partially. When Δi=1, we know that δi(j) is 0 before Li, and at least one of δi(j) is non-zero in Li,Ri. So the log probability density function of the observed data Oi=Δi,Li,Ri,Ci is

logfOiβ,h0=logj:j0,LiPδij=0gij=11j:jLi,RiPδij=0gij=1. (A1)

When Δi=0, we know that δi(j) is 0 before Ci, so the log probability density function of the observed data Oi is

logfOiβ,h0=logj:j0,CiPδi(j)=0gi(j)=1.

Therefore, the observed log-likelihood function for β and h0 is given by lβ,h0=i=1mlogfOiβ,h0, Similarly, we can construct augmented data that gives a simple complete likelihood function by assuming the complete knowledge of δi={δi(j)}j=1Nt and gi={gi(j)}j=1Nt. Given the values of δi and gi, the joint log probability density function of gi, δi and the observed data Oi is

logfOi,δi,giβ,h0=logfOiδi,gi+logfδi,giβ,h0=logfOiδi,gi+j=1Ntgi(j)δi(j)logexph0(j)expβxi-1-h0(j)expβxi

We note that fOiδi,gi is the probability density function that specifies the probability of observing the interval-censored outcomes given the true survival information. By our assumption that the observation process is independent of the event process, fOiδi,gi should not depend on the parameters β and h0. The complete log-likelihood function for β and h0 is then given by

l*β,h0=i=1mlogfOi,δi,giβ,h0.

where the term that does not depend on β and h0 can be dropped. We can similarly apply the Taylor approximation in Section 2.2 and the complete log-likelihood function l*β,h0 can be approximated by

l0*β,h0=i=1mj=1Ntgi(j)δi(j)logh0(j)+βxi-h0(j)expβxi.

We can similarly re-write l0*β,h0 as a partial log-likelihood. By solving l0*β,h0h0j=0, we can derive the following expression for h0(j)

h0(j)=i=1mgi(j)δi(j)i=1mgi(j)expβxi.

After we plug in the above expression back to l0β,h0, l0β,h0 then can be written as a partial likelihood function that is free of the nuisance parameters h0

pl0*(β)=i=1mj=1Ntgi(j)δi(j)βxi-logl=1mgl(j)expβxl.

After applying the zeroth order approximation, the complete log-likelihood function closely resembles the Cox partial likelihood function.25 Similarly, we can apply the first order approximation, log{exp(x)-1}log(x)+x/2, to l0*β,h0, which reduces the log-likelihood function to

l1*β,h0=i=1mj=1Ntgi(j)δi(j)logh0(j)+βxi-1-δi(j)2h0(j)expβxi.

We can similarly derive the partial log-likelihood form using the technique

pl1*(β)=i=1mj=1Ntgi(j)δi(j)βxi-logl=1mgl(j)1-δl(j)2expβxl.

A.3. Model Estimation by EM Algorithm

Similarly, the EM algorithm primarily consists of the expectation step and the maximization step. In the expectation step, we are concerned with evaluating the expectation of the complete log-likelihood given the observed data and current parameter estimates. We let P~() and E~[] be the conditional probability and conditional expectation of a random variable given all observed data Δi,Li,Ri,Cii=1m, and P~i() and E~i[] be the conditional probability and conditional expectation of a random variable given all observed data from the individual i,Δi,Li,Ri,Ci. The expected complete log-likelihood given the observed data is given by

E˜l1*β,h0=i=1mj=1NtE˜igijδijlogh0j+βxiE˜igijE˜igijδij2h0jexpβxi (A2)

The quantities that we need to evaluate in the expectation step are essentially E~i[gi(j)δi(j)] and E~i[gi(j)]. Since gi(j)δi(j) and gi(j) can only take values of 0 or 1, we have E~i[gi(j)δi(j)]=P~i(gi(j)δi(j)=1) and E~i[gi(j)]=P~i(gi(j)=1).

Therefore, for individuals with Δi=1, we have

E~i[gi(j)δi(j)]=0ifj0,LiP~igi(j)δi(j)=1ifjLi,Ri0ifjRi,𝒯andE~igi(j)=1ifj0,LiP~igi(j)=1ifjLi,Ri0ifjRi,𝒯.

For individuals with Δi=1, we have E~i[gi(j)δi(j)]=0 and

E~i[gi(j)]=1ifj0,Ci0ifjCi,𝒯.

So the problem boils down to the calculation of P~i(gi(j)δi(j)=1) and P~i(gi(j)=1) when Δi=1 and jLi,Ri. Given the observed information, we know the event has occurred in the interval Li,Ri, so gi(j)δi(j)=1 for at least one of the j' in the collection {j:jLi,Ri}, let P(gi(j)δi(j)=1)=pij, so we have

P~igi(j)δi(j)=1=Pgi(j)δi(j)=1maxj:jLi,Rigi(j)δi(j)=1=Pgi(j)δi(j)=1/Pmaxj:jLi,Rigi(j)δi(j)=1=j:j<j,Li,Ri1-pijpij1-j:IjLi,Ri1-pij.

Similarly,

P~i(gi(j)=1)=Pgi(j)=1maxj:jLi,Rigi(j)δi(j)=1=Pgi(j)=1,maxj:jLi,Rigi(j)δi(j)=1/Pmaxj:jLi,Rigi(j)δi(j)=1=j:j<j,jLi,Ri1-pij1-j:jj,j:jLi,Ri1-pij1-j:jLi,Ri1-pij.

We can see that the expected log complete likelihood E~[l1*β,h0] is a re-weighted version of the log complete likelihood, where the weights can be interpreted as the probability of observing certain information. The first term logh0(j)+βxi is weighted by E~i[gi(j)δi(j)] representing the expected values of the event process, and the second term h0(j)expβxi is weighted by E~i[gi(j)-gi(j)δi(j)/2] represented the expected values of the at-risk process. We can easily draw a parallel between the proposed method and the fractional re-weighted at-risk process considered by others. 30,29 When the survival status of an individual is unknown, the individual contributes a fractional weight E~i[gi(j)-gi(j)δi(j)/2] to the risk set determined by the probability that the individual is still at-risk in the interval j. The proposed method extends the idea of the fractional re-weighted at-risk process by Datta et al.30 from right-censored cases to interval-censored cases.

In the maximization step, we update parameters to maximize the expected complete log-likelihood function E~[l1*β,h0] Similarly, we suggest updating β using the partial log-likelihood function and then updating h0 by plugging in the updated values of β. The algorithm for the maximization step is described below

S(β)i=1mj=1NtE~[gi(j)δi(j)]xi-l=1mE~lgl(j)-E~lgl(j)δl(j)/2expβxlxll=1mE~lgl(j)-E~lgl(j)δl(j)/2expβxl
Iβi=1mj=1NtE˜gijδijl=1mE˜lgljE˜lgljδlj/2expβxlxl2l=1mE˜lgljE˜lgljδlj/2expβxl+l=1mE˜lglIjE˜lgljδlj/2expβxlxll=1mE˜lgljE˜lgljδlj/2expβxl2
ββ[Iβ]1Sβ
h0(j)i=1mE~igi(j)δi(j)i=1mE~igi(j)-E~igi(j)δi(j)/2expβxi,

where S(β) and I(β) are the first and second order derivatives of E~[pl1*(β)]. The EM algorithm proceeds by iterating between the expectation step and maximization step until convergence holds. The convergence properties of the proposed EM algorithm is established as a proposition in Appendix B.

A.4. Comparing with Turnbull’s Method

We give a comparison between the proposed method and the seminal idea proposed by Turnbull2. Based on Turnbull’s idea, the log-likelihood function can be written as follows

loglβ,h0=i=1mlogfOiβ,h0

where

logfOiβ,h0=logPTi*Li,Ri=logj:IjLi,RiαijwhenΔi=1,logPTi*Ci,T=logj:IjLi,TαijwhenΔi=0,

and αij=P(Ti*j). By contrast, our current parameterization gives the following form of logfOiβ,h0

logfOiβ,h0=logj:Ij0,Liλij1-j:jLi,RiλijwhenΔi=1,logj:Ij0,CiλijwhenΔi=0,

where λij=P(Ti*jgi(j)=1). From the above equations, we can observe that the difference between αij and λij is whether the event time probabilities are modeled conditional on the at-risk process. It is noteworthy that the parameterization with λij enables us to operate with the logarithm of products rather than the logarithm of sums. Also, the parameterization with αij is subject to the constraint that j=1Ntαij=1 and αij0. These nuances explain the reason why the parameterization with λij simplifies the problem.

A.5. Simulation Studies

We will present some simulation studies to evaluate the proposed method for interval-censored single-event data. In the simulations, we let the design interval be (0,1]. The design interval is discretized into 100 sub-intervals 1=(0.00,0.01],2=(0.01,0.02],,100=(0.99,1.00] and h0k=0.03×sin(πt), where t=k100 is the right end of the interval k.p=4 and xi=x11,,x1p follow multivariate normal distribution with mean 0p×1 and variance Ip×p. The true regression coefficients β=β1,,βp=(0.4,0.3,0.2,0.1). The true survival outcomes (Δi,Ti*) are generated according to the assumed hazard rate h0() and regression coefficients β. The monitoring time sequences are generated such that Tij-Tij-1 follow exponential distribution with mean λ where ni=20, 1jni, Ti0=0, and Tij are rounded to multiples of 0.01. We considered four different cases. In Cases (I), (II), and (III), we let the sample size m=500,1000,2000 respectively, and λ=0.2. In Cases (VI) and (V), we let λ=0.15,0.25 respectively, and m=1000. All simulations are repeated 500 times. We will compare the following five methods. The proposed methods that approximate the log-likelihood functions by 0th and 1st order Taylor expansion are denoted by “0th Order Approx” and “1st Order Approx”. The method that directly maximizes the log-likelihood functions without approximations is denoted by “No Approx”. The standard Cox proportional hazards model that does not account for interval censoring is denoted by “Cox PH”. The method by the R package “icenreg” is denoted by “Package icenreg”, which implements the state-of-the-art gradient ascent algorithm to fit interval-censored data. In Table A1, we summarize the results by evaluating the biases of estimated regression coefficients β, and the root-mean-squared error (RMSE) of regression coefficients β. In addition, we reported the median and interquartile of CPU times and the number of iterations taken to converge.

TABLE A1.

Estimation results and computational performance of the five methods used in modeling interval-censored single-event data. The accuracy of the estimation is evaluated by measuring the biases and RMSE of the estimated regression coefficients, with the corresponding standard errors in parentheses. The computational performance is evaluated by the median CPU times in seconds and median number of iterations required to converge, along with the corresponding inter-quartile ranges in brackets.

Method Bias(β^1) Bias(β^2) Bias(β^3) Bias(β^4) RMSE(β^) CPU Time (Seconds) Iterations

Case (I)
 0th Order Approx −0.038 (0.002) −0.031 (0.002) −0.021 (0.002) −0.006 (0.002) 0.110 (0.002) 0.21 [0.20,0.21] 5 [5,5]
 1st Order Approx −0.034 (0.002) −0.028 (0.002) −0.019 (0.002) −0.005 (0.002) 0.109 (0.002) 0.22 [0.21,0.22] 5 [5,5]
 No Approx 0.009 (0.003) 0.004 (0.002) 0.002 (0.002) 0.005 (0.002) 0.109 (0.002) 46.41 [39.64,54.79] 940 [832,1046]
 Cox PH −0.043 (0.003) −0.034 (0.002) −0.024 (0.002) −0.007 (0.002) 0.120 (0.002) 0.01 [0.01,0.01] 3 [3,3]
 Package icenReg 0.009 (0.003) 0.005 (0.002) 0.002 (0.002) 0.006 (0.002) 0.110 (0.002) 0.09 [0.08,0.09] 10 [9,11]
Case (II)
 0th Order Approx −0.045 (0.002) −0.033 (0.002) −0.020 (0.001) −0.012 (0.002) 0.091 (0.001) 0.49 [0.48,0.49] 5 [5,5]
 1st Order Approx −0.041 (0.002) −0.030 (0.002) −0.018 (0.002) −0.011 (0.002) 0.088 (0.001) 0.51 [0.50,0.52] 5 [5,5]
 No Approx 0.002 (0.002) 0.002 (0.002) 0.003 (0.002) 0.000 (0.002) 0.076 (0.001) 100.52 [69.91,115.81] 957 [885,1034]
 Cox PH −0.051 (0.002) −0.037 (0.002) −0.024 (0.002) −0.014 (0.002) 0.100 (0.001) 0.02 [0.02,0.02] 3 [3,3]
 Package icenReg 0.002 (0.002) 0.002 (0.002) 0.003 (0.002) 0.000 (0.002) 0.077 (0.001) 0.75 [0.73,0.78] 8 [8,8]
Case (III)
 0th Order Approx −0.043 (0.001) −0.035 (0.001) −0.024 (0.001) −0.009 (0.001) 0.077 (0.001) 0.76 [0.75,0.77] 5 [5,5]
 1st Order Approx −0.039 (0.001) −0.032 (0.001) −0.022 (0.001) −0.008 (0.001) 0.074 (0.001) 0.79 [0.78,0.80] 5 [5,5]
 No Approx 0.002 (0.001) −0.001 (0.001) −0.002 (0.001) 0.002 (0.001) 0.054 (0.001) 137.35 [124.93,146.15] 970 [921,1024]
 Cox PH −0.049 (0.001) −0.039 (0.001) −0.027 (0.001) −0.011 (0.001) 0.086 (0.001) 0.02 [0.02,0.02] 3 [3,3]
 Package icenReg 0.002 (0.001) −0.001 (0.001) −0.002 (0.001) 0.002 (0.001) 0.054 (0.001) 0.43 [0.38,0.48] 12 [11,14]
Case (IV)
 0th Order Approx −0.033 (0.002) −0.023 (0.002) −0.014 (0.001) −0.008 (0.002) 0.081 (0.001) 0.69 [0.62,0.71] 5 [5,5]
 1st Order Approx −0.028 (0.002) −0.020 (0.002) −0.011 (0.001) −0.007 (0.002) 0.079 (0.001) 0.76 [0.66,0.78] 5 [5,5]
 No Approx 0.002 (0.002) 0.002 (0.002) 0.004 (0.002) 0.000 (0.002) 0.075 (0.001) 118.86 [99.55,129.15] 918 [849,987]
 Cox PH −0.036 (0.002) −0.025 (0.002) −0.015 (0.002) −0.010 (0.002) 0.086 (0.001) 0.01 [0.01,0.01] 3 [3,3]
 Package icenReg 0.002 (0.002) 0.003 (0.002) 0.004 (0.002) 0.000 (0.002) 0.075 (0.001) 0.76 [0.74,0.79] 8 [8,8]
Case (V)
 0th Order Approx −0.054 (0.001) −0.040 (0.001) −0.025 (0.001) −0.014 (0.002) 0.099 (0.001) 0.41 [0.41,0.42] 5 [5,5]
 1st Order Approx −0.051 (0.001) −0.038 (0.002) −0.023 (0.001) −0.013 (0.002) 0.096 (0.001) 0.43 [0.42,0.43] 5 [5,5]
 No Approx 0.003 (0.002) 0.003 (0.002) 0.004 (0.002) 0.001 (0.002) 0.078 (0.001) 144.31 [118.43,157.38] 1015 [937,1080]
 Cox PH −0.062 (0.002) −0.046 (0.002) −0.029 (0.002) −0.017 (0.002) 0.111 (0.001) 0.01 [0.01,0.01] 3 [3,3]
 Package icenReg 0.003 (0.002) 0.002 (0.002) 0.004 (0.002) 0.001 (0.002) 0.078 (0.001) 0.16 [0.15,0.17] 9 [9,10]

Here we briefly summarize the results. In terms of RMSE, “Cox PH” has the largest RMSE, implying that the parameter estimates obtained from models not accounting for interval censoring are not as efficient than the other methods that account for interval censoring. Among the three methods proposed in this manuscript (“0th Order Approx”, “1st Order Approx”, and “No Approx”), “No Approx” gives the estimates of the smallest RMSE while “0th Order Approx” gives the largest RMSE in most of the cases considered here. Since “1st Order Approx” improves the order of approximation in “0th Order Approx”, it gives a smaller RMSE compared to “0th Order Approx”. The RMSE of parameter estimates by “Package icenReg” is close to “No Approx”. As the state-of-the-art algorithm for fitting interval-censored data, “Package icenReg” demonstrates good computational efficiency overall with short CPU time and the algorithm converges around 10 iterations. The CPU times of “0th Order Approx” and “1st Order Approx” are comparable to “No Approx”, and the algorithms only take around 5 iterations to converge. On the contrary, “No Approx” is relatively slow and usually requires a large number of iterations to achieve convergence. The results show that the techniques to approximate the log-likelihood and get rid of nuisance parameters can greatly facilitate the convergence of the algorithms. Among the three proposed methods, we found “1st Order Approx” promising as it is computationally efficient to be extended to interval-censored multistate data, and the loss of estimation efficiency due to approximation is less compared to “0th Order Approx”.

B. CONVERGENCE OF THE EM ALGORITHMS

B.1. Convergence of the EM Algorithm for Single-Event Models

We will begin by demonstrating the convergence of the EM algorithm for single-event data, as it represents a more straight forward scenario. The convergence of the EM algorithm for multistate data can be acquired similarly with some slight modifications. In this subsection, we will follow the notations introduced in Appnedix A.

We present the following proposition to show the convergence property of the EM algorithm.

Proposition 1

(Convergence of the EM algorithm for single-event model). Let Ai=δi,gi be the augmented data and A=Aii=1m and θ=β,h0 be the parameters in the model. Then under assumptions (a)–(d) listed below, there exists a neighborhood Θ of θ*, such that for any initial value θ(0) in Θ, the sequence of parameter estimates generated by the EM algorithm {θ(k)}k=0 converges to the maximizer θ*, where θ(k+1)=supθΘQ(θθ(k)) and Q(θ~θ)=E[(θ~)O,θ].

  1. Across all values of i=1,,,m the event time processes {δi(j)}j=1Nt and the processes of monitoring times Ti1<<Tini are jointly independent. The processes of monitoring times are independent of the parameters β and h0.

  2. The survival times can be modeled by a discrete-time survival model with a complementary log-log link
    Pδi(j)=1gi(j)=1=1-exp-h0(j)expβxi
  3. The parameter space Ω is compact with non-empty interior and the maximizer θ* of the loglikelihood function l(θ) lies in the interior of Ω.

  4. l(θ) has finite stationary points in Ω.

Proof of Proposition 1. First, we will verify the following smoothness properties.

  1. l(θ) has at least second-order continuous derivatives with respect to β.

  2. Q(θ~θ) has at least second-order continuous derivatives with respect to both θ~ and θ.

To show (i), we note that

l(θ)=i=1mlogfOiθ=log𝒜fOiAifAiθdmAi

Since both δi and gi can only take values of 0 and 1, Ai is defined on a discrete measure space 𝒜, and the Lebesgue integral over 𝒜 can be reduced to a finite sum. As a consequence, the smoothness of l(θ) follows from the smoothness of fAiθ. Property (i) follows from the fact that

fAiθ=j=1Ntpijδij1-pij1-δij

has at least second-order continuous derivatives with respect to θ. Similarly, by Bayes rule, we can write

Q(θ~θ)=E[l*(θ~)O,θ]=i=1m𝒜logpAiθ~pOiAipAiθdAi𝒜pOiAipAiθdAi.

For the same reason, we can show the smoothness of Q(θ~θ) in (ii) follows from the smoothness of pAiθ~ and pAiθ.

The smoothness properties imply that the derivatives of l(θ) and Q(θ~θ) should be 0, at their stationary points. The finiteness of stationary points and the uniqueness of the maximizer implies that there exists some δ>0 such that for any θ in Θ={θ:l(θ)l(θ*)-δ},θ2l(θ) in is negative definite, and θl(θ)=0 if and only if θ=θ*. Here we remark that the finiteness of stationary points is needed to rule out the possibility that the observed data Oi is non-informative about the underlying survival process indicated by Ai. In the extreme case that pOiAi=pOi is completely non-informative of the underlying survival process, l(θ) is a constant and all θ in Ω are stationary points.

Next, we will apply Theorem 6 of Wu39 to complete the proof. If the initial value θ(0) is in Θ. By Theorem 1 of Dempster et al.22, l(θ(k)) is non-decreasing, so all subsequent θ(k) are also in Θ. The finiteness of Ω and the continuity of lθ implies that Θ is a closed set. As a result, θ2Q(θθ~) is bounded in Θ and there exists a constant λ<0 such that all eigenvalues of θ2Q(θθ~) are smaller than λ. Apply Taylor’s expansion to Q(θθk) at θ=θ(k+1), we have

Q(θθ(k))=Q(θ(k+1)θ(k))+(θ-θ(k+1))[θ2Q(θθ(k))]θ=θ0(k)(θ-θ(k+1))

where the first-order term is zero as [θQ(θθ(k))]θ=θ(k+1)=0 by the the fact that θ(k+1) maximizes Q(θθ(k)), and θ0(k) is some point on the line segment joining θ(k) and θ(k+1). Therefore, we have

Q(θ(k+1)θ(k))-Q(θ(k)θ(k))λ(θ(k+1)-θ(k))(θ(k+1)-θ(k)).

At this point, the remainder of the proof follows from Theorem 6 of Wu39.

B.2. Convergence of the EM Algorithm for Multistate Models

We also present the following proposition to show the convergence property of the EM algorithm for multistate models. In this subsection, we will follow the notations introduced in Section 2.

Proposition 2

(Convergence of the EM algorithm for multistate models). Let Ai=δi,gi be the augmented data and A=Aii=1m. Let θ=β,h0. be the parameters in the model that lies in a compact space Ω with a non-empty interior. Then under assumptions (a’), (b’), (c) and (d), there exists a neighborhood Θ of θ*, such that for any initial value θ(0) in Θ, the sequence of parameter estimates generated by the EM algorithm {θ(k)}k=0 converges to the maximizer θ*, where θ(k+1)=supθΘQ(θθ(k)), and Q(θ~θ)=E[l(θ~)O,θ].

  • (a’)

    Across all values of i=1,,,m the event time processes δi,s1s2jj=1s1,s2ENt and the processes of monitoring times Ti1<<Tini are jointly independent. The processes of monitoring times are independent of the parameters β and h0.

  • (b’)
    The multistate data can be discrete-time multistate model with a complementary log-log link
    P(δi,s1s2(j)=1gi,s1(j)=1)=1-exp{-h0,s1s2(j)expβs1s2xi}.

Proof of Proposition 2. The proof is omitted for brevity since it closely mirrors the proof of Proposition 1.

DATA AVAILABILITY STATEMENT

The data in the case study and the Julia code for implementing the proposed methods can be found online at https://github.com/luyouepiusf/approximation_method.

References

  • 1.Lindsey J. A study of interval censoring in parametric regression models. Lifetime Data Analysis 1998; 4(4): 329–354. [DOI] [PubMed] [Google Scholar]
  • 2.Turnbull BW. The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society: Series B (Methodological) 1976; 38(3): 290–295. [Google Scholar]
  • 3.Finkelstein DM, Wolfe RA. A semiparametric model for regression analysis of interval-censored failure time data. Biometrics 1985: 933–945. [PubMed] [Google Scholar]
  • 4.Farrington C. Interval censored survival data: a generalized linear modelling approach. Statistics in Medicine 1996; 15(3): 283–292. [DOI] [PubMed] [Google Scholar]
  • 5.Tanner MA, Wong WH. The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association 1987; 82(398): 528–540. [Google Scholar]
  • 6.Wang L, McMahan CS, Hudgens MG, Qureshi ZP. A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics 2016; 72(1): 222–231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Marshall G, Jones RH. Multi-state models and diabetic retinopathy. Statistics in Medicine 1995; 14(18): 1975–1983. [DOI] [PubMed] [Google Scholar]
  • 8.Satten GA, Longini IM. Markov chains with measurement error: Estimating the “true” course of a marker of the progression of human immunodeficiency virus disease. Journal of the Royal Statistical Society: Series C (Applied Statistics) 1996; 45(3): 275–295. [Google Scholar]
  • 9.Alioum A, Commenges D. MKVPCI: a computer program for Markov models with piecewise constant intensities and covariates. Computer Methods and Programs in Biomedicine 2001; 64(2): 109–119. [DOI] [PubMed] [Google Scholar]
  • 10.Frydman H, Szarek M. Nonparametric estimation in a Markov “illness–death” process from interval censored observations with missing intermediate transition status. Biometrics 2009; 65(1): 143–151. [DOI] [PubMed] [Google Scholar]
  • 11.Pak D, Li C, Todem D, Sohn W. A multistate model for correlated interval-censored life history data in caries research. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2017; 66(2): 413–423. [Google Scholar]
  • 12.Zhang H, Kelvin EA, Carpio A, Allen Hauser W. A multistate joint model for interval-censored event-history data subject to within-unit clustering and informative missingness, with application to neurocysticercosis research. Statistics in Medicine 2020; 39(23): 3195–3206. [DOI] [PubMed] [Google Scholar]
  • 13.Sharples LD. Use of the Gibbs sampler to estimate transition rates between grades of coronary disease following cardiac transplantation. Statistics in Medicine 1993; 12(12): 1155–1169. [DOI] [PubMed] [Google Scholar]
  • 14.Pan SL, Wu HM, Yen AMF, Chen THH. A Markov regression random-effects model for remission of functional disability in patients following a first stroke: a Bayesian approach. Statistics in Medicine 2007; 26(29): 5335–5353. [DOI] [PubMed] [Google Scholar]
  • 15.Van Den Hout A, Matthews FE. Estimating dementia-free life expectancy for Parkinson’s patients using Bayesian inference and microsimulation. Biostatistics 2009; 10(4): 729–743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kneib T, Hennerfeind A. Bayesian semi parametric multi-state models. Statistical Modelling 2008; 8(2): 169–198. [Google Scholar]
  • 17.De Iorio M, Gallot N, Valcarcel B, Wedderburn L. A Bayesian semiparametric Markov regression model for juvenile dermatomyositis. Statistics in Medicine 2018; 37(10): 1711–1731. [DOI] [PubMed] [Google Scholar]
  • 18.Huang J. Efficient estimation for the proportional hazards model with interval censoring. The Annals of Statistics 1996; 24(2): 540–568. [Google Scholar]
  • 19.Lawless JF. A note on interval-censored lifetime data and the constant-sum condition of Oiler, Gómez & Calle (2004). Canadian Journal of Statistics 2004; 32(3): 327–331. [Google Scholar]
  • 20.Oller R, Gómez G, Calle ML. Interval censoring: identifiability and the constant-sum property. Biometrika 2007; 94(1): 61–70. [Google Scholar]
  • 21.Van Dyk DA, Meng XL. The art of data augmentation. Journal of Computational and Graphical Statistics 2001; 10(1): 1–50. [Google Scholar]
  • 22.Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 1977; 39(1): 1–22. [Google Scholar]
  • 23.Meng XL, Van Dyk D. Fast EM-type implementations for mixed effects models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 1998; 60(3): 559–578. [Google Scholar]
  • 24.Murphy SA, Van Der Vaart AW. On profile likelihood. Journal of the American Statistical Association 2000; 95(450): 449–465. [Google Scholar]
  • 25.Cox DR. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 1972; 34(2): 187–202. [Google Scholar]
  • 26.Groeneboom P, Wellner JA. Information bounds and nonparametric maximum likelihood estimation. 19. Springer Science & Business Media. 1992. [Google Scholar]
  • 27.Zhang Z, Sun J. Interval censoring. Statistical Methods in Medical Research 2010; 19(1): 53–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Marshall G, Guo W, Jones RH. MARKOV: A computer program for multi-state Markov models with covariables. Computer Methods and Programs in Biomedicine 1995; 47(2): 147–156. [DOI] [PubMed] [Google Scholar]
  • 29.Datta S, Satten GA. Estimating future stage entry and occupation probabilities in a multistage model based on randomly right-censored data. Statistics & Probability Letters 2000; 50(1): 89–95. [Google Scholar]
  • 30.Datta S, Satten GA, Datta S. Nonparametric estimation for the three-stage irreversible illness–death model. Biometrics 2000; 56(3): 841–847. [DOI] [PubMed] [Google Scholar]
  • 31.Jackson CH, Sharples LD, Thompson SG, Duffy SW, Couto E. Multistate Markov models for disease progression with classification error. Journal of the Royal Statistical Society: Series D (The Statistician) 2003; 52(2): 193–209. [Google Scholar]
  • 32.Gu Y, Zeng D, Heiss G, Lin DY. Maximum Likelihood Estimation for Semiparametric Regression Models with Interval-Censored Multistate Data. Biometrika 2023: asad073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Kalbfleisch J, Lawless JF. The analysis of panel data under a Markov assumption. Journal of the American Statistical Association 1985: 863–871. [Google Scholar]
  • 34.Van Den Hout A. Multi-State Survival Models for Interval-Censored Data. CRC Press. 2016. [Google Scholar]
  • 35.Tian L, Zucker D, Wei L. On the Cox model with time-varying regression coefficients. Journal of the American Statistical Association 2005; 100(469): 172–183. [Google Scholar]
  • 36.Sun J, Kopciuk KA, Lu X. Polynomial spline estimation of partially linear single-index proportional hazards regression models. Computational Statistics & Data Analysis 2008; 53(1): 176–188. [Google Scholar]
  • 37.Ma L, Hu T, Sun J. Cox regression analysis of dependent interval-censored failure time data. Computational Statistics & Data Analysis 2016; 103: 79–90. [Google Scholar]
  • 38.Finkelstein DM, Goggins WB, Schoenfeld DA. Analysis of failure time data with dependent interval censoring. Biometrics 2002; 58(2): 298–304. [DOI] [PubMed] [Google Scholar]
  • 39.Wu CJ. On the convergence properties of the EM algorithm. The Annals of Statistics 1983; 11(1): 95–103. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Code
Supinfo

Data Availability Statement

The data in the case study and the Julia code for implementing the proposed methods can be found online at https://github.com/luyouepiusf/approximation_method.

RESOURCES