A Discrete Approximation Method for Modeling Interval-Censored Multistate Data

Lu You; Xiang Liu; Jeffrey Krischer

doi:10.1002/sim.10079

. Author manuscript; available in PMC: 2025 May 30.

Published in final edited form as: Stat Med. 2024 Apr 10;43(12):2452–2471. doi: 10.1002/sim.10079

A Discrete Approximation Method for Modeling Interval-Censored Multistate Data

Lu You ¹, Xiang Liu ¹, Jeffrey Krischer ¹

PMCID: PMC11109708 NIHMSID: NIHMS1984079 PMID: 38599784

Summary

Many longitudinal studies are designed to monitor participants for major events related to the progression of diseases. Data arising from such longitudinal studies are usually subject to interval censoring since the events are only known to occur between two monitoring visits. In this work, we propose a new method to handle interval-censored multistate data within a proportional hazards model framework where the hazard rate of events is modeled by a nonparametric function of time and the covariates affect the hazard rate proportionally. The main idea of this method is to simplify the likelihood functions of a discrete-time multistate model through an approximation and the application of data augmentation techniques, where the assumed presence of censored information facilitates a simpler parameterization. Then the expectation-maximization algorithm is used to estimate the parameters in the model. The performance of the proposed method is evaluated by numerical studies. Finally, the method is employed to analyze a dataset on tracking the advancement of coronary allograft vasculopathy following heart transplantation.

Keywords: data augmentation, interval censoring, multistate model, proportional hazards model, time-to-event data

1 |. INTRODUCTION

The advancement of diseases can usually be described as several stages or states based on the etiology, severity and presentation of diseases. For example, the stages of cancer are defined by the spread and size of cancerous tumors in tissues. The stages of Alzheimer’s disease are characterized by cognitive impairment and the progression of Alzheimer’s disease is further complicated by various disease subtypes. A lot of clinical studies are designed to understand the natural history of disease progression. These studies will follow individuals with elevated risks of disease and monitor the disease progression through scheduled clinical visits. The typical research question that these studies aim to answer is whether the individuals are at high risk of progressing to the next state and what factors will accelerate and delay the state transitions. To analyze data from these studies, multistate models are an indispensable tool especially when there are multiple disease states of interest.

In this research, we focus on the problem of interval censoring in multistate models. In clinical studies, The disease status of individuals is evaluated at their clinical visits, so usually, the investigators only know that the changes of status occurred sometime between the two consecutive visits, and the exact times of state transitions are not known. In practice, some study individuals may not strictly adhere to the monitoring schedules and missed visits are anticipated. As a consequence, the sequences of monitoring times are irregularly spaced and the problem of interval censoring is not negligible. In the literature, there has been a variety of methods to analyze interval-censored data in a single-event survival model. Lindsey¹ considered a parametric likelihood-based method; Turnbull², Finkelstein et al.³ and Farrington⁴ used nonparametric maximum likelihood estimation methods in semiparametric models; Tanner et al.⁵ considered a data augmentation algorithm using multiple imputation method; Wang et al.⁶ considered a two-stage data augmentation method using a latent Poisson process. Interval censoring in multistate models is usually a more complex problem. Unlike single-event models, which have at most one unobserved event per interval, there can be multiple unobserved events occurring in the interval where the number of events and the actual event sequence is also unknown. Because of the difficulties mentioned above, many multistate models in the literature have restrictive assumptions on the structure of multistate models or the distribution of state transition times. In the frequentist framework, both Marshall⁷ and Satten⁸ considered using stationary Markov chains to model the multistate data; Alioum and Commenges⁹ used piecewise-constant intensities in Markov models to relax the stationarity assumptions; Frydman and Szarek¹⁰ considered nonparametric estimation of transition intensities in a three-state “illness-death” model; Pak et al.¹¹ considered the semi-parametric estimation of a progressive three-state model; Zhang and Sun¹² considered a Monte Carlo expectation-maximization algorithm in the semi-parametric estimation of a four-state model with informative missingness. In the Bayesian framework, multistate models have been considered by Sharples¹³, Pan¹⁴, Van Den Hout and Matthews¹⁵, Kneib and Hennerfeind¹⁶ and De Iorio et al.¹⁷ with different prior specifications and model structures, and all these methods deal with the censored information by sampling from the posterior distribution of the latent variables using Monte Carlo Markov chains. In sum, the existing methods are usually limited by restrictive assumptions (e.g., stationary and parametric), flexibility to accommodate complicated multistate structures, and intensive computational algorithms (e.g., constrained optimization and Monte Carlo methods). To address the problems mentioned above, we propose a new method for fitting interval-censored multistate models with arbitrary model structures. The proposed method is semiparametric where the transition hazards are a nonparametric function of time and model covariates affect the transition hazards proportionally as in a proportional hazards model. Parameters in the model can be estimated using an expectation-maximization algorithm.

In this paper, we will propose a novel method for fitting interval-censored multistate data. The proposed method applies an approximation to reduce the number of parameters in the likelihood and increase the computational efficiency of the algorithm. The remainder of the paper is organized as follows. In Section 2, we will introduce the proposed method along with the model estimation technique. Some simulation studies for evaluating the proposed method are presented in Section 3. A real data application to the heart transplant data is presented in Section 4. Section 5 concludes the paper with some discussions and directions for future research. Appendix A presents the proposed method in the single-event survival models as a special case of the multistate model, and some technical details about the model estimation are provided in Appendix B.

2 |. METHOD

2.1 |. Data and Notations

In this section, we will give a full description of the method for modeling multistate interval-censored data. Let us first consider a typical multistate interval-censored dataset. The relationship between different states can be described by a directed graph $(V, E)$ , where the set of vertices $V$ represents the collection of $N_{s}$ states and the set of directed edges $E$ represents all possible transitions from one state to the next. In this paper, we will number the states by $1, 2, \dots, N_{s}$ (i.e., $V = \{1, \dots, N_{s}\}$ ), and $(s_{1}, s_{2}) \in E$ if and only if the individuals can make a transition from state $s_{1}$ to state $s_{2}$ . We will let $N (s_{1}) = \{s_{2} : (s_{1}, s_{2}) \in E\}$ denote all states that can be transitioned to starting from state $s_{1}$ . Suppose that there are $m$ individuals in the dataset. The ith individual is sequentially monitored at the time sequence $0 = T_{i 0} < T_{i 1} < T_{i 2} < \dots < T_{i n_{i}}$ for the occurrences of transitions. Here we let $S_{i} (t)$ be the state occupied by the ith individual at time $t$ , and $S_{i j} = S_{i} (T_{i j})$ . The exact times of state transitions are not directly observed and can only be inferred from the observed data. We let $(0, 𝒯]$ be the design interval, where $𝒯$ can be chosen arbitrarily large to cover all observation times and thus the design interval $(0, 𝒯]$ does not have to be specified a priori. Next, we introduce some additional notations that are needed to describe the discrete-time multistate model in this paper. Different from other discrete-time survival models that assume that event time distributions have discrete masses over the design interval, in this paper, we will formulate our model in terms of intervals instead of discrete times. Such consideration helps simplify the analysis of interval-censored data and the interpretation of results. The design interval $(0, 𝒯]$ is discretized into a union of $N_{t}$ disjoint small intervals $ℐ_{1} = (t_{0}, t_{1}], ℐ_{2} = (t_{1}, t_{2}], \dots, ℐ_{N_{t}} = (t_{N_{t} - 1}, t_{N_{t}}]$ , where the time sequence $T = \{t_{0}, \dots, t_{N_{t}}\}$ encompasses all observation times $\{T_{i j} : i = 1, \dots, m; j = 1, \dots, n_{i}\}$ . Here we let $δ_{i, s_{1} \to s_{2}} (t)$ be the event indicator process such that $δ_{i, s_{1} \to s_{2}} (t) = 1$ if the ith individual transitions from state $s_{1}$ to $s_{2}$ at time $t$ and 0 otherwise. Let $g_{i, s_{1}} (t)$ be the at-risk process such that $g_{i, s_{1}} (t) = 1$ if the ith individual is at risk of transitioning out of state $s_{1}$ at time $t$ . For ease of exposition, we introduce the following notations to allow $δ_{i, s_{1} \to s_{2}} (\cdot)$ and $g_{i, s_{1}} (\cdot)$ to be functions of intervals. For an interval $ℐ_{j}$ , we let $δ_{i, s_{1} \to s_{2}} (ℐ_{j}) = {m a x}_{t \in ℐ_{j}} δ_{i, s_{1} \to s_{2}} (t)$ which means $δ_{i, s_{1} \to s_{2}} (ℐ_{j})$ if and only if the transition from state $s_{1}$ to $s_{2}$ is observed before censoring and and in the interval $ℐ_{j}$ . Similarly, we let $g_{i, s_{1}} (ℐ_{j}) = {m a x}_{t \in ℐ_{j}} g_{i, s_{1}} (t)$ , which means $g_{i, s_{1}} (ℐ_{j}) = 1$ if and only if the individual is at risk of transitioning out of state $s_{1}$ before entering the interval $ℐ_{j}$ and has not been censored in the interval $ℐ_{j}$ . We let $δ_{i, s_{1} \to s_{2}} (ℐ_{j}) = 1$ if the ith individual transitions from state $s_{1}$ to $s_{2}$ in the interval $ℐ_{j}$ and 0 otherwise. Also, we let $g_{i, s_{1}} (ℐ_{j}) = 1$ if the ith individual is at risk of transitioning out of $s_{1}$ in the interval $ℐ_{j}$ and 0 otherwise. Let $x_{i}$ be a p-dimensional vector of covariates, we assume the following discrete-time multistate model with a complementary log-log link

P (δ_{i, s_{1} \to s_{2}} (ℐ_{j}) = 1 ∣ g_{i, s_{1}} (ℐ_{j}) = 1) = 1 - e x p \{- h_{0, s_{1} \to s_{2}} (ℐ_{j}) e x p (β_{s_{1} \to s_{2}}^{'} x_{i})\}

where $β_{s_{1} \to s_{2}}$ is a p-dimensional vector of regression coefficients and $h_{0, s_{1} \to s_{2}} = {h_{0, s_{1} \to s_{2}} (ℐ_{j})}_{j = 1}^{N_{t}}$ are parameters that characterize the rate of transitions in the interval $ℐ_{j}$ . Many existing works have pointed out the similarity between discrete-time survival models with complementary log-log link to the continuous-time survival models.^3,18 In discrete-time survival models, the probability mass of event time distributions is assumed to be placed on the observed event times, while the proposed method allocates the mass of transition time distributions to intervals. $h_{0, s_{1} \to s_{2}} (ℐ_{j})$ can similarly be interpreted as the cause-specific baseline hazard in competing risks models, and $h_{0, s_{1} \to s_{2}}$ is usually treated as a nuisance parameter when the focus of statistical inference is on the regression coefficient $β_{s_{1} \to s_{2}}$ . We will further explore the relationship between the proposed model and the Cox proportional hazards model in the next section. In this manuscript, we will assume that the observation time process $T_{i 0} < T_{i 1} < T_{i 2} < \dots < T_{i n_{i}}$ is independent of the event time process ${δ_{i, s_{1} \to s_{2}} (ℐ_{j})}_{j = 1 (s_{1}, s_{2}) \in E}^{N_{t}}$ which is usually referred to as the “independent inspection time” model. As noted by Lawless the “independent inspection time” model satisfies the constant sum condition by Oller et al.^19,20

2.2 |. The Observed and Complete Likelihood

Following the frequentist inference approach, we estimate parameters by maximizing the observed likelihood, which is the likelihood function given all the observed data. However, due to interval censoring, the observed likelihood takes a complicated form that is hard to deal with. To overcome the difficulty, we will utilize the technique of data augmentation, as described by Van Dyk and Meng²¹. The data augmentation technique assumes the existence of certain unknown parameters and data to create an augmented dataset, which allows us to derive a complete likelihood function of a simpler form. This approach is particularly useful when the statistical problem is complicated by unobserved data and missing information.^22,23 According to the theory of the EM algorithm, maximum likelihood estimation can be achieved by maximizing the expected complete log-likelihood, enabling us to work directly with the complete likelihood instead of the observed likelihood.²² Throughout this paper, we will follow this idea to develop the method for handling interval-censored data. In this section, we will first derive the observed and complete likelihood functions in the single-event case. In addition, we propose to apply an approximation to the complete likelihood function to further simplify the parameter estimation and numerical computation of the method. The benefits of utilizing the complete likelihood and the approximation will be demonstrated in this section.

We let $δ_{i} = {δ_{i, s_{1} \to s_{2}} (ℐ_{j})}_{j = 1 (s_{1}, s_{2}) \in E}^{N_{t}}$ and $g_{i} = {g_{i, s_{1}} (ℐ_{j})}_{j = 1 (s_{1}, s_{2}) \in E}^{N_{t}}$ denote the underlying event and at-risk processes, $O_{i} = ({\{T_{i j}\}}_{j = 1}^{n_{i}}, {\{S_{i j}\}}_{j = 1}^{n_{i}})$ denote the observed data, and $β = {\{β_{s_{1} \to s_{2}}\}}_{(s_{1}, s_{2}) \in E}$ and $h_{0} = {\{h_{0, s_{1} \to s_{2}} (t)\}}_{(s_{1}, s_{2}) \in E}$ denote the collection of parameters. To simplify the notations, we will let ${〚A〛}_{s_{1}, s_{2}}$ be the $(s_{1}, s_{2})$ -th element of matrix $A$ The log probability density function of the observed data $O_{i}$ is

l o g f (O_{i} ∣ β, h_{0}) = \sum_{j = 1}^{n_{i}} l o g P (S_{i, j - 1} = S_{i} (T_{i, j - 1}), S_{i j} = S_{i} (T_{i j})) = \sum_{j = 1}^{n_{i}} l o g {〚\prod_{j : ℐ_{j} \subset (T_{i, j - 1}, T_{i j}]} P_{i j}〛}_{(S_{i, j - 1}, S_{i j})}

where $P_{i j}$ is the stochastic matrix given by

{〚P_{i j}〛}_{s_{1}, s_{2}} = \{\begin{array}{l} 1 - e x p \{- h_{0, s_{1} \to s_{2}} (ℐ_{j}) e x p (β_{s_{1} \to s_{2}}^{'} x_{i})\} & i f s_{1} \neq s_{2} \\ 1 - \sum_{s : s \neq s_{1}} {〚P_{i j}〛}_{s_{1}, s} & i f s_{1} = s_{2} \end{array} .

Therefore, the observed log-likelihood function for $β$ and $h_{0}$ is $l (β, h_{0}) = \sum_{i = 1}^{m} l o g f (O_{i} ∣ β, h_{0})$ . Calculating the derivatives of $l (β, h_{0})$ with respect to $β$ and $h_{0}$ will require much effort because it involves multiplication of matrices that depend on $β$ and $h_{0}$ . However, we can construct augmented data that give a simple complete likelihood function by assuming the complete knowledge of $δ_{i} = {δ_{i} (ℐ_{j})}_{j = 1}^{N_{t}}$ and $g_{i} = {g_{i} (ℐ_{j})}_{j = 1}^{N_{t}}$ . Given the values of $δ_{i}$ and $g_{i}$ we can easily write down the joint log probability density function of $g_{i}$ , $δ_{i}$ and the observed data $O_{i}$

\begin{array}{l} l o g f (O_{i}, δ_{i}, g_{i} ∣ β, h_{0}) = l o g f (O_{i} ∣ δ_{i}, g_{i}) + l o g f (δ_{i}, g_{i} ∣ β, h_{0}) = l o g f (O_{i} ∣ δ_{i}, g_{i}) + \sum_{j = 1}^{N_{t}} \sum_{(s_{1}, s_{2}) \in E} g_{i, s_{1}} (ℐ_{j}) \times \\ [δ_{i, s_{1} \to s_{2}} (ℐ_{j}) l o g \{e x p {h_{0, s_{1} \to s_{2}} (ℐ_{j}) e x p (β_{s_{1} \to s_{2}}^{'} x_{i})} - 1\} - h_{0, s_{1} \to s_{2}} (ℐ_{j}) e x p (β_{s_{1} \to s_{2}}^{'} x_{i})], \end{array}

We note that $f (O_{i} ∣ δ_{i}, g_{i})$ is the probability density function that specifies the probability of observing the interval-censored outcomes given the true event process. By our assumption that the observation process is independent of the event process, $f (O_{i} ∣ δ_{i}, g_{i})$ should not depend on the parameters $β$ and $h_{0}$ . The complete log-likelihood function for $β$ and $h_{0}$ is then given by

l^{*} (β, h_{0}) = \sum_{i = 1}^{m} \log f (O_{i}, δ_{i}, g_{i} ∣ β, h_{0}) .

where the term that does not depend on $β$ and $h_{0}$ can be dropped. We can see from the expression for $l^{*} (β, h_{0})$ that taking derivatives of the complete likelihood is as easy as taking derivatives of the likelihood function of a complementary log-log model.

2.3 |. An Approximation Strategy for Attaining the Partial Likelihood

Next, we will apply an approximation to the complete likelihood function that will eventually make the nuisance parameters $h_{0}$ implicit in the likelihood function. The result of the approximation resembles the partial likelihood of the Cox cause-specific hazard multistate model and we will see the connections between the proposed discrete-time survival model and the continuous-time Cox proportional hazards models. Let $z = h_{0, s_{1} \to s_{2}} (ℐ_{j}) e x p (β_{s_{1} \to s_{2}}^{'} x_{i})$ be the rate of transition, and consider the following Taylor’s expansion

l o g {e x p (z) - 1} = l o g (z) + \frac{z}{2} + \frac{z^{2}}{24} + O (z^{3})

which holds for small, positive values of $z$ or say, when the intervals $ℐ_{j}$ are small enough. If we apply Taylor’s expansion to the zeroth order, $l o g {e x p (x) - 1} \approx l o g (x)$ , the complete log-likelihood function $l^{*} (β, h_{0})$ can be approximated by

l_{0}^{*} (β, h_{0}) = \sum_{i = 1}^{m} \sum_{j = 1}^{N_{t}} \sum_{(s_{1}, s_{2}) \in E} g_{i, s_{1}} (I_{j}) [δ_{i, s_{1} \to s_{2}} (ℐ_{j}) \{l o g h_{0, s_{1} \to s_{2}} (ℐ_{j}) + β_{s_{1} \to s_{2}}^{'} x_{i}\} - h_{0, s_{1} \to s_{2}} (ℐ_{j}) e x p (β_{s_{1} \to s_{2}}^{'} x_{i})] .

We can re-write $l_{0}^{*} (β, h_{0})$ as a partial log-likelihood for $β$ , following the arguments by Murphy and Van der Vaart.²⁴ Given the importance of this technique in the context of this paper, we provide a detailed derivation in this section. Notice that when the complete likelihood is maximized, the following equation holds

\frac{\partial l_{0}^{*} (β, h_{0})}{\partial h_{0, s_{1} \to s_{2}} (ℐ_{j})} = 0 .

Solving the equation, we can derive the following expression for $h_{0, s_{1} \to s_{2}} (ℐ_{j})$

h_{0, s_{1} \to s_{2}} (ℐ_{j}) = \frac{\sum_{i = 1}^{m} g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j})}{\sum_{i = 1}^{m} g_{i, s_{1}} (ℐ_{j}) e x p (β_{s_{1} \to s_{2}}^{'} x_{i, s_{1} \to s_{2}})} .

After we plug in the above expression back to $l_{0}^{*} (β, h_{0})$ , $l_{0}^{*} (β, h_{0})$ then can be written as a partial likelihood function that is free of the nuisance parameters $h_{0}$

p l_{0}^{*} (β, h_{0}) = \sum_{i = 1}^{m} \sum_{j = 1}^{N_{t}} \sum_{(s_{1}, s_{2}) \in E} g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j}) \{β_{s_{1} \to s_{2}}^{'} x_{i} - l o g (\sum_{l = 1}^{m} g_{l, s_{1}} (ℐ_{j}) e x p (β_{s_{1} \to s_{2}}^{'} x_{l}))\} .

The zeroth order approximation reveals that the complete log-likelihood function can be expressed in a form that closely resembles the partial likelihood function of the Cox cause-specific hazard multistate model.²⁵ Similarly, we can apply the first order approximation, $l o g {e x p (x) - 1} \approx l o g (x) + x ∕ 2$ , to $l^{*} (β, h_{0})$ , which reduces the log-likelihood function to

\begin{array}{l} l_{1}^{*} (β, h_{0}) = \sum_{i = 1}^{m} \sum_{j = 1}^{N_{t}} \sum_{(s_{1}, s_{2}) \in E} g_{i, s_{1}} (ℐ_{j}) \times \\ [δ_{i, s_{1} \to s_{2}} (ℐ_{j}) \{l o g h_{0, s_{1} \to s_{2}} (ℐ_{j}) + β_{s_{1} \to s_{2}}^{'} x_{i}\} - (1 - \frac{δ_{i, s_{1} \to s_{2}} (ℐ_{j})}{2}) h_{0, s_{1} \to s_{2}} (ℐ_{j}) e x p (β_{s_{1} \to s_{2}}^{'} x_{i})] . \end{array}

We can similarly derive the partial log-likelihood form using the same technique. To re-write the complete log-likelihood function as a partial log-likelihood function, we can solve for

\frac{\partial l_{1}^{*} (β, h_{0})}{\partial h_{0, s_{1} \to s_{2}} (ℐ_{j})} = 0,

which will give us the following expression for $h_{0, s_{1} \to s_{2}} (ℐ_{j})$ ,

h_{0, s_{1} \to s_{2}} (ℐ_{j}) = \frac{\sum_{i = 1}^{m} g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j})}{\sum_{i = 1}^{m} g_{i, s_{1}} (ℐ_{j}) (1 - δ_{i, s_{1} \to s_{2}} (ℐ_{j}) / 2) e x p \{β_{s_{1} \to s_{2}}^{'} x_{i}\}} .

After plugging in the expression for $h_{0, s_{1} \to s_{2}} (ℐ_{j})$ back into the complete log-likelihood $l_{1}^{*} (β, h_{0})$ , we obtain the corresponding partial log-likelihood function

p l_{1}^{*} (β, h_{0}) = \sum_{i = 1}^{m} \sum_{j = 1}^{N_{t}} \sum_{(s_{1}, s_{2}) \in E} g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j}) \{β_{s_{1} \to s_{2}}^{'} x_{i} - l o g (\sum_{l = 1}^{m} g_{l, s_{1}} (ℐ_{j}) (1 - \frac{δ_{l, s_{1} \to s_{2}} (t)}{2}) e x p (β_{s_{1} \to s_{2}}^{'} x_{l}))\} .

We note that it is not possible to obtain the partial log-likelihood form for approximations of second order or higher, but due to limitations on space, we omit the details. In this manuscript, we focus on the first-order approximation as it tends to improve the efficiency of parameter estimation compared to the zeroth-order approximation as discussed in Appendix A.5 and the simple partial log-likelihood form associated with the first-order approximation significantly simplifies the problem and numerical computations.

2.4 |. Model Estimation by EM Algorithm

In the literature of interval-censored single-event data, there are two major types of methods for parameter estimation in interval-censoring models, the EM algorithm introduced by Turnbull originally known as the self-consistency algorithm, and the iterative convex minorant algorithm introduced by Groeneboom and Wellner.^2,26 The self-consistency algorithm has a concise form that makes it easy to implement, but it may converge slowly, especially when the nonparametric component $h_{0}$ has a large number of parameters.²⁷ The iterative convex minorant algorithm is a gradient-based method that optimizes the observed likelihood function directly and is typically much faster. However, the implementation is complicated due to the high-dimensional parameter space and the need to satisfy bound constraints. A lot of the existing works on interval-censored multistate data have relied on optimization algorithms to directly maximize the observed likelihood.^8,28,9 Recently, Wang et al.⁶ discovered a novel use of the EM algorithm by considering a two-stage data augmentation using Poisson processes to improve the self-consistency algorithm. Motivated by this method, we present an EM algorithm for estimating the parameters of the proposed model in this section. We have already set up our problem to apply the EM algorithm in the previous sections. The complete likelihood based on the augmented data has a simple form that is convenient to work with. The approximation we apply leads to a partial likelihood function that is free of the nuisance parameters $h_{0}$ . Therefore, it has the potential to overcome the slow convergence of the self-consistency algorithm, especially when the dimension of $h_{0}$ is high. The theory of the EM algorithm by Dempster et al.²² suggests that maximizing the observed log-likelihood can be achieved by maximizing the expected complete log-likelihood given the observed data. In the calculation of the expected complete log-likelihood, we will adopt the method of fractional re-weighted at-risk process proposed by Datta and Satten²⁹ when the survival status of the individual is unknown. Combining all the techniques mentioned above, we will be able to derive an algorithm that is easy to implement and also computationally efficient.

The EM algorithm primarily consists of the expectation step and the maximization step. In the expectation step, we are concerned with evaluating the expectation of the complete log-likelihood given the observed data and current parameter estimates. To simplify the notations a bit, we let $\tilde{P} (\cdot)$ and $\tilde{E} [\cdot]$ be the conditional probability and conditional expectation of a random variable given all observed data ${\{O_{i}\}}_{i = 1}^{m}$ , and ${\tilde{P}}_{i} (\cdot)$ and ${\tilde{E}}_{i} [\cdot]$ be the conditional probability and conditional expectation of a random variable given all observed data from the individual $i$ , $O_{i}$ . The expected complete log-likelihood given the observed data is given by

\begin{array}{l} \tilde{E} [l_{1}^{*} (β, h_{0})] = \sum_{i = 1}^{m} \sum_{j = 1}^{N_{t}} \sum_{(s_{1}, s_{2}) \in E} [{\tilde{E}}_{i} [g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (I_{j})] \{l o g h_{0, s_{1} \to s_{2}} (ℐ_{j}) + β_{s_{1} \to s_{2}}^{'} x_{i}\} \\ - ({\tilde{E}}_{i} [g_{i, s_{1}} (ℐ_{j})] - \frac{{\tilde{E}}_{i} [g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1}} (ℐ_{j})]}{2}) h_{0, s_{1} \to s_{2}} (ℐ_{j}) e x p (β_{s_{1} \to s_{2}}^{'} x_{i})] . \end{array}

The quantities that we need to evaluate are ${\tilde{E}}_{i} [g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j})]$ and ${\tilde{E}}_{i} [g_{i, s_{1}} (ℐ_{j})]$ . Since $g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j})$ and $g_{i, s_{1}} (ℐ_{j})$ can only take values of 0 and 1, we have

{\tilde{E}}_{i} [g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j})] = {\tilde{P}}_{i} (g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j}) = 1)

and

{\tilde{E}}_{i} [g_{i, s_{1}} (ℐ_{j})] = {\tilde{P}}_{i} (g_{i, s_{1}} (ℐ_{j}) = 1) .

The above conditional probabilities can be calculated by some routine manipulations of stochastic matrices. Given $β$ and $h_{0}$ , we can construct the a stochastic matrix $P_{i j}$ representing the transition probability in the interval $ℐ_{j}$ for the ith individual, where ${〚P_{i j}〛}_{s_{1}, s_{2}}$ is the probability of transitioning from state $s_{1}$ to $s_{2}$ when $s_{1} \neq s_{2}$ , and ${〚P_{i j}〛}_{s_{1}, s_{2}}$ is the probability of leaving state $s_{1}$ when $s_{1} = s_{2}$ . Namely,

{〚P_{i j}〛}_{s_{1}, s_{2}} = \{\begin{array}{l} 1 - e x p \{- h_{0, s_{1} \to s_{2}} (ℐ_{j}) e x p (β_{s_{1} \to s_{2}}^{'} x_{i})\} & i f s_{1} \neq s_{2} \\ 1 - \sum_{s : s \neq s_{1}} {〚P_{i j}〛}_{s_{1}, s} & i f s_{1} = s_{2} \end{array} .

Let $(T_{i, l - 1}, T_{i l}]$ be the unique interval among $(T_{i 0}, T_{i 1}], (T_{i 1}, T_{i 2}], \dots, (T_{i, n_{i} - 1}, T_{i n_{i}}]$ that contains $ℐ_{j}$ . To calculate the above conditional probabilities,

{\tilde{P}}_{i} (g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j}) = 1) = P (S_{i} (t_{j - 1}) = s_{1}, S_{i} (t_{j}) = s_{2} ∣ S_{i} (T_{i, l - 1}) = S_{i, l - 1}, S_{i} (T_{i l}) = S_{i l}) = \frac{P (S_{i} (T_{i, l - 1}) = S_{i, l - 1}, S_{i} (t_{j - 1}) = s_{1}, S_{i} (t_{j}) = s_{2}, S_{i} (T_{i l}) = S_{i l})}{P (S_{i} (T_{i, l - 1}) = S_{i, l - 1}, S_{i} (T_{i l}) = S_{i l})} = \frac{{〚\prod_{j^{'} : ℐ_{j^{'}} \subset (T_{i, l - 1}, t_{j - 1}]} P_{i j^{'}}〛}_{S_{i, l - 1}, s_{1}} {〚P_{i j}〛}_{s_{1}, s_{2}} {〚\prod_{j^{'} : ℐ_{j^{'}} \subset (t_{j}, T_{i l}]} P_{i j^{'}}〛}_{s_{2}, S_{i l}}}{{〚 \prod_{j^{'} : ℐ_{j^{'}} \subset (T_{i, l - 1}, T_{i l}]} P_{i j^{'}} 〛}_{S_{i, l - 1}, S_{i l}}}

Similarly,

{\tilde{P}}_{i} (g_{i, s_{1}} (ℐ_{j}) = 1) = P (S_{i} (t_{j - 1}) = s_{1} ∣ S_{i} (T_{i, l - 1}) = S_{i, l - 1}, S_{i} (T_{i l}) = S_{i l}) = \frac{P (S_{i} (T_{i, l - 1}) = S_{i, l - 1}, S_{i} (t_{j - 1}) = s_{1}, S_{i} (T_{i l}) = S_{i l})}{P (S_{i} (T_{i, l - 1}) = S_{i, l - 1}, S_{i} (T_{i l}) = S_{i l})} = \frac{{〚\prod_{j^{'} : ℐ_{j^{'}} \subset (T_{i, l - 1}, t_{j - 1}]} P_{i j^{'}}〛}_{S_{i, l - 1}, S_{1}} {〚\prod_{j^{'} : I_{j^{'}} \subset (t_{j - 1}, T_{i l}]} P_{i j^{'}}〛}_{s_{1}, S_{i l}}}{{〚\prod_{j^{'} : I_{j^{'}} \subset (T_{i, l - 1}, T_{i l}]} P_{i j^{'}}〛}_{S_{i, l - 1}, S_{i l}}}

We can see that the expected log complete likelihood $\tilde{E} [l_{1}^{*} (β, h_{0})]$ is a re-weighted version of the log complete likelihood $l_{1}^{*} (β, h_{0})$ , where the weights can be interpreted as the probability of observing certain information. The first term $l o g h_{0, s_{1} \to s_{2}} (ℐ_{j}) + β_{s_{1} \to s_{2}}^{'} x_{i}$ is weighted by ${\tilde{E}}_{i} [g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j})]$ representing the expected values of the event process, and the second term $h_{0, s_{1} \to s_{2}} (ℐ_{j}) e x p (β_{s_{1} \to s_{2}}^{'} x_{i})$ is weighted by ${\tilde{E}}_{i} [g_{i, s_{1}} (ℐ_{j}) - g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j}) / 2]$ representing the expected values of the at-risk process. We can easily draw a parallel between the proposed method and the fractional re-weighted at-risk process ${\tilde{E}}_{i} [g_{i, s_{1}} (ℐ_{j}) - g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j}) / 2]$ to the risk set determined by the probability that the individual is still at-risk in the interval considered by others.^30,29 When the survival status of an individual is unknown, the individual contributes a fractional weight $ℐ_{j}$ . The proposed method extends the idea of the fractional re-weighted at-risk process by Datta et al.³⁰ from right-censored cases to interval-censored cases.

In the maximization step, we update parameters to maximize the expected complete log-likelihood function $\tilde{E} [l_{1}^{*} (β, h_{0})]$ . Directly taking derivatives of $\tilde{E} [l_{1}^{*} (β, h_{0})]$ to update $β$ and $h_{0}$ via Newton-Raphson step is possible, but impractical due to the large number of parameters in $h_{0}$ and the constraint that $h_{0} (ℐ_{j}) \geq 0$ . In practice, we found that directly working on $\tilde{E} [l_{1}^{*} (β, h_{0})]$ is not convenient and leads to slow convergence. On the contrary, the partial log-likelihood $p l_{1}^{*} (β)$ only depends on parameters $β$ and is free of bound constraints. As $p l_{1}^{*} (β)$ and $l_{1}^{*} (β, h_{0})$ are equivalent with different parameterizations, we suggest updating $β$ using the partial log-likelihood function and then updating $h_{0}$ by plugging in the updated values of $β$ . The algorithm for the maximization step is described below

S (β_{s_{1} \to s_{2}}) \leftarrow \sum_{i = 1}^{m} \sum_{j = 1}^{N_{t}} \tilde{E} [g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j})] \times \{x_{i} - \frac{\sum_{l = 1}^{m} ({\tilde{E}}_{l} [g_{l, s_{1}} (ℐ_{j})] - {\tilde{E}}_{l} [g_{l, s_{1}} (ℐ_{j}) δ_{l, s_{1} \to s_{2}} (ℐ_{j})] / 2) e x p (β_{s_{1} \to s_{2}}^{'} x_{l}) x_{l}}{\sum_{l = 1}^{m} ({\tilde{E}}_{l} [g_{l, s_{1}} (ℐ_{j})] - {\tilde{E}}_{l} [g_{l, s_{1}} (ℐ_{j}) δ_{l, s_{1} \to s_{2}} (ℐ_{j})] / 2) e x p (β_{s_{1} \to s_{2}}^{'} x_{l})}\}

I (β_{s_{1} \to s_{2}}) \leftarrow \sum_{i = 1}^{m} \sum_{j = 1}^{N_{t}} \tilde{E} [g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j})] \times \{- \frac{\sum_{l = 1}^{m} ({\tilde{E}}_{l} [g_{l, s_{1}} (ℐ_{j})] - {\tilde{E}}_{l} [g_{l, s_{1}} (ℐ_{j}) δ_{l, s_{1} \to s_{2}} (ℐ_{j})] / 2) e x p (β_{s_{1} \to s_{2}}^{'} x_{l}) x_{l}^{\otimes 2}}{\sum_{l = 1}^{m} ({\tilde{E}}_{l} [g_{l, s_{1}} (ℐ_{j})] - {\tilde{E}}_{l} [g_{l, s_{1}} (ℐ_{j}) δ_{l, s_{1} \to s_{2}} (ℐ_{j})] / 2) e x p (β_{s_{1} \to s_{2}}^{'} x_{l})} + {(\frac{\sum_{l = 1}^{m} ({\tilde{E}}_{l} [g_{l, s_{1}} (ℐ_{j})] - {\tilde{E}}_{l} [g_{l, s_{1}} (ℐ_{j}) δ_{l, s_{1} \to s_{2}} (ℐ_{j})] / 2) e x p (β_{s_{1} \to s_{2}}^{'} x_{l}) x_{l}}{\sum_{l = 1}^{m} ({\tilde{E}}_{l} [g_{l, s_{1}} (ℐ_{j})] - {\tilde{E}}_{l} [g_{l, s_{1}} (ℐ_{j}) δ_{l, s_{1} \to s_{2}} (ℐ_{j})] / 2) e x p (β_{s_{1} \to s_{2}}^{'} x_{l})})}^{\otimes 2}\}

β_{s_{1} \to s_{2}} \leftarrow β_{s_{1} \to s_{2}} - {[I (β_{s_{1} \to s_{2}})]}^{- 1} S (β_{s_{1} \to s_{2}})

h_{0, s_{1} \to s_{2}} (t) \leftarrow \frac{\sum_{i = 1}^{m} {\tilde{E}}_{i} [g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j})]}{\sum_{i = 1}^{m} ({\tilde{E}}_{i} [g_{i, s_{1}} (ℐ_{j})] - {\tilde{E}}_{i} [g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j})] / 2) e x p (β_{s_{1} \to s_{2}}^{'} x_{i})},

where $S (β_{s_{1} \to s_{2}})$ and $I (β_{s_{1} \to s_{2}})$ are the first and second derivatives of the following expected partial likelihood with respect to $β_{s_{1} \to s_{2}}$

\sum_{i = 1}^{m} \sum_{j = 1}^{N_{t}} \sum_{(s_{1}, s_{2}) \in E} {\tilde{E}}_{i} [g_{i, s_{1}} (ℐ_{j}) δ_{i, s_{1} \to s_{2}} (ℐ_{j})] \{β_{s_{1} \to s_{2}}^{'} x_{i} - l o g (\sum_{l = 1}^{m} {\tilde{E}}_{l} [g_{l, s_{1}} (ℐ_{j}) (1 - \frac{δ_{l, s_{1} \to s_{2}} (t)}{2})] e x p (β_{s_{1} \to s_{2}}^{'} x_{l}))\} .

Compared to directly taking derivatives of the expected log-complete likelihood, the above procedure typically speeds up parameter estimate convergence and reduces the likelihood of numerical singularities due to bound constraints on $h_{0}$ . The EM algorithm proceeds by iterating between the expectation step and maximization step until convergence holds. The convergence properties of the proposed EM algorithm is established as a proposition in Appendix B. The step-by-step implementation of the algorithm has been described in Section S1 of the Supplementary Materials.

The proposed EM algorithm is relatively easy to implement and has a clear interpretation. As a comparison, many other methods that directly optimize the likelihood functions usually involve calculating the derivatives of complicated functions and checking the bounds and singularities of parameter estimates.^8,9,31. We can now further discuss how the proposed method is different from the EM method proposed by Wang et al. and Gu et al.^6,32 Both methods use data augmentation and the EM algorithm in the estimation of parameters, but we highlight the following main differences between their methods and the proposed method. First, the proposed method is derived based on a discrete-time survival model where the event rate in each interval $ℐ_{j}$ is described by a binomial model with complementary log-log link, while the method in Want et al. is derived based on a continuous-time survival model where the survival events are described by Poisson point processes. Second, the proposed method treats both the event process $δ_{i, s_{1} \to s_{2}} (t)$ and the at-risk process $g_{i, s_{1}} (t)$ as incompletely observed data, and applies the method of the fractional re-weighted process by Datta et al.³⁰ to account for missing information. Finally, Want et al. and Gu et al. employ a two-stage data augmentation approach that partitions the Poisson processes into two levels, while our proposed method unifies the two stages into a single-stage procedure.

3 |. SIMULATION STUDIES

We performed simulation studies to evaluate the proposed method for interval-censored multistate data. We let the design interval be (0, 1] and discretize it into 200 sub-intervals $ℐ_{1} = (0.000, 0.005], ℐ_{2} = (0.005, 0.010], \dots, ℐ_{200} = (0.995, 1.000] . p = 4$ and $x_{i} = {(x_{11}, \dots, x_{1 p})}^{'}$ follow multivariate normal distribution with mean $0_{p}$ and variance $I_{p \times p}$ . In Case (A), we consider a 4-state model as shown in the left panel of Figure 1. In Case (B), we consider a more complicated 6-state model as shown in the right panel of Figure 1. The true values of $β_{s_{1} \to s_{2}}$ are given in Table 1. In Case (A), $h_{1 \to 2} (ℐ_{k}) = h_{1 \to 3} (ℐ_{k}) = 0.1 \times t e x p (- 5 t)$ and $h_{2 \to 4} (ℐ_{k}) = h_{2 \to 4} (ℐ_{k}) = 0.005 / [1 + e x p (5 - 10 t)]$ where $t = k ∕ 200$ is the right end of the interval $ℐ_{k}$ . In Case (B), $h_{1 \to 2} (ℐ_{k}) = h_{1 \to 3} (ℐ_{k}) = 0.01 \times e x p (- 2 t)$ and $h_{2 \to 4} (ℐ_{k}) = h_{2 \to 4} (ℐ_{k}) = h_{3 \to 5} (ℐ_{k}) = h_{3 \to 6} (ℐ_{k}) = 0.01 \times t^{2}$ where $t = k / 200$ is the right end of the interval $ℐ_{k}$ .

The 4-state model considered in Case (A) (left) and the 6-state model considered in Case (B) (right) in the simulation studies.

TABLE 1.

Biases and RMSE of the estimated regression coefficients, biases, and RMSE in the simulation study of interval-censored multistate data. Values in the parentheses are the corresponding standard errors.

			$β_{s_{1} \to s_{2}, 1}$		$β_{s_{1} \to s_{2}, 2}$		$β_{s_{1} \to s_{2}, 3}$		$β_{s_{1} \to s_{2}, 4}$
s ₁	s ₂	Method	Bias	Truth	Bias	Truth	Bias	Truth	Bias	Truth	RMSE

Case (A)
1	2	1st Order Approx	0.008 (0.003)	0.6	0.003 (0.003)	0.2	0.000 (0.003)	0.2	0.003 (0.003)	0.2	0.129 (0.002)
1	2	Package msm	0.048 (0.003)	0.6	0.041 (0.003)	0.2	0.018 (0.003)	0.2	0.023 (0.003)	0.2	0.151 (0.003)
1	3	1st Order Approx	−0.000 (0.003)	0.2	0.006 (0.003)	0.6	0.001 (0.003)	0.2	0.002 (0.003)	0.2	0.126 (0.002)
1	3	Package msm	0.038 (0.003)	0.2	0.045 (0.003)	0.6	0.020 (0.003)	0.2	0.021 (0.003)	0.2	0.147 (0.002)
2	4	1st Order Approx	0.004 (0.005)	0.2	0.004 (0.004)	0.2	0.016 (0.004)	0.6	0.008 (0.004)	0.2	0.199 (0.004)
2	4	Package msm	−0.058 (0.004)	0.2	−0.056 (0.004)	0.2	−0.090 (0.004)	0.6	−0.038 (0.004)	0.2	0.211 (0.003)
3	4	1st Order Approx	0.012 (0.004)	0.2	0.008 (0.005)	0.2	0.006 (0.004)	0.2	0.010 (0.004)	0.6	0.201 (0.003)
3	4	Package msm	−0.048 (0.004)	0.2	−0.055 (0.004)	0.2	−0.037 (0.004)	0.2	−0.093 (0.004)	0.6	0.211 (0.003)
Case (B)
1	2	1st Order Approx	0.004 (0.002)	0.5	−0.000 (0.002)	0.1	0.000 (0.002)	0.1	0.002 (0.002)	0.1	0.105 (0.002)
1	2	Package msm	0.075 (0.003)	0.5	0.074 (0.004)	0.1	0.022 (0.003)	0.1	0.028 (0.003)	0.1	0.163 (0.003)
1	3	1st Order Approx	−0.001 (0.002)	0.1	0.008 (0.002)	0.5	−0.003 (0.002)	0.1	0.002 (0.002)	0.1	0.101 (0.002)
1	3	Package msm	0.068 (0.003)	0.1	0.077 (0.004)	0.5	0.021 (0.003)	0.1	0.027 (0.003)	0.1	0.159 (0.003)
2	4	1st Order Approx	0.001 (0.004)	0.1	0.006 (0.004)	0.1	0.005 (0.004)	0.4	0.004 (0.004)	0.2	0.182 (0.003)
2	4	Package msm	−0.123 (0.004)	0.1	−0.060 (0.005)	0.1	−0.085 (0.004)	0.4	−0.054 (0.005)	0.2	0.231 (0.004)
2	5	1st Order Approx	0.011 (0.004)	0.5	−0.004 (0.004)	0.1	0.002 (0.004)	0.1	−0.005 (0.004)	0.1	0.169 (0.003)
2	5	Package msm	−0.139 (0.004)	0.5	−0.069 (0.004)	0.1	−0.080 (0.004)	0.1	−0.061 (0.004)	0.1	0.234 (0.004)
3	5	1st Order Approx	0.003 (0.004)	0.1	0.009 (0.004)	0.5	0.001 (0.004)	0.1	0.008 (0.004)	0.1	0.172 (0.003)
3	5	Package msm	−0.062 (0.004)	0.1	−0.139 (0.004)	0.5	−0.051 (0.004)	0.1	−0.073 (0.004)	0.1	0.227 (0.003)
3	6	1st Order Approx	0.001 (0.004)	0.1	−0.014 (0.004)	0.1	0.001 (0.004)	0.2	0.009 (0.004)	0.4	0.181 (0.003)
3	6	Package msm	−0.063 (0.004)	0.1	−0.136 (0.005)	0.1	−0.064 (0.004)	0.2	−0.092 (0.005)	0.4	0.245 (0.004)

Open in a new tab

We compared the proposed method with the method in the R package “msm”, which is one of the few R packages that can implement regression analysis on multistate models with arbitrary structures. The method similarly assumes a proportional hazards model for transition times and covariates, so the estimates by “msm” are comparable to the estimates from the proposed model. However, the method in the “msm” assumes that transition times follow a parametric exponential multistate model. In contrast, the proposed method is semi-parametric where the baseline hazard functions can be nonparametric, so the proposed method is more flexible in comparison to the method in “msm”. In Table 1, we compare the bias and RMSE of the estimates produced by the package “msm” and the proposed method that applies first order approximation denoted by “1st Order Approx”. From the table, we can see that the proposed method generally gives estimates of smaller biases and smaller RMSE compared to the method in the package “msm”. Furthermore, in Case (B), the algorithm in the package “msm” failed to converge on 195 simulations with error messages that the Hessian matrices were not positive-definite. The “msm” algorithm is based on the scoring procedure by Kalbfleisch and Lawless³³. In formulating the scoring procedure, the chain rule is used to obtain derivatives of the likelihood function with respect to the transition probabilities initially, followed by the coefficients. As a result, the transition probabilities are in the denominator of the Hessian matrix as shown in Equation (3.6) of Kalbfleisch and Lawless³³. Consequently, their method is more likely to suffer from a singular Hessian matrix. By contrast, the proposed EM algorithm is less likely to suffer from numerical singularities because of the techniques we use to simplify the likelihood function to be optimized.

4 |. DATA APPLICATION TO HEART TRANSPLANT DATA

In this section, we apply the proposed method to analyze a real dataset that tracks the progression of coronary allograft vasculopathy (CAV) after heart transplantation. The dataset can be accessed through the R package “msm” on CRAN: https://cran.r-project.org/web/packages/msm/index.html. The dataset comprises 2846 visits from 622 patients, with each visit recording the severity of CAV. We categorize CAV into States 1, 2, and 3, representing no, mild, and severe CAV, respectively, and State 4 signifies death. The time origin is chosen to be the time of transplantation. We note that in this model, patients can transition from a more severe state to a less severe state (e.g., from State 2 to State 1 and from State 3 to State 2). In this dataset, we observed such transitions on 58 patients (10.3%) where 53 patients (9.4%) had one such transition and 5 patients had two such transitions. State 4 is the end state that patients can not recover from. The multistate models in many other works make the assumption that there are no bidirection transitions^12,32,34, and the backward transitions are usually assumed to be misclassified and removed from the data analysis (e.g., Section 1.3.1 of Van Den Hout³⁴). The proposed method is not constrained by such assumptions which could potentially lead to an underestimation of the transition risks. The model incorporates two covariates: the age group of organ recipients (coded as 0 for “under 50 years” and 1 for “over 50 years”) and the age group of organ donors (coded as 0 for “under 30 years” and 1 for over 30 years”). Figure 2 illustrates the transitions between states, Table 2 summarizes the frequency of each observed transition, and Table 3 summarizes the demographics and distribution of covariates in the analysis population.

A four-state model for describing the progression of severity of CAV following heart transplantation.

TABLE 2.

Observed number of transitions in the heart transplant dataset.

		To
		State 1	State 2	State 3	State 4
From	State 1	1367	204	44	148
	State 2	46	134	54	48
	State 3	4	13	107	55

Open in a new tab

TABLE 3.

Summary of the demographics and distribution of covariates in the analysis population.

Variable	Frequency (Percent)

Sex (Female)	87 (14.0%)
Age of Organ Recipient ≥ 50	291 (46.8%)
Age of Organ Donor ≥ 30	297 (47.7%)

Open in a new tab

Table 4 presents the estimated regression coefficients. We find that the risk of transitioning from State 1 to State 2 is higher for older donor age groups (p < 0.001), and there is also a reduced likelihood of recovering from State 2 to State 1 (p < 0.001) and from State 3 to State 2 (p = 0.038) when donors are in the older group. Recipients in the older age group face a higher risk of transitioning from State 1 to State 4 (p < 0.001) and a lower chance of recovering from State 3 to State 2 (p = 0.005). At the 0.05 significance level, we do not find other estimates of regression coefficients to be statistically significant.

TABLE 4.

Estimated regression coefficients and the corresponding 95% CI and p-values in the application to heart transplant data.

Variable	Estimate	95% CI	P values

State 1 → State 2
Recipient Age Group	0.033	(−0.188,0.253)	0.772
Donor Age Group	0.409	(0.195,0.623)	< .001
State 1 → State 4
Recipient Age Group	0.983	(0.592,1.374)	< .001
Donor Age Group	0.157	(−0.222,0.535)	0.417
State 2 → State 1
Recipient Age Group	0.245	(−0.085,0.575)	0.146
Donor Age Group	−0.583	(−0.915,−0.250)	< .001
State 2 → State 3
Recipient Age Group	−0.252	(−0.699,0.195)	0.269
Donor Age Group	0.023	(−0.357,0.403)	0.904
State 2 → State 4
Recipient Age Group	−0.074	(−0.856,0.709)	0.853
Donor Age Group	0.143	(−0.569,0.856)	0.694
State 3 → State 2
Recipient Age Group	−1.121	(−1.908,−0.334)	0.005
Donor Age Group	−0.645	(−1.254,−0.036)	0.038
State 3 → State 4
Recipient Age Group	0.307	(−0.234,0.848)	0.266
Donor Age Group	−0.138	(−0.626,0.350)	0.579

Open in a new tab

Figure 3 presents our estimates of the 1-year probabilities of state occupation for recipients and donors in the younger age group within 10 years of transplantation. The results indicate that given a patient is currently in State 1, the probability of leaving State 1 is generally less than 0.2. Patients in State 2 have a higher probability of transitioning to State 3, and the probability of returning to State 1 decreases over time. Notably, the 1-year probability of death (State 4) increases for patients in more severe states (State 1 < State 2 < State 3). The Julia code for implementing the data application can be found in the GitHub repository at https://github.com/luyouepiusf/approximation_method.

The figure provides estimated probabilities of the recipient’s future state occupation after one year, based on their current state and the time elapsed since transplantation. These probabilities are calculated assuming both the recipient and the donor belong to the younger age groups.

5 |. CONCLUSIONS AND DISCUSSIONS

In this paper, a novel method based on the idea of data augmentation is proposed to handle interval censoring in both single-event survival models and multistate models. An efficient EM algorithm is proposed to estimate the parameters in the model. Theoretical and numerical results have shown that the proposed method gives sound parameter estimates and is computationally efficient. The proposed method is applied to the heart transplant dataset to model the advancement of CAV following heart transplantation.

There are still some questions and future research topics that can be explored following this research effort. First, the proposed method relies on an approximation method that compromises the precision of parameter estimates. It is worth discussing whether a higher-order approximation or a correction method can improve the estimation. Second, in many observational studies, longitudinal measures of risk factors are also collected from the participants. The longitudinal risk factors can be important predictors of disease progression and can be included as covariates in the model. The proposed method can be extended to incorporate longitudinal data. Third, the proportional hazards assumption can be relaxed to allow more flexibility in our model. For example, we may consider introducing time-dependent coefficients or using a semiparametric single-index model.^35,36 Finally, the current research only discusses cases when the censoring is independent but in some applications, censoring can be dependent on the covariates or the past state occupation. Methods to handle dependent censoring need to be considered in this case.^37,38

Supplementary Material

Code

NIHMS1984079-supplement-Code.zip^{(18.6KB, zip)}

Supinfo

NIHMS1984079-supplement-Supinfo.pdf^{(147.2KB, pdf)}

ACKNOWLEDGMENTS

Research reported in this publication was supported by the National Institute Of Diabetes And Digestive And Kidney Diseases of the National Institutes of Health under Award Number R03DK135437. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

0. Abbreviations:

EM: expectation-maximization
CAV: coronary allograft vasculopathy
RMSE: root-mean-squared error

APPENDIX

A. METHOD FOR INTERVAL-CENSORED SINGLE-EVENT DATA

We will give a description of the method in the single-event interval model as a special case of the multistate model.

A.1. Data and Notations

While we will try to keep the notations consistent with those in Section 2, some adaptations to the notations are needed to represent single-event data. We consider a typical interval-censored dataset where each individual can only experience the survival event once. Suppose that there are $m$ individuals in the dataset. The ith individual is sequentially monitored at the time sequence $T_{i 1} < T_{i 2} < \dots < T_{i n_{i}}$ for the survival status. We let $Δ_{i} = 1$ if the ith individual is known to have experienced the survival event at one of the monitoring times, and $Δ_{i} = 0$ otherwise. For individuals with $Δ_{i} = 1$ , we know that the true event time $T_{i}^{*}$ falls in one of the intervals: $(0, T_{i 1}], (T_{i 1}, T_{i 2}], \dots, (T_{i n_{i - 1}}, T_{i n_{i}}]$ and denote the interval by $(L_{i}, R_{i}]$ with the censoring time defined by $C_{i} = R_{i}$ . For those who did not have an event, we know that the individual remains event-free until $T_{i n_{i}}$ and we will let the censoring time $C_{i}$ be the last follow-up time $T_{i n_{i}}$ . Let $(0, 𝒯]$ be the design interval that encompasses all monitoring times $T_{i j}$ . The design interval $(0, 𝒯]$ is discretized into a union of $N_{t}$ disjoint small intervals $ℐ_{1} = (t_{0}, t_{1}], ℐ_{2} = (t_{1}, t_{2}], \dots, ℐ_{N_{t}} = (t_{1}, t_{N_{t}}]$ where $t_{0} = 0$ , $t_{N_{t}} = 𝒯$ , and the time sequence $T = \{t_{0}, t_{1}, \dots, t_{N_{t}}\}$ encompasses all $L_{i}$ , $R_{i}$ and $C_{i}$ . Here we let $δ_{i} (t) = I {t = T_{i}^{*}, t \leq C_{i}}$ be the event indicator process, and $g_{i} (t) = I {t \leq T_{i}^{*}, t \leq C_{i}}$ be the at-risk process. For ease of exposition, we introduce the following notations to allow $δ_{i} (\cdot)$ and $g (\cdot)$ to be functions of intervals. For an interval $ℐ_{j}$ , we let $δ_{i} (ℐ_{j}) = {m a x}_{t \in ℐ_{j}} δ_{i} (t) = I {T_{i}^{*} \in ℐ_{j}, t_{j} \leq C_{i}}$ , which means $δ_{i} (ℐ_{j}) = 1$ if and only if the event is observed before censoring and in the interval $ℐ_{j}$ . Similarly, we let $g_{i} (ℐ_{j}) = {m a x}_{t \in ℐ_{j}} g_{i} (t) = I {t_{j - 1} < T_{i}^{*}, t_{j} \leq C_{i}}$ , which means $g_{i} (ℐ_{j}) = 1$ if and only if the individual is at risk of events before entering the interval $ℐ_{j}$ and has not been censored in the interval $ℐ_{j}$ . Let $x_{i}$ be a p-dimensional vector of covariates for the ith individual. The survival times are modeled by a discrete-time survival model with complementary log-log link

P (δ_{i} (ℐ_{j}) = 1 ∣ g_{i} (ℐ_{j}) = 1) = 1 - e x p \{- h_{0} (ℐ_{j}) e x p (β^{'} x_{i})\}

where $j = 1, \dots, N_{t}, β$ is a p-dimensional vector of regression coefficients, and $h_{0} = {h_{0} (ℐ_{1}), \dots, h_{0} (ℐ_{N_{t}})}$ are parameters that characterize the rate of events in the interval $ℐ_{j}$ .

A.2. The Observed, Complete, and Partial Likelihood

Given the observed data, we only know the values of $δ_{i} (ℐ_{j})$ and $g_{i} (ℐ_{j})$ partially. When $Δ_{i} = 1$ , we know that $δ_{i} (ℐ_{j})$ is 0 before $L_{i}$ , and at least one of $δ_{i} (ℐ_{j})$ is non-zero in $(L_{i}, R_{i}]$ . So the log probability density function of the observed data $O_{i} = (Δ_{i}, L_{i}, R_{i}, C_{i})$ is

\begin{array}{l} l o g f (O_{i} ∣ β, h_{0}) \\ = l o g \{\prod_{j : ℐ_{j} \subset (0, L_{i}]} P (δ_{i} (ℐ_{j}) = 0 ∣ g_{i} (ℐ_{j}) = 1) [1 - \prod_{j : ℐ_{j} \subset (L_{i}, R_{i}]} P (δ_{i} (ℐ_{j}) = 0 ∣ g_{i} (ℐ_{j}) = 1)]\} . \end{array}

(A1)

When $Δ_{i} = 0$ , we know that $δ_{i} (ℐ_{j})$ is 0 before $C_{i}$ , so the log probability density function of the observed data $O_{i}$ is

l o g f (O_{i} ∣ β, h_{0}) = l o g \{\prod_{j : ℐ_{j} \subset (0, C_{i}]} P (δ_{i} (ℐ_{j}) = 0 ∣ g_{i} (ℐ_{j}) = 1)\} .

Therefore, the observed log-likelihood function for $β$ and $h_{0}$ is given by $l (β, h_{0}) = \sum_{i = 1}^{m} l o g f (O_{i} ∣ β, h_{0})$ , Similarly, we can construct augmented data that gives a simple complete likelihood function by assuming the complete knowledge of $δ_{i} = {δ_{i} (ℐ_{j})}_{j = 1}^{N_{t}}$ and $g_{i} = {g_{i} (ℐ_{j})}_{j = 1}^{N_{t}}$ . Given the values of $δ_{i}$ and $g_{i}$ , the joint log probability density function of $g_{i}$ , $δ_{i}$ and the observed data $O_{i}$ is

l o g f (O_{i}, δ_{i}, g_{i} ∣ β, h_{0}) = l o g f (O_{i} ∣ δ_{i}, g_{i}) + l o g f (δ_{i}, g_{i} ∣ β, h_{0}) = l o g f (O_{i} ∣ δ_{i}, g_{i}) + \sum_{j = 1}^{N_{t}} g_{i} (ℐ_{j}) [δ_{i} (ℐ_{j}) l o g \{e x p \{h_{0} (ℐ_{j}) e x p (β^{'} x_{i})\} - 1\} - h_{0} (ℐ_{j}) e x p (β^{'} x_{i})]

We note that $f (O_{i} ∣ δ_{i}, g_{i})$ is the probability density function that specifies the probability of observing the interval-censored outcomes given the true survival information. By our assumption that the observation process is independent of the event process, $f (O_{i} ∣ δ_{i}, g_{i})$ should not depend on the parameters $β$ and $h_{0}$ . The complete log-likelihood function for $β$ and $h_{0}$ is then given by

l^{*} (β, h_{0}) = \sum_{i = 1}^{m} \log f (O_{i}, δ_{i}, g_{i} ∣ β, h_{0}) .

where the term that does not depend on $β$ and $h_{0}$ can be dropped. We can similarly apply the Taylor approximation in Section 2.2 and the complete log-likelihood function $l^{*} (β, h_{0})$ can be approximated by

l_{0}^{*} (β, h_{0}) = \sum_{i = 1}^{m} \sum_{j = 1}^{N_{t}} g_{i} (ℐ_{j}) [δ_{i} (ℐ_{j}) \{l o g h_{0} (ℐ_{j}) + β^{'} x_{i}\} - h_{0} (ℐ_{j}) e x p (β^{'} x_{i})] .

We can similarly re-write $l_{0}^{*} (β, h_{0})$ as a partial log-likelihood. By solving $\frac{\partial l_{0}^{*} (β, h_{0})}{\partial h_{0} (ℐ_{j})} = 0$ , we can derive the following expression for $h_{0} (ℐ_{j})$

h_{0} (ℐ_{j}) = \frac{\sum_{i = 1}^{m} g_{i} (ℐ_{j}) δ_{i} (ℐ_{j})}{\sum_{i = 1}^{m} g_{i} (ℐ_{j}) e x p (β^{'} x_{i})} .

After we plug in the above expression back to $l_{0} (β, h_{0})$ , $l_{0} (β, h_{0})$ then can be written as a partial likelihood function that is free of the nuisance parameters $h_{0}$

p l_{0}^{*} (β) = \sum_{i = 1}^{m} \sum_{j = 1}^{N_{t}} g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) \{β^{'} x_{i} - l o g (\sum_{l = 1}^{m} g_{l} (ℐ_{j}) e x p (β^{'} x_{l}))\} .

After applying the zeroth order approximation, the complete log-likelihood function closely resembles the Cox partial likelihood function.²⁵ Similarly, we can apply the first order approximation, $l o g {e x p (x) - 1} \approx l o g (x) + x / 2$ , to $l_{0}^{*} (β, h_{0})$ , which reduces the log-likelihood function to

l_{1}^{*} (β, h_{0}) = \sum_{i = 1}^{m} \sum_{j = 1}^{N_{t}} g_{i} (ℐ_{j}) [δ_{i} (ℐ_{j}) \{l o g h_{0} (ℐ_{j}) + β^{'} x_{i}\} - (1 - \frac{δ_{i} (ℐ_{j})}{2}) h_{0} (ℐ_{j}) e x p (β^{'} x_{i})] .

We can similarly derive the partial log-likelihood form using the technique

p l_{1}^{*} (β) = \sum_{i = 1}^{m} \sum_{j = 1}^{N_{t}} g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) \{β^{'} x_{i} - l o g (\sum_{l = 1}^{m} g_{l} (ℐ_{j}) (1 - \frac{δ_{l} (ℐ_{j})}{2}) e x p (β^{'} x_{l}))\} .

A.3. Model Estimation by EM Algorithm

Similarly, the EM algorithm primarily consists of the expectation step and the maximization step. In the expectation step, we are concerned with evaluating the expectation of the complete log-likelihood given the observed data and current parameter estimates. We let $\tilde{P} (\cdot)$ and $\tilde{E} [\cdot]$ be the conditional probability and conditional expectation of a random variable given all observed data ${\{(Δ_{i}, L_{i}, R_{i}, C_{i})\}}_{i = 1}^{m}$ , and ${\tilde{P}}_{i} (\cdot)$ and ${\tilde{E}}_{i} [\cdot]$ be the conditional probability and conditional expectation of a random variable given all observed data from the individual $i, (Δ_{i}, L_{i}, R_{i}, C_{i})$ . The expected complete log-likelihood given the observed data is given by

\tilde{E} [l_{1}^{*} (β, h_{0})] = \sum_{i = 1}^{m} \sum_{j = 1}^{N_{t}} [{\tilde{E}}_{i} [g_{i} (ℐ_{j}) δ_{i} (ℐ_{j})] \{l o g h_{0} (ℐ_{j}) + β^{'} x_{i}\} - ({\tilde{E}}_{i} [g_{i} (ℐ_{j})] - \frac{{\tilde{E}}_{i} [g_{i} (ℐ_{j}) δ_{i} (ℐ_{j})]}{2}) h_{0} (ℐ_{j}) e x p (β^{'} x_{i})]

(A2)

The quantities that we need to evaluate in the expectation step are essentially ${\tilde{E}}_{i} [g_{i} (ℐ_{j}) δ_{i} (ℐ_{j})]$ and ${\tilde{E}}_{i} [g_{i} (ℐ_{j})]$ . Since $g_{i} (ℐ_{j}) δ_{i} (ℐ_{j})$ and $g_{i} (ℐ_{j})$ can only take values of 0 or 1, we have ${\tilde{E}}_{i} [g_{i} (ℐ_{j}) δ_{i} (ℐ_{j})] = {\tilde{P}}_{i} (g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) = 1)$ and ${\tilde{E}}_{i} [g_{i} (ℐ_{j})] = {\tilde{P}}_{i} (g_{i} (ℐ_{j}) = 1)$ .

Therefore, for individuals with $Δ_{i} = 1$ , we have

{\tilde{E}}_{i} [g_{i} (ℐ_{j}) δ_{i} (ℐ_{j})] = \{\begin{array}{l} 0 & if ℐ_{j} \subset (0, L_{i}] \\ {\tilde{P}}_{i} (g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) = 1) & if ℐ_{j} \subset (L_{i}, R_{i}] \\ 0 & if ℐ_{j} \subset (R_{i}, 𝒯] \end{array} and {\tilde{E}}_{i} [g_{i} (ℐ_{j})] = \{\begin{array}{l} 1 & if ℐ_{j} \subset (0, L_{i}] \\ {\tilde{P}}_{i} (g_{i} (ℐ_{j}) = 1) & if ℐ_{j} \subset (L_{i}, R_{i}] \\ 0 & if ℐ_{j} \subset (R_{i}, 𝒯] \end{array} .

For individuals with $Δ_{i} = 1$ , we have ${\tilde{E}}_{i} [g_{i} (ℐ_{j}) δ_{i} (ℐ_{j})] = 0$ and

{\tilde{E}}_{i} [g_{i} (ℐ_{j})] = \{\begin{array}{l} 1 & if ℐ_{j} \in (0, C_{i}] \\ 0 & if ℐ_{j} \in (C_{i}, 𝒯] \end{array} .

So the problem boils down to the calculation of ${\tilde{P}}_{i} (g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) = 1)$ and ${\tilde{P}}_{i} (g_{i} (ℐ_{j}) = 1)$ when $Δ_{i} = 1$ and $ℐ_{j} \subset (L_{i}, R_{i}]$ . Given the observed information, we know the event has occurred in the interval $(L_{i}, R_{i}]$ , so $g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) = 1$ for at least one of the $j'$ in the collection ${j^{'} : ℐ_{j} \subset (L_{i}, R_{i}]}$ , let $P (g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) = 1) = p_{i j}$ , so we have

\begin{array}{l} {\tilde{P}}_{i} (g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) = 1) = P (g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) = 1 ∣ \underset{j : ℐ_{j} \subset (L_{i}, R_{i}]}{m a x} g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) = 1) \\ = P (g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) = 1) / P (\underset{j : ℐ_{j} \subset (L_{i}, R_{i}]}{m a x} g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) = 1) = \frac{[\prod_{j^{'} : j^{'} < j, ℐ^{'} \subset (L_{i}, R_{i}]} (1 - p_{i j^{'}})] p_{i j}}{1 - \prod_{j^{'} : I_{j}^{'} \subset (L_{i}, R_{i}]} (1 - p_{i j^{'}})} . \end{array}

Similarly,

\begin{array}{l} {\tilde{P}}_{i} (g_{i} (ℐ_{j}) = 1) = P (g_{i} (ℐ_{j}) = 1| \underset{j : ℐ_{j} \subset (L_{i}, R_{i}]}{m a x} g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) = 1) \\ = P (g_{i} (ℐ_{j}) = 1, \underset{j : ℐ_{j} \subset (L_{i}, R_{i}]}{m a x} g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) = 1) / P (\underset{j : ℐ_{j} \subset (L_{i}, R_{i}]}{m a x} g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) = 1) \\ = \frac{[\prod_{j^{'} : j^{'} < j, ℐ_{j^{'}} \subset (L_{i}, R_{i}]} (1 - p_{i j^{'}})] [1 - \prod_{j^{'} : j^{'} \geq j, j : ℐ_{j} \subset (L_{i}, R_{i}]} (1 - p_{i j^{'}})]}{1 - \prod_{j^{'} : ℐ_{j}^{'} \subset (L_{i}, R_{i}]} (1 - p_{i j^{'}})} . \end{array}

We can see that the expected log complete likelihood $\tilde{E} [l_{1}^{*} (β, h_{0})]$ is a re-weighted version of the log complete likelihood, where the weights can be interpreted as the probability of observing certain information. The first term $l o g h_{0} (ℐ_{j}) + β^{'} x_{i}$ is weighted by ${\tilde{E}}_{i} [g_{i} (ℐ_{j}) δ_{i} (ℐ_{j})]$ representing the expected values of the event process, and the second term $h_{0} (ℐ_{j}) e x p (β^{'} x_{i})$ is weighted by ${\tilde{E}}_{i} [g_{i} (ℐ_{j}) - g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) / 2]$ represented the expected values of the at-risk process. We can easily draw a parallel between the proposed method and the fractional re-weighted at-risk process considered by others. ^30,29 When the survival status of an individual is unknown, the individual contributes a fractional weight ${\tilde{E}}_{i} [g_{i} (ℐ_{j}) - g_{i} (ℐ_{j}) δ_{i} (ℐ_{j}) / 2]$ to the risk set determined by the probability that the individual is still at-risk in the interval $ℐ_{j}$ . The proposed method extends the idea of the fractional re-weighted at-risk process by Datta et al.³⁰ from right-censored cases to interval-censored cases.

In the maximization step, we update parameters to maximize the expected complete log-likelihood function $\tilde{E} [l_{1}^{*} (β, h_{0})]$ Similarly, we suggest updating $β$ using the partial log-likelihood function and then updating $h_{0}$ by plugging in the updated values of $β$ . The algorithm for the maximization step is described below

S (β) \leftarrow \sum_{i = 1}^{m} \sum_{j = 1}^{N_{t}} \tilde{E} [g_{i} (ℐ_{j}) δ_{i} (ℐ_{j})] \{x_{i} - \frac{\sum_{l = 1}^{m} ({\tilde{E}}_{l} [g_{l} (ℐ_{j})] - {\tilde{E}}_{l} [g_{l} (ℐ_{j}) δ_{l} (ℐ_{j})] / 2) e x p (β^{'} x_{l}) x_{l}}{\sum_{l = 1}^{m} ({\tilde{E}}_{l} [g_{l} (ℐ_{j})] - {\tilde{E}}_{l} [g_{l} (ℐ_{j}) δ_{l} (ℐ_{j})] / 2) e x p (β^{'} x_{l})}\}

I (β) \leftarrow \sum_{i = 1}^{m} \sum_{j = 1}^{N_{t}} \tilde{E} [g_{i} (ℐ_{j}) δ_{i} (ℐ_{j})] \{- \frac{\sum_{l = 1}^{m} ({\tilde{E}}_{l} [g_{l} (ℐ_{j})] - {\tilde{E}}_{l} [g_{l} (ℐ_{j}) δ_{l} (ℐ_{j})] / 2) e x p (β^{'} x_{l}) x_{l}^{\otimes 2}}{\sum_{l = 1}^{m} ({\tilde{E}}_{l} [g_{l} (ℐ_{j})] - {\tilde{E}}_{l} [g_{l} (ℐ_{j}) δ_{l} (ℐ_{j})] / 2) e x p (β^{'} x_{l})} + {(\frac{\sum_{l = 1}^{m} ({\tilde{E}}_{l} [g_{l} (I_{j})] - {\tilde{E}}_{l} [g_{l} (ℐ_{j}) δ_{l} (ℐ_{j})] / 2) e x p (β^{'} x_{l}) x_{l}}{\sum_{l = 1}^{m} ({\tilde{E}}_{l} [g_{l} (ℐ_{j})] - {\tilde{E}}_{l} [g_{l} (ℐ_{j}) δ_{l} (ℐ_{j})] / 2) e x p (β^{'} x_{l})})}^{\otimes 2}\}

β \leftarrow β - {[I (β)]}^{- 1} S (β)

h_{0} (ℐ_{j}) \leftarrow \frac{\sum_{i = 1}^{m} {\tilde{E}}_{i} [g_{i} (ℐ_{j}) δ_{i} (ℐ_{j})]}{\sum_{i = 1}^{m} ({\tilde{E}}_{i} [g_{i} (ℐ_{j})] - {\tilde{E}}_{i} [g_{i} (ℐ_{j}) δ_{i} (ℐ_{j})] / 2) e x p (β^{'} x_{i})},

where $S (β)$ and $I (β)$ are the first and second order derivatives of $\tilde{E} [p l_{1}^{*} (β)]$ . The EM algorithm proceeds by iterating between the expectation step and maximization step until convergence holds. The convergence properties of the proposed EM algorithm is established as a proposition in Appendix B.

A.4. Comparing with Turnbull’s Method

We give a comparison between the proposed method and the seminal idea proposed by Turnbull². Based on Turnbull’s idea, the log-likelihood function can be written as follows

l o g l (β, h_{0}) = \sum_{i = 1}^{m} l o g f (O_{i} ∣ β, h_{0})

where

l o g f (O_{i} ∣ β, h_{0}) = \{\begin{array}{l} l o g P \{T_{i}^{*} \subset (L_{i}, R_{i}]\} = l o g \{\sum_{j : I_{j} \subset (L_{i}, R_{i}]} α_{i j}\} & when Δ_{i} = 1, \\ l o g P \{T_{i}^{*} \subset (C_{i}, T]\} = l o g \{\sum_{j : I_{j} \subset (L_{i}, T]} α_{i j}\} & when Δ_{i} = 0, \end{array}

and $α_{i j} = P (T_{i}^{*} \in ℐ_{j})$ . By contrast, our current parameterization gives the following form of $l o g f (O_{i} ∣ β, h_{0})$

l o g f (O_{i} ∣ β, h_{0}) = \{\begin{array}{l} l o g \{\prod_{j : I_{j} \subset (0, L_{i}]} λ_{i j} [1 - \prod_{j : ℐ_{j} \subset (L_{i}, R_{i}]} λ_{i j}]\} & when Δ_{i} = 1, \\ l o g \{\prod_{j : I_{j} \subset (0, C_{i}]} λ_{i j}\} & when Δ_{i} = 0, \end{array}

where $λ_{i j} = P (T_{i}^{*} \in ℐ_{j} ∣ g_{i} (ℐ_{j}) = 1)$ . From the above equations, we can observe that the difference between $α_{i j}$ and $λ_{i j}$ is whether the event time probabilities are modeled conditional on the at-risk process. It is noteworthy that the parameterization with $λ_{i j}$ enables us to operate with the logarithm of products rather than the logarithm of sums. Also, the parameterization with $α_{i j}$ is subject to the constraint that $\sum_{j = 1}^{N_{t}} α_{i j} = 1$ and $α_{i j} \geq 0$ . These nuances explain the reason why the parameterization with $λ_{i j}$ simplifies the problem.

A.5. Simulation Studies

We will present some simulation studies to evaluate the proposed method for interval-censored single-event data. In the simulations, we let the design interval be (0,1]. The design interval is discretized into 100 sub-intervals $ℐ_{1} = (0.00, 0.01], ℐ_{2} = (0.01, 0.02], \dots, ℐ_{100} = (0.99, 1.00]$ and $h_{0} (ℐ_{k}) = 0.03 \times s i n (π t)$ , where $t = k ∕ 100$ is the right end of the interval $ℐ_{k} . p = 4$ and $x_{i} = {(x_{11}, \dots, x_{1 p})}^{'}$ follow multivariate normal distribution with mean $0_{p \times 1}$ and variance $I_{p \times p}$ . The true regression coefficients $β = {(β_{1}, \dots, β_{p})}^{'} = (0.4, 0.3, 0.2, 0.1)^{'}$ . The true survival outcomes $(Δ_{i}, T_{i}^{*})$ are generated according to the assumed hazard rate $h_{0} (\cdot)$ and regression coefficients $β$ . The monitoring time sequences are generated such that $T_{i j} - T_{i j - 1}$ follow exponential distribution with mean $λ$ where $n_{i} = 20$ , $1 \leq j \leq n_{i}$ , $T_{i 0} = 0$ , and $T_{i j}$ are rounded to multiples of 0.01. We considered four different cases. In Cases (I), (II), and (III), we let the sample size $m = 500, 1000, 2000$ respectively, and $λ = 0.2$ . In Cases (VI) and (V), we let $λ = 0.15, 0.25$ respectively, and $m = 1000$ . All simulations are repeated 500 times. We will compare the following five methods. The proposed methods that approximate the log-likelihood functions by 0th and 1st order Taylor expansion are denoted by “0th Order Approx” and “1st Order Approx”. The method that directly maximizes the log-likelihood functions without approximations is denoted by “No Approx”. The standard Cox proportional hazards model that does not account for interval censoring is denoted by “Cox PH”. The method by the R package “icenreg” is denoted by “Package icenreg”, which implements the state-of-the-art gradient ascent algorithm to fit interval-censored data. In Table A1, we summarize the results by evaluating the biases of estimated regression coefficients $β$ , and the root-mean-squared error (RMSE) of regression coefficients $β$ . In addition, we reported the median and interquartile of CPU times and the number of iterations taken to converge.

TABLE A1.

Estimation results and computational performance of the five methods used in modeling interval-censored single-event data. The accuracy of the estimation is evaluated by measuring the biases and RMSE of the estimated regression coefficients, with the corresponding standard errors in parentheses. The computational performance is evaluated by the median CPU times in seconds and median number of iterations required to converge, along with the corresponding inter-quartile ranges in brackets.

Method	$B i a s ({\hat{β}}_{1})$	$B i a s ({\hat{β}}_{2})$	$B i a s ({\hat{β}}_{3})$	$B i a s ({\hat{β}}_{4})$	$R M S E (\hat{β})$	CPU Time (Seconds)	Iterations

Case (I)
0th Order Approx	−0.038 (0.002)	−0.031 (0.002)	−0.021 (0.002)	−0.006 (0.002)	0.110 (0.002)	0.21 [0.20,0.21]	5 [5,5]
1st Order Approx	−0.034 (0.002)	−0.028 (0.002)	−0.019 (0.002)	−0.005 (0.002)	0.109 (0.002)	0.22 [0.21,0.22]	5 [5,5]
No Approx	0.009 (0.003)	0.004 (0.002)	0.002 (0.002)	0.005 (0.002)	0.109 (0.002)	46.41 [39.64,54.79]	940 [832,1046]
Cox PH	−0.043 (0.003)	−0.034 (0.002)	−0.024 (0.002)	−0.007 (0.002)	0.120 (0.002)	0.01 [0.01,0.01]	3 [3,3]
Package icenReg	0.009 (0.003)	0.005 (0.002)	0.002 (0.002)	0.006 (0.002)	0.110 (0.002)	0.09 [0.08,0.09]	10 [9,11]
Case (II)
0th Order Approx	−0.045 (0.002)	−0.033 (0.002)	−0.020 (0.001)	−0.012 (0.002)	0.091 (0.001)	0.49 [0.48,0.49]	5 [5,5]
1st Order Approx	−0.041 (0.002)	−0.030 (0.002)	−0.018 (0.002)	−0.011 (0.002)	0.088 (0.001)	0.51 [0.50,0.52]	5 [5,5]
No Approx	0.002 (0.002)	0.002 (0.002)	0.003 (0.002)	0.000 (0.002)	0.076 (0.001)	100.52 [69.91,115.81]	957 [885,1034]
Cox PH	−0.051 (0.002)	−0.037 (0.002)	−0.024 (0.002)	−0.014 (0.002)	0.100 (0.001)	0.02 [0.02,0.02]	3 [3,3]
Package icenReg	0.002 (0.002)	0.002 (0.002)	0.003 (0.002)	0.000 (0.002)	0.077 (0.001)	0.75 [0.73,0.78]	8 [8,8]
Case (III)
0th Order Approx	−0.043 (0.001)	−0.035 (0.001)	−0.024 (0.001)	−0.009 (0.001)	0.077 (0.001)	0.76 [0.75,0.77]	5 [5,5]
1st Order Approx	−0.039 (0.001)	−0.032 (0.001)	−0.022 (0.001)	−0.008 (0.001)	0.074 (0.001)	0.79 [0.78,0.80]	5 [5,5]
No Approx	0.002 (0.001)	−0.001 (0.001)	−0.002 (0.001)	0.002 (0.001)	0.054 (0.001)	137.35 [124.93,146.15]	970 [921,1024]
Cox PH	−0.049 (0.001)	−0.039 (0.001)	−0.027 (0.001)	−0.011 (0.001)	0.086 (0.001)	0.02 [0.02,0.02]	3 [3,3]
Package icenReg	0.002 (0.001)	−0.001 (0.001)	−0.002 (0.001)	0.002 (0.001)	0.054 (0.001)	0.43 [0.38,0.48]	12 [11,14]
Case (IV)
0th Order Approx	−0.033 (0.002)	−0.023 (0.002)	−0.014 (0.001)	−0.008 (0.002)	0.081 (0.001)	0.69 [0.62,0.71]	5 [5,5]
1st Order Approx	−0.028 (0.002)	−0.020 (0.002)	−0.011 (0.001)	−0.007 (0.002)	0.079 (0.001)	0.76 [0.66,0.78]	5 [5,5]
No Approx	0.002 (0.002)	0.002 (0.002)	0.004 (0.002)	0.000 (0.002)	0.075 (0.001)	118.86 [99.55,129.15]	918 [849,987]
Cox PH	−0.036 (0.002)	−0.025 (0.002)	−0.015 (0.002)	−0.010 (0.002)	0.086 (0.001)	0.01 [0.01,0.01]	3 [3,3]
Package icenReg	0.002 (0.002)	0.003 (0.002)	0.004 (0.002)	0.000 (0.002)	0.075 (0.001)	0.76 [0.74,0.79]	8 [8,8]
Case (V)
0th Order Approx	−0.054 (0.001)	−0.040 (0.001)	−0.025 (0.001)	−0.014 (0.002)	0.099 (0.001)	0.41 [0.41,0.42]	5 [5,5]
1st Order Approx	−0.051 (0.001)	−0.038 (0.002)	−0.023 (0.001)	−0.013 (0.002)	0.096 (0.001)	0.43 [0.42,0.43]	5 [5,5]
No Approx	0.003 (0.002)	0.003 (0.002)	0.004 (0.002)	0.001 (0.002)	0.078 (0.001)	144.31 [118.43,157.38]	1015 [937,1080]
Cox PH	−0.062 (0.002)	−0.046 (0.002)	−0.029 (0.002)	−0.017 (0.002)	0.111 (0.001)	0.01 [0.01,0.01]	3 [3,3]
Package icenReg	0.003 (0.002)	0.002 (0.002)	0.004 (0.002)	0.001 (0.002)	0.078 (0.001)	0.16 [0.15,0.17]	9 [9,10]

Open in a new tab

Here we briefly summarize the results. In terms of RMSE, “Cox PH” has the largest RMSE, implying that the parameter estimates obtained from models not accounting for interval censoring are not as efficient than the other methods that account for interval censoring. Among the three methods proposed in this manuscript (“0th Order Approx”, “1st Order Approx”, and “No Approx”), “No Approx” gives the estimates of the smallest RMSE while “0th Order Approx” gives the largest RMSE in most of the cases considered here. Since “1st Order Approx” improves the order of approximation in “0th Order Approx”, it gives a smaller RMSE compared to “0th Order Approx”. The RMSE of parameter estimates by “Package icenReg” is close to “No Approx”. As the state-of-the-art algorithm for fitting interval-censored data, “Package icenReg” demonstrates good computational efficiency overall with short CPU time and the algorithm converges around 10 iterations. The CPU times of “0th Order Approx” and “1st Order Approx” are comparable to “No Approx”, and the algorithms only take around 5 iterations to converge. On the contrary, “No Approx” is relatively slow and usually requires a large number of iterations to achieve convergence. The results show that the techniques to approximate the log-likelihood and get rid of nuisance parameters can greatly facilitate the convergence of the algorithms. Among the three proposed methods, we found “1st Order Approx” promising as it is computationally efficient to be extended to interval-censored multistate data, and the loss of estimation efficiency due to approximation is less compared to “0th Order Approx”.

B. CONVERGENCE OF THE EM ALGORITHMS

B.1. Convergence of the EM Algorithm for Single-Event Models

We will begin by demonstrating the convergence of the EM algorithm for single-event data, as it represents a more straight forward scenario. The convergence of the EM algorithm for multistate data can be acquired similarly with some slight modifications. In this subsection, we will follow the notations introduced in Appnedix A.

We present the following proposition to show the convergence property of the EM algorithm.

Proposition 1

(Convergence of the EM algorithm for single-event model). Let $A_{i} = (δ_{i}, g_{i})$ be the augmented data and $A = {\{A_{i}\}}_{i = 1}^{m}$ and $θ = (β, h_{0})$ be the parameters in the model. Then under assumptions (a)–(d) listed below, there exists a neighborhood $Θ$ of $θ^{*}$ , such that for any initial value $θ^{(0)}$ in $Θ$ , the sequence of parameter estimates generated by the EM algorithm ${θ^{(k)}}_{k = 0}^{\infty}$ converges to the maximizer $θ^{*}$ , where $θ^{(k + 1)} = {s u p}_{θ \in Θ} Q (θ ∣ θ^{(k)})$ and $Q (\tilde{θ} ∣ θ) = E [(\tilde{θ}) ∣ O, θ]$ .

Across all values of $i = 1, \dots,, m$ the event time processes ${δ_{i} (ℐ_{j})}_{j = 1}^{N_{t}}$ and the processes of monitoring times $T_{i 1} < \dots < T_{i n_{i}}$ are jointly independent. The processes of monitoring times are independent of the parameters $β$ and $h_{0}$ .
The survival times can be modeled by a discrete-time survival model with a complementary log-log link
$P (δ_{i} (ℐ_{j}) = 1 ∣ g_{i} (ℐ_{j}) = 1) = 1 - e x p \{- h_{0} (ℐ_{j}) e x p (β^{'} x_{i})\}$
The parameter space $Ω$ is compact with non-empty interior and the maximizer $θ^{*}$ of the loglikelihood function $l (θ)$ lies in the interior of $Ω$ .
$l (θ)$ has finite stationary points in $Ω$ .

Proof of Proposition 1. First, we will verify the following smoothness properties.

$l (θ)$ has at least second-order continuous derivatives with respect to $β$ .
$Q (\tilde{θ} ∣ θ)$ has at least second-order continuous derivatives with respect to both $\tilde{θ}$ and $θ$ .

To show (i), we note that

l (θ) = \sum_{i = 1}^{m} l o g f (O_{i} ∣ θ) = l o g \{\int_{𝒜} f (O_{i} ∣ A_{i}) f (A_{i} ∣ θ) d m (A_{i})\}

Since both $δ_{i}$ and $g_{i}$ can only take values of 0 and 1, $A_{i}$ is defined on a discrete measure space $𝒜$ , and the Lebesgue integral over $𝒜$ can be reduced to a finite sum. As a consequence, the smoothness of $l (θ)$ follows from the smoothness of $f (A_{i} ∣ θ)$ . Property (i) follows from the fact that

f (A_{i} ∣ θ) = \prod_{j = 1}^{N_{t}} p_{i j}^{δ_{i j}} {(1 - p_{i j})}^{1 - δ_{i j}}

has at least second-order continuous derivatives with respect to $θ$ . Similarly, by Bayes rule, we can write

Q (\tilde{θ} ∣ θ) = E [l^{*} (\tilde{θ}) ∣ O, θ] = \sum_{i = 1}^{m} \frac{\int_{𝒜} l o g p (A_{i} ∣ \tilde{θ}) p (O_{i} ∣ A_{i}) p (A_{i} ∣ θ) d A_{i}}{\int_{𝒜} p (O_{i} ∣ A_{i}) p (A_{i} ∣ θ) d A_{i}} .

For the same reason, we can show the smoothness of $Q (\tilde{θ} ∣ θ)$ in (ii) follows from the smoothness of $p (A_{i} ∣ \tilde{θ})$ and $p (A_{i} ∣ θ)$ .

The smoothness properties imply that the derivatives of $l (θ)$ and $Q (\tilde{θ} ∣ θ)$ should be 0, at their stationary points. The finiteness of stationary points and the uniqueness of the maximizer implies that there exists some $δ > 0$ such that for any $θ$ in $Θ = {θ : l (θ) \geq l (θ^{*}) - δ}, \nabla_{θ}^{2} l (θ)$ in is negative definite, and $\nabla_{θ} l (θ) = 0$ if and only if $θ = θ^{*}$ . Here we remark that the finiteness of stationary points is needed to rule out the possibility that the observed data $O_{i}$ is non-informative about the underlying survival process indicated by $A_{i}$ . In the extreme case that $p (O_{i} ∣ A_{i}) = p (O_{i})$ is completely non-informative of the underlying survival process, $l (θ)$ is a constant and all $θ$ in $Ω$ are stationary points.

Next, we will apply Theorem 6 of Wu³⁹ to complete the proof. If the initial value $θ^{(0)}$ is in $Θ$ . By Theorem 1 of Dempster et al.²², $l (θ^{(k)})$ is non-decreasing, so all subsequent $θ^{(k)}$ are also in $Θ$ . The finiteness of $Ω$ and the continuity of $l (θ)$ implies that $Θ$ is a closed set. As a result, $\nabla_{θ}^{2} Q (θ ∣ \tilde{θ})$ is bounded in $Θ$ and there exists a constant $λ < 0$ such that all eigenvalues of $\nabla_{θ}^{2} Q (θ ∣ \tilde{θ})$ are smaller than $λ$ . Apply Taylor’s expansion to $Q (θ ∣ θ^{k})$ at $θ = θ^{(k + 1)}$ , we have

Q (θ ∣ θ^{(k)}) = Q (θ^{(k + 1)} ∣ θ^{(k)}) + {(θ - θ^{(k + 1)})}^{⊤} {[\nabla_{θ}^{2} Q (θ ∣ θ^{(k)})]}_{θ = θ_{0}^{(k)}} (θ - θ^{(k + 1)})

where the first-order term is zero as ${[\nabla_{θ} Q (θ ∣ θ^{(k)})]}_{θ = θ^{(k + 1)}} = 0$ by the the fact that $θ^{(k + 1)}$ maximizes $Q (θ ∣ θ^{(k)})$ , and $θ_{0}^{(k)}$ is some point on the line segment joining $θ^{(k)}$ and $θ^{(k + 1)}$ . Therefore, we have

Q (θ^{(k + 1)} ∣ θ^{(k)}) - Q (θ^{(k)} ∣ θ^{(k)}) \geq λ {(θ^{(k + 1)} - θ^{(k)})}^{⊤} (θ^{(k + 1)} - θ^{(k)}) .

At this point, the remainder of the proof follows from Theorem 6 of Wu³⁹.

B.2. Convergence of the EM Algorithm for Multistate Models

We also present the following proposition to show the convergence property of the EM algorithm for multistate models. In this subsection, we will follow the notations introduced in Section 2.

Proposition 2

(Convergence of the EM algorithm for multistate models). Let $A_{i} = (δ_{i}, g_{i})$ be the augmented data and $A = {\{A_{i}\}}_{i = 1}^{m}$ . Let $θ = (β, h_{0})$ . be the parameters in the model that lies in a compact space $Ω$ with a non-empty interior. Then under assumptions (a’), (b’), (c) and (d), there exists a neighborhood $Θ$ of $θ^{*}$ , such that for any initial value $θ^{(0)}$ in $Θ$ , the sequence of parameter estimates generated by the EM algorithm ${θ^{(k)}}_{k = 0}^{\infty}$ converges to the maximizer $θ^{*}$ , where $θ^{(k + 1)} = {s u p}_{θ \in Θ} Q (θ ∣ θ^{(k)})$ , and $Q (\tilde{θ} ∣ θ) = E [l (\tilde{θ}) ∣ O, θ]$ .

(a’)
Across all values of $i = 1, \dots,, m$ the event time processes ${\{δ_{i, s_{1} \to s_{2}} (ℐ_{j})\}}_{j = 1^{(s_{1}, s_{2}) \in E}}^{N_{t}}$ and the processes of monitoring times $T_{i 1} < \dots < T_{i n_{i}}$ are jointly independent. The processes of monitoring times are independent of the parameters $β$ and $h_{0}$ .
(b’)
The multistate data can be discrete-time multistate model with a complementary log-log link
$P (δ_{i, s_{1} \to s_{2}} (ℐ_{j}) = 1 ∣ g_{i, s_{1}} (ℐ_{j}) = 1) = 1 - \exp {- h_{0, s_{1} \to s_{2}} (ℐ_{j}) \exp (β_{s_{1} \to s_{2}}^{'} x_{i})} .$

Proof of Proposition 2. The proof is omitted for brevity since it closely mirrors the proof of Proposition 1.

DATA AVAILABILITY STATEMENT

The data in the case study and the Julia code for implementing the proposed methods can be found online at https://github.com/luyouepiusf/approximation_method.

References

1.Lindsey J. A study of interval censoring in parametric regression models. Lifetime Data Analysis 1998; 4(4): 329–354. [DOI] [PubMed] [Google Scholar]
2.Turnbull BW. The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society: Series B (Methodological) 1976; 38(3): 290–295. [Google Scholar]
3.Finkelstein DM, Wolfe RA. A semiparametric model for regression analysis of interval-censored failure time data. Biometrics 1985: 933–945. [PubMed] [Google Scholar]
4.Farrington C. Interval censored survival data: a generalized linear modelling approach. Statistics in Medicine 1996; 15(3): 283–292. [DOI] [PubMed] [Google Scholar]
5.Tanner MA, Wong WH. The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association 1987; 82(398): 528–540. [Google Scholar]
6.Wang L, McMahan CS, Hudgens MG, Qureshi ZP. A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics 2016; 72(1): 222–231. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Marshall G, Jones RH. Multi-state models and diabetic retinopathy. Statistics in Medicine 1995; 14(18): 1975–1983. [DOI] [PubMed] [Google Scholar]
8.Satten GA, Longini IM. Markov chains with measurement error: Estimating the “true” course of a marker of the progression of human immunodeficiency virus disease. Journal of the Royal Statistical Society: Series C (Applied Statistics) 1996; 45(3): 275–295. [Google Scholar]
9.Alioum A, Commenges D. MKVPCI: a computer program for Markov models with piecewise constant intensities and covariates. Computer Methods and Programs in Biomedicine 2001; 64(2): 109–119. [DOI] [PubMed] [Google Scholar]
10.Frydman H, Szarek M. Nonparametric estimation in a Markov “illness–death” process from interval censored observations with missing intermediate transition status. Biometrics 2009; 65(1): 143–151. [DOI] [PubMed] [Google Scholar]
11.Pak D, Li C, Todem D, Sohn W. A multistate model for correlated interval-censored life history data in caries research. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2017; 66(2): 413–423. [Google Scholar]
12.Zhang H, Kelvin EA, Carpio A, Allen Hauser W. A multistate joint model for interval-censored event-history data subject to within-unit clustering and informative missingness, with application to neurocysticercosis research. Statistics in Medicine 2020; 39(23): 3195–3206. [DOI] [PubMed] [Google Scholar]
13.Sharples LD. Use of the Gibbs sampler to estimate transition rates between grades of coronary disease following cardiac transplantation. Statistics in Medicine 1993; 12(12): 1155–1169. [DOI] [PubMed] [Google Scholar]
14.Pan SL, Wu HM, Yen AMF, Chen THH. A Markov regression random-effects model for remission of functional disability in patients following a first stroke: a Bayesian approach. Statistics in Medicine 2007; 26(29): 5335–5353. [DOI] [PubMed] [Google Scholar]
15.Van Den Hout A, Matthews FE. Estimating dementia-free life expectancy for Parkinson’s patients using Bayesian inference and microsimulation. Biostatistics 2009; 10(4): 729–743. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Kneib T, Hennerfeind A. Bayesian semi parametric multi-state models. Statistical Modelling 2008; 8(2): 169–198. [Google Scholar]
17.De Iorio M, Gallot N, Valcarcel B, Wedderburn L. A Bayesian semiparametric Markov regression model for juvenile dermatomyositis. Statistics in Medicine 2018; 37(10): 1711–1731. [DOI] [PubMed] [Google Scholar]
18.Huang J. Efficient estimation for the proportional hazards model with interval censoring. The Annals of Statistics 1996; 24(2): 540–568. [Google Scholar]
19.Lawless JF. A note on interval-censored lifetime data and the constant-sum condition of Oiler, Gómez & Calle (2004). Canadian Journal of Statistics 2004; 32(3): 327–331. [Google Scholar]
20.Oller R, Gómez G, Calle ML. Interval censoring: identifiability and the constant-sum property. Biometrika 2007; 94(1): 61–70. [Google Scholar]
21.Van Dyk DA, Meng XL. The art of data augmentation. Journal of Computational and Graphical Statistics 2001; 10(1): 1–50. [Google Scholar]
22.Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 1977; 39(1): 1–22. [Google Scholar]
23.Meng XL, Van Dyk D. Fast EM-type implementations for mixed effects models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 1998; 60(3): 559–578. [Google Scholar]
24.Murphy SA, Van Der Vaart AW. On profile likelihood. Journal of the American Statistical Association 2000; 95(450): 449–465. [Google Scholar]
25.Cox DR. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 1972; 34(2): 187–202. [Google Scholar]
26.Groeneboom P, Wellner JA. Information bounds and nonparametric maximum likelihood estimation. 19. Springer Science & Business Media. 1992. [Google Scholar]
27.Zhang Z, Sun J. Interval censoring. Statistical Methods in Medical Research 2010; 19(1): 53–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Marshall G, Guo W, Jones RH. MARKOV: A computer program for multi-state Markov models with covariables. Computer Methods and Programs in Biomedicine 1995; 47(2): 147–156. [DOI] [PubMed] [Google Scholar]
29.Datta S, Satten GA. Estimating future stage entry and occupation probabilities in a multistage model based on randomly right-censored data. Statistics & Probability Letters 2000; 50(1): 89–95. [Google Scholar]
30.Datta S, Satten GA, Datta S. Nonparametric estimation for the three-stage irreversible illness–death model. Biometrics 2000; 56(3): 841–847. [DOI] [PubMed] [Google Scholar]
31.Jackson CH, Sharples LD, Thompson SG, Duffy SW, Couto E. Multistate Markov models for disease progression with classification error. Journal of the Royal Statistical Society: Series D (The Statistician) 2003; 52(2): 193–209. [Google Scholar]
32.Gu Y, Zeng D, Heiss G, Lin DY. Maximum Likelihood Estimation for Semiparametric Regression Models with Interval-Censored Multistate Data. Biometrika 2023: asad073. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Kalbfleisch J, Lawless JF. The analysis of panel data under a Markov assumption. Journal of the American Statistical Association 1985: 863–871. [Google Scholar]
34.Van Den Hout A. Multi-State Survival Models for Interval-Censored Data. CRC Press. 2016. [Google Scholar]
35.Tian L, Zucker D, Wei L. On the Cox model with time-varying regression coefficients. Journal of the American Statistical Association 2005; 100(469): 172–183. [Google Scholar]
36.Sun J, Kopciuk KA, Lu X. Polynomial spline estimation of partially linear single-index proportional hazards regression models. Computational Statistics & Data Analysis 2008; 53(1): 176–188. [Google Scholar]
37.Ma L, Hu T, Sun J. Cox regression analysis of dependent interval-censored failure time data. Computational Statistics & Data Analysis 2016; 103: 79–90. [Google Scholar]
38.Finkelstein DM, Goggins WB, Schoenfeld DA. Analysis of failure time data with dependent interval censoring. Biometrics 2002; 58(2): 298–304. [DOI] [PubMed] [Google Scholar]
39.Wu CJ. On the convergence properties of the EM algorithm. The Annals of Statistics 1983; 11(1): 95–103. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Code

NIHMS1984079-supplement-Code.zip^{(18.6KB, zip)}

Supinfo

NIHMS1984079-supplement-Supinfo.pdf^{(147.2KB, pdf)}

Data Availability Statement

The data in the case study and the Julia code for implementing the proposed methods can be found online at https://github.com/luyouepiusf/approximation_method.

[R1] 1.Lindsey J. A study of interval censoring in parametric regression models. Lifetime Data Analysis 1998; 4(4): 329–354. [DOI] [PubMed] [Google Scholar]

[R2] 2.Turnbull BW. The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society: Series B (Methodological) 1976; 38(3): 290–295. [Google Scholar]

[R3] 3.Finkelstein DM, Wolfe RA. A semiparametric model for regression analysis of interval-censored failure time data. Biometrics 1985: 933–945. [PubMed] [Google Scholar]

[R4] 4.Farrington C. Interval censored survival data: a generalized linear modelling approach. Statistics in Medicine 1996; 15(3): 283–292. [DOI] [PubMed] [Google Scholar]

[R5] 5.Tanner MA, Wong WH. The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association 1987; 82(398): 528–540. [Google Scholar]

[R6] 6.Wang L, McMahan CS, Hudgens MG, Qureshi ZP. A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics 2016; 72(1): 222–231. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Marshall G, Jones RH. Multi-state models and diabetic retinopathy. Statistics in Medicine 1995; 14(18): 1975–1983. [DOI] [PubMed] [Google Scholar]

[R8] 8.Satten GA, Longini IM. Markov chains with measurement error: Estimating the “true” course of a marker of the progression of human immunodeficiency virus disease. Journal of the Royal Statistical Society: Series C (Applied Statistics) 1996; 45(3): 275–295. [Google Scholar]

[R9] 9.Alioum A, Commenges D. MKVPCI: a computer program for Markov models with piecewise constant intensities and covariates. Computer Methods and Programs in Biomedicine 2001; 64(2): 109–119. [DOI] [PubMed] [Google Scholar]

[R10] 10.Frydman H, Szarek M. Nonparametric estimation in a Markov “illness–death” process from interval censored observations with missing intermediate transition status. Biometrics 2009; 65(1): 143–151. [DOI] [PubMed] [Google Scholar]

[R11] 11.Pak D, Li C, Todem D, Sohn W. A multistate model for correlated interval-censored life history data in caries research. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2017; 66(2): 413–423. [Google Scholar]

[R12] 12.Zhang H, Kelvin EA, Carpio A, Allen Hauser W. A multistate joint model for interval-censored event-history data subject to within-unit clustering and informative missingness, with application to neurocysticercosis research. Statistics in Medicine 2020; 39(23): 3195–3206. [DOI] [PubMed] [Google Scholar]

[R13] 13.Sharples LD. Use of the Gibbs sampler to estimate transition rates between grades of coronary disease following cardiac transplantation. Statistics in Medicine 1993; 12(12): 1155–1169. [DOI] [PubMed] [Google Scholar]

[R14] 14.Pan SL, Wu HM, Yen AMF, Chen THH. A Markov regression random-effects model for remission of functional disability in patients following a first stroke: a Bayesian approach. Statistics in Medicine 2007; 26(29): 5335–5353. [DOI] [PubMed] [Google Scholar]

[R15] 15.Van Den Hout A, Matthews FE. Estimating dementia-free life expectancy for Parkinson’s patients using Bayesian inference and microsimulation. Biostatistics 2009; 10(4): 729–743. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Kneib T, Hennerfeind A. Bayesian semi parametric multi-state models. Statistical Modelling 2008; 8(2): 169–198. [Google Scholar]

[R17] 17.De Iorio M, Gallot N, Valcarcel B, Wedderburn L. A Bayesian semiparametric Markov regression model for juvenile dermatomyositis. Statistics in Medicine 2018; 37(10): 1711–1731. [DOI] [PubMed] [Google Scholar]

[R18] 18.Huang J. Efficient estimation for the proportional hazards model with interval censoring. The Annals of Statistics 1996; 24(2): 540–568. [Google Scholar]

[R19] 19.Lawless JF. A note on interval-censored lifetime data and the constant-sum condition of Oiler, Gómez & Calle (2004). Canadian Journal of Statistics 2004; 32(3): 327–331. [Google Scholar]

[R20] 20.Oller R, Gómez G, Calle ML. Interval censoring: identifiability and the constant-sum property. Biometrika 2007; 94(1): 61–70. [Google Scholar]

[R21] 21.Van Dyk DA, Meng XL. The art of data augmentation. Journal of Computational and Graphical Statistics 2001; 10(1): 1–50. [Google Scholar]

[R22] 22.Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 1977; 39(1): 1–22. [Google Scholar]

[R23] 23.Meng XL, Van Dyk D. Fast EM-type implementations for mixed effects models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 1998; 60(3): 559–578. [Google Scholar]

[R24] 24.Murphy SA, Van Der Vaart AW. On profile likelihood. Journal of the American Statistical Association 2000; 95(450): 449–465. [Google Scholar]

[R25] 25.Cox DR. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 1972; 34(2): 187–202. [Google Scholar]

[R26] 26.Groeneboom P, Wellner JA. Information bounds and nonparametric maximum likelihood estimation. 19. Springer Science & Business Media. 1992. [Google Scholar]

[R27] 27.Zhang Z, Sun J. Interval censoring. Statistical Methods in Medical Research 2010; 19(1): 53–70. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Marshall G, Guo W, Jones RH. MARKOV: A computer program for multi-state Markov models with covariables. Computer Methods and Programs in Biomedicine 1995; 47(2): 147–156. [DOI] [PubMed] [Google Scholar]

[R29] 29.Datta S, Satten GA. Estimating future stage entry and occupation probabilities in a multistage model based on randomly right-censored data. Statistics & Probability Letters 2000; 50(1): 89–95. [Google Scholar]

[R30] 30.Datta S, Satten GA, Datta S. Nonparametric estimation for the three-stage irreversible illness–death model. Biometrics 2000; 56(3): 841–847. [DOI] [PubMed] [Google Scholar]

[R31] 31.Jackson CH, Sharples LD, Thompson SG, Duffy SW, Couto E. Multistate Markov models for disease progression with classification error. Journal of the Royal Statistical Society: Series D (The Statistician) 2003; 52(2): 193–209. [Google Scholar]

[R32] 32.Gu Y, Zeng D, Heiss G, Lin DY. Maximum Likelihood Estimation for Semiparametric Regression Models with Interval-Censored Multistate Data. Biometrika 2023: asad073. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Kalbfleisch J, Lawless JF. The analysis of panel data under a Markov assumption. Journal of the American Statistical Association 1985: 863–871. [Google Scholar]

[R34] 34.Van Den Hout A. Multi-State Survival Models for Interval-Censored Data. CRC Press. 2016. [Google Scholar]

[R35] 35.Tian L, Zucker D, Wei L. On the Cox model with time-varying regression coefficients. Journal of the American Statistical Association 2005; 100(469): 172–183. [Google Scholar]

[R36] 36.Sun J, Kopciuk KA, Lu X. Polynomial spline estimation of partially linear single-index proportional hazards regression models. Computational Statistics & Data Analysis 2008; 53(1): 176–188. [Google Scholar]

[R37] 37.Ma L, Hu T, Sun J. Cox regression analysis of dependent interval-censored failure time data. Computational Statistics & Data Analysis 2016; 103: 79–90. [Google Scholar]

[R38] 38.Finkelstein DM, Goggins WB, Schoenfeld DA. Analysis of failure time data with dependent interval censoring. Biometrics 2002; 58(2): 298–304. [DOI] [PubMed] [Google Scholar]

[R39] 39.Wu CJ. On the convergence properties of the EM algorithm. The Annals of Statistics 1983; 11(1): 95–103. [Google Scholar]

PERMALINK

A Discrete Approximation Method for Modeling Interval-Censored Multistate Data

Lu You

Xiang Liu

Jeffrey Krischer

Summary

1 |. INTRODUCTION

2 |. METHOD

2.1 |. Data and Notations

2.2 |. The Observed and Complete Likelihood

2.3 |. An Approximation Strategy for Attaining the Partial Likelihood

2.4 |. Model Estimation by EM Algorithm

3 |. SIMULATION STUDIES

FIGURE 1.

TABLE 1.

4 |. DATA APPLICATION TO HEART TRANSPLANT DATA

FIGURE 2.

TABLE 2.

TABLE 3.

TABLE 4.

FIGURE 3.

5 |. CONCLUSIONS AND DISCUSSIONS

Supplementary Material

ACKNOWLEDGMENTS

0. Abbreviations:

APPENDIX

A. METHOD FOR INTERVAL-CENSORED SINGLE-EVENT DATA

A.1. Data and Notations

A.2. The Observed, Complete, and Partial Likelihood

A.3. Model Estimation by EM Algorithm

A.4. Comparing with Turnbull’s Method

A.5. Simulation Studies

TABLE A1.

B. CONVERGENCE OF THE EM ALGORITHMS

B.1. Convergence of the EM Algorithm for Single-Event Models

Proposition 1

B.2. Convergence of the EM Algorithm for Multistate Models

Proposition 2

DATA AVAILABILITY STATEMENT

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases