Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2020 Jun 17;107(4):949–964. doi: 10.1093/biomet/asaa034

General regression model for the subdistribution of a competing risk under left-truncation and right-censoring

A Bellach 1,, M R Kosorok 2, P B Gilbert 3, J P Fine 2
PMCID: PMC7799183  PMID: 33462536

Summary

Left-truncation poses extra challenges for the analysis of complex time-to-event data. We propose a general semiparametric regression model for left-truncated and right-censored competing risks data that is based on a novel weighted conditional likelihood function. Targeting the subdistribution hazard, our parameter estimates are directly interpretable with regard to the cumulative incidence function. We compare different weights from recent literature and develop a heuristic interpretation from a cure model perspective that is based on pseudo risk sets. Our approach accommodates external time-dependent covariate effects on the subdistribution hazard. We establish consistency and asymptotic normality of the estimators and propose a sandwich estimator of the variance. In comprehensive simulation studies we demonstrate solid performance of the proposed method. Comparing the sandwich estimator with the inverse Fisher information matrix, we observe a bias for the inverse Fisher information matrix and diminished coverage probabilities in settings with a higher percentage of left-truncation. To illustrate the practical utility of the proposed method, we study its application to a large HIV vaccine efficacy trial dataset.

Keywords: Cure model, Fine–Gray model, Proportional odds model, Pseudo risk set, Semiparametric transformation model, Time-varying covariate, Weighted conditional nonparametric maximum likelihood estimation

1. Introduction

Delayed entries arise in medical studies when individuals come under observation some time after the initial event. The time of the initial event is referred to as the left-truncation time or the delayed entry time, and we denote it by Inline graphic. We consider a competing risks setting, in which individuals are exposed to several distinct event types, denoted by Inline graphic, and only the first-occurring event is observable at time Inline graphic. Individuals are subject to independent right-censoring at Inline graphic. Defining Inline graphic and Inline graphic, we observe Inline graphic only for those individuals with Inline graphic. Such a setting is referred to as a competing risks setting with left-truncated and right-censored data. Individuals with Inline graphic are subject to left-truncation. We will make the common assumption that Inline graphic and Inline graphic are independent given a vector of covariates Inline graphic.

With a sample that represents only those individuals whose event and censoring times exceed the left-truncation time, it is well known that delayed entries need to be taken into account when fitting survival regression models. For the analysis of survival data with independent left-truncation and right-censoring and, without competing risks, a conditional likelihood approach was proposed by Andersen et al. (1988, 2013), Keiding & Gill (1990) and Owen (2001), among others. In particular, one can estimate the regression parameters in proportional hazards models using a conditional partial likelihood, where individuals are included in the risk set from their time of entrance until an event of interest or a censoring is observed.

A common approach used in competing risks settings with independent right-censoring is to model the cause-specific hazard Inline graphic, which can be interpreted as the instantaneous risk of dying from cause Inline graphic at time Inline graphic conditional on being alive at Inline graphic. The cumulative incidence function, defined as Inline graphic  Inline graphic is a complex function of all cause-specific hazards. Therefore, estimated parameters targeting the cause-specific hazard cannot be interpreted with regard to the corresponding cumulative incidence function. An alternative approach models the subdistribution hazard for the event of interest,

graphic file with name Equation1.gif

as defined by Gray (1988), which is the instantaneous risk of experiencing an event of interest at time Inline graphic conditional on not having experienced an event of interest until Inline graphic. The subdistribution hazard is directly related to the corresponding cumulative incidence function through Inline graphic Regression parameters targeting the subdistribution hazard are thus directly interpretable with regard to the cumulative incidence function. The subdistribution hazard is also linked to a cure model formulation of the competing risks setting (Bellach et al., 2019) and may be interpreted as the hazard of the event time Inline graphic as in Fine & Gray (1999).

Regression modelling for the subdistribution of a competing risk with independent left-truncation and right-censoring has so far been investigated only for the proportional hazards model (Geskus, 2011; Zhang et al., 2011). As the proportional hazards assumption is not valid in general, it is of interest to consider a more general direct regression model. Moreover, the existing literature does not provide intuition for the proposed weights, and in particular the weighted risk set appears to be unexplained. The purpose of this paper is to give comprehensible arguments for the proposed weights and risk sets, and to derive estimators for a general direct regression model targeting the subdistribution hazard based on a conditional nonparametric maximum likelihood procedure.

We first investigate a competing risks model in which the competing risk event is associated with a cure event. This is frequently the case with infectious diseases. As a historic example, Edward Jenner’s observation that prior infection with cowpox confers protective immunity against smallpox played a pivotal role in the development of the smallpox vaccine, which led to the eradication of smallpox in 1980. A more recent example is vaccination against the dengue virus; epidemiological data support the idea that disease caused by a given dengue genotype prevents disease due to other dengue genotypes within the same serotype (Juraska et al., 2018).

In the cure model setting, left-truncation and right-censoring times can be observed independently of the cure event. Cured individuals with Inline graphic and Inline graphic are incorporated into the risk set, which is denoted by Inline graphic. It is always possible to determine the size of such a risk set from the observed data, although the size of the cure fraction may be unknown. For the competing risks model, individuals with a previous competing risk event are incorporated into the risk set like a cure fraction, as they are no longer exposed to the event of interest. It should be noticed that the terms risk process and risk set are rooted in the tradition of survival analysis and should not be understood in a literal sense. The risk set in a subdistribution hazard model consists of those individuals who have not experienced an event of interest up to time Inline graphic, and may never experience an event of interest because of a cure event or competing risk event prior to Inline graphic. In particular, not all individuals in the risk set are actually at risk for the event of interest.

First, we derive the conditional likelihood function for the cure model, adhering to the principle that the true parameter can be estimated efficiently by the value that maximizes the probability of the observed data. Second, we study the classical competing risks setting, where the occurrence of any event would terminate observation. We use the cure model to obtain new insights into the direct competing risks regression model with left-truncated and right-censored data. For estimation, we weight using the observed data, mimicking the structure that would have occurred in the cure model setting, and define the pseudo risk set. Specifically, the expected number of individuals in the pseudo risk set is by construction equal to the expected number of individuals in the corresponding cure model risk set.

We derived the proposed weighted conditional nonparametric maximum likelihood estimators using the R (R Development Core Team, 2020) function optim with the method specified as BFGS, which utilizes the Broyden–Fletcher–Goldfarb–Shanno algorithm, a quasi-Newton method. For all simulation studies and clinical trial data that we investigated, the optimization worked very reliably.

2. Pseudo risk set and weighted likelihood function

2.1. Cure model with complete data

We start by investigating a competing risks model in which the competing risk event is connected to the cure of the subject. Denoting an event of interest by Inline graphic and the competing event by Inline graphic, the cure event at Inline graphic implies that Inline graphic, although the subject is still observable. Let Inline graphic denote the duration of the study, with administrative censoring at Inline graphic. For settings with only administrative censoring at Inline graphic, we have Inline graphic and Inline graphic. For a random sample Inline graphic, the observed data are Inline graphic for Inline graphic and Inline graphic for Inline graphic. The process counting the events of interest is Inline graphic where Inline graphic. The individual at-risk process is Inline graphic  Inline graphic, and Inline graphic is the cardinality of the risk set Inline graphic for Inline graphic.

We denote the cumulative subdistribution hazard for the event of interest by Inline graphic and define Inline graphic. As argued in Bellach et al. (2019), Inline graphic is the compensator of Inline graphic with respect to the filtration Inline graphic, and the Nelson–Aalen estimator for the cumulative subdistribution hazard can be derived from the Doob decomposition Inline graphic, with Inline graphic denoting the martingale. A likelihood function as in Bellach et al. (2019, § 2.1) can be derived by considering that the contribution for an event of interest is Inline graphic with Inline graphic, and the contribution for a cure event or administrative censoring at Inline graphic is Inline graphic. The estimator for the cumulative subdistribution hazard that is obtained from maximizing this likelihood function is equivalent to the Nelson–Aalen estimator.

2.2. Cure model with independent left-truncation and right-censoring

With independent left-truncation and right-censoring, we have Inline graphic and Inline graphic. The observed data are Inline graphic for subjects Inline graphic, Inline graphic for subjects Inline graphic and Inline graphic for subjects Inline graphic. The process counting the observed events of interest is denoted by Inline graphic with Inline graphic, and Inline graphic is the individual at-risk process for subject Inline graphic. Then Inline graphic is the cardinality of the risk set

graphic file with name Equation2.gif (1)
graphic file with name Equation3.gif (2)

The risk set in (1) is based on observable data. Subjects enter the study at Inline graphic independently of a possible cure event and remain in the risk set until an event of interest is observed at Inline graphic or until censoring at Inline graphic. The size of the cure fraction Inline graphic may be unobservable for the theoretical decomposition (2), with subjects entering the study after the cure event. For modelling of the subdistribution hazard, however, only knowledge about the size of the risk sets Inline graphic is required, which can be obtained from (1) regardless of the specific decomposition (2).

A Nelson–Aalen estimator for the cumulative subdistribution hazard can be derived from the weighted Doob decomposition Inline graphic. The same estimate may also be derived from a weighted conditional likelihood function obtained by considering that the contribution for an observed event of interest at Inline graphic is Inline graphic with Inline graphic, the contribution for an observed censoring is Inline graphic, and the contribution for a cured individual with Inline graphic is Inline graphic, regardless of whether the cure event occurred before or after the entrance time Inline graphic. We thus obtain the conditional likelihood function

graphic file with name Equation4.gif

where the cumulative baseline subdistribution hazard is approximated by a step function Inline graphic with jumps at the observed event times, which are denoted by Inline graphic.

2.3. Left-truncated and right-censored competing risks data

Modelling the subdistribution hazard for the competing risks model requires a reconstruction of the cure model risk set, where the cure fraction consists of Inline graphic. For the competing risks model, however, only subjects with Inline graphic are observable, and in particular subjects Inline graphic are assigned to the cure fraction. The observed data are Inline graphic for subjects Inline graphic and Inline graphic for subjects Inline graphic.

To maintain the cure model structure, an adjustment for hypothetical left-truncation after competing risk events and right-censoring of individuals in the cure fraction is required. We apply inverse probability of censoring weighting. A weight function Inline graphic that consistently estimates Inline graphic, where Inline graphic, would be justified by Inline graphic. We make the decomposition Inline graphic, with simplified weights Inline graphic, to define the pseudo risk set

graphic file with name Equation5.gif

The process counting the observable events of interest is defined by Inline graphic, with Inline graphic. A Nelson–Aalen-type estimator for the cumulative subdistribution hazard is obtained from the weighted Doob decomposition Inline graphic, and may also be derived from the weighted conditional likelihood function from § 2.2, where the risk set indicator Inline graphic is replaced by Inline graphic:

graphic file with name Equation6.gif

A product limit estimator for the cumulative incidence function can be derived based on the Nelson–Aalen-type estimator for the subdistribution hazard. Zhang et al. (2009) and Geskus (2011) proved its equivalence to the fully efficient Aalen–Johansen estimator. This suggests that the weighted conditional likelihood function is a promising candidate for likelihood-based inference on the subdistribution hazard.

3. Definition and estimation of the weights

We use the following representation of the theoretical weights:

graphic file with name Equation7.gif

Denoting the number of observable events by Inline graphic and the number of observable individuals by Inline graphic, we estimate Inline graphic consistently by Inline graphic. The unconditional probabilities Inline graphic and Inline graphic can be estimated with the product limit estimator. Let Inline graphic be the process counting the competing risk events. We obtain the estimated weights

graphic file with name Equation8.gif

where Inline graphic is the left-truncated version of the product limit estimator for the overall survival, as proposed by Zhang et al. (2011). Decomposing the weight yields Inline graphic, with

graphic file with name Equation9.gif

Geskus (2011) introduced weights based on the product limit estimator Inline graphic estimating Inline graphic and the inverse product limit estimator of the delayed entries Inline graphic estimating Inline graphic, which had previously been investigated by Keiding & Gill (1990) for a model with independent left-truncation and no censoring. A common assumption for left-truncated and right-censored data is independence of Inline graphic and Inline graphic, but not independence of Inline graphic and Inline graphic. For this reason, the validity of the weights

graphic file with name Equation10.gif

which may be decomposed as Inline graphic with

graphic file with name Equation11.gif

is not immediately apparent. A theoretical justification using results of He & Yang (1998) would require independence of Inline graphic and Inline graphic. Interestingly, the weights Inline graphic and Inline graphic are equivalent with continuous Inline graphic, as we prove in the Supplementary Material, thereby justifying the weights proposed by Geskus (2011).

For a competing risks model with only left-truncation and no right-censoring, the weights simplify; for example, the theoretical weights in this case are Inline graphic.

4. General regression model

We consider estimation of a general model for the cumulative subdistribution hazard

graphic file with name Equation12.gif

as in Bellach et al. (2019), akin to a general semiparametric regression model for survival and recurrent event data without left-truncation and competing risks, which has been investigated by Zeng&Lin (2006); here Inline graphic is a vector of unknown regression parameters, Inline graphic is an unspecified increasing function, and Inline graphic is a thrice continuously differentiable and strictly increasing function with Inline graphic, Inline graphic and Inline graphic. Regularity conditions for this model, which are considerably weaker than those in Zeng & Lin (2006), are specified in the Appendix. Special cases of the general model include the Box–Cox transformation models with link function Inline graphic for Inline graphic, and the logarithmic transformation models with link function Inline graphic for Inline graphic  (Chen et al., 2002). The Fine–Gray model under left-truncation and right-censoring (Geskus, 2011; Zhang et al., 2011) is also a special case. In particular, the proportional hazards model is the Box–Cox transformation model with Inline graphic and the limiting logarithmic transformation model with Inline graphic, while the proportional odds model is the logarithmic transformation model with Inline graphic and the limiting Box–Cox transformation model with Inline graphic. If Inline graphic is differentiable, the instantaneous subdistribution hazard rate is defined by

graphic file with name Equation13.gif (3)

where Inline graphic denotes the baseline subdistribution hazard and Inline graphic is a vector of external time-dependent or time-independent covariates.

For the cure model with independent left-truncation and right-censoring, we define Inline graphic to obtain the conditional loglikelihood

graphic file with name Equation14.gif

By rearranging terms in the second sum, this can be written as

graphic file with name Equation15.gif

With independently left-truncated and right-censored competing risks data, we obtain

graphic file with name Equation16.gif (4)

For statistical inference, the baseline subdistribution hazard Inline graphic is approximated by a sequence of step functions, and the weighted nonparametric maximum likelihood estimator Inline graphic is obtained by maximizing the discretized loglikelihood function with respect to Inline graphic and the jump sizes Inline graphic. In the Appendix and Supplementary Material we prove the following theorem.

Theorem 1.

The weighted nonparametric maximum likelihood estimatorInline graphic  is uniformly consistent.

With an application of Lemma S2 from the Supplementary Material, which is based on Theorem 3.3.1 in van der Vaart & Wellner (1996), we derive the following theorem.

Theorem 2.

We have thatInline graphic  converges weakly to a zero-mean Gaussian process.

We prove these theorems first for the cure model set-up, thereby adapting arguments by Murphy (1994, 1995), Parner (1998), Zeng & Lin (2006) and Bellach et al. (2019). These results are then adapted to the direct competing risks regression model, which has a similar mathematical structure.

To obtain weak convergence and for variance estimation, we consider the linear functionals

graphic file with name Equation17.gif

where Inline graphic, with Inline graphic being a function in the Skorohod space Inline graphic and Inline graphic. Let Inline graphic be the ordered times for the event of interest and let Inline graphic be the number of such events. Defining a vector Inline graphic with Inline graphic  Inline graphic, the asymptotic variance of the linear functional can be estimated by the sandwich estimator

graphic file with name Equation18.gif

where Inline graphic is the observed Fisher information matrix with respect to Inline graphic and the jump sizes, and Inline graphic is the estimated variance of the score with respect to Inline graphic and the jump sizes, with Inline graphic, Inline graphic and Inline graphic denoting the components of the independent and identically distributed decomposition given in the Supplementary Material. Analogously, we obtain the estimator for the covariances of Inline graphic and Inline graphic, with Inline graphic and with corresponding vectors Inline graphic:

graphic file with name Equation19.gif

For the cure model, the sandwich estimator simplifies to the inverse Fisher information. However, for the competing risks model with independent left-truncation and right-censoring, the middle term Inline graphic contains additional terms reflecting the extra variability from the estimated weights.

5. Simulation studies

We conducted simulation studies for the Fine–Gray model and for the proportional odds model to evaluate the bias, the empirical and model-based variances, and the coverage probabilities of the proposed method for different sample sizes. We considered settings with two different event types. All simulation results are based on 1000 repetitions.

For our first set of simulations with fixed covariates, displayed in Table 1, we used the model formulations in Fine & Gray (1999) and Fine (2001), which are specified in the Supplementary Material. The covariates Inline graphic and Inline graphic  Inline graphic were generated independently from a standard normal distribution. For the second set of simulations, displayed in Table 2, we developed a new method to generate event times for competing risks models with binary time-dependent covariates, the details of which are given in the Supplementary Material. We generated Inline graphic from a standard normal distribution and generated Inline graphic as a time-dependent covariate with one transition from 0 to 1, thereby assuming that the covariate process is observable after the competing risk event. The transition time for Inline graphic was generated from a standard exponential distribution.

Table 1.

Simulation results for the Fine–Gray and proportional odds models. For columns 3–8, Inline graphic, Inline graphic, Inline graphic is generated from Inline graphic with probability Inline graphic and set to zero otherwise, and Inline graphic with Inline graphic; approximately Inline graphic are left-truncated, Inline graphic are censored, Inline graphic are events of interest and Inline graphic are competing events. For columns 9–14, Inline graphic, Inline graphic, Inline graphic is generated from Inline graphic with probability Inline graphic and set to Inline graphic otherwise, and Inline graphic with Inline graphic; approximately Inline graphic are left-truncated, Inline graphic are censored, Inline graphic are events of interest and Inline graphic are competing events

Fine–Gray modelInline graphic Proportional odds modelInline graphic
Size Parameter Bias SE Inline graphic Inline graphic CPInline graphic CPInline graphic Bias SE Inline graphic Inline graphic CPInline graphic CPInline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic

Bias, absolute value of bias; SE, empirical standard error; Inline graphic, inverse Fisher information; Inline graphic, sandwich estimator; CPInline graphic and CPInline graphic, coverage probabilities of Inline graphic confidence intervals for Inline graphic and Inline graphic, respectively; all results are multiplied by Inline graphic.

Table 2.

Results of simulation studies with external time-dependent covariates. For columns 3–8, Inline graphic, Inline graphic, Inline graphic is generated from Inline graphic with probability Inline graphic and set to Inline graphic otherwise, and Inline graphic with Inline graphic; approximately Inline graphic are left-truncated, Inline graphic are censored, Inline graphic are events of interest and Inline graphic are competing events. For columns 9–14, Inline graphic, Inline graphic, Inline graphic is generated from Inline graphic with probability Inline graphic and set to Inline graphic otherwise, and Inline graphic with Inline graphic; approximately Inline graphic are left-truncated, Inline graphic are censored, Inline graphic are events of interest and Inline graphic are competing events

Fine–Gray modelInline graphic Proportional odds modelInline graphic
Size Parameter Bias SE Inline graphic Inline graphic CPInline graphic CPInline graphic Bias SE Inline graphic Inline graphic CPInline graphic CPInline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic

Bias, absolute value of bias; SE, empirical standard error; Inline graphic, inverse Fisher information; Inline graphic, sandwich estimator; CPInline graphic and CPInline graphic, coverage probabilities of Inline graphic confidence intervals for Inline graphic and Inline graphic, respectively; all results are multiplied by Inline graphic.

In all simulation studies the performance of the weighted nonparametric maximum likelihood estimator is solid, even for small sample sizes, in terms of the bias, and the standard errors and coverage probabilities of the proposed sandwich estimator. The bias is generally small, the standard errors decrease at rate Inline graphic, the proposed sandwich estimators are close to the empirical variance estimators, and the coverage probabilities are close to the nominal level of Inline graphic, particularly for sample sizes Inline graphic, Inline graphic and Inline graphic.

Comparing the results for the estimated variances, in particular for the additional simulation studies presented in the Supplementary Material, it is observed that the proposed sandwich estimator has the desired properties, while the inverse Fisher information matrix underestimates the variance of the parameter estimates, and, moreover, the coverage probabilities are clearly smaller than the desired value of Inline graphic.

Simulation results reported in the Supplementary Material confirm our theoretical results on the equivalence of the weights proposed by Geskus (2011) and Zhang et al. (2011).

6. Application to HIV vaccine efficacy trial data

We analysed data from the HVTN 503 preventive HIV vaccine efficacy trial in South Africa, to assess how participant factors are associated with the time from reaching adulthood to diagnosis of genotype-specific HIV-1 infection. These data, from 784 individuals screened at age 18 and above, are left-truncated because only those who test negative for HIV-1 are eligible to enrol in the trial. It was of particular interest to investigate the efficacy of the vaccine and the effects of sex at birth, and body mass index at enrolment on the time to infection with the HIV-1 169K virus, which has a vaccine-matched residue at position 169 of the V2 loop, compared with the effects on the time to infection with the HIV-1 K169! virus, which has a vaccine-mismatched residue at position 169. This question is relevant because an HIV vaccine is currently undergoing testing for efficacy in the HVTN 702 randomized, placebo-controlled trial in South Africa, and it is hypothesized that the efficacy of this vaccine against the virus type HIV-1 169K is superior to that against type HIV-1 K169!, because a previous trial of a similar HIV vaccine showed greater efficacy against HIV-1 type 169K (Rolland et al., 2012).

A total of 28 infections with HIV-1 virus type 169K and 15 infections with HIV-1 virus type K169! were observed. We conducted backward selection starting with the covariates assignment to vaccine, sex at birth and body mass index, and then removed the last, which was clearly nonsignificant.

Our findings, displayed in Tables 3 and 4, indicate that women in South Africa tend to be infected at earlier ages than men. For HIV-1 type 169K, the results can be interpreted as an Inline graphic increase in the expected subdistribution hazard for women, with Inline graphic, and an Inline graphic increase in the expected odds ratio, with Inline graphic. For HIV-1 type K169! this would be interpreted as a Inline graphic increase in the expected subdistribution hazard for women, with Inline graphic, and a Inline graphic increase in the expected odds ratio, with Inline graphic. As illustrated in Fig. 1, for infection with HIV-1 type 169K, the fit of the model may be improved with a logarithmic transformation and parameter Inline graphic, while for infection with HIV-1 type K169! a Box–Cox transformation with a larger parameter Inline graphic may yield a moderately improved fit over the Fine–Gray and proportional odds models. The observed direct association with risk of being female in comparison to being male was therefore significantly greater for HIV-1 type 169K than for HIV-1 type K169!. The result that women and men have different risks especially for HIV-1 type 169K motivates further exploratory analyses in the HVTN 702 trial, including assessment of whether sex modifies vaccine efficacy against HIV-1 type 169K.

Table 3.

Time to infection with vaccine-matched virus (Inline graphic, 169K HIV-1 infection): effect of vaccination and of sex at birth on time to HIV-1 infection; reported are parameter estimates (with standard errors in parentheses) for different semiparametric transformation models and the estimated Inline graphic confidence intervals

Logarithmic transformation Box–Cox transformation
Inline graphic Inline graphic PO model FG model Inline graphic
Loglik Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Treat Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic CI Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Sex Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic CI Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic

PO, proportional odds; FG, Fine–Gray; Loglik, loglikelihood; Treat, vaccination treatment; Sex, sex at birth (indicator of female); CI, confidence interval.

Table 4.

Time to infection with vaccine unmatched virus (Inline graphic, K169! HIV infection): effect of vaccination and of sex at birth on time to HIV-1 infection; reported are parameter estimates (with standard errors in parentheses) for different semiparametric transformation models and the estimated Inline graphic confidence intervals

Logarithmic transformation Box–Cox transformation
Inline graphic PO model FG model Inline graphic Inline graphic
Loglik Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Treat Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic CI Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Sex Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic CI Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic

PO, proportional odds; FG, Fine–Gray; Loglik, loglikelihood; Treat, vaccination treatment; Sex, sex at birth (indicator of female); CI, confidence interval.

Fig. 1.

Fig. 1.

Model selection with Akaike criterion for the HIV vaccine data: the upper panels are for HIV-1 type 169K and the lower panels for HIV-1 type K169!; the left panels show logarithmic transformation models and the right panels Box–Cox transformation models.

7. Discussion

Any cure model may be interpreted as a competing risks model where the competing risk event is the cure. Therefore, the subdistribution hazard approach proposed in § 2.2 can be applied to any cure model under left-truncation and right-censoring.

A direct relation between the subdistribution hazard and the cumulative incidence function can be established for either time-independent covariates, such as baseline covariates, or external time-dependent covariates. But this is not possible for internal time-dependent covariates, as discussed in Bellach et al. (2019). It is well known that even in survival settings without competing risks, regression parameters for the hazard rate in the presence of internal time-dependent covariates are not interpretable with regard to the underlying distribution; see, for instance, Kalbfleisch & Prentice (2002).

The general regression model can be extended to a model for the marginal mean intensity for recurrent event data with competing terminal events under left-truncation and right-censoring. For the marginal mean intensity, defined as Inline graphic where Inline graphic denotes the process counting the number of recurrent events, the usual relationship with the marginal mean is retained; i.e., Inline graphic. A slightly modified version of the weighted loglikelihood function (4) may be applied to estimate the regression parameters for the general model for the marginal mean intensity as defined in (3).

Supplementary Material

asaa034_Supplementary_Data

Acknowledgement

The authors thank Erik Parner, Hein Putter, Donglin Zeng and two referees for helpful comments. They also thank the HVTN 503 study participants and personnel. Bellach’s research was in part conducted at the Department of Biostatistics at the University of Copenhagen, Denmark, and supported by the E.U. Programme FP7: Marie-Curie ITN mediasres. Gilbert’s research was supported by the U.S. National Institute of Allergy and Infectious Diseases. The authors received permission to use data from the CASCADE Collaboration in EuroCoord (CASCADE Collaboration, 2006) that was funded by the E.U. Programme FP7, and thank Kholoud Porter, Jannie van der Helm and Ronald Geskus for their support. Kosorok and Fine are also affiliated with the Department of Statistics and Operations Research at the University of North Carolina, Chapel Hill.

Appendix

Model conditions

Condition A1.

The cumulative baseline Inline graphic is a strictly increasing and continuously differentiable function, and Inline graphic lies in the interior of a compact set Inline graphic.

Condition A2.

The vector of covariates Inline graphic is with probabilty one of bounded variation on the observed interval Inline graphic. In combination with Condition A1 this implies that there is a constant Inline graphic such that Inline graphic.

Condition A3.

The endpoint of the study Inline graphic is chosen in such a way that with probability one there exists a constant Inline graphic such that Inline graphic and Inline graphic. There is a finite partition Inline graphic of the interval Inline graphic with Inline graphic for some Inline graphic.

Condition A4.

The function Inline graphic is thrice continuously differentiable and strictly increasing, with Inline graphic, Inline graphic and Inline graphic. Moreover, one of the following conditions holds:

  • (i) Inline graphic for Inline graphic;

  • (ii) Inline graphic for Inline graphic and, in addition, for any Inline graphic and any sequence Inline graphic with Inline graphic as Inline graphic,
    graphic file with name Equation20.gif (A1)
Condition A5.

(Identifiability). If a vector Inline graphic and a deterministic function Inline graphic, Inline graphic, exist such that Inline graphic with probability one, then Inline graphic and Inline graphic for Inline graphic-almost all Inline graphic, where Inline graphic denotes Lebesgue measure. This condition rules out perfect multicollinearity of the covariates Inline graphic for Inline graphic-almost all Inline graphic.

Condition A6.

Let Inline graphic denote a vector in Inline graphic and let Inline graphic. For a subset Inline graphic of nonzero Lebesgue measure, for all Inline graphic we have that

Condition A6.

Existence of the weighted nonparametric maximum likelihood estimator and consistency

We first ascertain that Inline graphic is bounded almost surely on Inline graphic. As a consequence, all estimated jump sizes must be finite. By the Helly–Bray lemma we obtain that for every sequence Inline graphic there exist a subsequence Inline graphic and Inline graphic such that Inline graphic. Together with Condition A1 this implies sequential compactness for Inline graphic, which means that for every sequence Inline graphic there exist a subsequence Inline graphic and Inline graphic such that Inline graphic. Using a Kullback–Leibler argument, one can then conclude that every subsequence converges to the true parameter Inline graphic. As Inline graphic is a continuous and increasing function, we obtain uniform almost sure convergence for the cumulative baseline hazard Inline graphic and almost sure convergence for Inline graphic.

Let Inline graphic be a constant such that Inline graphic. We define Inline graphic and consider the sequence Inline graphic. Under Condition A4(i) the sequence is bounded above by

graphic file with name Equation22.gif

and under Condition A4(ii) the sequence is bounded above by

graphic file with name Equation23.gif

In both cases the upper bound would become infinitely small if Inline graphic, which means that Inline graphic would go to Inline graphic as Inline graphic. This contradicts the definition of Inline graphic as a maximum likelihood estimator. Therefore Inline graphic is bounded almost surely on Inline graphic.

Writing Inline graphic and Inline graphic, we define the Kullback–Leibler distance for the cure model with independent left-truncation and right-censoring as

graphic file with name Equation24.gif

where Inline graphic is the subdistribution density for the event of interest. In the Supplementary Material we prove that Inline graphic and Inline graphic, and establish the identifiability, namely that if Inline graphic then Inline graphic. For the competing risks model with independent left-truncation and right-censoring, we define

graphic file with name Equation25.gif

Consistency then follows from the asymptotic equivalence of Inline graphic and Inline graphic.

Weak convergence to a Gaussian process

Let Inline graphic denote the space of elements Inline graphic with Inline graphic and Inline graphic. A norm on Inline graphic is then defined by Inline graphic, where Inline graphic denotes the Euclidean norm and Inline graphic the total variation norm. For Inline graphic we define Inline graphic. The parameter space is Inline graphic. For Inline graphic we define Inline graphic, so that Inline graphic. One-dimensional submodels of the form Inline graphic are considered with Inline graphic to define the empirical score operator

graphic file with name Equation26.gif

for a measurable function Inline graphic, where Inline graphic is the component related to the derivative with respect to Inline graphic and Inline graphic is the component related to the derivative along the submodel for Inline graphic, as stated in the Supplementary Material. The limiting version Inline graphic is defined by replacing the empirical measure Inline graphic by the probability measure Inline graphic. It is then sufficient to verify the conditions of Lemma S2 in the Supplementary Material for weighted Inline graphic-estimators.

To verify condition Inline graphic in Lemma S2, note that as our model is based on independent and identically distributed observations, it is sufficient to verify the two conditions of Lemma S3. Arguments like those in van der Vaart & Wellner (1996) and Kosorok (2008) along the lines of Donsker preservation theorems are immediately applicable. The second condition of Lemma S3 immediately follows, as pointwise convergence can be strengthened to Inline graphic convergence by dominated convergence. By the Donsker theorem, Inline graphic converges in distribution to the tight random element Inline graphic. Also by the Donsker theorem, Inline graphic converges in distribution to a tight random element Inline graphic. Joint convergence follows from the asymptotic linearity of the two components marginally, together with the fact that the composition of two Donsker classes is also Donsker. By definition Inline graphic. As argued in Parner (1998), from the Kullback–Leibler information being positive and by interchanging expectation and differentiation, we obtain Inline graphic.

To show continuous invertibility of Inline graphic and Inline graphic, it suffices to prove the invertibility of the operator Inline graphic corresponding to the cure model, as the scores for the cure model and for the competing risks model with left-truncation and right-censoring are asymptotically equivalent. Therefore, Inline graphic and Inline graphic are also asymptotically equivalent. The continuous Gateaux derivatives are provided in the Supplementary Material. Gateaux differentiability can be strengthened to Frechét differentiability with an additional continuity condition.

Thus, from Lemma S2 we obtain weak convergence of Inline graphic, and the covariances for the limiting process Inline graphic with Inline graphic and Inline graphic are

graphic file with name Equation27.gif

for Inline graphic  (Kosorok, 2008).

Supplementary material

Supplementary material available at Biometrika online includes proofs of Theorems 1 and 2, details on the sandwich estimator for the variance, results of additional simulation studies, and a proof of the equivalence of the weights proposed by Geskus (2011) and Zhang et al. (2011).

References

  1. Andersen,  P. K., Borgan,  Ø., Gill,  R. D. & Keiding,  N. (1988). Censoring, truncation and filtering in statistical models based on counting processes. Contemp. Math.  80, 19–60. [Google Scholar]
  2. Andersen,  P. K., Borgan,  Ø., Gill,  R. D. & Keiding,  N. (2013). Statistical Models Based on Counting Processes. New York: Springer. [Google Scholar]
  3. Bellach,  A., Kosorok,  M. R., Rüschendorf,  L. & Fine,  J. P. (2019). Weighted NPMLE for the subdistribution of a competing risk. J. Am. Statist. Assoc.  114, 259–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. CASCADE Collaboration (2006). Effective therapy has altered the spectrum of cause-specific mortality following HIV seroconversion. AIDS  20, 741–9. [DOI] [PubMed] [Google Scholar]
  5. Chen,  K., Jin,  Z. & Ying,  Z. (2002). Semiparametric analysis of transformation models with censored data. Biometrika  89, 659–68. [Google Scholar]
  6. Fine,  J. P. (2001). Regression modeling of competing crude failure probabilities. Biostatistics  2, 85–97. [DOI] [PubMed] [Google Scholar]
  7. Fine,  J. P. & Gray,  R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. J. Am. Statist. Assoc.  94, 496–509. [Google Scholar]
  8. Geskus,  R. B. (2011). Cause-specific cumulative incidence estimation and the Fine and Gray model under both left truncation and right censoring. Biometrics  67, 39–49. [DOI] [PubMed] [Google Scholar]
  9. Gray,  R. J. (1988). A class of Inline graphic-sample tests for comparing the cumulative incidence of a competing risk. Ann. Statist.  16, 1141–54. [Google Scholar]
  10. He,  S. & Yang,  G. L. (1998). Estimation of the truncation probability in the random truncation model. Ann. Statist.  26, 1011–27. [Google Scholar]
  11. Juraska,  M., Magaret,  C. A., Shao,  J., Carpp,  L. N., Fiore-Gartland,  A., Benkeser,  D., Girerd-Chambaz,  Y., Langevin,  E., Frago,  C., Guy,  B.  et al. (2018). Viral genetic diversity and protective efficacy of a tetravalent dengue vaccine in two phase 3 trials. Proc. Nat. Acad. Sci.  115, E8378–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Kalbfleisch,  J. D. & Prentice,  R. L. (2002). The Statistical Analysis of Failure Time Data. Hoboken, New Jersey: John Wiley & Sons, 2nd ed. [Google Scholar]
  13. Keiding,  N. & Gill,  R. D. (1990). Random truncation models and Markov processes. Ann. Statist.  18, 582–602. [Google Scholar]
  14. Kosorok,  M. R. (2008). Introduction to Empirical Processes and Semiparametric Inference. New York: Springer. [Google Scholar]
  15. Murphy  S. A. (1994). Consistency in a proportional hazards model incorporating a random effect. Ann. Statist.  22, 712–31. [Google Scholar]
  16. Murphy  S. A. (1995). Asymptotic theory of the frailty model. Ann. Statist.  23, 182–98. [Google Scholar]
  17. Owen,  A. B. (2001). Empirical Likelihood. Boca Raton, Florida: Chapman & Hall/CRC. [Google Scholar]
  18. Parner,  E. (1998). Asymptotic theory for the correlated gamma-frailty model. Ann. Statist.  26, 183–214. [Google Scholar]
  19. R Development Core Team (2020). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org. [Google Scholar]
  20. Rolland,  M., Edlefsen,  P. T., Larsen,  B., Tovanabutra,  S., Sanders-Buell,  E., Hertz,  T., Decamp,  A. C., Carrico,  C., Menis,  S., Magaret,  C. A.  et al. (2012). Increased HIV-1 vaccine efficacy against viruses with genetic signatures in Env V2. Nature  490, 417–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. van der Vaart,  A. W. & Wellner,  J. A. (1996). Weak Convergence and Empirical Processes. New York: Springer. [Google Scholar]
  22. Zeng,  D. & Lin,  D. Y. (2006). Efficient estimation of semiparametric transformation models for counting processes. Biometrika  93, 627–40. [Google Scholar]
  23. Zhang,  X., Zhang,  M. J. & Fine,  J. P. (2009). A mass redistribution algorithm for right-censored and left-truncated time to event data. J. Statist. Plan. Infer.  139, 3329–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Zhang,  X., Zhang,  M. J. & Fine,  J. P. (2011). A proportional hazards regression model for the subdistribution with right-censored and left-truncated competing risks data. Statist. Med.  30, 1933–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

asaa034_Supplementary_Data

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES