Abstract
Survival data is doubly truncated when only participants who experience an event during a random interval are included in the sample. Existing methods typically correct for double truncation bias in Cox regression through inverse probability weighting via the nonparametric maximum likelihood estimate (NPMLE) of the selection probabilities. This approach relies on two key assumptions, quasi-independent truncation and positivity of the sampling probabilities, yet there are no methods available to thoroughly assess these assumptions in the regression context. Furthermore, these estimators can be particularly sensitive to extreme event times. Finally, current double truncation methods rely on bootstrapping for variance estimation. Aside from the unnecessary computational burden, there are often identifiability issues with the NPMLE during bootstrap resampling. To address these limitations of current methods, we propose a class of robust Cox regression coefficient estimators with time-varying inverse probability weights and extend these estimators to conduct sensitivity analysis regarding possible non-positivity of the sampling probabilities. Also, we develop a nonparametric test and graphical diagnostic for verifying the quasi-independent truncation assumption. Finally, we provide closed-form standard errors for the NPMLE as well as for the proposed estimators. The proposed estimators are evaluated through extensive simulations and illustrated using an AIDS study.
Supplementary Information
The online version contains supplementary material available at 10.1007/s10985-025-09650-5.
Keywords: AIDS, Cox model, double truncation, robust inference
Introduction
Survival data is doubly truncated when inclusion in the sample requires that the event of interest occurs between two random truncation times, termed the study entry and end times. For example, autopsy-confirmation remains the gold standard for verifying Alzheimer’s disease (AD) status. If only autopsy-confirmed cases are included in an analysis where the time origin is the time of AD symptom onset, however, the sample becomes biased due to excluding all patients that either succumb to the disease prior to enrollment or survive past the end of the study. Another example of double truncation is the AIDS data analyzed in Sect. 5. In this dataset, the inclusion criteria are that the event of interest, AIDS diagnosis, occurred after the discovery of the disease and before the end of the study, with time measured in months since HIV infection. Double truncation also occurs in non-biomedical applications (Dörre and Emura 2019). In astronomy, for example, Efron and Petrosian (1999) describe a sample of quasar data which consists of quasars with luminosity (the “event time" here) large enough to yield reliable redshifts, but small enough to avoid confusion with other stellar objects. Standard survival analysis methods for left truncated and right censored data cannot be directly applied to doubly truncated data, since the resulting estimates will generally be biased and classical martingale-based results no longer apply.
Problem statement
A commonly used approach to account for double truncation bias in Cox regression is inverse probability weighting (Rennert and Xie 2018; Mandel et al. 2018), where the weights can be obtained by a nonparametric maximum likelihood estimator (NPMLE) under some conditions which will be described shortly (Efron and Petrosian 1999; Shen 2010; de Uña-Álvarez and Keilegom 2021). These inverse probability weighted estimators are popular because they are flexible, since they lack any modeling assumptions for the truncation distribution and can be easily computed using standard software. In particular, they can directly use some residual diagnostics from standard Cox regression software, unlike the conditional maximum likelihood estimator of Rennert and Xie (2022). Unfortunately, they currently suffer from several limitations that are specific to doubly truncated data.
First, in addition to the usual Cox model assumptions, these estimators rely on two key conditions. The first key condition is quasi-independence (see Sect. 3) between the truncation times and the event times and covariates. While tests for quasi-independence have been studied in the context of nonparametric analyses (Martin and Betensky 2005; Shen 2011), there are no available methods to thoroughly assess this assumption under double truncation when covariates are involved. Under one-sided truncation, quasi-independence can be tested through a Cox model for the truncation time (Vakulenko-Lagun et al. 2022; Wang et al. 2024), but fitting a regression model for the bivariate truncation distribution under double truncation is more complex, prone to model misspecification, and cannot be done through standard methods. The second key condition is that the sampling (non-truncation) probabilities are strictly positive across the entire event time distribution, commonly known as the positivity assumption. This assumption depends on the study design (Rennert and Xie 2018, supplementary material). In situations where this assumption may be violated, e.g. in Sect. 5, there are no available methods to assess the potential impact on inference results from doubly truncated Cox regression (but see Vakulenko-Lagun et al. (2020) for right truncation).
Furthermore, the use of inverse probability weights can make these estimators highly sensitive to extreme event times. The reason is that long survivors, who appear in many risk sets of the partial likelihood and so are already potential influential points, typically have a low estimated probability of being observed. Therefore they also tend to have larger IPW weights, which increases their influence over the fitted model even further. Thus a relatively small fraction of the sample may have undue influence on the Cox model estimates, and this can lead to increased variance in small samples, as well as substantial bias if these influential points are not representative of the target population, e.g. as a result of data contamination. In practice, distinguishing between contaminated and representative samples is often challenging if not impossible, which has motivated the development of robust estimators intended to reduce the influence of such outliers on the model estimates. Several modifications of the partial likelihood score function have been proposed to improve robustness for untruncated data, e.g. Sasieni (1993a) and Sasieni (1993b), but robust estimation has not yet been explored in doubly truncated data.
Finally, all the aforementioned methods for doubly truncated data rely on bootstrap resampling for calculating standard errors. There are several drawbacks to relying on resampling methods for inference with doubly truncated survival data, however. First, the NPMLE does not have a general closed-form expression, and must be computed by iterative methods. Thus, even basic nonparametric analyses could become fairly computationally intensive in large samples due to needing to re-fit the NPMLE. In addition, the NPMLE is well-defined only under certain conditions on the data (Xiao and Hudgens 2019), so it may not be identifiable for an arbitrary subsample of the data. Finally, as already mentioned, a key assumption when using inverse probability weighted estimators is that the sampling probability is strictly positive across all event times. This positivity assumption is not verifiable using solely the observed data, however, and in Sect. 5 we find evidence that it may not hold for the AIDS data. Therefore it is good practice to perform sensitivity analysis for possible positivity violations, which involves fitting several coefficient estimators (see Sect. 2.3), and using the nonparametric bootstrap to form standard errors for each of these estimators can take an unnecessarily long time. Such issues do not typically arise in the analysis of untruncated data.
Contributions
In this article, we study a new class of robust inverse probability weighted (IPW) Cox regression estimators that use weights based on the NPMLE, with an emphasis on deriving closed-form standard errors. This permits regression analysis without any unnecessary parametric assumptions on the truncation times, whose distribution is typically not of direct interest. As a preliminary, we derive a simplified form for the plug-in estimator of the influence function of the NPMLE and prove that it is well-defined as long as the NPMLE is identifiable. Therefore, our results also facilitate general nonparametric analysis of doubly truncated survival data, as illustrated by several examples provided in Sect. 3.1. For Cox regression we study the standard IPW partial likelihood estimator as well as a class of robust alternatives that use time-varying weights. In particular, we propose novel IPW estimators with time-varying weights based on the estimated survival function. This has the intuitive appeal of assigning relative importance to each event based on the size of its inverse probability weighted risk set, producing a highly robust estimator. For example, our simulation results in Sect. 4 show that the proposed survival function weights lead to a drastic reduction in mean squared error compared to the existing IPW estimators of Rennert and Xie (2018) and Mandel et al. (2018) under potential data contamination.
We also extend these IPW estimators to allow sensitivity analysis of the positivity assumption on the sampling probabilities and propose a simple nonparametric test and graphical diagnostic for the quasi-independent truncation assumption. We further provide closed-form standard errors that are both simple to compute through standard software and consistent under model misspecification. This is essential for sensitivity analysis of the positivity assumption, since the degree of the true positivity violation is unknown to the analyst. Theoretical properties of the estimators are developed, and we demonstrate the robustness of the proposed estimators in the presence of outliers through several simulation settings. Finally, we illustrate our proposed estimators using an AIDS study.
Related work
Assuming quasi-independence between the truncation and event times, Efron and Petrosian (1999) and Shen (2010) provided an estimation procedure and theoretical results for the NPMLE of the event and truncation time distribution functions, with standard errors later derived by Emura et al. (2015). Also, de Uña-Álvarez and Keilegom (2021) derived the influence function for the NPMLE and suggested a plug-in estimate. This estimate is cumbersome to implement, however, and its empirical performance was not evaluated.
Both Rennert and Xie (2018) and Mandel et al. (2018) proposed Cox regression estimators that account for double truncation bias by introducing inverse probability weights in the partial likelihood score function. They relied on the bootstrap for variance estimation when using NPMLE weights, however, since they were not able to derive closed-form standard errors. Later, Rennert and Xie (2022) proposed a conditional maximum likelihood estimator (MLE) that relaxed the quasi-independent truncation assumption to covariate-dependent truncation for the Cox model, also relying on bootstrap for inference (note their conditional MLE approach was initially deposited in a preprint server in 2018: https://arxiv.org/abs/1803.09830). Conditional maximum likelihood estimation has further been extended to doubly truncated data to fit semiparametric transformation models (Shen and Hsu 2020) and to analyze data that is also interval-censored (Shen 2025). If the stronger assumption of quasi-independent truncation is valid, however, IPW estimators with NPMLE weights hold several practical advantages over the MLE. First, they can be computed directly through standard Cox regression software once the weights are obtained, as described in Sect. 2.4, while the MLE uses an iterative EM algorithm. In fact, this ease of implementation also applies to our proposed standard errors (see Remark 3). Finally, IPW estimators can directly use standard Cox model diagnostics for model misspecification and non-proportional hazards based on plots of weighted residuals, e.g. Grambsch and Therneau (1994), which are widely available in standard software.
Organization
We describe the proposed doubly truncated Cox regression methods in Sect. 2, with the nonparametric test and graphical diagnostic for the quasi-independence assumption in Sect. 2.1, the proposed class of IPW estimators with time-varying weights in Sect. 2.2, and the sensitivity analysis for positivity violations in Sect. 2.3. The theoretical results, including closed-form standard errors for all proposed methods, are provided in Sect. 3. In Sect. 3.1 we describe the developments for the NPMLE, while Sect. 3.2 contains the asymptotics for the IPW estimators. The proposed methods are evaluated through extensive simulations in Sect. 4, and illustrated through an application to an AIDS dataset in Sect. 5.
Proposed methods
For any p-vector , let , , and . Furthermore, operations such as and are understood to be applied elementwise, and denotes the diagonal matrix with ith diagonal entry equal to . For any matrix , denotes its (i, j)th entry. denotes the identity matrix, while and are n-vectors of ones and zeros, respectively. If X is a random variable, then its support is denoted by .
For a random event time T, we define the at-risk process , the counting process , and the potentially time-varying p-dimensional covariate vector for . We also assume that, prior to truncation, the data follows a Cox regression model with hazard function
where is the true regression parameter and is the cumulative baseline hazard function. We use to denote the law of the pre-truncation data. Since the left and right truncation times (U, V) are assumed to be jointly independent of the event time (and covariates) within the observable data region , the non-truncation probability at event time t is .
We observe a random sample of doubly truncated data , , conditional on . For example, in the AIDS data analyzed in Sect. 5, is the ith AIDS incubation time, is the ith AIDS discovery time (1982), and is the ith end of study time (1986). Note that although the truncation times are fixed calendar dates in this case, do vary across individuals since time is measured from the date of HIV infection. Let be the law of the doubly truncated observed data, as opposed to the pre-truncation law . All probabilistic statements are understood to be with respect to this post-truncation distribution unless specified otherwise. By the quasi-independence between (U, V) and T, the pre-truncation event time cumulative distribution function (CDF) can be expressed as , where is the post-truncation CDF, , and is the overall non-truncation probability.
Nonparametric diagnostic for the quasi-independent truncation assumption
Consistency of the IPW estimators considered in this paper depends critically on the quasi-independent truncation assumption, which states that is independent of (U, V) within the observable region under . If this assumption holds, the selection probabilities for reduce to and can be estimated consistently by the NPMLE. This assumption may be violated if, for example, the truncation times and event time are associated with common baseline risk factors. Therefore we propose a two-step nonparametric diagnostic for assessing the quasi-independence assumption.
First, divide the data into S covariate strata. These strata should be based on binned values of covariates that are hypothesized to potentially be associated with the truncation times, and the diagnostic may be repeated using different stratifications. Since each additional stratum tends to reduce the power of the test, we recommend using only 2-4 strata in practice based on our simulation studies (Online Resource Section S5.2). Then, test for quasi-independent truncation within each stratum using, for example, conditional Kendall’s tau (Martin and Betensky 2005). This checks that the sth stratum-specific selection probabilities reduce to .
Finally, estimate the stratum-specific selection probabilities , , by computing the NPMLE within each stratum. Plot these estimates against time, along with the unstratified NPMLE estimates . Under quasi-independent truncation, the truncation time distribution should not vary by stratum, so we should have for all and . Visually, this can be assessed by how close the stratum-specific estimates are to the unstratified estimates , and the magnitude of these deviations quantify the estimated departure from quasi-independent truncation. In Online Resource Section S2, we derive the asymptotic null distribution of the test statistic and describe how to compute a p-value and confidence band for the diagnostic plot based on sampling from this distribution, using closed-form estimates. Furthermore, we assess the power and Type I error rate of this diagnostic test through several simulation settings in Online Resource Section S5.2. See Sect. 5.1 for an illustration of this diagnostic when applied to the AIDS data.
Robust IPW Cox regression
In this section, we briefly review existing IPW estimators for doubly truncated Cox regression, and then introduce our proposed estimators that are robust to outliers in the event time.
Since the truncated data is generally a biased sample, the classical maximum partial likelihood estimator is no longer a consistent estimator for the regression parameters . One can correct for this bias, however, by weighting each individual in the sample proportional to the inverse probability of not being truncated, e.g. using weights . Intuitively, individuals with a low probability of being observed are given larger weights and the resulting pseudo-population is an approximately representative sample of the pre-truncation distribution. In order to avoid unnecessary parametric assumptions or modeling of the truncation distribution, we consider inverse probability weighting using the NPMLE of the weights, which can be obtained from the point masses of the NPMLE of the event time CDF (see Sect. 2.4). Let denote the estimated weights. de Uña-Álvarez and Keilegom (2021) showed that converges in probability to a(t) uniformly in t under some assumptions.
Assuming no tied event times, the standard inverse probability weighted (IPW) partial likelihood has the score function
where
for . This inverse probability weighted partial likelihood has been previously applied to survey data, where the weights are known (Binder 1992; Lin 2000), for general biased samples with weights estimated parametrically (Pan and Schaubel 2008), and specifically to account for double truncation (Rennert and Xie 2018).
An alternative approach proposed by Mandel et al. (2018), which we refer to as stabilized weighting, only applies weights to individuals in the risk set covariate averages of the score function, but still results in an unbiased estimating function. We consider this method, as well as other potentially robust alternatives to the IPW partial likelihood, under the general framework of time-varying weights with the structure , e.g. with for stabilized weights. The use of such weights can be considered an inverse probability weighted analogue of what is known in the untruncated data setting as weighted Cox regression (Schemper 1992; Sasieni 1993a; Schemper et al. 2009). It can be particularly useful for doubly truncated data because the standard maximum partial likelihood estimator is known to be sensitive to extreme event times, and the introduction of inverse probability weights naturally tends to increase the influence of such event times, which generally have a low probability of being observed, even further. The IPW partial likelihood with time-varying weights based on weight function has the score function
with its derivative denoted as . Now let and . One can show algebraically that , the uniform limit of , so under the theoretical assumptions outlined in Sect. 3. Thus the score function provides an approximately unbiased estimating equation for under a wide class of non-negative weight functions , intuitively because the weighting in the risk-set average is unaffected. Standard errors for this class of IPW estimators can be computed with standard software and are provided in Sect. 3.2.
Some options for the weight function include:
Stabilized weights:
Survival function:
Fleming-Harrington weights: for
Combinations of above, e.g. stabilized survival .
In order to produce an estimator that is robust to outliers in the event time, the weight function should de-emphasize unusually large event times, where the risk set averages in the score function are dominated by a small number of individuals. We advocate for survival function weighting, which accomplishes this goal in the following intuitive manner: with , each event is weighted by the number of individuals at-risk in the IPW pseudo-sample at the given event time, since . If the estimated sampling probabilities are observed to approach zero, this can be combined with stabilized weighting to further reduce the variance of the coefficient estimates. Stabilized weights alone, however, may not sufficiently protect against influential points. Intuitively this is because the sampling probability is a functional of only the truncation time distribution, so the shape of a(t) does not change under different event time distributions. This is also supported by our simulation results in Sect. 4, where the IPW estimator using the stabilized weights of Mandel et al. (2018) is shown to be much more sensitive to data contamination compared to using the proposed survival function weights.
Lastly, given coefficient estimates , the inverse probability weighted estimator for the cumulative baseline hazard is
Note that the form of the estimator is unchanged when using time-varying weights, apart from the different coefficient estimate .
Sensitivity analysis for potential violations of the positivity assumption on the sampling probabilities
A key condition underlying the inverse probability weighted estimators described above is that all event times have a positive probability of being observed, i.e. . Since this condition cannot be assessed using solely the observed data, it is generally good practice to conduct a sensitivity analysis in order to quantify the potential impact of positivity violations on the coefficient estimates. We extend the approach of Vakulenko-Lagun et al. (2020), which was developed for right-truncated data, to double truncation. Due to the complexity of the conditional survival function under time-varying covariates, we only consider time-independent covariates here.
Suppose that the observed event time distribution is restricted to the deterministic interval [L, R] due to zero sampling probability outside this interval, with and which results in a violation of the positivity assumption. This positivity violation introduces another layer of bias in the sample, in addition to the bias from the random truncation intervals. Without further assumptions, inference must be done conditional on and . The first condition alone is less problematic for Cox regression because the hazard rate conditional on is simply , which is unchanged at any time in the observable region , so hazard ratio estimation is unaffected. Intuitively this is because the hazard rate conditions on survival up to time t. The second condition , however, can change the hazard rate at all , so it will generally result in biased estimates.
In order to correct the bias from the positivity violation, we first quantify the degree of the positivity violation by the truncated baseline probability mass , where is the true baseline survival function. If were known, we could impute the follow-up time that was truncated due to non-positivity by defining the modified at-risk process . It is straightforward to show that for any , which accounts for the bias due to non-positivity within the risk-set average. Letting be the risk-set average based on , the regression coefficients would be estimated using the modified IPW score function
The true truncated probability mass is not known in practice, since it involves the true baseline hazard, and estimating it jointly with may not be feasible. One can instead obtain a range of plausible coefficient estimates by fitting estimates along a fixed grid of values for . A sensitivity interval (SI) for assuming truncated mass of at most q is then given by the union of the confidence intervals from . The standard errors needed to construct these confidence intervals are provided in Sect. 3.2.
Software implementation
To estimate the pre-truncation event time CDF, consider the class of right-continuous step functions with positive increments at the observed event times . The NPMLE is , where the point masses maximize the nonparametric likelihood
subject to the constraints , , and (Efron and Petrosian 1999). It can be shown that the point masses of are inversely proportional to the estimated selection probabilities, that is where are the estimated inverse probability weights which are used in the IPW estimators discussed above.
The NPMLE is commonly computed by an EM algorithm that iteratively updates the estimates by setting and , then normalizing to sum to one (Efron and Petrosian 1999).
The proposed IPW estimators can be fit using standard software packages for Cox regression. In the R programming language (R Core Team 2023, v4.3.1), for example, one can use the coxph function from the survival package (Therneau and Grambsch 2000; Therneau 2023, v3.5-5) with weights set to and an offset of to obtain an IPW estimator with time-varying weights. The offset term produces the correct risk set averages without , since . The model components needed to compute the standard errors are also simple to extract this way (see Remark 3). This convenient workaround will, however, produce an incorrect baseline hazard estimate due to the presence of in the estimated hazard increments. To create a model object that will produce the correct hazard, one can then call coxph again with init set to the estimated coefficients, weights of (free of ), and control set to coxph.control(iter.max = 0) to not update the coefficient estimates further. Lastly, all standard errors should be adjusted to account for the variability of the NPMLE as described in Sect. 3 below.
Theoretical results
We make the following assumptions:
- Positivity: , where , and hold for some constants and . In addition:
- We have , , , and for some .
- The marginal post-truncation densities of U and V are bounded on , where U and V are assumed to be absolutely continuous.
NPMLE identifiability: the directed graph with edges defined by if is strongly connected (for any i and j, there exists a path from i to j and from j to i). This condition is due to Xiao and Hudgens (2019) and can be easily checked with standard software.
Quasi-independent truncation: is independent of (U, V) within the observable region , under .
Bounded covariates: is bounded above almost surely.
Monotone hazard: where is continuous and strictly positive on .
The matrix is positive definite.
For time-varying weights, the weight function converges in probability to a deterministic function of bounded variation and uniformly in , for some function depending on only the ith data point.
The positivity assumption (Assumption 1) and the quasi-independent truncation assumption (Assumption 3) allow nonparametric estimation of the inverse probability weights (de Uña-Álvarez and Keilegom 2021). We propose a diagnostic test for the quasi-independent truncation assumption in Sect. 2.1. The positivity assumption depends on the study design (recruitment period, study duration, inclusion criteria) and may be checked using external information on the range of possible event times (see Sect. 5). In cases where such information is not available or when it suggests that the positivity assumption is invalid, the potential impact of a positivity violation can be assessed by our sensitivity analysis proposed in Sect. 2.3. The graphical condition in Assumption 2 can be easily checked with standard software. The remaining Assumptions 4-7 are mild and commonly applied in Cox regression. In particular, Assumption 7 holds for both the existing and proposed estimators described in Sect. 2.2.
Influence function estimation for the NPMLE
Recently, de Uña-Álvarez and Keilegom (2021) derived the influence function for the NPMLE of the event time distribution function , which involves an infinite sum of bounded linear operators. Since an estimate of this influence function can be used to compute standard errors for and , they suggested estimating the infinite sum by plugging in the corresponding sample quantities and truncating the sum at some cutoff c. This can be fairly computationally intensive, however, and we further show in Lemma 1 below that the exact plug-in estimate has a simple closed-form which exists whenever the NPMLE is identifiable in the observed sample (Assumption 2).
Under Assumptions 1 and 3, the results of de Uña-Álvarez and Keilegom (2021) imply that
where is a bounded linear operator with for
and is a function that depends on the ith data point
For they derived
with
Here , and the plug-in estimates for and would be the corresponding empirical distribution functions from the observed sample. Then the estimates for and are right continuous step functions with jumps at the observed event times.
In order to estimate , first note that the plug-in estimator for is given by
Also, denote the lower triangular matrix of ones by , so . Lastly, define the matrices
and
With sorted event times, is the plug-in estimate of when there are no ties. If there are ties, we define the matrix with jth row equal to the kth standard basis vector , where and either or . Then it is straightforward to verify that is the plug-in estimate of . Lemma 1 below provides estimates for the influence functions of and , evaluated at .
Lemma 1
(Estimator for the influence function of the NPMLE) Suppose the event times are sorted so that and assume the strong connectedness condition holds. Let , , and be the matrix of plug-in influence function estimates for , such that estimates . Then
and the plug-in influence function estimates for , evaluated at and the ith data point, are
Furthermore, letting be the matrix of influence function estimates for ,
is the estimate corresponding to and the ith data point.
The proof for Lemma 1 is provided in Online Resource Section S1.
Since the estimated influence function for is a right-continuous step function with jumps at the observed event times, can be used for covariance estimation of at any time points. The estimate , on the other hand, would have right-continuous jumps at observed left truncation times, and left-continuous jumps at right truncation times. Therefore it should be evaluated at additional time points, which is straightforward, when conducting nonparametric analysis of the weight process. Only is required to construct standard errors for the IPW estimators, as described in Sect. 3.2 below.
The utility of these simple closed-form influence function estimates extends beyond Cox regression, which is the focus of Sect. 3.2, to general nonparametric analysis of doubly truncated survival data. They allow one to obtain confidence intervals and conduct hypothesis tests based on smooth functionals of the NPMLE without needing further complicated theoretical derivations and without relying on the nonparametric bootstrap. We discuss this below in the context of testing for ignorable sampling bias, with more examples provided in Online Resource Section S3. Additional simulation results for the NPMLE are available in Online Resource Section S5.1.
Remark 1
(Ignorable sampling bias) The NPMLE typically has higher variance than the much simpler empirical distribution function, which does not correct for truncation bias, and the same can be said for IPW vs unweighted Cox regression estimates. Given a sample of doubly truncated data, one may wish to evaluate whether the sampling bias is ignorable, since then standard methods for untruncated data could be applied. This situation occurs when the selection probability is constant, i.e. , , or equivalently . Recently, de Uña-Álvarez (2023) proposed testing for ignorable sampling bias through the statistic , where is the empirical distribution function. They noted that the asymptotic null distribution of this statistic is that of the supremum of a mean-zero Gaussian process G(t) with and , but they declined to work with this distribution due to its complexity. Instead they relied on bootstrap resampling to approximate the null distribution. It is computationally simpler, however, to use the influence function estimates in Lemma 1 to estimate this covariance function, and then calculate a p-value by repeated simulation of the process G(t).
Remark 2
(Ignorable sampling bias continued) We may instead assess the null hypothesis of ignorable sampling bias directly through , using the statistic . Under the null, this converges weakly to the restricted supremum of a mean-zero Gaussian process G(t) with , where is the influence function for a(t). Using the estimates from Lemma 1, one can calculate the p-value for this test by simulation.
Robust IPW Cox regression with time-varying weights
Let be the IPW coefficient estimates obtained using weight function . Define
for and let be the influence function of . The plug-in estimators are , and (see Lemma 1 above).
Consistency of can be shown by a concavity argument as in Rennert and Xie (2018). Theorem 1 below provides the standard errors for , while Theorem 2 provides the influence function estimates for the baseline hazard estimator.
Theorem 1
(Standard errors for robust IPW Cox regression with time-varying weights) Suppose . Under Assumptions 1-7,
where and, denoting the ith data point by and an independent data point , and Thus is asymptotically normal with plug-in variance estimator
Remark 3
(Computing the standard errors) The model components needed to compute the variance estimator in Theorem 1 are easy to obtain with standard software. If is obtained by the offset approach described in Sect. 2.4, then the ’s are the weighted score residuals and the ’s are weighted averages of those score residuals. The ’s are weighted averages of the Schoenfeld residuals and are asymptotically negligible when the fitted model is correctly specified. Lastly, is the observed inverse information matrix, which can be easily obtained from Cox regression software.
Remark 4
(Comparison with previous work) Mandel et al. (2018) studied the special case of stabilized weights . They provided an asymptotic representation for as a U-statistic with a fairly complicated kernel of degree 3. The standard errors provided in Theorem 1, on the other hand, are more directly useful for practical implementation and cover a wide variety of weight functions .
Remark 5
(Model misspecification and non-proportional hazards) The standard errors in Theorem 1 are still consistent under model misspecification. When the fitted model is incorrect, the IPW estimator with time-varying weights can be shown to converge to a well-defined constant under fairly general conditions (Struthers and Kalbfleisch 1986; Sasieni 1993b). The limiting value will depend on the chosen weight function , and under non-proportional hazards it has an approximate interpretation as an average regression effect. In particular, survival function weighting in the case of a single binary covariate leads to a that closely approximates the log-odds of concordance between the two covariate groups, with the concordance probability defined as (Schemper et al. 2009). This quantity has a clear interpretation regardless of whether the proportional hazards assumption holds. Further details and related simulation results are provided in Online Resource Section S5.4.
Remark 6
(Sensitivity analysis for the positivity assumption) As mentioned above, the proposed standard errors are consistent under model misspecification, which may result in . This is particularly useful for the sensitivity analysis estimator , since in practice one will use multiple values for the truncated mass , and at most one can be correct. Define by replacing with in , and let be the derivative of . In Online Resource Section S4 we show that is asymptotically normal with plug-in variance estimator
These standard errors are also easily adapted to the right truncation setting, where Vakulenko-Lagun et al. (2020) instead relied on bootstrap resampling.
Theorem 2
(Influence function for the IPW baseline hazard estimator) Under Assumptions 1-6, suppose is asymptotically linear with influence function . Then, uniformly in for , the IPW baseline hazard estimator satisfies
where and, for and an independent , and The plug-in estimates when is the robust IPW estimator from Theorem 1 are given by , where estimates negative , , and .
The proof for Theorem 2 is provided in Online Resource Section S4.
Simulation results
We assessed the accuracy of the proposed estimators and their standard errors through extensive simulations. For each setting, 1000 simulations were conducted. In each simulation we fit the standard IPW partial likelihood estimator (W-1) as well as several choices of time-varying weights, with equal to the survival function (W-surv), general Fleming-Harrington weights (W-fhrs) for , and their stabilized weighting counterparts (W-a), (W-asurv), and (W-afhrs), using the novel methods developed in Sect. 2.2. For comparison we also included the standard maximum partial likelihood estimator, which does not account for truncation bias (UW), and the conditional maximum likelihood estimator (MLE) (Rennert and Xie 2022). Standard errors for the IPW estimators were computed using the sandwich estimator from Theorem 1, either (1) based on only , the naive sandwich standard errors that do not account for the variability of the NPMLE, (2) based on , the proposed standard errors trusting that the model is correct when using time-varying weights, or (3) based on , the proposed robust variance estimator that is consistent under model misspecification. We did not compute standard errors for the MLE due to the high computational burden of bootstrapping.
First, we ran simulations to check the accuracy of the proposed standard errors. The truncation times were generated as and . The event times followed a Cox model with hazard rate , with the baseline hazard from a distribution, covariates with Bernoulli(0.5) independent of Uniform(0, 1), and . We set the sample size to . The results for the coefficient estimators are summarized in Table 1 (under no data contamination), including a representative subset of the Fleming-Harrington weighted estimators for easier comparisons. Here all inverse probability weighted estimators are unbiased, and their confidence intervals have the correct 95% coverage probability when using the proposed standard errors (adj. and robust). As expected, the unadjusted sandwich standard errors (unadj.) tend to underestimate the true variability of the IPW estimators. On the other hand, including does not have a tangible impact on the accuracy of the standard errors because the fitted model is correctly specified. Overall, we find that the MLE and all IPW estimators have roughly equivalent performance in this setting of well-behaved data, although the MLE happens to have a slightly higher variance than some IPW estimators (see Online Resource Section S5.3.2 for results at and 700). We also evaluated the baseline survival function estimators and their proposed standard errors in this simulation setting. As shown in Fig. 1 (under no data contamination), the IPW estimators are all unbiased, have correct confidence interval coverage rates, and are all roughly equivalent in terms of performance.
Table 1.
Simulation results for regression coefficient estimation with contaminated data at . The truncation rates were 0.15 (left), 0.21 (right), 0.36 (overall). Includes estimators for using the standard unadjusted (UW) and IPW partial likelihood (W-1), as well as time-varying weights based on the survival function (W-surv), general Fleming-Harrington weights (W-fhrs), and their stabilized weighting counterparts (W-a), (W-asurv), and (W-afhrs). 95% confidence intervals (CI) are obtained from a normal approximation and either unadjusted sandwich standard errors (unadj.) or the proposed standard errors that account for the NPMLE variability (adj. and robust)
| Estimator | Bias (%) | MSE | SD | Standard error | CI coverage | ||||
|---|---|---|---|---|---|---|---|---|---|
| Proposed | Proposed | ||||||||
| Unadj | Adj | Robust | Unadj | Adj | Robust | ||||
| No data contamination | |||||||||
| W-1a | 1.12 | 0.014 | 0.119 | 0.110 | 0.113 | 0.930 | 0.940 | ||
| W- | 0.72 | 0.015 | 0.120 | 0.116 | 0.118 | 0.119 | 0.943 | 0.946 | 0.947 |
| W-fh02c | 0.60 | 0.018 | 0.134 | 0.131 | 0.133 | 0.133 | 0.950 | 0.955 | 0.954 |
| W-fh11c | 1.15 | 0.015 | 0.122 | 0.115 | 0.118 | 0.119 | 0.934 | 0.939 | 0.940 |
| W- | 1.06 | 0.014 | 0.117 | 0.110 | 0.113 | 0.112 | 0.933 | 0.939 | 0.939 |
| W- | 0.75 | 0.014 | 0.120 | 0.115 | 0.118 | 0.118 | 0.941 | 0.948 | 0.949 |
| W-afh02c | 0.64 | 0.017 | 0.131 | 0.128 | 0.130 | 0.130 | 0.944 | 0.948 | 0.949 |
| W-afh11c | 1.11 | 0.015 | 0.122 | 0.116 | 0.118 | 0.119 | 0.933 | 0.940 | 0.944 |
| MLE | 1.65 | 0.018 | 0.134 | ||||||
| UW | 12.98 | 0.027 | 0.100 | 0.096 | 0.730 | ||||
| Contamination probability: 1% | |||||||||
| W-1a | 12.69 | 0.040 | 0.154 | 0.120 | 0.122 | 0.793 | 0.799 | ||
| W- | 6.05 | 0.019 | 0.123 | 0.118 | 0.120 | 0.120 | 0.910 | 0.917 | 0.918 |
| W-fh02c | 4.06 | 0.020 | 0.135 | 0.131 | 0.133 | 0.133 | 0.930 | 0.934 | 0.935 |
| W-fh11c | 9.58 | 0.027 | 0.133 | 0.122 | 0.123 | 0.124 | 0.871 | 0.876 | 0.879 |
| W- | 10.54 | 0.030 | 0.137 | 0.118 | 0.119 | 0.120 | 0.836 | 0.842 | 0.842 |
| W- | 5.69 | 0.018 | 0.121 | 0.117 | 0.119 | 0.119 | 0.919 | 0.924 | 0.925 |
| W-afh02c | 4.05 | 0.019 | 0.131 | 0.128 | 0.131 | 0.130 | 0.930 | 0.935 | 0.937 |
| W-afh11c | 8.66 | 0.024 | 0.130 | 0.121 | 0.123 | 0.123 | 0.884 | 0.887 | 0.888 |
| MLE | 9.28 | 0.031 | 0.149 | ||||||
| UW | 21.59 | 0.059 | 0.110 | 0.101 | 0.419 | ||||
| Contamination probability: 3% | |||||||||
| W-1a | 29.50 | 0.110 | 0.151 | 0.131 | 0.131 | 0.390 | 0.393 | ||
| W- | 15.52 | 0.039 | 0.121 | 0.119 | 0.120 | 0.120 | 0.733 | 0.737 | 0.738 |
| W-fh02c | 11.28 | 0.030 | 0.131 | 0.131 | 0.133 | 0.132 | 0.862 | 0.865 | 0.864 |
| W-fh11c | 23.10 | 0.071 | 0.131 | 0.127 | 0.127 | 0.128 | 0.548 | 0.546 | 0.557 |
| W- | 24.92 | 0.080 | 0.135 | 0.124 | 0.125 | 0.125 | 0.478 | 0.478 | 0.480 |
| W- | 14.75 | 0.036 | 0.120 | 0.119 | 0.120 | 0.120 | 0.759 | 0.766 | 0.766 |
| W-afh02c | 11.24 | 0.029 | 0.129 | 0.129 | 0.130 | 0.130 | 0.857 | 0.859 | 0.860 |
| W-afh11c | 21.16 | 0.061 | 0.128 | 0.125 | 0.126 | 0.126 | 0.600 | 0.604 | 0.607 |
| MLE | 22.13 | 0.070 | 0.146 | ||||||
| UW | 32.54 | 0.117 | 0.105 | 0.101 | 0.125 | ||||
Fig. 1.

Simulation results for baseline survival function estimation with contaminated data at . Includes the standard unweighted estimator, which does not account for double truncation bias, in pink dashed lines, the existing estimators in black lines (dashed for Rennert and Xie (2018), dotted for Mandel et al. (2018)), and the proposed estimators in green lines (dashed for , dotted for ). The 95% pointwise confidence intervals are computed using either a or transformation of the baseline survival function
Next, we examined robustness to data contamination. The data was contaminated by replacing a fifth of the linear predictors that had values above a cutoff c with standard normal random variables, and then using these contaminated linear predictors to simulate the event times. In other words, the data followed the Cox model where and Bernoulli(0.2). Thus a portion of the highest-risk individuals (according to the fitted Cox model) had long survival times that followed a different model. We ran two settings, with the cutoff c chosen to produce an overall contamination probability of 0.01 and 0.03, respectively, motivated by a similar simulation study for weighted Cox regression under right censoring (Sasieni 1993a). Again the sample size was set to . We also explored sample sizes of , which is similar to the AIDS dataset in Sect. 5, and (see Online Resource Section S5.3.2).
The results for the coefficient estimators under data contamination are summarized in Table 1. Although all estimators have increased bias as the contamination percentage increases, using the proposed time-varying weights based on (powers of) the survival function clearly leads to the lowest bias, lowest MSE, and most accurate confidence intervals out of all estimators by a wide margin. In particular, our proposed estimators (W-surv and W-asurv) attain 60% of the MSE of the existing estimators (Rennert and Xie 2018; Mandel et al. 2018; Rennert and Xie 2022), which do not have time-varying weights based on the event time distribution, and 80% of the MSE of Fleming-Harrington weights with , which use the NPMLE to downweight both early and late event times. Whether one uses the survival function alone (W-surv) or its stabilized counterpart (W-asurv) makes relatively little difference. The same trends occur at other sample sizes as well (Online Resource Section S5.3.2). Similarly, the proposed survival function weights produce the most robust estimator of the baseline survival function, as shown in Fig. 1. Again, they result in lower bias, lower MSE, and more accurate confidence interval coverage than the weights used by both existing estimators. Since the estimators based on Fleming-Harrington weights performed similar to Mandel et al. (2018) weights (for ) and survival function weights (for ), we omit their results for clarity.
We observed several notable trends in the simulation results for the full set of Fleming-Harrington weighted estimators with (some not reported in Table 1). First, weights constructed from only the CDF () generally had the worst bias and variance across all settings, so we chose not to report their results. This may be explained by the fact that such weights emphasize long event times, which makes the estimator more sensitive to influential points. This would also explain why weights constructed from only the survival function () typically had the best MSE across all settings. Among this class of weights, choosing the best power s involved a bias-variance tradeoff, where larger s improved robustness under data contamination while also increasing the variance of the estimator. The best choice may depend on the distribution of the data and thus be difficult to determine in practice. Therefore, we recommend the proposed survival function weights as a generally reasonable choice with much better robustness than the existing double truncation methods.
Finally, we simulated data under violations of the positivity assumption. The event times followed the same Cox model as in Table 1 (without data contamination), which had support [0.1, 0.5]. The uniform covariate was centered for better interpretability of the baseline distribution. The truncation times were and , which resulted in a true truncated mass of . We fit sensitivity estimators for values of ranging from 0 to 1, in increments of 0.05, using the novel method outlined in Sect. 2.3 and the standard errors developed in Sect. 3.2.
Table 2 summarizes the results for a subset of the fitted sensitivity estimators at . First, we note that the IPW estimators which do not consider potential positivity violations (with ) have a heavy bias of about and 95% confidence interval coverage rates of less than . Thus all existing methods would heavily underestimate the true hazard ratio. In contrast, the sensitivity analysis estimators near the true truncated mass are approximately unbiased, and their sensitivity intervals have roughly 95% coverage. As long as the truncated mass is not underestimated, all sensitivity intervals achieve at least 95% coverage. The two sensitivity analysis estimators that use the proposed survival function weights (W-surv and W-asurv) generally have the lowest variance and thinnest sensitivity intervals, leading to higher power compared to the novel sensitivity analysis estimators that use weights adapted from the existing methods of Rennert and Xie (2018) (W-1) and Mandel et al. (2018) (W-a).
Table 2.
Simulation results for proposed sensitivity analysis under violated positivity assumption at . Includes novel IPW sensitivity analysis estimators with weights adapted from the standard IPW partial likelihood estimator (W-1), the proposed time-varying weights based on the survival function (W-surv), and their stabilized weighting counterparts (W-a) and (W-asurv). The true truncated mass was 0.1 on the left and 0.4 on the right, i.e. , while the true coefficient value was . The sensitivity intervals (SI) have nominal level 0.95 when the truncated mass is not underestimated
| Quantity | Estimator | Assumed truncated mass | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.00 | 0.10 | 0.20 | 0.30 | 0.40 | 0.45 | 0.50 | 0.60 | 0.70 | 0.80 | ||
| Bias of | W-1 | 0.62 | 0.39 | 0.27 | 0.16 | 0.05 | 0.02 | 0.08 | 0.24 | 0.44 | 0.71 |
| W-surv | 0.56 | 0.38 | 0.27 | 0.16 | 0.05 | 0.01 | 0.07 | 0.23 | 0.41 | 0.68 | |
| W-a | 0.60 | 0.39 | 0.28 | 0.16 | 0.04 | 0.04 | 0.10 | 0.26 | 0.45 | 0.72 | |
| W-asurv | 0.57 | 0.39 | 0.29 | 0.17 | 0.05 | 0.01 | 0.07 | 0.23 | 0.42 | 0.68 | |
| SD | W-1 | 0.37 | 0.47 | 0.49 | 0.55 | 0.60 | 0.63 | 0.67 | 0.76 | 0.84 | 0.98 |
| W-surv | 0.31 | 0.41 | 0.45 | 0.51 | 0.57 | 0.61 | 0.65 | 0.71 | 0.81 | 0.95 | |
| W-a | 0.30 | 0.40 | 0.50 | 0.56 | 0.63 | 0.65 | 0.70 | 0.79 | 0.92 | 1.11 | |
| W-asurv | 0.27 | 0.36 | 0.43 | 0.49 | 0.53 | 0.57 | 0.62 | 0.67 | 0.76 | 0.92 | |
| SI length | W-1 | 0.96 | 1.34 | 1.54 | 1.75 | 1.97 | 2.08 | 2.21 | 2.52 | 2.90 | 3.42 |
| W-surv | 0.92 | 1.24 | 1.45 | 1.65 | 1.88 | 1.99 | 2.12 | 2.43 | 2.81 | 3.33 | |
| W-a | 0.87 | 1.25 | 1.50 | 1.75 | 2.00 | 2.14 | 2.30 | 2.65 | 3.07 | 3.66 | |
| W-asurv | 0.80 | 1.11 | 1.31 | 1.53 | 1.75 | 1.88 | 2.01 | 2.34 | 2.75 | 3.29 | |
| SI coverage | W-1 | 0.35 | 0.72 | 0.85 | 0.90 | 0.93 | 0.94 | 0.95 | 0.97 | 0.97 | 0.98 |
| W-surv | 0.36 | 0.73 | 0.88 | 0.93 | 0.95 | 0.96 | 0.97 | 0.98 | 0.99 | 0.99 | |
| W-a | 0.30 | 0.72 | 0.86 | 0.92 | 0.95 | 0.95 | 0.96 | 0.98 | 0.99 | 0.98 | |
| W-asurv | 0.28 | 0.64 | 0.83 | 0.90 | 0.95 | 0.96 | 0.97 | 0.98 | 0.99 | 0.99 | |
Real data analysis
We applied the proposed estimators to analyze the relationship between age and AIDS incubation time from transfusion-acquired HIV. The data was collected by the Centers for Disease Control (CDC) and is publicly available in the R package gss (Gu 2014, v2.2-7). The patients were sampled retrospectively, conditional on AIDS diagnosis between its discovery in 1982 and the end of the study in 1986. Time was measured in months since HIV infection, so the truncation times and varied across individuals. The single covariate was age at infection, with three groups: children (age , ), adults (age 5-59, ), and elderly (age , ). The reference group was elderly patients. We did not find strong evidence against quasi-independent truncation (Assumption 3; see Sect. 5.1), and we found that the data satisfied the strong connectedness condition for NPMLE identifiability (Assumption 2) using the is.connected function from the R package igraph (Csárdi and Nepusz 2006; Csárdi et al. 2025, v1.5.0). Since we observed that the NPMLE approached zero for patients with longer incubation times, we chose to analyze this data using the stabilized survival function time weight to allow robustness against extreme inverse probability weights.
Existing literature has estimated the median incubation time of AIDS from transfusion-acquired HIV infection to be around 20 months for children, 90 months for adults, and 65 months for the elderly (Medley et al. 1987; Blaxhult et al. 1990; Kopec-Schrader et al. 1993). For comparison, we computed the NPMLE of the incubation time distribution for each of the three age groups. The estimated median (range) was 18 (4-43) months for children, 63 (4-89) months for adults, and 64 (0.5-83) months for the elderly age group. Evidently there is likely a severe positivity violation in this dataset, with possibly 50% of the probability mass for the adult incubation time distribution being right-truncated.
The results of the data analysis are provided in Table 3. Note that, relative to elderly patients, children are estimated to have shorter incubation times on average, while adults are estimated to have longer incubation times. This is intuitive and also supported by prior clinical literature (Medley et al. 1987; Blaxhult et al. 1990; Kopec-Schrader et al. 1993), since adults are expected to have the most robust immune systems among the three age groups. When we assumed there was no truncated mass (), the estimated hazard ratios were 8.5 for children and 0.5 for adults, relative to the elderly, but the latter regression effect was not statistically significant. Given the positivity issues outlined above, we also conducted a sensitivity analysis to assess the robustness of our Cox regression estimates. The results, also provided in Table 3, indicate that we are potentially severely underestimating the magnitude of the regression effect for adults. The estimates for children, on the other hand, are fairly stable. This is in-line with the fact that adults tend to have the longest incubation times, so they are more likely to be right-truncated due to non-positivity compared to the other age groups.
Table 3.
Regression coefficient estimates and 95% sensitivity intervals (SI) for AIDS incubation data, under several assumed values for the truncated mass . The single categorical covariate was age group, with reference group age . Here is the coefficient estimate for the given age group
| Age | Age 5-59 | |||||
|---|---|---|---|---|---|---|
| Lower SI | Upper SI | Lower SI | Upper SI | |||
| 0.00 | 2.14 | 1.47 | 2.81 | 0.69 | 1.77 | 0.38 |
| 0.02 | 2.16 | 1.47 | 2.83 | 0.94 | 2.88 | 1.01 |
| 0.04 | 2.18 | 1.47 | 2.85 | 1.10 | 3.69 | 1.48 |
| 0.06 | 2.20 | 1.47 | 2.87 | 1.28 | 4.66 | 2.10 |
| 0.08 | 2.22 | 1.47 | 2.89 | 1.48 | 5.95 | 3.00 |
| 0.10 | 2.24 | 1.47 | 2.91 | 1.73 | 7.87 | 4.42 |
| 0.12 | 2.26 | 1.47 | 2.93 | 2.05 | 11.14 | 7.04 |
| 0.14 | 2.29 | 1.47 | 2.96 | 2.54 | 18.36 | 13.28 |
Assessing the quasi-independent truncation assumption
To check for potential violations of the quasi-independent truncation assumption, we applied the proposed nonparametric diagnostic described in Sect. 2.1, stratified by age group. First, we checked the quasi-independent truncation assumption within strata. The estimated conditional Kendall’s tau rank correlation between the incubation times and truncation times was 0.06 for children, 0.04 for adults, and 0.12 among the elderly. The aggregate p-value for testing whether all three rank correlations were zero was 0.05. Given the small effect sizes and marginally significant p-value, we did not consider this strong evidence against the quasi-independent truncation assumption within each strata.
Then, we checked the quasi-independent truncation assumption across strata. The diagnostic plot in Fig. 2 includes a 95% uniform confidence band which shows no significant difference between the stratum-specific and unstratified selection probabilities. The p-value for testing the null hypothesis that was 0.2. The confidence band and p-value were computed based on 2000 simulations from the estimated asymptotic null distribution of . In conclusion, we did not find strong evidence that the quasi-independent truncation assumption was violated in this dataset.
Fig. 2.

Diagnostic plot for assessing the quasi-independent truncation assumption in the AIDS data, stratified by age group. Includes the stratum-specific NPMLE’s (gray lines), the unstratified NPMLE (solid black line), and a 95% uniform confidence band for centered at (black dashed lines)
Discussion
In this paper, we have proposed robust IPW estimation for doubly truncated Cox regression with time-varying weights based on the survival function, which has been shown to be highly robust against influential points with contaminated data. In particular, our simulation results have shown that the proposed estimators can achieve lower bias and MSE than the current IPW estimators of Rennert and Xie (2018) and Mandel et al. (2018) in such settings. Although the estimator of Mandel et al. (2018) can be thought of as a robust IPW estimator with time-varying weights based on the normalized sampling probabilities , our proposed estimators consistently showed better robustness properties across several simulation settings. This may be explained by the fact that the sampling probability is determined solely by the bivariate truncation time distribution, which does not allow their stabilized weights to automatically adapt to different event time distributions. We have further derived closed-form standard errors for a general class of IPW estimators based on NPMLE weights, which includes existing methods as well as our robust alternatives. Lastly, we have developed a graphical diagnostic and statistical test for the quasi-independent truncation assumption when it involves covariates, as well as sensitivity analysis approaches regarding the positivity assumption on the sampling probabilities. One possible direction for future work would be to relax the quasi-independent truncation assumption to covariate-dependent truncation.
Our proposed diagnostic for quasi-independent truncation is based on stratifying by binned covariate values, which leads to a relatively simple testing procedure. For each stratum, however, neither the within-stratum quasi-independence test nor the within-stratum selection probability estimates use any information from the other strata. This may limit the power of the test when there are a large number of covariates, particularly if many of them are continuous. To mitigate this issue, the within-strata quasi-independence tests could be replaced by a modification of the conditional Kendall’s tau test (Martin and Betensky 2005) with kernel smoothing across covariate values. Estimating the conditional selection probabilities nonparametrically without stratification, on the other hand, would require a novel estimator that is beyond the scope of this paper. Other approaches to (conditional) independence testing have been studied for untruncated samples (Su and White 2014; Shah and Peters 2020; Zhou et al. 2020; Cai et al. 2022), but their extension to doubly truncated data is complicated by the sampling bias.
Our results have provided useful tools for the analysis of observational studies subject to double truncation. Through our developments for the NPMLE and our robust IPW estimators, we provide fast extensions of standard methods for right censored data, where inference is based on Kaplan-Meier curves and the maximum partial likelihood estimator, to doubly truncated data. Testing for ignorable sampling bias through the NPMLE is easy to implement and computationally fast. Also, using our closed-form standard errors for Cox regression will expedite the model building process and facilitate sensitivity analysis for the positivity assumption without the unnecessary computational burden of resampling methods. The R code for the data analysis in Sect. 5 is available online at https://github.com/omar-vazquez/robust_ipw.
In practice, the analyst can apply standard weighted residual diagnostics with our proposed IPW estimators to detect model misspecification and update the model accordingly. To address non-proportional hazards, our proposed estimators can be easily extend to fit stratified Cox models. Another option is to add new time-varying covariates to the model based on interactions between existing covariates and functions of time. Unfortunately, this will complicate the interpretation of the coefficient estimates, since the log hazard ratios now vary over time. A third option would be to neither stratify nor add these interactions, and just fit the misspecified Cox model. This can provide a simple and useful summary of the covariate effects, since existing work implies that the coefficient estimates from an IPW estimator with time-varying weights based on the weight function w(t) will approximate a weighted average of the true log hazard ratios over time, with weights (see Remark 5 and Supplementary material S5.4). The proposed survival function weights are an intuitive choice in this context, since each time point is weighted by the proportion of individuals still at-risk.
Using our proposed diagnostic, we found that the quasi-independent truncation assumption was satisfied in the AIDS data to a reasonable extent. Nevertheless, we did observe a small positive Kendall’s tau correlation between the incubation and truncation times in Sect. 5.1, mostly in the elderly age group. This could be caused by a delay in diagnosis time for patients infected far before AIDS discovery. In that case, the reported AIDS diagnosis times would overestimate the true incubation times resulting in measurement error for the event times. Accounting for this potential issue goes beyond correcting for double truncation bias and may require some additional modeling.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgements
This work is supported in part by funds from the National Institutes of Health (NIH): R01-NS102324, P30-AG072979, P01-AG066597, U19-AG062418, and P01-AG084497.
Appendix A Proof of Theorem 1
By a first order Taylor series expansion of the score function, we have
where lies on the line between and . Therefore
First, by the convergence of to and uniform convergence of to , , we have
and this sum of iid random matrices converges in probability to the full rank matrix
by the law of large numbers.
We move on to the gradient. Let and note that . Therefore the gradient can be written as
For the first term , by construction the process is a sum of iid terms with uniformly mean zero. By the monotonicity of its components, it converges weakly to a mean-zero Gaussian process, so the uniform convergence of implies that
For the second term , we first apply the uniform convergence of , and then plug-in to get
Similarly, for the third term we apply the uniform convergence of to get
since , and then plug-in to get
Thus can be written in the form of a mean zero U-statistic
with kernel
Since and have uniformly mean zero, we have , which simplifies the form of the kernel’s Hájek projection. Thus,
with
This completes the proof of Theorem 1.
Declarations
Conflict of interest
The authors declare that they have no Conflict of interest
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Binder DA (1992) Fitting Cox’s proportional hazards models from survey data. Biometrika 79(1):139–147 [Google Scholar]
- Blaxhult A, Granath F, Lidman K, Giesecke J (1990) The influence of age on the latency period to aids in people infected by hiv through blood transfusion. AIDS 4(2):125–130 [DOI] [PubMed] [Google Scholar]
- Cai Z, Li R, Zhang Y (2022) A distribution free conditional independence test with applications to causal discovery. J Mach Learn Res 23(85):1–41 [Google Scholar]
- Csárdi G, Nepusz T (2006) The igraph software package for complex network research. Inter J Complex Syst. 1695, https://igraph.org
- Csárdi G, Nepusz T, Traag V, Horvát S, Zanini F, Noom D, Müller K (2025) igraph: Network analysis and visualization in R. 10.5281/zenodo.7682609, https://CRAN.R-project.org/package=igraph, R package version 1.5.0
- Dörre A, Emura T (2019) Analysis of doubly truncated data: an introduction. Springer, New York [Google Scholar]
- Efron B, Petrosian V (1999) Nonparametric methods for doubly truncated data. J Am Stat Assoc 94(447):824–834 [Google Scholar]
- Emura T, Konno Y, Michimae H (2015) Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation. Lifetime Data Anal 21:397–418 [DOI] [PubMed] [Google Scholar]
- Fleming TR, Harrington DP (2013) Counting processes and survival analysis, vol 625. John Wiley & Sons, Hoboken [Google Scholar]
- Grambsch PM, Therneau TM (1994) Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 81(3):515–526 [Google Scholar]
- Gu C (2014) Smoothing spline ANOVA models: R package gss. J Stat Softw 58(5):1–25 [Google Scholar]
- Kopec-Schrader E, Tindall B, Learmontt J, Wyliet B, Kaldor JM (1993) Development of aids in people with transfusion-acquired hiv infection. AIDS 7(7):1009–1014 [DOI] [PubMed] [Google Scholar]
- Lin D (2000) On fitting Cox’s proportional hazards models to survey data. Biometrika 87(1):37–47 [Google Scholar]
- Mandel M, de Uña-Álvarez J, Simon DK, Betensky RA (2018) Inverse probability weighted Cox regression for doubly truncated data. Biometrics 74(2):481–487 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin EC, Betensky RA (2005) Testing quasi-independence of failure and truncation times via conditional Kendall’s tau. J Am Stat Assoc 100(470):484–492 [Google Scholar]
- Medley G, Anderson R, Cox D, Billard L (1987) Incubation period of aids in patients infected via blood transfusion. Nature 328(6132):719–721 [DOI] [PubMed] [Google Scholar]
- Pan Q, Schaubel DE (2008) Proportional hazards models based on biased samples and estimated selection probabilities. Can J Stat 36(1):111–127 [Google Scholar]
- R Core Team (2023) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/
- Rennert L, Xie SX (2018) Cox regression model with doubly truncated data. Biometrics 74(2):725–733 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rennert L, Xie SX (2022) Cox regression model under dependent truncation. Biometrics 78(2):460–473, the unofficial version of this Biometrics paper describing the same method was deposited in a preprint server in 2018 [DOI] [PMC free article] [PubMed]
- Sasieni P (1993) Maximum weighted partial likelihood estimators for the Cox model. J Am Stat Assoc 88(421):144–152 [Google Scholar]
- Sasieni P (1993) Some new estimators for Cox regression. Ann Stat 21(4):1721–1759 [Google Scholar]
- Schemper M (1992) Cox analysis of survival data with non-proportional hazard functions. J R Stat Soc Ser D: Stat 41(4):455–465 [Google Scholar]
- Schemper M, Wakounig S, Heinze G (2009) The estimation of average hazard ratios by weighted Cox regression. Stat Med 28(19):2473–2489 [DOI] [PubMed] [Google Scholar]
- Shah RD, Peters J (2020) The hardness of conditional independence testing and the generalised covariance measure. Ann Stat 48(3):1514–1538 [Google Scholar]
- Shen PS (2010) Nonparametric analysis of doubly truncated data. Ann Inst Stat Math 62(5):835–853 [Google Scholar]
- Shen PS (2011) Testing quasi-independence for doubly truncated data. J Nonparametric Stat 23(3):753–761 [Google Scholar]
- Shen PS (2025) Cox regression model with doubly truncated and interval-censored data. Comput Stat & Data Anal 203:108090 [Google Scholar]
- Shen PS, Hsu H (2020) Conditional maximum likelihood estimation for semiparametric transformation models with doubly truncated data. Comput Stat & Data Anal 144:106862 [Google Scholar]
- Struthers CA, Kalbfleisch JD (1986) Misspecified proportional hazard models. Biometrika 73(2):363–369 [Google Scholar]
- Su L, White H (2014) Testing conditional independence via empirical likelihood. J Econom 182(1):27–44 [Google Scholar]
- Therneau TM (2023) A package for survival analysis in R. https://CRAN.R-project.org/package=survival, R package version 3.5-5
- Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, New York [Google Scholar]
- de Uña-Álvarez J, Keilegom IV (2021) Efron–Petrosian integrals for doubly truncated data with covariates: an asymptotic analysis. Bernoulli 27(1):249–273 [Google Scholar]
- de Uña-Álvarez J (2023) Testing for an ignorable sampling bias under random double truncation. Stat Med 42(20):3732–3744 [DOI] [PubMed] [Google Scholar]
- Vakulenko-Lagun B, Mandel M, Betensky RA (2020) Inverse probability weighting methods for Cox regression with right-truncated data. Biometrics 76(2):484–495 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vakulenko-Lagun B, Qian J, Chiou SH, Wang N, Betensky RA (2022) Nonparametric estimation of the survival distribution under covariate-induced dependent truncation. Biometrics 78(4):1390–1401 [DOI] [PubMed] [Google Scholar]
- Wang Y, Ying A, Xu R (2024) Doubly robust estimation under covariate-induced dependent left truncation. Biometrika p asae005 [DOI] [PMC free article] [PubMed]
- Xiao J, Hudgens M (2019) On nonparametric maximum likelihood estimation with double truncation. Biometrika 106(4):989–996 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou Y, Liu J, Zhu L (2020) Test for conditional independence with application to conditional screening. J Multivar Anal 175:104557 [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
