Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 May 14.
Published in final edited form as: Comput Stat Data Anal. 2013 Dec 4;73:16–26. doi: 10.1016/j.csda.2013.11.018

Eliminating bias due to censoring in Kendall’s tau estimators for quasi-independence of truncation and failure

Matthew D Austin a, Rebecca A Betensky a,*
PMCID: PMC3912250  NIHMSID: NIHMS548321  PMID: 24505164

Abstract

While the currently available estimators for the conditional Kendall’s tau measure of association between truncation and failure are valid for testing the null hypothesis of quasi-independence, they are biased when the null does not hold. This is because they converge to quantities that depend on the censoring distribution. The magnitude of the bias relative to the theoretical Kendall’s tau measure of association between truncation and failure due to censoring has not been studied, and so its importance in real problems is not known. We quantify this bias in order to assess the practical usefulness of the estimators. Furthermore, we propose inverse probability weighted versions of the conditional Kendall’s tau estimators to remove the effects of censoring and provide asymptotic results for the estimators. In simulations, we demonstrate the decrease in bias achieved by these inverse probability weighted estimators. We apply the estimators to the Channing House data set and an AIDS incubation data set.

Keywords: Inverse probability weighting, Left truncation

1. Introduction

In observational studies, subjects are often followed from an initiating event to a failure event, and the lag time between these two events, the failure time, is of primary interest. Truncated survival data arise when the failure time is observed only if it falls within a subject specific region, known as the truncation set. The lower and upper limits of the truncation set are termed the left and right truncation times, respectively. This mechanism differs from censoring in that truncated subjects from the reference population are not observed at all, whereas censored observations are sampled, though their data are incomplete, as their exact failure times are unknown. One well known example of truncated data is an AIDS incubation cohort study of HIV positive subjects with AIDS (Lagakos et al., 1988). The failure time is the lag time between HIV infection and AIDS onset, which was observed only if it was less than the lag time between HIV infection and study recruitment. Thus, the lag time between HIV infection and AIDS onset is right truncated by the lag time between HIV infection and the end of recruitment. A second, well known example of left truncated, right censored data is the Channing House data (Hyde, 1977), in which 97 male residents of the Channing House retirement home were observed until death, or end of study or departure from the community. The failure time of interest is the age at death of the subjects, with age at entry into the retirement community as the left truncation time, as subjects were sampled only if their ages at death exceeded their ages at entry into the Channing House, and age at end of study or departure from the community as the right censoring time.

Quasi-independence of truncation and failure refers to their independence in the observable region (Tsai, 1990). Quasi-independence allows the joint density of the truncation time and the failure time over the observable region to be factored into a product that is proportional to the product of the marginal densities of each variable. Under quasi-independence, the distribution of the failure time can be consistently estimated by the risk set adjusted product limit estimator of Kaplan and Meier (1958) or the self-consistency algorithm of Turnbull (1976). Unlike the requisite and unidentifiable assumption of independence of failure and censoring (Tsiatis, 1981) for application of standard survival analysis methods to censored failure time data, quasi-independence can be tested using the observed data (Tsai, 1990). A popular nonparametric statistic that is used to quantify dependence is Kendall’s tau (Kendall, 1938), although it is a measure of association and not statistical dependence. Thus, while a non-zero Kendall’s tau implies dependence, the converse is not true, and so a Kendall’s tau test is most useful when it rejects the null. Tsai (1990) and Martin and Betensky (2005) proposed modified versions of Kendall’s tau that account for various types of truncation in the presence of censoring. While the estimators for the conditional Kendall’s tau are valid for testing the null hypothesis of quasi-independence (i.e., they are constructed to have expectation zero under the null hypothesis), they are biased for the “true” Kendall’s tau measure of association between the failure time and the truncation time when the null does not hold. This is because they converge to quantities that depend on the censoring distribution.

When quasi-independence of failure and truncation does not hold, estimation of the failure distribution requires a model for its dependence on truncation. This model dictates the nature of the dependence estimator that is required. A semi-parametric model, such as a transformation model (Efron and Petrosian, 1994; Austin and Betensky, 2012) requires a nonparametric estimator of dependence, and thus a Kendall’s tau type measure is appropriate. The transformation model operates by transforming the truncation variable to a latent, unobserved truncation time that would have been observed in the absence of dependence on the failure time, and then uses it in a standard estimator for the failure distribution that assumes quasi-independence of truncation and failure. The transformation requires an estimate of a dependence parameter, and it is desirable for this estimate to be free of the censoring distribution; this was shown in simulation studies conducted by Austin and Betensky (2012). In particular, there was considerably less bias for the transformation models that were based on the censoring-adjusted Kendalls tau as compared to the unadjusted Kendalls tau. Thus, an estimator of the conditional Kendall’s tau that does not depend on the underlying censoring is needed; the derivation of such a measure is the topic of this paper. An alternative approach is to use a copula model for the dependence between failure and truncation (Chaieb et al., 2006). This approach requires input of the copula-based dependence parameter that does not contain information on the censoring distribution; this is a parametric measure that depends on the selection of the particular copula family. The copula dependence estimator does not depend on the marginal distributions of failure and truncation, whereas the Kendall’s tau estimators do. For the ultimate purpose of estimation of the failure distribution, however, there is no advantage to this independence, while there is a downside to the parametric assumptions required for the copula estimator.

Beaudoin et al. (2007) reviewed the conditions for consistency of several existing estimation procedures for Kendall’s tau when one variable is subject to right censoring. All of these estimators either suffer from computational complexity, or are not consistent for the true Kendall’s tau under dependence. The computational complexities arise from kernel density estimation, permutation, or complex imputation for the censored observations. Oakes (2008) presented conditions for consistency of Kendall’s tau for bivariate random variables that are potentially subject to truncation, but like Beaudoin et al. (2007), did not allow for the case in which one variable truncates the observation of the other. Thus, for our setting of quasi-independence testing, if there is censoring, these estimators are biased for the true Kendall’s tau, and should not be used.

The magnitude of the bias due to censoring in the presence of truncation has not been studied, although it was recognized by Martin and Betensky (2005), and so its importance is not known. In this paper, we examine this issue and quantify this bias in order to assess the practical usefulness of the estimators. Furthermore, we propose inverse probability weighted versions of the conditional Kendall’s tau statistics to remove the effects of censoring. This follows the approach of Uno et al. (2009), who derived an inverse probability weighted C-statistic for assessing concordance between a failure time and a continuous marker in the context of right censoring, where the same problem arises. This problem was also addressed for Kendall’s tau in the presence of bivariate censoring (Lakhal et al., 2009). Through use of inverse probability weighting, we are able to eliminate the bias due to censoring up to the upper limit of the support of the censoring variable. We consider two commonly used sampling models, and derive corrected tau estimators for each. We also provide asymptotic results for the estimators.

In Section 2 we introduce notation, both sampling models, and provide an overview of the conditional Kendall’s tau estimators. In Sections 3 and 4 we derive our proposed estimators, prove their consistency and provide associated asymptotic results. In Section 5 we report simulation results, and results from the Channing House and AIDS studies. In Section 6 we conclude.

2. Notation and Kendall’s tau

Let X denote failure time, T denote truncation time, and C denote censoring time. We observe data of the form (T, Y, δ), where δ = I(X < C) and Y = min(X,C), and for which T < Y. These data are both left truncated and right censored. For the first sampling model, we define the residual failure and censoring times as: R = XT and D = CT. We assume that RD|T, DT, and P(T < C) = 1. We refer to this model as the residual censoring model, as the censoring variable independent of the failure time on the residual (i.e., post-truncation) time scale. This model was considered by Vardi and Stockmeyer (1985), Wang (1991) and Mandel and Betensky (2008). The second sampling model assumes that (T,X) ⊥ C|T < X, T < C; we refer to this model as the independent censoring model. This model was considered by Sun and Zhu (2000), De Uña-álvarez (2004), and Chaieb et al. (2006).

For both of these models, if quasi-independence of X and T holds, the product limit estimator or the Turnbull algorithm can be used to estimate the failure time distribution. Unfortunately, it is not possible to identify these models from the data alone. However, the context and sampling and drop-out of particular study may be informative regarding the censoring model. For the residual censoring model, we show that these estimators are valid under quasi-independence in the Appendix. For the independent censoring model, this is obvious from the derivations of the estimators given in Wang (1989), Wang (1991), Vardi and Stockmeyer (1985), Tsai (1990), Turnbull (1976) and Woodroofe (1985). A U-statistic test for quasi-independence was proposed by Martin and Betensky (2005), based on a conditional Kendall’s tau,

τc=E[sgn((X1X2)(T1T2))|Ω12],

where Ω12 = {max(T1, T2) ≤ min(X1,X2)} is the event of comparability of the pair of observations. This construction is necessary so that under H0 : XT|X > T, τc = 0. However, in the presence of right censoring, τc is not directly estimable. Instead, the parameter

τc*=E[sgn((X1X2)(T1T2))|Λ12]=E[sgn((Y1Y2)(T1T2))|Λ12]

is easily estimated, where Λ12 = {max(T1, T2) ≤ min(Y1, Y2) ∩ δ(1) = 1} and δ(1) is the failure indicator for min(Y1, Y2). Λ12 is the event of comparability and orderability of the pair of observations (Martin and Betensky, 2005). We note that a comparable pair, (T1, Y1), (T2, Y2) is orderable if min(Y1, Y2) = min(X1,X2). Martin and Betensky (2005) proposed a consistent estimator of this parameter, for the independent censoring model, using a ratio of U statistics,

τ̂c*=1Mi=1n1j=i+1nsgn((YiYj)(TiTj))I(Λij)=Uc/UM,

where M=i=1n1j=i+1nI(Λij) is the number of comparable and orderable pairs. The U-statistics, Uc and UM, have expected values of τc*μ and P12) = μ, respectively. Martin and Betensky (2005) also showed that under the null hypothesis, H0 : τc = 0, it follows that τc*=0. In the Appendix, we extend this result to the residual censoring model, so that the test based on the theoretical quantity, τc*, is valid for both models. That is, for testing purposes, it is sufficient to estimate τc*, instead of τc.

3. Model 1: residual censoring model

Model 1 assumes that RD|T, DT, and P(T < C) = 1, where R = XT and D = CT, and the sampling requires that T < X. We further define Q(s) = P(CT > s). We must establish H0 : XT|T < X, so that we may apply standard survival analysis techniques to estimate FX(·). We propose to estimate τc with a new, inverse probability weighted estimator, τ̂c1 = Uc1/UM1, where

Uc1=(n2)1i=1n1j=i+1nsgn((YiYj)(TiTj))I(Λij)/(Yij*Ti)(Yij*Tj)UM1=(n2)1i=1n1j=i+1nI(Λij)/(Yij*Ti)(Yij*Tj),

Yij*=min(Yi,Yj), and (u) = (CT > u). Applying the asymptotic theory for U-statistics (Randles and Wolfe, 1979) and assuming Q(·) to be known, the asymptotic variance of Uc1 is given by

γ1=E(sgn((Y1Y2)(T1T2)(Y1Y3)(T1T3))I(Λ12)I(Λ13)Q(Y12*T1)Q(Y12*T2)Q(Y13*T1)Q(Y13*T3))(τcμ)2,

where μ = P12). Upon substitution of Q by its consistent estimator, , a computationally efficient estimator for γ1 is given by Martin and Betensky (2005). We evaluate the impact of not inflating the variance estimator for insertion of the estimator in place of Q in our simulation studies. Alternatively, the approach taken by Datta et al. (2010) for the variance of a U-statistic with inverse probability of censoring weighting might be applied to this setting with truncation. Furthermore, it follows that n−1/2(Uc1 − τcμ) is approximately distributed as N(0, 4γ1), and thus n−1/2(τ̂c1 − τc) is approximately N(0, 4γ12). Under the null, H0 : XT|T < X, nτ̂c12/(4γ̂1/μ̂2) is approximately χ12.

We now establish the consistency of τ̂c1. Recall that we can decompose τc into a sum of its constituent parts as follows:

τc=E[sgn((X1X1)(T1T2))|Ω12)]=P(X1<X2,T1<T2|Ω12)P(X2<X1,T1<T2|Ω12)P(X1<X2,T2<T1|Ω12)+P(X2<X1,T2<T1|Ω12). (1)

We proceed by deriving a consistent estimator for the first term in (1), from which it is easily seen how the remaining three terms are estimated and combined to form τ̂c1 = Uc1/UM1.

P(T1<T2,X1<X2|Ω12)=P(T1<T2,X1<X2|max(T)<min(X))=P(T1<T2,X1<X2,max(T)<min(X))P(max(T)<min(X))=P(T1<T2,X1<X2,T2<X1)P(T1<X1,T2<X1,X1<X2)+P(T1<X2,T2<X2,X2<X1). (2)

Assuming for now that Q is known, we estimate the numerator of (2) with

(n2)1i=1n1j=i+1nδiI(Ti<Tj,Yi<Yj,Tj<Yi)/Q(YiTi)Q(YiTj) (3)

and the denominator with

(n2)1i=1n1j=i+1nδiI(Ti<Yi,Yi<Yj,Tj<Yi)/Q(YiTi)Q(YiTj)+(n2)1i=1n1j=i+1nδjI(Tj<Yj,Yj<Yi,Ti<Yj)/Q(YjTj)Q(YjTi).

We note that the denominator is the same for all four terms in (1) and is estimated by UM1. The consistency of (3) for the numerator of (2) follows from noting that

E(δ1I(T1<T2)I(Y1<Y2)I(T2<Y1)Q(Y1T1)Q(Y1T2))=E(E[I(T1<T2,T2<X1,X1<X2,X1<C1,X1<C2)Q(X1T1)Q(X1T2)|T,XT])=E(I(T1<T2,T2<X1,X1<X2)Q(X1T1)Q(X1T2)E[I(X1<C1,X1<C2)|T,XT]).

After applying the independence assumptions of the model, this final expectation is equal to

=E(I(T1<T2,T2<X1,X1<X2)Q(X1T1)Q(X1T2)E[I(X1T1<C1T1,X1T2<C2T2)|T,XT])=E(I(T1<T2,T2<X1,X1<X2)Q(X1T1)Q(X1T2)P(C1T1>X1T1|T,XT)P(C2T2>X1T2|T,XT))=E(I(T1<T2,T2<X1,X1<X2)Q(X1T1)Q(X1T2)P(C1T1>X1T1)P(C2T2>X1T2)),

which is equal to P(T1 < T2, T2 < X1,X1 < X2), as desired. The consistency of the denominator of (2) follows analogously. This argument clearly illustrates that without the inverse weighting by Q, the distribution of CT, i.e., as in τ̂c*, the estimator converges to a function of Q, and thus is inconsistent for τc. In practice, we do not know Q, but due to the model’s assumed independence of CT and XT, we are able to estimate it using a Kaplan-Meier estimator based on (yiti, 1−δi), i.e., the residual censoring times, with reversal of the roles of censoring and failure.

It is important to note that if the support of the random variable XT contains the support of CT, then we cannot estimate τc, but rather are limited to

E[sgn((T1T2)(X1X2))|Ω12],

where Ω′12 = I{max(T1, T2) ≤ min(X1,X2) ∩ max(XT) < K} and K is the upper limit of the support of CT. Like τc*, this quantity mixes information about the dependence of failure and truncation with the censoring distribution, and thus is not of clear interest. However, in this case, inverse weighting will not remove the censoring effects.

4. Model 2: independent censoring model

Model 2 assumes that (T,X) ⊥ C|T < X, T < C and that sampling requires T < min(X,C). We further define SC(u) = P(C > u). We must establish H0 : XT|T < X, so that we may apply standard survival analysis techniques to estimate FX(·). Under this model, we propose to estimate τc with a second, new, inverse probability weighted estimator, τ̂c2 = Uc2/UM2, where

Uc2=(n2)1i=1n1j=i+1nsgn((YiYj)(TiTj))I(Λij)ŜC(Yij*)2/[ŜC(Tj)ŜC(Ti)],UM2=(n2)1i=1n1j=i+1nI(Λij)ŜC(Yij*)2/[ŜC(Tj)ŜC(Ti)],

Yij*=min(Yi,Yj), and ŜC(u) = (C > u). Again, upon substitution of SC by its consistent estimator, ŜC, a computationally efficient estimator for γ2 is given by Martin and Betensky (2005). We evaluate the impact of not inflating the variance estimator for insertion of the estimator ŜC in place of SC in our simulation studies. applying standard U-statistic theory, the asymptotic variance of Uc2 is given by

γ2=E(sgn((Y1Y2)(T1T2)(Y1Y3)(T1T3))I(Λ12)I(Λ13)SC(Y12*)2/(SC(T1)SC(T2))×SC(Y13*)2/(SC(T1)SC(T3)))(τcμ)2,

where μ = P12). A consistent estimator of γ2 can be derived as in Martin and Betensky (2005). Under quasi-independence, H0 : XT|T < X, it follows that nτ̂c22/(4γ̂2/μ̂2) is approximately χ12.

As for Model 1, we establish consistency of our estimator for the numerator of the first term in τc (1), which is given in (2). This follows from expressing the expectation of each term of the estimator of the numerator of (2) as an iterated expectation, and then successively applying the independence assumptions of the model. Again, assuming for now that SC is known,

E(δ1I(T1<T2)I(Y1<Y2)I(T2<Y1)SC(Y1)2/[SC(T1)SC(T2)])=E(E[I(T1<T2,T2<X1,X1<X2,X1<C1,X1<C2)SC(X1)2/(SC(T1)SC(T2))|T,X,C>T])=E(I(T1<T2,T2<X1,X1<X2)SC(X1)2/[SC(T1)SC(T2)]E[I(X1<C1,X1<C2)|T,X,C>T])=E(I(T1<T2,T2<X1,X1<X2)SC(X1)2/[SC(T1)SC(T2)]P(X1<C1|C1>T1,T1,X1)P(X1<C2|C2>T2,T2,X2))=E(I(T1<T2,T2<X1,X1<X2)SC(X1)2/[SC(T1)SC(T2)]SC(X1)2/[SC(T1)SC(T2)])=P(T1<T2,T2<X1,X1<X2).

The consistency of the estimator for the denominator of (2) follows similarly, as does consistency for estimators of the remaining terms in τc (1). As for Model 1, the use of inverse probability weighting to remove the effects of censoring is effective only if the support of the failure time variable is contained within that of the censoring time variable. Otherwise, the estimator for τc remains as a function of the distribution of C.

We also consider a second inverse weighted estimator for τc the under Model 2; this estimator includes only subjects who are uncensored. It is likely that this estimator will be less efficient than our first proposal given that it is using fewer observations. We were motivated to consider this estimator in our extension of the transformation model of Efron and Petrosian (1994) to handle dependent truncation and failure in the presence of censoring (Austin and Betensky, 2012). In that setting, due to the nature of the transformation model for dependency, it is necessary to use the uncensored observations only in certain steps of the estimation process. Thus, while this tau estimator does not make full use of the available data, it enables fuller use of the data (i.e., less modeling) for estimation of the failure distribution (Austin and Betensky, 2012). This new estimator for τc is given by τ̂c3 = Uc3/UM3, where

Uc3=(n2)1i=1n1j=i+1nsgn((YiYj)(TiTj))I(ηij)(ŜC(Yi)ŜC(Yj))/(ŜC(Tj)ŜC(Ti)),UM3=(n2)1i=1n1j=i+1nI(ηij)(ŜC(Yi)ŜC(Yj))/(ŜC(Tj)ŜC(Ti)),

and η12 = I{max(T1, T2) ≤ min(Y1, Y2) ∩ δ1δ2 = 1}. The asymptotic variance of Uc3, with SC assumed known, is given by

γ3=E(sgn((Y1Y2)(T1T2)(Y1Y3)(T1T3))I(η12)I(η13)(SC(Y1)SC(Y2))/(SC(T1)SC(T2))×(SC(Y1)SC(Y2))/(SC(T1)SC(T3)))(τcμ)2,

where μ = P12). Under H0 : XT|X > T, nτ̂c32/(4γ̂3/μ̂2) is approximately χ12.

As for τ̂c2, the consistency of τ̂c3 follows from the law of large numbers and the unbiasedness of the components of the U-statistic, as seen in the following steps for one component of the numerator of the estimator:

E(δ1δ2I(T1<T2)I(Y1<Y2)I(T2<Y1)(SC(Y1)SC(Y2))/(SC(T1)SC(T2)))=E(E[I(T1<T2,T2<X1,X1<X2,X1<C1,X2<C2)[SC(X1)SC(X2)]/[SC(T1)SC(T2)]|T,X,C>T])=E(I(T1<T2,T2<X1,X1<X2)(SC(X1)SC(X2))/[SC(T1)SC(T2)]E[I(X1<C1,X2<C2)|T,X,C>T])=E(I(T1<T2,T2<X1,X1<X2)(SC(X1)SC(X2))/[SC(T1)SC(T2)]P(X1<C1|C1>T1,T1,X1)P(X2<C2|C2>T2,T2,X2))=E(I(T1<T2,T2<X1,X1<X2)SC(X1)SC(X2)/[SC(T1)SC(T2)](SC(X1)SC(X2))/(SC(T1)SC(T2)))=P(T1<T2,T2<X1,X1<X2).

Just as τ̂c1 and τ̂c2 have their associated inconsistent unweighted version (τ̂c*), τ̂c3 improves upon τ̂c, where

τ̂c=1Mi=1n1j=i+1nsgn((YiYj)(TiTj))δiδj)

and M=i=1n1j=i+1nδiδj. This estimator, like τ̂c*, is inconsistent away from the null in the presence of censoring; we present it for comparison to τ̂c3.

5. Simulations and applications

In simulation studies, we explore the nature of the inconsistency of the standard Kendall’s tau estimator for left truncated and right censored data, τ̂c*, and confirm the consistency of our new inverse probability weighted estimators. We also assess the consistency of our new estimator for the AIDS incubation study, for which there was no censoring, but to which we artificially impose independent censoring. Finally, we compare tests based on τ̂c* and τ̂c1 and τ̂c2 for the Channing House data.

5.1. Simulations: residual censoring model

For investigation of the residual censoring model, we generated T and X as bivariate normal, with means 0 and 1, respectively, and variance 1, and retained those observations that satisfied T < X. We varied the correlation parameter, ρ, so that we could investigate how the estimator behaves under different degrees of dependence between T and X. We generated D as exponential with rate λ, and set C = T +D. This ensured that P(T < C) = 1 and DT. Also, by construction, R = XTD|T. We varied the rate parameter, λ, to investigate the effects of heavy (70%) versus light (20%) censoring. In each simulation, we generated 1000 (5000 for the variance estimation) data sets with 500 observations each (after truncation). For each of the 1000 data sets, we calculated τ̂c1 and τ̂c*, along with their analytical variances. The variance for τ̂c* was given by Martin and Betensky (2005). In these simulations, the support of CT contains the support of XT so that there are no residual effects of censoring after inverse probability weighting.

Table 1 displays the true value of τc, estimated from the simulated data prior to censoring, the averages across the data sets of the estimators, τc1 and τ̂c*, and their analytical variances, along with their empirical variances and estimated power for the test of quasi-independence. It is apparent that under this model, the standard estimator, τ̂c*, displays substantial bias toward the null. This bias appears to increase with distance from the null. Our new estimator, τ̂c1 exhibits much smaller bias. As expected, there is more bias under heavy censoring than under light censoring. The analytic variances of our new estimator are quite close to their empirical variances, with small positive bias, confirming the quality and conservativeness of our variance estimators. Under heavy censoring, the variances of our new estimator are roughly twice those of τ̂c*; this is likely due to the additional variability in our estimator due to the inverse weighting by (·). This translates into slightly higher power for the test of the null of quasi-independence that is based on τ̂c*.

Table 1.

Consistency results for τ̂c1 for censoring probabilities of 70% and 20% for N = 500. Each average was taken over 1000 repetitions (5000 for variances for 70% censoring). The column τ̂c1 represents the average value of τ̂c1 and the column τ̂c* represents the average value of τ̂c*. The column σ̂2(τ̂c1) represents the average analytical variance of τ̂c1, just as σ̂2(τ̂c*) represents the average analytical variance of τ̂c*. P = Power; EV = empirical variance

Censoring τc τ̂c1
τ̂c*
σ̂2(τ̂c1) EV(τ̂c1)
σ̂2(τ̂c*)
EV(τ̂c*)
P(τ̂c1)
P(τ̂c*)
0.70 −0.79 −0.791 −0.744 0.0073 0.0073 0.0049 0.0049 1.000 1.000
−0.38 −0.381 −0.345 0.0072 0.0070 0.0047 0.0044 1.000 1.000
−0.33 −0.318 −0.257 0.0072 0.0070 0.0029 0.0026 0.99 0.997
−0.22 −0.218 −0.174 0.0071 0.0069 0.0028 0.0025 0.816 0.907
−0.12 −0.116 −0.091 0.0070 0.0067 0.0025 0.0025 0.291 0.395
−0.018 −0.018 −0.014 0.0055 0.0050 0.0025 0.0025 0.061 0.07
0.000 −0.002 −0.005 0.0055 0.0048 0.0023 0.0024 0.049 0.051
0.019 0.018 0.015 0.0054 0.0047 0.0024 0.0024 0.072 0.066
0.10 0.103 0.082 0.0053 0.0053 0.0024 0.0023 0.37 0.378
0.17 0.172 0.142 0.0054 0.0045 0.0023 0.0024 0.719 0.801
0.24 0.237 0.205 0.0055 0.0049 0.0025 0.0025 0.922 0.982
0.44 0.436 0.403 0.0056 0.0051 0.0029 0.0027 1.000 1.000
0.66 0.678 0.642 0.0064 0.0060 0.0034 0.0033 1.000 1.000
0.20 −0.79 −0.790 −0.777 0.0019 0.0018 0.0016 0.0015 1.000 1.000
−0.38 −0.379 −0.365 0.0015 0.0016 0.0012 0.0011 1.000 1.000
−0.33 −0.319 −0.313 0.0010 0.0009 0.0010 0.0010 1.000 1.000
−0.22 −0.222 −0.217 0.0011 0.0010 0.0011 0.0010 1.000 1.000
−0.12 −0.118 −0.116 0.0010 0.0010 0.0011 0.0010 0.963 0.95
−0.018 −0.018 −0.018 0.0010 0.0010 0.0010 0.0010 0.088 0.091
0.000 0.001 −0.002 0.0011 0.0010 0.0010 0.0011 0.051 0.050
0.019 0.019 0.017 0.0010 0.0011 0.0010 0.0011 0.101 0.096
0.10 0.103 0.100 0.0009 0.0009 0.0010 0.0010 0.92 0.897
0.17 0.173 0.170 0.0009 0.0008 0.0009 0.0009 1.000 1.000
0.24 0.238 0.235 0.0009 0.0007 0.0009 0.0008 1.000 1.000
0.44 0.441 0.403 0.0008 0.0007 0.0007 0.0006 1.000 1.000

Under this normal copula model, an alternative measure of dependence between T and X is given by τ = 2 arcsin (ρ) /π. As explained in Section 1, τ and τc are not comparable measures of dependence, and the copula based τ is not the target of our estimation. Nonetheless, it is interesting to note that τ and τc are close when the copula model is correctly specified; they differ by at most 0.10.

5.2. Simulations: independent censoring model

For investigation of the independent censoring model, we generated T and X as for the residual censoring model. We generated C ~ Nc, 1), independently of T and X, and retained only those observations that satisfied T < min(X,C), thereby ensuring that (T,X) ⊥ C|T < C, T < X. We varied the mean parameter, μc, to again create censoring probabilities of 70% and 20%. In each simulation, we generated 1000 data sets (5000 for variance estimation) with 500 observations each (after truncation). For each of the 1000 data sets, we calculated τ̂c2, τ̂c3, and τ̂c*, along with their analytical variances. In these simulations, the support of C contains the support of X so that there are no residual effects of censoring after inverse probability weighting.

Table 2 displays the true value of τc, estimated from the simulated data, prior to censoring, the averages across the data sets of the estimators, τ̂c2 and τ̂c*, and their analytical variances, along with their empirical variances and estimated power for the test of quasi-independence. It is apparent that under this model, in contrast to the residual censoring model, there is substantial bias away from the null in the standard estimator, τ̂c*. This bias appears to be increase with distance from the null. Our new estimator exhibits much smaller bias. As expected, there is more bias under heavy censoring than under light censoring. The analytic variances are quite close to the empirical variances, with small positive bias in some cases, confirming the quality and conservatism of our variance estimators. In contrast to the results of the simulation for residual censoring model, in this simulation, the variances of the two estimators are comparable in magnitude, even though the variance estimator for τ̂c2 involves ŜC(·). There remains slightly higher power for the test of the null of quasi-independence that is based on τ̂c* than the test based on τ̂c2; under this model this is due to the positive bias of τ̂c* away from the null, rather than a smaller variance.

Table 2.

Consistency results for τ̂c2 for censoring probabilities of 70% and 20% for N = 500. Each average was taken over 1000 repetitions (5000 for variances for 70% censoring). The column τ̂c2 represents the average value of τ̂c2 and the column τ̂c* represents the average value of τ̂c*. The column σ̂2(τ̂c2) represents the average analytical variance of τ̂c2, just as σ̂2(τ̂c*) represents the average analytical variance of τ̂c*. P = Power; EV = empirical variance

Censoring τc τ̂c2
τ̂c*
σ̂2(τ̂c2) EV(τ̂c2)
σ̂2(τ̂c*)
EV(τ̂c*)
P(τ̂c2)
P(τ̂c*)
0.70 −0.79 −0.788 −0.802 0.0049 0.0047 0.0047 0.0043 1.000 1.000
−0.38 −0.377 −0.412 0.0048 0.0045 0.0047 0.0044 1.000 1.000
−0.33 −0.326 −0.357 0.0029 0.0029 0.0031 0.0029 1.000 1.000
−0.22 −0.228 −0.250 0.0028 0.0026 0.0030 0.0030 0.995 0.999
−0.12 −0.126 −0.136 0.0024 0.0022 0.0025 0.0024 0.715 0.775
−0.018 −0.019 −0.020 0.0026 0.0023 0.0025 0.0024 0.077 0.066
0.00 −0.001 −0.002 0.0026 0.0020 0.0025 0.0024 0.050 0.049
0.019 0.018 0.019 0.0023 0.0021 0.0024 0.0023 0.089 0.071
0.10 0.104 0.114 0.0022 0.0021 0.0023 0.0022 0.610 0.646
0.17 0.183 0.200 0.0022 0.0019 0.0023 0.0021 0.936 0.991
0.25 0.251 0.272 0.0025 0.0023 0.0024 0.0021 0.992 1.000
0.44 0.434 0.463 0.0036 0.0034 0.0037 0.0036 1.000 1.000
0.66 0.658 0.671 0.0042 0.0041 0.0038 0.0040 1.000 1.000
0.20 −0.79 −0.791 −0.801 0.0006 0.0007 0.0008 0.0007 1.000 1.000
−0.38 −0.379 −0.393 0.0008 0.0007 0.0009 0.0009 1.000 1.000
−0.33 −0.321 −0.339 0.0008 0.0007 0.0009 0.0009 1.000 1.000
−0.22 −0.222 −0.233 0.0009 0.0009 0.0010 0.0010 1.000 1.000
−0.12 −0.121 −0.126 0.0009 0.0009 0.0010 0.0010 0.963 0.95
−0.018 −0.017 −0.019 0.0009 0.0009 0.0010 0.0010 0.088 0.091
0.00 −0.002 −0.003 0.0008 0.0008 0.0010 0.0011 0.050 0.052
0.019 0.019 0.020 0.0009 0.0009 0.0010 0.0010 0.101 0.096
0.10 0.103 0.105 0.0008 0.0009 0.0009 0.0010 0.92 0.897
0.17 0.173 0.180 0.0008 0.0007 0.0008 0.0008 1.000 1.000
0.25 0.238 0.246 0.0008 0.0007 0.0008 0.0008 1.000 1.000
0.44 0.442 0.451 0.0007 0.0006 0.0006 0.0005 1.000 1.000

Tables 3 and 4 display the true value of τc, estimated from the simulated data, prior to censoring, the averages across the data sets of the estimators, τ̂c3 and τ̂c, and their analytical variances, along with their empirical variances and estimated power for the test of quasi-independence. Table 3 contains results for sample sizes of 500, and Table 4 contains results for sample sizes of 1000. In Table 3, it is apparent that τ̂c3 is less biased than τ̂c, especially under heavy censoring. This behavior is likely due to the loss of information due to the restriction to the uncensored observations, and is confirmed in Table 4, which reflects diminished biases for both estimators with sample sizes of 1000.

Table 3.

Consistency results for τ̂c3 for censoring probabilities of 70% and 20% for N = 500. Each average was taken over 1000 repetitions (5000 for variances for 70% censoring). The column τ̂c3 represents the average value of τ̂c3 and the column τ̂c represents the average value of τ̂c. The column σ̂2(τ̂c3) represents the average analytical variance of τ̂c3, just as σ̂2(τ̂c) represents the average analytical variance of τ̂c. P = Power; EV = empirical variance

Censoring τc τ̂c3
τ̂c
σ̂2(τ̂c3) EV(τ̂c3)
σ̂2(τ̂c)
EV(τ̂c)
P(τ̂c3)
P(τ̂c)
0.70 −0.32 −0.282 −0.198 0.0112 0.0125 0.0035 0.0028 0.892 0.953
−0.22 −0.188 −0.132 0.0099 0.0128 0.0029 0.0027 0.736 0.699
−0.12 −0.089 −0.060 0.0083 0.0093 0.0027 0.0029 0.463 0.282
−0.018 −0.012 −0.009 0.0066 0.0078 0.0027 0.0029 0.074 0.058
0.000 0.001 −0.004 0.0065 0.0073 0.0026 0.0025 0.047 0.051
0.019 0.021 0.009 0.0069 0.0073 0.0025 0.0027 0.077 0.067
0.10 0.112 0.067 0.0084 0.0099 0.0023 0.0025 0.601 0.302
0.17 0.167 0.115 0.0110 0.0134 0.0023 0.0025 0.798 0.687
0.25 0.240 0.169 0.0124 0.0133 0.0025 0.0023 0.995 0.934
0.20 −0.33 −0.315 −0.271 0.0010 0.0011 0.0010 0.0009 0.994 1.000
−0.22 −0.218 −0.183 0.0018 0.0016 0.0011 0.0010 0.987 1.000
−0.12 −0.119 −0.096 0.0020 0.0018 0.0011 0.0011 0.958 0.832
−0.018 −0.018 −0.016 0.0021 0.0018 0.0010 0.0010 0.088 0.065
0.000 0.001 −0.002 0.0023 0.0020 0.0010 0.0011 0.049 0.049
0.019 0.0178 0.013 0.0023 0.0019 0.0010 0.0010 0.092 0.068
0.10 0.105 0.088 0.0012 0.0013 0.0009 0.0009 0.888 0.812
0.17 0.172 0.151 0.0013 0.0016 0.0009 0.0008 0.992 0.998
0.24 0.238 0.213 0.0011 0.0012 0.0009 0.0008 1.000 1.000

Table 4.

Consistency results for τ̂c3 for censoring probabilities of 70% and 20% for N = 500. Each average was taken over 1000 repetitions (5000 for variances for 70% censoring). The column τ̂c3 represents the average value of τ̂c3 and the column τ̂c* represents the average value of τ̂c*. The column σ̂2(τ̂c3) represents the average analytical variance of τ̂c3, just as σ̂2(τ̂c*) represents the average analytical variance of τ̂c*. P = Power; EV = empirical variance

Censoring τc τ̂c3
τ̂c
σ̂2(τ̂c3) EV(τ̂c3)
σ̂2(τ̂c)
EV(τ̂c)
P(τ̂c3)
P(τ̂c)
0.70 −0.32 −0.312 −0.245 0.0101 0.0105 0.0027 0.0020 0.923 1.000
−0.22 −0.205 −0.187 0.0089 0.0101 0.0022 0.0020 0.875 0.712
−0.12 −0.119 −0.072 0.0077 0.0088 0.0019 0.0021 0.513 0.358
−0.018 −0.017 −0.010 0.0056 0.0059 0.0021 0.0021 0.068 0.071
0.000 0.001 −0.001 0.0053 0.0055 0.0020 0.0021 0.050 0.050
0.019 0.020 0.010 0.0065 0.0064 0.0020 0.0020 0.089 0.078
0.10 0.109 0.077 0.0081 0.0091 0.0019 0.0021 0.623 0.355
0.17 0.169 0.123 0.010 0.0116 0.0019 0.0020 0.888 0.772
0.25 0.244 0.182 0.0118 0.0122 0.0020 0.0018 1.0000 1.000
0.20 −0.33 −0.328 −0.335 0.0010 0.0011 0.0008 0.0007 0.994 1.000
−0.22 −0.218 −0.231 0.0017 0.0016 0.0009 0.0009 0.987 1.000
−0.12 −0.119 −0.124 0.0019 0.0018 0.0009 0.0008 0.958 0.920
−0.018 −0.018 −0.019 0.0029 0.0018 0.0007 0.0008 0.088 0.091
0.000 0.000 0.000 0.0024 0.0021 0.0008 0.0008 0.050 0.049
0.019 0.0185 0.021 0.0020 0.0019 0.0007 0.0008 0.092 0.091
0.10 0.102 0.104 0.0011 0.0011 0.0008 0.0008 0.888 0.911
0.17 0.169 0.177 0.0011 0.0012 0.0008 0.0007 0.992 1.000
0.24 0.239 0.242 0.0011 0.0010 0.0006 0.0006 1.000 1.000

5.3. AIDS incubation study

An AIDS incubation cohort study of HIV positive subjects with AIDS was conducted to estimate the distribution of the time from HIV infection to AIDS onset in individuals who were infected by contaminated blood transfusions (Lagakos et al., 1988). The failure time is the lag time between HIV infection and AIDS onset, which was observed only if it was less than the lag time between HIV infection and study recruitment. Thus, the lag time between HIV infection and AIDS onset was right truncated by the lag time between HIV infection and the end of recruitment. Specifically, let T denote the months from January 1978 to the date of HIV infection (i.e., time of blood transfusion) and let X denote the months from HIV infection to the date of AIDS onset. The study ended in July 1986, and so subjects were included only if they were diagnosed with AIDs prior to July 1986, i.e., if X < 102 − T. This sampling represents right truncation, and so we apply a time reversal to transform this into left truncation; that is, let X* = 102 − X so that the sampling is defined by T < X*. Since there is no censoring in this study, the standard estimator of τc, τ̂c (Tsai, 1990; Martin and Betensky, 2005), is consistent, even under dependence. This offers us the opportunity to evaluate our estimators in comparison to the standard estimator in the presence of censoring, which we artificially impose, since we know the true value of τc, up to random variation due to the finite sample size. From the original data set, we calculate τ̂c*=0.322. To impose censoring, we assumed the residual censoring model, and simulated CT as exponential with rate parameter, λ, and varied λ to obtain censoring probabilities of 0.25, 0.50, and 0.75. For each censoring probability, we simulated 200 new data sets, and for each data set we calculated τ̂c1 and τ̂c*. Table 5 lists the results of our simulation; these values are to be compared to the “true” value of 0.322. As expected, the original estimator, τ̂c*, is quite biased in the presence of censoring; at 50% censoring it is 0.398 and at 75% censoring it is 0.497. In contrast, our new estimator, τ̂c1, is, at worst, 0.324 at 75% censoring.

Table 5.

This table shows the average value of the estimate for τc for two different estimators at 3 levels of censoring. Each average was taken over 200 repetitions.

P(C < X*|X* > T) τ̂c1
τ̂c*
0.75 0.324 0.497
0.50 0.322 0.398
0.25 0.323 0.351

5.4. Channing House study

We illustrate the use of our inverse probability weighted U statistic on the Channing House data set. This study followed 97 male retirees from their entry to the Channing House retirement community in Palo Alto, California until death or end of study or departure from the community. There is a moderate amount of censoring in this study; about 56% of the observations are censored. The value of τ̂c* is 0.197 for these data, with associated p-value of 0.041. In comparison, τ̂c1 is 0.185 with associated p-value of 0.064 and τ̂c2 is 0.133, with associated p-value of 0.153. The analytical variance of τ̂c1 is 0.010 and its simple bootstrap variance (2000 bootstrap samples) is 0.009. The analytical variance of τ̂c2 is 0.009 and its bootstrap variance is 0.010. This provides further support for the accuracy of our analytical variances, even with their substitution of the true distribution functions (Q and SC) by consistent estimators.

It is noteworthy that there is a qualitative difference between the assumption of the residual censoring model and the independent censoring model with respect to the important assessment of quasi-independence. As censoring does not have any real meaning prior to entry to the retirement home, the residual censoring model, for which P(T < C) = 1, seems most appropriate. This choice raises the interesting question of how to estimate the survival distribution given the near-rejection of the null hypothesis of quasi-independence; we address this in Austin and Betensky (2012).

6. Discussion

Quasi-independence of failure and truncation is a necessary prerequisite for application of straightforward estimation and regression procedures for truncated and censored data. At the first level of analysis, a test of quasi-independence is of primary interest. For this purpose, the simple τc* estimator is valid. If this test rejects the null hypothesis of quasi-independence, a second level of analysis is required. This involves some approach for modeling and adjusting for the dependence in order to obtain an estimate for the failure time distribution. For example, a function of an estimate of Kendall’s tau might be used in a copula model for failure and truncation (Chaieb et al., 2006) or in a transformation model (Efron and Petrosian, 1994; Austin and Betensky, 2012)). At this level of analysis, the simple τc* is not consistent for the true τc, as it converges to a quantity that depends on the underlying censoring distribution. A similar result was found by Uno et al. (2009) for Kendall’s tau for censored by non-truncated data. We have addressed this problem by proposing two new estimators for τc that are consistent away from the null and that apply to two commonly assumed models for dependent truncation. We also considered a third estimator, but found it to have poor large sample behavior due to its restriction to uncensored observations. While our proposed estimators do depend on the marginal distributions of failure and truncation, they are useful within the context of transformation models for failure in the presence of dependent truncation (Efron and Petrosian, 1994; Austin and Betensky, 2012). Interestingly, we found the original estimators to have greater power for detecting deviations from quasi-independence. This was due to their smaller variances in the case of the residual censoring model and to their greater bias in the case of the independent censoring model. Thus, it appears that if these Kendall’s tau measures of association are of interest entirely for the purpose of hypothesis testing, then the original estimators may suffice. Finally, we note that our simulations only considered cases in which the support of the (residual) censoring distribution contained that of the (residual) failure time distribution; in cases in which this does not occur, even our inverse probability weighted estimators will be functions of the associated censoring distributions.

Acknowledgments

This research was supported in part by NIH grants T32NS048005 and R01CA075971.

Appendix A. Appendix

We first show that under quasi-independence, the assumptions of the residual censoring model permit use of the risk set adjusted Kaplan Meier estimator. For simplicity, we express the likelihood contributions for discrete random variables:

H(T,X,C)=P(T=t,X=x,C=c|X>T)=P(T=t,XT=xt,CT=ct|X>T)=P(T=t|X>T)×P(XT=xt,CT=ct|T=t,X>T)=P(T=t|X>T)P(XT=xt|T=t,X>T)×P(CT=ct|T=t,X>T)P(T=t|X>T)P(X=x|X>t)P(CT=ct).

It is evident that quasi-independence allows for use of the conditional likelihood component for X for estimation of FX(·) via standard survival analysis estimation procedures.

We now show that under H0 : XT|T < X, and the remainder of the residual censoring model assumptions, τc*=0:

τc*E(sgn((T1T2)(Y1Y2))I(Λ12))=P(X1<C1,T1<T2,X1<Y2)P(X2<C2,T1<T2,X2<Y1)+P(X2<C2,T2<T1,X2<Y1)P(X1<C1,T2<T1,X1<Y2).

Under the assumptions XTCT|T, CTT, and P(T < C) = 1, along with quasi-independence, it follows that all four terms in τc* are equivalent and and hence τc*=0. We first note that the first and third terms and the second and fourth terms are trivially equivalent to each other, simply by relabeling the indices. Defining G(t,u)=c=ug(t,c)dc, with g(t, c) = P(T = t,C = c), we can express the first term as

P(X1<C1,T1<T2,X1<Y2)t=0s=tu=sfx(u)SX(u)G(t,u)G(s,u)dudsdt.

It is easy to see that the second term has exactly this same expression.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Austin M, Betensky R. Estimating a survival distribution in the presence of dependent left truncation and right censoring. 2012 Unpublished Manuscript. [Google Scholar]
  2. Beaudoin D, Duchesne T, Genest C. Improving the estimation of Kendall’s tau when censoring affects only one of the variables. Computational Statistics & Data Analysis. 2007;51(12):5743–5764. [Google Scholar]
  3. Chaieb L, Rivest L, Abdous B. Estimating survival under a dependent truncation. Biometrika. 2006;93(3):655. [Google Scholar]
  4. Datta S, Bandyopadhyay D, Satten GA. Inverse probability of censoring weighted U-statistics for right-censored data with an application to testing hypotheses. Scandinavian Journal of Statistics. 2010;37:680–700. [Google Scholar]
  5. De Uña-álvarez J. Nonparametric estimation under length-biased sampling and Type I censoring: a moment based approach. Annals of the Institute of Statistical Mathematics. 2004;56(4):667–681. [Google Scholar]
  6. Efron B, Petrosian V. Survival Analysis of the Gamma-Ray Burst Data. Journal of the American Statistical Association. 1994;89(426) [Google Scholar]
  7. Hyde J. Testing survival under right censoring and left truncation. Biometrika. 1977;64(2):225. [Google Scholar]
  8. Kaplan E, Meier P. Nonparametric estimation from incomplete observations. Journal of the American statistical association. 1958;53(282):457–481. [Google Scholar]
  9. Kendall M. A new measure of rank correlation. Biometrika. 1938;30(1–2):81. [Google Scholar]
  10. Lagakos S, Barraj L, Gruttola V. Nonparametric analysis of truncated survival data, with application to AIDS. Biometrika. 1988;75(3):515. [Google Scholar]
  11. Lakhal L, Rivest L, Beaudoin D. IPCW Estimator for Kendall’s tau under bivariate censoring. The International Journal of Biostatistics. 2009;5:1–20. [Google Scholar]
  12. Mandel M, Betensky R. Simultaneous confidence intervals based on the percentile bootstrap approach. Computational statistics & data analysis. 2008;52(4):2158–2165. doi: 10.1016/j.csda.2007.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Martin E, Betensky R. Testing quasi-independence of failure and truncation times via conditional Kendall’s tau. Journal of the American Statistical Association. 2005;100(470):484–492. [Google Scholar]
  14. Oakes D. On consistency of Kendall’s tau under censoring. Biometrika. 2008;95(4):997. [Google Scholar]
  15. Randles R, Wolfe D. Introduction to the theory of nonparametric statistics. New York: Wiley; 1979. [Google Scholar]
  16. Sun L, Zhu L. A semiparametric model for truncated and censored data* 1. Statistics & Probability Letters. 2000;48(3):217–227. [Google Scholar]
  17. Tsai W. Testing the assumption of independence of truncation time and failure time. Biometrika. 1990;77(1):169. [Google Scholar]
  18. Tsiatis A. A large sample study of Cox’s regression model. The Annals of Statistics. 1981;9(1):93–108. [Google Scholar]
  19. Turnbull B. The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society. Series B (Methodological) 1976;38(3):290–295. [Google Scholar]
  20. Uno H, Cai T, Pencina M, D’Agostino R, Wei L. On The C-Statistics For Evaluating Overall Adequacy Of Risk Prediction Procedures With Censored Survival Data. Harvard University Biostatistics Working Paper Series. 2009:101. doi: 10.1002/sim.4154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Vardi M, Stockmeyer L. Improved upper and lower bounds for modal logics of programs. Proceedings of the seventeenth annual ACM symposium on Theory of computing; ACM; 1985. pp. 240–251. [Google Scholar]
  22. Wang M. A semiparametric model for randomly truncated data. Journal of the American Statistical Association. 1989;84(407):742–748. [Google Scholar]
  23. Wang M. Nonparametric estimation from cross-sectional survival data. Journal of the American Statistical Association. 1991;86(413):130–143. [Google Scholar]
  24. Woodroofe M. Estimating a distribution function with truncated data. The Annals of Statistics. 1985;13(1):163–177. [Google Scholar]

RESOURCES