Eliminating bias due to censoring in Kendall’s tau estimators for quasi-independence of truncation and failure

Matthew D Austin; Rebecca A Betensky

doi:10.1016/j.csda.2013.11.018

. Author manuscript; available in PMC: 2015 May 14.

Published in final edited form as: Comput Stat Data Anal. 2013 Dec 4;73:16–26. doi: 10.1016/j.csda.2013.11.018

Eliminating bias due to censoring in Kendall’s tau estimators for quasi-independence of truncation and failure

Matthew D Austin ^a, Rebecca A Betensky ^a,^*

PMCID: PMC3912250 NIHMSID: NIHMS548321 PMID: 24505164

Abstract

While the currently available estimators for the conditional Kendall’s tau measure of association between truncation and failure are valid for testing the null hypothesis of quasi-independence, they are biased when the null does not hold. This is because they converge to quantities that depend on the censoring distribution. The magnitude of the bias relative to the theoretical Kendall’s tau measure of association between truncation and failure due to censoring has not been studied, and so its importance in real problems is not known. We quantify this bias in order to assess the practical usefulness of the estimators. Furthermore, we propose inverse probability weighted versions of the conditional Kendall’s tau estimators to remove the effects of censoring and provide asymptotic results for the estimators. In simulations, we demonstrate the decrease in bias achieved by these inverse probability weighted estimators. We apply the estimators to the Channing House data set and an AIDS incubation data set.

Keywords: Inverse probability weighting, Left truncation

1. Introduction

In observational studies, subjects are often followed from an initiating event to a failure event, and the lag time between these two events, the failure time, is of primary interest. Truncated survival data arise when the failure time is observed only if it falls within a subject specific region, known as the truncation set. The lower and upper limits of the truncation set are termed the left and right truncation times, respectively. This mechanism differs from censoring in that truncated subjects from the reference population are not observed at all, whereas censored observations are sampled, though their data are incomplete, as their exact failure times are unknown. One well known example of truncated data is an AIDS incubation cohort study of HIV positive subjects with AIDS (Lagakos et al., 1988). The failure time is the lag time between HIV infection and AIDS onset, which was observed only if it was less than the lag time between HIV infection and study recruitment. Thus, the lag time between HIV infection and AIDS onset is right truncated by the lag time between HIV infection and the end of recruitment. A second, well known example of left truncated, right censored data is the Channing House data (Hyde, 1977), in which 97 male residents of the Channing House retirement home were observed until death, or end of study or departure from the community. The failure time of interest is the age at death of the subjects, with age at entry into the retirement community as the left truncation time, as subjects were sampled only if their ages at death exceeded their ages at entry into the Channing House, and age at end of study or departure from the community as the right censoring time.

Quasi-independence of truncation and failure refers to their independence in the observable region (Tsai, 1990). Quasi-independence allows the joint density of the truncation time and the failure time over the observable region to be factored into a product that is proportional to the product of the marginal densities of each variable. Under quasi-independence, the distribution of the failure time can be consistently estimated by the risk set adjusted product limit estimator of Kaplan and Meier (1958) or the self-consistency algorithm of Turnbull (1976). Unlike the requisite and unidentifiable assumption of independence of failure and censoring (Tsiatis, 1981) for application of standard survival analysis methods to censored failure time data, quasi-independence can be tested using the observed data (Tsai, 1990). A popular nonparametric statistic that is used to quantify dependence is Kendall’s tau (Kendall, 1938), although it is a measure of association and not statistical dependence. Thus, while a non-zero Kendall’s tau implies dependence, the converse is not true, and so a Kendall’s tau test is most useful when it rejects the null. Tsai (1990) and Martin and Betensky (2005) proposed modified versions of Kendall’s tau that account for various types of truncation in the presence of censoring. While the estimators for the conditional Kendall’s tau are valid for testing the null hypothesis of quasi-independence (i.e., they are constructed to have expectation zero under the null hypothesis), they are biased for the “true” Kendall’s tau measure of association between the failure time and the truncation time when the null does not hold. This is because they converge to quantities that depend on the censoring distribution.

When quasi-independence of failure and truncation does not hold, estimation of the failure distribution requires a model for its dependence on truncation. This model dictates the nature of the dependence estimator that is required. A semi-parametric model, such as a transformation model (Efron and Petrosian, 1994; Austin and Betensky, 2012) requires a nonparametric estimator of dependence, and thus a Kendall’s tau type measure is appropriate. The transformation model operates by transforming the truncation variable to a latent, unobserved truncation time that would have been observed in the absence of dependence on the failure time, and then uses it in a standard estimator for the failure distribution that assumes quasi-independence of truncation and failure. The transformation requires an estimate of a dependence parameter, and it is desirable for this estimate to be free of the censoring distribution; this was shown in simulation studies conducted by Austin and Betensky (2012). In particular, there was considerably less bias for the transformation models that were based on the censoring-adjusted Kendalls tau as compared to the unadjusted Kendalls tau. Thus, an estimator of the conditional Kendall’s tau that does not depend on the underlying censoring is needed; the derivation of such a measure is the topic of this paper. An alternative approach is to use a copula model for the dependence between failure and truncation (Chaieb et al., 2006). This approach requires input of the copula-based dependence parameter that does not contain information on the censoring distribution; this is a parametric measure that depends on the selection of the particular copula family. The copula dependence estimator does not depend on the marginal distributions of failure and truncation, whereas the Kendall’s tau estimators do. For the ultimate purpose of estimation of the failure distribution, however, there is no advantage to this independence, while there is a downside to the parametric assumptions required for the copula estimator.

Beaudoin et al. (2007) reviewed the conditions for consistency of several existing estimation procedures for Kendall’s tau when one variable is subject to right censoring. All of these estimators either suffer from computational complexity, or are not consistent for the true Kendall’s tau under dependence. The computational complexities arise from kernel density estimation, permutation, or complex imputation for the censored observations. Oakes (2008) presented conditions for consistency of Kendall’s tau for bivariate random variables that are potentially subject to truncation, but like Beaudoin et al. (2007), did not allow for the case in which one variable truncates the observation of the other. Thus, for our setting of quasi-independence testing, if there is censoring, these estimators are biased for the true Kendall’s tau, and should not be used.

The magnitude of the bias due to censoring in the presence of truncation has not been studied, although it was recognized by Martin and Betensky (2005), and so its importance is not known. In this paper, we examine this issue and quantify this bias in order to assess the practical usefulness of the estimators. Furthermore, we propose inverse probability weighted versions of the conditional Kendall’s tau statistics to remove the effects of censoring. This follows the approach of Uno et al. (2009), who derived an inverse probability weighted C-statistic for assessing concordance between a failure time and a continuous marker in the context of right censoring, where the same problem arises. This problem was also addressed for Kendall’s tau in the presence of bivariate censoring (Lakhal et al., 2009). Through use of inverse probability weighting, we are able to eliminate the bias due to censoring up to the upper limit of the support of the censoring variable. We consider two commonly used sampling models, and derive corrected tau estimators for each. We also provide asymptotic results for the estimators.

In Section 2 we introduce notation, both sampling models, and provide an overview of the conditional Kendall’s tau estimators. In Sections 3 and 4 we derive our proposed estimators, prove their consistency and provide associated asymptotic results. In Section 5 we report simulation results, and results from the Channing House and AIDS studies. In Section 6 we conclude.

2. Notation and Kendall’s tau

Let X denote failure time, T denote truncation time, and C denote censoring time. We observe data of the form (T, Y, δ), where δ = I(X < C) and Y = min(X,C), and for which T < Y. These data are both left truncated and right censored. For the first sampling model, we define the residual failure and censoring times as: R = X − T and D = C − T. We assume that R ⊥ D|T, D ⊥ T, and P(T < C) = 1. We refer to this model as the residual censoring model, as the censoring variable independent of the failure time on the residual (i.e., post-truncation) time scale. This model was considered by Vardi and Stockmeyer (1985), Wang (1991) and Mandel and Betensky (2008). The second sampling model assumes that (T,X) ⊥ C|T < X, T < C; we refer to this model as the independent censoring model. This model was considered by Sun and Zhu (2000), De Uña-álvarez (2004), and Chaieb et al. (2006).

For both of these models, if quasi-independence of X and T holds, the product limit estimator or the Turnbull algorithm can be used to estimate the failure time distribution. Unfortunately, it is not possible to identify these models from the data alone. However, the context and sampling and drop-out of particular study may be informative regarding the censoring model. For the residual censoring model, we show that these estimators are valid under quasi-independence in the Appendix. For the independent censoring model, this is obvious from the derivations of the estimators given in Wang (1989), Wang (1991), Vardi and Stockmeyer (1985), Tsai (1990), Turnbull (1976) and Woodroofe (1985). A U-statistic test for quasi-independence was proposed by Martin and Betensky (2005), based on a conditional Kendall’s tau,

τ_{c} = E [sgn ((X_{1} - X_{2}) (T_{1} - T_{2})) | Ω_{12}],

where Ω₁₂ = {max(T₁, T₂) ≤ min(X₁,X₂)} is the event of comparability of the pair of observations. This construction is necessary so that under H₀ : X ⊥ T|X > T, τ_c = 0. However, in the presence of right censoring, τ_c is not directly estimable. Instead, the parameter

τ_{c}^{*} = E [sgn ((X_{1} - X_{2}) (T_{1} - T_{2})) | Λ_{12}] = E [sgn ((Y_{1} - Y_{2}) (T_{1} - T_{2})) | Λ_{12}]

is easily estimated, where Λ₁₂ = {max(T₁, T₂) ≤ min(Y₁, Y₂) ∩ δ₍₁₎ = 1} and δ₍₁₎ is the failure indicator for min(Y₁, Y₂). Λ₁₂ is the event of comparability and orderability of the pair of observations (Martin and Betensky, 2005). We note that a comparable pair, (T₁, Y₁), (T₂, Y₂) is orderable if min(Y₁, Y₂) = min(X₁,X₂). Martin and Betensky (2005) proposed a consistent estimator of this parameter, for the independent censoring model, using a ratio of U statistics,

{τ̂}_{c}^{*} = \frac{1}{M} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} sgn ((Y_{i} - Y_{j}) (T_{i} - T_{j})) I (Λ_{i j}) = U_{c} / U_{M},

where $M = \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} I (Λ_{i j})$ is the number of comparable and orderable pairs. The U-statistics, U_c and U_M, have expected values of $τ_{c}^{*} μ$ and P(Λ₁₂) = μ, respectively. Martin and Betensky (2005) also showed that under the null hypothesis, H₀ : τ_c = 0, it follows that $τ_{c}^{*} = 0$ . In the Appendix, we extend this result to the residual censoring model, so that the test based on the theoretical quantity, $τ_{c}^{*}$ , is valid for both models. That is, for testing purposes, it is sufficient to estimate $τ_{c}^{*}$ , instead of τ_c.

3. Model 1: residual censoring model

Model 1 assumes that R ⊥ D|T, D ⊥ T, and P(T < C) = 1, where R = X − T and D = C − T, and the sampling requires that T < X. We further define Q(s) = P(C − T > s). We must establish H₀ : X ⊥ T|T < X, so that we may apply standard survival analysis techniques to estimate F_X(·). We propose to estimate τ_c with a new, inverse probability weighted estimator, τ̂_c1 = U_c1/U_M1, where

U_{c 1} = {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} sgn ((Y_{i} - Y_{j}) (T_{i} - T_{j})) I (Λ_{i j}) / Q̂ (Y_{i j}^{*} - T_{i}) Q̂ (Y_{i j}^{*} - T_{j}) U_{M 1} = {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} I (Λ_{i j}) / Q̂ (Y_{i j}^{*} - T_{i}) Q̂ (Y_{i j}^{*} - T_{j}),

$Y_{i j}^{*} = min (Y_{i}, Y_{j})$ , and Q̂(u) = P̂(C − T > u). Applying the asymptotic theory for U-statistics (Randles and Wolfe, 1979) and assuming Q(·) to be known, the asymptotic variance of U_c1 is given by

γ_{1} = E (\frac{sgn ((Y_{1} - Y_{2}) (T_{1} - T_{2}) (Y_{1} - Y_{3}) (T_{1} - T_{3})) I (Λ_{12}) I (Λ_{13})}{Q (Y_{12}^{*} - T_{1}) Q (Y_{12}^{*} - T_{2}) Q (Y_{13}^{*} - T_{1}) Q (Y_{13}^{*} - T_{3})}) - {(τ_{c} μ)}^{2},

where μ = P(Ω₁₂). Upon substitution of Q by its consistent estimator, Q̂, a computationally efficient estimator for γ₁ is given by Martin and Betensky (2005). We evaluate the impact of not inflating the variance estimator for insertion of the estimator Q̂ in place of Q in our simulation studies. Alternatively, the approach taken by Datta et al. (2010) for the variance of a U-statistic with inverse probability of censoring weighting might be applied to this setting with truncation. Furthermore, it follows that n^−1/2(U_c1 − τ_cμ) is approximately distributed as N(0, 4γ₁), and thus n^−1/2(τ̂_c1 − τ_c) is approximately N(0, 4γ₁/μ²). Under the null, H₀ : X ⊥ T|T < X, $n {τ̂}_{c 1}^{2} / (4 {γ̂}_{1} / {μ̂}^{2})$ is approximately $χ_{1}^{2}$ .

We now establish the consistency of τ̂_c1. Recall that we can decompose τ_c into a sum of its constituent parts as follows:

τ_{c} = E [sgn ((X_{1} - X_{1}) (T_{1} - T_{2})) | Ω_{12})] = P (X_{1} < X_{2}, T_{1} < T_{2} | Ω_{12}) - P (X_{2} < X_{1}, T_{1} < T_{2} | Ω_{12}) - P (X_{1} < X_{2}, T_{2} < T_{1} | Ω_{12}) + P (X_{2} < X_{1}, T_{2} < T_{1} | Ω_{12}) .

(1)

We proceed by deriving a consistent estimator for the first term in (1), from which it is easily seen how the remaining three terms are estimated and combined to form τ̂_c1 = U_c1/U_M1.

P (T_{1} < T_{2}, X_{1} < X_{2} | Ω_{12}) = P (T_{1} < T_{2}, X_{1} < X_{2} | max (T) < min (X)) = \frac{P (T_{1} < T_{2}, X_{1} < X_{2}, max (T) < min (X))}{P (max (T) < min (X))} = \frac{P (T_{1} < T_{2}, X_{1} < X_{2}, T_{2} < X_{1})}{P (T_{1} < X_{1}, T_{2} < X_{1}, X_{1} < X_{2}) + P (T_{1} < X_{2}, T_{2} < X_{2}, X_{2} < X_{1})} .

(2)

Assuming for now that Q is known, we estimate the numerator of (2) with

{(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} δ_{i} I (T_{i} < T_{j}, Y_{i} < Y_{j}, T_{j} < Y_{i}) / Q (Y_{i} - T_{i}) Q (Y_{i} - T_{j})

(3)

and the denominator with

{(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} δ_{i} I (T_{i} < Y_{i}, Y_{i} < Y_{j}, T_{j} < Y_{i}) / Q (Y_{i} - T_{i}) Q (Y_{i} - T_{j}) + {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} δ_{j} I (T_{j} < Y_{j}, Y_{j} < Y_{i}, T_{i} < Y_{j}) / Q (Y_{j} - T_{j}) Q (Y_{j} - T_{i}) .

We note that the denominator is the same for all four terms in (1) and is estimated by U_M1. The consistency of (3) for the numerator of (2) follows from noting that

E (\frac{δ_{1} I (T_{1} < T_{2}) I (Y_{1} < Y_{2}) I (T_{2} < Y_{1})}{Q (Y_{1} - T_{1}) Q (Y_{1} - T_{2})}) = E (E [\frac{I (T_{1} < T_{2}, T_{2} < X_{1}, X_{1} < X_{2}, X_{1} < C_{1}, X_{1} < C_{2})}{Q (X_{1} - T_{1}) Q (X_{1} - T_{2})} | T, X - T]) = E (\frac{I (T_{1} < T_{2}, T_{2} < X_{1}, X_{1} < X_{2})}{Q (X_{1} - T_{1}) Q (X_{1} - T_{2})} E [I (X_{1} < C_{1}, X_{1} < C_{2}) | T, X - T]) .

After applying the independence assumptions of the model, this final expectation is equal to

= E (\frac{I (T_{1} < T_{2}, T_{2} < X_{1}, X_{1} < X_{2})}{Q (X_{1} - T_{1}) Q (X_{1} - T_{2})} E [I (X_{1} - T_{1} < C_{1} - T_{1}, X_{1} - T_{2} < C_{2} - T_{2}) | T, X - T]) = E (\frac{I (T_{1} < T_{2}, T_{2} < X_{1}, X_{1} < X_{2})}{Q (X_{1} - T_{1}) Q (X_{1} - T_{2})} P (C_{1} - T_{1} > X_{1} - T_{1} | T, X - T) P (C_{2} - T_{2} > X_{1} - T_{2} | T, X - T)) = E (\frac{I (T_{1} < T_{2}, T_{2} < X_{1}, X_{1} < X_{2})}{Q (X_{1} - T_{1}) Q (X_{1} - T_{2})} P (C_{1} - T_{1} > X_{1} - T_{1}) P (C_{2} - T_{2} > X_{1} - T_{2})),

which is equal to P(T₁ < T₂, T₂ < X₁,X₁ < X₂), as desired. The consistency of the denominator of (2) follows analogously. This argument clearly illustrates that without the inverse weighting by Q, the distribution of C − T, i.e., as in ${τ̂}_{c}^{*}$ , the estimator converges to a function of Q, and thus is inconsistent for τ_c. In practice, we do not know Q, but due to the model’s assumed independence of C−T and X−T, we are able to estimate it using a Kaplan-Meier estimator based on (y_i−t_i, 1−δ_i), i.e., the residual censoring times, with reversal of the roles of censoring and failure.

It is important to note that if the support of the random variable X −T contains the support of C − T, then we cannot estimate τ_c, but rather are limited to

E [sgn ((T_{1} - T_{2}) (X_{1} - X_{2})) | Ω_{12}^{'}],

where Ω′₁₂ = I{max(T₁, T₂) ≤ min(X₁,X₂) ∩ max(X − T) < K} and K is the upper limit of the support of C − T. Like $τ_{c}^{*}$ , this quantity mixes information about the dependence of failure and truncation with the censoring distribution, and thus is not of clear interest. However, in this case, inverse weighting will not remove the censoring effects.

4. Model 2: independent censoring model

Model 2 assumes that (T,X) ⊥ C|T < X, T < C and that sampling requires T < min(X,C). We further define S_C(u) = P(C > u). We must establish H₀ : X ⊥ T|T < X, so that we may apply standard survival analysis techniques to estimate F_X(·). Under this model, we propose to estimate τ_c with a second, new, inverse probability weighted estimator, τ̂_c2 = U_c2/U_M2, where

U_{c 2} = {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} \frac{sgn ((Y_{i} - Y_{j}) (T_{i} - T_{j})) I (Λ_{i j})}{Ŝ_{C} {(Y_{i j}^{*})}^{2} / [Ŝ_{C} (T_{j}) Ŝ_{C} (T_{i})]}, U_{M 2} = {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} \frac{I (Λ_{i j})}{Ŝ_{C} {(Y_{i j}^{*})}^{2} / [Ŝ_{C} (T_{j}) Ŝ_{C} (T_{i})]},

$Y_{i j}^{*} = min (Y_{i}, Y_{j})$ , and Ŝ_C(u) = P̂(C > u). Again, upon substitution of S_C by its consistent estimator, Ŝ_C, a computationally efficient estimator for γ₂ is given by Martin and Betensky (2005). We evaluate the impact of not inflating the variance estimator for insertion of the estimator Ŝ_C in place of S_C in our simulation studies. applying standard U-statistic theory, the asymptotic variance of U_c2 is given by

γ_{2} = E (\frac{sgn ((Y_{1} - Y_{2}) (T_{1} - T_{2}) (Y_{1} - Y_{3}) (T_{1} - T_{3})) I (Λ_{12}) I (Λ_{13})}{S_{C} {(Y_{12}^{*})}^{2} / (S_{C} (T_{1}) S_{C} (T_{2})) \times S_{C} {(Y_{13}^{*})}^{2} / (S_{C} (T_{1}) S_{C} (T_{3}))}) - {(τ_{c} μ)}^{2},

where μ = P(Ω₁₂). A consistent estimator of γ₂ can be derived as in Martin and Betensky (2005). Under quasi-independence, H₀ : X ⊥ T|T < X, it follows that $n {τ̂}_{c 2}^{2} / (4 {γ̂}_{2} / {μ̂}^{2})$ is approximately $χ_{1}^{2}$ .

As for Model 1, we establish consistency of our estimator for the numerator of the first term in τ_c (1), which is given in (2). This follows from expressing the expectation of each term of the estimator of the numerator of (2) as an iterated expectation, and then successively applying the independence assumptions of the model. Again, assuming for now that S_C is known,

E (\frac{δ_{1} I (T_{1} < T_{2}) I (Y_{1} < Y_{2}) I (T_{2} < Y_{1})}{S_{C} {(Y_{1})}^{2} / [S_{C} (T_{1}) S_{C} (T_{2})]}) = E (E [\frac{I (T_{1} < T_{2}, T_{2} < X_{1}, X_{1} < X_{2}, X_{1} < C_{1}, X_{1} < C_{2})}{S_{C} {(X_{1})}^{2} / (S_{C} (T_{1}) S_{C} (T_{2}))} | T, X, C > T]) = E (\frac{I (T_{1} < T_{2}, T_{2} < X_{1}, X_{1} < X_{2})}{S_{C} {(X_{1})}^{2} / [S_{C} (T_{1}) S_{C} (T_{2})]} E [I (X_{1} < C_{1}, X_{1} < C_{2}) | T, X, C > T]) = E (\frac{I (T_{1} < T_{2}, T_{2} < X_{1}, X_{1} < X_{2})}{S_{C} {(X_{1})}^{2} / [S_{C} (T_{1}) S_{C} (T_{2})]} P (X_{1} < C_{1} | C_{1} > T_{1}, T_{1}, X_{1}) P (X_{1} < C_{2} | C_{2} > T_{2}, T_{2}, X_{2})) = E (\frac{I (T_{1} < T_{2}, T_{2} < X_{1}, X_{1} < X_{2})}{S_{C} {(X_{1})}^{2} / [S_{C} (T_{1}) S_{C} (T_{2})]} S_{C} {(X_{1})}^{2} / [S_{C} (T_{1}) S_{C} (T_{2})]) = P (T_{1} < T_{2}, T_{2} < X_{1}, X_{1} < X_{2}) .

The consistency of the estimator for the denominator of (2) follows similarly, as does consistency for estimators of the remaining terms in τ_c (1). As for Model 1, the use of inverse probability weighting to remove the effects of censoring is effective only if the support of the failure time variable is contained within that of the censoring time variable. Otherwise, the estimator for τ_c remains as a function of the distribution of C.

We also consider a second inverse weighted estimator for τ_c the under Model 2; this estimator includes only subjects who are uncensored. It is likely that this estimator will be less efficient than our first proposal given that it is using fewer observations. We were motivated to consider this estimator in our extension of the transformation model of Efron and Petrosian (1994) to handle dependent truncation and failure in the presence of censoring (Austin and Betensky, 2012). In that setting, due to the nature of the transformation model for dependency, it is necessary to use the uncensored observations only in certain steps of the estimation process. Thus, while this tau estimator does not make full use of the available data, it enables fuller use of the data (i.e., less modeling) for estimation of the failure distribution (Austin and Betensky, 2012). This new estimator for τ_c is given by τ̂_c3 = U_c3/U_M3, where

U_{c 3} = {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} \frac{sgn ((Y_{i} - Y_{j}) (T_{i} - T_{j})) I (η_{i j})}{(Ŝ_{C} (Y_{i}) Ŝ_{C} (Y_{j})) / (Ŝ_{C} (T_{j}) Ŝ_{C} (T_{i}))}, U_{M 3} = {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} \frac{I (η_{i j})}{(Ŝ_{C} (Y_{i}) Ŝ_{C} (Y_{j})) / (Ŝ_{C} (T_{j}) Ŝ_{C} (T_{i}))},

and η₁₂ = I{max(T₁, T₂) ≤ min(Y₁, Y₂) ∩ δ₁δ₂ = 1}. The asymptotic variance of U_c3, with S_C assumed known, is given by

γ_{3} = E (\frac{sgn ((Y_{1} - Y_{2}) (T_{1} - T_{2}) (Y_{1} - Y_{3}) (T_{1} - T_{3})) I (η_{12}) I (η_{13})}{(S_{C} (Y_{1}) S_{C} (Y_{2})) / (S_{C} (T_{1}) S_{C} (T_{2})) \times (S_{C} (Y_{1}) S_{C} (Y_{2})) / (S_{C} (T_{1}) S_{C} (T_{3}))}) - {(τ_{c} μ)}^{2},

where μ = P(Ω₁₂). Under H₀ : X ⊥ T|X > T, $n {τ̂}_{c 3}^{2} / (4 {γ̂}_{3} / {μ̂}^{2})$ is approximately $χ_{1}^{2}$ .

As for τ̂_c2, the consistency of τ̂_c3 follows from the law of large numbers and the unbiasedness of the components of the U-statistic, as seen in the following steps for one component of the numerator of the estimator:

E (\frac{δ_{1} δ_{2} I (T_{1} < T_{2}) I (Y_{1} < Y_{2}) I (T_{2} < Y_{1})}{(S_{C} (Y_{1}) S_{C} (Y_{2})) / (S_{C} (T_{1}) S_{C} (T_{2}))}) = E (E [\frac{I (T_{1} < T_{2}, T_{2} < X_{1}, X_{1} < X_{2}, X_{1} < C_{1}, X_{2} < C_{2})}{[S_{C} (X_{1}) S_{C} (X_{2})] / [S_{C} (T_{1}) S_{C} (T_{2})]} | T, X, C > T]) = E (\frac{I (T_{1} < T_{2}, T_{2} < X_{1}, X_{1} < X_{2})}{(S_{C} (X_{1}) S_{C} (X_{2})) / [S_{C} (T_{1}) S_{C} (T_{2})]} E [I (X_{1} < C_{1}, X_{2} < C_{2}) | T, X, C > T]) = E (\frac{I (T_{1} < T_{2}, T_{2} < X_{1}, X_{1} < X_{2})}{(S_{C} (X_{1}) S_{C} (X_{2})) / [S_{C} (T_{1}) S_{C} (T_{2})]} P (X_{1} < C_{1} | C_{1} > T_{1}, T_{1}, X_{1}) P (X_{2} < C_{2} | C_{2} > T_{2}, T_{2}, X_{2})) = E (\frac{I (T_{1} < T_{2}, T_{2} < X_{1}, X_{1} < X_{2})}{S_{C} (X_{1}) S_{C} (X_{2}) / [S_{C} (T_{1}) S_{C} (T_{2})]} (S_{C} (X_{1}) S_{C} (X_{2})) / (S_{C} (T_{1}) S_{C} (T_{2}))) = P (T_{1} < T_{2}, T_{2} < X_{1}, X_{1} < X_{2}) .

Just as τ̂_c1 and τ̂_c2 have their associated inconsistent unweighted version $({τ̂}_{c}^{*})$ , τ̂_c3 improves upon ${τ̂}_{c}^{†}$ , where

{τ̂}_{c}^{†} = \frac{1}{M'} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} sgn ((Y_{i} - Y_{j}) (T_{i} - T_{j})) δ_{i} δ_{j})

and $M' = \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} δ_{i} δ_{j}$ . This estimator, like ${τ̂}_{c}^{*}$ , is inconsistent away from the null in the presence of censoring; we present it for comparison to τ̂_c3.

5. Simulations and applications

In simulation studies, we explore the nature of the inconsistency of the standard Kendall’s tau estimator for left truncated and right censored data, ${τ̂}_{c}^{*}$ , and confirm the consistency of our new inverse probability weighted estimators. We also assess the consistency of our new estimator for the AIDS incubation study, for which there was no censoring, but to which we artificially impose independent censoring. Finally, we compare tests based on ${τ̂}_{c}^{*}$ and τ̂_c1 and τ̂_c2 for the Channing House data.

5.1. Simulations: residual censoring model

For investigation of the residual censoring model, we generated T and X as bivariate normal, with means 0 and 1, respectively, and variance 1, and retained those observations that satisfied T < X. We varied the correlation parameter, ρ, so that we could investigate how the estimator behaves under different degrees of dependence between T and X. We generated D as exponential with rate λ, and set C = T +D. This ensured that P(T < C) = 1 and D ⊥ T. Also, by construction, R = X −T ⊥ D|T. We varied the rate parameter, λ, to investigate the effects of heavy (70%) versus light (20%) censoring. In each simulation, we generated 1000 (5000 for the variance estimation) data sets with 500 observations each (after truncation). For each of the 1000 data sets, we calculated τ̂_c1 and ${τ̂}_{c}^{*}$ , along with their analytical variances. The variance for ${τ̂}_{c}^{*}$ was given by Martin and Betensky (2005). In these simulations, the support of C − T contains the support of X − T so that there are no residual effects of censoring after inverse probability weighting.

Table 1 displays the true value of τ_c, estimated from the simulated data prior to censoring, the averages across the data sets of the estimators, τ_c1 and ${τ̂}_{c}^{*}$ , and their analytical variances, along with their empirical variances and estimated power for the test of quasi-independence. It is apparent that under this model, the standard estimator, ${τ̂}_{c}^{*}$ , displays substantial bias toward the null. This bias appears to increase with distance from the null. Our new estimator, τ̂_c1 exhibits much smaller bias. As expected, there is more bias under heavy censoring than under light censoring. The analytic variances of our new estimator are quite close to their empirical variances, with small positive bias, confirming the quality and conservativeness of our variance estimators. Under heavy censoring, the variances of our new estimator are roughly twice those of ${τ̂}_{c}^{*}$ ; this is likely due to the additional variability in our estimator due to the inverse weighting by Q̂ (·). This translates into slightly higher power for the test of the null of quasi-independence that is based on ${τ̂}_{c}^{*}$ .

Table 1.

Consistency results for τ̂_c1 for censoring probabilities of 70% and 20% for N = 500. Each average was taken over 1000 repetitions (5000 for variances for 70% censoring). The column τ̂_c1 represents the average value of τ̂_c1 and the column ${τ̂}_{c}^{*}$ represents the average value of ${τ̂}_{c}^{*}$ . The column σ̂²(τ̂_c1) represents the average analytical variance of τ̂_c1, just as ${σ̂}^{2} ({τ̂}_{c}^{*})$ represents the average analytical variance of ${τ̂}_{c}^{*}$ . P = Power; EV = empirical variance

Censoring

τ_c

τ̂_c1

{τ̂}_{c}^{*}

σ̂²(τ̂_c1)

EV(τ̂_c1)

{σ̂}^{2} ({τ̂}_{c}^{*})

E V ({τ̂}_{c}^{*})

P(τ̂_c1)

P ({τ̂}_{c}^{*})

0.70

−0.79

−0.791

−0.744

0.0073

0.0049

1.000

−0.38

−0.381

−0.345

0.0072

0.0070

0.0047

0.0044

1.000

−0.33

−0.318

−0.257

0.0072

0.0070

0.0029

0.0026

0.99

0.997

−0.22

−0.218

−0.174

0.0071

0.0069

0.0028

0.0025

0.816

0.907

−0.12

−0.116

−0.091

0.0070

0.0067

0.0025

0.291

0.395

−0.018

−0.014

0.0055

0.0050

0.0025

0.061

0.07

0.000

−0.002

−0.005

0.0055

0.0048

0.0023

0.0024

0.049

0.051

0.019

0.018

0.015

0.0054

0.0047

0.0024

0.072

0.066

0.10

0.103

0.082

0.0053

0.0024

0.0023

0.37

0.378

0.17

0.172

0.142

0.0054

0.0045

0.0023

0.0024

0.719

0.801

0.24

0.237

0.205

0.0055

0.0049

0.0025

0.922

0.982

0.44

0.436

0.403

0.0056

0.0051

0.0029

0.0027

1.000

0.66

0.678

0.642

0.0064

0.0060

0.0034

0.0033

1.000

0.20

−0.79

−0.790

−0.777

0.0019

0.0018

0.0016

0.0015

1.000

−0.38

−0.379

−0.365

0.0015

0.0016

0.0012

0.0011

1.000

−0.33

−0.319

−0.313

0.0010

0.0009

0.0010

1.000

−0.22

−0.222

−0.217

0.0011

0.0010

0.0011

0.0010

1.000

−0.12

−0.118

−0.116

0.0010

0.0011

0.0010

0.963

0.95

−0.018

0.0010

0.088

0.091

0.000

0.001

−0.002

0.0011

0.0010

0.0011

0.051

0.050

0.019

0.017

0.0010

0.0011

0.0010

0.0011

0.101

0.096

0.10

0.103

0.100

0.0009

0.0010

0.92

0.897

0.17

0.173

0.170

0.0009

0.0008

0.0009

1.000

0.24

0.238

0.235

0.0009

0.0007

0.0009

0.0008

1.000

0.44

0.441

0.403

0.0008

0.0007

0.0006

1.000

Open in a new tab

Under this normal copula model, an alternative measure of dependence between T and X is given by τ = 2 arcsin (ρ) /π. As explained in Section 1, τ and τ_c are not comparable measures of dependence, and the copula based τ is not the target of our estimation. Nonetheless, it is interesting to note that τ and τ_c are close when the copula model is correctly specified; they differ by at most 0.10.

5.2. Simulations: independent censoring model

For investigation of the independent censoring model, we generated T and X as for the residual censoring model. We generated C ~ N(μ_c, 1), independently of T and X, and retained only those observations that satisfied T < min(X,C), thereby ensuring that (T,X) ⊥ C|T < C, T < X. We varied the mean parameter, μ_c, to again create censoring probabilities of 70% and 20%. In each simulation, we generated 1000 data sets (5000 for variance estimation) with 500 observations each (after truncation). For each of the 1000 data sets, we calculated τ̂_c2, τ̂_c3, and ${τ̂}_{c}^{*}$ , along with their analytical variances. In these simulations, the support of C contains the support of X so that there are no residual effects of censoring after inverse probability weighting.

Table 2 displays the true value of τ_c, estimated from the simulated data, prior to censoring, the averages across the data sets of the estimators, τ̂_c2 and ${τ̂}_{c}^{*}$ , and their analytical variances, along with their empirical variances and estimated power for the test of quasi-independence. It is apparent that under this model, in contrast to the residual censoring model, there is substantial bias away from the null in the standard estimator, ${τ̂}_{c}^{*}$ . This bias appears to be increase with distance from the null. Our new estimator exhibits much smaller bias. As expected, there is more bias under heavy censoring than under light censoring. The analytic variances are quite close to the empirical variances, with small positive bias in some cases, confirming the quality and conservatism of our variance estimators. In contrast to the results of the simulation for residual censoring model, in this simulation, the variances of the two estimators are comparable in magnitude, even though the variance estimator for τ̂_c2 involves Ŝ_C(·). There remains slightly higher power for the test of the null of quasi-independence that is based on ${τ̂}_{c}^{*}$ than the test based on τ̂_c2; under this model this is due to the positive bias of ${τ̂}_{c}^{*}$ away from the null, rather than a smaller variance.

Table 2.

Consistency results for τ̂_c2 for censoring probabilities of 70% and 20% for N = 500. Each average was taken over 1000 repetitions (5000 for variances for 70% censoring). The column τ̂_c2 represents the average value of τ̂_c2 and the column ${τ̂}_{c}^{*}$ represents the average value of ${τ̂}_{c}^{*}$ . The column σ̂²(τ̂_c2) represents the average analytical variance of τ̂_c2, just as ${σ̂}^{2} ({τ̂}_{c}^{*})$ represents the average analytical variance of ${τ̂}_{c}^{*}$ . P = Power; EV = empirical variance

Censoring

τ_c

τ̂_c2

{τ̂}_{c}^{*}

σ̂²(τ̂_c2)

EV(τ̂_c2)

{σ̂}^{2} ({τ̂}_{c}^{*})

E V ({τ̂}_{c}^{*})

P(τ̂_c2)

P ({τ̂}_{c}^{*})

0.70

−0.79

−0.788

−0.802

0.0049

0.0047

0.0043

1.000

−0.38

−0.377

−0.412

0.0048

0.0045

0.0047

0.0044

1.000

−0.33

−0.326

−0.357

0.0029

0.0031

0.0029

1.000

−0.22

−0.228

−0.250

0.0028

0.0026

0.0030

0.995

0.999

−0.12

−0.126

−0.136

0.0024

0.0022

0.0025

0.0024

0.715

0.775

−0.018

−0.019

−0.020

0.0026

0.0023

0.0025

0.0024

0.077

0.066

0.00

−0.001

−0.002

0.0026

0.0020

0.0025

0.0024

0.050

0.049

0.019

0.018

0.019

0.0023

0.0021

0.0024

0.0023

0.089

0.071

0.10

0.104

0.114

0.0022

0.0021

0.0023

0.0022

0.610

0.646

0.17

0.183

0.200

0.0022

0.0019

0.0023

0.0021

0.936

0.991

0.25

0.251

0.272

0.0025

0.0023

0.0024

0.0021

0.992

1.000

0.44

0.434

0.463

0.0036

0.0034

0.0037

0.0036

1.000

0.66

0.658

0.671

0.0042

0.0041

0.0038

0.0040

1.000

0.20

−0.79

−0.791

−0.801

0.0006

0.0007

0.0008

0.0007

1.000

−0.38

−0.379

−0.393

0.0008

0.0007

0.0009

1.000

−0.33

−0.321

−0.339

0.0008

0.0007

0.0009

1.000

−0.22

−0.222

−0.233

0.0009

0.0010

1.000

−0.12

−0.121

−0.126

0.0009

0.0010

0.963

0.95

−0.018

−0.017

−0.019

0.0009

0.0010

0.088

0.091

0.00

−0.002

−0.003

0.0008

0.0010

0.0011

0.050

0.052

0.019

0.020

0.0009

0.0010

0.101

0.096

0.10

0.103

0.105

0.0008

0.0009

0.0010

0.92

0.897

0.17

0.173

0.180

0.0008

0.0007

0.0008

1.000

0.25

0.238

0.246

0.0008

0.0007

0.0008

1.000

0.44

0.442

0.451

0.0007

0.0006

0.0005

1.000

Open in a new tab

Tables 3 and 4 display the true value of τ_c, estimated from the simulated data, prior to censoring, the averages across the data sets of the estimators, τ̂_c3 and ${τ̂}_{c}^{†}$ , and their analytical variances, along with their empirical variances and estimated power for the test of quasi-independence. Table 3 contains results for sample sizes of 500, and Table 4 contains results for sample sizes of 1000. In Table 3, it is apparent that τ̂_c3 is less biased than ${τ̂}_{c}^{†}$ , especially under heavy censoring. This behavior is likely due to the loss of information due to the restriction to the uncensored observations, and is confirmed in Table 4, which reflects diminished biases for both estimators with sample sizes of 1000.

Table 3.

Consistency results for τ̂_c3 for censoring probabilities of 70% and 20% for N = 500. Each average was taken over 1000 repetitions (5000 for variances for 70% censoring). The column τ̂_c3 represents the average value of τ̂_c3 and the column ${τ̂}_{c}^{†}$ represents the average value of ${τ̂}_{c}^{†}$ . The column σ̂²(τ̂_c3) represents the average analytical variance of τ̂_c3, just as ${σ̂}^{2} ({τ̂}_{c}^{†})$ represents the average analytical variance of ${τ̂}_{c}^{†}$ . P = Power; EV = empirical variance

Censoring

τ_c

τ̂_c3

{τ̂}_{c}^{†}

σ̂²(τ̂_c3)

EV(τ̂_c3)

{σ̂}^{2} ({τ̂}_{c}^{†})

E V ({τ̂}_{c}^{†})

P(τ̂_c3)

P ({τ̂}_{c}^{†})

0.70

−0.32

−0.282

−0.198

0.0112

0.0125

0.0035

0.0028

0.892

0.953

−0.22

−0.188

−0.132

0.0099

0.0128

0.0029

0.0027

0.736

0.699

−0.12

−0.089

−0.060

0.0083

0.0093

0.0027

0.0029

0.463

0.282

−0.018

−0.012

−0.009

0.0066

0.0078

0.0027

0.0029

0.074

0.058

0.000

0.001

−0.004

0.0065

0.0073

0.0026

0.0025

0.047

0.051

0.019

0.021

0.009

0.0069

0.0073

0.0025

0.0027

0.077

0.067

0.10

0.112

0.067

0.0084

0.0099

0.0023

0.0025

0.601

0.302

0.17

0.167

0.115

0.0110

0.0134

0.0023

0.0025

0.798

0.687

0.25

0.240

0.169

0.0124

0.0133

0.0025

0.0023

0.995

0.934

0.20

−0.33

−0.315

−0.271

0.0010

0.0011

0.0010

0.0009

0.994

1.000

−0.22

−0.218

−0.183

0.0018

0.0016

0.0011

0.0010

0.987

1.000

−0.12

−0.119

−0.096

0.0020

0.0018

0.0011

0.958

0.832

−0.018

−0.016

0.0021

0.0018

0.0010

0.088

0.065

0.000

0.001

−0.002

0.0023

0.0020

0.0010

0.0011

0.049

0.019

0.0178

0.013

0.0023

0.0019

0.0010

0.092

0.068

0.10

0.105

0.088

0.0012

0.0013

0.0009

0.888

0.812

0.17

0.172

0.151

0.0013

0.0016

0.0009

0.0008

0.992

0.998

0.24

0.238

0.213

0.0011

0.0012

0.0009

0.0008

1.000

Open in a new tab

Table 4.

Consistency results for τ̂_c3 for censoring probabilities of 70% and 20% for N = 500. Each average was taken over 1000 repetitions (5000 for variances for 70% censoring). The column τ̂_c3 represents the average value of τ̂_c3 and the column ${τ̂}_{c}^{*}$ represents the average value of ${τ̂}_{c}^{*}$ . The column σ̂²(τ̂_c3) represents the average analytical variance of τ̂_c3, just as ${σ̂}^{2} ({τ̂}_{c}^{*})$ represents the average analytical variance of ${τ̂}_{c}^{*}$ . P = Power; EV = empirical variance

Censoring

τ_c

τ̂_c3

{τ̂}_{c}^{†}

σ̂²(τ̂_c3)

EV(τ̂_c3)

{σ̂}^{2} ({τ̂}_{c}^{†})

E V ({τ̂}_{c}^{†})

P(τ̂_c3)

P ({τ̂}_{c}^{†})

0.70

−0.32

−0.312

−0.245

0.0101

0.0105

0.0027

0.0020

0.923

1.000

−0.22

−0.205

−0.187

0.0089

0.0101

0.0022

0.0020

0.875

0.712

−0.12

−0.119

−0.072

0.0077

0.0088

0.0019

0.0021

0.513

0.358

−0.018

−0.017

−0.010

0.0056

0.0059

0.0021

0.068

0.071

0.000

0.001

−0.001

0.0053

0.0055

0.0020

0.0021

0.050

0.019

0.020

0.010

0.0065

0.0064

0.0020

0.089

0.078

0.10

0.109

0.077

0.0081

0.0091

0.0019

0.0021

0.623

0.355

0.17

0.169

0.123

0.010

0.0116

0.0019

0.0020

0.888

0.772

0.25

0.244

0.182

0.0118

0.0122

0.0020

0.0018

1.0000

1.000

0.20

−0.33

−0.328

−0.335

0.0010

0.0011

0.0008

0.0007

0.994

1.000

−0.22

−0.218

−0.231

0.0017

0.0016

0.0009

0.987

1.000

−0.12

−0.119

−0.124

0.0019

0.0018

0.0009

0.0008

0.958

0.920

−0.018

−0.019

0.0029

0.0018

0.0007

0.0008

0.088

0.091

0.000

0.0024

0.0021

0.0008

0.050

0.049

0.019

0.0185

0.021

0.0020

0.0019

0.0007

0.0008

0.092

0.091

0.10

0.102

0.104

0.0011

0.0008

0.888

0.911

0.17

0.169

0.177

0.0011

0.0012

0.0008

0.0007

0.992

1.000

0.24

0.239

0.242

0.0011

0.0010

0.0006

1.000

Open in a new tab

5.3. AIDS incubation study

An AIDS incubation cohort study of HIV positive subjects with AIDS was conducted to estimate the distribution of the time from HIV infection to AIDS onset in individuals who were infected by contaminated blood transfusions (Lagakos et al., 1988). The failure time is the lag time between HIV infection and AIDS onset, which was observed only if it was less than the lag time between HIV infection and study recruitment. Thus, the lag time between HIV infection and AIDS onset was right truncated by the lag time between HIV infection and the end of recruitment. Specifically, let T denote the months from January 1978 to the date of HIV infection (i.e., time of blood transfusion) and let X denote the months from HIV infection to the date of AIDS onset. The study ended in July 1986, and so subjects were included only if they were diagnosed with AIDs prior to July 1986, i.e., if X < 102 − T. This sampling represents right truncation, and so we apply a time reversal to transform this into left truncation; that is, let X* = 102 − X so that the sampling is defined by T < X*. Since there is no censoring in this study, the standard estimator of τ_c, τ̂_c (Tsai, 1990; Martin and Betensky, 2005), is consistent, even under dependence. This offers us the opportunity to evaluate our estimators in comparison to the standard estimator in the presence of censoring, which we artificially impose, since we know the true value of τ_c, up to random variation due to the finite sample size. From the original data set, we calculate ${τ̂}_{c}^{*} = 0.322$ . To impose censoring, we assumed the residual censoring model, and simulated C − T as exponential with rate parameter, λ, and varied λ to obtain censoring probabilities of 0.25, 0.50, and 0.75. For each censoring probability, we simulated 200 new data sets, and for each data set we calculated τ̂_c1 and ${τ̂}_{c}^{*}$ . Table 5 lists the results of our simulation; these values are to be compared to the “true” value of 0.322. As expected, the original estimator, ${τ̂}_{c}^{*}$ , is quite biased in the presence of censoring; at 50% censoring it is 0.398 and at 75% censoring it is 0.497. In contrast, our new estimator, τ̂_c1, is, at worst, 0.324 at 75% censoring.

Table 5.

This table shows the average value of the estimate for τ_c for two different estimators at 3 levels of censoring. Each average was taken over 200 repetitions.

P(C < X*|X* > T)

τ̂_c1

{τ̂}_{c}^{*}

0.75

0.324

0.497

0.50

0.322

0.398

0.25

0.323

0.351

Open in a new tab

5.4. Channing House study

We illustrate the use of our inverse probability weighted U statistic on the Channing House data set. This study followed 97 male retirees from their entry to the Channing House retirement community in Palo Alto, California until death or end of study or departure from the community. There is a moderate amount of censoring in this study; about 56% of the observations are censored. The value of ${τ̂}_{c}^{*}$ is 0.197 for these data, with associated p-value of 0.041. In comparison, τ̂_c1 is 0.185 with associated p-value of 0.064 and τ̂_c2 is 0.133, with associated p-value of 0.153. The analytical variance of τ̂_c1 is 0.010 and its simple bootstrap variance (2000 bootstrap samples) is 0.009. The analytical variance of τ̂_c2 is 0.009 and its bootstrap variance is 0.010. This provides further support for the accuracy of our analytical variances, even with their substitution of the true distribution functions (Q and S_C) by consistent estimators.

It is noteworthy that there is a qualitative difference between the assumption of the residual censoring model and the independent censoring model with respect to the important assessment of quasi-independence. As censoring does not have any real meaning prior to entry to the retirement home, the residual censoring model, for which P(T < C) = 1, seems most appropriate. This choice raises the interesting question of how to estimate the survival distribution given the near-rejection of the null hypothesis of quasi-independence; we address this in Austin and Betensky (2012).

6. Discussion

Quasi-independence of failure and truncation is a necessary prerequisite for application of straightforward estimation and regression procedures for truncated and censored data. At the first level of analysis, a test of quasi-independence is of primary interest. For this purpose, the simple $τ_{c}^{*}$ estimator is valid. If this test rejects the null hypothesis of quasi-independence, a second level of analysis is required. This involves some approach for modeling and adjusting for the dependence in order to obtain an estimate for the failure time distribution. For example, a function of an estimate of Kendall’s tau might be used in a copula model for failure and truncation (Chaieb et al., 2006) or in a transformation model (Efron and Petrosian, 1994; Austin and Betensky, 2012)). At this level of analysis, the simple $τ_{c}^{*}$ is not consistent for the true τ_c, as it converges to a quantity that depends on the underlying censoring distribution. A similar result was found by Uno et al. (2009) for Kendall’s tau for censored by non-truncated data. We have addressed this problem by proposing two new estimators for τ_c that are consistent away from the null and that apply to two commonly assumed models for dependent truncation. We also considered a third estimator, but found it to have poor large sample behavior due to its restriction to uncensored observations. While our proposed estimators do depend on the marginal distributions of failure and truncation, they are useful within the context of transformation models for failure in the presence of dependent truncation (Efron and Petrosian, 1994; Austin and Betensky, 2012). Interestingly, we found the original estimators to have greater power for detecting deviations from quasi-independence. This was due to their smaller variances in the case of the residual censoring model and to their greater bias in the case of the independent censoring model. Thus, it appears that if these Kendall’s tau measures of association are of interest entirely for the purpose of hypothesis testing, then the original estimators may suffice. Finally, we note that our simulations only considered cases in which the support of the (residual) censoring distribution contained that of the (residual) failure time distribution; in cases in which this does not occur, even our inverse probability weighted estimators will be functions of the associated censoring distributions.

Acknowledgments

This research was supported in part by NIH grants T32NS048005 and R01CA075971.

Appendix A. Appendix

We first show that under quasi-independence, the assumptions of the residual censoring model permit use of the risk set adjusted Kaplan Meier estimator. For simplicity, we express the likelihood contributions for discrete random variables:

H (T, X, C) = P (T = t, X = x, C = c | X > T) = P (T = t, X - T = x - t, C - T = c - t | X > T) = P (T = t | X > T) \times P (X - T = x - t, C - T = c - t | T = t, X > T) = P (T = t | X > T) P (X - T = x - t | T = t, X > T) \times P (C - T = c - t | T = t, X > T) \propto P (T = t | X > T) P (X = x | X > t) P (C - T = c - t) .

It is evident that quasi-independence allows for use of the conditional likelihood component for X for estimation of F_X(·) via standard survival analysis estimation procedures.

We now show that under H₀ : X ⊥ T|T < X, and the remainder of the residual censoring model assumptions, $τ_{c}^{*} = 0$ :

τ_{c}^{*} \propto E (sgn ((T_{1} - T_{2}) (Y_{1} - Y_{2})) I (Λ_{12})) = P (X_{1} < C_{1}, T_{1} < T_{2}, X_{1} < Y_{2}) - P (X_{2} < C_{2}, T_{1} < T_{2}, X_{2} < Y_{1}) + P (X_{2} < C_{2}, T_{2} < T_{1}, X_{2} < Y_{1}) - P (X_{1} < C_{1}, T_{2} < T_{1}, X_{1} < Y_{2}) .

Under the assumptions X − T ⊥ C − T|T, C − T ⊥ T, and P(T < C) = 1, along with quasi-independence, it follows that all four terms in $τ_{c}^{*}$ are equivalent and and hence $τ_{c}^{*} = 0$ . We first note that the first and third terms and the second and fourth terms are trivially equivalent to each other, simply by relabeling the indices. Defining $G (t, u) = \int_{c = u}^{\infty} g (t, c) d c$ , with g(t, c) = P(T = t,C = c), we can express the first term as

P (X_{1} < C_{1}, T_{1} < T_{2}, X_{1} < Y_{2}) \propto \int_{t = 0}^{\infty} \int_{s = t}^{\infty} \int_{u = s}^{\infty} f_{x} (u) S_{X} (u) G (t, u) G (s, u) d u d s d t .

It is easy to see that the second term has exactly this same expression.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

Austin M, Betensky R. Estimating a survival distribution in the presence of dependent left truncation and right censoring. 2012 Unpublished Manuscript. [Google Scholar]
Beaudoin D, Duchesne T, Genest C. Improving the estimation of Kendall’s tau when censoring affects only one of the variables. Computational Statistics & Data Analysis. 2007;51(12):5743–5764. [Google Scholar]
Chaieb L, Rivest L, Abdous B. Estimating survival under a dependent truncation. Biometrika. 2006;93(3):655. [Google Scholar]
Datta S, Bandyopadhyay D, Satten GA. Inverse probability of censoring weighted U-statistics for right-censored data with an application to testing hypotheses. Scandinavian Journal of Statistics. 2010;37:680–700. [Google Scholar]
De Uña-álvarez J. Nonparametric estimation under length-biased sampling and Type I censoring: a moment based approach. Annals of the Institute of Statistical Mathematics. 2004;56(4):667–681. [Google Scholar]
Efron B, Petrosian V. Survival Analysis of the Gamma-Ray Burst Data. Journal of the American Statistical Association. 1994;89(426) [Google Scholar]
Hyde J. Testing survival under right censoring and left truncation. Biometrika. 1977;64(2):225. [Google Scholar]
Kaplan E, Meier P. Nonparametric estimation from incomplete observations. Journal of the American statistical association. 1958;53(282):457–481. [Google Scholar]
Kendall M. A new measure of rank correlation. Biometrika. 1938;30(1–2):81. [Google Scholar]
Lagakos S, Barraj L, Gruttola V. Nonparametric analysis of truncated survival data, with application to AIDS. Biometrika. 1988;75(3):515. [Google Scholar]
Lakhal L, Rivest L, Beaudoin D. IPCW Estimator for Kendall’s tau under bivariate censoring. The International Journal of Biostatistics. 2009;5:1–20. [Google Scholar]
Mandel M, Betensky R. Simultaneous confidence intervals based on the percentile bootstrap approach. Computational statistics & data analysis. 2008;52(4):2158–2165. doi: 10.1016/j.csda.2007.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Martin E, Betensky R. Testing quasi-independence of failure and truncation times via conditional Kendall’s tau. Journal of the American Statistical Association. 2005;100(470):484–492. [Google Scholar]
Oakes D. On consistency of Kendall’s tau under censoring. Biometrika. 2008;95(4):997. [Google Scholar]
Randles R, Wolfe D. Introduction to the theory of nonparametric statistics. New York: Wiley; 1979. [Google Scholar]
Sun L, Zhu L. A semiparametric model for truncated and censored data* 1. Statistics & Probability Letters. 2000;48(3):217–227. [Google Scholar]
Tsai W. Testing the assumption of independence of truncation time and failure time. Biometrika. 1990;77(1):169. [Google Scholar]
Tsiatis A. A large sample study of Cox’s regression model. The Annals of Statistics. 1981;9(1):93–108. [Google Scholar]
Turnbull B. The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society. Series B (Methodological) 1976;38(3):290–295. [Google Scholar]
Uno H, Cai T, Pencina M, D’Agostino R, Wei L. On The C-Statistics For Evaluating Overall Adequacy Of Risk Prediction Procedures With Censored Survival Data. Harvard University Biostatistics Working Paper Series. 2009:101. doi: 10.1002/sim.4154. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vardi M, Stockmeyer L. Improved upper and lower bounds for modal logics of programs. Proceedings of the seventeenth annual ACM symposium on Theory of computing; ACM; 1985. pp. 240–251. [Google Scholar]
Wang M. A semiparametric model for randomly truncated data. Journal of the American Statistical Association. 1989;84(407):742–748. [Google Scholar]
Wang M. Nonparametric estimation from cross-sectional survival data. Journal of the American Statistical Association. 1991;86(413):130–143. [Google Scholar]
Woodroofe M. Estimating a distribution function with truncated data. The Annals of Statistics. 1985;13(1):163–177. [Google Scholar]

[R1] Austin M, Betensky R. Estimating a survival distribution in the presence of dependent left truncation and right censoring. 2012 Unpublished Manuscript. [Google Scholar]

[R2] Beaudoin D, Duchesne T, Genest C. Improving the estimation of Kendall’s tau when censoring affects only one of the variables. Computational Statistics & Data Analysis. 2007;51(12):5743–5764. [Google Scholar]

[R3] Chaieb L, Rivest L, Abdous B. Estimating survival under a dependent truncation. Biometrika. 2006;93(3):655. [Google Scholar]

[R4] Datta S, Bandyopadhyay D, Satten GA. Inverse probability of censoring weighted U-statistics for right-censored data with an application to testing hypotheses. Scandinavian Journal of Statistics. 2010;37:680–700. [Google Scholar]

[R5] De Uña-álvarez J. Nonparametric estimation under length-biased sampling and Type I censoring: a moment based approach. Annals of the Institute of Statistical Mathematics. 2004;56(4):667–681. [Google Scholar]

[R6] Efron B, Petrosian V. Survival Analysis of the Gamma-Ray Burst Data. Journal of the American Statistical Association. 1994;89(426) [Google Scholar]

[R7] Hyde J. Testing survival under right censoring and left truncation. Biometrika. 1977;64(2):225. [Google Scholar]

[R8] Kaplan E, Meier P. Nonparametric estimation from incomplete observations. Journal of the American statistical association. 1958;53(282):457–481. [Google Scholar]

[R9] Kendall M. A new measure of rank correlation. Biometrika. 1938;30(1–2):81. [Google Scholar]

[R10] Lagakos S, Barraj L, Gruttola V. Nonparametric analysis of truncated survival data, with application to AIDS. Biometrika. 1988;75(3):515. [Google Scholar]

[R11] Lakhal L, Rivest L, Beaudoin D. IPCW Estimator for Kendall’s tau under bivariate censoring. The International Journal of Biostatistics. 2009;5:1–20. [Google Scholar]

[R12] Mandel M, Betensky R. Simultaneous confidence intervals based on the percentile bootstrap approach. Computational statistics & data analysis. 2008;52(4):2158–2165. doi: 10.1016/j.csda.2007.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Martin E, Betensky R. Testing quasi-independence of failure and truncation times via conditional Kendall’s tau. Journal of the American Statistical Association. 2005;100(470):484–492. [Google Scholar]

[R14] Oakes D. On consistency of Kendall’s tau under censoring. Biometrika. 2008;95(4):997. [Google Scholar]

[R15] Randles R, Wolfe D. Introduction to the theory of nonparametric statistics. New York: Wiley; 1979. [Google Scholar]

[R16] Sun L, Zhu L. A semiparametric model for truncated and censored data* 1. Statistics & Probability Letters. 2000;48(3):217–227. [Google Scholar]

[R17] Tsai W. Testing the assumption of independence of truncation time and failure time. Biometrika. 1990;77(1):169. [Google Scholar]

[R18] Tsiatis A. A large sample study of Cox’s regression model. The Annals of Statistics. 1981;9(1):93–108. [Google Scholar]

[R19] Turnbull B. The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society. Series B (Methodological) 1976;38(3):290–295. [Google Scholar]

[R20] Uno H, Cai T, Pencina M, D’Agostino R, Wei L. On The C-Statistics For Evaluating Overall Adequacy Of Risk Prediction Procedures With Censored Survival Data. Harvard University Biostatistics Working Paper Series. 2009:101. doi: 10.1002/sim.4154. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Vardi M, Stockmeyer L. Improved upper and lower bounds for modal logics of programs. Proceedings of the seventeenth annual ACM symposium on Theory of computing; ACM; 1985. pp. 240–251. [Google Scholar]

[R22] Wang M. A semiparametric model for randomly truncated data. Journal of the American Statistical Association. 1989;84(407):742–748. [Google Scholar]

[R23] Wang M. Nonparametric estimation from cross-sectional survival data. Journal of the American Statistical Association. 1991;86(413):130–143. [Google Scholar]

[R24] Woodroofe M. Estimating a distribution function with truncated data. The Annals of Statistics. 1985;13(1):163–177. [Google Scholar]

PERMALINK

Eliminating bias due to censoring in Kendall’s tau estimators for quasi-independence of truncation and failure

Matthew D Austin

Rebecca A Betensky

Abstract

1. Introduction

2. Notation and Kendall’s tau

3. Model 1: residual censoring model

4. Model 2: independent censoring model

5. Simulations and applications

5.1. Simulations: residual censoring model

Table 1.

5.2. Simulations: independent censoring model

Table 2.

Table 3.

Table 4.

5.3. AIDS incubation study

Table 5.

5.4. Channing House study

6. Discussion

Acknowledgments

Appendix A. Appendix

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Eliminating bias due to censoring in Kendall’s tau estimators for quasi-independence of truncation and failure

Matthew D Austin

Rebecca A Betensky

Abstract

1. Introduction

2. Notation and Kendall’s tau

3. Model 1: residual censoring model

4. Model 2: independent censoring model

5. Simulations and applications

5.1. Simulations: residual censoring model

Table 1.

5.2. Simulations: independent censoring model

Table 2.

Table 3.

Table 4.

5.3. AIDS incubation study

Table 5.

5.4. Channing House study

6. Discussion

Acknowledgments

Appendix A. Appendix

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases