A flexible and robust method for assessing conditional association and conditional concordance

Xiangyu Liu; Jing Ning; Yu Cheng; Xuelin Huang; Ruosha Li

doi:10.1002/sim.8202

. Author manuscript; available in PMC: 2020 Feb 27.

Published in final edited form as: Stat Med. 2019 May 9;38(19):3656–3668. doi: 10.1002/sim.8202

A flexible and robust method for assessing conditional association and conditional concordance

Xiangyu Liu ¹, Jing Ning ², Yu Cheng ³, Xuelin Huang ², Ruosha Li ¹

PMCID: PMC7045600 NIHMSID: NIHMS1561358 PMID: 31074082

Abstract

When analyzing bivariate outcome data, it is often of scientific interest to measure and estimate the association between the bivariate outcomes. In the presence of influential covariates for one or both of the outcomes, conditional association measures can quantify the strength of association without the disturbance of the marginal covariate effects, to provide cleaner and less-confounded insights into the bivariate association. In this work, we propose estimation and inferential procedures for assessing the conditional Kendall’s tau coefficient given the covariates, by adopting the quantile regression and quantile copula framework to handle marginal covariate effects. The proposed method can flexibly accommodate right censoring and be readily applied to bivariate survival data. It also facilitates an estimator of the conditional concordance measure, namely, a conditional C index, where the unconditional C index is commonly used to assess the predictive capacity for survival outcomes. The proposed method is flexible and robust and can be easily implemented using standard software. The method performed satisfactorily in extensive simulation studies with and without censoring. Application of our methods to two real-life data examples demonstrates their desirable practical utility.

Keywords: association measure, bivariate outcomes, C index, Kendall’s tau, predictive capacity, quantile regression

1|. INTRODUCTION

Bivariate data and bivariate survival data arise in various biomedical fields, where it is often of interest to examine the association between the two outcomes. Summary measures have been proposed in the literature to quantify the strength of association, including but not limited to Pearson’s product moment correlation coefficient, Spearman’s rho, and Kendall’s tau. In many practical scenarios, one or both of the bivariate outcomes can be skewed or contaminated by outliers. Moreover, the relationship between the two outcomes may not be linear but rather follow a curvature pattern. Under these circumstances, the latter two rank-based association measures are more appropriate, because they are robust to outliers and remain unchanged under monotonic transformations. In this article, we focus on Kendall’s tau coefficient, a widely used rank-based association measure that represents the difference between the concordance probability and the discordance probability for paired observations.

Understanding whether and in what magnitude the covariates change the associational strength between bivariate outcomes is an essential question in association studies. When one or more covariates are associated with the bivariate outcomes, confounding often occurs during the evaluation of the association between the two outcomes. Correlated random variables might be conditionally independent after covariate adjustments if their dependence is fully induced by the covariates. Conditional dependence is also very important as it implies a more inherent association that cannot be well explained by the observed covariates. In one of our motivating examples, we have data on the birth weights of monozygotic and dizygotic twins, plus a list of patient characteristics. It is important to examine whether there is an association, and if so, how much of the association between the twins’ birth weights is attributable to their gender, gestational age, and the medical institution. Conditional association measures, such as the conditional Kendall’s tau, can serve as a useful tool to dissect the bivariate association in the presence of covariates and to gain cleaner and more comprehensive insights.

The literature offers some approaches to estimate the conditional Kendall’s tau for uncensored complete data. For example, Veraverbeke et al estimated the conditional Kendall’s tau given a single covariate using nonparametric kernel smoothing methods.¹ Gijbels et al proposed an empirical copula estimator after adjusting for the covariate effects on the marginal distributions, where the effects can be specified as location-scale regression models.² Ji et al used a generalized Kendall’s tau to test the conditional independence between bivariate outcomes.³

Censoring occurs for various reasons, including limited follow-up for bivariate survival data, or detection limits for biomarker measurements. Estimating the association measures becomes more challenging in the presence of censoring. For Kendall’s tau, the vast majority of the literature handles censoring without considering the covariate adjustment. Oakes proposed an estimator of Kendall’s tau by ignoring the bivariate pairs that have indeterminate ranking relations.⁴ Wang and Wells proposed a nonparametric estimator of Kendall’s tau that uses its relationship with the bivariate survival function.⁵ Fan et al defined a finite region version of Kendall’s tau and estimated it by plugging in nonparametric estimators of the hazard function and joint survival function.⁶ Lakhal et al used inverse probability censoring weights for estimation,⁷ and Hsieh used an imputation method to handle the missing information due to censoring.⁸ To accommodate both censoring and covariate adjustments, Fan and Prentice⁹ generalized the estimator of Fan et al⁶ by assuming that proportional hazards model in the marginal hazard functions. Choi and Matthews proposed an estimator of Kendall’s tau that combines a parametric frailty model and accelerated failure-time model.¹⁰ However, these estimators require relatively strong assumptions regarding the marginal models, which may not hold in many applications.

In this work, we propose a flexible and robust model and an easily implemented method to estimate the conditional Kendall’s tau, by adopting the frameworks from quantile regression¹¹ and quantile association.¹² Quantile regression is an appealing alternative to classical regression models, such as linear regression and Cox proportional hazards models, for several reasons. First, the model assumption is quite flexible, allowing all the covariates to exert varying effects on different quantile levels. Second, quantile regression poses minimal assumptions and is more robust to potential outliers. Third, just like Kendall’s tau, quantile regression is equivariant to monotone transformations. The recent development of quantile regression in survival analysis^13,14 offers convenient tools for estimating marginal quantiles for censored outcomes.

The rest of this paper is organized as follows. In Section 2, we introduce the proposed semiparametric methods to estimate the conditional Kendall’s tau with and without considering right censoring. Our methods do not require any parametric assumption regarding the form of the underlying copula and therefore provide a robust model. For survival outcomes, we present the estimation of a conditional restricted-range C index, which reflects the predictive capacity of one outcome for the other outcome after controlling for covariates. In Section 3, we investigate the finite sample properties of the proposed estimators through simulations under various settings. We apply our methods to the analysis of two real datasets in Section 4. Some discussion and final remarks are provided in Section 5.

2|. METHOD

2.1|. Conditional Kendall’s tau with complete and censored bivariate outcomes

Consider the bivariate outcome (T₁, T₂), where T₁ and T₂ are two continuous random variables. Let F_j(t_j) = P(T_j ≤ t_j) denote the marginal cumulative distribution function (CDF) of T_j, where j = 1, 2. Kendall’s tau (τ) is equal to the probability that two pairs of observations are concordant minus the probability that they are discordant, namely,

τ = P {(T_{1 a} - T_{1 b}) (T_{2 a} - T_{2 b}) > 0} - P {(T_{1 a} - T_{1 b}) (T_{2 a} - T_{2 b}) < 0} = E [sign {(T_{1 a} - T_{1 b}) (T_{2 a} - T_{2 b})}],

(1)

where (T_1a, T_2a) and (T_1b, T_2b) are two pairs of observations from subjects a and b. Kendall’s tau has a close connection with the copula function $P (u_{1}, u_{2})$ . Let ${\tilde{U}}_{1} = F_{1} (T_{1})$ , ${\tilde{U}}_{2} = F_{2} (T_{2})$ , then the copula of (T₁, T₂) is the joint CDF of ( ${\tilde{U}}_{1}, {\tilde{U}}_{2}$ ), defined as

P (u_{1,} u_{2}) = P ({\tilde{U}}_{1} \leq u_{1,} {\tilde{U}}_{2} \leq u_{2}) = P {T_{1} \leq Q_{1} (u_{1}), T_{2} \leq Q_{2} (u_{2})},

(2)

where Q_j(u) = inf {t : F_j(t) ≥ u}, j = 1, 2, represents the marginal quantile function. Let $D = [0, 1] \times [0, 1]$ . Schweizer and Wolff¹⁵ and Nelsen¹⁶ showed that

τ = 4 \int_{D} \int P (u_{1}, u_{2}) d P (u_{1}, u_{2}) - 1 .

(3)

In the presence of a (p + 1) × 1 covariate vector Z = (1, Z₁, Z₂, …, Z_p)^⊺, it is possible that part or all of the association between (T₁, T₂) is attributable to Z, and the residual association given Z can be quantified by a conditional Kendall’s tau,¹⁷

τ_{Z} = 4 \int_{D} \int P_{Z} (u_{1}, u_{2}) d P_{Z} (u_{1}, u_{2}) - 1,

(4)

where the conditional copula

P_{Z} (u_{1}, u_{2}) = P {F_{1} (T_{1} | Z) \leq u_{1}, F_{2} (T_{2} | Z) \leq u_{2} | Z} = P (U_{1} \leq u_{1}, U_{2} \leq u_{2} | Z) .

(5)

Here, U_j ≡ F_j(T_j | Z), j = 1, 2, corresponds to a standardized version of T_j, where F_j(t_j | Z) = P(T_j ≤ t_j | Z) is the conditional CDF of T_j given Z. In this work, we assume that the underlying copula $P_{Z} (u_{1}, u_{2})$ does not further depend on Z, after adjusting for the covariate effects on the marginal CDFs. Thus, $P_{Z} (u_{1}, u_{2})$ represents the underlying copula for all study subjects. This assumption is commonly used in parametric copula models and was used by Gijbels et al² for empirical estimation of the conditional copula. More discussion of this assumption is deferred to Section 5.

Additional complications arise when the bivariate outcome (T₁, T₂) is subject to right censoring. Let (C₁, C₂) denote the vector of bivariate right censoring times, which are assumed to be independent of (T₁, T₂) given Z. The observed data are ${(Y_{1 i}, Y_{2 i}, δ_{1 i}, δ_{2 i}, Z_{i})}_{i = 1}^{n}$ , where Y_ji = T_ji ∧ C_ji, δ_ji = I(T_ji ≤ C_ji), j = 1, 2, ∧ denotes the minimum operator, and I(·) is the indicator function. Due to censoring, the conditional copula $P_{Z} (u_{1}, u_{2})$ may not be identifiable for u₁, u₂ close to 1. Denote the identifiable range as $R = (0, τ_{U_{1}}] \times (0, τ_{U_{2}}]$ . To assess the conditional association between (T₁, T₂) given Z, we extend the work of Fan et al⁶ and define a restricted quantile range conditional Kendall’s tau as

τ_{Z} (ℝ) = E {sign (U_{1 a} - U_{1 b}) (U_{2 a} - U_{2 b}) | U_{1 a} \land U_{1 b} \leq τ_{U_{1}}, U_{2 a} \land U_{2 b} \leq τ_{U_{2}}},

(6)

where U_ji = F_j(T_ji | Z_i) for subject i. Moreover, define the conditional survival copula

S_{Z} (u_{1}, u_{2}) = P (U_{1} > u_{1}, U_{2} > u_{2} | Z),

(7)

it can be shown that

τ_{Z} (ℝ) = \frac{\int \int_{ℝ} S_{Z} (u_{1}^{-}, u_{2}^{-}) S_{Z} (d u_{1}, d u_{2}) - \int \int_{ℝ} S_{Z} (u_{1}^{-}, d u_{2}) S_{Z} (d u_{1}, u_{2}^{-})}{\int \int_{ℝ} S_{Z} (u_{1}^{-}, u_{2}^{-}) S_{Z} (d u_{1}, d u_{2}) + \int \int_{ℝ} S_{Z} (u_{1}^{-}, d u_{2}) S_{Z} (d u_{1}, u_{2}^{-})} .

(8)

When compared to the standard copula $P$ , the survival copula is more commonly adopted for bivariate survival data. The conditional copula $P_{Z} (u_{1}, u_{2})$ and the conditional survival copula $S_{Z} (u_{1}, u_{2})$ are closely connected and uniquely define each other through the following relationship:

S_{Z} (u_{1}, u_{2}) = 1 - u_{1} - u_{2} + P_{Z} (u_{1}, u_{2}) .

(9)

For ease of notation and computation, in the following, we target the $P_{Z} (u_{1}, u_{2})$ for complete data and $S_{Z} (u_{1}, u_{2})$ for bivariate survival data, as the intermediate step for estimating the conditional Kendall’s tau.

2.2|. Estimation of conditional Kendall’s tau

We start with the setting without censoring, where $P_{Z} (u_{1}, u_{2})$ in (5) can be further written as

P_{Z} (u_{1}, u_{2}) = P {T_{1} \leq Q_{1} (u_{1} | Z), T_{2} \leq Q_{2} (u_{2} | Z)} .

The conditional marginal quantiles, Q_j(u|Z) = inf {t : F_j (t|Z) ≥ u} for j = 1, 2. In practice, it is possible that F_j(u | Z) and Q_j(u | Z) only depend on a subset of Z, denoted by Z_j, such that Q_j(u | Z) = Q_j(u | Z_j). In this case, one can estimate Q_j(u | Z) by incorporating Z_j instead of the whole Z.

A rather flexible model for estimating Q_j(u | Z) is the quantile regression model,¹¹ which postulates that

g {Q_{j} (u | Z)} = Z_{j}^{T} β_{j} (u), j = 1, 2 .

(10)

The unknown quantile regression coefficient vector β_j(u) represents the effects of Z_j on the uth quantile of g(T_j). The g(·) function is prespecified and can be any monotone link function, such as the identity link or the log link. There is little restriction on β_j(u), and the model can accommodate a wide variety of effect patterns. At a specific quantile level, the estimated coefficient can be obtained as

{\hat{β}}_{j} (u) = \arg \min_{β_{j}} \sum_{i = 1}^{n} ρ_{u} {g (T_{j i}) - Z_{j i}^{T} β_{j}}, j = 1, 2; u \in (0, 1),

(11)

where ρ_u(t) = {u − I(t < 0)}t is the quantile loss function.¹¹ The estimator ${\hat{β}}_{j} (u)$ can be obtained easily in R package quantreg for a sequence of u levels of interest.

We propose to estimate $P_{Z} (u_{1}, u_{2})$ by imposing an equally spaced fine grid on $D = [0, 1] \times [0, 1]$ . In practice, the grid size can be 0.01 or similar values. Without loss of generality, we let Q(0 | Z) ≡ −∞ and let Q(1 | Z) ≡ ∞. At other grid points, we can define an empirical estimator of $P_{Z} (u_{1}, u_{2})$ as

{\hat{P}}_{Z} (u_{1}, u_{2}) = \frac{1}{n} \sum_{i = 1}^{n} I {T_{1 i} \leq {\hat{Q}}_{1} (u_{1} | Z_{1 i}), T_{2 i} \leq {\hat{Q}}_{2} (u_{2} | Z_{2 i})},

(12)

where ${\hat{Q}}_{j} (u_{j} ∣ Z_{ji}) = g^{- 1} {Z_{ji} \hat{β} (u_{j})}$ . Following this, ${\hat{P}}_{Z} (u_{1}, u_{2})$ is a right-continuous bivariate step function that jumps only on the two-dimensional grid points. Plugging this estimator into (4) leads to an estimator for conditional Kendall’s tau, denoted by ${\hat{τ}}_{Z}$ .

For bivariate censored data, we first adopt the methods of Li et al¹⁸ to estimate the survival copula

S_{Z} (u_{1}, u_{2}) = P (U_{1} > u_{1}, U_{2} > u_{2}) = P {T_{1} > Q_{1} (u_{1} | Z), T_{2} > Q_{2} (u_{2} | Z)} .

The quantile regression model in (10) was adopted for the quantile range of u_j ∈ (0, τ_{U_j}], j = 1, 2. With survival outcomes, the quantile regression model offers more flexibility than traditional censored regression models such as the Cox proportional hazards model and the accelerated failure time model. An application of existing censored quantile regression methods, such as that in the work of Peng and Huang,¹⁴ gives a consistent estimator ${\hat{β}}_{j} (u)$ for u ∈ (0, τ_{U_j}].

For the censored quantile regression, selection of the identifiable range is known to be a subtle issue. The upper bound τ_{U_j} should satisfy the regularity conditions in C1-C4 in the work of Peng and Huang.¹⁴ In practice, we can follow the recommendations therein¹⁴ and select the upper bounds of $R$ in an adaptive manner. In general, τ_{U_j} is selected primarily according to the quantile range of interest and the censoring rate in the data. When the selected value exceeds the upper limit of identifiability, one often observe abnormal behaviors in the estimated ${\hat{β}}_{j} (u)$ and its standard error estimates, such as nonconvergence and/or large standard error estimates. When these occur, the upper bound τ_{U_j} must be reset to a smaller value. More details about the identifiable range can be found in the work of Peng and Huang.¹⁴

Li et al¹⁸ proposed an estimator for the subdistribution processes of (U_1i, U_2i) and then adapted the Volterra-type approach of Prentice and Cai¹⁹ to obtain the estimated survival copula as ${\hat{S}}_{Z} (u_{1}, u_{2})$ for $(u_{1}, u_{2}) \in R$ . To derive an estimator for the conditional restricted-range Kendall’s tau, $τ_{Z} (R)$ , we impose a fine grid on the restricted quantile range $R$ and define $S_{Z} (u_{1}, u_{2})$ as a bivariate step function that only jumps on grid points. According to the relationship in (8), we can derive a plug-in type of estimator for $τ_{Z} (R)$ , denoted by ${\hat{τ}}_{Z} (R)$ .

Under the same regularity conditions as those required in the supplemental material B in the work of Li et al,¹⁸ we can show that ${\hat{τ}}_{Z} (R) \overset{P}{\to} τ_{Z} (R)$ and $\sqrt{n} {{\hat{τ}}_{Z} (R) - τ_{Z} (R)}$ converge in distribution to a zero-mean normal distribution. These follow from the result in the work of Li et al¹⁸ that ${\hat{S}}_{Z} (u_{1}, u_{2})$ is uniformly consistent to $S_{Z} (u_{1}, u_{2})$ . Further, $\sqrt{n} {{\hat{S}}_{Z} (u_{1}, u_{2}) - S_{Z} (u_{1}, u_{2})}$ can be written as $Ξ {\sqrt{n} {\vec{Z}}_{n} (u_{1}, u_{2})} + o_{p} (1)$ , where Ξ(·) is a linear operator and $\sqrt{n} {\vec{Z}}_{n} (u_{1}, u_{2})$ converges weakly to a tight zero-mean Gaussian process. Next, the proposed ${\hat{τ}}_{Z} (R)$ is a plug-in estimator based on ${\hat{S}}_{Z} (u_{1}, u_{2})$ and can be expressed as $Ψ {{\hat{S}}_{Z} (u_{1}, u_{2})}$ , where Ψ(·) is a uniform Hadamard differentiable functional of $S$ .^6,20 Let dΨ(·) denote the derivative of Ψ(·),we have that $\sqrt{n} {{\hat{τ}}_{Z} (R) - τ_{Z} (R)}$ is asymptotically equivalent to $d Ψ \circ Ξ {\sqrt{n} {\vec{Z}}_{n} (u_{1}, u_{2})}$ . The continuous mapping theorem and the fact that Gaussian property is preserved under linear operations give the asymptotic results for ${\hat{τ}}_{Z} (R)$ .

2.3|. Conditional C index

Our method facilitates an estimator for the conditional concordance probability, namely, a conditional version of the widely adopted C index for survival data. In the presence of covariates, direct estimation of the conditional C index following its definition is quite challenging, due to the curse of dimensionality. However, we note that it can be naturally estimated semiparametrically, based on the nice relationship between Kendall’s tau and the C index.

Without covariates, the C index can be written as C = P(T_2a > T_2b | T_1a > T_1b).^21–24 When there is no censoring, we propose the conditional C index as

C_{Z} = E {P (T_{2 a} > T_{2 b} | T_{1 a} > T_{1 b,}, Z_{a} = Z_{b} = Z)} = E {P (T_{1 a} > T_{1 b} | T_{2 a} > T_{2 b,}, Z_{a} = Z_{b} = Z)},

(13)

where Z_a and Z_b respectively represent the covariate vector for two independent subjects, a and b. It reflects the probability of rank concordance for a pair of independent subjects, given that the two subjects have the same covariate value.

The unconditional C index has been widely studied. The conditional C index in (13) reflects the prognostic value of T₁ for T₂, or vice versa, after controlling for covariates Z. Thus, the quantity addresses the important question as to whether T₁ carries additional prognostic value for T₂ given Z, where C_Z = 0.5 corresponds to no additional prognostic value and C_Z = 1 corresponds to the ideal prognostic value. This quantity bears important practical utility, for example, when one of the outcomes, say, T₁, is easier to observe, and the other outcome, T₂, is expensive or time consuming to measure. Under the common copula assumption, we can derive that

C_{Z} = E {P (U_{2 a} > U_{2 b} | U_{1 a} > U_{1 b}, Z_{a} = Z_{b} = Z)} = {1 + τ_{Z}} / 2,

which entails a plug-in estimator for the conditional C index denoted by ${\hat{C}}_{Z}$ .

In the presence of censoring, let $I_{ab}$ be a shorthand of the indicator function I(U_1a ∧ U_1b ≤ τ_U1, U_2a ∧ U_2b ≤ τ_U2) for an independent pair of subjects indexed by a and b. We can extend the unconditional C index under univariable censoring from the work of Uno et al²⁴ to bivariate censored data with covariates, and we define a restricted-range conditional C index as

C_{Z} (ℝ) = E {P (U_{2 a} > U_{2 b} | U_{1 a} > U_{1 b}, I_{a b} = 1, Z_{a} = Z_{b} = Z)} = {1 + τ_{Z} (ℝ)} / 2 .

When there are no covariates and only one of the outcomes is subject to censoring, $C_{Z} (R)$ reduces to the commonly studied C index for censored data. An estimator of this statistic can be formulated as ${\hat{C}}_{Z} (R) = {1 + {\hat{τ}}_{Z} (R)} ∕ 2$ .

2.4|. Variance estimation and inference

The first goal is to derive confidence limits for τ_Z. The standard error of ${\hat{τ}}_{Z} (R)$ is approximated well by the bootstrap method. Let h(x) be a link function that maps (−1,1) to (−∞, ∞), such as h(x) = 0.5 log{(1 + r)/(1 − r)}.We can build confidence intervals (CIs) for $τ_{Z} (R)$ using a Wald-type CI of $h {τ_{Z} (R)}$ and the delta method.

Next, our examination of conditional independence can be formulated as the hypothesis testing problem of

H_{0} : τ_{Z} (ℝ) = 0 vs H_{1} : τ_{Z} (ℝ) \neq 0 .

If this null hypothesis is rejected, there is statistical evidence that the association between the two outcomes is not merely explained by Z, and there must exist additional factors that underlie the residual association. To conduct the hypothesis testing, we can formulate a Wald-type test statistic as

Z = \frac{h {{\hat{τ}}_{Z} (ℝ)}}{h' {{\hat{τ}}_{Z} (ℝ)} \hat{S E} {{\hat{τ}}_{Z} (ℝ)}},

where h′(x) = dh(x)/dx and $\hat{SE} (\cdot)$ denotes the estimated standard error of an estimator. This test statistic asymptotically follows the standard normal distribution under the null hypothesis. Similarly, we can calculate the confidence limits and conduct hypothesis testing for the conditional C statistic.

3|. SIMULATION STUDY

To examine the finite-sample performance of the proposed methods, we conducted simulations under two configurations: one for the complete dataset and the other for the censored data. For both configurations, the bivariate outcomes (T₁, T₂) were generated from log-linear models. We let T₁ have independent and identically distributed errors and T₂ have covariate-dependent errors by setting log T₁ = b₁₁Z₁ + b₁₂Z₂ + ϵ₁ and log T₂ = b₂₁Z₁ + b₂₂Z₂ + I(Z₂ = 0)ϵ₂ + I(Z₂ = 1)ϵ₃, respectively, where Z₁ ~ unif(0, 2), Z₂ ~ Bernoulli(0.5), ϵ₁ ~ N(0, 0.5²), ϵ₂ ~ N(0, 0.15²), and ϵ₃ ~ N(0, 0.5²). Thus, the corresponding quantile regression models were

Q_{1} (u_{1} ∣ Z) = \exp {b_{11} Z_{1} + b_{12} Z_{2} + Q_{∊_{1}} (u_{1})}, Q_{2} (u_{2} ∣ Z) = \exp [b_{21} Z_{1} + {Q_{∊_{3}} (u_{2}) - Q_{∊_{2}} (u_{2}) + b_{22}} \times Z_{2} + Q_{∊_{2}} (u_{2})] .

Given the covariates, the paired outcomes follow the Clayton copula with different values of the association parameters, such that the corresponding Kendall’s tau is equal to 0.00, 0.25, or 0.50, respectively. For the regression coefficients of (Z₁, Z₂) on T₁, we set b₁₁ = 0.8 and b₁₂ = 0.6. We considered three different coefficient settings for T₂: (i) Z₁ and Z₂ are not associated with T₂, namely, b₂₁ = 0 and b₂₂ = 0; (ii) Z₁ and Z₂ contribute positively to T₂, where b₂₁ = 0.9 and b₂₂ = 0.7; and (iii) Z₁ and Z₂ contribute negatively to T₂, where b₂₁ = −0.9 and b₂₂ = −0.7.

For configuration II with right censoring, we generated the censoring time C₁ from a Weibull(shape = α, scale = β) distribution and C₂ from a uniform distribution. Two censoring scenarios were considered: (i) low censoring rates (LC) - 15% for T₁ and 20% for T₂; and (ii) high censoring rates (HC) - 40% for T₁ and 40% for T₂. The censoring times (C₁, C₂) were generated so that the censoring rates are consistent with LC and HC. Details of the setups are as follows.

LC, (b₂₁, b₂₂) = (0, 0),where C₁ ~ Weibull(α = 2.9, β = 9.0), C₂ ~ unif(0.3, 4.1)
HC, (b₂₁, b₂₂) = (0, 0), where C₁ ~ Weibull(α = 1.8, β = 5.1), C₂ ~ unif(0.1, 2.5)
LC, (b₂₁, b₂₂) = (0.9, 0.7), where C₁ ~ Weibull(α = 2.9, β = 9.0), C₂ ~ unif(2.0, 15.0)
HC, (b₂₁, b₂₂) = (0.9, 0.7), where C₁ ~ Weibull(α = 1.8, β = 5.1), C₂ ~ unif(0.0, 10.6)
LC, (b₂₁, b₂₂) = (−0.9, −0.7), where C₁ ~ Weibull(α = 2.9, β = 9.0), C₂ ~ unif(0.3, 0.9)
HC, (b₂₁, b₂₂) = (−0.9, −0.7), where C₁ ~ Weibull(α = 1.8, β = 5.1), C₂ ~ unif(0.0, 0.9)

In addition to Clayton’s copula, we incorporated a setting where (T₁, T₂) conditionally follow Frank’s copula, and where the copula parameters are chosen such that the unrestricted Kendall’s tau $τ_{Z} (D)$ is equal to 0, 0.25, and 0.5, respectively. The method of Li et al¹⁸ was used to obtain $\hat{S} (u_{1}, u_{2})$ , and an equally spaced fine grid of size 0.01 was adopted. We ran 2000 simulations with sample sizes n = 100, 200, 400 for configuration I, which does not involve censoring, and with n = 200, 400 for configuration II. For each simulation, we used the bootstrap resampling method with B = 400 to obtain the standard errors and Wald-type 95% CIs of the parameters.

Table 1 reports the results of the simulation study for the conditional Kendall’s tau ${\hat{τ}}_{Z}$ under configuration I, including its true value (TRUE), empirical biases (BIAS), empirical standard deviations (ESD), the average of bootstrap resampling-based standard errors (ASE), the empirical coverage probability of 95% Wald-type confidence intervals (ECP), and the empirical rejection rate (ERR) for testing conditional independence. We also present the true value of the unconditional Kendall’s tau (RAW). We observe that the estimators are virtually unbiased under all scenarios, while the empirical bias tends to shrink with the sample size. The bootstrap-based standard errors agree with the empirical standard deviations quite well, and as expected, decrease with the sample size at the $\sqrt{n}$ rate. The empirical coverage probabilities are close to the nominal level of 95% and are not compromised by the small sample size. The empirical rejection rates are close to 0.05 when the true conditional Kendall’s tau is 0 and close to 1.00 when the true Kendall’s tau is 0.25 or 0.50, suggesting that we have good power to detect the true association between the two outcomes, even with a small sample size of 100.

TABLE 1.

Simulation results of estimating the true conditional Kendall’s tau (TRUE) for the complete configuration under different scenarios with fixed coefficients of (Z₁, Z₂) on T₁ (b₁₁ = 0.8, b₁₂ = 0.6). RAW is the mean of the unconditional Kendall’s tau. BIAS, ESD, ASE, ECP, and ERR represent the empirical bias (mean minus TRUE), standard deviation, mean of estimated standard error, coverage rates, and rejection rates for the proposed estimator

n	RAW	TRUE	BIAS	ESD	ASE	ECP	ERR
Clayton (b₂₁ = 0.0, b₂₂ = 0.0)

100	0.000	0.000	0.000	0.067	0.070	0.955	0.045
	0.154	0.250	−0.009	0.065	0.068	0.954	0.932
	0.286	0.500	−0.019	0.054	0.058	0.955	1.000

200	0.000	0.000	−0.001	0.047	0.048	0.950	0.050
	0.154	0.250	−0.005	0.046	0.047	0.953	0.999
	0.286	0.500	−0.010	0.038	0.040	0.946	1.000

400	0.000	0.000	0.001	0.034	0.034	0.946	0.054
	0.154	0.250	−0.002	0.033	0.033	0.937	1.000
	0.286	0.500	−0.004	0.026	0.027	0.953	1.000

Clayton (b₂₁ = 0.9, b₂₂ = 0.7)

100	0.460	0.000	0.000	0.067	0.070	0.958	0.042
	0.564	0.250	−0.010	0.065	0.068	0.954	0.930
	0.661	0.500	−0.020	0.054	0.058	0.955	1.000

200	0.460	0.000	−0.001	0.047	0.048	0.951	0.049
	0.564	0.250	−0.005	0.046	0.047	0.954	0.999
	0.661	0.500	−0.010	0.038	0.040	0.946	1.000

400	0.460	0.000	0.001	0.034	0.034	0.946	0.054
	0.564	0.250	−0.002	0.033	0.033	0.937	1.000
	0.661	0.500	−0.004	0.026	0.027	0.952	1.000

Clayton (b₂₁ = −0.9, b₂₂ = −0.7)

100	−0.454	0.000	0.000	0.067	0.070	0.956	0.044
	−0.377	0.250	−0.010	0.065	0.068	0.954	0.930
	−0.315	0.500	−0.020	0.054	0.058	0.956	1.000

200	−0.454	0.000	−0.001	0.047	0.048	0.952	0.048
	−0.377	0.250	−0.005	0.046	0.047	0.952	0.999
	−0.315	0.500	−0.010	0.038	0.040	0.946	1.000

400	−0.454	0.000	0.001	0.034	0.034	0.945	0.055
	−0.377	0.250	−0.002	0.033	0.033	0.935	1.000
	−0.315	0.500	−0.004	0.026	0.027	0.953	1.000

Open in a new tab

Compared to the values of the conditional Kendall’s tau, we observe that the unconditional counterparts (RAW) are smaller in magnitude when (b₁₁, b₁₂) = (0,0) (top section of Table 1) and larger when b₁₁ > 0 and b₁₁ > 0 (middle section). The sign of the unconditional Kendall’s tau is in the reverse direction when b₁₁ < 0 and b₁₁ < 0 (bottom section). These results suggest that covariates may sometimes heavily distort the dependence between the bivariate outcomes, and the conditional dependence index such as ${\hat{τ}}_{Z}$ can provide cleaner insights after removing the effects of covariates.

Table 2 presents the results of the simulation study for configuration II. We observe patterns similar to those in the uncensored configuration. In the presence of censoring, the estimators remain unbiased, and the standard error estimates, Wald-type CIs and hypothesis testing procedures continue to perform well. The bootstrap-based standard errors agree quite well with the empirical standard deviations. The results of Frank copula are displayed at the bottom of Table 2 and are comparable to those we observe for Clayton’s copula.

TABLE 2.

Simulation results for estimating the restricted-range conditional Kendall’s tau for the censoring configuration with fixed coefficients of (Z₁ Z₂) on T₁(b₁₁ = 0.8, b₁₂ = 0.6), where (τ_U₁, τ_U₂) = (0.8, 0.8) for low censoring and (τ_U₁, τ_U₂) = (0.7, 0.7) for high censoring. RAW is the mean of the unconditional Kendall’s tau. BIAS, ESD, ASE, ECP, and ERR represent the empirical bias (mean minus TRUE), standard deviation, mean of estimated standard error, coverage rates, and rejection rates for the proposed estimator

		Low Censoring							High Censoring
n	RAW	TRUE	BIAS	ESD	ASE	ECP	ERR	RAW	TRUE	BIAS	ESD	ASE	ECP	ERR
Clayton (b₂₁ = 0.0, b₂₂ = 0.0)

200	0.000	0.000	0.000	0.058	0.060	0.957	0.043	0.000	0.000	−0.002	0.079	0.087	0.966	0.034
	0.158	0.269	−0.001	0.057	0.060	0.967	0.992	0.163	0.292	−0.005	0.079	0.085	0.965	0.922
	0.292	0.535	−0.002	0.051	0.055	0.966	1.000	0.299	0.572	−0.010	0.070	0.079	0.973	1.000

400	0.000	0.000	0.000	0.040	0.042	0.950	0.050	0.000	0.000	0.001	0.055	0.058	0.958	0.042
	0.158	0.269	0.002	0.039	0.041	0.965	1.000	0.163	0.292	−0.002	0.054	0.057	0.959	0.998
	0.292	0.535	0.004	0.034	0.037	0.965	1.000	0.299	0.572	−0.003	0.047	0.052	0.971	1.000

Clayton (b₂₁ = 0.9, b₂₂ = 0.7)

200	0.486	0.000	0.000	0.057	0.059	0.956	0.044	0.511	0.000	0.000	0.077	0.084	0.964	0.036
	0.591	0.269	−0.001	0.055	0.059	0.963	0.992	0.616	0.292	−0.006	0.075	0.082	0.971	0.940
	0.687	0.535	0.000	0.049	0.054	0.974	1.000	0.708	0.572	−0.010	0.067	0.076	0.972	1.000

400	0.486	0.000	0.000	0.039	0.041	0.956	0.044	0.511	0.000	0.000	0.054	0.056	0.960	0.040
	0.591	0.269	0.002	0.038	0.040	0.964	1.000	0.616	0.292	−0.001	0.052	0.055	0.966	1.000
	0.687	0.535	0.005	0.032	0.035	0.963	1.000	0.708	0.572	−0.001	0.045	0.050	0.973	1.000

Clayton (b₂₁ = −0.9, b₂₂ = −0.7)

200	−0.467	0.000	0.000	0.058	0.061	0.962	0.038	−0.480	0.000	−0.001	0.086	0.093	0.970	0.030
	−0.387	0.269	−0.003	0.058	0.061	0.953	0.992	−0.398	0.292	−0.010	0.083	0.092	0.970	0.864
	−0.324	0.535	−0.003	0.051	0.056	0.966	1.000	−0.333	0.572	−0.018	0.076	0.087	0.971	1.000

400	−0.467	0.000	0.000	0.040	0.042	0.958	0.042	−0.480	0.000	0.001	0.057	0.062	0.966	0.034
	−0.387	0.269	0.001	0.039	0.041	0.963	1.000	−0.398	0.292	−0.004	0.056	0.061	0.968	0.996
	−0.324	0.535	0.003	0.033	0.037	0.970	1.000	−0.333	0.572	−0.007	0.051	0.057	0.968	1.000

Frank (b₂₁ = −0.9, b₂₂ = −0.7)

200	−0.466	0.000	0.001	0.059	0.061	0.957	0.043	−0.480	0.000	−0.002	0.085	0.093	0.972	0.028
	−0.385	0.266	0.003	0.054	0.058	0.960	0.996	−0.398	0.281	−0.002	0.079	0.088	0.972	0.882
	−0.320	0.527	0.006	0.043	0.051	0.976	1.000	−0.331	0.549	−0.005	0.068	0.080	0.979	1.000

400	−0.466	0.000	0.000	0.041	0.042	0.962	0.038	−0.480	0.000	0.001	0.060	0.062	0.952	0.048
	−0.385	0.266	0.004	0.038	0.039	0.953	1.000	−0.398	0.281	0.003	0.054	0.058	0.958	0.998
	−0.320	0.527	0.009	0.029	0.032	0.963	1.000	−0.331	0.549	0.004	0.046	0.050	0.969	1.000

Open in a new tab

Table 3 provides the simulation results for the conditional C index. Due to space limitation, we present only the simulation results under the high censoring configuration. The results suggest that the estimators are unbiased and the average of the bootstrap-based standard errors agree quite well with the empirical standard deviations.

TABLE 3.

Simulation results for estimating the conditional C index under high censoring configuration with fixed coefficients of (Z₁, Z₂) on T₁ (b₁₁ = 0.8, b₁₂ = 0.6), where (τ_U₁, τ_U₂) = (0.7, 0.7)

	High Censoring
n	RAW	TRUE	BIAS	ESD	ASE	ECP
Clayton (b₂₁ = 0.0, b₂₂ = 0.0)

200	0.500	0.500	−0.001	0.040	0.043	0.966
	0.582	0.646	−0.002	0.039	0.043	0.965
	0.650	0.786	−0.005	0.035	0.040	0.973

400	0.500	0.500	0.000	0.028	0.029	0.958
	0.582	0.646	−0.001	0.027	0.028	0.959
	0.650	0.786	−0.001	0.024	0.026	0.971

Clayton (b₂₁ = 0.9, b₂₂ = 0.7)

200	0.756	0.500	0.000	0.039	0.042	0.964
	0.808	0.646	−0.003	0.037	0.041	0.971
	0.854	0.786	−0.005	0.034	0.038	0.972

400	0.756	0.500	0.000	0.027	0.028	0.960
	0.808	0.646	−0.001	0.026	0.028	0.966
	0.854	0.786	0.000	0.023	0.025	0.973

Clayton (b₂₁ = −0.9, b₂₂ = −0.7)

200	0.260	0.500	−0.001	0.043	0.047	0.970
	0.301	0.646	−0.005	0.041	0.046	0.970
	0.334	0.786	−0.009	0.038	0.044	0.971

400	0.260	0.500	0.000	0.029	0.031	0.966
	0.301	0.646	−0.002	0.028	0.031	0.968
	0.334	0.786	−0.003	0.025	0.028	0.968

Frank (b₂₁ = −0.9, b₂₂ = −0.7)

200	0.260	0.500	−0.001	0.043	0.047	0.972
	0.301	0.641	−0.001	0.040	0.044	0.972
	0.335	0.775	−0.003	0.034	0.040	0.979

400	0.260	0.500	0.001	0.030	0.031	0.952
	0.301	0.641	0.002	0.027	0.029	0.958
	0.335	0.775	0.002	0.023	0.025	0.969

Open in a new tab

For all previous simulation studies, T₁ and T₂ are generated using the same covariates Z₁ and Z₂. Our methods allow the two marginal models for the individual outcomes to have different covariates. To see this, we replace Z₁ in the data generation procedures with Z₁₁ for T₁ and with Z₁₂ for T₂, where (Z₁₁, Z₁₂) ~ Gaussiancopula(0.25). We display the results under the high censoring scenario (Table 4), where all estimators for the conditional Kendall’s tau and C index continue to perform satisfactorily.

TABLE 4.

Simulation results for estimating the conditional Kendall’s tau and conditional C index under the high censoring configuration with fixed coefficients of (Z₁, Z₂) on T₁ (b₁₁ = 0.8, b₁₂ = 0.6), where (τ_U₁, τ_U₂) = (0.7, 0.7) and (Z₁₁, Z₁₂) ~ Gaussiancopula(0.25). RAW is the mean of the unconditional Kendall’s tau or C indexes that do not consider covariates. BIAS, ESD, ASE, ECP, and ERR represent the empirical bias (mean minus TRUE), standard deviation, mean of estimated standard error, coverage rates, and rejection rates for the proposed estimator

	Conditional Kendall’s Tau							Conditional C Index
n	RAW	TRUE	BIAS	ESD	ASE	ECP	ERR	RAW	TRUE	BIAS	ESD	ASE	ECP
	Clayton (b₂₁ = 0.0, b₂₂ = 0.0)							Clayton (b₂₁ = 0.0, b₂₂ = 0.0)

200	0.000	0.000	0.000	0.080	0.086	0.970	0.030	0.500	0.500	0.000	0.040	0.043	0.970
	0.163	0.292	−0.008	0.077	0.085	0.973	0.919	0.582	0.646	−0.004	0.038	0.042	0.973
	0.299	0.572	−0.016	0.069	0.079	0.972	1.000	0.650	0.786	−0.008	0.034	0.040	0.972

400	0.000	0.000	−0.001	0.054	0.058	0.962	0.038	0.500	0.500	0.000	0.027	0.029	0.962
	0.163	0.292	−0.004	0.053	0.057	0.964	1.000	0.582	0.646	−0.002	0.027	0.028	0.964
	0.299	0.572	−0.004	0.048	0.052	0.962	1.000	0.650	0.786	−0.002	0.024	0.026	0.962

	Clayton (b₂₁ = 0.9, b₂₂ = 0.7)							Clayton (b₂₁ = 0.9, b₂₂ = 0.7)

200	0.221	0.000	−0.001	0.079	0.085	0.968	0.032	0.611	0.500	0.000	0.039	0.043	0.968
	0.304	0.292	−0.009	0.078	0.084	0.962	0.920	0.652	0.646	−0.005	0.039	0.042	0.962
	0.367	0.572	−0.017	0.071	0.078	0.957	1.000	0.684	0.786	−0.008	0.036	0.039	0.957

400	0.221	0.000	−0.002	0.053	0.057	0.962	0.038	0.611	0.500	−0.001	0.027	0.029	0.962
	0.304	0.292	−0.003	0.052	0.056	0.964	1.000	0.652	0.646	−0.002	0.026	0.028	0.964
	0.367	0.572	−0.002	0.048	0.051	0.962	1.000	0.684	0.786	−0.001	0.024	0.026	0.962

	Clayton (b₂₁ = −0.9, b₂₂ = −0.7)							Clayton (b₂₁ = −0.9, b₂₂ = −0.7)

200	−0.208	0.000	−0.001	0.083	0.091	0.971	0.029	0.396	0.500	0.000	0.042	0.046	0.971
	−0.139	0.292	−0.012	0.083	0.090	0.964	0.870	0.431	0.646	−0.006	0.041	0.045	0.964
	−0.085	0.572	−0.020	0.076	0.086	0.962	1.000	0.458	0.786	−0.010	0.038	0.043	0.962

400	−0.208	0.000	0.000	0.057	0.061	0.960	0.040	0.396	0.500	0.000	0.028	0.030	0.960
	−0.139	0.292	−0.006	0.056	0.060	0.968	0.998	0.431	0.646	−0.003	0.028	0.030	0.968
	−0.085	0.572	−0.007	0.052	0.056	0.963	1.000	0.458	0.786	−0.003	0.026	0.028	0.963

	Frank (b₂₁ = −0.9, b₂₂ = −0.7)							Frank (b₂₁ = −0.9, b₂₂ = −0.7)

200	−0.208	0.000	−0.001	0.083	0.091	0.969	0.031	0.396	0.500	−0.001	0.042	0.045	0.969
	−0.142	0.281	−0.004	0.079	0.086	0.970	0.896	0.429	0.641	−0.002	0.039	0.043	0.970
	−0.086	0.549	−0.009	0.068	0.078	0.971	1.000	0.457	0.775	−0.004	0.034	0.039	0.971

400	−0.208	0.000	0.001	0.058	0.061	0.954	0.046	0.396	0.500	0.000	0.029	0.030	0.954
	−0.142	0.281	0.001	0.053	0.057	0.964	1.000	0.429	0.641	0.001	0.027	0.028	0.964
	−0.086	0.549	0.001	0.046	0.050	0.972	1.000	0.457	0.775	0.001	0.023	0.025	0.972

Open in a new tab

In addition, we conducted sensitivity studies to evaluate the robustness of the proposed method with respect to violations of the independent censoring assumption. The details of the sensitivity analysis are provided in the online supporting information. We examined the scenario when T_j is subject to the dependent censoring by C_j, j = 1, 2, as well as the scenario when T₁ is subject to the dependent censoring by T₂. Under all the simulation settings considered, the proposed method is reasonably robust to the violation of the independent censoring assumption.

4|. REAL DATA EXAMPLE

4.1|. Analysis of the premature twins data

We applied the proposed methods to data from a retrospective study of premature twins.²⁵ Variables recorded for 63 pairs of monozygotic (MZ) and 137 dizygotic (DZ) twins include birth weights (BW), gestational age (GA), sex (male vs female), and birth medical institutions (INST, a total of three institutions). The unconditional Kendall’s tau coefficient for the twins’ birth weights is equal to 0.604 (95% CI, 0.488 to 0.720) for MZ twins and 0.698 (95% CI, 0.767 to 0.629) for DZ twins. In this analysis, we aimed to assess whether and how different covariates explain the dependence between the birth weights.

Table 5 presents the estimated Kendall’s tau and C index with and without conditioning on covariates for MZ twins (top section) and DZ twins (bottom section), where the left panel was based on the linear quantile regression model with identity link, and the right panel was based on linear quantile regression models after log-transforming both birth weight outcomes. Standard errors were obtained using B = 400 bootstrap resamples. There is little difference between the left and right panels, suggesting that our method may be insensitive to the link function g(−). In the following, we focus on interpreting the results using the identity link.

TABLE 5.

Analysis of the premature twins’ birth weights without considering the covariates (first row) and adjusting for the covariates for monozygotic (MZ) twins (top section) and dizygotic (DZ) twins (bottom section)

		g(x) = x					g(x) = log(x)
		τ	SE(τ)	p-value	C	SE(C)	τ	SE(τ)	p-value	C	SE(C)
MZ	(Y₁, Y₂)	0.604	0.059	<0.001	0.802	0.030	0.604	0.059	<0.001	0.802	0.030
	(Y₁, Y₂ \| Sex)	0.594	0.055	<0.001	0.797	0.027	0.594	0.055	<0.001	0.797	0.027
	(Y₁, Y₂ \| GA)	0.310	0.076	<0.001	0.655	0.038	0.315	0.082	<0.001	0.657	0.041
	(Y₁, Y₂ \| INST)	0.609	0.056	<0.001	0.805	0.028	0.609	0.056	<0.001	0.805	0.028

DZ	(Y₁, Y₂)	0.698	0.035	<0.001	0.849	0.017	0.698	0.035	<0.001	0.849	0.017
	(Y₁, Y₂ \| Sex)	0.694	0.034	<0.001	0.847	0.017	0.694	0.034	<0.001	0.847	0.017
	(Y₁, Y₂ \| GA)	0.380	0.053	<0.001	0.690	0.027	0.390	0.051	<0.001	0.695	0.025
	(Y₁, Y₂ \| INST)	0.662	0.036	<0.001	0.831	0.018	0.662	0.036	<0.001	0.831	0.018

Open in a new tab

When only one covariate is taken into consideration, conditioning on the gestational age reduces the dependence the most, from 0.604 to 0.310 for MZ twins and from 0.698 to 0.380 for DZ twins. At the same time, the conditional C index decreases to 0.655 (95% CI, 0.581 to 0.729) for MZ twins and to 0.690 (95% CI, 0.637 to 0.743) for DZ twins, after adjusting for GA. These findings suggest that the same gestational age may be the most important explanatory factor for the dependence in birth weights between the twins. Therefore, gestational age alone explains approximately 50% of the positive dependence between the birth weights for both MZ and DZ twins. By comparison, gender and institution do not explain much of the association between the twins’ birth weights. We also examined different combinations of the covariates, but none of them showed further decrease of the association. There remains a small to moderate level of dependence that cannot be explained by the covariates considered, and the residual association is likely attributable to other causes, such as maternal and genetic factors. For both the unconditional and conditional Kendall’s tau, we observed larger estimates for DZ twins when compared to MZ twins. This interesting pattern may be because that the MZ twins often share one placenta, which may cause unbalanced birth weights due to limited nutrition and oxygen.

4.2|. Analysis of data from the German Breast Cancer Study

We next applied the proposed methods to data from the German Breast Cancer Study (GBCS), a prospective study to explore prognostic factors for node-positive breast cancer.²⁶ Time to recurrence (Y₁) and time to death (Y₂), along with eight prognostic factors (age, menopausal status, hormone therapy, tumor size, tumor grade, number of positive lymph nodes, progesterone and estrogen receptors) were recorded for n = 686 patients, out of whom 278 (40.5%) experienced cancer recurrence and 171 (24.9%) died. Death occurred before recurrence for 21 (3.1%) subjects, posing dependent censoring to the recurrence event. Since the rate of dependent censoring is very low, we ignored this complication below, in view of the results of the sensitivity studies in the online supporting information, Section 1.2. Some further discussions about this complication can be found in Section 5. In this analysis, the association between the time to recurrence and time to death is our main interest, and we log-transformed the tumor size, number of positive lymph nodes, number of progesterone receptors and estrogen receptors to make them less skewed.

The dependence between the time to disease recurrence and time to death carries significant information about the disease progression and can be very useful in clinical decision-making for disease prognosis. For example, conditional independence between the time to recurrence and time to death would imply that the association between the two event times is purely caused and explained by the recorded prognostic factors. Therefore, the recurrence event would provide no additional prognostic value for death in the presence of these prognostic factors. On the other hand, conditional dependence given these prognostic factors would imply a more intrinsic association between the two event times, suggesting the existence of other key factors that underlie such an association. Moreover, when making disease prognosis for death, one needs to consider the time and status of the recurrence event, after accounting for all the prognostic factors in the model.^27,28

Table 6 presents Kendall’s tau and the C index with and without covariates. We chose the quantile range $R$ by considering the high censoring rate and following the procedure described in Section 2.2. As expected, the upper limits τ_{U_j}, j = 1, 2, vary slightly with the covariate vector Z. To facilitate comparison, we display the results for $R = (0, 0.5] \times (0, 0.3]$ , by setting τ_{U_j} to the smallest upper bound for all Z. We observe that the raw restricted-range Kendall’s tau and C index between time to disease recurrence and time to death is 0.751 (95% CI, 0.702 to 0.800) and 0.876 (95% CI, 0.851 to 0.900), suggesting strong association and concordance between the two event times. Interestingly, conditioning on the collected prognostic factors does not explain much of the dependence between the two outcomes. When conditioning on one factor at a time, the number of progesterone receptors drops the estimates the most, but only from 0.751 to 0.713. After adjusting for all the covariates with statistically significant effects,²⁶ the restricted-range conditional Kendall’s tau and conditional C index decrease to 0.705 (95% CI, 0.585 to 0.825) and 0.852 (95% CI, 0.793 to 0.912), respectively. The estimates are not very sensitive to the choice of $R$ . For example, when we set $R = (0, 0.6] \times (0, 0.4]$ , the conditional Kendall’s tau changed from 0.705 to 0.697 for the last model in Table 6. This slight change may be due to some variations in the local association level.

TABLE 6.

Analysis of the German Breast Cancer Study dataset without covariates (first row) and with covariates, where (τ_U₁, τ_U₂) = (0.5, 0.3)

	g(x) = log(x)
	τ	SE(τ)	p-value	C	SE(C)
(Y₁, Y₂)	0.751	0.025	<0.001	0.876	0.012

(Y₁, Y₂ \| age	0.751	0.028	<0.001	0.875	0.014

(Y₁, Y₂ \| menopause)	0.743	0.024	<0.001	0.872	0.012

(Y₁, Y₂ \| hormone)	0.740	0.027	<0.001	0.870	0.013

(Y₁, Y₂ \| grade)	0.737	0.036	<0.001	0.868	0.018

(Y₁, Y₂ \| estrg_recp)	0.736	0.040	<0.001	0.868	0.020

(Y₁, Y₂ \| size)	0.735	0.038	<0.001	0.867	0.019

(Y₁, Y₂ \| nodes)	0.718	0.046	<0.001	0.859	0.023

(Y₁, Y₂ \| prog_recp)	0.713	0.046	<0.001	0.857	0.023

(Y₁, Y₂ \| grade, nodes, prog_recp)	0.705	0.061	<0.001	0.852	0.031

Open in a new tab

In summary, analysis results suggest that the time to recurrence and time to death have strong association, irrespective of whether the prognostic covariates are adjusted or not. Even in the presence of all the important covariates, the time to disease recurrence bears strong concordance with and thus carries desirable predictive value for the time to death.

5|. DISCUSSION

We proposed general semiparametric methods for both complete bivariate outcomes and right-censored bivariate outcomes to quantify the dependence between the two outcomes, conditionally on one or more covariates. The estimator also gives rise to a conditional version of the C index to evaluate the predictive value of one outcome for another, after controlling for the covariates. Posing minimal assumptions, the estimators are rather flexible and robust. Extensive simulation studies and two data applications demonstrate the satisfactory finite-sample performance and practical usefulness of our proposed estimator. We have focused on Kendall’s tau and the C index for assessing conditional association and conditional concordance. The proposed methods can be easily extended to other measures that can be expressed as a function of the conditional copula, including Spearman’s rho and Gini’s coefficient.

Our methods pose the common copula assumption by assuming that covariates affect the marginal distributions of the outcomes but not the conditional association. This assumption simplifies the estimation procedure and has been adopted in many association studies. To verify the validity of this assumption, the methods of Li et al¹² can be adopted.

Due to the need to specify a marginal model for Q_j(u_j|Z_j), the estimator of the conditional Kendall’s tau is not invariant to monotone transformation. This is because the estimated quantile, ${\hat{Q}}_{j} (u_{j} ∣ Z_{j})$ , depends on the link function g(·) in the marginal quantile regression model, while the true quantile Q_j(u_j|Z_j) is equivariant to monotone transformations. However, our results in Table 5 confirms that the estimated conditional Kendall’s tau is insensitive to the choice of g(·). This is likely because that the flexibility of quantile regression model allows it to approximate the true quantiles Q_j(u_j|Z_j) well, even when the model is subject to misspecification of the link function.

In this work, we fit two marginal models for the bivariate outcomes separately. If the bivariate outcomes represent the same outcome measured from two correlated subjects, one may estimate β_j(u) by jointly considering the bivariate outcomes, using quantile regression methods for correlated data.^29,30 This may lead to efficiency gains when compared to fitting the two marginal models separately. However, for most of the applications, the bivariate outcomes represent two different outcomes measured from the same subject. In this case, it is generally difficult to jointly estimate β_j(u), j = 1, 2 under the quantile regression framework.

When the bivariate event times involve time to a landmark event of the disease, such as recurrence and time to death, additional complications may arise due to the dependent censoring by death to the disease outcome. When such a complication arises, the data structure becomes semicompeting risks data instead of bivariate survival data. In the GBCS data, the rate of dependent censoring was very low, and our sensitivity studies provide assuring evidence that the proposed method can still perform well. However, in scenarios with higher rates of dependent censoring, it is necessary to account for the additional complication due to dependent censoring. There have been several existing methods for semicompeting risks data, mostly by posing copula or frailty models to associate the disease event and the death event.^28,31–33 It is of interest to extend the proposed methods along these directions to handle semicompeting risks data.

Next, though we considered only the right-censoring mechanism in the simulation study and real data analysis, we can easily adapt our method to handle left-censoring by reversing the outcome(s). This would enable us to apply the proposed method to study dependence between survival times and a left-censored biomarker. Such data are abundant in biomedical studies, such as data from the prehospital resuscitation on helicopter study.³⁴ Adapting the proposed methods to handle interval censoring and truncation is beyond the scope of this article but merits future research.

Supplementary Material

Supplement

NIHMS1561358-supplement-Supplement.pdf^{(149.7KB, pdf)}

ACKNOWLEDGEMENTS

The authors are grateful to the editor, associate editor, and two referees for their helpful comments, which lead to substantial improvements to this paper. This research was partially supported by the National Institutes of Health (NIH) through awards 1R01DK117209, 5R01CA193878, 1R03NS108136-01A1, and U01HL077863; by the Division of Mathematical Sciences, National Science Foundation (NSF), through award 1612965; and by the Andrew Sabin Family Fellowship. The authors acknowledge the Texas Advanced Computing Center at the University of Texas at Austin for providing high performance computing resources that have contributed to the research results reported within this paper (http://www.tacc.utexas.edu).

Funding information

National Institutes of Health (NIH), Grant/Award Number: 1R01DK117209, 5R01CA193878, 1R03NS108136-01A1, and U01HL077863; Division of Mathematical Sciences, National Science Foundation (NSF), Grant/Award Number: 1612965; Andrew Sabin Family Fellowship

Footnotes

SUPPORTING INFORMATION

Additional supporting information may be found online in the Supporting Information section at the end of the article.

DATA AVAILABILITY STATEMENT

The GBCS data²⁶ that support the findings of this study are openly available at ftp://ftp.wiley.com/public/sci_tech_med/survival/ and the Premature Twins Data²⁵ are openly available at https://publichealth.yale.edu/c2s2/software/twin_analysis/sample_sas_real.aspx.

REFERENCES

1.Veraverbeke N, Omelka M, Gijbels I. Estimation of a conditional copula and association measures. Scand J Stat. 2011;38(4):766–780. [Google Scholar]
2.Gijbels I, Omelka M, Veraverbeke N. Estimation of a copula when a covariate affects only marginal distributions. Scand J Stat. 2015;42(4):1109–1126. [Google Scholar]
3.Ji S, Ning J, Qin J, Follmann D. Conditional independence test by generalized Kendall’s tau with generalized odds ratio. Stat Methods Med Res. 2018;27(11):3224–3235. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Oakes D A concordance test for independence in the presence of censoring. Biometrics. 1982;38(2):451–455. [PubMed] [Google Scholar]
5.Wang W, Wells MT. Estimation of Kendall’s tau under censoring. Statistica Sinica. 2000;10(4):1199–1215. [Google Scholar]
6.Fan J, Hsu L, Prentice RL. Dependence estimation over a finite bivariate failure time region. Lifetime Data Anal. 2000;6(4):343–355. [DOI] [PubMed] [Google Scholar]
7.Lakhal L, Rivest L-P, Beaudoin D. IPCW estimator for Kendall’s tau under bivariate censoring. Int J Biostat. 2009;5(1):1–22. [Google Scholar]
8.Hsieh J-J. Estimation of Kendall’s tau from censored data. Comput Stat Data Anal. 2010;54(6):1613–1621. [Google Scholar]
9.Fan J, Prentice RL. Covariate-adjusted dependence estimation on a finite bivariate failure time region. Statistica Sinica. 2002;12(3):689–705. [Google Scholar]
10.Choi Y-H, Matthews DE. Accelerated life regression modelling of dependent bivariate time-to-event data. Can J Stat. 2005;33(3):449–464. [Google Scholar]
11.Koenker R Quantile Regression. Cambridge, UK: Cambridge University Press; 2005. 10.1017/CBO9780511754098 [DOI] [Google Scholar]
12.Li R, Cheng Y, Fine JP. Quantile association regression models. J Am Stat Assoc. 2014;109(505):230–242. [Google Scholar]
13.Portnoy S Censored regression quantiles. J Am Stat Assoc. 2003;98(464):1001–1012. [Google Scholar]
14.Peng L, Huang Y. Survival analysis with quantile regression models. J Am Stat Assoc. 2008;103(482):637–649. [Google Scholar]
15.Schweizer B, Wolff EF. On nonparametric measures of dependence for random variables. Ann Stat. 1981;9(4):879–885. [Google Scholar]
16.Nelsen RB. An Introduction to Copulas. New York, NY: Springer; 2006. ISBN 978-0-387-28678-5. [Google Scholar]
17.Gijbels I, Veraverbeke N, Omelka M. Conditional copulas, association measures and their applications. Comput Stat Data Anal. 2011;55(5):1919–1932. [Google Scholar]
18.Li R, Cheng Y, Chen Q, Fine J. Quantile association for bivariate survival data. Biometrics. 2016;73(2):506–516. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Prentice RL, Cai J. Covariance and survivor function estimation using censored multivariate failure time data. Biometrika. 1992;79(3):495–512. [Google Scholar]
20.Gill RD, van der Laan MJ, Wellner JA. Inefficient estimators of the bivariate survival function for three models. Annales de l’I.H.P. Probabilit’s et Statistiques. 1995;31(3):545–597. [Google Scholar]
21.Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA. 1982;247(18):2543–2546. [PubMed] [Google Scholar]
22.Harrell FE Jr, Lee KL, Califf RM, Pryor DB, Rosati RA. Regression modelling strategies for improved prognostic prediction. Statist Med. 1984;3(2):143–152. [DOI] [PubMed] [Google Scholar]
23.Heagerty PJ, Zheng Y. Survival model predictive accuracy and roc curves. Biometrics. 2005;61(1):92–105. [DOI] [PubMed] [Google Scholar]
24.Uno H, Cai T, Pencina MJ, DAgostino RB, Wei LJ. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statist Med. 2011;30(10):1105–1117. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Feng R, Zhou G, Zhang M, Zhang H. Analysis of twin data using SAS. Biometrics. 2009;65(2):584–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Sauerbrei W, Royston P, Bojar H, Schmoor C, Schumacher M, German Breast Cancer Study Group (GBSG). Modelling the effects of standard prognostic factors in node-positive breast cancer. Br J Cancer. 1999;79(11-12):1752–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Mauguen A, Rachet B, Mathoulin-Pelissier S, MacGrogan G, Laurent A, Rondeau V. Dynamic prediction of risk of death using history of cancer recurrences in joint frailty models. Statist Med. 2013;32(30):5366–5380. [DOI] [PubMed] [Google Scholar]
28.Emura T, Nakatochi M, Matsui S, Michimae H, Rondeau V. Personalized dynamic prediction of death according to tumour progression and high-dimensional genetic factors: meta-analysis with a joint model. Stat Methods Med Res. 2018;27(9):2842–2858. [DOI] [PubMed] [Google Scholar]
29.Jung S-H. Quasi-likelihood for median regression models. J Am Stat Assoc. 1996;91(433):251–257. [Google Scholar]
30.Leng C, Zhang W. Smoothing combined estimating equations in quantile regression for longitudinal data. Stat Comput. 2014;24(1):123–136. [Google Scholar]
31.Fine JP, Jiang H, Chappell R. On semi-competing risks data. Biometrika. 2001;88(4):907–919. [Google Scholar]
32.Peng L, Jiang H, Chappell RJ, Fine JP. An overview of the semi-competing risks problem In: Biswas A, Datta S, Fine JP, Segal MR, eds. Statistical Advances in the Biomedical Sciences: Clinical Trials, Epidemiology, Survival Analysis, and Bioinformatics. Hoboken, NJ: John Wiley & Sons; 2007. [Google Scholar]
33.Peng M, Xiang L, Wang S. Semiparametric regression analysis of clustered survival data with semi-competing risks. Comput Stat Data Anal. 2018;124(8):53–70. [Google Scholar]
34.Holcomb JB, Swartz MD, DeSantis SM, et al. Multicenter observational prehospital resuscitation on helicopter study (PROHS). J Trauma Acute Care Surg. 2017;83(1 Suppl 1):S83–S91. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

NIHMS1561358-supplement-Supplement.pdf^{(149.7KB, pdf)}

[R1] 1.Veraverbeke N, Omelka M, Gijbels I. Estimation of a conditional copula and association measures. Scand J Stat. 2011;38(4):766–780. [Google Scholar]

[R2] 2.Gijbels I, Omelka M, Veraverbeke N. Estimation of a copula when a covariate affects only marginal distributions. Scand J Stat. 2015;42(4):1109–1126. [Google Scholar]

[R3] 3.Ji S, Ning J, Qin J, Follmann D. Conditional independence test by generalized Kendall’s tau with generalized odds ratio. Stat Methods Med Res. 2018;27(11):3224–3235. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Oakes D A concordance test for independence in the presence of censoring. Biometrics. 1982;38(2):451–455. [PubMed] [Google Scholar]

[R5] 5.Wang W, Wells MT. Estimation of Kendall’s tau under censoring. Statistica Sinica. 2000;10(4):1199–1215. [Google Scholar]

[R6] 6.Fan J, Hsu L, Prentice RL. Dependence estimation over a finite bivariate failure time region. Lifetime Data Anal. 2000;6(4):343–355. [DOI] [PubMed] [Google Scholar]

[R7] 7.Lakhal L, Rivest L-P, Beaudoin D. IPCW estimator for Kendall’s tau under bivariate censoring. Int J Biostat. 2009;5(1):1–22. [Google Scholar]

[R8] 8.Hsieh J-J. Estimation of Kendall’s tau from censored data. Comput Stat Data Anal. 2010;54(6):1613–1621. [Google Scholar]

[R9] 9.Fan J, Prentice RL. Covariate-adjusted dependence estimation on a finite bivariate failure time region. Statistica Sinica. 2002;12(3):689–705. [Google Scholar]

[R10] 10.Choi Y-H, Matthews DE. Accelerated life regression modelling of dependent bivariate time-to-event data. Can J Stat. 2005;33(3):449–464. [Google Scholar]

[R11] 11.Koenker R Quantile Regression. Cambridge, UK: Cambridge University Press; 2005. 10.1017/CBO9780511754098 [DOI] [Google Scholar]

[R12] 12.Li R, Cheng Y, Fine JP. Quantile association regression models. J Am Stat Assoc. 2014;109(505):230–242. [Google Scholar]

[R13] 13.Portnoy S Censored regression quantiles. J Am Stat Assoc. 2003;98(464):1001–1012. [Google Scholar]

[R14] 14.Peng L, Huang Y. Survival analysis with quantile regression models. J Am Stat Assoc. 2008;103(482):637–649. [Google Scholar]

[R15] 15.Schweizer B, Wolff EF. On nonparametric measures of dependence for random variables. Ann Stat. 1981;9(4):879–885. [Google Scholar]

[R16] 16.Nelsen RB. An Introduction to Copulas. New York, NY: Springer; 2006. ISBN 978-0-387-28678-5. [Google Scholar]

[R17] 17.Gijbels I, Veraverbeke N, Omelka M. Conditional copulas, association measures and their applications. Comput Stat Data Anal. 2011;55(5):1919–1932. [Google Scholar]

[R18] 18.Li R, Cheng Y, Chen Q, Fine J. Quantile association for bivariate survival data. Biometrics. 2016;73(2):506–516. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Prentice RL, Cai J. Covariance and survivor function estimation using censored multivariate failure time data. Biometrika. 1992;79(3):495–512. [Google Scholar]

[R20] 20.Gill RD, van der Laan MJ, Wellner JA. Inefficient estimators of the bivariate survival function for three models. Annales de l’I.H.P. Probabilit’s et Statistiques. 1995;31(3):545–597. [Google Scholar]

[R21] 21.Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA. 1982;247(18):2543–2546. [PubMed] [Google Scholar]

[R22] 22.Harrell FE Jr, Lee KL, Califf RM, Pryor DB, Rosati RA. Regression modelling strategies for improved prognostic prediction. Statist Med. 1984;3(2):143–152. [DOI] [PubMed] [Google Scholar]

[R23] 23.Heagerty PJ, Zheng Y. Survival model predictive accuracy and roc curves. Biometrics. 2005;61(1):92–105. [DOI] [PubMed] [Google Scholar]

[R24] 24.Uno H, Cai T, Pencina MJ, DAgostino RB, Wei LJ. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statist Med. 2011;30(10):1105–1117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Feng R, Zhou G, Zhang M, Zhang H. Analysis of twin data using SAS. Biometrics. 2009;65(2):584–589. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Sauerbrei W, Royston P, Bojar H, Schmoor C, Schumacher M, German Breast Cancer Study Group (GBSG). Modelling the effects of standard prognostic factors in node-positive breast cancer. Br J Cancer. 1999;79(11-12):1752–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Mauguen A, Rachet B, Mathoulin-Pelissier S, MacGrogan G, Laurent A, Rondeau V. Dynamic prediction of risk of death using history of cancer recurrences in joint frailty models. Statist Med. 2013;32(30):5366–5380. [DOI] [PubMed] [Google Scholar]

[R28] 28.Emura T, Nakatochi M, Matsui S, Michimae H, Rondeau V. Personalized dynamic prediction of death according to tumour progression and high-dimensional genetic factors: meta-analysis with a joint model. Stat Methods Med Res. 2018;27(9):2842–2858. [DOI] [PubMed] [Google Scholar]

[R29] 29.Jung S-H. Quasi-likelihood for median regression models. J Am Stat Assoc. 1996;91(433):251–257. [Google Scholar]

[R30] 30.Leng C, Zhang W. Smoothing combined estimating equations in quantile regression for longitudinal data. Stat Comput. 2014;24(1):123–136. [Google Scholar]

[R31] 31.Fine JP, Jiang H, Chappell R. On semi-competing risks data. Biometrika. 2001;88(4):907–919. [Google Scholar]

[R32] 32.Peng L, Jiang H, Chappell RJ, Fine JP. An overview of the semi-competing risks problem In: Biswas A, Datta S, Fine JP, Segal MR, eds. Statistical Advances in the Biomedical Sciences: Clinical Trials, Epidemiology, Survival Analysis, and Bioinformatics. Hoboken, NJ: John Wiley & Sons; 2007. [Google Scholar]

[R33] 33.Peng M, Xiang L, Wang S. Semiparametric regression analysis of clustered survival data with semi-competing risks. Comput Stat Data Anal. 2018;124(8):53–70. [Google Scholar]

[R34] 34.Holcomb JB, Swartz MD, DeSantis SM, et al. Multicenter observational prehospital resuscitation on helicopter study (PROHS). J Trauma Acute Care Surg. 2017;83(1 Suppl 1):S83–S91. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A flexible and robust method for assessing conditional association and conditional concordance

Xiangyu Liu

Jing Ning

Yu Cheng

Xuelin Huang

Ruosha Li

Abstract

1|. INTRODUCTION

2|. METHOD

2.1|. Conditional Kendall’s tau with complete and censored bivariate outcomes

2.2|. Estimation of conditional Kendall’s tau

2.3|. Conditional C index

2.4|. Variance estimation and inference

3|. SIMULATION STUDY

TABLE 1.

TABLE 2.

TABLE 3.

TABLE 4.

4|. REAL DATA EXAMPLE

4.1|. Analysis of the premature twins data

TABLE 5.

4.2|. Analysis of data from the German Breast Cancer Study

TABLE 6.

5|. DISCUSSION

Supplementary Material

ACKNOWLEDGEMENTS

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A flexible and robust method for assessing conditional association and conditional concordance

Xiangyu Liu

Jing Ning

Yu Cheng

Xuelin Huang

Ruosha Li

Abstract

1|. INTRODUCTION

2|. METHOD

2.1|. Conditional Kendall’s tau with complete and censored bivariate outcomes

2.2|. Estimation of conditional Kendall’s tau

2.3|. Conditional C index

2.4|. Variance estimation and inference

3|. SIMULATION STUDY

TABLE 1.

TABLE 2.

TABLE 3.

TABLE 4.

4|. REAL DATA EXAMPLE

4.1|. Analysis of the premature twins data

TABLE 5.

4.2|. Analysis of data from the German Breast Cancer Study

TABLE 6.

5|. DISCUSSION

Supplementary Material

ACKNOWLEDGEMENTS

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases