Spearman-like correlation measure adjusting for covariates in bivariate survival data

Svetlana K Eden; Chun Li; Bryan E Shepherd

doi:10.1002/bimj.202200137

. Author manuscript; available in PMC: 2024 Dec 1.

Published in final edited form as: Biom J. 2023 Sep 27;65(8):e2200137. doi: 10.1002/bimj.202200137

Spearman-like correlation measure adjusting for covariates in bivariate survival data

Svetlana K Eden ^1,^*, Chun Li ², Bryan E Shepherd ¹

PMCID: PMC10897866 NIHMSID: NIHMS1931307 PMID: 37753794

Abstract

We propose an extension of Spearman’s correlation for censored continuous and discrete data that permits covariate adjustment. Previously proposed non-parametric and semi-parametric Spearman’s correlation estimators require either non-parametric estimation of the bivariate survival surface or parametric assumptions about the dependence structure. In practice, non-parametric estimation of the bivariate survival surface is difficult, and parametric assumptions about the correlation structure may not be satisfied. Therefore, we propose a method that requires neither and uses only the marginal survival distributions. Our method estimates the correlation of probability-scale residuals, which has been shown to equal Spearman’s correlation when there is no censoring. Because this method relies only on marginal distributions, it tends to be less variable than the previously suggested non-parametric estimators, and the confidence intervals are easily constructed. Although under censoring it is biased for Spearman’s correlation, as our simulations show, it performs well under moderate censoring, with a smaller mean squared error than non-parametric approaches. We also extend it to partial (adjusted), conditional, and partial-conditional correlation, which makes it particularly relevant for practical applications. We apply our method to estimate the correlation between time to viral failure and time to regimen change in a multi-site cohort of persons living with HIV in Latin America.

Keywords: Bivariate survival data, Non-parametric, Probability-scale residuals, Semi-parametric, Spearman’s correlation

1. Introduction

Many scientific studies focus on measuring correlation between two variables. Correlation can occur when one variable affects the other or when a third variable affects both variables of interest. In either case, it can be measured using an unadjusted correlation, but to understand the mechanism behind the relationship between two variables, it is also essential to be able to compute the conditional and adjusted correlation. For example, for people living with HIV and receiving antiretroviral therapy (ART), an increase in viral load will often prompt the treating physician to change the person’s ART regimen. Therefore, time to viral failure (the failure of ART to suppress the HIV virus) and time to regimen change should be correlated. However, because of differences in clinical practices, this correlation may vary depending on the clinic, location, or patient characteristics such as age, lab results, and comorbidities.

Measuring correlation can be challenging in the presence of right censoring. Right censoring is a general term for data with values that are not always observed due to the end of study. We are interested in the correlation between variables in a single subject (e.g., time to viral failure and time to regimen change) or in paired subjects (e.g., income of father and income of son). Specifically, we are interested in Spearman’s correlation, a non-parametric rank correlation measure. Unlike Pearson’s correlation, it is invariant to monotonic transformations and robust in the presence of outliers. It also approximates Pearson’s correlation well for normally distributed variables (Kruskal, 1958), and therefore is more popular in practice than other non-parametric correlation measures (e.g., Kendall’s tau (Kruskal, 1958) or cross ratio (Clayton, 1978)). Also, the interpretation of Spearman’s correlation is straightforward for continuous and ordinal data (i.e., the correlation of the ranked data) and is desirable in the context of right-censored data.

In the setting of bivariate right censoring, several Spearman-like statistics have been suggested. Test statistics of Cuzick (1982) and Dabrowska (1986) resemble Spearman’s correlation under certain assumptions and are effectively unscaled estimates of Spearman’s correlation when applied to uncensored data. However, these test statistics are not easy to interpret. Semi-parametric approaches for measuring correlation using copulas have been considered by Carriere (2000), Romeo et al. (2006), and Zhang (2008); although some of these authors did not estimate Spearman’s correlation, it can be computed from the estimated copulas (Nelsen, 2007). Schemper et al. (2013) proposed a semi-parametric iterative multiple imputation method to estimate Spearman’s correlation based on a normal copula. These semi-parametric approaches tend to be stable and efficient when the copula is properly specified but can be misleading in the presence of misspecification.

Eden et al. (2021) proposed to estimate Spearman’s correlation based on non-parametric estimators of the bivariate survival surface; they used the non-parametric and consistent estimator of Dabrowska (1988). Although this approach does not make parametric assumptions, its reliance on non-parametric estimators of the bivariate survival surface, which is notoriously difficult to estimate (Kalbfleisch and Prentice, 2011), can lead to poor efficiency as well as instability when the sample size is small and there is heavy censoring.

Additionally, it is desirable to measure the rank correlation between bivariate right-censored data while adjusting for covariates. Several bivariate survival models have been suggested to estimate adjusted relationships. To mention a few, Clayton and Cuzick (1985) proposed a method of estimating a cross ratio (Clayton, 1978), in the context of a frailty model that may include covariates. Shih and Louis (1995) estimated Kendall’s tau by first fitting separate Cox models conditional on covariates for both of the time-to-event variables, and then using maximum likelihood to estimate association assuming different parametric dependence structures defined with copulas. Prentice and Hsu (1997) developed a method that used Cox models to estimate marginal distributions for each variable conditional on covariates and assumed a semiparametric pairwise dependence structure. To our knowledge, there is no estimator of partial Spearman’s correlation for bivariate survival data without assuming a correlation structure of the bivariate distribution.

In this manuscript, we derive unadjusted, partial, and conditional estimators of Spearman’s correlation for bivariate survival data. Our estimators are extensions of the approach of Liu et al. (2018), who showed that Spearman’s rank correlation for uncensored continuous or discrete variables is equivalent to the correlation between probability-scale residuals (PSRs). PSRs are well defined with continuous, ordinal, and right-censored data (Shepherd et al., 2016), and can be computed with unadjusted and adjusted estimates of the marginal survival distributions. This is advantageous because it avoids computing estimates of the bivariate survival surface, and it provides a straightforward extension for covariate-adjustment. In Section 2, we review the definition of PSR, define our unadjusted Spearman’s correlation estimator using PSRs, describe its population parameter, and show how it is related to other Spearman-like statistics in the literature. In Section 3, we focus on estimation and inference of partial and conditional Spearman’s correlation with right-censored data using PSRs. In Section 4, we use simulations to estimate the performance of our statistics and compare them to other approaches. In Section 5, we apply our method to an HIV study examining the association between times from treatment initiation to viral failure and to regimen change. Finally, in Section 6, we discuss our approach and future directions. All the methods are implemented using R code available at https://github.com/SvetlanaEden/survRhoPSR. The de-identified data (adults only) used in this manuscript are available at https://biostat.app.vumc.org/ArchivedAnalyses/.

2. Unadjusted correlation of PSRs

2.1. Notation and definitions

Let $T_{X}$ and $T_{Y}$ be time to event variables, either for a single subject or for a pair of subjects. Time to events $T_{X}$ and $T_{Y}$ can be censored at times $C_{X}$ and $C_{Y}$ , respectively. We assume independence between ( $T_{X}$ , $T_{Y}$ ) and ( $C_{X}$ , $C_{Y}$ ), but $C_{X}$ and $C_{Y}$ can be dependent. Without loss of generality we assume that ( $T_{X}$ , $T_{Y}$ ) and ( $C_{X}$ , $C_{Y}$ ) are defined on $[0, \infty) \times [0, \infty)$ . If $T_{X}$ and $T_{Y}$ are observed on a single subject then it is likely that $C_{X} = C_{Y}$ . When $C_{X} = C_{Y}$ with probability 1, we call this univariate censoring, otherwise censoring is bivariate. Often, studies are restricted by the maximum follow-up time, which we denote as $τ_{X}$ and $τ_{Y}$ for $T_{X}$ and $T_{Y}$ , respectively. When $τ_{X} = τ_{Y} = \infty$ , we call it unbounded censoring. When $τ_{X} < \infty$ or $τ_{Y} < \infty$ , we refer to it as type I censoring. Type I censoring can be strict or generalized (Klein and Moeschberger, 1997). Strict type I censoring implies that all subjects start the study at the same calendar time, and there is no censoring other than that at the end of the study. Generalized type I censoring allows other patterns of study entry and censoring before the end of the study as long as the resulting censoring mechanism is uninformative. As a result of censoring, we only observe $X = \min (T_{X}, C_{X})$ and $Y = \min (T_{Y}, C_{Y})$ and event indicators $Δ_{X} = 1 (T_{X} \leq C_{X})$ and $Δ_{Y} = 1 (T_{Y} \leq C_{Y})$ . We denote marginal cumulative distribution functions of $T_{X}$ and $T_{Y}$ as $F_{T_{X}} (x) = \Pr (T_{X} \leq x)$ and $F_{T_{Y}} (y) = \Pr (T_{Y} \leq y)$ . We define $F_{T_{X}} (x^{-}) = \lim_{t ↑ x} F_{T_{X}} (t)$ ; function $F_{T_{Y}} (y^{-})$ is defined similarly.

As mentioned by Liu et al. (2018), in the absence of censoring, the population parameter for Spearman’s correlation between $T_{X}$ and $T_{Y}$ can be defined as

ρ_{S} = Cor {\frac{F_{T_{X}} (T_{X}) + F_{T_{X}} (T_{X}^{-})}{2}, \frac{F_{T_{Y}} (T_{Y}) + F_{T_{Y}} (T_{Y}^{-})}{2}} .

When both $T_{X}$ and $T_{Y}$ are continuous the above definition translates into a better known expression, $ρ_{S} = Cor {F_{T_{X}} (T_{X}), F_{T_{Y}} (T_{Y})}$ , the grade correlation (Kruskal, 1958). Liu et al. (2018) showed that $ρ_{S}$ can be presented as

ρ_{S} / c_{ρ} = Cov [{F_{T_{X}} (T_{X}) + F_{T_{X}} (T_{X}^{-}) - 1} {F_{T_{Y}} (T_{Y}) + F_{T_{Y}} (T_{Y}^{-}) - 1}],

(1)

where $c_{ρ} = {[Var {F_{T_{X}} (T_{X}) + F_{T_{X}} (T_{X}^{-}) - 1} Var {F_{T_{Y}} (T_{Y}) + F_{T_{Y}} (T_{Y}^{-}) - 1}]}^{- 1 / 2}$ , and $c_{ρ} = 3$ when $T_{X}$ and $T_{Y}$ are continuous. The right-hand side of (1) is the covariance of probability-scale residuals (PSRs) (Li and Shepherd, 2012; Shepherd et al., 2016), which are defined as

r (t_{X}, F_{T_{X}}) = E {sign (t_{X}, T_{X})} = \Pr (T_{X} < t_{X}) - \Pr (T_{X} > t_{X}) = F_{T_{X}} (t_{X}^{-}) + F_{T_{X}} (t_{X}) - 1,

where sign ( $t_{X}$ , $T_{X}$ ) is −1, 0, and 1 for $t_{X} < T_{X}$ , $t_{X} = T_{X}$ , and $t_{X} > T_{X}$ , respectively.

Shepherd et al. (2016) extended this definition to right-censored time-to-event data. When the time to event $T_{X}$ is unknown because of censoring, they suggested to use the expectation of PSRs, $r (x, F_{T_{X}}, Δ_{X} = 0) = E {r (T_{X}, F_{T_{X}}) ∣ T_{X} > x} = F_{T_{X}} (x)$ . This led to the following definition: $r (x, F_{T_{X}}, δ_{X}) = F_{T_{X}} (x) - δ_{X} (1 - F_{T_{X}} (x^{-}))$ , where ( $x$ , $δ_{X}$ ) is a realization of ( $X$ , $Δ_{X}$ ), or as a random variable,

r (X, F_{T_{X}}, Δ_{X}) = F_{T_{X}} (X) - Δ_{X} (1 - F_{T_{X}} (X^{-})) .

(2)

For example, suppose $F_{T_{X}} (x) = F_{T_{X}} (x^{-}) = 0.5$ for continuous $X = x$ ; i.e., $x$ is the median. Then if $Δ_{X} = 1$ the PSR is $2 (0.5) - 1 = 0$ , whereas if $Δ_{X} = 0$ then the expectation of the PSR given $T_{X} > x$ is 0.5, which is simply the expectation of $2 F_{T_{X}} (T_{X}) - 1$ given $T_{X} > x$ , which is $F_{T_{X}} (x)$ .

2.2. Correlation of Probability-Scale Residuals: Population Parameter and Estimation.

Note that the estimation of (1) requires the estimation of the joint distribution of $T_{X}$ and $T_{Y}$ , which is notoriously challenging to estimate non-parametrically (Dabrowska, 1988; Pruitt, 1991; Van Der Laan, 1996). In our earlier work (Eden et al., 2021), we plugged in the non-parametric bivariate survival surface estimator of Dabrowska (1988) into (1) to estimate $ρ_{S}$ , although other non-parametric estimators are feasible (e.g., Prentice and Zhao (2019)) and may be preferred in specific settings (e.g., Tsai et al. (1986) and Tsai and Crowley (1998) with univariate censoring). However, non-parametric estimators of the bivariate survival surface can be quite variable under heavy censoring, and under type I censoring the bivariate survival surface is not identifiable non-parametrically. Hence, we are interested in developing a correlation measure that does not require estimating the bivariate survival surface.

Because in the absence of censoring the correlation of PSRs equals Spearman’s correlation, it is natural to consider it as a measure of association in the presence of censoring, where the PSR for censored observations is replaced with its expectation conditional on the event occurring after the censoring time:

ρ_{P S R} = Cor {r (X, F_{T_{X}}, Δ_{X}), r (Y, F_{T_{Y}}, Δ_{Y})} = \frac{Cov {r (X, F_{T_{X}}, Δ_{X}), r (Y, F_{T_{Y}}, Δ_{Y})}}{\sqrt{Var {r (X, F_{T_{X}}, Δ_{X})} Var {r (Y, F_{T_{Y}}, Δ_{Y})}}} .

(3)

Shepherd et al. (2016) proved that PSRs have zero expectation when $F_{T_{X}}$ are $F_{T_{Y}}$ are properly specified and $T_{X} ⊥ C_{X}$ and $T_{Y} ⊥ C_{Y}$ ; therefore definition (3) can be rewritten as

ρ_{P S R} / c_{ρ} = E_{(X, Y, Δ_{X}, Δ_{Y})} {r (X, F_{T_{X}}, Δ_{X}) r (Y, F_{T_{Y}}, Δ_{Y})},

(4)

where $c_{ρ} = {[E_{(X, Δ_{X})} {r^{2} (X, F_{T_{X}}, Δ_{X})} E_{Y, Δ_{Y}} {r^{2} (Y, F_{T_{Y}}, Δ_{Y})}]}^{- 1 / 2}$ .

The estimation of $ρ_{P S R}$ is straightforward. Let ( $x_{i}$ , $δ_{X, i}$ , $y_{i}$ , $δ_{Y, i}$ ) for $i = 1, \dots, n$ be independent and identically distributed (iid) draws from ( $X$ , $Δ_{X}$ , $Y$ , $Δ_{Y}$ ), and let ${\hat{F}}_{T_{X}}$ and ${\hat{F}}_{T_{Y}}$ be Kaplan–Meier estimates of $F_{T_{X}}$ and $F_{T_{Y}}$ , respectively. We estimate the right-hand side of (4) as

{\hat{ρ}}_{P S R} / {\hat{c}}_{ρ} = \frac{1}{n} \sum_{i} {r (x_{i}, {\hat{F}}_{T_{X}}, δ_{X, i}) r (y_{i}, {\hat{F}}_{T_{Y}}, δ_{Y, i})},

(5)

where {\hat{c}}_{ρ} = {[{\frac{1}{n} \sum_{i} r^{2} (x_{i}, {\hat{F}}_{T_{X}}, δ_{X, i})} {\frac{1}{n} \sum_{i} r^{2} (y_{i}, {\hat{F}}_{T_{Y}}, δ_{Y, i})}]}^{- 1 / 2} .

(6)

Note that this approach does not require estimating the joint distribution of $T_{X}$ and $T_{Y}$ . Note also that under strict type I censoring, (5) is the Spearman’s correlation after assigning all censored observations the highest rank value. Estimation of the bivariate or marginal survival distributions beyond the highest observed event time is not necessary to obtain $ρ_{P S R}$ .

The variance of the estimator can be computed using a large sample approximation approach with M-estimation and the delta method (Stefanski and Boos, 2002). M-estimation can be used under very general assumptions and requires computing estimating equations and their derivatives for each unknown parameter, including Kaplan-Meier (KM) estimates of $F_{T_{X}} (x_{i})$ and $F_{T_{Y}} (y_{i})$ for all $i$ . We use the results of Stute (1995), who developed estimating equations for the Kaplan-Meier estimator (see Sections 7.1 and 7.2 of the Supporting Information).

2.3. Relations between ${\hat{ρ}}_{P S R}$ and other Spearman-like statistics

In this section, we briefly review some Spearman-like statistics in the literature and highlight their relationship to ${\hat{ρ}}_{P S R}$ . The idea of testing association in bivariate survival data using marginal distributions was suggested previously. Cuzick (1982) studied a situation where the two underlying times $T_{X}$ and $T_{Y}$ are assumed to be connected to a common latent variable: $T_{X} = a Z + e_{X}$ , $T_{Y} = b Z + e_{Y}$ , where the parameters $a$ and $b$ are constrained to be $b = a λ$ and both $e_{X}$ and $e_{Y}$ follow logistic distributions. To test the null hypothesis $a = 0$ , Cuzick suggested statistic $\sum_{i} (s_{X, i} s_{Y, i})$ , where $s_{\cdot, i} = 1 - 2 {\hat{F}}_{\cdot, i}$ for uncensored observations and $1 - {\hat{F}}_{\cdot, i}$ for censored observations, which is “a statistic equivalent to Spearman’s rank correlation coefficient” when there is no censoring. In comparison, for continuous time, $r (\cdot, {\hat{F}}_{.,}, 1) = 2 {\hat{F}}_{\cdot, i} - 1$ for uncensored observations and $r (\cdot, \hat{F} ., 0) = {\hat{F}}_{\cdot, i}$ for censored observations. Therefore, when there is no censoring, the statistic of Cuzick is ( $n - 1$ ) times the covariance of PSRs.

Dabrowska (1986) defined a more general version of $\sum_{i} (s_{X, i} s_{Y, i})$ for testing the null of independence of $F_{T_{X}}$ and $F_{T_{Y}}$ when the underlying times are continuous, where $s_{\cdot, i} = δ_{\cdot, i} - (1 + δ_{\cdot, i}) {\hat{F}}_{\cdot, i}$ is “the censored-data version of the Spearman test.” Since $s_{\cdot, i} = δ_{\cdot, i} (1 - {\hat{F}}_{\cdot, i}) - {\hat{F}}_{\cdot, i} = - r (\cdot, {\hat{F}}_{\cdot, i}, δ_{\cdot, i})$ , Dabrowska’s statistic is ( $n - 1$ ) times the covariance of PSRs for continuous time.

To test independence for bivariate current status data, Ding and Wang (2004) suggested to use statistic $(1 / n) \sum_{i} [{δ_{X, i} - {\hat{F}}_{T_{X}} (c_{X, i})} {δ_{Y, i} - {\hat{F}}_{T_{Y}} (c_{Y, i})}]$ , where $c_{X, i}$ and $c_{Y, i}$ are the times of collecting the status data. Their statistic is $(n - 1) / n$ times the covariance of PSRs for current status data, as defined by Shepherd et al. (2016).

All three statistics above are related to the covariance of PSRs under certain conditions. The advantage of using the correlation instead of the covariance is that it has a convenient range from −1 to 1, and therefore can be used not only as a test statistic but as a Spearman-like correlation measure.

2.4. Comparison of $ρ_{P S R}$ and $ρ_{S}$

When there is no censoring, $ρ_{P S R} = ρ_{S}$ . In the presence of censoring, however, $ρ_{P S R}$ does not necessarily equal $ρ_{S}$ . The distribution of PSRs is a mixture of two distributions, one corresponding to the time to event and the other corresponding to the time to censoring. Because $ρ_{P S R}$ depends on the censoring distribution in a complicated manner, we are unable to derive a general expression of $ρ_{P S R}$ solely as a function of $ρ_{S}$ ; we provide an intuitive explanation in Section 5 of the Supporting Information. Based on our simulations, censoring has a tendency to affect the joint distribution of $(r (X, F_{T_{X}}, Δ_{X}), r (Y, F_{T_{Y}}, Δ_{Y}))$ in such a way as to attenuate the estimated correlation. However, as will be illustrated in Section 4 via simulations, the difference between $ρ_{P S R}$ and $ρ_{S}$ is often quite small, particularly when the probability of censoring is low. In addition, the mean squared error of ${\hat{ρ}}_{P S R}$ is often smaller than that of other non-parametric estimators of $ρ_{S}$ . Importantly, regardless of the censoring distribution, $ρ_{P S R} = 0$ when $T_{X} ⊥ T_{Y}$ (see Theorem A.1 in the Supporting Information).

To illustrate the difference between $ρ_{P S R}$ and $ρ_{S}$ , we derived algebraic expressions of $ρ_{P S R}$ for four specific scenarios. They are combinations of two extreme correlation structures, perfectly positive and negative Spearman’s correlation $(ρ_{S} \in {- 1, 1})$ , and two types of censoring, strict type I censoring and unbounded censoring, both with $(T_{X}, T_{Y}) ⊥ (C_{X}, C_{Y})$ and $C_{X} ⊥ C_{Y}$ (see Section 6 in the Supporting Information). Figure 1 shows that for these scenarios, heavier censoring leads to lower absolute values of $ρ_{P S R}$ except under strict type I censoring with equal proportions of censoring for both variables. Unbounded censoring has a stronger effect to lower $| ρ_{P S R} |$ than type I censoring. The maximum change in $| ρ_{P S R} |$ under strict type I censoring is 0.06 when $ρ_{S} = 1$ and 0.14 when $ρ_{S} = - 1$ , and that under unbounded censoring is 0.20 when $ρ_{S} = 1$ and 0.30 when $ρ_{S} = - 1$ . The performance of $ρ_{P S R}$ under other dependence structures and censoring scenarios will be studied through simulations in Section 4.

Contour plots for the absolute value of $ρ_{P S R}$ as a function of the proportion censored when $ρ_{S} = 1$ . The left and right columns represent scenarios of perfect positive and perfect negative correlations, respectively. The top and bottom rows are strict type I and unbounded bivariate censoring, respectively. Each contour represents a change of 0.016 absolute correlation value.

3. Adjusted correlations of PSRs

3.1. Population parameters

Correlation between two variables is often confounded by another variable or a set of variables, $Z$ . For instance, the correlation between the time to viral failure and time to regimen change could be confounded by study site, age, sex, and CD4 count at ART initiation. Confounded associations create a practical necessity for an adjusted correlation estimator. For uncensored data, Liu et al. (2018) showed that adjusted or partial Spearman’s correlation, $ρ_{S \cdot Z}$ , could be computed as the correlation between PSRs from models adjusting for confounders. For right-censored data, partial Spearman’s correlation, $ρ_{P S R \cdot Z}$ , can be defined in a similar way:

ρ_{P S R \cdot Z} / c_{ρ \cdot Z} = Cov {r (X, F_{T_{X} ∣ Z}, Δ_{X}), r (Y, F_{T_{Y} ∣ Z}, Δ_{Y})}

(7)

where $c_{ρ \cdot Z} = {[Var {r (X, F_{T_{X} ∣ Z}, Δ_{X})} Var {r (Y, F_{T_{Y} ∣ Z}, Δ_{Y})}]}^{- 1 / 2}$ , where $F_{T_{Y} ∣ Z}$ is the distribution of $T_{X}$ conditional on $Z$ and $F_{T_{Y} ∣ Z}$ is similarly defined. Throughout this section, we assume independence between $T_{X}$ and $C_{X}$ conditional on $Z$ and between $T_{Y}$ and $C_{Y}$ conditional on $Z$ .

In practice, in addition to studying adjusted correlation, researchers might be interested in studying how the correlation is modified by another variable or a set of variables. For example, we might ask whether the correlation between time to viral failure and time to regimen change varies with the CD4 count at ART initiation. This question can be answered by computing a conditional correlation. Following Liu et al. (2018), we define an extension of the conditional Spearman’s correlation for right-censored data as

ρ_{P S R ∣ Z} / c_{ρ ∣ Z} = Cov {r (X, F_{T_{X} ∣ Z}, Δ_{X}), r (Y, F_{T_{Y} ∣ Z}, Δ_{Y}) ∣ Z},

(8)

where $c_{ρ ∣ Z} = {[Var {r (X, F_{T_{X} ∣ Z}, Δ_{X}) ∣ Z} Var {r (Y, F_{T_{Y} ∣ Z}, Δ_{Y}) ∣ Z}]}^{- 1 / 2}$ . Note that unlike partial correlation (7), the conditional correlation defined in (8) is a function of $Z$ , acknowledging that the dependence structure between $T_{X}$ and $T_{Y}$ may depend $Z$ .

Finally, researchers may be interested in studying how association adjusted for variables $Z_{1}$ is modified by variables $Z_{2}$ . This can be addressed by combining the previously described approaches and computing a partial-conditional correlation. For example, we might be interested in the correlation between the time to viral failure and time to regimen change for different levels of CD4 count at ART initiation ( $Z_{2}$ ) adjusted for study site, age, and sex ( $Z_{1}$ ). Similar to Liu et al. (2018), we define the extension of partial-conditional Spearman’s correlation for right-censored data, $ρ_{P S R \cdot Z_{1} ∣ Z_{2}}$ , as

ρ_{P S R \cdot Z_{1} ∣ Z_{2}} / c_{ρ \cdot Z_{1} ∣ Z_{2}} = Cov {r (X, F_{T_{X} ∣ Z}, Δ_{X}), r (Y, F_{T_{Y} ∣ Z}, Δ_{Y}) ∣ Z_{2}},

(9)

where $c_{ρ \cdot Z_{1} ∣ Z_{2}} = {[Var {r (X, F_{T_{X} ∣ Z}, Δ_{X} ∣ Z_{2})} Var {r (Y, F_{T_{Y} ∣ Z}, Δ_{Y} ∣ Z_{2})}]}^{- 1 / 2}$ and $Z = (Z_{1}, Z_{2})$ .

Similar to the unadjusted parameters described in Section 2, these covariate-adjusted Spearman-like parameters depend on the censoring distribution and therefore, in the presence of censoring, are not equal to the underlying covariate-adjusted Spearman’s correlations without censoring. However, they are bounded between −1 and 1, and they are equal to 0 when $T_{X} ⊥ T_{Y} ∣ Z$ and $(T_{X}, T_{Y}) ⊥ (C_{X}, C_{Y}) ∣ Z$ (see Theorem A.2 in the Supporting Information). In addition, their estimation does not require estimating the joint distribution of $T_{X}$ and $T_{Y}$ conditional on $Z$ , as will be shown in the next subsection.

3.2. Estimation

Estimation of the parameters (7), (8), and (9) can be performed using the steps suggested by Liu et al. (2018). Under correctly specified models and independent censoring, Shepherd et al. (2016) showed that $E {r (X, F_{T_{X} ∣ Z}, Δ_{X}) ∣ Z} = E {r (Y, F_{T_{Y} ∣ Z}, Δ_{Y}) ∣ Z} = 0$ , so the covariance and variance estimates can be approximated as expectations of the product and squared PSRs, respectively. Therefore, a plug-in estimator for the partial correlation is

{\hat{ρ}}_{P S R s \cdot Z} / {\hat{c}}_{ρ \cdot Z} = \frac{1}{n} \sum_{i} r (x_{i}, {\hat{F}}_{T_{X} ∣ Z = z_{i}}, δ_{X, i}) r (y_{i}, {\hat{F}}_{T_{Y} ∣ Z = z_{i}}, δ_{Y, i}),

where ${\hat{c}}_{ρ \cdot Z} = {[{\frac{1}{n} \sum_{i} r^{2} (x_{i}, {\hat{F}}_{T_{X} ∣ Z = z_{i}}, δ_{X, i})} {\frac{1}{n} \sum_{i} r^{2} (y_{i}, {\hat{F}}_{T_{Y} ∣ Z = z_{i}}, δ_{Y, i})}]}^{- 1 / 2}$ , and ${\hat{F}}_{T_{X} ∣ Z}$ and ${\hat{F}}_{T_{Y} ∣ Z}$ are fitted distributions.

Although one could use parametric survival models (e.g., exponential, Weibull, or log-normal regressions) to estimate ${\hat{F}}_{T_{X} ∣ Z}$ and ${\hat{F}}_{T_{Y} ∣ Z}$ , this choice would seem contrary to the non-parametric nature of Spearman’s rank correlation. If we wanted to preserve its non-parametric nature, we would fit a nonparametric model for each outcome. Still, these models are hard to estimate with multivariable or continuous $Z$ . We can compromise and use a rank-based semi-parametric model, for example, Cox proportional hazards regression. Other choices could be the larger class of semi-parametric linear transformation models proposed by Zeng and Lin (2007).

Estimating conditional and partial-conditional correlations is a little more complicated. For conditional correlation at $Z = z_{i}$ , after obtaining PSRs $r (x_{i}, {\hat{F}}_{T_{X} ∣ Z = z_{i}}, δ_{X, i})$ and $r (y_{i}, {\hat{F}}_{T_{Y} ∣ Z = z_{i}}, δ_{Y, i})$ , we follow the approach in Liu et al. (2018) to obtain a smoothed estimate. Specifically, we fit linear models of $K = r (x_{i}, {\hat{F}}_{T_{X} ∣ Z = z_{i}}, δ_{X, i}) r (y_{i}, {\hat{F}}_{T_{Y} ∣ Z = z_{i}}, δ_{Y, i})$ , $L = r^{2} (x_{i}, {\hat{F}}_{T_{X} ∣ Z = z_{i}}, δ_{X, i})$ , and $W = r^{2} (y_{i}, {\hat{F}}_{T_{Y} ∣ Z = z_{i}}, δ_{Y, i})$ on $Z$ . We suggest using flexible modeling techniques (e.g., restricted cubic splines for continuous covariates). Next, we obtain the fitted values $\hat{K} (Z)$ , $\hat{L} (Z)$ , and $\hat{W} (Z)$ , and then calculate ${\hat{ρ}}_{P S R ∣ Z}$ as $\frac{\hat{K} (Z)}{\sqrt{\hat{L} (Z) \hat{W} (Z)}}$ . Although unlikely, this ratio can be outside of [−1,1] for some $Z$ , in which case we assign ${\hat{ρ}}_{P S R ∣ Z}$ to be −1 or 1 depending on its sign. Also, in extreme cases with small sample sizes and heavy censoring, terms $\hat{L} (Z)$ and $\hat{W} (Z)$ can be negative, in which case the estimate is not defined. Estimating partial-conditional correlation is performed similarly, except that we fit $K$ , $L$ , and $W$ on $Z_{2}$ .

We estimate the variance of partial, conditional, and partial-conditional correlations using the bootstrap or M-estimation (Stefanski and Boos, 2002); see also Section 7.1 of the Supporting Information. If PSRs are estimated using a parametric regression, then score equations required for M-estimation can be conveniently obtained from standard statistical software. If PSRs are estimated using Cox proportional hazards regression (Cox, 1972), the estimating equations for $β$ -coefficients and the baseline hazard can be obtained using the full likelihood approach suggested by Breslow (1972). M-estimation approaches for exponential, Weibull, log-logistic, and log-normal survival regressions as well as Cox proportional hazards regression are detailed in Section 7.3 of the Supporting Information. Variance estimation for conditional and partial-conditional correlations also requires estimating equations from the models that provide $K$ , $L$ , and $W$ (see the previous paragraph), and although tedious, the approach is straightforward and similar to that described by Liu et al. (2018).

3.3. Choice of covariates

As an aside, the choice of adjustment variables, $Z$ , deserves some careful consideration. Note that $F_{T_{X} ∣ Z}$ and $F_{T_{Y} ∣ Z}$ are the distributions of $T_{X}$ and $T_{Y}$ conditional on the same set of covariates $Z$ . Therefore, when $T_{X}$ and $T_{Y}$ belong to the same subject (e.g., times to viral failure and to regimen change), $Z$ must be the same for $F_{T_{X} ∣ Z}$ and $F_{T_{Y} ∣ Z}$ . When the times to events belong to different paired subjects, $Z$ should contain the union of covariates relevant for both subjects. For example, if one were interested in the prostate-specific antigen (PSA)-adjusted association between times to prostate cancer in father-son pairs, $Z$ should include both PSA for the father and PSA for the son.

4. Simulations

To investigate the performance of our unadjusted, partial, and conditional estimators, we applied them to simulated data with different sample sizes, dependence structures, and censoring scenarios. All simulations used 1000 replications and were performed in statistical language R (R Core Team, 2017) using libraries survival (Therneau, 2015), lcopula (Belzile and Genest, 2017), and cubature (Narasimhan and Johnson, 2017). The dependence structures were simulated using copulas (Nelsen, 2007). For random variables $T_{X}$ and $T_{Y}$ with marginal CDFs $F_{T_{X}} (x)$ and $F_{T_{Y}} (y)$ , a copula is the joint CDF of random variables $U = F_{T_{X}} (T_{X})$ and $V = F_{T_{Y}} (T_{Y})$ , $H (u, v) = \Pr (U \leq u, V \leq v)$ . Following Fan et al. (2000), we employed two commonly used copulas, Clayton’s copula and Frank’s copula. Clayton’s family produces only positive association; the magnitude of the association is defined by parameter $θ$ . Frank’s family also has one parameter $θ$ and can generate positive and negative correlations. The choices of $θ$ are detailed below.

4.1. Unadjusted correlation

The data ( $X_{i}$ , $Δ_{X, i}$ , $Y_{i}$ , $Δ_{Y, i}$ ) were simulated in the following manner:

(U_{i}, V_{i}) \sim H (u, v; θ);

(10)

T_{X, i} = F_{T_{X}}^{- 1} (U_{i}), where F_{T_{X}} (x) = 1 - e^{- x};

(11)

T_{Y, i} = F_{T_{Y}}^{- 1} (V_{i}), where F_{T_{Y}} (y) = 1 - e^{- y};

(12)

C_{X, i} \sim Exponential (rate = λ_{X}), C_{Y, i} \sim Exponential (rate = λ_{Y});

(13)

X_{i} = \min (T_{X, i}, C_{X, i}); Y_{i} = \min (T_{Y, i}, C_{Y, i});

Δ_{X, i} = 1 (T_{X, i} \leq C_{X, i}); Δ_{Y, i} = 1 (T_{Y, i} \leq C_{Y, i});

where (10) was one of the following:

$ρ_{S} = 0$ implemented using $H (u, v) = u v$ ;
$ρ_{S} = 0.2$ implemented using Clayton’s copula $H (u, v, θ = 0.311)$ ;
$ρ_{S} = 0.2$ implemented using Frank’s copula $H (u, v, θ = 1.224)$ ;
$ρ_{S} = - 0.2$ implemented using Frank’s copula $H (u, v, θ = - 1.224)$ .

We studied sample sizes of 100 and 200 and simulated two types of unbounded censoring, univariate $(C_{X} \equiv C_{Y})$ and bivariate $(C_{X} ⊥ C_{Y})$ . The desired censoring proportion, $P$ , was achieved by choosing parameter $λ = P / (1 - P)$ . For unbounded univariate censoring, we used censoring proportions ( $P_{X}$ , $P_{Y}$ ) of (0.3,0.3) and (0.7,0.7). For unbounded bivariate censoring, the censoring proportions were (0.3,0.3), (0.3,0.7), and (0.7,0.7). For strict type I censoring, median survival time of the underlying distribution was used as the follow-up period. To simulate data under generalized type I censoring, we first simulated data under unbounded censoring and then censored all observations after the median survival time. As a result, the censoring proportions for generalized type I censoring were a little higher, (0.56,0.56) and (0.73,0.73).

Type I error rate and power of ${\hat{ρ}}_{P S R}$ were compared to previously suggested methods:

Spearman’s correlation estimator, ${\hat{ρ}}_{S}^{H}$ with bootstrap confidence intervals obtained with 1000 bootstrap samples (Eden et al., 2021);
Spearman-like statistic, $S_{n}$ , proposed by Dabrowska (1986);
Log-rank statistic, $T_{n}$ , proposed by Dabrowska (1986);
Log-rank statistic, $U_{n}$ , as a special case of one of the martingale-based statistics proposed by Shih and Louis (1996);
Weighted log-rank statistic, $V_{n}$ , with optimal weights by Shih and Louis (1996).

Bias and root mean squared error (RMSE) of ${\hat{ρ}}_{P S R}$ and ${\hat{ρ}}_{S}^{H}$ were reported with respect to $ρ_{S}$ . For ${\hat{ρ}}_{P S R}$ , bias and RMSE were also reported with respect to $ρ_{P S R}$ . The 95%-confidence intervals of ${\hat{ρ}}_{P S R}$ were computed using M-estimation with estimating equations proposed by Stute (1995).

Figure 2 shows the type I error rate and power for ${\hat{ρ}}_{P S R}$ compared to the other test statistics for Frank’s copula at $n = 200$ . For zero or moderate censoring, the type I error rate of ${\hat{ρ}}_{P S R}$ tended to be conserved at the nominal level. For high levels of censoring, the type I error appeared to be slightly inflated. For $ρ_{S} = 0.2, - 0.2$ , the power of our method tended to be similar or higher compared to the previously suggested non-parametric test statistics. In particular, for heavy censoring, ${\hat{ρ}}_{P S R}$ was much more powerful than the only other interpretable estimate, ${\hat{ρ}}_{S}^{H}$ , which has a large variance as a result of having to estimate a non-parametric bivariate survival surface.

Type I error rate (top row) and power (middle and bottom rows) for unadjusted correlation and sample size of 200. The columns represent different types and proportions of censoring. The following methods are presented: 1) ${\hat{ρ}}_{P S R}$ with Stute’s estimating equations is represented by ○; 2) ${\hat{ρ}}_{S}^{H}$ (Eden *et al.*, 2021) is represented by △; 3) ${\hat{S}}_{N}$ (Dabrowska, 1986) is represented by +; 4) ${\hat{T}}_{N}$ (Dabrowska, 1986) is represented by ×; 5) $U_{N}$ (Shih and Louis, 1996) is represented by ◇; 6) ${\hat{V}}_{N}$ (Shih and Louis, 1996) is represented by ▽.

The bias and RMSE of ${\hat{ρ}}_{P S R}$ and ${\hat{ρ}}_{S}^{H}$ are reported in Figure 3. For the studied simulation scenarios, ${\hat{ρ}}_{P S R}$ was approximately unbiased for $ρ_{P S R}$ and biased towards zero for $ρ_{S}$ with larger bias observed for heavier censoring. In spite of the bias, however, the RMSE of ${\hat{ρ}}_{P S R}$ for $ρ_{S}$ was generally lower than that of ${\hat{ρ}}_{S}^{H}$ , as the variance of ${\hat{ρ}}_{P S R}$ was typically much smaller than that of ${\hat{ρ}}_{S}^{H}$ . In addition, ${\hat{ρ}}_{S}^{H}$ is also biased for $ρ_{S}$ under type I censoring because the full joint distribution cannot be non-parametrically estimated.

Point estimate ±SD for unadjusted correlation and sample size of 200. The columns represent different types and proportions of censoring. Solid horizontal lines represent $ρ_{S}$ , and dotted horizontal lines represent $ρ_{P S R}$ . The numbers on the bottom represent the corresponding RMSEs. The following methods are presented: 1) ${\hat{ρ}}_{P S R}$ with Stute’s estimating equations is represented by ○; 2) ${\hat{ρ}}_{S}^{H}$ (Eden *et al.*, 2021) is represented by △. The horizontal dotted and solid lines are the population parameters $ρ_{P S R}$ and $ρ_{S}$ , respectively.

Figures 1, 2, 3, and 4 in the Supporting Information show simulation results for the sample size of 100 and 200 with high levels of bivariate censoring and Clayton’s dependence structure. For heavier censoring, an elevated type I error rate, larger bias and RMSE are observed for ${\hat{ρ}}_{P S R}$ . For the sample size of 100, the type I error rate and RMSE are elevated, but the bias is very similar to that of 200. Tables 1 and 2 in the Supporting Information provide numeric values for type I error rate, power, bias, and RMSE for all studied simulation scenarios. We also included simulation results using the Clayton and Frank copulas with Gompertz marginal distributions; the Marshal-Olkin copula with exponential and Gompertz marginal distributions; and the Clayton, Frank, and Marshal-Olkin copulas with discretized exponential marginal distributions (see Figures 5, 6 in Supporting Information). Results were generally similar to those already presented with slightly greater attenuation of the point estimates and slightly higher RMSEs in the presence of heavy censoring.

4.2. Partial correlation

To simulate partial correlation, we followed the steps similar to those in Section 4.1, but instead of (11) and (12) we used

T_{X, i} = F_{T_{X} ∣ Z}^{- 1} (U_{i} ∣ Z_{i}), where F_{T_{X} ∣ Z} (x ∣ Z_{i}) = 1 - e^{- (e^{- Z_{i} β_{X}}) x};

T_{Y, i} = F_{T_{Y} ∣ Z}^{- 1} (V_{i} ∣ Z_{i}), where F_{T_{Y} ∣ Z} (y ∣ Z_{i}) = 1 - e^{- (e^{- Z_{i} β_{Y}}) y};

where $Z = (Z_{0}, Z_{1}, Z_{2})$ , $Z_{0} \equiv 1$ , $Z_{1}$ was normally distributed with mean 0 and variance 1, and $Z_{2}$ was binary with $\Pr (Z_{2} = 1) = \frac{1}{2}$ , $β_{X} = (1, 1, 0.5)$ and $β_{Y} = (0, - 1, 2)$ . Times to censoring, $C_{X}$ and $C_{Y}$ , were simulated from exponential distributions with $λ = P / (1 - P)$ , where the censoring proportion $P$ was chosen so that the average censoring proportion in both variates was either 0.3 or 0.7. For type I censoring, these censoring proportions were approximately 0.56 or 0.73. We estimated the partial correlation of PSRs, as outlined in Section 3.2. We fit the following models for $F_{T_{X}} (\cdot ∣ Z)$ and $F_{T_{Y}} (\cdot ∣ Z)$ :

Log-normal survival model (misspecified model);
Exponential survival model (true model);
Cox proportional hazards model (true model) with variance estimated using partial likelihood score equations, and ignoring uncertainty in baseline hazard;
Cox proportional hazards model (true model) with variance estimated using full likelihood score equations.

We evaluated the performance of our method for the sample sizes of 100 and 200 under correctly and incorrectly specified models. The type I error rate, power, bias, and RMSE were reported with respect to $ρ_{S \cdot Z}$ , the partial Spearman’s correlation between $T_{X}$ and $T_{Y}$ .

Figure 4 shows that bias and RMSE were similar for the studied models, even for the misspecified log-normal model. Figure 10 in the Supporting Information also shows that the RMSE was similar for Cox model with full and partial likelihood. Figure 4 also shows that ${\hat{ρ}}_{P S R \cdot Z}$ was further from $ρ_{S \cdot Z}$ as the proportion censored increased. Results were somewhat similar for $n = 100$ (see Figure 9 and Tables 3, 4 in the Supporting Information).

Point estimate ±SD for partial correlation and sample size of 200. The columns represent different types and proportions of censoring. The numbers on the bottom represent the corresponding RMSEs. The following methods are presented: 1) Cox with estimating equations based on the full likelihood represented by ○; 2) Exponential survival model represented by △; 3) Log-normal survival model represented by □.

The type I error rate was near the nominal 0.05 level except with lower sample size and high rates of censoring (see Figures 7 and 8 in the Supporting Information).

4.3. Conditional correlation

To simulate conditional correlation, we followed the steps similar to those in Section 4.2, but instead of a vector of covariates ( $Z_{0}$ , $Z_{1}$ , $Z_{2}$ ), we simulated a single covariate $Z$ , a uniformly distributed random variable with support in [0,3]. For conditional correlation structure, the parameters of Frank’s copula were chosen in such a way that the average correlation $ρ_{S} (Z)$ was approximately 0.2: 1) constant correlation, $ρ_{S} (Z) = 0.2$ ; 2) linear increasing correlation, $ρ_{S} (Z) = 0.133 Z$ ; and 3) quadratic correlation (bellshaped), $ρ_{S} (Z) = 0.001 + 0.48 Z - 0.16 Z^{2}$ . The parameters for univariate censoring were chosen in such a way that the proportions of censored events were approximately (0.3, 0.3). We studied the sample size of 500 simulated 1000 times. A Cox proportional hazards model was used to estimate PSRs. All cases were analyzed using correctly specified models. For bell-shaped conditional correlation, the regression model was fit with terms $Z$ and $Z^{2}$ . The performance of ${\hat{ρ}}_{P S R} (Z)$ was evaluated visually by plotting the bias and coverage probability of ${\hat{ρ}}_{P S R} (Z)$ for $ρ_{S} (Z)$ and $ρ_{P S R} (Z)$ .

Figure 5 shows the population parameters of $ρ_{S} (Z)$ and $ρ_{P S R} (Z)$ , bias, and coverage probability of ${\hat{ρ}}_{P S R} (Z)$ for $ρ_{S} (Z)$ and $ρ_{P S R} (Z)$ as functions of $Z$ under (0.3,0.3) unbounded univariate censoring.

Top row: the population parameters for $ρ_{S} (Z)$ (in black) and $ρ_{P S R} (Z)$ (in gray) as functions of $Z$ . Middle row: bias of ${\hat{ρ}}_{P S R} (Z)$ for $ρ_{S} (Z)$ (in black) and ${\hat{ρ}}_{P S R} (Z)$ for $ρ_{P S R} (Z)$ (in gray) as functions of $Z$ . Bottom row: coverage probability of ${\hat{ρ}}_{P S R} (Z)$ for $ρ_{S} (Z)$ (in black) and ${\hat{ρ}}_{P S R} (Z)$ for $ρ_{P S R} (Z)$ (in gray) as functions of $Z$ . Frank’s copula was used to model correlation. Survival probabilities were modeled using Cox proportional hazards regression (true model) with variance estimated using full likelihood score equations. For bell-shaped conditional correlation, the linear models were fit with a quadratic term. The data were simulated 1000 times with a sample size of 500. Unbounded univariate censoring of (0.3,0.3) was applied.

Survival probabilities were modeled using Cox proportional hazards regression (true model) with variance estimated using full likelihood score equations. Although $ρ_{P S R}$ is not the same as $ρ_{S}$ , the bias of ${\hat{ρ}}_{P S R} (Z)$ for $ρ_{S} (Z)$ was reasonable, and the coverage was mostly above 90%. Figure 11 in the Supporting Information shows that our method performs very well in the absence of censoring, although a small bias was still observed.

5. Application

We use our method to compute the correlation between the time from ART initiation to viral failure and the time from ART initiation to major regimen change among people living with HIV in Latin America with data from the Caribbean, Central, and South America Network for HIV epidemiology (McGowan et al., 2007). The anonymized version of this dataset (CCASAnet, 2020) was also used by Eden et al. (2021), and additional details can be found in that paper. In short, viral failure and regimen change tend to be correlated as failure to suppress the virus often triggers a regimen change. However, not all regimen changes are due to viral failure, and not all viral failures lead to a regimen change.

Variable definitions were as defined elsewhere (Cesar et al., 2015; Eden et al., 2021). Censoring was univariate. Adults (18 years or older when starting ART) and children (under 18) were analyzed separately. The analysis datasets included 6691 adults from anonymized sites A, B, C, D, and E, and 374 children from sites F and G. For adults and children, the median follow-up times were 4.1 years (ranging from 1 day to 18.2 years) and 7.4 years (ranging from 1 day to 19.1 years), respectively. For adults, 28.6% had viral failure, 28.3% had regimen change, 59.1% had neither, and only 16.1% had both events. For children, 63.4% had viral failure, 48.1% had regimen change, 30.8% had neither, and 42.3% had both events.

Table 1 presents rank correlation estimates and their 95% confidence intervals (CIs) for various subgroups. It shows that ${\hat{ρ}}_{P S R}$ was positive for all studied subgroups. The unadjusted correlations for adults and children were very similar, 0.32 (95% CI: 0.29, 0.35) and 0.32 (95% CI: 0.22, 0.42), respectively. For adults, the correlations across sites were a little more variable with lower correlations in site A (0.22) and site B (0.21) and higher correlations in site C (0.40) and site E (0.33).

Table 1.

Correlation between time to viral failure and time to regimen change. Columns ${\hat{ρ}}_{P S R}$ , ${\hat{ρ}}_{P S R \cdot Z}$ , and ${\hat{ρ}}_{S}^{H}$ (Eden et al., 2021) show the correlation values. The confidence intervals for ${\hat{ρ}}_{P S R}$ are computed using Stute’s estimating equations, for ${\hat{ρ}}_{P S R \cdot Z}$ using Cox full likelihood score equations, and for ${\hat{ρ}}_{S}^{H}$ using bootstrap with 1000 bootstrap samples.

Group	N	${\hat{ρ}}_{P S R}$	${\hat{ρ}}_{P S R \cdot Z}$	${\hat{ρ}}_{S}^{H}$

Adults	6691	0.32 (0.29, 0.35)	0.29 (0.26, 0.32)	0.35 (0.26, 0.43)
Male	5185	0.31 (0.28, 0.34)	0.29 (0.25, 0.32)	0.32 (0.22, 0.42)
Female	1506	0.35 (0.30, 0.40)	0.31 (0.25, 0.36)	0.45 (0.30, 0.58)
Site A	1040	0.22 (0.16, 0.28)	0.21 (0.15, 0.28)	−0.06 (−0.22, 0.29)
Site B	975	0.21 (0.14, 0.28)	0.22 (0.15, 0.29)	0.52 (0.16, 0.75)
Site C	2225	0.40 (0.35, 0.45)	0.39 (0.34, 0.44)	0.45 (0.37, 0.53)
Site D	138	0.28 (0.08, 0.47)	0.27 (0.07, 0.47)	−0.18 (−0.51, 0.48)
Site E	2313	0.33 (0.29, 0.37)	0.30 (0.26, 0.35)	0.45 (0.36, 0.53)
Children	374	0.32 (0.22, 0.42)	0.32 (0.22, 0.42)	0.36 (0.20, 0.53)
Male	191	0.26 (0.12, 0.40)	0.27 (0.12, 0.42)	0.33 (0.11, 0.52)
Female	183	0.38 (0.24, 0.52)	0.41 (0.28, 0.55)	0.34 (0.10, 0.63)
Site F	73	0.36 (0.15, 0.57)	0.40 (0.20, 0.61)	0.42 (0.15, 0.65)
Site G	301	0.31 (0.19, 0.43)	0.31 (0.19, 0.43)	0.36 (0.20, 0.53)

Open in a new tab

Table 1 also includes estimates based on ${\hat{ρ}}_{S}^{H}$ , the fully non-parametric approach that requires estimation of the bivariate survival surface. Estimates ${\hat{ρ}}_{P S R}$ consistently showed positive correlation. Although fairly similar to ${\hat{ρ}}_{P S R}$ , ${\hat{ρ}}_{S}^{H}$ disagreed with ${\hat{ρ}}_{P S R}$ in a couple of cases showing negative insignificant correlations. This is probably because ${\hat{ρ}}_{S}^{H}$ has much larger variance for smaller samples and heavier censoring, so ${\hat{ρ}}_{P S R}$ might be more robust here. We also computed the P-values of the non-parametric tests ${\hat{S}}_{n}$ , ${\hat{T}}_{n}$ , ${\hat{U}}_{n}$ , and ${\hat{V}}_{n}$ cited in Section 4.1. All four tests were significant with the highest P-value of 0.024 (Adults, site D, for ${\hat{T}}_{n}$ ) and the majority of P-values being less than 0.001.

Table 1 also presents partial correlations computed using Cox proportional hazards models. Each model was adjusted for five covariates: sex, age at ART initiation, CD4 count at ART initiation (square-root transformed), viral load at ART initiation (log-transformed), and study site. Both CD4 count and viral load were included using restricted cubic splines with three knots at quantiles 0.1, 0.5, and 0.9. The covariate-adjusted correlation was generally similar to the unadjusted correlation, suggesting that the positive rank correlation between times to viral failure and regimen change were likely not due to the confounding by these covariates.

Figure 6 shows the partial rank correlation of time to viral failure and time to regimen change conditional on CD4 count and age at ART initiation. Each of the conditional correlations was adjusted for the other four covariates. The modeling techniques were the same as for partial correlations. To allow for greater flexibility of the correlation’s functional form, the linear regression models of PSRs (see Section 3) included the variable of interest using restricted cubic spline with 3 knots (at quantiles 0.1, 0.5, and 0.9) for children and 5 knots (at quantiles 0.05, 0.275, 0.5, 0.725, and 0.95) for adults. We chose a smaller number of knots for children because of the smaller sample size. The figure shows that the partial-conditional correlation as a function of CD4 looks similar in children and adults. For both, it decreases with increasing CD4 count. Confidence intervals for children tended to be wider because of the smaller sample size. The correlation conditional on age is about the same for children at the age right below 18 and adults at age 18 and remains more or less the same (around 0.3) up until the age of 60, where it starts declining towards zero.

Partial-conditional correlation of PSRs between time to viral failure and time to regimen change computed as a function of CD4 count and age at the time of first ART. The left column shows results for children, the right column for adults. Cox proportional hazards regression was used to model survival probabilities. The variance was estimated using full likelihood score equations. Note that because its skewed nature, the $x$ -axis of the CD4 counts is on the square-root scale.

6. Discussion

We proposed a method of measuring unadjusted, partial, and conditional correlations between right-censored variables by estimating the correlation of probability-scale residuals. In the absence of censoring, our method equals Spearman’s correlation; in the presence of censoring it approximates Spearman’s. Our method has several advantages. It is based on ranks and not affected by extreme values or by monotonic transformations. It does not assume the dependence structure, does not require estimating the joint bivariate survival distribution, and can be used with continuous and discrete data. Together with M-estimation, it provides a straightforward method to compute unadjusted correlation and confidence intervals for bivariate right-censored data, while also providing an interpretable measure of association. The unadjusted correlation of PSRs is purely non-parametric because it does not assume the form of the marginal distributions and can be estimated using non-parametric Kaplan-Meier estimates for marginal cumulative distribution functions. For moderate censoring and sample sizes of 200 or more, its power and type I error rate are comparable to previously suggested linear rank tests, while also providing an interpretable measure of association.

A notable advantage of our method is that it is easily extended to conditional, partial, and partial-conditional correlation. Although parametric assumptions have to be made when computing the partial and partial-conditional correlations, using Cox regression or other semi-parametric survival models maintains the rank-based nature of Spearman’s correlation. In addition, our simulations suggest that partial correlations are quite robust to the choice of model.

Our method is not designed to address questions related to competing risk events. For instance, in our data example, those who died were treated as censored at the time of death, which implicitly assumes they are similar to patients remaining in the study and spreads their probability mass accordingly (e.g., like a Kaplan-Meier estimator). This is a simplification and is a potential limitation of our analysis. With this limitation noted, the proportion who died prior to experiencing both events in our study was small, 0.07 for adults and 0.05 for children, so this analysis choice had little impact on the results. One could alternatively assume that those who never experienced an event had infinite event times, which do not present a problem to our method because our method is based on ranks and those with infinite values can simply be thought of as having the highest rank value. If those who died prior to experiencing an event were instead assigned $Δ_{X} = Δ_{Y} = 1$ and $X = Y = \infty$ (which is in practice equivalent to assigning $X$ and $Y$ to be arbitrary numbers greater than any time to event in the dataset), then $ρ_{P S R}$ was estimated as 0.33 (95% CI [0.31,0.36]) for adults and 0.40 (95% CI [0.31,0.49]) for children, which are close to our original estimates of 0.32 (95% CI [0.29,0.35]) for adults and 0.32 (95% CI [0.22,0.42]) for children. However, more research on rank correlation estimation in the presence of competing risks is warranted. In special situations, a reviewer noted that correlations can be computed in the presence of competing risks using extensions of multi-state models (e.g., extensions of Meller et al. (2019)).

Our motivating application illustrated our methods using data with univariate censoring. It is important to note that our proposed methods are also appropriate for settings with bivariate censoring. Following a reviewers suggestion, we also applied our method to a small data example using publicly available data with bivariate censoring. Details are in Supplemental Material Section 4.

The main limitation of our approach is that in the presence of censoring, the population parameter, $ρ_{P S R}$ , depends on the censoring distribution and generally does not equal $ρ_{S}$ . In particular, the bias of ${\hat{ρ}}_{P S R}$ for $ρ_{S}$ increases with heavier censoring. However, as shown by our simulations, even with heavy censoring our method can still detect correlation when it truly exists, and the mean squared error of ${\hat{ρ}}_{P S R}$ for $ρ_{S}$ tends to be smaller than that of the other non-parametric estimator of correlation with bivariate survival data, ${\hat{ρ}}_{S}^{H}$ (Eden et al., 2021). It should be noted that with heavy censoring, all methods for estimating the bivariate correlation have limitations. Parametric and semi-parametric approaches rely on parametric assumptions to extrapolate. The other non-parametric approach proposed by Eden et al. (2021) can be highly variable because it requires non-parametric estimation of the bivariate survival distribution, which can be unstable in the presence of heavy censoring and is computationally expensive. Also, under type I censoring it is impossible to non-parametrically estimate a bivariate survival surface after the last censoring point. An advantage of our estimator is that it does not require estimating the bivariate or marginal survival distributions after the last censoring point; hence, it addresses type I censoring by simply and intuitively plugging in the expectation of the PSR after the censoring point, so it is easily able to handle type I censoring. However, the bias of $ρ_{P S R}$ comes from using the expectation of PSRs after censoring. Thus, in situations with heavy censoring and when assumptions about marginal distributions or dependence structure are justified, parametric or semi-parametric methods may be preferred.

Supplementary Material

Supinfo1

NIHMS1931307-supplement-Supinfo1.pdf^{(557KB, pdf)}

Supinfo2

NIHMS1931307-supplement-Supinfo2.zip^{(4.4MB, zip)}

Acknowledgements

This work was partially supported by the US National Institutes of Health, grants R01 AI093234 (Shepherd), and U01 AI069923 (McGowan). We thank the members of CCASAnet for allowing us to present their data and Cathy Jenkins for her help with constructing the analysis dataset. We also thank Dr. Shih for sharing her code related to the paper of Shih and Louis (1996).

Footnotes

Conflict of Interest

The authors have declared no conflict of interest.

References

Belzile L. and Genest C. (2017). lcopula: Liouville Copulas. [R package version 1.0. https://CRAN.R-project.org/package=lcopula (accessed January 3, 2022)]
Breslow NE (1972). Discussion of Professor Cox’s paper. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 34, 216–217. [Google Scholar]
Carriere JF (2000). Bivariate survival models for coupled lives. Scandinavian Actuarial Journal 2000, 17–32. [Google Scholar]
CCASAnet (2020). Times to viral failure and regimen change data. [Anonymized for presentation, https://biostat.app.vumc.org/ArchivedAnalyses/ (accessed January 3, 2022)]
Cesar C. and Jenkins CA and Shepherd BE and Padgett D. and Mejía F. and Ribeiro SR and Cortes CP and Pape JW and Madero JS and Fink V. and others (2015). Incidence of virological failure and major regimen change of initial combination antiretroviral therapy in the Latin America and the Caribbean: an observational cohort study. The Lancet HIV 2, e492–e500. [DOI] [PMC free article] [PubMed] [Google Scholar]
Clayton DG (1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65, 141–151. [Google Scholar]
Clayton DG and Cuzick J. (1985). Multivariate generalizations of the proportional hazards model. Journal of the Royal Statistical Society. Series A (General), 82–117. [Google Scholar]
Cox DR (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 34, 187–202. [Google Scholar]
Cuzick J. (1982). Rank tests for association with right censored data. Biometrika 69, 351–364. [Google Scholar]
Dabrowska DM (1986). Rank tests for independence for bivariate censored data. The Annals of Statistics 14, 250–264. [Google Scholar]
Dabrowska DM (1988). Kaplan–Meier estimate on the plane. The Annals of Statistics, 1475–1489. [Google Scholar]
Ding AA and Wang W. (2004). Testing independence for bivariate current status data. Journal of the American Statistical Association 99, 145–155. [Google Scholar]
Eden SK and Li C. and Shepherd BE (2021). Nonparametric estimation of Spearman’s rank correlation with bivariate survival data. Biometrics, 78(2), pp.421–434. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fan J. and Hsu L. and Prentice RL (2000). Dependence estimation over a finite bivariate failure time region. Lifetime Data Analysis 6, 343–355. [DOI] [PubMed] [Google Scholar]
Kalbfleisch JD and Prentice RL (2011). The statistical analysis of failure time data. John Wiley & Sons. [Google Scholar]
Klein JP and Moeschberger ML (1997). Survival analysis: techniques for censored and truncated data. Springer Science & Business Media. [Google Scholar]
Kruskal WH (1958). Ordinal measures of association. Journal of the American Statistical Association 53, 814–861. [Google Scholar]
Li C. and Shepherd BE (2012). A new residual for ordinal outcomes. Biometrika, asr073. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu Q. and Li C. and Wanga V. and Shepherd BE (2018). Covariate-adjusted Spearman’s rank correlation with probability-scale residuals. Biometrics 74, 595–605. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meller M. and Beyersmann J. and Rufibach K. (2019). Joint modeling of progressionfree and overall survival and computation of correlation measures. Statistics in Medicine, 38(22), 4270–4289. [DOI] [PubMed] [Google Scholar]
McGowan CC and Cahn P. and Gotuzzo E. and Padgett D. and Pape JW and Wolff M. and Schechter M. and Masys DR (2007). Cohort profile: Caribbean, Central and South America Network for HIV research (CCASAnet) collaboration within the international Epidemiologic databases to evaluate AIDS (IeDEA) programme. International Journal of Epidemiology 36, 969–976. [DOI] [PubMed] [Google Scholar]
Narasimhan B. and Johnson SG (2017). cubature: Adaptive Multivariate Integration over Hypercubes. [R package version 1.3–8. https://CRAN.R-project.org/package=cubature (accessed January 3, 2022)]
Nelsen RB (2007). An introduction to copulas. Springer Science & Business Media. [Google Scholar]
Prentice RL and Hsu L. (1997). Regression on hazard ratios and cross ratios in multivariate failure time analysis. Biometrika 84, 349–363. [Google Scholar]
Prentice RL and Zhao S, (2019). The Statistical Analysis of Multivariate Failure Time Data: A Marginal Modeling Approach. Chapman and Hall/CRC. [Google Scholar]
Pruitt RC (1991). On negative mass assigned by the bivariate Kaplan–Meier estimator. The Annals of Statistics 19, 443–453. [Google Scholar]
R Core Team (2017). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; [Version 3.4.3. https://www.R-project.org/ (accessed January 3, 2022)] [Google Scholar]
Romeo José S. and Tanaka NI and Pedroso-de-Lima AC (2006). Bivariate survival modeling: a Bayesian approach based on copulas. Lifetime Data Analysis 12, 205–222. [DOI] [PubMed] [Google Scholar]
Schemper M. and Kaider A. and Wakounig S. and Heinze G. (2013). Estimating the correlation of bivariate failure times under censoring. Statistics in Medicine 32, 4781–4790. [DOI] [PubMed] [Google Scholar]
Shepherd BE and Li C. and Liu Q. (2016). Probability-scale residuals for continuous, discrete, and censored data. Canadian Journal of Statistics 44, 463–479. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shih JH and Louis TA (1995). Inferences on the association parameter in copula models for bivariate survival data. Biometrics, 1384–1399. [PubMed] [Google Scholar]
Shih JH and Louis TA (1996). Tests of independence for bivariate survival data. Biometrics, 1440–1449. [PubMed] [Google Scholar]
Stefanski LA and Boos DD (2002). The calculus of M-estimation. The American Statistician 56, 29–38. [Google Scholar]
Stute W. (1995). The central limit theorem under random censorship. The Annals of Statistics, 422–439. [Google Scholar]
Therneau TM (2015). A Package for Survival Analysis in S. [R package version 2.38. https://CRAN.R-project.org/package=survival (accessed January 3, 2022)]
Tsai WY and Leurgans S. and Crowley J. (1986). Nonparametric estimation of a bivariate survival function in the presence of censoring. The Annals of Statistics, 14(4), pp.1351–1365. [Google Scholar]
Tsai WY and Crowley J. (1998). A note on nonparametric estimators of the bivariate survival function under univariate censoring. Biometrika 85(3), 573–580. [Google Scholar]
Van Der Laan MJ (1996). Efficient estimation in the bivariate censoring model and repairing NPMLE. The Annals of Statistics 24, 596–627. [Google Scholar]
Zeng D. and Lin DY (2007). Maximum likelihood estimation in semiparametric regression models with censored data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69, 507–564. [Google Scholar]
Zhang S. (2008). Inference on the association measure for bivariate survival data with hybrid censoring and applications to an HIV study. Ph.D. Thesis. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supinfo1

NIHMS1931307-supplement-Supinfo1.pdf^{(557KB, pdf)}

Supinfo2

NIHMS1931307-supplement-Supinfo2.zip^{(4.4MB, zip)}

[R1] Belzile L. and Genest C. (2017). lcopula: Liouville Copulas. [R package version 1.0. https://CRAN.R-project.org/package=lcopula (accessed January 3, 2022)]

[R2] Breslow NE (1972). Discussion of Professor Cox’s paper. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 34, 216–217. [Google Scholar]

[R3] Carriere JF (2000). Bivariate survival models for coupled lives. Scandinavian Actuarial Journal 2000, 17–32. [Google Scholar]

[R4] CCASAnet (2020). Times to viral failure and regimen change data. [Anonymized for presentation, https://biostat.app.vumc.org/ArchivedAnalyses/ (accessed January 3, 2022)]

[R5] Cesar C. and Jenkins CA and Shepherd BE and Padgett D. and Mejía F. and Ribeiro SR and Cortes CP and Pape JW and Madero JS and Fink V. and others (2015). Incidence of virological failure and major regimen change of initial combination antiretroviral therapy in the Latin America and the Caribbean: an observational cohort study. The Lancet HIV 2, e492–e500. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Clayton DG (1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65, 141–151. [Google Scholar]

[R7] Clayton DG and Cuzick J. (1985). Multivariate generalizations of the proportional hazards model. Journal of the Royal Statistical Society. Series A (General), 82–117. [Google Scholar]

[R8] Cox DR (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 34, 187–202. [Google Scholar]

[R9] Cuzick J. (1982). Rank tests for association with right censored data. Biometrika 69, 351–364. [Google Scholar]

[R10] Dabrowska DM (1986). Rank tests for independence for bivariate censored data. The Annals of Statistics 14, 250–264. [Google Scholar]

[R11] Dabrowska DM (1988). Kaplan–Meier estimate on the plane. The Annals of Statistics, 1475–1489. [Google Scholar]

[R12] Ding AA and Wang W. (2004). Testing independence for bivariate current status data. Journal of the American Statistical Association 99, 145–155. [Google Scholar]

[R13] Eden SK and Li C. and Shepherd BE (2021). Nonparametric estimation of Spearman’s rank correlation with bivariate survival data. Biometrics, 78(2), pp.421–434. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Fan J. and Hsu L. and Prentice RL (2000). Dependence estimation over a finite bivariate failure time region. Lifetime Data Analysis 6, 343–355. [DOI] [PubMed] [Google Scholar]

[R15] Kalbfleisch JD and Prentice RL (2011). The statistical analysis of failure time data. John Wiley & Sons. [Google Scholar]

[R16] Klein JP and Moeschberger ML (1997). Survival analysis: techniques for censored and truncated data. Springer Science & Business Media. [Google Scholar]

[R17] Kruskal WH (1958). Ordinal measures of association. Journal of the American Statistical Association 53, 814–861. [Google Scholar]

[R18] Li C. and Shepherd BE (2012). A new residual for ordinal outcomes. Biometrika, asr073. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Liu Q. and Li C. and Wanga V. and Shepherd BE (2018). Covariate-adjusted Spearman’s rank correlation with probability-scale residuals. Biometrics 74, 595–605. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Meller M. and Beyersmann J. and Rufibach K. (2019). Joint modeling of progressionfree and overall survival and computation of correlation measures. Statistics in Medicine, 38(22), 4270–4289. [DOI] [PubMed] [Google Scholar]

[R21] McGowan CC and Cahn P. and Gotuzzo E. and Padgett D. and Pape JW and Wolff M. and Schechter M. and Masys DR (2007). Cohort profile: Caribbean, Central and South America Network for HIV research (CCASAnet) collaboration within the international Epidemiologic databases to evaluate AIDS (IeDEA) programme. International Journal of Epidemiology 36, 969–976. [DOI] [PubMed] [Google Scholar]

[R22] Narasimhan B. and Johnson SG (2017). cubature: Adaptive Multivariate Integration over Hypercubes. [R package version 1.3–8. https://CRAN.R-project.org/package=cubature (accessed January 3, 2022)]

[R23] Nelsen RB (2007). An introduction to copulas. Springer Science & Business Media. [Google Scholar]

[R24] Prentice RL and Hsu L. (1997). Regression on hazard ratios and cross ratios in multivariate failure time analysis. Biometrika 84, 349–363. [Google Scholar]

[R25] Prentice RL and Zhao S, (2019). The Statistical Analysis of Multivariate Failure Time Data: A Marginal Modeling Approach. Chapman and Hall/CRC. [Google Scholar]

[R26] Pruitt RC (1991). On negative mass assigned by the bivariate Kaplan–Meier estimator. The Annals of Statistics 19, 443–453. [Google Scholar]

[R27] R Core Team (2017). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; [Version 3.4.3. https://www.R-project.org/ (accessed January 3, 2022)] [Google Scholar]

[R28] Romeo José S. and Tanaka NI and Pedroso-de-Lima AC (2006). Bivariate survival modeling: a Bayesian approach based on copulas. Lifetime Data Analysis 12, 205–222. [DOI] [PubMed] [Google Scholar]

[R29] Schemper M. and Kaider A. and Wakounig S. and Heinze G. (2013). Estimating the correlation of bivariate failure times under censoring. Statistics in Medicine 32, 4781–4790. [DOI] [PubMed] [Google Scholar]

[R30] Shepherd BE and Li C. and Liu Q. (2016). Probability-scale residuals for continuous, discrete, and censored data. Canadian Journal of Statistics 44, 463–479. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Shih JH and Louis TA (1995). Inferences on the association parameter in copula models for bivariate survival data. Biometrics, 1384–1399. [PubMed] [Google Scholar]

[R32] Shih JH and Louis TA (1996). Tests of independence for bivariate survival data. Biometrics, 1440–1449. [PubMed] [Google Scholar]

[R33] Stefanski LA and Boos DD (2002). The calculus of M-estimation. The American Statistician 56, 29–38. [Google Scholar]

[R34] Stute W. (1995). The central limit theorem under random censorship. The Annals of Statistics, 422–439. [Google Scholar]

[R35] Therneau TM (2015). A Package for Survival Analysis in S. [R package version 2.38. https://CRAN.R-project.org/package=survival (accessed January 3, 2022)]

[R36] Tsai WY and Leurgans S. and Crowley J. (1986). Nonparametric estimation of a bivariate survival function in the presence of censoring. The Annals of Statistics, 14(4), pp.1351–1365. [Google Scholar]

[R37] Tsai WY and Crowley J. (1998). A note on nonparametric estimators of the bivariate survival function under univariate censoring. Biometrika 85(3), 573–580. [Google Scholar]

[R38] Van Der Laan MJ (1996). Efficient estimation in the bivariate censoring model and repairing NPMLE. The Annals of Statistics 24, 596–627. [Google Scholar]

[R39] Zeng D. and Lin DY (2007). Maximum likelihood estimation in semiparametric regression models with censored data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69, 507–564. [Google Scholar]

[R40] Zhang S. (2008). Inference on the association measure for bivariate survival data with hybrid censoring and applications to an HIV study. Ph.D. Thesis. [Google Scholar]

PERMALINK

Spearman-like correlation measure adjusting for covariates in bivariate survival data

Svetlana K Eden

Chun Li

Bryan E Shepherd

Abstract

1. Introduction