Abstract
Most research on the study of associations among paired failure times has either assumed time invariance or been based on complex measures or estimators. Little has accommodated competing risks. This paper targets the conditional cause-specific hazard ratio, henceforth called the cause-specific cross ratio, a recent modification of the conditional hazard ratio designed to accommodate competing risks data. Estimation is accomplished by an intuitive, non-parametric method that localizes Kendall’s tau. Time variance is accommodated through a partitioning of space into ‘bins’ between which the strength of association may differ. Inferential procedures are developed, small-sample performance is evaluated and the methods are applied to the investigation of familial association in dementia onset.
Keywords: Cause-specific, Kendall’s tau, Multivariate, Paired, Survival, U-statistic
1. INTRODUCTION
Methodology for analyzing correlated failure-time data must ensure correctness of inferences, accounting for failure-time dependence within clusters. We propose methodology for estimating the strength of dependence for paired failure times.
Much work assessing failure time associations has been focused on the cross, or conditional hazard, ratio function (Clayton, 1978; Oakes, 1982, 1986; Clayton & Cuzick, 1985). Various researchers have exploited parametric models for the cross ratio (Genest & MacKay, 1986; Oakes, 1989; Nielsen et al., 1992; Genest et al., 1995; Shih & Louis, 1995; Glidden, 2000; Ripatti & Palmgren 2000; Ripatti et al., 2002). Measures allowing non-parametrically time-varying association have also been proposed, with common estimation strategies being to ‘plug into’ one’s measure a non-parametric estimator of the multivariate survival or cumulative hazard function (Prentice & Cai, 1992; Hsu & Prentice, 1996; Fan et al., 2000; Wang & Wells, 2000) or solve method of moments equations involving appropriately chosen empirical processes for the measure of interest (Oakes, 1982, 1989; Genest & Rivest, 1993; Barbe et al., 1996; Viswanathan & Manatunga, 2001; Chen & Bandeen-Roche, 2005). Related regression models have also been proposed (Prentice & Hsu, 1997; Fine & Jiang, 2000).
However, measures of failure time association have had slow uptake in biomedical studies, and we know of only three papers addressing the (semi-)competing risks that are unavoidable when conditions may lead to death or affect only a fraction of individuals; see Bandeen-Roche & Liang (2002), a University of Wisconsin technical report by Y. Cheng, J. P. Fine & M. R. Kosorok (2004), and a paper soon to appear by Cheng & Fine (2008). To narrow these gaps, this paper studies a simple, non-parametric estimator of an easily interpreted measure of failure time association, namely the cause-specific cross ratio (Bandeen-Roche & Liang, 2002). Cheng & Fine (2008) have studied an alternative approach to estimating the same measure. Our derivation yields insight into the convergence behaviour of cross-ratio estimators when time invariance of the cross ratio is wrongly assumed, and into factors affecting the estimator’s precision.
2. METHODS
2·1. Notation and estimand
Consider competing events 1,…, m. Let X* be the time to the first event, and let Kϵ {1, …,m} be a code identifying that event. We observe, for individuals, X, the minimum of X* and the time at which X* is censored for non-competing reasons, together with K, a code equalling 0 if failure is censored and K*, if the earliest competing event precedes censoring. Then, the individual’s cause-specific hazard for the kth event is
(Prentice et al., 1978; Benichou & Gail, 1990).
With bivariate failure processes, observable data are times (Xi, Yi) and associated causes (Ki, Li) samples in pairs, for i = 1, …, n. We assume (Xi, Yi, Ki, Li) to be independently and identically distributed so that, with i suppressed, cause-specific densities
(1) |
exist for each combination of failure causes (k, l). Then, (X, Y) has an absolutely continuous joint survival function
Bandeen-Roche & Liang (2002) defined the cause-specific conditional hazard, i.e. cross ratio, as
(2) |
If X and Y are failure times for two family members, and causes K and L indicate dementia onset, if equal to 1, or death before dementia, if equal to 2, then the cause-specific cross ratio for k = 1, l = 1 gives the factor by which dementia onset at age y is increased for family members whose relatives are diagnosed as cases at age x versus those whose relatives survive, disease-free, beyond age x, or vice versa:
2·2. Estimator
A transformed, localized Kendall’s tau serves to estimate (2) non-parametrically. Let (Xa, Y2) and (Xb, Ya) be independently drawn failure time pairs; let (X(ab), Y(ab)) be the componentwise minima (XaΛXb, YaΛYb); and let (K(ab), L(ab)) be causes corresponding to (X(ab), Y(ab)). It can be shown that
that is, the quotient of conditional probabilities of concordance and discordance between the pairs’ failure times, given (X(ab), Y(ab)) and (K(ab), L(ab)). Thus, a simple estimator determines concordance status for every two pairs with (K(ab), L(ab)) = (k, l) and then divides the number of concordances by the number of discordances. However, if θCS(x, y; k, l) is a continuous function of (x, y) on {(u, v) : u > 0, v > 0} = ℛ2+, the numbers or ratios must be binned or smoothed if stable ratios are to be obtained. Let ℬ = {B1,…,Bj} be a partition of ℛ2+, with ℬ set a priori and J finite, and ℐ{A} = 1 if A is true and 0 otherwise. Our estimator is
(3) |
with j (x, y) indexing the partition cell including (x, y). It parallels estimators employed by Bandeen-Roche & Liang (2002) and, with causes ignored, Chen & Bandeen-Roche (2005), but defines ℬ on ℛ2+ rather than on {S(x, y), (x, y) ∈ ℛ2+}.
2·3. Distributional properties
Asymptotic inference employing (3) seems only to have been developed for like estimators ignoring competing risks and involving the global Kendall’s tau (Kendall, 1948, p. 67; Oakes 1982) or a weighted analogue (Oakes, 1986). Existing theory for localized versions of tau (Shieh, 1998) does not apply, because our localization procedure incorporates weights that may stochastically relate to the terms being weighted. Rather, let ri represent a generic four-tuple realization (xi, yi, ki, li) for pair i, and Ri, the analogous random variable. At each (x, y, k, l), the numerator of (3) is a U-statistic with kernel
and similarly for the denominator with kernel, we label hD(x,y,k,l)(ra, rb). Thus, inferences follow directly from U-statistic theory (Serfling, 1980, Ch. 5). With sums replaced by averages, (3) converges almost surely to
(4) |
under weak conditions, so long as the denominator exceeds 0. More interesting is to consider interpretation if pr{(Xa − Xb)(Ya − Yb) < 0| (X(ab), Y(ab)) = (x, y), (K(ab), L(ab)) = (k, l)} is bounded above 0 almost everywhere on Bj(x,y). Then, by an argument in a longer version of this paper available from the authors, (4) equals
(5) |
where we now suppress the (x, y) notation indexing bins, retaining only j. If the cause-specific cross ratio is constant over Bj, (5) equals that constant value. Otherwise, (5) averages over potential time realizations within Bj, weighting with respect to probabilities of discordance in pairs a and b. Thus, with time-varying cause-specific cross ratio, (5) summarizes up-weighting regions of less strongly positive association over the bin, and the limiting target of (3) is closer to the null than a straight expectation of (2) over Bj.
We report asymptotic distributions of the dividends defining (3). Notation to index properly each ‘C’ and ‘D’ subscripted term in terms of (j, k, l) is suppressed. The dividends depend on the numerator and denominator kernel means,
and the corresponding variances and covariances. The numerator variance depends on
and the denominator variance on
The terms hC(Ra, Rb) and hD(Ra, Rb) are indicator functions, and thus have variances ζ2C = ĒC(1 − ĒC) and ζ2D = ĒD (1 − ĒD); the terms ζ1C and ζ1D are elucidated in Appendix 1. The numerator and denominator variances are obtained, respectively from U-statistic theory as,
Moreover, the dividends asymptotically have normal distributions, provided ζ1C > 0 and ζ1D > 0. Under these and previous conditions,
in distribution, where η1 = cova[Eb|a{hC(Ra, Rb)}, Eb|a{hD(Ra, Rb)}]. By the delta method, the proposed estimator is asymptotically normal with limiting mean (5) and variance
(6) |
To see what ζ1C > 0, ζ1D > 0 entail, consider the form of ζ1C, from Appendix 1:
(7) |
Trivially, bins and failure causes must be such that failures of type (k, l) may occur, excluding pr{(Xa, Ya) ∈ Bj, (Ka, La) = (k, l)} = 0. Furthermore, if S(x, y)) is restricted to one dimension, S(x, y) = 1 − F(x, y), (7) equals 0 if there are one bin, ℛ2+ and failure cause. However, even in this case, all ζ1C > 0 if there are multiple causes or bins having well-defined positive measure.
Equation (3) estimates θCS(x, y; k, l) for each bin and pair of failure causes (k, l). There may also be interest in estimators involving more than one of these, such as the ratio comparing individuals whose relative is diagnosed with dementia, versus who dies dementia-free, at a given age:
It is relatively simple so to extend inferences, because the cause-specific cross ratios for different cause combinations and bins are asymptotically uncorrelated; see Appendix 2.
2·4. Variance estimation
Since the discordance-associated terms are complicated, we estimate quantities defining the limiting variance of our estimator by their sample counterparts. Estimators for ĒC and ĒD are given by numerator and denominator of (3), and those for ζ1C, ζ1D and η1 are calculated similarly; for instance,
Computations involve nested sums and thus, are O(n2) in complexity, but simple in form.
3. SIMULATION STUDY
3·1. Design
Each study scenario envisaged two failure causes, ‘disease’ (k = 1) and ‘death’ (k = 2), and comprised 500 runs. Bivariate data were generated according to the frailty model for subject-and cause-specific failure hazards given by equation (6) of Bandeen-Roche & Liang (2002), as described in Appendix 2 of that article. For a given hazard function λ*(t) = 1, that model is
(8) |
with G as a scalar ‘size’ frailty, and W, a compositional ‘shape’ frailty, shared by a given pair. Throughout, we generated W to be beta-distributed with scale parameter 1 and set λ*(t) = 1.
We allowed the distribution of G to be gamma or positive stable. The cause-specific cross ratio is time-invariant with gamma frailties; we varied it to be 2·25 or 6·0. In positive stable runs, we generated size frailties as in Lee (1979) and fixed E(W) = (0·5, 0·5). Positive stable distributions have Laplace transform exp(−ua); we varied α to be 0·4 or 0·8.
For each run, we applied (3) over a four-cell grid, bisecting each time, dimension at its median. We considered two sample sizes, n = 500, 1000. Standard errors were constructed as in § 2·4. For each scenario, we evaluated estimator bias, compared asymptotic and empirical estimator variance calculations and determined 95% confidence interval coverage. With positive stable-size frailties, the cause-specific cross ratio decreases with time, and the bin-specific association parameters are given by equation (5). To estimate these, we generated ns = 20 000 pairs for each scenario and replaced the expectation in (5) by a sample average over the 20 000-choose-2 pairings of pairs. To select this sample size, we generated estimates over a range of ns; by ns = 20 000, an approximate asymptote was achieved.
Secondary studies compared the performance of our estimator and the cumulative hazard plug-in estimator proposed by Fan et al., (2000). This estimator addresses single-cause failure outcomes; data were generated accordingly, setting wk in equation (8) equal to 1. It estimates the cross ratio, θ(t), over lower (t1, t2) quadrants; we report findings over quadrants with t1 and t2 equal to (a) the lower quartile and (b) the upper quartile of the marginal survival function. We studied models of independence within pairs, gamma frailty with cross ratio equal to 2; and positive stable frailty with α = 0·4, 0·8. For positive stable scenarios, the values of parameters to be estimated were approximated, as described in the previous paragraph. For the Fan estimator, equation (6) of Fan et al., (2000) was averaged, with weights as given at the top of page 183 of that article. We compared sample sizes of n = 100 and n = 1000 pairs as well as scenarios without censoring and with independent standard-exponential-distributed random censoring of individual failure times, yielding censoring probability of approximately 0·5.
3·2. Results
We first consider performance with gamma scenarios in our primary study; see Table 1. Both our estimator and its associated inferences were accurate on all time quadrants. With n = 1000, upward biases ranged from 1% to 7% for both point and standard error estimation; with n = 500, these were moderately exacerbated. Confidence interval coverage ranged from 93% to 97%.
Table 1.
Quadrant | n | R1 = 0·2; θCS(1, 1) = 6 | R1 = 0·8; θCS(1, 1) = 2·25 | ||||||
---|---|---|---|---|---|---|---|---|---|
Mean | SDE | SDM | Cov | Mean | SDE | SDM | Cov | ||
1 : (t1, t2)≤ | 1000 | 6·11 | 1·06 | 1·00 | 0·96 | 2·24 | 0·16 | 0·16 | 0·95 |
medians | 500 | 6·30 | 2·16 | 1·93 | 0·96 | 2·31 | 0·31 | 0·31 | 0·95 |
2 : t1 ≤ median | 1000 | 6·12 | 1·64 | 1·60 | 0·93 | 2·27 | 0·24 | 0·24 | 0·94 |
t2 > median | 500 | 7·18 | 4·39 | 4·16 | 0·94 | 2·34 | 0·47 | 0·47 | 0·96 |
3 : t1 > median, | 1000 | 6·36 | 1·70 | 1·65 | 0·94 | 2·25 | 0·24 | 0·24 | 0·94 |
ta ≤ median | 500 | 7·06 | 6·59 | 4·96 | 0·93 | 2·29 | 0·46 | 0·46 | 0·93 |
4 : (t1, t2) | 1000 | 6·15 | 1·38 | 1·28 | 0·96 | 2·26 | 0·20 | 0·20 | 0·96 |
> medians | 500 | 7·16 | 4·27 | 3·85 | 0·93 | 2·31 | 0·40 | 0·39 | 0·95 |
θCS(t1, t2; 1, 1), cause-specific cross ratio function as in equation (2); θ(t), cross ratio function; SDE, square root of the average of variance estimates over 500 runs; SDM, empirical standard deviation of estimates over 500 runs; Cov, empirical confidence interval coverage.
Estimator performance was even better for positive stable scenarios, see Table 2, yet upward biases persisted for the smaller sample sizes and less populated bins. As suggested by a referee, we explored the impact of estimator distribution skewness on these biases by comparing geometric means and arithmetic means for n = 500 and α = 0·4. Geometric means were generally closer to limiting target values than arithmetic means, with biases of 0·05, 0·01, < 0·01 and −0·03 as compared to 0·17, 0·16, 0·17 and 0·01 per bins ordered as the rows of Table 2.
Table 2.
Quadrant | n | α = 0·4; {θCS(Q;1; 1, 1), …, θCS(Q4; 1, 1) = {9·24, 3·68, 3·66, 2·89} |
α = 0·8; {θCS(Q1; 1, 1), …, θCS(Q4·1, 1) = {2·66, 1·80, 1·80, 1·69} |
||||||
---|---|---|---|---|---|---|---|---|---|
Mean | SDE | SDM | Cov | Mean | SDE | SDM | Cov | ||
1 : (t1, t2)≤ | 1000 | 9·24 | 1·00 | 1·04 | 0·93 | 2·67 | 0·29 | 0·27 | 0·96 |
medians | 500 | 9·41 | 1·92 | 1·99 | 0·94 | 2·79 | 0·56 | 0·59 | 0·94 |
2 : t1 ≤ median | 1000 | 3·76 | 0·73 | 0·75 | 0·94 | 1·81 | 0·25 | 0·25 | 0·95 |
t2 > median | 500 | 3·93 | 1·52 | 1·49 | 0·94 | 1·89 | 0·51 | 0·52 | 0·94 |
3 : t1 median, | 1000 | 3·77 | 0·74 | 0·76 | 0·94 | 1·82 | 0·25 | 0·23 | 0·96 |
t2 ≤ median | 500 | 4·02 | 1·61 | 1·60 | 0·95 | 1·85 | 0·49 | 0·50 | 0·93 |
4 : (t1, t2) | 1000 | 2·89 | 0·33 | 0·35 | 0·93 | 1·72 | 0·23 | 0·23 | 0·95 |
> medians | 500 | 2·98 | 0·66 | 0·64 | 0·96 | 1·75 | 0·47 | 0·46 | 0·95 |
θCS(t1, t2; 1, 1), cause-specific cross ratio function as in equation (2); SDE, square root of the average of variance estimates over 500 runs; SDM, empirical standard deviation of estimates over 500 runs; Cov, empirical confidence interval coverage.
Table 3 compares our estimator and the Fan estimator. In independence and gamma frailty runs, both estimators achieved outstanding accuracy, and they exhibited nearly identical precision. In positive stable scenarios, our estimator’s limiting values exceeded those of the Fan estimator, increasingly with the strength of association and quadrant size. Equation (5) predicts this. Our estimator’s standard error was higher in absolute terms, but was only modestly higher as a percentage of its limiting target. In the absence of censoring, both estimators exhibited little bias. In censored scenarios, both estimators exhibited appreciable bias, ranging approximately from 10% to 50%. In all, the estimators performed quite similarly.
Table 3.
Association Model | n | Censored | (t1, t2)≤ lower quartiles | (t1, t2)≤ upper quartiles | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Our estimator | Fan estimator | Our estimator | Fan estimator | |||||||
Mean | SDE | Mean | SDE | Mean | SDE | Mean | SDE | |||
Independent | 1000 | N | 1·00 | 0·13 | 1·00 | 0·13 | 1·00 | 0·05 | 1·00 | 0·05 |
θ(t) = 1 | Y | 1·00 | 0·15 | 1·00 | 0·15 | 1·00 | 0·08 | 1·00 | 0·08 | |
100 | N | 1·05 | 0·43 | 1·05 | 0·42 | 1·01 | 0·16 | 1·03 | 0·15 | |
Y | 1·05 | 0·52 | 1·05 | 0·51 | 1·01 | 0·19 | 1·02 | 0·19 | ||
Gamma | 1000 | N | 2·02 | 0·20 | 2·02 | 0·20 | 2·00 | 0·09 | 2·00 | 0·10 |
θ(t) = 2·0 | Y | 2·01 | 0·23 | 2·01 | 0·23 | 2·00 | 0·15 | 1·99 | 0·15 | |
100 | N | 2·11 | 0·73 | 2·08 | 0·69 | 2·01 | 0·31 | 1·96 | 0·32 | |
Y | 2·13 | 0·84 | 2·11 | 0·82 | 2·07 | 0·52 | 2·05 | 0·51 | ||
Pos· stable | 1000 | N | 11·3 | 1·10 | 10·1 | 1·00 | 4·53 | 0·24 | 2·90 | 0·19 |
α = 0·4 | Y | 12·4 | 1·41 | 10·9 | 1·26 | 6·31 | 0·49 | 3·95 | 0·35 | |
100 | N | 12·2 | 4·27 | 9·80 | 3·05 | 4·56 | 0·75 | 2·85 | 0·56 | |
Y | 13·7 | 5·86 | 10·8 | 3·97 | 6·69 | 1·64 | 4·11 | 1·11 | ||
limit | 11·2 | NA | 10·1 | NA | 4·51 | NA | 2·94 | NA | ||
Pos· stable | 1000 | N | 2·53 | 0·28 | 2·32 | 0·26 | 1·56 | 0·08 | 1·28 | 0·07 |
α = 0·8 | Y | 2·68 | 0·31 | 2·43 | 0·28 | 1·82 | 0·13 | 1·42 | 0·12 | |
100 | N | 2·62 | 0·90 | 2·38 | 0·80 | 1·57 | 0·23 | 1·29 | 0·21 | |
Y | 2·81 | 1·13 | 2·55 | 1·00 | 1·86 | 0·46 | 1·46 | 0·37 | ||
limit | 2·52 | NA | 2·33 | NA | 1·55 | NA | 1·28 | NA |
θ(t), cross ratio function; SDE, square root of the average of variance estimates over 500 runs.
4. APPLICATION: AGGREGATION OF DEMENTIA IN FAMILIES
There is evidence that dementia aggregates in families (Hendrie, 1998) with greater heritability for early- than later-onset dementia (Silverman et al., 2005). If so, dementia onset ages should be associated within families, most strongly in a lower left quadrant of ages. In addition, death very often predates a dementia diagnosis.
We analyzed the same 3635 failure time pairs, from the Cache County Study on Memory in Aging (Breitner et al., 1999), as were analyzed by Bandeen-Roche & Liang (2002). Event times were ascertained for the oldest sibling inclusive of self (X) and mother (Y). Analyses treated censoring as a third ‘failure’ cause, besides dementia onset and death.
We first estimated θCS(j; 1, 2) on a grid dichotomizing children’s and mothers’ ages approximately at the respective medians for time-to-first event, yielding bins (x ≤ 75, y > 80), (x > 75, y ≥ 80), (x > 75, y > 80) and (x > 75, y > 80); see Table 4. Early maternal onset of dementia was strongly associated with early child onset: θ̂CS(j; 1, 1) = 3·81 for (x ≤ 75, y ≤ 80) with 95% confidence interval (1·48, 6·14). Surprisingly, the estimated strength of association was greater for paired late onsets: θ̂CS(j; 1, 1) = 5·89 on (x > 75, y > 80) with 95% confidence interval (1·67, 10·1). It was only notably weaker for early child onset and late maternal onset: θ̂CS(j; 1, 1) = 0·80 on (x ≤ 75, y > 80) with 95% confidence interval (−0·27, 1·86).Here, asymptotic inferences were derived as described in § 2·4. We also computed bootstrap standard errors and confidence intervals, following Bandeen-Roche & Liang (2002), taking 1000 boot-strap samples. Bootstrap standard errors closely matched asymptotic approximations except in the (late, late) quadrant, where the approximation was roughly 10% smaller than the boot-strap estimate. Asymptotic confidence intervals had lower limits decreased by 10–20%, and upper limits considerably more modestly, than their bias-corrected percentile-based bootstrap counterparts.
Table 4.
Quadrant | Mean | SDE | SDM | SDB | 95% CI-asymptotic | 95% CI-bootstrap |
---|---|---|---|---|---|---|
1 : t1 ≤ 75, t2 ≤ 80 | 3·81 | 1·19 | 1·23 | 1·23 | (1·48, 6·14) | (1·68, 6·22) |
2 : t1 ≤ 75, t2 > 80 | 0·80 | 0·54 | 0·53 | 0·57 | (−0·27, 1·86) | (0·00, 1·87) |
3 : t1 > 75, t2 ≤ 80 | 2·41 | 0·73 | 0·76 | 0·77 | (0·97, 3·84) | (1·10, 3·92) |
4 : t1 > 75, t2 > 80 | 5·89 | 2·15 | 2·41 | 2·38 | (1·67, 10·11) | (2·08, 10·39) |
SDE, asymptotic standard deviation approximation; SDM, square-root of the average of asymptotic variance approximation estimates over bootstrap replicates; SDB, bootstrap standard deviation estimate; CI, confidence interval.
To explore further the unexpected strength of association found for paired late dementia onsets, we conducted analyses trichotomizing to age ranges ≤ 70, 70 – 80 and > 80 in each dimension. The strength of estimated association for two early onsets was increased: θCS(j; 1, 1) = 5·35. Associations were the weakest in the bins with maximally disparate children’s and mothers’ dementia onset ages. However, there was little evidence against a comparable strength of association for two late onsets as for two early onsets. With small sample sizes in most cells, few of the estimated associations differed significantly from the null of θCS(j; 1, 1) = 1.
Bandeen-Roche & Liang (2002) estimated the cause-specific cross ratio as 8·86 for times with joint first-event survival probability greater than 0·80, and of the order of 2·5 for times with joint survival probability no greater than 0·80. Their analysis may have understated heritability of late-onset dementia, because comparably, late times and disparate times may both have low joint survival probability. The assumptions made by copula-based association analysis appear inadequate for the current example.
5. BIN CHOICE AND STUDY DESIGN
Our methodology requires the choice of cut-points defining failure time ‘bins.’ For our dementia analysis and larger simulation runs, we originally attempted estimation on a 5 × 5 time grid. This partitioning yielded zero-count cells. The number of (a, b) pairings with (early failure, late failure) componentwise minima decreases with degree of discrepancy between ‘early’ and ‘late’, and the number of (a, b) pairings with (late, late) componentwise minima declines with time. Ultimately, a partitioning by equally spaced marginal quantiles is not optimal.
Beyond ad hoc considerations, it will sometimes be necessary to design studies, ensuring that strength of association is estimated with suitable precision in given regions of space. While full elaboration is beyond this paper’s scope, a tractable design basis emerges if we multiply and divide the asymptotic variance (6) by Given (6) equals
(9) |
Candidates for determine candidates for ĒC. Then, to complete (9), one must obtain candidates for ζ1C, ζ1D and η1. While these may be both complicated and unknown, Appendix 1 provides a template for their approximation with pilot data on bivariate failure location and cause frequencies, and marginal failure time distributions.
6. DISCUSSION
Our methodology accurately estimates failure time associations over a range of sample sizes and associations encountered in practice. Its accuracy appears comparable, and its precision nearly so, to that of estimation as proposed by Fan et al., (2000). Relative to survival function and cumulative hazard plug-in counterparts, our estimator’s strengths include its ready interpretation and simplicity. We do not intend it as a replacement for alternative methods, but as a potentially more easily interpreted and readily implemented complement to them.
Our strategy handles censoring as a distinct failure cause. Doing this more efficiently is an advantage of competing methods, such as that of Fan et al., (2000). Doing so for our method is complicated, because censoring introduces uncertainty into both the determination of concordance and the value of (x, y) to which a given determination should be assigned.
Through (5), our estimator up-weights regions of less strongly positive association in summarizing time-varying strength of association over a bin. This effect is not due to competing risks, and thus, also prevails for Kendall’s tau-based cross-ratio estimators. That there may be a need to delineate estimation targets when cross-ratio constancy is mistakenly assumed, was highlighted in our positive stable simulations, where the Fan estimator appeared to down-weight the conditional hazard ratio more strongly than ours relative to an unweighted average over a bin.
Rather than binning, one might kernel-smooth counts defining our estimator’s dividends, and thus, obtain pointwise association estimates. So long as bandwidth choice is a priori, inferences go through, as in the current paper. However, for non-constant association, we find kernel-smoothed estimators to be quite biased.
Our failure to find strongly for greater familial aggregation of early-, than late-onset dementia contrasts with a recent study (Silverman et al., 2005). Comparison is complicated by substantial differences in the sampling strategies and analytical methodologies used in the two studies. However, our analysis cautions against too strongly downplaying familial aggregation in later-onset dementia.
ACKNOWLEDGEMENT
The authors acknowledge the support of the National Institutes of Health. They are grateful to reviewers for their helpful comments and to Dr. Peter Zandi for providing the Cache County data.
APPENDIX 1
Elucidation of variance terms
As a first step, it is useful to write the respective means in a different format from that given preceding (4). We begin with the concordance, i.e. numerator term. Note that, the compound event {(Xa − Xb)(Ya − Yb) > 0, (X(ab), Y(ab)) ∈ Bj, (K(ab), L(ab)) = (k, l)} occurs, if and only if there occurs {(Xa < Xb), (Ya < Yb), (X(ab), Y(ab)) ∈ Bj, (K(ab), L(ab)) = (k, l)} and/or {(Xa > Xb), (Ya > Yb), (X(ab), Y(ab)) ∈ Bj, (K(ab), L(ab)) = (k, l)}. Then,
(A1) |
where Eb|a denotes expectation with respect to Rb conditioning on Ra, and so on. Term ζ1C follows from the second line of (A1), replacing Ea with vara. If we define
i.e. the probability that (Xb, Yb) is a (k, l)-type failure occurring in the intersection of Bj and the quadrant {0 < x < Xa, 0 < y < Ya}, then,
To elucidate discordance terms concisely, we define one-dimensional analogues of a few already defined quantities. First, we define ‘slices’ of Bj: let Bj1(y), be the set of x-axis values, such that (x, y) ∈ Bj, and analogously for Bj2(x). Let the version without an argument, Bj1 (Bj2), be the set of x-axis (y-axis) values, such that (x, y) ∈ Bj for at least one y (x). Secondly, denote one-dimensional regions where bin slices, on the real line, intersect 0 : z interval line segments τ0:z by δ1{(z, y)} = Bj1(y) ∩ τ0:z and δ2{(x, z)} = Bj2(x) ∩ τ0:z. Then, if we assume that all regions in question are measurable, the discordance analogues of the concordance-related quantities (6) and (7) follow:
where ‘.’ denotes the usual sum over all possibilities with respect to the argument at issue. If bins are defined on a rectangular grid, D1{(z, y)} simplifies to Bj1 ∩ τ0:z, and similarly for D2{(x, z)}. The expansion for η1 is similar to that for the variance terms, and we omit it.
APPENDIX 2
Asymptotic independence across causes and bins
Consider cause-specific cross ratios for the same, ‘jth’ bin but different cause combinations; without loss of generality, (k, l) = (1, 1) and (1, 2). We have,
where ρC1 = cov{hC(j,1,1)(Ra, Rb), hC(j,1,2)(Ra, Ri)}, a < b ≠ i; ρC2 = cov{hC(j,1,1)(Ra, Rb), hC(j,1,2)(Ra, Rb)}, and similarly for ρD1 and ρD2 with respect to discordance kernels. The last of these terms has summands that converge to equal limits, equal to
Hence, cov{θ̂CS(j; 1, 1), θ̂CS(j; 1, 2)} converges to 0. Asymptotic independence follows with asymptotic normality. The proof for independence of cause-specific cross ratios across bins follows sufficiently similarly, that we omit it.
Contributor Information
Karen Bandeen-Roche, Email: kbandeen@jhsph.edu.
Jing Ning, Email: jning@jhsph.edu.
REFERENCES
- Bandeen-roche K, Liang KY. Modelling multivariate failure time associations in the presence of a competing risk. Biometrika. 2002;89:299–314. doi: 10.1093/biomet/asm091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barbe P, Genest C, Ghoudi K, RÁillard B. On Kendall’s process. J. Multivariate Anal. 1996;58:197–229. [Google Scholar]
- Benichou J, Gail MH. Estimates of absolute cause-specific risk in cohort studies. Biometrics. 1990;46:813–826. [PubMed] [Google Scholar]
- Breitner JCS, Wyse BW, Anthony JC, Welsh-Bohmer KA, Steffens DC, Norton MC, Tschanz JT, Plassman BL, Meyer MR, Skoog I, Khachaturian A. APOE-ϵ4 count predicts age when prevalence of AD increases, then declines: The Cache County Study. Neurology. 1999;53:321–331. doi: 10.1212/wnl.53.2.321. [DOI] [PubMed] [Google Scholar]
- Chen MC, Bandeen-Roche K. A diagnostic for association in bivariate survival models. Lifetime Data Anal. 2005;11:245–264. doi: 10.1007/s10985-004-0386-8. [DOI] [PubMed] [Google Scholar]
- Cheng Y, Fine JP. Nonparametric estimation of cause-specific cross hazard ratio with bivariate competing risks data. Biometrika. 2008 to appear. [Google Scholar]
- Clayton DG. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika. 1978;65:141–151. [Google Scholar]
- Clayton D, Cuzick J. Multivariate generalizations of the proportional hazards model. J. Roy. Statist. Soc. A. 1985;148:82–108. [Google Scholar]
- Fan J, Prentice RL, Hsu L. A class of weighted dependence measures for bivariate failure time data. J. R. Stat. Soc. B. 2000;62:181–190. [Google Scholar]
- Fine JP, Jiang H. On association in a copula with time transformations. Biometrika. 2000;87:559–571. [Google Scholar]
- Genest C, Ghoudi K, Rivest L-P. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika. 1995;82:543–552. [Google Scholar]
- Genest C, Mackay J. Copules archimediennes et familles de lois bidimensionelles dont les marges sont données. Canad. J. Statist. 1986;14:145–159. [Google Scholar]
- Genest C, Rivest L-P. Statistical inference procedures for bivariate Archimedean copulas. J. Amer. Statist. Assoc. 1993;88:1034–1043. [Google Scholar]
- Glidden DV. A two-stage estimator of the dependence parameter for the Clayton–Oakes model. Lifetime Data Anal. 2000;6:141–156. doi: 10.1023/a:1009664011060. [DOI] [PubMed] [Google Scholar]
- Hendrie HC. Epidemiology of dementia and Alzheimer’s disease. Am. J. Geriatr. Psychol. 1998;6:S3–S18. doi: 10.1097/00019442-199821001-00002. [DOI] [PubMed] [Google Scholar]
- Hsu L, Prentice RL. On assessing the strength of dependency between failure time variates. Biometrika. 1996;83:491–506. [Google Scholar]
- Kendall MG. Rank Correlation Methods. 2nd ed. London: Griffin; 1955. [Google Scholar]
- Lee L. Multivariate distributions having Weibull properties. J. Multivariate Anal. 1979;9:267–277. [Google Scholar]
- Nielsen GG, Gill RD, Andersen PK, Sorensen TIA. A counting process approach to maximum likelihood estimation in frailty models. Scand. J. Statist. 1992;19:25–43. [Google Scholar]
- Oakes D. A model for association in bivariate survival data. J. R. Stat. Soc. B. 1982;44:414–422. [Google Scholar]
- Oakes D. Semiparametric inference in a model for association in bivariate survival data. Biometrika. 1986;73:353–361. [Google Scholar]
- Oakes D. Bivariate survival models induced by frailties. J. Am. Statist. Assoc. 1989;84:487–493. [Google Scholar]
- Prentice RL, Cai J. Covariance and survivor function estimation using censored multivariate failure time data. Biometrika. 1992;79:495–512. [Google Scholar]
- Prentice RL, Hsu L. Regression on hazard ratios and cross ratios in multivariate failure time analysis. Biometrika. 1997;84:349–363. [Google Scholar]
- Prentice RL, Kalbfleisch JD, Peterson AVPJR, Flournoy N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics. 1978;34:541–554. [PubMed] [Google Scholar]
- Ripatti S, Larsen K, Palmgren J. Maximum likelihood inference for multivariate frailty models using an automated Monte Carlo EM algorithm. Lifetime Data Anal. 2002;8:349–360. doi: 10.1023/a:1020566821163. [DOI] [PubMed] [Google Scholar]
- Ripatti S, Palmgren J. Estimation of multivariate frailty models using penalized partial likelihood. Biometrics. 2000;56:1016–1022. doi: 10.1111/j.0006-341x.2000.01016.x. [DOI] [PubMed] [Google Scholar]
- Serfling RJ. Approximation Theorems of Mathematical Statistics. New York: Wiley; 1980. [Google Scholar]
- Shieh GS. A weighted Kendall’s tau statistic. Statist. Probab. Lett. 1998;39:17–24. [Google Scholar]
- Shih JH, Louis TA. Inferences on the association parameter in copula models for bivariate survival data. Biometrics. 1995;51:1584–1599. [PubMed] [Google Scholar]
- Silverman JM, Ciresi G, Smith CJ, Marin DB, Schaider-beeri M. Variability of familial risk of Alzheimer disease across the late life span. Arch. Gener. Psychol. 2005;62:565–573. doi: 10.1001/archpsyc.62.5.565. [DOI] [PubMed] [Google Scholar]
- Viswanathan B, Manatunga AK. Diagnostic plots for assessing the frailty distribution in multivariate survival data. Lifetime Data Anal. 2001;7:143–155. doi: 10.1023/a:1011348823081. [DOI] [PubMed] [Google Scholar]
- Wang W, Wells MT. Estimation of Kendall’s tau under censoring. Statist. Sinica. 2000;10:1199–1215. [Google Scholar]