Rank-based testing of equal survivorship based on cross-sectional survival data with or without prospective follow-up

Kwun Chuen Gary Chan; Jing Qin

doi:10.1093/biostatistics/kxv011

. 2015 Mar 25;16(4):772–784. doi: 10.1093/biostatistics/kxv011

Rank-based testing of equal survivorship based on cross-sectional survival data with or without prospective follow-up

Kwun Chuen Gary Chan ^1,^*, Jing Qin ²

PMCID: PMC4570577 PMID: 25813647

Abstract

Existing linear rank statistics cannot be applied to cross-sectional survival data without follow-up since all subjects are essentially censored. However, partial survival information are available from backward recurrence times and are frequently collected from health surveys without prospective follow-up. Under length-biased sampling, a class of linear rank statistics is proposed based only on backward recurrence times without any prospective follow-up. When follow-up data are available, the proposed rank statistic and a conventional rank statistic that utilizes follow-up information from the same sample are shown to be asymptotically independent. We discuss four ways to combine these two statistics when follow-up is present. Simulations show that all combined statistics have substantially improved power compared with conventional rank statistics, and a Mantel-Haenszel test performed the best among the proposal statistics. The method is applied to a cross-sectional health survey without follow-up and a study of Alzheimer's disease with prospective follow-up.

Keywords: Accelerated failure time model, Backward recurrence time, Length biased sampling

1. Introduction

Cross-sectional sampling of survival data recruits individuals who have experienced a certain initial event but not a failure event at the sampling time. It is considered a focused and economical design for studying the natural history of disease (Wang, 1991) but is subject to selection bias. In particular, subjects with a longer survival time are more likely to be sampled. Cross-sectional data can be collected without prospective follow-up, in this case only a backward recurrence time is collected as discussed in Cox (1962), Allison (1985), Yamaguchi (2003), among others. The estimation of the backward time distribution is one of the key elements in evaluating the statistical accuracy of estimates of current HIV incidence rates from cross-sectional surveys. Backward recurrence time is also frequently collected in health surveys.

In cross-sectional data with prospective follow-up in addition to the backward time, the data structure allows one to apply statistical methods designed for left truncated and right censored data. When disease incidence is stationary over time, cross-sectional survival data are length biased (Wang, 1991). For length-biased survival data, it has been shown that conventional methods for analyzing left truncated right censored data are inefficient and more efficient estimators were proposed by Vardi (1989), Shen and others (2009), Ning and others (2011), Chan and others (2012), among others.

In conventional survival analysis with right censored data, Prentice (1978) introduced a general class of linear rank tests for a semiparametric accelerated failure time model. Unlike the fully parametric approach which depends on the correctness of the underlying distributional assumption, the rank test is robust to distributional misspecification. If the parametric assumption is correct, then the rank test is fully efficient. On the other hand, the rank test can produce a valid test even when the distributional assumption is misspecified. Ying (1990) extended linear rank statistics to left truncated and right censored data which can be applied to cross-sectional survival data. When the censoring proportion is high, the existing tests typically have low power. Recently, Ning and others (2010) developed a modified log-rank test for Inline graphic -sample testing based on cross-sectional survival data with prospective follow-up. However, the tests developed by Ying (1990) and Ning and others (2010) cannot be applied to cross-sectional survival data without prospective follow-up.

In this paper, we extend the work of Prentice (1978) and Ying (1990) to length-biased cross-sectional data without follow-up based on backward recurrence time only, which can be optimally combined with Ying's rank test when prospective follow-up is present. As in Prentice (1978) and Ying (1990), our primary focus is to test the null hypothesis Inline graphic against two-sided alternatives under a population accelerated failure time model

(1.1)

where Inline graphic is the survival time of interest, is a -vector covariate, is a -vector coefficient, and follows an unspecified distribution with a density function and a survival function . It is assumed that and are independent. The proposed statistics can be applied to cross-sectional survival data without follow-up, whereas the log-rank statistic of Ning and others (2010) cannot be applied to that case. Moreover, the sampling distribution of their statistic is found by resampling methods when censoring is present. In contrast, the asymptotic variance of the proposed test statistic has a closed form consistent estimator and can be computed from existing software. This will facilitate practical implementations of the proposed methods.

The rest of the paper is organized as follows. The proposed linear rank statistics for cross-sectional data without follow-up are given in Section 2. When follow-up data are also available in a cross-sectional sample, we can combine the linear rank statistics proposed in Section 2 and the linear rank statistics based on follow-up data to improve power. Four ways to combine the two statistics will be discussed in Section 3. Results from simulation studies are presented in Section 4. Section 5 contains an analysis of the National Comorbidity Survey Replication. Section 6 contains an analysis of the Canadian study of Health and Aging, and concluding remarks are given in Section 7.

2. Linear rank statistics for cross-sectional data without follow-up

In a cross-sectional sample with complete follow-up, we could observe Inline graphic , where is the time from an initial event to recruitment, also known as the backward recurrence time, and is the time from recruitment to a failure event, also known as the forward recurrence time. In a cross-sectional sample without follow-up, only is observed but not or . In other words, every individual is right censored. Under length-biased sampling, the backward time has a conditional density (Cox, 1962)

where Inline graphic is the conditional survival function of given , and

(2.1)

The last equality in (2.1) is shown in Section A of Supplementary material available at Biostatistics online. The density of Inline graphic given above is a consequence of length-biased sampling (Cox, 1962). In the context of cross-sectional survival data, length-biasness of survival time requires a stationary disease incidence assumption where the disease incidence in the population remains constant over time and independent of Inline graphic (Wang, 1991). This assumption is imposed for the estimation of for model (1.1) in Shen and others (2009), Ning and others (2011), among others and the test of Ning and others (2010). We derive the test statistic under this working assumption, since the derivation is tractable. However, we show in Section D of Supplementary material available at Biostatistics online that the test is valid even for disease incidence being non-stationary over time. Simulations are reported in Section 4 which indicates that the test can be valid under misspecification of the stationary incidence assumption.

When the distribution of Inline graphic is known, parametric tests can be readily constructed for testing , but their validity depend on the correct model specification. Alternative to parametric tests, the linear rank test was proposed to offer a greater robustness to model misspecification. A comprehensive review of linear rank statistics for right censored data can be found in Kalbfleisch and Prentice (2002, Chapter 7). In the following, we derive a linear rank test based on backward recurrence time Inline graphic . Let be the ordered outcomes with corresponding covariate vectors , respectively. The rank vector is given by the corresponding labels and the rank likelihood (Kalbfleisch and Prentice, 1973) based on is

where Inline graphic are ordered values of and are the corresponding covariate values. Under model (1.1), one can write out the log-rank likelihood as

A score test statistic can be constructed as

(2.2)

The expectation is taken with respect to the Inline graphic th order statistic generated from a distribution with density . The derivation of (2.2) is shown in Section B of Supplementary material available at Biostatistics online. Furthermore, since

the linear rank statistic can be written as Inline graphic , where , and , and are ordered statistics of uniform distributed random variables. Note that is quite different from which is given in Kalbfleisch and Prentice (2002, Chapter 7) for unbiased samples.

We consider two important special cases where Inline graphic and the associated rank tests have closed form expressions, which lead to easier computations and well-known test statistics. First, when the error term is extreme-value distributed, it can be shown that is the survival function of an exponential distribution. Therefore, it follows from the same calculation in Prentice (1978) that Inline graphic . Thus, it follows that which is exactly the log-rank statistics treating backward recurrence times as if they are completely observed survival times in an unbiased sample.

Another important special case is to construct a Wilcoxon statistics based on backward recurrence times. Suppose the error distribution follows the G-rho family with Inline graphic (Harrington and Fleming, 1982), which has a survival function

(2.3)

It follows that Inline graphic and . Therefore, and . The corresponding statistic is which is exactly the Wilcoxon statistics treating backward recurrence times as if they are completely observed survival times in an unbiased sample. In contrast to the construction of a Wilcoxon statistic for an unbiased sample, which corresponds to a logistic error distribution, the construction here requires the error distribution to have a lighter tail. It is because the distribution backward recurrence time, called the stationary distribution in renewal theory (Cox, 1962), is generally having a heavier tail than the survival distribution. One exception is an exponential survival distribution which has an exponential stationary distribution. This corresponds to an extreme-valued error distribution, where the log-rank test can be constructed based on unbiased survival data and backward recurrence times. Note that for a logistic distributed error, the stationary distribution is undefined because Inline graphic is infinite.

The test statistics are asymptotically normal following from the standard theory of linear rank estimation, provided that Inline graphic (Hájek and others, 1967; Prentice, 1978). Following their arguments, a consistent variance estimate of is and a consistent variance estimate of is , where for a column vector . Therefore, Chi-square test statistics can be constructed by and which are asymptotically distributed under the null hypothesis.

In the literature, there are a few prominent ways for choosing between the log-rank and Wilcoxon tests. One way is to specify the test statistic before analysis based on prior belief or hypothesis about the possible alternative hypothesis. When proportional hazard alternative is assumed, the log-rank test is optimal. When it is assumed that the hazard ratio decreases with time, Wilcoxon test is usually more powerful. Another way is to use graphical diagnostics for choosing the test statistics (Hess, 1995), and a related pre-test is recently proposed by Martinez (2010). Note that the choice between log-rank and Wilcoxon tests only affects power, and will only be useful for borderline cases. An advantage for rank-based statistics is their robustness against misspecification of parametric error distribution. In fact, log-rank statistic is often used in practice despite possible departure from proportional hazards alternative, since it provides a conservative test in such cases.

In the above discussion, we constructed rank-based statistics based on backward recurrence times Inline graphic observed in a cross-sectional sample as if they are completely observed survival times in an unbiased sample. The result is seemingly paradoxical, and can be better understood from the following. Cross-sectional sampling preferentially select longer survivors. For subgroups with better survival, it is more likely to recruit individuals who lived longer since disease incidence. Therefore, the backward time Inline graphic also tends to be longer in those subgroups. It was discussed in Allison (1985) that one can fit a proportional hazards model based on backward recurrence times to estimate the relative hazard in the population survival model under a strong assumption of exponential distributed failure time. The log-rank statistic is the corresponding score statistic based on a partial likelihood function. For parametric accelerated failure time models, Yamaguchi (2003) and Keiding and others (2011) also showed that survival model parameters can be estimated based on backward recurrence time. In particular, they showed that Inline graphic for the same regression coefficient but a different error term which has the survival function . This suggests that testing the null hypothesis from the accelerated failure time model using backward recurrences times is equivalent to testing the null hypothesis in the target population. Therefore, linear rank statistics can also be constructed directly backward recurrence times and the corresponding statistics coincide with our results.

3. Combined rank statistics for cross-sectional data with follow-up

When prospective follow-up is present, a right-censored version of the forward recurrence time Inline graphic would be observed in addition to the backward recurrence time . Denote the observed length- biased sampling data with possible right censoring as . Suppose and are conditionally independent given and and the censoring distribution does not involve any parameter of the survival distribution. Note that in length-biased sampling, Inline graphic and are independent in the population. The observed data likelihood is

Based on the conditional likelihood function Inline graphic , linear rank statistics have been proposed by Ying (1990). Based on the marginal likelihood function , linear rank statistics have been proposed in the previous section. We will discuss how to combine the two statistics in this section. Note that the backward recurrence time and forward recurrence time Inline graphic are correlated (Cox, 1962). Also, in the context of cross-sectional survival data, informative censoring occurs because the survival time and the censoring time within a prevalent population shares a common random component . However, and are independent conditional on , since and Inline graphic are independent conditional on . Because of this conditional independence and the definition of risk sets for left truncated right censored data, Ying's test does not suffer from induced informative censoring. However, efficiency is lost because of conditioning. We improve efficiencies by utilizing the information from the marginal distribution of Inline graphic .

We introduce the following counting process notations for representing Ying's statistics: Inline graphic , . When the error is extreme-value distributed, the locally most powerful test based on is the log-rank test , where and is a constant corresponds to the maximal observable survival time. When the distribution function of the error term is (2.3), the locally most powerful test (Harrington and Fleming, 1982) is the G-rho test with Inline graphic , where is the left hand limit of the product-limit estimator for left truncated right censored data (Tsai and others, 1987). The two statistics have consistent variance estimates and respectively. Chi-square test statistics can be constructed by and which are asymptotically distributed under the null hypothesis.

Since two Chi-square test statistics are constructed based on Inline graphic and , respectively, it is intuitive that a combined test statistic shall lead to an improvement of power if the two statistics are not perfectly correlated. In fact, the two statistics are asymptotically independent as we shall show. Power can be greatly improved as if we had two independent samples instead of one. In the following, we focus on combining two log-rank tests Inline graphic and under a working assumption that the error is extreme-value distributed. The combination of test statistics and under a working error distribution (2.3) will follow a similar manner. We first need to find out the correlations between and under the null hypothesis. Note that is a function of only Inline graphic , and which will be shown in Section C of Supplementary material available at Biostatistics online. Therefore, , so that and are uncorrelated. Moreover, the two statistics have limiting Gaussian distributions and are asymptotically independent. Following this result, we investigate several methods for combining the two log-rank statistics.

The first method is inspired by the method combining independent Inline graphic tables in Mantel and Haenszel (1959). Under , is asymptotically normal with mean zero and variance , and a Mantel-Haenszel (MH) statistic is defined as , which is asymptotically distributed.

The second method is a modification of Inline graphic . The covariance matrices and can be written as and , respectively, and and are asymptotically normal with mean zero and variance , the identity matrix. A modified Mantel-Haenszel (MMH) statistic is defined as , which is asymptotically distributed.

The third method is a sum of Chi-square test (Bhattacharya, 1961), defined as Inline graphic , which is asymptotically distributed.

The fourth method is the Fisher's inverse Chi-square test statistics (Fisher, 1932, pp. 99–101), defined as Inline graphic , where and are -values based on and , respectively. The Fisher's statistics is asymptotically distributed, regardless of the dimensions of . It is because the -values are uniformly distributed under the null hypothesis and each of the two terms in is distributed.

There had been other procedures proposed for combining independent Chi-square tests, see, for example, Marden (1982). We considered the above four procedures for the following reasons. The Mantel-Haenszel test has been shown to outperform other procedures for combining Inline graphic tables with similar departures from the null hypothesis for all tables (Louv and Littell, 1986). In our case, the association parameter is the same for the tests based on backward recurrence time and follow-up time, suggesting that Mantel-Haenszel type procedures should have a decent power. On the other hand, the sum of Chi-square and Fisher's tests are shown to be admissible in Marden (1982) and have reasonable power against a wide range of alternatives (Louv and Littell, 1986).

It is well known that the log-rank statistic has optimal power under proportional hazards alternatives. For decreasing hazard ratios, the Wilcoxon test or the G-rho family is more powerful. For right-censored data, Kosorok (1999) derived a versatile maximum test of weighted log-rank statistics, which is powerful against a wide range of alternatives. We would like to stress that our main contribution is rather different than the broad class of tests proposed for right-censored data. We highlight the differences as follows. First, we combine asymptotic independent test statistics from two sources ( Inline graphic and ) using the same set of data. This is different from maximum tests that combine dependent test statistics from one source ( only). For a combination of dependent test statistics, the gain in power is usually limited, whereas we shall show by simulations that the gain in power can be substantial because we combine independent sources of information. Second, the Mantel-Haenzel test, sum test and Fisher's inverse test considered in this section are designed for combining asymptotic independent test statistics. The null distribution will be much more complex when the statistics are correlated. In fact, the maximum test requires extensive simulations to compute the null distribution (Kosorok, 1999). On the other hand, our proposed Fisher's inverse test can be readily extended to combine two maximum tests, one for the backward time and the other from follow-up data, since information from two sources are asymptotically independent.

4. Simulation studies

We performed simulation studies to evaluate the performance of the proposed test statistics. In each case, independent data sets were generated 10 000 times under the null hypothesis and 1000 times under the alternative hypotheses. First, we considered a two-sample testing scenario where Inline graphic was a Bernoulli variable with . The survival time was generated by model (1.1) where and , where was standard exponential distributed. That is, is extreme value distributed. We also considered to be a logistic or a normal distribution with the same mean and variance as the above extreme value distribution to show that the tests are robust against mis-specification of parametric error distribution. To obtain a length-based sample, we generated random truncation times Inline graphic from a distribution; an observation was in the cross-sectional sample if . Data were generated until the cross-sectional sample had or observations. The survival endpoint was censored if an individual in the prevalent cohort survived past time units after recruitment. We compared the proposed testing procedures, including the test based only on backward recurrence time without follow-up and the four combined test statistics, to two existing methods: the linear rank statistics of Ying (1990) based on left truncated right censored data (LTRC) and the modified log-rank statistics of Ning and others (2010), denoted by NSQ. A 5% significance level was chosen. Table 1 summarizes the results from the simulation study. The results showed that all tests had an empirical Type I error that was close to the nominal value, even for a small sample size and misspecified error distributions. Under alternative hypotheses, there were significant improvements in power by incorporating information from backward recurrence times. In fact, the test statistics based on backward recurrence times alone had a higher power than the conventional log-rank test based on left truncated and right censored data. All combined tests had improved power compared with the conventional log-rank test or the test based only on the backward recurrence time. The combined tests had a consistent improved power compared with the test of Ning and others (2010) which also recognized the length-biased data structure but utilized information in a different way. Their test requires the nonparametric estimation of a pooled survival distribution which may compromise power under alternative hypotheses, while the proposed tests do not require such an estimation. Among the proposed tests, the Mantel-Haenszel test and its modification had a consistently higher power than the sum of Chi-square and the Fisher's test, and the modified Mantel-Haenszel test performed slightly better than the Mantel-Haenszel test.

Table 1.

Percentage of hypotheses rejected by the existing and proposed tests: Two-sample testing. LTRC: conventional log-rank statistic for left truncated right censored data, NQS: modified log-rank statistic of Ning and others (2010), backward: proposed log-rank statistic based on backward recurrence time, MH: proposed Mantel-Haenszel statistic, MMH: proposed modified Mantel-Haenszel statistic, SUM: proposed sum of Chi-square statistic, Fisher: proposed Fisher's inverse Chi-square statistic


(a) Extreme value distribution

LTRC	5	11	29	55	71	5	21	61	85	96
NQS	5	16	53	86	98	5	30	86	100	100
Backward	5	15	42	73	94	5	26	72	96	100
MH	5	21	59	89	98	6	39	90	100	100
MMH	5	23	62	91	99	6	41	93	100	100
SUM	5	18	55	86	97	5	32	88	100	100
Fisher	5	18	54	87	98	5	32	88	100	100
(b) Logistic distribution

LTRC	5	11	31	54	78	5	17	53	85	97
NQS	5	14	42	77	95	5	24	77	97	100
Backward	5	11	33	58	80	5	20	58	86	99
MH	5	19	48	82	96	6	30	81	98	100
MMH	5	15	53	87	98	6	33	85	99	100
SUM	5	15	44	80	95	5	25	78	97	100
Fisher	5	15	44	79	96	5	25	79	97	100
(c) Normal distribution

LTRC	5	11	30	59	77	5	20	57	85	96
NQS	5	15	45	79	94	5	25	79	98	100
Backward	6	13	33	60	81	5	21	60	91	99
MH	5	18	50	84	96	6	33	84	99	100
MMH	5	18	54	87	98	6	34	87	99	100
SUM	5	14	45	80	95	5	27	80	98	100
Fisher	5	15	46	80	95	5	27	80	99	100

Open in a new tab

Next we considered a scenario where Inline graphic was bivariate and contained a continuous variable. Suppose , where is a Bernoulli variable with , and is standard uniform distributed. The modified log-rank test of Ning and others (2010) is only applicable to univariate categorical and was not considered under this scenario. We considered alternative hypotheses that do not follow model (1.1) and also two scenarios of non-stationary incident distributions. For stationary incident distribution, we generated random truncation times Inline graphic from a distribution; an observation was in the cross-sectional sample if . For covariate-independent non-stationary disease incidence, is generated from which gives an increasing disease incidence over calendar time. For covariate-independent non-stationary disease incidence, is generated from Inline graphic . Survival time followed with . Note that the survival time did not follow an accelerated failure time model. The sampling of cross-sectional data and the censoring mechanism were chosen to be the same as in the previous example. Results are being shown in Table 2. The major conclusions were similar as in the previous example, and the proposed methods greatly improved power compared with the existing log-rank test, even when the alternative distributions do not follow an accelerated failure time model. The Mantel-Haenszel test and its modification continued to outperform other estimators, and the difference in power was greater in this scenario. The simulations also showed that the type I error remains close to the nominal value under mild departure of the stationary disease incidence assumptions.

Table 2.

Percentage of hypotheses rejected by the existing and proposed tests: Bivariate covariates and a misspecified alternative model. Test procedures are the same as in Table 1


(a) Stationary incidence
Estimator
LTRC	4	17	8	18	4	38	18	43
Backward	5	45	19	61	5	82	39	92
MH	5	64	25	76	5	93	54	97
MMH	5	62	23	70	5	91	53	95
SUM	4	45	18	61	4	86	41	93
Fisher	4	45	18	61	4	86	41	93
(b) Non-stationary, covariate-independent incidence
Estimator
LTRC	4	21	10	20	4	41	16	46
Backward	5	42	17	53	5	79	33	89
MH	4	64	25	71	5	93	47	97
MMH	4	61	23	66	4	92	46	96
SUM	4	48	16	54	5	85	36	92
Fisher	4	48	16	54	5	85	36	92
(c) Non-stationary, covariate-dependent incidence
Estimator
LTRC	5	23	11	26	5	48	21	50
Backward	5	59	23	74	5	90	44	98
MH	6	72	32	85	6	96	56	100
MMH	5	69	29	81	5	96	54	100
SUM	5	60	23	77	5	94	48	99
Fisher	5	60	23	77	5	94	48	99

Open in a new tab

5. Application to a health survey without follow-up

Partial survival information in the form of backward recurrence times are frequently collected in health surveys. Most surveys are administered cross-sectionally without follow-up data, due to cost and logistic reasons. Without any prospective follow-up, survival times being collected are all censored and conventional statistical methods are not applicable. Therefore, the partial survival information being collected are seldom analyzed. An exception in given in McLaughlin and others (2010), who examined the associations between childhood adversities and the durations of adult mental disorders using backward recurrence times collected from a nationally representative sample from the National Comorbidity Survey Replication.

For illustrating the proposed test statistic, we analyzed a different survival outcome collected in the same survey, the duration between suicidal thoughts. Although suicidal thoughts are recurrent events, cross-sectional surveys like the one we have usually collect the most recent onset of the recurrent events. Therefore, we only have time from the last event which is exactly the model set-up in Section 2. The survey was administered in 2001–2002, and collected the time of last suicidal thoughts from 1010 respondents with ages between 18 and 91. We examined whether certain experiences from childhood are associated with the duration between suicidal thoughts among adults. We first examined whether the duration between suicidal thoughts were associated with living with both biological parents until age 16. The proposed log-rank statistic was 9.40 with a Inline graphic null distribution, which gives a -value of 0.002, whereas the proposed Wilcoxon statistic was 13.10 with a -value of 0.0003. As mentioned in Section 2, the log-rank statistic is conservative under departure of proportional hazards alternative, but the result is consistent with the Wilcoxon statistic in this application. A direct comparison of backward recurrence times also revealed a consistent pattern, in which the mean backward recurrence time among individuals who lived with both parents until 16 was 2.6 years (95% CI: 1.0-4.1) longer than individuals who did not live with both parents. We also examined the associations between the duration between suicidal thoughts with three childhood adversities: parental substance abuse, parental criminality, and family violence. The proposed log-rank statistics were 5.0, 12.1, and 0.6 for parental substance abuse, parental criminality, and family violence, respectively, yielding Inline graphic -values of 0.025, 0.001, and 0.431. Therefore, the data suggested that parental substance abuse and criminality were associated with the duration between suicidal thoughts at a 5% significance level. However, the data did not suggest any association between family violence and the duration between suicidal thoughts. We reiterate that the tests of Ying (1990) and Ning and others (2010) are not applicable to this data because there was no prospective follow-up.

6. Application to a prospective study of Alzheimer's disease

Next, we applied the proposed tests to the Canadian Study of Health and Aging (Wolfson and others, 2001). The same example has been used to study the performance of the modified log-rank test in Ning and others (2010). A prevalent sample of individuals suffered from dementia was collected in 1991. The date of dementia onset was collected from medical history, and a prospective follow-up was conducted between 1991 and 1996. It is well established that the sample is subject to length-biased sampling, see, for example, Shen and others (2009) and Ning and others (2010), among others. Subjects were being classified into one of the three diagnostic categories: probable Alzheimer's disease, possible Alzheimer's disease, and vascular dementia. We investigated whether the time from onset to death were associated with diagnostic categories. The available data included 818 subjects with dementia, among them 393 had probable Alzheimer's disease, 252 had possible Alzheimer's disease, and 173 had vascular dementia. We considered pairwise comparisons and an overall comparison among the three groups. The Inline graphic -values from the different tests are summarized in Table 3. As noted in Ning and others (2010), the conventional log-rank test for left truncated right censored data did not reject the null hypotheses that the survival distributions for any two of the three groups were equal. However, statistical significant differences were found between vascular and possible Alzheimer's disease, between probable and possible Alzheimer's disease and in the three-sample test at a 5% significant level using the proposed Mantel-Haenszel tests. Inline graphic -values were slightly higher for the sum test and the Fisher's test. For the three-sample test, the test of Ning and others (2010) obtained a marginal significant result at a 5% level of significance, whereas the proposed Mantel-Haenszel tests had a lower -value of 0.02. Interestingly, the same conclusion can be reached at the baseline recruitment before any follow-up began, by using the log-rank test based on backward recurrence times.

Table 3.

Inline graphic -values multiplied by for the Canadian Health and Aging data. Test procedures are the same as in Table 1. (a) Vascular vs probable, (b) vascular vs possible, (c) probable vs possible, and (d) three-sample test

	(a)	(b)	(c)	(d)
LTRC	42	33	71	54
NQS	35	2	4	5
Backward	45	1	2	1
MH	28	1	4	2
MMH	28	2	5	2
SUM	56	3	5	5
Fisher	52	1	6	5

Open in a new tab

To further understand the power of detecting a difference in a three-sample test, we ran a simulation study with estimated parameter values from the data. We fit a parametric survival model (1.1) with an extreme-value distributed error. We estimated that Inline graphic where is the relative survival time comparing subjects with possible Alzheimer's disease with subjects with probable Alzheimer's disease, and is the relative survival time comparing subjects with vascular dementia with subjects with probable Alzheimer's disease. The error distribution is Inline graphic where is an exponential distribution with an estimated mean of 3.82 years. The result is comparable with a semiparametric analysis based on log-rank estimating equations which estimates . It is estimated that and of cases has probable Alzheimer's disease, possible Alzheimer's disease, and vascular dementia, respectively, using the bias corrected estimator of Chan and Wang (2012). We simulated 1000 independent data sets using these parameters and each data set contained 818 subjects in a cross-sectional sample. The generation of cross-sectional sample was described in Section 4. The residual censoring time from recruitment was generated from a uniform Inline graphic distribution, consistent with the observed data. We found that the power of the sum of Chi-square test and Fisher's test were 76%, whereas the power of the Mantel-Haenszel test and the modified Mantel-Haenszel test were 86%. Therefore, the Mantel-Haenszel tests have a lower Type II error and are more likely to reject the null hypothesis when then the alternative is true.

7. Concluding remarks

A linear rank statistic is proposed for length-biased cross-sectional survival data without follow-up, based on backward recurrence time. Existing test statistics cannot be applied to this case because all subjects are essentially censored. Interestingly, the special cases of log-rank and Wilcoxon statistics treat backward recurrence time as if it is the completely observed survival time. When prospective follow-up is present, the proposed rank statistic without follow-up can be combined with a conventional rank statistic for cross-sectional data with follow-up. Another interesting observation is that the two rank statistics based on the same sample are asymptotically independent. This property facilitates the combination of two statistics and four methods were explored. Also, combing two independent statistics can substantially improve power. Based on the simulation studies and the data analysis, the proposed Mantel-Haenszel tests have a larger power and we recommend their use in practice.

The test of Ning and others (2010) was designed for cross-sectional data with prospective follow-up. The test is not asymptotically equivalent to our proposed method. In a cross-sectional sample without prospective follow-up, the test of Ning and others (2010) has infinite variability and the proposed method still work. When forward follow-up is much shorter compared with backward time, which happens in practice because follow-up data is typically costly to collect, the proposed test is expected to outperform its existing competitors. Our test statistics are also applicable to the case where follow-up time is only observed from a subsample. Existing methods would generally discard backward information for the incomplete observations without forward follow-up, while the proposed method can utilize backward information from the full sample.

Funding

K.C.G.C. is partially supported by grant R01 HL-122212 from the National Institutes of Health.

Supplementary Material

Supplementary Data

supp_16_4_772__index.html^{(883B, html)}

Acknowledgement

The authors thank Prof. Anastasios Tsiatis, an associate editor and two reviewers for their helpful comments and suggestions which greatly improved this paper. The authors thank Prof. Masoud Asgharian for sharing the Canadian Study of Health and Aging data. Conflict of Interest: None declared.

References

Allison P. D. (1985). Survival analysis of backward recurrence times. Journal of the American Statistical Association 80390, 315–322. [Google Scholar]
Bhattacharya N. (1961). Sampling experiments on the combination of independent tests. Sankhya 11, 191–196. [Google Scholar]
Chan K. C. G., Chen Y. Q., Di C.-Z. (2012). Proportional mean residual life model for right-censored length-biased data. Biometrika 994, 995–1000. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chan K. C. G., Wang M.-C. (2012). Estimating incident population distribution from prevalent data. Biometrics 682, 521–531. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cox D. R. (1962). Renewal Theory. London: Methuen. [Google Scholar]
Fisher R. A. (1932). Statistical Methods for Research Workers. London: Oliver and Boyd. [Google Scholar]
Hájek J. (1967). Theory of Rank Tests. New York: Academic press. [Google Scholar]
Harrington D. P., Fleming T. R. (1982). A class of rank test procedures for censored survival data. Biometrika 693, 553–566. [Google Scholar]
Hess K. R. (1995). Graphical methods for assessing violations of the proportional hazards assumption in Cox regression. Statistics in Medicine 1415, 1707–1723. [DOI] [PubMed] [Google Scholar]
Kalbfleisch J. D., Prentice R. L. (1973). Marginal likelihoods based on Cox's regression and life model. Biometrika 602, 267–278. [Google Scholar]
Kalbfleisch J. D., Prentice R. L. (2002). The Statistical Analysis of Failure Time Data. New York, NY: John Wiley & Sons. [Google Scholar]
Keiding N., Fine J. P., Hansen O. H., Slama R. (2011). Accelerated failure time regression for backward recurrence times and current durations. Statistics & Probability Letters 817, 724–729. [Google Scholar]
Kosorok M. R., Lin C.-Y. (1978). The versatility of function-indexed weighted log-rank statistics. Journal of the American Statistical Association 94445, 320–332. [Google Scholar]
Louv W. C., Littell R. C. (1986). Combining one-sided binomial tests. Journal of the American Statistical Association 81394, 550–554. [Google Scholar]
Mantel N., Haenszel W. (1959). Statical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 224, 719–748. [PubMed] [Google Scholar]
Marden J. I. (1982). Combining independent noncentral chi squared or f tests. The Annals of Statistics 101, 266–277. [Google Scholar]
Martinez R. L. M. C., Naranjo J. D. (2010). A pretest for choosing between logrank and wilcoxon tests in the two-sample problem. Metron 682, 111–125. [Google Scholar]
McLaughlin K., Green J. G., Gruber M. J., Sampson N. A., Zaslavsky A. M., Kessler R. C. (2010). Childhood adversities and adult psychiatric disorders in the national comorbidity survey replication II: associations with persistence of DSM-IV disorders. Archives of General Psychiatry 672, 124–132. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ning J., Qin J., Shen Y. (2010). Non-parametric tests for right-censored data with biased sampling. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 725, 609–630. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ning J., Qin J., Shen Y. (2011). Buckley–James-type estimator with right-censored and length-biased data. Biometrics 674, 1369–1378. [DOI] [PMC free article] [PubMed] [Google Scholar]
Prentice R. L. (1978). Linear rank tests with right censored data. Biometrika 651, 167–179. [Google Scholar]
Shen Y., Ning J., Qin J. (2009). Analyzing length-biased data with semiparametric transformation and accelerated failure time models. Journal of the American Statistical Association 104487, 1192–1202. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tsai W.-Y., Jewell N. P., Wang M.-C. (1987). A note on the product-limit estimator under right censoring and left truncation. Biometrika 744, 883–886. [Google Scholar]
Vardi Y. (1989). Multiplicative censoring, renewal processes, deconvolution and decreasing density: nonparametric estimation. Biometrika 764, 751–761. [Google Scholar]
Wang M.-C. (1991). Nonparametric estimation from cross-sectional survival data. Journal of the American Statistical Association 86, 130–143. [Google Scholar]
Wolfson C., Wolfson D. B., Asgharian M., M’Lan C. E. (2001). A reevaluation of the duration of survival after the onset of dementia. New England Journal of Medicine 34415, 1111–1116. [DOI] [PubMed] [Google Scholar]
Yamaguchi K. (2003). Accelerated failure–time mover–stayer regression models for the analysis of last–episode data. Sociological Methodology 331, 81–110. [Google Scholar]
Ying Z. (1990). Linear rank statistics for truncated data. Biometrika 774, 909–914. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

supp_16_4_772__index.html^{(883B, html)}

supp_kxv011_kxv011supp.pdf^{(110.8KB, pdf)}

[C1] Allison P. D. (1985). Survival analysis of backward recurrence times. Journal of the American Statistical Association 80390, 315–322. [Google Scholar]

[C2] Bhattacharya N. (1961). Sampling experiments on the combination of independent tests. Sankhya 11, 191–196. [Google Scholar]

[C3] Chan K. C. G., Chen Y. Q., Di C.-Z. (2012). Proportional mean residual life model for right-censored length-biased data. Biometrika 994, 995–1000. [DOI] [PMC free article] [PubMed] [Google Scholar]

[C4] Chan K. C. G., Wang M.-C. (2012). Estimating incident population distribution from prevalent data. Biometrics 682, 521–531. [DOI] [PMC free article] [PubMed] [Google Scholar]

[C5] Cox D. R. (1962). Renewal Theory. London: Methuen. [Google Scholar]

[C6] Fisher R. A. (1932). Statistical Methods for Research Workers. London: Oliver and Boyd. [Google Scholar]

[C7] Hájek J. (1967). Theory of Rank Tests. New York: Academic press. [Google Scholar]

[C8] Harrington D. P., Fleming T. R. (1982). A class of rank test procedures for censored survival data. Biometrika 693, 553–566. [Google Scholar]

[C9] Hess K. R. (1995). Graphical methods for assessing violations of the proportional hazards assumption in Cox regression. Statistics in Medicine 1415, 1707–1723. [DOI] [PubMed] [Google Scholar]

[C10] Kalbfleisch J. D., Prentice R. L. (1973). Marginal likelihoods based on Cox's regression and life model. Biometrika 602, 267–278. [Google Scholar]

[C11] Kalbfleisch J. D., Prentice R. L. (2002). The Statistical Analysis of Failure Time Data. New York, NY: John Wiley & Sons. [Google Scholar]

[C12] Keiding N., Fine J. P., Hansen O. H., Slama R. (2011). Accelerated failure time regression for backward recurrence times and current durations. Statistics & Probability Letters 817, 724–729. [Google Scholar]

[C13] Kosorok M. R., Lin C.-Y. (1978). The versatility of function-indexed weighted log-rank statistics. Journal of the American Statistical Association 94445, 320–332. [Google Scholar]

[C14] Louv W. C., Littell R. C. (1986). Combining one-sided binomial tests. Journal of the American Statistical Association 81394, 550–554. [Google Scholar]

[C15] Mantel N., Haenszel W. (1959). Statical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 224, 719–748. [PubMed] [Google Scholar]

[C16] Marden J. I. (1982). Combining independent noncentral chi squared or f tests. The Annals of Statistics 101, 266–277. [Google Scholar]

[C17] Martinez R. L. M. C., Naranjo J. D. (2010). A pretest for choosing between logrank and wilcoxon tests in the two-sample problem. Metron 682, 111–125. [Google Scholar]

[C18] McLaughlin K., Green J. G., Gruber M. J., Sampson N. A., Zaslavsky A. M., Kessler R. C. (2010). Childhood adversities and adult psychiatric disorders in the national comorbidity survey replication II: associations with persistence of DSM-IV disorders. Archives of General Psychiatry 672, 124–132. [DOI] [PMC free article] [PubMed] [Google Scholar]

[C19] Ning J., Qin J., Shen Y. (2010). Non-parametric tests for right-censored data with biased sampling. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 725, 609–630. [DOI] [PMC free article] [PubMed] [Google Scholar]

[C20] Ning J., Qin J., Shen Y. (2011). Buckley–James-type estimator with right-censored and length-biased data. Biometrics 674, 1369–1378. [DOI] [PMC free article] [PubMed] [Google Scholar]

[C21] Prentice R. L. (1978). Linear rank tests with right censored data. Biometrika 651, 167–179. [Google Scholar]

[C22] Shen Y., Ning J., Qin J. (2009). Analyzing length-biased data with semiparametric transformation and accelerated failure time models. Journal of the American Statistical Association 104487, 1192–1202. [DOI] [PMC free article] [PubMed] [Google Scholar]

[C23] Tsai W.-Y., Jewell N. P., Wang M.-C. (1987). A note on the product-limit estimator under right censoring and left truncation. Biometrika 744, 883–886. [Google Scholar]

[C24] Vardi Y. (1989). Multiplicative censoring, renewal processes, deconvolution and decreasing density: nonparametric estimation. Biometrika 764, 751–761. [Google Scholar]

[C25] Wang M.-C. (1991). Nonparametric estimation from cross-sectional survival data. Journal of the American Statistical Association 86, 130–143. [Google Scholar]

[C26] Wolfson C., Wolfson D. B., Asgharian M., M’Lan C. E. (2001). A reevaluation of the duration of survival after the onset of dementia. New England Journal of Medicine 34415, 1111–1116. [DOI] [PubMed] [Google Scholar]

[C27] Yamaguchi K. (2003). Accelerated failure–time mover–stayer regression models for the analysis of last–episode data. Sociological Methodology 331, 81–110. [Google Scholar]

[C28] Ying Z. (1990). Linear rank statistics for truncated data. Biometrika 774, 909–914. [Google Scholar]

PERMALINK

Rank-based testing of equal survivorship based on cross-sectional survival data with or without prospective follow-up

Kwun Chuen Gary Chan

Jing Qin

Abstract

1. Introduction

2. Linear rank statistics for cross-sectional data without follow-up

3. Combined rank statistics for cross-sectional data with follow-up