Abstract
Existing linear rank statistics cannot be applied to cross-sectional survival data without follow-up since all subjects are essentially censored. However, partial survival information are available from backward recurrence times and are frequently collected from health surveys without prospective follow-up. Under length-biased sampling, a class of linear rank statistics is proposed based only on backward recurrence times without any prospective follow-up. When follow-up data are available, the proposed rank statistic and a conventional rank statistic that utilizes follow-up information from the same sample are shown to be asymptotically independent. We discuss four ways to combine these two statistics when follow-up is present. Simulations show that all combined statistics have substantially improved power compared with conventional rank statistics, and a Mantel-Haenszel test performed the best among the proposal statistics. The method is applied to a cross-sectional health survey without follow-up and a study of Alzheimer's disease with prospective follow-up.
Keywords: Accelerated failure time model, Backward recurrence time, Length biased sampling
1. Introduction
Cross-sectional sampling of survival data recruits individuals who have experienced a certain initial event but not a failure event at the sampling time. It is considered a focused and economical design for studying the natural history of disease (Wang, 1991) but is subject to selection bias. In particular, subjects with a longer survival time are more likely to be sampled. Cross-sectional data can be collected without prospective follow-up, in this case only a backward recurrence time is collected as discussed in Cox (1962), Allison (1985), Yamaguchi (2003), among others. The estimation of the backward time distribution is one of the key elements in evaluating the statistical accuracy of estimates of current HIV incidence rates from cross-sectional surveys. Backward recurrence time is also frequently collected in health surveys.
In cross-sectional data with prospective follow-up in addition to the backward time, the data structure allows one to apply statistical methods designed for left truncated and right censored data. When disease incidence is stationary over time, cross-sectional survival data are length biased (Wang, 1991). For length-biased survival data, it has been shown that conventional methods for analyzing left truncated right censored data are inefficient and more efficient estimators were proposed by Vardi (1989), Shen and others (2009), Ning and others (2011), Chan and others (2012), among others.
In conventional survival analysis with right censored data, Prentice (1978) introduced a general class of linear rank tests for a semiparametric accelerated failure time model. Unlike the fully parametric approach which depends on the correctness of the underlying distributional assumption, the rank test is robust to distributional misspecification. If the parametric assumption is correct, then the rank test is fully efficient. On the other hand, the rank test can produce a valid test even when the distributional assumption is misspecified. Ying (1990) extended linear rank statistics to left truncated and right censored data which can be applied to cross-sectional survival data. When the censoring proportion is high, the existing tests typically have low power. Recently, Ning and others (2010) developed a modified log-rank test for
-sample testing based on cross-sectional survival data with prospective follow-up. However, the tests developed by Ying (1990) and Ning and others (2010) cannot be applied to cross-sectional survival data without prospective follow-up.
In this paper, we extend the work of Prentice (1978) and Ying (1990) to length-biased cross-sectional data without follow-up based on backward recurrence time only, which can be optimally combined with Ying's rank test when prospective follow-up is present. As in Prentice (1978) and Ying (1990), our primary focus is to test the null hypothesis
against two-sided alternatives under a population accelerated failure time model
![]() |
(1.1) |
where
is the survival time of interest,
is a
-vector covariate,
is a
-vector coefficient, and
follows an unspecified distribution with a density function
and a survival function
. It is assumed that
and
are independent. The proposed statistics can be applied to cross-sectional survival data without follow-up, whereas the log-rank statistic of Ning and others (2010) cannot be applied to that case. Moreover, the sampling distribution of their statistic is found by resampling methods when censoring is present. In contrast, the asymptotic variance of the proposed test statistic has a closed form consistent estimator and can be computed from existing software. This will facilitate practical implementations of the proposed methods.
The rest of the paper is organized as follows. The proposed linear rank statistics for cross-sectional data without follow-up are given in Section 2. When follow-up data are also available in a cross-sectional sample, we can combine the linear rank statistics proposed in Section 2 and the linear rank statistics based on follow-up data to improve power. Four ways to combine the two statistics will be discussed in Section 3. Results from simulation studies are presented in Section 4. Section 5 contains an analysis of the National Comorbidity Survey Replication. Section 6 contains an analysis of the Canadian study of Health and Aging, and concluding remarks are given in Section 7.
2. Linear rank statistics for cross-sectional data without follow-up
In a cross-sectional sample with complete follow-up, we could observe
, where
is the time from an initial event to recruitment, also known as the backward recurrence time, and
is the time from recruitment to a failure event, also known as the forward recurrence time. In a cross-sectional sample without follow-up, only
is observed but not
or
. In other words, every individual is right censored. Under length-biased sampling, the backward time has a conditional density (Cox, 1962)
![]() |
where
is the conditional survival function of
given
,
and
![]() |
(2.1) |
The last equality in (2.1) is shown in Section A of Supplementary material available at Biostatistics online. The density of
given above is a consequence of length-biased sampling (Cox, 1962). In the context of cross-sectional survival data, length-biasness of survival time requires a stationary disease incidence assumption where the disease incidence in the population remains constant over time and independent of
(Wang, 1991). This assumption is imposed for the estimation of
for model (1.1) in Shen and others (2009), Ning and others (2011), among others and the test of Ning and others (2010). We derive the test statistic under this working assumption, since the derivation is tractable. However, we show in Section D of Supplementary material available at Biostatistics online that the test is valid even for disease incidence being non-stationary over time. Simulations are reported in Section 4 which indicates that the test can be valid under misspecification of the stationary incidence assumption.
When the distribution of
is known, parametric tests can be readily constructed for testing
, but their validity depend on the correct model specification. Alternative to parametric tests, the linear rank test was proposed to offer a greater robustness to model misspecification. A comprehensive review of linear rank statistics for right censored data can be found in Kalbfleisch and Prentice (2002, Chapter 7). In the following, we derive a linear rank test based on backward recurrence time
. Let
be the ordered outcomes with corresponding covariate vectors
, respectively. The rank vector is given by the corresponding labels
and the rank likelihood (Kalbfleisch and Prentice, 1973) based on
is
![]() |
where
are ordered values of
and
are the corresponding covariate values. Under model (1.1), one can write out the log-rank likelihood as
![]() |
A score test statistic can be constructed as
![]() |
(2.2) |
The expectation is taken with respect to the
th order statistic generated from a distribution with density
. The derivation of (2.2) is shown in Section B of Supplementary material available at Biostatistics online. Furthermore, since
![]() |
the linear rank statistic can be written as
, where
,
and
, and
are ordered statistics of uniform distributed random variables. Note that
is quite different from
which is given in Kalbfleisch and Prentice (2002, Chapter 7) for unbiased samples.
We consider two important special cases where
and the associated rank tests have closed form expressions, which lead to easier computations and well-known test statistics. First, when the error term
is extreme-value distributed, it can be shown that
is the survival function of an exponential distribution. Therefore, it follows from the same calculation in Prentice (1978) that
. Thus, it follows that
which is exactly the log-rank statistics treating backward recurrence times
as if they are completely observed survival times in an unbiased sample.
Another important special case is to construct a Wilcoxon statistics based on backward recurrence times. Suppose the error distribution follows the G-rho family with
(Harrington and Fleming, 1982), which has a survival function
![]() |
(2.3) |
It follows that
and
. Therefore,
and
. The corresponding statistic is
which is exactly the Wilcoxon statistics treating backward recurrence times
as if they are completely observed survival times in an unbiased sample. In contrast to the construction of a Wilcoxon statistic for an unbiased sample, which corresponds to a logistic error distribution, the construction here requires the error distribution to have a lighter tail. It is because the distribution backward recurrence time, called the stationary distribution in renewal theory (Cox, 1962), is generally having a heavier tail than the survival distribution. One exception is an exponential survival distribution which has an exponential stationary distribution. This corresponds to an extreme-valued error distribution, where the log-rank test can be constructed based on unbiased survival data and backward recurrence times. Note that for a logistic distributed error, the stationary distribution is undefined because
is infinite.
The test statistics are asymptotically normal following from the standard theory of linear rank estimation, provided that
(Hájek and others, 1967; Prentice, 1978). Following their arguments, a consistent variance estimate of
is
and a consistent variance estimate of
is
, where
for a column vector
. Therefore, Chi-square test statistics can be constructed by
and
which are asymptotically
distributed under the null hypothesis.
In the literature, there are a few prominent ways for choosing between the log-rank and Wilcoxon tests. One way is to specify the test statistic before analysis based on prior belief or hypothesis about the possible alternative hypothesis. When proportional hazard alternative is assumed, the log-rank test is optimal. When it is assumed that the hazard ratio decreases with time, Wilcoxon test is usually more powerful. Another way is to use graphical diagnostics for choosing the test statistics (Hess, 1995), and a related pre-test is recently proposed by Martinez (2010). Note that the choice between log-rank and Wilcoxon tests only affects power, and will only be useful for borderline cases. An advantage for rank-based statistics is their robustness against misspecification of parametric error distribution. In fact, log-rank statistic is often used in practice despite possible departure from proportional hazards alternative, since it provides a conservative test in such cases.
In the above discussion, we constructed rank-based statistics based on backward recurrence times
observed in a cross-sectional sample as if they are completely observed survival times in an unbiased sample. The result is seemingly paradoxical, and can be better understood from the following. Cross-sectional sampling preferentially select longer survivors. For subgroups with better survival, it is more likely to recruit individuals who lived longer since disease incidence. Therefore, the backward time
also tends to be longer in those subgroups. It was discussed in Allison (1985) that one can fit a proportional hazards model based on backward recurrence times to estimate the relative hazard in the population survival model under a strong assumption of exponential distributed failure time. The log-rank statistic is the corresponding score statistic based on a partial likelihood function. For parametric accelerated failure time models, Yamaguchi (2003) and Keiding and others (2011) also showed that survival model parameters can be estimated based on backward recurrence time. In particular, they showed that
for the same regression coefficient
but a different error term
which has the survival function
. This suggests that testing the null hypothesis
from the accelerated failure time model using backward recurrences times is equivalent to testing the null hypothesis
in the target population. Therefore, linear rank statistics can also be constructed directly backward recurrence times and the corresponding statistics coincide with our results.
3. Combined rank statistics for cross-sectional data with follow-up
When prospective follow-up is present, a right-censored version of the forward recurrence time
would be observed in addition to the backward recurrence time
. Denote the observed length- biased sampling data with possible right censoring as
. Suppose
and
are conditionally independent given
and
and the censoring distribution does not involve any parameter of the survival distribution. Note that in length-biased sampling,
and
are independent in the population. The observed data likelihood is
![]() |
Based on the conditional likelihood function
, linear rank statistics have been proposed by Ying (1990). Based on the marginal likelihood function
, linear rank statistics have been proposed in the previous section. We will discuss how to combine the two statistics in this section. Note that the backward recurrence time
and forward recurrence time
are correlated (Cox, 1962). Also, in the context of cross-sectional survival data, informative censoring occurs because the survival time
and the censoring time
within a prevalent population shares a common random component
. However,
and
are independent conditional on
, since
and
are independent conditional on
. Because of this conditional independence and the definition of risk sets for left truncated right censored data, Ying's test does not suffer from induced informative censoring. However, efficiency is lost because of conditioning. We improve efficiencies by utilizing the information from the marginal distribution of
.
We introduce the following counting process notations for representing Ying's statistics:
,
. When the error is extreme-value distributed, the locally most powerful test based on
is the log-rank test
, where
and
is a constant corresponds to the maximal observable survival time. When the distribution function of the error term is (2.3), the locally most powerful test (Harrington and Fleming, 1982) is the G-rho test with
,
where
is the left hand limit of the product-limit estimator for left truncated right censored data (Tsai and others, 1987). The two statistics have consistent variance estimates
and
respectively. Chi-square test statistics can be constructed by
and
which are asymptotically
distributed under the null hypothesis.
Since two Chi-square test statistics are constructed based on
and
, respectively, it is intuitive that a combined test statistic shall lead to an improvement of power if the two statistics are not perfectly correlated. In fact, the two statistics are asymptotically independent as we shall show. Power can be greatly improved as if we had two independent samples instead of one. In the following, we focus on combining two log-rank tests
and
under a working assumption that the error is extreme-value distributed. The combination of test statistics
and
under a working error distribution (2.3) will follow a similar manner. We first need to find out the correlations between
and
under the null hypothesis. Note that
is a function of only
, and
which will be shown in Section C of Supplementary material available at Biostatistics online. Therefore,
, so that
and
are uncorrelated. Moreover, the two statistics have limiting Gaussian distributions and are asymptotically independent. Following this result, we investigate several methods for combining the two log-rank statistics.
The first method is inspired by the method combining independent
tables in Mantel and Haenszel (1959). Under
,
is asymptotically normal with mean zero and variance
, and a Mantel-Haenszel (MH) statistic is defined as
, which is asymptotically
distributed.
The second method is a modification of
. The covariance matrices
and
can be written as
and
, respectively, and
and
are asymptotically normal with mean zero and variance
, the identity matrix. A modified Mantel-Haenszel (MMH) statistic is defined as
, which is asymptotically
distributed.
The third method is a sum of Chi-square test (Bhattacharya, 1961), defined as
, which is asymptotically
distributed.
The fourth method is the Fisher's inverse Chi-square test statistics (Fisher, 1932, pp. 99–101), defined as
, where
and
are
-values based on
and
, respectively. The Fisher's statistics is asymptotically
distributed, regardless of the dimensions of
. It is because the
-values are uniformly distributed under the null hypothesis and each of the two terms in
is
distributed.
There had been other procedures proposed for combining independent Chi-square tests, see, for example, Marden (1982). We considered the above four procedures for the following reasons. The Mantel-Haenszel test has been shown to outperform other procedures for combining
tables with similar departures from the null hypothesis for all tables (Louv and Littell, 1986). In our case, the association parameter
is the same for the tests based on backward recurrence time and follow-up time, suggesting that Mantel-Haenszel type procedures should have a decent power. On the other hand, the sum of Chi-square and Fisher's tests are shown to be admissible in Marden (1982) and have reasonable power against a wide range of alternatives (Louv and Littell, 1986).
It is well known that the log-rank statistic has optimal power under proportional hazards alternatives. For decreasing hazard ratios, the Wilcoxon test or the G-rho family is more powerful. For right-censored data, Kosorok (1999) derived a versatile maximum test of weighted log-rank statistics, which is powerful against a wide range of alternatives. We would like to stress that our main contribution is rather different than the broad class of tests proposed for right-censored data. We highlight the differences as follows. First, we combine asymptotic independent test statistics from two sources (
and
) using the same set of data. This is different from maximum tests that combine dependent test statistics from one source (
only). For a combination of dependent test statistics, the gain in power is usually limited, whereas we shall show by simulations that the gain in power can be substantial because we combine independent sources of information. Second, the Mantel-Haenzel test, sum test and Fisher's inverse test considered in this section are designed for combining asymptotic independent test statistics. The null distribution will be much more complex when the statistics are correlated. In fact, the maximum test requires extensive simulations to compute the null distribution (Kosorok, 1999). On the other hand, our proposed Fisher's inverse test can be readily extended to combine two maximum tests, one for the backward time and the other from follow-up data, since information from two sources are asymptotically independent.
4. Simulation studies
We performed simulation studies to evaluate the performance of the proposed test statistics. In each case, independent data sets were generated 10 000 times under the null hypothesis and 1000 times under the alternative hypotheses. First, we considered a two-sample testing scenario where
was a Bernoulli variable with
. The survival time
was generated by model (1.1) where
and
, where
was standard exponential distributed. That is,
is extreme value distributed. We also considered
to be a logistic or a normal distribution with the same mean and variance as the above extreme value distribution to show that the tests are robust against mis-specification of parametric error distribution. To obtain a length-based sample, we generated random truncation times
from a
distribution; an observation was in the cross-sectional sample if
. Data were generated until the cross-sectional sample had
or
observations. The survival endpoint was censored if an individual in the prevalent cohort survived past
time units after recruitment. We compared the proposed testing procedures, including the test based only on backward recurrence time without follow-up and the four combined test statistics, to two existing methods: the linear rank statistics of Ying (1990) based on left truncated right censored data (LTRC) and the modified log-rank statistics of Ning and others (2010), denoted by NSQ. A 5% significance level was chosen. Table 1 summarizes the results from the simulation study. The results showed that all tests had an empirical Type I error that was close to the nominal value, even for a small sample size and misspecified error distributions. Under alternative hypotheses, there were significant improvements in power by incorporating information from backward recurrence times. In fact, the test statistics based on backward recurrence times alone had a higher power than the conventional log-rank test based on left truncated and right censored data. All combined tests had improved power compared with the conventional log-rank test or the test based only on the backward recurrence time. The combined tests had a consistent improved power compared with the test of Ning and others (2010) which also recognized the length-biased data structure but utilized information in a different way. Their test requires the nonparametric estimation of a pooled survival distribution which may compromise power under alternative hypotheses, while the proposed tests do not require such an estimation. Among the proposed tests, the Mantel-Haenszel test and its modification had a consistently higher power than the sum of Chi-square and the Fisher's test, and the modified Mantel-Haenszel test performed slightly better than the Mantel-Haenszel test.
Table 1.
Percentage of hypotheses rejected by the existing and proposed tests: Two-sample testing. LTRC: conventional log-rank statistic for left truncated right censored data, NQS: modified log-rank statistic of Ning and others (2010), backward: proposed log-rank statistic based on backward recurrence time, MH: proposed Mantel-Haenszel statistic, MMH: proposed modified Mantel-Haenszel statistic, SUM: proposed sum of Chi-square statistic, Fisher: proposed Fisher's inverse Chi-square statistic
![]() |
![]() |
|||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| (a) Extreme value distribution | ||||||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| LTRC | 5 | 11 | 29 | 55 | 71 | 5 | 21 | 61 | 85 | 96 |
| NQS | 5 | 16 | 53 | 86 | 98 | 5 | 30 | 86 | 100 | 100 |
| Backward | 5 | 15 | 42 | 73 | 94 | 5 | 26 | 72 | 96 | 100 |
| MH | 5 | 21 | 59 | 89 | 98 | 6 | 39 | 90 | 100 | 100 |
| MMH | 5 | 23 | 62 | 91 | 99 | 6 | 41 | 93 | 100 | 100 |
| SUM | 5 | 18 | 55 | 86 | 97 | 5 | 32 | 88 | 100 | 100 |
| Fisher | 5 | 18 | 54 | 87 | 98 | 5 | 32 | 88 | 100 | 100 |
| (b) Logistic distribution | ||||||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| LTRC | 5 | 11 | 31 | 54 | 78 | 5 | 17 | 53 | 85 | 97 |
| NQS | 5 | 14 | 42 | 77 | 95 | 5 | 24 | 77 | 97 | 100 |
| Backward | 5 | 11 | 33 | 58 | 80 | 5 | 20 | 58 | 86 | 99 |
| MH | 5 | 19 | 48 | 82 | 96 | 6 | 30 | 81 | 98 | 100 |
| MMH | 5 | 15 | 53 | 87 | 98 | 6 | 33 | 85 | 99 | 100 |
| SUM | 5 | 15 | 44 | 80 | 95 | 5 | 25 | 78 | 97 | 100 |
| Fisher | 5 | 15 | 44 | 79 | 96 | 5 | 25 | 79 | 97 | 100 |
| (c) Normal distribution | ||||||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| LTRC | 5 | 11 | 30 | 59 | 77 | 5 | 20 | 57 | 85 | 96 |
| NQS | 5 | 15 | 45 | 79 | 94 | 5 | 25 | 79 | 98 | 100 |
| Backward | 6 | 13 | 33 | 60 | 81 | 5 | 21 | 60 | 91 | 99 |
| MH | 5 | 18 | 50 | 84 | 96 | 6 | 33 | 84 | 99 | 100 |
| MMH | 5 | 18 | 54 | 87 | 98 | 6 | 34 | 87 | 99 | 100 |
| SUM | 5 | 14 | 45 | 80 | 95 | 5 | 27 | 80 | 98 | 100 |
| Fisher | 5 | 15 | 46 | 80 | 95 | 5 | 27 | 80 | 99 | 100 |
Next we considered a scenario where
was bivariate and contained a continuous variable. Suppose
, where
is a Bernoulli variable with
, and
is standard uniform distributed. The modified log-rank test of Ning and others (2010) is only applicable to univariate categorical
and was not considered under this scenario. We considered alternative hypotheses that do not follow model (1.1) and also two scenarios of non-stationary incident distributions. For stationary incident distribution, we generated random truncation times
from a
distribution; an observation was in the cross-sectional sample if
. For covariate-independent non-stationary disease incidence,
is generated from
which gives an increasing disease incidence over calendar time. For covariate-independent non-stationary disease incidence,
is generated from
. Survival time
followed
with
. Note that the survival time did not follow an accelerated failure time model. The sampling of cross-sectional data and the censoring mechanism were chosen to be the same as in the previous example. Results are being shown in Table 2. The major conclusions were similar as in the previous example, and the proposed methods greatly improved power compared with the existing log-rank test, even when the alternative distributions do not follow an accelerated failure time model. The Mantel-Haenszel test and its modification continued to outperform other estimators, and the difference in power was greater in this scenario. The simulations also showed that the type I error remains close to the nominal value under mild departure of the stationary disease incidence assumptions.
Table 2.
Percentage of hypotheses rejected by the existing and proposed tests: Bivariate covariates and a misspecified alternative model. Test procedures are the same as in Table 1
![]() |
![]() |
|||||||
|---|---|---|---|---|---|---|---|---|
| (a) Stationary incidence | ||||||||
Estimator
|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| LTRC | 4 | 17 | 8 | 18 | 4 | 38 | 18 | 43 |
| Backward | 5 | 45 | 19 | 61 | 5 | 82 | 39 | 92 |
| MH | 5 | 64 | 25 | 76 | 5 | 93 | 54 | 97 |
| MMH | 5 | 62 | 23 | 70 | 5 | 91 | 53 | 95 |
| SUM | 4 | 45 | 18 | 61 | 4 | 86 | 41 | 93 |
| Fisher | 4 | 45 | 18 | 61 | 4 | 86 | 41 | 93 |
| (b) Non-stationary, covariate-independent incidence | ||||||||
Estimator
|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| LTRC | 4 | 21 | 10 | 20 | 4 | 41 | 16 | 46 |
| Backward | 5 | 42 | 17 | 53 | 5 | 79 | 33 | 89 |
| MH | 4 | 64 | 25 | 71 | 5 | 93 | 47 | 97 |
| MMH | 4 | 61 | 23 | 66 | 4 | 92 | 46 | 96 |
| SUM | 4 | 48 | 16 | 54 | 5 | 85 | 36 | 92 |
| Fisher | 4 | 48 | 16 | 54 | 5 | 85 | 36 | 92 |
| (c) Non-stationary, covariate-dependent incidence | ||||||||
Estimator
|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| LTRC | 5 | 23 | 11 | 26 | 5 | 48 | 21 | 50 |
| Backward | 5 | 59 | 23 | 74 | 5 | 90 | 44 | 98 |
| MH | 6 | 72 | 32 | 85 | 6 | 96 | 56 | 100 |
| MMH | 5 | 69 | 29 | 81 | 5 | 96 | 54 | 100 |
| SUM | 5 | 60 | 23 | 77 | 5 | 94 | 48 | 99 |
| Fisher | 5 | 60 | 23 | 77 | 5 | 94 | 48 | 99 |
5. Application to a health survey without follow-up
Partial survival information in the form of backward recurrence times are frequently collected in health surveys. Most surveys are administered cross-sectionally without follow-up data, due to cost and logistic reasons. Without any prospective follow-up, survival times being collected are all censored and conventional statistical methods are not applicable. Therefore, the partial survival information being collected are seldom analyzed. An exception in given in McLaughlin and others (2010), who examined the associations between childhood adversities and the durations of adult mental disorders using backward recurrence times collected from a nationally representative sample from the National Comorbidity Survey Replication.
For illustrating the proposed test statistic, we analyzed a different survival outcome collected in the same survey, the duration between suicidal thoughts. Although suicidal thoughts are recurrent events, cross-sectional surveys like the one we have usually collect the most recent onset of the recurrent events. Therefore, we only have time from the last event which is exactly the model set-up in Section 2. The survey was administered in 2001–2002, and collected the time of last suicidal thoughts from 1010 respondents with ages between 18 and 91. We examined whether certain experiences from childhood are associated with the duration between suicidal thoughts among adults. We first examined whether the duration between suicidal thoughts were associated with living with both biological parents until age 16. The proposed log-rank statistic was 9.40 with a
null distribution, which gives a
-value of 0.002, whereas the proposed Wilcoxon statistic was 13.10 with a
-value of 0.0003. As mentioned in Section 2, the log-rank statistic is conservative under departure of proportional hazards alternative, but the result is consistent with the Wilcoxon statistic in this application. A direct comparison of backward recurrence times also revealed a consistent pattern, in which the mean backward recurrence time among individuals who lived with both parents until 16 was 2.6 years (95% CI: 1.0-4.1) longer than individuals who did not live with both parents. We also examined the associations between the duration between suicidal thoughts with three childhood adversities: parental substance abuse, parental criminality, and family violence. The proposed log-rank statistics were 5.0, 12.1, and 0.6 for parental substance abuse, parental criminality, and family violence, respectively, yielding
-values of 0.025, 0.001, and 0.431. Therefore, the data suggested that parental substance abuse and criminality were associated with the duration between suicidal thoughts at a 5% significance level. However, the data did not suggest any association between family violence and the duration between suicidal thoughts. We reiterate that the tests of Ying (1990) and Ning and others (2010) are not applicable to this data because there was no prospective follow-up.
6. Application to a prospective study of Alzheimer's disease
Next, we applied the proposed tests to the Canadian Study of Health and Aging (Wolfson and others, 2001). The same example has been used to study the performance of the modified log-rank test in Ning and others (2010). A prevalent sample of individuals suffered from dementia was collected in 1991. The date of dementia onset was collected from medical history, and a prospective follow-up was conducted between 1991 and 1996. It is well established that the sample is subject to length-biased sampling, see, for example, Shen and others (2009) and Ning and others (2010), among others. Subjects were being classified into one of the three diagnostic categories: probable Alzheimer's disease, possible Alzheimer's disease, and vascular dementia. We investigated whether the time from onset to death were associated with diagnostic categories. The available data included 818 subjects with dementia, among them 393 had probable Alzheimer's disease, 252 had possible Alzheimer's disease, and 173 had vascular dementia. We considered pairwise comparisons and an overall comparison among the three groups. The
-values from the different tests are summarized in Table 3. As noted in Ning and others (2010), the conventional log-rank test for left truncated right censored data did not reject the null hypotheses that the survival distributions for any two of the three groups were equal. However, statistical significant differences were found between vascular and possible Alzheimer's disease, between probable and possible Alzheimer's disease and in the three-sample test at a 5% significant level using the proposed Mantel-Haenszel tests.
-values were slightly higher for the sum test and the Fisher's test. For the three-sample test, the test of Ning and others (2010) obtained a marginal significant result at a 5% level of significance, whereas the proposed Mantel-Haenszel tests had a lower
-value of 0.02. Interestingly, the same conclusion can be reached at the baseline recruitment before any follow-up began, by using the log-rank test based on backward recurrence times.
Table 3.
-values multiplied by
for the Canadian Health and Aging data. Test procedures are the same as in Table 1. (a) Vascular vs probable, (b) vascular vs possible, (c) probable vs possible, and (d) three-sample test
| (a) | (b) | (c) | (d) | |
|---|---|---|---|---|
| LTRC | 42 | 33 | 71 | 54 |
| NQS | 35 | 2 | 4 | 5 |
| Backward | 45 | 1 | 2 | 1 |
| MH | 28 | 1 | 4 | 2 |
| MMH | 28 | 2 | 5 | 2 |
| SUM | 56 | 3 | 5 | 5 |
| Fisher | 52 | 1 | 6 | 5 |
To further understand the power of detecting a difference in a three-sample test, we ran a simulation study with estimated parameter values from the data. We fit a parametric survival model (1.1) with an extreme-value distributed error. We estimated that
where
is the relative survival time comparing subjects with possible Alzheimer's disease with subjects with probable Alzheimer's disease, and
is the relative survival time comparing subjects with vascular dementia with subjects with probable Alzheimer's disease. The error distribution is
where
is an exponential distribution with an estimated mean of 3.82 years. The result is comparable with a semiparametric analysis based on log-rank estimating equations which estimates
. It is estimated that
and
of cases has probable Alzheimer's disease, possible Alzheimer's disease, and vascular dementia, respectively, using the bias corrected estimator of Chan and Wang (2012). We simulated 1000 independent data sets using these parameters and each data set contained 818 subjects in a cross-sectional sample. The generation of cross-sectional sample was described in Section 4. The residual censoring time from recruitment was generated from a uniform
distribution, consistent with the observed data. We found that the power of the sum of Chi-square test and Fisher's test were 76%, whereas the power of the Mantel-Haenszel test and the modified Mantel-Haenszel test were 86%. Therefore, the Mantel-Haenszel tests have a lower Type II error and are more likely to reject the null hypothesis when then the alternative is true.
7. Concluding remarks
A linear rank statistic is proposed for length-biased cross-sectional survival data without follow-up, based on backward recurrence time. Existing test statistics cannot be applied to this case because all subjects are essentially censored. Interestingly, the special cases of log-rank and Wilcoxon statistics treat backward recurrence time as if it is the completely observed survival time. When prospective follow-up is present, the proposed rank statistic without follow-up can be combined with a conventional rank statistic for cross-sectional data with follow-up. Another interesting observation is that the two rank statistics based on the same sample are asymptotically independent. This property facilitates the combination of two statistics and four methods were explored. Also, combing two independent statistics can substantially improve power. Based on the simulation studies and the data analysis, the proposed Mantel-Haenszel tests have a larger power and we recommend their use in practice.
The test of Ning and others (2010) was designed for cross-sectional data with prospective follow-up. The test is not asymptotically equivalent to our proposed method. In a cross-sectional sample without prospective follow-up, the test of Ning and others (2010) has infinite variability and the proposed method still work. When forward follow-up is much shorter compared with backward time, which happens in practice because follow-up data is typically costly to collect, the proposed test is expected to outperform its existing competitors. Our test statistics are also applicable to the case where follow-up time is only observed from a subsample. Existing methods would generally discard backward information for the incomplete observations without forward follow-up, while the proposed method can utilize backward information from the full sample.
Funding
K.C.G.C. is partially supported by grant R01 HL-122212 from the National Institutes of Health.
Supplementary Material
Acknowledgement
The authors thank Prof. Anastasios Tsiatis, an associate editor and two reviewers for their helpful comments and suggestions which greatly improved this paper. The authors thank Prof. Masoud Asgharian for sharing the Canadian Study of Health and Aging data. Conflict of Interest: None declared.
References
- Allison P. D. (1985). Survival analysis of backward recurrence times. Journal of the American Statistical Association 80390, 315–322. [Google Scholar]
-
Bhattacharya N. (1961). Sampling experiments on the combination of independent
tests. Sankhya
11, 191–196. [Google Scholar] - Chan K. C. G., Chen Y. Q., Di C.-Z. (2012). Proportional mean residual life model for right-censored length-biased data. Biometrika 994, 995–1000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chan K. C. G., Wang M.-C. (2012). Estimating incident population distribution from prevalent data. Biometrics 682, 521–531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cox D. R. (1962). Renewal Theory. London: Methuen. [Google Scholar]
- Fisher R. A. (1932). Statistical Methods for Research Workers. London: Oliver and Boyd. [Google Scholar]
- Hájek J. (1967). Theory of Rank Tests. New York: Academic press. [Google Scholar]
- Harrington D. P., Fleming T. R. (1982). A class of rank test procedures for censored survival data. Biometrika 693, 553–566. [Google Scholar]
- Hess K. R. (1995). Graphical methods for assessing violations of the proportional hazards assumption in Cox regression. Statistics in Medicine 1415, 1707–1723. [DOI] [PubMed] [Google Scholar]
- Kalbfleisch J. D., Prentice R. L. (1973). Marginal likelihoods based on Cox's regression and life model. Biometrika 602, 267–278. [Google Scholar]
- Kalbfleisch J. D., Prentice R. L. (2002). The Statistical Analysis of Failure Time Data. New York, NY: John Wiley & Sons. [Google Scholar]
- Keiding N., Fine J. P., Hansen O. H., Slama R. (2011). Accelerated failure time regression for backward recurrence times and current durations. Statistics & Probability Letters 817, 724–729. [Google Scholar]
- Kosorok M. R., Lin C.-Y. (1978). The versatility of function-indexed weighted log-rank statistics. Journal of the American Statistical Association 94445, 320–332. [Google Scholar]
- Louv W. C., Littell R. C. (1986). Combining one-sided binomial tests. Journal of the American Statistical Association 81394, 550–554. [Google Scholar]
- Mantel N., Haenszel W. (1959). Statical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 224, 719–748. [PubMed] [Google Scholar]
- Marden J. I. (1982). Combining independent noncentral chi squared or f tests. The Annals of Statistics 101, 266–277. [Google Scholar]
- Martinez R. L. M. C., Naranjo J. D. (2010). A pretest for choosing between logrank and wilcoxon tests in the two-sample problem. Metron 682, 111–125. [Google Scholar]
- McLaughlin K., Green J. G., Gruber M. J., Sampson N. A., Zaslavsky A. M., Kessler R. C. (2010). Childhood adversities and adult psychiatric disorders in the national comorbidity survey replication II: associations with persistence of DSM-IV disorders. Archives of General Psychiatry 672, 124–132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ning J., Qin J., Shen Y. (2010). Non-parametric tests for right-censored data with biased sampling. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 725, 609–630. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ning J., Qin J., Shen Y. (2011). Buckley–James-type estimator with right-censored and length-biased data. Biometrics 674, 1369–1378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prentice R. L. (1978). Linear rank tests with right censored data. Biometrika 651, 167–179. [Google Scholar]
- Shen Y., Ning J., Qin J. (2009). Analyzing length-biased data with semiparametric transformation and accelerated failure time models. Journal of the American Statistical Association 104487, 1192–1202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsai W.-Y., Jewell N. P., Wang M.-C. (1987). A note on the product-limit estimator under right censoring and left truncation. Biometrika 744, 883–886. [Google Scholar]
- Vardi Y. (1989). Multiplicative censoring, renewal processes, deconvolution and decreasing density: nonparametric estimation. Biometrika 764, 751–761. [Google Scholar]
- Wang M.-C. (1991). Nonparametric estimation from cross-sectional survival data. Journal of the American Statistical Association 86, 130–143. [Google Scholar]
- Wolfson C., Wolfson D. B., Asgharian M., M’Lan C. E. (2001). A reevaluation of the duration of survival after the onset of dementia. New England Journal of Medicine 34415, 1111–1116. [DOI] [PubMed] [Google Scholar]
- Yamaguchi K. (2003). Accelerated failure–time mover–stayer regression models for the analysis of last–episode data. Sociological Methodology 331, 81–110. [Google Scholar]
- Ying Z. (1990). Linear rank statistics for truncated data. Biometrika 774, 909–914. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.









































































