SUMMARY
Double censoring often occurs in registry studies when left censoring is present in addition to right censoring. In this work, we propose a new analysis strategy for such doubly censored data by adopting a quantile regression model. We develop computationally simple estimation and inference procedures by appropriately using the embedded martingale structure. Asymptotic properties, including the uniform consistency and weak convergence, are established for the resulting estimators. Moreover, we propose conditional inference to address the special identifiability issues attached to the doubly censoring setting. We further show that the proposed method can be readily adapted to handle left truncation. Simulation studies demonstrate good finite-sample performance of the new inferential procedures. The practical utility of our method is illustrated by an analysis of the onset of the most commonly investigated respiratory infection, Pseudomonas aeruginosa, in children with cystic fibrosis through the use of the US Cystic Fibrosis Registry.
Keywords: Conditional inference, Double censoring, Empirical process, Martingale, Regression quantile, Truncation
1. Introduction
Double censoring often occurs in biomedical research when an outcome of interest is subject to both left censoring and right censoring. For example, in patients with cystic fibrosis (CF), the onset of Pseudomonas aeruginosa (PA) is monitored as an important landmark event for lung disease progression and was shown to be associated with reduced survival (Kerem et al., 1992, Emerson et al., 2002). However, in the US Cystic Fibrosis Foundation Patient Registry (CFFPR), which documents the diagnosis and annual follow-ups of all known CF patients and is described in detail elsewhere (FitzSimmons, 1993), ages at the first PA infection were not known for patients who had been infected with PA infection and thus were positive at the first documented CFFPR visit, and those who had no detected or reported PA infection by the end of follow-up. This poses a double censoring scenario, where, with the time origin set as birth, time to first PA infection was subject to left censoring by time to registry entry, which was always recorded, and also right censoring by time to last follow-up visit, which was not always known in advance due to the occurrence of random dropout. Such a double censoring setting frequently arises in observational studies and is the focus of this paper. Note, it is different from the double censoring problem studied by Sun (2006), which concerned the gap time between two related events subject to interval censoring.
The simultaneous presence of left censoring and right censoring can bring many complexities to the analysis of doubly censored data. For example, the estimator of the survival distribution function in the one-sample case is generally presented as the solution of self-consistency equations (Turnbull, 1974, Chang and Yang, 1987), and cannot be expressed in closed form. Gehan (1965) studied the two-sample problem by an extension of the Wilcoxon test. Ren (2008) proposed a weighted empirical likelihood-based semiparametric maximum likelihood estimator as a unified approach for the two-sample problem with various censoring schemes including double censoring. Overall, these approaches are more complicated than their counterparts for randomly right censored data, for example, Kaplan-Meier estimator (Kaplan and Meier, 1958) or log-rank test (Mantel, 1966).
This paper is concerned with the general regression setting. Among existing work, Zhang and Li (1996) proposed a Buckey-James-Ritov-type M-estimator. Their estimating equation is neither monotone nor continuous and thus may necessitate special efforts to address some computational issues. More recently, Ren and Gu (1997) and Ren (2003) proposed a parallel regression M-estimator. This approach requires the independence between censoring variables and covariates, and thus imposes a stronger random censoring assumption than the usual one. For scenarios where both left and right censoring times are always observed, Cai and Cheng (2004) studied semiparametric transformation models (Cheng et al., 1995), and Yan et al. (2009) adapted temporal process regression (Fine et al., 2004) to doubly censored data.
The primary goal of this paper is to develop a computationally simple regression approach that can accommodate the double censoring scenario as occurs in CFFPR while only requiring the standard random censoring assumption (i.e. the left and right censoring times are independent of the event time of interest given covariates). To this end, we adopt quantile regression (Koenker and Bassett, 1978), a regression strategy that has received increased attention in survival analysis. With right censored data, among the earliest breakthroughs, Powell (1984, 1986) extended the least absolute deviation (LAD) approach from traditional quantile regression to censored quantile regression, assuming the censoring variables are fixed or always observable. Later efforts have been made to accommodate non-fixed censoring which is not always known by requiring additional restrictions such as unconditional independent censoring (Ying et al., 1995, Honore et al., 2002, among others), or nearly independently and identically distributed (i.i.d.) errors (Yang, 1999). More recently, without imposing these constraints, Portnoy (2003) proposed a recursively reweighted estimator as a generalization of the Kaplan-Meier estimator, with subsequent work by Neocleous et al. (2006) and Portnoy and Lin (2010) devoted to further polish the algorithm and the asymptotic theory. Peng and Huang (2008) proposed an alternative estimation approach by utilizing the martingale structure of randomly right censored data. Their approach is well justified in theory and also has a convenient implementation.
As elaborated in Portnoy (2003) and Peng and Huang (2008), the use of quantile regression in survival analysis offers straightforward interpretation on event times as well as extra model flexibility to accommodate varying covariate effects, thereby providing a valuable alternative to the popular Cox proportional hazards model (Cox, 1972). For example, through analyzing the CFFPR dataset with quantile regression, one would be able to assess how covariates impact various quantiles of time to first PA infection without restricting their effects to be constant as contrast to the traditional Cox regression. Such an analysis may also help detect population inhomogeneous risk patterns of the onset of PA infection, for example, some covariate effects that vary between patients who have high susceptibility to PA infection versus those who are less prone to this complication.
To the best of our knowledge, there has been little work on developing quantile regression methods tailored to the double censoring setting considered in this paper. To tackle the problem, our basic strategy is to identify an appropriate martingale process embedded with the present data structure and construct stochastic integral estimating equations by employing the technique of Peng and Huang (2008). Such an effort yields estimation and inference procedures which can be readily implemented by existing statistical software. The details are presented in Section 2 along with asymptotic justifications. In Section 3, we address an unique issue with double censoring, which is, both the lower and higher tails of the event time quantile function may suffer from non-identifiability. Unlike in the right censoring case, one may not simply deal with this identifiability problem by restricting the quantile inference range. In the case of lack of identifiability in the lower tail, we propose a novel solution based on a conditional version of quantile regression and present the corresponding estimation and inference.
In the rest of the paper, we first note that the new quantile regression approach for doubly censored data can be slightly modified to handle truncated data. The adaptation in the presence of left truncation is briefly described in Section 4. We then report in Section 5 the results from simulation studies conducted to evaluate the finite-sample performance of our proposals. An application to the CFFPR data is presented in Section 6 to illustrate the practical utility of the proposed method. In Section 7, we conclude this paper with a few remarks.
2. Quantile Regression Procedure with Doubly Censored Data
2.1 Data and Model
Let T denote the event time of interest, L be the left censoring time, U be the right censoring time, and Z̃ be the p × 1 vector of recorded covariates. Define Z = (1, Z̃)Τ, X = max{L, min(T, U)}, R(t) = I(L < t ≤ X), and N(t) = I(X ≤ t, δ = 1). Here δ is the censoring indicator defined as
and N(t) and R(t) may be viewed as the counting process and the at–risk process associated with T. We assume that (L, U) ⊥ T given Z. The observed data consists of n i.i.d. replicates of (X, δ, Z, L), denoted by .
Define the conditional τ–th quantile of T given Z by QT (τ |Z) = inf{t : FT (t|Z) ≥ τ}, where FT (t|Z) = Pr(T ≤ t|Z). We consider the quantile regression model taking the form,
| (1) |
where g(·) is a known monotone link function, and β0(τ) is a vector of unknown coefficients representing covariate effects on QT(τ|Z). As noted in Peng and Huang (2008), model (1) is a strict extension of the accelerated failure time (AFT) model (Buckley and James, 1979, Prentice, 1978, Louis, 1981, Wei and Gail, 1983, Tsiatis, 1990, Ritov, 1990, Wei et al., 1990, among others). The formulation of τ–varying coefficients enables the accommodation of population inhomogeneous covariate effects.
2.2 Estimation Procedure
To estimate β0(τ) in model (1), our basic idea is to determine an appropriate martingale process which allows us to construct an unbiased stochastic integral estimating equation by using Peng and Huang (2008)’s technique.
Following this line, we consider , where N(t) = I(X ≤ t, δ = 1), R(t) = I(L < t ≤ X), and ΛT (t|Z) denotes the cumulative hazard function of T given Z. Let Ni(t), Ri(t), ΛT (t|Zi), and Mi(t) be sample analogues of N(t), R(t), ΛT (t|Z), and M(t). Denote the filtration σ{Ni(u), Ri(u+), Zi : i = 1, …, n; 0 ≤ u ≤ t} by ℱt. Provided (Li, Ui) ⊥ Ti given Zi,
This shows that Mi(t) is a martingale (Fleming and Harrington 1991) and
| (2) |
Under model (1), which implies , it follows that
from a use of variable transformation within the integral, where H(x) = −log(1 − x). This equality and (2) naturally lead to an estimating equation for β0(·) given by
| (3) |
where
It is easy to see that E{Sn(β0, τ)} = 0. With all Li = 0, equation (3) reduces to Peng and Huang (2008)’s estimating equation for randomly right censored data.
The stochastic integral representation of Sn(β, τ) suggests a grid-based procedure to obtain an estimator of β0(·) based on equation (3). Specifically, define the estimator β̂(τ) as a right-continuous step function that jumps only on a prespecified grid, 𝒢Ln = {0 = τ0 < τ1 < … < τLn = τU < 1}, where τU is a prespecified constant subject to certain theoretical constraint. Because the definition of QT (·|Z0) and model (1) imply g{ZΤβ0(0)} = 0, we always set g{ZΤβ̂(0)} = 0. We propose to obtain β̂(τj) (j = 1, …, Ln) by sequentially solving the following equation for β(τj):
| (4) |
Note that equation (4) is a monotone estimating equation Fygenson and Ritov (1994), and the left hand side of (4) equals times the gradient of the following L1–type convex function,
| (5) |
where R* is a sufficiently large positive number which is expected to bound from above for all h’s in the compact parameter space for β0(τ). In our numerical studies, we set R* = 108. As a result, β̂(τj) can be alternatively obtained as the minimizer of lj(h). This L1–minimization problem can be readily solved, for example, by using the Barrodale-Roberts algorithm Barrodale and Roberts (1974) implemented in standard statistical software, such as l1 fit() function in S-PLUS and rq() function in R. The same computational technique has been used in non-quantile regression settings, for example, the rank inference for the accelerated failure time model (Jin et al., 2003).
2.3 Asymptotic Results
Asymptotic studies of the proposed estimator are facilitated by the stochastic integral representation of our estimating function. We introduce necessary notation before stating the regularity conditions and theorems.
Let FX(·|Z) and F̄X(·|Z) be the distribution function and survival function of X given Z, respectively. Define F̃X,δ(t|Z) = Pr(X ≤ t, δ = 1|Z) and F̄X,L(t|Z) = Pr(X ≥ t, L ≥ t|Z). Let fT(·|Z), fX(·|Z), f̄X (·|Z), f̃X,δ(·|Z), f̄X,L(·|Z) and g′(·) denote the first order derivatives of FT (·|Z), FX(·|Z), F̄X(·|Z), F̃X,δ(·|Z), F̄X,L(·|Z) and g(·), respectively. Define
and ‖𝒢Ln‖ = max{|τj − τj−1|, j = 1, …, Ln}.
The regularity conditions include:
-
C1.
The covariate space Z is bounded, i.e., supi‖Zi‖ < ∞.
-
C2.
(a) Each component of E(ZN[g{ZΤ β0(τ)}]) is a Lipschitz function of τ; (b) f̃X,δ(t|z) and fX(t|z) are bounded above uniformly in t and z.
-
C3.
(a) f̃X,δ{g(ZΤb)|Z} > 0 for all b ∈ ℬ(d0), (b) E(Z⊗2) > 0; (c) each component of J(b)B(b)−1 is uniformly bounded in b ∈ ℬ(d0), where ℬ(d0) is a neighborhood containing {β0(τ), τ ∈ (0, τU)}, defined as ℬ(d0) = {b ∈ ℝp : infτ∈(0, τU] ‖ μ(b) − μ(β0(τ))‖ ≤ d0}.
-
C4.
infτ∈[ν,τU]eigminB{β0(τ)} > 0 for any ν ∈ (0, τU), where eigmin(·) denotes the minimal eigenvalue of a matrix.
We establish the uniform consistency and weak convergence of β̂(τ) stated in the following theorems.
Theorem 1: Assuming conditions C1–C4 hold and limn→∞ ‖𝒢Ln‖ = 0, then supτ∈[ν,τU]‖β̂(τ) − β0(τ) ‖ →p 0, where 0 < ν < τU.
Theorem 2: Assuming conditions C1–C4 hold and limn→∞ n1/2‖𝒢Ln‖ = 0, then n1/2{β̂(τ) − β0(τ)} converges weakly to a Gaussian process for τ ∈ [ν, τU], where 0 < ν < τU.
The regularity conditions and the proofs of Theorems 1–2 bear similarity with those in Peng and Huang (2008). Specifically, condition C1 requires covariates to be bounded and this is often met in practice. The smoothness of β0(·) and the boundedness of f̃X,δ(t|z) and fX(t|z) assumed in condition C2 are also reasonable in practical settings. It is rather common in quantile regression literature to require positive densities and the positive definiteness of E(Z⊗2) as in C3(a) and C3(b). Given bounded density functions and covariates, coupled with the condition C4 commented later, C3(c) is satisfied with b = β0(τ). By the continuity of J(b)B(b)−1, it is thus natural to expect C3(c) holds in a neighborhood of β0, ℬ(d0).
We would like to point out that the regularity condition C4 is the crucial constraint that ensures the identifiability of {β0(τ), τ ∈ (0, τU]}. In the simple one-sample case, this condition is equivalent to
for any ν ∈ (0, τU). Assuming fT (·) is bounded away from 0, this condition further reduces to Pr{L < QT (τ) ≤ U} > 0, ∀τ ∈ (ν, τU], implying τU ≤ FT (U+) and ν ≥ FT (L−). Here and hereafter, for a random variable Y, we use Y+ and Y− to denote the upper bound and the lower bound of its support respectively. This requirement concurs with Chang and Yang (1987)’s identifiability condition proposed for estimating the distribution function of doubly censored data in the one-sample case. Since ν > 0 can be chosen arbitrarily, it implies L− ≤ T−. In the general regression setting, however, C4 only renders implicit conditions on L and τU to guarantee the identifiability of β0(·). In practice, an infeasible minimization of lj(h) with some j < Ln may suggest the violation of the identifiability condition C4. In this case, one would need to reduce the τ–range for inference by reseting τU to a value less than τj.
Under regularity conditions C1–4, providing E{Sn(β0, τ)} = 0 and μ(β̂ (0)) = 0 shown in Section 2.1, we can establish the uniform consistency of the proposed estimator by straightforwardly adapting the arguments in Peng and Huang (2008, Appendices). The proof of Theorem 1 is thus omitted. We sketch the proof of Theorem 2 in Web Appendix A, which provides more concrete information on the asymptotic distribution of n1/2{β̂(·) − β0(·)}.
2.4 Inference
For inference on β0(·), we propose resampling-based approaches, given the complexity in the asymptotic distribution of β̂(·) shown in the proof of Theorem 2. The resampling-based inference procedures stated above can be justified by following the lines of Peng and Huang (2008), and thus are omitted here.
More specifically, we perturb the objective function (5) by ξ1, …, ξn, a set of i.i.d. variates from a nonnegative known distribution with mean 1 and variance 1, for example, Exp(1). The resulting objective function is
| (6) |
for j = 1, …, Ln, where β*(τj) is defined as the minimizer of l̃j(h) and can be obtained sequentially using the same procedure as that taken to compute β̂(τj). For a fixed τ*, we can approximate the variance of β̂(τ*) by repeatedly generating the variates set {ξ1, …, ξn} for B times and calculating the variance of the resulting . A 95% confidence interval for β0(τ*) can be constructed based on a normal approximation. Along the lines of Peng and Huang (2007) and Huang and Peng (2009), a 95% confidence band for {β0(·) : τ ∈ [l, u]} may be given by {β̂(τ) ± ρ0.95σ̂ (τ), τ ∈ [l, u]}, where ρ0.95 is the 95% empirical percentile of supτ∈[l,u] |β*(τ) − β̂(τ)|σ̂(τ) with σ̂(τ) being the empirical standard deviation of β*(τ).
Second-stage inference can also be conducted in a similar fashion to that of Peng and Huang (2008). First, we consider the general hypothesis H0 : ψ{β0(τ)} = r0(τ), τ ∈ [l, u], where ψ(·) is a known function and r0(τ) is a hypothesized value for ψ{β0(τ)}. Let ψ(x) = x(q) and r0(τ) = 0, where u(l) denotes the l–th component of vector u and 2 ≤ q ≤ p + 1. Testing H0 is equivalent to assess whether the effect of Z(q) is significant for τ ∈ [l, u]. One natural test may take the form, , where Θ(·) is a nonnegative weight function. The distribution of Γ may be approximated by the empirical distribution of given the observed data.
Another second-stage hypothesis of interest is given by H̃0 : ψ̃{β0(τ)} = η0, τ ∈ [l, u], where ψ̃(·) is a known function and η0 is an unspecified constant. With ψ̃(x) = x(q), H̃0 depicts the scenario where the effect of Z(q) is constant over τ ∈ [l, u]. To test H̃0, one may adopt the test statistic , where Θ̃(·) is a nonconstant weight function and . One may reject H̃0 when Γ̃ is greater than the (1 − α/2)th quantile or less than the (α/2)th empirical quantile of . As a useful by-product of the hypothesis testing for H̃0, ρ̂ in the special case with ψ̃(x) = x(q) and Θ̃(υ) = 1, denoted by η̂, provides a consistent estimate for , the average quantile effect of Z(q) across τ ∈ [l, u]. The inference on η0 can be made similarly upon the resampling quantity .
3. A Conditional Version of Quantile Regression
As pointed out in Section 2.3, certain conditions are required for the identifiability of β0(τ) and may not be satisfied in some real datasets, for example, when L− > T−. It is of practical interest to propose some remedies when data fail to identify some part of β0(·), most likely to be β0(τ) with small or large τ. In the presence of only random right censoring, one practical solution is to adaptively imposing an upper bound, τU, on the τ-range in which β0(τ) is estimated (Peng and Huang, 2008). This action has little impact on the estimation because the estimating equations for {β0(τ), 0 < τ < τU} stand alone without involving estimates for β0(τ) with τ > τU.
Dealing with the identifiability issue in the double censoring case is more challenging because non-identifiability can occur on both tails of regression quantiles. One notable difficulty in view of equation (3) is that the sequential procedure presented in Section 2.2 would always require estimates at the lower tail, which however may not be identifiable due to left censoring.
When the identifiability of the lower tail of β0(·) is precluded by doubly censored data, we propose a conditional version of model (1), which takes the form,
| (7) |
where QT (τ |Z, T > t0) = inf{t ≥ t0 : Pr(T ≤ t|Z, T > t0) ≥ τ}, and t0 > 0 is a prespecified constant subject to certain theoretical and practical constraints. A more detailed discussion on how to choose t0 is relegated to the end of this section. In model (7), the unknown coefficients in α0(τ) represent the effects of covariates on the τ-th conditional quantile of T provided T > t0.
The reasoning for adopting this conditional version of quantile regression is similar to that for estimating a conditional survival function of left truncated and right censored data when the unconditional one is not identifiable (Tsai et al., 1987). Model (7) essentially imposes a lower bound of t0 for regression quantiles. As elaborated later, doing so helps circumvent the difficulty associated with non-identifiable lower tail of β0(τ) at the cost of estimating a quantity which may slightly deviate from the primary interest.
Model (7) necessitates a different estimation procedure from that of the unconditional model (2). Estimating equation (3) can not be directly borrowed without modification. The critical step in the adaptation of equation (3) is to identify an analogue of M(t) when model (7) is assumed instead of model (2). Along this line, we propose a natural substitute of M(t), given by , where Ň(t) = I(t0 < X ≤ t, δ = 1), Ř(t) = I(t0 ∨ L < t ≤ X), and ΛT(u|Z, T > t0) is the cumulative hazard function of T conditional on Z and T > t0, namely, −log Pr(T > t|Z, T > t0). Here ∨ is the maximum operator. Note that M̌ (t) resembles the standard martingale for T truncated by t0 ∨ L except for the conditional hazard function involved. The use of the conditional hazard dΛT (u|Z, T > t0) in place of dΛT (u|Z) in M̌ (t) is in tune with the assumed conditional model (7).
There are two key facts that need to be verified before we adapt the estimation procedure in Section 2. First, we need to show E{M̌ (t)|Z} = 0 for ∀t ≥ t0. To this end, standard arguments based on martingale however may not be directly applicable, and we instead prove this by examining the connection between M(t) and M̌(t). A detailed proof is provided in Web Appendix B. Secondly, we need to have g{ZΤα0(0)} = 0 as the boundary condition. Like in the unconditional case, this easily follows from the definition of QT (τ |Z, T > t0), and model assumption (7).
The estimating equation, motivated by E{M̌ (g{ZΤα (τ)} + t0)|Z} = 0, is then given by
| (8) |
where
where Ňi(·) and Ři(·) are sample analogues of Ň(·) and Ř(·).
An estimator of α0(τ), denoted by α̂(·), can be easily obtained based on equation (8) by slightly modifying the algorithm presented in Section 2.2. The key is to note that Šn(α, τ) can be rewritten as
where . This shows that we can simply replace Xi, Li, Zi in the proposed estimation procedure for β0(τ) by respectively to compute α̂(τ). The same strategy can be used to adapt the resampling-based procedures in Section 2.4 to make inference on α0(τ).
The analogy in estimating equation also suggests the similarity in asymptotic studies between the unconditional and conditional cases. Specifically, under regularity conditions C1′ – C4′, which are C1–C4 defined based on transformed data (X*, δ, Z*, L*), we have the following theorems:
Theorem 3: Assuming conditions C1′–C4′ hold and limn→∞ ‖𝒢Ln‖ = 0, then
where 0 < ν < τU.
Theorem 4: Assuming conditions C1′–C4′ hold and limn→∞ ‖n1/2𝒢Ln‖ = 0, then n1/2{α̂(τ) − α0(τ)} converges weakly to a Gaussian process for τ ∈ [ν, τU], where 0 < ν < τU.
Similar to the unconditional case, the regularity condition C4′ is concerned with the identifiability of {α0(τ) : τ ∈ (0, τU)} and implicitly impose the theoretical requirement on t0 and τU. In the one-sample case, it becomes
for ν ∈ (0, τU). It is easy to see that, with t0 chosen to be greater than L−, L < g{β0(τ)} + t0 would hold with a positive probability and thus the above condition would only impose constraints on τU. This finding in the one-sample case is suggestive of the diminished identifiability issue with the lower tail of α0(·) when the conditional version of quantile regression model is adopted.
In practice, to perform the proposed conditional quantile regression, one may choose t0 as a constant that is greater than the observed lower bound of L and also produces converged estimates for model (7) with τ close to 0. The final selection of t0 may be further adjusted according to the scientific interpretation of QT (τ |Z, T > t0), thereby yielding more meaningful conditional inference.
4. Extension to Handle Left Truncation
In this section, we present an extension to scenarios where left truncation is present, as frequently occurs in observational studies. Following the notation in Section 2, when X is subject to left truncation by event time A, the observed data include n i.i.d. replicates of (X′, L′, A′, δ ′, Z), denoted by , where {X′, L′, A′, δ′, Z′} follows the conditional distribution of {X, L, A, δ, Z} given X ≥ A. It is assumed that (L, U, A) is independent of T given Z. Such data can be referred to as doubly censored data with left truncation. With all Li = 0, the data reduce to the usual left truncated right censored data.
Adopting the same idea for doubly censored data studied in Section 2, we construct an estimating equation for model (1) by utilizing the martingale structure associated with the observed truncated data described above. In the current setting, we define N′(t) = I(X′ ≤ t, δ′ = 1), and the at-risk process as R′ (t) = I(L′ ∨ A′ < t ≤ X′). We show in Web Appendix C that is a martingale process. By this fact, only minor changes to equation (3) may be needed in order to accommodate the presence of left truncation. The proposed estimating equation for β0(·) is given by
| (9) |
where
The estimation and inference procedures can be developed based on equation (9) similarly to those described in Section 2. In the view of the analogy between equation (9) and equation (3), we can establish Theorems 1–2 with (X, δ, Z, L) in C1–C4 replaced by (X′, δ′, Z′, L′ ∨ A′) by adopting essentially the same arguments.
Following the same reasoning, one may estimate α0(·) in model (7) by solving the equation, n1/2Ǩn (α, τ) = 0, with , where , and . Theorem 3–4 can be generalized to the resulting estimator of α0(τ).
5. Simulation Studies
We studied the finite-sample performance of the proposed methods through Monte-Carlo simulations. We first generated event times from an AFT model with i.i.d. errors:
where ε followed the extreme value distribution. The covariates Z1 and Z2 were generated from Unif(0, 1) and Bernoulli(0.5) respectively. We obtained the right censoring time U and left censoring time L by generating U and L respectively from Unif(0.1 · I(Z2 = 1), cu) and Unif(0, cl) · W until L ≤ U with Z2 fixed, where W was a Bernoulli(1 − p0) variate. It is easy to show that under this set-up, both model (1) and model (7) hold with g(·) = exp(·), β0(τ) = α0(τ) = {Qε(τ), b1, b2}Τ. By setting p0 = 0.2, L had a probability mass of 0.2 at zero and rendered a scenario that the lower tail of β0(·) was identifiable. In this case, the model (1) was considered. Choosing b1 = 0, b2 = −0.5, cl = 0.5, cu = 3.8 resulted in 20% right censoring and 20% left censoring. With p0 = 0, we studied model (7) with t0 = 0.16. We set p0 = 0, b1 = 0, b2 = −1.0, cl = 0.3, cu = 4.5. The resulting proportions of right censoring and left censoring are 15% and 20% respectively. Under each configuration, we generated 1000 data sets of sample size n = 200. We set B = 200 in the resampling procedures with generated from Exp(1). An equally spaced grid on τ ∈ (0, 1) with size 0.01 was adopted when estimating β0(·) or α0(·). We also carried out tests on the overall significance and the constant effect hypotheses for each covariate. In the latter test we adopted the weight function Θ(υ) = I{υ ≥ (l + u)/2}. We set l = 0.1 and u = 0.7.
Table 1 presents the results from estimating model (1) and model (7) in the AFT model setting. We report absolute values of biases (Bias), empirical standard deviations (EmpSD), and average estimated resampling-based standard deviations (AvgSD) of β̂ (τ) and α̂(τ), and coverage rates of 95% Wald confidence intervals of β0(τ) and α0(τ) with τ = 0.1, 0.3, 0.5 and 0.7. It is observed from Table 1 that in either unconditional or conditional case, biases are small, the resampling-based standard deviation estimates agree well with the empirical ones, and the coverage rates are in general close to the nominal level.
Table 1.
Simulation Results under AFT Models. Bias: absolute biases; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations; Cov95: coverage rates of 95% Wald confidence intervals.
| Unconditional Case | Conditional Case | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| b1=0, b2=−0.5 | b1=0, b2=−1 | |||||||||
|
| ||||||||||
| τ | Bias | AvgSD | EmpSD | Cov95 | Bias | AvgSD | EmpSD | Cov95 | ||
| 0.1 | β̂0 | 0.07 | 0.93 | 0.96 | 0.87 | α̂0 | 0.02 | 0.77 | 0.72 | 0.90 |
| β̂1 | 0.01 | 1.36 | 1.49 | 0.89 | α̂1 | 0.06 | 1.18 | 1.15 | 0.92 | |
| β̂2 | 0.00 | 0.92 | 0.86 | 0.92 | α̂2 | 0.01 | 0.75 | 0.66 | 0.95 | |
| 0.3 | β̂0 | 0.04 | 0.55 | 0.49 | 0.94 | α̂0 | 0.02 | 0.41 | 0.38 | 0.94 |
| β̂1 | 0.03 | 0.82 | 0.77 | 0.95 | α̂1 | 0.04 | 0.66 | 0.61 | 0.95 | |
| β̂2 | 0.02 | 0.51 | 0.43 | 0.97 | α̂2 | 0.02 | 0.39 | 0.36 | 0.96 | |
| 0.5 | β̂0 | 0.01 | 0.35 | 0.31 | 0.96 | α̂0 | 0.01 | 0.31 | 0.28 | 0.95 |
| β̂1 | 0.03 | 0.55 | 0.50 | 0.95 | α̂1 | 0.02 | 0.49 | 0.45 | 0.96 | |
| β̂2 | 0.01 | 0.32 | 0.29 | 0.95 | α̂2 | 0.01 | 0.29 | 0.27 | 0.95 | |
| 0.7 | β̂0 | 0.01 | 0.28 | 0.24 | 0.96 | α̂0 | 0.03 | 0.28 | 0.26 | 0.95 |
| β̂1 | 0.02 | 0.43 | 0.39 | 0.96 | α̂1 | 0.02 | 0.43 | 0.39 | 0.96 | |
| β̂2 | 0.00 | 0.25 | 0.23 | 0.95 | α̂2 | 0.01 | 0.25 | 0.23 | 0.96 | |
Table 2 presents the hypothesis testing results. The empirical rejection rates (ERR) for both tests at level 0.05 are reported, together with the estimated average effects (AvgEst), empirical standard deviations of the average effects, and average resampling-based standard deviation estimates of the average effects. We see that the type I errors are close to the nominal level 0.05. The estimated average covariate effects of Z1 and Z2 are close to the true values. The resampling-based standard deviation estimates for the average covariate effect estimates agree well with the empirical standard deviations.
Table 2.
Simulation Results on Hypothesis Testing and Second-Stage Inference under AFT Models. ERR: empirical rejection rates; AvgEst: estimated average effects; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations.
| Unconditional Case | |||||
|---|---|---|---|---|---|
| b1=0, b2=−0.5 | |||||
|
| |||||
| H0: β(τ) = 0, l ≤ τ ≤ u | H0: β(τ) = η0, l ≤ τ ≤ u | ||||
| ERR | AvgEst | AvgSD | EmpSD | ERR | |
| β̂0 | 0.60 | −0.82 | 0.40 | 0.39 | 0.89 |
| β̂1 | 0.07 | −0.03 | 0.60 | 0.63 | 0.05 |
| β̂2 | 0.29 | −0.49 | 0.37 | 0.34 | 0.04 |
| Conditional Case | |||||
| b1=0, b2=−1 | |||||
|
| |||||
| H0: α(τ) = 0, l ≤ τ ≤ u | H0: α(τ) = η0, l ≤ τ ≤ u | ||||
| ERR | AvgEst | AvgSD | EmpSD | ERR | |
|
| |||||
| α̂0 | 0.76 | −0.79 | 0.31 | 0.30 | 0.96 |
| α̂1 | 0.06 | −0.03 | 0.49 | 0.49 | 0.05 |
| α̂2 | 0.93 | −1.01 | 0.30 | 0.29 | 0.03 |
In addition to the AFT setting, we considered a log linear model with heteroscedastic errors. That is, event times were generated from the model:
where Z1 followed Unif(0, 1), Z2 followed Bernoulli(0.5), and ξ followed Exp(1). The right censoring time U and left censoring time L were generated in the same way as in the AFT setting. We considered two different configurations: (a) ε was a N(0, 1) variate, p0 = 0.2, b1 = 0, b2 = −1.5, cl = 0.7, cu = 4.8, (b) ε was an extreme value variate, p0 = 0, t0 = 0.03, b1 = 0, b2 = −4.5, cl = 0.05, cu = 4.0. One can verify that model (1) holds under configuration (a) with g(·) = exp(·) and β0(τ) = {β0(τ), β1(τ), β2(τ)}Τ, where β0(τ) = Qε(τ), β1(τ) = 0, and β2(τ) = Qb2ξ+ε(τ) − β0(τ). Under configuration (b), model (7) was satisfied with g(·) = exp(·), α0(τ) = Qε(τ) = log{− log(1 − τ)}, α1(τ) = 0, and α2(τ) = log[Qexp(b2ξ+ε)(1 − Pr{exp(b2ξ + ε) > t0}(1 − τ)) − t0] − α0(τ). Here, for a random variable Y, QY (τ) denotes its τ–th quantile. Note that unlike in the AFT settings, the effects of Z2 were not constant and took a complicated analytic form. We approximated the true coefficients for Z2 by using bootstrapping. In (a), there were 25% right censoring and 20% left censoring. In (b), the rates of right censoring and left censoring were 15% and 25% respectively.
Tables 3–4 present estimation results and hypothesis testing results as Tables 1–2. We can see that in the presence of varying covariate effects, our proposed method also performs well. In both unconditional and conditional cases, the regression quantile estimates are virtually unbiased with standard deviations accurately estimated by the proposed resampling method. The estimates for average effects also have small biases with estimated standard deviations agreeing well with empirical ones. The proposed test for the overall significance and the test for the constancy of covariate effect appear to have the right sizes and also reasonable power.
Table 3.
Simulation Results under Log-Linear Models with Heteroscedastic Errors. Bias: absolute biases; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations; Cov95: coverage rates of 95% Wald confidence intervals.
| Unconditional Case | Conditional Case | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| b1=0, b2=−1.5 | b1=0, b2=−4.5 | |||||||||
|
| ||||||||||
| τ | Bias | AvgSD | EmpSD | Cov95 | Bias | AvgSD | EmpSD | Cov95 | ||
| 0.1 | β̂0 | 0.06 | 0.64 | 0.60 | 0.91 | α̂0 | 0.05 | 0.71 | 0.62 | 0.94 |
| β̂1 | 0.05 | 1.11 | 1.09 | 0.91 | α̂1 | 0.01 | 1.15 | 1.04 | 0.94 | |
| β̂2 | 0.09 | 1.09 | 1.02 | 0.85 | α̂2 | 0.02 | 0.82 | 0.75 | 0.94 | |
| 0.3 | β̂0 | 0.04 | 0.41 | 0.37 | 0.94 | α̂0 | 0.01 | 0.40 | 0.36 | 0.95 |
| β̂1 | 0.02 | 0.70 | 0.63 | 0.95 | α̂1 | 0.01 | 0.67 | 0.62 | 0.95 | |
| β̂2 | 0.00 | 0.65 | 0.53 | 0.95 | α̂2 | 0.01 | 0.49 | 0.44 | 0.95 | |
| 0.5 | β̂0 | 0.01 | 0.33 | 0.31 | 0.94 | α̂0 | 0.01 | 0.32 | 0.28 | 0.96 |
| β̂1 | 0.00 | 0.56 | 0.52 | 0.95 | α̂1 | 0.03 | 0.54 | 0.49 | 0.95 | |
| β̂2 | 0.01 | 0.42 | 0.37 | 0.95 | α̂2 | 0.00 | 0.39 | 0.36 | 0.95 | |
| 0.7 | β̂0 | 0.02 | 0.32 | 0.29 | 0.96 | α̂0 | 0.03 | 0.30 | 0.27 | 0.95 |
| β̂1 | 0.00 | 0.53 | 0.49 | 0.95 | α̂1 | 0.02 | 0.51 | 0.46 | 0.96 | |
| β̂2 | 0.02 | 0.35 | 0.31 | 0.96 | α̂2 | 0.01 | 0.36 | 0.33 | 0.95 | |
Table 4.
Simulation Results on Hypothesis Testing and Second-Stage Inference under Log-Linear Models with Heteroscedastic Errors. ERR: empirical rejection rates; AvgEst: estimated average effects; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations.
| Unconditional Case | |||||
|---|---|---|---|---|---|
| b1=0, b2=−1.5 | |||||
|
| |||||
| H0: β(τ) = 0, l ≤ τ ≤ u | H0: β(τ) = η0, l ≤ τ ≤ u | ||||
| ERR | AvgEst | AvgSD | EmpSD | ERR | |
| β̂0 | 0.17 | −0.32 | 0.32 | 0.31 | 0.90 |
| β̂1 | 0.06 | 0.00 | 0.54 | 0.53 | 0.03 |
| β̂2 | 0.96 | −1.50 | 0.45 | 0.42 | 0.23 |
| Conditional Case | |||||
| b1=0, b2=−4.5 | |||||
|
| |||||
| H0: α(τ) = 0, l ≤ τ ≤ u | H0: α(τ) = η0, l ≤ τ ≤ u | ||||
| ERR | AvgEst | AvgSD | EmpSD | ERR | |
|
| |||||
| α̂0 | 0.79 | −0.79 | 0.30 | 0.28 | 0.98 |
| α̂1 | 0.05 | −0.02 | 0.50 | 0.48 | 0.04 |
| α̂2 | 0.99 | −1.66 | 0.37 | 0.36 | 0.14 |
We conducted simulations with a larger sample size, n = 400. A comparison of the results corresponding to n = 200 and n = 400 (Web Appendix C) suggests that the performance of the proposed estimators and tests further improves as sample size increases.
Simulations were also performed to compare our approach with a naive approach that simply discards all left-censored subjects. Data were generated from the same configurations as the unconditional cases in Table 1 and Table 2. We evaluated the estimation of model (1). Figure 1 displays the mean estimated coefficients from the proposed approach and those from the naive approach along with the true coefficients. This figure shows that the proposed estimators are virtually unbiased while the naive approach can produce substantial biases particularly in the estimation of non-zero coefficients. Different bias patterns with the naive method are observed in some other set-ups (with results not shown), for example, a large deviation from zero of the naive estimates for the zero-coefficients. These findings show that it is important to account for double censoring arising from practical situations.
Figure 1.
Comparison among True coefficients (Bold Solid Lines), Mean Estimated Coefficients from the Proposed Method (Solid Lines), and Mean Estimated Coefficients from the Naive Approach (Dotted Lines).
We also conducted simulations for cases with left truncation. Results reported in Web Appendix E show satisfactory performance of our proposals in Section 4.
6. CFFPR Data Example
We apply the proposed quantile regression method to the CFFPR data discussed in Section 1. Cystic Fibrosis (CF) is one of the most common and life-shortening genetic disorders affecting the lungs and digestive systems of about 30,000 children and adults in the United States and 70,000 worldwide (Cystic Fibrosis Foundation 2010). Pseudomonas aeruginosa (PA), the predominant bacterial pathogen infecting 80% of CF patients under age 18, accelerates decline in lung function (Kosorok et al. 2001) and serves as an important predictor of mortality in CF (Retsch-Bogart et al. 2008). In our analysis, we used the CFFPR data collected during 1986–2005 to investigate the association between onset ages of the first detected PA infection and several risk factors in CF patients diagnosed by age 10. Similar data were analyzed by Lai et al. (2004) under the Cox model and Yan et al. (2009) based on temporal process regression.
In the CFFPR, a patient’s age at the first PA infection, which is the event time of interest T, is subject to both left and right censoring. Among 12,818 CF patients diagnosed between 1986 and 2000, 3,343 (26.1%) patients had PA infection at study entry (i.e. T is left censored by the patient’s age at the first CFFPR record L) and 2,213 (17.3%) patients had no PA infection documented by December 2005 (i.e. T is right censored by age at the last follow-up before the cut-off date U). To avoid the complication with delayed entry, we restricted the study population to subjects who were diagnosed before age 10 years during 1986–2000 and alive at age 10 years. The first restriction was imposed because the first 10 years were known to have greatest potential to take advantage of early diagnosis (Campbell and White, 2005, Grosse et al., 2006). The second restriction was imposed to avoid left truncation due to mortality prior to CF diagnosis. Since the mortality rate before age 10 years was very low, about 1.5% (Grosse et al., 2006), we expect excluding patients who died before age 10 years would only result in a small deviation from the general young CF population. The restricted sample contains 11,179 patients with 23.7% left censoring and 16.2% right censoring.
We applied the proposed quantile regression method to this doubly-censored restricted CFFPR sample. Since the support of left censoring time L in this dataset appears to have a lower bound approaching 0, suggesting the lower tail identifiability, we fit the data with the unconditional model (1), choosing g(·) as the identity function. The same set of covariates examined in Yan et al. (2009) was considered, including gender (1 for females and 0 for males), diagnosis mode (denoted by “factor”) and diagnosis year (denoted by “dx”). Diagnosis mode was defined according to common clinical practices that identify CF prior to 2005, which includes four categories: diagnosis at birth due to meconium ileus (MI), diagnosis shortly after birth by neonatal/prenatal screening (SCR), diagnosis at variable ages because of family history (FH), and diagnosis at variable ages from various symptoms (SYMP) other than MI (Lai et al., 2004, Yan et al., 2009). Diagnosis year was classified into three periods, i.e., 1986–1989 (dx86), 1990–1993 (dx90), and 1994–2000 (dx94), that coincided with the major therapeutic breakthroughs in CF, which are, Pulmozyme in 1994 (Fuchs et al. 1994, Ramsey and Dorkin 1994) and TOBI in 1999 (Ramsey et al. 1999). Boy patients who were diagnosed between 1986 and 1989 (dx86) by symptoms other than MI (SYMP) was chosen as the reference comparison group. We adopted the same τ–grid as that used in simulations.
Figure 2 displays the proposed coefficient estimates for β0(τ) in bold solid lines along with their 95% pointwise Wald confidence intervals in bold dashed lines for τ ∈ [l, u] with l=0.10 and u=0.65. The naive estimates obtained from the subset excluding left-censored observations and the corresponding 95% confidence intervals are plotted in dot-dash lines and dotted lines respectively. The estimates for β0(τ) with τ close to 0 (not shown), though exhibiting rather large variability, are all converged solutions to the L1–minimization problem (5). This further suggests that the lower tail identifiability of regression quantiles may be of little concern in this example. As shown in Figure 2 (panel A), with all observations included in the analysis, the estimated intercept indicates that about 10% of male patients with CF diagnosed during 1986–1989 by SYMP acquired their first PA infection by age 2 years and approximately 65% of them had their first PA infection by age 9 years.
Figure 2.
Coefficient Estimates (Bold Solid Lines) and 95% Pointwise Confidence Intervals (Bold Dashed Lines) from the Proposed Method, in Contrast with Coefficient Estimates (Dot-Dash Lines) and 95% Pointwise Confidence Intervals (Dotted Lines) from the Naive Method.
With regard to the gender effect (Figure 2, panel B), the regression estimates for most of the quantile range, τ ∈ (0.2, 0.65), are negative, indicating that girls acquired their first PA infection earlier than boys. More importantly, the gender difference is more pronounced at larger τ’s, which correspond to patients who acquired first PA infection at older ages. Similarly, many other covariates show non-constant effects across the quantile range of 0.1 to 0.65. For example, the effect of ”FH diagnosis” (Figure 2, panel D) increases with τ, while the coefficient estimates for both “diagnosis 1990–1993” and “diagnosis 1994–2000” (Figure 2, panel F and G) decrease with τ. Such varying effects would not have been identified by traditional Cox regression or classic linear regression that only models the mean.
Our observation of the earlier onset of PA infection in girls compared to boys is consistent to previous literature (Lai et al., 2004, Yan et al., 2009). In contrast, our result on the beneficial effect of “FH diagnosis” (i.e., later first PA infection compared to the “SYMP diagnosis”) appears to be new. This effect would have been masked by an opposite effect if left-censored cases were excluded from the analysis. Panel D of Figure 2 shows that even the pointwise 95% confidence intervals do not overlap across the entire range of τ, leading to an erroneous conclusion on the effect of “FH diagnosis”. Substantial discrepancy is also noted on the magnitude of the regression estimates of covariates “MI diagnosis” (panel C), “diagnosis 1990–1993” (panel F) and “diagnosis 1994–2000” (panel G) between the proposed method and the naive method. These systematic differences are also observed in our simulation studies, and provide strong evidence to support the importance of appropriately accommodating left censored observations when investigating PA infections in CF.
Formal tests on the significance of covariate effects were performed based on the average quantile effects across τ ranging from 0.10 to 0.65. The delaying effect of FH diagnosis and the accelerating effects of more recent diagnosis cohorts (1990–1993 and 1994–2000) on first PA infections observed in Figure 2 (panels D, F, and G) are confirmed by very significant p-values, which are all < 0.001. The effect of gender is marginally significant with a p-value of 0.06. In the view of rather monotone patterns of the coefficients for these covariates, we conducted the proposed constancy tests using the weight function Θ̃(t) = I[t < (l + u)/2]. These analyses show that the effects of gender, FH, dx90 and dx94 may vary across τ, and the corresponding p-values are 0.002, 0.006, < 0.001, and < 0.001 respectively, confirming the visual trends illustrated in Figure 2. These results have some interesting clinical implications. The earlier acquisition of first PA infections in females with CF reported repeatedly in the literature (Lai et al., 2004, Yan et al., 2009) is not uniform across the entire female population, this gender effect is smaller among CF patients who are subject to high risk versus low risk of PA infection. The association between more recent diagnosis cohorts and shorter times to first PA infection may be explained by the increased culture frequency in these patients, which may shorten the time to detecting PA infection in patients with late onsets of PA infection in a greater extent, as compared to those who experienced first PA infections at young ages. This may be because patients in the latter group tended to have frequent sick visits in early life, which may offset the benefit of frequent cultures in the detection of PA infection.
7. Remarks
In this paper we propose a quantile regression method for doubly censored data. The stochastic integration presentation of the proposed estimating equation facilitates asymptotic studies and entails computationally simple implementations. A useful solution to handle the unique identifiability issue with doubly censored regression quantiles is proposed based on conditional inference. We also present an adaptation of our method to settings where left truncation is present.
As the key methodological idea of this work, we utilize the martingale structure embedded with the observed doubly censored data to construct an estimating function which takes the form of a stochastic integral of β(·). The advantages of using such an estimating function include: (1) it has expectation zero with all τ ∈ (0, 1) under the standard random censoring mechanism (i.e. (L, U)⊥T|Z), (2) it enables a simple sequential algorithm to approximate the solution of the resulting estimating equation. In principle, other estimating equations that possess similar features may be used as alternative approaches.
Note that the proposed estimation procedure does not guarantee the monotonicity of ZT β(τ) in τ. Nevertheless, we do not expect this would cause serious practical issues. Given the uniform consistency of β0(τ) and bounded covariates, one may use supu∈(0,τ]{ZT β(u)} instead of ZT β(τ) to produce a monotone and yet consistent prediction of quantile functions
The double censoring mechanism, (L, U) ⊥ C given Z, is adopted in this work and thus both L and U are allowed to depend on Z. In addition, we require L be always observed, which is often true in registry study settings, as exemplified by the CFFPR data. Such additional information on left censoring time contributes to identifying an appropriate martingale for constructing estimating equations. How to relax the assumption on known L nevertheless merits future research.
Based on our numerical experience, the proposed estimator is very robust to the choice of grid points. A grid-free estimation procedure may be developed by adapting the technique in Huang (2010). Provided the uniform consistency of β̂(τ), one may apply linear or polynomial interpolations on to obtain a consistent estimator of β0(τ) that is smooth in τ.
As suggested by one referee, a smooth B-spline estimator of β0(τ) may be derived directly based on equation (3). The computational procedure required to obtain the smooth B-spline estimator is more complex than for our step function estimator, because the representation (4) is no longer available. In addition, while we anticipate the large sample behavior of the smooth B-spline estimator will be the same as that of our β̂(τ), the asymptotic arguments we used for our β̂(τ) cannot be readily adapted, due to the more complex structure of the B-spline estimator. Development of computational techniques and rigorous asymptotic theory for the smooth B-spline estimator merits future research.
Supplementary Material
Acknowledgements
The authors thank Dr. Zhumin Zhang for her useful comments on the manuscript. This work was partially supported by National Science foundation grants DMS-0706985 and DMS-1007660 (to Peng) and National Institutes of Health grant DK072126 (to Lai).
Footnotes
Supplementary Materials
Web Appendices A, B and C referenced in Sections 2–4 are available under the Paper Information link at the Biometrics website http://www.tibs.org/biometrics.
References
- Barrodale I, Roberts F. Solution of an overdetermined system of equations in the l1 norm. Communications of the ACM. 1974;17:319–320. [Google Scholar]
- Buckley J, James I. Linear regression with censored data. Biometrika. 1979;66:429–436. [Google Scholar]
- Cai T, Cheng S. Semiparametric regression analysis for doubly censored data. Biometrika. 2004;91:277–290. [Google Scholar]
- Campbell Pr, White T. Newborn screening for cystic fibrosis: an opportunity to improve care and outcomes. The Journal of Pediatrics. 2005;147:S2–S5. doi: 10.1016/j.jpeds.2005.08.016. [DOI] [PubMed] [Google Scholar]
- Chang MN, Yang GL. Strong consistency of a nonparametric estimator of the survival function with doubly censored data. The Annals of Statistics. 1987;15:1536–1547. [Google Scholar]
- Cheng S, Wei L, Ying Z. Analysis of transformation models with censored data. Biometrika. 1995;82:835–845. [Google Scholar]
- Cox D. Regression models and life-tables. Journal of the Royal Statistical Society. Series B. 1972;34:187–220. [Google Scholar]
- Emerson J, Rosenfeld M, McNamara S, Ramsey B, Gibson R. Pseudomonas aeruginosa and other predictors of mortality and morbidity in young children with cystic fibrosis. Pediatric Pulmonology. 2002;34:91–100. doi: 10.1002/ppul.10127. [DOI] [PubMed] [Google Scholar]
- Fine JP, Yan J, Kosorok MR. Temporal process regression. Biometrika. 2004;91:683–703. [Google Scholar]
- FitzSimmons S. The changing epidemiology of cystic fibrosis. The Journal of Pediatrics. 1993;122:1–9. doi: 10.1016/s0022-3476(05)83478-x. [DOI] [PubMed] [Google Scholar]
- Fygenson M, Ritov Y. Monotone estimating equations for censored data. The Annals of Statistics. 1994;22:732–746. [Google Scholar]
- Gehan EA. A generalized two-sample wilcoxon test for doubly censored data. Biometrika. 1965;52:650–653. [PubMed] [Google Scholar]
- Grosse S, Rosenfeld M, Devine O, Lai H, PM F. Potential impact of newborn screening for cystic fibrosis on child survival: a systematic review and analysis. The Journal of Pediatrics. 2006;149:362–366. doi: 10.1016/j.jpeds.2006.04.059. [DOI] [PubMed] [Google Scholar]
- Honore B, Khan S, Powell J. Quantile regression under random censoring. Journal of Econometrics. 2002;109:67–105. [Google Scholar]
- Huang Y. Quantile Calculus and Censored Regression. The Annals of Statistics. 2010;38:1607–1637. doi: 10.1214/09-aos771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang Y, Peng L. Accelerated recurrent time models. Scandinavian Journal of Statistics. 2009;36:636–648. doi: 10.1111/j.1467-9469.2009.00645.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jin Z, Lin D, Wei L, Ying Z. Rank-based inference for the accelerated failure time model. Biometrika. 2003;90:341–353. [Google Scholar]
- Kaplan E, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958;53:457–481. [Google Scholar]
- Kerem E, Reisman J, Corey M, Canny G, Levison H. Prediction of mortality in patients with cystic fibrosis. New England Journal of Medicine. 1992;326:1187–1191. doi: 10.1056/NEJM199204303261804. [DOI] [PubMed] [Google Scholar]
- Koenker R, Bassett G. Regression quantiles. Econometrica. 1978;46:33–50. [Google Scholar]
- Lai H, Cheng Y, Cho H, Kosorok M, Farrell P. Association between initial disease presentation, lung disease outcomes, and survival in patients with cystic fibrosis. American Journal of Epidemiology. 2004;159:537–546. doi: 10.1093/aje/kwh083. [DOI] [PubMed] [Google Scholar]
- Louis T. Nonparametric analysis of an accelerated failure time model. Biometrika. 1981;68:381–390. [Google Scholar]
- Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep. 1966;50:163–170. [PubMed] [Google Scholar]
- Neocleous T, Branden KV, Portnoy S. Correction to censored regression quantiles by S. Portnoy. Journal of the American Statistical Association. 2006;101:860–861. [Google Scholar]
- Peng L, Huang Y. Survival analysis with temproal covariate effects. Biometrika. 2007;94:719–733. [Google Scholar]
- Peng L, Huang Y. Survival analysis with quantile regression models. Journal of the American Statistical Association. 2008;103:637–649. [Google Scholar]
- Portnoy S. Censored regression quantiles. Journal of the American Statistical Association. 2003;98:1001–1012. [Google Scholar]
- Portnoy S, Lin G. Asymptotics for censored regression quantiles. Journal of Nonparametric Statistics. 2010;22:115–130. [Google Scholar]
- Powell J. Least absolute deviations estimation for the censored regression model. Journal of Econometrics. 1984;25:303–325. [Google Scholar]
- Prentice R. Linear rank tests with right-censored data. Biometrika. 1978;65:167–179. [Google Scholar]
- Ren J-J. Regression m-estimators with non-i.i.d doubly censored data. The Annals of Statistics. 2003;31:1186–1219. [Google Scholar]
- Ren J-J. Weighted empirical likelihood in some two-sample semiparametric models with various types of censored data. The Annals of Statistics. 2008;36:147–166. [Google Scholar]
- Ren J-J, Gu M. Regression m-estimators with doubly censored data. The Annals of Statistics. 1997;25:2638–2664. [Google Scholar]
- Ritov Y. Estimation in a linear regression model with censored data. The Annals of Statistics. 1990;18:303–328. [Google Scholar]
- Sun J. The Statistical Analysis of Interval-Censored Failure Time Data. Springer; 2006. [Google Scholar]
- Tsai W, Jewell N, Wang M. A note on the product-limit estimator under right censoring and left truncation. Biometrika. 1987;74:883–886. [Google Scholar]
- Tsiatis AA. Estimating regression parameters using linear rank tests for censored data. The Annals of Statistics. 1990;18:354–372. [Google Scholar]
- Turnbull BW. Nonparametric estimation of a survivorship function with doubly censored data. Journal of the American Statistical Association. 1974;69:169–173. [Google Scholar]
- Wei L, Gail M. Nonparametric estimation for a scale-change with censored observations. Journal of the American Statistical Association. 1983;78:382–388. [Google Scholar]
- Wei L, Ying Z, Lin D. Linear regression analysis of censored survival data based on rank tests. Biometrika. 1990;77:845–851. [Google Scholar]
- Yan J, Cheng Y, Fine JP, Lai HJ. Uncovering symptom progression history from disease registry data with application to young cystic fibrosis patients. Biometrics. 2009;66:594–602. doi: 10.1111/j.1541-0420.2009.01288.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang S. Censored median regression using weighted empirical survival and hazard functions. Journal of the American Statistical Association. 1999;94:137–145. [Google Scholar]
- Ying Z, Jung S, Wei L. Survival analysis with median regression models. Journal of the American Statistical Association. 1995;90:178–184. [Google Scholar]
- Zhang C-H, Li X. Linear regression with doubly censored data. The Annals of Statistics. 1996;24:2720–2743. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


