The Change-Plane Cox Model

Susan Wei; Michael R Kosorok

doi:10.1093/biomet/asy050

. Author manuscript; available in PMC: 2019 Dec 1.

Published in final edited form as: Biometrika. 2018 Oct 17;105(4):891–903. doi: 10.1093/biomet/asy050

The Change-Plane Cox Model

Susan Wei ¹, Michael R Kosorok ²

PMCID: PMC6289527 NIHMSID: NIHMS988684 PMID: 30555175

Summary

We propose a projection pursuit technique in survival analysis for finding lower-dimensional projections that exhibit differentiated survival outcome. This idea is formally introduced as the change-plane Cox model, a non-regular Cox model with a change-plane in the covariate space dividing the population into two subgroups whose hazards are proportional. The proposed technique offers a potential framework for principled subgroup discovery. Estimation of the change-plane is accomplished via likelihood maximization over a data-driven sieve constructed using sliced inverse regression. Consistency of the sieve procedure for the change-plane parameters is established. In simulations the sieve estimator demonstrates better classification performance for subgroup identification than alternatives.

Keywords: Latent Supervised Learning, Projection Pursuit, Random Projection, Sieve Estimation, Sliced Inverse Regression, Subgroup Discovery

1. Introduction

Projection pursuit, the analysis of high-dimensional data via its lower-dimensional projections, is a common tool in exploratory data analysis. The idea is to search for projections that reveal interesting structure in the data. In this work, we present a projection pursuit technique in survival analysis where a projection is considered interesting if it leads to a separation of survival outcomes. The proposed technique is based on the change-plane Cox model, set forth below.

Let (X, Z, U) be a random vector of covariates, where $X \in R^{p}$ , $Z \in R^{q_{1}}$ , and $U \in R^{q_{2}}$ . Let $S^{p}$ be the collection of unit vectors in $R^{p}$ . The following assumptions constitute what shall be called the change-plane Cox model:

Assumption 1. The hazard function of the true survival time T^∘ has the form

λ (t ∣ X, Z, U) = exp {β_{1}^{T} Z + β_{2} 1 (ω^{T} X \geq γ) + β_{3}^{T} Z 1 (ω^{T} X \geq γ) + β_{4}^{T} U} λ (t),

(1)

where ω is an element of $S^{p}$ , γ is in some known interval [a, b], β = (β1, … , β₄) is the vector of regression parameters with at least either one of β₂ or β₃ nonzero for model identifiability, and λ(t) is an unknown baseline hazard function;

Assumption 2. The survival time T^∘ with hazard function (1) may be subject to right-censoring at a censoring time C which, conditional on (X, Z, U), is independent of T^∘;

Assumption 3. X and (Z, U) are independent.

We observe the covariate vector (X, Z, U), the censored time T = min(T^∘, C), and the censoring indicator δ, where δ = 1 if T^∘ ≤ C and δ = 0 otherwise. By seeking the change-plane, given by ω^TX = γ, we accomplish our goal of finding a lower-dimensional projection of X that reveals two subgroups with differentiated survival.

To fix ideas, imagine X to be a set of biomarkers potentially predictive of survival, Z a categorical treatment variable, and U a set of baseline covariates such as age or gender. In this case, the regression coefficient β₃ represents the interaction effect between treatment and the subgroup indicator 1(ω^TX ≥ γ). A significant β₃ is of practical interest since it would suggest the presence of treatment heterogeneity.

Rigorous assessment of β’s significance is likely to be challenging considering results in Pons (2003). There, it is shown for a certain change-point Cox model, which may be viewed as a special case of (1), that the maximum partial likelihood estimator for the change-point is n consistent but root-n consistent for the regression coefficients. Such non-regularity can be expected in the change-plane Cox model as well. Leaving distributional theory to future work, we propose in the meantime a resampling procedure in the Supplementary Material that serves as a heuristic proxy for assessing the significance of β.

2. Methodology

2·1. Overview

Our aim in this section is to propose an estimation scheme for the change-plane parameters in (1) based on a sample of n independent and identically distributed replicates of (R, T, δ) where R = (X, Z, U) denotes the full covariate set. The maximum partial likelihood estimator of the change-plane parameters can incur overfitting even when the dimension of X is moderately high, e.g., p = 25. This consideration leads us to employ a regularization technique known as Grenander’s method of sieves (Grenander, 1981), in which maximization takes place over an approximating subset of the parameter space called a sieve. It is desired that the sieve be dense, in a sense that will be later made rigorous in Definition 2. Interestingly, as demonstrated by Geman & Hwang (1982) in the context of nonparametric density estimation, regularization of the likelihood via the method of sieves may produce consistent estimators even when the full maximum likelihood estimator is not.

A sieve maximization scheme for fitting (1) is as follows. Collect the parameters into θ = (β, ω, γ). The sample log partial likelihood under (1) is

L_{n} (θ) = n^{- 1} \sum_{i = 1}^{n} (δ_{i} η (R_{i}, θ) - δ_{i} \log [\sum_{j : T_{j} \geq T_{i}} n^{- 1} exp {η (R_{i}, θ)}])

(2)

where $η (R, θ) = β_{1}^{T} Z + β_{2} 1 (ω^{T} X \geq γ) + β_{3}^{T} Z 1 (ω^{T} X \geq γ) + β_{4}^{T} U$ . The factor n⁻¹ is added for consistency with the empirical process notation in Section 3. Now, let

M_{n} (ω, γ) = L_{n} {{\hat{β}}_{n} (ω, γ), ω, γ}

(3)

where the quantity ${\hat{β}}_{n} (ω, γ) = {arg max}_{β} L_{n} (β, ω, γ)$ is uniquely defined and can be found via Newton’s method. We shall focus on the estimation of ω since, once it is determined, the other parameters in (1) can be estimated by profiling.

Definition 1. For a sieve $Ω_{n} \subset S^{p}$ , the corresponding sieve estimator for ω in (1) is

\hat{ω} (Ω_{n}) = \underset{ω \in Ω_{n}}{arg max} M_{n} {ω, \tilde{γ} (ω)}

where

\tilde{γ} (ω) = \underset{γ \in [a, b]}{arg max} M_{n} (ω, γ) .

(4)

The success of the sieve estimator hinges on the specification of rhe sieve. The remainder of Section 2 describes the construction of a data-driven sieve.

2·2. Initialization of the sieve

Algorithm 1 details the construction of an initial sieve consisting of vectors that represent possible change-planes in the X covariate space. Consideration of computation time leads to the particular choices in Algorithm 1, such as the number of clusters K, chosen deliberately so that ∣Ω₀∣ is linear in n. Similarly, the discarding of clusters with fewer than four elements and the downsampling of clusters with more than ten elements are taken merely for computational gain. To get a sense for the size of Ω₀, consider that Algorithm 1 applied to the simulations in Section 4 results in ∣Ω₀∣ ≈ 3000 for sample size n = 100. If computation time is a nonfactor, better empirical performance of the overall sieve procedure (Algorithm 2 in the next section) has been observed for Ω₀ in Algorithm 1 with a larger number of elements.

\begin{matrix} Algorithm 1 . Initial sieve Ω_{0} \\ Input : {X_{1}, \dots, X_{n}} \\ Initialize Ω_{0} to the empty set; \\ Set K to n ∕ 10; \\ Partition the data {X_{1}, \dots, X_{n}} into K clusters using K -means clustering; \\ Discard clusters with fewer than four elements; \\ Retain ten elements at random for clusters with more than ten elements; \\ foreach remaining cluster do \\ | \begin{matrix} foreach non - overlapping partition of the cluster into two parts P_{1} and P_{2} do \\ | Add to Ω_{0} the unit-length vector that connects the centroids of P_{1} and P_{2}; \end{matrix} \\ Output : Ω_{0} \end{matrix}

2·3. Updating the sieve using sliced inverse regression

We next update Ω₀ by incorporating survival information using sliced inverse regression (Li, 1991). We first briefly review the technique. Sliced inverse regression is based on a model in which a response variable S and a covariate vector X in $R^{p}$ satisfy

S = f (κ_{1}^{T} X, \dots, κ_{k}^{T} X, ϵ)

(5)

for unknown constant vectors κ_j’s of the same dimension as X, unknown function f, and noise term ϵ that is independent of X. Below is the linearity condition, satisfied by X with elliptically symmetric distributions, used to justify sliced inverse regression.

Condition 1. For any $b \in R^{p}$ , $E (b^{T} X ∣ ω_{0}^{T} X)$ is linear in $ω_{0}^{T} X$ .

If X satisfies Condition 1, then for every s, the centered inverse regression curve, E(X ∣ S = s) – E(X), is in the span of {Σκ₁, … , Σκ_k} where Σ= cov(X). Thus, the space spanned by the k eigenvectors of the covariance matrix of E(X ∣ S) associated with the k largest eigenvalues coincides with the span of {Σκ₁, … , Σκ_k}. Then clearly, the span of {κ₁, … , κ_k} itself can be obtained through standardization by Σ⁻¹. The inverse regression curve is estimated empirically by slicing the range of S into H nonoverlapping intervals I_h, h = 1, … , H and computing the sample version of E(X ∣ S ∈ I_h).

The subscript zero will be used to denote the true parameter value under (1). Since T^∘ with hazard function (1) satisfies (5) with k = 1, the recovery of ω₀ in the change-plane Cox model can be accomplished via an eigendecomposition of the covariance matrix of E(X ∣ T^∘), followed by standardization using Σ⁻¹. To avoid issues in estimating Σ and Σ⁻¹ using their sample versions, we assume throughout the paper that n > p. However, rather than slicing on T^∘, we slice simultaneously on T^∘ and $1 {ω^{T} X \geq \tilde{γ} (ω)}$ where ω ∈ Ω₀. Specifically, let 0 = t₁ < ⋯ < t_H < ∞ = t_H+1 be a partition of the positive real line into non-overlapping intervals I_h = [t_h, t_h+1), h = 1, … , H. Let ν(ω) denote the largest-eigenvalue eigenvector of the weighted covariance matrix

V (ω) = \sum_{l = 0}^{1} \sum_{h = 1}^{H} p_{hl} (ω) {m_{hl} (ω) - E (X)} {m_{hl} (ω) - E (X)}^{T}

(6)

where

\begin{matrix} m_{hl} (ω) & = E [X ∣ T^{\circ} \in I_{h}, 1 {ω^{T} X \geq \tilde{γ} (ω)} = l], \\ p_{hl} (ω) & = pr [T^{\circ} \in I_{h}, 1 {ω^{T} X \geq \tilde{γ} (ω)} = l] . \end{matrix}

Assuming Condition 1 holds, the rescaled eigenvector Σ⁻¹ν(ω) coincides with the desired ω₀.

We now describe an estimate of V (ω) that accounts for censoring by employing the conditioning argument in Li et al. (1999). First, we have

m_{h 1} (ω) = \frac{E [X 1 {T^{\circ} \geq t_{h}, ω^{T} X \geq \tilde{γ} (ω)}] - E [X 1 {T^{\circ} \geq t_{h + 1}, ω^{T} X \geq \tilde{γ} (ω)}]}{E [1 {T^{\circ} \geq t_{h}, ω^{T} X \geq \tilde{γ} (ω)}] - E [1 {T^{\circ} \geq t_{h + 1}, ω^{T} X \geq \tilde{γ} (ω)}]},

which can be further decomposed as

\begin{matrix} E [X 1 {T^{\circ} \geq t, ω^{T} X \geq \tilde{γ} (ω)}] \\ = E [X 1 {T \geq t, ω^{T} X \geq \tilde{γ} (ω)}] + E [X 1 {T < t, δ = 0, ω^{T} X \geq \tilde{γ} (ω)} α (T, t, X)] \end{matrix}

where

α (t^{'}, t, X) = pr (T^{\circ} \geq t ∣ X) ∕ pr (T^{\circ} \geq t^{'} ∣ X), t^{'} < t,

(7)

can be interpreted as a weight adjusting for the presence of censoring. This decomposition allows us to rewrite the numerator of m_h1(ω) as

\begin{matrix} E [X 1 {T^{\circ} \geq t_{h}, ω^{T} X \geq \tilde{γ} (ω)}] - E [X 1 {T^{\circ} \geq t_{h + 1}, ω^{T} X \geq \tilde{γ} (ω)}] \\ = E [X 1 {t_{h} \leq T \leq t_{h + 1}, ω^{T} X \geq \tilde{γ} (ω)}] + E [X 1 {T < t_{h}, δ = 0, ω^{T} X \geq \tilde{γ} (ω)} α (T, t_{h}, X)] \\ - E [X 1 {T < t_{h + 1}, δ = 0, ω^{T} X \geq \tilde{γ} (ω)} α (T, t_{h + 1}, X)] . \end{matrix}

Thus we can slice on the observed survival time T rather than T^∘. Let

\begin{matrix} {\hat{c}}_{i, h 1} (ω) = 1 {t_{h} \leq T_{i} < t_{h + 1}, ω^{T} X_{i} \geq \tilde{γ} (ω)}) + 1 {T_{i} < t_{h}, δ_{i} = 0, ω^{T} X_{i} \geq \tilde{γ} (ω)} \hat{α} (T_{i}, t_{h}, X_{i}) \\ - 1 {T_{i} < t_{h + 1}, δ_{i} = 0, ω^{T} X_{i} \geq \tilde{γ} (ω)} \hat{α} (T_{i}, t_{h + 1}, X_{i}), \end{matrix}

where $\hat{α} (\cdot, \cdot, \cdot)$ denotes a nonparametric estimate of (7) to be discussed in Section 2·4. To estimate m_h1 and p_h1, we use the sample moments

{\hat{m}}_{h 1} (ω) = \sum_{i = 1}^{n} X_{i} {\hat{c}}_{i, h 1} (ω) / \sum_{i = 1}^{n} {\hat{c}}_{i, h 1} (ω),

and ${\hat{p}}_{h 1} (ω) = n^{- 1} \sum_{i = 1}^{n} {\hat{c}}_{i, h 1} (ω)$ , respectively. The estimation of m_h0 and p_h0 is analogous. These components are incorporated into the data-driven sieve detailed in Algorithm 2. Let the resulting sieve be denoted ${\hat{Ω}}_{n}$ . The sieve estimator associated to it will be written $\hat{ω} ({\hat{Ω}}_{n})$ , following the notation introduced in Definition 1.

\begin{matrix} Algorithm 2 . Data-driven sieve {\hat{Ω}}_{n} based on sliced inverse regression \\ Input : (X_{i}, T_{i}, δ_{i}), i = 1, \dots, n \\ H, the number of slices \\ Ω_{0}, the initial sieve \\ \hat{α} (\cdot, \cdot, \cdot), censoring weight estimate . \\ Initialize {\hat{Ω}}_{n} \subset 𝕊^{p} to the empty set; \\ Find \hat{Σ}, the empirical covariance matrix based on X_{1}, \dots, X_{n}; \\ Set {t_{h}}_{h = 1}^{\infty} according to the observed range of T_{i} ’ s divided into H equal intervals with \\ t_{1} = 0 and t_{H + 1} = \infty; \\ Find \hat{α} (T_{i}, t_{h + 1}, X_{i}), i = 1, \dots, n; h = 1, \dots, H; \\ foreach ω \in Ω_{0} do \\ ∣ \begin{matrix} Find {\hat{V}}_{n} (ω) = \sum_{l = 0}^{1} \sum_{h = 1}^{H} {\hat{p}}_{h l} (ω) {{\hat{m}}_{h l} (ω - \overline{X})} {{\hat{m}}_{h l} (ω) - \overline{X}}^{T}; \\ Find the largest-eigenvalue eigenvector of {\hat{V}}_{n} (ω), denote this by {\hat{ν}}_{n} (ω); \\ Add {\hat{Σ}}^{- 1} {\hat{ν}}_{n} (ω)f, normalized to unit length, to {\hat{Ω}}_{n}; \end{matrix} \\ Output : {\hat{Ω}}_{n} \end{matrix}

Algorithm 2 is rather insensitive to H and we recommend setting it to 10. Far more critical to Algorithm 2 is the estimation of the censoring weight, the focus of the next section.

2·4. Estimation of censoring weights

The estimation of the censoring weight α in (7) reduces to that of pr(T^∘ ≥ t ∣ X), the conditional survival function of T^∘. We shall consider two nonparametric estimates of the latter, and hence of (7) itself. The first is the classic nonparametric kernel estimator in Beran (1981), which is described in equations (3.11) to (3.13) of Li et al. (1999) in notation similar to our setting. The corresponding censoring weight estimate shall also be referred to as Beran’s kernel estimate.

Despite its simplicity, the performance of Beran’s kernel estimate quickly deteriorates as the dimension of X increases. This limitation may be overcome by modern machine learning techniques. We shall employ the recursively imputed survival tree method proposed by Zhu & Kosorok (2012), a powerful, albeit complex, method for estimating the conditional survival function for censored data.

The recursively imputed survival tree combines imputation of censored observations with the idea of extremely randomized trees. Like the random forest, the extremely randomized tree selects a subset of candidate features at random. However, it does not search for the most discriminative cutpoints as in the random forest, rather basing itself on random thresholds for each covariate. The imputation of censored observations enables more terminal nodes, and thus more complex trees, to be constructed. Full details of the recursively imputed survival tree algorithm are given in the Supplementary Material. We have found that the recursively imputed survival tree estimate of α leads to better performance of Algorithm 2 compared to Beran’s kernel estimate, as soon as the dimension of X increases beyond a few dimensions, e.g., p > 5.

3. Consistency

Theorem 1 establishes the consistency of the sieve estimator corresponding to a general sieve Ω_n under the following conditions:

Condition 2. The parameter θ₀ = (β₀, ω₀, γ₀) lies in a compact subset Θ = Θ₁ × Θ₂ of $R^{2 q_{1} + q_{2} + 1} \times S^{p} \times [a, b]$ where Θ₁ and Θ₂ are compact subsets of $R^{2 q_{1} + q_{2} + 1}$ and $S^{p} \times [a, b]$ , respectively.

Condition 3. The covariate X has a continuous distribution and the projection $ω_{0}^{T} X$ has a strictly bounded and positive density f over [a, b].

Condition 4. The probabilities pr(C = 0) = 0 and pr(C ≥ τ ∣ X) = pr(C = τ ∣ X) are positive almost surely for some 0 < τ < ∞.

Condition 5. The variables Z and U lie in bounded sets.

Conditions 2 and 3 are rather technical and simplify the proof. Condition 4 is common in survival analysis, though it is not precisely true in practice, e.g. in a clinical trial with staggered entry. Condition 5 is needed for an application of the dominated convergence theorem. The statement of Theorem 1 requires a definition first.

Definition 2. A sieve $Ω_{n} \subset S^{p}$ is called dense for (1) if there exists a sequence ω_n ∈ Ω_n such that {ω_n, $\tilde{γ} (ω_{n})$ } converges to (ω₀, γ₀) as n → ∞.

Theorem 1 (Consistency of general sieve estimator). Suppose Conditions 2–5 and $Ω_{n} \subset S^{p}$ is a dense sieve for (1). If ${\hat{ω}}_{n} = \hat{ω} (Ω_{n})$ denotes the sieve estimator, then ${{\hat{ω}}_{n}, \tilde{γ} ({\hat{ω}}_{n})}$ is consistent for (ω₀, γ₀) as n → ∞.

The proof of Theorem 1 can be found in the Appendix. Next, Corollary 1 establishes the consistency of the sieve estimator corresponding to Algorithm 2 under Condition 1 and the following meta-condition: Condition 6. The censoring weight estimate $\hat{α}$ is such that for every ω ∈ Ω₀, ${\hat{m}}_{hl} (ω)$ is consistent for m_hl(ω) as n → ∞ for h = 1, … , H and l = 0, 1.

Though we will limit our discussion of Condition 6 to the two estimators considered in Section 2·4, its specification is left deliberately broad so as to allow for other possible censoring weight estimators.

For Beran’s kernel estimate, the arguments in the proof of Lemma 3.1 in Li et al. (1999) can be used to verify Condition 6. The application of Lemma 3.1 requires regularity conditions labeled therein as (B.1), (B.3), (B.5) and (B.8), which mostly pertain to the relationship between the bandwidth rate and the bias and variance terms of the kernel estimate.

As for the recursively imputed survival tree estimate of α, Theorem 1 of Cui et al. (2017) addresses the consistency of estimating the underlying hazard function using a similar survival tree-based method. In both cases, a single tree is partitioned enough so that the failure and censoring observations in the terminal nodes are approximately independent while maintaining a sufficient number of observations. In Theorem 1 of Cui et al. (2017), this is used to establish consistency of the resulting local Nelson-Aalen estimators for the conditional hazard estimators. For the recurseively imputed survival tree, the Kaplan-Meier estimator (approximately through the Monte Carlo EM algorithm) is used instead of the Nelson-Aalen estimator.

For both Lemma 3.1 in Li et al. (1999) and Theorem 1 in Cui et al. (2017), suitable smoothness on the conditional survival function is most convenient in ascertaining the key conditions. Under Condition 3, the region where the smoothness is not met by the change-plane Cox model, i.e. the change-plane, can be bounded by a region with arbitrarily small probability.

Corollary 1 (Consistency of sieve estimator corresponding to Algorithm 2). Let ${\hat{Ω}}_{n}$ denote the sieve produced by Algorithm 2 for some nonempty initial sieve Ω₀. Suppose Conditions 1–6 hold. If ${\hat{ω}}_{n} = \hat{ω} ({\hat{Ω}}_{n})$ denotes the sieve estimator, then ${{\hat{ω}}_{n}, \tilde{γ} ({\hat{ω}}_{n})}$ is consistent for (ω₀, γ₀) as n → ∞.

Proof. Let ω ∈ Ω₀. Through conditioning, we have the identity

\begin{matrix} m_{h 1} (ω) & = E {X ∣ T^{\circ} \in [t_{h}, t_{h + 1}), ω^{T} X \geq \tilde{γ} (ω)} \\ = E {E (X ∣ T^{\circ}) ∣ T^{\circ} \in [t_{h}, t_{h + 1}), ω^{T} X \geq \tilde{γ} (ω)} . \end{matrix}

A similar identity holds for m_h0. By Condition 1, ν(ω), the largest-eigenvalue eigenvector of (6), is a scalar multiple of Σω₀. By Condition 6, the individual components in ${\hat{V}}_{n} (ω)$ are consistent for their theoretical counterparts. Thus ${\hat{V}}_{n} (ω)$ is consistent for V (ω) and hence the eigenvector ${\hat{ν}}_{n} (ω)$ is consistent for ν(ω) as n → ∞. Thus, the sieve ${\hat{Ω}}_{n}$ is dense and Theorem 1 yields the desired result.□

4. Simulation study

In this section, we use simulation to compare the sieve estimator to two alternatives. To focus on subgroup identification in the change-plane Cox model, we set Z = 1 and U = 0 in (1). This yields the reduced change-plane Cox model, with hazard function

λ (t ∣ X) = exp {β 1 (ω^{T} X \geq γ)} λ (t) .

Subgroup identification in this model can be viewed as a type of latent supervised learning (Wei & Kosorok, 2013) where the right-censored survival time plays the role of a surrogate training label.

The first alternative we consider is the double-slicing procedure proposed in Li et al. (1999), which simultaneously slices on the censored survival time and the censoring indicator. A critical assumption is that the censoring time also satisfies a sliced inverse regression representation, i.e.

C = g (κ_{1}^{T} X, \dots, κ_{c}^{T} X, ϵ^{'})

(8)

where g and ϵ′ are unspecified, and ϵ′ is independent of X. As Li’s double-slicing method does not automatically produce an estimate of γ, we obtain one by applying $\tilde{γ}$ in (4) to the estimated ω. A complete description of Li’s double-slicing method can be found in the Supplementary Material.

The second alternative we consider is the standard survival tree implemented using the rpart package in R (Therneau & Atkinson, 2018). We use the rpart tree to produce a direct estimate of subgroup membership since oen cannot be obtained for the change-plane itself. This is done by thresholding the hazard rate at unity to divide the terminal nodes of the rpart tree into two subgroups. The rpart survival tree should not be confused with the recursively imputed survival tree. The latter is used in this paper solely for the estimation of α. It must also be said that rpart was implemented using default rather than carefully-tuned parameters.

The sieve estimator corresponding to Algorithm 2 is implemented as follows. The initial sieve Ω₀ is produced according to Algorithm 1 with K = n/10. The recursively imputed survival tree is used to estimate the conditional survival function of T^∘ and, in turn, the censoring weight α.

The simulation setup is as follows. We draw n = 100 independent and identically distributed observations (X, T, δ) from the reduced change-plane Cox model with parameters

\begin{matrix} β & = \log 10, λ (t) = 1, X \sim N (0, I_{p}), \\ ω & = \underset{[p ∕ 2]}{(\underset{︸}{p^{- 1 ∕ 2}, \dots, p^{- 1 ∕ 2}}}, \underset{p - [p ∕ 2]}{\underset{︸}{- p^{- 1 ∕ 2}, \dots, - p^{- 1 ∕ 2}})}, γ = 1 / 4, \end{matrix}

and one of three censoring mechanisms in Table 1. As this setup results in exponential survival times on either side of the change-plane with all components of ω nonzero, we call it the abundant exponential simulation.

Table 1:

Censoring mechanisms

Name	Distribution
independent	C ~ uniform(0, 10)
linear	C ~ min{uniform(0, 31.97), 20}1(ω^TX ≥ γ) + min{uniform(0, 3.2), 2}1(ω^TX < γ)
nonlinear	$C \sim exponential {10^{- 1} exp (X_{1} + X_{2}^{2} + \log ∣ X_{3} ∣)}$

Open in a new tab

We write uniform(a, b) to denote the uniform distribution with parameters a and b and exponential(μ) to denote the exponential distribution with mean μ. The independent setting is so-called because censoring is independent of X. In the linear setting, censoring is dependent on X only through the change-plane while in the nonlinear setting censoring depends nonlinearly on X.

The average misclassification rate over 100 Monte Carlo simulations on a large independent test set (sample size 10, 000) of the covariate X will serve as the measure of performance. Figure 1 summarizes the classification performance of the three methods as a function of dimension p for each of the three censoring mechanisms in Table 1.

Fig. 1: — Results for abundant exponential simulation. Misclassification rate over 100 Monte Carlo simulations for the sieve (solid), Li’s double-slicing (dotted), and rpart tree (dashed), as a function of dimension p. Vertical bars indicate Monte Carlo simulation error.

The sieve estimator performs better than Li’s double-slicing procedure under the independent censoring mechanism, since there is no benefit to slicing on the censoring variable. In the linear censoring case, the two methods have similar performance since the sieve estimator is unlikely to provide a substantial improvement when C satisfies (8). In contrast, under the nonlinear censoring mechanism, C cannot be written as a function of a linear combination of the covariates which leads to the violation of (8) in Li’s double-slicing model. The sieve estimator slightly outperforms it in this case.

Figure 1 reveals the rpart tree has difficulty across all censoring mechanisms and dimensions, probably because the geometry of the change-plane is far from that assumed by it. When the geometry is favorable to the rpart survival tree, it can be expected to perform substantially better. An example of this can be found in the sparse exponential simulation presented in the Supplementary Material. The rpart approach is still outperformed by both the sieve estimator and Li’s double-slicing for dimensions p = 5, 10, 25. It is not until p = 50 that it shows its advantages. Nonetheless, survival tree methods for subgroup identification cannot produce subgroups that are contiguous in the covariate space, which may hamper interpretability in certain settings.

The abundant exponential simulation in this section and the sparse exponential simulation in the Supplementary Material both consider an idealized setting where the data are generated according the reduced change-plane Cox model. The sieve estimator is seen to offer generally better classification performance over both Li’s double-slicing and the rpart tree across a range of dimensions p and censoring mechanisms.

5. Future work

We originally envisioned the change-plane Cox model as a tool for performing subgroup discovery, which aims to identify subgroups with heterogenous treatment responses from a very large pool of candidate subgroups (Lipkovich et al., 2017). Given its post-hoc nature, subgroup discovery, and more generally subgroup analysis, is notoriously controversial (Wang et al., 2007). The change-plane Cox model may provide a principled, data-driven framework for subgroup discovery when the outcome of interest is survival. However, as the data examples in the Supplementary Material highlight, several issues must be addressed before the potential can be realized.

In the Supplementary Material, we apply the full change-plane Cox model to two datasets. The significance of β is assessed by repeatedly partitioning the data into training and test sets. Each time, only the training data is used to obtain an estimate of the change-plane parameters ω and γ. The significance of the regression coefficient β is then assessed in the test set, ignoring the fact that the change-plane was learned from the data. For both datasets, the resampling strategy reveals that significant β coefficients in the training data may not remain so in the test set.

Distributional theory for the parameters in the change-plane Cox model, which is currently lacking, could help identify these instances of overoptimism. For now, we recommend any application of the proposed technique always be accompanied by the resampling strategy, which seems adequate for detecting whether the subgroups discovered are real or not. A deeper issue is the challenge that data-driven approaches pose to the standard paradigm of the scientific method. When hypotheses are generated from the data, care is needed to avoid confirmation bias.

Supplementary Material

Supplementary material

NIHMS988684-supplement-Supplementary_material.pdf^{(290.2KB, pdf)}

Acknowledgement

The authors are grateful to all the reviewers, especially the Associate Editor, for their meticulous readings and invaluable input. The authors would also like to thank Ruoqing Zhu for helpful comments on an early version. The second author was supported in part by grant P01 CA142538 from the US National Cancer Institute and from grant DMS-1407732 from the US National Science Foundation.

Appendix

Proof of Theorem 1. Let P denote the probability measure of W = (R, T, δ) under (1). Define the empirical measure to be $P_{n} = n^{- 1} \sum_{i = 1}^{n} δ_{W_{i}}$ where δ_w is the measure that assigns mass 1 at w and zero elsewhere. For a measurable function f, we denote $P_{n} f = n^{- 1} \sum_{i = 1}^{n} f (W_{i})$ and Pf = ∫ f dP. Let $\tilde{W} = (\tilde{R}, \tilde{T}, \tilde{δ})$ be a realization from P, independent of W. Let $\tilde{P}$ and ${\tilde{P}}_{n}$ be defined analogously for $\tilde{W}$ . Next let Y(t) = 1(T ≥ t) be the at-risk process. Using empirical process notation we can write (2) and (3) as $L_{n} (θ) = {\tilde{P}}_{n} \tilde{δ} {η (\tilde{R}, θ) - \log F_{n} (\tilde{T}, θ)}$ where $F_{n} (t, θ) = P_{n} Y (t) exp {η (R, θ)}$ and $M_{n} (ω, γ) = {\tilde{P}}_{n} \tilde{δ} {η (\tilde{R}, {\hat{β}}_{n} (ω, γ), ω, γ) - \log F_{n} (\tilde{T}, {\hat{β}}_{n} (ω, γ), ω, γ)}$ . In the expressions for L_n and M_n, the random variables ( $\tilde{R}$ , $\tilde{T}$ , $\tilde{δ}$ ) in the first term on the right-hand side have their expectations taken with respect to ${\tilde{P}}_{n}$ . In the second term on the right hand side, two successive integrations take place: first the expectation of (R, T, δ) in F_n with respect to $P_{n}$ and then the expectation of $\tilde{T}$ with respect to $P_{n}$ . Let F₀(t, θ) = PY (t) exp{η(R, θ)}. The corresponding population versions of L_n and M_n are

L_{p} (θ) = \tilde{P} \tilde{δ} {η (\tilde{R}, θ) - \log F_{0} (\tilde{T}, θ)},

(A1)

and

M (ω, γ) = \tilde{P} \tilde{δ} [η {\tilde{R}, β (ω, γ), ω, γ} - \log F_{0} {\tilde{T}, β (ω, γ), ω, γ}]

where β(ω, γ) = arg max_β L_P(β, ω, γ). The subscript in L_p refers to the fact that this is a partial likelihood. Later we will use L to denote the full likelihood.

Following the argmax theorem in M-estimation theory (Kosorok, 2008, Theorem 14.1), the following conditions are sufficient to obtain consistency: 1) The sequence { ${\hat{ω}}_{n}$ , $\tilde{γ} ({\hat{ω}}_{n})$ } is uniformly tight; 2a) The map (ω, γ) ↦ M(ω, γ) is upper semi-continuous with 2b) a unique maximum at (ω₀, γ₀); 3) M_n converges to M uniformly over every compact set K in Θ₂; and 4) the sieve estimator nearly maximizes the objective function, i.e., $M_{n} {{\hat{ω}}_{n}, \tilde{γ} ({\hat{ω}}_{n})} \geq M_{n} (ω_{0}, γ_{0}) - o_{P} (1)$ . We now check each of these conditions in turn.

The first condition of the argmax theorem holds since $‖ {\hat{ω}}_{n} ‖ = 1$ and $\tilde{γ} ({\hat{ω}}_{n})$ must lie in the interval [a, b]. For condition (2a), we will show that M(ω, γ) is continuous. Let (ω_n, γ_n) be a sequence converging to (ω, γ) and β_n be a sequence converging to β. Then θ_n = (β_n, ω_n, n) is a sequence converging to θ = (β, ω, γ). We first show that $\tilde{P} \tilde{δ} η (\tilde{R}, θ_{n}) \to \tilde{P} \tilde{δ} η (\tilde{R}, θ)$ if θ_n → θ. This can be seen to hold component-wise for η in light of Conditions 3 and 5. We’ll show it explicitly for one of the components. Since X is continuous by Condition 3, we have

\begin{matrix} ∣ P δ 1 (ω_{n}^{T} X \geq γ_{n}) - δ 1 (ω^{T} X \geq γ) ∣ \\ \leq P δ ∣ 1 (ω_{n}^{T} X \geq γ_{n}) - 1 (ω^{T} X \geq γ) ∣ 1 (∣ ω_{n}^{T} X - γ_{n} - ω^{T} X - γ_{0} ∣ \leq ϵ) \\ + P δ ∣ 1 (ω_{n}^{T} X \geq γ_{n}) - 1 (ω^{T} X \geq γ) ∣ 1 (∣ ω_{n}^{T} X - γ_{n} - ω^{T} X - γ_{0} ∣ > ϵ) \to 0 . \end{matrix}

If β(ω_n, γ_n) → β(ω, γ) then $F_{0} {\tilde{T}, β (ω_{n}, γ_{n}), ω_{n}, γ_{n}} \to F_{0} {\tilde{T}, β (ω, γ), ω, γ}$ almost surely. Note that $F_{0} {\tilde{T}, β (ω_{n}, γ_{n}), ω_{n}, γ_{n}}$ is bounded by an integrable function under Conditions 4 and 5. This gives $\tilde{P} \tilde{δ} \log F_{0} {\tilde{T}, β (ω_{n}, γ_{n}), ω_{n}, γ_{n}} \to \tilde{P} \tilde{δ} \log F_{0} {\tilde{T}, β (ω, γ), ω, γ}$ . Thus to show that M(ω, γ) is continuous, it suffices to establish continuity of β(ω, γ). To see this, first note L_p(θ) is continuous using the arguments above. Next we establish that L_p(θ) has a unique maximum in β for every pair (ω, γ). Consider

\frac{\partial}{\partial β} L_{p} (θ) = \tilde{P} \tilde{δ} [\frac{\partial}{\partial β} η (\tilde{R}, θ) - \frac{PY (\tilde{T}) exp {η (R, θ)} \frac{\partial}{\partial β} η (R, θ)}{PY (\tilde{T}) exp {η (R, θ)}}]

where

\frac{\partial}{\partial β} η (R, θ) = {Z, 1 (ω^{T} X \geq γ), Z 1 (ω^{T} X \geq γ), U} .

A straightforward calculation shows the second partial derivative with respect to β is strictly negative definite. Thus β(ω_n, γ_n) → β(ω, γ).

We now verify condition (2b). Under (1), write the integrated hazard function of T^∘, given X, as exp{η(R, θ)}Λ(t) where Λ is continuous and monotone increasing with Λ(0) = 0. The joint likelihood, in θ and the nuisance parameter Λ, for a single observation (R, T, δ) is proportional to L(θ, Λ) ≡ {b(R, θ)λ(T)}^δ exp{−b(R, θ)Λ(T)} where b(R, θ) = exp{η(R, θ)}. Next we check (2b) by showing that the profile of L over Λ equals L_p(θ) in equation (A1) up to a constant, which will then enable us to use the standard Kullback–Leibler argument for identifiability to show that θ₀ is a unique maximizer of (A4) and hence (ω₀, γ₀) is a unique maximizer of M(ω, γ).

In L, replace λ(t) with λ_s(t) = {1 + sf(t)}λ(t), where f is for now an unspecified bounded function, and take the Gateaux derivative of L with respect to s at s = 0. Letting N(t) = 1(T ≤ t. δ = 1) be the counting process and using the fact that P dN(t) = PY (t)b(R, θ₀) dΛ₀(t), we obtain that the expectation of the resulting derivative is

\int_{0}^{τ} f (t) P {Y (t) b (R, θ_{0})} d Λ_{0} (t) - \int_{0}^{τ} f (t) P {Y (t) b (R, θ)} d Λ (t) .

(A2)

Now if we replace Λ in (A2) with $Λ_{s} (t) = \int_{0}^{t} {1 + sg (u)} d Λ (u)$ , for some other function g, and differentiate again with respect to s at s = 0, we obtain that the second Gateaux derivative is $- \int_{0}^{t} f (t) g (t) P {Y (t) b (R, θ)} d Λ (t)$ , which is strictly negative when f = g, implying that for fixed θ, any Λ which is a zero of (A2) for a rich enough collection of functions f is a maximizer over all Λ for fixed θ. Plug f(t) = 1(t ≤ u) into (A2), and allow u to range over [0, τ], and we obtain that the profile maximizer of L over Λ satisfies $\int_{0}^{u} P {Y (t) b (R, θ_{0})} d Λ_{0} (t) - \int_{0}^{u} P {Y (t) b (R, θ)} d Λ (t) = 0$ for all u ∈ [0, τ]. Hence

\frac{d Λ (t)}{d Λ_{0} (t)} = \frac{P {Y (t) b (R, θ_{0})}}{P {Y (t) b (R, θ)}} .

(A3)

Plugging (A3) back into L, and removing additive terms which are constants with respect to θ, we obtain that the profile of L over the parameter Λ is

P (\int_{0}^{τ} \log b (R, θ) dN (t) - \int_{0}^{τ} \log [P {Y (t) b (R, θ)}] dN (t)),

(A4)

which equals to L_p(θ) in equation (A1). Now let θ₁ maximize (A4). Then, by the fact that (A4) is the profile of L over the parameter Λ, there exists a Λ₁ such that the joint parameter (θ₁, Λ₁) maximizes L. By the property of the Kullback–Leibler discrepancy and model identifiability, this implies that θ₁ = θ₀. Hence (A4) has a unique maximizer at θ₀ and we have shown that M(ω, γ) is uniquely maximized at (ω₀, γ₀).

Proceeding on to condition (3) of the argmax theorem, fix a compact K = K₁ × K₂ ⊂ Θ where K₁ is compact in Θ₁ and K₂ is compact in Θ₂. Let m_θ(v, t, δ) = δ{η(v, θ) – log F_n(t, θ)} and consider the class of functions {m_θ(v, t, δ) : θ ∈ K}. First we consider the component {η(v, θ) : θ ∈ K}. Trivially, the classes {β_i} for i = 1, … , 4 are each Donsker, as are the classes {Z} and {U}. The class {1(ω^Tx ≥ γ) : (ω, γ) ∈ K₂} is also Donsker by the example in Section 4.1.1 of Kosorok (2008). Since products of bounded Donsker classes are Donsker, {η(v, θ) : θ ∈ K} is Donsker. Next, we examine the component {log F_n(t, θ) : t ∈ [0, τ], θ ∈ K}. The class {exp{β₂1(ω^Tx ≥ γ)}} is Donsker, since exponentiation is Lipschitz continuous on compact sets. The at-risk process Y (t) is Donsker by Lemma 4.1 in Kosorok (2008). Thus {log F_n(t, θ)} is Donsker. Repeating arguments for sums of Donsker classes and products of bounded Donsker classes shows that {m_θ(v, t, δ) : θ ∈ K} is a Donsker class of functions, and therefore also a Glivenko–Cantelli class of functions.

Now, let $m_{ω, γ} (v, t, δ) = δ {η (v, {\hat{β}}_{n} (ω, γ), ω, γ) - \log F_{n} (t, {\hat{β}}_{n} (ω, γ), ω, γ)}$ . Then we can write $M_{n} (ω, γ) = {\tilde{P}}_{n} m_{ω, γ} (\tilde{R}, \tilde{T}, \tilde{δ})$ . Since the estimated log ratio hazard ${\hat{β}}_{n} (ω, γ)$ lies in a compact set in Θ₁ for all (ω, γ) ∈ K₂, the class {m_{ω, γ}(v, t, δ) : (ω, γ) ∈ K₂} is contained in a Donsker class, which implies it is a Glivenko–Cantelli class. Thus

sup_{(ω, γ) \in K_{2}} ∣ M_{n} (ω, γ) - \tilde{P} m_{ω, γ} (\tilde{R}, \tilde{T}, \tilde{δ}) ∣ \to 0

in probability as n → ∞. Next we show that $\tilde{P} m_{ω, γ} (\tilde{R}, \tilde{T}, \tilde{δ})$ converges uniformly to M(ω, γ). The uniform convergence of ${\hat{β}}_{n} (ω, γ)$ to β(ω, γ) can be shown by adapting the arguments of Theorem 1 in Pons (2003). Next we show $F_{n} {t, {\hat{β}}_{n} (ω, γ), ω, γ}$ to F₀{t, β(ω, γ), ω, γ} uniformly over (ω, γ) ∈ K₂. We may write $F_{n} {t, {\hat{β}}_{n} (ω, γ), ω, γ} = P_{n} Y (t) exp [η {R, {\hat{β}}_{n} (ω, γ), ω, γ}]$ and F₀{t, β(ω, γ), ω, γ} = PY (t) exp[η{R, β(ω, γ), ω, γ}]. We have already argued the Donsker property of the classes {1(t ≥ r) : r ∈ [0; τ]} and {exp{η(v, θ)} : θ ∈ K}. Thus we conclude that {1(t ≥ r) exp{η(v, θ)} : r ∈ [0, τ], θ ∈ K} is Donsker and hence Glivenko–Cantelli. Hence, M_n(ω, γ) converges uniformly to M(ω, γ) over compact K₂ ⊂ Θ₂.

Finally, we look at condition (4) of the argmax theorem. If the sieve Ω_n is dense, there is a sequence ${ω_{n}, \tilde{γ} (ω_{n})} \in Ω_{n} \times [a, b]$ that converges to (ω₀, γ₀). By definition $M_{n} {ω_{n}, \tilde{γ} (ω_{n})} \geq M_{n} {ω_{n}, \tilde{γ} (ω_{n})}$ . By the continuity of M(ω, γ), $M_{n} (ω_{0}, γ_{0}) - M_{n} {ω_{n}, \tilde{γ} (ω_{n})} = o_{P} (1)$ and thus $M_{n} {{\hat{ω}}_{n}, \tilde{γ} ({\hat{ω}}_{n})} \geq M_{n} (ω_{0}, γ_{0}) - o_{P} (1)$ . The conditions of the argmax theorem are met, and consistency follows.□

Footnotes

Supplementary material

Supplementary material available at Biometrika online includes 1) descriptions of the recursively imputed survival tree and Li’s double-slicing method, 2) implementation details of all methods used in the simulations and data analyses, 3) results of the sparse exponential simulation, and 4) analysis of two survival datasets.

Contributor Information

Susan Wei, Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A., susanwei@umn.edu.

Michael R. Kosorok, Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599, U.S.A., kosorok@unc.edu

References

Beran R (1981). Nonparametric regression with randomly censored survival data. Tech. rep., Univ. California, Berkeley [Google Scholar]
Cui Y, Zhu R & Kosorok M (2017). Tree based weighted learning for estimating individualized treatment rules with censored data. Electronic Journal of Statistics 11, 3927–3953. [DOI] [PMC free article] [PubMed] [Google Scholar]
Geman S & Hwang C-R (1982). Nonparametric Maximum Likelihood Estimation by the Method of Sieves. The Annals of Statistics 10, 401–414. [Google Scholar]
Grenander U (1981). Abstract Inference. New York: Wiley. [Google Scholar]
Kosorok MR (2008). Introduction to Empirical Processes and Semiparametric Inference. New York: Springer-Verlag New York. [Google Scholar]
Li K-C (1991). Sliced Inverse Regression for Dimension Reduction. Journal of the American Statistical Association 86, 316–327. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li K-CC, Wang J-LL & Chen C-HH (1999). Dimension reduction for censored regression data. Annals of Statistics 27, 1–23. [Google Scholar]
Lipkovich I, Dmitrienko A & D’Agostino RB (2017). Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials. Statistics in Medicine 36, 136–196. [DOI] [PubMed] [Google Scholar]
Pons O (2003). Estimation in a Cox Regression Model with a Change-Point According to a Threshold in a Covariate. The Annals of Statistics 31, 442–463. [Google Scholar]
Therneau TM & Atkinson EJ (2018). rpart: Recursive Partitioning and Regression Trees. R package version 41–13. [Google Scholar]
Wang R, Lagakos SW, Ware JH, Hunter DJ & Drazen JM (2007). Statistics in Medicine Reporting of Subgroup Analyses in Clinical Trials. New England Journal of Medicine 357, 2189–2194. [DOI] [PubMed] [Google Scholar]
Wei S & Kosorok MR (2013). Latent Supervised Learning. Journal of the American Statistical Association 108, 957–970. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu R & Kosorok MR (2012). Recursively Imputed Survival Trees. Journal of The American Statistical Association 107, 331–340. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

NIHMS988684-supplement-Supplementary_material.pdf^{(290.2KB, pdf)}

[R1] Beran R (1981). Nonparametric regression with randomly censored survival data. Tech. rep., Univ. California, Berkeley [Google Scholar]

[R2] Cui Y, Zhu R & Kosorok M (2017). Tree based weighted learning for estimating individualized treatment rules with censored data. Electronic Journal of Statistics 11, 3927–3953. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Geman S & Hwang C-R (1982). Nonparametric Maximum Likelihood Estimation by the Method of Sieves. The Annals of Statistics 10, 401–414. [Google Scholar]

[R4] Grenander U (1981). Abstract Inference. New York: Wiley. [Google Scholar]

[R5] Kosorok MR (2008). Introduction to Empirical Processes and Semiparametric Inference. New York: Springer-Verlag New York. [Google Scholar]

[R6] Li K-C (1991). Sliced Inverse Regression for Dimension Reduction. Journal of the American Statistical Association 86, 316–327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Li K-CC, Wang J-LL & Chen C-HH (1999). Dimension reduction for censored regression data. Annals of Statistics 27, 1–23. [Google Scholar]

[R8] Lipkovich I, Dmitrienko A & D’Agostino RB (2017). Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials. Statistics in Medicine 36, 136–196. [DOI] [PubMed] [Google Scholar]

[R9] Pons O (2003). Estimation in a Cox Regression Model with a Change-Point According to a Threshold in a Covariate. The Annals of Statistics 31, 442–463. [Google Scholar]

[R10] Therneau TM & Atkinson EJ (2018). rpart: Recursive Partitioning and Regression Trees. R package version 41–13. [Google Scholar]

[R11] Wang R, Lagakos SW, Ware JH, Hunter DJ & Drazen JM (2007). Statistics in Medicine Reporting of Subgroup Analyses in Clinical Trials. New England Journal of Medicine 357, 2189–2194. [DOI] [PubMed] [Google Scholar]

[R12] Wei S & Kosorok MR (2013). Latent Supervised Learning. Journal of the American Statistical Association 108, 957–970. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Zhu R & Kosorok MR (2012). Recursively Imputed Survival Trees. Journal of The American Statistical Association 107, 331–340. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

The Change-Plane Cox Model

Susan Wei

Michael R Kosorok

Summary

1. Introduction

2. Methodology

2·1. Overview

2·2. Initialization of the sieve

2·3. Updating the sieve using sliced inverse regression

2·4. Estimation of censoring weights

3. Consistency

4. Simulation study

Table 1:

Fig. 1:

5. Future work

Supplementary Material

Acknowledgement

Appendix

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

The Change-Plane Cox Model

Susan Wei

Michael R Kosorok

Summary

1. Introduction

2. Methodology

2·1. Overview

2·2. Initialization of the sieve

2·3. Updating the sieve using sliced inverse regression

2·4. Estimation of censoring weights

3. Consistency

4. Simulation study

Table 1:

Fig. 1:

5. Future work

Supplementary Material

Acknowledgement

Appendix

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases