A SIEVE M-THEOREM FOR BUNDLED PARAMETERS IN SEMIPARAMETRIC MODELS, WITH APPLICATION TO THE EFFICIENT ESTIMATION IN A LINEAR MODEL FOR CENSORED DATA

Ying Ding; Bin Nan

. Author manuscript; available in PMC: 2014 Jan 14.

Published in final edited form as: Ann Stat. 2011;39(6):2795–3443.

A SIEVE M-THEOREM FOR BUNDLED PARAMETERS IN SEMIPARAMETRIC MODELS, WITH APPLICATION TO THE EFFICIENT ESTIMATION IN A LINEAR MODEL FOR CENSORED DATA^*

Ying Ding ¹, Bin Nan ¹

PMCID: PMC3890689 NIHMSID: NIHMS359465 PMID: 24436500

Abstract

In many semiparametric models that are parameterized by two types of parameters – a Euclidean parameter of interest and an infinite-dimensional nuisance parameter, the two parameters are bundled together, i.e., the nuisance parameter is an unknown function that contains the parameter of interest as part of its argument. For example, in a linear regression model for censored survival data, the unspecified error distribution function involves the regression coefficients. Motivated by developing an efficient estimating method for the regression parameters, we propose a general sieve M-theorem for bundled parameters and apply the theorem to deriving the asymptotic theory for the sieve maximum likelihood estimation in the linear regression model for censored survival data. The numerical implementation of the proposed estimating method can be achieved through the conventional gradient-based search algorithms such as the Newton-Raphson algorithm. We show that the proposed estimator is consistent and asymptotically normal and achieves the semiparametric efficiency bound. Simulation studies demonstrate that the proposed method performs well in practical settings and yields more efficient estimates than existing estimating equation based methods. Illustration with a real data example is also provided.

Keywords and phrases: Accelerated failure time model, B-spline, bundled parameters, efficient score function, semiparametric efficiency, sieve maximum likelihood estimation

1. Introduction

In a semiparametric model that is parameterized by two types of parameters – a finite-dimensional Euclidean parameter and an infinite-dimensional parameter, oftentimes the infinite-dimensional parameter is considered as a nuisance parameter, and the two parameters are separated. In many interesting statistical models, however, the parameter of interest and the nuisance parameter are bundled together, a terminology used by [12] when they reviewed the linear models under interval censoring, which means that the infinite-dimensional parameter is an unknown function of the parameter of interest. For example, in a linear regression model for censored survival data, the unspecified error distribution function, often treated as a nuisance parameter, is a function of the regression coefficients. Other examples include the single index model and the Cox regression model with an unspecified link function.

There is a rich literature of asymptotic distributional theories for M-estimation in a variety of semiparametric models with well separated parameters, see e.g. [9, 10, 11, 23, 29, 32], among many others. Though many methodologies of M-estimation for bundled parameters have been proposed in the literature, general asymptotic distributional theories for such problems are still lacking. The only estimation theories for bundled parameters we are aware of are the sieve generalized method of moment of [1] and the estimating equation approach of [5, 18].

In this article, we consider an extension of existing asymptotic distributional theories to accommodate situations where the estimation criteria are parameterized with bundled parameters. The proposed theory has similar flavor of Theorem 2 in [5], but they are different because the latter requires an existing uniform consistent estimator of the infinite dimensional nuisance parameter with a convergence rate faster than n^−1/4, which is then treated as a fixed function of the parameter of interest in their estimating procedure, while we need to simultaneously estimate both parameters through a sieve parameter space; furthermore, their existing nuisance parameter estimator needs to satisfy their condition (2.6), which is usually hard to verify when its convergence rate is slower than n^−1/2. Our proposed theory is general enough to cover a wide range of problems for bundled parameters including afore-mentioned single index model, the Cox model with unknown link function, and linear model under different censoring mechanisms. Rigorous proofs for each of the models, however, will take lengthy derivations. We only use the efficient estimation in the semiparametric linear regression model with right censored data as an illustrative example that motivates such a theoretical development, and will present results for other models elsewhere. Note that the considered example can not be directly put into the framework of restricted moments due to right censoring, thus can not be handled by the method of [1].

Suppose that the failure time transformed by a known monotone transformation is linearly related to a set of covariates, where the failure time is subject to right censoring. Let T_i denote the transformed failure time and C_i denote the transformed censoring time by the same transformation for subject i, i = 1, ⋯, n. Let Y_i = min(T_i, C_i) and Δ_i = I(T_i ≤ C_i). Then the semiparametric linear model we consider here can be written as

T_{i} = X_{i}^{'} β_{0} + e_{0, i}, i = 1, \dots, n,

(1.1)

where the errors e_0,i are independent and identically distributed (i.i.d.) with an unspecified distribution. When the failure time is log-transformed, this model corresponds to the well-known accelerated failure time model [15]. Here we assume that (X_i, C_i), i = 1, …, n, are i.i.d. and independent of e_0,i. This is a common assumption for linear models with censored survival data, which is particularly needed in [21] to derive the efficient score function for β₀. Such an assumption, however, is stronger than necessary in the usual linear regression without censoring, for which the error is only required to be uncorrelated with covariates (see e.g. [3]). We also avoid trivial transformations such as log(0) so that we always have Y_i’s bounded from below.

The semiparametric linear regression model relates the failure time to the covariates directly. It provides a straightforward interpretation of the data and serves as an attractive alternative to the Cox model [6] in many applications. Several estimators of the regression parameters have been proposed in the literature since late 70’s, including the rank-based estimators (see e.g. [19], [28], [25], [30], [13], [14]) and the Buckley-James estimator (see e.g. [2], [20], [16]). There are two major challenges in the estimation for such a linear model: (1) the estimating functions in the aforementioned methods are discrete, leading to potential multiple solutions as well as numerical difficulties; (2) none of the aforementioned methods is efficient. Recently, [31] developed a kernel-smoothed profile likelihood estimating procedure for the accelerated failure time model. In this article, we consider a sieve maximum likelihood approach for model (1.1) for censored data. The proposed approach is much intuitive, easy to implement numerically, and asymptotically efficient.

It is easy to see that T and C are independent conditional on X under the assumption e₀ ⊥ (C, X). Hence the joint density function of Z = (Y, Δ, X) can be written as

f_{Y, Δ, X} (y, δ, x) = λ_{0} {(y - x' β_{0})}^{δ} exp {- Λ_{0} (y - x' β_{0})} H (y, δ, x),

(1.2)

where Λ₀(·) is the true cumulative hazard function for the error term e₀ and λ₀(·) is its derivative. H (y, δ, x) only depends on the conditional distribution of C given X and the marginal distribution of X, and is free of β₀ and λ₀. To simplify the notation, we will ignore the factor H from the likelihood function. Then for i.i.d. observations (Y_i, Δ_i, X_i), i = 1, ⋯, n, from equation (1.2) we obtain the log likelihood function for β and λ as

l_{n} (β, λ) = n^{- 1} \sum_{i = 1}^{n} {Δ_{i} log {λ (Y_{i} + X_{i}^{'} β)} - \int I (Y_{i} \geq t) λ (t - X_{i}^{'} β) d t} .

(1.3)

The log likelihood given in (1.3) apparently is a semiparametric model, where the argument of the nuisance parameter λ involves β, thus β and λ are bundled parameters. To keep the positivity of λ, let g(·) = log λ(·). Then the log likelihood function for β and g, using the counting process notation, can be written as

l_{n} (β, g) = n^{- 1} \sum_{i = 1}^{n} {\int g (t - X_{i}^{'} β) d N_{i} (t) - \int I (Y_{i} \geq t) e^{g (t - X_{i}^{'} β)} d t},

(1.4)

where N_i(t) = Δ_iI(Y_i ≤ t) is the counting process for subject i.

We propose a new approach by directly maximizing the log likelihood function in a sieve space in which function g(·) is approximated by B-splines. Numerically, the estimator can be easily obtained by the Newton-Raphson algorithm or any gradient-based search algorithms. We show that the proposed estimator is consistent and asymptotically normal, and the limiting covariance matrix reaches the semiparametric efficiency bound, which can be estimated either by inverting the information matrix based on the efficient score function of the regression parameters derived by [21], or by inverting the observed information matrix of all parameters, taking into account that we are also estimating the nuisance parameters in the sieve space for the log hazard function.

2. The sieve M-theorem on the asymptotic normality of semiparametric estimation for bundled parameters

In this section, we extend the general theorem introduced by [29], which deals with the asymptotic normality of semiparametric M-estimators of regression parameters when convergence rate of the estimator for nuisance parameters can be slower than n^−1/2. In their theorem, the parameters of interest and the nuisance parameters are assumed to be separated. We consider a more general setting where the nuisance parameter can be a function of the parameters of interest. The theorem is crucial in the proof of asymptotic normality given in Theorem 4.2 for our proposed estimators.

Some empirical process notation will be used from now on. We denote Pf = ∫ f(z) dP(z) and $ℙ_{n} f = n^{- 1} \sum_{i = 1}^{n} f (Z_{i})$ , where P is a probability measure and ℙ_n is an empirical probability measure, and denote 𝔾_nf = n^1/2(ℙ_n − P)f. Given i.i.d. observations Z₁, Z₂, ⋯, Z_n ∈ 𝒵, we estimate the unknown parameters (β, ζ(·, β)) by maximizing an objective function for (β, ζ(·, β)), $n^{- 1} \sum_{i = 1}^{n} m (β, ζ (\cdot, β); Z_{i}) = ℙ_{n} m (β, ζ (\cdot, β); Z)$ , where β is the parameter of interest and ζ(·, β) is the nuisance parameter that can be a function of β. Here “ · ” denotes the other arguments of ζ besides β, which can be some components of Z ∈ 𝒵. If the objective function m is the log-likelihood function of a single observation, then the estimator becomes the semiparametric maximum likelihood estimator. Here we adopt similar notation in [29].

Let θ = (β, ζ(·, β)), β ∈ ℬ ⊂ ℝ^d and ζ ∈ ℋ, where ℬ is the parameter space of β and ℋ is a class of functions mapping from 𝒵 × ℬ to ℝ. Let Θ = ℬ×ℋ be the parameter space of θ. Define a distance between θ₁, θ₂ ∈ Θ by

d (θ_{1}, θ_{2}) = {{| β_{2} - β_{1} |}^{2} + {‖ ζ_{2} (\cdot, β_{2}) - ζ_{1} (\cdot, β_{1}) ‖}^{2}}^{1 / 2},

where | · | is the Euclidean distance and ‖ · ‖ is some norm. Let Θ_n be the sieve parameter space, a sequence of increasing subsets of the parameter space Θ growing dense in Θ as n → ∞. We aim to find θ̂_n ∈ Θ_n such that d(θ̂_n, θ₀) = o_p(1) and β̂_n is asymptotically normal.

For any fixed ζ(·, β) ∈ ℋ, let {ζ_η(·, β) : η in a neighborhood of 0 ∈ ℝ} be a smooth curve in ℋ running through ζ(·, β) at η = 0, i.e., ζ_η(·, β)|_η=0 = ζ(·, β). Assume all ζ(·, β) ∈ ℋ are at least twice-differentiable with respect to β, and denote

ℍ = {h : h (\cdot, β) = \frac{\partial ζ_{η} (\cdot, β)}{\partial η} |_{η = 0}, ζ_{η} \in ℋ, β \in ℬ} .

Assume the objective function m is twice Frechet differentiable. Since for a small δ, we have ζ(·, β + δ) − ζ(·, β) = ζ̇_β(·, β)δ + o(δ), here ζ̇_β(·, β) = ∂ζ(·, β)/∂β, then by the definition of functional derivatives it follows that

lim_{δ \to 0} \frac{1}{δ} {m (β, ζ (\cdot, β + δ); z) - m (β, ζ (\cdot, β); z)} = lim_{δ \to 0} \frac{1}{δ} {m (β, ζ (\cdot, β) + {ζ̇}_{β} (\cdot, β) δ + o (δ); z) - m (β, ζ (\cdot, β) + {ζ̇}_{β} (\cdot, β) δ; z)} + lim_{δ \to 0} \frac{1}{δ} {m (β, ζ (\cdot, β) + {ζ̇}_{β} (\cdot, β) δ; z) - m (β, ζ (\cdot, β); z)} = lim_{δ \to 0} ṁ_{2} (β, ζ (\cdot, β) + {ζ̇}_{β} (\cdot, β) δ; z) [o (δ) / δ] + ṁ_{2} (β, ζ (\cdot, β); z) [{ζ̇}_{β} (\cdot, β)] = ṁ_{2} (β, ζ (\cdot, β); z) [{ζ̇}_{β} (\cdot, β)],

where the subscript 2 indicates that the derivatives are taken with respect to the second argument of the function. The last equality holds because

lim_{δ \to 0} ṁ_{2} (β, ζ (\cdot, β) + {ζ̇}_{β} (\cdot, β) δ; z) [o (δ) / δ] = 0 .

Similarly we have

lim_{δ \to 0} \frac{1}{δ} {ṁ_{2} (β, ζ (\cdot, β + δ); z) [h (\cdot, β)] - ṁ_{2} (β, ζ (\cdot, β); z) [h (\cdot, β)]} = {m̈}_{22} (β, ζ (\cdot, β); z) [h (\cdot, β), {ζ̇}_{β} (\cdot, β)]

and

lim_{δ \to 0} \frac{1}{δ} {ṁ_{2} (β, ζ (\cdot, β); z) [h (\cdot, β + δ)] - ṁ_{2} (β, ζ (\cdot, β); z) [h (\cdot, β)]} = ṁ_{2} (β, ζ (\cdot, β); z) [ḣ_{β} (\cdot, β)] .

Thus according to the chain rule of the functional derivatives, we have

ṁ_{β} (β, ζ (\cdot, β); z) = \frac{\partial m (β, ζ (\cdot, β); z)}{\partial β} = ṁ_{1} (β, ζ (\cdot, β); z) + ṁ_{2} (β, ζ (\cdot, β); z) [{ζ̇}_{β} (\cdot, β)], ṁ_{ζ} (β, ζ (\cdot, β); z) [h] = \frac{\partial m (β, (ζ + η h) (\cdot, β); z)}{\partial η} |_{η = 0} = ṁ_{2} (β, ζ (\cdot, β); z) [h (\cdot, β)], {m̈}_{β β} (β, ζ (\cdot, β); z) = \frac{\partial^{2} m (β, ζ (\cdot, β); z)}{\partial β \partial β'} = \frac{\partial ṁ_{β} (β, ζ (\cdot, β); z)}{\partial β'} = {m̈}_{11} (β, ζ (\cdot, β); z) + {m̈}_{12} (β; ζ (\cdot, β); z) [{ζ̇}_{β} (\cdot, β)] + {m̈}_{21} (β, ζ (\cdot, β); z) [{ζ̇}_{β} (\cdot, β)] + {m̈}_{22} (β, ζ (\cdot, β); z) [{ζ̇}_{β} (\cdot, β), {ζ̇}_{β} (\cdot, β)] + ṁ_{2} (β, ζ (\cdot, β); z) [{ζ̈}_{β β} (\cdot, β)], {m̈}_{β ζ} (β, ζ (\cdot, β); z) [h] = \frac{\partial ṁ_{β} (β, (ζ + η h) (\cdot, β); z)}{\partial η} |_{η = 0} = {m̈}_{12} (β, ζ (\cdot, β); z) [h (\cdot, β)] + {m̈}_{22} (β, ζ (\cdot, β); z) [{ζ̇}_{β} (\cdot, β), h (\cdot, β)] + ṁ_{2} (β, ζ (\cdot, β); z) [ḣ_{β} (\cdot, β)], {m̈}_{ζ β} (β, ζ (\cdot, β); z) [h] = \frac{\partial ṁ_{2} (β, ζ (\cdot, β); z) [h (\cdot, β)]}{\partial β} = {m̈}_{21} (β, ζ (\cdot, β); z) [h (\cdot, β)] + {m̈}_{22} (β, ζ (\cdot, β); z) [h (\cdot, β), {ζ̇}_{β} (\cdot, β)] + ṁ_{2} (β, ζ (\cdot, β); z) [ḣ_{β} (\cdot, β)], {m̈}_{ζ ζ} (β, ζ (\cdot, β); z) [h_{1}, h_{2}] = {m̈}_{22} (β, ζ (\cdot, β); z) [h_{1} (\cdot, β), h_{2} (\cdot, β)] .

As noted before, the subscript 1 or 2 in the derivatives indicates that the derivatives are taken with respect to the first or the second argument of the function, and h inside the square brackets is a function denoting the direction of the functional derivative with respect to ζ. Note that for the second derivatives m̈_βζ and m̈_ζβ, we implicitly require the direction h to be a differentiable function with respect to β. It is easily seen that when ζ is free of β, all the above derivatives reduce to that in [29]. Following [29], we also define

Ṡ_{β} (β, ζ (\cdot, β)) = P ṁ_{β} (β, ζ (\cdot, β); Z), Ṡ_{ζ} (β, ζ (\cdot, β)) [h] = P ṁ_{ζ} (β, ζ (\cdot, β); Z) [h], Ṡ_{β, n} (β, ζ (\cdot, β)) = ℙ_{n} ṁ_{β} (β, ζ (\cdot, β); Z), Ṡ_{ζ, n} (β, ζ (\cdot, β)) [h] = ℙ_{n} ṁ_{ζ} (β, ζ (\cdot, β); Z) [h], {S̈}_{β β} (β, ζ (\cdot, β)) = P {m̈}_{β β} (β, ζ (\cdot, β); Z), {S̈}_{ζ ζ} (β, ζ (\cdot, β)) [h, h] = P {m̈}_{ζ ζ} (β, ζ (\cdot, β); Z) [h, h],

and

{S̈}_{β ζ} (β, ζ (\cdot, β)) [h] = {S̈}_{ζ β}^{'} (β, ζ (\cdot, β)) [h] = P {m̈}_{β ζ} (β, ζ (\cdot, β); Z) [h] .

Furthermore, for h = (h₁, h₂, ⋯, h_d)′ ∈ ℍ^d, we denote

ṁ_{ζ} (β, ζ (\cdot, β); z) [h] = (ṁ_{ζ} (β, ζ (\cdot, β); z) [h_{1}], \dots, ṁ_{ζ} (β, ζ (\cdot, β); z) [h_{d}])', {m̈}_{β ζ} (β, ζ (\cdot, β); z) [h] = ({m̈}_{β ζ} (β, ζ (\cdot, β); z) [h_{1}], \dots, {m̈}_{β ζ} (β, ζ (\cdot, β); z) [h_{d}]), {m̈}_{ζ β} (β, ζ (\cdot, β); z) [h] = ({m̈}_{ζ β} (β, ζ (\cdot, β); z) [h_{1}], \dots, {m̈}_{ζ β} (β, ζ (\cdot, β); z) [h_{d}])', {m̈}_{ζ ζ} (β, ζ (\cdot, β); z) [h, h] = ({m̈}_{ζ ζ} (β, ζ (\cdot, β); z) [h_{1}, h], \dots, {m̈}_{ζ ζ} (β, ζ (\cdot, β); z) [h_{d}, h])',

and define correspondingly

Ṡ_{ζ} (β, ζ (\cdot, β)) [h] = P ṁ_{ζ} (β, ζ (\cdot, β); Z) [h], Ṡ_{ζ, n} (β, ζ (\cdot, β)) [h] = ℙ_{n} ṁ_{ζ} (β, ζ (\cdot, β); Z) [h], {S̈}_{β ζ} (β, ζ (\cdot, β)) [h] = P {m̈}_{β ζ} (β, ζ (\cdot, β); Z) [h], {S̈}_{ζ β} (β, ζ (\cdot, β)) [h] = P {m̈}_{ζ β} (β, ζ (\cdot, β); Z) [h], {S̈}_{ζ ζ} (β, ζ (\cdot, β)) [h, h] = P {m̈}_{ζ ζ} (β, ζ (\cdot, β); Z) [h, h] .

To obtain the asymptotic normality result for the sieve M-estimator β̂_n, the assumptions we will make in the following look similar to those in [29], but all the derivatives with respect to β involve the chain rule and hence are more complicated, which is the key difference to [29]. Additionally, we focus on sieve estimators in the sieve parameter space. We list the following assumptions:

A1
(Rate of convergence) For an estimator θ̂_n = (β̂_n, ζ̂_n(·, β̂_n)) ∈ Θ_n and the true parameter θ₀ = (β₀, ζ₀(·, β₀)) ∈ Θ, d(θ̂_n, θ₀) = O_p(n^−ξ) for some ξ > 0.
A2
Ṡ_β(β₀, ζ₀(·, β₀)) = 0 and Ṡ_ζ(β₀, ζ₀(·, β₀))[h] = 0 for all h ∈ ℍ.
A3
(Positive information) There exists an $h^{*} = (h_{1}^{*}, \dots, h_{d}^{*})'$ , where $h_{j}^{*} \in ℍ$ for j = 1, ⋯, d, such that
${S̈}_{β ζ} (β_{0}, ζ_{0} (\cdot, β_{0})) [h] - {S̈}_{ζ ζ} (β_{0}, ζ_{0} (\cdot, β_{0})) [h^{*}, h] = 0$
for all h ∈ ℍ. Furthermore, the matrix
$A = - {S̈}_{β β} (β_{0}, ζ_{0} (\cdot, β_{0})) + {S̈}_{ζ β} (β_{0}, ζ_{0} (\cdot, β_{0})) [h^{*}] = - P {{m̈}_{β β} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) - {m̈}_{ζ β} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h^{*}]}$
is nonsingular.
A4
The estimator (β̂_n, ζ̂_n(·, β̂_n)) satisfies
$Ṡ_{β, n} ({β̂}_{n}, {ζ̂}_{n} (\cdot, {β̂}_{n})) = o_{p} (n^{- 1 / 2}) and Ṡ_{ζ, n} ({β̂}_{n}, {ζ̂}_{n} (\cdot, {β̂}_{n})) [h^{*}] = o_{p} (n^{- 1 / 2}) .$
A5
(Stochastic equicontinuity) For some C > 0,
$sup_{d (θ, θ_{0}) \leq C n^{- ξ}, θ \in Θ_{n}} | \sqrt{n} (Ṡ_{β, n} - Ṡ_{β}) (β, ζ (\cdot, β)) - \sqrt{n} (Ṡ_{β, n} - Ṡ_{β}) (β_{0}, ζ_{0} (\cdot, β_{0})) | = o_{p} (1)$
and
$sup_{d (θ, θ_{0}) \leq C n^{- ξ}, θ \in Θ_{n}} | \sqrt{n} (Ṡ_{ζ, n} - Ṡ_{ζ}) (β, ζ (\cdot, β)) [h^{*} (\cdot, β)] - \sqrt{n} (Ṡ_{ζ, n} - Ṡ_{ζ}) (β_{0}, ζ_{0} (\cdot, β_{0})) [h^{*} (\cdot, β)] | = o_{p} (1) .$
A6
(Smoothness of the model) For some α > 1 satisfying αξ > 1/2, and for θ in a neighborhood of θ₀ : {θ : d(θ, θ₀) ≤ Cn^−ξ, θ ∈ Θ_n},
$| Ṡ_{β} (β, ζ (\cdot, β)) - Ṡ_{β} (β_{0}, ζ_{0} (\cdot, β_{0})) - {S̈}_{β β} (β_{0}, ζ_{0} (\cdot, β_{0})) (β - β_{0}) - {S̈}_{β ζ} (β_{0}, ζ_{0} (\cdot, β_{0})) [ζ (\cdot, β) - ζ_{0} (\cdot, β_{0})] | = O (d^{α} (θ, θ_{0}))$
and
$| Ṡ_{ζ} (β, ζ (\cdot, β)) [h^{*} (\cdot, β)] - Ṡ_{ζ} (β_{0}, ζ_{0} (\cdot, β_{0})) [h^{*} (\cdot, β_{0})] - {S̈}_{ζ β} (β_{0}, ζ_{0} (\cdot, β_{0})) [h^{*} (\cdot, β_{0})] (β - β_{0}) - {S̈}_{ζ ζ} (β_{0}, ζ_{0} (\cdot, β_{0})) [h^{*} (\cdot, β_{0}), ζ (\cdot, β) - ζ_{0} (\cdot, β_{0})] | = O (d^{α} (θ, θ_{0})) .$

Note that ξ in A1 depends on the entropy of the sieve parameter space for ζ and can not be arbitrarily small – it is controlled by the smoothness of the model in A6. The convergence rate in A1 needs to be achieved prior to obtaining asymptotic normality. A2 is a common assumption for the maximum likelihood estimation and usually holds. The direction h* in A3 may be found through the equation in A3. It is the least favorable direction when m is the likelihood function. A4 and A5 are usually verified either by the Donsker property or the maximal inequality of [27]. A6 can be obtained by the Taylor expansion. The following theorem is an extension to Theorem 6.1 in [29] when the infinite-dimensional parameter ζ is a function of the finite-dimensional parameter β.

Theorem 2.1. Suppose that assumptions A1–A6 hold. Then

\sqrt{n} ({β̂}_{n} - β_{0}) = A^{- 1} \sqrt{n} ℙ_{n} m^{*} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) + o_{p} (1) \to_{d} N (0, A^{- 1} B (A^{- 1})'),

where

m^{*} (β_{0}, ζ_{0} (\cdot, β_{0}); z) = ṁ_{β} (β_{0}, ζ_{0} (\cdot, β_{0}); z) - ṁ_{ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); z) [h^{*}], B = P {m^{*} {(β_{0}, ζ_{0} (\cdot, β_{0}); Z)}^{\otimes 2}},

and A is given in assumption A3. Here a^⊗2 = aa′.

Proof. The proof follows similarly along the proof of Theorem 6.1 in [29]. Assumptions A1 and A5 yield

\sqrt{n} (Ṡ_{β, n} - Ṡ_{β}) ({β̂}_{n}, {ζ̂}_{n} (\cdot, {β̂}_{n})) - \sqrt{n} (Ṡ_{β, n} - Ṡ_{β}) (β_{0}, ζ_{0} (\cdot, β_{0})) = o_{p} (1) .

Since Ṡ_β,n(β̂_n, ζ̂_n(·, β̂_n)) = o_p(n^−1/2) by A4 and Ṡ_β(β₀, ζ₀(·, β₀)) = 0 by A2, we have

\sqrt{n} Ṡ_{β} ({β̂}_{n}, {ζ̂}_{n} (\cdot, {β̂}_{n})) + \sqrt{n} Ṡ_{β, n} (β_{0}, ζ_{0} (\cdot, β_{0})) = o_{p} (1) .

Similarly,

\sqrt{n} Ṡ_{ζ} ({β̂}_{n}, {ζ̂}_{n} (\cdot, {β̂}_{n})) [h^{*} (\cdot, {β̂}_{n})] + \sqrt{n} Ṡ_{ζ, n} (β_{0}, ζ_{0} (\cdot, β_{0})) [h^{*} (\cdot, β_{0})] = o_{p} (1) .

Combining these equalities and assumption A6 yields

{S̈}_{β β} (β_{0}, ζ_{0} (\cdot, β_{0})) ({β̂}_{n} - β_{0}) + {S̈}_{β ζ} (β_{0}, ζ_{0} (\cdot, β_{0})) [{ζ̂}_{n} (\cdot, {β̂}_{n}) - ζ_{0} (\cdot, β_{0})] + Ṡ_{β, n} (β_{0}, ζ_{0} (\cdot, β_{0})) + O (d^{α} ({θ̂}_{n}, θ_{0})) = o_{p} (n^{- 1 / 2})

(2.1)

and

{S̈}_{ζ β} (β_{0}, ζ_{0} (\cdot, β_{0})) [h^{*} (\cdot, β_{0})] ({β̂}_{n} - β_{0}) + {S̈}_{ζ ζ} (β_{0}, ζ_{0} (\cdot, β_{0})) [h^{*} (\cdot, β_{0}), {ζ̂}_{n} (\cdot, {β̂}_{n}) - ζ_{0} (\cdot, β_{0})] + Ṡ_{ζ, n} (β_{0}, ζ_{0} (\cdot, β_{0})) [h^{*} (\cdot, β_{0})] + O (d^{α} ({θ̂}_{n}, θ_{0})) = o_{p} (n^{- 1 / 2}) .

(2.2)

Since α > 1 with αξ > 1/2, the rate of convergence assumption A1 implies $\sqrt{n} O (d^{α} ({θ̂}_{n}, θ_{0})) = o_{p} (1)$ , then (2.1) – (2.2) together with A3 yields

({S̈}_{β β} (β_{0}, ζ_{0} (\cdot, β_{0})) - {S̈}_{ζ β} (β_{0}, ζ_{0} (\cdot, β_{0})) [h^{*} (\cdot, β_{0})]) ({β̂}_{n} - β_{0}) = - (Ṡ_{β, n} (β_{0}, ζ_{0} (\cdot, β_{0})) - Ṡ_{ζ, n} (β_{0}, ζ_{0} (\cdot, β_{0})) [h^{*} (\cdot, β_{0})]) + o_{p} (n^{- 1 / 2}),

that is,

- A ({β̂}_{n} - β_{0}) = - ℙ_{n} m^{*} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) + o_{p} (n^{- 1 / 2}) .

This yields

\sqrt{n} ({β̂}_{n} - β_{0}) = A^{- 1} \sqrt{n} ℙ_{n} m^{*} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) + o_{p} (1) \to_{d} N (0, A^{- 1} B (A^{- 1})') .

3. Back to the linear model: the sieve maximum likelihood estimation

By taking logarithm to the positive function λ(·) in (1.3), the function g(·) in (1.4) is no longer restricted to be positive, which eases the estimation. We now describe the spline-based sieve maximum likelihood estimation for model (1.1). Under the regularity conditions C.1–C.3 stated in Section 4, we know that the observed residual times ${Y_{i} - X_{i}^{'} β : β \in ℬ, i = 1, \dots, n}$ are confined in some finite interval. Let [a, b] be an interval of interest, where −∞ < a < b < ∞. Let T_{K_n} = {t₁, ⋯, t_{K_n} } be a set of partition points of [a, b] with K_n = O(n^ν) and max_{1≤j≤K_n+1} |t_j − t_j−1| = O(n^−ν) for some constant ν ∈ (0, 1/2). Let 𝒮_n(T_{K_n}, K_n, p) be the space of polynomial splines of order p ≥ 1 defined in [22][Definition 4.1]. According to [22][Corollary 4.10], there exist a set of B-spline basis functions {B_j, 1 ≤ j ≤ q_n} with q_n = K_n + p such that for any s ∈ 𝒮_n(T_{K_n}, K_n, p), we can write

s (t) = \sum_{j = 1}^{q_{n}} γ_{j} B_{j} (t),

(3.1)

where we follow [24] by requiring max_{j=1,…,q_n} |γ_j | ≤ c_n that is allowed to grow with n slowly enough.

Let γ = (γ₁, …, γ_{q_n})′. Under suitable smoothness assumptions, g₀(·) = log λ₀(·) can be well approximated by some function in 𝒮_n(T_{K_n}, K_n, p). Therefore, we seek a member of 𝒮_n(T_{K_n}, K_n, p) together with a value of β ∈ ℬ that maximizes the log likelihood function. Specifically, let θ̂_n = (β̂_n, γ̂_n) be the value that maximizes

l_{n} (β, γ) = n^{- 1} \sum_{i = 1}^{n} [\int \sum_{j = 1}^{q_{n}} γ_{j} B_{j} (t - X_{i}^{'} β) d N_{i} (t) - \int I (Y_{i} \geq t) exp {\sum_{j = 1}^{q_{n}} γ_{j} B_{j} (t - X_{i}^{'} β)} d t] .

(3.2)

Taking the first order derivatives of l_n(β, γ) with respect to β and γ and setting them to zero, we can obtain the score equations. Since the integrals here are univariate integrals, their numerical implementation can be easily done by the one-dimensional Gaussian-quadrature method. Newton-Raphson algorithm or any other gradient-based search algorithms can be applied to solve the score equations for all parameters θ = (β, γ), e.g.,

θ^{(m + 1)} = θ^{(m)} - H {(θ^{(m)})}^{- 1} \cdot S (θ^{(m)})

where θ^(m) = (β^(m), γ^(m)) is the parameter estimate from the mth iteration, and

S (θ) = (\begin{matrix} \frac{\partial l_{n} (β, γ)}{\partial β} \\ \frac{\partial l_{n} (β, γ)}{\partial γ} \end{matrix}), H (θ) = (\begin{matrix} \frac{\partial^{2} l_{n} (β, γ)}{\partial β \partial β'} & \frac{\partial^{2} l_{n} (β, γ)}{\partial β \partial γ'} \\ \frac{\partial^{2} l_{n} (β, γ)}{\partial γ \partial β'} & \frac{\partial^{2} l_{n} (β, γ)}{\partial γ \partial γ'} \end{matrix})

are the score function and Hessian matrix of parameter θ. For any fixed β and n, it is clearly seen that l_n(β, γ) in (3.2) is concave with respect to γ and goes to −∞ if any γ_j approaches either ∞ or −∞, hence γ̂_n must be bounded which yields an estimator of s in 𝒮_n(T_{K_n}, K_n, p).

As stated in the next section, the distribution of β̂_n can be approximated by a normal distribution. One way to estimate the variance matrix of β̂_n is to approximate the (inverse of the) information matrix based on the efficient score function for β₀ by plugging in the estimated parameters (β̂_n, λ̂_n(·)). The consistency of such a variance estimator is given in Theorem 4.3. Another way is to invert the observed information matrix from the last Newton-Raphson iteration, taking into account that we are also estimating the nuisance parameter γ. The consistency of the latter approach may be proved in a similar way as Example 4 in [23] or via Theorem 2.2 in [8], and we leave detailed derivation to interested readers. Simulations indicate that both estimators work reasonably well.

4. Asymptotic results

Denote ε_β = Y − X′β and ε₀ = Y − X′β₀. We assume the following regularity conditions:

(C.1)
The true parameter β₀ belongs to the interior of a compact set ℬ ⊆ ℝ^d.
(C.2)
(a) The covariate X takes values in a bounded subset 𝒳 ⊆ ℝ^d; (b) E(XX′) is nonsingular.
(C.3)
There is a truncation time τ < ∞ such that, for some constant δ, P(ε₀ > τ|X) ≥ δ > 0 almost surely with respect to the probability measure of X. This implies that Λ₀(τ) ≤ −log δ < ∞.
(C.4)
The error e₀’s density f and its derivative ḟ are bounded and
$\int {(ḟ (t) / f (t))}^{2} f (t) d t < \infty .$
(C.5)
The conditional density of C given X and its derivative ġ_C|X are uniformly bounded for all possible values of X. That is,
$sup_{x \in 𝒳} g_{C | X} (t | X = x) \leq K_{1}, sup_{x \in 𝒳} | ġ_{C | X} (t | X = x) | \leq K_{2}$
for all t ≤ τ with some constants K₁, K₂ > 0, where τ is the truncation time defined in Condition C.3.
(C.6)
Let 𝒢^p denote the collection of bounded functions g on [a, b] with bounded derivatives g^(j), j = 1, …, k, and the kth derivative g^(k) satisfies the following Lipschitz continuity condition:
$| g^{(k)} (s) - g^{(k)} (t) | \leq L {| s - t |}^{m} for s, t \in [a, b],$
where k is a positive integer and m ∈ (0, 1] such that p = k + m ≥ 3, and L < ∞ is an unknown constant. The true log hazard function g₀(·) = log λ₀(·) belongs to 𝒢^p, where [a, b] is a bounded interval.
(C.7)
For some η ∈ (0, 1), u′V ar(X|ε₀)u ≥ ηu′E(XX′|ε₀)u almost surely for all u ∈ ℝ^d.

Condition C.1 is a common regularity assumption that has been imposed in the literature, see e.g. [16]. Conditions C.2(a) and C.3–C.4 were also assumed in [25]. Condition C.5 implies Condition B in [25]. In Condition C.6, we require p ≥ 3 to provide desirable controls of the spline approximation error rates of the first and second derivatives of g₀ (see Corollary 6.21 of [22]), which are needed in verifying Assumptions A4–A6. Condition C.7 was also proposed for the panel count data model in [29]. As noted in their Remark 3.4, this Condition C.7 can be justified in many applications when Condition C.2(b) is satisfied. The bounded interval [a, b] in C.6 may be chosen as a = inf_y,x(y − x′β₀) > −∞ and b = τ < ∞ under C.1–C.3, which is what we use in the following.

Now define the collection of functions ℋ^p as follows:

ℋ^{p} = {ζ (\cdot, β) : ζ (t, x, β) = g (ψ (t, x, β)), g \in 𝒢^{p}, t \in [a, b], x \in 𝒳, β \in ℬ},

where

ψ (t, x, β) = t - x' (β - β_{0})

and 𝒢^p is defined in C.6. Here ζ is a composite function of g composed with ψ. Note that ζ(t, x, β₀) = g(t). Then for ζ(·, β) ∈ ℋ^p we define the following norm

{‖ ζ (\cdot, β) ‖}_{2} = {\int_{𝒳} \int_{a}^{b} {g (t - x' (β - β_{0}))}^{2} d Λ_{0} (t) d F_{X} (x)}^{1 / 2} .

(4.1)

We also have the following collection of scores

ℍ = {h : h (\cdot, β) = {\frac{\partial ζ_{η} (\cdot, β)}{\partial η} |}_{η = 0} = w (ψ (\cdot, β)), ζ_{η} \in ℋ^{p}},

in which h(t, x, β) = w(ψ(t, x, β)) = w(t − x′(β − β₀)).

For any θ₁ = (β₁, ζ₁(·, β₁)) and θ₂ = (β₂, ζ₂(·, β₂)) in the space of Θ^p = ℬ × ℋ^p, define the following distance

d (θ_{1}, θ_{2}) = {{| β_{1} - β_{2} |}^{2} + {‖ ζ_{1} (\cdot, β_{1}) - ζ_{2} (\cdot, β_{2}) ‖}_{2}^{2}}^{1 / 2} .

(4.2)

Let $𝒢_{n}^{p} = 𝒮_{n} (T_{K_{n}}, K_{n}, p)$ . Denote

ℋ_{n}^{p} = {ζ (\cdot, β) : ζ (t, x, β) = g (ψ (t, x, β)), g \in 𝒢_{n}^{p}, t \in [a, b], x \in 𝒳, β \in ℬ}

and $Θ_{n}^{p} = ℬ \times ℋ_{n}^{p}$ . Clearly $ℋ_{n}^{p} \subseteq ℋ_{n + 1}^{p} \subseteq \dots \subseteq ℋ^{p}$ for all n ≥ 1. The sieve estimator θ̂_n = (β̂_n, ζ̂_n(·, β̂_n)), where ζ̂_n (t, x, β_n) = ĝ_n(t − x′(β̂_n − β₀)), is the maximizer of the empirical log-likelihood n⁻¹l_n(θ; Z) over the sieve space $Θ_{n}^{p}$ . The following theorem gives the convergence rate of the proposed estimator θ̂_n to the true parameter θ₀ = (β₀, ζ₀(·, β₀)) = (β₀, g₀).

Theorem 4.1. Let K_n = O(n^ν), where ν satisfies the restriction $\frac{1}{2 (1 + p)} < ν < \frac{1}{2 p}$ with p being the smoothness parameter defined in Condition C.6. Suppose Conditions C.1–C.7 hold and the failure time T follows model (1.1), then

d ({θ̂}_{n}, θ_{0}) = O_{p} {n^{- min (p ν, (1 - ν) / 2)}},

where d(·, ·) is defined in (4.2).

Remark. It is worth pointing out that the sieve space $𝒢_{n}^{p}$ does not have to be restricted to the B-spline space – it can be any sieve space as long as the estimator ${θ̂}_{n} \in ℬ \times ℋ_{n}^{p}$ satisfies the conditions of Theorem 1 in [24]. We refer to [4] for a comprehensive discussion of the sieve estimation for semiparametric models in general sieve spaces. Our choice of the B-spline space is primarily motivated by its simplicity of numerical implementation, which is a tremendous advantage of the proposed approach over exiting numerical methods for the accelerated failure time models, in particular, the linear programming approach.

We provide a proof of Theorem 4.1 in the online Supplementary Material by checking the conditions of Theorem 1 in [24]. Theorem 4.1 implies that if ν = 1/(1 + 2p), d(θ̂_n, θ₀) = O_p(n^−p/(1+2p)) which is the optimal convergence rate in the nonparametric regression setting. Although the overall convergence rate is slower than n^−1/2, the next theorem states that the proposed estimator of the regression parameter is still asymptotically normal and semiparametrically efficient.

Theorem 4.2. Given the following efficient score function for the censored linear model derived by [21]:

l_{β_{0}}^{*} (Y, Δ, X) = \int {X - P (X | Y - X' β_{0} \geq t)} {- \frac{{λ̇}_{0}}{λ_{0}} (t)} d M (t),

where

M (t) = Δ I (Y - X' β_{0} \leq t) - \int_{- \infty}^{t} I (Y - X' β_{0} \geq s) λ_{0} (s) d s

is the failure counting process martingale and

P (X | Y - X' β_{0} \geq t) = \frac{P {X I (Y - X' β_{0} \geq t)}}{P {I (Y - X' β_{0} \geq t)}}

was shown by [20]. Suppose that the conditions in Theorem 4.1 hold and $I (β_{0}) = P {l_{β_{0}}^{*} {(Y, Δ, X)}^{\otimes 2}}$ is nonsingular, then

n^{1 / 2} ({β̂}_{n} - β_{0}) = n^{- 1 / 2} I^{- 1} (β_{0}) \sum_{i = 1}^{n} l_{β_{0}}^{*} (Y_{i}, Δ_{i}, X_{i}) + o_{p} (1) \to N (0, I^{- 1} (β_{0}))

in distribution.

The proof of Theorem 4.2 is where we need to apply our general sieve M-theorem proposed in Section 2. We prove by checking the assumptions A1–A6. Details are provided in Section 7. The following theorem gives consistency of the variance estimator based on the above efficient score.

Theorem 4.3. Suppose the conditions in Theorem 4.2 hold. Denote

l_{{β̂}_{n}}^{*} (Y, Δ, X) = \int {X - X̄ (t; {β̂}_{n})} {- {\dot{\hat{g}}}_{n} (t)} d M̂ (t), where X̄ (t; {β̂}_{n}) = \frac{ℙ_{n} {X I (Y - X' {β̂}_{n} \geq t)}}{ℙ_{n} {I (Y - X' {β̂}_{n} \geq t)}} and M̂ (t) = Δ I (Y - X' {β̂}_{n} \leq t) - \int_{- \infty}^{t} I (Y - X' {β̂}_{n} \geq s) exp {ĝ_{n} (s)} d s .

Then $ℙ_{n} {l_{{β̂}_{n}}^{*} {(Y, Δ, X)}^{\otimes 2}} \to P {l_{β_{0}}^{*} {(Y, Δ, X)}^{\otimes 2}} = I (β_{0})$ in probability.

It is clearly seen that X̄ (t, β̂_n) in Theorem 4.3 estimates P(X|Y − X′β₀ ≥ t) in Theorem 4.2. The proof of Theorem 4.3 is provided in the Supplementary Material.

5. Numerical examples

5.1. Simulations

Extensive simulations are carried out to evaluate the finite sample performance of the proposed method. In the simulation studies, failure times are generated from the model

log T = 2 + X_{1} + X_{2} + e_{0},

where X₁ is Bernoulli with success probability 0.5, X₂ is independent normal with mean 0 and standard deviation 0.5 truncated at ±2. This is the same model used by [14] and [31]. We consider six error distributions: standard normal; standard extreme-value; mixtures of N(0, 1) and N(0, 3²) with mixing probabilities (0.5,0.5) and (0.95,0.05), denoted by 0.5N(0, 1) + 0.5N(0, 3²) and 0.95N(0, 1)+0.05N(0, 3²), respectively; Gumbel(−0.5μ, 0.5) with μ being the Euler constant and 0.5N(0, 1) + 0.5N(−1, 0.5²). The first four distributions were also considered by [31]. Similarly to [31], the censoring times are generated from Uniform [0, c] distribution, where c is chosen to produce a 25% censoring rate. We set the sample size n to 200, 400 and 600.

We choose cubic B-splines with one interior knot for n = 200 and 400, and two interior knots for n = 600. We perform the sieve maximum likelihood analysis and obtain the estimates of the slope parameters using the Newton-Raphson algorithm that updates (β, γ) iteratively. We stop iteration when the change of parameter estimates or the gradient value is less than a pre-specified tolerance value that is set to be 10⁻⁵ in our simulations. Log-rank and Gehan-weighted estimators are included for efficiency comparisons. We calculate the theoretical semiparametric efficiency bound I⁻¹(β₀), and scale it by the sample size, i.e., $σ^{*} = \sqrt{I^{- 1} (β_{0}) / n}$ , which serves as the reference standard error under the fully efficient situation. Table 1 summarizes the results of these studies based on 1000 simulated datasets. The bias of the proposed estimators of β₁ and β₂ are negligible. Both variance estimation procedures, denoted as ¹SEE (the standard error estimates by inverting the information matrix based on the efficient score function) and ²SEE (the standard error estimates by inverting the observed information matrix of all parameters including nuisance parameters), yield nice standard error estimates for the parameter estimators comparing to the empirical standard error SE, and the 95% confidence intervals have proper coverage probabilities, especially when the sample size is large. For the N(0, 1) error and the two mixtures of normal errors that are also considered in [31], the proposed estimators are more efficient than the log-rank estimators and have similar variances to the Gehan-weighted estimators. For the standard extreme-value error, the proposed estimators are more efficient than the Gehan-weighted estimator and similar to the log-rank estimator that is known to be the most efficient estimator under this particular error distribution. For the Gumbel(−0.5μ, 0.5) and 0.5N(0, 1) + 0.5N(−1, 0.5²) errors, the proposed estimators are more efficient than the other two estimators. Under all six error distributions, the standard errors of the proposed estimators are close to the efficient theoretical standard errors. The sample averages of the estimates for λ₀ under different simulation settings are reasonably close to corresponding true curves (results not shown here, see [7] for details).

Table 1.

Summary statistics for the simulation studies. The true slope parameters are β₁ = 1 and β₂ = 1. (a): N (0, 1); (b): standard extreme-value; (c): 0.5N(0, 1) + 0.5N(0, 3²); (d): 0.95N(0, 1) + 0.05N(0, 3²); (e): Gumbel (−0.5μ, 0.5); (f): 0.5N(0, 1) + 0.5N(−1, 0.5²).

Err. dist			B-spline MLE				Log-rank		Gehan

	n		Bias	SE	¹SEE (CP)	²SEE (CP)	Bias	SE	Bias	SE	σ*
(a)	200	β₁	.003	.168	.149 (.912)	.155 (.924)	.000	.170	.002	.159	.155
		β₂	.003	.167	.153 (.928)	.156 (.928)	.004	.171	.002	.160	.156
	400	β₁	.006	.110	.108 (.948)	.110 (.950)	.005	.115	.008	.108	.110
		β₂	.001	.110	.109 (.944)	.110 (.945)	.002	.116	.001	.109	.110
	600	β₁	.001	.092	.088 (.939)	.090 (.943)	.001	.096	.002	.093	.090
		β₂	.005	.091	.089 (.945)	.090 (.944)	.005	.097	.003	.092	.090
(b)	200	β₁	−.009	.180	.154 (.894)	.161 (.903)	−.008	.168	−.007	.190	.165
		β₂	.004	.182	.162 (.903)	.163 (.915)	.005	.170	.005	.195	.169
	400	β₁	.000	.126	.113 (.914)	.115 (.923)	−.001	.124	.000	.143	.117
		β₂	.008	.118	.116 (.934)	.116 (.938)	.010	.116	.012	.135	.120
	600	β₁	.001	.102	.093 (.919)	.094 (.923)	.001	.100	.000	.114	.095
		β₂	.011	.098	.095 (.944)	.095 (.945)	.011	.097	.007	.114	.098
(c)	200	β₁	.014	.300	.281 (.930)	.279 (.924)	−.020	.315	−.019	.292	.259
		β₂	.000	.306	.285 (.916)	.282 (.918)	.002	.317	.002	.288	.260
	400	β₁	.034	.199	.206 (.955)	.200 (.949)	.002	.218	.002	.197	.183
		β₂	−.003	.207	.208 (.949)	.202 (.942)	−.001	.222	−.002	.200	.184
	600	β₁	.035	.168	.171 (.957)	.165 (.949)	.003	.185	.001	.163	.150
		β₂	−.007	.169	.172 (.956)	.166 (.956)	−.004	.190	−.002	.168	.150
(d)	200	β₁	−.013	.172	.157 (.926)	.164 (.927)	−.010	.181	−.007	.166	.167
		β₂	−.004	.180	.160 (.908)	.164 (.913)	−.005	.184	−.005	.173	.166
	400	β₁	.003	.119	.113 (.944)	.116 (.948)	.004	.126	.006	.117	.118
		β₂	.003	.117	.114 (.942)	.116 (.953)	.004	.126	.003	.115	.118
	600	β₁	−.003	.097	.093 (.948)	.095 (.952)	−.002	.105	.002	.097	.096
		β₂	.001	.096	.094 (.942)	.095 (.944)	.002	.105	.003	.094	.096
(e)	200	β₁	.004	.080	.077 (.944)	.078 (.946)	−.001	.111	.004	.088	.079
		β₂	−.001	.083	.080 (.929)	.078 (.934)	.000	.114	.000	.091	.080
	400	β₁	−.005	.055	.055 (.946)	.055 (.951)	−.003	.079	−.004	.061	.056
		β₂	.003	.055	.056 (.954)	.056 (.950)	.003	.081	.003	.063	.056
	600	β₁	−.003	.047	.045 (.940)	.045 (.938)	.000	.067	−.001	.052	.045
		β₂	−.001	.047	.046 (.944)	.045 (.943)	−.002	.066	−.001	.051	.046
(f)	200	β₁	−.002	.126	.117 (.918)	.120 (.929)	−.002	.159	−.001	.128	.119
		β₂	.000	.133	.120 (.917)	.121 (.926)	.002	.164	.001	.134	.116
	400	β₁	−.002	.087	.084 (.949)	.085 (.950)	.003	.114	.000	.091	.084
		β₂	.004	.086	.086 (.951)	.086 (.953)	.003	.111	.004	.090	.082
	600	β₁	.003	.074	.070 (.929)	.070 (.931)	.005	.101	.001	.074	.069
		β₂	.003	.074	.070 (.936)	.070 (.936)	.009	.104	.004	.075	.067

Open in a new tab

5.2. A real data example

We use the Stanford heart transplant data [17] as an illustrative example. This dataset was also analyzed by [14] using their proposed least squares estimators. Following their analysis, we consider the same two models: the first one regresses the base-10 logarithm of the survival time on age at transplant and T5 mismatch score for the 157 patients with complete records on T5 measure, and the second one regresses the base-10 logarithm of the survival time on age and age². There were 55 censored patients. We fit these two models using the proposed method with five cubic B-spline basis functions.

We report the parameter estimates and the standard error estimates in Table 2 and compare them with the Gehan-weighted estimators reported by [14] and the Buckley-James estimators reported by [17]. For the first model, the parameter estimates for the age effect are fairly similar among all estimators and the standard error estimate from the proposed method tends to be smaller, while the parameter estimates for the T5 mismatch score vary across different estimators with none of them being significant at the 0.05 level. The disparity of the T5 effect may be due to what was pointed out by [17]: the accelerated failure time model with age and T5 as covariates does not fit the data ideally. For the second model with age and age² being the covariates, the point estimates are very similar across all methods and the standard error estimates from the proposed method are the smallest.

Table 2.

Regression parameter estimates and standard error estimates for the Stanford heart transplant data. The proposed estimators are compared with Gehan-weighted estimators reported in [13] and Buckley-James estimators reported in [17].

		B-spline MLE		Gehan-weighted		Buckley-James

	Covariate	Est.	SE	Est.	SE	Est.	SE
M. 1	Age	−0.0237	0.0068	−0.0211	0.0106	−0.015	0.008
	T5	−0.2118	0.1271	−0.0265	0.1507	−0.003	0.134
M. 2	Age	0.1022	0.0245	0.1046	0.0474	0.107	0.037
	Age²	−0.0016	0.0004	−0.0017	0.0006	−0.0017	0.0005

Open in a new tab

6. Discussion

By applying the proposed general sieve M-estimation theory for semiparametric models with bundled parameters, we are able to derive the asymptotic distribution for the sieve maximum likelihood estimator in a linear regression model where the response variable is subject to right censoring. By providing a both statistically and computationally efficient estimating procedure, this work makes the linear model a more viable alternative to the Cox proportional hazards model. Comparing to the existing methods for estimating β in a linear model, the proposed method has three advantages. Firstly, the estimating functions are smooth functions in contrast to the discrete estimating functions in the existing estimation methods, thus the root search is easier and can be done fast by conventional iterative methods such as the Newton-Raphson algorithm. Secondly, the standard error estimates are obtained directly by inverting either the efficient information matrix for the regression parameters or the observed information matrix of all parameters, either method is more computationally tractable compared to the re-sampling techniques. Thirdly, the proposed estimator achieves the semiparametric efficiency bound.

The proposed general sieve M-estimation theory can also be applied to other statistical models, for example, the single index model, the Cox model with an unknown link function, and the linear model under different censoring mechanisms. Such research is undergoing and will be presented elsewhere.

7. Proof of Theorem 4.2

Empirical process theory developed in [26, 27] will be heavily involved in the proof. We use the symbol ≲ to denote that the left hand side is bounded above by a constant times the right hand side and ≳ to denote that the left hand side is bounded below by a constant times the right hand side. For notational simplicity, we drop the superscript * in the outer probability measure P* whenever an outer probability applies.

7.1. Technical lemmas

We first introduce several lemmas that will be used for the proofs of Theorems 4.1, 4.2 and 4.3. Proofs of these lemmas are provided in the online Supplementary Material.

Lemma 7.1. Under Conditions C.1–C.3 and C.6, the log-likelihood

l (β, ζ (\cdot, β); Z) = Δ g (ε_{0} - X' (β - β_{0})) - \int_{a}^{b} 1 (ε_{0} \geq t) exp {g (t - X' (β - β_{0}))} d t,

where ε₀ = Y − X′β₀, has bounded and continuous first and second derivatives with respect to β ∈ ℬ and ζ(·, β) ∈ ℋ^p.

Lemma 7.2. For g₀ ∈ 𝒢^p, there exists a function $g_{0, n} \in 𝒢_{n}^{p}$ such that

{‖ g_{0, n} - g_{0} ‖}_{\infty} = O (n^{- p ν}) .

Lemma 7.3. Let θ_0,n = (β₀, ζ_0,n(·, β₀)) with ζ_0,n(·, β₀) ≡ g_0,n defined in Lemma 7.2. Denote $ℱ_{n} = {l (θ; z) - l (θ_{0, n}; z) : θ \in Θ_{n}^{p}}$ . Assume that Conditions C.1–C.3 and C.6 hold, then the ε-bracketing number associated with ‖ · ‖_∞ norm for ℱ_n is bounded by (1/ε)^cq_n+d, i.e., N_{[ ]}(ε, ℱ_n, ‖ · ‖_∞) ≲ (1/ε)^cq_n+d for some constant c > 0.

Lemma 7.4. Let $h_{j}^{*} (t, x, β) = w_{j}^{*} (ψ (t, x, β))$ , where $h_{j}^{*} (t, x, β_{0}) = w_{j}^{*} (t) = - ġ_{0} (t) P (X_{j} | ε_{0} \geq t)$ , j = 1, …, d. Assume Conditions C.1–C.6 hold, then there exists $h_{j, n}^{*} (t, x, β) = w_{j, n}^{*} (ψ (t, x, β)) \in ℋ_{n}^{2}$ such that ${‖ h_{j, n}^{*} - h_{j}^{*} ‖}_{\infty} = O (n^{- 2 ν})$ , or equivalently, ${‖ w_{j, n}^{*} - w_{j}^{*} ‖}_{\infty} = O (n^{- 2 ν})$ where $w_{j, n}^{*} \in 𝒢_{n}^{2}$ .

Lemma 7.5. For $h_{j}^{*}$ defined in Lemma 7.4, denote the class of functions

ℱ_{n}^{j} (η) = {{l̇}_{ζ} (θ; z) [h_{j}^{*} - h_{j}] : θ \in Θ_{n}^{p}, h_{j} \in ℋ_{n}^{2}, d (θ, θ_{0}) \leq η, {‖ h_{j} - h_{j}^{*} ‖}_{\infty} \leq η} .

Assume Conditions C.1–C.6 hold, then $N_{[]} (ε, ℱ_{n}^{j} (η), {‖ \cdot ‖}_{\infty}) ≲ {(η / ε)}^{c q_{n} + d}$ for some constant c > 0.

Lemma 7.6. For j = 1, ⋯, d, define the following two classes of functions

ℱ_{n, j}^{β} (η) = {{l̇}_{β_{j}} (θ; z) - {l̇}_{β_{j}} (θ_{0}; z) : θ \in Θ_{n}^{p}, d (θ, θ_{0}) \leq η, {‖ ġ (ψ (\cdot, β)) - ġ_{0} (ψ (\cdot, β_{0})) ‖}_{2} \leq η},

and

ℱ_{n, j}^{ζ} (η) = {{l̇}_{ζ} (θ; z) [h_{j}^{*} (\cdot, β)] - {l̇}_{ζ} (θ_{0}; z) [h_{j}^{*} (\cdot, β_{0})] : θ \in Θ_{n}^{p}, d (θ, θ_{0}) \leq η},

where l̇_{β_j} (θ; Z) is the jth element of l̇_β(θ; Z), ġ(·) denotes the derivative of g(·), and $h_{j}^{*}$ is defined in Lemma 7.5. Assume Conditions C.1–C.6 hold, then $N_{[]} (ε, ℱ_{n, j}^{β} (η), {‖ \cdot ‖}_{\infty}) ≲ {(η / ε)}^{c_{1} q_{n} + d}$ and $N_{[]} (ε, ℱ_{n, j}^{ζ} (η), {‖ \cdot ‖}_{\infty}) ≲ {(η / ε)}^{c_{2} q_{n} + d}$ for some constants c₁, c₂ > 0.

7.2. Proof of Theorem 4.2

We prove the theorem by checking Assumptions A1–A6 in Section 2. Here the criterion function of a single observation is the log-likelihood function l(β, ζ(·, β); Z). So instead of m, we use l to denote the criterion function. By Theorem 4.1 we know that Assumption A1 holds with ξ = min(pν, (1 − ν)/2) and the norm ‖ · ‖₂ defined in (4.1). A2 automatically holds for the scores. For A3, we need to find an $h^{*} = (h_{1}^{*}, \dots, h_{d}^{*})'$ with h* (t, x, β₀) = w* (t) such that

{S̈}_{β ζ} (β_{0}, ζ_{0} (\cdot, β_{0})) [h] - {S̈}_{ζ ζ} (β_{0}, ζ_{0} (\cdot, β_{0})) [h^{*}, h] = P {{l̈}_{β ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h] - {l̈}_{ζ ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h^{*}, h]} = 0

for all h ∈ ℍ with h(t, x, β) = w(t − x′(β − β₀)). Note that

P {{l̈}_{β ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h] - {l̈}_{ζ ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h^{*}, h]} = P {- X [Δ ẇ (ε_{0}) - \int_{a}^{b} 1 (ε_{0} \geq t) exp {g_{0} (t)} ẇ (t) d t] + \int_{a}^{b} 1 (ε_{0} \geq t) exp {g_{0} (t)} w (t) [X ġ_{0} (t) + w^{*} (t)] d t} .

Since P{l̇_ζ (β₀, ζ₀(·, β₀); Z)[h]|X} = 0 for all h ∈ ℍ, replacing h(·, β₀) by ẇ we have

P {- X [Δ ẇ (ε_{0}) - \int_{a}^{b} 1 (ε_{0} \geq t) exp {g_{0} (t)} ẇ (t) d t]} = P {- X \cdot P [Δ ẇ (ε_{0}) - \int_{a}^{b} 1 (ε_{0} \geq t) exp {g_{0} (t)} ẇ (t) d t | X]} = P {- X \cdot 0} = 0 .

Hence we only need to find a w* such that

P {\int_{a}^{b} 1 (ε_{0} \geq t) exp {g_{0} (t)} w (t) [X ġ_{0} (t) + w^{*} (t)] d t} = \int_{a}^{b} exp {g_{0} (t)} w (t) {ġ_{0} (t) P [1 (ε_{0} \geq t) X] + w^{*} (t) P [1 (ε_{0} \geq t)]} d t = 0 .

One obvious choice for w* (or h*) is

h^{*} (t, x, β_{0}) = w^{*} (t) = - ġ_{0} (t) \frac{P [1 (ε_{0} \geq t) X]}{P [1 (ε_{0} \geq t)]} = - ġ_{0} (t) P (X | ε_{0} \geq t) .

(7.1)

Then it follows

{l̇}_{β} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) - {l̇}_{ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h^{*}] = Δ {- ġ_{0} (Y - X' β_{0})} {X - P (X | ε_{0} \geq Y - X' β_{0})} - \int 1 (Y - X' β_{0} \geq t) {X - P (X | ε_{0} \geq t)} {- ġ_{0} (t)} exp {g_{0} (t)} d t = \int {X - P (X | ε_{0} \geq t)} {- ġ_{0} (t)} d M (t) = l_{β_{0}}^{*} (Y, Δ, X),

which is the efficient score function for β₀ originally derived by [21], where

M (t) = Δ I (Y - X' β_{0} \leq t) = \int_{- \infty}^{t} I (Y - X' β_{0} \geq s) exp {g_{0} (s)} d s .

By the fact of zero-mean for a score function, it is straightforward to verify the following equalities:

P {l̈}_{β ζ} (β, ζ (\cdot, β); Z) [h] = - P {{l̇}_{β} (β, ζ (\cdot, β); Z) {l̇}_{ζ}^{'} (β, ζ (\cdot, β); Z) [h]}, P {l̈}_{ζ β} (β, ζ (\cdot, β); Z) [h] = - P {{l̇}_{ζ} (β, ζ (\cdot, β); Z) [h] {l̇}_{β}^{'} (β, ζ (\cdot, β); Z)}, P {l̈}_{β β} (β, ζ (\cdot, β); Z) = - P {{l̇}_{β} (β, ζ (\cdot, β); Z) {l̇}_{β}^{'} (β, ζ (\cdot, β); Z)}, P {l̈}_{ζ ζ} (β, ζ (\cdot, β); Z) [h_{1}, h_{2}] = - P {{l̇}_{ζ} (β, ζ (\cdot, β); Z) [h_{1}] {l̇}_{ζ}^{'} (β, ζ (\cdot, β); Z) [h_{2}]} .

Then together with the fact that

P {{l̈}_{β ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h^{*}] - {l̈}_{ζ ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h^{*}, h^{*}]} = 0,

the matrix A in Assumption A3 of Theorem 2.1 is given by

A = P {- {l̈}_{β β} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) + {l̈}_{ζ β} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h^{*}] + {l̈}_{β ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h^{*}] - {l̈}_{ζ ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h^{*}, h^{*}]} = P {{l̇}_{β} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) {l̇}_{β}^{'} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) - {l̇}_{ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h^{*}] {l̇}_{β}^{'} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) - {l̇}_{β} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) {l̇}_{ζ}^{'} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h^{*}] + {l̇}_{ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h^{*}] {l̇}_{ζ}^{'} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h^{*}]} = P {{l̇}_{β} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) - {l̇}_{ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h^{*}]}^{\otimes 2} = P l_{β_{0}}^{*} {(Y, Δ, X)}^{\otimes 2},

which is the information matrix for β₀.

To verify A4, we note that the first part automatically holds since β̂_n satisfies the score equation Ṡ_β,n(β̂_n, ζ̂_n(·, β̂_n)) = ℙ_nl̇_β (β̂_n, ζ̂_n(·, β̂_n); Z) = 0. Next we shall show that

Ṡ_{ζ, n} ({β̂}_{n}, {ζ̂}_{n} (\cdot, {β̂}_{n})) [h_{j}^{*}] = ℙ_{n} {Δ w_{j}^{*} (Y - X' {β̂}_{n}) - \int 1 (Y \geq t) exp {{ζ̂}_{n} (t, X, {β̂}_{n})} w_{j}^{*} (t - X' {β̂}_{n}) d t} = o_{p} (n^{- 1 / 2}),

where $w_{j}^{*} (t) = - ġ_{0} (t) P (X_{j} | ε_{0} \geq t)$ , j = 1, ⋯, d, is the jth component of w* (t) given in (7.1). According to Lemma 7.4, there exists $h_{j, n}^{*} \in ℋ_{n}^{2}$ such that ${‖ h_{j}^{*} - h_{j, n}^{*} ‖}_{\infty} = O (n^{- 2 ν})$ . Then by the score equation for γ: Ṡ_γ,n (β̂_n, γ̂_n) = ℙ_nl̇_γ(β̂_n, γ̂_n; Z) = 0 and the fact that $w_{j, n}^{*} (t)$ can be written as $w_{j, n}^{*} (t) = \sum_{k = 1}^{q_{n}} γ_{j, k}^{*} B_{k} (t)$ for some coefficients ${γ_{j, 1}^{*}, \dots, γ_{j, q_{n}}^{*}}$ and the basis functions B_k(t) of the spline space, it follows that

ℙ_{n} {Δ w_{j, n}^{*} (Y - X' {β̂}_{n}) - \int 1 (Y \geq t) exp {{ζ̂}_{n} (t, X, {β̂}_{n})} w_{j, n}^{*} (t - X' {β̂}_{n}) d t} = 0 .

So it suffices to show that for each 1 ≤ j ≤ d,

I_{n} = ℙ_{n} {l̇}_{ζ} ({β̂}_{n}, {ζ̂}_{n} (\cdot, {β̂}_{n}); Z) [h_{j}^{*} - h_{j, n}^{*}] = o_{p} (n^{- 1 / 2}) .

Since $P {{l̇}_{ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h_{j}^{*} - h_{j, n}^{*}]} = 0$ , we decompose I_n into I_n = I_1n + I_2n, where

I_{1 n} = (ℙ_{n} - P) {l̇}_{ζ} ({β̂}_{n}, {ζ̂}_{n} (\cdot, {β̂}_{n}); Z) [h_{j}^{*} - h_{j, n}^{*}]

and

I_{2 n} = P {{l̇}_{ζ} ({β̂}_{n}, {ζ̂}_{n} (\cdot, {β̂}_{n}); Z) [h_{j}^{*} - h_{j, n}^{*}] - {l̇}_{ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h_{j}^{*} - h_{j, n}^{*}]} .

We will show that I_1n and I_2n are both o_p(n^−1/2).

First consider I_1n. According to Lemma 7.5, the ε-bracketing number associated with ‖ · ‖_∞ norm for the class $ℱ_{n}^{j} (η)$ defined in Lemma 7.5 is bounded by (η/ε)^cq_n+d. This implies that

log N_{[]} (ε, ℱ_{n}^{j} (η), L_{2} (P)) \leq log N_{[]} (ε, ℱ_{n}^{j} (η), {‖ \cdot ‖}_{\infty}) ≲ q_{n} log (η / ε),

which leads to the bracketing integral

J_{[]} (η, ℱ_{n}^{j} (η), L_{2} (P)) = \int_{0}^{η} \sqrt{1 + log N_{[]} (ε, ℱ_{n}^{j} (η), L_{2} (P))} d ε ≲ q_{n}^{1 / 2} η .

Now we pick η to be η_n = O{n^{−min(2ν,(1−ν)/2})}, then

{‖ h_{j}^{*} - h_{j, n}^{*} ‖}_{\infty} = O (n^{- 2 ν}) \leq O {n^{- min (2 ν, (1 - ν) / 2)}} = η_{n},

and since p ≥ 3,

d ({θ̂}_{n}, θ_{0}) = O_{p} {n^{- min (p ν, (1 - ν) / 2)}} \leq O_{p} {n^{- min (2 ν, (1 - ν) / 2)}} = η_{n} .

Therefore, ${l̇}_{ζ} ({β̂}_{n}, {ζ̂}_{n} (\cdot, {β̂}_{n}); z) [h_{j}^{*} - h_{j, n}^{*}] \in ℱ_{n}^{j} (η_{n})$ . Denote t_β = t − X′(β−β₀) for notational simplicity, for any ${l̇}_{ζ} (θ; Z) [h_{j}^{*} - h] \in ℱ_{n}^{j} (η_{n})$ , it follows that

P {{l̇}_{ζ} (θ; Z) [h_{j}^{*} - h]}^{2} = P {Δ (w_{j}^{*} - w) (ε_{β}) + \int_{a}^{b} 1 (ε_{0} \geq t) exp {g (t_{β})} (w_{j}^{*} - w) (t_{β}) d t}^{2} ≲ {‖ w_{j}^{*} - w ‖}_{\infty}^{2} + P {\int_{a}^{b} exp {2 g (t_{β})} {(w_{j}^{*} - w)}^{2} (t_{β}) d t} ≲ {‖ w_{j}^{*} - w ‖}_{\infty}^{2} + {‖ w_{j}^{*} - w ‖}_{\infty}^{2} \int_{a}^{b} P [exp {2 g (t_{β})}] d t,

where the first inequality holds because of the Cauchy-Schwartz inequality. Since ${‖ w_{j}^{*} - w ‖}_{\infty} \leq η_{n}$ , by the same argument of [24] on page 591 for slowly growing c_n (their l_n), e.g. $c_{n} = o (log (η_{n}^{- 1}))$ , we know that ${‖ {l̇}_{ζ} (θ; Z) [h_{j}^{*} - h] ‖}_{\infty}$ is bounded by some constant 0 < M < ∞ and $P {{l̇}_{ζ} (θ; Z) [h_{j}^{*} - h]}^{2} ≲ η_{n}$ for a slightly enlarged η_n obtained by a fine adjustment of ν. Then by the maximal inequality in Lemma 3.4.2 of [27], it follows that

E_{P} {‖ 𝔾_{n} ‖}_{ℱ_{n}^{j} (η_{n})} ≲ J_{[]} (η_{n}, ℱ_{n}^{j} (η_{n}), L_{2} (P)) (1 + \frac{J_{[]} (η_{n}, ℱ_{n}^{j} (η_{n}), L_{2} (P))}{η_{n}^{2} \sqrt{n}} M) ≲ q_{n}^{1 / 2} η_{n} + q_{n} n^{- 1 / 2} = O {n^{ν / 2 - min (2 ν, (1 - ν) / 2)}} + O (n^{ν - 1 / 2}) = O {n^{- min (3 ν / 2, 1 / 2 - ν)}} + O (n^{ν - 1 / 2}) = o (1),

where the last equality holds because 0 < ν < 1/2. Thus by the Markov’s inequality, $I_{1 n} = n^{- 1 / 2} 𝔾_{n} {l̇}_{ζ} ({θ̂}_{n}; Z) [h_{j}^{*} - h_{j, n}^{*}] = o_{p} (n^{- 1 / 2})$ .

Next for I_2n, the Taylor expansion for ${l̇}_{ζ} ({θ̂}_{n}; Z) [h_{j}^{*} - h_{j, n}^{*}]$ at θ₀ yields

{l̇}_{ζ} ({β̂}_{n}, {ζ̂}_{n} (\cdot, {β̂}_{n}); Z) [h_{j}^{*} - h_{j, n}^{*}] - {l̇}_{ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h_{j}^{*} - h_{j, n}^{*}] = ({β̂}_{n} - β_{0})' {l̈}_{β ζ} ({β̃}_{n}, {ζ̃}_{n} (\cdot, {β̃}_{n}); Z) [h_{j}^{*} - h_{j, n}^{*}] + {l̈}_{ζ ζ} ({β̃}_{n}, {ζ̃}_{n} (\cdot, {β̃}_{n}); Z) [h_{j}^{*} - h_{j, n}^{*}, {ζ̂}_{n} - ζ_{0}],

where (β̃_n, ζ̃_n(·, β̃_n)) is between (β₀, ζ₀(·, β₀)) and (β̂_n, ζ̂_n(·, β̂_n)). Then it follows that

| {l̈}_{β ζ} ({β̃}_{n}, {ζ̃}_{n} (\cdot, {β̃}_{n}); Z) [h_{j}^{*} - h_{j, n}^{*}] | = | X {Δ (ẇ_{j}^{*} - ẇ_{j, n}^{*}) (ε_{{β̃}_{n}}) - \int_{a}^{b} 1 (ε_{0} \geq t) exp {{g̃}_{n} (t_{{β̃}_{n}})} [(ẇ_{j}^{*} - ẇ_{j, n}^{*}) (t_{{β̃}_{n}}) + {\dot{\tilde{g}}}_{n} (t_{{β̃}_{n}}) (w_{j}^{*} - w_{j, n}^{*}) (t_{{β̃}_{n}})] d t} | ≲ {‖ ẇ_{j}^{*} - ẇ_{j, n}^{*} ‖}_{\infty} + {‖ ẇ_{j}^{*} - ẇ_{j, n}^{*} ‖}_{\infty} {\int_{a}^{b} exp {{g̃}_{n} (t_{{β̃}_{n}})} d t} + {‖ w_{j}^{*} - w_{j, n}^{*} ‖}_{\infty} {\int_{a}^{b} exp {{g̃}_{n} (t_{{β̃}_{n}})} {\dot{\tilde{g}}}_{n} (t_{{β̃}_{n}}) d t} ≲ {‖ ẇ_{j}^{*} - ẇ_{j, n}^{*} ‖}_{\infty} + {‖ w_{j}^{*} - w_{j, n}^{*} ‖}_{\infty} = O (n^{- ν}) + O (n^{- 2 ν}) = O (n^{- ν}),

where the second inequality holds because g̃_n and its first derivative ${\dot{\tilde{g}}}_{n}$ are bounded (or growing with n slowly enough so it can be effectively treated as bounded based on the same argument of [24] on page 591), and the last equality holds due to the Corollary 6.21 of [22] that ${‖ ẇ_{j}^{*} - ẇ_{j, n}^{*} ‖}_{\infty} = O (n^{- (2 - 1) ν}) = O (n^{- ν})$ . Thus,

| ({β̂}_{n} - β_{0})' {l̈}_{β ζ} ({β̃}_{n}, {ζ̃}_{n} (\cdot, {β̃}_{n}); Z) [h_{j}^{*} - h_{j, n}^{*}] | = | {β̂}_{n} - β_{0} | \cdot O (n^{- ν}) = O_{p} {n^{- min (p ν, (1 - ν) / 2)}} \cdot O (n^{- ν}) = O_{p} {n^{- min ((p + 1) ν, (1 + 3 ν) / 2)}} .

Also,

| {l̈}_{ζ ζ} ({β̃}_{n}, {ζ̃}_{n} (\cdot, {β̃}_{n}); Z) [h_{j}^{*} - h_{j, n}^{*}, {ζ̂}_{n} - ζ_{0}] | = | \int_{a}^{b} 1 (ε_{0} \geq t) exp {{g̃}_{n} (t_{{β̃}_{n}})} (w_{j}^{*} - w_{j, n}^{*}) (t_{{β̃}_{n}}) (ĝ_{n} - g_{0}) (t_{{β̃}_{n}}) d t | \leq {‖ w_{j}^{*} - w_{j, n}^{*} ‖}_{\infty} \cdot {\int_{a}^{b} exp {{g̃}_{n} (t_{{β̃}_{n}})} (ĝ_{n} - g_{0}) (t_{{β̃}_{n}}) d t} = {‖ w_{j}^{*} - w_{j, n}^{*} ‖}_{\infty} \cdot I_{3 n} .

By the Cauchy-Schwartz inequality and the boundedness of g̃_n, we have

P {I_{3 n}}^{2} = P {\int_{a}^{b} exp {{g̃}_{n} (t_{{β̃}_{n}})} (ĝ_{n} - g_{0}) (t_{{β̃}_{n}}) d t}^{2} ≲ \int_{𝒳} \int_{a}^{b} {(ĝ_{n} - g_{0})}^{2} (t_{β̃}) d Λ_{0} (t) d F_{X} (x) = {‖ {ζ̂}_{n} (\cdot, {β̃}_{n}) - ζ_{0} (\cdot, {β̃}_{n}) ‖}_{2}^{2} ≲ | {β̃}_{n} - {β̂}_{n} |^{2} + {‖ {ζ̂}_{n} (\cdot, {β̂}_{n}) - ζ_{0} (\cdot, β_{0}) ‖}_{2}^{2} + | β_{0} - {β̃}_{n} |^{2} ≲ | {β̂}_{n} - β_{0} |^{2} + {‖ {ζ̂}_{n} (\cdot, {β̂}_{n}) - ζ_{0} (\cdot, β_{0}) ‖}_{2}^{2} = d {({θ̂}_{n}, θ_{0})}^{2}

Hence |I_3n| ≲ d(θ̂_n, θ₀) and

| {l̈}_{ζ ζ} ({β̃}_{n}, {g̃}_{n}; Z) [h_{j}^{*} - h_{j, n}^{*}, {ζ̂}_{n} - ζ_{0}] | ≲ {‖ w_{j}^{*} - w_{j, n}^{*} ‖}_{\infty} \cdot d ({θ̂}_{n}, θ_{0}) = O (n^{- 2 ν}) \cdot O_{p} {n^{- min (p ν, (1 - ν) / 2)}} = O_{p} {n^{- min ((p + 2) ν, (1 + 3 ν) / 2)}} .

Since $\frac{1}{2 (1 + p)} < ν < \frac{1}{1 + 2 p}$ , it follows that I_2n = O{n^{−min((p+1)ν,(1+3ν)/2)}} = o(n^−1/2). Thus I_n = I_1n + I_2n = o_p(n^−1/2) and Condition A4 holds.

Now we verify Assumption A5. First by Lemma 7.6, the ε-bracketing numbers for the classes of functions $ℱ_{n, j}^{β} (η)$ and $ℱ_{n, j}^{ζ} (η)$ are both bounded by (η/ε)^cq_n+d, which implies that the corresponding ε-bracketing integrals are both bounded by $q_{n}^{1 / 2} η$ , i.e.,

J_{[]} (η, ℱ_{n, j}^{β} (η), L_{2} (P)) ≲ q_{n}^{1 / 2} η and J_{[]} (η, ℱ_{n, j}^{ζ} (η), L_{2} (P)) ≲ q_{n}^{1 / 2} η .

Then for l̇_{β_j} (θ; z) − l̇_{β_j} (θ₀; z), by applying the Cauchy-Schwartz inequality, together with subtracting and adding the terms ġ(ε₀), e^g₀(t_β) ġ(t_β), e^g₀(t) ġ(t_β) and e^g₀(t) ġ₀(t_β), we have

{{l̇}_{β_{j}} (θ; Z) - {l̇}_{β_{j}} (θ_{0}; Z)}^{2} = {- Δ X_{j} [ġ (ε_{β}) - ġ_{0} (ε_{0})] + X_{j} \int_{a}^{b} 1 (ε_{0} \geq t) [e^{g (t_{β})} ġ (t_{β}) - e^{g_{0} (t)} ġ_{0} (t)] d t}^{2} ≲ {Δ {[ġ (ε_{β}) - ġ_{0} (ε_{0})]}^{2}} + {\int_{a}^{b} {[e^{g (t_{β})} ġ (t_{β}) - e^{g_{0} (t)} ġ_{0} (t)]}^{2} d t} ≲ {Δ {[ġ (ε_{β}) - ġ (ε_{0})]}^{2}} + {Δ {[ġ (ε_{0}) - ġ_{0} (ε_{0})]}^{2}} + \int_{a}^{b} {{[e^{g (t_{β}))} - e^{g_{0} (t_{β})}]}^{2} + {[e^{g_{0} (t_{β})} - e^{g_{0} (t)}]}^{2}} ġ^{2} (t_{β}) d t + \int_{a}^{b} e^{2 g_{0} (t)} {{[ġ (t_{β}) - ġ_{0} (t_{β})]}^{2} + e^{2 g_{0} (t)} {[ġ_{0} (t_{β}) - ġ_{0} (t)]}^{2}} d t = B_{1} + B_{2} + B_{3} + B_{4} .

For B₁, since g̈ is bounded and the largest eigenvalue of P(XX′) satisfies 0 < λ_d < ∞ by Condition C.2(b), it follows that

P B_{1} \leq P {[g̈ (Y - X' β̃) X' (β - β_{0})]}^{2} ≲ P {[X' (β - β_{0})]}^{2} \leq λ_{d} | β - β_{0} |^{2} ≲ | β - β_{0} |^{2} \leq η^{2} .

For B₂, we have

P B_{2} \leq \int_{𝒳} {\int_{a}^{b} {(ġ (t) - ġ_{0} (t))}^{2} d Λ_{0} (t)} d F_{X} (x) = {‖ ġ ((ψ (\cdot, β_{0})) - ġ_{0} (ψ (\cdot, β_{0})) ‖}_{2}^{2} ≲ | β - β_{0} |^{2} + {‖ ġ (ψ (\cdot, β)) - ġ_{0} (ψ (\cdot, β_{0})) ‖}_{2}^{2} ≲ η^{2} .

For B₃, by using the mean value theorem, it follows that

P B_{3} = P {\int_{a}^{b} {{[e^{g̃ (t_{β})} (g - g_{0}) (t_{β})]}^{2} + {[e^{g_{0} (t_{β̃})} X' (β - β_{0})]}^{2}} ġ^{2} (t_{β}) d t} ≲ \int_{𝒳} \int_{a}^{b} {(g - g_{0})}^{2} (t_{β}) d Λ_{0} (t) d F_{X} (x) + P {[X' (β - β_{0})]}^{2} ≲ {‖ ζ (\cdot, β) - ζ_{0} (\cdot, β_{0}) ‖}_{2}^{2} + | β - β_{0} |^{2} \leq η^{2},

where g̃ = g₀ + ξ(g − g₀) for some 0 < ξ < 1 and thus is bounded. Finally for B₄, by the mean value theorem, it follows that

P B_{4} = P {\int_{a}^{b} e^{2 g_{0} (t)} {{[ġ (t_{β}) - ġ_{0} (t_{β})]}^{2} + e^{2 g_{0} (t)} [ġ_{0} (t_{β}) - ġ_{0} {(t)]}^{2}} d t} ≲ \int_{𝒳} \int_{a}^{b} {(ġ - ġ_{0})}^{2} (t_{β}) d Λ_{0} (t) d F_{X} (x) + P \int_{a}^{b} {[{g̈}_{0} (t_{β̃}) X' (β - β_{0})]}^{2} d t ≲ {‖ ġ (ψ (\cdot, β)) - ġ_{0} (ψ (\cdot, β)) ‖}_{2}^{2} + P {[X' (β - β_{0})]}^{2} ≲ {‖ ġ (ψ (\cdot, β)) - ġ_{0} (ψ (\cdot, β_{0})) ‖}_{2}^{2} + | β - β_{0} |^{2} ≲ η^{2} .

Therefore we have P{l̇_{β_j} (θ; Z) − l̇_{β_j} (θ₀; Z)}² ≲ η². Using the similar argument, we can show that $P {{l̇}_{ζ} (θ; Z) [h_{j}^{*}] - {l̇}_{ζ} (θ_{0}; Z) [h_{j}^{*}]}^{2} ≲ η^{2}$ . By Lemma 7.1, we also have ‖l̇_{β_j} (θ; Z)− l̇_{β_j} (θ₀; Z)‖_∞ and ${‖ {l̇}_{ζ} (θ; Z) [h_{j}^{*}] - {l̇}_{ζ} (θ_{0}; Z) [h_{j}^{*}] ‖}_{\infty}$ are both bounded. Now we pick η as η_n = O{n^{−min((p−1)ν, (1−ν)/2)}}, then by the maximal inequality in Lemma 3.4.2 of [27], it follows that

E_{P} {‖ 𝔾_{n} ‖}_{ℱ_{n, j}^{β} (η_{n})} ≲ q_{n}^{1 / 2} η_{n} + q_{n} n^{- 1 / 2} = O {n^{max ((\frac{3}{2} - p) ν, ν - \frac{1}{2})}} + O (n^{ν - \frac{1}{2}}) = o (1),

where the last equality holds since p ≥ 3 and $ν < \frac{1}{2}$ . Similarly, we have $E_{P} {‖ 𝔾_{n} ‖}_{ℱ_{n, j}^{ζ} (η_{n})} = o (1)$ . Thus for ξ = min(pν, (1 − ν)/2) and Cn^−ξ = O{n^{−min(pν,(1−ν)/2)}}, by the Markov’s inequality,

sup_{d (θ, θ_{0}) \leq C n^{- ξ}} 𝔾_{n} {{l̇}_{β_{j}} (β, ζ (\cdot, β); Z) - {l̇}_{β_{j}} (β_{0}, ζ_{0} (\cdot, β_{0}); Z)} = o_{p} (1), sup_{d (θ, θ_{0}) \leq C n^{- ξ}} 𝔾_{n} {{l̇}_{ζ} (β, ζ (\cdot, β); Z) [h_{j}^{*}] - {l̇}_{ζ} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) [h_{j}^{*}]} = o_{p} (1) .

This completes the verification of Assumption A5.

Finally, Assumption A6 can be verified by using the Taylor expansion. Since the proofs for the two equations in A6 are essentially identical, we just prove the first equation. In a neighborhood of $θ_{0} : {θ : d (θ, θ_{0}) \leq C n^{- ξ}, θ \in Θ_{n}^{p}}$ with ξ = min(pν, (1 − ν)/2), the Taylor expansion for l̇_β(θ; Z) yields

{l̇}_{β} (θ; Z) = {l̇}_{β} (θ_{0}; Z) + {l̈}_{β β} (θ̃; Z) (β - β_{0}) + {l̈}_{β g} (θ̃; Z) [ζ (\cdot, β) - ζ_{0} (\cdot, β_{0})] = {l̇}_{β} (θ_{0}; Z) + {l̈}_{β β} (θ_{0}; Z) (β - β_{0}) + {l̈}_{β ζ} (θ_{0}; Z) [ζ (\cdot, β) - ζ_{0} (\cdot, β_{0})] + {{l̈}_{β β} (θ̃; Z) (β - β_{0}) - {l̈}_{β β} (θ_{0}; Z) (β - β_{0})} + {{l̈}_{β ζ} (θ̃; Z) [ζ (\cdot, β) - ζ_{0} (\cdot, β_{0})] - {l̈}_{β ζ} (θ_{0}; Z) [ζ (\cdot, β) - ζ_{0} (\cdot, β_{0})]},

where θ̃ = (β̃, ζ̃(·, β̃)) is a midpoint between θ₀ and θ. So

P {{l̇}_{β} (θ; Z) - {l̇}_{β} (θ_{0}; Z) - {l̈}_{β β} (θ_{0}; Z) (β - β_{0}) - {l̈}_{β ζ} (θ_{0}; Z) [ζ (\cdot, β) - ζ_{0} (\cdot, β_{0})]} = P {{l̈}_{β β} (θ̃; Z) - {l̈}_{β β} (θ_{0}; Z)} (β - β_{0}) + P {{l̈}_{β ζ} (θ̃; Z) [ζ (\cdot, β) - ζ_{0} (\cdot, β_{0})] - {l̈}_{β ζ} (θ_{0}; Z) [ζ (\cdot, β) - ζ_{0} (\cdot, β_{0})]} .

Then by direct calculation we have

P | {l̈}_{β β} (θ̃; Z) - {l̈}_{β β} (θ_{0}; Z) | \leq P | X X' Δ {\ddot{\tilde{g}} (ε_{β̃}) - {g̈}_{0} (ε_{0})} | + P {X X' | \int_{a}^{b} 1 (ε_{0} \geq t) {exp {g̃ (t_{β̃})} \ddot{\tilde{g}} (t_{β̃}) - exp {g_{0} (t)} {g̈}_{0} (t)} d t + \int_{a}^{b} 1 (ε_{0} \geq t) {exp {g̃ (t_{β̃})} {\dot{\tilde{g}}}^{2} (t_{β̃}) - exp {g_{0} (t)} ġ_{0}^{2} (t)} d t |} ≲ P | Δ {\ddot{\tilde{g}} (ε_{β̃}) - {g̈}_{0} (ε_{0})} | + P {\int_{a}^{b} | exp {g̃ (t_{β̃})} \ddot{\tilde{g}} (t_{β̃}) - exp {g_{0} (t)} {g̈}_{0} (t) | d t} + P {\int_{a}^{b} | exp {g̃ (t_{β̃})} {\dot{\tilde{g}}}^{2} (t_{β̃}) - exp {g_{0} (t)} ġ_{0}^{2} (t) | d t} = C_{1} + C_{2} + C_{3} .

By applying the similar argument that we used before for verifying A5 and Condition C.6, we can show

C_{1} ≲ | β - β_{0} | + {‖ g̈ (ψ (\cdot, β)) - {g̈}_{0} (ψ (\cdot, β_{0})) ‖}_{2} = O (n^{- ξ}) + O {n^{- min ((p - 2) ν, (1 - ν) / 2)}} .

Similarly, we can show

C_{2} ≲ | β - β_{0} | + {‖ g̈ (ψ (\cdot, β)) - {g̈}_{0} (ψ (\cdot, β_{0})) ‖}_{2} = O (n^{- ξ}) + O {n^{- min ((p - 2) ν, (1 - ν) / 2)}}

and

C_{3} ≲ | β - β_{0} | + {‖ ġ (ψ (\cdot, β)) - ġ_{0} (ψ (\cdot, β_{0})) ‖}_{2} = O (n^{- ξ}) + O {n^{- min ((p - 1) ν, (1 - ν) / 2)}},

where ξ = min(pν, (1 − ν)/2). Therefore,

P | {l̈}_{β β} (θ̃; Z) - {l̈}_{β β} (θ_{0}; Z) | = O {n^{- min ((p - 2) ν, (1 - ν) / 2)}}

and thus

P | {l̈}_{β β} (θ̃; Z) - {l̈}_{β β} (θ_{0}; Z) | (β - β_{0}) = O {n^{- min ((p - 2) ν, (1 - ν) / 2)}} \cdot O {n^{- min (p ν, (1 - ν) / 2)}} = O {n^{- min (2 (p - 1) ν, \frac{1}{2} + (p - \frac{5}{2}) ν, 1 - ν)}} = o (n^{- 1 / 2}),

where the last equality holds since p ≥ 3, so $2 (p - 1) ν > \frac{p - 1}{p + 1} \geq \frac{1}{2}, \frac{1}{2} + (p - \frac{5}{2}) ν > \frac{1}{2}$ and $1 - ν > \frac{1}{2}$ . Similarly we can show

P | {l̈}_{β ζ} (θ̃; Z) [ζ (\cdot, β) - ζ_{0} (\cdot, β_{0})] - {l̈}_{β ζ} (θ_{0}; Z) [ζ (\cdot, β) - ζ_{0} (\cdot, β_{0})] | = O {n^{- min (2 (p - 1) ν, \frac{1}{2} + (p - \frac{5}{2}) ν, 1 - ν)}} = o (n^{- 1 / 2}) .

Therefore, we have

| P {{l̇}_{β} (θ; Z) - {l̇}_{β} (θ_{0}; Z) - {l̈}_{β β} (θ_{0}; Z) (β - β_{0}) - {l̈}_{β ζ} (θ̃; Z) [ζ (\cdot, β) - ζ_{0} (\cdot, β_{0})]} | = O {n^{- min (2 (p - 1) ν, \frac{1}{2} + (p - \frac{5}{2}) ν, 1 - ν)}} = O (n^{- α ξ}),

where $α = min (2 (p - 1) ν, \frac{1}{2} + (p - \frac{5}{2}) ν, 1 - ν) / min (p ν, \frac{1 - ν}{2}) > 1$ and αξ > 1/2.

Therefore, we have verified all six assumptions and thus we have

\sqrt{n} ({β̂}_{n} - β_{0}) = A^{- 1} \sqrt{n} ℙ_{n} l_{β_{0}}^{*} (β_{0}, ζ_{0} (\cdot, β_{0}); Z) + o_{p} (1) \to N (0, A^{- 1} B (A^{- 1})'),

where $l_{β_{0}}^{*} (θ_{0}; Z) = {l̇}_{β} (θ_{0}; Z) - {l̇}_{ζ} (θ_{0}; Z) [h^{*}]$ is the efficient score function for β₀ and $A = P {l_{β_{0}}^{*} (Y, Δ, X)}^{\otimes 2} = I (β_{0})$ , which is shown when verifying A3. Hence A = B and A⁻¹B(A⁻¹)′ = A⁻¹ = I⁻¹(β₀), and

\sqrt{n} ℙ_{n} l_{β_{0}}^{*} (θ_{0}; Z) = n^{- \frac{1}{2}} \sum_{i = 1}^{n} l_{β_{0}}^{*} (Y_{i}, Δ_{i}, X_{i}) .

Thus we complete the proof of Theorem 4.2.

Supplementary Material

Supplement Material

NIHMS359465-supplement-Supplement_Material.pdf^{(117.4KB, pdf)}

Acknowledgements

The authors would like to thank two referees and an associate editor for their very helpful comments.

Footnotes

Supported in part by NSF Grant DMS-0706700. Nan’s research is also supported in part by NSF grant DMS-1007590 and NIH grant R01-AG036802.

SUPPLEMENTARY MATERIAL

Additional proofs. The supplementary document contains proofs of technical lemmas and Theorems 4.1 and 4.3.

Contributor Information

Ying Ding, Email: yingding@umich.edu.

Bin Nan, Email: bnan@umich.edu.

References

1.Ai C, Chen X. Efficient estimation of models with conditional moment restrictions containing unknown functions. Econometrica. 2003;71:1795–1843. [Google Scholar]
2.Buckley J, James I. Linear Regression with Censored Data. Biometrika. 1979;66:429–436. [Google Scholar]
3.Chamberlain G. Asymptotic Efficiency in Estimation with Conditional Moment Restrictions. Journal of Econometrics. 1987;34:305–334. [Google Scholar]
4.Chen X. Large Sample Sieve Estimation of Semi-nonparametric Models. In: Heckman JJ, Leamer EE, editors. Handbook of Econometrics, Volumn 6B. Elsevier; 2007. pp. 5549–5632. [Google Scholar]
5.Chen X, Linton O, Van Keilegom I. Estimation of semiparametric models when the criterion function is not smooth. Econometrica. 2003;71:1591–1608. [Google Scholar]
6.Cox DR. Regression Models and Lifetables. Journal of the Royal Statistical Society, Series B. 1972;34:187–220. [Google Scholar]
7.Ding Y. Ph.D. Thesis, Biostatistics. University of Michigan; 2010. Some New Insights about the Accelerated Failure Time Model. [Google Scholar]
8.He X, Shao Q-M. On parameters of increasing dimensions. Journal of Multivariate Analysis. 2000;73:120–135. [Google Scholar]
9.He X, Xue H, N-Z S. Sieve maximum likelihood estimation for doubly semiparametric zero-inflated Poisson models. Journal of Multivariate Analysis. 2010;101:2026–2038. doi: 10.1016/j.jmva.2010.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Huang J. Efficient Estimation for the Proportional Hazards Model with Interval Censoring. The Annals of Statistics. 1996;24:540–568. [Google Scholar]
11.Huang J. Efficient Estimation of the Partly Linear Additive Cox Model. The Annals of Statistics. 1999;27:1536–1563. [Google Scholar]
12.Huang J, Wellner JA. Interval censored survival data: a review of recent progress. Proceedings of the First Seattle Symposium in Biostatistics: Survival Analysis, Lecture Notes in Statistics. 1997;123:123–169. [Google Scholar]
13.Jin Z, Lin DY, Wei LJ, Ying Z. Rank-based Inference for the Accelerated Failure Time Model. Biometrika. 2003;90:341–353. [Google Scholar]
14.Jin Z, Lin DY, Ying Z. On Least-Squares Regression with Censored Data. Biometrika. 2006;93:147–161. [Google Scholar]
15.Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2nd edition. Hoboken, NJ: Wiley; 2002. [Google Scholar]
16.Lai TL, Ying Z. Large Sample Theory of a Modified Buckley-James Estimator for Regression Analysis with Censored Data. The Annals of Statistics. 1991;10:1370–1402. [Google Scholar]
17.Miller RG, Halpern J. Regression with censored data. Biometrika. 1982;69:521–531. [Google Scholar]
18.Nan B, Kalbfleisch JD, Yu M. Asymptotic theory for the semiparametric accelerated failure time model with missing data. The Annals of Statistics. 2009;37:2351–2376. [Google Scholar]
19.Prentice RL. Linear Rank Tests with Right Censored Data. Biometrika. 1978;65:167–179. [Google Scholar]
20.Ritov Y. Estimation in a Linear Regression Model with Censored Data. The Annals of Statistics. 1990;18:303–328. [Google Scholar]
21.Ritov Y, Wellner JA. Censoring, Martingales and the Cox model. In: Prabhu NU, editor. Statistical Inference from Stochastic Processes. Providence, RI: American Mathematical Society; 1988. pp. 191–219. [Google Scholar]
22.Schumaker L. Spline Functions: Basic Theory. New York: Wiley; 1981. [Google Scholar]
23.Shen X. On methods of sieves and penalization. The Annals of Statistics. 1997;25:2555–2591. [Google Scholar]
24.Shen X, Wong WH. Convergence Rate of Sieve Estimates. The Annals of Statistics. 1994;22:580–615. [Google Scholar]
25.Tsiatis AA. Estimating Regresion Parameters Using Linear Rank Tests for Censored Data. The Annals of Statistics. 1990;18:354–372. [Google Scholar]
26.van der Vaart AW. Asymptotic Statistics. Cambridge University Press; 1998. [Google Scholar]
27.van der Vaart AW, Wellner JA. Weak Convergence and Em- pirical Processes. New York: Springer; 1996. [Google Scholar]
28.Wei LJ, Ying Z, Lin DY. Linear Regression Analysis of Censored Survival Data Based on Rank Tests. Biometrika. 1990;77:845–851. [Google Scholar]
29.Wellner JA, Zhang Y. Two Likelihood-based Semiparametric Estimation Methods for Panel Count Data with Covariates. The Annals of Statistics. 2007;35:2106–2142. [Google Scholar]
30.Ying Z. A large Sample Study of Rank Estimation for Censored Regression Data. The Annals of Statistics. 1993;21:76–99. [Google Scholar]
31.Zeng D, Lin DY. Efficient Estimation for the Accelerated Failure Time Model. Journal of the American Statistical Association. 2007;102:1387–1396. [Google Scholar]
32.Zhang Y, Hua L, Huang J. A Spline-Based Semiparametric Maximum Likelihood Estimation Method for the Cox Model with Interval-Censored Data. Scandinavian Journal of Statistics. 2010;37:338–354. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement Material

NIHMS359465-supplement-Supplement_Material.pdf^{(117.4KB, pdf)}

[R1] 1.Ai C, Chen X. Efficient estimation of models with conditional moment restrictions containing unknown functions. Econometrica. 2003;71:1795–1843. [Google Scholar]

[R2] 2.Buckley J, James I. Linear Regression with Censored Data. Biometrika. 1979;66:429–436. [Google Scholar]

[R3] 3.Chamberlain G. Asymptotic Efficiency in Estimation with Conditional Moment Restrictions. Journal of Econometrics. 1987;34:305–334. [Google Scholar]

[R4] 4.Chen X. Large Sample Sieve Estimation of Semi-nonparametric Models. In: Heckman JJ, Leamer EE, editors. Handbook of Econometrics, Volumn 6B. Elsevier; 2007. pp. 5549–5632. [Google Scholar]

[R5] 5.Chen X, Linton O, Van Keilegom I. Estimation of semiparametric models when the criterion function is not smooth. Econometrica. 2003;71:1591–1608. [Google Scholar]

[R6] 6.Cox DR. Regression Models and Lifetables. Journal of the Royal Statistical Society, Series B. 1972;34:187–220. [Google Scholar]

[R7] 7.Ding Y. Ph.D. Thesis, Biostatistics. University of Michigan; 2010. Some New Insights about the Accelerated Failure Time Model. [Google Scholar]

[R8] 8.He X, Shao Q-M. On parameters of increasing dimensions. Journal of Multivariate Analysis. 2000;73:120–135. [Google Scholar]

[R9] 9.He X, Xue H, N-Z S. Sieve maximum likelihood estimation for doubly semiparametric zero-inflated Poisson models. Journal of Multivariate Analysis. 2010;101:2026–2038. doi: 10.1016/j.jmva.2010.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Huang J. Efficient Estimation for the Proportional Hazards Model with Interval Censoring. The Annals of Statistics. 1996;24:540–568. [Google Scholar]

[R11] 11.Huang J. Efficient Estimation of the Partly Linear Additive Cox Model. The Annals of Statistics. 1999;27:1536–1563. [Google Scholar]

[R12] 12.Huang J, Wellner JA. Interval censored survival data: a review of recent progress. Proceedings of the First Seattle Symposium in Biostatistics: Survival Analysis, Lecture Notes in Statistics. 1997;123:123–169. [Google Scholar]

[R13] 13.Jin Z, Lin DY, Wei LJ, Ying Z. Rank-based Inference for the Accelerated Failure Time Model. Biometrika. 2003;90:341–353. [Google Scholar]

[R14] 14.Jin Z, Lin DY, Ying Z. On Least-Squares Regression with Censored Data. Biometrika. 2006;93:147–161. [Google Scholar]

[R15] 15.Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2nd edition. Hoboken, NJ: Wiley; 2002. [Google Scholar]

[R16] 16.Lai TL, Ying Z. Large Sample Theory of a Modified Buckley-James Estimator for Regression Analysis with Censored Data. The Annals of Statistics. 1991;10:1370–1402. [Google Scholar]

[R17] 17.Miller RG, Halpern J. Regression with censored data. Biometrika. 1982;69:521–531. [Google Scholar]

[R18] 18.Nan B, Kalbfleisch JD, Yu M. Asymptotic theory for the semiparametric accelerated failure time model with missing data. The Annals of Statistics. 2009;37:2351–2376. [Google Scholar]

[R19] 19.Prentice RL. Linear Rank Tests with Right Censored Data. Biometrika. 1978;65:167–179. [Google Scholar]

[R20] 20.Ritov Y. Estimation in a Linear Regression Model with Censored Data. The Annals of Statistics. 1990;18:303–328. [Google Scholar]

[R21] 21.Ritov Y, Wellner JA. Censoring, Martingales and the Cox model. In: Prabhu NU, editor. Statistical Inference from Stochastic Processes. Providence, RI: American Mathematical Society; 1988. pp. 191–219. [Google Scholar]

[R22] 22.Schumaker L. Spline Functions: Basic Theory. New York: Wiley; 1981. [Google Scholar]

[R23] 23.Shen X. On methods of sieves and penalization. The Annals of Statistics. 1997;25:2555–2591. [Google Scholar]

[R24] 24.Shen X, Wong WH. Convergence Rate of Sieve Estimates. The Annals of Statistics. 1994;22:580–615. [Google Scholar]

[R25] 25.Tsiatis AA. Estimating Regresion Parameters Using Linear Rank Tests for Censored Data. The Annals of Statistics. 1990;18:354–372. [Google Scholar]

[R26] 26.van der Vaart AW. Asymptotic Statistics. Cambridge University Press; 1998. [Google Scholar]

[R27] 27.van der Vaart AW, Wellner JA. Weak Convergence and Em- pirical Processes. New York: Springer; 1996. [Google Scholar]

[R28] 28.Wei LJ, Ying Z, Lin DY. Linear Regression Analysis of Censored Survival Data Based on Rank Tests. Biometrika. 1990;77:845–851. [Google Scholar]

[R29] 29.Wellner JA, Zhang Y. Two Likelihood-based Semiparametric Estimation Methods for Panel Count Data with Covariates. The Annals of Statistics. 2007;35:2106–2142. [Google Scholar]

[R30] 30.Ying Z. A large Sample Study of Rank Estimation for Censored Regression Data. The Annals of Statistics. 1993;21:76–99. [Google Scholar]

[R31] 31.Zeng D, Lin DY. Efficient Estimation for the Accelerated Failure Time Model. Journal of the American Statistical Association. 2007;102:1387–1396. [Google Scholar]

[R32] 32.Zhang Y, Hua L, Huang J. A Spline-Based Semiparametric Maximum Likelihood Estimation Method for the Cox Model with Interval-Censored Data. Scandinavian Journal of Statistics. 2010;37:338–354. [Google Scholar]

PERMALINK

A SIEVE M-THEOREM FOR BUNDLED PARAMETERS IN SEMIPARAMETRIC MODELS, WITH APPLICATION TO THE EFFICIENT ESTIMATION IN A LINEAR MODEL FOR CENSORED DATA^*

Ying Ding

Bin Nan

Abstract

1. Introduction

2. The sieve M-theorem on the asymptotic normality of semiparametric estimation for bundled parameters

3. Back to the linear model: the sieve maximum likelihood estimation

4. Asymptotic results

5. Numerical examples

5.1. Simulations

Table 1.

5.2. A real data example

Table 2.

6. Discussion

7. Proof of Theorem 4.2

7.1. Technical lemmas

7.2. Proof of Theorem 4.2

Supplementary Material

Acknowledgements

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A SIEVE M-THEOREM FOR BUNDLED PARAMETERS IN SEMIPARAMETRIC MODELS, WITH APPLICATION TO THE EFFICIENT ESTIMATION IN A LINEAR MODEL FOR CENSORED DATA*

Ying Ding

Bin Nan

Abstract

1. Introduction

2. The sieve M-theorem on the asymptotic normality of semiparametric estimation for bundled parameters

3. Back to the linear model: the sieve maximum likelihood estimation

4. Asymptotic results

5. Numerical examples

5.1. Simulations

Table 1.

5.2. A real data example

Table 2.

6. Discussion

7. Proof of Theorem 4.2

7.1. Technical lemmas

7.2. Proof of Theorem 4.2

Supplementary Material

Acknowledgements

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

A SIEVE M-THEOREM FOR BUNDLED PARAMETERS IN SEMIPARAMETRIC MODELS, WITH APPLICATION TO THE EFFICIENT ESTIMATION IN A LINEAR MODEL FOR CENSORED DATA^*