Abstract
We consider tests of hypotheses when the parameters are not identifiable under the null in semiparametric models, where regularity conditions for profile likelihood theory fail. Exponential average tests based on integrated profile likelihood are constructed and shown to be asymptotically optimal under a weighted average power criterion with respect to a prior on the nonidentifiable aspect of the model. These results extend existing results for parametric models, which involve more restrictive assumptions on the form of the alternative than do our results. Moreover, the proposed tests accommodate models with infinite dimensional nuisance parameters which either may not be identifiable or may not be estimable at the usual parametric rate. Examples include tests of the presence of a change-point in the Cox model with current status data and tests of regression parameters in odds-rate models with right censored data. Optimal tests have not previously been studied for these scenarios. We study the asymptotic distribution of the proposed tests under the null, fixed contiguous alternatives and random contiguous alternatives. We also propose a weighted bootstrap procedure for computing the critical values of the test statistics. The optimal tests perform well in simulation studies, where they may exhibit improved power over alternative tests.
Keywords and phrases: Change-point models, contiguous alternative, empirical processes, exponential average test, nonstandard testing problem, odds-rate models, optimal test, power, profile likelihood
1. Introduction
In this paper we investigate nonstandard testing problems involving a family of probability distributions {Pθ, θ ∈ Θ}, known up to a parameter θ, in a parameter space Θ. The parameter space Θ is assumed to be a subset of an infinite-dimensional metric space. The null and alternative hypotheses are:
where Θ0 is a subset of Θ and contains at least two elements. In the usual testing framework, the parameters are unique under the null, so that identifiability is not an issue. While we allow multiple values of θ satisfying the null, we assume that the null distribution, denoted by P0, is unique, where Θ0 = {θ ∈ Θ : Pθ = P0}. Under this set-up, the true value of θ is not identifiable under the null, since for any θ ≠ θ′ in Θ0, Pθ = Pθ′ = P0. Such loss of identifiability occurs in diverse applications in the social, biological, physical and medical sciences. We next present two such examples followed by a description of the main contributions of this paper. The Introduction concludes with a brief outline of the remainder of the paper.
1.1. Example 1: univariate frailty regression under right censoring
Let T be a non-negative random variable representing the failure time, C be the independent censoring time, V ≡ min(T, C) and Z ≡ Z(·) be a corresponding p-dimensional covariate process. The observed data {Xi = (Vi, Δi, Zi), i = 1, …, n} consists of n i.i.d. realizations of X = (V, Δ, Z), where Δ ≡ 1{T ≤ V}, 1{·} is the indicator function. In this model, the hazard function of the survival time T given covariates Z is
| (1) |
where t is the time index, W is an unobserved gamma frailty with mean 1 and variance ζ, β is a p–dimensional regression parameter and η(·) is a completely unspecified baseline hazard function.
When β is not zero, the odds-rate model has been treated extensively; see Kosorok et al. (2004); Murphy et al. (1997); Murphy and van der Vaart (1997, 2000); Parner (1998); Slud and Vonta (2004), among others. Scharfstein et al. (1998) considered semiparametric efficient estimation in the setting where the covariates are time-independent, ζ is assumed known and η(·) is assumed to be absolutely continuous. Bagdonavičius and Nikulin (1999) considered estimation for a class of proportional hazards model, which includes the odds-rate model with ζ unspecified, based on a modified partial likelihood. Kosorok et al. (2004) considered robust inference for odds-rate models when the frailty distribution and regression covariates may be mis-specified. To our knowledge, problems associated with testing the null β = 0 when the frailty parameter is unknown have not been previously considered in the statistical literature.
It has been shown that ζ and η(·) are not identifiable under the null (Kosorok et al., 2004). Intuitively, when β = 0, the covariate process Z provides no information for the failure time process. The frailty W and the baseline hazard η(·) are not distinguishable from each other, hence ζ and η(·) are not identifiable. Thus, the testing problem described above is nonregular and standard asymptotic results are not applicable.
1.2. Example 2: change-point regression for current status data
Change-point models have been studied extensively and have proven to be popular in clinical research. In many settings, a change-point effect is realistic and can be much easier to interpret than a quadratic or more complex nonlinear effect (Chappell, 1989). Change-point Cox models have been widely used in survival applications, as in Kosorok and Song (2007); Luo et al. (1997); Pons (2003), where likelihood ratio tests were investigated. However, to our knowledge, optimal testing has not been explored for such models.
Under current status censoring, a subject is examined once at a random observation time V and at that time it is observed whether the event time T ≤ V or not. The observed data {Xi = (Vi, Δi, Zi), i = 1, …, n} consists of n i.i.d. realizations of X = (V, Δ, Z), where Δ ≡ 1{T ≤ V} and Z is a d–dimensional covariate. Here we let d = 1 for simplicity. In this example, we assume that the time to event T satisfies a change-point Cox model conditionally on the covariate Z. That is, the density of X is given by:
| (2) |
with rγ(z) = αz + (β1 + β2z)1{z > ζ}, where α, β1 and β2 are scalar regression parameters, ζ is the change-point parameter and Λ(·) is the cumulative baseline hazard function. We also define the collected parameters β ≡ (β1, β2), ξ ≡ (β, α), γ ≡ (ξ, ζ) and η ≡ (α, Λ). We are particularly interested in the hypothesis test of the existence of a change-point for regression parameters in the above model, that is, H0 : β = 0.
Although Cox regression with current status data was discussed by Huang (1996) and others, change-point Cox regression has not been studied with current status data. The development of optimal tests in the current status setting is further complicated by the fact that the nuisance parameter Λ cannot be estimated at the parametric rate, unlike with right censored data.
In model (2), the change-point parameter is present only under the alternative. This is different from Example 1, where the odds rate parameter ζ and the baseline hazard function η(·) are both present, but indistinguishable, under the null.
1.3. Description of main contribution
The statistical literature contains numerous precedents on the nonidentifiability problem in parametric models, see Chernoff (1954), Chernoff and Lander (1995), Dacunha-Castelle and Gassiat (1999) and Liu and Shao (2003). Among others, Dacunha-Castelle and Gassiat (1999) proposed a locally conic parametrization approach to enable asymptotic expansions of the likelihood ratio test under loss of identifiability under the null. Liu and Shao (2003) derived a quadratic approximation of the loglikelihood ratio function by using Hellinger distance. Most authors directly study the approximation of the log-likelihood ratio function in some neighborhood and obtain its asymptotic null distribution. However, the asymptotic optimality properties of the classical likelihood ratio tests (LRT) do not hold anymore (Lindsay, 1995) and Wald and score tests are not even well defined in these nonstandard problems. To our knowledge, all results for testing nonidentifiable P0 using likelihood based tests are for parametric models. The main aim of this paper is to investigate the construction of optimal likelihood based tests for semiparametric models.
A key question which arises, as noted by Dacunha-Castelle and Gassiat (1999), is: since the parameter is not identifiable, around which point can an expansion be made? To address this question, we assume the existence of a “full rank” reparameterization which contains all the information of the null model and in which all parameters are identifiable. To be specific, we partition θ ≡ (ψ, ζ) and ψ ≡ (β, η), where β ∈ ℝp is a parameter of interest, ζ ∈ ℝq and η is a parameter defined on an arbitrary parametric space,
η. We assume that the information in the null model can be absorbed into the parameter space of η, through this full rank reparameterization. This is made precise in Section 2. Note that Example 1 requires such a reparameterization since both ζ and η are present under the null. In contrast, such a reparameterization is not required for Example 2 since ζ is not present under the null.
When the models involved are parametric, a special case when η does not depend on ζ under the null, that is, ζ is only present under the alternative, has been studied extensively by Andrews and Ploberger (1994); Davies (1977, 1987); King and Shively (1993), and others. Davies (1977) showed that the likelihood ratio test is optimal in the sense that as the significance level of the test tends to zero, its power function approaches that of the optimum test when ζ is given. These optimality results are very weak and do not provide any guidance regarding the performance of the test in practical applications, where the significance level is fixed, eg. at level .05 (Andrews, 1999). Andrews and Ploberger (1994) studied optimal tests for parametric models using the weighted average power criterion originally introduced by Wald (1943) when studying the likelihood ratio test under regularity conditions, where the model is identifiable under the null. Under loss of identifiability, the likelihood ratio test is generally less powerful than the optimal test in Andrews and Ploberger (1994). These optimal tests possess a Bayesian interpretation, where the weight corresponds to a prior on the nonidentifiable parameter, and are asymptotically equivalent to a Bayesian posterior odds ratio.
In this paper, we adapt the weighted average power criterion (Andrews and Ploberger, 1994; Wald 1943) to construct optimal tests in semiparametric models under loss of identifiability. Our main contribution is to extend the results of Andrews and Ploberger (1994) in at least four directions.
First, Andrews and Ploberger (1994) address only parametric models, as is the case for most of the literature on testing problems with nonidentifiability under the null. Our optimality results are available for semiparametric models, where η may be infinite dimensional and ζ may not be estimable at the usual parametric rate under either the null or the alternative. A semiparametric profile likelihood approach is adopted to reduce the infinite-dimensional model to a finite-dimensional uniformly least-favorable submodel; see Murphy and van der Vaart (2000) for a discussion of profile likelihood in regular settings. We note however, that the idea of uniformly least favorable submodels is a new concept in semiparametric settings, which is not discussed in Murphy and van der Vaart (2000). The development of this concept is both nontrivial and critical to establishing an appropriate optimality criterion for semiparametric models under loss of identifiability.
Second, the results of Andrews and Ploberger (1994) are applicable only for tests where a nuisance parameter (namely ζ) is present only under the alternative. This may not be true in our situation, where a nondegenerate reparameterization may be needed to make ζ vanish under the null. Furthermore, our tests and the optimality results do not depend on the reparameterization.
Third, Andrews and Ploberger (1994) establish that their test is optimal with respect to local alternatives for ψ involving a multivariate normal prior with singular covariance matrix. In our approach, it is only necessary to specify the prior in the direction of β, the parameter of interest, and no prior is needed on the remaining parameter η. This enables us to avoid the singular covariance issue in Andrews and Ploberger (1994).
Fourth, we develop a simple and effective Monte Carlo method of inference for the proposed test statistics.
Adopting a profile likelihood approach has several advantages. First, under the identifiable submodel, the MLE for η may converge at a slower rate than the usual rate, such as the change-point Cox model with current status data. This makes the theoretical justification based on Taylor expansion of the full likelihood fail. Second, even if the MLE of the nonparameric component converges at the rate, semiparametric likelihoods may not be suitably “differentiable,” in particular, when such a likelihood contains certain empirical terms, as with, for example, the odds-rate model. Third, handling the remainder terms in a Taylor type expansion is challenging, owing to the presence of the infinite dimensional parameters, and a delicate Banach space analysis is required. Employing the profile likelihood enables us to address these issues rigorously.
1.4. Organization of paper
The remainder of the paper is organized as follows. In Section 2, we present the generic testing problem and the model and data assumptions. The optimality results are given in Section 3. We verify that the results hold for the examples in Section 4. In Section 5, we describe a simulation study to evaluate the finite-sample behavior of the proposed tests and to compare its efficiency with some alternative tests for the current status example. In Section 6, we discuss some additional examples without identifiability under the null which are not covered in our current settings and which require further extensions. Proofs are given in Section 7.
2. The hypothesis tests and assumptions
2.1. The optimal tests
In this subsection we formulate the tests of hypotheses when the parameters are not identifiable under the null. Let Pθ denote the probability measure, based on observed data X̃n ≡ (X1, X2, …, Xn), where θ ∈ Θ and the subscript n is the sample size. As mentioned previously, the parameters θ ∈ Θ0 under the null hypothesis are not identifiable. We assume, as in the examples, that θ can be partitioned as (ψ, ζ), with ζ q-dimensional and ψ of arbitrary dimension. We further assume that ψ can be partitioned as (β, η) so that the null hypothesis can be stated in terms of β, with the nuisance parameter η having arbitrary dimension. The likelihood function of the data is given by ln(θ) and the profile likelihood for β and ζ is defined as pln(β, ζ) = supη ln(β, η, ζ). For the semiparametric model {P(β, η, ζ)} on a sample space
, we assume β ∈ ℝp, ζ ∈ Ξ, a compact subset of ℝq and η ∈
η, which is a subset of a Banach space.
The hypotheses to be tested are:
| (3) |
When β = β0, the null distribution P0 is unique and the likelihood for a single observation under the null is abbreviated as l0. Let π ≡ (η, ζ). The null set of π is Π0 and its cardinality is the same as that of Ξ, which is at least two. Θ0 = {β0} × Π0. For each ζ ∈ Ξ, η0(ζ) ≡ {t ∈
η : (t, ζ) ∈ Π0} is an interior point of
η. Let ψ0(ζ) ≡ (β0, η0(ζ)), and θ0(ζ) ≡ (ψ0(ζ), ζ). Thus, Θ0 can be represented as Θ0 = {θ0(ζ) : ζ ∈ Ξ}.
Before introducing the optimal tests, we need some additional notations for the parameter space and the score and information operators in the semiparametric settings. We denote
as the derivative of log l1(θ) with respect to β and l̈β is the second derivative of log l1 (θ) with respect to β.
refers to the class of square integrable functions under the measure Pθ with mean 0. The score operator for η is defined as l̇η, which is a bounded linear map from
η to
with adjoint operator
, where
η is the closed linear span of
η. The information operator is
. The efficient score for β is the ordinary score function l̇β minus its orthogonal projection onto the closed linear span of the score operator l̇η. The efficient information for β is
, which is the asymptotic variance of the efficient score function.
We use the notations ℙn and
n for the empirical distribution and the empirical process of the observations. That is, for every measurable function f and probability measure P,
We note that although simultaneous estimation of β and ζ fails under the null due to nonidentifiability, estimation results for β̂n(ζ), the MLE of β at a fixed value of ζ, are often valid under the null. This suggests making inference about β using β̂n(ζ). For fixed ζ ∈ Ξ, the score, Wald and likelihood ratio test statistics for testing H0 against H1 are given by
where θ̂n(ζ) ≡ (β̂n(ζ), η̂n(ζ), ζ) is the unrestricted MLE of θ at a fixed value of ζ and θ̂0(ζ) ≡ β0, η̂0(ζ), ζ) is the restricted MLE of θ for a fixed value of ζ under the null. ℙnl̇β(θ̂0(ζ)) = ℙnl̇β(β0, η̂0 (ζ), ζ) is the empirical score function of β evaluated at the restricted MLE θ̂0 (ζ). ℙnl̇β(θ̂n(ζ)) = ℙnl̇β(β̂n(ζ), η̂n(ζ), ζ) is the empirical score function of β evaluated at the unrestricted MLE θ̂n(ζ). The inverse matrix of , a consistent estimator of the efficient information Ĩβ under the null, estimates the covariance matrix of β̂n(ζ).
The optimal tests we propose take the form:
where c > 0 is a known constant and J(·) is a pre-selected integrable prior on ζ. Their optimality will be discussed in Section 3. We note that, in semi-parametric settings, the computation of the efficient information may involve high dimensional maximization and nonparametric smoothing. Then the tests ERn and EWn may be computationally harder than ELRn. Hence the likelihood ratio based test ELRn is more attractive in these settings.
In construction of the optimal tests, understanding and computing θ̂0(ζ), may be complicated due to the dependence of the parameter θ0(ζ) on ζ. Assuming the existence of the following full rank reparameterization, we can eliminate the dependence between η and ζ, thereby easing both the theoretical developments and the computations for the proposed tests.
2.2. Full rank reparameterization: breaking the dependence between η and ζ
We assume there exists a map φζ :
η ↦
η which is one-to-one and uniformly Hadamard-differentiable at η tangentially to
η over ζ ∈ Ξ, i.e.,
as supζ∈Ξ ‖hn(ζ)−h(ζ)‖ → 0, and tn → 0, where h(ζ) is in the tangent space of
η for all ζ ∈ Ξ and ‖ · ‖ denotes the norm of
η. Its derivative φ̇ζ is one-to-one and continuously invertible uniformly over ζ ∈ Ξ. That is, there exists a positive constant c such that ‖φ̇ζ(η1(ζ) − η2(ζ))‖ ≥ c‖η1(ζ) − η2(ζ)‖ for every η1(ζ) and η2(ζ) in
η for all ζ ∈ Ξ. Let η̄ ≡ φζ(η), and
, where ζ vanishes under the null, for all x in
.
This reparameterization does not change the likelihood, that is, the equality l1(β, η(ζ), ζ)(x) = ℓ1(β, η̄, ζ)(x) holds both under the null and the alternative. Under the null, the likelihood l1(β0, η0(ζ), ζ) = ℓ1 (β0, η̄0, ζ) for a specific η̄0, which does not depend on ζ, and ζ disappears in the null likelihood. We thus reduce the parameter dimension of the null space from Π0 to
η. For Example 2 in the Introduction, φ can be taken to be the identity and thus the reparameterization is not needed. In contrast, a reparameterization is needed for Example 1. We will give the details later in Section 4.
The reason we assume the existence of such a full rank reparameterization is to eliminate the dependence between η and ζ. The issue is that the optimality results are with respect to a perturbation of the parameter η, which is not well defined in the original space, due to the dependence between parameters η and ζ. Subsequent assumptions are built on the new parameterization θ̄ ≡ (β, η̄, ζ). However, the results still hold for the original parameterization, since the efficient score and efficient information of β are invariant under such reparameterization of η, as given in the following lemma:
Lemma 1. Under the full rank reparameterization, l̃β(θ) = ℓ̃β(θ̄), where ℓ̃β(θ̄) is the efficient score of β under the new reparameterization. The efficient information matrix is also invariant to these reparameterizations.
Remark 1. The full rank reparameterization defined above may not be unique. We will show later in the proof of Theorem 2 that the optimal tests proposed in this paper are invariant to the choice of the full rank reparameterization.
Next we discuss how to construct the optimal tests with the new parameterization, where ζ vanishes, and ψ̄ does not depend on ζ under the null.
2.3. Constructing optimal tests under the full rank reparameterization
Though ζ disappears in the likelihood under the null hypothesis, the score and information are still processes indexed by ζ. For fixed ζ ∈ Ξ, the score, Wald and likelihood ratio test statistics for testing H0 against H1 with the new parameterization can be represented as:
where is the unrestricted MLE of ψ̄ and is the restricted MLE of ψ̄0. is the empirical score function of β evaluated at the restricted MLE . is the empirical score function of β evaluated at the unrestricted MLE . The inverse matrix of , a consistent estimator of the efficient information of β under the null, estimates the covariance matrix of β̂n. It is thus obvious that the optimal tests are invariant with respect to the choice of full rank reparameterizations.
To further study the asymptotic distribution and the optimality of the proposed tests, we need the following assumptions, based on the full rank reparameterization. We note that except Assumption C, all other assumptions can also be stated with the original parameterization.
2.4. The assumptions based on the reparameterzation
To derive asymptotically optimal tests of H0, we consider local alternatives to H0 of the form with ζ and η̄ unspecified. The optimality criterion will involve a weighted average power criterion, where the averaging is with respect to an integrable prior Qζ(h) on the values of h in ℝp defining local alternatives and an integrable prior J(ζ) on ζ. Before formally stating the optimality criterion, we give assumptions on the data and the parameter spaces. The first two assumptions postulate the existence of the prior on local alternatives, Qζ(h).
A1 The efficient information function of β evaluated at (ψ̄0, ζ),
β(ψ̄0, ζ), is uniformly continuous in β and ζ over B0 × Ξ, where B0 is some neighborhood of β0. Furthermore,
β(ψ̄0, ζ) is uniformly positive definite over ζ ∈ Ξ, that is infζ∈Ξ λmin {
β(ψ̄0, ζ)} > 0, where λmin(C) is the smallest eigenvalue of the matrix C.A2 Qζ is a normal measure with mean β0 and variance for ζ ∈ Ξ, where c > 0 is a scalar constant.
Assumptions A1 and A2 are analogous to Assumptions 1(e), 1(f) and 4 of Andrews and Ploberger (1994), although there are fundamental differences. Andrews and Ploberger (1994) work directly by building on the full parametric likelihood and their assumptions refer to the information matrix for all parameters. Furthermore, their optimality results are defined in terms of local alternatives for ψ, where the prior is a multivariate normal with singular covariance matrix. Our Assumptions A1 and A2 are only for the parameter of interest, β, with no prior assumptions needed for η under either the null or the alternative.
The next set of conditions assumes the existence of a uniformly least-favorable submodel. This submodel can be viewed as a “uniform” version of the least favorable submodel discussed in Murphy and van der Vaart (2000): the convergence rate of the nuisance parameter now is in the “uniform” sense, and the efficient score and the efficient information possess Donsker and Glivenko-Cantelli properties with “larger” index sets, respectively. When the set of ζ, Ξ, is a singleton, this new submodel concept reduces to the ordinary least favorable submodel. The development of this concept is critical to establishing an appropriate optimality criterion for general semiparametric models under loss of identifiability. Here are the needed assumptions:
B1 There exists a map t ↦ ft from a fixed neighborhood of β0 into
η, such that the map t ↦ ℓ(t, θ̄) defined by ℓ(t, θ̄) ≡ ℓ1(t, ft, ζ) is twice continuous differentiable. Let ℓ̇(t, θ̄) and ℓ̈(t, θ̄) denote the derivatives with respect to t. The submodel with parameters (t, ft, ζ) passes through η̄ at t = β, that is, fβ(β, η̄, ζ) = η̄ for all ζ ∈ Ξ.B2 The submodel is uniformly least-favorable at ψ̄0 = (β0, η̄0) and ζ for estimating β0 in the sense that ℓ̇(β0, ψ̄0, ζ) = ℓ̃β(ψ̄0, ζ). As (t, β, η̄) → (β0, β0, η̄0), we assume that supζ∈Ξ ‖ℓ̇(t, ψ̄, ζ) − ℓ̃β(ψ̄0, ζ)‖ = oP0 (1) and supζ∈Ξ ‖ℓ̈(t, ψ̄, ζ) − ℓ̈(β0, ψ̄0, ζ)‖ = oP0 (1). In the sequel, we let denote a quantity going to zero in probability, under P, uniformly over the set Ξ.
- B3 We assume that , the restricted MLE of ψ̄ under the null, satisfies . The unrestricted MLE . Moreover, let , that is . Assume that for any random sequences β̃n → P0 β0, we have and the following uniform “no-bias” condition holds:
(4) B4 There exist neighborhoods U of β0 and V of ψ̄0, such that the class of functions {ℓ̇(t, ψ̄, ζ) : t ∈ U, ψ̄ ∈ V, ζ ∈ Ξ} is P0-Donsker with square integrable envelope function and the class of functions {ℓ̈(t, ψ̄, ζ) : t ∈ U, ψ̄ ∈ V, ζ ∈ Ξ} is P0-Glivenko-Cantelli and is bounded in L1(P0), where L1(Pθ) refers to the class of integrable functions under Pθ.
Assumptions B1-B4 set the stage for the quadratic expansion of the profile likelihood and the derivation of the optimality properties of the proposed tests. Note that these assumptions can also be built on the original parameterization, but we use the new parameterization for ease of presentation. Since our formulation includes parametric models as special cases, the existence of a uniformly least-favorable submodel in our set-up covers all situations considered by Andrews and Ploberger (1994).
Compared with Andrews and Ploberger (1994), we have a stronger form of the unbiasedness condition and stronger requirements on the consistency of the estimators for the expansion of the profile likelihood. This is partly due to the more general structure of the semiparametric model. As in Assumption B3, we require that if β̃n is any sequence of estimators consistent for β0, must be consistent for η̄0, the true value of the nuisance parameter η̄, uniformly over Ξ. In Andrews and Ploberger (1994), consistency is only needed for the unconstrained MLE (Assumption 2) and the constrained MLE under the null hypothesis (Assumption 3).
To evaluate the local asymptotic distribution of the proposed tests, we require differentiability in quadratic mean (DQM) of the parameters ψ̄, as stated in the following Assumption C, which is commonly used to evaluate the local power. It will be verified for the two examples presented in the introduction. Unlike Assumptions B1–B4, the full rank reparameterization is indispensable in Assumption C:
-
C Differentiability in quadratic mean of the parameter ψ̄. A perturbation of ψ̄ in its domain is ψ̄t = ψ̄0 + th + o(1), where h ≡ (hβ, hη̄), hβ ∈ ℝp and hη̄ ∈
η. The DQM condition for ψ̄0 with respect to the collection of paths {ψ̄t} is:
for all ζ ∈ Ξ, where Aζ is a bounded linear operator defined on ℝp ×
η and takes values in
.
Differentiability in quadratic mean implies that the range of Aζ is contained in
. Note that Aζh = ∂/∂tℓ1(ψ̄t, ζ)|t=0, following similar arguments as in Kosorok and Song (2007), where h = (hβ, hη̄). We define Aζ to be given by
, where ℓ̇β and ℓ̇η̄ are the score operators for β and η̄ respectively. Moreover, ℝp ×
η is a Hilbert space with ‖ · ‖ denoting its norm and 〈 ·, ·〉 denoting its inner product. Since in parametric settings, twice continuous differentiability implies DQM (Pollard, 1995), this assumption is weaker than Assumption 1(c) in Andrews and Ploberger (1994).
3. Main results
This section includes several main results. The first one gives the asymptotic null distribution of the proposed tests.
3.1. The distributions of the test statistics under the null
To establish the asymptotic null distribution of the test statistics, a key result about the uniform profile likelihood expansion is summarized in the following lemma.
Lemma 2. Under Assumptions A–C, for any random sequence β̃n →P0 β0,
| (5) |
Lemma 2 enables us to establish the asymptotic equivalence of these test statistics and their asymptotic distributions:
Theorem 1. Under Assumptions A–C, ELRn = EWn + oP0(1) = ERn + oP0(1) →d eχ(c), where
and
(θ0(ζ)) is the limiting process of
nl̃β(θ0 (ζ)), which is a mean zero Gaussian process with variance function σ2(ζ) = Ĩβ(θ0(ζ)) indexed by ζ and with covariance function σ2(ζ1, ζ2) = P0 {l̃β(θ0(ζ1)) l̃β(θ0(ζ2))′}, indexed by ζ1 and ζ2, for ζ, ζ1 and ζ2 ∈ Ξ.
Remark 2. When J(·) does not correspond to a prior on ζ, corresponding rather to a weight function lacking a probabilistic interpretation, then the results in Theorem 1 will generally hold, although the test may no longer possess the optimality discussed in the sequel. Theorem 1 should also hold if Qζ(h) is not a prior distribution, corresponding rather to a weight function on local alternatives for β. This robustness indicates that the tests are generally valid under loss of identifiability, yielding a large class of test statistics, with the optimal test being a member of this class.
We note that Theorem 1 only holds for normal weight , which corresponds to the uniform least favorable direction. As indicated in the proof of Theorem 1, the normal weight function Qζ(·) is integrated out, hence does not appear in the test with the original form. Subsequently, the optimal tests depend on the weight function Qζ(·) only through the scalar c. The larger c is, the more weight is given to alternatives for which β is large. For example, for a test of the change-point model, larger values of c correspond to greater weight being given to larger changes. In the special case where J(ζ) is a pointmass at a single value ζ0, the optimal test rejects if and only if LR(ζ0) exceeds some constant (i.e., the optimal test equals the standard score test for fixed ζ0) and the optimal test is independent of c. When J(ζ) is not a pointmass distribution, however, the optimal test ELRn depends on c. The larger c is, the more power is directed at alternatives for which β is large.
The limit as c → 0 of the 2(ELRn − 1)/c statistic is equal to the “average score” statistic ∫ LRn(ζ)dJ(ζ), which is the limit of the ELR statistics that are designed for alternatives that are very close to the null hypothesis. At the other extreme, the limit as c → ∞ is log ∫ exp(LRn(ζ)/2)dJ(ζ). Thus for testing against more distant alternatives, the optimal test statistic is still of an average exponential form.
If the constant c/(1 + c) which appears in the definition of ELRn is replaced by a constant r > 0, then the limit as r → ∞ of ELRn is the likelihood ratio test, equivalently, the “sup score” statistic studied in Kosorok and Song (2007). Hence, the sup score test is designed for distant alternatives, but is of a more extreme form than the optimal exponential test, since the latter requires r < 1. It can be easily shown as a corollary to Theorem 1 that the usual likelihood ratio, Wald and score tests have the following distribution:
Corollary 1. Under the null hypotheses and Assumptions A-C, supζ LRn(ζ) = supζ Wn(ζ) + oP0 (1) = supζ Rn(ζ) + oP0 (1) →d χ, with .
3.2. Optimality of the proposed tests
The second main result of this paper is the optimality property of the proposed tests. Following assumptions in Section 2, we consider local alternatives for hβ ∈ ℝp with prior distribution Qζ(hβ) on the local alternative direction hβ and prior distribution J(ζ) on the nonidentifiable parameter ζ. The optimality result is as follows:
Theorem 2. Under Assumptions A–C, the test statistics in Theorem 1 are asymptotically uniformly most powerful for testing H0 : β = β0 against the contiguous alternative
where h ≡ (hβ, hη(ζ)), and where is the uniformly least-favorable direction indexed by ζ. Moreover, this optimality result is invariant under the choice of reparameterization.
Theorem 2 also implies that the proposed tests have the greatest weighted average power asymptotically in the class of all tests of asymptotic significance level α, against the alternative . That is, they maximize
over all tests φn of asymptotic level α.
Our optimality results are under alternatives , with nonsingular normal weights on hβ. Our weights on hβ are precisely Andrews and Ploberger's [2] weights projected onto the parameter space that is of interest. Thus our results and Andrews and Ploberger's are consistent.
We now discuss the choice of the direction qζ, the priors Qζ(·) and J(·). By the Neyman-Pearson lemma, for any appropriate prior distributions Qζ(·) and J(·) and any known directions qζ, a UMP test for testing H0 : β = β0 against the contiguous alternative , where h ≡ (hβ, hη(ζ)), is defined by
where kαn > 0, λn ∈ [0, 1] are constants such that the rejection probability is α under the null and
We have the following result:
Corollary 2. Under Assumptions A–C, the null hypothesis and the contiguous alternatives,
where W(qζ, ζ) ≤ 1 is defined in equation (17) in section 7 below. When qζ = q̃ζ, W(q̃ζ, ζ) = 1 and QLRn = ELRn + oP0 (1).
As the alternatives we consider are contiguous to the null, in each direction qζ, which indexes QLRn, there exists a consistent estimator η̃n(qζ) of η̄0 by the convolution theorem, provided certain conditions hold. The optimal tests can thus be built on η̃n(qζ).
In applications with composite hypotheses where qζ is unknown, there may not exist a direction which can maximize the power over all directions (Bickel et al., 2006). In a regular testing problem where all parameters are identifiable, it can be shown that the likelihood ratio test, which is built on the uniformly least-favorable direction, will maximize the minimum power of all directions of the alternatives, over all the test based directions. In our nonregular testing problem, the situation is further complicated, since the power depends on the covariance structure of
(θ0(ζ)). It is not clear if the maximin property still holds in our problem. We note that, however, our tests can be interpreted as the “maximum direction” test. Moreover, since the power of the test is not affected by multiplying by a constant in QLRn, we can standardize W(qζ, ζ)dJ(ζ) to obtain dJ̃(ζ), which is a probability measure on ζ. Then the question of the optimal choice of both qζ and J(ζ) reduces to the question of the optimal choice of J̃(ζ). Hence, without loss of generality we can replace qζ with q̃ζ. For this reason, we should choose qζ = q̃ζ and focus on the choice of Qζ(·) and J̃(·) for optimization.
One reason we use the normal weight for Qζ in this paper is to facilitate a comparison with Andrews and Ploberger (1994). Using the normal prior with covariance matrix proportional to the efficient information matrix also leads to a significant simplification of the representation of the test statistics, since many terms cancel in the proof of Theorem 1. However we note that the choice of Qζ(·) is not limited to the normal weight studied in this paper, as indicated in the proof of Theorem 2. More general choices of the priors Qζ(·) and J(·) merit future consideration, but this is beyond the scope of the current paper.
The optimality of the likelihood ratio statistics with loss of identifiability under the null for semiparametric models is of potential interest. Similar to the likelihood ratio test under loss of identifiability with parametric models (Andrews and Ploberger, 1994), in the semiparametric setting, the profile likelihood ratio statistic is not of the optimal average exponential form. It can be shown to be a limit of an average exponential test, but only if a parameter is pushed beyond an admissible boundary, as noted by Andrews and Ploberger (1995) in the parametric case.
3.3. The distributions of the test statistics under local alternatives
To gain insight into the power of the optimal tests in practice, it is worthwhile to study their asymptotic distributions under local alternatives. In the following two theorems, Theorem 3 gives the asymptotic distribution for fixed local alternatives , while Theorem 4 gives the asymptotic distribution for random local alternatives . As shown in the theorems, the distributions depend on the form of the alternative, which will depend in part on the specifics of the application. These results also usually depend on the prior distributions J(·) and Qζ(·), for both fixed alternatives and random alternatives, although in different manners.
Theorem 3. Under local alternatives and Assumptions A–C, ELRn = EWn + op(1) = ERn + op(1) →d fχ(c), with
where v⋆(hβ, ζ, ζ1) ≡ P0l̃β(θ0(ζ))l̃β(θ0(ζ1))′hβ.
Now we establish the asymptotic distribution of the test statistics under the alternative :
Theorem 4. Under Assumptions A–C and the local alternative , where rχ(c) is a real random variable such that its cumulative distribution function Pr(rχ(c) ≤ t) = P0 [1{eχ(c) ≤ t}eχ(c)].
3.4. Monte Carlo computation and inference
Although we have obtained the asymptotic distributions of the test statistics, these distributions generally have complicated analytic forms which depend on the values of unknown nuisance parameters. We now introduce a weighted bootstrap method to obtain the asymptotically valid critical values of eχ(c). This method does not require explicit evaluation of the limiting distribution, thereby avoiding the numerical difficulties inherent in such an evaluation.
We first generate n i.i.d. positive random variables κ1, …, κn, with mean 0 < μκ < ∞, variance and with . Next, we divide each weight by the sample average of the weights κ̄, to obtain “standardized weights” which sum to n. For a real, measurable function f, define the weighted empirical measure . Let denote the maximizer of over ψ ∈ Ψ at fixed ζ ∈ Ξ, where is obtained by replacing ℙn with in the definition of ln. Similarly, let denote the maximizer of over ψ ∈ Ψ at fixed ζ ∈ Ξ, where is obtained by replacing ℙn with in the definition of , the log likelihood under the null. Now repeat the bootstrap procedure a large number of times M̃n and compute the differences of the bootstrapped unrestricted MLE and restricted MLE of , as processes of ζ. Note that we are allowing the number of bootstraps to depend on n. Define and let
To estimate critical values, we compute the standardized bootstrap test statistics , for 1 ≤ k ≤ M̃n. For a test of size α, we compare the observed test statistics with the (1 − α)th quantile of the corresponding M̃n standardized bootstrap statistics. The reason we subtract off the mean is to ensure that we obtain a valid approximation to the null distribution when the null hypothesis may not be true. If not, then there may be loss of power, although the type I error rate will still be controlled when the null is true. The proof of the bootstrap validity can be built upon the proof of Theorems 7−8 in Kosorok and Song (2007). We omit the details.
4. Examples
In this section, we study the two examples in the introduction to illustrate the two types of nonidentifiability settings, one where a nuisance parameter is present under the null and one where it is not. These examples demonstrate important differences in how the full rank reparameterizations and uniformly least favorable submodels are defined in the two settings. We present Example 2 first because a reparameterization is not required, simplifying the presentation.
4.1. Example 2 revisited: change-point regression for current status data
In the change-point Cox model with current status data, a test of the existence of a threshold effect corresponds to a test of the null H0 : β = 0. The change-point parameter ζ is present only under the alternative. Hence it suffices to take φζ as the identity map.
We make the following assumptions and will argue that the assumptions in Section 2 can be checked under these assumptions. Given Z, T and V are independent, Z belongs to a compact subset of ℝ. The change-point parameter ζ ∈ [a, b], for some known −∞ < a < b < ∞ with Pr(Z < a) > 0 and Pr(Z > b) > 0. Assume P(Var(Z|V)) > 0, which guarantees that, as we will show later, the efficient information (Ĩ)β(θ0(ζ)) is positive definite uniformly over ζ ∈ Ξ. The Lebesgue density of V is positive and continuous on its support [σ, τ] with 0 < σ < τ < ∞. The baseline hazard function Λ is continuously differentiable at [σ, τ], with derivative that is bounded away from 0 and satisfies Λ0(σ) > 0, Λ0(τ) < M, for some known M. We let
Λ denote a set of non-decreasing cadlag functions Λ on [σ, τ] with Λ(τ) ≤ M.
The likelihood function equals (2) with fV,Z(v, z) removed, because it can be absorbed into the underlying measure on the sample space. The log-likelihood for a single observation log l1(θ) takes the form log l1(θ) = δ log[1 − exp{−Λ(v) exp(rγ(z))}] − (1 − δ) exp(rγ(z))Λ(v). Define Z(ζ) ≡ (1{Z > ζ}, Z1{Z > ζ}, Z) and note that with such a data representation we can adopt much material in the literature and hence simplify our arguments.
To define a uniformly least-favorable submodel in β, we take two steps. For Step 1, we calculate scores for ξ and Λ. The score function for ξ is l̇ξ(x) = z(ζ)Λ(v)Q(x; θ) with
The score operator for Λ along Λt = Λ+th with t ≥ 0 and h a non-decreasing non-negative right continuous function, is given by
We project l̇ξ(X) onto the space generated by l̇Λ. That is, we need to find a function
such that
, for all h ∈
Λ, which is equivalent to solving the least square problem Pθ‖l̇ξ − l̇Λh‖2. The solution under the null is
, where
, which is assumed to possess a version that is differentiable componentwise with the derivatives being bounded on [σ, τ] uniformly over ζ ∈ Ξ. It can be shown that Λt(θ) is indeed a hazard function when t is sufficiently close to ξ.
The uniformly least-favorable direction for ξ is . Here ϕ is a function mapping [0, M] into [0, ∞) such that ϕ(y) = y on [Λ0(σ), Λ0(τ)] and the function y ↦ ϕ(y)/y is Lipschitz and ϕ(y) ≤ c(min(y, M − y)) for a sufficiently large constant c. The efficient score for ξ for this uniformly least-favorable submodel is given by:
may be extended to [0, ∞) by setting for u ≤ Λ0(σ) and for u > Λ0(τ).
For Step 2, we next project l̇β(x) onto the space generated by l̃ξ. The efficient score function for β, l̃β, is the first two coordinates of l̃ξ minus its projection on the remaining coordinates of l̃ξ. Since l̃ξ lies in a finite-dimensional space, the projection path has a matrix representation. The efficient information for ξ, Ĩξ can be partitioned as a two-by-two block matrix, with denoting its first two-by-two principle submatrix, and so on. We define , and ξt(θ) = ξ − (β − t)vθ. We also define .
Now we use the uniformly least-favorable path t ↦ (ξt(θ), Λt(θ)) in the parameter space for the nuisance parameter η ≡ (α, Λ). This leads to l(t, β, α, Λ) = log l(ξt(θ), Λt(θ)). This submodel is least favorable at (ξ0, Λ0) uniformly over ζ ∈ Ξ since , Whereas . The efficient information matrix for β is, . The remainder of Assumption B4 can be verified by standard empirical process arguments.
To verify Assumption A1 in Section 2, it suffices to show that Ĩξ is uniformly positive definite over ζ ∈ Ξ, which can be achieved by checking that infζ∈Ξ λmin{P0(Cov(Z(ζ)|V))} > 0. We first show that the random vector (Z, 1{Z > ζ}, Z1{Z > ζ}) is linearly independent given V pointwisely in ζ ∈ Ξ. Suppose that given V,
| (6) |
a.s., for some constants a, b and c. Our aim is to show a = b = c = 0. When Z ≤ ζ, (6) becomes aZ1{Z ≤ ζ} = 0. Since Var(Z|V) > 0 and P(Z ≤ ζ|V) > 0, for every ζ ∈ Ξ, Var(Z|Z ≤ ζ, V) > 0, and therefore a = 0. When Z > ζ, (6) becomes (b + cZ)1{Z > ζ} = 0. If c ≠ 0, Z = −b/c, which is contradicted with the fact that Var(Z|Z > ζ, V) > 0. Thus we conclude that c = 0 and b = 0 as a consequence. That P(Cov(Z(ζ)|V)) is uniformly positive definite over ζ ∈ Ξ follows since P(Cov(Z(ζ)|V)) is a continuous function of ζ and Ξ is compact.
The profile likelihood estimator ψ̂n(ζ) can be shown to be consistent for (β0, Λ0) by a similar proof as used for the full maximum likelihood estimator in Huang (1996). The following lemma shows the uniform consistency of ψ̂n(ζ) under the null.
Lemma 3. .
To verify the uniform no-bias condition (4), we need the following result about the uniform rate of convergence:
Lemma 4. Suppose that d(η, η1) : η, η1 ∈
η is the metric defined on
η, and C1, C2 and C3 are positive constants with,
| (7) |
| (8) |
for functions φn such that δ ↦ φn(δ)/δα is decreasing for some α < 2 and sets B×
η×Ξ such that under the null Pr(β̃n ∈ B, η̂β̃n (ζ) ∈
η, ζ ∈ Ξ) → 1. Then
for any sequence of positive numbers rn such that
for every n.
We apply Lemma 4 with η = (α, Λ),
η = ℝ ×
Λ, where
Λ is the closed linear span of
Λ, d(η, η1) = ‖α − α1‖ + ‖Λ − Λ1‖2 and
Condition (7) can be established by the Taylor expansion and the uniform boundedness on the derivatives of the loglikelihood. Condition (8) can be verified using Lemma 3.3 of Murphy and van der Vaart (1999), with the choice , where M ≥ ‖mβ,η,ζ‖∞ is a constant. These conditions imply that , for any sequence β̃n → 0. Now we only need to verify
| (9) |
which is equivalent to (4) under regularity conditions. We further decompose (9) as (17) in Murphy and van der Vaart (2000), which can be easily verified by the Taylor expansion and the uniform boundedness on the first and second derivatives of the loglikelihood.
It is not difficult to see that {pξ,Λ(ζ)} is differentiable in quadratic mean at (ψ0, ζ) with respect to the set of directions {ξ0 + th1, Λ0 + th2}, where h1 ∈ ℝ3, and h2 is a non-decreasing non-negative right continuous function. Thus all conditions in Section 2 are satisfied.
4.2. Example 1 revisited: univariate frailty regression under right censoring
The odds-rate model we consider in this paper posits that the hazard function has the form (1). We define gζ(s) ≡ (1 + ζs)−1/ζ, for ζ > 0, and g0(s) ≡ limζ↓0 gζ(s) = exp(−s). Let SZ(·) denote the survival function of T given Z, and after integrating over W, SZ(t) becomes
, where the cumulative baseline hazard function η(·) is a non-negative, monotone increasing cadlag (right-continuous with left-hand limits) function. We will argue later that Assumptions A–C can be checked under the following conditions. The true null survival function is unique and denoted as S0. The censoring time C is independent of T given Z and uninformative of ζ and β. Moreover, for a finite time point τ, P01{C ≥ τ} = P01{C = τ} > 0 almost surely. ζ ∈ Ξ ≡ [0, K0] for some known K0 < ∞. The null value β0 = 0 is an interior point of a known compact set B0 ∈ ℝp. The parameter space for η,
η, is a Banach space consisting of continuous and monotone increasing functions on the interval [0, τ] equipped with the total variation norm ‖ · ‖v. Its closed linear span is denoted as
η. The function η(·) ∈
η satisfies η(0) = 0 and η(τ) < ∞. The covariate process Z(·) is uniformly bounded in total variation on [0, τ] and var[Z(0+)] is positive definite.
The true values of π ≡ (η, ζ) are not unique under the null, since the null set Π0 contains all pairs of (η, ζ) satisfying, for t ∈ [0, τ], (1 + ζη(t))−1/ζ = S0(t), when ζ ∈ (0, K0]; and exp(−η(t)) = S0(t), when ζ = 0. In this example, ζ appears both under the null and the alternative. Equivalently, for any fixed ζ ∈ (0, K0], η0(t)(ζ) = (S0(t)−ζ − 1)/ζ and for ζ = 0, η0(t)(ζ) = − log(S0(t)), t ∈ [0, τ]. Hence Π0 = {(ζ, η0(ζ)) : ζ ∈ Ξ}. Thus we need a suitable parameter transformation for this example. Let η̄ = φζ(η) ≡ (1 + ηζ)1/ζ − 1, for ζ > 0; and η̄ = limζ→0 φζ(η) = exp(η) − 1. It can be easily checked that η̄ ∈
η. The following arguments reveal that the map φζ(η) :
η ↦
η is a full-rank reparameterization.
The log likelihood function with the new parameter θ̄ = (β, η̄, ζ) is
| (10) |
where a1(·) is the derivative of η̄(·). We will replace a1(·) with nΔη̄(·) in the sequel, since this form of the empirical log-likelihood function is asymptotically equal to the true log-likelihood function. When β = 0, it is clear that ζ vanishes since (10)=ℙn{δ log Δη̄(v)−(δ+1) log(1+η̄(v))}, and η̄(0) = 0. The odds-rate model with new parameterization ψ̄ ≡ (β, η̄) is identifiable under the null, since the null survival function S0(t|z) = (1 + η̄)−1 is a strictly monotone function of η̄ and is unique.
The Gâteaux derivative of φζ(η) at η ∈
η exists and is obtained by differentiating φζ(η) along the submodels t ↦ η + th. This derivative is φ̇ζ(η)(h) ≡ ∂/∂tφζ(η + th)|t=0 = (1 + ζη)1/ζ−1h for ζ > 0 and exp(η)h for ζ = 0.
The Gâteaux differentiability of φζ(η) pointwisely in ζ can be strengthened to uniform Fréchet differentiability by noticing that
for any r > 0. Thus
as ‖h(ζ)‖v → 0 uniformly over ζ ∈ Ξ, which we will hereafter refer to as “uniformly Fréchet differentiable.” Since φ̇ζ(η)(h) is uniformly bounded and Lipschitz in h, by checking the definition, we can show that φ̇ζ is one-to-one and continuously invertible uniformly over ζ ∈ Ξ.
To define a uniformly least-favorable submodel, we calculate scores for β and η̄. Let
denote the space of elements h = (h1, h2) such that h1 ∈ ℝp and h2 ∈
η. Consider the one-dimensional submodel defined by the map
, h ∈
. The derivative of log ℓn(ψ̄, ζ) with respect to t evaluated at t = 0 yields score operators ℓ̇n(ψ̄, ζ)(h) ≡ (ℓ̇nβ(h1), ℓ̇nη̄(h2)), where
and
with Y(u) ≡ 1{V ≥ u}.
To obtain the information operator, we consider the two-dimensional submodel defined by the map
, where h, h̃ ∈
. Define
∞ = {h ∈
: ‖h‖
< ∞}. The information operator σ̄θ̄(h) :
∞ ↦
∞ is given by −P0∂/∂s∂tℓ1(ψ̄st)|s,t=0 = ψ̄(σ̄θ̄(h)). We will show σ̄θ̄ is one-to-one, continuously invertible and onto uniformly over ζ ∈ Ξ, via Part (1) of Lemma 7 in the appendix for which it suffices to show that the information operator for the original parameterization σθ is one-to-one, continuously invertible and onto uniformly over ζ ∈ Ξ.
With the same derivation of σ̄θ̄, σθ :
∞ ↦
∞ takes the form
where
with S(θ) = −(1 + δζ)/(1 + ζη̄(τ))2.
All of the operators
, 1 ≤ i, j ≤ 2 are uniformly compact and bounded over ζ ∈ Ξ. With a similar argument as in Kosorok et al. (2004), the linear operator σθ :
∞ ↦
∞ is one-to-one, continuously invertible and onto uniformly over ζ ∈ Ξ by verifying the conditions of Lemma 8 in the appendix. Thus a uniformly least-favorable submodel for estimating β in the presence of η̄ and ζ is η̄t(β, η̄, ζ) = (1 + (β − t)′vθ̄) dη̄, where vθ̄ : ℝ ↦ ℝp is the uniformly least-favorable direction at (β0, η̄, ζ) defined by
, h ∈ ℝp. This leads to ℓ(t, β, η̄, ζ) = ℓ1(β, η̄t(θ), ζ). Because η̄β(β, η̄, ζ) = η̄, B1 is satisfied. Since ∂/∂t|t=β0ℓ(t, β0, η̄0, ζ) = ℓ̇β(β0, ψ̄0, ζ) = ℓ̃β(ψ̄0, ζ), where ℓ̃β(x) = ℓ̇β − ℓ̇η̄vθ is the efficient score for β, B2 is satisfied due to the continuity of the involved functions with respect to ψ̄ and the fact that Ξ is compact. The efficient information for β is
β = P0ℓ̃βℓ̃′β. That {ℓ̇(t, ψ̄, ζ) : t ∈ U, ψ̄ ∈ V, ζ ∈ Ξ} is P0-Donsker and {ℓ̈(t, ψ̄, ζ) : t ∈ U, ψ̄ ∈ V, ζ ∈ Ξ} is P0-Glivenko-Cantelli for some neighborhoods U and V follows from standard empirical process arguments.
It follows from Corollary 8.1.3 in Golub and Van Loan (1983) that the set of eigenvalues is a continuous function of the elements of
β(θ̄), which are continuous functions of ζ. The set of eigenvalues is therefore a continuous function of ζ. Thus infζ λmin{Ĩβ(θ0(ζ))} > 0 by the compactness of Ξ, and Assumption A1 is satisfied.
The consistency of the restricted MLE and the uniform consistency of the unrestricted MLE can be established via the self-consistency equation approach, with arguments similar to the proof of Theorem 3 in Kosorok et al. (2004). We omit the details. To verify the uniform no-bias condition (4), it suffices to show that , for any sequence β̃n → β0, where “⋆” denotes the outer expectation. By verifying conditions in Lemma 9 in the appendix, we have
Together with the fact that , we obtain , uniformly over ζ ∈ Ξ.
Let l̇ψ(h) ≡ (l̇β(h1), l̇η(h2)) denote the score operator of ψ with the original parameterization. It was shown in Kosorok et al. (2004) that the operator ψ ↦ l̇ψ is Fréchet differentiable with derivative ψ(σθ(h)), and it can be strengthened to uniform Fréchet differentiability due to the smoothness of the involved functions. Since φζ is uniformly Fréchet differentiable, by Part (2) of Lemma 7, the chain rule for uniform Fréchet differentiability ℓ̇ψ̄ ≡ (ℓ̇β, ℓ̇η̄) is uniformly Fréchet differentiable with derivative .
By the uniform Fréchet differentiability of . Since σ̄θ̄ is linear, the first term on the right-hand side is of the order OP0(n−1/2). It follows that , since σ̄θ̄ is uniformly continuously intrtible over ζ ∈ Ξ.
5. Simulation results
This section presents simulation results regarding the finite-sample properties of the proposed optimal test statistics for Example 2, the change-point Cox model with current status data. The simulation study was designed with several objectives. First, we demonstrate how to compute the asymptotic critical values with the proposed weighted bootstrap procedure. Second, we analyze the empirical type I error of the proposed tests and compare with the nominal size of the tests. Third, we compare the power of the optimal tests with that of other tests such as the sup score statistics (equivalent to the likelihood ratio statistic) and some naive (pointwise) tests under several different alternatives. Fourth, we evaluate the sensitivity of the power of the optimal test to the choice of c under several different alternatives.
A single time-independent covariate Z with a uniform [0,1] distribution was used. The threshold covariate Y = Z. The parameter α was set at α0 = 0, with the cumulative baseline hazards A0(t) = 3t2. The censoring time was uniformly distributed on the interval [0,5]. This resulted in a censoring rate of about 25% under the null hypothesis. Under the alternative, we set β10 = −0.5, β20 ∈ {−0.3, −0.5, −0.8}. The range of β20 values reflects the distance from the null. We consider the following alternative distributions of ζ:
The weight J(ζ) degenerates to one point at 0.5, that is ζ = 0.5.
A uniform weight J(ζ) with support on [0.05, 0.95].
For all the scenarios, we compute the optimal tests with a uniform weight on [0.05,0.95]. The sample size for each simulated data set was 300. For each simulated data set, 250 bootstraps were generated with standard exponential weights truncated at 5, to compute the critical values for Rn(ζ), the naive score statistic at several ζ values, supζ Rn(ζ), the sup score statistic and ERn, the weighted exponential score statistic. We take c = 0, 0.5, 1, 3 and ∞ respectively. Each scenario was replicated 1000 times. To compute the restricted MLE under the null, we use the iterative convex minorant algorithm. Empirical type I error and power results for selected subsets of the test statistics described above are provided in Table 1.
Table 1.
The empirical type I error and power of the proposed tests, sample size n = 300, 1000 simulations, with bootstrap size 250. The worst case Monte Carlo error for table entries is 0.016. The Monte Carlo error is 0.007 and 0.009 for empirical type I error with nominal size 0.05 and 0.1 respectively. The empirical power results are based on size 0.05 tests.
| Empirical type I error | |||||||||
| Nominal size | Weighted exponential tests, c= | Sup score | Naive tests Rn(ζ), ζ = | ||||||
| 0 | 0.5 | 1 | 3 | ∞ | 0.3 | 0.9 | 0.5 | ||
| J(ζ) ∼ Uniform[0.05, 0.95] | |||||||||
| 0.05 | 0.056 | 0.057 | 0.045 | 0.063 | 0.046 | 0.058 | 0.043 | 0.044 | 0.039 |
| 0.10 | 0.098 | 0.103 | 0.109 | 0.095 | 0.099 | 0.085 | 0.112 | 0.103 | 0.100 |
| Empirical power | |||||||||
| True alternative | Weighted exponential tests, c= | Sup score | Naive tests Rn(ζ), ζ = | ||||||
| 0 | 0.5 | 1 | 3 | ∞ | 0.3 | 0.9 | 0.5 | ||
| J(ζ) ∼ Uniform[0.05, 0.95] | |||||||||
| ζ=0.5 | |||||||||
| η = −0.3 | 0.646 | 0.647 | 0.653 | 0.653 | 0.656 | 0.688 | 0.243 | 0.044 | 0.692 |
| η = −0.5 | 0.835 | 0.833 | 0.839 | 0.845 | 0.847 | 0.865 | 0.616 | 0.076 | 0.840 |
| η = −0.8 | 0.922 | 0.925 | 0.928 | 0.928 | 0.928 | 0.968 | 0.957 | 0.174 | 0.942 |
| J(ζ) ∼ Uniform[0.05, 0.95] | |||||||||
| η = −0.3 | 0.320 | 0.320 | 0.320 | 0.320 | 0.312 | 0.211 | 0.133 | 0.055 | 0.142 |
| η = −0.5 | 0.485 | 0.488 | 0.492 | 0.494 | 0.500 | 0.405 | 0.258 | 0.083 | 0.272 |
| η = −0.8 | 0.748 | 0.757 | 0.763 | 0.768 | 0.769 | 0.605 | 0.494 | 0.183 | 0.413 |
We now make several general comments on the simulation results. The empirical type I error for all the tests is quite close to the nominal level. When the alternative distribution of ζ is correctly specified, the optimal test is notably more powerful than the sup score statistic and naive tests. When the true alternative distribution of ζ degenerates to one point, although the weighted exponential tests are no longer optimal, the empirical powers are still superior to the naive tests with misspecified ζ. We also observe that the empirical power of the sup score statistics is comparable to that of the naive test at the true ζ, which may be due to the fast convergence rate of the change-point estimator. For all the alternatives considered, the empirical power of the weighted exponential tests seems to increase as c increases. However, the trend is rather weak. In many cases, the difference in power is less than 0.01. This suggests that the direction of the test (specifically, least favorable curve in this paper), rather than the scale of the curve, is most critical for the power of the weighted exponential test.
6. Discussion
In this paper, we consider tests of hypotheses when the parameters are not identifiable under the null in semiparametric models. Our optimality results apply to a large class of semiparametric testing problems under loss of identifiability, where nuisance parameters may not be root-n estimable either under the null or alternative. We note that our current regularity conditions are not directly applicable for testing under loss of inevitability when the parameter of interest is not root-n estimable. One example is testing for homogeneity in mixture models, where the usual first order Taylor approximation may not be possible (Chen et al., 2004; Chernoff and Lander, 1995; Dacunha-Castelle and Gassiat, 1999; Lindsay, 1995; Liu and Shao, 2003). A higher order expansion is required. Although not directly covered by our framework, the homogeneity tests may possess a uniform quadratic expansion (Zhu and Zhang, 2006), thus permitting a generalization of our results to general quadratic expansions. In the following, we conclude the paper with a brief discussion of this generalization.
To be concrete, let us consider a two-component mixture with density g(ρ, μ1, μ2, η) = ρf(μ1, η) + (1 − ρ)f(μ2, η), where f(μ, η) is a parametric p.d.f. with parameters μ ∈ ℝp, η ∈ ℝq, such as a location-scale family. Let β = μ2 − μ1, θ = (ρ, β′, , η′) ′, and the hypothesis of interest is β = 0; that is, there is only a single component in the mixture. For convenience, we assume the mixing proportion ρ ∈ (0, 1] and μ1 = μ2 = μ0 under the null.
In this example, ρ is not identifiable and μ1 and μ2 are mutually indistinguishable under the null. Simple algebra shows that the information matrix for ψ ≡ (β, μ1) is singular under the null, for arbitrary values of ρ, which corresponds to the fact that μ1 and μ2 are not root-n estimable (Chen and Chen, 2003; Zhu and Zhang, 2004). We consider the following reparameterization: μ̄ = (1 − ρ)(μ1 − μ0) + ρ(μ2 − μ0), and v̄ = (1 − ρ)(μ1 − μ0)2 + ρ(μ2 − μ0)2, which can be considered as “mixed mean” and “mixed variance”. Let β ≡ (μ̄, v̄) and ψ̄ ≡ (μ̄, v̄, η). We can establish the identifiability of ψ̄ and the consistency and the root-n rate of the MLE of ψ̄ under the null. Furthermore, under a set of assumptions on the parameter space (e.g. the cone condition in Andrews, 1999 and 2001) and the stochastic differentiability and equicontinuity of the involved functions, we can establish the following quadratic expansion of the loglikelihood w.r.t. , where , and Snζ(·) and Bζ(·) are different from but similar in structure to the score and information processes for ψ̄ indexed by ζ.
When the nuisance parameter η is not present, a similar weight as in the current paper for ψ̄ can be chosen as Qζ(·) = Bζ(·). The corresponding weighted exponential tests are still optimal in the Neyman-Pearson sense. If η is present, a uniformly least favorable curve for this quadratic expansion with respect to β would need to be characterized. This is beyond the scope of the current paper but is an interesting topic for future research.
7. Proofs
Proof of Lemma 1. Since φ̇ζ is linear, continuously invertible and one-to-one, the tangent set for η and η̄ are identical. By the chain rule, for any γ in the tangent set of η̄. The efficient score for β with the parameter (β, η, ζ) is: and with the parameter (β, η̄, ζ) is: . The efficient score function is invariant under such reparameterizations since
and . That the efficient information matrix is invariant under reparameterizations thus follows from its definition.
Proof of Lemma 2. It suffices to show that under the full rank reparameterization, for any random sequence β̃n →P0 β0,
| (11) |
By Assumption B2, B4 and the dominated convergence theorem, for every
, we have
. Similarly, we have
. The derivative of the function t ↦ log ℓ(t, ψ̄0, ζ) satisfies P0ℓ̈(β0, ψ̄0, ζ) = −
β(ψ̄0, ζ). These facts, together with the empirical process conditions, imply that for every random sequence
and
. The subsequent steps of the proof are similar to those used in the proof of Theorem 1 in Murphy and van der Vaart (2000), and we omit the details.
Proof of Theorem 1. The proof takes several steps. We first show the asymptotic equivalence of these statistics, which is summarized in Lemma 5 below. With a small abuse of notation, let . This is the profile likelihood ratio of the alternative over the null and it can be approximated by
with the linear statistic . An approximate exponential Wald statistic is defined as
where .
Now we show the asymptotic distribution of these tests under the null hypothesis. Assume without loss of generality that β̂n and take their values in U and V as defined in Assumption B4, respectively. Following Lemma 3.2 in Murphy and van der Vaart (1997), we have . Thus ℓ̃β(ψ0, ζ) = ℓ̃β(θ0(ζ)) is P0–Donsker as a class indexed by ζ ∈ Ξ and by the continuous mapping theorem. Lemma 5 below then gives the desired results of Theorem 1.
Lemma 5. Under the null hypothesis and Assumptions A–C, (1) , (2) , (3) , (4) and (5) ERn − ELRn →P0 0.
Proof. For notational simplicity, let β̄n = β̄n(θ0(ζ)) and Ĩ0 = Ĩβ(θ0(ζ)). We first show (1). For 0 < M < ∞, define
and
Note that for any M > 0,
Hence it suffices to show that (i) |PLRn − PLRn(M)| →P0 0, (ii) and (iii) , as n → ∞ and ∀M : 0 < M < ∞. To show (i), for any ε > 0,
| (12) |
| (13) |
| (14) |
| (15) |
where (12) uses Assumption C and (13) holds by definition of the profile likelihood. (14) holds by Assumption B3 and Lemma 2. (15) holds by Fubini's theorem. The right hand side of (15) can be made arbitrarily small for all n by taking M large enough, since Qζ is a uniformly tight measure.
For (ii), we have
| (16) |
In the inequality, follows from Assumption B4. The fact that follows from Assumption A. The last term ∫ζ ∈Ξ ∫‖h‖>M dQζ(h)dJ(ζ) → 0, as M → ∞. Hence (16)=op(1), as M → ∞.
Now we show (iii). For contiguous sequences and ‖h‖ ≤ M, Lemma 2 yields the following expansion of the profile likelihood under the null:
therefore,
where the last equality follows from , by arguments analogous to those used in (16) above. The proof for Part (1) is now completed.
For Part (2), since ,
with
where the last equality holds by integrating out a normal density.
For Part (3), it follows from Lemma 2 and Assumption B3 that , and reapplication of Lemma 2 and the argmax theorem yields . Part (3) now follows.
For the proof of Part (4) and Part (5), it suffices to show that and . These results follow from Donsker properties and standard arguments. We omit the details. The proof of Lemma 5 is thus complete.
Proof of Corollary 1. The proof is similar to the proof of Theorem 1. We omit the details.
Proof of Corollary 2. The proof follows the same lines as the proof of Part (2) (iii) of Lemma 5, with
| (17) |
where det is the determinant of a matrix, 〈·, ·〉η is the inner product defined on
η, and W(qζ, ζ) ≤ 1 since
is nonnegative definite.
Lemma 6. Under Assumptions A–C, the densities and are contiguous to the densities . As a consequence, the results of Lemma 5 still hold under local alternatives and .
Proof. Assumption C implies that a LAN (local asymptotic normal) expansion for the log-likelihood ratio holds immediately by Lemma 3.10.11 of van der Vaart and Wellner (1996):
It follows from LAN that Λnζ →d Wζ, where Wζ ∼ N(−1/2‖Aζh‖2, ‖Aζh‖2), under P0. Therefore, under P0,
P0(exp(Wζ)) = 1, using the formula for the moment generating function of the normal distribution. By Le Cam's first Lemma (van der Vaart, 1996, page 88), we conclude that the sequences of probability measures and {P0} are contiguous, for every ζ ∈ Ξ. Consequently the convergence in probability that hold under P0 also hold under and vice-versa. Similarly, since P(eχ) = 1 using the formula for the moment generating function of the χ2 distribution, we conclude that the sequences and are contiguous.
Proof of Theorem 2. We define a
-neighborhood of β0 as a collection of sequences
, for hβ ∈ ℝp. A
neighborhood of η is similarly defined as
, for hη ∈
η. With a minor abuse of notation, a local form of the hypotheses can be written as:
| (18) |
where h1 ∈ ℝp ×
η takes the value (hβ1, hη1), with
. We note that the least favorable direction q̃ζ is invariant under the choice of φζ, as a consequence, the contiguous alternative H1 is also invariant under the choice of φζ.
Define
| (19) |
A test defined by LRn is
where k̃αn > 0, λ̃n ∈ [0, 1] are constants such that the rejection probability is α under the null. For notational simplicity, let . By the Neyman-Pearson lemma, for all n ≥ 1 and any test φn with level α, with a minor abuse of notation,
| (20) |
| (21) |
| (22) |
| (23) |
where (21) follows since LRn has an absolutely continuous asymptotic distribution under the contiguous alternative H1 and by Fubini's theorem. (22) follows since PLRn − LRn = oP (1) under H1, which will be established at the end of the proof. (23) follows from Lemma 6. The results for ERn and ELRn also follow from Lemma 6. By Fubini's theorem, we obtain , which implies that the proposed tests have the greatest weighted average power asymptotically in the class of all tests of asymptotic significance level α, against the alternative .
To show PLRn − LRn = oP (1) under H1, it suffices to show PLRn − LRn = oP (1) under the null by Lemma 6. Define , and note that ∀M : 0 < M < ∞, |PLRn − LRn| ≤ |PLRn − PLRn(M)| + |PLRn(M) − LRn(M)| + |LRn − LRn(M)|. Hence it suffices to show that (i) |PLRn − PLRn(M)| →P0 0, (ii) and (iii) , as n → ∞. Part (i) was shown in Lemma 5. Part (ii) can be similarly established by taking M large enough and using Assumption A.
To show Part (iii), we take Taylor expansion of at (ψ̄0, ζ) with respect to hβ along the direction q̃ζ, which leads to the following expansion in the least favorable submodel:
On the right-hand side, we can replace ℙnℓ̇(β0, ψ̄0, ζ) by , and ℙnℓ̈(β̃, ψ̃, ζ) by , according to Assumption B2. Comparing the above display and Lemma 2 with , we obtain Part (iii).
Proof of Theorem 3. The equivalence of the three tests under local alternatives is shown in Lemma 6. To show their asymptotic distribution, a key step is to establish that β̄n converges under
in distribution to the process ζ ↦
(θ0(ζ)) + v⋆ (hβ, ζ, ζ1), where v⋆ (hβ, ζ, ζ1) ≡ P0l̃β(θ0(ζ))l̃β(θ0(ζ1))′ hβ, by Theorem 3.10.12 in van der Vaart and Wellner (1996). The result follows by Lemma 6 and the continuous mapping theorem.
Proof of Theorem 4. The equivalence of the three tests under local alternatives is shown in Lemma 6. Since the sequences of densities are contiguous to the density , we have
under P0. Then ELRn →d rχ(c) under , by Le Cam's third lemma.
Proof of Lemma 3. The proof mainly involves an argument that for an arbitrary, possibly random sequence {ζn}, the distance between the minimizer of the Kullback-Leibler information and θ̂n(ζn) goes to zero. Consequently, the assertion of Lemma 3 follows from the arbitrariness of the sequence ζn and Slutsky's theorem. We omit the details.
Proof of Lemma 4. The proof mainly involves a uniform “peeling device” with an adaptation of the proof of Theorem 3.2 given in Murphy and van der Vaart (1999), which details we omit.
Lemma 7. (1) Assume φζ :
φ ⊂
↦
ψ ⊂
is one-to-one, continuously invertible and onto and ψζ :
ψ ⊂
↦
is one-to-one, continuously invertible and onto, then ψζ ○ φζ :
φζ ↦
is one-to-one, continuously invertible and onto. (2). Assume φζ :
φ ⊂
↦
ψ ⊂
is uniformly Fréchet differentiable at θ ∈
ψ and ψζ :
ψ ⊂
↦
is uniformly Fréchet differentiable at φζ(θ) over ζ ∈ Ξ. Then ψζ ∘ φζ :
φ ↦
is uniformly Fréchet differentiable at θ with derivative
.
Proof. For Part (1), it suffices to note that ‖ψ̇ζ(φζ(θ))(φ̇ζ(θ)(h))‖ ≥ c1‖φ̇ζ(θ)(h)‖ ≥ c1c2‖h‖. For Part (2), we note that, ψζ ∘ φζ(θ + th) − ψζ ∘ φζ(θ) = ψζ(φζ(θ)+tkt) − ψζ(φζ(θ)), where kt = {φζ(θ+th) − φζ(θ)}/t. So we rewrite the uniform Fréchet difference as ψζ(φζ(θ + h))(·) − ψζ(φζ(θ))(·) = ψ̇ζ(φζ(θ))(φζ(θ+h) − φζ(θ))+o Ξ(‖φζ(θ+h) − φζ(θ)‖) = ψ̇ζ(φζ(θ))ψ̇ζ(θ)(h) + oΞ(‖h‖).
Lemma 8. Let Aζ = Tζ + Kζ :
↦
be a linear operator between Banach spaces, where Tζ is onto and there exists c1 > 0, such that ‖Tζh‖ ≥ c1‖h‖ for all h ∈
and ζ ∈ Ξ, and Kζ is uniformly compact, i.e., {ζ ∈ Ξ, ‖h‖ ≤ 1 : ∪‖h‖≤1Kζh} is compact. Then if N(Aζ) = {0} for all ζ ∈ Ξ, then Aζ is onto and there exists c2 > 0 such that ‖Aζh‖ ≥ c2‖h‖, ∀ζ ∈ Ξ and all h ∈
.
Proof. Since, for an arbitrary random sequence ζn, is continuous, the operator is compact. Hence is one-to-one and therefore also onto be a result of Riesz for compact operators. Thus Tζn +Kζn is also onto. We will be done if we can show that is continuously invertible, since that would imply that is bounded. The remainder of the proof follows the proof of Lemma 6.17 in Kosorok (2008).
Lemma 9. Suppose that Un(ψ, ζ)(h) = ℙnv(ψ, ζ)(h) and U(ψ, ζ)(h) = Pv(ψ, ζ)(h) for given P-measurable functions v(ψ, ζ)(h) indexed by Ψ × Ξ and an arbitrary index set
η. Assume ψ = ψ0, v(ψ0, ζ)(h) = v(ψ0)(h). If
, the class of functions {v(ψ, ζ)(h) − v(ψ0)(h) : ‖ψ−ψ0‖ < δ, h ∈
η, ζ ∈ Ξ} is P−Donsker for some δ > 0 and supζ∈Ξ,h∈
η P0(v(ψ, ζ)(h)−v(ψ0)(h))2 → 0, as ψ → ψ0, then
.
Proof. This is a “uniform” version of Lemma 3.3.5 in van der Vaart and Wellner (1996). Let Ψδ ≡ {ψ : ‖ψ − ψ0‖ < δ} and define an extraction function f : ℓ∞(Ψδ × Ξ ×
η) × Ψδ ↦ ℓ∞(
η × Ξ) as f(z, ψ, ζ)(h) ≡ z(ψ, ζ, h), where z ∈ ℓ∞(Ψδ ×
η × Ξ). Since f is continuous at every point (z, ψ1, ζ), we have suph∈
η,ζ∈Ξ |z(ψ, ζ, h) − z(ψ1, ζ, h)| → 0 as ψ → ψ1. Define the stochastic process Zn(ψ, ζ, h) =
n(v(ψ, ζ)(h) − v(ψ0, ζ)(h)) indexed by Ψδ × Ξ ×
η. By assumptions, Zn converges weakly in ℓ∞(Ψδ × Ξ ×
η) to a tight Gaussian process Z0 with continuous sample paths with respect to the metric ρζ defined by
((ψ1, ζ, h1), (ψ2, ζ, h2)) = P(v(ψ1, ζ)(h1) − v(ψ0, ζ)(h1) − v(ψ2, ζ)(h2) + v(θ0, ζ)(h2))2, at fixed ζ. Since as assumed, suph∈
η,ζ∈Ξ ρζ((ψ, h), (ψ0, h)) → 0, we have that f is continuous at almost all sample paths of Z0 uniformly over ζ ∈ Ξ. The result now follows by Slutksy's theorem and the continuous mapping theorem.
Contributor Information
Rui Song, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7420, U.S.A. E-mail: rsong@bios.unc.edu.
Michael R. Kosorok, Departments of Biostatistics and Statistics & Operation Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7420, U.S.A. E-mail: kosorok@unc.edu.
Jason P. Fine, Departments of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7420, U.S.A. E-mail: jfine@bios.unc.edu.
References
- Andrews DWK. Estimation when a parameter is on a boundary. Econometrica. 1999;67:1341–1383. [Google Scholar]
- Andrews DWK, Ploberger W. Optimal tests when a nuisance parameter is present only under the alternative (STMA V36 3473) Econometrica. 1994;62:1383–1414. [Google Scholar]
- Andrews DWK, Ploberger W. Admissibility of the likelihood ratio test when a nuisance parameter is present only under the alternative. The Annals of Statistics. 1995;23:1609–1629. [Google Scholar]
- Bagdonavičius VB, Nikulin MS. Generalized proportional hazards model based on modified partial likelihood. Lifetime Data Analysis. 1999;5:329–350. doi: 10.1023/a:1009688109364. [DOI] [PubMed] [Google Scholar]
- Bickel PJ, Ritov Y, Stoker TM. Tailor-made tests for goodness of fit to semiparametric hypotheses. The Annals of Statistics. 2006;34:721–741. [Google Scholar]
- Chappell R. Fitting bent lines to data, with applications to allometry. Journal of Theoretical Biology. 1989;138:235–256. doi: 10.1016/s0022-5193(89)80141-9. [DOI] [PubMed] [Google Scholar]
- Chen H, Chen J. Tests for homogeneity in normal mixtures in the presence of a structural parameter. Statistica Sinica. 2003;13:351–365. [Google Scholar]
- Chen H, Chen J, Kalbfleisch JD. Testing for a finite mixture model with two components. Journal of the Royal Statistical Society, Series B: Statistical Methodology. 2004;66:95–115. [Google Scholar]
- Chernoff H. On the distribution of the likelihood ratio. Annals of Mathematical Statistics. 1954;25:573–578. [Google Scholar]
- Chernoff H, Lander E. Asymptotic distribution of the likelihood ratio test that a mixture of two binomials is a single binomial. Journal of Statistical Planning and Inference. 1995;43:19–40. [Google Scholar]
- Dacunha-Castelle D, Gassiat E. Testing the order of a model using locally conic parametrization: Population mixtures and stationary ARMA processes. The Annals of Statistics. 1999;27:1178–1209. [Google Scholar]
- Davies RB. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika. 1977;64:247–254. doi: 10.1111/j.0006-341X.2005.030531.x. [DOI] [PubMed] [Google Scholar]
- Davies RB. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika. 1987;74:33–43. [Google Scholar]
- Golub GH, Van Loan CF. Matrix Computations. Johns Hopkins University Press; 1983. [Google Scholar]
- Huang J. Efficient estimation for the proporitional hazards model with interval censoring. The Annals of Statistics. 1996;24:540–568. [Google Scholar]
- King ML, Shively TS. Locally optimal testing when a nuisance parameter is present only under the alternative. The Review of Economics and Statistics. 1993;75:1–7. [Google Scholar]
- Kosorok MR. Introduction to Empirical Processes and Semiparametric Inference. Springer; New York: 2008. [Google Scholar]
- Kosorok MR, Lee BL, Fine JP. Robust inference for univariate proportional hazards frailty regression models. The Annals of Statistics. 2004;32:1448–1491. [Google Scholar]
- Kosorok MR, Song R. Inference under right censoring for transformation models with a change-point based on a covariate threshold. The Annals of Statistics. 2007;35:957–989. [Google Scholar]
- Lindsay BG. Mixture Models: Theory, Geometry, and Applications. Institute of Mathematical Statistics; 1995. [Google Scholar]
- Liu X, Shao Y. Asymptotics for likelihood ratio tests under loss of identifiability. The Annals of Statistics. 2003;31:807–832. [Google Scholar]
- Luo X, Turnbull BW, Clark LC. Likelihood ratio tests for a changepoint with survival data. Biometrika. 1997;84:555–565. [Google Scholar]
- Murphy SA, Rossini AJ, van der Vaart AW. Maximum likelihood estimation in the proportional odds model. Journal of the American Statistical Association. 1997;92:968–976. [Google Scholar]
- Murphy SA, van der Vaart AW. Semiparametric likelihood ratio inference. The Annals of Statistics. 1997;25:1471–1509. [Google Scholar]
- Murphy SA, van der Vaart AW. Observed information in semiparametric models. Bernoulli. 1999;5:381–412. [Google Scholar]
- Murphy SA, van der Vaart AW. On profile likelihood (C/R: P466-485) Journal of the American Statistical Association. 2000;95:449–465. [Google Scholar]
- Parner E. Asymptotic theory for the correlated gamma-frailty model. The Annals of Statistics. 1998;26:183–214. [Google Scholar]
- Pollard D. Another look at differentiability in quadratic mean. 1995 URL http://www.stat.yale.edu/∼pollard/LeCamFest/Pollard.pdf.
- Pons O. Estimation in a Cox regression model with a change-point according to a threshold in a covariate. The Annals of Statistics. 2003;31:442–463. [Google Scholar]
- Scharfstein DO, Tsiatis AA, Gilbert PB. Semiparametric efficient estimation in the generalized odds-rate class of regression models for right-censored time-to-event data. Lifetime Data Analysis. 1998;4:355–391. doi: 10.1023/a:1009634103154. [DOI] [PubMed] [Google Scholar]
- Slud EV, Vonta F. Consistency of the NPML estimator in the right-censored transformation model. Scandinavian Journal of Statistics. 2004;31:21–41. [Google Scholar]
- van der Vaart AW. Asymptotic Statistics. Cambridge University Press; Cambridge: 1996. [Google Scholar]
- van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Springer; New York: 1996. [Google Scholar]
- Wald A. Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the American Mathematical Society. 1943;54:426–482. [Google Scholar]
- Zhu HT, Zhang HP. Hypothesis testing in mixture regression models. Journal of the Royal Statistical Society, Series B: Statistical Methodology. 2004;66:3–16. [Google Scholar]
- Zhu HT, Zhang HP. Asymptotics for estimation and testing procedures under loss of identifiability. Journal of Multivariate Analysis. 2006;97:19–45. [Google Scholar]
