Summary
The infinite-dimensional information operator for the nuisance parameter plays a key role in semiparametric inference, as it is closely related to the regular estimability of the target parameter. Calculation of information operators has traditionally proceeded in a case-by-case manner and has often entailed lengthy derivations with complicated arguments. We develop a unified framework for this task by exploiting commonality in the form of semiparametric likelihoods. The general formula developed allows one to derive information operators with simple calculus and, if necessary at all, a minimal amount of probabilistic evaluation. This streamlined approach shows its simplicity and versatility in application to a number of existing models as well as a new model of practical interest.
Keywords: Efficient score, Infinite-dimensional nuisance parameter, Missing data, Survival analysis, Tangent space
1. Introduction
Consider a smooth parametric model with density
, where
is the parameter of interest and
is an unknown nuisance parameter. Suppose that the information matrix for
can be written in the partitioned form
![]() |
(1) |
where
can be regarded as the information for
under a known nuisance parameter
. When
is unknown, the efficient information for
is
![]() |
(2) |
The second term in (2) represents the information lost due to the unknown
. If
is positive definite, then a regular and asymptotically linear estimator for
exists (Bickel et al., 1993, Ch. 2). Among such estimators, the maximum likelihood estimator is usually the most efficient, with an asymptotic variance matrix that attains the Cramér–Rao bound, i.e.,
.
Now, consider a semiparametric version of the above model where the Euclidean nuisance parameter
is replaced by an infinite-dimensional parameter
. Specifically, denote the class of probability measures by
![]() |
(3) |
where
is a nonparametric space of probability measures or positive finite measures. Use
to denote the density of
with respect to some dominating measure. Let
denote the score function for
and
the score operator for
, where
is the original tangent space for
(see, e.g., Bickel et al., 1993) and
denotes the space of all
-square-integrable functions for any measure
. Because there is no parametric constraint on
, if it is a probability measure, then
, the space of all
-mean zero square-integrable functions; if it is a positive finite measure, then
. In practice one can usually, without loss of generality, work with a smaller set than
, e.g., the subset of all bounded functions with bounded variation (see, e.g., van der Vaart, 1998, Ch. 25). In such cases, the score functions for
can typically be generated by taking
with
.
Let
denote the information matrix for
under a known
, where
for any vector
. Let
denote the adjoint of
. The information operator for
can be expressed in a form analogous to (1):
![]() |
(4) |
which acts upon
. Here and after, operations on a vector with components in a Hilbert space are understood to operate componentwise. In parallel with (2), the efficient information for
in the presence of unknown
is
![]() |
(5) |
The efficient information
is the variance matrix of the efficient score function
, i.e., the
-projection of
onto the orthogonal complement of the nuisance tangent space
.
To illustrate with a concrete example, consider the Cox model with right-censored data (Cox, 1975), the reputed archetype of semiparametric models. Let
denote the event time of interest, and
a vector of covariates. The Cox proportional hazards model specifies that
![]() |
(6) |
where
is the conditional cumulative hazard function of
given
,
is the regression parameter,
is the baseline cumulative hazard function, and
is the maximum length of follow-up. Here,
is the parameter of interest and
is the nuisance parameter. Let
denote the censoring time which satisfies
. Then, the observed data consist of
, where
is the indicator function and
. With
, the elements on the right-hand side of (4) can be derived explicitly as follows (see, e.g., § 25.12.1 of van der Vaart, 1998):
,
,
, and
, where
,
, and
. Using these identities, the efficient information
can be calculated explicitly under (5).
Like in the parametric model, positive definiteness of
is necessary for the regular estimability of
, and is usually the key condition governing the asymptotic efficiency of the maximum likelihood estimator (see, e.g., van der Vaart, 1998, Ch. 25). Unfortunately, direct calculation of
is rarely feasible beyond the archetypal example of the Cox model. This is because the semiparametric information operator
for the nuisance parameter, i.e., the analogue of
from the parametric case, is a map between infinite-dimensional spaces. Consequently, its properties such as invertibility are often more elusive than information matrices in parametric models. Even if the operator is invertible, an analytic expression for its inverse is generally nonexistent, making it impossible to evaluate
through (5).
Due to these challenges, proof of regular estimability for the Euclidean parameter in a semiparametric model often involves lengthy and complicated arguments on a case-by-case basis. A common pivotal component of such arguments, however, is the derivation of the information operator
and its mathematical properties. There are several broad-based approaches to deriving
in the existing literature. The first is a conditional expectation approach suitable for the so-called information loss models (§ 25.5.2 of van der Vaart, 1998) which include many models for missing/censored data. The approach assumes that the observed data
is a randomly coarsened version of the full data
, the latter following a nonparametric distribution
. It then follows that
with
, and that
with
. This approach is not completely general as it is restricted to a particular type of data-generating mechanism. In addition, the evaluation of the conditional expectations can be mathematically formidable unless the model is sufficiently simple. The other approach involves constructing a least favourable submodel such that its score function coincides with the efficient score (see, e.g., Huang & Wellner, 1997; Murphy et al., 1997; Zeng & Lin, 2006). This approach is generally applicable, but draws heavily on functional analytic arguments regarding Hilbert-space operators. Unfortunately, to the average statistician, the techniques needed to carry out the calculation in both approaches are mostly nonstandard and unfamiliar.
In this paper we propose a simple and general approach to the calculation of information operators in semiparametric models. Theoretical results are derived based on a common form of semiparametric likelihoods. Consequently, when studying a particular model, the user need only plug in the specific expressions from the model to replace the generic terms in the general likelihood. This allows one to bypass complicated functional analytic and probabilistic arguments characteristic of individual case-by-case treatments. It also offers new insights into results obtained earlier in the literature on a seemingly ad hoc basis.
2. Theory and methods
2.1. Two types of semiparametric problems
As mentioned in § 1, the information operator
is the semiparametric counterpart of
in (1). Unlike the parametric case, however,
is sometimes noninvertible so that the formula (5) for the efficient score does not apply. In such cases, the infinite-dimensional parameter
is not
-estimable (see, e.g., Huang & Wellner, 1997). Depending on whether
is continuously invertible, i.e., its inverse operator exists and is continuous, most of the semiparametric models in the literature can be classified into two categories.
In Category 1,
can be expressed as the sum of a continuously invertible operator
, often in the form of a multiplication operator, and a compact operator
, often in the form of an integral operator. A compact operator maps the unit ball in
into a totally bounded set. Because it is generally impossible to find an analytic expression for
, the positive definiteness of
is usually assessed, not by (5), but by indirect means. The idea is best illustrated using the parametric example. If the efficient information for
in the parametric model is invertible, by rules of matrix inverse, one has that
![]() |
(7) |
where
. The matrix
is the efficient information for
in the presence of an unknown
. By (7), provided that
is invertible, the invertibility of
is equivalent to that of
. Similarly, in the semiparametric case define
by
, where
is a compact operator. As the semiparametric analogue of
, the operator
is the efficient information operator for
in the presence of an unknown
. Also similarly to the parametric case, under a nonsingular
, the efficient information
is nonsingular if
is continuously invertible, which is a somewhat stronger condition than
being continuously invertible. Intuitively, continuous invertibility of
means not only that
is regularly estimable, but also that
and
are not locally confounded. Because
is the sum of a continuously invertible operator
and a compact operator
, by the Fredholm theory (Rudin, 1973), it is continuously invertible if it is one-to-one. The latter condition can usually be proved through local identifiability arguments. If the estimator is obtained by maximum likelihood, its asymptotic properties are best handled by the likelihood equations approach (see § 25.12 of van der Vaart, 1998). Examples of Category 1 problems include Murphy (1995), Murphy et al. (1997), Parner (1998), Kosorok et al. (2004), Zeng & Lin (2006) and Mao & Lin (2017), among others.
In the second category,
is itself a compact integral operator and is therefore non-invertible. The nuisance parameter
in such cases is generally estimable only at rates slower than
, but the same need not be true for
. To check whether
is regularly estimable, one seeks to derive, or at least show the existence of, a least favourable direction
satisfying the normal equation
![]() |
(8) |
Then, the efficient score for
, defined as the projection of
onto the orthogonal complement of
, is
. This is because
for all
, where
denotes the inner product in
. The nonsingularity of
may be proved through local identifiability arguments. If the estimator is obtained by maximum likelihood, its asymptotic properties are best handled by the approximately least-favourable submodels approach (see § 25.11 of van der Vaart, 1998). Examples of Category 2 problems include Huang (1995, 1996), Huang & Wellner (1997) and Zeng et al. (2016), among others.
For both categories, one needs to derive the specific forms of the information operators
and
, and check the corresponding requirements such as continuous invertibility or existence of a root. Such analyses usually constitute the main steps in deriving the asymptotic properties of the maximum likelihood estimators, and should not be taken lightly since semiparametric likelihoods may sometimes be ill-behaved (van der Vaart, 2002, § 5.2).
2.2. A general formula for the information operators
Our approach to establishing a general formula for the information operators consists in taking second derivatives on a general form of semiparametric likelihoods. In the parametric setting, it is well known that the information matrix can be equivalently expressed as the expectation of the negative quadrature of the loglikelihood. This equivalency can be exploited in the semiparametric case to simplify calculation as well. The following lemma lays the foundation for the subsequent derivation of information operators. Throughout, we assume that model (3) is sufficiently smooth to justify pointwise differentiation as a means of score generation and interchange of expectation and differentiation whenever appropriate. For a more general set-up for smooth models based on differentiability in the quadratic mean, see Bickel et al. (1993).
Lemma 1.
Let
be a score function for model (3). Write
,
. Then
The following theorem presents the formulas for the score functions, score operators and information operators based on a general form of semiparametric likelihoods. The proof involves straightforward application of Lemma 1 with
or
. Unless otherwise specified, we use
and
to denote the first and second derivatives of a generic smooth function
.
Theorem 1.
Suppose that the loglikelihood for model (3) takes the form
(9) where
,
, and
are real-valued data-dependent functions and
is a data-dependent linear functional on the closed linear span of the space for
, the log-density of
with respect to a certain dominating measure. Write
,
and
, where
. Then, if
, we have that
where
(10) If
, the results are the same except that
and
are centred to have
-mean zero, and
.
Remark 1.
For notational simplicity, we have assumed that the function
in Theorem 1 is real-valued. It is straightforward to extend the results to vector-valued
. Furthermore, instead of a single nuisance parameter
, one can extend the framework to accommodate multiple nuisance parameters
. In such cases, the original tangent space for
will be
, where
is the original tangent space for
![]()
. These extensions are considered and illustrated in the Supplementary Material.
As mentioned in § 2.1, the efficient score for
is the score function minus its projection onto the tangent space for
, i.e.,
, where
solves the normal equation (8). Under the conditions of Theorem 1, both sides of (8) have explicit expressions. The problem is then treated as either Category 1 or Category 2 depending on whether
is continuously invertible.
2.3. Positive definiteness of the efficient information
Under the conditions of Theorem 1, the information operator
can be written as the sum of a multiplication operator with multiplier
and a compact Hilbert–Schmidt integral operator with kernel
, insofar as
is square-integrable by
, which we shall always assume to be true. The multiplication operator is continuously invertible if
is bounded above and away from zero. If so, the model is of Category 1. Likewise, if
, it is of Category 2. The local identifiability condition needed for both categories to ensure nonsingularity of
can be stated formally as follows.
Condition 1 (Local identifiability).
If
(11)
-almost surely for some
and
, then
and
.
Since the left-hand side of (11) is a score function in the general form
, Condition 1 simply says that the joint score operator is one-to-one so that local alternatives to
in all possible directions can be identified. In particular, it implies that
is positive definite. To use it to show that
is one-to-one for Category 1 problems, take
and
. Then, one can show that
implies
, which in turn implies
. For Category 2 problems, provided that
as a solution to (8) exists, one may take
to show that
is positive definite.
Corollary 1.
Suppose that Condition 1 is satisfied. Then,
is positive definite if either of the following is true:
(i) Category
: There exists
such that
.
(ii) Category:
and the solution
to the following integral equation exists:
(12)
In the second case, solution of
usually starts with taking derivatives on both sides of (12). For example, Huang & Wellner (1997) took this route to show that the solution exists for the Cox model with case-2 interval-censored data. In general, this approach requires that
is a smooth function and lies in the range of
.
The following two propositions can usually simplify calculations of the quantities in (10) for Category 2 problems. The first one is fairly intuitive: if the density of
does not appear in the likelihood, then information on some aspects thereof cannot be recovered to the first order and thus
will not be continuously invertible.
Proposition 1.
Under the conditions of Theorem 1, if
then
![]()
-almost everywhere.
Proof.
With
, use
to find that
for all
. If
, this means that
; if
, this means that
is a constant, which also leads to the desired result by the form of
in this case. ☐
For Category 2 problems, derivation of the normal equation (12) can be further simplified if
is a conditional density in a certain form.
Proposition 2.
Suppose that the density for model (3) can be written in the form
where
is the conditional density of
given
and
is the marginal density of
. If the loglikelihood
can be written in the form of (9) with
,
,
and
for some deterministic functions
and
, then
and
. Then, the normal equation (12) becomes
(13)
Proof.
In light of Proposition 1, we only need to show that
. Because
is now a score function for the conditional density of
given
, we have that
. The result follows from the fact that
depends on
only. ☐
Proposition 2 applies to all standard regression models for interval-censored data, where the examination times are conditionally independent of the event times given covariates (see, e.g., Sun, 2007). Indeed, let
be the event time of interest,
be a sequence of examination times,
be the observed indicators for the affiliation of
to the intervals partitioned by
, and
be the covariates. If
and
parametrizes only the conditional distribution of
given
, the conditions of Proposition 2 are satisfied with
and
. In these models, the nuisance parameter
, usually played by a nonparametric baseline function, is estimable only at
, whereas the regression parameter
is often regularly estimable.
3. Applications
3.1. The Cox model under right and interval censorships
We first consider the simple example of the Cox model for right-censored data described in § 1. The loglikelihood for the observed data is
![]() |
(14) |
with
. Comparing (14) with (9), one readily recognizes that
,
,
and
. The last identity means that
operates on
by evaluating it at
and then multiplying it by
. Hence,
,
,
and
. By Theorem 1, we have that
,
,
and
. If
has bounded support and
, we have that the multiplier
is bounded above and away from zero. It is thus a Category 1 problem, but is special in that the efficient score can be constructed explicitly. Indeed, the normal equation (8) can be solved by
. Then, an approximation to the efficient score
can be constructed by replacing
with an empirical version, leading to the familiar partial likelihood score function for
(Cox, 1975). Furthermore, under linear independence of
, it is easy to show that Condition 1 is satisfied so that the efficient information is positive definite. Related models for recurrent event and competing risks are considered in the Supplementary Material.
The Cox model under case-1 interval censoring, studied in detail by Huang (1996), offers an example of a Category 2 problem. The conditional hazard of
given
is specified by the same model (6), but the observed data now consist of
, where
is the examination time satisfying
. Clearly, the likelihood for the observed data satisfies the conditions of Proposition 2 with
and
. The loglikelihood is
![]() |
So, we may set
and
, so that
,
, and
. By Proposition 2, the normal equation is in the form of (13), which, by straightforward iterated conditional expectation, can be simplified to
![]() |
(15) |
where
and
are redefined by
respectively, and 
Assuming that the support of
contains
, take the derivatives on both sides of (15) to find that
![]() |
(16) |
where
. So,
. Using (16), one easily obtains the efficient score
![]() |
Remark 2.
Huang (1996) and van der Vaart (1998, 2002) derived the same result for the efficient score by orthogonal projections. However, their approach seems to be model specific and, particularly in the construction of an approximately least favourable submodel, to require a fair amount of ingenuity. On the other hand, the approach outlined here involves only routine calculus and a minimal amount of probabilistic evaluation that yields (15).
Like in other interval-censoring problems (Huang & Wellner, 1997), the maximum likelihood estimator for the infinite-dimensional baseline function
converges at
. However, provided that the efficient information for
is positive definite, this nonstandard rate does not interfere with the asymptotic normality and efficiency of the maximum likelihood estimator for
. In fact, it appears that, in general, the nuisance parameter estimator need only converge at a rate faster than
(see, e.g., § 6 of Huang, 1996).
3.2. Regression models with missing covariates
Suppose that the conditional density of outcome
with respect to a dominating measure
given regressor
is specified through a parametric model
, where
. Estimation of
would be standard if
is fully observed, because the likelihood factorizes into two parts containing
and
, respectively. In the case of missing data in the regressor, however, the nonparametric component
will get entangled with the regression parameter and this will complicate inference. Problems of this type have been studied in general settings by Lawless et al. (1999). Here we consider a simple case with a single level of missingness in
. Using the notation of Tsiatis (2006), we denote the coarsened regressor by
, where
is a known many-to-one function. Let
if the full data
are observed and
if only the coarsened version
is available. We assume that the data are coarsened at random, that is,
, where
is some arbitrary function for the selection probability. Assuming that
involves no aspect of
, the loglikelihood for the observed data
is
![]() |
Thus, we may set
,
,
and
. Then, we have that
,
and
, where
is the full-data score function for
. Using straightforward calculus, it is not hard to obtain that
![]() |
where
. Some details on the derivation of
are given in the Supplementary Material.
Proposition 3.
Suppose that the following two conditions hold:
(a)
for some
;
(b)
is positive definite almost surely.
Then, the efficient information
is positive definite.
Proof.
The multiplier
is clearly bounded above, and is bounded away from zero by (a). One can easily use (a) and (b) to verify Condition 1. The result follows by Corollary 1. ☐
Remark 3.
The information operator
may be derived using the traditional information-loss approach. One first evaluates
and then derives
. To arrive at a simple expression such as obtained here, substantial work is needed to evaluate the conditional expectations through repeated use of the Bayes rule and interchange of integrals.
3.3. A novel semiparametric survival-sacrifice model
Finally, we apply our methods to a novel semiparametric model for so-called survival-sacrifice data, which are routinely collected in animal carcinogenicity experiments. In such experiments, animals are randomized into different treatment arms to receive different doses of a carcinogen. The goal is to compare the incidence of tumor between arms over the course of follow-up. Let
denote time to tumor formation and let
denote time to tumor-caused death, so we have that
almost surely. Because the tumor under study is usually impalpable, necropsy is needed to determine whether a tumor is present. Such an examination is performed either at death time
or at a random censoring time
that is unrelated to the tumor. Hence, the observed data consist of
. In scientific terminology, the variables
and
indicate fatal and incidental tumors, respectively. Write
and
. Assuming
, we can write the likelihood of the observed data
as a multiple of
![]() |
(17) |
Nonparametric estimation of
has been well studied in the literature. Recently, Mao (2019) proposed ad hoc numerical procedures for the regression of
and
against a set of covariates
, but without a formal study of the semiparametric theory. Here we consider the theoretical aspects of a simple regression model for the conditional cumulative hazards of
and
given
, denoted by
and
, respectively. Let
![]() |
(18) |
The baseline hazard functions of the two marginal models differ only by a multiplicative factor
, with
to account for the ordering
. This parsimonious model allows for joint assessment of covariate effects on both fatal and incidental tumors through
.
Write
,
and
. Assume that
. By replacing the cumulative distribution functions in (17) with their conditional counterparts specified in model (18), we obtain the loglikelihood
![]() |
where
. Now, it is easy to recognize that
,
,
, and
. So, in this case we have a vector-valued
function as discussed in Remark 1. By straightforward calculus, we have that
,
, and
, where
and
![]() |
Based on these quantities, we show in the Supplementary Material that, under suitable regularity conditions, Condition 1 for local identifiability is satisfied. Furthermore, using the fact that
, one can easily show that
. If
has bounded support and
, we have that
is bounded above and away from zero. Thus, by Corollary 1, the efficient information for
is positive definite under these conditions.
4. Remarks
We have focused on semiparametric models indexed jointly by a Euclidean parameter of interest and an infinite-dimensional nuisance parameter. The proposed approach, however, can easily be adapted for a nonparametric model
, where the interest centres on certain aspects of the infinite-dimensional parameter
; see the Supplementary Material for details.
The specified form of loglikelihood (9) appears general enough to encompass a surprisingly large pool of existing semiparametric models. It is thus reasonable to expect our framework to be amenable and useful to many new models to come. Straightforward extensions to Theorem 1 exist to further widen its applicability. For example, the function
can be made dependent on
, which will accommodate the loglikelihoods for some frailty models in survival analysis (see, e.g., Kosorok et al., 2004). Furthermore, the conventional derivative of
with respect to
can be replaced by a generalized derivative, e.g., one such that
. This generalization is useful when applied to the accelerated failure time model (Buckley & James, 1979), where
may be in the form of
.
Our framework is most useful when the likelihood is explicitly indexed by a Euclidean parameter and an infinite-dimensional parameter. Other semiparametric models are more naturally formulated through moment or conditional moment constraints (see, e.g., Bickel et al., 1993, § 6.2); for such models it is usually easier to derive information operators via direct projection methods.
We have been concerned exclusively with the theoretical conditions that imply the positive definiteness of the efficient information. Even when the conditions are satisfied, however, numerical implementation of the maximum likelihood may still be a challenge due to the presence of an infinite-dimensional nuisance parameter. Finite-dimensional approximation may produce bias if the approximation is too coarse and may render computation infeasible if it is too fine. The profile likelihood approach (Murphy & van der Vaart, 2000) circumvents the infinite dimensionality issue by profiling out the nuisance parameter in the likelihood, and is thus a numerically reliable procedure for fitting semiparametric models (see, e.g., Yin & Zeng, 2006; Zeng & Lin, 2006; Mao & Lin, 2017).
Supplementary Material
Acknowledgement
This research was supported by the U.S. National Institutes of Health. I appreciate the helpful comments by the editor, associate editor and two referees.
Appendix
Proof of Theorem 1.
Calculations for
and
are straightforward. We thus focus on exhibiting the forms of
and
. By Lemma 1, if
, where
is a bounded function with bounded total variation in
, then
(A1) The results for
then follow by equating it to
if
and to
if
.
Now, consider
, where
. Here we used
instead of
to stress the possible local dependence of the direction
on
. If
and
, since
is bounded, we have that
for all
. So,
is a score function under
. Thus,
can be derived from
similarly to (A1). For
, however, a fixed score
does not generally have
-mean zero so that
. To circumvent this problem, set
and apply the previous calculations to the score function
to obtain the desired result. More details can be found in the Supplementary Material. ☐
Supplementary material
Supplementary Material available at Biometrika online includes technical results and additional examples.
References
- Bickel, P. J., Klaassen, C. A., Ritov, Y. A. & Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Baltimore: Johns Hopkins University Press. [Google Scholar]
- Buckley, J. & James, I. (1979). Linear regression with censored data. Biometrika 66, 429–36. [Google Scholar]
- Cox, D. R. (1975). Partial likelihood. Biometrika 62, 269–76. [Google Scholar]
- Huang, J. (1995). Maximum likelihood estimation for proportional odds regression model with current status data In Analysis of Censored Data, IMS Lecture Notes - Monograph Series 27, Koul H. L. & Deshpande J. V., eds. Hayward: IMS, pp. 129–46. [Google Scholar]
- Huang, J. (1996). Efficient estimation for the proportional hazards model with interval censoring. Ann. Statist. 24, 540–68. [Google Scholar]
- Huang, J. & Wellner, J. A. (1997). Interval censored survival data: a review of recent progress In Proc. 1st Seattle Symp. Biostatistics: Survival Analysis, Lin D. Y. & Fleming T. R., eds. New York: Springer, pp. 123–69. [Google Scholar]
- Kosorok, M. R., Lee, B. L. & Fine, J. P. (2004). Robust inference for univariate proportional hazards frailty regression models. Ann. Statist. 32, 1448–91. [Google Scholar]
- Lawless, J. F., Kalbfleisch, J. D. & Wild, C. J. (1999). Semiparametric methods for response-selective and missing data problems in regression. J. R. Statist. Soc. B, 61, 413–38. [Google Scholar]
- Mao, L. (2019). Proportional hazards regression of survival-sacrifice data with cause-of-death information in animal carcinogenicity studies. Statist. Med., 38, 3628–41. [DOI] [PubMed] [Google Scholar]
- Mao, L. & Lin, D. Y. (2017). Efficient estimation of semiparametric transformation models for the cumulative incidence of competing risks. J. R. Statist. Soc. B, 79, 573–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murphy, S. A. (1995). Asymptotic theory for the frailty model. Ann. Statist. 23, 182–98. [Google Scholar]
- Murphy, S. A., Rossini, A. J. & van der Vaart, A. W. (1997). Maximum likelihood estimation in the proportional odds model. J. Am. Statist. Assoc. 92, 968–76. [Google Scholar]
- Murphy, S. A. & van der Vaart, A. W. (2000). On profile likelihood. J. Am. Statist. Assoc. 95, 449–65. [Google Scholar]
- Parner, E. (1998). Asymptotic theory for the correlated gamma-frailty model. Ann. Statist. 26, 183–214. [Google Scholar]
- Rudin, W. (1973). Functional Analysis. New York: McGraw-Hill. [Google Scholar]
- Sun, J. (2007). The Statistical Analysis of Interval-Censored Failure Time Data. New York: Springer. [Google Scholar]
- Tsiatis, A. (2006). Semiparametric Theory and Missing Data. New York: Springer. [Google Scholar]
- van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press. [Google Scholar]
- van der Vaart, A. (2002). Semiparameric statistics In Lectures on Probability Theory and Statistics, Bernard P., ed. New York: Springer, pp. 331–457. [Google Scholar]
- Yin, G. & Zeng, D. (2006). Efficient algorithm for computing maximum likelihood estimates in linear transformation models. J. Comp. Graph. Statist. 15, 228–45. [Google Scholar]
- Zeng, D. & Lin, D. Y. (2006). Efficient estimation of semiparametric transformation models for counting processes. Biometrika 93, 627–40. [Google Scholar]
- Zeng, D., Mao, L. & Lin, D. Y. (2016). Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 103, 253–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.














































































































































