Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2020 Jun 14;107(4):983–995. doi: 10.1093/biomet/asaa037

A unified approach to the calculation of information operators in semiparametric models

Lu Mao 1,
PMCID: PMC7745773  NIHMSID: NIHMS1632140  PMID: 33343007

Summary

The infinite-dimensional information operator for the nuisance parameter plays a key role in semiparametric inference, as it is closely related to the regular estimability of the target parameter. Calculation of information operators has traditionally proceeded in a case-by-case manner and has often entailed lengthy derivations with complicated arguments. We develop a unified framework for this task by exploiting commonality in the form of semiparametric likelihoods. The general formula developed allows one to derive information operators with simple calculus and, if necessary at all, a minimal amount of probabilistic evaluation. This streamlined approach shows its simplicity and versatility in application to a number of existing models as well as a new model of practical interest.

Keywords: Efficient score, Infinite-dimensional nuisance parameter, Missing data, Survival analysis, Tangent space

1. Introduction

Consider a smooth parametric model with density Inline graphic, where Inline graphic is the parameter of interest and Inline graphic is an unknown nuisance parameter. Suppose that the information matrix for Inline graphic can be written in the partitioned form

graphic file with name M5.gif (1)

where Inline graphic can be regarded as the information for Inline graphic under a known nuisance parameter Inline graphic. When Inline graphic is unknown, the efficient information for Inline graphic is

graphic file with name M11.gif (2)

The second term in (2) represents the information lost due to the unknown Inline graphic. If Inline graphic is positive definite, then a regular and asymptotically linear estimator for Inline graphic exists (Bickel et al., 1993, Ch. 2). Among such estimators, the maximum likelihood estimator is usually the most efficient, with an asymptotic variance matrix that attains the Cramér–Rao bound, i.e., Inline graphic.

Now, consider a semiparametric version of the above model where the Euclidean nuisance parameter Inline graphic is replaced by an infinite-dimensional parameter Inline graphic. Specifically, denote the class of probability measures by

graphic file with name M18.gif (3)

where Inline graphic is a nonparametric space of probability measures or positive finite measures. Use Inline graphic to denote the density of Inline graphic with respect to some dominating measure. Let Inline graphic denote the score function for Inline graphic and Inline graphic the score operator for Inline graphic, where Inline graphic is the original tangent space for Inline graphic (see, e.g., Bickel et al., 1993) and Inline graphic denotes the space of all Inline graphic-square-integrable functions for any measure Inline graphic. Because there is no parametric constraint on Inline graphic, if it is a probability measure, then Inline graphic, the space of all Inline graphic-mean zero square-integrable functions; if it is a positive finite measure, then Inline graphic. In practice one can usually, without loss of generality, work with a smaller set than Inline graphic, e.g., the subset of all bounded functions with bounded variation (see, e.g., van der Vaart, 1998, Ch. 25). In such cases, the score functions for Inline graphic can typically be generated by taking Inline graphic with Inline graphic.

Let Inline graphic denote the information matrix for Inline graphic under a known Inline graphic, where Inline graphic for any vector Inline graphic. Let Inline graphic denote the adjoint of Inline graphic. The information operator for Inline graphic can be expressed in a form analogous to (1):

graphic file with name M47.gif (4)

which acts upon Inline graphic. Here and after, operations on a vector with components in a Hilbert space are understood to operate componentwise. In parallel with (2), the efficient information for Inline graphic in the presence of unknown Inline graphic is

graphic file with name M51.gif (5)

The efficient information Inline graphic is the variance matrix of the efficient score function Inline graphic, i.e., the Inline graphic-projection of Inline graphic onto the orthogonal complement of the nuisance tangent space Inline graphic.

To illustrate with a concrete example, consider the Cox model with right-censored data (Cox, 1975), the reputed archetype of semiparametric models. Let Inline graphic denote the event time of interest, and Inline graphic a vector of covariates. The Cox proportional hazards model specifies that

graphic file with name M59.gif (6)

where Inline graphic is the conditional cumulative hazard function of Inline graphic given Inline graphic, Inline graphic is the regression parameter, Inline graphic is the baseline cumulative hazard function, and Inline graphic is the maximum length of follow-up. Here, Inline graphic is the parameter of interest and Inline graphic is the nuisance parameter. Let Inline graphic denote the censoring time which satisfies Inline graphic. Then, the observed data consist of Inline graphic, where Inline graphic is the indicator function and Inline graphic. With Inline graphic, the elements on the right-hand side of (4) can be derived explicitly as follows (see, e.g., § 25.12.1 of van der Vaart, 1998): Inline graphic, Inline graphic, Inline graphic, and Inline graphic, where Inline graphic, Inline graphic, and Inline graphic. Using these identities, the efficient information Inline graphic can be calculated explicitly under (5).

Like in the parametric model, positive definiteness of Inline graphic is necessary for the regular estimability of Inline graphic, and is usually the key condition governing the asymptotic efficiency of the maximum likelihood estimator (see, e.g., van der Vaart, 1998, Ch. 25). Unfortunately, direct calculation of Inline graphic is rarely feasible beyond the archetypal example of the Cox model. This is because the semiparametric information operator Inline graphic for the nuisance parameter, i.e., the analogue of Inline graphic from the parametric case, is a map between infinite-dimensional spaces. Consequently, its properties such as invertibility are often more elusive than information matrices in parametric models. Even if the operator is invertible, an analytic expression for its inverse is generally nonexistent, making it impossible to evaluate Inline graphic through (5).

Due to these challenges, proof of regular estimability for the Euclidean parameter in a semiparametric model often involves lengthy and complicated arguments on a case-by-case basis. A common pivotal component of such arguments, however, is the derivation of the information operator Inline graphic and its mathematical properties. There are several broad-based approaches to deriving Inline graphic in the existing literature. The first is a conditional expectation approach suitable for the so-called information loss models (§ 25.5.2 of van der Vaart, 1998) which include many models for missing/censored data. The approach assumes that the observed data Inline graphic is a randomly coarsened version of the full data Inline graphic, the latter following a nonparametric distribution Inline graphic. It then follows that Inline graphic with Inline graphic, and that Inline graphic with Inline graphic. This approach is not completely general as it is restricted to a particular type of data-generating mechanism. In addition, the evaluation of the conditional expectations can be mathematically formidable unless the model is sufficiently simple. The other approach involves constructing a least favourable submodel such that its score function coincides with the efficient score (see, e.g., Huang & Wellner, 1997; Murphy et al., 1997; Zeng & Lin, 2006). This approach is generally applicable, but draws heavily on functional analytic arguments regarding Hilbert-space operators. Unfortunately, to the average statistician, the techniques needed to carry out the calculation in both approaches are mostly nonstandard and unfamiliar.

In this paper we propose a simple and general approach to the calculation of information operators in semiparametric models. Theoretical results are derived based on a common form of semiparametric likelihoods. Consequently, when studying a particular model, the user need only plug in the specific expressions from the model to replace the generic terms in the general likelihood. This allows one to bypass complicated functional analytic and probabilistic arguments characteristic of individual case-by-case treatments. It also offers new insights into results obtained earlier in the literature on a seemingly ad hoc basis.

2. Theory and methods

2.1. Two types of semiparametric problems

As mentioned in § 1, the information operator Inline graphic is the semiparametric counterpart of Inline graphic in (1). Unlike the parametric case, however, Inline graphic is sometimes noninvertible so that the formula (5) for the efficient score does not apply. In such cases, the infinite-dimensional parameter Inline graphic is not Inline graphic-estimable (see, e.g., Huang & Wellner, 1997). Depending on whether Inline graphic is continuously invertible, i.e., its inverse operator exists and is continuous, most of the semiparametric models in the literature can be classified into two categories.

In Category 1, Inline graphic can be expressed as the sum of a continuously invertible operator Inline graphic, often in the form of a multiplication operator, and a compact operator Inline graphic, often in the form of an integral operator. A compact operator maps the unit ball in Inline graphic into a totally bounded set. Because it is generally impossible to find an analytic expression for Inline graphic, the positive definiteness of Inline graphic is usually assessed, not by (5), but by indirect means. The idea is best illustrated using the parametric example. If the efficient information for Inline graphic in the parametric model is invertible, by rules of matrix inverse, one has that

graphic file with name M110.gif (7)

where Inline graphic. The matrix Inline graphic is the efficient information for Inline graphic in the presence of an unknown Inline graphic. By (7), provided that Inline graphic is invertible, the invertibility of Inline graphic is equivalent to that of Inline graphic. Similarly, in the semiparametric case define Inline graphic by Inline graphic, where Inline graphic is a compact operator. As the semiparametric analogue of Inline graphic, the operator Inline graphic is the efficient information operator for Inline graphic in the presence of an unknown Inline graphic. Also similarly to the parametric case, under a nonsingular Inline graphic, the efficient information Inline graphic is nonsingular if Inline graphic is continuously invertible, which is a somewhat stronger condition than Inline graphic being continuously invertible. Intuitively, continuous invertibility of Inline graphic means not only that Inline graphic is regularly estimable, but also that Inline graphic and Inline graphic are not locally confounded. Because Inline graphic is the sum of a continuously invertible operator Inline graphic and a compact operator Inline graphic, by the Fredholm theory (Rudin, 1973), it is continuously invertible if it is one-to-one. The latter condition can usually be proved through local identifiability arguments. If the estimator is obtained by maximum likelihood, its asymptotic properties are best handled by the likelihood equations approach (see § 25.12 of van der Vaart, 1998). Examples of Category 1 problems include Murphy (1995), Murphy et al. (1997), Parner (1998), Kosorok et al. (2004), Zeng & Lin (2006) and Mao & Lin (2017), among others.

In the second category, Inline graphic is itself a compact integral operator and is therefore non-invertible. The nuisance parameter Inline graphic in such cases is generally estimable only at rates slower than Inline graphic, but the same need not be true for Inline graphic. To check whether Inline graphic is regularly estimable, one seeks to derive, or at least show the existence of, a least favourable direction Inline graphic satisfying the normal equation

graphic file with name M142.gif (8)

Then, the efficient score for Inline graphic, defined as the projection of Inline graphic onto the orthogonal complement of Inline graphic, is Inline graphic. This is because Inline graphic for all Inline graphic, where Inline graphic denotes the inner product in Inline graphic. The nonsingularity of Inline graphic may be proved through local identifiability arguments. If the estimator is obtained by maximum likelihood, its asymptotic properties are best handled by the approximately least-favourable submodels approach (see § 25.11 of van der Vaart, 1998). Examples of Category 2 problems include Huang (1995, 1996), Huang & Wellner (1997) and Zeng et al. (2016), among others.

For both categories, one needs to derive the specific forms of the information operators Inline graphic and Inline graphic, and check the corresponding requirements such as continuous invertibility or existence of a root. Such analyses usually constitute the main steps in deriving the asymptotic properties of the maximum likelihood estimators, and should not be taken lightly since semiparametric likelihoods may sometimes be ill-behaved (van der Vaart, 2002, § 5.2).

2.2. A general formula for the information operators

Our approach to establishing a general formula for the information operators consists in taking second derivatives on a general form of semiparametric likelihoods. In the parametric setting, it is well known that the information matrix can be equivalently expressed as the expectation of the negative quadrature of the loglikelihood. This equivalency can be exploited in the semiparametric case to simplify calculation as well. The following lemma lays the foundation for the subsequent derivation of information operators. Throughout, we assume that model (3) is sufficiently smooth to justify pointwise differentiation as a means of score generation and interchange of expectation and differentiation whenever appropriate. For a more general set-up for smooth models based on differentiability in the quadratic mean, see Bickel et al. (1993).

Lemma 1.

Let Inline graphic be a score function for model (3). Write Inline graphic, Inline graphic. Then  

Lemma 1.

The following theorem presents the formulas for the score functions, score operators and information operators based on a general form of semiparametric likelihoods. The proof involves straightforward application of Lemma 1 with Inline graphic or Inline graphic. Unless otherwise specified, we use Inline graphic and Inline graphic to denote the first and second derivatives of a generic smooth function Inline graphic.

Theorem 1.

Suppose that the loglikelihood for model (3) takes the form  

Theorem 1. (9)

 where Inline graphic, Inline graphic, and Inline graphic are real-valued data-dependent functions and Inline graphic is a data-dependent linear functional on the closed linear span of the space for Inline graphic, the log-density of Inline graphic with respect to a certain dominating measure. Write Inline graphic, Inline graphic and Inline graphic, where Inline graphic. Then, if Inline graphic, we have that  

Theorem 1.

 where  

Theorem 1. (10)

If Inline graphic, the results are the same except that Inline graphic and Inline graphic are centred to have Inline graphic-mean zero, and Inline graphic.

Remark 1.

For notational simplicity, we have assumed that the function Inline graphic in Theorem 1 is real-valued. It is straightforward to extend the results to vector-valued Inline graphic. Furthermore, instead of a single nuisance parameter Inline graphic, one can extend the framework to accommodate multiple nuisance parameters Inline graphic. In such cases, the original tangent space for Inline graphic will be Inline graphic, where Inline graphic is the original tangent space for Inline graphic  Inline graphic. These extensions are considered and illustrated in the Supplementary Material.

As mentioned in § 2.1, the efficient score for Inline graphic is the score function minus its projection onto the tangent space for Inline graphic, i.e., Inline graphic, where Inline graphic solves the normal equation (8). Under the conditions of Theorem 1, both sides of (8) have explicit expressions. The problem is then treated as either Category 1 or Category 2 depending on whether Inline graphic is continuously invertible.

2.3. Positive definiteness of the efficient information

Under the conditions of Theorem 1, the information operator Inline graphic can be written as the sum of a multiplication operator with multiplier Inline graphic and a compact Hilbert–Schmidt integral operator with kernel Inline graphic, insofar as Inline graphic is square-integrable by Inline graphic, which we shall always assume to be true. The multiplication operator is continuously invertible if Inline graphic is bounded above and away from zero. If so, the model is of Category 1. Likewise, if Inline graphic, it is of Category 2. The local identifiability condition needed for both categories to ensure nonsingularity of Inline graphic can be stated formally as follows.

Condition 1 (Local identifiability).

If

Condition 1 (Local identifiability). (11)

Inline graphic-almost surely for some Inline graphic and Inline graphic, then Inline graphic and Inline graphic.

Since the left-hand side of (11) is a score function in the general form Inline graphic, Condition 1 simply says that the joint score operator is one-to-one so that local alternatives to Inline graphic in all possible directions can be identified. In particular, it implies that Inline graphic is positive definite. To use it to show that Inline graphic is one-to-one for Category 1 problems, take Inline graphic and Inline graphic. Then, one can show that Inline graphic implies Inline graphic, which in turn implies Inline graphic. For Category 2 problems, provided that Inline graphic as a solution to (8) exists, one may take Inline graphic to show that Inline graphic is positive definite.

Corollary 1.

Suppose that Condition 1 is satisfied. Then, Inline graphic is positive definite if either of the following is true:  

  • (i) Category Inline graphic: There exists Inline graphic such that Inline graphic.

  • (ii) Category Inline graphic: Inline graphic and the solution Inline graphic to the following integral equation exists:  
    graphic file with name M229.gif (12)

In the second case, solution of Inline graphic usually starts with taking derivatives on both sides of (12). For example, Huang & Wellner (1997) took this route to show that the solution exists for the Cox model with case-2 interval-censored data. In general, this approach requires that Inline graphic is a smooth function and lies in the range of Inline graphic.

The following two propositions can usually simplify calculations of the quantities in (10) for Category 2 problems. The first one is fairly intuitive: if the density of Inline graphic does not appear in the likelihood, then information on some aspects thereof cannot be recovered to the first order and thus Inline graphic will not be continuously invertible.

Proposition 1.

Under the conditions of Theorem 1, if Inline graphic then Inline graphic  Inline graphic-almost everywhere.

Proof.

With Inline graphic, use Inline graphic to find that Inline graphic for all Inline graphic. If Inline graphic, this means that Inline graphic; if Inline graphic, this means that Inline graphic is a constant, which also leads to the desired result by the form of Inline graphic in this case. ☐

For Category 2 problems, derivation of the normal equation (12) can be further simplified if Inline graphic is a conditional density in a certain form.

Proposition 2.

Suppose that the density for model (3) can be written in the form  

Proposition 2.

 where Inline graphic is the conditional density of Inline graphic given Inline graphic and Inline graphic is the marginal density of Inline graphic. If the loglikelihood Inline graphic can be written in the form of (9) with Inline graphic, Inline graphic, Inline graphic and Inline graphic for some deterministic functions Inline graphic and Inline graphic, then Inline graphic and Inline graphic. Then, the normal equation (12) becomes  

Proposition 2. (13)

Proof.

In light of Proposition 1, we only need to show that Inline graphic. Because Inline graphic is now a score function for the conditional density of Inline graphic given Inline graphic, we have that Inline graphic. The result follows from the fact that Inline graphic depends on Inline graphic only. ☐

Proposition 2 applies to all standard regression models for interval-censored data, where the examination times are conditionally independent of the event times given covariates (see, e.g., Sun, 2007). Indeed, let Inline graphic be the event time of interest, Inline graphic be a sequence of examination times, Inline graphic be the observed indicators for the affiliation of Inline graphic to the intervals partitioned by Inline graphic, and Inline graphic be the covariates. If Inline graphic and Inline graphic parametrizes only the conditional distribution of Inline graphic given Inline graphic, the conditions of Proposition 2 are satisfied with Inline graphic and Inline graphic. In these models, the nuisance parameter Inline graphic, usually played by a nonparametric baseline function, is estimable only at Inline graphic, whereas the regression parameter Inline graphic is often regularly estimable.

3. Applications

3.1. The Cox model under right and interval censorships

We first consider the simple example of the Cox model for right-censored data described in § 1. The loglikelihood for the observed data is

graphic file with name M286.gif (14)

with Inline graphic. Comparing (14) with (9), one readily recognizes that Inline graphic, Inline graphic, Inline graphic and Inline graphic. The last identity means that Inline graphic operates on Inline graphic by evaluating it at Inline graphic and then multiplying it by Inline graphic. Hence, Inline graphic, Inline graphic, Inline graphic and Inline graphic. By Theorem 1, we have that Inline graphic, Inline graphic, Inline graphic and Inline graphic. If Inline graphic has bounded support and Inline graphic, we have that the multiplier Inline graphic is bounded above and away from zero. It is thus a Category 1 problem, but is special in that the efficient score can be constructed explicitly. Indeed, the normal equation (8) can be solved by Inline graphic. Then, an approximation to the efficient score Inline graphic can be constructed by replacing Inline graphic with an empirical version, leading to the familiar partial likelihood score function for Inline graphic (Cox, 1975). Furthermore, under linear independence of Inline graphic, it is easy to show that Condition 1 is satisfied so that the efficient information is positive definite. Related models for recurrent event and competing risks are considered in the Supplementary Material.

The Cox model under case-1 interval censoring, studied in detail by Huang (1996), offers an example of a Category 2 problem. The conditional hazard of Inline graphic given Inline graphic is specified by the same model (6), but the observed data now consist of Inline graphic, where Inline graphic is the examination time satisfying Inline graphic. Clearly, the likelihood for the observed data satisfies the conditions of Proposition 2 with Inline graphic and Inline graphic. The loglikelihood is

graphic file with name M319.gif

So, we may set Inline graphic and Inline graphic, so that Inline graphic, Inline graphic, and Inline graphic. By Proposition 2, the normal equation is in the form of (13), which, by straightforward iterated conditional expectation, can be simplified to

graphic file with name M325.gif (15)

where Inline graphic and Inline graphic are redefined by Inline graphic respectively, and Inline graphic

Assuming that the support of Inline graphic contains Inline graphic, take the derivatives on both sides of (15) to find that

graphic file with name M332.gif (16)

where Inline graphic. So, Inline graphic. Using (16), one easily obtains the efficient score

graphic file with name M335.gif

Remark 2.

Huang (1996) and van der Vaart (1998, 2002) derived the same result for the efficient score by orthogonal projections. However, their approach seems to be model specific and, particularly in the construction of an approximately least favourable submodel, to require a fair amount of ingenuity. On the other hand, the approach outlined here involves only routine calculus and a minimal amount of probabilistic evaluation that yields (15).

Like in other interval-censoring problems (Huang & Wellner, 1997), the maximum likelihood estimator for the infinite-dimensional baseline function Inline graphic converges at Inline graphic. However, provided that the efficient information for Inline graphic is positive definite, this nonstandard rate does not interfere with the asymptotic normality and efficiency of the maximum likelihood estimator for Inline graphic. In fact, it appears that, in general, the nuisance parameter estimator need only converge at a rate faster than Inline graphic (see, e.g., § 6 of Huang, 1996).

3.2. Regression models with missing covariates

Suppose that the conditional density of outcome Inline graphic with respect to a dominating measure Inline graphic given regressor Inline graphic is specified through a parametric model Inline graphic, where Inline graphic. Estimation of Inline graphic would be standard if Inline graphic is fully observed, because the likelihood factorizes into two parts containing Inline graphic and Inline graphic, respectively. In the case of missing data in the regressor, however, the nonparametric component Inline graphic will get entangled with the regression parameter and this will complicate inference. Problems of this type have been studied in general settings by Lawless et al. (1999). Here we consider a simple case with a single level of missingness in Inline graphic. Using the notation of Tsiatis (2006), we denote the coarsened regressor by Inline graphic, where Inline graphic is a known many-to-one function. Let Inline graphic if the full data Inline graphic are observed and Inline graphic if only the coarsened version Inline graphic is available. We assume that the data are coarsened at random, that is, Inline graphic, where Inline graphic is some arbitrary function for the selection probability. Assuming that Inline graphic involves no aspect of Inline graphic, the loglikelihood for the observed data Inline graphic is

graphic file with name M363.gif

Thus, we may set Inline graphic, Inline graphic, Inline graphic and Inline graphic. Then, we have that Inline graphic, Inline graphic and Inline graphic, where Inline graphic is the full-data score function for Inline graphic. Using straightforward calculus, it is not hard to obtain that

graphic file with name M373.gif

where Inline graphic. Some details on the derivation of Inline graphic are given in the Supplementary Material.

Proposition 3.

Suppose that the following two conditions hold:  

  • (a) Inline graphic for some Inline graphic;

  • (b) Inline graphic is positive definite almost surely.

Then, the efficient information Inline graphic is positive definite.

Proof.

The multiplier Inline graphic is clearly bounded above, and is bounded away from zero by (a). One can easily use (a) and (b) to verify Condition 1. The result follows by Corollary 1. ☐

Remark 3.

The information operator Inline graphic may be derived using the traditional information-loss approach. One first evaluates Inline graphic and then derives Inline graphic. To arrive at a simple expression such as obtained here, substantial work is needed to evaluate the conditional expectations through repeated use of the Bayes rule and interchange of integrals.

3.3. A novel semiparametric survival-sacrifice model

Finally, we apply our methods to a novel semiparametric model for so-called survival-sacrifice data, which are routinely collected in animal carcinogenicity experiments. In such experiments, animals are randomized into different treatment arms to receive different doses of a carcinogen. The goal is to compare the incidence of tumor between arms over the course of follow-up. Let Inline graphic denote time to tumor formation and let Inline graphic denote time to tumor-caused death, so we have that Inline graphic almost surely. Because the tumor under study is usually impalpable, necropsy is needed to determine whether a tumor is present. Such an examination is performed either at death time Inline graphic or at a random censoring time Inline graphic that is unrelated to the tumor. Hence, the observed data consist of Inline graphic. In scientific terminology, the variables Inline graphic and Inline graphic indicate fatal and incidental tumors, respectively. Write Inline graphic and Inline graphic. Assuming Inline graphic, we can write the likelihood of the observed data Inline graphic as a multiple of

graphic file with name M396.gif (17)

Nonparametric estimation of Inline graphic has been well studied in the literature. Recently, Mao (2019) proposed ad hoc numerical procedures for the regression of Inline graphic and Inline graphic against a set of covariates Inline graphic, but without a formal study of the semiparametric theory. Here we consider the theoretical aspects of a simple regression model for the conditional cumulative hazards of Inline graphic and Inline graphic given Inline graphic, denoted by Inline graphic and Inline graphic, respectively. Let

graphic file with name M406.gif (18)

The baseline hazard functions of the two marginal models differ only by a multiplicative factor Inline graphic, with Inline graphic to account for the ordering Inline graphic. This parsimonious model allows for joint assessment of covariate effects on both fatal and incidental tumors through Inline graphic.

Write Inline graphic, Inline graphic and Inline graphic. Assume that Inline graphic. By replacing the cumulative distribution functions in (17) with their conditional counterparts specified in model (18), we obtain the loglikelihood

graphic file with name M415.gif

where Inline graphic. Now, it is easy to recognize that Inline graphic, Inline graphic, Inline graphic, and Inline graphic. So, in this case we have a vector-valued Inline graphic function as discussed in Remark 1. By straightforward calculus, we have that Inline graphic, Inline graphic, and Inline graphic, where Inline graphic and

graphic file with name M426.gif

Based on these quantities, we show in the Supplementary Material that, under suitable regularity conditions, Condition 1 for local identifiability is satisfied. Furthermore, using the fact that Inline graphic, one can easily show that Inline graphic. If Inline graphic has bounded support and Inline graphic, we have that Inline graphic is bounded above and away from zero. Thus, by Corollary 1, the efficient information for Inline graphic is positive definite under these conditions.

4. Remarks

We have focused on semiparametric models indexed jointly by a Euclidean parameter of interest and an infinite-dimensional nuisance parameter. The proposed approach, however, can easily be adapted for a nonparametric model Inline graphic, where the interest centres on certain aspects of the infinite-dimensional parameter Inline graphic; see the Supplementary Material for details.

The specified form of loglikelihood (9) appears general enough to encompass a surprisingly large pool of existing semiparametric models. It is thus reasonable to expect our framework to be amenable and useful to many new models to come. Straightforward extensions to Theorem 1 exist to further widen its applicability. For example, the function Inline graphic can be made dependent on Inline graphic, which will accommodate the loglikelihoods for some frailty models in survival analysis (see, e.g., Kosorok et al., 2004). Furthermore, the conventional derivative of Inline graphic with respect to Inline graphic can be replaced by a generalized derivative, e.g., one such that Inline graphic. This generalization is useful when applied to the accelerated failure time model (Buckley & James, 1979), where Inline graphic may be in the form of Inline graphic.

Our framework is most useful when the likelihood is explicitly indexed by a Euclidean parameter and an infinite-dimensional parameter. Other semiparametric models are more naturally formulated through moment or conditional moment constraints (see, e.g., Bickel et al., 1993, § 6.2); for such models it is usually easier to derive information operators via direct projection methods.

We have been concerned exclusively with the theoretical conditions that imply the positive definiteness of the efficient information. Even when the conditions are satisfied, however, numerical implementation of the maximum likelihood may still be a challenge due to the presence of an infinite-dimensional nuisance parameter. Finite-dimensional approximation may produce bias if the approximation is too coarse and may render computation infeasible if it is too fine. The profile likelihood approach (Murphy & van der Vaart, 2000) circumvents the infinite dimensionality issue by profiling out the nuisance parameter in the likelihood, and is thus a numerically reliable procedure for fitting semiparametric models (see, e.g., Yin & Zeng, 2006; Zeng & Lin, 2006; Mao & Lin, 2017).

Supplementary Material

asaa037_Supplementary_Data

Acknowledgement

This research was supported by the U.S. National Institutes of Health. I appreciate the helpful comments by the editor, associate editor and two referees.

Appendix

Proof of Theorem 1.

Calculations for Inline graphic and Inline graphic are straightforward. We thus focus on exhibiting the forms of Inline graphic and Inline graphic. By Lemma 1, if Inline graphic, where Inline graphic is a bounded function with bounded total variation in Inline graphic, then

Proof of Theorem 1. (A1)

The results for Inline graphic then follow by equating it to Inline graphic if Inline graphic and to Inline graphic if Inline graphic.

Now, consider Inline graphic, where Inline graphic. Here we used Inline graphic instead of Inline graphic to stress the possible local dependence of the direction Inline graphic on Inline graphic. If Inline graphic and Inline graphic, since Inline graphic is bounded, we have that Inline graphic for all Inline graphic. So, Inline graphic is a score function under Inline graphic. Thus, Inline graphic can be derived from Inline graphic similarly to (A1). For Inline graphic, however, a fixed score Inline graphic does not generally have Inline graphic-mean zero so that Inline graphic. To circumvent this problem, set Inline graphic and apply the previous calculations to the score function Inline graphic to obtain the desired result. More details can be found in the Supplementary Material. ☐

Supplementary material

Supplementary Material available at Biometrika online includes technical results and additional examples.

References

  1. Bickel, P. J., Klaassen, C. A., Ritov, Y. A. & Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Baltimore: Johns Hopkins University Press. [Google Scholar]
  2. Buckley, J. & James, I. (1979). Linear regression with censored data. Biometrika  66, 429–36. [Google Scholar]
  3. Cox, D. R. (1975). Partial likelihood. Biometrika  62, 269–76. [Google Scholar]
  4. Huang, J. (1995). Maximum likelihood estimation for proportional odds regression model with current status data In Analysis of Censored Data, IMS Lecture Notes - Monograph Series  27, Koul H. L. & Deshpande J. V., eds. Hayward: IMS, pp. 129–46. [Google Scholar]
  5. Huang, J. (1996). Efficient estimation for the proportional hazards model with interval censoring. Ann. Statist.  24, 540–68. [Google Scholar]
  6. Huang, J. & Wellner, J. A. (1997). Interval censored survival data: a review of recent progress In Proc. 1st Seattle Symp. Biostatistics: Survival Analysis, Lin D. Y. & Fleming T. R., eds. New York: Springer, pp. 123–69. [Google Scholar]
  7. Kosorok, M. R., Lee, B. L. & Fine, J. P. (2004). Robust inference for univariate proportional hazards frailty regression models. Ann. Statist.  32, 1448–91. [Google Scholar]
  8. Lawless, J. F., Kalbfleisch, J. D. & Wild, C. J. (1999). Semiparametric methods for response-selective and missing data problems in regression. J. R. Statist. Soc.  B, 61, 413–38. [Google Scholar]
  9. Mao, L. (2019). Proportional hazards regression of survival-sacrifice data with cause-of-death information in animal carcinogenicity studies. Statist. Med., 38, 3628–41. [DOI] [PubMed] [Google Scholar]
  10. Mao, L. & Lin, D. Y. (2017). Efficient estimation of semiparametric transformation models for the cumulative incidence of competing risks. J. R. Statist. Soc.  B, 79, 573–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Murphy, S. A. (1995). Asymptotic theory for the frailty model. Ann. Statist.  23, 182–98. [Google Scholar]
  12. Murphy, S. A., Rossini, A. J. & van der Vaart, A. W. (1997). Maximum likelihood estimation in the proportional odds model. J. Am. Statist. Assoc.  92, 968–76. [Google Scholar]
  13. Murphy, S. A. & van der Vaart, A. W. (2000). On profile likelihood. J. Am. Statist. Assoc.  95, 449–65. [Google Scholar]
  14. Parner, E. (1998). Asymptotic theory for the correlated gamma-frailty model. Ann. Statist.  26, 183–214. [Google Scholar]
  15. Rudin, W. (1973). Functional Analysis. New York: McGraw-Hill. [Google Scholar]
  16. Sun, J. (2007). The Statistical Analysis of Interval-Censored Failure Time Data. New York: Springer. [Google Scholar]
  17. Tsiatis, A. (2006). Semiparametric Theory and Missing Data. New York: Springer. [Google Scholar]
  18. van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press. [Google Scholar]
  19. van der Vaart, A. (2002). Semiparameric statistics In Lectures on Probability Theory and Statistics, Bernard P., ed. New York: Springer, pp. 331–457. [Google Scholar]
  20. Yin, G. & Zeng, D. (2006). Efficient algorithm for computing maximum likelihood estimates in linear transformation models. J. Comp. Graph. Statist.  15, 228–45. [Google Scholar]
  21. Zeng, D. & Lin, D. Y. (2006). Efficient estimation of semiparametric transformation models for counting processes. Biometrika  93, 627–40. [Google Scholar]
  22. Zeng, D., Mao, L. & Lin, D. Y. (2016). Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika  103, 253–71. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

asaa037_Supplementary_Data

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES