Maximum Likelihood Inference for the Cox Regression Model with Applications to Missing Covariates

Ming-Hui Chen; Joseph G Ibrahim; Qi-Man Shao

doi:10.1016/j.jmva.2009.03.013

. Author manuscript; available in PMC: 2009 Oct 1.

Published in final edited form as: J Multivar Anal. 2009 Oct 1;100(9):2018–2030. doi: 10.1016/j.jmva.2009.03.013

Maximum Likelihood Inference for the Cox Regression Model with Applications to Missing Covariates

Ming-Hui Chen ¹, Joseph G Ibrahim ¹, Qi-Man Shao ¹

PMCID: PMC2744117 NIHMSID: NIHMS94916 PMID: 19802375

Abstract

In this paper, we carry out an in-depth theoretical investigation for existence of maximum likelihood estimates for the Cox model (Cox, 1972, 1975) both in the full data setting as well as in the presence of missing covariate data. The main motivation for this work arises from missing data problems, where models can easily become difficult to estimate with certain missing data configurations or large missing data fractions. We establish necessary and sufficient conditions for existence of the maximum partial likelihood estimate (MPLE) for completely observed data (i.e., no missing data) settings as well as sufficient conditions for existence of the maximum likelihood estimate (MLE) for survival data with missing covariates via a profile likelihood method. Several theorems are given to establish these conditions. A real dataset from a cancer clinical trial is presented to further illustrate the proposed methodology.

Keywords: Missing at random (MAR), Monte Carlo EM algorithm, Existence of partial maximum likelihood estimate, Necessary and sufficient conditions, Partial likelihood, Proportional hazards model

1 Introduction

There is a vast literature on parameter estimation in the Cox model in presence of missing covariates, including Schluchter and Jackson (1989), Lin and Ying (1993), Lipsitz and Ibrahim (1996, 1998, 2000), Paik (1997), Paik and Tsai (1997), Chen and Little (1999), Herring and Ibrahim (2001), Leong, Lipsitz, and Ibrahim (2001), Chen (2002), Pons (2002), Herring, Ibrahim, and Lipsitz (2002, 2004), and Chen, Ibrahim, and Shao (2006). However, there is very little literature addressing specific theoretical conditions for the existence of MLE’s of the Cox model in either the full data case or in the presence of missing covariate data. We are not aware of specific literature that establishes specific theoretical results for existence of such estimates. This is what we set out to do in this paper. Specifically, we provide necessary and sufficient conditions for existence of the Maximum Partial Likelihood Estimate (MPLE) with no missing data as well as sufficient conditions for existence of the Maximum Likelihood Estimate (MLE) with Missing at Random (MAR) covariate data via the profile likelihood method. The methodology proposed here is quite new and will shed light on the characterizations of existence of the MPLE or MLE for the Cox model with complete data as well as with missing covariate data. The profile likelihood method for obtaining the MLE in the presence of MAR covariates is quite different from the other parametric and semiparametric approaches seen in the literature. The profile likelihood method is genuinely non-parametric in estimating the cumulative baseline hazard and does not require a semi-parametric estimate of the baseline hazard as is required in Lipsitz and Ibrahim (1998) and Herring and Ibrahim (2001).

We mention that Jacobsen (1989) establishes a necessary and sufficient condition for existence of the MPLE for the Cox model without missing covariate data, Chen, Ibrahim, and Shao (2004) consider issues in posterior propriety and characterize conditions for existence of the MLE in generalized linear models with MAR covariate data, and Huang, Chen, and Ibrahim (2005) carry out a detailed investigation of posterior propriety in generalized linear models with nonignorably missing covariate data. The methods and models considered in those papers are quite different from the Cox model setting. In the Cox model, i) we no longer have independence between the observations in the construction of the partial likelihood, that is, the complete data log-likelihood is not a sum of n independent observations, ii) the Cox regression model, and in particular, Cox’s partial likelihood, is an inherently semiparametric model, and thus a profile likelihood method considered here is quite different than the fully parametric models considered in Chen, Ibrahim, and Shao (2004) and Huang, Chen, and Ibrahim (2005), and iii) right censoring and tied observations require new theory not developed in Chen, Ibrahim, and Shao (2004) and Huang, Chen, and Ibrahim (2005). Thus, i) – iii) will require new theory for characterizing conditions for existence of the MPLE and MLE of the regression coefficients in the Cox model allowing for tied observations.

The significance of this work thus has two aspects. First, the proposed methodology will allow the data analyst to determine, for a given dataset, whether the MPLE or MLE exists before carrying out the analysis. Such a methodology is critical since it is not always clear from the computer output in an analysis whether the MPLE or MLE exists or not. Second, such conditions will be useful for determining suitable starting values for EM-type algorithms when fitting these models. Thus, the practical consequences of the proposed methodology is that we provide valuable tools for checking existence of the MPLE or MLE as well as inferential and computational tools for maximum likelihood based inference for the Cox model with or without MAR covariates.

The rest of this article is organized as follows. Section 2 presents several motivating examples. We give necessary and sufficient conditions for the existence of the MPLE with no missing data in Section 3 and give sufficient conditions for existence of the MLE in the presence of MAR covariate data in Section 4. The computational development involving the Monte Carlo EM (MCEM) algorithm is given in Section 5. Section 6 presents a detailed analysis of a lung cancer dataset to further illustrate the proposed methodology. Proofs of all theorems are given in the Appendix.

2 Motivating Examples

To fix ideas, let y_i denote the minimum of the censoring time C_i and the survival time T_i, and let x_i = (x_i₁,…,x_ip)′ be the p × 1 vector of covariates associated with y_i for the i^th subject. Denote by β = (β_1,…, β_p)′ the p × 1 vector of regression coefficients. Also, δ_i = 1{T_i = y_i} is the indicator for the event for i = 1, 2, …, n, where n is the total number of observations and ℛ(t) = {i : y_i ≥ t} is the set of subjects at risk at time t. Then, the partial likelihood of Cox (1975) is given by

L_{p} (β ∣ D_{obs}) = \prod_{i = 1}^{n} {[\frac{exp (x_{i}^{'} β)}{\sum_{j \in R (y_{i})} exp (x_{j}^{'} β)}]}^{δ_{i}},

(2.1)

where D_obs = {(y_i, δ_i, x_i) : i = 1, 2, …, n} is the observed univariate right censored survival data. As usual, we assume throughout that x_i does not include an intercept, since the interceptis not estimable in the Cox partial likelihood, and that given x_i, T_i and C_i are independent. For the completely observed data D_obs, the maximum partial likelihood estimate (MPLE) is defined as β̂ = arg max L_p(β|D_obs). The asymptotic properties of β̂ have been well studied in the literature, and in fact, the MPLE can be computed via standard statistical software, such as the SAS procedure, PROC PHREG. However, it remains unclear when the MPLE exists and when it does not for a given dataset. To motivate the proposed methodology, we consider the following two examples.

Example 1: A Simple Illustration

Suppose n = 3, y₁ and y₂ are two failure times, y₃ is a right censored survival time, and we have one binary covariate x. Let x₁, x₂, and x₃ denote the three observed values of x. Assuming y₁ < y₂ < y₃, the partial likelihood of Cox (1975) is then given by

L_{p} (β_{1} ∣ D_{obs}) = \frac{exp (x_{1} β_{1})}{exp (x_{1} β_{1}) + exp (x_{2} β_{1}) + exp (x_{3} β_{1})} \times \frac{exp (x_{2} β_{1})}{exp (x_{2} β_{1}) + exp (x_{3} β_{1})},

where D_obs = {(y_i, x_i), i = 1, 2, 3}. We consider two special cases.

Case 1

x₁ = x₂ = 0 and x₃ = 1. In this case, we have $L_{p} (β_{1} ∣ D) = \frac{1}{2 + exp (β_{1})} \times \frac{1}{2 + exp (β_{1})}$ . Then, we can see that the maximum value of L_p(β₁|D) is attained at β₁ = −∞. Thus, the MPLE does not exist.

Case 2

x₁ = 0, x₂ = 1, and x₃ = 0. In this case, we have $L_{p} (β_{1} ∣ D_{obs}) = \frac{1}{2 + exp (β_{1})} \times \frac{exp (β_{1})}{exp (β_{1}) + 1}$ . Then, the MPLE does exist. In fact, the MPLE of β₁ is $\frac{1}{2} log (2)$ .

In Example 1, the partial likelihood function behaves quite differently by simply switching two observed values of the covariate: one leads to the existence of the MPLE and the other does not. Thus, a natural question is what are general if and only if conditions for the existence of the MPLE in the Cox model? From this illustrative example, we can see that this is not an easy problem to solve, as it requires an in-depth theoretical investigation to find such conditions.

Example 2: Prostate Cancer Data

We consider data, which consist of n = 550 men who were treated with radiation therapy following with six months of with short-course androgen suppression therapy for localized prostate cancer with at least one adverse risk factor (prostate-specific antigen [PSA] > 10 ng/mL, biopsy Gleason score 7 to 10, or 2002 American Joint Commission on Cancer (AJCC) clinical tumor category T2b or T2c) between 1989 and 2002. The outcome variable (y_i) in years was time to prostate cancer death, which is continuous and subject to right censoring, and δ_i = pfail denotes the censoring indicator which equals 1 if the i^th subject died due to prostate cancer, and 0 otherwise. The goal of this study was to determine whether the number of risk factors present was associated with time to prostate cancer death (Tsai et al., 2006).

Define A = I {PSA > 10}, B = I {Gleason ≥ 7}, and C = I {T2b or T2c}. We consider five covariates: AB, AC, BC, ABC, and age. There are no missing values in this data set. A Cox proportional hazards model was fitted to this data set. The following outputs were produced by SAS Procedure PHREG:

Variable	DF	Parameter Estimate	Standard Error	Chi-Square	Pr > ChiSq
AB	1	0.39759	1.23355	0.1039	0.7472
AC	1	−14.30314	2107	0.0000	0.9946
BC	1	0.59060	1.22714	0.2316	0.6303
ABC	1	2.22155	0.80450	7.6253	0.0058
age	1	0.02262	0.04821	0.2201	0.6390

Open in a new tab

From the above results, we see that although SAS Procedure PHREG does produce the estimates for all five covariates, clearly there is some identifiability problem with the covariate, AC, as it has a large value of the estimate along with a huge standard error compared to all other covariates. Now, the question is: are the MPLEs are really existent in this Cox model?

Example 3: Small Cell Lung Cancer Data

We consider data from a phase III advanced non-small-cell lung cancer (SCLC) clinical trial conducted by the University of North Carolina at Chapel Hill (LCCC 9719). The results of this study have been published in Socinski et al. (2002). The goal of this trial was to compare a defined duration of therapy (A) to continuous therapy followed by second line therapy (B) in order to determine optimal duration of therapy in SCLC patients. LCCC 9719 had n = 230 patients. We consider here five prognostic factors: x₁ = treatment (2 arms: A and B, coded as 1 and 0), x₂ = gender (female and male, coded as 0 and 1), x₃ = age in years, x₄ = highest grade toxicity (recorded by cycle) (2 levels: 0 versus > 0, coded as 0 and 1), and x₅ = quality of life (QOL) FACTG score. For these five prognostic factors, x₄ and x₅ had missing information and x₁, x₂, and x₃ were completely observed for all cases. In this dataset, there is a total missing covariate data fraction of 52.74% on these two covariates. The outcome variable (y_i in months) is time to progression, which is continuous and subject to right censoring, and δ_i denotes the censoring indicator which equals 1 if the i^th subject had disease progression, and 0 otherwise. The median follow up time is 3.94 months and the range of the follow up time is (0.10, 12.26) months. There are d = 102 distinct progression times and ties are present in the dataset. A summary of the dataset is given in Table 1. In the presence of missing covariates, a joint probability distribution must be specified for the progression time and the missing covariates, and a profile likelihood method is hence proposed for obtaining the MLE in Section 4, as a partial likelihood approach in this context may not be as desirable.

Table 1.

Summary of LCCC 9719 Data

completely observed variables

x₁	A	114
(frequency)	B	116

x₂	Male	144
(frequency)	Female	86

x₃	mean	62.24
(years)	std dev	10.17

y	censored	83
(frequency)	relapsed	147

missing covariates

x₄	0	155
(frequency)	1	10
	missing	65

x₅	mean	78.14
(QOL score)	std dev	15.31
	missing	81

both x₄ and x₅	missing	27
one of x₄ or x₅	missing	119

Open in a new tab

3 Existence of the MPLE With No Missing Data

In this section, we characterize very general conditions for the existence of the MPLE of β for a given dataset D_obs under the Cox model with no missing covariate data. Define X^* to be

X^{*} = (δ_{i} (x_{j} - x_{i}), j \in R (y_{i}), 1 \leq i \leq n)^{'} .

(3.1)

Let k_i denote the number of subjects in ℛ(y_i) for i = 1, 2,…, n. Also let $K = \sum_{i = 1}^{n} k_{i}$ . Then, X^* is a K × p matrix. Using X^*, we are led to the following theorem.

Theorem 3.1

The MPLE of β in (2.1) exists if the following conditions are satisfied:

(C1) X^* is of full rank p; and
(C2) There exists a positive vector v, i.e., each component of v is positive, such that

X^{*^{'}} v = 0.

(3.2)

In addition, if (C1) is satisfied, then (C2) is a necessary condition for the existence of MPLE for β.

The proof of Theorem 3.1 is given in the Appendix.

Remark 3.1

In X^* defined by (3.1), the rows corresponding to δ_i = 0 or x_j = x_i can be excluded. Thus, the effective numbers of rows in X^* can be reduced substantially. Specifically, let $k_{i}^{*} = \sum_{j \in R (y_{i})} 1 {x_{j} \neq x_{i}}$ , where the indicator function 1{x_j ≠ x_i} = 1 if x_j ≠ x_i and 0 otherwise. Then, the effective numbers of rows in X^* is given by $K^{*} = \sum_{i = 1}^{n} δ_{i} k_{i}^{*}$ .

Remark 3.2

When ties are present, as discussed in Klein and Moeschberger (2003, Chapter 8), the partial likelihood may be defined as

L_{p t} (β ∣ D_{obs}) = \prod_{i = 1}^{d} \frac{exp (z_{i}^{'} β)}{{[\sum_{j \in R (y_{i})} exp (x_{j}^{'} β)]}^{d_{i}}},

(3.3)

where $d = \sum_{i = 1}^{n} δ_{i}$ , z_i = Σ_j_{∈_i} x_j, d_i = the number of events at y_i, and Inline graphic _i is the set of all individuals who have the event at time y_i. We can thus rewrite (3.3) as

L_{p t} (β ∣ D_{obs}) = \prod_{i = 1}^{n} \frac{exp (δ_{i} x_{i}^{'} β)}{{[\sum_{j \in R (y_{i})} exp (x_{j}^{'} β)]}^{δ_{i}}},

and Theorem 3.1 can be easily extended to the cases when ties are present. Note that the partial likelihood given by (3.3) is the likelihood of Breslow (1974), and the Breslow likelihood is the default choice in SAS to handle ties in the failure times.

Remark 3.3

Suppose y₁ ≤ y₂ ≤ ··· ≤ y_n. Then, from condition (C2), it is easy to observe that if there exists a j such that x₁_j ≤ x₂_j ≤ ··· x_nj, the MPLE of β does not exist. Also, when one of the components of x_i, say, x_ij, is binary and the x_ij’s take the same value for δ_i = 1 or the the x_ij’s take the same value for δ_i = 0, then the MPLE of β does not exist.

Remark 3.4

When conditions (C1) and (C2) are satisfied for a subset of the data, the MPLE still does exist. To see this, we assume that the subset consists of the first n^* observations. Then we have

L_{p} (β ∣ D_{obs}) \leq \prod_{i = 1}^{n^{*}} {[\frac{exp (x_{i}^{'} β)}{\sum_{j \in R (y_{i}), j \leq n^{*}} exp (x_{j}^{'} β)}]}^{δ_{i}} .

The existence of the MPLE can obtain by simply applying Theorem 3.1 to the above upper bound. These subset conditions are only sufficient but not necessary. However, this result is particularly useful for large datasets, for which checking conditions (C1) and (C2) may not be computationally feasible.

Remark 3.5

Jacobsen (1989) also characterizes a necessary and sufficient condition for the existence of the MPLE. His condition can be stated as follows: there is no a ∈ R^p such that a′δ_i(x_j − x_i) ≥ 0 for j ∈ ℛ(y_i) and 1 ≤ i ≤ n. According to Lemma A.1, we can see that Jacobsen’s condition implies (C2). Thus, (C2) is necessary for existence and for uniqueness. We note that the conditions stated in Theorem 3.1 are sufficient. However, compared to Jacobsen’s condition, the conditions (C1) and (C2) given in Theorem 3.1 are easier to check. First, it is straightforward to check condition (C1) that X^*has full column rank. As discussed in Appendix A of Roy and Hobert (2007), condition (C2) can be checked with a simple linear program using the ‘simplex’ function from the ‘boot’ library in the R programming language.

Example 1: A Simple Illustration (revisited)

Recall that in Example 1, we have n = 3, y₁ < y₂ < y₃, δ₁ = δ₂ = 1, and δ₃ = 0. For Case 1 in which x₁ = x₂ = 0 and x₃ = 1, we have k₁ = 3, k₂ = 2, k₃ = 1, and K = 6. Thus, using (3.1), (X^*) = (0, 0, 1, 0, 1, 0)′, which is a 6 × 1 matrix. After excluding the rows corresponding to δ_i = 0 or x_j = x_i, the effective number of rows in X^* is K^* = 2. It is easy to see that X^* is of full rank, which is 1. Also, for any v = (v₁, v₂, v₃, v₄, v₅, v₆)′ such that v_i > 0, (X^*)′v = v₃ + v₅ > 0. Thus, by Theorem 3.1, the MPLE does not exist.

For Case 2, where x₁ = 0, x₂ = 1, and x₃ = 0, using (3.1), we have (X^*)′ = (0, 1, 0, 0, −1, 0). Let v = (v₁, v₂, v₃, v₄, v₂, v₅)′ for v_j > 0, j = 1, 2, …, 5, (X^*)′v ≡ 0. Obviously, X^* is of full rank. Thus, the MPLE does exist by Theorem 3.1.

In general, we have X^* = (0, x₂ − x₁, x₃ − x₁, 0, x₃ − x₂, 0)′ and (3.2) reduces to

\begin{array}{l} (0, x_{2} - x_{1}, x_{3} - x_{1}, 0, x_{3} - x_{2}, 0) (v_{1}, v_{2}, v_{3}, v_{4}, v_{5}, v_{6})^{'} \\ = v_{2} (x_{2} - x_{1}) + v_{3} (x_{3} - x_{1}) + v_{5} (x_{3} - x_{2}) = 0. \end{array}

(3.4)

Condition (C1) requires that at least two of x₁, x₂, and x₃ are different. If x₁ = x₂ and condition (C1) holds, then there is no positive solution v to (3.4) regardless of the value of x₃. Thus, the MPLE always does not exist when x₁ = x₂. However, if x₁ < x₂, then the MPLE exists if x₃ < x₂ and does not exist if x₃ ≥ x₂. Similarly, if x₂ < x₁, the MPLE exists if x₃ > x₂ and does not exist if x₃ ≤ x₂. One interesting observation is that even if x₃ < x₁ < x₂, the MPLE still exists although x₃, for which δ₃ = 0, is distinct from {x₁, x₂} in the sense that δ₁ = δ₂ = 1. Thus, the condition for existence of the MPLE cannot be characterized by the value of δ_i alone by fitting, for example, a binary regression model to δ_i while treating (1, x_i)′ as a vector of covariates.

Example 2: Prostate Cancer Data (revisited)

After we further examined the data, we found that

Variable	δ= 0	δ= 1	Total
Only one of A, B, C	253	2	255
Only AB not C	116	1	117
Only AC not B	35	0	35
Only BC not A	64	1	65
ABC	71	7	78

Total	539	11	550

Open in a new tab

From the above table, “only AC not B” is the only group, in which there are no events. This explains why we obtained the unusual estimate and standard error for the regression coefficient corresponding to AC. From Remark 3.3, it becomes apparent that the MPLEs do not exist for this dataset if we fit the five covariates in the Cox model. One way to fix this problem is to combine AB, AC, and BC as one variable, which was called the two-factors only variable in Tsai et al. (2006).

4 Profile Maximum Likelihood Estimation in the Presence of Missing Covariates

When there are missing covariates, we assume that the distribution of the censoring time C_i does not depend on the missing covariates and the missingness is MAR. In this case, we cannot directly use the Cox partial likelihood since we need to model the failure time and the covariates jointly. Thus, instead of the partial likelihood approach, we use a profile likelihood approach when we have MAR covariates.

For notational simplicity, we assume that all failure times are distinct and let y₁, y₂, …, y_d be d distinct failure times. Let h₀(y) ≥ 0 denote the baseline hazard function and also let $H_{0} (y) = \int_{0}^{y} h_{0} (u) d u$ denote the baseline cumulative hazard function. Let $x_{i} = (x_{i, mis}^{'}, x_{i, obs}^{'})^{'}$ and D_obs = (y_i, δ_i, x_i_,_obs, i = 1,2, …, n). Also let D = (y_i, δ_i, x_i,obs, x_i,mis, i = 1, 2,…, n) denote the complete data. In addition, let r_i = (r_i₁, r_i₂, …, r_ip)′ to be the vector of the p missing covariate indicators such that r_il = 0 when x_il is missing and r_il = 1 when x_il is observed for i = 1, 2, ···, n and l = 1, 2, ···, p. Since we assume ignorable missingness in the covariates (i.e., MAR covariates and the parameters of the missing data mechanism are distinct from the sampling model), we do not need to model r_i. Also, we assume that the parameters of the distributions for the censoring times C_i’s are distinct from the sampling model. Thus, for ignorably missing covariates, ignoring the parts adhering to censoring and the missing data mechanism, the observed data likelihood function based on the Cox model (Cox, 1972) is given by

\begin{array}{l} L (β, h_{0}, α ∣ D_{obs}) \\ = \int [\prod_{i = 1}^{d} h_{0} (y_{i}) exp (x_{i}^{'} β)] exp {- \sum_{j = 1}^{n} H_{0} (y_{j}) exp (x_{j}^{'} β)} [\prod_{i = 1}^{n} f (x_{i, mis}, x_{i, obs} ∣ α)] d x_{mis}, \end{array}

(4.1)

where x_mis = (x_i,mis, i = 1, 2, …, n), f(x_i,mis, x_i,obs|α) denotes the joint distribution of x_i, and α is the vector of parameters for the covariate distribution.

It is well known that the partial likelihood can be expressed as a profile likelihood (Johansen, 1983) by substituting a nonparametric maximum likelihood estimator for the cumulative baseline hazard function H₀(y), which is a function of the fixed coefficients β, and that this nonparametric maximum likelihood estimator is necessarily a pure-jump estimator with jumps precisely at the observed event times. Following the profile likelihood approach (see, for example, Klein and Moeschberger (2003, Chapter 8)), we have

\begin{array}{l} sup_{h_{0}} L (β, h_{0}, α ∣ D_{obs}) \\ \leq \int sup_{h_{0}} [\prod_{i = 1}^{d} h_{0} (y_{i}) exp (x_{i}^{'} β)] exp {- \sum_{j = 1}^{n} H_{0} (y_{j}) exp (x_{j}^{'} β)} [\prod_{i = 1}^{n} f (x_{i, mis}, x_{i, obs} ∣ α)] d x_{mis} \\ = \int sup_{h_{0}} [\prod_{i = 1}^{d} h_{0} (y_{i}) exp (x_{i}^{'} β)] exp {- \sum_{j = 1}^{n} (\sum_{y_{k} \leq y_{j}} h_{0} (y_{k})) exp (x_{j}^{'} β)} [\prod_{i = 1}^{n} f (x_{i, mis}, x_{i, obs} ∣ α)] d x_{mis} \\ = \int sup_{h_{0}} \prod_{i = 1}^{d} [h_{0} (y_{i}) exp (x_{i}^{'} β) exp {- h_{0} (y_{i}) \sum_{j \in R (y_{i})} exp (x_{j}^{'} β)}] [\prod_{i = 1}^{n} f (x_{i, mis}, x_{i, obs} ∣ α)] d x_{mis} \\ = \int {\prod_{i = 1}^{d} exp (x_{i}^{'} β) {[\sum_{j \in R (y_{i})} exp (x_{j}^{'} β)]}^{- 1}} [\prod_{i = 1}^{n} f (x_{i, mis}, x_{i, obs} ∣ α)] d x_{mis} . \end{array}

(4.2)

We note that in (4.2), the function

[\prod_{i = 1}^{d} h_{0} (y_{i}) exp (x_{i}^{'} β)] exp {- \sum_{j = 1}^{n} H_{0} (y_{j}) exp (x_{j}^{'} β)}

is maximized when h₀(y_j) = 0 except for the times at which events occur. Thus, the MLE of(β,α) exists if the upper bound in the right-hand side of (4.2) goes to zero when $∥ β ∥ + ∥ α ∥ = \sqrt{β^{'} β} + \sqrt{α^{'} α} \to \infty$ . Write

L (α ∣ D_{obs}) = \int \prod_{i = 1}^{n} f (x_{i, mis}, x_{i, obs} ∣ α) d x_{mis} .

(4.3)

The following theorem characterizes the conditions for existence of the MLE of (β, h₀, α) when the x_ij’s are bounded.

Theorem 4.1

If the x_ij’s are bounded, i.e., a_i ≤ x_ij ≤ b_i, define X^** to be $X^{* *} = (δ_{i} (x_{j}^{*} - x_{i}^{*})$ , j ∈ℛ(y_i), δ_j = 0, 1 ≤ i ≤ n)′, where $x_{i}^{*} = ((x_{i, mis}^{R})^{'}, x_{i, obs}^{'})^{'}$ and each component of $x_{i, mis}^{R}$ is equal to either $a_{i}^{*} = δ_{i} a_{i} + (1 - δ_{i}) b_{i}$ or $b_{i}^{*} = (1 - δ_{i}) a_{i} + δ_{i} b_{i}$ for all i. Then, the MLE of (β, h₀, α) in (4.1) exists if the following conditions are satisfied: (C1^*) lim_||_α_||→∞ L(α|D_obs) = 0; (C2^*) X^** is of full rank; and (C3^*) there exists a positive vector v such that X^**′v = 0.

The proof of Theorem 4.1 is given in the Appendix. The main intuition behind Theorem 4.1 is that when the MLE exists under conditions (C2^*) and (C3^*) for the most extreme possible values of the missing covariates, then the MLE also exists for any intermediate values of the missing covariates, and averaging over the missing values will not affect the existence of the MLE. In Theorem 4.1, the elements of the matrix X^* corresponding to the missing covariates are “filled-in” by either $a_{i}^{*} = δ_{i} a_{i} + (1 - δ_{i}) b_{i}$ or $b_{i}^{*} = (1 - δ_{i}) a_{i} + δ_{i} b_{i}$ , where $a_{i}^{*}$ and $b_{i}^{*}$ are in fact the two possible extreme values of the missing covariates when the x_ij’s are bounded.

The next theorem gives the sufficient conditions for existence of the MLE of (β, h₀, α) when the x_ij’s unbounded.

Theorem 4.2

If the x_ij’s are unbounded, the MLE of (β, h₀, α) in (4.1) exists if condition (C1^*) in Theorem 4.1 and conditions (C1) and (C2) in Theorem 3.1 are satisfied for the completely observed cases.

The proof of Theorem 4.2 is given in the Appendix. Theorem 4.2 is practically useful as the conditions stated in this theorem are easy to check than those given in Theorem 4.1. We note that in Theorem 4.2, we are not doing a complete case analysis. Instead, we use a subset of the data with the completely observed cases to establish the sufficient conditions for the existence of the MLE when the missing covariates are unbounded.

Remark 4.1

Assume that the maximum number of missing components of x_i, i = 1, …, n, is p_i. Then, to verify the conditions given in Theorem 4.1, we need to check only the conditions (C2^*) and (C3^*) for at most 2^p_i possible X ^**’s.

Remark 4.2

When there are no missing covariates, it is easy to observe that the profile maximum likelihood estimate of β reduces to the MPLE, while the profile maximum likelihood estimate of α is the MLE.

Remark 4.3

Ibrahim, Lipsitz and Chen (1999) and Chen and Ibrahim (2001) provide a comprehensive set of guidelines for specifying the joint distribution of the covariate vector x_i through a series of one dimensional conditional distributions. Condition (C1^*) stated in Theorem 4.1 holds for many covariate distributions considered in Ibrahim, Lipsitz and Chen (1999) and Chen and Ibrahim (2001).

Remark 4.4

When there are ties in the event times, similar to Remark 3.2, the upper bound given in (4.2) can be modified as

K \int {\prod_{i = 1}^{d} exp (z_{i}^{'} β) {[\sum_{j \in R (y_{i})} exp (x_{j}^{'} β)]}^{- d_{i}}} [\prod_{i = 1}^{n} f (x_{i, mis}, x_{i, obs} ∣ α)] d x_{mis},

where K > 0 is independent of β, α, and x_i, and z_i and d_i are defined in (3.3). Thus, all the theory developed in this subsection is still valid in the presence of ties.

Next, we consider an interesting special case where each missing component of x_i is discrete and bounded.

Corollary 4.1

Assume that each missing component of x_i is discrete and bounded. Then condition (C3^*) given in Theorem 4.1 is also necessary for the existence of the MLE for (β, h₀) if condition (C2^*) is satisfied.

The proof of Corollary 4.1 directly follows from the fact that when each missing component of x_i is discrete and bounded, we have

\begin{array}{l} sup_{h_{0}} L (β, h_{0}, α ∣ D_{obs}) = \sum_{x_{mis}} sup_{h_{0}} [\prod_{i = 1}^{d} h_{0} (y_{i}) exp (x_{i}^{'} β)] exp {- \sum_{j = 1}^{n} H_{0} (y_{j}) exp (x_{j}^{'} β)} \\ \times [\prod_{i = 1}^{n} f (x_{i, mis}, x_{i, obs} ∣ α)] . \end{array}

Thus, details of the proof are omitted for brevity.

5 Computational Development

When there are no missing covariates, computing the MPLE of β is straightforward and, in fact, the MPLE can be computed via standard statistical software, such as the SAS procedure, PROC PHREG. In the presence of missing covariates, the EM algorithm is required. Martinussen (1999) proposes an efficient EM algorithm for computing the MLE and its standard error in the presence of discrete missing covariates. When x_i,mis is continuous or mixed continuous and categorical, we need to develop a Monte Carlo EM (MCEM) algorithm, which is an extension of Martinussen’s algorithm for computing the MLE’s of β, h₀, and α as well as their standard errors.

To implement the MCEM algorithm, let γ = (β, h₀, α). Let γ⁽^t⁾ denote the parameter estimate of γ at the t^th EM iteration. In the E-step, we take an MCMC sample of size $m_{i}^{(t)}, x_{i, mis}^{(t 1)}, x_{i, mis}^{(t 2)}, \dots, x_{i, mis}^{(t m_{i}^{(t)})}$ , from

f (x_{i, mis} ∣ x_{i, obs}, y_{i}, γ^{(t)}) \propto exp (δ_{i} x_{i}^{'} β^{(t)}) exp {- H_{0}^{(t)} (y_{i}) exp (x_{i}^{'} β^{(t)})} f (x_{i, mis}, x_{i, obs} ∣ α^{(t)})

for i = 1, 2, …, n. Note that this conditional distribution is log-concave as long as f(x_i,mis, x_i,obs |α⁽^t⁾) is log-concave in each component of x_i,mis. We then compute

\begin{array}{l} Q (γ ∣ γ^{(t)}) = \sum_{i = 1}^{d} [log h_{0} (y_{i}) + \frac{1}{m_{i}^{(t)}} \sum_{k = 1}^{m_{i}^{(t)}} x_{i}^{(t k)^{'}} β] - \sum_{j = 1}^{n} H_{0} (y_{j}) [\frac{1}{m_{i}^{(t)}} \sum_{k = 1}^{m_{j}^{(t)}} exp (x_{j}^{(t k)^{'}} β)] \\ + \sum_{i = 1}^{n} \frac{1}{m_{i}^{(t)}} \sum_{k = 1}^{m_{i}^{(t)}} log f (x_{i, mis}^{(t k)}, x_{i, obs} ∣ α), \end{array}

(5.1)

where $x_{i}^{(t k)} = (x_{i, mis}^{(t k)^{'}}, x_{i, obs}^{'})^{'}$ and H₀(y_j) = Σ_yl_≤_{y_j,δ_l=1} h₀(yl). In the M-step, we compute

β^{(t + 1)} = \underset{β}{arg max} \sum_{i = 1}^{d} {[\frac{1}{m_{i}^{(t)}} \sum_{k = 1}^{m_{i}^{(t)}} x_{i}^{(t k)^{'}} β] - log [\sum_{j \in R (y_{i})} \frac{1}{m_{j}^{(t)}} \sum_{k = 1}^{m_{j}^{(t)}} exp (x_{j}^{(t k)^{'}} β)]},

(5.2)

h_{0}^{(t + 1)} (y_{i}) = {[\sum_{j \in R (y_{i})} \frac{1}{m_{j}^{(t)}} \sum_{k = 1}^{m_{j}^{(t)}} exp (x_{j}^{(t k)^{'}} β^{(t + 1)})]}^{- 1}, H_{0}^{(t + 1)} (y_{i}) = \sum_{y_{j} \leq y_{i}, δ_{j} = 1} h_{0}^{(t + 1)} (y_{j}),

(5.3)

and

α^{(t + 1)} = \underset{α}{arg max} \sum_{i = 1}^{n} \frac{1}{m_{i}^{(t)}} \sum_{k = 1}^{m_{i}^{(t)}} log f (x_{i, mis}^{(t k)}, x_{i, obs} ∣ α) .

Following Booth and Hobert (1999), in the MCEM algorithm, we take m⁽^t⁺¹⁾ = m⁽^t⁾ + Δm, where Δm > 0. With this dynamic MCMC sample size m⁽^t⁾, the MCEM algorithm requires much less computational time. Also a large m⁽^t⁾ is not needed in early iterations of the algorithm since γ⁽^t⁾ is still far from the MLE γ̂ and the algorithm is not near convergence. As t increases m⁽^t⁾ increases, and a more computationally accurate estimate of Q(γ|γ ⁽^t⁾) is obtained in the E-step.

When x_i_,_mis is categorical, the E-step at the (t + 1)^st iteration reduces to the EM by the Method of Weights (Ibrahim, 1990). With the EM by the Method of Weights, a similar M-step can be developed. We refer to Ibrahim (1990) and Martinussen (1999) for the detailed development of the EM algorithm in this case. It is easy to see from (5.2) that when there are no missing covariates, β⁽^t⁺¹⁾ is the MPLE of β, which is consistent with Remark 4.2.

Let γ̂ denote the estimate of γ at EM convergence. Using Louis’s method (Louis, 1982), the estimated observed information matrix of γ based on the observed data is not difficult to compute. Note that the complete-data likelihood function can be written as

L (β, h_{0}, α ∣ D) = \prod_{i = 1}^{n} [{(h_{0} (y_{i}) exp (x_{i}^{'} β))}^{δ_{i}} exp {- H_{0} (y_{i}) exp (x_{i}^{'} β)} f (x_{i} ∣ α)] .

(5.4)

Thus, the log-likelihood function for the i^th observation is given by

l (γ ∣ x_{i}, y_{i}, δ_{i}) = δ_{i} [log h_{0} (y_{i}) + x_{i}^{'} β] - H_{0} (y_{i}) exp (x_{i}^{'} β) + log (f (x_{i} ∣ α)) .

(5.5)

Write the gradient vector of Q(γ|γ⁽^t⁾) as

\dot{Q} = (γ ∣ γ^{(t)}) = \sum_{i = 1}^{n} {\dot{Q}}_{i} (γ ∣ γ^{(t)}) = \sum_{i = 1}^{n} \frac{1}{m_{i}^{(t)}} \sum_{k = 1}^{m_{i}^{(t)}} \frac{\partial l (γ ∣ x_{i}^{(t k)}, y_{i}, δ_{i})}{\partial γ},

and write the matrix of second derivatives of Q(γ|γ⁽^t⁾) as

\ddot{Q} = (γ ∣ γ^{(t)}) = \sum_{i = 1}^{n} \frac{1}{m_{i}^{(t)}} \sum_{k = 1}^{m_{i}^{(t)}} \frac{\partial^{2} l (γ ∣ x_{i}^{(t k)}, y_{i}, δ_{i})}{\partial γ \partial γ^{'}} .

In addition, write the complete data score vector as

S (γ ∣ D) = \sum_{i = 1}^{n} S_{i} (γ ∣ x_{i}, y_{i}, δ_{i}) = \sum_{i = 1}^{n} \frac{\partial l (γ ∣ x_{i}, y_{i}, δ_{i})}{\partial γ} .

Then, the estimated observed information matrix of γ̂ is given by

I (\hat{γ}) = - \ddot{Q} (\hat{γ} ∣ \hat{γ}) - {[\sum_{i = 1}^{n} \frac{1}{m_{i}^{(t)}} \sum_{k = 1}^{m_{i}^{(t)}} S_{i} (\hat{γ} ∣ x_{i}^{(k)}, y_{i}) S_{i} (\hat{γ} ∣ x_{i}^{(k)}, y_{i})^{'}] - \sum_{i = 1}^{n} {\dot{Q}}_{i} (\hat{γ} ∣ \hat{γ}) {\dot{Q}}_{i} (\hat{γ} ∣ \hat{γ})^{'}},

(5.6)

where $x_{i, mis}^{(1)}, x_{i, mis}^{(2)}, \dots, x_{i, mis}^{(m_{i}^{(t)})}$ is an MCMC sample of size $m_{i}^{(t)}$ , from f(x_i,mis|x_i,obs_, γ̂), and $x_{i}^{(k)} = (x_{i, mis}^{(k)^{'}}, x_{i, obs}^{'})^{'}$ . Thus, the estimate of the asymptotic covariance matrix of γ̂ is [ℐ(γ̂)]⁻¹.

Finally, we note that when there are ties in the failure times, (5.7), (5.2), and (5.3) can be modified as

\begin{array}{l} Q (γ ∣ γ^{(t)}) = \sum_{i = 1}^{d} [d_{i} log h_{0} (y_{i}) + \frac{1}{m_{i}^{(t)}} \sum_{k = 1}^{m_{i}^{(t)}} z_{i}^{(t k)^{'}} β] - \sum_{i = 1}^{d} h_{0} (y_{i}) \sum_{j \in R (y_{i})} [\frac{1}{m_{j}^{(t)}} \sum_{k = 1}^{m_{j}^{(t)}} exp (x_{j}^{(t k)^{'}} β)] \\ + \sum_{i = 1}^{n} \frac{1}{m_{i}^{(t)}} \sum_{k = 1}^{m_{i}^{(t)}} log f (x_{i, mis}^{(t k)}, x_{i, obs} ∣ α), \end{array}

(5.7)

where $z_{i}^{(t k)} = \sum_{j \in D_{i}} (x_{j, mis}^{(t k)^{'}}, x_{j, obs}^{'})^{'}$ ,

β^{(t + 1)} = \underset{β}{arg max} \sum_{i = 1}^{d} {[\frac{1}{m_{i}^{(t)}} \sum_{k = 1}^{m_{i}^{(t)}} z_{i}^{(t k)^{'}} β] - d_{i} log [\sum_{j \in R (y_{i})} \frac{1}{m_{j}^{(t)}} \sum_{k = 1}^{m_{j}^{(t)}} exp (x_{j}^{(t k)^{'}} β)]},

(5.8)

and

h_{0}^{(t + 1)} (y_{i}) = d_{i} {[\sum_{j \in R (y_{i})} \frac{1}{m_{j}^{(t)}} \sum_{k = 1}^{m_{j}^{(t)}} exp (x_{j}^{(t k)^{'}} β^{(t + 1)})]}^{- 1}, H_{0}^{(t + 1)} (y_{i}) = \sum_{y_{j} \leq y_{i}, δ_{j} = 1} h_{0}^{(t + 1)} (y_{j}) .

(5.9)

The calculation of ℐ(γ̂) needs to be modified accordingly in the presence of ties. Again, the above formulation can be easily extended to the case where x_i,mis is categorical.

6 Analysis of Small Cell Lung Cancer Data

For the LCCC 9719 data discussed in Section 2, we use the proposed methods to estimate the regression coeffients assuming the missing covariates are MAR. We consider a Cox regression model for [y_i |x_i, β, h₀] allowing for right censoring. Thus, we have

f (y_{i} ∣ δ_{i}, x_{i}, β, h_{0}) = {[h_{0} (y_{i}) exp {x_{i}^{'} β)}]}^{δ_{i}} exp {- H_{0} (y_{i}) exp (x_{i}^{'} β)},

where x_i = (x_i1,…, x_i5)′ is a 5 × 1 vector of covariates, i = 1, 2, …, n, β = (β₁,…,β₅)′ is the vector of the corresponding regression coeffcients, h₀(y_i) and H₀(y_i) denote the baseline hazard function and the cumulative baseline hazard function, respectively. Since x_i₁, x_i₂, and x_i₃ are always observed, they do not need to be modeled. Thus, we only need to model two missing covariates (x₄, x₅) conditioning on the completely observed covariates throughout. We consider two models: [x₄|x₁, x₂, x₃][x₅|x₁, x₂, x₃, x₄] and [x₄|x₁, x₂, x₃, x₅][x₅|x₁, x₂, x₃] for (x₄, x₅). We use a logistic regression model for x_i₄ and a normal linear regression model for x_i₅. Specifically, for example, for [x₄|x₁, x₂, x₃][x₅|x₁, x₂, x₃, x₄], we have

f (x_{i 4} ∣ x_{i 1}, x_{i 2}, x_{i 3}, α_{4}) = \frac{exp {x_{i 4} (α_{40} + α_{41} x_{i 1} + α_{42} x_{i 2} + α_{43} x_{i 3})}}{1 + exp (α_{40} + α_{41} x_{i 1} + α_{42} x_{i 2} + α_{43} x_{i 3})},

where α₄ = (α₄₀, α₄₁, α₄₂, α₄₃)′, and

f (x_{i 5} ∣ x_{i 1}, x_{i 2}, x_{i 3}, x_{i 4}, α_{5}) = \frac{1}{\sqrt{2 π α_{55}}} exp {- {[x_{i 5} - (α_{50} + α_{51} x_{i 1} + \dots + α_{54} x_{i 4})]}^{2} / (2 α_{55})},

where α₅ = (α₅₀, α₅₁, …, α₅₅)′.

To illustrate how to apply the Theorems presented in Sections 2 and 3, we consider a subset of the LCCC 9719 data, which is given in Table 2. Since all of the covariates are observed in this subset, using (3.1) after excluding the rows corresponding to δ_i = 0 or x_j = x_i, X^* is a 35× 5 matrix. The first 8 rows are given by $x_{i}^{'} - x_{1}^{'}$ for i = 1, 2, …, 8, and the last row is given by $x_{9}^{'} - x_{7}^{'}$ . Using Maple (Version 8) linsolve, with Maple code “linsolve(X^*, v);”, after loading a linalg package, we obtain a closed form solution for X^*′v = 0 and find that there indeed exists a positive vector v > 0 satisfying X^*′v = 0. Also, |X^*′X^*| = 9.2344 × 10¹⁰ > 0. Thus, conditions (C1) and (C2) given in Theorem 3.1 are met for this subset. As discussed in Remark 3.4, when the conditions (C1) and (C2) are satisfied for a subset of the data, these two conditions hold for the entire set of completely observed cases. In addition, we can show that lim_||_α_||→∞ L(α|D_obs) = 0, where L(α|D_obs) is defined by (4.3), using the results established in Chen and Shao (2001) and hence, details are omitted here for brevity. Thus, based on Theorem 4.1, the MLE does exist for the entire dataset.

Table 2.

A Subset of LCCC 9719 Data

Obs (i)	y_i	δ_i	x_i₁	x_i₂	x_i₃	x_i₄	x_i₅
1	0.394	1	1	0	68	0	54
2	1.083	1	0	0	81	0	79
3	1.116	1	1	1	82	0	64
4	1.149	1	0	1	58	1	86
5	1.313	1	1	1	52	1	54
6	3.973	1	1	0	69	1	92
7	6.665	1	0	0	54	1	83
8	9.521	0	1	0	62	0	67
9	14.380	0	0	1	81	0	80

Open in a new tab

Since the MPLE of β and the MLE of (β, h₀, α) exist for this dataset, we can compute various estimates of β and α₄ and α₅. We standardized age and QOL score in order to make the numerical computations more stable. We used the SAS procedure PHREG to obtain the MPLE of β for the complete case (CC) analysis (i.e, an analysis deleting all of the missing values). The MCEM algorithm discussed in Section 5 was implemented using FORTRAN 77 with IMSL, the estimated observed information matrix ℐ(γ̂) given by (5.6) is of dimension (102+15)×(102+15), and its inverse was computed via the IMSL subroutine DLINDS. The Gibbs sampling algorithm was used to generate the Monte Carlo sample with 500 “burn-in” iterations at each MCEM iteration. In the MCEM, we took m⁽⁰⁾ = 500 and Δm = 50. The convergence criterion for the MCEM algorithm for obtaining the MLE was that the squared distance between the t^th and (t + 20)^th iterations was less than 10⁻³. The MCEM algorithm for obtaining the MLE of (β, h₀, α) required only 25 iterations using m(t) = 1750 at convergence.

The resulting MPLEs and MLEs are shown in Tables 3, 4, and 5 for the complete case (CC) analysis as well as analyses incorporating all of the cases with two different models for (x₄, x₅). In the tables, standard errors (SEs), Z-statistics, p-values, and 95% confidence intervals for β are also reported. We can see some differences between the estimates in Tables 3 and 4. In the CC analysis, the 95% confidence interval for β₁ is (−0.024, 0.967) while the 95% confidence interval is (0.133, 0.820) in the analysis incorporating all of the cases, which indicates that the regression coefficient for treatment is not significant at the 0.05 level in the CC analysis, but significant in the analysis incorporating all of the cases. This indicates that continuous therapy followed by second line therapy may have a strong effect (i.e., more beneficial) compared to defined duration of therapy with respect to time to progression. Also, the SEs from the analysis incorporating all of the cases are consistently smaller than those from the CC analysis for all of the β_j’s. This is expected since more information is used in the all case analysis.

Table 3.

Maximum Partial Likelihood Estimates of β for Complete Case Analysis

Parameter	MPLE	SE	Z-statistic	p-value	95% CI
β₁	0.471	0.253	1.864	0.062	(−0.024, 0.967)
β₂	0.068	0.243	0.280	0.780	(−0.409, 0.545)
β₃	−0.020	0.130	−0.154	0.878	(−0.275, 0.235)
β₄	0.878	0.411	2.140	0.032	(0.074, 1.684)
β₅	−0.138	0.119	−1.158	0.247	(−0.372, 0.096)

Open in a new tab

Table 4.

Maximum Likelihood Estimates of β Based On All Observed Data and Model [x₄|x₁, x₂, x₃][x₅|x₁, x₂, x₃, x₄] for Missing Covariates

Parameter	MLE	SE	Z-statistic	p-value	95% CI
β₁	0.477	0.175	2.723	0.006	(0.133, 0.820)
β₂	0.174	0.180	0.966	0.334	(−0.179, 0.528)
β₃	−0.021	0.090	−0.238	0.812	(−0.198, 0.155)
β₄	0.914	0.381	2.400	0.016	(0.168, 1.661)
β₅	−0.052	0.105	−0.490	0.624	(−0.258, 0.155)

Open in a new tab

Table 5.

Maximum Likelihood Estimates of β Based On All Observed Data and Model [x₄|x₁, x₂, x₃, x₅][x₅|x₁, x₂, x₃] for Missing Covariates

Parameter	MLE	SE	Z-statistic	p-value	95% CI
β₁	0.477	0.175	2.722	0.006	(0.133, 0.820)
β₂	0.173	0.180	0.959	0.338	(−0.181, 0.527)
β₃	−0.021	0.090	−0.233	0.816	(−0.197, 0.155)
β₄	0.914	0.388	2.356	0.018	(0.154, 1.674)
β₅	−0.053	0.106	−0.501	0.616	(−0.261, 0.155)

Open in a new tab

The reason why we considered two models for (x₄, x₅) is that there are two possibilities in modeling the joint covariate distribution as a sequence of one dimensional conditional distributions. As Ibrahim, Lipsitz, and Chen (1999) point out, it is important to conduct a sensitivity analysis to examine whether inference about the parameters of primary interest, which are the β_j’s in this case, is robust with respect to the order of conditioning in the covariate distributions. From Tables 4 and 5, both estimates and SEs for all the β_j’s are very close for these two joint covariate distributions. Thus, inference about β is quite robust with respect to these two different orders of conditioning.

Finally, the estimated baseline hazard functions h₀(y) are plotted in Figure 1 for the complete case analysis as well as the analysis incorporating all of the cases, labeled Complete Cases and All Cases, respectively. In the all case analysis, the model [x₄|x₁, x₂, x₃][x₅|x₁, x₂, x₃, x₄] for (x₄, x₅) was used since an almost identical estimated baseline hazard function was obtained under the model [x₄|x₁, x₂, x₃, x₅][x₅|x₁, x₂, x₃]. Strikingly, the CC analysis resulted in a much different (larger) estimate of the baseline hazard than the all case analysis, which further demonstrates the importance of incorporating all of the cases into the analysis.

Estimated baseline hazard function (h₀(y)) for CC and all cases analyses.

Acknowledgments

The authors wish to thank the Editor, the Associate Editor, and two referees for helpful comments and suggestions which have improved the paper. Dr. Chen and Dr. Ibrahim’s research was partially supported by NIH grants #GM 70335 and #CA 74015. Dr. Shao’s research was partially supported by HKUST DAG05/06.SC27 and RGC 602206.

Appendix: Proofs of Theorems

We first establish a useful result, which is formally stated in the following lemma.

Lamma A.1

Let X^* be an n^* × p matrix (p < n^*). Also let R^{n^*} denote the n^*-dimensional Euclidean space. If there is no positive vector v = (v₁, v₂, …, v_n^*)′ ∈ R^{n^*} (denoted by v > 0, i.e., v_i > 0 for i = 1, 2, …, n^*) such that

X^{*^{'}} v = 0,

(A.1)

then there exists a non-zero vector b ∈R^p such that

b^{'} x_{i}^{*} \leq 0,

(A.2)

where $x_{i}^{*}$ is the i^th row of X^*.

Proof

Let Inline graphic = {X^*′ v: v > 0, v ∈ Rⁿ^{^*}}. Then is a convex cone in R^p (see Theorem 2.6 in Rockafellar (1970)). Since (A.1) does not hold, by Corollary 11.7.3 of Rockafellar (1970), there exists some non-zero vector b such that ∀ v > 0, b′X^*′v ≤ 0 and hence ∀ v ≥ 0, b′X^*′v ≤ 0. In particular, (A.2) holds.

Proof of Theorem 3.1

Observe that for δ = 0 or 1 and x > −1

{(\frac{1}{1 + x})}^{δ} = \int_{0}^{\infty} e^{- t (1 + δ x)} d t .

(A.3)

Without loss of generality (WLOG), assume y₁ ≤ y₂ ≤ … ≤ y_n. Then

\begin{array}{l} L_{p} (β ∣ D_{obs}) = \prod_{i = 1}^{n} {(\frac{exp (x_{i}^{'} β)}{\sum_{j \in R (y_{i})} exp (x_{j}^{'} β)})}^{δ_{i}} = \prod_{i = 1}^{n} {(\frac{1}{1 + \sum_{j > i} exp ((x_{j} - x_{i})^{'} β)})}^{δ_{i}} \\ = \prod_{i = 1}^{n} \int_{0}^{\infty} exp (- t_{i} (1 + δ_{i} \sum_{j > t} exp ((x_{j} - x_{i})^{'} β))) d t_{i} \\ = \int_{R^{+ n}} exp (- \sum_{i = 1}^{n} t_{i}) \prod_{1 \leq i \leq n, j > i} {(exp (- exp (- (x_{i} - x_{j})^{'} β))}^{t_{i} δ_{i}} d t \\ = \int_{R^{+ n}} exp (- \sum_{i = 1}^{n} t_{i}) \prod_{1 \leq i \leq n, j > i} F {((x_{i} - x_{j})^{'} β)}^{t_{i} δ_{i}} d t, \end{array}

(A.4)

where t = (t₁, t₂, …, t_n)′, R⁺ⁿ = R⁺ ×…×R⁺ with R⁺ = (0, ∞), and F (u) = exp(−exp(−u)).

Sufficiency

WLOG, we assume that L_p(β|D_obs) ≢ 0. Then, there exists a β₀ such that L_p(β₀|D_obs) > 0. Let M > 1 such that

\frac{1}{1 - log F (- M)} < L_{p} (β_{0} ∣ D_{obs}) .

For β satisfying max_1≤_i_≤_n,j_>_i δ_i(x_j − x_i)′β > M, there exist i₀ and j₀ achieving the maximum such that δ_i₀(x_i₀−x_j₀)′β < −M. Since F is a nondecreasing distribution function, we have

\begin{array}{l} L_{p} (β ∣ D_{obs}) \leq \int_{R^{+ n}} exp (- \sum_{i = 1}^{n} t_{i}) F {(δ_{i_{0}} (x_{i_{0}} - x_{j_{0}})^{'} β)}^{t_{i_{0}}} d t \leq \int_{R^{+ n}} exp (- \sum_{i = 1}^{n} t_{i}) F {(- M)}^{t_{i_{0}}} d t \\ = \int_{0}^{\infty} exp {- [1 - log F (- M)] t_{i_{0}}} d t_{i_{0}} = \frac{1}{1 - log F (- M)} \leq L_{p} (β_{0} ∣ D_{obs}) . \end{array}

(A.5)

When max_1≤_i_≤_n_,_j_>_i δ_i(x_j−x_i)′β ≤ M, following Lemma 4.1 in Chen and Shao (2001), conditions (C1) and (C2) imply that

∥ β ∥ = \sqrt{β^{'} β} \leq D for some 0 < D < \infty .

(A.6)

Combining (A.5) and (A.6) leads to sup_β L_p(β|D_obs) = sup_β_{: ||}_β_||≤_D L_p(β|D_obs). Since L_p(β|D_obs) is a continuous and bounded function, there exists a β̂ such that

L_{p} (\hat{β} ∣ D_{obs}) = sup_{β : ∥ β ∥ \leq D} L_{p} (β ∣ D_{obs})

and hence the MPLE exists.

Necessity

Assume that the MPLE of β exists. Then, there is a β^* such that

L_{p} (β^{*} ∣ D_{obs}) = sup_{β} L_{p} (β ∣ D_{obs}) and ∥ β^{*} ∥ < \infty .

Assume that condition (C2) does not hold. Then, by Lemma A.1, there exists a non-zero vector b such that δ_i(x_j − x_i)′b ≤ 0 for all 1 ≤ i ≤ n and j > i. Thus,

\begin{array}{l} L_{p} (β^{*} + s b ∣ D_{obs}) = \int_{R^{+ n}} exp (- \sum_{i = 1}^{n} t_{i}) \prod_{1 \leq i \leq n, j > i} F {(δ_{i} (x_{i} - x_{j})^{'} β^{*} + δ_{i} s (x_{i} - x_{j})^{'} b)}^{t_{i} δ_{i}} d t \\ = \int_{R^{+ n}} exp (- \sum_{i = 1}^{n} t_{i}) \prod_{1 \leq i \leq n, j > i} F {(δ_{i} (x_{i} - x_{j})^{'} β^{*} - s δ_{i} (x_{j} - x_{i})^{'} b)}^{t_{i} δ_{i}} d t \end{array}

which is an increasing function of s when condition (C1) holds. This is a contradiction. This shows that condition (C2) is necessary for the existence of the MPLE for β if condition (C1) is satisfied.

Proof of Theorem 4.1

Write

L^{*} (β, α ∣ D_{obs}) = \int {\prod_{i = 1}^{d} exp (x_{i}^{'} β) {[\sum_{j \in R (y_{i})} exp (x_{j}^{'} β)]}^{- 1}} [\prod_{i = 1}^{n} f (x_{i, mis}, x_{i, obs} ∣ α)] d x_{mis}

(A.7)

It is sufficient to prove that

lim_{∥ β ∥ + ∥ α ∥ \to \infty} L^{*} (β, α ∣ D_{obs}) = 0.

(A.8)

Observe that

L_{p} (β ∣ D) = \prod_{i = 1}^{d} exp (x_{i}^{'} β) {[\sum_{j \in R (y_{i})} exp (x_{j}^{'} β)]}^{- 1}

is an increasing function in $x_{j}^{'} β$ for δ_j = 1 and a decreasing function in $x_{j}^{'} β$ for δ_j = 0. For 1 ≤ l ≤ p, let $x_{i l}^{R} = b_{i}^{*} = δ_{i} a_{i} + (1 - δ_{i}) b_{i}$ if β_l ≥ 0 and $x_{i l}^{R} = a_{i}^{*} = (1 - δ_{i}) a_{i} + δ_{i} b_{i}$ if β_l < 0. Write $x_{i}^{*} = ((x_{i, mis}^{R})^{'}, x_{i, obs}^{'})^{'}$ and $x_{i, mis}^{R} = (x_{i l}^{R}, r_{i l} = 0, 1 \leq l \leq p)^{'}$ . Let ℛ_i = ℛ(y_i) − {i}. Then we have

L_{p} (β ∣ D) \leq L_{p}^{*} (β ∣ D) \equiv \prod_{i = 1}^{d} exp ((x_{i}^{*})^{'} β) {[exp ((x_{i}^{*})^{'} β) + \sum_{δ_{j} = 0, j \in R_{i}} exp ((x_{j}^{*})^{'} β)]}^{- 1} .

(A.9)

It directly follows from (A.7), (A.9) and (4.3) that

L^{*} (β, α ∣ D_{obs}) \leq L_{p}^{*} (β ∣ D) L (α ∣ D_{obs}) .

Following the proof of Theorem 3.1, ${lim}_{∥ β ∥ \to \infty} L_{p}^{*} (β ∣ D) = 0$ if conditions (C2^*) and (C3^*) are satisfied. Consequently, we obtain (A.8) under condition (C1^*).

Proof of Theorem 4.2

Let 1 = (1, 1, …, 1)′. Then we have

L_{p} (β ∣ D) \leq \prod_{r_{i} = 1, 1 \leq i \leq d} exp (x_{i}^{'} β) {[exp (x_{i}^{'} β) + \sum_{δ_{j} = 0, r_{j} = 1, j \in R_{i}} exp (x_{j}^{'} β)]}^{- 1} .

Therefore, the above inequality, condition (C1^*) in Theorem 4.1, and conditions (C1) and (C2) stated in Theorem 3.1 directly yield the existence of the MLE of (β, h₀, α).

Footnotes

AMS 2000 subject classifications. Primary 62N02, 62F15; secondary 62N99, 65C05.

References

Booth JG, Hobert JP. Maximizing Generalized Linear Mixed Model likelihoods with an Automated Monte Carlo EM Algorithm. Journal of the Royal Statistical Society, Series B. 1999;61:265–285. [Google Scholar]
Breslow NE. Covariance Analysis of Censored Survival Data. Biometrics. 1974;30:89–99. [PubMed] [Google Scholar]
Chen HY. Double Nonparametric Likelihood Method for the Cox regression model with missing covariates. Journal of the American Statistical Association. 2002;97:565–576. [Google Scholar]
Chen HY, Little RJA. Proportional Hazards Regression with Missing Covariates. Journal of the American Statistical Association. 1999;94:896–908. [Google Scholar]
Chen MH, Ibrahim JG. Maximum Likelihood Methods for Cure Rate Models with Missing Covariates. Biometrics. 2001;57:43–52. doi: 10.1111/j.0006-341x.2001.00043.x. [DOI] [PubMed] [Google Scholar]
Chen MH, Ibrahim JG, Shao QM. On Propriety of the Posterior Distribution and Existence of the Maximum Likelihood Estimator for Regression models with Covariates Missing at Random. Journal of the American Statistical Association. 2004;99:421–438. [Google Scholar]
Chen MH, Ibrahim JG, Shao QM. Posterior Propriety and Computation for the Cox Regression Model with Applications to Missing Covariates. Biometrika. 2006;93:791–807. [Google Scholar]
Chen MH, Shao QM. Propriety of Posterior Distribution for Dichotomous Quantal Response Models With General Link Functions. Proceedings of the American Mathematical Society. 2001;129:293–302. [Google Scholar]
Cox DR. Regression Models and Life Tables (with Discussion) Journal of the Royal Statistical Society, Series B. 1972;34:187–220. [Google Scholar]
Cox DR. Partial Likelihood. Biometrika. 1975;62:269–76. [Google Scholar]
Herring AH, Ibrahim JG. Likelihood-based Methods for Missing Covariates in the Cox Proportional Hazards Model. Journal of the American Statistical Association. 2001;96:292–302. [Google Scholar]
Herring AH, Ibrahim JG, Lipsitz SR. Frailty Models With Missing Covariates. Biometrics. 2002;58:98–109. doi: 10.1111/j.0006-341x.2002.00098.x. [DOI] [PubMed] [Google Scholar]
Herring AH, Ibrahim JG, Lipsitz SR. Nonignorably Missing Covariate Data in Survival Analysis: A Case Study of an International Breast Cancer Study Group Trial. Applied Statistics. 2004;53:293–310. [Google Scholar]
Huang L, Chen M-H, Ibrahim JG. Bayesian Analysis for Generalized Linear Models with Nonignorably Missing Covariates. Biometrics. 2005;61:729–737. doi: 10.1111/j.1541-0420.2005.00338.x. [DOI] [PubMed] [Google Scholar]
Ibrahim JG. Incomplete Data in Generalized Linear Models. Journal of the American Statistical Association. 1990;85:765–769. [Google Scholar]
Ibrahim JG, Lipsitz SR, Chen MH. Missing Covariates in Generalized Linear Models When the Missing Data Mechanism is Nonignorable. Journal of the Royal Statistical Society, Series B. 1999;61:173–190. [Google Scholar]
Jacobsen M. Existence and Unicity of MLEs in Discrete Exponential Family Distributions. Scandinavian Journal of Statistics. 1989;16:335–349. [Google Scholar]
Johansen S. An Extension of Cox’s Regression Model. International Statistical Review. 1983;51:258–262. [Google Scholar]
Klein JP, Moeschberger ML. Survival Analysis. 2. New York: Springer-Verlag; 2003. [Google Scholar]
Lin DY, Ying Z. Cox Regression With Incomplete Covariate Measurements. Journal of the American Statistical Association. 1993;88:1341–1349. [Google Scholar]
Lipsitz SR, Ibrahim JG. Using the EM Algorithm for Survival Data with Incomplete Categorical Covariates. Lifetime Data Analysis. 1996;2:5–14. doi: 10.1007/BF00128467. [DOI] [PubMed] [Google Scholar]
Lipsitz SR, Ibrahim JG. Estimating Equations with Incomplete Categorical Covariates in the Cox Model. Biometrics. 1998;54:1002–1013. [PubMed] [Google Scholar]
Lipsitz SR, Ibrahim JG. Estimation with Correlated Censored Survival Data with Missing Covariates. Biostatistics. 2000;1:315–327. doi: 10.1093/biostatistics/1.3.315. [DOI] [PubMed] [Google Scholar]
Louis T. Finding the Observed Information Matrix When Using the EM Algorithm. Journal of the Royal Statistical Society, Series B. 1982;44:226–233. [Google Scholar]
Martinussen T. Cox Regression with Incomplete Covariate Measurements Using the EM-Algorithm. Scandinavian Journal of Statistics. 1999;26:479–491. [Google Scholar]
Paik MC. Multiple Imputation for the Cox Proportional Hazards Model with Missing Covariates. Lifetime Data Analysis. 1997;3:289–298. doi: 10.1023/a:1009657116403. [DOI] [PubMed] [Google Scholar]
Paik MC, Tsai WY. On Using the Cox Proportional Hazards Model with Missing Covariates. Biometrika. 1997;84:579–593. doi: 10.1023/a:1009657116403. [DOI] [PubMed] [Google Scholar]
Pons O. Estimation in the Cox Model with Missing Covariate Data. Journal of Nonparametric Statistics. 2002;14:223–247. [Google Scholar]
Rockafellar RT. Convex Analysis. Princeton, N.J: Princeton University Press; 1970. [Google Scholar]
Roy V, Hobert JP. Convergence Rates and Asymptotic Standard Errors for Markov Chain Monte Carlo Algorithms for Bayesian Probit Regression. Journal of the Royal Statistical Society, Series B. 2007;69:607–623. [Google Scholar]
Schluchter M, Jackson K. Log-linear Analysis of Censored Survival Data with Partially Observed Covariates. Journal of the American Statistical Association. 1989;84:42–52. [Google Scholar]
Socinski MA, Schell MJ, Peterman A, Bakri K, Yates S, Gitten R, Unger P, Lee J, Lee Ji, Tynan M, Moore M, Kies M. Phase III Trial Comparing Defined Duration of Therapy Versus Continuous Therapy Followed by Second-Line Therapy in Advanced-Stage IIIB/IV Non-Small-Cell Lung Cancer. Journal of Clinical Oncology. 2002;20:1335–1343. doi: 10.1200/JCO.2002.20.5.1335. [DOI] [PubMed] [Google Scholar]
Tsai HK, Chen M-H, McLeod DG, Carroll PR, Richie JP, D’Amico AV. Cancer-Specific Mortality Following Radical Prostatectomy or Radiation Therapy with Short-Course Hormonal Therapy in Men with Localized, Unfavorable-Risk Prostate Cancer. 2006 doi: 10.1002/cncr.22279. Submitted. [DOI] [PubMed] [Google Scholar]

[R1] Booth JG, Hobert JP. Maximizing Generalized Linear Mixed Model likelihoods with an Automated Monte Carlo EM Algorithm. Journal of the Royal Statistical Society, Series B. 1999;61:265–285. [Google Scholar]

[R2] Breslow NE. Covariance Analysis of Censored Survival Data. Biometrics. 1974;30:89–99. [PubMed] [Google Scholar]

[R3] Chen HY. Double Nonparametric Likelihood Method for the Cox regression model with missing covariates. Journal of the American Statistical Association. 2002;97:565–576. [Google Scholar]

[R4] Chen HY, Little RJA. Proportional Hazards Regression with Missing Covariates. Journal of the American Statistical Association. 1999;94:896–908. [Google Scholar]

[R5] Chen MH, Ibrahim JG. Maximum Likelihood Methods for Cure Rate Models with Missing Covariates. Biometrics. 2001;57:43–52. doi: 10.1111/j.0006-341x.2001.00043.x. [DOI] [PubMed] [Google Scholar]

[R6] Chen MH, Ibrahim JG, Shao QM. On Propriety of the Posterior Distribution and Existence of the Maximum Likelihood Estimator for Regression models with Covariates Missing at Random. Journal of the American Statistical Association. 2004;99:421–438. [Google Scholar]

[R7] Chen MH, Ibrahim JG, Shao QM. Posterior Propriety and Computation for the Cox Regression Model with Applications to Missing Covariates. Biometrika. 2006;93:791–807. [Google Scholar]

[R8] Chen MH, Shao QM. Propriety of Posterior Distribution for Dichotomous Quantal Response Models With General Link Functions. Proceedings of the American Mathematical Society. 2001;129:293–302. [Google Scholar]

[R9] Cox DR. Regression Models and Life Tables (with Discussion) Journal of the Royal Statistical Society, Series B. 1972;34:187–220. [Google Scholar]

[R10] Cox DR. Partial Likelihood. Biometrika. 1975;62:269–76. [Google Scholar]

[R11] Herring AH, Ibrahim JG. Likelihood-based Methods for Missing Covariates in the Cox Proportional Hazards Model. Journal of the American Statistical Association. 2001;96:292–302. [Google Scholar]

[R12] Herring AH, Ibrahim JG, Lipsitz SR. Frailty Models With Missing Covariates. Biometrics. 2002;58:98–109. doi: 10.1111/j.0006-341x.2002.00098.x. [DOI] [PubMed] [Google Scholar]

[R13] Herring AH, Ibrahim JG, Lipsitz SR. Nonignorably Missing Covariate Data in Survival Analysis: A Case Study of an International Breast Cancer Study Group Trial. Applied Statistics. 2004;53:293–310. [Google Scholar]

[R14] Huang L, Chen M-H, Ibrahim JG. Bayesian Analysis for Generalized Linear Models with Nonignorably Missing Covariates. Biometrics. 2005;61:729–737. doi: 10.1111/j.1541-0420.2005.00338.x. [DOI] [PubMed] [Google Scholar]

[R15] Ibrahim JG. Incomplete Data in Generalized Linear Models. Journal of the American Statistical Association. 1990;85:765–769. [Google Scholar]

[R16] Ibrahim JG, Lipsitz SR, Chen MH. Missing Covariates in Generalized Linear Models When the Missing Data Mechanism is Nonignorable. Journal of the Royal Statistical Society, Series B. 1999;61:173–190. [Google Scholar]

[R17] Jacobsen M. Existence and Unicity of MLEs in Discrete Exponential Family Distributions. Scandinavian Journal of Statistics. 1989;16:335–349. [Google Scholar]

[R18] Johansen S. An Extension of Cox’s Regression Model. International Statistical Review. 1983;51:258–262. [Google Scholar]

[R19] Klein JP, Moeschberger ML. Survival Analysis. 2. New York: Springer-Verlag; 2003. [Google Scholar]

[R20] Lin DY, Ying Z. Cox Regression With Incomplete Covariate Measurements. Journal of the American Statistical Association. 1993;88:1341–1349. [Google Scholar]

[R21] Lipsitz SR, Ibrahim JG. Using the EM Algorithm for Survival Data with Incomplete Categorical Covariates. Lifetime Data Analysis. 1996;2:5–14. doi: 10.1007/BF00128467. [DOI] [PubMed] [Google Scholar]

[R22] Lipsitz SR, Ibrahim JG. Estimating Equations with Incomplete Categorical Covariates in the Cox Model. Biometrics. 1998;54:1002–1013. [PubMed] [Google Scholar]

[R23] Lipsitz SR, Ibrahim JG. Estimation with Correlated Censored Survival Data with Missing Covariates. Biostatistics. 2000;1:315–327. doi: 10.1093/biostatistics/1.3.315. [DOI] [PubMed] [Google Scholar]

[R24] Louis T. Finding the Observed Information Matrix When Using the EM Algorithm. Journal of the Royal Statistical Society, Series B. 1982;44:226–233. [Google Scholar]

[R25] Martinussen T. Cox Regression with Incomplete Covariate Measurements Using the EM-Algorithm. Scandinavian Journal of Statistics. 1999;26:479–491. [Google Scholar]

[R26] Paik MC. Multiple Imputation for the Cox Proportional Hazards Model with Missing Covariates. Lifetime Data Analysis. 1997;3:289–298. doi: 10.1023/a:1009657116403. [DOI] [PubMed] [Google Scholar]

[R27] Paik MC, Tsai WY. On Using the Cox Proportional Hazards Model with Missing Covariates. Biometrika. 1997;84:579–593. doi: 10.1023/a:1009657116403. [DOI] [PubMed] [Google Scholar]

[R28] Pons O. Estimation in the Cox Model with Missing Covariate Data. Journal of Nonparametric Statistics. 2002;14:223–247. [Google Scholar]

[R29] Rockafellar RT. Convex Analysis. Princeton, N.J: Princeton University Press; 1970. [Google Scholar]

[R30] Roy V, Hobert JP. Convergence Rates and Asymptotic Standard Errors for Markov Chain Monte Carlo Algorithms for Bayesian Probit Regression. Journal of the Royal Statistical Society, Series B. 2007;69:607–623. [Google Scholar]

[R31] Schluchter M, Jackson K. Log-linear Analysis of Censored Survival Data with Partially Observed Covariates. Journal of the American Statistical Association. 1989;84:42–52. [Google Scholar]

[R32] Socinski MA, Schell MJ, Peterman A, Bakri K, Yates S, Gitten R, Unger P, Lee J, Lee Ji, Tynan M, Moore M, Kies M. Phase III Trial Comparing Defined Duration of Therapy Versus Continuous Therapy Followed by Second-Line Therapy in Advanced-Stage IIIB/IV Non-Small-Cell Lung Cancer. Journal of Clinical Oncology. 2002;20:1335–1343. doi: 10.1200/JCO.2002.20.5.1335. [DOI] [PubMed] [Google Scholar]

[R33] Tsai HK, Chen M-H, McLeod DG, Carroll PR, Richie JP, D’Amico AV. Cancer-Specific Mortality Following Radical Prostatectomy or Radiation Therapy with Short-Course Hormonal Therapy in Men with Localized, Unfavorable-Risk Prostate Cancer. 2006 doi: 10.1002/cncr.22279. Submitted. [DOI] [PubMed] [Google Scholar]

PERMALINK

Maximum Likelihood Inference for the Cox Regression Model with Applications to Missing Covariates

Ming-Hui Chen

Joseph G Ibrahim

Qi-Man Shao

Abstract

1 Introduction

2 Motivating Examples

Example 1: A Simple Illustration

Case 1

Case 2

Example 2: Prostate Cancer Data

Example 3: Small Cell Lung Cancer Data

Table 1.

3 Existence of the MPLE With No Missing Data

Theorem 3.1

Remark 3.1

Remark 3.2

Remark 3.3

Remark 3.4

Remark 3.5

Example 1: A Simple Illustration (revisited)

Example 2: Prostate Cancer Data (revisited)

4 Profile Maximum Likelihood Estimation in the Presence of Missing Covariates

Theorem 4.1

Theorem 4.2

Remark 4.1

Remark 4.2

Remark 4.3

Remark 4.4

Corollary 4.1

5 Computational Development

6 Analysis of Small Cell Lung Cancer Data

Table 2.

Table 3.

Table 4.

Table 5.

Figure 1.

Acknowledgments

Appendix: Proofs of Theorems

Lamma A.1

Proof

Proof of Theorem 3.1

Sufficiency

Necessity

Proof of Theorem 4.1

Proof of Theorem 4.2

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases