Efficient Estimation for Semiparametric Structural Equation Models With Censored Data

Kin Yau Wong; Donglin Zeng; D Y Lin

doi:10.1080/01621459.2017.1299626

. Author manuscript; available in PMC: 2019 Jun 6.

Published in final edited form as: J Am Stat Assoc. 2018 Jun 6;113(522):893–905. doi: 10.1080/01621459.2017.1299626

Efficient Estimation for Semiparametric Structural Equation Models With Censored Data

Kin Yau Wong ¹, Donglin Zeng ¹, D Y Lin ¹

PMCID: PMC6075718 NIHMSID: NIHMS844851 PMID: 30083023

Abstract

Structural equation modeling is commonly used to capture complex structures of relationships among multiple variables, both latent and observed. We propose a general class of structural equation models with a semiparametric component for potentially censored survival times. We consider nonparametric maximum likelihood estimation and devise a combined Expectation-Maximization and Newton-Raphson algorithm for its implementation. We establish conditions for model identifiability and prove the consistency, asymptotic normality, and semiparametric efficiency of the estimators. Finally, we demonstrate the satisfactory performance of the proposed methods through simulation studies and provide an application to a motivating cancer study that contains a variety of genomic variables. Supplementary materials for this article are available online.

Keywords: Integrative analysis, Joint modeling, Latent variables, Model identifiability, Nonparametric maximum likelihood estimation, Survival analysis

1. INTRODUCTION

Structural equation modeling (SEM) is a very general and powerful approach to capture complex relationships among multiple factors, both observed and latent (Bollen 1989). A typical SEM framework consists of a structural model that connects latent variables and a measurement model that relates latent variables to observed variables. SEM is extremely popular in the social sciences and psychology, where unmeasured quantities and psychological constructs, such as human intelligence and creativity, can be related to and investigated through observed data. The text of Bollen (1989) has been cited more than 20,000 times. Recently, SEM has gained popularity in medical and public health research (Dahly et al. 2009; Naliboff et al. 2012).

Our interest in SEM was motivated by its potential application to integrative analysis in genomic studies. Recent technological advances have made it possible to collect different types of genomic data, including DNA copy number, SNP genotype, DNA methylation level, and expression levels of mRNA, microRNA, and protein, on a large number of subjects. There is a growing interest in integrating these genomic platforms so as to understand their biological relationships and predict disease progression and death, which are considered potentially censored survival times (The Cancer Genome Atlas (TCGA); https://tcga-data.nci.nih.gov/tcga/).

SEM with discrete survival times has been studied by Rabe-Hesketh et al. (2001; 2004), Muthén and Masyn (2005), and Moustaki and Steele (2005). For continuous survival time, Larsen (2004; 2005) adopted the proportional hazards model (Cox 1972) with a single latent variable to capture the association between the survival time and other observed variables; Asparouhov et al. (2006) considered a more general formulation of the association among the latent and observed variables. SEM with the Cox proportional hazards model for the survival component has been adopted for more complex settings, such as multivariate survival times (Stoolmiller and Snyder 2006) and competing risks (Stoolmiller and Snyder 2013). A popular software program, Mplus (Muthén and Muthén 1998–2015), has implemented SEM with survival data under the proportional hazards model. The estimation of the nonparametric baseline hazard function is based on piecewise-constant splines, and no theoretical justification is available. In fact, the standard error estimator for the baseline hazard function is incorrect.

In this article, we propose a general SEM framework that includes a semiparametric component of the measurement model for potentially censored survival times. Specifically, we formulate the effects of latent and observed covariates on survival times through a broad class of semiparametric transformation models that includes the proportional hazards model as a special case. The observed covariates may include manifest variables that depend on latent variables. We study nonparametric maximum likelihood estimation (NPMLE), under which the cumulative hazard functions are estimated by step functions with jumps at observed survival times.

The proposed SEM is reminiscent of joint modeling for survival and longitudinal data (Henderson et al. 2000; Tsiatis and Davidian 2004). With the latter, the observed longitudinal variables are considered error-prone measurements of some underlying latent variables, but the measurements themselves are not causal determinants of the survival time. By contrast, our SEM framework allows latent variables to have direct effects on survival times, as well as indirect effects through other manifest variables. In addition, our framework accommodates much more complex relationships among latent variables.

A major challenge in our theoretical development is model identifiability. Even for an SEM with normally distributed variables, no single set of conditions exists that is both necessary and sufficient for model identifiability. Methods that deal with special cases of the normal SEM were proposed by Bollen (1989), Reilly and O’Brien (1996), Vicard (2000), and Bollen and Davis (2009), among others. Most of the methods are based on the fact that identifiability can be established by solving the equations relating the first two model-implied moments to the sample moments. This approach is not directly applicable to models with nonparametric components, as infinite-dimensional parameters cannot be identified through a finite number of equations. Because the proportional hazards structure results in a likelihood function that takes the form of a Laplace transform, however, we are able to develop sufficient conditions under which the identifiability of a semiparametric SEM can be established by inspecting simpler parametric models.

Another theoretical challenge is the invertibility of the information operator. For the information operator to be invertible, we require that the score statistic along any nontrivial submodel is non-zero. As in the case of model identifiability, general conditions for the invertibility of the information operator for semiparametric models do not exist. In the existing work involving latent variables for survival times (Kosorok et al. 2004; Zeng and Lin 2010), verifying the invertibility of the information operator involves inspecting the local behavior of the score statistic around the zero survival time. This approach does not make full use of the variability of the score statistic contributed by the survival times and cannot deal with the proposed general modeling framework. We show that the invertibility of the information operator can be verified by inspecting the parametric components of the SEM under some mild conditions in the survival model.

The rest of this article is structured as follows. In Section 2, we formulate the model and describe our approach to establish model identifiability. In Section 3, we discuss the numerical implementation of the NPMLE. In Section 4, we present theoretical results for model identifiability and describe the asymptotic properties of the estimators. In Section 5, we report the results from simulation studies. In Section 6, we provide an application to the TCGA data, which motivated this work. We make some concluding remarks in Section 7 and relegate theoretical proofs to the Appendix.

2. BASIC FRAMEWORK

2.1 Model and Likelihood

Let η denote a q-vector of latent variables, Y denote an r-vector of uncensored manifest variables, (T₁, …, T_K) denote K potentially censored survival times, and W and Z denote two vectors of observed covariates. Without loss of generality, assume that the support of the covariates includes zero. We specify the conditional distributions of η given Z, Y given Z and η, and T_k given W, Z, Y, and η as follows:

η ∣ Z ~ F_{η} (\cdot ∣ Z; ν),

(1)

Y ∣ (Z, η) ~ F_{Y} (\cdot ∣ Z, η; ψ),

(2)

Λ_{T_{k}} (t ∣ W, Z, Y, η) = G_{k} {Λ_{k} (t) e^{W^{T} ϑ_{k} + Z^{T} β_{k} + Y^{T} α_{k} + η^{T} ϕ_{k}}}, k = 1, \dots, K,

(3)

where F_η (· | Z, ν) denotes a q-variate normal distribution function indexed by a parameter vector ν, F_Y (· | Z, η; ψ) denotes an r-variate parametric distribution function indexed by a parameter vector ψ, Λ_{T_k} is the cumulative hazard function of T_k given (W,Z,Y, η), G_k is a known increasing function, Λ_k is an unspecified positive increasing function with Λ_k(0) = 0, and (ϑ_k, β_k, α_k,ϕ_k) are unknown regression parameters.

Model (1) is the structural model of the latent variables. Model (2) is the measurement model of Y. We assume that Y and η are independent of W given Z. Models (1) and (2) represent the existing SEM framework with Y not restricted to be normally distributed. Equation (3) includes the proportional hazards and proportional odds models as special cases with the choices of G_k (x) = x and G_k (x) = log (1 + x), respectively. The proportional hazards model has been considered in the literature.

The survival time T_k is subject to right censoring by C_k. It is assumed that (C₁, …, C_K) are independent of (T₁, …, T_K) and η conditional on Y, Z, and W. Define T̃_k = min(T_k,C_k) and Δ_k = I(T_k ≤ C_k), where I(·) is the indicator function. For a sample of size n, the observed data consist of 𝒪_i ≡ (T̃₁_i, … T̃_Ki,Δ₁_i, …, Δ_Ki,Y_i,Z_i,W_i) (i = 1, …, n).

Let θ denote the collection of all Euclidean parameters, and write 𝒜 = (Λ₁, …, Λ_K). The likelihood function for θ and 𝒜 is proportional to

\begin{array}{l} L_{n} (θ, A) = \prod_{i = 1}^{n} \int \prod_{k = 1}^{K} {[λ_{k} ({\tilde{T}}_{k i}) e^{W_{i}^{T} ϑ_{k} + Z_{i}^{T} β_{k} + Y_{i}^{T} α_{k} + η^{T} ϕ_{k}} \times G_{k}^{'} {Λ_{k} ({\tilde{T}}_{k i}) e^{W_{i}^{T} ϑ_{k} + Z_{i}^{T} β_{k} + Y_{i}^{T} α_{k} + η^{T} ϕ_{k}}}]}^{Δ_{k i}} \\ \times exp [- G_{k} {Λ_{k} ({\tilde{T}}_{k i}) e^{W_{i}^{T} ϑ_{k} + Z_{i}^{T} β_{k} + Y_{i}^{T} α_{k} + η^{T} ϕ_{k}}}] \\ \times f_{Y} (Y_{i} ∣ Z_{i}, η; ψ) f_{η} (η ∣ Z_{i}; ν) d η, \end{array}

(4)

where f′(x) = df(x)/dx for any function f, $λ_{k} = Λ_{k}^{'}, f_{Y} = F_{Y}^{'}$ , and $f_{η} = F_{η}^{'}$ . The NPMLE is defined to be the maximizer of L_n(θ,𝒜), in which Λ_k is treated as a step function with jumps at T̃_ki with Δ_ki = 1 (i = 1, …, n).

2.2 Model Identifiability

We describe our approach to establish model identifiability in this section and defer the technical details to Section 4. The identifiability results can be summarized by two simple rules. Suppose that we have arranged the survival times such that for some 0 ≤ K₁ ≤ min(q,K), each of (T₁, …, T_K_₁) regresses on and only on one latent variable and a set of covariates that are independent of the latent variables. (We allow K₁ = 0 if no survival time satisfies the given conditions, in which case Rule 1 below is vacuous.) We call an observed variable X an indicator of a latent variable η if X follows a generalized linear model with η as a covariate and is independent of all other manifest variables and survival times conditional on η. We have the following rules:

Rule 1. The latent variables attached to (T₁, …, T_K_₁) can be treated as observed if each of (T₁, …, T_K_₁) depends on at least one observed covariate.
Rule 2. If each latent variable has a separate continuous indicator and the distributions of the latent variables and the indicators are identifiable, then the whole model is identifiable.

To illustrate the usefulness of the two identifiability rules, we present two examples.

Example 1

Consider the model depicted in Figure 1. In the model, Y₁, Y₂, and Y₃ are conditionally independent normal manifest variables of η, and T is a survival time that follows the proportional hazards model with covariate η. Assume that the regression parameter of Y₁ on η is fixed to be one, E(η) = 0, and the regression parameters of η in the models of Y₂ and Y₃ are non-zero.

The First Example of SEM to Illustrate the Identifiability Rules. The SEM con- sists of one latent variable, one survival time, and three conditionally independent normal manifest variables.

The above model is similar to the joint model for survival and longitudinal variables. By Bollen (1989)’s three-indicator rule, the model of (Y₁, Y₂, Y₃, η) is identifiable. With Y₁ serving as an indicator of η, Rule 2 implies that the remaining parameters are identifiable. Note that Rule 1 is not applicable in this case because T does not depend on an independent covariate. In fact, the model is not identifiable without (Y₁, Y₂, Y₃) because the scale of the baseline hazard function and the variance of η cannot be separated.

Example 2

Consider the model depicted on the left-hand side of Figure 2. In the model, Y₁, Y₂, and Y₃ are conditionally independent normal manifest variables of η₂, T₁ is a survival time that follows the proportional hazards model with covariates W and η₁, and T₂ is a survival time that follows the proportional hazards model with covariates η₁ and η₂. Assume that W and Z are non-constant and linearly independent, the regression parameters for the latent variables in the models of T₁ and Y₁ are fixed to be one, E(η₁) = E(η₂) = 0, and the regression parameters of W in the model of T₁ and η₂ in the models of Y₂ and Y₃ are non-zero.

The Second Example of SEM to Illustrate the Identifiability Rules. The left panel is an SEM that consists of two latent variables, two observed covariates, two survival times, and three conditionally independent normal manifest variables. The right panel is an intermediate step in identifying the SEM on the left.

First, we use T₁ to help identify the latent variable distributions. By Rule 1, η₁ can be treated as observed when identifying the model. The problem thus reduces to identifying the model shown on the right-hand side of Figure 2. The model can then be shown identifiable by the arguments used in Example 1.

3. COMPUTATION OF THE NPMLE

In this section, we use Z to denote both W and Z with β_k (k = 1, …,K) as the corresponding vector of regression parameters. Application of a transformation G_k can be viewed as inclusion of an extra latent variable log s_k in the regression equation, where s_k is a random variable with density g_k such that $G_{k} (x) = - log \int_{0}^{\infty} e^{- x t} g_{k} (t) d t$ . We adopt the Expectation-Maximization (EM) algorithm (Dempster et al. 1977) by treating the latent variables, including those introduced by the transformations, as missing data. We perform occasional Newton-Raphson steps to speed up the convergence.

In the combined algorithm, either an EM step or a Newton-Raphson step is performed at each iteration. To avoid confusion, we call the latter an outer Newton-Raphson step. For an EM step, note that the conditional expectation for any function φ of (η_i, s_i) ≡ (η_i, s₁_i, …, s_Ki) given the observed data is

\begin{array}{l} E {φ (η_{i}, s_{i}) ∣ O_{i}} = C^{- 1} \int \int φ (η, s) \prod_{k = 1}^{K} ({[Λ_{k} {{\tilde{T}}_{k i}} s_{k} e^{Z_{i}^{T} β_{k} + Y_{i}^{T} α_{k} + η^{T} ϕ_{k}}]}^{Δ_{k i}} \\ \times exp (- \int_{0}^{{\tilde{T}}_{k i}} s_{k} e^{Z_{i}^{T} β_{k} + Y_{i}^{T} α_{k} + η^{T} ϕ_{k}} d Λ_{k} (t))) \\ \times f_{Y} (Y_{i} ∣ Z_{i}, η; ψ) f_{η} (η ∣ Z_{i}; ν) g (s) d η d s_{1} \dots d s_{K}, \end{array}

where Λ_k {t} is the jump size of the step function Λ_k at t, s = (s₁, …, s_K), $g (s) = \prod_{k = 1}^{K} g_{k} (s_{k})$ , and 𝒞 equals the above integral evaluated at φ(·, ·) = 1. We use the Gauss-Hermite quadrature to approximate the integrals. To reduce the number of abscissas, we adopt an adaptive quadrature approach (Liu and Pierce 1994). Denote the approximation of the conditional expectation as Ê(·). After taking expectation on the functions involved, we update (β_k, α_k,ϕ_k) by the one-step Newton-Raphson algorithm on

\sum_{i = 1}^{n} log {[\frac{\hat{E} {s_{k i} exp (Z_{i}^{T} β_{k} + Y_{i}^{T} α_{k} + η_{i}^{T} ϕ_{k})}}{\sum_{j = 1}^{n} I ({\tilde{T}}_{k j} \geq {\tilde{T}}_{k i}) \hat{E} {s_{k j} exp (Z_{j}^{T} β_{k} + Y_{j}^{T} α_{k} + η_{j}^{T} ϕ_{k})}}]}^{Δ_{k i}} .

Then, we update the cumulative baseline hazard function by

{\hat{Λ}}_{k} {T_{k i}} = \frac{Δ_{k i}}{\sum_{j = 1}^{n} I ({\tilde{T}}_{k j} \geq {\tilde{T}}_{k i}) \hat{E} {s_{k j} exp (Z_{j}^{T} β_{k} + Y_{j}^{T} α_{k} + η_{j}^{T} ϕ_{k})}},

where (β_k, α_k,ϕ_k) are evaluated at the current estimates. In addition, we update the remaining parameters at the maximum of

\sum_{i = 1}^{n} \hat{E} [log {f_{Y} (Y_{i} ∣ Z_{i}, η_{i}; ψ) f_{η} (η_{i} ∣ Z_{i}; ν)}] .

If a closed-form solution is not available, then we apply the one-step Newton-Raphson algorithm to the above expression instead. The above algorithm can be generalized to the case where components of Y_i that do not appear in the model of the survival times are missing at random for some subjects. In this case, we simply drop the corresponding f_Y terms in the evaluation of Ê and the complete-data log-likelihood.

For an outer Newton-Raphson step, we apply the one-step Newton-Raphson algorithm directly to the logarithm of L_n(θ,𝒜) given in (4) using a similar adaptive quadrature approximation. At the current estimates, the first derivative of log L_n(θ,𝒜), i.e., the score statistic, is the same as the first derivative of the expected complete-data log-likelihood. The Hessian matrix used in the Newton-Raphson algorithm can be obtained by Louis (1982)’s formula.

To determine whether an EM step or an outer Newton-Raphson step is to be performed, we keep track of the difference in the log-likelihood at the previous iteration, either an EM or an outer Newton-Raphson step, and the difference at the previous outer Newton-Raphson step. For each iteration, if the log-likelihood difference at the previous step is too small relative to that at the previous outer Newton-Raphson step, then an outer Newton-Raphson step is performed; otherwise, an EM step is performed. Upon convergence, Louis (1982)’s formula is used to obtain the information matrix for the estimation of the standard errors.

The reason that we use a combination of the EM and Newton-Raphson algorithms instead of the Newton-Raphson algorithm alone is two-fold. First, EM steps are more stable, which is important, especially in early iterations. Second, in the estimation of the survival model under the EM algorithm, the regression parameters can be obtained by maximizing the partial-likelihood-type function, and the estimators of the baseline hazard functions take the form of the Breslow estimator. Unlike the Newton-Raphson algorithm, the EM algorithm does not involve the inversion of a high-dimensional matrix.

4. THEORETICAL PROPERTIES

4.1 Identifiability Conditions

As discussed in Section 2, we set aside K₁ survival times, T₁, …, T_K_₁, that are used to identify the distribution of the underlying latent variables. We assume that span(ϕ₁, …, ϕ_K_₁) = ℝ^K^₁. We can choose the K₁ survival times such that each is associated with a few, preferably only one, latent variables. (K₁ is allowed to be 0, in which case we rely solely on the manifest variable Y to identity the latent variable distribution.) Without loss of generality, we assume that ϕ_k = e_k (k = 1, …, K₁), where e_k is a q-vector with 1 at the kth position and 0 elsewhere. This assumption can be satisfied by applying a linear transformation to the latent variables. Effectively, we fix the scale of the first K₁ latent variables, as is common when establishing model identifiability for SEM. We partition η into (η₁, η₂), where η₁ ≡ (η₁₁, …, η₁_K₁) consists of the first K₁ components of η.

We consider the following identifiability conditions. For the “baseline” hazard functions (Λ₁, …, Λ_K), we only require identifiability on [0, τ], where τ denotes the study duration.

(C1)
If (1,W^T,Z^T,Y^T)^Tc = 0 almost surely for some vector c of appropriate dimension, then c = 0. For k = 1, …,K, λ_k is continuous and strictly positive on [0, τ], and there exists a positive and measurable function g_k such that $exp {- G_{k} (t)} = \int_{0}^{\infty} e^{- t s} g_{k} (s) d m_{k} (s)$ , where m_k is the Lebesgue measure or the counting measure at 1.
(C2)
For k = 1, …,K₁, E(Y^Tα_k | Z = 0) = 0, and E(η | Z = 0) = 0. Also, for any vectors c₁ and c₂ of appropriate dimensions, E(e^Y^{^T}^c^₁+^η^{^T}^c^₂ | Z) is finite almost surely.
(C3)
For k = 1, …,K₁, ϑ_k is non-zero, and α_k and β_k are zero.
(C4)
Consider two sets of parameters (ψ, ν) and (ψ̃, ν̃). Let f_Y_,_η_₁ be the density of (Y, η₁) given Z. Then, f_Y_,_η_₁ (Y, η₁ | Z; ψ, ν) = f_Y_,_η_₁ (Y, η₁ | Z; ψ̃, ν̃) for all Z, Y, and η₁ implies that ψ = ψ̃ and ν = ν̃.
(C5)
Let (Y⁺, $η_{1}^{+}$ ) be the components of (Y, η₁) that appear in the regression of T_k for some k = K₁ + 1, …,K, and let (Y⁻, $η_{1}^{-}$ ) be the remaining components. Let F_η_₂|_Y_,_η_₁ be the distribution function of η₂ given (Z,Y, η₁) with (Y⁻, $η_{1}^{-}$ ) treated as a parameter vector. Then, η₂ is complete sufficient in {F_η_₂|_Y_,_η_₁ (· | Z,Y, η₁): Z = z₀,Y⁺ = y₀, $η_{1}^{+} = η_{10}$ for any fixed z₀, y₀, and η₁₀.

Remark 1

Condition (C1) pertains to basic requirements on the covariates, the baseline hazard functions, and the transformation functions such that the survival model with observed covariates is identifiable. If m_k is a point mass at 1, then G_k is simply the identity function. Condition (C2) fixes the location parameters of the latent variables and the manifest variables that appear in the regression models of the first K₁ survival times. Condition (C3) requires that the first K₁ survival times depend only on their corresponding latent variable and W. The presence of a covariate besides the latent variable is necessary for distinguishing the contributions of the baseline hazard function and the latent variable to the distribution of a survival time that follows a mixture distribution. Condition (C4) requires that the model with observed (Y, η₁) is identifiable. Condition (C5) requires that η₂ is complete sufficient conditional on (Y, η₁), where components of (Y, η₁) that do not appear in the regression of T_k (k = K₁ + 1, …, K) are treated as parameters, and the rest are held fixed. Conditions (C2) and (C3) are vacuous if K₁ = 0, and condition (C5) is vacuous if K₁ = K.

We have the following identifiability result.

Theorem 1

Under conditions (C1)–(C5), the model specified by (1)–(3) is identifiable.

Remark 2

The condition that α_k and β_k are zero for k = 1, …, K₁ separates the first K₁ survival times from the remaining observed variables that are associated with the latent variables. This condition is used to simplify the presentation of the identifiability conditions. In the proof of Theorem 1, we consider generalized versions of conditions (C3)–(C5), where α_k and β_k are allowed to be non-zero.

Remark 3

Theorem 1 implies that the distribution of the latent variable underlying a given survival time can be completely identified if the survival time only regresses on the latent variable and a set of independent covariates. Thus, the survival times make it easy to identify the model, as only a single survival time is enough to identify an underlying latent variable. By contrast, this property does not hold for normal random variables.

Remark 4

The derivation of model identifiability from condition (C5) utilizes the property of complete sufficient statistics. The derivation is applicable to general latent variable models; the general result is given by Lemma 1 in the Appendix. Lemma 1 allows for the establishment of model identifiability by inspecting just a part of the model. It includes Reilly and O’Brien (1996)’s side-by-side rule, which states that the loadings of an observed variable on any number of latent variables are identifiable if each of the latent variables is attached to a separate independent observed variable whose distribution is identifiable, as a special case.

4.2 Asymptotic Properties

Let d be the dimension of θ, θ₀ denote the true value of θ, and Λ₀_k denote the true value of Λ_k (k = 1, …, K). We impose the following conditions.

(D1)
The parameter θ₀ lies in the interior of a compact set Θ ⊂ ℝ^d, and the function Λ₀_k is continuously differentiable with $λ_{0 k} (t) \equiv Λ_{0 k}^{'} (t) > 0$ on [0, τ] for each k = 1, …,K.
(D2)
With probability one, P(T̃_ki = τ | W,Z) > δ₀ (k = 1, …,K) for some fixed δ₀ > 0.
(D3)
Consider any fixed Z and (ψ, ν) ∈ Θ_ψν, where Θ_ψν consists of the (ψ, ν)-component of every θ ∈ Θ. For any constant a₁ > 0 and δ = 0, 1,
$E {\int e^{a_{1} (1 + ∣ Y ∣ + ∣ η ∣)} f_{Y} {(Y ∣ Z, η; ψ)}^{δ} f_{η} (η ∣ Z; ν) d η} < \infty .$

Also, for j = 1, 2, 3, there exists a constant a₂ > 0 such that
$| \frac{\frac{\partial^{j}}{\partial ψ^{j}} f_{Y} (Y ∣ Z, η; ψ)}{f_{Y} (Y ∣ Z, η; ψ)} | + | \frac{\frac{\partial^{j}}{\partial ν^{j}} f_{η} (η ∣ Z; ν)}{f_{η} (η ∣ Z; ν)} | \leq e^{a_{2} (1 + ∣ Y ∣ + ∣ η ∣)} .$

In addition, for some positive constants M_j and c_j, N_j ∈ ℝ^r, and φ₁ ∈ ℓ^∞(ℝ^r),
$e^{\sum_{j = 1}^{q} - M_{j} ∣ b_{j} ∣} \leq e^{\sum_{j = 1}^{q} (N_{j}^{T} Y b_{j} + c_{j} b_{j}^{2})} φ_{1} (Y) f_{Y} (Y ∣ Z, η; ψ) f_{η} (η ∣ Z; ν) \leq e^{\sum_{j = 1}^{q} M_{j} ∣ b_{j} ∣},$

where b = S (η) for some one-to-one linear transformation S.
(D4)
The function G_k is four-times differentiable, G_k (0) = 0, and $K_{1 k} {(1 + x)}^{- κ_{1 k}} \leq G_{k}^{'} (x) \leq K_{2 k} {(1 + x)}^{κ_{2 k}}$ for some positive constants κ₁_k, κ₂_k, K₁_k, and K₂_k. Also, G_k(x)/x^ρk → M_k or G_k(x)/log(x) → M_k as x → ∞ for some positive constants M_k and ρ_k. In addition, exp {−G_k(x)} ≤ μ_k (1 + x)^−κ_3k for some μ_k and κ₃_k > κ₂_k + 1. Furthermore, for some r_k,
$sup_{x \geq 0} \frac{| G_{k}^{″} (x) | + | G_{k}^{(3)} (x) | + | G_{k}^{(4)} (x) |}{G_{k}^{'} (x) {(1 + x)}^{r_{k}}} < \infty,$

where $G_{k}^{″}$ and $G_{k}^{(j)}$ denote the second and jth derivatives of G_k, respectively.
(D5)
Let (Z⁽^k⁾,Y⁽^k⁾) be the components of (Z,Y) that appear in the regression of T_k (k = 1, …,K₁). For any vectors h₁, h₂_k, and h₃_k of appropriate dimensions, if
$\frac{\partial}{\partial (ψ, ν)} f_{Y, η_{1}} {(Y, η_{1} ∣ Z; ψ, ν)}^{T} h_{1} - \sum_{k = 1}^{K_{1}} (Z^{(k) T} h_{2 k} + Y^{(k) T} h_{3 k}) \frac{\partial}{\partial η_{1 k}} f_{Y, η_{1}} (Y, η_{1} ∣ Z; ψ, ν)$

is equal to 0 for all Z, Y, and η₁, then h₁ = 0, h₂_k = 0, and h₃_k = 0 for k = 1, …,K₁, where f_Y_,_η_₁ is defined in condition (C4).

Remark 5

Conditions (D1)–(D4) are similar to the conditions of Zeng and Lin (2010) for joint modeling of longitudinal and survival data. Extra conditions are imposed on the transformations and the distributions of Y and η to accommodate the presence of unbounded covariate Y in the survival model. Condition (D5) is for the invertibility of the information operator. If α_k = 0 and β_k = 0 for k = 1, …, K₁, then condition (D5) simply requires that the information matrix of the model for (Y, η₁) is invertible. This is parallel to condition (C4) for identifiability.

Remark 6

The conditions for identifiability and the invertibility of the information operator (C1)–(C5), and (D5) differ significantly from the corresponding conditions (C5) and (C7) of Zeng and Lin (2010). The latter are stated under very general settings, but they are hard to verify for specific models, especially under our SEM framework. By contrast, our conditions are easier to verify and have intuitive interpretations. For the model in Example 1, (D5) simply requires that the model of (Y₁, Y₂, Y₃, η₁) given Z has a non-zero score statistic, which clearly holds.

Let 𝒜₀ = (Λ₀₁, …,Λ₀_K) and (θ̂, 𝒜̂) be the NPMLE of (θ, 𝒜). Also, let 𝒱 = {v ∈ ℝ^d, |v| ≤ 1} and 𝒬 = {h (t): ||h (t)||_V _[0,_τ_] ≤ 1} with ||·||_V _[0,_τ_] being the total variation norm on [0, τ]. We consider (θ̂ − θ₀, 𝒜̂ − 𝒜₀) as a random element in l^∞(𝒱 × 𝒬^K) with

(\hat{θ} - θ_{0}, \hat{A} - A_{0}) (v, h_{1}, \dots, h_{K}) = {(\hat{θ} - θ_{0})}^{T} v + \sum_{k = 1}^{K} \int_{0}^{τ} h_{k} (s) d ({\hat{Λ}}_{k} - Λ_{0 k}) (s) .

We have the following results.

Theorem 2

Under conditions (C1)–(C5) and (D1)–(D5),

$∣ \hat{θ} - θ_{0} ∣ + \sum_{k = 1}^{K} {sup}_{t \in [0, τ]} ∣ {\hat{Λ}}_{k} (t) - Λ_{0 k} (t) ∣ \to_{a . s .} 0$ ; and
n¹^/²(θ̂ − θ₀, 𝒜̂ − 𝒜₀) →_d 𝒢 in l^∞(𝒱 × 𝒬^K), where 𝒢 is a continuous zero-mean Gaussian process. Furthermore, the limiting covariance matrix of n¹^/²(θ̂ − θ₀) attains the semiparametric efficiency bound.

Remark 7

The proof of Theorem 2 relies on the Donsker properties of certain classes of functions. It is more challenging to establish the Donsker results in our setting than in previous settings (e.g., Kosorok et al. (2004) and Zeng and Lin (2010)) because the likelihood function of the proposed model may contain the unbounded variable Y.

Remark 8

A key step in proving the asymptotic normality of the NPMLE is to show that the information operator is invertible. The result is given by Lemma 2 in the Appendix, which states that condition (D5), together with conditions (C1)–(C3), and (C5), implies that the information operator of the model is invertible. With this result, we can verify the invertibility of the information operator of the semiparametric model by inspecting the parametric part of the model that contains the observed and latent variables. For the frailty models in Kosorok et al. (2004), verification of the invertibility of the information operator involves inspection of the local behavior of the score around T = 0. However, that approach is limited to frailty distributions that are indexed by a one-dimensional parameter and is not directly applicable to cases with more complex latent variable distributions such as those in our setting.

5. SIMULATION STUDIES

We considered a model with covariates Z = (Z₁,Z₂)^T, two latent variables (η₁, η₂), observed continuous variables (Y₁, …, Y₅), observed binary variables (Y₆, Y₇), and a survival time T. Their distributions are given by

\begin{array}{l} Λ_{T} (t ∣ Z, Y_{6}, Y_{7}, η_{2}) & = & G {Λ_{0} (t) exp (X_{T}^{T} β_{T} + ϕ_{T} η_{2})}, X_{T} = {(Z_{1}, Z_{2}, Y_{6}, Y_{7})}^{T}, \\ logit {P (Y_{6} = 1 ∣ Z, η_{2})} & = & X_{Y_{6}}^{T} β_{Y_{6}} + ϕ_{Y_{6}} η_{2}, X_{Y_{6}} = {(1, Z_{1}, Z_{2})}^{T}, \\ logit {P (Y_{7} = 1 ∣ Z, Y_{6}, η_{2})} & = & X_{Y_{7}}^{T} β_{Y_{7}} + ϕ_{Y_{7}} η_{2}, X_{Y_{7}} = {(1, Z_{1}, Z_{2}, Y_{6})}^{T}, \\ Y_{j} ∣ η_{1} & ~ & N (β_{Y_{j}} + ϕ_{Y_{j}} η_{1}, σ_{Y_{j}}^{2}), j = 1, 2, 3, \\ Y_{j} ∣ η_{2} & ~ & N (β_{Y_{j}} + ϕ_{Y_{j}} η_{2}, σ_{Y_{j}}^{2}), j = 4, 5, \\ η_{2} & ~ & N (β_{η} η_{1}, σ_{η_{2}}^{2}), \\ η_{1} & ~ & N (0, σ_{η_{1}}^{2}) . \end{array}

The parameters ϕ_Y_₁ and ϕ_Y_₄ are fixed to be one. The model is depicted in Figure 3.

Model Used in Simulation Studies. The SEM consists of two latent variables, an observed covariate, seven binary or normal manifest variables, and a survival time that regresses on the latent variable, some manifest variables, and the observed covariates.

We set Z₁ and Z₂ to independent standard normal and Bernoulli(0.5), respectively, and Λ₀(t) = t². We considered the class of logarithmic transformations G(x) = r⁻¹ log(1 + rx) with r = 0 or 1, which correspond to the proportional hazards and proportional odds models, respectively. We generated the censoring times from Exp(c), where c was chosen to yield approximately 30% censored observations. We set (Y₁, …, Y₅) to be missing completely at random for 30% of the subjects. We set the sample size to 400 and set the number of abscissa points to 20 for each Gauss-Hermite quadrature. We simulated 1,000 datasets for each setting. The results are summarized in Table 1.

Table 1.

Simulation Results for the SEM with Two Latent Variables.

Dep	Ind	Proportional Hazards Model					Proportional Odds Model
Dep	Ind	Param	Bias	SE	SEE	CP	Param	Bias	SE	SEE	CP
T	Z₁	0.100	0.004	0.075	0.073	0.95	0.100	0.004	0.109	0.107	0.94
	Z₂	−0.200	−0.004	0.180	0.182	0.95	−0.200	0.008	0.264	0.270	0.95
	Y₆	0.100	0.001	0.145	0.143	0.95	0.100	0.000	0.214	0.208	0.95
	Y₇	0.200	−0.007	0.167	0.163	0.95	0.200	−0.007	0.233	0.236	0.95
	η₂	0.500	0.020	0.167	0.166	0.96	0.500	0.024	0.225	0.223	0.96
	Λ₀(t₁)	0.202	0.003	0.050	0.049	0.95	0.212	0.004	0.073	0.071	0.95
	Λ₀(t₂)	0.518	0.010	0.121	0.118	0.95	0.608	0.019	0.204	0.201	0.94
	Λ₀(t₃)	1.103	0.042	0.258	0.251	0.95	1.588	0.088	0.550	0.542	0.95
Y₁	Int	0.000	0.000	0.074	0.073	0.94	0.000	0.001	0.074	0.073	0.94
Y₁	Var	1.000	−0.015	0.132	0.128	0.96	1.000	−0.016	0.132	0.128	0.97
Y₂	Int	0.000	0.002	0.076	0.073	0.94	0.000	0.002	0.076	0.073	0.94
	η₁	1.000	0.022	0.220	0.204	0.94	1.000	0.021	0.218	0.204	0.94
	Var	1.000	−0.013	0.130	0.129	0.96	1.000	−0.012	0.129	0.129	0.96
Y₃	Int	0.000	0.001	0.076	0.073	0.95	0.000	0.002	0.076	0.073	0.95
	η₁	1.000	0.021	0.214	0.203	0.95	1.000	0.021	0.215	0.204	0.96
	Var	1.000	−0.010	0.132	0.128	0.96	1.000	−0.010	0.133	0.129	0.96
Y₄	Int	0.000	−0.001	0.077	0.075	0.95	0.000	−0.001	0.077	0.076	0.95
Y₄	Var	1.000	−0.026	0.193	0.184	0.96	1.000	−0.032	0.215	0.210	0.96
Y₅	Int	0.000	−0.002	0.078	0.076	0.94	0.000	−0.001	0.079	0.076	0.94
	η₂	1.000	0.048	0.296	0.276	0.94	1.000	0.057	0.339	0.323	0.95
	Var	1.000	−0.031	0.195	0.187	0.96	1.000	−0.036	0.219	0.214	0.96
Y₆	Int	0.000	−0.005	0.286	0.273	0.94	0.000	−0.006	0.286	0.273	0.94
	Z₁	−0.500	0.001	0.112	0.113	0.95	−0.500	0.001	0.112	0.113	0.95
	Z₂	0.500	0.021	0.307	0.297	0.95	0.500	0.021	0.307	0.297	0.95
	η₂	0.000	0.008	0.226	0.220	0.96	0.000	0.007	0.228	0.221	0.97
Y₇	Int	1.000	0.032	0.338	0.345	0.96	1.000	0.030	0.337	0.345	0.96
	Z₁	1.000	0.028	0.148	0.148	0.96	1.000	0.028	0.148	0.148	0.96
	Z₂	0.200	0.002	0.333	0.342	0.96	0.200	0.002	0.333	0.342	0.96
	Y₆	−0.200	−0.011	0.264	0.264	0.95	−0.200	−0.009	0.264	0.264	0.95
	η₂	0.000	0.006	0.262	0.253	0.97	0.000	0.008	0.266	0.255	0.97
η₁	Var	0.500	0.009	0.139	0.134	0.95	0.500	0.009	0.138	0.134	0.95
η₂	η₁	0.500	0.008	0.156	0.147	0.93	0.500	0.008	0.160	0.151	0.94
η₂	Var	0.500	0.007	0.175	0.168	0.96	0.500	0.013	0.194	0.191	0.97

Open in a new tab

NOTE: Each row corresponds to the regression parameter of the dependent variable “Dep” on the independent variable “Ind” or some other parameter in the model of “Dep”. “Int” and “Var” stand for the intercept and error variance, respectively. The parameters Λ₀(t₁), Λ₀(t₂), and Λ₀(t₃) correspond to the cumulative baseline hazard function values at the 25%, 50%, and 75% quantiles of the survival time. The true value of a parameter is given under “Param”. “Bias” is the empirical bias; “SE” is the empirical standard error; “SEE” is the empirical mean of the standard error estimator; and “CP” is the empirical coverage probability of the 95% confidence interval.

The estimators of all parameters are virtually unbiased for both the proportional hazards and proportional odds models. The standard error estimators accurately reflect the true variations, and the coverage probabilities of the confidence intervals are close to the nominal level. Standard error estimators for the parameters in the survival model are larger under the proportional odds model than under the proportional hazards model. As a result, the standard error estimators for the parameters associated with η₂ are larger. The standard error estimators for the remaining parameters are very similar between the two models.

We also evaluated Mplus (Muthén and Muthén 1998–2015) under the proportional hazards model, and the results are presented in Table S.1 of the Supplementary Materials. The results for the Euclidean parameters are similar to those presented in Table 1. Mplus provides estimator for the baseline hazard function instead of the cumulative baseline hazard function. Its standard error estimator does not reflect the true variation, and the coverages of the confidence intervals are far below the nominal level.

6. REAL DATA ANALYSIS

We analyzed a dataset on patients with serous ovarian cancer from the TCGA project (The Cancer Genome Atlas Research Network 2011). Genomic variables include DNA copy number, SNP genotype, DNA methylation level, and levels of expression of mRNA, microRNA, total protein, and phosphorylated protein. Demographic and clinical variables include age at diagnosis, race, tumor stage, tumor grade, time to tumor progression, and time to death. There are a total of 586 patients. The median follow-up time was about 2.5 years, and roughly 30% of the patients were lost to follow-up before tumor progression or death. The data are available from http://gdac.broadinstitute.org/.

We focused on the integrative analysis of clinical outcomes and expression levels of mRNA, total protein, and phosphorylated protein. We considered mRNA expression as a latent variable that can only be observed with error through three microarray platforms, namely Agilent 244K Whole Genome Expression Array, Affymetrix HT-HG-U133A, and Affymetrix Exon 1.0. We assumed that the effects of a gene on clinical outcomes are mediated through unobserved protein activity. The latent protein activity is modified by mRNA expression and is manifest through the observed protein expression measurements, which were obtained from the reverse-phase protein arrays platform. Figure 4 depicts the SEM fit for each gene. We assumed that the observed variables follow the distributions described in Section 5, with (Y₁, Y₂, Y₃) being the three microarray measurements, (Y₄, Y₅) = (Total protein expression, Phosphorylated protein expression), (Y₆, Y₇) = (Tumor stage, Tumor grade), (Z₁, Z₂) = (Age, Race), and T being progression-free survival time.

Results from the SEM Analysis of the Gene ACACA. Analysis results are from 542 patients with ovarian cancer in the TCGA project. The numbers besides an arrow correspond to the point estimate and standard error estimate (in parentheses) of the regression parameter. The numbers below the latent variables correspond to the point estimate and standard error estimate (in parentheses) of the error variance.

We dichotomized tumor stage into stage II/III versus stage IV and tumor grade into grade 2 versus grade 3/4. Race was dichotomized into white and non-white. We allowed mRNA expression and protein expression data to be missing for some subjects. We excluded patients with tumor stage I or grade 1, as those patients may have a disease that is biologically different from that of patients with tumors of other stages or grades. For each gene, we fit the class of transformation models with G(x) = r⁻¹ log(1 + rx) over a grid of r = (0, 0.1, …, 2). We selected the model with the smallest AIC or, equivalently, the largest log-likelihood value.

We present the results for the gene ACACA. The sample size is 542. About 30% of the subjects do not have protein expression data, and over 10% of the subjects miss at least one mRNA expression measurement. The best-fitting model is obtained at r = 1, which corresponds to the proportional odds model. The point estimates and standard error estimates of the parameters associated with the latent variables are shown in Figure 4. The remaining results are shown in Table S.2 of the Supplementary Materials. The latent variables have strong positive association with the measurement platforms. As expected, latent protein activity and latent mRNA expression are highly correlated. Latent protein activity is positively associated with progression-free survival time, with a p-value of 0.100. Specifically, higher latent protein activity is associated with shorter progression-free survival time, which agrees with the findings of the literature (Menendez and Lupu 2007). The association of ACACA with tumor stage or tumor grade is weak.

The results for the parameters in the non-survival models are similar between r = 0 and 1. The parameters in the survival model have different interpretations between r = 0 and 1. With r = 0, a unit increase in latent protein activity would have a multiplicative effect of exp(0.068) on the hazard function. With r = 1, a unit increase in the latent protein activity would have a multiplicative effect of exp(−0.192) on the survival odds. For this dataset, the proportional odds model provides much stronger evidence for the effect of protein activity on progression-free survival than the proportional hazards model.

For the Cox proportional hazards model, we also present the results from Mplus in Table S.2. The results from NPMLE and Mplus are similar for most parameters. There are considerable differences between the cumulative baseline hazard function estimates. The standard error estimates for the cumulative baseline hazard function are not available from Mplus.

For comparisons, we also fit a proportional odds model without latent variables for progression-free survival on the covariates and the two protein expression variables, where the subjects with missing protein expression data were discarded. The p-value of the Wald test for the joint effect of protein expression is 0.157. With r = 0, the Wald test p-value is 0.578. Therefore, analyses based on standard models fail to conclude a strong association between the protein expression and progression-free survival. The power of the proposed SEM framework stems from the appropriate handling of missing data, the dimension reduction of the observed covariates, and the flexibility of the survival model.

7. DISCUSSION

In this article, we consider semiparametric SEM for potentially right-censored survival time data. We prove the consistency, asymptotic normality, and semiparametric efficiency of the NPMLE. We propose new rules for establishing model identifiability and invertibility of the information operator. We construct an EM algorithm to compute the NPMLE and introduce occasional Newton-Raphson steps to accelerate the convergence.

One contribution of Theorem 1 is that it reduces a semiparametric identifiability problem to a parametric one; it shows that the inclusion of the semiparametric component does not make the model less identifiable but, in some sense, makes the model more easily identifiable. With that being said, the result hinges on correct specification of the model and does not guarantee empirical identifiability in a finite sample. Therefore, care should be taken when fitting a model that is nearly non-identifiable. Another main result of ours is given by Lemma 1. This lemma is applicable to a wide range of latent variable models and allows one to deduce the identifiability of a model by inspecting just part of it.

Invertibility of the information operator has received much less attention in the literature than model identifiability. In this article, we prove a general result for invertibility of the information operator. It is evident from the proof that the invertibility of the information operator can be established using techniques similar to those used to establish model identifiability. Specifically, the key to the proof of the identifiability of the mixture Cox model is that with the presence of a covariate that is independent of the latent variable, the contributions to the likelihood from the latent variable and the baseline hazard function can be separated by considering different values of the covariate. (In a normal mixture model, however, we lack such identifiability results precisely because the random effect and error term are combined linearly and their distributions cannot be distinguished.) As a result, if two sets of parameters give rise to the same marginal survival function, then they must do so by giving rise to the same random-effect distribution. Based on the proportional hazards structure, we prove a parallel result for the invertibility of the information operator: the existence of a submodel with zero score implies that the random-effect distribution has zero score along that submodel as well. Therefore, to ensure the invertibility of the information operator of the mixture Cox model, one only has to ensure that the information matrix of the random-effect distribution is invertible.

Our work can be extended in several directions. First, one may be interested in expanding the model by inclusion of more latent and observed variables. As the number of variables increases, the number of parameters to be estimated increases as well. Then, it may be desirable to perform variable selection. Because a single variable may be associated with multiple parameters, one may prefer not to treat parameters as the basic unit of selection, as in traditional lasso methods (Tibshirani 1996). Instead, methods like group lasso (Yuan and Lin 2006) that penalize parameters associated with a variable as a group may be considered.

In our model, the distribution of the manifest variable Y is fully parametric. One can allow a nonparametric transformation on Y. A major challenge arises in extending the asymptotic results to unbounded nonparametric transformation, as the estimator of the transformation function can be unbounded (Zeng and Lin 2010).

Finally, it would be of interest to consider interval-censored data. Interval censoring results in a different likelihood function, which makes the computation of the NPMLE and the derivation of its asymptotic properties challenging, even for univariate survival time data. The asymptotic theory for interval-censored data is only available in a few simple cases; see Huang and Wellner (1997) for a review.

Supplementary Material

SupMaterials

NIHMS844851-supplement-SupMaterials.pdf^{(317.2KB, pdf)}

Acknowledgments

This research was supported by the National Institutes of Health grants R01GM047845, R01CA082659, and P01CA142538.

APPENDIX: TECHNICAL DETAILS

We present the following conditions, which are clearly implied by conditions (C3)–(C5):

(C3′)
For k = 1, …, K₁, ϑ_k is a non-zero vector.
(C4′)
Consider two sets of parameters (α_k, β_k, ψ, ν) and (α̃_k, β̃_k, ψ̃, ν̃) for k = 1, …, K₁, and let η̃₁ = (η̃₁₁, …, η̃₁_K_₁), where η̃₁_k = Y^T(α_k−α̃_k)+Z^T(β_k−β̃_k)+η₁_k. Let f_Y_, _η_₁ be the density of (Y, η₁) given Z. Then, f_Y_, _η_₁ (Y, η₁ | Z; ψ, ν) = f_Y_, _η_₁ (Y, η̃₁ | Z; ψ̃, ν̃) for all Z, Y, and η₁ implies that α_k = α̃_k, β_k = β̃_k, ψ = ψ̃, and ν = ν̃.
(C5′)
For k = K₁ + 1, …, K, let (Y⁽^k⁾, $η_{1}^{(k)}, η_{2}^{(k)}$ ) be the components of (Y, η₁, η₂) that appear in the regression of T_k, and let (Y⁻⁽^k⁾, $η_{1}^{- (k)}, η_{2}^{- (k)}$ ) be the remaining components. If $η_{2}^{(k)}$ is non-empty, then (Y⁻⁽^k⁾, $η_{1}^{- (k)}$ ) is non-empty, and $η_{2}^{(k)}$ is complete sufficient in { $F_{η_{2}^{(k)} ∣ Y, η_{1}} (\cdot ∣ Z, Y, η_{1}) : Z = z_{0}, Y^{(k)} = y_{0}, η_{1}^{(k)} = η_{10}$ } for any fixed z₀, y₀, and η₁₀, where $F_{η_{2}^{(k)} ∣ Y, η_{1}}$ is the distribution function of $η_{2}^{(k)}$ given (Z,Y, η₁) with (Y⁻⁽^k⁾, $η_{1}^{- (k)}$ ) treated as a parameter vector.

We prove Theorem 1 under the generalized conditions (C1), (C2), and (C3′)-(C5′). The proof makes use of two lemmas given at the end of this appendix. We first provide an overview of the proof. For any two sets of parameters (ϑ_k, α_k, β_k, ϕ_k, Λ_k, ψ, ν) and (ϑ̃_k, α̃_k, β̃_k, ϕ̃_k, Λ̃_k, ψ̃, ν̃), assume that the likelihood values at the two sets of parameters are identical almost surely. By definition, the model is identifiable if the equality of the likelihood values implies the equality of the two sets of parameters. We derive the equality of the two sets of parameters in the following steps:

By conditions (C1), (C2), and (C3′) and the identifiability of the mixture Cox model (Kortram et al. 1995), ϑ_k = ϑ̃_k and Λ_k = Λ̃_k for k = 1, …, K₁.
With some algebraic manipulation, the likelihood function can be expressed in the form of the Laplace transform of the distribution of a function of (Y, η₁). The uniqueness of the Laplace transform, together with condition (C4′), implies that (α_k, β_k, ψ, ν) = (α̃_k, β̃_k, ψ̃, ν̃) for k = 1, …, K₁.
By the uniqueness of the Laplace transform and the complete sufficiency of η₂ imposed by condition (C5′), the equality of the likelihood functions of (T_K_₁+1, …, T_K, Y) implies the equality of the likelihood functions of (T_K_₁+1, …, T_K, Y, η). By the identifiability of the Cox model, we conclude that (ϑ_k, α_k, β_k, ϕ_k) = (ϑ̃_k, α̃_k, β̃_k, ϕ̃_k) for k = K₁ + 1, …, K.

Proof of Theorem 1

The likelihood is given in (4). Here, we consider a single observation and drop the subscript i. Using the arguments in Section 10.1 of Zeng and Lin (2010), we can set each survival time to be right censored at any time point within [0, τ] when establishing identifiability. Consider two sets of parameters (ϑ_k, α_k, β_k, ϕ_k, Λ_k, ψ, ν) and (ϑ̃_k, α̃_k, β̃_k, ϕ̃_k, Λ̃_k, ψ̃, ν̃) such that the likelihood values for an observation with the K survival times being right censored are equal almost surely, i.e.,

\begin{array}{l} \int \prod_{k = 1}^{K} [\int exp {- Λ_{k} (t_{k}) s_{k} e^{W^{T} ϑ_{k} + Z^{T} β_{k} + Y^{T} α_{k} + η^{T} ϕ_{k}}} g_{k} (s_{k}) d m_{k} (s_{k})] \times f_{Y, η} (Y, η ∣ Z; ψ, ν) d η \\ = \int \prod_{k = 1}^{K} [\int exp {- {\tilde{Λ}}_{k} (t_{k}) s_{k} e^{W^{T} {\tilde{ϑ}}_{k} + Z^{T} {\tilde{β}}_{k} + Y^{T} {\tilde{α}}_{k} + η^{T} {\tilde{ϕ}}_{k}}} g_{k} (s_{k}) d m_{k} (s_{k})] \times f_{Y, η} (Y, η ∣ Z; \tilde{ψ}, \tilde{ν}) d η \end{array}

(A.1)

for all t₁, …, t_K ∈ [0, τ], W, Z, and Y, where f_Y,η is the density of (Y, η) given Z. If m_k is a point mass at one, then s_k is fixed at one, g_k = 1, and the integration with respect to m_k(s_k) can be omitted. For simplicity of description, assume that m_k is the Lebesgue measure. Note that

\int s g_{k} (s) d s = - lim_{t \to 0^{+}} \frac{d}{d t} \int_{0}^{\infty} e^{- t s} g_{k} (s) d s = - lim_{t \to 0^{+}} \frac{d}{d t} exp {- G_{k} (t)} = G_{k}^{'} (0) < \infty .

Thus, a transformation model can be written as a random-effect proportional hazards model with known distributions (g₁, …, g_K) for random effects (s₁, …, s_K) with finite means.

First, we show that the baseline hazard functions of the first K₁ survival times are identifiable. For each k = 1, …, K₁, set t_l → 0 for l ≠ k on both sides of (A.1). On each side of the resulting equation, integration with respect to Y results in the likelihood of a mixture Cox model with s_ke^{Y^Tα_k+η_1k} or s_ke^{Y^Tα̃_k+η_1k} as a latent variable. Let E(· | Z) and Ẽ(· | Z) be the expectations under f_Y_,_η(· | Z; ψ, ν) and f_Y_,_η(· | Z; ψ̃, ν̃), respectively. Theorem 3 of Kortram et al. (1995) implies that E(s_ke^{Y^Tα_k+η_1k} | Z = 0)Λ_k = Ẽ(s_ke^{Y^Tα̃_k+η_1k} | Z = 0)Λ̃_k on [0, τ], ϑ_k = ϑ̃_k, and the distribution of E(s_ke^Y^{^T_α_k+η_1k} | Z = 0)⁻¹s_ke^{Y^Tα_k+η_1k} under f_Y_,_η(· | Z;ψ, ν) is equal to that of Ẽ(s_ke^{Y^Tα̃_k+η_1k} | Z = 0)⁻¹s_ke^{Y^Tα̃_k+η_1k} under f_Y_,_η(· |Z; ψ̃, ν̃). Because E(Y^Tα_k + η₁_k | Z = 0) = Ẽ(Y^T α̃_k + η₁_k| Z = 0) = 0 by condition (C2), we see that Λ_k = Λ̃_k on [0, τ].

Second, we show that the likelihood function takes the form of a Laplace transform and use the uniqueness of the Laplace transform to prove the identifiability of (α_k, β_k, ψ, ν) (k = 1, …, K₁). Setting t_k → 0 for k = K₁ + 1, …, K and W = 0 on both sides of (A.1), we have

\begin{array}{l} \int \prod_{k = 1}^{K_{1}} [\int exp {- Λ_{k} (t_{k}) s_{k} e^{Z^{T} β_{k} + Y^{T} α_{k} + η_{1 k}}} g_{k} (s_{k}) d s_{k}] f_{Y, η} (Y, η ∣ Z; ψ, ν) d η \\ = \int \prod_{k = 1}^{K_{1}} [\int exp {- Λ_{k} (t_{k}) s_{k} e^{Z^{T} {\tilde{β}}_{k} + Y^{T} {\tilde{α}}_{k} + η_{1 k}}} g_{k} (s_{k}) d s_{k}] f_{Y, η} (Y, η ∣ Z; \tilde{ψ}, \tilde{ν}) d η . \end{array}

(A.2)

Let U = (U₁, …, U_K_₁), U_k = s_ke^η_1k, and f_U_|_Y be the density function of U given Z and Y. By the uniqueness of the Laplace transform, for any continuous functions f and f̃, any open set 𝒮, and any positive real numbers c and c̃,

\int_{0}^{\infty} e^{- cst} f (t) d t = \int_{0}^{\infty} e^{- \tilde{c} s t} \tilde{f} (t) d t \forall s \in S

implies that f(t) = (c/c̃)f̃(ct/c̃) for all t > 0. Therefore, the equality of (A.2) for all t₁, …, t_K_₁, Z, and Y implies that

f_{U ∣ Y} (U ∣ Z, Y; ψ, ν) = e^{\sum_{k = 1}^{K_{1}} Z^{T} (β_{k} - {\tilde{β}}_{k}) + Y^{T} (α_{k} - {\tilde{α}}_{k})} f_{U ∣ Y} (\tilde{U} ∣ Z, Y; \tilde{ψ}, \tilde{ν})

(A.3)

for all U, Z, and Y, where Ũ = (Ũ₁, …, Ũ_K_₁), and Ũ_k = e^{Z^T(β_k−β̃_k)+Y^T(α_k−α̃_k)}U_k. Let f_η_₁|_Y be the density of η₁ given Z and Y. By the definition of U,

\begin{array}{l} f_{U ∣ Y} (U ∣ Z, Y; ψ, ν) \int_{s_{k} > 0} f_{η_{1} ∣ Y} (log U_{1} - log s_{1}, \dots, log U_{K_{1}} - log s_{K_{1}} ∣ Z, Y; ψ, ν) \prod_{k = 1}^{K_{1}} U_{k}^{- 1} g_{k} (s_{k}) d (s_{1}, \dots, s_{K_{1}}) \\ \int_{ℝ^{K_{1}}} f_{η_{1} ∣ Y} (log U_{1} - v_{1}, \dots, log U_{K_{1}} - v_{K_{1}} ∣ Z, Y; ψ, ν) \prod_{k = 1}^{K_{1}} U_{k}^{- 1} {\bar{g}}_{k} (v_{k}) e^{v_{k}} d (v_{1}, \dots, v_{K_{1}}), \end{array}

where ḡ_k(v) = g_k(e^v). Thus, (A.3) implies that

\int_{ℝ^{K_{1}}} {f_{η_{1} ∣ Y} (log U_{1} - v_{1}, \dots, log U_{K_{1}} - v_{K_{1}} ∣ Z, Y; ψ, ν) - f_{η_{1} ∣ Y} (log {\tilde{U}}_{1} - v_{1}, \dots, log {\tilde{U}}_{K_{1}} - v_{K_{1}} ∣ Z, Y; \tilde{ψ}, \tilde{ν})} \prod_{k = 1}^{K_{1}} {\bar{g}}_{k} (v_{k}) e^{v_{k}} d (v_{1}, \dots, v_{K_{1}}) = 0.

(A.4)

Consider two arbitrary continuous functions f, g: ℝ → ℝ. Note that

\int_{- \infty}^{\infty} e^{- s t} (f * g) (t) d t = \int_{- \infty}^{\infty} e^{- s t} f (t) d t \int_{- \infty}^{\infty} e^{- s t} g (t) d t

for any s such that the integrals are defined, where $(f * g) (t) \equiv \int_{- \infty}^{\infty} f (t - s) g (s) d s$ is the convolution of f and g. Therefore, (f * g)(·) = 0 implies that

\int_{- \infty}^{\infty} e^{- s t} f (t) d t \int_{- \infty}^{\infty} e^{- s t} g (t) d t = 0,

which, if g is positive, implies that f(·) = 0 by the uniqueness of the bilateral Laplace 27 transform (Chareka 2007). Because ḡ_k(·)e^(·) is positive, (A.4) implies that f_η_₁|_Y (η₁ | Z,Y; ψ, ν) = f_η_₁|_Y (η̃₁ | Z,Y; ψ̃, ν̃), where η̃₁ is defined in condition (C4′). By condition (C4′), (α_k, β_k, ψ, ν) = (α̃_k, β̃_k, ψ̃, ν̃) for k = 1, …, K₁.

It remains to identify the parameters associated with (T_K_₁+1, …, T_K). By the uniqueness of the Laplace transform, (A.1) implies that

\begin{array}{l} \int \prod_{k = K_{1} + 1}^{K} {\int e^{- Λ_{k} (t_{k}) s_{k} e^{W^{T} ϑ_{k} + Z^{T} β_{k} + Y^{T} α_{k} + η^{T} ϕ_{k}}} g_{k} (s_{k}) d s_{k}} f_{Y, η} (Y, η ∣ Z; ψ, ν) d η_{2} \\ = \int \prod_{k = K_{1} + 1}^{K} {\int e^{- {\tilde{Λ}}_{k} (t_{k}) s_{k} e^{W^{T} {\tilde{ϑ}}_{k} + Z^{T} {\tilde{β}}_{k} + Y^{T} {\tilde{α}}_{k} + η^{T} {\tilde{ϕ}}_{k}}} g_{k} (s_{k}) d s_{k}} f_{Y, η} (Y, η ∣ Z; ψ, ν) d η_{2} \end{array}

for all t_K_₁+1, …, t_K, W, Z, Y, and η₁, i.e., η₁ can be treated as observed for identifying the remaining parameters. Under condition (C5′), we can use the arguments in the proof of Lemma 1 to show that the integrands in the above equality are equal at each value of η₂. We conclude that (ϑ_k, α_k, β_k, ϕ_k, Λ_k) = (ϑ̃_k, α̃_k, β̃_k, ϕ̃_k, Λ̃_k) for k = K₁+1, …, K.

We provide an overview for the proof of Theorem 2. The consistency of the NPMLE is proved in the following steps:

By conditions (D2)–(D4), the NPMLE exists, i.e., Λ̂_k(τ) < ∞.
By conditions (D3) and (D4), Λ̂_k(τ) is uniformly bounded. Helly’s selection theorem then implies that every subsequence of Λ̂_k has a further converging subsequence.
By the Glivenko-Cantelli properties of the log-likelihood and related functions given by Lemma S2 in the Supplementary Materials, the identifiability of the model, and the non-negativity of the Kullback-Leibler divergence, we conclude the consistency of the NPMLE.

The asymptotic normality of the NPMLE follows mainly from the arguments of van der Vaart (1998, pp. 419–424). Donsker properties of the score and related functions are given by Lemma S2 in the Supplementary Materials, and the invertibility of the information operator is given by Lemma 2.

Proof of Theorem 2

We use Z to denote both W and Z with β_k (k = 1, …, K) being the corresponding vector of regression parameters. Let

\begin{array}{l} Ψ (O_{i}; θ, A) = \prod_{k = 1}^{K} \int {[e^{Z_{i}^{T} β_{k} + Y_{i}^{T} α_{k} + η^{T} ϕ_{k}} G_{k}^{'} {e^{Z_{i}^{T} β_{k} + Y_{i}^{T} α_{k} + η^{T} ϕ_{k}} Λ_{k} ({\tilde{T}}_{k i})}]}^{Δ_{k i}} \\ \times exp [- G_{k} {e^{Z_{i}^{T} β_{k} + Y_{i}^{T} α_{k} + η^{T} ϕ_{k}} Λ_{k} ({\tilde{T}}_{k i})}] f_{Y} (Y_{i} ∣ Z_{i}, η; ψ) f_{η} (η ∣ Z_{i}; ν) d η, \end{array}

Ψ̇_θ(𝒪_i; θ, 𝒜) be the derivative of Ψ(𝒪_i; θ, 𝒜) with respect to θ, and Ψ̇_k(𝒪_i; θ, 𝒜)[H_k] be the derivative of Ψ(𝒪_i; θ, 𝒜) along the path (Λ_k + εH_k).

First, we prove the consistency. By condition (D4),

\begin{array}{l} {[e^{Z_{i}^{T} β_{k} + Y_{i}^{T} α_{k} + η^{T} ϕ_{k}} G_{k}^{'} {e^{Z_{i}^{T} β_{k} + Y_{i}^{T} α_{k} + η^{T} ϕ_{k}} Λ_{k} ({\tilde{T}}_{k i})}]}^{Δ_{k i}} e^{- G_{k} {e^{Z_{i}^{T} β_{k} + Y_{i}^{T} α_{k} + η^{T} ϕ_{k}} Λ_{k} ({\tilde{T}}_{k i})}} \\ \leq e^{O (1 + ∣ Y ∣ + ∣ η ∣)} {1 + Λ_{k} ({\tilde{T}}_{k i})}^{- Δ_{k i} - κ_{3 k} + κ_{2 k} + 1} . \end{array}

Thus, condition (D3) implies that

Ψ (O_{i}; θ, A) \leq \prod_{i = 1}^{n} F (O_{i}; θ) \prod_{k = 1}^{K} {1 + Λ_{k} ({\tilde{T}}_{k i})}^{- Δ_{k i} - κ_{3 k} + κ_{2 k} + 1},

(A.5)

where ℱ(𝒪_i; θ) is a random variable with |E{log ℱ(𝒪_i; θ)}| < ∞ for any θ. By condition (D2), P(T̃_ki = τ) is positive. Therefore, if Λ_k (τ) = ∞, then the right-hand side of (A.5) is zero for large n. We conclude that Λ̂_k (τ) < ∞, such that the NPMLE exists.

We then show that lim sup_nΛ̂_k (τ) < ∞ almost surely. From (A.5),

\begin{array}{l} \frac{1}{n} log L_{n} (\hat{θ}, \hat{A}) = \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{K} Δ_{k i} log {\hat{Λ}}_{k} {{\tilde{T}}_{k i}} + \frac{1}{n} \sum_{i = 1}^{n} log Ψ (O_{i}; \hat{θ}, \hat{A}) \\ \leq \frac{1}{n} \sum_{i = 1}^{n} log F (O_{i}; \hat{θ}) + \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{K} Δ_{k i} log {\hat{Λ}}_{k} {{\tilde{T}}_{k i}} - \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{K} (Δ_{k i} + κ_{3 k} - κ_{2 k} - 1) log {1 + {\hat{Λ}}_{k} ({\tilde{T}}_{k i})} . \end{array}

Let $\tilde{N} = n^{- 1} \sum_{i = 1}^{n} (Δ_{1 i} I ({\tilde{T}}_{1 i} \leq \cdot), \dots, Δ_{K i} I ({\tilde{T}}_{K i} \leq \cdot))$ . Clearly,

\frac{1}{n} log L_{n} (θ_{0}, \tilde{N}) = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{K} Δ_{k i} log n + \frac{1}{n} \sum_{i = 1}^{n} log Ψ (O_{i}; θ_{0}, \tilde{N}) .

The second term on the right-hand side of the above equation is O_p (1). Thus,

\begin{array}{l} \frac{1}{n} log L_{n} (\hat{θ}, \hat{A}) - \frac{1}{n} log L_{n} (θ_{0}, \tilde{N}) + O_{p} (1) \\ \leq \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{K} Δ_{k i} log [n {\hat{Λ}}_{k} {{\tilde{T}}_{k i}}] - \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{K} (Δ_{k i} + κ_{3 k} - κ_{2 k} - 1) log {1 + {\hat{Λ}}_{k} ({\tilde{T}}_{k i})} . \end{array}

Note that (κ₃_k − κ₂_k − 1) is positive by condition (D4). Using the partitioning argument similar to those of Murphy (1994) and Parner (1998), we can show that the right-hand side of the above inequality tends to −∞ if lim sup_n Λ̂_k (τ) = ∞. By definition of (θ̂, 𝒜̂), the left-hand side of the inequality is bounded below by an O_p(1) term. Therefore, Λ̂_k (τ) is uniformly bounded.

Given the boundedness of Λ̂_k (τ), Helly’s selection theorem implies that, for any subsequence of n, we can always choose a further subsequence such that Λ̂_k converges pointwise to some monotone function $Λ_{k}^{*}$ and θ̂ converges to θ^*. The desired consistency result follows if we can show that $Λ_{k}^{*} = Λ_{0 k}$ and θ^* = θ₀ almost surely. With an abuse of notation, let {n}_{1,2, …} be the subsequence. Define

{\tilde{Λ}}_{k} (t) = - \sum_{i = 1}^{n} Δ_{k i} I ({\tilde{T}}_{k i} \leq t) {\sum_{j = 1}^{n} \frac{{\dot{Ψ}}_{k} (O_{j}; θ_{0}, A_{0}) [I ({\tilde{T}}_{k i} \leq \cdot)]}{Ψ (O_{j}; θ_{0}, A_{0})}}^{- 1} .

By Lemma S2 in the Supplementary Materials and the properties of Donsker (and therefore, Glivenko-Cantelli) classes,

\frac{1}{n} \sum_{j = 1}^{n} \frac{{\dot{Ψ}}_{k} (O_{j}; θ_{0}, A_{0}) [I (s \leq \cdot)]}{Ψ (O_{j}; θ_{0}, A_{0})} \to E (\frac{{\dot{Ψ}}_{k} (O_{i}; θ_{0}, A_{0}) [I (s \leq \cdot)]}{Ψ (O_{i}; θ_{0}, A_{0})})

uniformly on [0, τ]. Because the score function along the path Λ_k = Λ₀_k + εI (· ≥ s) with other parameters fixed at their true values has zero expectation,

- E {\frac{{\dot{Ψ}}_{k} (O_{i}; θ_{0}, A_{0}) [I (s \leq \cdot)]}{Ψ (O_{i}; θ_{0}, A_{0})}} = \frac{d P ({\tilde{T}}_{k i} Δ_{k i} < s) / d s}{λ_{0 k} (s)} .

Algebraic manipulation yields that the uniform limit of Λ̃_k on [0, τ] is Λ₀_k. Note that

{\hat{Λ}}_{k} (t) = \int_{0}^{t} \frac{| n^{- 1} \sum_{j = 1}^{n} {\dot{Ψ}}_{k} (O_{j}; θ_{0}, A_{0}) [I (s \leq \cdot)] / Ψ (O_{j}; θ_{0}, A_{0}) |}{| n^{- 1} \sum_{j = 1}^{n} {\dot{Ψ}}_{k} (O_{j}; \hat{θ}, \hat{A}) [I (s \leq \cdot)] / Ψ (O_{j}; \hat{θ}, \hat{A}) |} d {\tilde{Λ}}_{k} (s) .

We have shown that the numerator of the integrand in the above equation converges uniformly. Similarly, we can show that the denominator of the integrand in the above equation converges uniformly to |E{Ψ̇_k(𝒪_i; θ^*,𝒜^*)[I (s ≤ ·)]/Ψ(𝒪_i; θ^*,𝒜^*)}| and that the limit is bounded away from 0. Because Λ̃_k converges uniformly to Λ₀_k, which is differentiable with respect to t, $Λ_{k}^{*}$ is also differentiable with respect to t. It follows that dΛ̂_k/dΛ̃_k converges uniformly to $λ_{k}^{*} / λ_{0 k}$ on [0, τ], where $λ_{k}^{*} = {(Λ_{k}^{*})}^{'}$ . As n⁻¹ log L_n(θ̂, 𝒜̂) − n⁻¹ log L_n(θ₀, 𝒜̃) is non-negative,

\frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{K} Δ_{k i} log \frac{d {\hat{Λ}}_{k} ({\tilde{T}}_{k i})}{d {\hat{Λ}}_{k} ({\tilde{T}}_{k i})} + \frac{1}{n} \sum_{i = 1}^{n} log \frac{Ψ (O_{i}; \hat{θ}, \hat{A})}{Ψ (O_{i}; θ_{0}, \hat{A})} \geq 0.

By the Glivenko-Cantelli properties of the class of functions of log Ψ(𝒪_i; θ, 𝒜) given by Lemma S2 and the uniform convergence of dΛ̂_k/dΛ̃_k, setting n → ∞ on both sides of the above inequality yields

E {log \frac{\prod_{k = 1}^{K} λ_{k}^{*} {({\tilde{T}}_{k i})}^{Δ_{k i}} Ψ (O_{i}; θ^{*}, A^{*})}{\prod_{k = 1}^{K} λ_{0 k} {({\tilde{T}}_{k i})}^{Δ_{k i}} Ψ (O_{i}; θ_{0}, A_{0})}} \geq 0.

The left-hand side of the above inequality is the negative Kullback-Leibler distance of the density indexed by (θ^*,𝒜^*). From the identifiability of the model implied by Theorem 1, we conclude that θ^* = θ₀ and $Λ_{k}^{*} = Λ_{k 0}$ . The desired consistency result follows.

To prove the asymptotic normality of the NPMLE, we adopt the arguments of van der Vaart (1998, pp. 419–424). Let ℘_n be the empirical measure determined by n i.i.d. observations, and let ℘ be the true probability measure. Let ℓ̇_θ(θ, 𝒜) be the derivative of log L_n(θ, 𝒜) with respect to θ, and let ℓ̇_k(θ, 𝒜)[H_k] be the derivative of log L_n(θ, 𝒜) along the path (Λ_k + εH_k). For any v ∈ ℝ^d and 𝒲 = (h₁, …, h_K) with h_k ∈ BV [0, τ], where BV [0, τ] is the space of functions of bounded variation on [0, τ], we have

P_{n} (v^{T} {\dot{ℓ}}_{θ} (\hat{θ}, \hat{A}) + \sum_{k = 1}^{K} {\hat{ℓ}}_{k} (\hat{θ}, \hat{A}) [\int h_{k} d {\hat{Λ}}_{k}]) = 0.

In addition,

P (v^{T} {\dot{ℓ}}_{θ} (θ_{0}, A_{0}) + \sum_{k = 1}^{K} {\dot{ℓ}}_{k} (θ_{0}, A_{0}) [\int h_{k} d Λ_{0 k}]) = 0.

Therefore,

\begin{array}{l} \sqrt{n} (P_{n} - P) (v^{T} {\dot{ℓ}}_{θ} (\hat{θ}, \hat{A}) + \sum_{k = 1}^{K} {\dot{ℓ}}_{k} (\hat{θ}, \hat{A}) [\int h_{k} d {\hat{Λ}}_{k}]) \\ = - \sqrt{n} P {(v^{T} {\dot{ℓ}}_{θ} (\hat{θ}, \hat{A}) + \sum_{k = 1}^{K} {\dot{ℓ}}_{k} (\hat{θ}, \hat{A}) [\int h_{k} d {\hat{Λ}}_{k}]) \\ - (v^{T} {\dot{ℓ}}_{θ} (θ_{0}, A_{0}) + \sum_{k = 1}^{K} {\dot{ℓ}}_{k} (θ_{0}, A_{0}) [\int h_{k} d Λ_{0 k}])} . \end{array}

(A.6)

From the Donsker properties of the classes of functions of ℓ̇_θ and ℓ̇_k implied by Lemma S2 and the consistency of θ̂ and 𝒜̂, we conclude that the left-hand side of (A.6) equals

\sqrt{n} (P_{n} - P) (v^{T} {\dot{ℓ}}_{θ} (θ_{0}, A_{0}) + \sum_{k = 1}^{K} {\dot{ℓ}}_{k} (θ_{0}, A_{0}) [\int h_{k} d Λ_{0 k}]) + o_{p} (1) .

This term converges to a Gaussian process in l^∞(𝒱 ×𝒬^K). By the Taylor series expansion, the right-hand side of (A.6) is of the form

\begin{array}{l} - \sqrt{n} {B_{1} {[v, W]}^{T} (\hat{θ} - θ_{0}) + \sum_{k = 1}^{K} \int B_{2 k} [v, W] d ({\hat{Λ}}_{k} - Λ_{0 k})} \\ + o_{p} (\sqrt{n} | \hat{θ} - θ_{0} | + \sqrt{n} \sum_{k = 1}^{K} {‖ {\hat{Λ}}_{k} - Λ_{0 k} ‖}_{V [0, τ]}), \end{array}

where ℬ ≡ (B₁,B₂₁, …,B₂_K) is the information operator and is linear in ℝ^d ×BV [0, τ]^K. By Lemma 2, ℬ is invertible. The rest of the proof then follows the arguments of van der Vaart (1998, pp. 419–424). Finally, because v^Tθ̂ is an asymptotically linear estimator of v^Tθ₀ with the influence function lying in the space spanned by the score functions, θ̂ is an efficient estimator for θ₀.

The following two lemmas are used in the proofs of Theorem 1 and Theorem 2 and are proved in Section S2 of the Supplementary Materials.

Lemma 1

Let Model A be

\begin{array}{l} X ∣ (η_{1}, η_{2}) \overset{d}{=} X ∣ η_{1} & ~ & F_{X ∣ η_{1}} (\cdot ∣ η_{1}), \\ Y ∣ (η_{1}, η_{2}) & ~ & F_{Y ∣ η} (\cdot ∣ η_{1}, η_{2}), \\ η_{2} ∣ η_{1} & ~ & F_{η_{2} ∣ η_{1}} (\cdot ∣ η_{1}), \\ η_{1} & ~ & F_{η_{1}}, \end{array}

where (X,Y) are observed, and (η₁, η₂) are latent. Model A is depicted in Figure A.1. Let $f_{X ∣ η_{1}} = F_{X ∣ η_{1}}^{'}, f_{Y ∣ η} = F_{Y ∣ η}^{'}, f_{η_{2} ∣ η_{1}} = F_{η_{2} ∣ η_{1}}^{'}$ , and $f_{η_{1}} = F_{η_{1}}^{'}$ . Assume that: (a) for any density functions f̃_{Y |η} and f̃_η₂|η₁,

\int f_{Y ∣ η} (Y ∣ η_{1}, η_{2}) f_{η_{2} ∣ η_{1}} (η_{2} ∣ η_{1}) d η_{2} = \int {\tilde{f}}_{Y ∣ η} (Y ∣ η_{1}, η_{2}) {\tilde{f}}_{η_{2} ∣ η_{1}} (η_{2} ∣ η_{1}) d η_{2} \forall Y, η_{1}

implies that (f_Y _|_η, f_η_₂|_η_₁) = (f̃_Y _|_η, f̃_η_₂|_η_₁), i.e., the model for Y is identifiable if η₁ is observed, (b) F_X_|_η_₁ and F_η_₁ are identifiable based on (X,Y), and (c) η₁ is a complete sufficient statistic in {F_η_₁|_X(· | X) : X ∈ 𝒳}, where F_η_₁|_X is the conditional distribution function of η₁ given X, and 𝒳 is the range of X. Then, Model A is identifiable. A sufficient condition for η₁ to be complete sufficient is that the density of X is of the form

f_{X ∣ η_{1}} (X ∣ η_{1}) \propto \prod_{j = 1}^{q} exp {X_{j} s_{j} (η_{1}) - a_{j} (η_{1})} b_{j} (X_{j}),

where X = (X₁, …,X_q), η₁ ↦ (s₁(η₁), …, s_q(η₁)) is one-to-one, and b_j is non-zero on some open set.

Figure A.1 — SEM Considered in Lemma 1. The SEM consists of two sets of latent variables and two sets of observed variables that may all be multivariate. The observed variable X depends only on the latent variable η₁, but the observed variable Y depends on both sets of latent variables.

Lemma 2

Under conditions (C1), (C2), (C3′), (C5′), and (D5), the model given by (1)–(3) has an invertible information operator.

Footnotes

SUPPLEMENTARY MATERIALS

The supplementary materials contain proofs of Lemmas 1 and 2, additional theoretical results, simulation results for Mplus, and additional data analysis results.

References

Asparouhov T, Masyn K, Muthen B. Continuous Time Survival in Latent Variable Models. ASA Section on Biometrics; Proceedings of the Joint Statistical Meeting; Seattle. August 2006; 2006. pp. 180–187. [Google Scholar]
Bollen KA. Structural Equations With Latent Variables. New York: Wiley; 1989. [Google Scholar]
Bollen KA, Davis WR. Two Rules of Identification for Structural Equation Models. Structural Equation Modeling. 2009;16:523–536. [Google Scholar]
Chareka P. A Finite-Interval Uniqueness Theorem for Bilateral Laplace Transforms. International Journal of Mathematics and Mathematical Sciences. 2007;2007 [Google Scholar]
Cox DR. Regression Models and Life-Tables. Journal of the Royal Statistical Society: Series B. 1972;34:187–220. [Google Scholar]
Dahly D, Adair L, Bollen K. A Structural Equation Model of the Developmental Origins of Blood Pressure. International Journal of Epidemiology. 2009;38:538–548. doi: 10.1093/ije/dyn242. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dempster AP, Laird NM, Rubin DB. Maximum Likelihood From Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society: Series B. 1977;39:1–38. [Google Scholar]
Henderson R, Diggle P, Dobson A. Joint Modelling of Longitudinal Measurements and Event Time Data. Biostatistics. 2000;1:465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]
Huang J, Wellner JA. Interval Censored Survival Data: a Review of Recent Progress. In: Lin DY, Fleming TR, editors. Proceedings of the First Seattle Symposium in Biostatistics. New York: Springer; 1997. pp. 123–169. [Google Scholar]
Kortram RA, van Rooij ACM, Lenstra AJ, Ridder G. Constructive Identification of the Mixed Proportional Hazards Model. Statistica Neerlandica. 1995;49:269–281. [Google Scholar]
Kosorok MR, Lee BL, Fine JP. Robust Inference for Univariate Proportional Hazards Frailty Regression Models. The Annals of Statistics. 2004;32:1448–1491. [Google Scholar]
Larsen K. Joint Analysis of Time-to-Event and Multiple Binary Indicators of Latent Classes. Biometrics. 2004;60:85–92. doi: 10.1111/j.0006-341X.2004.00141.x. [DOI] [PubMed] [Google Scholar]
Larsen K. The Cox Proportional Hazards Model With a Continuous Latent Variable Measured by Multiple Binary Indicators. Biometrics. 2005;61:1049–1055. doi: 10.1111/j.1541-0420.2005.00374.x. [DOI] [PubMed] [Google Scholar]
Liu Q, Pierce DA. A Note on Gauss-Hermite Quadrature. Biometrika. 1994;81:624–629. [Google Scholar]
Louis TA. Finding the Observed Information Matrix When Using the EM Algorithm. Journal of the Royal Statistical Society: Series B. 1982;44:226–233. [Google Scholar]
Menendez JA, Lupu R. Fatty Acid Synthase and the Lipogenic Phenotype in Cancer Pathogenesis. Nature Reviews Cancer. 2007;7:763–777. doi: 10.1038/nrc2222. [DOI] [PubMed] [Google Scholar]
Moustaki I, Steele F. Latent Variable Models for Mixed Categorical and Survival Responses, With an Application to Fertility Preferences and Family Planning in Bangladesh. Statistical Modelling. 2005;5:327–342. [Google Scholar]
Murphy SA. Consistency in a Proportional Hazards Model Incorporating a Random Effect. The Annals of Statistics. 1994;22:712–731. [Google Scholar]
Muthén B, Masyn K. Discrete-Time Survival Mixture Analysis. Journal of Educational and Behavioral Statistics. 2005;30:27–58. [Google Scholar]
Muthén L, Muthén B. Mplus User’s Guide. 7. Los Angeles, CA: Muthén & Muthén; 1998–2015. [Google Scholar]
Naliboff BD, Kim SE, Bolus R, Bernstein CN, Mayer EA, Chang L. Gastrointestinal and Psychological Mediators of Health-Related Quality of Life in IBS and IBD: a Structural Equation Modeling Analysis. The American Journal of Gastroenterology. 2012;107:451–459. doi: 10.1038/ajg.2011.377. [DOI] [PMC free article] [PubMed] [Google Scholar]
Parner E. Asymptotic Theory for the Correlated Gamma-Frailty Model. The Annals of Statistics. 1998;26:183–214. [Google Scholar]
Rabe-Hesketh S, Skrondal A, Pickles A. Generalized Multilevel Structural Equation Modeling. Psychometrika. 2004;69:167–190. [Google Scholar]
Rabe-Hesketh S, Yang S, Pickles A. Multilevel Models for Censored and Latent Responses. Statistical Methods in Medical Research. 2001;10:409–427. doi: 10.1177/096228020101000604. [DOI] [PubMed] [Google Scholar]
Reilly T, O’Brien RM. Identification of Confirmatory Factor Analysis Models of Arbitrary Complexity: the Side-By-Side Rule. Sociological Methods & Research. 1996;24:473–491. [Google Scholar]
Stoolmiller M, Snyder J. Modeling Heterogeneity in Social Interaction Processes Using Multilevel Survival Analysis. Psychological Methods. 2006;11:164–177. doi: 10.1037/1082-989X.11.2.164. [DOI] [PubMed] [Google Scholar]
Stoolmiller M, Snyder J. Embedding Multilevel Survival Analysis of Dyadic Social Interaction in Structural Equation Models: Hazard Rates as Both Outcomes and Predictors. Journal of Pediatric Psychology. 2013:1–11. doi: 10.1093/jpepsy/jst076. [DOI] [PMC free article] [PubMed] [Google Scholar]
The Cancer Genome Atlas Research Network. Integrated Genomic Analyses of Ovarian Carcinoma. Nature. 2011;474:609–615. doi: 10.1038/nature10166. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tibshirani R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society: Series B. 1996;58:267–288. [Google Scholar]
Tsiatis AA, Davidian M. Joint Modeling of Longitudinal and Time-To- Event Data: an Overview. Statistica Sinica. 2004;14:809–834. [Google Scholar]
van der Vaart AW. Asymptotic Statistics. Cambridge: Cambridge University Press; 1998. [Google Scholar]
Vicard P. On the Identification of a Single-Factor Model With Correlated Residuals. Biometrika. 2000;87:199–205. [Google Scholar]
Yuan M, Lin Y. Model Selection and Estimation in Regression With Grouped Variables. Journal of the Royal Statistical Society: Series B. 2006;68:49–67. [Google Scholar]
Zeng D, Lin DY. A General Asymptotic Theory for Maximum Likelihood Estimation in Semiparametric Regression Models With Censored Data. Statistica Sinica. 2010;20:871–910. [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SupMaterials

NIHMS844851-supplement-SupMaterials.pdf^{(317.2KB, pdf)}

[R1] Asparouhov T, Masyn K, Muthen B. Continuous Time Survival in Latent Variable Models. ASA Section on Biometrics; Proceedings of the Joint Statistical Meeting; Seattle. August 2006; 2006. pp. 180–187. [Google Scholar]

[R2] Bollen KA. Structural Equations With Latent Variables. New York: Wiley; 1989. [Google Scholar]

[R3] Bollen KA, Davis WR. Two Rules of Identification for Structural Equation Models. Structural Equation Modeling. 2009;16:523–536. [Google Scholar]

[R4] Chareka P. A Finite-Interval Uniqueness Theorem for Bilateral Laplace Transforms. International Journal of Mathematics and Mathematical Sciences. 2007;2007 [Google Scholar]

[R5] Cox DR. Regression Models and Life-Tables. Journal of the Royal Statistical Society: Series B. 1972;34:187–220. [Google Scholar]

[R6] Dahly D, Adair L, Bollen K. A Structural Equation Model of the Developmental Origins of Blood Pressure. International Journal of Epidemiology. 2009;38:538–548. doi: 10.1093/ije/dyn242. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Dempster AP, Laird NM, Rubin DB. Maximum Likelihood From Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society: Series B. 1977;39:1–38. [Google Scholar]

[R8] Henderson R, Diggle P, Dobson A. Joint Modelling of Longitudinal Measurements and Event Time Data. Biostatistics. 2000;1:465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]

[R9] Huang J, Wellner JA. Interval Censored Survival Data: a Review of Recent Progress. In: Lin DY, Fleming TR, editors. Proceedings of the First Seattle Symposium in Biostatistics. New York: Springer; 1997. pp. 123–169. [Google Scholar]

[R10] Kortram RA, van Rooij ACM, Lenstra AJ, Ridder G. Constructive Identification of the Mixed Proportional Hazards Model. Statistica Neerlandica. 1995;49:269–281. [Google Scholar]

[R11] Kosorok MR, Lee BL, Fine JP. Robust Inference for Univariate Proportional Hazards Frailty Regression Models. The Annals of Statistics. 2004;32:1448–1491. [Google Scholar]

[R12] Larsen K. Joint Analysis of Time-to-Event and Multiple Binary Indicators of Latent Classes. Biometrics. 2004;60:85–92. doi: 10.1111/j.0006-341X.2004.00141.x. [DOI] [PubMed] [Google Scholar]

[R13] Larsen K. The Cox Proportional Hazards Model With a Continuous Latent Variable Measured by Multiple Binary Indicators. Biometrics. 2005;61:1049–1055. doi: 10.1111/j.1541-0420.2005.00374.x. [DOI] [PubMed] [Google Scholar]

[R14] Liu Q, Pierce DA. A Note on Gauss-Hermite Quadrature. Biometrika. 1994;81:624–629. [Google Scholar]

[R15] Louis TA. Finding the Observed Information Matrix When Using the EM Algorithm. Journal of the Royal Statistical Society: Series B. 1982;44:226–233. [Google Scholar]

[R16] Menendez JA, Lupu R. Fatty Acid Synthase and the Lipogenic Phenotype in Cancer Pathogenesis. Nature Reviews Cancer. 2007;7:763–777. doi: 10.1038/nrc2222. [DOI] [PubMed] [Google Scholar]

[R17] Moustaki I, Steele F. Latent Variable Models for Mixed Categorical and Survival Responses, With an Application to Fertility Preferences and Family Planning in Bangladesh. Statistical Modelling. 2005;5:327–342. [Google Scholar]

[R18] Murphy SA. Consistency in a Proportional Hazards Model Incorporating a Random Effect. The Annals of Statistics. 1994;22:712–731. [Google Scholar]

[R19] Muthén B, Masyn K. Discrete-Time Survival Mixture Analysis. Journal of Educational and Behavioral Statistics. 2005;30:27–58. [Google Scholar]

[R20] Muthén L, Muthén B. Mplus User’s Guide. 7. Los Angeles, CA: Muthén & Muthén; 1998–2015. [Google Scholar]

[R21] Naliboff BD, Kim SE, Bolus R, Bernstein CN, Mayer EA, Chang L. Gastrointestinal and Psychological Mediators of Health-Related Quality of Life in IBS and IBD: a Structural Equation Modeling Analysis. The American Journal of Gastroenterology. 2012;107:451–459. doi: 10.1038/ajg.2011.377. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Parner E. Asymptotic Theory for the Correlated Gamma-Frailty Model. The Annals of Statistics. 1998;26:183–214. [Google Scholar]

[R23] Rabe-Hesketh S, Skrondal A, Pickles A. Generalized Multilevel Structural Equation Modeling. Psychometrika. 2004;69:167–190. [Google Scholar]

[R24] Rabe-Hesketh S, Yang S, Pickles A. Multilevel Models for Censored and Latent Responses. Statistical Methods in Medical Research. 2001;10:409–427. doi: 10.1177/096228020101000604. [DOI] [PubMed] [Google Scholar]

[R25] Reilly T, O’Brien RM. Identification of Confirmatory Factor Analysis Models of Arbitrary Complexity: the Side-By-Side Rule. Sociological Methods & Research. 1996;24:473–491. [Google Scholar]

[R26] Stoolmiller M, Snyder J. Modeling Heterogeneity in Social Interaction Processes Using Multilevel Survival Analysis. Psychological Methods. 2006;11:164–177. doi: 10.1037/1082-989X.11.2.164. [DOI] [PubMed] [Google Scholar]

[R27] Stoolmiller M, Snyder J. Embedding Multilevel Survival Analysis of Dyadic Social Interaction in Structural Equation Models: Hazard Rates as Both Outcomes and Predictors. Journal of Pediatric Psychology. 2013:1–11. doi: 10.1093/jpepsy/jst076. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] The Cancer Genome Atlas Research Network. Integrated Genomic Analyses of Ovarian Carcinoma. Nature. 2011;474:609–615. doi: 10.1038/nature10166. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Tibshirani R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society: Series B. 1996;58:267–288. [Google Scholar]

[R30] Tsiatis AA, Davidian M. Joint Modeling of Longitudinal and Time-To- Event Data: an Overview. Statistica Sinica. 2004;14:809–834. [Google Scholar]

[R31] van der Vaart AW. Asymptotic Statistics. Cambridge: Cambridge University Press; 1998. [Google Scholar]

[R32] Vicard P. On the Identification of a Single-Factor Model With Correlated Residuals. Biometrika. 2000;87:199–205. [Google Scholar]

[R33] Yuan M, Lin Y. Model Selection and Estimation in Regression With Grouped Variables. Journal of the Royal Statistical Society: Series B. 2006;68:49–67. [Google Scholar]

[R34] Zeng D, Lin DY. A General Asymptotic Theory for Maximum Likelihood Estimation in Semiparametric Regression Models With Censored Data. Statistica Sinica. 2010;20:871–910. [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Efficient Estimation for Semiparametric Structural Equation Models With Censored Data

Kin Yau Wong

Donglin Zeng

D Y Lin

Roles

Abstract

1. INTRODUCTION

2. BASIC FRAMEWORK

2.1 Model and Likelihood

2.2 Model Identifiability

Example 1

Figure 1.

Example 2

Figure 2.

3. COMPUTATION OF THE NPMLE

4. THEORETICAL PROPERTIES

4.1 Identifiability Conditions

Remark 1

Theorem 1

Remark 2

Remark 3

Remark 4

4.2 Asymptotic Properties

Remark 5

Remark 6

Theorem 2

Remark 7

Remark 8

5. SIMULATION STUDIES

Figure 3.

Table 1.

6. REAL DATA ANALYSIS

Figure 4.

7. DISCUSSION

Supplementary Material

Acknowledgments

APPENDIX: TECHNICAL DETAILS

Proof of Theorem 1

Proof of Theorem 2

Lemma 1

Figure A.1.

Lemma 2

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases