Semiparametric Maximum Likelihood Estimation in Normal Transformation Models for Bivariate Survival Data

Yi Li; Ross L Prentice; Xihong Lin

doi:10.1093/biomet/asn049

. Author manuscript; available in PMC: 2008 Dec 11.

Published in final edited form as: Biometrika. 2008 Dec;95(4):947–960. doi: 10.1093/biomet/asn049

Semiparametric Maximum Likelihood Estimation in Normal Transformation Models for Bivariate Survival Data

Yi Li ¹, Ross L Prentice ², Xihong Lin ³

PMCID: PMC2600666 NIHMSID: NIHMS50270 PMID: 19079778

SUMMARY

We consider a class of semiparametric normal transformation models for right censored bivariate failure times. Nonparametric hazard rate models are transformed to a standard normal model and a joint normal distribution is assumed for the bivariate vector of transformed variates. A semiparametric maximum likelihood estimation procedure is developed for estimating the marginal survival distribution and the pairwise correlation parameters. This produces an efficient estimator of the correlation parameter of the semiparametric normal transformation model, which characterizes the bivariate dependence of bivariate survival outcomes. In addition, a simple positive-mass-redistribution algorithm can be used to implement the estimation procedures. Since the likelihood function involves infinite-dimensional parameters, the empirical process theory is utilized to study the asymptotic properties of the proposed estimators, which are shown to be consistent, asymptotically normal and semiparametric efficient. A simple estimator for the variance of the estimates is also derived. The finite sample performance is evaluated via extensive simulations.

Keywords: Asymptotic normality, Bivariate failure time, Consistency, Semiparametric efficiency, Semiparametric maximum likelihood estimate, Semiparametric normal transformation

1 Introduction

Examples of censored bivariate data include the Danish Twin Study (Wienke et al., 2002), the diabetic retinopathy study (Hougaard, 2000), the dual infection kidney dialysis study (Van Keilegom & Hettmansperger, 2002) and the reproductive health study of the association of age at a marker event and age at menopause (Nan et al., 2006). In all these studies, the assessment of marginal distribution as well as dependence among dependent individuals, such as twins, is of major interest, the latter because it renders genetic information.

Few existing bivariate distributions for nonnegative random variables accommodate semiparametric specifications of marginal distribution and unrestricted pairwise dependence. Consider Clayton's (1978) model for a pair of survival times $({\tilde{T}}_{1}, {\tilde{T}}_{2})$ , defined by

S ({\tilde{t}}_{1}, {\tilde{t}}_{2}) = {[max {S_{1} {({\tilde{t}}_{1})}^{- θ} + S_{2} {({\tilde{t}}_{2})}^{- θ} - 1, 0}]}^{- θ^{- 1}},

(1)

where $S ({\tilde{t}}_{1}, {\tilde{t}}_{2}) = pr ({\tilde{T}}_{1} > {\tilde{t}}_{1}, {\tilde{T}}_{2} > {\tilde{t}}_{2}), S_{1} ({\tilde{t}}_{1}) = S ({\tilde{t}}_{1}, 0^{-}) and S_{2} ({\tilde{t}}_{2}) = S (0^{-}, {\tilde{t}}_{2})$ are bivariate survival and marginal survival functions respectively, and θ has an interpretation as cross-ratio (Oakes, 1989) as well as corresponding to other dependence measures such as Kendall's τ. This model allows for negative dependence when −1 < θ < 0. However, for random variables ${\tilde{T}}_{1}$ and ${\tilde{T}}_{2}$ which are marginally absolutely continuous with respect to the Lebesgue measure µ, the joint distribution of $({\tilde{T}}_{1}, {\tilde{T}}_{2})$ is absolutely continuous with respect to the product Lebesgue measure µ × µ only when θ > −0.5. When θ ≤ −0.5, Oakes (1989) noted that the distribution is no longer absolutely continuous, but has a mass along the curve given by ${({\tilde{t}}_{1}, {\tilde{t}}_{2}) : S_{1} {({\tilde{t}}_{1})}^{- θ} + S_{2} {({\tilde{t}}_{2})}^{- θ} - 1 = 0}$ . Hougaard (2000) further noted that frailty models cannot yield unrestricted marginal distributions with unrestricted pairwise parameters.

Hence it is of substantial interest to specify a semiparametric likelihood model that allows for arbitrary modelling of the marginal survival functions, that allows for a flexible and interpretable correlation structure, and that retains a likelihood so that an efficient and simple estimating procedure is possible. For this purpose, we study a class of semiparametric normal transformation models for right censored bivariate failure times. Specifically, nonparametric marginal hazard rate models are transformed to a standard normal model and a joint normal distribution is imposed on the bivariate vector of transformed variates. The induced joint distribution is closely related to the normal copula model developed by, for example, Klaassen & Wellner (1997) and Pitt et al. (2006). However, all the previous efforts in normal copula focused only on non-censoring situations, and it is unclear whether these existing results can be generalized to censoring situations.

This paper is motivated by a recent work of Li & Lin (2006) on spatial survival data. Li & Lin only considered estimating equation approaches in spatial settings and their estimators are not efficient under the bivariate normal transformation model. In contrast, we focus this paper on semiparametric likelihood based inference for bivariate survival data.

2 Semiparametric Normal Transformation Models

Consider a pair of survival times $({\tilde{T}}_{1}, {\tilde{T}}_{2})$ , where each ${\tilde{T}}_{j}$ marginally has a cumulative hazard Λ_j(t). Then $Λ_{j} ({\tilde{T}}_{j})$ marginally follows a unit exponential distribution, and its probit transformation

T_{j} = Φ^{- 1} {1 - e^{- Λ_{j} ({\tilde{T}}_{j})}}

(2)

has a standard normal distribution, where Φ(·) is the cumulative distribution function for N(0, 1).

To specify the correlation structure within the survival time pair $({\tilde{T}}_{1}, {\tilde{T}}_{2})$ , we assume that the normally transformed survival time pair (T₁; T₂) is jointly normally distributed with correlation coefficient ρ and with a joint tail probability function

Ψ (z_{1}, z_{2}; ρ) = \int_{z_{1}}^{\infty} \int_{z_{2}}^{\infty} ϕ (x_{1}, x_{2}; ρ) d x_{1} d x_{2}

(3)

where ϕ(x₁, x₂; ρ) is the joint probability density function for a bivariate normal vector with mean (0, 0) and covariance matrix $(\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix})$ . It follows that the bivariate survival function for the original survival time pair $({\tilde{T}}_{1}, {\tilde{T}}_{2})$ is

S ({\tilde{t}}_{1}, {\tilde{t}}_{2}; ρ) = pr ({\tilde{T}}_{1} > {\tilde{t}}_{1}, {\tilde{T}}_{2} > {\tilde{t}}_{2}; ρ) = Ψ [Φ^{- 1} {F_{1} ({\tilde{t}}_{1})}, Φ^{- 1} {F_{2} ({\tilde{t}}_{2})}; ρ],

(4)

where F_j(·) is the marginal cumulative distribution function of ${\tilde{T}}_{j}$ for j = 1, 2. In addition, the density for the original survival time pair $({\tilde{T}}_{1}, {\tilde{T}}_{2})$ is

f ({\tilde{t}}_{1}, {\tilde{t}}_{2}; ρ) = f_{1} ({\tilde{t}}_{1}) f_{2} ({\tilde{t}}_{2}) e^{g (t_{1}, t_{2}; ρ)},

(5)

where $t_{i} = Φ^{- 1} {1 - e^{- Λ_{i} ({\tilde{t}}_{i})}}, f_{i} (\tilde{t}) = λ_{i} (\tilde{t}) exp {- Λ_{i} (\tilde{t})}$ is the marginal density for ${\tilde{T}}_{i}, i = 1, 2$ and

g (t_{1}, t_{2}; ρ) = - 0.5 log (1 - ρ^{2}) - 0.5 {(1 - ρ^{2})}^{- 1} (ρ^{2} t_{1}^{2} + ρ^{2} t_{2}^{2} - 2 ρ t_{1} t_{2}) .

(6)

It is obvious that ρ = 0 results in $f ({\tilde{t}}_{1}, {\tilde{t}}_{2}; ρ = 0) = f_{1} ({\tilde{t}}_{1}) f_{2} ({\tilde{t}}_{2})$ , corresponding to the independent case. One can easily show that the bivariate survival function approaches the upper Fréchet bound $min {S_{1} ({\tilde{t}}_{1}), S_{2} ({\tilde{t}}_{2})} as ρ \to 1^{-}$ , and the lower Fréchet bound $max {S_{1} ({\tilde{t}}_{1}) + S_{2} ({\tilde{t}}_{2}) - 1, 0} as ρ \to - 1^{+}$ . Indeed, the correlation parameter ρ provides a summary measure for the pairwise dependence, whose connection with the other commonly used dependence measures, including Kendall's tau, Spearman’s rho and the cross ratio, can be found in Li & Lin (2006). Also of interest is to note that (5) can be rewritten as

\begin{matrix} \frac{f ({\tilde{t}}_{1} | {\tilde{t}}_{2})}{f ({\tilde{t}}_{1} | O)} & = & \frac{f ({\tilde{t}}_{2} | {\tilde{t}}_{1})}{f ({\tilde{t}}_{2} | O)} & = e^{g (t_{1}, t_{2}; ρ)}, \end{matrix}

where f(·|·) denotes a conditional density function and O is the empty set. Hence, function g or parameter ρ can also be interpreted as a Bayes factor for a dependence model against an independence model.

We are now in a position to consider estimation based on a censored sample of m pairs; that is, we estimate the marginal hazard rate and the correlation parameter on the basis of observed pairs $({\tilde{X}}_{i 1}, δ_{i 1}, {\tilde{X}}_{i 2}, δ_{i 2})$ , where ${\tilde{X}}_{i j} = {\tilde{T}}_{i j} \land {\tilde{U}}_{i j} ≔ min ({\tilde{T}}_{i j} and {\tilde{U}}_{i j}), δ_{i j} = I ({\tilde{T}}_{i j} \leq {\tilde{U}}_{i j}), for j = 1, 2$ . For simplicity, we assume that censoring is random in that the censoring pair $({\tilde{U}}_{i 1}, {\tilde{U}}_{i 2})$ is independent of the survival pair $({\tilde{T}}_{i 1}, {\tilde{T}}_{i 2})$ . The likelihood function can then be factorized into the product of contributions from the survival and censoring times, facilitating likelihood-based inference.

In some applications involving bivariate survival data, including studies of disease occurrence patterns of twins or siblings, it is natural to restrict the marginal cumulative hazard to be common for members of the same pair. Hence, we first consider drawing inference with the constraint of Λ₁ ≡ Λ₂ in §3, followed by the case of separate marginal cumulative hazards Λ_j, j = 1, 2, which do not have such a constraint, in §4.

3 Semiparametric Maximum Likelihood Estimation With A Common Marginal Cumulative Hazard

3.1 The Likelihood Function

This section proposes a semiparametric maximum likelihood estimation procedure for the semiparametric normal transformation model with a common marginal cumulative hazard, Λ, say. We define the normally transformed observed time $X_{i j} = Φ^{- 1} {1 - exp (- Λ ({\tilde{X}}_{i j})}, for j = 1, 2$ . As this transformation is monotone, it can easily accommodate right censored data as the transformed outcome (X_ij, δ_ij) contains the same information as the original $({\tilde{X}}_{i j}, δ_{i j})$ , facilitating the derivation of a likelihood function that can be factorized into the product of contributions from the survival and censoring times. It follows that the likelihood function for the unknown parameters (Λ, ρ) can be written, up to a constant, as the product of factors

\begin{matrix} {\tilde{L}}_{i} (ρ, Λ) & = & {e^{g (X_{i 1,} X_{i 2;} ρ)} Λ′ ({\tilde{X}}_{i 1}) Λ′ ({\tilde{X}}_{i 2}) e^{- Λ ({\tilde{X}}_{i 1}) - Λ ({\tilde{X}}_{i 2})}}^{δ_{i 1} δ_{i 2}} {Ψ_{1} (X_{i 1}, X_{i 2}; ρ) Λ′ ({\tilde{X}}_{i 1}) e^{- Λ ({\tilde{X}}_{i 1})}}^{δ_{i 1} (1 - δ_{i 2})} \\ \times {Ψ_{2} (X_{i 1}, X_{i 2}; ρ) Λ′ ({\tilde{X}}_{i 2}) e^{- Λ ({\tilde{X}}_{i 2})}}^{(1 - δ_{i 1}) δ_{i 2}} \times {Ψ (X_{i 1}, X_{i 2}; ρ)}^{(1 - δ_{i 1}) (1 - δ_{i 2})}, \end{matrix}

(7)

i = 1,…,m where $Ψ_{j} (x_{1}, x_{2}; ρ) = - (\partial / \partial x_{j}) Ψ (x_{1}, x_{2}; ρ) / ϕ (x_{j}), for j = 1, 2$ . Indeed, $Ψ_{j} (x_{1}, x_{2}; ρ) = pr (T_{3 - j} \geq x_{3 - j} | T_{j} = x_{j}), for j = 1, 2$ .

Direct maximization of the above likelihood in a space containing continuous hazard Λ(·) is not feasible, as one can always let the likelihood go to ∞ by choosing some continuous function Λ(·) with fixed values at each ${\tilde{X}}_{i j}$ while letting Λ′(·) go to ∞ at an observed failure time, for example, at some ${\tilde{X}}_{i j}$ with δ_ij = 1. Thus we need to assume that Λ is cadlag and piecewise constant, where by cadlag we mean right continuous with left hand limit. It follows that the maximum likelihood estimator of Λ(·) will be the one which jumps only at distinct observed failure times. We denote the jump size of Λ(·) at t by ΔΛ(t) = Λ(t) − Λ(t−). The semiparametric maximum likelihood estimator is the maximizer of the empirical likelihood function L(ρ, Λ), which is the product of terms (7) with Λ′(·) replaced by ΔΛ(·). We denote the log empirical likelihood function by ℓ(ρ, Λ) = log L(ρ, Λ).

3.2 Theoretical Properties of the Semiparametric Maximum Likelihood Estimator

The main results of the paper are proved under the following set of regularity conditions. Condition 1. (Boundedness.) The parameter ρ lies in a compact set within (−1, 1). Condition 2. (Finite Interval.) There exist a t₀ > 0 and a constant c₀ > 0 such that $pr ({\tilde{U}}_{i j} \geq t_{0}) = pr ({\tilde{U}}_{i j} = t_{0}) > c_{0}$ . In practice, t₀ is usually the duration of the study. Condition 3. (Differentiability.) The marginal cumulative hazard Λ (t) is differentiable and Λ′(t) > 0 over [0, t₀]. Moreover, Λ(t₀) < ∞.

Condition 1 ensures the existence and consistency of the estimators. A similar boundedness condition on the frailty parameter was assumed by Murphy (1994) in the context of frailty models for the same purpose. Condition 2 ensures that the failures for both pair members can be observed over a finite interval [0, t₀], which makes it possible to estimate the hazard over [0, t₀]. Condition 3 implies absolute continuity of the cumulative hazard, which is useful in the consistency proof, and that we can work with the supremum norm on the space of cumulative hazard functions. Also, Condition 3 guarantees the identifiability of the semiparametric normal transformation model specified in (2) and (3).

Under Conditions 1–3, we show in a technical report, obtainable at http://biowww.dfci.harvard.edu/~yili/bikaproof.pdf, that the semiparametric maximum likelihood estimates do exist and are finite. Furthermore, the next two Propositions indicate that the semiparametric maximum likelihood estimator of Λ remains bounded, and that the estimators of {ρ,Λ(·)} are consistent and asymptotically normal estimators of the true parameters. The proofs can be found in the aforementioned technical report.

Proposition 1

(Consistency) Denote the true parameters by (ρ₀, Λ₀). Then |ρ̂ − ρ₀| → 0 and ${sup}_{t \in [0, t_{0}]} | \hat{Λ} (t) - Λ_{0} (t) | \to 0$ almost surely.

Proposition 2

(Asymptotic Normality) The scaled process m^1/2(ρ̂−ρ⁰,Λ̂−Λ₀) converges weakly to a zero-mean Gaussian process in the metric space R×l^∞[0, t₀], where l^∞[0, t₀] is the linear space containing all the bounded functions in [0, t₀] equipped with the supremum norm. Furthermore, ρ̂ and $\int_{0}^{t_{0}} η (s) d \hat{Λ} (s)$ are asymptotically efficient, where η(s) is any function of bounded variation over [0, t₀].

Proposition 2 implies that both ρ̂ and Λ̂(t), and hence the estimator of the marginal survival are asymptotically efficient by taking η(s) = I(s ≤ t) for any t ∈ [0,t₀]. It further implies that the infinite-dimensional parameter, Λ(·), can be treated in the same fashion as the finite-dimensional correlation parameter ρ. Hence the asymptotic covariance matrix can be estimated by inverting the observed information matrix. To be specific, for any constant h₁ and any function h₂ of bounded variation, the asymptotic variance of

h_{1} \hat{ρ} + \int_{0}^{t_{0}} h_{2} (s) d \hat{Λ} (s) = h_{1} \hat{ρ} + \sum_{{(i, j) : δ_{i j} = 1}} h_{2} ({\tilde{X}}_{i j}) Δ \hat{Λ} ({\tilde{X}}_{i j})

(8)

can be estimated by $\hat{h′} {\hat{J}}^{- 1} \hat{h}$ , where ĥ is a column vector comprising h₁ and $h_{2} ({\tilde{X}}_{i j})$ for which δ_ij = 1, and Ĵ is the negative Hessian matrix of ℓ(ρ, Λ) with respect to ρ and the jump size of Λ at ${\tilde{X}}_{i j}$ when δ_ij = 1. More formally, it can be shown that $m \hat{h′} {\hat{J}}^{- 1} \hat{h} \to V (h_{1}, h_{2})$ in probability as m → ∞, where V (h₁; h₂) is the asymptotic variance of $m^{1 / 2} {h_{1} \hat{ρ} + \int_{0}^{t_{0}} h_{2} (s) d \hat{Λ} (s)}$ . The justification follows the proof of Theorem 3 in Parner (1998), who argued that the empirical information operator based on Ĵ approximates the true invertible information operator. We will evaluate the finite-sample performance of this variance estimator in §5.

3.3 A Positive-Mass-Redistribution Algorithm

Consider the following computationally efficient procedure for obtaining the semiparametric maximum likelihood estimates and their variance-covariance matrix. Since ${\tilde{T}}_{1}$ and ${\tilde{T}}_{2}$ have the same distribution function, whose estimator has masses at the distinct failure times of ${\tilde{T}}_{1}$ and ${\tilde{T}}_{2}$ , we denote by t₁ < … < t_K the K distinct, ordered and pooled ${\tilde{T}}_{1}$ and ${\tilde{T}}_{2}$ failure times. Define $r (t_{1}, t_{2}) = # {l | {\tilde{X}}_{1 l} \geq t_{1}, {\tilde{X}}_{2 l} \geq t_{2}}$ to be the size of the risk set at (t₁, t₂) and let $\tilde{R} = {(t_{1}, t_{2}) | r (t_{1}, t_{2}) > 0}$ denote the risk region. We focus on the square grids ${(t_{1}, t_{2}) | t_{1} = t_{i}, t_{2} = t_{j}, 1 \leq i \leq K, 1 \leq j \leq K}$ formed by the observed ${\tilde{T}}_{1}$ and ${\tilde{T}}_{2}$ pooled failure times. This is because the censored values in ${\tilde{T}}_{1}$ , or ${\tilde{T}}_{2}$ , in the sample can be replaced by censored values at the immediately smaller ${\tilde{T}}_{1}$ and ${\tilde{T}}_{2}$ pooled uncensored failure time, or by zero if there is no corresponding smaller uncensored time, without affecting the log empirical likelihood ℓ(ρ, Λ). We refer to such replacement as positive mass redistribution. Then let $n_{i j}^{δ_{1} δ_{2}} = # {l | {\tilde{X}}_{1 l} = t_{i}, {\tilde{X}}_{2 l} = t_{j}, δ_{1 l} = δ_{l}, δ_{2 l} = δ_{2}} for δ_{1}, δ_{2} \in {0, 1}$ , 0 ≤ i K and 0 ≤ j ≤ K, with t₀ = 0. Also let f_ij = f(t_i,t_j), $F_{i} = \prod_{l = 1}^{i} (1 - λ_{l})$ and $F_{i}^{-} = F_{i - 1}$ , for i = 1,…,K, where λ_l = ΔΛ(t_l). The log empirical likelihood ℓ(ρ, Λ) defined in §3.1 can now be written as

\begin{matrix} ℓ = \sum_{i = 1}^{K} \sum_{j = 1}^{K} n_{i j}^{11} \log f_{l m} & + & n_{i j}^{10} \log (λ_{i} F_{i}^{-} - \sum_{υ = 1}^{j} f_{i υ}) + n_{i j}^{01} \log (λ_{j} F_{j}^{-} - \sum_{u = 1}^{i} f_{u j}) \\ + & n_{i j}^{00} \log (F_{i} + F_{j} + \sum_{u = 1}^{i} \sum_{υ = 1}^{j} f_{u υ} - 1), \end{matrix}

(9)

which involves only the marginal hazard rates at uncensored ${\tilde{T}}_{1}$ and ${\tilde{T}}_{2}$ times, and the joint density at grid points in the risk region. The latter can be rewritten as

f_{i j} = λ_{i} F_{i}^{-} λ_{j} F_{j}^{-} e^{g (s_{i}, s_{j}; ρ)},

with g(·,·) defined in (6) and s_i = Φ⁻¹(1 − F_i). To ensure numerical stability and avoid arguments of 0 for Φ⁻¹ in computation in finite samples, we use an asymptotically equivalent transformation s_i = Φ⁻¹{1 − (1 − m⁻¹)F_i}. A simple Newton-Raphson procedure, starting with ρ = 0 and the Kaplan-Meier marginal hazard rates derived by treating $({\tilde{X}}_{i j}, δ_{i j})$ , j = 1, 2, i = 1,…, m, as 2m independent observations, can be used to compute the semiparametric maximum likelihood estimates ${\hat{λ}}_{i}$ , ρ̂. These calculations are less computationally demanding as they do not require the evaluation of bivariate incomplete normal integrals, only the evaluation of the univariate Φ⁻¹.

If we follow the arguments in §3.2, the variability of ${\hat{λ}}_{i}$ , ρ̂ can be assessed by inverting the negative Hessian matrix of (9), denoted by Ĵ, a (K + 1) × (K + 1) matrix. Furthermore, the functional (8) can be rewritten as a linear combination of ρ̂ and the ${\hat{λ}}_{i}$ , namely,

h_{1} \hat{ρ} + \sum_{i = 1}^{K} h_{2} (t_{i}) {\hat{λ}}_{i},

whose variance function can be easily computed as $\hat{h′} {\hat{J}}^{- 1} \hat{h}$ , where ĥ = {h₁, h₂(t₁),…,h₂(t_K)}′. We can easily apply this result to estimate the variance of the estimator of a survival probability. For example, the common marginal survival $S (u_{0}) = pr ({\tilde{T}}_{1} > u_{0})$ at any given time u₀ ∈ [0,t₀] can be estimated by $\hat{S} (u_{0}) = e^{- \hat{Λ} (u_{0})}$ . With a first-order Taylor expansion,

\hat{S} (u_{0}) - S (u_{0}) ≏ - S (u_{0}) {\hat{Λ} (u_{0}) - Λ (u_{0})} = \int_{0}^{t_{0}} - S (u_{0}) I (s \leq u_{0}) d \hat{Λ} (s) + const

Hence, Ŝ(u₀) can be approximated by the functional form (8) with h₁ = 0 and h₂(s) = −S(u₀)I(s ≤ u₀) and application of the above results will render ${\hat{S}}^{2} (u_{0}) \hat{e′} {\hat{J}}^{- 1} \hat{e}$ as a consistent estimator of the variance of Ŝ(u₀), where ê = {0; I(t₁ ≤ u₀),…,I(t_K ≤ u₀)}′.

4 Semiparametric Maximum Likelihood Estimation for the Stratified-Hazard Model

4.1 The Estimator and Its Theoretical Properties

In this section, we relax the condition of a common marginal hazard and allow each ${\tilde{T}}_{i j}$ to have a separate cumulative hazard function Λ_j(·), j = 1, 2. This is often appropriate in practice.

We consider joint maximum likelihood estimation, following a development parallel to that for the common-hazard model. Our inference stems from the loglikelihood function of unknown parameters (Λ₁, Λ₂, ρ) based on the observed data $({\tilde{X}}_{i j}, δ_{i j})$ , j = 1, 2, i = 1,…, m, which can be written, up to a constant, as the product over i = 1,…,m of terms

\begin{matrix} {\tilde{L}}_{i} (ρ, Λ_{1}, Λ_{2}) \\ = & {e^{g (X_{i 1}, X_{i 2}; ρ)} Λ_{1}^{'} ({\tilde{X}}_{i 1}) Λ_{2}^{'} ({\tilde{X}}_{i 2}) e^{- Λ_{1} ({\tilde{X}}_{i 1}) - Λ_{2} ({\tilde{X}}_{i 2})}}^{δ_{i 1} δ_{i 2}} {Ψ_{1} (X_{i 1}, X_{i 2}; ρ) Λ_{1}^{'} ({\tilde{X}}_{i 1}) e^{- Λ_{1} ({\tilde{X}}_{i 1})}}^{δ_{i 1} (1 - δ_{i 2})} \\ \times {Ψ_{2} (X_{i 1}, X_{i 2}; ρ) Λ_{2}^{'} ({\tilde{X}}_{i 2}) e^{- Λ_{2} ({\tilde{X}}_{i 2})}}^{(1 - δ_{i 1}) δ_{i 2}} \times {Ψ (X_{i 1}, X_{i 2}; ρ)}^{(1 - δ_{i 1}) (1 - δ_{i 2})} . \end{matrix}

(10)

Here $X_{i j} = Φ^{- 1} {1 - exp (- Λ_{j} ({\tilde{X}}_{i j})}$ for j = 1, 2. Again, direct maximization of the likelihood function in a space containing continuous hazards Λ₁(·) or Λ₂(·) is infeasible, as one can always make the likelihood arbitrarily large by constructing some continuous functions Λ₁(·) and Λ₂(·) with fixed values at each ${\tilde{X}}_{i j}$ while letting $Λ_{1}^{'} (\cdot)$ or $Λ_{2}^{'} (\cdot)$ go to ∞ at an observed failure time. Hence, when performing the maximum likelihood estimation, we assume that (Λ₁, Λ₂) are cadlag and piecewise constant. It follows that the semiparametric maximum likelihood estimator, $(\hat{ρ}, {\hat{Λ}}_{1}, {\hat{Λ}}_{2})$ , is the maximizer of the empirical likelihood function ℓ(ρ, Λ₁, Λ₂), which is obtained from (10) with the derivatives $Λ_{1}^{'} (\cdot)$ and $Λ_{2}^{'} (\cdot)$ at the observed failure times respectively replaced by their jumps ΔΛ₁(·) and ΔΛ₂(·) at the corresponding time-points. We can show that $(\hat{ρ}, {\hat{Λ}}_{1}, {\hat{Λ}}_{2})$ exist and are finite. Furthermore, under Conditions 1–3, with both Λ₁ and Λ₂ satisfying Condition 3, the asymptotic properties of the estimators are summarized in the following two theorems, the proofs of which can be found in our technical report.

Proposition 3

(Consistency) Denote the true parameters by (ρ₀, Λ₀₁, Λ₀₂). Then |ρ̂ − ρ₀| → 0, ${sup}_{t \in [0, t_{0}]} | {\hat{Λ}}_{1} (t) - Λ_{01} (t) | \to 0$ and ${sup}_{t \in [0, t_{0}]} | {\hat{Λ}}_{2} (t) - Λ_{02} (t) | \to 0$ almost surely.

Proposition 4

(Asymptotic Normality) The empirical process $m^{1 / 2} (\hat{ρ} - ρ_{0}, {\hat{Λ}}_{1} - Λ_{01,} {\hat{Λ}}_{2} - Λ_{02})$ converges weakly to a zero-mean Gaussian process in the metric space R × ℓ^∞[0, t₀] × l^∞ [0; t₀], where ℓ^∞ [0, t₀] is the linear space containing all the bounded functions in [0, t₀] equipped with the supremum norm. Furthermore, ρ̂, $\int_{0}^{t_{0}} η_{1} (s) d {\hat{Λ}}_{1} (s)$ and $\int_{0}^{t_{0}} η_{2} (s) d {\hat{Λ}}_{2} (s)$ are asymptotically efficient, where η₁(s) and η₂(s) are any functions of bounded variation over [0, t₀].

As in the case of a common-hazard model, the asymptotic covariance matrix of the estimators of the unknown, finite-dimensional and infinite dimensional, parameters can be estimated by inverting the observed information matrix. For any constant h₁ and any function h₂ and h₃ of bounded variation, the asymptotic variance of

h_{1} \hat{ρ} + \int_{0}^{t_{0}} h_{2} (s) d {\hat{Λ}}_{1} (s) + \int_{0}^{t_{0}} h_{3} (s) d {\hat{Λ}}_{2} (s) = h_{1} \hat{ρ} + \sum_{{i : δ_{i 1} = 1}} h_{2} ({\tilde{X}}_{i 1}) Δ \hat{Λ} ({\tilde{X}}_{i 1}) + \sum_{{i : δ_{i 2} = 1}} h_{2} ({\tilde{X}}_{i 2}) Δ {\hat{Λ}}_{2} ({\tilde{X}}_{i 2})

(11)

can be estimated by $\hat{h′} {\hat{J}}^{- 1} \hat{h}$ , where ĥ is a column vector comprising h₁, the $h_{2} ({\tilde{X}}_{i 1})$ for which δ_i1 = 1 and the $h_{2} ({\tilde{X}}_{i 2})$ for which δ_i2 = 1, and Ĵ is the negative Hessian matrix of ℓ(ρ,Λ₁, Λ₂) with respect to ρ and the jump sizes of Λ_j at ${\tilde{X}}_{i j}$ when δ_ij = 1. Indeed, following the proof of Theorem 3 in Parner (1998), one can show $m \hat{h′} {\hat{J}}^{- 1} \hat{h} \to V (h_{1}, h_{2}, h_{3})$ in probability as m → ∞, where V (h₁, h₂, h₃) is the asymptotic variance of $m^{1 / 2} {h_{1} \hat{ρ} + \int_{0}^{t_{0}} h_{2} (s) d {\hat{Λ}}_{1} (s) + \int_{0}^{t_{0}} h_{3} (s) d {\hat{Λ}}_{2} (s)}$ .

4.2 Practical Implementation for the Stratified-Hazard Model

Denote by t₁₁ < … < t_1I the I distinct ordered ${\tilde{T}}_{1}$ -failure times and by t₂₁ < … < t_2J the J distinct ${\tilde{T}}_{2}$ -failure times in the observed sample. As defined in §3.2, let r(t₁, t₂) be the size of the risk set at (t₁, t₂) and let R˜ be the risk region. We only consider the rectangular grids {(t₁, t₂)|t₁ = t_1i, t₂ = t_2j 1 ≤ i ≤ I, 1 ≤ j ≤ J} formed by the observed ${\tilde{T}}_{1}$ and ${\tilde{T}}_{2}$ failure times. This is because the censored values in ${\tilde{T}}_{1}$ , or ${\tilde{T}}_{2}$ , in the sample can be replaced by censored values at the immediately smaller ${\tilde{T}}_{1}$ , or ${\tilde{T}}_{2}$ , uncensored failure times, or by zero if there is no corresponding smaller uncensored time, without affecting the empirical likelihood ℓ(ρ,Λ₁, Λ₂). With these replacements, or the so-called positive-mass-redistributions, let $n_{i j}^{δ_{1} δ_{2}} = # {l | {\tilde{X}}_{1 l} = t_{1 i}, {\tilde{X}}_{2 l} = t_{2 j}, δ_{1 l} = δ_{1}, δ_{2 l} = δ_{2}}$ for δ₁,δ₂ ∈ {0, 1} and for 0 ≤ i ≤ I and 0 ≤ j ≤ J, with t₁₀ = t₂₀ = 0. Also let $f_{i j} = f (t_{1 i}, t_{2 j}), F_{1 i} = \prod_{l = 1}^{i} (1 - λ_{1 l}), F_{1 i}^{-} = F_{1, i - 1}, F_{2 j} = \prod_{k = 1}^{j} (1 - λ_{2 k})$ , and $F_{2 j}^{-} = F_{2, j - 1}$ , where λ_1l = ΔΛ₁(t_1l), λ_2k = ΔΛ₂(t_2k). The log empirical likelihood function can now be written as

\begin{matrix} ℓ = \sum_{i = 1}^{I} \sum_{j = 1}^{J} n_{i j}^{11} log f_{l m} & + & n_{i j}^{10} log (λ_{1 i} F_{1 i}^{-} - \sum_{υ = 1}^{j} f_{i j}) + n_{i j}^{01} log (λ_{2 j} F_{2 j}^{-} - \sum_{u = 1}^{i} f_{u j}) \\ + & n_{i j}^{00} log (F_{1 i} + F_{2 j} + \sum_{u = 1}^{i} \sum_{υ = 1}^{j} f_{u υ} - 1), \end{matrix}

(12)

which involves only the marginal hazard rates at uncensored T₁ and T₂ times, and the joint density at grid-points in the risk region, namely,

f_{i j} = λ_{1 i} F_{1 i}^{-} λ_{2 j} F_{2 j}^{-} e^{g (s_{1 i}, s_{2 j}; ρ)},

with s_1i = Φ⁻¹(1 − F_1i) and s_2j = Φ⁻¹(1 − F_2j). In practice, we use an asymptotically equivalent transformation s_1i = Φ⁻¹{1 − (1 − m⁻¹)F_1i} and s_2j = Φ⁻¹{1−(1−m⁻¹)F_2j}. A simple Newton-Raphson procedure, starting with ρ = 0, and Kaplan-Meier marginal hazard rates $λ_{1 i} = \sum_{j = 1}^{J} (n_{i j}^{11} + n_{i j}^{10}) / r (t_{1 i}, 0), λ_{2 j} = \sum_{i = 1}^{I} (n_{i j}^{11} + n_{i j}^{01}) / r (0, t_{2 j})$ , can be used to compute the SPMLEs ${\hat{λ}}_{1 i}$ , ${\hat{λ}}_{2 j}$ and ρ̂. We again note that the likelihood evaluations are less computationally demanding, requiring only the computation of the univariate Φ⁻¹.

Similarly, the variability of ${\hat{λ}}_{1 i}$ , ${\hat{λ}}_{2 j}$ and ρ̂ can be assessed by inverting the negative Hessian matrix of (12), denoted by the (I + J + 1) × (I + J + 1) matrix Ĵ. Moreover, the functional (11) can be rewritten as a linear combination of ρ̂ and the ${\hat{λ}}_{1 i}$ , ${\hat{λ}}_{2 j}$ , namely,

h_{1} \hat{ρ} + \sum_{i = 1}^{I} h_{2} (t_{1 i}) {\hat{λ}}_{1 i} + \sum_{j = 1}^{J} h_{3} (t_{2 j}) {\hat{λ}}_{2 j},

whose variance can be easily computed by $\hat{h′} {\hat{J}}^{- 1} \hat{h}$ , where ĥ = {h₁, h₂(t₁₁),…, h₂(t_1I), h₃(t₂₁),…, h₃(t_2J}′.

As an illustration of this variance formula. Consider the bivariate survival estimator Ŝ(u₀, υ₀) S(u₀, υ₀) at any given time pair (u₀, υ₀) ∈ [0,t₀]². Based on the semiparametric normal transformation model, this is

\hat{S} (u_{0}, υ_{0}) = Ψ [Φ^{- 1} {1 - e^{- {\hat{Λ}}_{1} (u_{0})}}, Φ^{- 1} {1 - e^{- {\hat{Λ}}_{2} (υ_{0})}}; \hat{ρ}] .

To evaluate the variance of Ŝ(u₀, υ₀), we perform a first-order Taylor expansion, yielding

\begin{matrix} \hat{S} (u_{0}, υ_{0}) - S (u_{0}, υ_{0}) \\ ≏ & γ_{1} (u_{0}, υ_{0}) (\hat{ρ} - ρ_{0}) + γ_{2} (u_{0}, υ_{0}) {{\hat{Λ}}_{1} (u_{0}) - Λ_{1} (u_{0})} + γ_{3} (u_{0}, υ_{0}) {{\hat{Λ}}_{2} (υ_{0}) - Λ_{2} (υ_{0})} \\ = & γ_{1} (u_{0}, υ_{0}) \hat{ρ} + \int_{0}^{t_{0}} γ_{2} (u_{0}, υ_{0}) I (s \leq u_{0}) d {\hat{Λ}}_{1} (s) + \int_{0}^{t_{0}} γ_{3} (u_{0}, υ_{0}) I (s \leq υ_{0}) d {\hat{Λ}}_{2} (s) + const \end{matrix}

where $γ_{1} (t_{1}, t_{2}) = \partial Ψ (x_{1}, x_{2}; ρ) / \partial ρ$ , $γ_{2} (t_{1}, t_{2}) = - Φ_{1} (x_{1}, x_{2}; ρ_{0}) exp {- Λ_{1} (t_{1})}$ , $γ_{3} (t_{1}, t_{2}) = - Φ_{2} (x_{1}, x_{2}; ρ_{0}) exp {- Λ_{2} (t_{2})}$ and $x_{j} = Φ^{- 1} {1 - e^{- Λ_{j} (t_{j})}}$ . Hence, Ŝ(u₀, υ₀) can be approximated by the functional form (11) with h₁ = γ₁(u₀, υ₀), h₂(s) = γ₂(u₀, υ₀)I(s ≤ u₀), h₃(s) = γ₃(u₀, υ₀)I(s ≤ υ₀), and application of the above variance formula renders a consistent estimator of the variance for Ŝ(u₀, υ₀), namely, $\hat{h′} {\hat{J}}^{- 1} \hat{h}$ , where $\hat{h} = {{\hat{γ}}_{1} (u_{0}, υ_{0}), {\hat{γ}}_{2} (u_{0}, υ_{0}) I (t_{11} \leq u_{0}), \dots, {\hat{γ}}_{2} (u_{0}, υ_{0}) I (t_{1 I} \leq u_{0}), {\hat{γ}}_{3} (u_{0}, υ_{0}) I (t_{21} \leq u_{0}), \dots, {\hat{γ}}_{3} (u_{0}, υ_{0}) I (t_{2 J} \leq u_{0})}^{'} and {\hat{r}}_{j} (\cdot, \cdot)$ is obtained from γ_j(·,·), for j = 1, 2, 3, with all the unknown parameters replaced by their estimators. We evaluate the finite-sample performance of this variance estimator in §5.

5 Numerical Studies

5.1 Preamble

A series of simulation studies were performed to examine the properties of the proposed estimator and to compare it with the existing bivariate survivor estimators, including the Prentice-Cai (Prentice & Cai, 1992), Dabrowska (1988) and repaired nonparametric maximum likelihood (van der Laan, 1996; Moodie et al, 2005) estimators. The simulation set-up mimics those in Prentice et al. (2004). The marginal distributions of ${\tilde{T}}_{1}$ and ${\tilde{T}}_{2}$ were specified as unit exponential, and the censoring time ${\tilde{U}}_{1}$ was taken to be an exponential variate with mean 0.5. Three special cases for ${\tilde{U}}_{2}$ were considered: ${\tilde{U}}_{2} = \infty$ , corresponding to no ${\tilde{T}}_{2}$ censoring; ${\tilde{U}}_{2} = {\tilde{U}}_{1}$ , corresponding to univariate censoring; ${\tilde{U}}_{2}$ is independent of ${\tilde{U}}_{1}$ and is an exponential variate with mean 0.5. A sample size of 120 pairs was considered with 1000 repetitions at a given configuration.

5.2 Finite sample performance under the correct model

We began by evaluating the finite sample performance of the semiparametric maximum likelihood estimator when the true model follows the semiparametric normal transformation model (4) with ρ = 0.5. We considered both the common-hazard model and the stratified-hazard model, but, as both models yielded similar results, we only report in Table 1 the simulation results for the common-hazard model. As efficient estimation of the common hazard function or the common marginal distribution function is of major interest under the common-hazard model, we report only the estimates of the marginal survival function at various time-points in Table 1. The sample averages of the estimates are very close to the true values, and the model-based standard errors, which were computed by applying the results of §3.3, match very well with the empirical standard errors up to the 3rd decimal point.

Table 1.

Averages and model based and empirical SEs of the SPMLEs under the semiparametric normal transformation model (4) with a common hazard function. The true values are: ρ = 0.5; S(0.1625) = 0.85; S(0.3566) = 0.70; S(0.5978) = 0.55.

				t = 0.1625		t = 0.3566		t = 0.5978
Censoring	ρ	SE_e	SE_m	Ŝ(t)	SE_e	Ŝ(t)	SE_e	Ŝ(t)	SE_e
Censoring on T₁ only	0.502	0.109	0.104	0.842	0.023	0.692	0.035	0.546	0.044
Univariate censoring	0.503	0.122	0.114	0.842	0.025	0.694	0.035	0.546	0.051
Bivariate censoring	0.493	0.141	0.138	0.846	0.027	0.697	0.042	0.555	0.055

Open in a new tab

SE_e, empirical standard error; SE_m, model-based standard error.

5.3 Finite sample performance under the misspecified model

We next considered the robustness of the SPMLE when the semiparametric normal transformation model was misspecified. The failure times were generated under the following bivariate Clayton model:

S ({\tilde{t}}_{1}, {\tilde{t}}_{2}) = {S_{1} {({\tilde{t}}_{1})}^{- θ} + S_{2} {({\tilde{t}}_{2})}^{- θ} - 1}^{- θ^{- 1}},

(13)

with θ = 4, implying a strong positive dependence between ${\tilde{T}}_{1}$ and ${\tilde{T}}_{2}$ . We compared the performance of the semiparametric maximum likelihood estimator based on the semiparametric normal transformation model, ${\hat{S}}_{NT}$ with the other existing nonparametric estimators, including the Prentice-Cai estimator, ${\hat{S}}_{PC}$ , the empirical hazard rate estimator, ${\hat{S}}_{E}$ , and the redistributed empirical estimator, ${\hat{S}}_{RE}$ , which were taken from Table 1 and Table 2 of Prentice et al. (2004). As Prentice et al. (2004) only considered stratified-hazard models, we focused on the stratified-hazard model to make the resulting estimates comparable.

Table 2.

Averages, SEs and mean squared errors (MSE) for various bivariate survival function estimators at various time pairs (t₁, t₂) when the correct model (13) with θ = 4 is misspecified to the semiparametric normal model (4). The true bivariate survival probabilities at these pairs are 0.771, 0.666 and 0.608, respectively.

		(t₁, t₂) = (0.1625, 0.1625)			(t₁, t₂) = (0.1625, 0.3566)			(t₁, t₂) = (0.3566, 0.3566)

Censoring		bias	SE	MSE	bias	SE	MSE	Bias	SE	MSE
				(×10⁻³)			(×10⁻³)			(×10⁻³)
Censoring on T₁ only	${\hat{S}}_{E}$	0.0%	0.046	2.1	0.3%	0.055	3.0	0.0%	0.057	3.2
	${\hat{S}}_{PC}$	0.1%	0.040	1.6	0.1%	0.043	1.9	−0.1%	0.047	2.2
	${\hat{S}}_{RE}$	0.1%	0.043	1.8	0.3%	0.046	2.1	0.0%	0.049	2.4
	${\hat{S}}_{NT}$	3.2%	0.035	1.7	4.4%	0.035	2.0	3.3%	0.043	2.2
			(0.033)			(0.030)			(0.038)
Univariate Censoring	${\hat{S}}_{E}$	0.0%	0.051	2.6	0.3%	0.059	3.5	0.1%	0.063	4.0
	${\hat{S}}_{PC}$	0.1%	0.041	1.7	0.3%	0.049	2.4	0.0%	0.051	2.6
	${\hat{S}}_{RE}$	0.1%	0.048	2.3	0.3%	0.057	3.2	0.0%	0.058	3.4
	${\hat{S}}_{NT}$	2.6%	0.032	1.4	3.3%	0.039	2.0	2.1%	0.042	1.9
			(0.028)			(0.037)			(0.042)
Bivariate Censoring	${\hat{S}}_{E}$	0.0%	0.058	3.4	0.3%	0.073	5.3	−0.1%	0.077	5.9
	${\hat{S}}_{PC}$	−0.1%	0.041	1.7	−0.1%	0.049	2.4	−0.3%	0.053	2.8
	${\hat{S}}_{RE}$	−0.2%	0.056	3.1	0.3%	0.066	4.4	−0.3%	0.068	4.6
	${\hat{S}}_{NT}$	2.2%	0.035	1.5	1.9%	0.042	1.9	0.1%	0.047	2.2
			(0.033)			(0.046)			(0.048)

Open in a new tab

${\hat{S}}_{E}$ , the empirical hazard rate estimator; ${\hat{S}}_{PC}$ , the Prentice-Cai estimator; ${\hat{S}}_{RE}$ , the redistributed empirical estimator; ${\hat{S}}_{NT}$ , the semiparametric maximum likelihood estimator; SEs, empirical standard errors; for ${\hat{S}}_{NT}$ the model based standard errors are displayed inside the brackets.

Table 2 and Table 3 contain the sample averages of the relative biases of the bivariate survival estimates and marginal survival estimates at selected time-points and the average model-based standard errors, calculated by applying the results of §4.2, for the point estimates, along with the empirical standard errors. We also list the summary statistics for the empirical hazard rate, ${\hat{S}}_{E}$ , Prentice-Cai, ${\hat{S}}_{PC}$ , and redistributed empirical, ${\hat{S}}_{RE}$ estimators (Prentice et al., 2004). Finally, we computed the mean squared errors for all the estimators.

Table 3.

Averages and SEs for various estimators of the marginal survival functions S₁ and S₂ when the correct model (13) with θ = 4 is misspecified to the semiparametric normal model (4). The true marginal survival probabilities for both T₁ and T₂ are 0.850, 0.700 and 0.55, respectively.

		t₁ = 0.1625			t₁ = 0.3566			t₁ = 0.5978

Censoring		bias	SE	MSE	bias	SE	MSE	Bias	SE	MSE
				(×10⁻³)			(×10⁻³)			(×10⁻³)
Censoring on T₁ only	${\hat{S}}_{E}$	0.0%	0.036	1.3	0.0%	0.049	2.4	0.2%	0.063	3.9
	${\hat{S}}_{KM}$	0.0%	0.036	1.3	0.1%	0.049	2.4	0.4%	0.064	4.1
	${\hat{S}}_{RE}$	0.1%	0.036	1.3	0.2%	0.048	2.3	1.6%	0.060	3.7
	${\hat{S}}_{NT}$	0.8%	0.031	1.0	2.7%	0.047	2.5	3.4%	0.065	4.6
			(0.036)			(0.050)			(0.062)
Univariate censoring	${\hat{S}}_{E}$	0.1%	0.043	1.8	0.1%	0.057	3.2	0.4%	0.071	5.0
censoring	${\hat{S}}_{KM}$	0.0%	0.036	1.3	0.1%	0.049	2.4	0.4%	0.064	4.1
	${\hat{S}}_{RE}$	0.1%	0.041	1.7	0.3%	0.054	2.9	1.1%	0.069	4.8
	${\hat{S}}_{NT}$	0.6%	0.030	0.9	2.4%	0.046	2.4	3.4%	0.061	4.1
			(0.035)			(0.050)			(0.067)
Bivariate	${\hat{S}}_{E}$	−0.1%	0.048	2.3	0.1%	0.072	5.1	−0.4%	0.100	10.0
censoring	${\hat{S}}_{KM}$	0.0%	0.035	1.2	0.3%	0.051	2.6	0.4%	0.064	4.1
	${\hat{S}}_{RE}$	0.0%	0.047	2.2	0.6%	0.065	4.2	2.3%	0.069	4.9
	${\hat{S}}_{NT}$	0.9%	0.031	1.0	1.9%	0.047	2.4	1.0%	0.063	4.0
			(0.035)			(0.051)			(0.065)

Open in a new tab

${\hat{S}}_{E}$ , the empirical hazard rate estimator; ${\hat{S}}_{KM}$ , the Kaplan-Meier estimator; ${\hat{S}}_{RE}$ , the redistributed empirical estimator; ${\hat{S}}_{NT}$ , the semiparametric maximum likelihood estimator. SEs; empirical standard errors; for ${\hat{S}}_{NT}$ the model based standard errors are displayed inside the brackets; only the estimates of the T₁-survival are reported, as similar patterns were observed for the estimates of the T₂-survival.

Our results show that, even when the underlying model is misspecified, the semiparametric maximum likelihood estimation based on the semiparametric normal transformation model incurred only small biases. Among all the scenarios examined, the relative biases, i.e. (point estimate - true value)/true value, of the semiparametric maximum likelihood estimates ranged from −5.7% to 4%. Compared to the competing nonparametric estimators, the new estimator also achieved high efficiency and had the smallest standard errors in all most all the scenarios examined. It had a smaller MSE than the other estimators in most cases considered. In addition, the model-based standard errors were in a good agreement with their empirical counterparts.

5.4 Further comparison with inverse-probability-of-censoring-weighted estimator in efficiency and robustness

We also compared the new estimator with a doubly-robust inverse-probability-of-censoring-weighted estimator, derived under univariate censoring, which stipulates that the censoring time is common for both pair members (Lin & Ying, 1993; Tsai & Crowley, 1998; Wang & Wells, 1998; Nan et al., 2006). The detailed derivation can be found in our technical report. We first compared the efficiency of the inverse-probability-of-censoring-weighted estimator with the semiparametric maximum likelihood estimator when the true underlying model indeed followed the semiparametric normal transformation model (4) with ρ = 0.5. The results are documented in Table 4, which demonstrate that the new estimator has noticeably smaller variances than its competitor. We next considered the robustness of the inverse-probability-of-censoring-weighted estimator when the underlying model model was misspecified as the semiparametric normal transformation model, while the true model followed the bivariate Clayton model (13) with θ = 4. The results in Table 4 indicate that the inverse-probability-of-censoring-weighted estimator has eliminated the bias caused by the misspecification of the semiparametric normal transformation model, while the new estimator incurs negligible biases and retains smaller variances. In terms of mean squared error (MSE), the semiparametric maximum likelihood estimator was superior in the scenarios considered.

Table 4.

Comparison of the semiparametric normal transformation model (4) based SPMLE estimator and the IPCW estimator at various time pairs (t₁, t₂) under univariate censoring. The true underlying models are the semiparametric normal transformation model (4) with ρ = 0.5, i.e. the working model is correctly specified, and Clayton’;s model (13) with θ = 4, i.e. the working model is misspecified. The true bivariate survival probabilities at the specified points are 0.7577, 0.6415 and 0.5568, respectively, and the averages of the relative biases are listed in the table.

True Underlying Model		(t₁; t₂) = (0.1625, 0.1625)			(t₁; t₂) = (0.1625, 0.3566)			(t₁; t₂) = (0.3566, 0.3566)

		bias	SE_e	MSE	bias	SE_e	MSE	bias	SE_e	MSE
				(×10⁻³)			(×10⁻³)			(×10⁻³)
Semiparametric Normal	${\hat{S}}_{NT}$	0.1%	0.040	1.60	0.0%	0.049	2.40	0.1%	0.050	2.50
Transformation Model	${\hat{S}}_{IP}$	−0.1%	0.043	1.85	−0.1%	0.052	2.70	0.0%	0.055	3.03

Clayton Model	${\hat{S}}_{NT}$	2.6%	0.031	1.41	3.3%	0.039	1.98	2.1%	0.042	1.90
	${\hat{S}}_{IP}$	0%	0.041	1.68	0.3%	0.051	2.61	0%	0.053	2.81

Open in a new tab

${\hat{S}}_{NT}$ , the semiparametric maximum likelihood estimator based on the semiparametric normal transformation model; ${\hat{S}}_{IP}$ , the inverse-probability-of-censoring-weighted estimator; SE_e, the empirical standard error; MSE, the mean squared error.

5.5 Estimation of ρ

Finally, we discuss the interpretation and estimation of ρ, which has a one-one correspondence between the common dependence measure for bivariate survival, such as Kendall’s τ. As indicated in Li & Lin (2006),

τ = 4 \int_{0}^{\infty} \int_{0}^{\infty} f ({\tilde{t}}_{1}, {\tilde{t}}_{2}; ρ) S ({\tilde{t}}_{1}, {\tilde{t}}_{2}; ρ) d {\tilde{t}}_{1} d {\tilde{t}}_{2} - 1,

where $S ({\tilde{t}}_{1}, {\tilde{t}}_{2}; ρ)$ and $f ({\tilde{t}}_{1}, {\tilde{t}}_{2}; ρ)$ are respectively the joint bivariate survival and density functions defined in (4) and (5). After a change of variables, we have that

τ ≔ τ (ρ) = 4 \int_{0}^{\infty} \int_{0}^{\infty} Ψ (t_{1}, t_{2}; ρ) e^{g (t_{1}, t_{2}; ρ)} ϕ (t_{1}) ϕ (t_{2}) d t_{1} d t_{2} - 1,

where Ψ(·) is the joint tail function for the bivariate normal distribution defined in (3), g(t₁, t₂, ρ) is the ‘cross’ term defined in (6), and ϕ is the standard normal density function, none of which depends on any specific forms of hazard functions. As shown in Li & Lin (2006), ρ uniquely determines τ, and thus provides a standardized dependence measure for bivariate survival. Indeed, τ(ρ̂) yields the model-based estimator of Kendall's τ , with a model-based standard error that can be conveniently obtained using the delta method.

We considered the estimation and the interpretation of the estimate of ρ when the semiparametric normal transformation model was misspecified, and the failure times were generated under the bivariate Clayton model in (13). We chose θ to be 0.5, 1; 2 and 4, which correspond to Kendall’s τ of 0.199, 0.333, 0.5 and 0.613, respectively, using formula (4.4) of Hougaard (2000).

The sample averages of the estimates of ρ and the model-based Kendall’s τ are displayed in Table 5, along with the empirical and model-based standard errors. It appears that, when the underlying model is misspecified, the estimate of ρ itself might not be of interest, as it does not recover the specific dependence structure of the true model. However, the model-based estimator of Kendall’s τ based on the estimates of ρ are very comparable to the true values. We envisage that, at least for the scenarios we considered, the estimate of ρ would lead to a reasonable approximation for Kendall's τ even if the model were misspecified.

Table 5.

Averages and SEs of ρ and the model-based Kendall’ τ when the correct model (13) with various true θ is misspecified to the semiparametric normal model (4) with parameter ρ to estimate.

θ	True Kendall’s	Censoring	ρ			Model-based τ

	τ		ρ̂	SE_e	SE_m	tau	SE_e	SE_m
4	0.614	Censoring on T₁ only	0.840	0.020	0.014	0.620	0.024	0.016
		Univariate censoring	0.839	0.026	0.021	0.619	0.028	0.022
		Bivariate censoring	0.820	0.033	0.030	0.608	0.033	0.026

2	0.500	Censoring on T₁ only	0.696	0.042	0.038	0.490	0.035	0.030
		Univariate censoring	0.685	0.063	0.057	0.487	0.050	0.041
		Bivariate censoring	0.693	0.064	0.057	0.492	0.052	0.043

1	0.333	Censoring on T₁ only	0.500	0.060	0.059	0.339	0.044	0.041
		Univariate censoring	0.483	0.076	0.077	0.320	0.056	0.053
		Bivariate censoring	0.493	0.077	0.072	0.327	0.058	0.051

0.5	0.199	Censoring on T₁ only	0.313	0.067	0.072	0.209	0.046	0.048
		Univariate censoring	0.291	0.084	0.078	0.194	0.054	0.062
		Bivariate censoring	0.297	0.087	0.079	0.196	0.057	0.064

Open in a new tab

SE_e, empirical standard errors; SE_m, model-based standard errors,

6 Discussion

With the analytical framework established in this article, we are to extend the results to multivariate data, where clusters are allowed to have varying cluster sizes and where each pair of failure times may have a distinct correlation parameter. A key feature of this transformation model is that it can easily accommodate covariates in such a way that survival outcomes marginally follow a common Cox proportional hazards model, and their joint distribution is specified by a joint normal distribution. Hence, the regression coefficients have population-level interpretations, a feature not shared by conditional frailty models.

Acknowledgement

This work was supported in part by grants to each author from the U.S. National Institute of Health. The authors thank Professor D. M. Titterington and the associate editor for their insightful suggestions.

Contributor Information

Yi Li, Department of Biostatistics, Harvard School of Public Health and Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA, yili@jimmy.harvard.edu.

Ross L. Prentice, Fred Hutchinson Cancer Research Center and University of Washington, 1959 NE Pacific Street, Seattle, WA 98195, rprentic@WHI.org

Xihong Lin, Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, MA 02115, xlin@hsph.harvard.edu.

References

Hougaard P. Analysis of Multivariate Survival Data. New York: Springer-Verlag; 2000. [Google Scholar]
van Keilegom I, Hettmansperger TP. Inference on multivariate M-estimators based on bivariate censored data. J. Am. Statist. Assoc. 2002;97:328–336. [Google Scholar]
Klaassen CAJ, Wellner JA. Efficient estimation in the bivariate normal copula model: Normal margins are least favourable. Bernoulli. 1997;3:55–77. [Google Scholar]
Li Y, Lin X. Semiparametric Normal Transformation Models for Spatially Correlated Survival Data. J. Am. Statist. Assoc. 2006;101:591–603. [Google Scholar]
Lin D, Ying Z. A simple nonparametric estimator of the bivariate survival function under univariate censoring. Biometrika. 1993;80:573–581. [Google Scholar]
Moodie FZ, Prentice RL. An adjustment to improve the bivariate survivor function repaired NPMLE. Lifetime Data Anal. 2005;11:291–307. doi: 10.1007/s10985-005-2964-9. [DOI] [PubMed] [Google Scholar]
Murphy SA. Consistency in a proportional hazards model incorporating a random effect. Ann. Statist. 1994;22:712–731. [Google Scholar]
Nan B, Lin X, Lisabeth LD, Harlow SD. Piecewise constant cross-ration estimation for association of age at a marker event and age at menopause. J. Am. Statist. Assoc. 2006;101:65–77. [Google Scholar]
Oakes D. Bivariate survival models induced by frailties. J. Am. Statist. Assoc. 1989;84:487–493. [Google Scholar]
Parner E. Asymptotic theory for the correlated gamma-frailty model. Ann. Statist. 1998;26:183–214. [Google Scholar]
Pitt M, Chan D, Kohn R. Efficient Bayesian inference for Gaussian copula regression models. Biometrika. 2006;93:537–554. [Google Scholar]
Prentice RL, Cai J. Covariance and survivor function estimation using censored multivariate failure time data. Biometrika. 1992;79:495–512. Correction (1993) 80, 711-2. [Google Scholar]
Prentice RL, Hsu L. Regression on hazard ratios and cross ratios in multivariate failure time analysis. Biometrika. 1997;84:349–363. [Google Scholar]
Prentice RL, Moodie ZF, Wu J. Hazard-based nonparametric survivor function estimation. J. R. Statist. Soc. B. 2004;66:305–319. [Google Scholar]
Tsai W, Crowley J. A note on nonparametric estimators of the bivariate survival function under univariate censoring. Biometrika. 1998;85:573–580. [Google Scholar]
van der Laan M. Efficient estimation in the bivariate censoring model and repairing NPMLE. Ann. Statist. 1996;24:596–627. [Google Scholar]
Wang W, Wells M. Nonparametric estimators of the bivariate survival function under simplified censoring conditions. Biometrika. 1998;84:863–880. [Google Scholar]
Wienke A, Lichtenstein P, Yashin A. A bivariate frailty model with a cure fraction for modeling familial correlations in diseases. Biometrics. 2003;59:1178–1183. doi: 10.1111/j.0006-341x.2003.00135.x. [DOI] [PubMed] [Google Scholar]

[R1] Hougaard P. Analysis of Multivariate Survival Data. New York: Springer-Verlag; 2000. [Google Scholar]

[R2] van Keilegom I, Hettmansperger TP. Inference on multivariate M-estimators based on bivariate censored data. J. Am. Statist. Assoc. 2002;97:328–336. [Google Scholar]

[R3] Klaassen CAJ, Wellner JA. Efficient estimation in the bivariate normal copula model: Normal margins are least favourable. Bernoulli. 1997;3:55–77. [Google Scholar]

[R4] Li Y, Lin X. Semiparametric Normal Transformation Models for Spatially Correlated Survival Data. J. Am. Statist. Assoc. 2006;101:591–603. [Google Scholar]

[R5] Lin D, Ying Z. A simple nonparametric estimator of the bivariate survival function under univariate censoring. Biometrika. 1993;80:573–581. [Google Scholar]

[R6] Moodie FZ, Prentice RL. An adjustment to improve the bivariate survivor function repaired NPMLE. Lifetime Data Anal. 2005;11:291–307. doi: 10.1007/s10985-005-2964-9. [DOI] [PubMed] [Google Scholar]

[R7] Murphy SA. Consistency in a proportional hazards model incorporating a random effect. Ann. Statist. 1994;22:712–731. [Google Scholar]

[R8] Nan B, Lin X, Lisabeth LD, Harlow SD. Piecewise constant cross-ration estimation for association of age at a marker event and age at menopause. J. Am. Statist. Assoc. 2006;101:65–77. [Google Scholar]

[R9] Oakes D. Bivariate survival models induced by frailties. J. Am. Statist. Assoc. 1989;84:487–493. [Google Scholar]

[R10] Parner E. Asymptotic theory for the correlated gamma-frailty model. Ann. Statist. 1998;26:183–214. [Google Scholar]

[R11] Pitt M, Chan D, Kohn R. Efficient Bayesian inference for Gaussian copula regression models. Biometrika. 2006;93:537–554. [Google Scholar]

[R12] Prentice RL, Cai J. Covariance and survivor function estimation using censored multivariate failure time data. Biometrika. 1992;79:495–512. Correction (1993) 80, 711-2. [Google Scholar]

[R13] Prentice RL, Hsu L. Regression on hazard ratios and cross ratios in multivariate failure time analysis. Biometrika. 1997;84:349–363. [Google Scholar]

[R14] Prentice RL, Moodie ZF, Wu J. Hazard-based nonparametric survivor function estimation. J. R. Statist. Soc. B. 2004;66:305–319. [Google Scholar]

[R15] Tsai W, Crowley J. A note on nonparametric estimators of the bivariate survival function under univariate censoring. Biometrika. 1998;85:573–580. [Google Scholar]

[R16] van der Laan M. Efficient estimation in the bivariate censoring model and repairing NPMLE. Ann. Statist. 1996;24:596–627. [Google Scholar]

[R17] Wang W, Wells M. Nonparametric estimators of the bivariate survival function under simplified censoring conditions. Biometrika. 1998;84:863–880. [Google Scholar]

[R18] Wienke A, Lichtenstein P, Yashin A. A bivariate frailty model with a cure fraction for modeling familial correlations in diseases. Biometrics. 2003;59:1178–1183. doi: 10.1111/j.0006-341x.2003.00135.x. [DOI] [PubMed] [Google Scholar]

PERMALINK

Semiparametric Maximum Likelihood Estimation in Normal Transformation Models for Bivariate Survival Data

Yi Li

Ross L Prentice

Xihong Lin

SUMMARY

1 Introduction

2 Semiparametric Normal Transformation Models

3 Semiparametric Maximum Likelihood Estimation With A Common Marginal Cumulative Hazard

3.1 The Likelihood Function

3.2 Theoretical Properties of the Semiparametric Maximum Likelihood Estimator

Proposition 1

Proposition 2

3.3 A Positive-Mass-Redistribution Algorithm

4 Semiparametric Maximum Likelihood Estimation for the Stratified-Hazard Model

4.1 The Estimator and Its Theoretical Properties

Proposition 3

Proposition 4

4.2 Practical Implementation for the Stratified-Hazard Model

5 Numerical Studies

5.1 Preamble

5.2 Finite sample performance under the correct model

Table 1.

5.3 Finite sample performance under the misspecified model

Table 2.

Table 3.

5.4 Further comparison with inverse-probability-of-censoring-weighted estimator in efficiency and robustness

Table 4.

5.5 Estimation of ρ

Table 5.

6 Discussion

Acknowledgement

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Semiparametric Maximum Likelihood Estimation in Normal Transformation Models for Bivariate Survival Data

Yi Li

Ross L Prentice

Xihong Lin

SUMMARY

1 Introduction

2 Semiparametric Normal Transformation Models

3 Semiparametric Maximum Likelihood Estimation With A Common Marginal Cumulative Hazard

3.1 The Likelihood Function

3.2 Theoretical Properties of the Semiparametric Maximum Likelihood Estimator

Proposition 1

Proposition 2

3.3 A Positive-Mass-Redistribution Algorithm

4 Semiparametric Maximum Likelihood Estimation for the Stratified-Hazard Model

4.1 The Estimator and Its Theoretical Properties

Proposition 3

Proposition 4

4.2 Practical Implementation for the Stratified-Hazard Model

5 Numerical Studies

5.1 Preamble

5.2 Finite sample performance under the correct model

Table 1.

5.3 Finite sample performance under the misspecified model

Table 2.

Table 3.

5.4 Further comparison with inverse-probability-of-censoring-weighted estimator in efficiency and robustness

Table 4.

5.5 Estimation of ρ

Table 5.

6 Discussion

Acknowledgement

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases