Score and Deviance Residuals Based on the Full Likelihood Approach in Survival Analysis

Susan Halabi; Sandipan Dutta; Yuan Wu; Aiyi Liu

doi:10.1002/pst.2047

. Author manuscript; available in PMC: 2020 Dec 31.

Published in final edited form as: Pharm Stat. 2020 Aug 9;19(6):940–954. doi: 10.1002/pst.2047

Score and Deviance Residuals Based on the Full Likelihood Approach in Survival Analysis

Susan Halabi ¹, Sandipan Dutta ², Yuan Wu ¹, Aiyi Liu ³

PMCID: PMC7774642 NIHMSID: NIHMS1655873 PMID: 32776412

Abstract

Assuming the proportional hazards model and non-informative censoring, the full likelihood approach is used to obtain two new residuals. The first residual is based on the ideas used in obtaining score-type residuals similar to the partial likelihood approach. The second type of residual is based on the concept of deviance residuals. Extensive simulations are conducted to compare the performance of the residuals from the full likelihood based approach with those of the partial likelihood method. We demonstrate through simulation studies that the full likelihood based residuals are more efficient than their partial likelihood counterpart in identifying potential outliers when the censoring proportion is high. The graphical techniques are used to illustrate the applications of these residuals using some examples.

Keywords: Deviance residuals, Full likelihood, Partial Likelihood, Non-informative censoring, Proportional hazards, Score-type residuals

1. Introduction

Residuals are important tools for checking the goodness-of-fit of a model, and identifying potential outliers and influential observations to a fitted model. A number of residuals for the Cox’s model¹ have been proposed for model diagnostics, identifying influential points and outliers^2–4. These include martingale residual, deviance residual, and the score residual ^4–6. Schoenfeld’s residuals (1982) are estimated through the partial likelihood function and are used to check the assumption of the proportional hazards model ². Martingale residuals were originally proposed to check for the overall goodness-of-fit of a proportional hazards model with respect to a data⁴. Although martingale residuals are good measure for the overall goodness-of-fit, they suffer from lack of symmetry making them difficult to be used for outlier detection. Therneau and Grambsch (2000) proposed deviance residual for the Cox’s proportional hazards model in order to improve on the lack of symmetry drawback⁴. Other methods for identifying influential observations are based on the jackknife approach ⁷.

The score residuals help in identifying influential or extreme observations with respect to every covariate in the fitted model, and in determining which of the covariates do not fit well in the proportional hazards model. Large magnitude of the deviance residual for an observation indicates that it is a potential outlier to the model. Similarly, large magnitude of the score residual of an individual with respect to a particular covariate indicates heavy influence of that individual in the estimation of the regression effect of that covariate. The deviance and score residuals for the proportional hazards model are based on the partial likelihood approach.

We are interested in detecting outliers in a phase III clinical trial where the censoring proportion was high. We develop innovative score-type and deviance residuals using the full likelihood function in the proportional hazards model. We further demonstrate through simulation studies that the full likelihood based residuals have higher area under the curve (AUC) than their partial likelihood counterpart in identifying potential outliers. The rest of this article is organized as follows. We derive the score-type and the deviance residuals based on the full likelihood function in section 2. We conduct simulation studies in section 3 where we generate survival data under the proportional hazards assumption and compare the performances of the different types of residuals in identifying potential outliers. We discuss the results of the simulations and we apply the full likelihood based score-type and deviance residuals to two real datasets from the primary biliary cirrhosis and breast cancer studies. Finally, we discuss the results and implications in Section 5.

2. Methods

We first introduce notations and then we describe the full and the partial likelihood functions. The full likelihood function is simplified for the case of the proportional hazards model in order to obtain the score-type and deviance residuals.

2.1. Full Likelihood Function

Assume that there are n individuals in the study. For the ith individual, let $T_{i}$ and $C_{i}$ be the failure time and censoring time, respectively. Assume further that: a) the failure time $T_{i}$ has the probability density function (p.d.f) $f (t)$ and survival function $S (t)$ ; b) the censoring time has p.d.f $g (t)$ and survival function $G (t)$ ; and c) the two random variables $T_{i}$ and $C_{i}$ are independent. Let $δ_{i}$ be the indicator variable taking value of 1 (or 0) if the i^th individual has failed (or not) and let $t_{i}$ be the observed failure time ${m i n (T}_{i}, C_{i})$ .

Assuming that the censoring is non-informative and using Lawless formula of the full likelihood function⁸, the full likelihood function is given by

L = \prod_{i = 1}^{n} {(f (t_{i}))}^{δ_{i}} {(S (t_{i}))}^{{1 - δ}_{i}}

(2.1)

For the proportional hazards model, the p.d.f. and survival function are given by

f (t) = λ_{0} (t) e^{x^{'} β} S (t)

(2.2)

and

S (t) = e^{- Λ_{0} (t) e^{x^{'} β}}

(2.3)

where

$λ_{0} (t)$ is the baseline hazard function at time t with x=0,

$Λ_{0} (t)$ is the baseline cumulative hazard function at time t, where $Λ_{0} (t) = \int_{0}^{t} λ_{0} (u) d u$ ,

$x' = (x_{1}, x_{2}, \dots, x_{p})$ is the vector of p covariates associated with the i^th individual,

$β^{'} = (β_{1}, β_{2}, \dots, β_{p})$ is the vector of p regression coefficients common to all the individuals.

Then the likelihood function (2.1), L, simplifies to

L = \prod_{i = 1}^{n} {(λ_{0} (t_{i}) e^{{x_{i}}^{'} β})}^{δ_{i}} (e^{- Λ_{0} (t_{i}) e^{{x_{i}}^{'} β}})

(2.4)

and the log-likelihood is

\log L = \sum_{i = 1}^{n} δ_{i} (l o g (λ_{0} (t_{i})) + e^{{x_{i}}^{'} β}) - \sum_{i = 1}^{n} Λ_{0} (t_{i}) e^{{x_{i}}^{'} β} .

(2.5)

2.2. Score-Type Residuals

In this section, the score-type residuals are obtained. On differentiating log L (2.5) with respect to $β_{j}$ one obtains the j^th score function (j = 1,…, p) which is given by

U_{j} (β) = \frac{\partial \log L}{\partial β_{j}} = \sum_{i = 1}^{n} x_{i j} (δ_{i} - Λ_{0} (t_{i}) e^{{x_{i}}^{'} β}) .

(2.6)

Proposition 1: $E (U_{j} (β)) = 0$

Proof: We provide a sketch of the proof in Appendix A.1. The unknown parameters of the score function are estimated by solving the score equation, $E (U_{j} (β)) = 0$ , where the score vector $U_{j} (β) = {(U_{1} (β), \dots, U_{p} (β))}^{'}$ . The score equation involves two different parameters, $Λ_{0}$ and β, that need to be estimated using a two-step procedure:

First, one needs to estimate $Λ_{0} (t)$ , say ${\hat{Λ}}_{0} (t)$ , and we used both the Kaplan-Meier⁹ (KM) ${\hat{Λ}}_{o K M} (t)$ and the Nelson-Aalen¹⁰ (NA) estimators ${\hat{Λ}}_{o N A} (t)$ for the baseline cumulative hazard function.

Second, given the estimates ${\hat{Λ}}_{0} (t_{i})$ , we estimate β by solving the first and second derivatives of the p non-linear equations

\sum_{i = 1}^{n} x_{i 1} (δ_{i} - {\hat{Λ}}_{0} (t_{i}) e^{{x_{i}}^{'} β}) = 0, \sum_{i = 1}^{n} x_{i 2} (δ_{i} - {\hat{Λ}}_{0} (t_{i}) e^{{x_{i}}^{'} β}) = 0, \sum_{i = 1}^{n} x_{i p} (δ_{i} - {\hat{Λ}}_{0} (t_{i}) e^{{x_{i}}^{'} β}) = 0 .

(2.7)

\frac{\partial^{2} \log L}{\partial β_{j} β_{k}} = \sum_{i = 1}^{n} x_{i j} (- Λ_{0} (t_{i}) e^{{x_{i}}^{'} β} x_{i k})

(2.8)

And therefore replacing β with $\hat{β}$ in equation 2.8 we obtain.

- \frac{\partial^{2} \log L}{\partial {\hat{β}}_{j} {\hat{β}}_{k}} = \sum_{i = 1}^{n} x_{i j} x_{i k} ({\hat{Λ}}_{0} (t_{i}) e^{{x_{i}}^{'} \hat{β}})

(2.9)

for j =1, 2,…, p and k = 1, 2,…p.

If $\hat{β}$ is the solution to the p non-linear equations, then we obtain a n × p matrix of score residuals E, defined by

E = (\begin{matrix} {\hat{e}}_{11} & \dots & {\hat{e}}_{1 p} \\ ⋮ & ⋱ & ⋮ \\ {\hat{e}}_{n 1} & \dots & {\hat{e}}_{n p} \end{matrix})

(2.10)

where ${\hat{e}}_{i j} = x_{i j} (δ_{i} - {\hat{Λ}}_{0} (t_{i}) e^{{x_{i}}^{' \hat{β}}})$ . It should be noted that every column of the residual matrix E in 2.10 adds up to 0, that is,

\sum_{i = 1}^{n} {\hat{e}}_{i 1} = 0, \sum_{i = 1}^{n} {\hat{e}}_{i 2} = 0, \dots, \sum_{i = 1}^{n} {\hat{e}}_{i p} = 0

2.3. Deviance Residual

If L is the likelihood function, then by definition ^11–14 the deviance function D, is given by

D = s u p 2 {\log L}_{s a t u a r t e d m o d e l} - s u p 2 {\log L}_{u s u a l m o d e l},

(2.11)

where a “saturated model” is a model that perfectly reproduces the data. We show in Appendix A.2 that the deviance function D, for the proportional hazards model is expressed by

D = 2 \sum_{i = 1}^{n} [(δ_{i} \log δ_{i} - δ_{i}) - δ_{i} ({x_{i}}^{'} \hat{β} + \log {\hat{Λ}}_{0} (t_{i})) + {\hat{Λ}}_{0} (t_{i}) e^{{x_{i}}^{'} \hat{β}}],

(2.12)

where $\hat{β}$ and ${\hat{Λ}}_{0}$ are the estimates for β and $Λ_{0}$ , respectively. Then, the deviance residual for the i^th individual is

D_{i} = s i g n \sqrt{2} {[(δ_{i} \log δ_{i} - δ_{i}) - δ_{i} ({x_{i}}^{'} \hat{β} + \log {\hat{Λ}}_{0} (t_{i})) + {\hat{Λ}}_{0} (t_{i}) e^{{x_{i}}^{'} \hat{β}}]}^{1 / 2}

(2.13)

If the i^th individual is censored ( $δ_{i} = 0$ ) and then $δ_{i} \log δ_{i} = 0$ and the expression in 2.13 becomes

D_{i} = s i g n \sqrt{2} {[{\hat{Λ}}_{0} (t_{i}) e^{{x_{i}}^{'} \hat{β}}]}^{1 / 2}

(2.14)

Whereas if the i^th individual is a failure ( $δ_{i} = 1)$ , then the expression in 2.13 is

D_{i} = s i g n \sqrt{2} {[- ({x_{i}}^{'} \hat{β} + \log {\hat{Λ}}_{0} (t_{i})) + {\hat{Λ}}_{0} (t_{i}) e^{{x_{i}}^{'} \hat{β}} - 1]}^{1 / 2} .

(2.15)

2.4. The Partial Likelihood Approach

Cox^1,15 proposed a partial likelihood approach for the proportional hazards model to estimate β without involving $λ_{0} (t)$ . The partial likelihood function is a product of the failure times of the conditional probabilities of observed individuals, chosen from the risk set to experience an event. Let $R (t) = {i : t_{i} > t}$ denote the risk set, that is, the set of individuals who are “at-risk” for failure at time $t$ . Suppose individual $(j)$ fails at $t_{j}$ , then the conditional probability is

{P L}_{m} = P (i n d i v i d u a l (m) f a i l s | o n e i n d i v i d u a l f r o m R (t_{m}) f a i l s a t t_{m}) \approx \frac{P (i n d i v i d u a l (m) f a i l s a t t_{m} | (m) a t r i s k)}{\sum_{l \in R (t_{m})} P (i n d i v i d u a l l f a i l s a t t_{m} | l a t r i s k)} \approx \frac{λ (t_{m} | x_{(m)})}{\sum_{l \in R (t_{m})} λ (t_{m} | x_{l})} .

Under the proportional hazards model, the partial likelihood is

P L = \prod_{m = 1}^{k} \frac{e^{x_{(m)}^{'} β}}{\sum_{x_{l} \in R (t_{m})} e^{x_{l}^{'} β}} = \prod_{i = 1}^{n} {(\frac{e^{x_{i}^{'} β}}{\sum_{x_{l} \in R (t_{i})} e^{x_{l}^{'} β}})}^{δ_{i}} .

Then relationship between the full likelihood and the partial likelihood function is

L = \prod_{i = 1}^{n} {(f (t_{i} | x_{i}))}^{δ_{i}} {(S (t_{i} | x_{i}))}^{{1 - δ}_{i}} = \prod_{i = 1}^{n} {(λ (t_{i} | x_{i}))}^{δ_{i}} S (t_{i} | x_{i}) = P L \prod_{i = 1}^{n} {(\sum_{l \in R (t_{i})} λ (t_{i} | x_{l}))}^{δ_{i}} S (t_{i} | x_{i})

Cox¹ argued that the partial likelihood term in the full likelihood contains almost all the information aboutβ, which validates the partial likelihood approach.

Residuals based on the Partial Likelihood Approach

In what follows, we briefly review several widely used residuals based on the partial likelihood approach, with which we compare our proposed methods in the simulation and real examples. The maximum likelihood estimation for β based on the partial likelihood is the solution for

\sum_{δ_{i} = 1} (x_{i} - E (x_{i} | R (t_{i}))) = 0,

where $E (x_{i} | R (t_{i})) = \frac{\sum_{x_{l} \in R (t_{i})} x_{l} e^{x_{l}^{'} β}}{\sum_{x_{l} \in R (t_{i})} e^{x_{l}^{'} β}}$ and the left hand side is the score function for the partial likelihood. For each i with $δ_{i} = 1$ , Schoenfeld residual ² is defined as

r_{i} = x_{i} - E (x_{i} | R (t_{i})),

where for each component j with $1 \leq j \leq p$ , $r_{i j} = x_{i j} - E (x_{i j} | R (t_{i}))$ . Grambsch and Therneau¹⁶ proposed the scaled Schoenfeld residual to assess the proportional hazards assumption. The martingale residual can assist in assessing the proportional hazards assumption as well, which was discussed by Lagakos¹⁷ and is based on the following martingale process:

{\hat{M}}_{i} (t) = N_{i} (t) - \int_{0}^{t} I_{[t_{i} \geq s]} e^{x_{i}^{'} \hat{β}} d {\hat{Λ}}_{0} (s), i = 1, \dots, n

where $N_{i} (t) = I_{[t_{i} \leq t, δ_{i} = 1]}$ . The martingale residual for individual i is defined as:

{\hat{M}}_{i} = {\hat{M}}_{i} (\infty) = δ_{i} - \int_{0}^{\infty} I_{[t_{i} \geq s]} e^{x_{i}^{'} \hat{β}} d {\hat{Λ}}_{0} (s) .

As indicated above, one concern in using the martingale residual is that it tends to be asymmetric. To overcome this problem, Therneau et al. ¹⁸ introduced the deviance residual as a transformation of the martingale residual and it is defined as

d_{i} = s i g n ({\hat{M}}_{i}) {[- 2 {{\hat{M}}_{i} + δ_{i} l o g (δ_{i} - {\hat{M}}_{i})}]}^{\frac{1}{2}} .

All the aforementioned residuals are helpful for assessing the proportional hazards assumption. The score residual⁴ is useful for studying the influence of one observation. Similar to the Schoenfeld residual, the score residual is based on the score function. Since the score function $\sum_{δ_{i} = 1} (x_{i} - E (x_{i} | R (t_{i})))$ can be rewritten as

$\sum_{i = 1}^{n} \int_{0}^{\infty} (x_{i} - E (x_{i} | R (s))) d {\hat{M}}_{i} (s)$ , by the fact that

$\int_{0}^{\infty} (x_{i} - E (x_{i} | R (s))) d N_{i} (s) = x_{i} - (x_{i} | R (t_{i}))$ for $δ_{i} = 1$ and

$\int_{0}^{\infty} (x_{i} - E (x_{i} | R (s))) d N_{i} (s) = 0$ for $δ_{i} = 0$ .

The score residual for individual i is defined as

\int_{0}^{\infty} (x_{i} - E (x_{i} | R (s))) d {\hat{M}}_{i} (s) .

3. Simulation Studies

3.1. Simulation Design

We describe different simulations scenarios conducted in this section. The survival data are generated under the proportional hazards assumption in order to compare the performances of residuals estimated from the full likelihood and the partial likelihood functions. We consider a sample size n with p different covariates. In the scenario of the proportional hazards, the true survival time for individual $i, T_{i}$ , is generated as $T_{i} = {Λ_{0}}^{- 1} [- \log (V_{i}) e^{{x_{i}}^{'} β}]$ ¹⁹, where V is a uniform random variable, and $Λ_{0}$ is the cumulative hazard function such that $Λ_{0} (t) = \int_{0}^{t} λ_{0} (y) d y$ . If $C_{i}$ represents the censoring time for individual i, there the observed event (death) time for individual i is given by $t_{i} = m i n (T_{i}, C_{i})$ . We consider exponential hazard as well as the Weibull hazard functions as choices for $λ_{0}$ as well for the scale parameter (2) and shape parameter (0.5 for the Weilbull distribution). The censoring times $C_{1}, \dots, C_{n}$ are generated from uniform distributions independent of the true survival times, with four different censoring proportions (0, 0.10, 0.20, and 0.50) as described in Halabi and Singh²⁰. Three thousand simulations were generated for each scenario.

Due to the lack of guidelines in the literature for simulating outliers in a regression study, we assume that the outliers are generated through the extreme observations in the covariate space, also known as high leverage points. In general, high leverage points may or may not lead to outliers and influential observations. However, for simplicity, we assume that the high leverage points are related to the outliers in our simulation settings. We refer the reader to the following references ^6,11–13 for detailed discussions on outliers, leverages, and influential observations.

To assess the performance of the two methods in detecting outliers, covariate values are generated from three different normal distributions such that the 95% of the sample values are from a multivariate normal distribution with mean $μ_{1}$ and covariance matrix Σ, while the remaining 5% are generated from a multivariate normal distribution with mean $μ_{2}$ and covariance matrix $Σ$ . We have assumed $n = 300, p = 3$ $, μ_{1} = {(0,0, 0)}^{'}, μ_{2} = {(- 2, - 3, - 4)}^{'}, a n d$

Σ = (\begin{matrix} 1 & 0.1 & 0.1 \\ 0.1 & 1 & 0.1 \\ 0.1 & 0.1 & 1 \end{matrix}) .

We consider two different sets for $β : β = {(1, 2, - 1)}^{'}$ and $β = {(0.2, 0.4, - 0.2)}^{'}$ .

We follow two approaches for classifying outliers based on the residuals. Suppose $s_{i j}$ is the full-likelihood based score residual obtained for the $j^{t h}$ covariate component of the individual $(j = 1, 2, 3; i = 1, \dots, 300)$ . If $Q_{1 j}$ and $Q_{3 j}$ are the first and the third quartiles based on the residuals ${s_{1 j}, s_{2 j}, \dots, s_{n j}}$ of the $j^{t h}$ covariate component, then the inter-quartile range (IQR) is obtained as $Q_{3 j} - Q_{1 j}$ . Following Tukey’s approach²¹ on detecting outliers based on quartile measures, we construct the interval $[Q_{1 j} - 1.5 (Q_{3 j} - Q_{1 j}), Q_{3 j} + 1.5 (Q_{3 j} - Q_{1 j})]$ where a typical k observation is classified as a potential outlier for the $j^{t h}$ covariate component if the corresponding residual value $s_{k j}$ is not contained in the aforementioned interval. We have also considered another interval $[Q_{1 j} - 0.5 (Q_{3 j} - Q_{1 j}), Q_{3 j} + 0.5 (Q_{3 j} - Q_{1 j})] .$ We refer to these intervals as quartile based threshold intervals. We construct threshold intervals for the partial likelihood score residuals. We also consider thresholds based on the median absolute deviation (MAD) ^22–23. If $M_{j}$ is the median based on the $j^{t h}$ covariate component residuals ${s_{1 j}, s_{2 j}, \dots, s_{n j}}$ , then the corresponding MAD is defined as the median of ${| s_{1 j} - M_{j} |, | s_{2 j} - M_{j} |, \dots, | s_{n j} - M_{j} |}$ . A threshold interval for detecting potential outlier for the $j^{t h}$ covariate component is obtained as $[M_{j} - 3 D_{j}, M_{j} + 3 D_{j}]$ and $[M_{j} -, M_{j} + D_{j}]$ ,, where $D_{j}$ is the MAD for the $j^{t h}$ covariate component. Any k observation is classified as a potential outlier for the $j^{t h}$ covariate component if the corresponding residual value $s_{k j}$ is not contained by this interval. We also construct similar MAD-based threshold intervals for detecting potential outliers based on the partial likelihood score residuals.

We compute the deviance residuals based on the full likelihood as well as the partial likelihood approach for detecting potential outliers. Unlike the score residuals which are calculated for every covariate for each individual, the deviance residuals have only one value per individual. Suppose that ${v_{1}, v_{2}, \dots, v_{n}}$ is the set of deviance residuals obtained from the whole data, $M_{V}$ and $D_{V}$ are the median and MAD based on the set of deviance residual, then a threshold interval based on MAD for detecting potential outlier observations is obtained as $[M_{V} - D_{V}, M_{V} + D_{V}]$ . Any k observation is considered a potential outlier if the corresponding deviance residual $v_{k}$ is not contained in this interval. These threshold intervals are constructed using both the partial as well as the full likelihood based deviance residuals.

We apply equations 2.7 and 2.11 to compute the score and the deviance residuals using the full likelihood approach. On the other hand, we fit the Cox’s proportional hazards model and compute the score and deviance residuals using the partial likelihood function. The function “residual” of the R package “survival” ²⁴ computes the aforementioned residuals for the partial likelihood estimation, one may specify residual types as “martingale”, “deviance”, “score”, “schoenfeld” and “scaledsch”, where “scaledsch” denotes the scaled Schoenfeld residual and the other types are self-explanatory. We compare our proposed score and deviance residuals with the counterparts based on the partial likelihood approach using this R package.

We investigate the performance of the score residuals for each of the three covariates as well of the deviance residuals for the two different likelihood by computing the area under the receiver operating characteristic curve (AUC) using the library pROC in R.²⁵ There is only one decision interval for a data sample (or for each coefficient of a data set), that is, we only have one estimation pair of the sensitivity and the specificity. Therefore, the AUC estimate is based on the ROC curve using the only estimation pair of the sensitivity and the specificity and two end points (0, 0) and (1, 1) on the sensitivity versus the 1-specificity plane. The AUC values for the full likelihood and the corresponding partial likelihood residuals are presented. Empirical standard errors of the AUC are also provided based on Monte Carlo simulations and the results are averaged over 3,000 simulations. The R codes were written by the second author and are available on https://duke.box.com/s/kjnruu2e9corg1rrv0g0uncikucu41u7.

3.2. Simulation Results

3.2.1. Small effect size

We present the simulation results for the small effect size where $β = {(0.2, 0.4, - 0.2)}^{'}$ . In computing the full likelihood score-type and deviance-type residuals, we have utilized the Kaplan-Meier estimator of the baseline cumulative hazard function denoted as KM in the tables. We observe that as the censoring proportion increases the performance of score residuals estimated from the full likelihood becomes better than that score residuals from the partial likelihood. This is evident when the censoring proportion is 0.50 for the Weibull distribution (supplemental Table 1, Figure 1) and for 0.20 and 0.50 censoring proportion when the failure rate was exponentially distributed (supplemental Table 2, Figure 2). On the other hand, the partial likelihood score residuals have higher levels of AUC than the full likelihood residuals under lower censoring proportion (namely, 0 and 0.10). When we narrow the thresholds, the AUC values of both the full likelihood and the partial likelihood residuals increase. However, the full likelihood approach outperforms the partial likelihood residuals at high censoring proportion where the AUC values of the full likelihood residuals reach as high as 0.81 (supplemental Table 2, Figure 2).

Figure 1. — Plot summarizing the AUC levels for the score residuals computed using the full and partial likelihood functions involving the covariate $X_{. 3}$

Figure 2. — Plot summarizing the AUC levels for the score residuals computed using the full and partial likelihood functions involving the covariate $X_{. 2}$

We assess the performances of the full likelihood based deviance residuals with the partial likelihood based deviance residuals assuming the failure times follow exponential and Weibull distributions, respectively (Table 1). We observe that the full likelihood based deviance residuals have higher AUC levels than the partial likelihood based deviance residuals in identifying outliers irrespective of the censoring proportion and failure time distributions (Table 1).

Table 1.

AUC and its standard error (SE) based on the partial and full likelihood methods of the deviance residuals where $β = {(0.2, 0.4, - 0.2)}^{'}$ , mean of the covariate distribution of the true outliers as $μ_{2} = {(- 2, - 3, - 4)}^{'}$ and based on 3000 Monte-Carlo simulations.

	AUC (SE)
Method	0 censoring	0.10 censoring	0.20 censoring	0.50 censoring
Exponential Distribution
KM full likelihood deviance	0.621 (0.066)	0.611 (0.065)	0.598 (0.065)	0.517 (0.114)
Partial likelihood deviance	0.502 (0.059)	0.519 (0.061)	0.521 (0.061)	0.458 (0.051)
Weibull Distribution
KM full likelihood deviance	0.621 (0.066)	0.605 (0.065)	0.579 (0.066)	0.548 (0.094)
Partial likelihood deviance	0.501 (0.058)	0.503 (0.060)	0.503 (0.060)	0.470 (0.061)

Open in a new tab

KM=Kaplan-Meier Estimator for the Cumulative Baseline Hazard Function

We have observed similar results for the AUC levels of the score (and deviance) residual based on the full likelihood function utilizing the Kaplan-Meier (KM) and the Nelson-Aalen (NA) estimators for the baseline cumulative hazard function (supplemental Table 2). Therefore, we report the results from the KM estimator for the full likelihood residual results in the simulations studies to avoid redundancy.

3.2.2. Large effect size

We next consider the scenario where $β = {(1, 2, - 1)}^{'}$ and present the AUC values for the different score residuals assuming Weibull failure times in Table 2. We observe that the AUC of the full likelihood score residual is greater than the partial likelihood score residuals irrespective of the censoring proportion. The AUC values of the score residuals in this setting of large effect size appear to be smaller than the corresponding AUC values under similar censoring proportion in the setting of small effect size, i.e., $β = {(0.2, 0.4, - 0.2)}^{'}$ . This is more evident in the AUC values of the partial likelihood score residuals. Particularly, at the high censoring proportion of 0.50, the AUC values of the partial likelihood residuals remain below 0.5 making them practically ineffective. On the other hand, the full likelihood score residual maintains high AUC values with increased censoring proportion.

Table 2.

AUC and its standard error (SE) for different methods of score residuals assuming Weibull failure times, $= {(1, 2, - 1)}^{'}$ , assuming mean of the covariate distribution of the true outliers as $μ_{2} = {(- 2, - 3, - 4)}^{'}$ based on 3000 Monte-Carlo simulations

Method	AUC (SE)
	0 censoring			0.10 censoring			0.20 censoring			0.50 censoring
	$X_{. 1}$	$X_{. 2}$	$X_{. 3}$	$X_{. 1}$	$X_{. 2}$	$X_{. 3}$	$X_{. 1}$	$X_{. 2}$	$X_{. 3}$	$X_{. 1}$	$X_{. 2}$	$X_{. 3}$
KM Score Residual¹	0.617 (0.061)	0.698 (0.066)	0.779 (0.065)	0.592 (0.058)	0.678 (0.066)	0.760 (0.066)	0.568 (0.053)	0.648 (0.065)	0.745 (0.065)	0.522 (0.067)	0.578 (0.087)	0.704 (0.095)
KM Score Residual ²	0.604 (0.059)	0.684 (0.065)	0.764 (0.067)	0.578 (0.055)	0.662 (0.066)	0.744 (0.067)	0.558 (0.050)	0.634 (0.064)	0.729 (0.065)	0.529 (0.069)	0.603 (0.099)	0.693 (0.096)
KM Score Residual³	0.657 (0.065)	0.725 (0.058)	0.783 (0.051)	0.648 (0.066)	0.736 (0.060)	0.782 (0.053)	0.627 (0.067)	0.729 (0.061)	0.777 (0.056)	0.548 (0.097)	0.634 (0.112)	0.731 (0.079)
KM Score Residual⁴	0.652 (0.062)	0.714 (0.054)	0.759 (0.044)	0.652 (0.063)	0.725 (0.053)	0.761 (0.046)	0.636 (0.066)	0.726 (0.054)	0.758 (0.048)	0.588 (0.098)	0.675 (0.092)	0.723 (0.069)
Partial Likelihood Score Residual¹	0.573 (0.053)	0.670 (0.063)	0.762 (0.062)	0.554 (0.051)	0.639 (0.062)	0.714 (0.063)	0.536 (0.050)	0.606 (0.059)	0.662 (0.064)	0.442 (0.044)	0.472 (0.050)	0.488 (0.055)
Partial Likelihood Score Residual²	0.566 (0.052)	0.668 (0.064)	0.751 (0.062)	0.551 (0.049)	0.635 (0.062)	0.708 (0.062)	0.535 (0.048)	0.602 (0.059)	0.661 (0.062)	0.455 (0.042)	0.490 (0.048)	0.495 (0.054)
Partial Likelihood Score Residual³	0.606 (0.064)	0.719 (0.062)	0.761 (0.054)	0.562 (0.064)	0.660 (0.066)	0.696 (0.062)	0.520 (0.062)	0.600 (0.066)	0.631 (0.066)	0.388 (0.051)	0.417 (0.057)	0.445 (0.062)
Partial Likelihood Score Residual⁴	0.604 (0.065)	0.701 (0.058)	0.736 (0.050)	0.550 (0.066)	0.636 (0.065)	0.672 (0.061)	0.502 (0.064)	0.574 (0.067)	0.609 (0.066)	0.372 (0.054)	0.402 (0.060)	0.425 (0.064)

Open in a new tab

KM=Kaplan-Meier Estimator for the Cumulative Baseline hazard Function.

Quartile based threshold $[Q_{1 j} - 1.5 (Q_{3 j} - Q_{1 j}), Q_{3 j} + 1.5 (Q_{3 j} - Q_{1 j})]$

MAD based threshold $[M_{j} - 3 D_{j}, M_{j} + 3 D_{j}]$ ,

Quartile based narrower threshold $[Q_{1 j} - 0.5 (Q_{3 j} - Q_{1 j}), Q_{3 j} + 0.5 (Q_{3 j} - Q_{1 j})]$

⁴

MAD based narrower threshold $[M_{j} - D_{j}, M_{j} + D_{j}]$

4. Data Analysis

4.1. Primary Biliary Cirrhosis data analysis

We computed the score and deviance residuals to identify outliers using the primary biliary cirrhosis (PBC), a well-studied example in the literature ⁴. The original data set includes information on survival time, survival status (dead or censored), disease type, treatment type, age, gender, and some baseline laboratory measurements of 424 patients. The censoring proportion in this data set was 0.61. We are interested in examining the effects of a patient’s age, albumin level, and bilirubin level on the outcome overall survival. These three continuous variables have observed values for all the patients. We fit a proportional hazards model with the three covariates and we evaluate whether there is any outlier to the fitted model. We apply our proposed full likelihood based residuals, as described in section 2. In addition, we compare the full likelihood residuals with the partial likelihood score residuals.

We present the plots of the full likelihood based score residuals using the KM cumulative hazard estimators for the variables age, albumin, and bilirubin, respectively (Figure 3). The left panel shows the full likelihood score residuals whereas the right panel presents the score residuals estimated using the partial likelihood function. In each of the figures, the middle horizontal line refers to the residual value of zero, while the uppermost and the lowermost horizontal lines refer to the upper and lower limits of the quartile based threshold interval of the full likelihood score residuals as described in Section 3. We observe that the score residuals for all the three variables are symmetrically distributed around zero, which is expected. However, the dispersion of the full likelihood based score residuals is large compared with the partial likelihood based residuals. The score residuals for age and albumin covariates are randomly distributed around zero with wide dispersion (Figures 3A–3B), with few residual values falling below the lower limit of the threshold interval. This suggests that despite the presence of a few outliers the fitted proportional hazards model with age and albumin may be a good fit.

Figures 3A-3C. — Plots of the full likelihood and partial likelihood score residuals against age, albumin, and bilirubin values in PBC data

On the other hand, we observe that the score residuals of bilirubin are close to zero at the lower values of bilirubin, but gradually deviate from zero as the bilirubin levels increase (Figure 3C). This leads to a high number of outliers for larger values of bilirubin that are outside the limits of the threshold interval. This suggests that bilirubin should be transformed and not modeled in its original scale.

Furthermore, we present the plot of the deviance residual against risk scores. The risk score variable for every sample is constructed based on the three given covariates (age, bilirubin, albumin). This is done by multiplying the covariate value with the corresponding regression coefficient estimated from the full likelihood approach and then taking the sum over the products (Figure 4). The uppermost and the lowermost horizontal lines represent the upper and lower threshold limits at y-axis values of 3 and −3, respectively, while the horizontal line at the middle represent the y-axis value of zero. We note that the deviance residual with respect to the linear predictor have a distinct pattern in that the deviance residual values tend to be more dispersed with the increased values of the risk score.

Figure 4. — Plot of full likelihood deviance residual against risk score (linear predictor) of age, albumin, and bilirubin values in PBC data

4.2. German Breast Cancer data analysis

The German Breast Cancer study ²⁶ data comprises of 686 individuals affected with breast cancer where we have information on the survival time, censoring (right censoring) status, age, menopause status, tumor grade, tumor size, hormone therapy status (whether or not hormone therapy received), progesterone receptor level, and estrogen receptor level of each of the patients. The censoring proportion in this data set is 0.56.

Figure 5 shows the score residual for age, progesterone and estrogen. The left panel presents the results from the full likelihood residuals whereas the right panel is for the partial likelihood residuals. For the age covariate, overall, the patterns of the score residuals based on the full likelihood and the partial likelihood are similar (Figure 5A). While both sets of score residuals are randomly and symmetrically placed around the zero line, the full likelihood residuals are more widely spread than the partial likelihood residuals. Moreover, we note that the full likelihood score residuals for age are within the quartile -based thresholds as marked by the uppermost and the lowermost horizontal lines (Figure 5A). This implies that there are no significant outliers for age with respect to the fitted model.

Figures 5A-5C — Plots of full likelihood and partial likelihood score residuals for age, progesterone receptor, and estrogen receptor levels in the German breast cancer study

We present the score residual plots for progesterone and estrogen receptor levels in Figures 5B–5C. We observe that the full likelihood score residuals and the partial likelihood score residuals follow similar patterns. These residuals, however, are not randomly distributed and tend to deviate further away from the zero line as the values of the covariates increase. This also leads to an increased number of outliers from the plots as many observations fall outside the quartile based thresholds. From the above analysis, we conclude that modelling progesterone and estrogen receptors in their original scales is not appropriate in the proportional hazards model.

5. Discussion

We have used the full likelihood approach to develop innovative residuals for the proportional hazards model. We compute the score-type residuals by solving the score equations from the full likelihood function. In addition, we calculate the deviance residuals from the deviance function based on the full likelihood function. We compare the performances of the score and deviance residuals computed from the full likelihood function with the score and deviance residuals computed from the partial likelihood function in identifying outliers. Overall, we found that the score residuals derived from the full likelihood function have higher area under the curve (AUC) values in detecting outliers than the partial likelihood approach. Specifically, the full likelihood score residuals outperform the competing methods when the censoring proportion is high. In addition, the deviance residual based on the full likelihood approach had higher values in the AUC than the deviance residual derived from the partial likelihood function. It is noteworthy to point out that residuals from the two methods have different formulas and it is likely that that the residuals from the full likelihood function has large variability compared to that from the partial likelihood function due to the baseline cumulative hazard.

To our best of our knowledge, there are very few articles that have simulated data in studying residuals. One of the main challenges in detecting outliers is how to determine an optimal threshold limit for the appropriate detection of outliers. In our simulation settings, we have generated extreme observations in the covariate space to indicate the presence of outliers. We have used Tukey’s quantile based threshold and the median based threshold limits for comparing the AUC of the full likelihood and the partial likelihood residuals. Such thresholds have been utilized in previous studies^16–17 and are useful in comparing the detection ability among the different methods. It is worth indicating that the AUC values for both the full and partial likelihood methods, obtained from the simulation studies, depends on the choice of threshold used for detecting outliers. In other words, one can attain higher overall detection accuracies by using some variations of these thresholds. For example, when we restrict the distance between the lower and upper thresholds, the values of the AUCs from both methods increase. We note that the AUC levels for the score residuals in detecting outliers based on the full likelihood approach is also higher than the partial likelihood approach. Most of the existing residual methods tend to use graphical representations to identify outliers. While visual methods are useful, they tend to be subjective. Nevertheless, we have provided an objective framework to study outliers. More research, however, is needed to set standard definition for thresholds that can help in identifying outlier efficiently.

In summary, we have employed a two-step procedure in estimating the score and deviance residuals based on the full likelihood function. In the first step, one needs to estimate the baseline cumulative hazard function in constructing the residuals. We have considered both the Kaplan-Meier and the Nelson-Aalen estimators for this purpose. The results showed that there is no difference in the performances of the new residuals based on the two different estimators of the baseline cumulative hazard function. Residuals derived from the full likelihood function have higher AUC levels in detecting outlier than the partial likelihood approaches when the censoring proportion is high. They are robust and can be easily computed. Investigators are encouraged to use them in identifying outliers when the censoring proportion is high.

Supplementary Material

Supplementary Materials

NIHMS1655873-supplement-Supplementary_Materials.docx^{(39.7KB, docx)}

Acknowledgement

This research was supported in part by National Institutes of Health Grants R21 CA195424, U01CA157703, United States Army Medical Research W81XWH-15-1-0467 and W81XWH-18-1-0278, and the Prostate Cancer Foundation Challenge Award. The authors would like to thank Dr. Bahadur Singh for his insightful comments and helpful suggestions on this manuscript prior to his death.

References

1.Cox DR. Regression models and life tables (with discussion). J Roy Stat Soc Series B 1972;74:187–220. [Google Scholar]
2.Schoenfeld D Partial residuals for the proportional hazards regression model. Biometrika 1982;69:239–241. [Google Scholar]
3.Fleming TR, Harrington DP. Counting Processes and Survival Analysis. Wiley, New York, 1991. [Google Scholar]
4.Therneau TM, Grambsch PM. Modeling Survival Data: Extending the Cox Model. Springer, New York, 2000. [Google Scholar]
5.Barlow WE, Prentice RL. Residuals for relative risk regression. Biometrika 1988;75:65–74. [Google Scholar]
6.Colette D Modelling Survival Data in Medical Research. 3^rd edition, Chapman and Hall/CRC, New York, 2014. [Google Scholar]
7.Cain KC, Lange NT. Approximate case influence for the proportional hazards regression model with censored data. Biometrics 1984;40:493–499. [PubMed] [Google Scholar]
8.Lawless JF. Statistical Models and Methods for Lifetime Data. Wiley, New York, 1982. [Google Scholar]
9.Kaplan EL, Meier P. Non-parametric estimation from incomplete observations. J Amer Statist Assoc 1958; 53: 457–481. [Google Scholar]
10.Aalen O, Borgan O, Gjessing H Survival and Event History Analysis: A Process Point of View (Statistics for Biology and Health). Springer, New York: 2008. [Google Scholar]
11.Belsley DA, Kuh E, Welsh RE. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley, New York, 1980. [Google Scholar]
12.Draper NR, Smith H. Applied Regression Analysis. 2nd edition, Wiley, New York, 1981. [Google Scholar]
13.Cook RD, Weisberg S. Residuals and Influence in Regression. Chapman and Hall, New York, 1982. [Google Scholar]
14.Agresti A An Introduction to Categorical Data Analysis. 3^rd ed. Wiley, New York, 2019. [Google Scholar]
15.Cox DR. Partial likelihood. Biometrika 1975;62:269–276. [Google Scholar]
16.Grambsch PM, Therneau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 1994;81:515–526. [Google Scholar]
17.Lagakos SW. The graphical evaluation of explanatory variables in proportional hazard regression models. Biometrika 1981;68: 93–98. [Google Scholar]
18.Therneau TM, Grambsch PM, FlemingTR. Martingale-based residuals for survival models. Biometrika 1990;77:147–160. [Google Scholar]
19.Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Stat Med 2005;24:1713–1723. [DOI] [PubMed] [Google Scholar]
20.Halabi S, Singh B. Sample size determination for comparing several survival curves with unequal allocations. Stat Med 2004;23:1793–1815. [DOI] [PubMed] [Google Scholar]
21.Frigge M, Hoaglin DC, Iglewicz B. Some implementations of the boxplot. Am Stat 1989;43:50–54. [Google Scholar]
22.Rousseeuw PJ, Croux C. Alternatives to the median absolute deviation. J Am Stat Ass 1993;88:1273–1283. [Google Scholar]
23.Leys C, Ley C, Klein O, Bernard P, Licata L. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J Experimental Social Psychol 2013;49:764–766. [Google Scholar]
24.Therneau TM, Lumley T, Elizabeth A, Cynthia C (2020). survival: Survival Analysis. R package version 3.1–12. [Google Scholar]
25.Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JF, Müller M. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011. 12, 77 DOI: 10.1186/1471-2105-12-77 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Schumacher M, Bastert G, Bojar H. Randomized 2 × 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. German Breast Cancer Study Group. J Clin Oncol 1994;12:2086–2093. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1655873-supplement-Supplementary_Materials.docx^{(39.7KB, docx)}

[R1] 1.Cox DR. Regression models and life tables (with discussion). J Roy Stat Soc Series B 1972;74:187–220. [Google Scholar]

[R2] 2.Schoenfeld D Partial residuals for the proportional hazards regression model. Biometrika 1982;69:239–241. [Google Scholar]

[R3] 3.Fleming TR, Harrington DP. Counting Processes and Survival Analysis. Wiley, New York, 1991. [Google Scholar]

[R4] 4.Therneau TM, Grambsch PM. Modeling Survival Data: Extending the Cox Model. Springer, New York, 2000. [Google Scholar]

[R5] 5.Barlow WE, Prentice RL. Residuals for relative risk regression. Biometrika 1988;75:65–74. [Google Scholar]

[R6] 6.Colette D Modelling Survival Data in Medical Research. 3^rd edition, Chapman and Hall/CRC, New York, 2014. [Google Scholar]

[R7] 7.Cain KC, Lange NT. Approximate case influence for the proportional hazards regression model with censored data. Biometrics 1984;40:493–499. [PubMed] [Google Scholar]

[R8] 8.Lawless JF. Statistical Models and Methods for Lifetime Data. Wiley, New York, 1982. [Google Scholar]

[R9] 9.Kaplan EL, Meier P. Non-parametric estimation from incomplete observations. J Amer Statist Assoc 1958; 53: 457–481. [Google Scholar]

[R10] 10.Aalen O, Borgan O, Gjessing H Survival and Event History Analysis: A Process Point of View (Statistics for Biology and Health). Springer, New York: 2008. [Google Scholar]

[R11] 11.Belsley DA, Kuh E, Welsh RE. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley, New York, 1980. [Google Scholar]

[R12] 12.Draper NR, Smith H. Applied Regression Analysis. 2nd edition, Wiley, New York, 1981. [Google Scholar]

[R13] 13.Cook RD, Weisberg S. Residuals and Influence in Regression. Chapman and Hall, New York, 1982. [Google Scholar]

[R14] 14.Agresti A An Introduction to Categorical Data Analysis. 3^rd ed. Wiley, New York, 2019. [Google Scholar]

[R15] 15.Cox DR. Partial likelihood. Biometrika 1975;62:269–276. [Google Scholar]

[R16] 16.Grambsch PM, Therneau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 1994;81:515–526. [Google Scholar]

[R17] 17.Lagakos SW. The graphical evaluation of explanatory variables in proportional hazard regression models. Biometrika 1981;68: 93–98. [Google Scholar]

[R18] 18.Therneau TM, Grambsch PM, FlemingTR. Martingale-based residuals for survival models. Biometrika 1990;77:147–160. [Google Scholar]

[R19] 19.Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Stat Med 2005;24:1713–1723. [DOI] [PubMed] [Google Scholar]

[R20] 20.Halabi S, Singh B. Sample size determination for comparing several survival curves with unequal allocations. Stat Med 2004;23:1793–1815. [DOI] [PubMed] [Google Scholar]

[R21] 21.Frigge M, Hoaglin DC, Iglewicz B. Some implementations of the boxplot. Am Stat 1989;43:50–54. [Google Scholar]

[R22] 22.Rousseeuw PJ, Croux C. Alternatives to the median absolute deviation. J Am Stat Ass 1993;88:1273–1283. [Google Scholar]

[R23] 23.Leys C, Ley C, Klein O, Bernard P, Licata L. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J Experimental Social Psychol 2013;49:764–766. [Google Scholar]

[R24] 24.Therneau TM, Lumley T, Elizabeth A, Cynthia C (2020). survival: Survival Analysis. R package version 3.1–12. [Google Scholar]

[R25] 25.Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JF, Müller M. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011. 12, 77 DOI: 10.1186/1471-2105-12-77 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Schumacher M, Bastert G, Bojar H. Randomized 2 × 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. German Breast Cancer Study Group. J Clin Oncol 1994;12:2086–2093. [DOI] [PubMed] [Google Scholar]

PERMALINK

Score and Deviance Residuals Based on the Full Likelihood Approach in Survival Analysis

Susan Halabi

Sandipan Dutta

Yuan Wu

Aiyi Liu

Abstract

1. Introduction