SemiCompRisks: An R Package for the Analysis of Independent and Cluster-correlated Semi-competing Risks Data

Danilo Alvares; Sebastien Haneuse; Catherine Lee; Kyu Ha Lee

doi:10.32614/rj-2019-038

. Author manuscript; available in PMC: 2021 Feb 17.

Published in final edited form as: R J. 2019 Aug 20;11(1):376–400. doi: 10.32614/rj-2019-038

SemiCompRisks: An R Package for the Analysis of Independent and Cluster-correlated Semi-competing Risks Data

Danilo Alvares ¹, Sebastien Haneuse ², Catherine Lee ³, Kyu Ha Lee ⁴

PMCID: PMC7889044 NIHMSID: NIHMS1668679 PMID: 33604061

Abstract

Semi-competing risks refer to the setting where primary scientific interest lies in estimation and inference with respect to a non-terminal event, the occurrence of which is subject to a terminal event. In this paper, we present the R package SemiCompRisks that provides functions to perform the analysis of independent/clustered semi-competing risks data under the illness-death multi-state model. The package allows the user to choose the specification for model components from a range of options giving users substantial flexibility, including: accelerated failure time or proportional hazards regression models; parametric or non-parametric specifications for baseline survival functions; parametric or non-parametric specifications for random effects distributions when the data are cluster-correlated; and, a Markov or semi-Markov specification for terminal event following non-terminal event. While estimation is mainly performed within the Bayesian paradigm, the package also provides the maximum likelihood estimation for select parametric models. The package also includes functions for univariate survival analysis as complementary analysis tools.

Introduction

Semi-competing risks refer to the general setting where primary scientific interest lies in estimation and inference with respect to a non-terminal event (e.g., disease diagnosis), the occurrence of which is subject to a terminal event (e.g., death) (Fine et al., 2001; Jazić et al., 2016). When there is a strong association between two event times, naïive application of a univariate survival model for non-terminal event time will result in overestimation of outcome rates as the analysis treats the terminal event as an independent censoring mechanism (Haneuse and Lee, 2016). The semi-competing risks analysis framework appropriately treats the terminal event as a competing event and considers the dependence between non-terminal and terminal events as part of the model specification.

Toward formally describing the structure of semi-competing risks data, let T₁ and T₂ denote the times to the non-terminal and terminal events, respectively. From the modeling perspective, the focus in the semi-competing risks setting is to characterize the distribution T₁ and its potential relationship with the distribution of T₂, i.e. the joint distribution of (T₁, T₂). For example, from an initial state (e.g., transplantation), as time progresses, a subject could make a transition into the non-terminal or terminal state (see Figure 1.a). In the case of a transition into the non-terminal state, the subject could subsequently transition into the terminal state even if these transitions cannot occur in the reverse order. The main disadvantage of the competing risks framework (see Figure 1.b) to the study of non-terminal event is that it does not utilize the information on the occurrence and timing of terminal event following the non-terminal event, which could be used to understand the dependence between the two events.

Figure 1: — Graphical representation of **(a)** semi-competing risks and **(b)** competing risks.

The current literature for the analysis of semi-competing risks data is composed of three approaches: methods that specify the dependence between non-terminal and terminal events via a copula (Fine et al., 2001; Wang, 2003; Jiang et al., 2005; Ghosh, 2006; Peng and Fine, 2007; Lakhal et al., 2008; Hsieh et al., 2008; Fu et al., 2013); methods based on multi-state models, specifically the so-called illness-death model (Liu et al., 2004; Putter et al., 2007; Ye et al., 2007; Kneib and Hennerfeind, 2008; Zeng and Lin, 2009; Xu et al., 2010; Zeng et al., 2012; Han et al., 2014; Zhang et al., 2014; Lee et al., 2015, 2016); and methods built upon the principles of causal inference (Zhang and Rubin, 2003; Egleston et al., 2007; Tchetgen Tchetgen, 2014; Varadhan et al., 2014).

The SemiCompRisks package is designed to provide a comprehensive suite of functions for the analysis of semi-competing risks data based on the illness-death model, together with, as a complementary suite of tools, functions for the analysis of univariate time-to-event data. While Bayesian methods are used for estimation and inference for all available models, maximum likelihood estimation is also provided for select parametric models. Furthermore, SemiCompRisks offers flexible parametric and non-parametric specifications for baseline survival functions and cluster-specific random effects distributions under accelerated failure time and proportional hazards models. The functionality of the package covers methods proposed in a series of recent papers on the analysis of semi-competing risks data (Lee et al., 2015, 2016, 2017c).

The remainder of the paper is organized as follows. Section Other packages and their features summarizes existing R packages that provide methods for multi-state modeling, and explains the key contributions of the SemiCompRisks package. Section CIBMTR data introduces an on-going study of stem cell transplantation and provides a description of the data available in the package. Section The illness-death models for semi-competing risks data presents different specifications of models and estimation methods implemented in our package. Section Package description summarizes the core components of the SemiCompRisks package, including datasets, functions for fitting models, functions, the structure of output provided to analysts. Section Illustration: Stem cell transplantation data illustrates the usage of the main functions in the package through three semi-competing risks analyses of the stem cell transplantation data. Finally, Section Discussion concludes with discussion and an overview of the extensions we are working on.

Other packages and their features

As we elaborate upon below, the illness-death model for semi-competing risks, that is the focus on the SemiCompRisks package, is a special case of the broader class of multi-state models. Currently, there are numerous R packages that permit estimation and inference for a multi-state model and that could conceivably be used to analyze semi-competing risks data.

The mvna package computes the Nelson-Aalen estimator of the cumulative transition hazard for arbitrary Markov multi-state models with right-censored and left-truncated data, but it does not compute transition probability matrices (Allignol et al., 2008). The TPmsm implements non-parametric and semi-parametric estimators for the transition probabilities in 3-state models, including the Aalen-Johansen estimator and estimators that are consistent even without Markov assumption or in case of dependent censoring (Araújo et al., 2014). The p3state.msm package performs inference in an illness-death model (Meira-Machado and Roca-Pardiñas, 2011). Its main feature is the ability for obtaining non-Markov estimates for the transition probabilities. The etm package calculates the empirical transition probability matrices and corresponding variance estimates for any time-inhomogeneous multi-state model with finite state space and data subject to right-censoring and left-truncation, but it does not account for the influence of covariates (Allignol et al., 2011). The msm package is able to fit time-homogeneous Markov models to panel count data and hidden Markov models in continuous time (Jackson, 2011). The time-homogeneous Markov approach could be a particular case of the illness-death model, where interval-censored data can be considered. The tdc.msm package may be used to fit the time-dependent proportional hazards model and multi-state regression models in continuous time, such as Cox Markov model, Cox semi-Markov model, homogeneous Markov model, non-homogeneous piecewise model, and non-parametric Markov model (Meira-Machado et al., 2007). The SemiMarkov package performs parametric (Weibull or exponentiated Weibull specification) estimation in a homogeneous semi-Markov model (Król and Saint-Pierre, 2015). Moreover, the effects of covariates on the process evolution can be studied using a semi-parametric Cox model for the distributions of sojourn times. The flexsurv package provides functions for fitting and predicting from fully-parametric multi-state models with Markov or semi-Markov specification (Jackson, 2016). In addition, the multi-state models implemented in flexsurv give the possibility to include interval-censoring and some of them also left-truncation. The msSurv calculates non-parametric estimation of general multi-state models subject to independent right-censoring and possibly left-truncation (Ferguson et al., 2012). This package also computes the marginal state occupation probabilities along with the corresponding variance estimates, and lower and upper confidence intervals. The mstate package can be applied to right-censored and left-truncated data in semi-parametric or non-parametric multi-state models with or without covariates and it may also be used to competing risk models (Wreede et al., 2011). Specifically for Cox-type illness-death models to interval-censored data, we highlight the packages coxinterval (Boruvka and Cook, 2015) and SmoothHazard (Touraine et al., 2017), where the latter also allows that the event times to be left-truncated. Finally, frailtypack package permits the analysis of correlated data under select clusterings, as well as the analysis of left-truncated data, through a focus on frailty models using penalized likelihood estimation or parametric estimation (Rondeau et al., 2012).

While these packages collectively provide broad functionality, each of them is either non-specific to semi-competing risks or only permits consideration of a narrow model specifications. In developing the SemiCompRisks package, the goal was to provide a single package within which a broad range of models and model specifications could be entertained. The frailtypack package, for example, can also be used to analyze cluster-correlated semi-competing risks data but it is restricted to the proportional hazards model with either patient-specific or cluster-specific random effects but not both (Liquet et al., 2012). Furthermore, estimation/inference is within the frequentist framework so that estimation of hospital-specific random effects, of particular interest in health policy applications (Lee et al., 2016), together with the quantification of uncertainty is incredibly challenging. This, however, is (relatively) easily achieved through the functionality of SemiCompRisks package. Given the breadth of the functionality of the package, in addition to the usual help files, we have developed a series of model-specific vignettes which can be accessed through the CRAN (Lee et al., 2017b) or R command vignette(“SemiCompRisks”), covering a total of 12 distinct model specifications.

CIBMTR data

The example dataset used throughout this paper was obtained from the Center for International Blood and Marrow Transplant Research (CIBMTR), a collaboration between the National Marrow Donor Program and the Medical College of Wisconsin representing a worldwide network of transplant centers (Lee et al., 2017a). For illustrative purposes, we consider a hypothetical study in which the goal is to investigate risk factors for grade III or IV acute graft-versus-host disease (GVHD) among 9, 651 patients who underwent the first allogeneic hematopoietic cell transplant (HCT) between January 1999 and December 2011.

As summarized in Table 1, after administratively censoring follow-up at 365 days post-transplant, each patient can be categorized according to their observed outcome information into four groups: (i) acute GVHD and death; (ii) acute GVHD and censored for death; (iii) death without acute GVHD; and (iv) censored for both. Furthermore, for each patient, the following covariates are available:gender (Male, Female); age (<10, 10–19, 20–29, 30–39, 40–49, 50–59, 60+); disease type (AML, ALL, CML, MDS); disease stage (Early, Intermediate, Advanced); and HLA compatibility (Identical sibling, 8/8, 7/8).

Table 1:

Covariate and simulated outcome information for 9,651 patients who underwent the first HCT between 1999–2011 with administrative censoring at 365 days.

	N	%	Outcome category (%)
	N	%	Both acute GVHD & death	Acute GVHD & censored for death	Death without acute GVHD	Censored for both
Total subjects	9,651	100.0	9.5	8.9	28.8	52.8
Gender
Male	5,366	55.6	9.7	9.5	28.1	52.7
Female	4,285	44.4	9.1	8.3	29.7	52.9
Age, years
<10	653	6.8	5.0	11.9	23.4	59.7
10–19	1,162	12.0	8.0	11.4	24.0	56.6
20–29	1,572	16.3	9.7	9.9	27.4	53.0
30–39	1,581	16.4	9.8	10.7	28.5	51.0
40–49	2,095	21.7	11.0	9.6	29.7	49.7
50–59	2,008	20.8	9.8	5.1	32.3	52.8
60+	580	6.0	9.9	4.8	33.1	52.2
Disease type
AML	4,919	51.0	8.2	8.0	30.3	53.5
ALL	2,071	21.5	9.9	9.0	29.3	51.8
CML	1,525	15.8	12.1	11.3	22.2	54.4
MDS	1,136	11.8	11.0	10.0	30.0	49.0
Disease status
Early	4,873	50.5	8.4	11.0	23.6	57.0
Intermediate	2,316	24.0	9.7	8.5	30.1	51.7
Advanced	2,462	25.5	11.5	5.4	37.7	45.4
HLA compatibility
Identical sibling	3,941	40.8	7.4	8.5	26.3	57.8
8/8	4,100	42.5	10.5	9.7	30.3	49.5
7/8	1,610	16.7	12.2	8.1	30.9	48.8

Analysis	Model	Data type	L-T and/or I-C	Statistical paradigm
Semi-competing risks	AFT	Independent	No	B
		Independent	Yes	B
		Clustered	No	x
		Clustered	Yes	x
	PHR	Independent	No	B & F
		Independent	Yes	x
		Clustered	No	B
		Clustered	Yes	x
Univariate	AFT	Independent	No	B
		Independent	Yes	B
		Clustered	No	x
		Clustered	Yes	x
	PHR	Independent	No	B & F
		Independent	Yes	x
		Clustered	No	B
		Clustered	Yes	x

	beta1	LL	UL	beta2	LL	UL	beta3	LL	UL
dTypeALL	1.49	1.20	1.8	1.37	1.09	1.7	0.99	0.78	1.3
dTypeCML	1.78	1.41	2.3	0.83	0.64	1.1	1.30	0.99	1.7
dTypeMDS	1.64	1.26	2.1	1.39	1.04	1.9	1.49	1.09	2.0
sexP	0.89	0.79	1.0	NA	NA	NA	NA	NA	NA

	h1-PM	LL	UL	h2-PM	LL	UL	h3-PM	LL	UL
Weibull: log-kappa	−6.14	−6.4	−5.90	−11.33	−11.74	−10.93	−6.873	−7.189	−6.557
Weibull: log-alpha	0.15	0.1	0.21	0.86	0.82	0.91	0.022	−0.033	0.077

Number of chains:	3
Number of scans:	5e+06
Thinning:	1000
Percentage of burnin:	50%

	exp(beta1)	LL	UL	exp(beta2)	LL	UL	exp(beta3)	LL	UL
dTypeALL	1.44	1.2	1.8	1.3	1.06	1.6	0.98	0.77	1.2
dTypeCML	1.71	1.4	2.1	0.8	0.63	1.0	1.25	0.96	1.6
dTypeMDS	1.61	1.3	2.1	1.4	1.04	1.8	1.44	1.07	2.0
sexP	0.89	0.8	1.0	NA	NA	NA	NA	NA	NA

	h1-PM	LL	UL	h2-PM	LL	UL	h3-PM	LL	UL
mu	−5.60	−6.006	−5.0	−5.0	−9.5	−2.3	−7.030	0.77	−6.5
sigmaSq	0.22	0.027	2.3	7.6	2.7	24.5	0.13	0.018	2.7
K	10.00	5.000	17.0	15.0	11.0	20.0	10.00	4.000	17.0

	exp(beta1)	LL	UL	exp(beta2)	LL	UL	exp(beta3)	LL	UL
dTypeALL	0.68	0.54	0.84	0.94	0.86	1.0	1.08	0.85	1.4
dTypeCML	0.53	0.42	0.67	1.27	1.12	1.4	0.92	0.71	1.2
dTypeMDS	0.58	0.44	0.75	0.88	0.78	1.0	0.78	0.58	1.0
sexP	1.16	0.99	1.36	NA	NA	NA	NA	NA	NA

	g=1: PM	LL	UL	g=2: PM	LL	UL	g=3: PM	LL	UL
log-Normal: mu	8.2	8.0	8.4	6.293	6.244	6.335	6.5	6.4	6.7
log-Normal: sigmaSq	7.2	6.4	8.0	0.013	0.005	0.033	1.7	1.5	2.0

Number of chains:	3
Number of scans:	50000
Thinning:	50
Percentage of burnin:	50%

	Estimate	SD	LL	UL
sexP	−0.19	0.09	0.68	0.99
sexP	−0.04	0.10	0.78	1.16
sexP	−0.08	0.11	0.74	1.14

	g=1	g=2	g=3
mu	1	1.2	1
sigmaSq	1	1.1	1

	g=1: PM	LL	UL	g=2: PM	LL	UL	g=3: PM	LL	UL
log-Normal: mu	8.3	8.2	8.6	6.4	6.38	6.5	6.1	5.9	6.2
log-Normal: sigmaSq	10.1	9.2	11.8	1.1	0.82	1.7	1.9	1.6	2.5

PERMALINK

SemiCompRisks: An R Package for the Analysis of Independent and Cluster-correlated Semi-competing Risks Data

Danilo Alvares

Sebastien Haneuse

Catherine Lee

Kyu Ha Lee

Abstract

Introduction

Figure 1:

Other packages and their features

CIBMTR data

Table 1:

The illness-death models for semi-competing risks data

AFT models for independent semi-competing risks data

PHR models for independent semi-competing risks data

PHR models for cluster-correlated semi-competing risks data

Estimation and inference

Package description

Summary of functionality

Table 2:

Model specification

Critical arguments

FreqID_HReg

BayesID_HReg

BayesID_AFT

Univariate survival data analysis

Summary output

Simulation of semi-competing risks data

Datasets

CIBMTR data.

BMT data.

Illustration: Stem cell transplantation data

Frequentist analysis

Independent semi-Markov PHR model with Weibull baseline hazards

Figure 2:

Bayesian analysis

Independent semi-Markov PHR model with PEM baseline hazards

Figure 3:

Independent AFT model with log-Normal baseline survival distribution

Discussion

Supplementary Material

Acknowledgments

Appendix

Simulation algorithm for semi-competing risks data

Simulating outcomes using CIBMTR covariates

Code for illustrative Bayesian examples

Independent semi-Markov PHR model with PEM baseline hazards

Independent AFT model with log-Normal baseline survival distribution

Contributor Information

Bibliography

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases