Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Feb 21.
Published in final edited form as: Commun Stat Simul Comput. 2020 Feb 7;51(7):3851–3867. doi: 10.1080/03610918.2020.1722838

Simulating survival data with predefined censoring rates under a mixture of non-informative right censoring schemes

Fei Wan 1
PMCID: PMC12922656  NIHMSID: NIHMS2136681  PMID: 41727356

Abstract

Simulation studies have been routinely used to validate the performances of statistical methods for censored survival data under various scenarios. Our previous work proposed an integrated approach of simulating right censored survival data for proportional hazards models given a set of arbitrarily distributed baseline covariates and predefined censoring rates. However, the limitations are that all study subjects are assumed to be enrolled at the same time and there is no study ending time. We extended the previous work to accommodate the more realistic scenario under which study subjects are enrolled at a constant rate during an enrollment period and are then followed until one of the following events occurs: (a) the event of interest (e.g., death or occurrence of disease); (b) the end of study period; (c) early withdraws from random censoring events, whichever comes first. To demonstrate the application of the proposed approach in practice, we generated censored survival data and assessed the impact of several factors (the magnitude of confounding, size of treatment effect, the sine distance between coefficient vectors of confounders in the treatment and outcome models, and censoring rate) on the potential bias of propensity score matching estimators in estimating conditional and marginal hazards ratios.

Keywords: Survival analysis, administrative censoring, non-informative right censoring, proportional hazards model, propensity score matching, bias

1. Introduction

Clinical researchers often encounter time to event outcomes in medical research. In a typical cohort study, patients enter the study during a fixed length enrollment period and are then followed to record their time until an event of interest occurs (i.e., death, occurrence of cancer, etc.), or until a censoring event occurs. Right censoring is the most common censoring mechanism in clinical studies, in which the individual’s time to event is greater than his or her censoring time.

There are several common types of right censoring in clinical research. For example, randomized trials are often designed to compare the effects of two competing therapies in improving cancer patients’ survival. Patients will be accrued in a fixed enrollment period (e.g., 2 years) and will be followed for a fixed period of time (e.g., 3 years). The whole study should be completed in 5 years. The first one is administrative censoring when the event is observed only if it occurs prior to the pre-specified study ending time. Because of time or cost considerations, the investigator will terminate the study or report the results before all subjects realize their events. The other type of right censoring is censoring from experiencing competing events or lost-to-follow up before the study ends. Some study subjects may experience competing events which cause them to be removed from the study, or they move from the study location for reasons unrelated to the event of interest. Thus, their events of interest are not observable.

Simulation studies have been routinely used to validate the performances of PH models and other statistical methods for survival outcomes in presence of censoring (Aalen, Cook, and Roysland 2015; Wu 2017). Our previous work (Wan 2017) presented an integrated approach of simulating survival data for PH models by simultaneously incorporating a baseline hazard function with known distribution, a known censoring distribution, and a set of baseline covariates of arbitrary distributions. The proposed approach numerically determines the value of censoring parameter in a specified censoring distribution to achieve the predefined censoring proportions in the simulated survival data. The limitations in the previous work are that the subjects enter the study at the same time and there is no study ending time point. In this study, we improved the previous framework to accommodate the more realistic scenario in which study subjects enter the study at varying time points during a fixed enrollment period and are subject to right censoring during the study period and the administrative censoring when the study ends.

This article is organized as follows. In Sec. 2, we lay out the notations, assumptions, and models for a typical cohort study with survival outcome of interest. In Sec. 3, we present the general approach of simulating censored survival data with baseline covariates for a predetermined censoring rate, allowing the conditions that study subjects enter the study at a constant rate during the enrollment period and are subject to random censoring events during the follow-up period and the administrative censoring when the study ends. In Sec. 4, we design the cohort studies to verify that the censoring rates in the simulated survival data are close to their nominal levels. To demonstrate the usefulness of the proposed method in real practice, in Sec. 5, we design a complex simulation study to assess the bias of propensity score (PS) matching estimators for survival outcomes with different censoring rates. We make our final conclusion in Sec. 6.

2. Notations, assumptions, and models

Suppose in a cohort study (Figure 1), we enroll a total of n patients. The calendar starting and terminal time of the study are predetermined by the investigator. We define l to be the total calendar time of the study from the study starting time to the study ending time, and a to be the length of study accrual period. Thus, every individual patient has a minimal follow-up period of length la. Each subject enters the study randomly at constant rate during the enrollment period 0,a, is then followed until one of the following events occurs: (i) the event of interest; (ii) the study ends; (iii) early withdraws from random censoring events, whichever happens first.

Figure 1.

Figure 1.

The study subjects are enrolled in the period 0,a. The follow-up period starts from the entry time point Ei to the study ending time point l.

We let Xi,j denote the j th baseline covariate of the i th subject and Xi=Xi,1,Xi,2,,Xi,p denote a 1×p vector of baseline covariates for the i th subject, where i=1,2,,n and j=1,2,,p. Next, let Ti represent the time to the event of interest for the i th subject and t denote an actual value of event time. The event times of the sample units are assumed to be independently and identically (i.i.d) distributed. The hazard function for subject i is given by the following multiplicative risk model

htXi=h0texpXiβ,t>0, (1)

where h0t is a nonnegative baseline hazard function and β=β1,β2,,βpT is the corresponding p×1 vector of regression coefficients. The covariates component in model (1), expXiβ, characterizes how covariates may influence the hazard function.

We assume h0t is from Weibullα,ν with the following density function

ft=αtα1ναexptνα,t>0,

where α>0 is the shape parameter and ν>0 is the scale parameter of the distribution. Thus, we have

h0t=αναtα1

It follows that hazard function htXi can be specified as

htXi=αναtα1expXiβ=ανexpXiβααtα1=αλiαtα1,t>0

where λi=expXiβ>0, Xi=1,Xi, and β=logναα,βα. Thus, the event time for individual i given a set of baseline covariates follows a Weibullα,λi. The survival functions of Ti given Xi is

StXi=exptλiα (2)

In realistic settings baseline covariates are usually from different distributions such as normal, Bernoulli, or Poisson, etc. It is difficult to derive joint probability density function fx for the high-dimensional covariate vector Xi, analytically and numerically. Instead of working with Xi and fx directly, it is much simpler to work with a single variable λi and its density function fλ.

Because of the constant enrollment rate assumption, the time to entry for patient i, denoted by Ei, is

Ei~Uniform0,a,

and the time from the entry to the study’s ending time point for individual i is defined as follows:

C1i=lEi

We can interpret C1i as the time to the administrative censoring event. It follows

C1i~Uniformla,l

Let fc1c1 denote the probability density function of C1i. This is

fc1c1=1a,lac1l0,otherwise

Let Cri denote the time to the occurrence of a right censoring event (i.e., lost to follow-up or censoring due to death from causes not related with the event of interest). We assume that the censoring times of all sample units are i.i.d. Let fcrcrθ denote the density function of Cri, where θ is a censoring parameter. Some common choices of censoring distribution include uniform distribution ~ Uniform0,θ or Weibull distribution ~ Weibullk,θ.

We let Yi=minTi,Cri,C1i be the observed follow-up time and δi=ITi<minCri,C1i be the censoring indicator with δi=1 if individual i experiences an event of interest, and 0 if this individual is censored from a right censoring event during the study period or from administrative censoring. The administrative censoring time C1i, the random right censoring time Cri, and the event time Ti are assumed to be independent of each other. That is, these censoring are non-informative. The important notations used were summarized in the nomenclature at the end of this paper.

3. The general framework

We aim to generate censored survival data in which the relationship between baseline covariates and the time to event outcome can be described by a proportional hazards (PH) model. In particular, the proportion of censored subjects in this data needs to be equal to the pre-determined nominal level. For this purpose, we follow a general procedure consisting of the following three steps: (1) derive the conditional censoring probability for each individual given a set of baseline covariates and censoring parameter; (2) derive the censoring rate function for the study population by marginalizing baseline covariates out from the conditional censoring probability function. This censoring rate function is a function of censoring parameter only; and (3) set the censoring rate function equal to a given value of censoring proportion and solve the equation for the corresponding value of censoring parameter.

3.1. Derivation of individual censoring probability δi=0λi,θ

An individual observes an event of interest if his or her event time is less than censoring times. This consists of two scenarios: Cri can occur before or after C1i. Thus, the event of interest occurs when Ti<Cri<C1i or Ti<C1i<Cri, where Ti>0, Cri>0, and lkC1il. We let δi=1Xi,θ denote the conditional probability of having an event for the i th individual given a set of baseline covariates Xi and the censoring parameter θ during the study period of l after this individual enters the study within the enrollment period a. Because of simplicity, we work with λi instead of Xi. Equivalently, we have

δi=1λi,θ=0<Ti<Cri,0<Cri<C1i,la<C1i<l+P0<Ti<C1i,C1i<Cri,la<C1i<l=lal0c10crfc1c1afcrcrθftXidtdcrdc1+lalc10c1fcrcrθfc1c1aftXidtdc1dcr=lal0c11afcrcrθ1expcrλiαdcrdc1+lalc11expc1λiα1afcrcrθdcrdc1 (3)

We need to specify fcrcrθ in the Eq. (3). Some common distributions for Cri include Weibull and uniform distributions.

  • Scenario1: Weibull censoring time: When the censoring time CriWeibullk,θ, we have

fcrcrk,θ=kcrk1θkexpcrθk,

where k>0 is the shape parameter and θ>0 is the scale parameter of the distribution. Study subjects are censored at varying rates during the study period after they enter the study. The conditional probability of experiencing an event of interest for individual i is:

δi=1λi,θ=lal0c11akcrk1θkexpcrθk1expcrλiαdcrdc1+lalc11expc1λiα1akcrk1θkexpcrθkdcrdc1=lal0c11akcrk1θkexpcrθk1expcrλiαdcrdc1+lal1expc1λiα1aexpc1θkdc1
  • Scenario2: Uniform censoring time: When Cri~Uniform0,θ, the density function of Cri is:

fcrcrθ=1θ,0<c<θ

and study subjects are censored at a constant rate during the time interval 0,θ. However, we have to determine θ in two different ways for a given censoring rate depending on whether θ is larger than la or not. When θ is less than la, Cri is always less than C1i so that being censored from administrative censoring is impossible. Then, the conditional probability of having an event for individual i is simply (Wan 2017)

δi=1λi,θ=λiαθγ1α,θ/λiα, (4)

where the lower incomplete gamma function γk,x is

γk,t=0xtk1etdt

When θ is greater than la, the conditional probability of having an event for individual i based on Eq. (3) is

δi=1λi,θ=lal0c11aθ1expcrλiαdcrdc1+lalc1θ1expc1λiα1aθdcrdc1=12aθl2la21aθlalλiαγ1α,c1λiαdc1+lal1aθ1expc1λiαθc1dc1,

The conditional probability of being censored for individual i given baseline covariates λi is

δi=0λi,θ=1δi=1λi,θ, (5)

To derive the censoring rate in the entire study population, the individual specific baseline covariate component λi need to be integrated out from δi=1λi,θ.

3.2. Derivation of censoring rate δi=0θ

We have the censoring rate in the study population by taking the expectation of δ=1λi,θ with respect to λi:

δi=0θ=Eλiδi=0λi,θ=Dδi=0u,θfλudu, (6)

where D denotes the domain of λi. It is difficult to derive the exact probability density function of λi analytically. However, we can estimate the density function fλλ using the kernel density estimation (KDE) method. KDE is a non-parametric method to estimate the probability density function of a random variable for a given dataset.

Suppose that we have a univariate independent and identically distributed random sample λ1,λ2,,λn drawn from some unknown probability density function fλλ, we are interested in knowing the shape of this density function. A kernel density estimator of fλλ is

f^λλ=1nhi=1nKλλih,

where h>0 is a smoothing parameter called the bandwidth. K is the kernel function. 1hKλλih is called the scaled kernel.

The kernel function is a symmetric, non-negative, real valued function that integrates to 1. The kernel is used in mathematics to denote a weighting function and it weights each data point λi differently based on their distance to λ. A common choice is a Gaussian kernel,

Kλ=12πeλ2/2

The kernel estimator f^λiλ is a biased estimator of density fλλ. A large value for h results in larger bias and smaller variance. A smaller h results in smaller bias and larger variance. There is always a tradeoff between the bias of the kernel density estimator and its variance when choosing the bandwidth. A recommended rule of thumb for bandwidth selection (Venables and Ripley 2002) is

h^=1.06minσ^ΙQR/1.34n1/5,

where IQR is the interquartile range computed as the difference between 75th and 25th percentiles and σ^ is the sample standard deviation.

3.3. Numerical solution of the censoring parameter θ for a predefined proportion

We set up a function γθ based on Eq. (6)

γθp=δ=0θp=0+δ=0u,θfλλdup (7)

For each possible combination of individual censoring probability δ=1λi,α,θ and density function fλu, we solve γθp=0 for θ that yields the desirable censoring proportion p. However, we cannot solve this equation explicitly. Instead, we use numerical integration to compute the integral component in Eq. (7) first and then use the Brent-Decker root-finding algorithm, implemented in R “uniroot” function, to find the solution for θ.

4. Case study

In this section, we design a hypothetical cohort study with survival outcomes to validate the algorithm proposed. Suppose a 10-year long cohort study is initiated to enroll the patients in the first two years, and each patient is followed until one of the following events occur: (i) the event of interest; (ii) competing censoring events or the lost to follow-up, or (iii) the end of the study period, whichever comes first. So the minimal and maximal follow-up times are 8 and 10 years, respectively. Details of simulation include:

  1. The entry time for each subject follows a Uniform0,2. Four independent baseline variables were generated for each subject: X1~Normal0,1, X2~Uniform0,1, X3~Bernouli0.5, and X4~Poisson5.

  2. We assume the baseline hazard function of the event time is from Weibullα,ν. i.e., h0t=α/ναtα1. The shape parameter α was set at 0.5, 1, 1.5 to represent decreasing, constant, and increasing hazards, respectively. The scale parameter ν was set at 2 so that β0=logνα=0.347,0.693,1.040, respectively. The regression coefficients for Xi were set as: β1,β2,β3,β4=0.2,0.2,0.1,0.1×α. Event time for individual i was generated from Weibullα,λi=expXiβi/α so that the underlying adjusted hazards function follows the form of a PH model defined by Eq. (1).

  3. An administrative censoring time C1i~Uniform8,12. The time to random censoring events Cri was generated from two scenarios: (i) study subjects were censored at higher rates at later time point. i.e., Cri~Weilbull1.2,θ; (ii) subjects were censored at a constant rate. i.e., Cri~Uniform0,θ. We searched for θ that yields a total of 30%, 50%, 70% censoring rates in the simulated data.

For each scenario, we solved θ for a given censoring rate and used it to simulate 1000 sample data with sample size n=10000 each. We then averaged these 1000 sample censoring rates to estimate the true censoring proportion.

It is worthy of noting that we need to check the range of λ in the simulated data in order to have more precise numerical integration in Eq. (7). e.g., when α=0.5, λ0,10; When α=1.5, λ0,16 (Sample R code are in the appendix). The results were reported in Table 1. When censoring distribution follows a uniform distribution, censoring parameter θ was determined using two different approaches laid out in Sec. 3 depending on whether the values for θ are large or smaller than the length of follow-up given the specified censoring rate. The sample censoring proportions based on the computed censoring parameters are very close to their nominal levels.

Table 1.

A comparison of sample censoring rates and nominal censoring rates.

Survival time Censoring time θ Nominal censoring proportion Sample censoring proportion
Weibull0.5,λ Uniform0,θ 15.329 30% 30.04%
4.094 50% 50.05%
1.021 70% 70.04%
Weibull1.2,θ 9.333 30% 29.98%
2.444 50% 49.99%
0.594 70% 70.04%
Weibull1,λ Uniform0,θ 31.343 30% 30.05%
9.126 50% 50.09%
4.256 70% 70.08%
Weibull1.2,θ 7.465 30% 29.90%
3.307 50% 49.98%
1.464 70% 69.98%
Weibull1.5,λ Uniform0,θ 10.881 30% 30.03%
5.937 50% 49.97%
3.363 70% 70.01%
Weibull1.2,θ 7.220 30% 29.94%
3.636 50% 50.01%
1.896 70% 70.03%

5. Application: bias of propensity score matching estimator

In this section, we design a complex simulation study to assess various factors (the size of treatment effect, the similarity between the coefficients of confounding variables in treatment and outcome models, the size of confounding, censoring rates, etc.) on the potential bias of propensity score matching estimators for estimating conditional and marginal hazard ratios. We need to generate censored survival data with pre-specified censoring rates in hundreds of different scenarios and it is impossible to do so by tuning the censoring parameter manually. However, this can be easily done using our approach. Through this demonstration, we will show the usefulness of our approach in real simulation studies. Propensity score matching is very common tool for researchers to design the observational studies to mimic randomized trials. However, some commonly used statistical methods used in randomized trials or conventional matched designs may not work in PS matched design as we would have expected. Specifically, whether these methods estimate marginal or conditional treatment effects still remain unclear among applied researchers. A conditional treatment effect is the average effect of treatment on the individual. A marginal treatment effect is the average effect of treatment on the population.

Austin (2013) and Austin et al. (2007) examined the following three methods: (1) unadjusted Cox PH model including binary treatment indicator only (the “Naive” Cox model); (2) unadjusted Cox PH model with robust sandwich variance estimator (the “robust” Cox model); (3) unadjusted Cox PH model stratified on the matched pairs. These studies showed the native Cox model is biased in estimating conditional hazards ratio in non-censored survival data. Both the native and robust methods appears to be consistent in estimating marginal hazards ratio but the stratified approach is biased. We design an extensive simulation study to re-assess the consistency of these three methods in estimating conditional and marginal hazards ratios in the censored survival data. We will also compare three variance estimators for PS matching estimator of the marginal hazards ratio: the empirical variance estimator, the model-based variance estimator, and the robust sandwich estimator to assess whether the robust sandwich estimator is adequate in capturing the true variability of PS matching estimator.

Suppose an 8-year-long prospective observational cohort study enrolls patients during a two year period with a constant enrollment rate. Once entered into the study each patient will be followed until the event of interest occurs, or until the end of study. Each patient receives either treatment (i.e., D=1) or control (i.e., D=0) once they are enrolled. The likelihood of receiving each treatment depends on his or her four baseline characteristics X=X1,X2,X3,X4 as follows:

logitPD=1X=α0+j=14α1,jXj (8)

where α1=α1,1,α1,2,α1,3,α1,4 are regression coefficients for X. The time to event outcome for each patient is determined also by X via the following PH model

htD,X=h0texpβ0+β1T+j=14β2,jXj, (9)

where eβ1 is interpreted as the conditional hazards ratio, and β2=β2,1,β2,2,β2,3,β2,4 is the regression coefficients for X. The marginal hazards ratio is define by

δ=logS1tlogS0t,

where S1t=T1>t and T1 is the potential event time when patients receive treatment A. S0t=T0>t and T0 is the potential event time when patients receive treatment B.

We used the same algorithm used in previous studies (Wan and Mitra 2018; Wan, Small, and Mitra 2018; Wan 2019) to generate simulation data:

  1. To generate β2=β2,1,β2,2,β2,3,β2,4, the elements of the coefficient vector were sampled randomly from <1,2,3,,9> first, and then the coefficient vector was normalized. The sign of each element ~Bernoullip=0.5. β2 is equal to k multiplied by its normalized factor; k determines the magnitude of confounding and was set to 0.3 and 1.2, representing low and high level of confounding. We repeated the same procedure to generate α1=α1,1,α1,2,α1,3,α1,4, but α1 was set to 1 multiplied by the normalized vector.

  2. For each pair of β2 and α1, confounding variables X1 and X2~Bernoullip=0.5, and X3 and X4 were generated independently from N0,1 with sample size n=2000. We normalized X1 and X2 with zero means and unit standard deviations. The treatment variable D was generated with the treatment model (8). The intercept α0 was set to −1.5. Thus, ~ 20% of simulated subjects received treatment. Next, the time to event T was generated using Eq. (9) and the baseline hazard h0t~Weibull1.5,2. β1 was set at {0, 0.3}, representing the null and non-null treatment effects. The marginal hazards ratio δ was generated using the approach used in Austin (2013). To generate censored scenarios, administrative censoring time C1i~Uniform8,12. Random censoring time Cri~Weilbull1.2,θ. We searched for θ that yields a total of 30% and 50% censoring rates in the simulated data.

  3. In the simulated data, we estimated the PS using a logistic-regression model for every subject. A nearest-neighborhood matching algorithm was used to match one treated subject with a control subject on the logit of the PS without replacement using a caliper of width equal to 0.005. In matched samples, we performed the following analyses: (1) the “naive” Cox model; (2) the “robust” Cox model; and (3) the stratified Cox model. This simulation process was repeated 1000 times. The estimates of treatment effect and the model-based standard errors for each method were computed in each simulated data.

  4. We sampled 10 coefficient pairs from each of five distance intervals {[0, 0.2], (0.2, 0.4], (0.4, 0.6], (0.6, 0.8], (0.8, 1]}. In summary, we will examine the effects of confounding β1=0.3,1.2, the size of conditional treatment effect β1=0,0.3, the censoring level (0%, 30%, 50%), the distance intervals (5 levels), and 10 coefficient pairs in each interval. We repeated the process in i–iii for each of 300 scenarios. It is worthy of noting that censoring parameter θ needs to be computed for each of 300 scenarios, which makes manual selection very difficult.

As shown in Figure 2, Naive and Robust Cox and stratified Cox models are both unbiased when the null hypothesis is true. All estimates circle around zeros in all scenarios. However, when there is a true treatment effect (Figure 3), Naive and Robust Cox and stratified Cox models are biased in estimating the conditional hazards ratios. Such biases increased with the increasing distance metrics and larger confounding effect. It’s also worthy of noting that censored data was associated with less biases. Figure 4 reveals the pattern of bias for estimating marginal hazards ratios. In the contrast, larger dissimilarity between αs and βs was associated with smaller bias. Naive or Robust Cox model was less biased than the stratified cox model. Larger confounding was associated with larger bias. On the contrary, censored data tended to be associated with lager bias. We could observe from Figure 5) that robust variance estimates were closer to the empirical variance estimates than the model-based variance estimates. This confirms the necessity of using robust variance methods in PS analyses.

Figure 2.

Figure 2.

The simulation results under H0 : β1=0. X-axis denotes the sine dissimilarity metric intervals. 5 denotes the most dissimilar interval (0.8,1], 1 denotes interval [0,0.2]. Y-axis denotes the estimates from Naive/Robust Cox and stratified models. Red color denotes confounding effect of 0.3 and green color denotes confounding effect of 1.2.

Figure 3.

Figure 3.

The simulation results under Ha : β1=0.3. X-axis denotes the sine dissimilarity metric intervals. 5 denotes the most dissimilar interval (0.8,1], 1 denotes interval [0,0.2]. Y-axis denotes the estimates from Naive/Robust Cox and stratified models. Red color denotes confounding effect of 0.3 and green color denotes confounding effect of 1.2.

Figure 4.

Figure 4.

X-axis denotes the sine dissimilarity metric intervals. 5 denotes the most dissimilar interval (0.8,1], 1 denotes interval [0,0.2]. Y-axis denotes the difference between averaged estimates and log marginal hazards ratios. Red color denotes confounding effect of 0.3 and green color denotes confounding effect of 1.2.

Figure 5.

Figure 5.

X-axis denotes the absolute difference between the averaged robust variance estimates and the averaged empirical variance estimates. Y-axis denotes the absolute difference between the averaged model-based variance estimates and the averaged empirical variance estimates.

6. Conclusion

Simulation studies have always been used to assess the properties of statistical methods analyzing time-to-event outcomes and censoring rate is often one factor under investigation. This article extends the general framework we proposed for simulating censored survival data by considering the more realistic settings under which study subjects are enrolled in a random fashion during an accrual period and are then followed until one of following events occurs: the event of interest occurs, early withdraws from random censoring events, and the ending of the study, whichever comes first. The approach relies on numerical integration and a root-finding algorithm to compute the value of censoring parameter so that we can avoid imposing specific distribution forms on covariates. As we demonstrated in our simulation study of assessing the PS matching estimators, our approach is particularly useful when we need to simulate censored survival data for many different scenarios because manual selection of censoring parameter is difficult. For simplicity and practicality, we made constant enrollment assumption. This assumption could be violated in practice. If a reasonable distribution for the enrollment rate can be specified, the algorithm can be modified accordingly to accommodate non-constant enrollment settings. Our improved approach could be an important tool for the design of simulation studies for survival outcomes.

Supplementary Material

supplementalRcode

Nomenclature

Xi

a 1×p vector of baseline covariates for the ith subject

Ti

the time to the event of interest for the i th subject

Ei

the time to entry for patient i

C1i

the time to the administrative censoring event

Cri

the time to the occurrence of a right censoring event

Y

the observed follow-up time

δ

the censoring indicator

α,ν

the shape and scale parameters of the Weibull distribution

β

the p×1 vector of regression coefficients in proportional hazards model

λ

the exponential of the linear predictor in proportional hazards model

A Sample R code


library (”pracma”)

n<-1000000
ul <–rnorm(n , 0 , 1)
u2 <–runif(n ,0 , 1)
u3 <–rbinom(n , 1 ,0.5)
u4 <–rpois(n , 5)

alpha . t<–1.5
alpha . c<–1.2

al<– 0.2*alpha . t
a2<– –0.2*alpha . t
a3<– 0.1*alpha . t
a4<– –0.1*alpha . t

a0<– –1.040
k<– –(a0+a1*u1+a2*u2+a3*u3+a4*u4) / alpha . t
lambda<–exp (k)

### kernel smoothing method###
dens<–density (lambda , n=1500, bw=0.01 , from=0, to=16,na .rm=TRUE)
x<–dens$x
y<–dens$y
y . loess <–1oess (y~x, span=0.1)

###get the nonparametric density estimates for f (t|lambda)###
density . fun . lambda<–function(x){
pred . y <– predict (y. loess , newdata=x)
return (pred . y)
}

###subject to a mixture of type I and random right censoring ###;

f . fun1<–function (c1 , cr , arg2){
theta <–arg2 [1]
lambda . i<–arg2 [2]
alpha . c <–arg2 [3]
alpha . t <–arg2 [4]
a <–arg2 [5]

part .1<-1/a
part .2<-dweibull(cr , alpha . c , theta)
part .3<–1–exp(–(cr /lambda . i) ^ alpha . t)
f . 1<–part . 1 * part . 2 * part . 3
}

f . fun2<-function(c1 , cr , arg2){
theta <–arg2 [1]
lambda . i<–arg2 [2]
alpha . c <–arg2 [3]
alpha . t <–arg2 [4]
a <–arg2 [5]

part .1<-1/a
part .2<-exp(–(c1/theta)âlpha . c)
part .3<–1–exp(–(c1/lambda . i) ^ alpha . t)

f . 1<–part . 1 * part . 2 * part . 3
}
censor .prop<–function (theta , arg1){

p<–arg1 [1]

cen . P<–integrate (function (u){

sapply (u , function (u){

lambda . i<–u

arg2<–c(theta , u , arg1 [–1])

prob . i<–1–(integral2 (f . fun1 , c1Min1 , c1Max1 , crMin1 , crMax1 , arg2=arg2 )$Q
+integral ( f . fun2 , c1Min2 , c1Max2 , arg2=arg2 , reltol = le – 10))

f . lambda . i<–density . fun . lambda (lambda . i)

return ( prob i*f . lambda . i )
})
}, 0 ,16) $value

return( cen . P–p)

}


p<–0.3
1<–12;
a<–2;
crMin1<-0; crMax1<–function (c1) c1;
c1Min1<–1–a; c1Max1<–1;
c1Min2<–1–a; c1Max2<–1;

arg1<–c(p , alpha . c , alpha . t , a)

###censoring parameter###
theta<–uniroot ( censor . prop, arg1=arg1 , c (0.01 , 100) , tol = 0.00000001)$root

References

  1. Aalen OO, Cook RJ, and Roysland K. 2015. Does Cox analysis of a randomized survival study yield a causal treatment effect. Lifetime Data Analysis 21 (4):579–93. doi: 10.1007/s10985-015-9335-y. [DOI] [PubMed] [Google Scholar]
  2. Austin PC 2013. The performance of different propensity score methods for estimating marginal hazards ratios. Statistics in Medicine 32 (16):2837–49. doi: 10.1002/sim.5705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Austin PC, Grootendorst P, Normand SL, and Anderson GM. 2007. Conditioning on the propensity score can result in biased estimation of common measures of treatment effect: A Monte Carlo study. Statistics in Medicine 26 (4):754–68. doi: 10.1002/sim.2618. [DOI] [PubMed] [Google Scholar]
  4. Venables WN, and Ripley BD. 2002. Modern applied statistics with S. 4th ed. New York, NY: Springer. [Google Scholar]
  5. Wan F 2017. Simulating survival data with predefined censoring rates for proportional hazards models. Statistics in Medicine 36 (5):838–54. doi: 10.1002/sim.7178. [DOI] [PubMed] [Google Scholar]
  6. Wan F 2019. Matched or unmatched analyses with propensity-score-matched data? Statistics in Medicine 38 (2):289–300. doi: 10.1002/sim.7976. [DOI] [PubMed] [Google Scholar]
  7. Wan F, and Mitra N. 2018. An evaluation of bias in propensity score adjusted non-linear regression models. Statistical Methods in Medical Research 27 (3):846–62. doi: 10.1177/0962280216643739. [DOI] [PubMed] [Google Scholar]
  8. Wan F, Small D, and Mitra N. 2018. A general approach to evaluating the bias of 2-stage instrumental variable estimators for proportional hazards models. Statistics in Medicine 37 (12):1997–2015. doi: 10.1002/sim.7636. [DOI] [PubMed] [Google Scholar]
  9. Wu JR 2017. Single-arm phase II survival trial design under the proportional hazards model. Statistics in Biopharmaceutical Research 9 (1):25–34. doi: 10.1080/19466315.2016.1174147. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplementalRcode

RESOURCES