SUMMARY
Patients undergoing renal transplantation are prone to graft failure which causes lost of follow-up measures on their blood urea nitrogen and serum creatinine levels. These two outcomes are measured repeatedly over time to assess renal function following transplantation. Loss of follow-up on these bivariate measures results in informative right censoring, a common problem in longitudinal data that should be adjusted for so that valid estimates are obtained. In this study, we propose a bivariate model that jointly models these two longitudinal correlated outcomes and generates population and individual slopes adjusting for informative right censoring using a discrete survival approach. The proposed approach is applied to the clinical dataset of patients who had undergone renal transplantation. A simulation study validates the effectiveness of the approach.
Keywords: Bivariate correlated outcomes, Discrete Survival distributions, Informative right censoring, Longitudinal data, Slope estimation
1. Introduction
Kidney transplant is an increasing and desired therapeutic option for patients with end-stage renal disease (Jahromi HA et al., 2006). Patients who have undergone renal transplantation are usually monitored over time to assess how well their kidneys are functioning following transplantation. Typically, patients’ blood urea nitrogen (BUN) and serum creatinine (Cr) are measured repeatedly over time to evaluate kidney function. A commonly occurring side effect to kidney transplantation is graft failure due to renal rejection which causes patients to go back on dialysis (Meier-Kriesche HU et al., 2000). Once back on dialysis, the kidneys do not function, and levels of BUN and Cr are not meaningful, and thus are no longer recorded. Thus patients with graft failure will not have complete set of measurements.
Because dropout depends on graft failure, informative right censoring is introduced to this data. This type of censoring is a commonly occurring problem in longitudinal data involving repeated measures over time. Informative censoring or missing not at random occurs when the missing data mechanism is dependent on the unobserved values of the outcome even after accounting for the fully observed values. (Imai, 2009). It also arises when the censoring probability for each individual is dependent on the underlying slope of the primary variable for that individual (Wu and Bailey, 1988). Ignoring the problem of informative right censoring in the analysis of such data and treating it as randomly missing will result in invalid slope estimates. This is a major issue that needs to be considered in the analysis of renal transplant data. Another important aspect is that the slopes for both outcomes, BUN and Cr, should be estimated concurrently. This is crucial since these two outcomes are correlated and both are used to assess kidney function. Relying only on one of them would not provide an accurate assessment of renal function. For instance, BUN levels are affected by diet and liver function whereas blood creatinine levels are influenced by age and gender (James GD et al., 1988). Thus to obtain valid slope estimates for this type of data, a statistical model that adjusts for informative right censoring and jointly models the slopes for the bivariate outcomes is required. Moreover, in the renal transplant context, interest was not only in generating population slopes for the bivariate outcomes but also in generating slopes for each individual subject so that kidney function for each patient can be assessed and followed over time. Accordingly, the proper statistical model to analyze the renal transplant data should adjust for informative right censoring and generate population and individual slopes for the bivariate longitudinal outcomes which in this context are, BUN and Cr.
In longitudinal study designs, as in the case of renal transplant dataset, it is common that more than one outcome are measured for each subject over time and the problem of informative right censoring is very likely to arise in this context as well. Moreover, in most of the cases these bivariate longitudinal outcomes tend to be correlated, and this correlation should be considered to help understand how these two outcomes are related and how this relationship is changing over time. Thus, a model that adjusts for informative right censoring, jointly models the two outcomes and accounts for the correlation between them is highly desired so that valid estimates of the bivariate slopes can be acquired. For these reasons we were motivated to develop such an approach so that it can be applied not solely to the renal transplant data but also to any dataset where bivariate correlated outcomes are measured longitudinally with informative right censoring.
Few studies have attempted to model bivariate outcomes measured repeatedly over time in the context of informative right censoring. The reason may be due to the underlying complexity of this problem since the correlations between the repeated measures on the same outcome and the correlations between the bivariate outcomes must be accounted for (Gueorguieva, 2005). In addition, adjusting for informative right censoring adds another level of complexity. A univariate approach to estimate a population and individual slopes was presented Mori et al. (1994). In this model the number of observations for each subject was modeled and was assumed to specifically follow a geometric distribution with mean dependent on the unknown individual slope. Fieuws and Verbeke (2006) proposed a method for analysis of multivariate repeated measures that is based on maximum likelihood estimation for each pair of outcomes.
A bivariate approach that adjusts for left censoring and informative dropout was developed by Thiébaut et al. (2005). In their approach Thiébaut et al proposed a joint multivariate normal model for the bivariate linear mixed model and a lognormal survival model for time to drop out. In our study we are also proposing a likelihood approach to address the problem of informative censoring in slope estimation for bivariate correlated longitudinal outcomes. The likelihood-approach is based on a multivariate normal distribution for the longitudinal outcomes, and also jointly models the slopes of two longitudinal correlated outcomes and incorporates the correlation between them into a likelihood function. To account for informative right censoring, the number of observations for each individual subject is modeled and is assumed to follow a discrete survival distribution. The novelty of our approach is that we show the likelihood simplifies into a function of the ordinary least squares (OLS) slopes for each individual. Because the OLS slope is a linear combination of the longitudinal outcomes for a subject, we propose that when the number of longitudinal measurements on a subject is moderately large, multivariate normality of the underlying outcomes is not necessary. This results from the central limit theorem that a linear combination of a large number of random variables is approximately normal (of course the normal approximation depends on the distribution of the underlying random variables in the sum). In addition to showing that the likelihood reduces to a function of the OLS slopes, we show that, even though conditional on the random intercept and random slope, the drop-out model could depend on both the random intercept and random slope, one only needs to specify the correct 'marginal' drop-out model (integrated over the random intercept), which is just a function of the underlying unobserved random slope.
The proposed model is suitable for cases where interest is primarily in slope estimation and informative right censoring is the main problem that needs to be accounted for along with the correlation between the outcomes. Here, we are proposing discrete survival model wherein different specifications of the discrete hazard can be incorporated. This includes logistic model, discrete proportional hazards model, and the truncated geometric distribution. Thus this model has some flexibility to accommodate different types of discrete hazards. We applied our bivariate model to the renal transplant registry (Besenski et al., 2005) at the Medical University of South Carolina. Estimates of the population and individual slopes adjusted for the correlation between the outcomes and for informative right censoring are generated. Each individual patient is assumed to have at least two measurements so that the slopes can be estimated.
2. Bivariate Model
Consider a set of i = 1, …,n independent subjects, with bivariate outcome data (yi1k, yi2k) to be recorded at k=1, ‥,p predetermined times, where (t1,t2, …,tp), are the actual times of measurement and are assumed to be prespecified and do not occur in a random fashion. The predetermined p time points are the same among individuals but not necessarily equally spaced. Because of censoring (in our example, due to graft failure), the ith individual has mi number of observations, where mi ≤ p, corresponding to times of measurement (t1,t2, …,tmi). We assume that yi1k and yi2k are both observed at all time points before dropout, i.e., yi1k and yi2k are either both measured or both are missing. First, we describe the model for (yi1k, yi2k). The underlying model for yi1k and yi2k at time point tk, is assumed to be
| (1) |
for j=1,2, with random effects αij and βij, and random error eijk which is assumed to be normal with mean zero and variance . The random effects αij, βij, and eijk are all assumed to be mutually independent. We note here that, in a model with informative censoring, the probability of being censored depends on unobserved data. In our informative censoring model, the probability of being censored depends on the unobserved random slopes (βi1,βi2), which we incorporate in the likelihood described later in this section.
In model (1), conditional on the random effects αij and βij, yijk is normal with mean αij + βijtik and variance . Further, conditional on the random effects αij and βij, yi1k and yi2k are assumed independent. Next, suppose we rewrite (1) as
| (2) |
where t̄i is the sample mean of the observation times for subject i. Using this formulation of the model, the sample mean ȳij is the OLS estimate of and
| (3) |
is the OLS estimate of the slope βij. Further, ȳij and bij are independent, and ȳij is a sufficient statistics for and bij is a sufficient statistics for βij. Since βij is of main interest, and ȳij and bij are independent and sufficient for and βij, respectively, all of the information about βij is contained in bij and the distribution of ȳij can be ignored.
Then, conditional on the individual random slopes for the first and second outcomes denoted as , their corresponding OLS estimates denoted as are assumed to be independent and follow a bivariate normal distribution:
| (4) |
where
| (5) |
Since the distribution of bij is a function of , we need to extract information about from the yijk ‘s. The OLS estimate of is
| (6) |
Here, is a sufficient statistic for , and is also independent of ȳij and bij. Thus, all of the information about βij is contained in the sufficient statistics , and the distribution of ȳij can be ignored. Further, is distributed as a chi-square with degrees of freedom. We note that these estimates will be the estimates of that result from the likelihood discussed below. As for the normality assumption, even if the underlying yij’s are not normal, the above properties will approximately hold as long as mi and n are moderately large.
Our interest is in estimating bivariate population slopes (β1,β2) and predicting the bivariate individual random effects (βi1,βi2), where (βi1,βi2) are assumed to be correlated with covariance and follow a bivariate normal distribution:
| (7) |
Because all of the information in the data about is contained in the sufficient statistics bi,ols and , we use these in the informative censoring likelihood.
With informative censoring, conditional on (βi1,βi2), the likelihood is made up of three independent pieces. In particular, the likelihood is a function of the following:
- having discrete censoring distribution
where hik is the discrete hazard (probability of dropping out in interval k given still being in the study in the previous interval), mi = 1, …,p+1 and Ri = 0 if patient seen at the last follow-up time (mi =p+1) and 1 if patient dropped out at, or before the last follow-up time.(8) (9) (10)
Assumption 1) states that the number of observations for each individual follows a discrete survival (censoring) distribution with discrete hazard dependent on the individual slopes βi1 and βi2. Also, γ0,γ11,γ12 are the parameters of the censoring distribution. This discrete censoring distribution can be right-truncated since the study may terminate before observing the withdrawal of all the subjects. To account for right truncation, the discrete survival distribution was modified by introducing the indicator variable Ri (Mori et al., 1994), which takes the value 0 if censoring did not occur and 1 otherwise. Possible specifications of the discrete hazard include a logistic model,
| (11) |
or the discrete proportional hazards model,
| (12) |
A special case of the discrete hazard is the truncated geometric distribution, with
| (13) |
so that the hazard is constant across intervals.
Note that if both censoring parameters γ11 and γ12 are zero then the drop-out process does not depend on any of the outcomes and is therefore non-informative. Meanwhile if only γ11 is zero then the censoring process is dependent on the second outcome only. The same is true for γ12. In case of dropout, it is assumed that no more measurements are recorded on both outcomes. Thus each subject will have the same number of observations for both outcomes. As in the non-informative case, conditional on the random slopes , the observed OLS slopes are assumed in 2) to be independent and normally distributed, and the estimated pooled variance in 3) is chi-square.
This is viewed as a non-ignorable missing data problem in the sense that the missing value mechanism is dependent on the unobserved random vector . In this context, is both a parameter in the distribution of and is unobserved itself (Mori et al., 1994).
We maximize the marginal likelihood, integrated over the unobserved random , to obtain the maximum likelihood estimate. However, we also use empirical Bayes method to predict for each individual. The joint distribution of mi, , and further, is used in the likelihood function as follows:
| (14) |
The likelihood function is described in details in the Appendix. The log of this likelihood is maximized to obtain estimates of the population slopes (β1,β2). In addition, the parameters of the dropout model (γk,β11,β12) are estimated as well. The Empirical Bayes estimates of the individual random slopes (βi1,βi2) are obtained via an approach similar to Ten Have and Localio (1999).
The two longitudinal outcomes are assumed to be correlated and their corresponding slopes follow a bivariate normal distribution denoted as in equation (14). Precisely,
| (15) |
Where ρβi12 denotes the correlation between the slopes of the bivariate outcomes.
Here, we emphasize that our approach does not restrict the dropout probability (distribution of mi) to depend only on βi. If the distribution of mi depends on βi and , we can write the integral in (14) as
| (16) |
where, we assume that, and βij are independent since corresponds to the mean of yijk at the mean of tij for subject i and βij corresponds to the slope, and these two are typically assumed independent; note however, this implies that , which corresponds to the mean at baseline, and βij will not be independent.
In (16),
| (17) |
so that the likelihood of the data is actually identical to (14), even if the distribution of mi depends on βi and . Thus, even though the drop-out model could depend on both βi and , one needs to specify the correct ‘marginal’ drop-out model in (17) only as a function of βi.
The main issue is, if the conditional distribution of mi depends on both βi and , with the hazard equal to a discrete logistic model similar to (11) or a discrete proportional hazards model such as in (12), then integrating over the distribution of (which is typically normal) will not give a simple logistic or proportional hazards model for the discrete hazard for f(mi | βi) and we could have a misspecified drop-out model. Note, some recent work with Bridge distributions (Wang and Louis, 2003) gives distributions for in which the marginal distribution (integrating over ) and the conditional distribution (conditional on ) have the same form (say, both discrete proportional hazards), so it still possible that both the conditional and marginal distributions have the same form. Thus, the bias in our model will depend on how close our proposed distribution for f(mi | βi) is to the true distribution.
The likelihood function described in equation 14 and in the appendix is then maximized using the SAS procedure NLMIXED (SAS Institute Inc., 2002). As for applicability, the proposed bivariate model can be implemented on any dataset where bivariate correlated outcomes are measured longitudinally with occurrence of informative right censoring. In this study we were interested in applying this proposed bivariate model to the renal transplant dataset described in the following section.
3. Renal Transplant dataset
The proposed bivariate model was applied to a clinical registry dataset. Patients who had undergone renal transplant in the year 2000 were identified from the hospital-based registry for renal transplantation at the Medical University of South Carolina (Besenski et al., 2005). The registry keeps follow-up records on kidney function as well as demographic information for all the patients who had undergone kidney transplant. BUN and Cr for each patient are recorded repeatedly at baseline and post-transplant to assess kidney function. These two outcomes were collected on 110 patients, who had undergone kidney transplant in the calendar year 2000. Informative right censoring occurred in this dataset due to graft failure. When graft failure occurs, kidneys will no longer function resulting in renal failure. In this situation, patients with renal failure go back to dialysis and hence no more measurements for BUN and Cr levels can be obtained. Thus, following graft failure informative right censoring occurs.
Approximately, twenty five percent of the observations had informative missing values in a three year follow-up period due to graft failure. The normal BUN level is in the range of 8 mg to 25 mg per 100 ml of blood and that for Cr is in the range of 0.7 mg to 1.3 mg per 100 ml of blood. Seven repeated measurements following the renal transplant were recorded at 0 (baseline), 1, 3, 6, 12, 24, 36 months following the transplant. These follow up measurement time points are predetermined and patients are followed at scheduled visits. Moreover, all patients should have a common schedule for their follow-up visits wherein BUN and Cr are recorded. This results in the patients having the same predetermined time points where measurements are recorded. In other words, BUN and Cr were to be collected between the years 2000 to 2003 (3 year follow-up measurements) for all patients who underwent renal transplant in year 2000. A histogram of the observed percentages of dropout was generated and is shown in Figure 1 (a). The discrete hazard for each interval, which is the probability of dropping out in an interval, given still in the study at the beginning of the interval is presented in Figure 1 (b).
Figure 1.
(a) Observed percentage dropout at each follow-up time point; (b) Hazard rate at follow-up time point.
After any organ transplant the patient’s body will be at high risk of rejecting the new organ and hence graft failure. However, this risk will be gradually reduced with time. With the renal dataset, as can be deduced from Figure 1 (a), the dropout percentage was highest in the 12 months period following transplantation. Subsequently, this percentage was significantly reduced at 24 months follow-up period. This is an indication that patients were at a higher risk of graft failure in the first year following transplant and this risk was reduced afterwards. Since the hazards do not appear constant over the follow-up times, particularly at 12 months (Figure 1(b)), the discrete logistic censoring model, formula (11), with a different intercept for each interval was employed.
The estimated slopes under the bivariate model for Cr and BUN were respectively as follows: β1 = −0.074 P-value < 0.0001; β2 = −0.103 P-value < 0.0001. The covariance between the slopes β1 and β2 was estimated to be equal to 0.002. The censoring parameters for Cr and BUN were also estimated as such: γ11 = −9.971 (P-value = 0.001) and γ12 = 6.6537 (P-value <0.0001) respectively. The estimated intercepts γk in the logistic dropout model for follow-up months 3, 6, 12, 24, 36 were respectively: −6.152, −4.378, −4.262, −3.334, and −3.700. The censoring parameters γ11 and γ12 are both significant indicating that the censoring process depends on both outcomes. The slope estimates showed that the levels of both outcomes (Cr and BUN) are decreasing with time. Furthermore, we also observed that patients who had smaller decrease in their BUN and Cr levels as determined by their individual slopes had a smaller number of observations. For instance, the mean of individual slopes for BUN for patients with number of observations mi = 2 or mi = 3 were −0.074 ± 0.0143 and −0.079 ± 0.0055 respectively. Whereas, that for patients with number of observations mi = 7 was −0.11 ± 0.0073. The same applies to Cr, which indicates that the higher the individual slopes the slower the reduction in BUN and Cr levels and hence, the greater the possibility for patients to experience graft failure. This fact underscores the significance of obtaining estimates for the individual subjects’ slopes. Such estimates allow investigators to monitor kidney disease status for each patient through assessment of their estimated slope over time. The Individual subjects’ slopes for Cr and BUN corresponding to each patient generated under the bivariate model are presented in Figure 2. When the univariate model was applied on each outcome separately, i.e., treating these two outcomes as being independent and estimating the corresponding slopes separately, the respective population slopes estimates and censoring parameters for Cr and BUN were obtained: β1 = −0.03 P-value < 0.0001; β2 = −0.06 P-value < 0.0001; γ1 = −16.4 P-value < 0.0001; γ2 = 8.76 P-value < 0.0001. Despite the difference in the estimated population slopes under the bivariate and univariate models, the slopes under both models suggested that the levels of Cr and BUN are decreasing with time.
Figure 2.
Estimates of individual slopes corresponding to: (a) creatinine and (b) BUN for each patient.
Assessing whether joint modeling for the bivariate longitudinal outcomes provides better estimates than the univariate model was thoroughly investigated in our simulation study as discussed in the following section.
4. Simulation Study
A simulation study was conducted to assess and compare the performance of the proposed bivariate model to that of the univariate model. The univariate model adjusts only for informative right censoring and assumes that the two longitudinal outcomes are independent and therefore, each of their corresponding slopes follows a univariate normal distribution. The proposed bivariate model assumes that the two longitudinal outcomes are correlated and that their slopes follow a bivariate normal distribution. SAS procedure NLMIXED (SAS Institute Inc., 2002) was used to obtain the slope estimates.
The performance of the bivariate model as well as that of the univariate model was evaluated by means of bias, mean squared errors for population slopes denoted as MSE(a) and mean squared errors for the individual slopes referred to as MSE(b), where
| (18) |
with n being the number of subjects in each data set and r being the total number of replications. MSE(a) evaluates the accuracy of the generated estimates for the population slope and MSE(b) evaluates the accuracy of the predicted estimates for the individual slopes. This allowed us to assess whether jointly modeling the bivariate slopes in the likelihood function and hence accounting for the correlation between the outcomes has any effect on bias and MSEs. The number of observations was randomly generated from the truncated geometric distribution that depended on both censoring parameters γ1 and γ2. The number of observations ranged from 2 to 7 recorded at prespecified time points (tij) 0, 1, 3, 6, 12, 24, and 36 months. A linear relationship was assumed between each outcome and log(tij+1), which equals 0, 0.69, 1.39, 1.95, 2.56, 3.22, and 3.61. Thus, the model for the outcomes is , for sij =log(tij+1) and j=1,2.
The performance of the bivariate model was assessed under two conditions, when the outcomes are normally distributed and when they are non-normal. These 2 types of simulations enabled us to verify the robustness of the proposed bivariate model to the outcomes distribution. In the first type of simulation, the errors eijk for the two outcomes were assumed to be normally distributed with mean zero and variance . The slopes for the first and second outcomes were normally distributed with mean βj, for j=1,2. Conditional on the random effects αij and βij (as well as unconditionally since αij and βij are specified as normal), each outcome yijk is also normal. In the second type of simulation, αij and βij? as well as [are you sure you didn't simulate αij and βij as normal butt he errors as triangular, we should be clear] the errors were simulated non-normally. In particular, the random errors eijk were simulated from a triangular distribution that had a mean of 0, but not symmetric about 0, with mode of 1, minimum of −3 and maximum of 2. This distribution with these particular values of the mode, mean, maximum and minimum, is known to have a moderate to large departure from normality (Lipsitz et al., 2004). The slopes [are you talking about the βij's here or the OLS slopes--we should be clear] for the two outcomes followed the triangular distribution. Conditional on αij and βij, the outcome yijk follows the same triangular distribution as the errors eijk [I don't think yijk follows the same triangular distribution as the errors eijk since conditional on αij and βij, the mean of yijk can't be 0]. I think you should just say: Conditional on αij and βij, the outcome yijk follows a triangular distribution. The results of these simulations were presented in Tables 1 and 2. As for the univariate model, the errors were assumed to be normally distributed.
Table 1.
Comparisons of the performance of Bivariate (normal and triangular errors) and Univariate: ρ =0.2
γ0 =−3.29; 1♣ first outcome 2♣ second outcome
| Bivariate Normal Estimation (Accounting for ρ) True Model Normal Errors |
Univariate (Ignoring ρ) True Model Normal Errors |
Bivariate Normal Estimation (Accounting for ρ) True Model Triangular Errors |
||
|---|---|---|---|---|
| γ1♣ = 0 γ2♣ = 0 | Bias _1*100 | 0.000 | 0.124 | −0.068 |
| Bias _2*100 | 0.000 | 0.455 | −0.080 | |
| MSE(a)_1*1 0^3 | 0.179 | 0.463 | 0.207 | |
| MSE(a)_2*100 | 0.103 | 0.222 | 0.114 | |
| MSE(b)_1*100 | 0.126 | 0.312 | 0.146 | |
| MSE(b)_2*10 | 0.105 | 0.201 | 0.127 | |
| γ1 = 5.7 γ2 = 0 | Bias_1*100 | 0.000 | 0.033 | −0.110 |
| Bias_2*100 | 0.000 | 0.315 | 0.022 | |
| MSE(a)_1*1 0^3 | 0.181 | 0.334 | 0.216 | |
| MSE(a)_2*100 | 0.101 | 0.184 | 0.102 | |
| MSE(b)_1*100 | 0.124 | 0.254 | 0.147 | |
| MSE(b)_2*10 | 0.102 | 0.188 | 0.125 | |
| γ1 = 3.5 γ2 = 2.3 | Bias _1*100 | −0.037 | 0.141 | −0.083 |
| Bias _2*100 | −0.027 | 0.266 | −0.033 | |
| MSE(a)_1*1 0^3 | 0.186 | 0.309 | 0.215 | |
| MSE(a)_2*100 | 0.098 | 0.149 | 0.109 | |
| MSE(b)_1*100 | 0.123 | 0.240 | 0.145 | |
| MSE(b)_2*10 | 0.090 | 0.165 | 0.124 | |
| γ1 = 6.9 γ2 = 4.6 | Bias _1*100 | −0.087 | 0.120 | −0.112 |
| Bias _2*100 | −0.169 | 0.226 | −0.180 | |
| MSE(a)_1*1 0^3 | 0.174 | 0.282 | 0.209 | |
| MSE(a)_2*100 | 0.112 | 0.144 | 0.118 | |
| MSE(b)_1*100 | 0.122 | 0.214 | 0.143 | |
| MSE(b)_2*10 | 0.089 | 0.154 | 0.124 | |
Table 2.
Comparisons of the performance of Bivariate and Univariate: ρ =0.7
γ0 =−3.29; 1♣ first outcome 2♣ second outcome
| Bivariate Normal Estimation (Accounting for ρ) True Model Normal Errors |
Univariate (Ignoring ρ) True Model Normal Errors |
Bivariate Normal Estimation (Accounting for ρ) True Model Triangular Errors |
||
|---|---|---|---|---|
| γ1♣ = 0 γ2♣ = 0 | Bias _1*100 | −0.048 | −0.117 | −0.059 |
| Bias _2*100 | 0.082 | 0.411 | −0.207 | |
| MSE(a)_1*1 0^3 | 0.173 | 0.425 | 0.197 | |
| MSE(a)_2*100 | 0.106 | 0.187 | 0.114 | |
| MSE(b)_1*100 | 0.132 | 0.306 | 0.156 | |
| MSE(b)_2*10 | 0.112 | 0.197 | 0.139 | |
| γ1 = 5.7 γ2 = 0 | Bias _1*100 | 0.011 | 0.034 | −0.148 |
| Bias _2*100 | −0.079 | 0.553 | −0.170 | |
| MSE(a)_1*1 0^3 | 0.182 | 0.369 | 0.206 | |
| MSE(a)_2*100 | 0.104 | 0.175 | 0.110 | |
| MSE(b)_1*100 | 0.130 | 0.263 | 0.159 | |
| MSE(b)_2*10 | 0.109 | 0.184 | 0.138 | |
| γ1 = 3.5 γ2 = 2.3 | Bias _1*100 | −0.016 | 0.160 | −0.091 |
| Bias _2*100 | −0.146 | 0.431 | −0.199 | |
| MSE(a)_1*1 0^3 | 0.179 | 0.314 | 0.205 | |
| MSE(a)_2*100 | 0.101 | 0.172 | 0.111 | |
| MSE(b)_1*100 | 0.128 | 0.258 | 0.157 | |
| MSE(b)_2*10 | 0.089 | 0.171 | 0.139 | |
| γ1 = 6.9 γ2 = 4.6 | Bias _1*100 | −0.070 | 0.230 | −0.089 |
| Bias _2*100 | −0.199 | 0.347 | −0.298 | |
| MSE(a)_1*1 0^3 | 0.183 | 0.303 | 0.195 | |
| MSE(a)_2*100 | 0.109 | 0.161 | 0.113 | |
| MSE(b)_1*100 | 0.127 | 0.227 | 0.154 | |
| MSE(b)_2*10 | 0.087 | 0.160 | 0.139 | |
The parameters were obtained from the estimates generated when the bivariate model with geometric censoring distribution was applied to the renal transplant data. Cr and BUN were the two outcomes of interest. The slopes used in the simulation were β1=−0.074 corresponding to Cr and β2 = −0.103 to BUN estimated under the bivariate model. For the case when normality was assumed, the variances for β1 and β2 were estimated from the renal transplant dataset as well with respective values . Moreover, estimates of the error variance for the first and second outcomes were generated from the renal transplant dataset and these estimates were as follows: .
The datasets (2000 replications each with size 200) were simulated using different values for the correlation coefficient (ρ = 0, ρ = 0.2, and ρ = 0.7). Considering different values of correlations will enable us to assess the effect of the level of dependence between outcomes on the validity of the estimated slopes under the bivariate and univariate models. In addition different values for the censoring parameters γ1 and γ2 were considered to study the effect of censoring levels on the accuracy of the results.
Tables 1, 2 and 3 present the bias and mean square errors for the slopes estimates for outcomes one and two under the bivariate and univariate models using different correlation and censoring levels. We will first start by discussing the results of the simulation study when the errors were normally distributed. In this context, when the outcomes are correlated and censoring process is dependent on both outcomes (i.e., γ1 ≠ 0 and γ2 ≠ 0), the bias, MSE(a) and MSE(b) for the first and second outcomes were reduced by an average of 57%, 35%, and 45% respectively compared to univariate model (Table 1).
Table 3.
Comparisons of the performance of Bivariate and Univariate: ρ =0 γ0 =™3.29; 1♣ first outcome 2♣ second outcome
| Bivariate (Accounting for ρ) | Univariate (Ignoring ρ) | |||
|---|---|---|---|---|
| γ1♣ = 6.9 γ2♣ = 4.6 | Bias _1*100 | −0.041 | 0.161 | |
| Bias _2*100 | −0.116 | 0.236 | ||
| MSE(a)_1*1 0^3 | 0.173 | 0.289 | ||
| MSE(a)_2*100 | 0.107 | 0.149 | ||
| MSE(b)_1*100 | 0.117 | 0.218 | ||
| MSE(b)_2*10 | 0.101 | 0.152 | ||
| γ1 = 0 γ2 = 7.6 | Bias _1*100 | −0.039 | 0.126 | |
| Bias _2*100 | −0.131 | −0.129 | ||
| MSE(a)_1*1 0^3 | 0.178 | 0.323 | ||
| MSE(a)_2*100 | 0.105 | 0.103 | ||
| MSE(b)_1*100 | 0.129 | 0.260 | ||
| MSE(b)_2*10 | 0.115 | 0.112 | ||
| γ1 = 4.6 γ2 = 0 | Bias _1*100 | −0.007 | −0.001 | |
| Bias _2*100 | 0.065 | 0.565 | ||
| MSE(a)_1*1 0^3 | 0.174 | 0.171 | ||
| MSE(a)_2*100 | 0.109 | 0.190 | ||
| MSE(b)_1*100 | 0.122 | 0.116 | ||
| MSE(b)_2*10 | 0.118 | 0.189 | ||
When the correlation between the outcomes was increased to 0.7 (Table 2) an average reduction by 68% in bias and 45% in MSE’s with the bivariate model was also identified. When censoring was noninformative (γ1 = 0 and γ2 = 0) with ρ = 0.2 (Table 1) the bivariate model generated unbiased estimates for both outcomes, along with an average reduction in MSEs by about 56 %. However, some bias was introduced when the correlation increased from 0.2 to 0.7 (Table 2). When censoring process was dependent on only one outcome (outcome 1 in our simulation) with γ1 = 5.7 and γ2 = 0, there was a reduction in MSEs by an average of 45% and 47% for correlations of 0.2 and 0.7 respectively (Tables 1–2).
All the results and conclusions discussed thus far were in the context of correlated outcomes. In Table 3 we present the simulation results of the case when these outcome were uncorrelated with various scenarios for the censoring process. A superiority of the bivariate model over the univariate approach was realized when censoring was informatively dependent on both outcome (ρ ≈ 0,γ1 = 6.9 and γ2 = 4.6). However, when censoring was informatively dependent on one outcome only, then the correctly specified univariate outcome upon which censoring truly depends showed an equivalent performance to the bivariate model. Meanwhile, with the misspecified univariate model MSEs were increased by about 44% compared to both the bivariate and the correctly specified univariate model.
All the discussions thus far were in the context of normal errors and outcomes. The simulation results for the case when normality assumption was violated and errors followed a triangular distribution (described above), were presented in Tables 1 and 2. In this context, our simulation results showed that in cases where the outcomes are deviated from normality, the bivariate model still generated accurate results with minimum bias and MSEs. There was only a minimal increase in MSEs compared to the bivariate model wherein the outcomes were normally distributed. In particular the average increase in MSEs was about 15% for ρ = 0.02 (Table 1) and about 16% for ρ = 0.7 (Table 2). The bias associated with the bivariate model having triangular errors was still minute. These results confirmed the robustness of the model for non-normal outcomes.
5. Discussion
In this article, a bivariate model that jointly models the bivariate outcomes and adjusts for informative right censoring was developed. The bivariate model that incorporates the discrete logistic censoring with different intercepts for follow-up time intervals formula (11) was applied on the cohort of renal transplant study population. The population slopes estimates for BUN and Cr under the bivariate model differed from those generated under the univariate models. This discrepancy in results is in line with what was previously observed by Thiébaut et.al (2005) where their bivariate model changed the fixed estimates compared to univariate models even after adjusting for informative dropout.
Our simulation study was based on geometric distribution not solely for simplicity reasons and not only because it is a special case of the discrete proportional odds but also due to the established relationship between proportional hazards model and the geometric distribution. In particular a study conducted by Chen A and Manatunga A (2007) demonstrated that the proportional hazards model is solely related to the geometric distribution. Accordingly, the only link between the proportional hazards model and the proportional odds is through the geometric distribution. Thus the results of this simulation study could provide a general idea of the performance of the bivariate proportional odds and proportional hazards models. As indicated by our simulation study results, when the censoring process is dependent on both outcomes, estimates generated under the bivariate model have about 2-fold decrease in bias and MSEs compared to univariate model. The same decrease was observed for the cases when outcomes are correlated and the censoring process is dependent on only one of the outcomes, or when the censoring is non-informative. In this context, the superiority in the performance of the bivariate model could be attributed to the incorporation of the correlation in the joint modeling in addition to the over-parameterization in the model specification. Some increase in bias was observed when the correlation increased from 0.2 (Table 1) to 0.7 (Table 2) under non-informative censoring. This increase could be an implication that the magnitude of correlation between the outcomes might affect the model’s performance especially in cases when the mechanism of missingness is random. However, when the outcomes are uncorrelated and the censoring process depends on one of the outcomes, the correct univariate model performs the same as the bivariate model. Meanwhile, the misspecified univariate model had the worst performance, which suggests that when the missing process is occurring in a random manner, the model that adjusts for informative right censoring doesn’t perform well. This is in agreement with what was already established by other studies (Wu and Bailey, 1988; Wu and Carroll, 1988; Mori et al, 1994).
A simulation study was conducted to examine the performance of the bivariate model asymptotically with non-normal outcomes. Here, the number of repeated measure (mi) ranged from 2 to 30, the errors were simulated from the triangular distribution described earlier. Our results showed a significant decrease in bias and MSE’s when mi ranged from 2 to 30 as compared to the case when it ranged between 2 and 7. In particular, a decrease of about 40% in bias and 65% in MSE’s was observed for 2 ≤ mi ≤ 30 as compared to 2 ≤ mi ≤ 7. This gain in accuracy is attributed to the fact that the subject-specific OLS slopes are more normal when mi is larger. A simulation study that was previously conducted by Lumley et al (2002) has shown that in cases of extreme skewness and departure from normality assumptions, estimated means were close to normality for sample sizes (n) as low as 65 and that sufficiently large sample size is often under 100. Previous studies have highlighted the robustness of the maximum likelihood estimator of fixed effects from linear mixed model to the non-Gaussian distribution of the errors and the random effects (Gadda et al., 2006; Zhang and Davidian, 2001).
In summary, we are presenting an analytical model, the novelty of which is in the simplification of the likelihood function into a function of the OLS slopes for each individual wherein the correct drop-out process depends on the underlying unobserved slopes, and in its robustness to conditions when the normality assumption of the outcomes is violated whether asymptotically or with small number of repeated measures. This will therefore translate into expansion of its applicability to a wider spectrum of datasets. The Censoring model to be used depends on whether dropout hazards are constant or varying over the follow-up times. Finally, this model can be further modified so that analysis of more than two outcomes can be incorporated.
ACKNOWLEDGMENTS
The project described was supported by Award Number UL1RR029882 from the National Center for Research Resources. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Center for Research Resources or the National Institutes of Health. We are also grateful for the support from the United States National Institutes of Health grants AI 60373, MH 054693, CA 74015, and CA 69222.
Appendix: The likelihood function
which equals,
Footnotes
Conflict of interest: No authors have conflicts of interest. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs.
REFERENCES
- Besenski N, Rumboldt Z, Emovon C, Nicholas J, Kini S, Milutinovic J, Budisavljevic MN. Brain MR imaging abnormalities in kidney transplant recipients. AJNR. 2005;26:2282–2289. [PMC free article] [PubMed] [Google Scholar]
- Chen A, Manatunga AK. A note on proportional hazards and proportional odds models. Statistics and Probability Letters. 2007;77:981–988. [Google Scholar]
- Fieuws S, Verbeke G. Pairwise fitting of mixed models for the joint modeling of multivariate longitudinal profiles. Biometrics. 2006;62:424–431. doi: 10.1111/j.1541-0420.2006.00507.x. [DOI] [PubMed] [Google Scholar]
- Gueorguieva RV. Comments about modeling of cluster size and binary and continuous subunit-specific outcomes. Biometrics. 2005;61:862–867. doi: 10.1111/j.1541-020X.2005.00409_1.x. [DOI] [PubMed] [Google Scholar]
- Imai K. Statistical analysis of randomized experiments with non-ignorable missing binary outcomes: an application to a voting experiment. Journal of the Royal Statistics Series C. 2009;58:83–104. [Google Scholar]
- Jacqmin-Gadda H, Sibillot S, Proust S, Molina J, Thiébaut R. Robustness of the linear mixed model to misspecified error distribution. Computational Statistics and Data Analysis. 2006;51:5142–5154. [Google Scholar]
- Jahromi HA, Raiss-Jalali GA, Roozbeh J. Impact of adequate dialysis before transplantation on development of chronic renal allograft dysfunction in 3-Year posttransplant period. Transplantation Proceeding. 2006;38:2003–2005. doi: 10.1016/j.transproceed.2006.07.013. [DOI] [PubMed] [Google Scholar]
- James GD, Seally JE, alderman M, Ljungman S, Mueller FB, Pecker MS, Laragh JH. A longitudinal study of urinary creatinine and creatinine clearance in normal subjects. Race, sex and age differences. Am J Hypertension. 1988;1:124–131. doi: 10.1093/ajh/1.2.124. [DOI] [PubMed] [Google Scholar]
- Lipsitz SR, Molenberghs G, Fitzmaurice GM, Ibrahim JG. Protective estimator for linear regression with nonignorably missing Gaussian outcomes. Statistical Modelling. 2004;4:3–17. [Google Scholar]
- Lumley T, Diehr P, Emerson S, Chen L. The Importance of the normality assumption in large public health data sets. Annual Review of Public Health. 2002;23:151–169. doi: 10.1146/annurev.publhealth.23.100901.140546. [DOI] [PubMed] [Google Scholar]
- Mori M, Woolson RF, Woodworth GG. Slope estimation in the presence of informative right censoring: modeling the number of observations as a geometric random variable. Biometrics. 1994;50:39–50. [PubMed] [Google Scholar]
- Meier-Kriesche HU, Ojo AO, Chbrik DM, Punch JD, Leichtman AB, Kaplan B. Increased impact of acute rejection on chronic allograft failure in recent era. Transplantation. 2000;7:1098–1100. doi: 10.1097/00007890-200010150-00018. [DOI] [PubMed] [Google Scholar]
- SAS, Institue Inc. USA: Cary, N.C.; 2002. [Google Scholar]
- Schluchter MD. Methods for the Analysis of Informatively Censored Longitudinal Data. Statistics in Medicine. 1992;11:1861–1870. doi: 10.1002/sim.4780111408. [DOI] [PubMed] [Google Scholar]
- Ten Have TR, Localio R. Empirical Bayes estimation of random effects parameters in mixed effects logistic regression models. Biometrics. 1999;55:1022–1029. doi: 10.1111/j.0006-341x.1999.01022.x. [DOI] [PubMed] [Google Scholar]
- Thiébaut R, Gadda-Jacqmin H, Babiker A, Commenges D. The CASCADE Collaboration. Joint modeling of bivariate longitudinal lata with informative dropout and left-censoring, with application to the evolution of CD4+ cell count and HIV RNA viral load in response to treatment of HIV infection. Statistics in Medicine. 2005;24:65–82. doi: 10.1002/sim.1923. [DOI] [PubMed] [Google Scholar]
- Wang Z, Louis TA. Matching conditional and marginal shapes in binary mixed-effects models using a bridge distribution function. Biometrika. 2003;90:765–775. [Google Scholar]
- Wu MC, Bailey K. Analyzing changes in the presence of informative right censoring caused by death and withdrawal. Statistics in Medicine. 1988;7:337–346. doi: 10.1002/sim.4780070134. [DOI] [PubMed] [Google Scholar]
- Wu MC, Bailey K. Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. Biometrics. 1989;45:939–955. [PubMed] [Google Scholar]
- Wu MC, Carroll RJ. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics. 1988;44:175–188. [Google Scholar]
- Zhang D, Davidian M. Linear Mixed Models with Flexible Distributions of Random Effects for Longitudinal Data. Biometrics. 2001;57:795–802. doi: 10.1111/j.0006-341x.2001.00795.x. [DOI] [PubMed] [Google Scholar]


