Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Jan 15.
Published in final edited form as: Stat Med. 2016 Aug 18;36(1):81–91. doi: 10.1002/sim.7080

Pseudo Maximum Likelihood Approach for the Analysis of Multivariate Left-Censored Longitudinal Data

Ghideon Solomon 1,, Lisa Weissfeld 2
PMCID: PMC5138145  NIHMSID: NIHMS808997  PMID: 27538729

Abstract

The linear mixed effects model based on a full likelihood is one of the few methods available to model longitudinal data subject to left-censoring. However, a full likelihood approach is complicated algebraically due to the large dimension of the numeric computations, and maximum likelihood estimation can be computationally prohibitive when the data are heavily censored. Moreover, for mixed models, the complexity of the computation increases as the dimension of the random effects in the model increases. We propose a method based on pseudo likelihood that simplifies the computational complexities, allows a wide class of multivariate models, and that can be used for many different data structures including settings where the level of censoring is high. The motivation for this work comes from the need for a joint model to assess the joint effect of pro-inflammatory and anti-inflammatory biomarker data on 30-day mortality status while simultaneously accounting for longitudinal left- censoring and correlation between markers in the analysis of Genetic and Inflammatory Markers for Sepsis (GenIMS) study conducted at the University of Pittsburgh. Two markers, interleukin-6 (IL-6) and interleukin-10 (IL-10) which naturally are correlated because of a shared similar biological pathways and are left-censored because of the limited sensitivity of the assays, are considered to determine if higher levels of these markers is associated with an increased risk of death after accounting for the left-censoring and their assumed correlation.

Keywords: Left-censored data, Longitudinal biomarker data, Mixed effects model, Pseudo maximum likelihood

1. INTRODUCTION

Medical studies collect biomarker data to gain insight into the biological mechanisms underlying both acute and chronic diseases. These markers may be obtained at a single point in time to aid in the diagnosis of an illness or may be collected longitudinally to provide information on the relationship between changes in a given biomarker as it relates to the course of the illness. While there are many different biomarkers presented in the medical literature there are not as many studies that examine the relationship between multiple biomarkers, measured longitudinally, and predictors of interest. One example of such studies is the HIV literature where CD4 counts and viral loads are jointly modeled over time (Thiebaut, et al [1]).

Analysis of biomarker data has been important in understanding the relationship between markers of inflammation and the development of sepsis in the Genetic and Inflammatory Markers of Sepsis (GenIMS) study [2] conducted at the University of Pittsburgh. The study enrolled 2320 subjects with community acquired pneumonia through the emergency department of 28 hospitals in southwestern Pennsylvania, Connecticut, Michigan and Tennessee between 2001 and 2003. A battery of inflammatory markers were measured throughout the course of hospitalization in the cohort and subjects were followed for a period of one year to study the relationship between the trajectories of the pro-inflammatory and anti-inflammatory markers and the risk of death and development of sepsis. There was a need for statistical methods that can accommodate multiple longitudinal biomarkers accounting simultaneously for left-censoring and correlation between the markers, rather than relying on a series of separate longitudinal models for each biomarker.

We are interested in studying the association of levels of IL-6 (anti-inflammatory marker) and IL-10 (pro-inflammatory marker) and mortality while accounting for left-censoring and correlation between the markers. A large percentage of both IL-6 and IL-10 are left-censored due to the sensitivity of the assays. The assays used to measure the concentration of the biomarkers were not sensitive enough to detect levels of the molecule at the low end of normal, resulting in moderate to heavy left-censoring of the biomarker. These two markers are also naturally correlated because they share similar biological pathways.

While there are methods available for the analysis of left censored outcome data in the statistical literature, there are limited methods that can handle multivariate truncated longitudinal data when multiple outcomes need to be studied simultaneously. To address the issue of truncation when modeling data, researchers have proposed either the use of imputed values or the development of methods to handle the censoring directly. Imputing the lower quantification limit (Keet et al, [3]) or half of this limit (O’Brien et al,[4]) to substitute for the censored value and use of random imputation procedures (Paxton et al, [5]) are the most frequently used approaches. All of these naive approaches produce estimates with a substantial bias and they do not adjust the standard errors of the estimates for the loss of information due to censoring (Ghebregiorgis and Weissfeld [6]). Hughes [7], Jacqmin-Gadda et al, [8], and Lyles et al [9] proposed methods that handle left-censored measures, however all of these methods are restricted to a longitudinal model with a single outcome and it is difficult to extend them to handle multiple outcomes. In addition, since all of these methods are based on a full likelihood, they involve numeric and algebraic complexities that require the evaluation of a series of multiple integrals and become computationally prohibitive for data with a high rate of censoring. The computations become even more unstable for models with more than two random effects, leading to convergence issues when the current methodology is applied.

We propose a method that addresses the weaknesses of the current methodology for multivariate longitudinal models with left censored outcome data. The two major weaknesses, computational complexity and model instability, are addressed by applying the method of pseudo-likelihood. Using the pseudo-likelihood the estimation problem is broken into two separate steps with estimation of the parameters associated with the covariance taking place in step 1 and then estimating the remaining parameters, based on the modified likelihood, occurring in step 2. The proposed pseudo-likelihood method considerably reduces the computational burden associated with the current methods and is much more stable while preserving the properties of the original estimators.

We present the proposed methodology in section 2, where the multivariate linear mixed model is discussed in section 2.1 and the pseudo-likelihood for left-censored data is developed in section 2.2. Computational details are given in section 2.4 and a simulation study, conducted to assess the performance of the proposed model, is summarized in section 3. The proposed method is applied to analyze the GenIMS data, and results of these methods are compared with results obtained using existing methods in section 4. Finally we give a brief discussion and concluding remarks in section 5.

2. PROPOSED METHODOLOGY

2.1. Multivariate Linear Mixed Effects Model

Let k be the number of outcomes in the model; it will be assumed that each of the k longitudinally measured outcomes can be modeled using the mixed model. Let Yi=[Yi1,Yi2,,YiK]T be the response vector for subject i (i = 1, … ,N), where Yik is the nik × 1 vector of measurements for marker k (k = 1, … , K).

For simplicity we present the bivariate case (k=2) in this section.

Let Yi=[Yi1Yi2] be the response vector for subject i, for i=1, … , N and Yik be the nik vector of measurements of marker k (k=1,2), where ni=ni1+ni2. Let β=[β1β2] be a p × 1 vector of population parameters, known as fixed effects, and Xi be a known ni × p design matrix of covariate variables linking β to Yi. Let γi=[γi1γi2] be a q × 1 vector of subject-specific parameters, known as random effects and Zi a known ni × q design matrix of covariates linking γi to Yi.

The fixed effect design matrix is not restricted to assume the use of the same set of predictors for both outcomes. The random effect design matrix can also use different models for the two outcomes.

For multivariate normal data we extend the linear mixed model proposed by Laird and Ware [10]:

Yi=Xiβ+Ziγi+ϵi, (1)

as in Schafer [11] and Bandyopadhyay et.al. [12] with the assumption that the ni-dimensional vector Yi satisfies

Yiγi~N(Xiβ+Ziγi,Σi),

where γi and ϵi are assumed to be mutually independent with ϵi ~ N(0,Σi), γi ~ N (0,G) and Σi is the ni × ni covariance matrix of measurement errors, which is a diagonal matrix containing the two elements of the measurement error of each marker, that depends on i only through its dimension ni, and G is the covariance matrix of the random effects given by:

G=[G1G12G12G2],

which is partitioned into four sub-matrices: G1 is the covariance matrix including variance and covariance of random effects for the frst marker, G2 the covariance matrix including variance and covariance of random effects of the second marker and G12= G21′ is the matrix of covariances between random effects of each marker. The correlation between the two markers can be studied through the matrix G12

Thus the set of unknown parameters in Σi will not depend on i. Marginally, the Yi are independent normals with mean μ = Xiβ and covariance matrix Vi=Var(Yi)=ZiGZiT+i.

2.2. Standard Likelihood

Let Θ = (β, η) denote the parameter space, where β is the vector of fixed effect parameters and η is the vector of covariance matrix parameters. Then the model in (1) can be rewritten as

Yi=f(X,ϴ)+ϵi, (2)

where X is the design matrix of the model.

For the ith, subject each variables value is either observed, censored or missing (missing values are assumed to be missing completely at random and hence doesn’t contribute anything to the likelihood function). Using the notation from section 2.1, and letting Yio denote the nio-vector of observed outcomes, Yic the nic-vector of censored outcomes and ci the nic-vector of censoring threshold for subject i, the likelihood function is given by

L(β,η)=L(ϴ)=i=1NfYioϴ(Yioϴ)Pr(Yic<ciYio,ϴ).

The matrix Xi, vector Yi, and the covariance matrix Vi in section 2.1 can be partitioned into observed and censored components as

Xi=[XioXic],Yi=[YioYic],Vi=[VioVicoVicoVic].

From model (1), Yio has a multivariate normal probability density fYio, and using properties of the multivariate normal distribution it follows that the conditional distribution of Yic given Yio is normally distributed (Billingsley [26], Casella and Berger [28]) with the following mean and variance expressions respectively:

μico=Xicβ+ηicoηio1[Yioμio].
Vico=ηicηicoηio1ηicoT.

where ηi = Vi(η).

Let the multivariate normal distribution function of the conditional distribution of Yic given Yio be denoted by Φico, then the likelihood function can be rewritten as:

L(ϴ)=i=1NfYioϴ(Yioϴ)Φico(ciϴ)=i=1N12πηioe{12ηio(YioXβ)T(YioXβ)}ci1ci2cinic12πηicoe{12ηico(uμico)T(uμico)}du, (3)

where u is an nc vector.

Full likelihood based methods involve high dimensional multiple integration and require all parameters in the parameter space to be estimated simultaneously making the methods computationally complex and potentially unstable. To avoid the arduous process of estimating the multivariate normal cumulative distribution function for a large number of variables, we propose an alternate pseudo-likelihood method that improves efficiency with regards to computational time.

2.3. Pseudo Likelihood

The idea behind the pseudo maximum likelihood estimation (PMLE) method is to apply the maximum likelihood estimation (MLE) algorithm to a reduced system of likelihood equations instead of applying the MLE approach to all variables simultaneously. This methodology was used by Fieuws and Verbeke [13] to fit mixed models for multivariate longitudinal profiles by applying the MLE algorithm to pairs of variables and taking averages.

The general concept of the approach is to resolve the computational complexity of high- dimensional joint random-effects models by reducing the dimensionality of the problem. This is achieved by carrying out the estimation in two steps; in the first step instead of maximizing the likelihood of the full joint model, all pairwise bivariate models will be fitted separately, each of these pairwise model yields estimates, with classical optimal asymptotic properties, including consistency and asymptotic normality.

In a second step, the parameters obtained by fitting the pairwise models will be combined and averaged to obtain one single estimate for each parameter in the parameter space of the full joint model. Clearly, these averages are still asymptotically normally distributed with the correct parameter value as mean. Standard errors, however, do not directly follow from the mere combination of the individual results. Therefore, an additional step is needed to correctly calculate the sampling variability of the estimates resulting from the pairwise approach.

In this section we present the pseudo-likelihood approach using the notation in the previous sections and that of Fieuws and Verbeke [13]. Suppose we have data consisting of k variables measured on each of N independent subjects as defined in section 2.1. The number of pairs of variables is then given by K=k(k1)2. We can denote these K pairs of variables as (Y·r, Y·s) for r = 1,… ,k and s = 1,… ,k where r < s. Consider the log-likelihood involving the pair of variables (Y·r, Y·s) written as:

rs(θrs)=i=1Nirs(θirsYir,Yis) (4)

where θrs represents the vector of all parameters in the bivariate joint mixed model corresponding to the specific pair (r, s). To simplify the notation further, let m = r + (s − 1)(s − 2)/2, so that rs(θrs)=m(θm), where m = 1, · · · , K.

Let θ be the stacked vector combining all pair-specific parameter vectors of individual MLEs (i.e. θ = [θ1, θ2, · · · , θK]T . Thus maximization in the PMLE case will require maximizing the pseudo-likelihood function of the following form:

(θ)=m=1Km(θm) (5)

Which is the sum of K likelihood functions, so to maximize this likelihood function we simply need to maximize each of the K likelihood functions that are components (summands) of equation 5.

The PMLE, θ^ in the vector-valued parameter case is shown to be asymptotically normal (Geys et.al. [14]). However, to obtain consistent estimators and to account for the intra- correlation that may exist between the biomarkers, an appropriate adjustments is carried out by replacing the asymptotic covariance matrix by a robust estimator (the sandwich estimator).

The asymptotic multivariate normal distribution of the estimator for θ is given by

N(θ^θ)dMVN(0,A(θ)1B(θ)A(θ)1), (6)

where the matrix A(θ)−1B(θ)A(θ)−1 consists of variances and covariances of the estimated parameters. In the PMLE method only block-diagonal portions of this matrix need to be estimated i.e., variances and covariances of MLEs computed within pairs of variables, which are by-products of the full likelihood (MLE) method.The block-diagonal portions of this martix are given by:

Amm=E{2mθmθm},Bmn=E{mθmnθn},

for m, n = 1, · · · , K and estimates are obtained by dropping the expectations and replacing the unknown parameters by their estimates

Our interest in the proposed method is to estimate Θ = (β, η), the vector of fixed effects parameters and covariance matrix parameters which can be calculated by taking averages over all pairs. Θ = Aθ, where A is the matrix that calculates the desired averages. Each row of A selects and averages all elements of θ corresponding to a specific parameter. Then ϴ^=Aθ^ will produce the unique parameter estimates of interest with

N(ϴ^θ)~MVN(0,AΣ(ϴ^)A1)

where Σ(ϴ^) is the covariance matrix for ϴ^ obtained by using equation (6).

While the full multivariate method requires numerical calculation of the multivariate normal distribution function, the pseudo-likelihood method only needs to calculate the bivariate normal distribution function, where many software packages have inherent functions or packages for it.

2.4. Computation

We developed a SAS macro to obtain the pseudo-likelihood estimates of model parameters by maximizing the log pseudo likelihood in (4) or (5) using the dual Quasi-Newton optimization method in SAS Proc NLMIXED. The NLMIXED procedure provides improved ML estimates and unlike other procedures, it allows for the explicit modeling of random effects by allowing the user to write his/her own function. But NLMIXED does not have an option for adjusting standard errors, so we developed a macro that runs NLMIXED and calculates the corrected standard error using the robust(sandwich) estimator given in (6) (SAS macro and other codes developed are available up on request from the the corresponding author). Another limitation of the NLMIXED procedure is that it lacks a REPEATED statement and so has limited capacities for modeling the covariance structure of correlated data, however in modeling longitudinal data in which there is not a high degree of serial correlation this limitation may not be serious.

3. SIMULATION STUDY

A simulation study is conducted to explore the performance of the PMLE and compare it with the full likelihood methods. It is also used to verify that the proposed method produces unbiased estimators, to assess the accuracy of the standard errors obtained and to determine if there is any loss in efficiency compared to the full likelihood methods.

Data were generated as follows; for subject i = 1, … , N a binary variable is randomly generated using Bernoulli(0.5) and is assigned as a covariate value for each subject. For times of measurements, random numbers (similar to the GenIMS data measurement time) uniformly distributed between 1 and 5, were selected. Censored values were selected independent of time and subject after a detection limit is set to achieve a desirable proportion of left-censored data. The parameter values were selected to be close to those obtained from the GenIMS data:

β1=(0.700.150.20.03),andβ2=(0.500.150.150.05).

Using these specifications 1000 measurements of 200 subjects were simulated according two different models. A model with two random effects, one random slope for each response (marker)(model 7), and a model with 4 random effect (model 8)

Yijk=β0k+β1ktij+β2kXi+β3ktijXi+γ1iktij+ϵij, (7)
Yijk=β0k+β1ktij+β2kXi+β3ktijXi+γ0ik+γ1iktij+ϵij. (8)

where β and γ are vectors, ε is a matrix as defined in section 2, and Xi is a binary covariate variable.

A binary variable Xi is used as a covariate and its effect over time is studied by including an interaction term in the model.

The proposed method is compared to two different existing full likelihood methods for efficiency and accuracy. The first method used for the comparison is a method proposed by Jacqumin-Gadda et.al [8], in which parameter estimation is carried out by maximizing the full likelihood using a Marquardt algorithm and other multiple iterative process. They used a FORTRAN program called CENSAD (and hence we labeled estimates obtained by this method as ML-CENSAD in the tables of results). The second method used for comparison is that of Thiebaut and Jacqmin-Gadda [23] which is also a full likelihood method, the authors used SAS proc NLMIXED procedure for maximization and hence estimates obtained by this method are labeled as ML-NLMIXED in the tables. Both of these methods are compared with the proposed method for computational time, efficiency and bias.

In Table I we present an efficiency comparison of the methods by reporting a summary of computation time for each method under both models (7) and (8). Estimating using the proposed PMLE takes significantly less time than the other two full likelihood based methods for both models.The two full likelihood methods took comparable time of estimation for the model with two random effects (model 7) with ML-CENSAD converging a little faster than the ML-NLMixed model. But for the model with four random effects (model 8) the ML-NLMIXED did not converge and it was stopped after 3600 cpu seconds.

Table I.

Computational time comparisons according method used. Methods based on the two full Maximum likelihood based approaches(ML-CENSAD and ML-NLMIXED) and the Pseudo likelihood approach (PMLE) proposed in this study. Time given is in cpu seconds*.

Method Model with 2 Random Effects Model with 4 Random Effects
ML-CENSAD 720 1856
ML-NLMIXED 780 Didn’t converge
PMLE 415 715
*

time given is for a single run

Generally speaking, the computation time significantly increased based on the data structure, rate of censoring, and model used for both full likelihood based methods, but the PMLE is not significantly affected by these changes. For instance, the ML-CENSAD is very slow to converge when the number of measures (ni) for each subject increases (the results reported in table I are for ni =5 ∀i) and the ML-NLMIXED has convergence difficulties when the number of random effects in the model increases (for the model with 4 random effects the method fails to converge).

In Table II we present the bias and the the mean of the estimated SEs across all simulations from the full likelihood method (ML-NLMIXED) and the PMLE for the model with two random effects (model 7) for three different levels of censoring (31%, 40%, and 75%). As can be seen from this table, the proposed PMLE method produces estimates comparable to the full likelihood method (no significant difference) in significantly shorter computation time.

Table II.

Simulation results comparing the performance of the PMLE approach with the ML approach at 31%, 40% and 75% censoring rates. Bias and mean SEs for the fixed effect parameters for a model with 2 random effects(model (7)) are reported. Reported Values are the mean of 500 replications.

% Censored Parameter True Value Method
ML-NLMIXED PMLE
Bias SE Bias SE
31 Time1 (β11) −0.15 −0.006 0.034 0.004 0.033
Covar1 (β21) −0.20 0.001 0.043 0.001 0.043
Inter1 (β31) 0.03 0.009 0.047 −0.009 0.047
Time2 (β12) −0.15 −0.024 0.032 0.028 0.035
Covar2 (β22) 0.15 −0.018 0.043 0.018 0.042
Inter2 (β32) 0.05 −0.006 0.049 −0.008 0.049

40 Time1 (β11) −0.15 −0.008 0.064 0.004 0.035
Covar1 (β21) −0.20 0.006 0.085 0.007 0.063
Inter1 (β31) 0.03 0.086 0.065 −0.054 0.077
Time2 (β12) −0.15 −0.075 0.072 0.048 0.045
Covar2 (β22) 0.15 −0.069 0.075 0.038 0.053
Inter2 (β32) 0.05 −0.057 0.081 −0.018 0.052

75 Time1 (β11) −0.15 0.009 0.049 −0.020 0.034
Covar1 (β21) −0.20 0.091 0.064 0.0033 0.053
Inter1 (β31) 0.03 0.114 0.056 0.048 0.062
Time2 (β12) −0.15 −0.021 0.052 −0.050 0.040
Covar2 (β22) 0.15 −0.014 0.059 −0.043 0.048
Inter2 (β32) 0.05 −0.027 0.065 −0.031 0.050

To obtain a better understanding of the strengths and weaknesses of the proposed methodology, a comparison of estimates and mean SEs across all simulations using both univariate and bivariate modeling, is carried out in a simulation study. Table III presents the results of this study by displaying the parameter estimate, and the mean SEs for two different censoring rates (25% and 75%) for the model with four random effects (model 8). The estimated mean SE and the bias based on the bivariate modeling generally tends to be smaller compared to results of the univariate model.

Table III.

Estimates and mean SEs comparison using the fixed effect parameters for the univariate and bivariate models for a model with 4 random effects. PMLE is used for the bivariate model. Values reported are for the mean of 500 replications.

Univariate
Bivariate:-PMLE
% Censored Parameter True Value Estimate SE 95% Cov* Estimate SE 95% Cov*
25 Time1 (β11) −0.15 −0.135 0.001 94.5 −0.120 0.045 93.7
Covariate1 (β21) −0.20 −0.196 0.040 94.8 −0.207 0.032 94.9
Interaction1 (β31) 0.03 0.019 0.013 95.1 0.043 0.013 97.2
Time2 (β12) −0.15 −0.148 0.176 95.3 −0.157 0.104 95.1
Covariate2 (β22) 0.15 0.176 0.026 95.3 0.163 0.019 95.8
Interaction2 (β32) 0.05 0.019 0.108 96.0 0.036 0.102 95.6

75 Time1 (β11) −0.15 −0.142 0.124 95.2 −0.160 0.115 95.7
Covariate1 (β21) −0.20 −0.359 0.241 94.6 −0.293 0.220 95.2
Interaction1 (β31) 0.03 0.118 0.160 93.9 0.031 0.148 94.7
Time2 (β12) −0.15 −0.192 0.121 96.3 −0.184 0.117 96.5
Covariate2 (β22) 0.15 0.128 0.187 95.3 0.156 0.161 95.8
Interaction2 (β32) 0.05 0.047 0.128 96.1 0.045 0.118 95.6
*

95% Coverage

4. REAL DATA EXAMPLE

One major aim of the Genetic and Inflammatory Markers for Sepsis study (GenIMS) [2] was to examine the relationship between a set of inflammatory markers and the development of sever sepsis and mortality, that is to determine if changes in these markers over time were related to mortality and/or development of sepsis. Because sepsis can result from multiple illnesses, the study focused on recruiting patients with community-acquired pneumonia (CAP) in order to insure a relatively homogenous group with respect to susceptibility to sepsis. A total of 2320 patients were enrolled into the study through the emergency departments in 28 hospitals (2001–2003). Blood samples for cytokine assays were drawn daily for the first seven days and weekly thereafter while patients remained in the hospital. The biomarkers of greatest interest are IL-6 (interleukin-6) a pro-inflammatory marker and IL-10 (interleukin-10), an anti-inflammatory marker. In both cases, the assay for each of these markers had a lower limit of detection resulting in a censoring of the measurements, IL-6 was censored at either 2 pg/ml or 5 pg/ml depending on the assay used; IL-10 was censored at 5 pg/ml.

Besides left censored values the GenIMS data include missing data as well as right censored values. The data cohort distinguishes missing data according to the reason of missingness (due to limit of detection and other types).

The censoring rate for both IL-6 and IL-10 increases over time. For IL-6 these rates for days 1 through 7 were as follows: 384/1797 or 21.4% for day1, 401/1738 or 23.1% for day 2, 464/1754 or 26.5% for day 3, 474/1463 or 32.4% for day 4, 364/1127 or 32.3% for day 5, 288/869 or 33.1% for day 6 and 229/696 or 32.9% for day 7. The censoring rates for IL-10 were substantially higher with the results for days 1 through 7 as follows: 1086/1797 or 60.4% for day 1, 1138/1738 or 65.5% for day 2, 1281/1754 or 73.0% for day 3, 1128/1463 or 77.1% for day 4, 844/1127 or 74.9% for day 5, 670/869 or 77.1% for day 6 and 532/696 or 76.4% for day 7. Overall 9283 (49.1%) measures of the combined bivariate cases were left-censored.

To examine the relationship between these markers and mortality over time, we fit a linear mixed model with random intercept and random slope for each biomarker. Before applying the methods a normalizing transformation is considered to assure normality, and measurements are transformed using the log transformation function. The models used are:

Yij1=β01+β11tij+β21Mortalityi+β31(tijMortalityi)+γ0i1+γ1i1tij+ϵij,and
Yij2=β02+β12tij+β22Mortalityi+β32(tijMortalityi)+γ0i2+γ1i2tij+ϵij.

where the superscript 1 and 2 represents the two biomarkers IL-6 and IL-10 respectively, and mortality is the day 30 mortality status, whether a patient is dead or alive within 30 days of enrollment, a 0/1 variable. The proposed PMLE and the ML-CENSAD methods are used to estimate the model parameters. The ML-NLMIXED method is not used here as it does not converge for a model with more than two random effects.

Additionally the above models were also fit for each outcome separately providing a univariate analysis and results were compared with those obtained using the multivariate analysis.

The results from these analyses looking at the relationship between 30-day mortality and changes in IL-6 and IL-10 are presented in Table IV. When comparing the estimates obtained using the full likelihood method (ML-CENSAD) and the PMLE, the PMLE produced similar estimates (no significant difference) simplifying the computational complexities of the full likelihood methods. The standard errors of the PMLE estimates are slightly larger, but not significantly different than the values in ML-CENSAD.

Table IV.

Parameter estimates and S.E. of fixed and Random effects according to method (ML-CENSAD and PMLE) and model (Univariate Vs. Bivariate) used. Responses are log(IL-6) and log(IL-10) and time is measured in days

Bivariate Model according to Method
Parameter Univariate Models ML-CENSAD PMLE
Slope time-IL-6 −0.329(0.160) −0.234(0.009) −0.274(0.012)
Slope time-IL-10 −3.279(0.616) −3.096(0.655) −3.094(0.665)
Mortality-IL-6 1.277(0.228) 1.194(0.176) 1.194(0.181)
Mortality-IL-10 0.496(0.267) 0.532(0.127) 0.532(0.133)
Mortality*time-IL-6 0.160(0.349) 0.660(0.250) 0.664(0.258)
Mortality*time-IL-10 0.036(0.125) 0.704(0.100) 0.695(0.107)
Random effects Covariance estimates
Parameter Estimate SE P-value

G 11 3.425 0.175 < 0.001
G 21 −0.433 0.035 < 0.001
G 22 0.221 0.014 < 0.001
G 31 0.742 0.063 < 0.001
G 32 −0.148 0.015 < 0.001
G 33 0.693 0.031 0.017
G 41 −0.148 0.012 < 0.001
G 42 0.042 0.004 < 0.001
G 43 −0.124 0.005 0.001
G 44 0.025 0.001 0.021

The joint modeling of the two biomarkers using a bivariate model allows us to study the correlation between the markers over time which can be of importance when understanding the role of biomarkers in the development of sepsis. Table IV presents the parameter estimates of the random effects covariance matrix, the elements of the sub-matrix of the covariance between random effects of each marker are all significantly different from zero justifying the use of the bivariate model and showing the gain in efficiency by accounting for the correlation between the markers. Moreover, the use of the bivariate model resulted in a different relationship between mortality and IL-6 values when compared to modeling IL-6 by a univariate model (Fig. 1). In addition, the statistical significance of the interaction term for mortality and time for IL-10 differs depending on the model used, it is non-significant for the univariate model but highly significant for the bivariate model.

Figure 1.

Figure 1

Estimated model means of log(IL-6) for the GenIMS data, using the PMLE parameter estimation by mortality status (dead or alive) and model used (univariate or bivariate).

Similar to the simulation results, estimated coefficients and SEs are generally lower based on the bivariate modeling compared to separate univariate models, this may indicate the contribution of information provided by the second marker data in the bivariate model in the estimation of the first marker. But some of the contrasts between the estimated coefficients and SEs for the univariate and bivarite models in table IV are larger than what is observed in the simulation study. This could be due the big difference in censoring rate of each of the outcome variables (27.4% and 70.5% for IL-6 and IL-10, respectively) when they are analyzed separately. In general the degree and direction of the estimate (or bias) in the fixed effects depend on the relative rate of censoring.

Plots of the estimated means, obtained from the models using PMLE are presented in Fig. 1 and Fig.2. Fig.1 presents the estimated mean IL-6 for both dead and alive patients using a bivariate model (considering IL-6 and IL-10) and a univariate model considering only IL-6 as an outcome. The estimated mean IL-6 increases from day 1 over the subsequent days using bivariate model, while the level decreases using a univariate model. Higher estimated mean levels were observed for patients who died than survived patients using both models.

Figure 2.

Figure 2

Estimated model means of log(IL-10) for the GenIMS data, using the PMLE parameter estimation by mortality status (dead or alive) and model used (univariate or bivariate).

Estimated mean IL-10 level (Fig.2) were high on day 1 using both models, but a rapid decrease was observed on the subsequent days. There is no significant difference in the estimated mean IL-10 levels between patients who died and those who survived for the single outcome analysis, but the multiple outcome analysis (using both IL-10 and IL-6 simultaneously) indicated that the estimated mean IL-10 level is significantly higher for those who died compared with those who survived. In all cases, the computation time for the PMLE method is substantially less than the time needed for ML-CENSAD.

5. DISCUSSION

In this paper, we proposed and evaluated a pseudo maximum likelihood estimator for the analysis of multivariate longitudinal left-censored data that simplifies computational complexities and can be applied to different models and different data structures. The major advantage of the pseudo likelihood estimator is its computational efficiency and simplicity when compared with the full likelihood method currently used for modeling multivariate longitudinal data. The proposed method considerably eases the numerical complexities of the full likelihood approach. Further, it alleviates the need to specify and estimate many nuisance parameters that are required in a full likelihood approach. As is demonstrated by the simulation and the real life data studies, the pseudo likelihood approach yields estimates with small bias and robust standard errors.

For longitudinal data with high rate of censoring, like the GenIMS data, the pseudo likelihood method dramatically decreases the computation time and provides similar results of estimates to those obtained using the full likelihood methods. The full likelihood methods are limited by the rate of censoring as they require numerical evaluation of multiple integrals of a multivariate normal density whose dimension is equal to that of the number of censored measures. Whereas, the pseudo likelihood approach avoids numerical evaluation of multivariate integrals, since filling in censored observations requires computation of the univariate normal distribution function for which efficient numerical algorithms are available.

Unlike the full likelihood, the pseudo-likelihood only requires specification of the correlation structure of the repeated measures on a single individual. Further, compared to maximum likelihood, which requires the full likelihood to be correctly specified in order to obtain consistent estimates, the pseudo- likelihood estimates are consistent as long as the marginal distributions are correctly specified. Note that a Bayesian approach using a Bayesian generalized methods of moments could also be used to deal with the underlying problem.

ACKNOWLEDGEMENTS

We thank Dr. Derek Angus and the CRISMA laboratory for access to the GenIMS data. The GenIMS study was funded on the grant R01 GM61992 by the National Institute of General Medical Sciences.

The authors would also like to thank the associate editor and two anonymous reviewers for their careful reading of the manuscript and their insightful comments that greatly contributed to improving the paper to its current version.

Footnotes

This article reflects the views of the authors and should not be construed to represent FDA’s Views or policies

REFERENCES

  • 1.Thiebaut R, Jacqmin-Gadda H, Leport C, et al. Bivariate longitudinal model for the analysis of the evolution of HIV RNA and CD4 cell count in HIV infection taking into account left censoring of HIV RNA measures. J. Biopharm. Stat. 2003;13:271–282. doi: 10.1081/BIP-120019271. [DOI] [PubMed] [Google Scholar]
  • 2.Kellum J, Kong L, Fink M, Weissfeld L, Yealy D, Pinsky M, Fine J, Krichevsky A, Delude R, Angus D. Understanding the inflammatory cytokine response in pneumonia and sepsis: results of the Genetic and Inflammatory Markers of Sepsis (GenIMS) Study. Archives of Internal Medicine. 2007;167:1655–1663. doi: 10.1001/archinte.167.15.1655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Keet I, Janssen M, Veugelers P, Miedema F, Klein M, Goudsmit J, Coutinho R, de Wolf F. Longitudinal analysis of CD4 T cell counts, T cell reactivity, and human immunodeficiency virus type 1 RNA levels in persons remaining AIDS-free despite CD4 cell counts less than 200 for more than 5 years. J. Infect. Dis. 1997;176:665–671. doi: 10.1086/514088. [DOI] [PubMed] [Google Scholar]
  • 4.O’Brien TR, Rosenberg PS, Yellin F, Goedert J. Longitudinal HIV-1 RNA levels in a cohort of homosexual men. J. AIDS. 1998;18:155–161. doi: 10.1097/00042560-199806010-00007. [DOI] [PubMed] [Google Scholar]
  • 5.Paxton W, Coombs R, McElrath J, Keefers M, Sinangil F, Williams B, Chernoff D, Hughes J, Corey L. Longitudinal analysis of quantitative virologic measures in HIV-1 infected individuals with greater than 400 CD4+ cells/microliter. J. Infect. Dis. 1997;175:247–254. doi: 10.1093/infdis/175.2.247. [DOI] [PubMed] [Google Scholar]
  • 6.Ghebregiorgis G, Weissfeld L. Analysis of Longitudinally measured left-censored incomplete biomarker of severe sepsis with dropout and death. Joint Statistical Meeting Proceedings. 2007 [Google Scholar]
  • 7.Hughes JP. Mixed effects models with censored data with application to HIV RNA levels. Biometrics. 1999;55:625–629. doi: 10.1111/j.0006-341x.1999.00625.x. [DOI] [PubMed] [Google Scholar]
  • 8.Jacqmin-Gadda H, Thiebaut R, Chene G, Commenges D. Analysis of left-censored longitudinal data with application to viral load in HIV infection. Biostatistics. 2000;1:355–368. doi: 10.1093/biostatistics/1.4.355. [DOI] [PubMed] [Google Scholar]
  • 9.Lyles RH, Lyles CM, Taylor DJ. Random regression models for human immunodeficiency virus ribonucleic acid data subject to left censoring and informative drop-outs. J. R.Stat. Soc. C. 2000;49:485–497. [Google Scholar]
  • 10.Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
  • 11.Schafer JL. Imputation of missing covariates under a multivariate linear mixed model. Technical report. 1997:97–104. [Google Scholar]
  • 12.Bandyopadhyay S, Ganguli B, Chatterjee A. A review of multivariate longitudinal data analysis. Statistical Methods in Medical Research. 2011;20:299–330. doi: 10.1177/0962280209340191. [DOI] [PubMed] [Google Scholar]
  • 13.Fieuws S, Verbeke G. Pairwise fitting of mixed models for the joint modeling of multivariate longitudinal profiles. Biometrics. 2006;62:424–431. doi: 10.1111/j.1541-0420.2006.00507.x. [DOI] [PubMed] [Google Scholar]
  • 14.Geys H, Molenberghs G, Ryan L. Pseudo-likelihood inference for clustered binary data. Communications in Statistics: Theory and Methods. 1997;26:2743–2767. [Google Scholar]
  • 15.Zhongxin N, Heliang F. Moment method estimation based on censored sample. Journal of Systems Science and Complexity. 2005;18:254–264. [Google Scholar]
  • 16.Fahs F, Mittelhammer R, Yoder J. Generalized method of truncated moments estimation of censored equation systems. Washington State University Working Paper Series 2013; [Google Scholar]
  • 17.Lee M, Kong L, Weissfeld L. Multiple imputation for left-censored biomarker data based on Gibbs sampling method. Statistics in Medicine. 2012;31:1838–1848. doi: 10.1002/sim.4503. [DOI] [PubMed] [Google Scholar]
  • 18.Lynn HS. Maximum likelihood inference for left-censored HIV RNA data. Statistics in Medicine. 2001;201:33–45. doi: 10.1002/1097-0258(20010115)20:1<33::aid-sim640>3.0.co;2-o. [DOI] [PubMed] [Google Scholar]
  • 19.Paxton W, Coombs R, McElrath M, et al. Longitudinal analysis of quantitative virologic measures in human immunodeficiency virus infected subjects with ≥ 400 CD4 lymphocytes: implications for applying measurements to individual patients. National Institute of Allergy and Infectious Diseases AIDS Vaccine Evaluation Group. Journal of Infectious Diseases. 1997;175:247–254. doi: 10.1093/infdis/175.2.247. [DOI] [PubMed] [Google Scholar]
  • 20.Lotz A, Kendzia B, Gawrych K, Lehnert M, Bruning T, Pesch B. Statistical methods for the analysis of left-censored variables. GMS Medizinische Informatik, Biometrie und Epidemiologie. 2013;9:1860–9171. ISSN. [Google Scholar]
  • 21.Gong G, Samaniego F. Pseudo Maximum Likelihood Estimation: Theory and Application. The Annals of Statistics. 1981;9:861–869. [Google Scholar]
  • 22.White H. Maximum Likelihood Estimation of Misspecified Models. Econometica. 1982;50:1–25. [Google Scholar]
  • 23.Thiebaut R, Jacqmin-Gadda H, Chene G, Leport C, Commenges D. Bivariate linear mixed models using SAS Proc MIXED. Comput. Methods Programs Biomed. 2002;69:249–256. doi: 10.1016/s0169-2607(02)00017-2. [DOI] [PubMed] [Google Scholar]
  • 24.Thiebaut R, Jacqmin-Gadda H. Mixed models for longitudinal left-censored repeated measures. Comput. Methods Programs Biomed. 2004;74:255–260. doi: 10.1016/j.cmpb.2003.08.004. [DOI] [PubMed] [Google Scholar]
  • 25.Jennrich RI. Asymptotic properties of non-linear least squares estimators. The Annals of Mathematical Statistics. 1969;40(2):633–643. [Google Scholar]
  • 26.Billingsley P. Probability and Measure. 3rd Wiley; NY: 1995. [Google Scholar]
  • 27.Lubin JH, Colt JS, Camann D, Davis S, Cerhan J, Severson RK. Epidemiologic evaluation of measurement data in the presence of detection limits. Environ Health Perspect. 2004;112:1691–1696. doi: 10.1289/ehp.7199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Casella G, Berger R. Statistical Inference. 2nd Thomson Learning; Pacific Grove, CA: 2002. [Google Scholar]
  • 29.Jaffa MA, Gebregziabher M, Jaffa AA. A Joint Modeling Approach for Right Censored High Dimensional Multivariate Longitudinal Data. Journal of Biometerics and Biostatistics. 2014;5(4):1000203. doi: 10.4172/2155-6180.1000203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Blanche P, Proust-Lima C, Loubre L, Berr C, Dartigues JF, Jacqmin-Gadda H. Quantifying and comparing dynamic predictive accuracy of joint models for longitudinal marker and time-to-event in presence of censoring and competing risks. Biometrics. 2015;71(1):102–113. doi: 10.1111/biom.12232. [DOI] [PubMed] [Google Scholar]

RESOURCES