Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Oct 1.
Published in final edited form as: Stat Biosci. 2016 Jan 22;8(2):220–233. doi: 10.1007/s12561-015-9140-x

Bivariate Mixed Effects Analysis of Clustered Data with Large Cluster Sizes

Daowen Zhang 1,*, Jie Lena Sun 2, Karen Pieper 2
PMCID: PMC5061463  NIHMSID: NIHMS754386  PMID: 27746847

Abstract

Linear mixed effects models are widely used to analyze a clustered response variable. Motivated by a recent study to examine and compare the hospital length of stay (LOS) between patients undertaking percutaneous coronary intervention (PCI) and coronary artery bypass graft (CABG) from several international clinical trials, we proposed a bivariate linear mixed effects model for the joint modeling of clustered PCI and CABG LOS’s where each clinical trial is considered a cluster. Due to the large number of patients in some trials, commonly used commercial statistical software for fitting (bivariate) linear mixed models failed to run since it could not allocate enough memory to invert large dimensional matrices during the optimization process. We consider ways to circumvent the computational problem in the maximum likelihood (ML) inference and restricted maximum likelihood (REML) inference. Particularly, we developed an expected and maximization (EM) algorithm for the REML inference and presented an ML implementation using existing software. The new REML EM algorithm is easy to implement and computationally stable and efficient. With this REML EM algorithm, we could analyze the LOS data and obtained meaningful results.

Keywords: Meta Analysis, Missing Data, Multi-center Studies

1 Introduction

For patients with coronary artery diseases, percutaneous coronary intervention (PCI) or coronary artery bypass graft (CABG) are two common procedures recommended by physicians. Among many other factors, the hospital length of stay (LOS) is an important one to evaluate the cost effectiveness of each procedure. While some patients receiving a PCI are discharged from the hospital within 24 hours after the procedure, many PCI patients require longer observation period. Compared to a PCI, the CABG is a much more invasive procedure and the patients with a CABG usually require a longer hospital stay. Because of the difference in patient care and management across the world, there may be a regional difference in hospital LOS’s of these two procedures.

Recently, we have a unique opportunity to examine the hospital LOS’s of PCI and CABG procedures from 10 international clinical trials on patients with coronary artery diseases (See Section 7 for more details). One of the research objectives is to quantify the expected LOS’s and their differences of the CABG and PCI procedures after adjusting for patients’ age, smoking and diabetes statuses, for seven different representative regions across the world, while taking into account the trial-to-trial variation. A common approach used to analyze clustered data such as the LOS data is linear mixed effects models where random effects are used to model the cluster effects of trials or trial-to-trial variation. However, empirical evidence indicated that CABG and PCI LOS’s exhibit very different within-trial (residual) variation, which has to be taken into account in the analysis for optimal inference. We hence propose a bivariate linear mixed model for jointly modeling CABG and PCI LOS’s. Due to large number of patients in some trials, the commonly used statistical software, the MIXED procedure of SAS (SAS/STAT 9.3, 2013), for fitting linear mixed models, failed to run since it could not allocate enough memory to invert large dimensional matrices. This computational problem motivates the research of this paper.

In this paper, we considered ways to circumvent the computational problem in the maximum likelihood (ML) inference and the restricted maximum likelihood (REML) inference for the proposed bivariate linear mixed model. For the ML inference, we presented an implementation using existing software. For the REML inference, we proposed a novel expected and maximization (EM) algorithm (Dempster, Laird, and Rubin, 1977) and proved its theoretical statistical property. The new REML EM algorithm is easy to implement and computationally stable and efficient. With this REML EM algorithm, we could analyze the LOS data and obtained meaningful results.

The paper is organized as follows. In Section 2, we describe the proposed bivariate linear mixed model. In Section 3, we discuss the computational issues and solution in the inference on fixed effects and random effects with large cluster sizes. In Section 4, we discuss the ML inference on variance/covariance components and in Section 5, we discuss their REML inference and present an EM algorithm for the REML inference. We present simulation results in Section 6. The proposed REML EM algorithm was used to fit a bivariate mixed model for the LOS data in Section 7. We conclude the paper in Section 8 with some discussion.

2 Models

Suppose our sample consists of m clusters with 2 response variables. In cluster i, we have observations for response variable Y1 from n1i subjects and response variable Y2 from n2i subjects, so that the size of cluster i is ni = n1i + n2i. At the same time, some important covariates were also measured for each subject. We consider the following Laird-Ware (Laird and Ware, 1986) linear mixed model for each response variable:

Y1ij=XijTβ1+ZijTb1i+e1ij,i=1,2,,m,j=1,2,,n1i,
Y2ij=Xi,n1i+jTβ2+Zi,n1i+jTb2i+e2ij,i=1,2,,m,j=1,,n2i, (1)

where Xij (usually including the intercept) is the -dimensional vector of covariates for fixed effects β1 and β2, Zij, usually a subset of Xij, is the -dimensional vector of covariates for cluster specific random effects b1i and b2i. It is assumed that the random effects vectors b1i and b2i are independent across clusters and have normal distributions with mean zero and variance matrices D11, D22 and covariance matrix D12. We assume in this paper that the variance matrix D formed by D11, D12 and D22 is a positive definite and unstructured matrix, an assumption commonly used in the statistical literature. It is further assumed that e1ij’s and e2ij’s are independent residual errors, independent of b1i, b2i, and distributed as N(0,σ12) and N(0,σ22) respectively.

Model (1) is particularly applicable for data from multi-center studies where each center is considered as a cluster, or data analyzed in meta analysis when the data from each individual study were available, in which case each study is considered as a cluster. By modeling response variables Y1 and Y2 jointly as in (1), we can answer many important scientific questions. For the hospital LOS data example, if we take Xij = Zij = 1, we can then compare the overall LOS’s from CABG and PCI procedures by comparing β1 and β2. When we include dummy variables for different representative regions in the world as well as adjusting covariates, we can then estimate the covariate-adjusted CABG LOS’s and PCI LOS’s, and hence their differences for those regions.

Model (1) can be re-written as a linear mixed effects model for a single response variable. For cluster i, denote by Y1i = (Y1i1, Y1i2, …, Y1in1i)T, the data vector of response variable 1 and Y2i = (Y2i1, Y2i2, …, Y2in2i)T of response variable 2. Similarly we can define the residual error vectors e1i and e2i. Stack XijT(j=1,2,,n1i) to form X1i and Xi,n1i+jT(j=1,2,,n2i) to form X2i. Similarly we can form Z1i and Z2i. We then define new response vector Yi=(Y1iT,Y2iT)T, covariate matrices Xi = diag{X1i, X2i}, Zi = diag{Z1i, Z2i}, and residual vector ei=(e1iT,e2iT)T. In matrix notation, model (1) can be re-written as a regular linear mixed effects model for the new response vector Yi

Yi=Xiβ+Zibi+ei,i=1,2,,m, (2)

where β=(β1T,β2T)T is the p × 1 (p = 2) new fixed effects vector, bi=(b1iT,b2iT)T is the q × 1 (q = 2) new random effects vector distributed as N(0, D), and ei ~ N(0, Ri) is the new residual error vector independent of bi with Ri=diag{σ12In1i×n1i,σ22In2i×n2i}.

The estimation and inference for linear mixed effects models have been thoroughly studied in the statistical literature for the case where cluster sizes are small to moderate, and have been implemented in many statistical softwares such as the MIXED procedure of SAS for routine analyses of many correlated data, including clustered data. However, the cluster sizes ni’s in the LOS data example for some clusters are so prohibitively large that a general purpose software cannot even allocate enough memory during the computation process. In this paper, we develop estimation and inference procedures for model (2) by taking advantage of the special features of the random effects and residual errors in the model.

3 Inference on Fixed Effects β and Random Cluster Effects bi

Stack Yi to form the response vector Y. Stack Xi to form the design matrix X for the fixed effects vector β, and define Z = diag{Z1, Z2, …, Zm} for the design matrix of the random effects vector b=(b1T,b2T,,bmT)T. Given D, σ12 and σ22, the maximum likelihood estimate (MLE) of β is given by

β^=(XTV1X)1XTV1Y=(i=1mXiTVi1Xi)1i=1mXiTVi1Yi, (3)

where V = var(Y) = diag{V1, V2, …, Vm}, Vi=var(Yi)=ZiDZiT+Ri. For applications where ni >> q, the calculation of β̂ can be facilitated by the following expressions

Vi1=Ri1Ri1Zi(D1+ZiTRi1Zi)1ZiTRi1, (4)
XTV1X=i=1mXiTRi1Xii=1mXiTRi1Zi(D1+ZiTRi1Zi)1ZiTRi1Xi, (5)
XTV1Y=i=1mXiTRi1Yii=1mXiTRi1Zi(D1+ZiTRi1Zi)1ZiTRi1Yi. (6)

Inference of cluster-specific random effects bi’s can be based on the best linear unbiased predictors (BLUPs) b^i=E(bi|Yi;β)|β=β^=DZiTVi1(YiXiβ^), which can also be obtained by solving the so called “mixed model equation” (Henderson 1984) jointly for β and b:

[XTR1XXTR1ZZTR1XZTR1Z+G1][βb]=[XTR1YXTR1Y], (7)

where G = diag{D,D, …, D} and R = diag{R1, R2, …, Rm} are the variance-covariance matrices of b and the residual vector. Note that equation (7) can also be derived as the “score equation” by treating b as a parameter vector and maximizing f(Y,b;β,D,σ12,σ22)=f(Y|b;β,σ12,σ22)f(b;D) jointly with respect to β and b. There are several advantages of using equation system (7). First, the MLE β̂ and BLUPs can be obtained without inverting any matrix whose dimension is of the magnitude of the cluster sizes ni’s. When the total number of clusters m is small to moderate such that p + mq, the dimension of the equation system (7), is not too large, we can obtain the MLE β̂ and BLUPs by directly inverting the coefficient matrix of this system. Otherwise, the coefficient matrix can be inverted efficiently by recognizing that ZTR−1Z + G−1 is a block diagonal matrix with the ith block being ZiTRi1Zi+D1. Second, once β̂ and are obtained, the joint inference on β and b can made easily based on the following variance-covariance expression

var(β^βb^b)=[XTR1XXTR1ZZTR1XZTR1Z+G1]1. (8)

4 Maximum Likelihood Estimation and Inference of D and (σ12,σ22)

In the linear mixed model literature, the inference on the variance-covariance matrix D of the cluster specific random effects bi’s and residual variances (σ12,σ22) can be carried out using the maximum likelihood (ML) approach or the restricted maximum likelihood (REML) approach. We will discuss the ML implementation in this section and the REML in Section 5.

The ML approach estimates β, D and (σ12,σ22) by jointly maximizing the following log-likelihood function with respect to β, D and (σ12,σ22)

(β,D,σ2;Y)=12i=1mlog|Vi|12i=1m(YiXiβ)TVi1(YiXiβ), (9)

where |Vi| can be calculated using |Vi|=σ12n1iσ22n2i|Iq×q+ZiTRi1ZiD|. The inference of D and (σ12,σ22) can be based on the Fisher information matrix or the observed information matrix from (9).

It is interesting to note that although the software such as the MIXED procedure of SAS routinely used to fit linear mixed model (2) may suffer from computational difficulty, the software such as the NLMIXED procedure of SAS that uses a numerical integration method for likelihood evaluation can easily implement the likelihood inference of β, D and (σ12,σ22) without inverting large matrices. Since Yij|bi are independent under the model specification, by definition, the log-likelihood function (9) can be equivalently re-written as

(β,D,σ12,σ22;Y)=i=1mlogexp{j=1nilog f(Yij|bi;β,σ12,σ22)+log f(bi;D)}dbi. (10)

Again by the given model specification, the exponent j=1nilog f(Yij|bi;β,σ12,σ22)+log f(bi;D) inside the integration in (10) does not involve inverting any large matrix, and is a quadratic function of bi with a negative definite second derivative matrix. Therefore, the integration can be calculated exactly by the adaptive Gaussian-Hermite quadrature method with only one quadrature point, which is the numerical integration method implemented in the NLMIXED procedure of SAS. Hence, this SAS procedure can be used to conduct the ML inference on β, D and (σ12,σ22) for data from large clusters.

Even though we can use existing software such as the NLMIXED procedure of SAS to calculate the likelihood function (β,D,σ12,σ22;Y), we may still encounter numerical problems since this procedure relies on the numerical differentiation method to calculate the required derivatives in the optimization process. When the total number of parameters in the model is small, the numerical differentiation method may work well. Otherwise, non-negligible numerical errors accumulated over the optimization process will yield poor parameter estimates or cause the optimization process not to converge. To overcome these problems, we can modify the EM algorithm of Laird and Ware (1982) for the ML estimation of a linear mixed model by treating the cluster specific random effects bi’s as missing data. It is well-known that an EM algorithm is numerically stable, will always increase the observed likelihood function during the parameter update process, and under the given model specification, parameter updates for MLE’s have closed form expressions without the need to invert any large dimensional matrices. Readers are referred to Laird and Ware (1982) for more details.

5 Restricted Maximum Likelihood Estimation and Inference of D and (σ12,σ22)

It is well-known that the ML approach for estimating D and (σ12,σ22) presented in the previous section did not account for the estimation of the fixed effects β and hence will produce biased estimates of D and (σ12,σ22) for small to moderate sample size m. This is because when β is profiled out from the log-likelihood (9) during the maximization process, the resulting function of D and (σ12,σ22) alone is not a log-likelihood function of any (transformed) data. Therefore, the estimating equations for D and (σ12,σ22) are biased. Even though the biases in the ML estimates of D and (σ12,σ22) will disappear asymptotically, it may not be negligible for small to moderate sample size m, especially for the estimate of D. Since the MLE’s of the fixed effects β depend on the estimates of D and (σ12,σ22), the biases in these estimates usually will in turn yield more biased estimates of β. On the other hand, the restricted maximum likelihood (REML) approach will usually yield less biased estimates of D and (σ12,σ22). For example, in a regular linear regression model, the REML estimate of the residual variance is unbiased.

For the linear mixed model (2), it is well-known that the REML log-likelihood function of D and (σ12,σ22) is given by (Diggle et. al. 2002)

R(D,σ12,σ22;Y)=12log|XTV1X|12log|V|12(YXβ^)TV1(YXβ^), (11)

where β̂ is the MLE of β given in (3). Compared to the maximum log-likelihood function (9), the REML log-likelihood function involves one extra term log |XTV−1X|. The expression in (5) for XTV−1X can be used to calculate this extra term without directly inverting individual variance-covariance matrix Vi.

Although it is computationally feasible to calculate the REML function R(D,σ12,σ22;Y) for data with large cluster sizes, direct maximization of this function sometimes may be numerically unstable, even if the random effects dimension q is moderate. Unlike the ML estimation, we cannot adapt an existing software such as the NLMIXED procedure of SAS to implement the REML estimation. Because of the attractive properties of an EM algorithm, we present in the following sub-section an EM algorithm for maximizing the REML log likelihood function R(D,σ12,σ22;Y).

5.1 EM Algorithm for the REML Estimation of D and (σ12,σ22)

It is well-known that the REML log-likelihood function R(D,σ12,σ22;Y) can be alternatively derived using the following formula (Harville, 1974)

LR(D,σ12,σ22;Y)=eR(D,σ12,σ22;Y)=f(Y|b;β,σ12,σ22)f(b;D)dβdb,

where both fixed effects β and random effects b are integrated out from the joint distribution of response Y and random effects b.

Denote by D(t), σ12(t), and σ22(t) the estimates of D, σ12, and σ22 at the t-th iteration of the REML EM algorithm. Theorem 1 in Appendix A, the general EM algorithm for the REML estimation, indicates that the update D(t+1), σ12(t+1),σ22(t+1) can be obtained by maximizing the following REML Q−function with respect to D and (σ12,σ22)

QR(D,σ12,σ22|D(t),σ12(t),σ22(t))=Eglog f(Y|b;β,σ12,σ22)+Eglog f(b;D)=log|R|12Eg(YXβZb)TR1(YXβZb)m2log|D|12i=1mEg(biD1bi),

where the expectation Eg stands for the expectation taken with respect to the distribution g(β,b|Y;D(t),σ12(t),σ22(t)) defined by

g(β,b|Y;D(t),σ12(t),σ22(t))=f(Y|b;β,σ12(t),σ22(t))f(b;D(t))/LR(D(t),σ12(t),σ22(t);Y).

It should be noted that β and b are both treated as random variables in the distribution g(β,b|Y;D(t),σ12(t),σ22(t)). By the given model specification, it is easy to see that g(β, b|Y; D(t), σ2(t)) is a normal distribution with the mean given by the solution of (7) and the variance matrix given by (8), both of which are evaluated at the current estimates (D(t),σ12(t),σ22(t)). Denote the mean vector of this distribution by (β̂T, T)T and the variance matrix by Σ (we suppress the obvious dependence of these quantities on t for a cleaner presentation). Then maximizing QR(D,σ12,σ22|D(t),σ12(t),σ22(t)) with respect to D and (σ12,σ22) leads to the following updates:

D(t+1)=1mi=1m{b^ib^iT+Σi},
σ12(t+1)=1n1[i=1m(Y1iX1iβ^1Z1ib^1i)T(Y1iX1iβ^1Z1ib^1i)+tr{ΣW1TW1}],
σ22(t+1)=1n2[i=1m(Y2iX2iβ^2Z2ib^2i)T(Y2iX2iβ^2Z2ib^2i)+tr{ΣW2TW2}],

where Σi is the block of Σ corresponding to the random effects bi, W1 and W2 are sub-matrices of (X, Z) corresponding to responses Y1 and Y2 respectively, and n1=i=1mn1i,n2=i=1mn2i, total numbers of patients for responses Y1 and Y2. We can take advantage of the special structure of Σ described in Section 3 to facilitate the calculation of the updates D(t+1), σ12(t+1) and σ22(t+1).

It is clear that this EM algorithm for the REML estimation is easy to implement since it yields explicitly closed form parameter updates at each iteration. It is also computationally efficient since it does not involve inverting any matrix that is of the same magnitude of the cluster sizes. Another advantage of this EM algorithm is that the variance-covariance matrix (8) is readily available at the convergence of the algorithm for the joint inference of fixed effects β and random effects b.

6 Simulation

In this section, we conducted a simulation study to demonstrate the superior performance of the EM algorithm for the REML estimation presented in the previous section over the ML inference using the NLMIXED procedure of SAS, especially in terms of computational time. In the simulation, we set m = 20 and conducted 100 simulation runs. In each simulation run, we generated n1i and n2i from Bin(2000, 0.5) for i = 1, 2, …, m = 20. Here the expected cluster size is 2000, so large that existing software such as the MIXED procedure of SAS cannot allocate enough memory during the computation. We consider the situation of one covariate x, which was generated from N(0, 1). Then clustered responses Y1ij and Y2ij were generated according to the following bivariate linear mixed model:

Y1ij=β10+xijβ11+b1i+e1ij,i=1,2,,m,j=1,2,,n1i,
Y2ij=β20+xi,n1i+jβ21+b2i+e2ij,i=1,2,,m,j=1,2,,n2i, (12)

where β10 = 2, β11 = 3, β20 = 3, β21 = 2, b1i, bi2 were generated from a bivariate normal distribution with mean zero and variance matrix {σij} with σ11 = 2, σ12 = σ21 = 1 and σ22 = 5, and the variances of the residual errors e1ij and e2ij were set as σ12=1 and σ22=4. The generated data set was analyzed by the proposed REML EM algorithm and the ML method implemented using the NLMIXED procedure of SAS. The simulation results summarized over 100 simulation runs are presented in Table 1.

Table 1.

Simulation results comparing REML and MLE. “Bias”, “SD”, “SE” and “CP” are the bias, empirical standard deviation, estimated standard error and empirical coverage probability of a 95% CI of a parameter estimate based on 100 Monte Carlo runs.

REML EM ML

Parameter Bias SD SE CP Bias SD SE CP
β10 −0.015 0.316 0.318 0.950 −0.015 0.316 0.310 0.950
β11 0.000 0.007 0.007 0.930 0.000 0.007 0.007 0.930
β20 −0.054 0.510 0.496 0.910 −0.054 0.510 0.483 0.890
β21 0.001 0.015 0.014 0.910 0.001 0.015 0.014 0.910
σ11 0.071 0.614 - - −0.032 0.584 - -
σ12 0.052 0.754 - - 0.000 0.717 - -
σ22 0.048 1.670 - - −0.205 1.588 - -
σ12
−0.001 0.001 - - −0.001 0.010 - -
σ22
0.002 0.037 - - 0.002 0.037 - -

From Table 1, we observe that both REML and ML estimates of the fixed effects are virtually unbiased with the same empirical standard deviations up to 3 decimal points. However, the ML approach produced slightly smaller estimated standard errors for two intercept estimates β̂10 and β̂20, resulting in lower than nominal level coverage probability of a 95% confidence interval (CI) of β20. We also observe that the REML estimates of all variance/covariance components are virtually unbiased. However, there is a sizable bias in the ML estimate of σ22. Although it is well-known that ML estimates of variance/covariance components are somewhat biased since the ML approach does not take into account the estimation of the fixed effects, it is possible that the bias in the ML estimate of σ22 and the low empirical coverage of a 95% CI of β20 is more a consequence of the computation instability. This is because the numerical differentiation method in calculating required derivatives used in the NLMIXED procedure of SAS may not be stable. The significance of the REML EM algorithm compared to the ML approach is the savings in the computational time. The simulation study was performed on a Linux platform built on Intel(R) (8)Core(TM) i7 CPU at 2.93GHz with 8 GB RAM. On average the REML EM took only 6 seconds to analyze a data set, while the ML approach took 210 seconds to analyze the same data set.

7 Application to the LOS Data

We applied the bivariate linear mixed model (1) and the REML EM algorithm developed in this paper to analyze the LOS data. The data are from ten international clinical trials for patients with acute coronary syndromes (ACS): EARLY ACS, GUSTON IIb, GUSTO IV, PARAGON A, PARAGON B, PRISM, PRISM PLUS,PURSUIT, SUNERGY and TRACER. Each trial was conducted in several of the following representative regions in the world: Asia, Australia/New Zealand, Europe, Latin America, Middle East, North America and South Africa. For more description of these trials, please see Chan, et. al. (2012). In this paper, we only included patients receiving a CABG or a PCI, resulting following cluster sizes (numbers of patients) for the ten clinical trials: 6469 for EARLY ACS, 1651 for GUSTON IIb, 1952 for GUSTO IV, 402 for PARAGON A, 1525 for PARAGON B, 1072 for PRISM, 1027 for PRISM PLUS, 4013 for PURSUIT, 6385 for SYNERGY and 8689 for TRACER. One of the research objectives is to estimate the expected LOS’s of the CABG and PCI procedures, and their differences in different regions adjusting for patients’ age, smoking and diabetes statuses, while taking into account the study-to-study variation. To address this objective, we considered the following model

Y1ij=D1ijTβ1+C1ijTγ1+b1i+e1ij,i=1,2,,10,j=1,2,,n1i,
Y2ij=D2i,n1i+jTβ2+C2ijTγ2+b2i+e2ij,i=1,2,,10,j=1,,n2i, (13)

where Y1ij is the CABG LOS of patient j in trial i, and Y2ij is the PCI LOS of the (n1i + j)th patient in the same trial, D1ij, D2i,n1i+j are 7 × 1 vectors of dummy variables for regions with corresponding effects β1, β2, C1ij, C2i,n1i+j are 3 × 1 vector of covariates representing patients’ age in year (centered at 60 years, the approximate sample mean), smoking status (1 for ever/current smoking, 0 for never-smoking) and diabetes status (1 for diabetes and 0 for non-diabetes). Therefore, β1, β2 are vectors representing the expected LOS’s of CABG and PCI procedures respectively for the patient populations (called target populations) who are 60 years old, non-smoking and free of diabetes in the seven regions in the world, and γ1, γ2 are the effects of those covariates on two LOS’s. We used normally distributed random effects (b1i, b2i)T with mean zero and variance matrix {σij} to account for the study-to-study variation in the CABG LOS’s and PCI LOS’s, as well as their correlation. The residual errors e1ij’s and e2ij’s are assumed to be independent with residual variances σ12 and σ22 respectively.

Because of the large cluster sizes in this application, the commercial software such as the MIXED procedure of SAS could not fit the bivariate linear mixed model (13) since it failed to allocate enough memory during the optimization process. Using the REML EM algorithm we developed, the fitting of model (13) was completed in seconds. The estimated expected LOS’s of the CABG and PCI procedures and their differences for the target populations in the seven regions are presented in Table 2. Also presented in this table are contrasts of these parameters of a region to the reference region: South Africa. From this table, we observe that patients’ CABG and PCI LOS’s in Asia and Europe are much longer than those in other regions, that the expected CABG LOS is about 4 or 5 days longer the expected PCI LOS in any region, with Asia having the biggest difference. This result may reflect the difference in patient management and quality of care in different regions.

Table 2.

Estimated expected CABG LOS (μ̂C), PCI LOS (μ̂P) and their difference μ̂C − μ̂P for the target populations in the seven regions of the world. Also presented are contrasts (denoted by Δ) of these parameters of a region to the referee region: South Africa. Numbers inside parentheses are the estimated standard error (SE) for the estimated parameters. AU/NZ stands for Australia/New Zealand.

Region CABG PCI CABG - PCI

μ̂C Δμ̂C μ̂P Δμ̂P μ̂C − μ̂P Δ(μ̂C − μ̂P)
Asia 12.77(0.58) 3.08(0.70) 6.29(0.24) 2.23(0.23) 6.48(0.60) 0.85(0.74)
AU/NZ 9.30(0.42) −0.39(0.58) 3.83(0.25) −0.23(0.24) 5.46(0.45) −0.17(0.63)
Europe 11.95(0.24) 2.26(0.46) 6.07(0.21) 2.00(0.20) 5.88(0.25) 0.23(0.50)
Latin America 10.46(0.35) 0.78(0.53) 5.23(0.24) 1.17(0.23) 5.24(0.37) −0.39(0.57)
Middle East 8.70(0.42) −0.98(0.58) 4.80(0.23) 0.73(0.22) 3.91(0.44) −1.71(0.62)
North America 8.10(0.23) −1.58(0.46) 3.77(0.22) −0.29(0.20) 4.33(0.25) −1.29(0.50)
South Africa 9.68(0.49) 4.06(0.29) 5.62(0.54)

Even though the estimated expected CABG LOS is much greater than the expected PCI LOS in any region, the estimated variances of random trial effects for the CABG LOS and PCI LOS are very close (σ̂11 = 0.36, σ̂22 = 0.43). The covariance of random trial effects for CABG LOS and PCI LOS is estimated to σ̂12 = 0.20, indicating studies with longer CABG LOS’s tend to have longer PCI LOS’s. The estimated residual variances of patient CABG LOS’s and PCI LOS’s are σ^12=31 and σ^22=11 respectively, indicating that there is much greater within-trial variation in patients’ CABG LOS’s than in patients’ PCI LOS’s in any given study and region.

We also implemented the ML approach using the NLMIXED procedure of SAS. Unfortunately, the program produced a Hessian matrix with some negative eigen-values during the optimization process and failed to provide valid estimates, especially for the variance/covariance parameters. This further demonstrates the advantage of the proposed REML EM algorithm for fitting a linear mixed model for clustered data with large cluster sizes.

8 Discussion

In this paper, we discussed joint modeling of two clustered response variables, hospital length of stay (LOS) for patients with acute coronary syndromes (ACS) who received CABG and PCI surgeries from several international clinical trials. A bivariate linear mixed model with separate fixed effects and random effects was proposed for the joint modeling where random effects were used to model the study-to-study variation. Due to large cluster sizes of these clinical trials, commercial software such as SAS could not fit the proposed model. We proposed a computational solution to the ML inference and the REML inference. Specially, we proposed an EM algorithm for the REML inference and provided an implementation of the ML inference using an existing procedure of SAS. Simulation studies indicated that compared to the ML approach implemented with SAS, the proposed REML EM algorithm is computationally stable and efficient. By applying the proposed REML EM algorithm to the LOS data, we are able to make inference on the expected LOS’s and their difference of CABG and PCI surgeries for different regions in the world, after adjusting for important covariates.

With the current random effects structure, when a large number of candidate covariates are available, we may use traditional model selection methods such as forward, backward and step-wise selection, or more advanced penalized likelihood methods with sparsity penalties on potential fixed effects. In this case, we need to calculate the likelihood function of the potential fixed effects. For this purpose we can develop an ML EM algorithm similar to that of Laird and Ware (1982) for our bivariate linear mixed model. The current random effects structure uses two correlated response specific random intercepts to model the cluster-to-cluster variation of the responses. If prior information indicates that some “covariate effects” on each response variable vary from cluster to cluster, we can expand the current random effect structure to include these “covariate effects” as additional (correlated) random effects for both responses. Information criterion such as Bayesian Information Criterion together with the penalized likelihood approach may be used to select a final random effects structure and fixed effects.

The proposed bivariate linear mixed model is more appropriate for jointly modeling two continuous clustered responses. For discrete clustered responses such as binary responses, a more attractive model would be the bivariate generalized linear mixed model. It is straight-forward to extend the idea to this model for the case of large cluster sizes. For example, the ML inference can be easily implemented with the NLMIXED procedure of SAS. However, the EM algorithm of the REML-like inference in the bivariate generalized linear mixed model does not have all the attractive features of the bivariate linear mixed model. For example, there is no closed form expression for updated parameters during the EM iteration, and a numerical method has to be used to obtain the parameter update. Despite this problem, we think that the EM algorithm will still be computationally more stable and efficient compared to the ML inference implemented with the NLMIXED procedure of SAS. It will be an interesting future project to evaluate their performance for the case of large cluster sizes.

Acknowledgments

The work of D. Zhang was supported by NIH grant R01 CA85848-12. The work of J. L. Sun and K. Pieper was supported through a grant from the Duke Clinical research Institute. The authors are grateful to Dr. Eric Peterson for his institutional financial support as well as providing the LOS data, without which this research would not have been possible.

Appendix A

Properties of the REML EM Algorithm

The following theorem states the general EM algorithm for the REML estimation.

Theorem 1

Suppose f(y|b; β, θ) is the conditional probability density function of y given random effects b and f(b; θ) is the probability density function of the random effects b. Assume the following “REML” likelihood

LR(θ;y)=f(y|b;β,θ)f(b;θ)dβdb

exists, so that g(β, b|y; θ) = f(y|b; β, θ) f (b; θ)/LR(θ; y) is a probability density function. Given estimate θ(t) at the tth iteration, define the Q-function for the REML algorithm as follows:

QR(θ|θ(t))=Eg[log{f(y|b;β,θ)f(b;θ)}],

where Eg stands for the expectation taken with respect to g(β, b|y; θ(t)). Denote θ(t+1) the update of θ obtained by maximizing QR(θ|θ(t)) with respect to θ, then

R(θ(t+1);y)R(θ(t);y), for all t,

where ℓR(θ; y) = log LR(θ; y).

Proof

By the definition of LR(θ; y), we have

log{f(y|b;β,θ)f(b;θ)}=log g(β,b;θ)+R(θ;y).

Taking expectation with respect to g(β, b|y; θ(t)) in both sides of the above equation leads to

QR(θ|θ(t))=Eg log g(β,b|y;θ)+R(θ;y).

Denote H(θ) = Eg log g(β, b|y; θ). Then

R(θ(t+1);y)R(θ(t);y)=QR(θ(t+1)|θ(t))QR(θ(t)|θ(t)){H(θ(t+1))H(θ(t))}.

By the definition of H(θ), Eg and Jensen’s inequality, we have

H(θ(t+1))H(θ(t))=Eg log {g(β,b|y;θ(t+))g(β,b|y;θ(t))}log Eg{g(β,b|y;θ(t+))g(β,b|y;θ(t))}=log(1)=0.

Since θ(t+1) maximizes QR(θ|θ(t)), it follows that QR(t+1)(t)) ≥ QR(t)(t)). Therefore

R(θ(t+1);y)R(θ(t);y), for all t.

Footnotes

The authors declare no conflict of interest.

References

  1. Chan M, Sun J, Newby L, Lokhnygina Y, White HD, Moliterno DJ, Throux P, Ohman EM, Simoons ML, Mahaffey KW, Pieper KS, Giugliano RG, Armstrong PW, Califf RM, Van de Werf F, Harrington RA. Trends in clinical trials of non-ST-segment elevation acute coronary syndromes over 15 years. International Journal of Cardiology. 2012 doi: 10.1016/j.ijcard.2012.01.065. [DOI] [PubMed] [Google Scholar]
  2. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 1977;39:138. [Google Scholar]
  3. Diggle PJ, Heagerty P, Liang k-Y, Zeger SL. Analysis of Longitudinal Data. 2nd. Oxford University Press; 2002. [Google Scholar]
  4. Harville DA. Bayesian inference for variance components using only error contrasts. Biometrika. 1994;61:383–385. [Google Scholar]
  5. Henderson CR. Applications of Linear Models in Animal Breeding. University of Guelph; 1984. [Google Scholar]
  6. Laird NM, Ware JH. Random effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
  7. SAS/STAT 9.3 User’s Guide. Cary, NC: SAS Institute Inc.; 2013. [Google Scholar]

RESOURCES