Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Jul 14.
Published in final edited form as: Biometrics. 2009 Jun;65(2):369–376. doi: 10.1111/j.1541-0420.2008.01107.x

Testing Random Effects in the Linear Mixed Model Using Approximate Bayes Factors

Benjamin R Saville 1, Amy H Herring 2
PMCID: PMC3136354  NIHMSID: NIHMS256347  PMID: 18759835

SUMMARY

Deciding which predictor effects may vary across subjects is a difficult issue. Standard model selection criteria and test procedures are often inappropriate for comparing models with different numbers of random effects due to constraints on the parameter space of the variance components. Testing on the boundary of the parameter space changes the asymptotic distribution of some classical test statistics and causes problems in approximating Bayes factors. We propose a simple approach for testing random effects in the linear mixed model using Bayes factors. We scale each random effect to the residual variance and introduce a parameter that controls the relative contribution of each random effect free of the scale of the data. We integrate out the random effects and the variance components using closed form solutions. The resulting integrals needed to calculate the Bayes factor are low-dimensional integrals lacking variance components and can be efficiently approximated with Laplace’s method. We propose a default prior distribution on the parameter controlling the contribution of each random effect and conduct simulations to show that our method has good properties for model selection problems. Finally, we illustrate our methods on data from a clinical trial of patients with bipolar disorder and on data from an environmental study of water disinfection by-products and male reproductive outcomes.

Keywords: Bayes factors, Laplace approximation, mixed model, random effects, variance components

1. Introduction

The linear mixed model (Laird and Ware, 1982) is a popular method for longitudinal data. It is often of interest to test whether certain random effects should be included, which is equivalent to setting the random effect variance equal to 0. Because this test lies on the boundary of the parameter space, classical procedures such as the likelihood ratio test can break down (Pauler, Wakefield and Kass, 1999; Lin, 1997; Self and Liang, 1987; Stram and Lee, 1994). Tests for a single variance component can be carried out using mixtures of chi-square distributions (Self and Liang, 1987; Stram and Lee, 1994). For multivariate tests, distributions of test statistics are complex and not easily applied (Pauler et al., 1999; Feng and McCulloch, 1992; Shapiro, 1988). Some alternative frequentist methods include score tests (Lin, 1997; Commenges and Jacqmin-Gadda, 1997; Verbeke and Molenberghs, 2003; Molenberghs and Verbeke, 2007; Zhang and Lin, 2008), Wald tests (Molenberghs and Verbeke, 2007; Silvapulle, 1992), and generalized likelihood ratio tests (Crainiceanu and Ruppert, 2004), but these methods are not easily extended for testing multiple variance components.

Some MCMC methods have been suggested to test variance components (Sinharay and Stern, 2001; Chen and Dunson, 2003; Cai and Dunson, 2006; Kinney and Dunson, 2008), but these methods are generally time consuming to implement, require special software, and rely on subjective choice of hyperparameters. The most widely used approximation to the Bayes factor is based on the Laplace approximation (Tierney and Kadane, 1986), resulting in the Bayesian information criterion (BIC) (Schwarz, 1978) under certain assumptions. However, the Laplace approximation can fail when the parameter lies on the boundary (Pauler et al., 1999; Hsiao, 1997; Erkanli, 1994). Pauler et al. (1999) proposed estimating Bayes factors using an importance sampling approach and a boundary Laplace approximation. Their methods are complex and are only applied in the context of simple variance component models.

Because random effects models involve a distinct parameter for every individual, linear mixed models can have a very large number of dimensions. This is problematic in calculating Bayes factors because high dimensional integrals are needed to calculate marginal likelihoods. Generally these integrals are not available in closed form and one must consider approximations. Numerical integration is not useful in high dimensions (Kuonen, 2003). Monte Carlo integration and importance sampling provide alternatives, but these methods lack accuracy and are computationally demanding. The Laplace and BIC approximations also suffer in performance from high-dimensionality (Kass and Raftery, 1995). In addition, it is not clear how to define the penalty for dimensionality in the BIC (Spiegelhalter et al., 2002).

It is well known that Bayes factors can be sensitive to the choice of prior distributions (Kass and Raftery, 1995). This is problematic in situations in which one has no prior information on the parameters and the goal is to choose the best model based on the data. In these situations it is common to use default priors which can be chosen based on the data without subjective inputs and that result in good frequentist and Bayesian operating characteristics. However, one must choose these default priors with care, because as the prior variance increases the Bayes factor will increasingly favor the null model (Bartlett, 1957).

We propose a simple approach for conducting approximate Bayesian inferences on whether to include random effects in the linear mixed model. Our approach involves a re-parameterization of the linear mixed model allowing an accurate Laplace approximation to the Bayes factor. In Section 2 we introduce two motivating examples. In Section 3 we introduce our method in the context of a repeated measures ANOVA model and conduct a simulation study. In Section 4 we generalize our approach to the linear mixed model and in Section 5 we apply the method to the motivating examples. We conclude with a discussion in Section 6.

2. Motivating Examples

2.1 Hamilton Rating Scale for Depression

We first consider a clinical trial of patients with bipolar I disorder (Calabrese et al., 2003), GlaxoSmithKline study SCAB2003. The investigators concluded that lamotrigine treatment significantly delays time to intervention for a depressive episode compared to placebo. Repeated measurements were collected on the Hamilton Rating Scale for Depression (HAMD), a 17-item scale measuring the severity of depression. We wish to determine if lamotrigine is effective in reducing depressive symptoms during the first year after randomization, as measured by the HAMD summary score, using a linear mixed model.

In assessing the impact of lamotrigine on HAMD scores, it is important to assess the heterogeneity among patients with respect to the overall mean and slope over time. One might expect patients to have different patterns of depressive episodes, perhaps resulting from biological mechanisms or unmeasured covariates that cause different individual profiles over time. This leads to the task of testing whether to include random coefficients for the intercept and slope over time in the linear mixed model.

2.2 Exposure of disinfection by-products in drinking water and male fertility

A multi-center study of 229 male patients from 3 sites (Raleigh, NC; Memphis, TN; and Galveston, TX) was conducted to evaluate the effect of disinfection by-products (DBP’s) in drinking water on male reproductive outcomes in presumed fertile men. DBP exposure was measured using water system samples and data collected on individual water usage. Three exposure variables of interest for the outcome percent normal sperm are brominated haloacetic acids (HAA-Br), brominated trihalomethanes (THM-Br), and total organic halides (TOX). Our focus is to evaluate the DBP exposure effects on the response (% normal sperm) using a linear mixed model.

In assessing the impact of DBPs on sperm quality, it is of interest to assess the heterogeneity among study sites with respect to the overall mean of percent normal sperm (i.e. intercept) and each DBP effect (i.e. slope). It may be the case that study site is a surrogate for unmeasured aspects of water quality or other unmeasured factors of interest.

3. Testing a random intercept

3.1 ANOVA model

We start by considering a simple ANOVA model with a random subject effect

M1(1):Yij=μ+λbi+εij, (1)

in which Yij is the jth response for subject i, μ is an intercept, bi ~ N(0, σ2) is a scaled random effect multiplied by a parameter λ > 0, and εij ~ N(0, σ2) for i = 1,…, n and j = 1,…, ni. This is an ANOVA model with a random effect variance equal to λ2σ2. The utility of this decomposition will later become clear. The notation Mk(a) represents parameterization (a) for model k. We distinguish models parameterized in different ways in order to consider the impact of parameterization on the accuracy of the Laplace approximation to the marginal likelihood. Our initial focus is to compare the ANOVA model to a model with no random subject effect,

M0:Yij=μ+εij, (2)

in which μ is an overall mean and εij ~ N(0, σ2). We are interested in estimating Bayes factors to determine the posterior odds of M1(a) versus M0 given equal prior odds, or

B10(a)=p(Y|M1(a))p(Y|M0), (3)

in which Y=(y1,,yn). Estimating the Bayes factor relies on estimates of

p(Y|Mk(a))=p(Y|θk(a),Mk(a))π(θk(a)|Mk(a))dθk(a), (4)

in which p(Y|θk(a),Mk(a)) is the data likelihood, θk(a) is the vector of model parameters, and π(θk(a)|Mk(a)) is the prior distribution of θk(a). Let M0(a)=M0, as only one parameterization of M0 will be considered. For M1(a) and M0, the marginal likelihoods are generally not available in closed form. Let θ1(a)=(ζ1(a),b,σ2) and θ0(a)=(ζ0(a),σ2), such that ζk(a) includes all parameters other than the random effects b and residual variance σ2. We specify an inverse gamma prior on σ2 with parameters v, w, in which the mean of σ2 is w/(v − 1) for v > 1. By marginalizing out b and σ2 in M1(1) and σ2 in M0, it can be shown that (Y|μ,λ,M1(1)) and (Y|μ, M0) follow multivariate t-distributions with

p(Y|ζk(a),Mk(a))=  Γ(2v+m2)Πi=1n|wvi|1/2(π2v)m/2Γ(2v/2)×  {1+12vi=1n(yiμi)(wvΣi)1(yiμi)}2v+m2, (5)

in which m=i=1nni is the total number of observations. In our ANOVA setup, μi = μ1ni in M0 and M1(1), Σi = Ini in M0, and Σi=(Ini+λ21ni1ni) in M1(1). After specifying a prior on μ, the Laplace method can be used to integrate over (μ, λ) in M1(1) and μ in M0. We use the resulting marginal likelihood estimates to estimate the Bayes factor B10(1). For additional details regarding these multivariate t-distributions, see the Web Appendices.

The Laplace approximation is based on a linear Taylor series approximation of l˜(ζk(a))=log{p(Y|ζk(a),Mk(a))π(ζk(a)|Mk(a))}. The marginal likelihood p(Y|Mk(a)) for model k and parameterization (a) is estimated by

p^(Y|Mk(a))=(2π)d/2|Σ˜k(a)|1/2p(Y|ζ˜k(a),Mk(a))π(ζ˜k(a)|Mk(a)), (6)

in which Σ˜k(a) is the inverse of the negative Hessian matrix of l˜(ζk(a)) evaluated at the posterior mode ζ˜k(a). Because the Laplace approximation is based on a linear Taylor series approximation, it requires certain regularity conditions. When the posterior mode lies on the boundary of the parameter space these regularity conditions fail. The Laplace method can perform poorly even if the mode is close to the boundary. Estimating the marginal likelihood in M1(1) via Laplace can be problematic because of the restriction λ > 0, motivating the parameterization

M1(2):Yij=μ+eϕbi+εij, (7)

in which ϕ = log(λ). Note the parameter space of ϕ is unrestricted, ensuring that the posterior mode is not on the boundary. Hence the estimated marginal likelihoods based on M1(2) may be more accurate than those based on M1(1). Following the steps outlined previously, it can be shown that (Y|μ,ϕ,M1(2)) follows a multivariate t-distribution with density (5), with μi = μ1ni and Σi=(Ini+e2ϕ1ni1ni). We use the Laplace approximation to integrate over (μ, ϕ) and use the resulting estimate of the marginal likelihood to estimate the Bayes factors B10(2).

3.2 Prior choice

It is important to identify default priors that yield robust tests. We have introduced a parameter λ (or ϕ) that controls the contribution of the random effect free of the scale of the data. We propose priors λ ~ log Nλ, τλ) and ϕ ~ Nϕ, τϕ) with κϕ = κλ and τϕ = τλ set so that the priors for λ and ϕ are “equivalent”, meaning they lead to the same marginal likelihood. Differences in the estimated marginal likelihoods between M1(1) and M1(2) result from differences in the accuracy of the two Laplace approximations. Given that the random effects are scaled to the residual error, we suggest κλ = log(0.3) and τλ = 2 as reasonable default values. This centers the parameter λ between 0 and 1. Even if the true value of λ is not close to 0.3, the variance ensures that the prior covers most reasonable values of λ. The choice of priors will be discussed further in our simulation studies.

3.3 Simulation study

We conducted a simulation study to evaluate the performance of our method in correctly identifying models with or without random effects. We simulated 250 data sets based on (1) with n = 50, 100, 500, 1000, ni = 3, σ2 = 1, μ = 0, and λ = 0, 0.15, 0.30, 0.45, 0.60. In order to implement the Laplace approximation, we estimated the posterior mode using an algorithm by Nelder and Mead (1965). We used prior distributions μ ~ N(0, 1) and σ2 ~ InvGam(1, 1), which are non-informative given the simulation settings. Estimates of the Bayes factors B10(1) and B10(2) were calculated for each data set and were interpreted according to the scale given by Wasserman (2000) and Jeffreys (1961).

Both parameterizations performed well in favoring the correct model, but accuracy depended on both the sample size and the true value of λ. In general, as λ increased our method increasingly favored M1(a) over the null model. Figure 1 shows box plots of log B^10(1) for λ = 0, 0.30. The dotted black line represents a log Bayes factor of 0. As the sample size increased, our method more accurately detected the absence of a random slope for λ = 0 and more accurately detected the presence of a random slope for λ > 0. Additional tables are available in the Web Appendices.

Figure 1.

Figure 1

Box plot of log B^10(2), by λ

We compared our method to the restricted likelihood ratio test (RLRT) and ANOVA F-test with α = 0.05. The asymptotic distribution of the RLRT follows a 50:50 mixture of a point mass at 0 and a chi-square distribution with 1 degree of freedom. For the ANOVA model, F =MSA/MSE, with MSA the between-group mean square and MSE the within-group mean square, follows an F distribution with (n−1) and (m−n) degrees of freedom, in which n is the number of subjects and m is the total number of observations. For our Bayesian approach, we chose to reject H0 if B10(a)>1. As illustrated in Table 1 (columns 2–6), the power of our approach was competitive with both the ANOVA F-test and restricted likelihood ratio test. Parameterization (2) was somewhat conservative, while parameterization (1) led to an inflated Type I error for n ≥ 100.

Table 1.

Testing a random intercept or random slope, power and Type I error

Testing a random intercept Testing a random slope
n λ
B^10(1)>1
B^10(2)>1
ANOVA RLRT
ψ22
B^21(1)>1
B^22(2)>1
RLRT
50 0 3 4 4 3 0 8 3 4
0.15 12 12 12 12 0.15 14 7 9
0.3 22 22 22 22 0.3 30 17 20
0.45 59 60 60 59 0.45 56 57 59
0.6 92 93 92 92 0.6 75 90 89
100 0 7 4 4 4 0 4 4 6
0.15 14 7 8 8 0.15 12 8 13
0.3 47 42 43 42 0.3 38 38 44
0.45 93 86 87 87 0.45 69 87 88
0.6 99 98 98 98 0.6 72 99 100
500 0 12 4 5 5 0 3 2 4
0.15 26 14 17 17 0.15 26 21 30
0.3 94 90 92 92 0.3 80 85 95
0.45 100 100 100 100 0.45 97 99 100
0.6 100 100 100 100 0.6 98 100 100
1000 0 12 6 8 8 0 1 0 3
0.15 36 28 35 34 0.15 33 32 45
0.3 100 100 100 100 0.3 93 99 100
0.45 100 100 100 100 0.45 85 100 100
0.6 100 100 100 100 0.6 96 100 100

Table gives percent of times the null hypothesis was rejected out of 250 simulations

*

Type I error is given by λ = 0 or ψ22=0

To assess the impact of our prior choice, we conducted additional simulations with priors of the form λ ~ logN(h, ζ), with various combinations of h = log(1), log(0.3), log(0.15) and ζ = 1, 2, 3. Additionally, we considered a log t-distribution for λ with 2 and 10 degrees of freedom. Equivalent priors were also assessed for ϕ using parameterization (2). We also considered prior distributions σ2 ∝ σ−2, σ2 ~InvGamma(0.1,0.1), σ2 ~InvGamma(0.01,0.01), and μ ∝ c in which c is a constant. We found that alternative priors on σ2 and μ did not have notable influence on the estimated Bayes factors, but the priors for λ and ϕ did have some influence. More specifically, values of h = log(0.30) and ζ = 2 resulted in power and Type I error rates that closely aligned with the ANOVA F-test and RLRT. Smaller values of h or ζ led to increased Type I error rates and larger values of h or ζ led to more conservative Type I error rates. See the Web Appendices for more details.

4. Testing a random slope

4.1 Linear mixed model

We generalize our approach for testing random effects by considering a linear mixed model

yi=Xiβ+Zibi+εi, (8)

in which yi = (Yi1,…, Yini)′ is a ni × 1 vector of responses, Xi = (xi1,…, xip) is a ni × p design matrix, Zi = (zi1,…, ziq) is a ni × q design matrix, β = (β1,…, βp)′ is a p × 1 vector of parameters, and bi = (bi1,…, biq)′ is a q × 1 vector of random effects. It is assumed that εi ~ N(0,R) is independent of bi ~ N(0, ψ), in which ψ is the q × q covariance matrix of random effects. A popular choice for R is σ2I, which assumes conditional independence given the random effects.

We choose bih ~ N(0, σ2) and introduce a parameter λh that controls the contribution of the hth random effect. Let Mk(a) refer to model k and parameterization a. Similar to Chen and Dunson (2003), our reparameterized model takes the form

M0(1):yi=Xiβ+Z0,iΛ0(1)Γ0b0,i+εi, (9)

in which Z0,i = (zi1,…, ziq), b0,i = (bi1,…, biq)′, Λ0(1)=diag(λ0(1))=diag(λ1(1),,λq(1)), and λl(1)~ log N(log(0.3),2) for l = 1,…, q. Let Γ0 be a lower triangular matrix with 1q along the diagonal, and lower off-diagonal elements γ0 which induce correlation between the random effects.

Our focus is to test whether to include an additional random effect bi(q+1). Let Z1,i, Λ1(1), Γ1 and b1,i be equal to their counterparts from (9), but including the elements corresponding to the additional random effect bi(q+1). The model including the additional random effect is

M1(1):yi=Xiβ+Z1,iΛ1(1)Γ1b1,i+εi, (10)

in which Z1,i = (zi1,…, zi(q+1)), b1,i = (bi1,…, bi(q+1))′, Λ1(1)=diag(λ1(1))=diag(λ1(1),,λq+1(1)),λl(1)~ log N(log(0.3),2) for l = 1,…, (q + 1), and Γ1 is a lower triangular matrix with 1q+1 along the diagonal and lower off-diagonal elements γ1.

As demonstrated with the ANOVA model, we also consider an alternate parameterization of (9) and (10) by setting λk(2)=logλk(1),Λ0(2)=diag(eλ0(2))=diag(eλ1(2),,eλq(2)),Λ1(2)=diag(eλ1(2))=diag(eλ1(2),,eλq+1(2)), and λl(2)~N(log(0.3),2) for l = 1,…, (q + 1). Let M0(2) denote the reduced model and M1(2) denote the full model under this parameterization.

4.2 Approximating the marginal likelihoods

In order to implement the Laplace approximation, we first marginalize out b and σ2. Let σ2 ~ InvGam(v, w). It can be shown that the marginal distribution p(Y|β,λk(a),γk,Mk) follows a multivariate t-distribution with density (5), in which μi = Xiβ and Σi=(Ini+Zk,iΛk(a)ΓkΓkΛk(a)Zk,i). After specifying priors for β and γk, we use the Laplace method to integrate over (β,λk(a),γk) to approximate the marginal likelihoods p(Y|Mk(a)) used to evaluate the Bayes factor B10(a). See the Web Appendices for additional details.

As previously discussed, many of the existing methods for testing variance components are only applicable in simple settings. One major advantage of our approach is we can test multiple random effects simultaneously by modifying equation (10) such that the Z1,i, Λ1(a), Γ1, and b1,i correspond to a model with several additional random effects.

4.3 Simulation study

We conducted a simulation study to test for the presence of a random slope. We defined one predictor based on time, such that xi = (1, 2,…, J)′ and xi* is centered and scaled by two times the standard deviation of xi. This standardization puts the regression coefficients on the same scale as binary indicators (Gelman, 2008). In the context of our method, it puts the scale of the λ parameter for the random slope on the same scale as the λ corresponding to the random intercept. Let Xi=(1,xi*), β = (β0, β1)′, M1 refer to the random intercept model, and M2 refer to the random intercept and slope model. Letting Z1,i = 1J, λ1(a)=λ0(a), and b1,i = bi0, we have

M1(a):yij=β0+λ0(a)bi0+β1xij*+εij (11)

for the random intercept model. Letting Z2,i=(1J,xi*),λ2(a)=(λ0(a),λ1(a)), and b2,i = (bi0, bi1)′, we have

M2(a):yij=β0+λ0(a)bi0+β1xij*+λ1(a)(γ12bi0+bi1)xij*+εij (12)

for the random intercept and slope model. Our focus is to compare model M2(a) to M1(a). After integrating out b and σ2 to produce marginal multivariate t-distributions, the integrals needed to calculate the marginal distributions p(Y|M1(a)) and p(Y|M2(a)) only have 3 or 5 dimensions, respectively. Hence the Laplace method can effectively be used to integrate over (β0,β1,λ0(a)) in M1(a) and (β0,β1,λ0(a),λ1(a),γ12) in M2(a).

We simulated 250 data sets based on a random intercept and slope model, Yij=β0+bi0+(β1+bi1)xij*+εij. We set β0 = 2.75, β1 = 3, J = 10, σ2 = 1, and bi ~ N2(0, ψ) with ψ12=ρ(bi0,bi1)ψ11ψ22,ψ11=1, and ρ(bi0, bi1) = −0.3. We varied the random slope standard deviation and sample size over ψ22=0,0.15,0.30,0.45,0.60 and n = 50, 100, 500, 1000, respectively. We used prior distributions β ~ N(0, 10I), σ2 ~ InvGam(1, 1) and γ12 ~ N(0, 1), which are non-informative given the simulation settings.

In general, as the standard deviation of bi1 increased, our method increasingly favored M2(a) over M1(a). Figure 2 shows box plots of log B^21(1) for ψ22=0 and ψ22=0.30. As the sample size increased, our method more accurately detected the absence of a random slope for ψ22=0 and more accurately detected the presence of a random slope for ψ22>0. Additional tables of the estimated Bayes factors are available in the Web Appendices.

Figure 2.

Figure 2

Box plot of log B^21(2), by standard deviation of random slope

We compared our method to the RLRT with α = 0.05. The asymptotic distribution of the RLRT follows a 50:50 mixture of chi-square distributions with 1 and 2 degrees of freedom. Consistent with the ANOVA simulation, we reject H0 if B21(a)>1. As illustrated in Table 1 (columns 7–10), the power of our approach is competitive with the restricted likelihood ratio test. Parameterization (2) is somewhat conservative, while parameterization (1) leads to an inflated Type I error for n = 50. Also, we occasionally ran into numerical problems using parameterization (1).

We conducted additional simulations with the same set of alternative priors given in section 3.3 in order to evaluate the sensitivity of the estimated Bayes factors to the prior distributions. We found similar results and conclude that a prior distribution λ ~ logN(log(0.3), 2) has good frequentist properties for model selection. Additional details are given in the Web Appendices. We also conducted simulations for testing a random intercept and slope simultaneously and found similar performance with respect to power and Type I error (see Web Appendices).

5. Applications

5.1 Hamilton Rating Scale for Depression

We consider 275 patients (160 lamotrigine 200/400 mg/day, 115 placebo) with at least one outcome measurement and complete covariate data. The number of repeated measurements per subject ranges from 1 to 17, and HAMD scores range from 0 to 35 with a mean value of 7. To better approximate normality, we used a square root transformation of HAMD (sqrt-HAMD). We fit a linear mixed model with sqrt-HAMD as the response, predicted by sqrt-HAMD at screening and baseline, time (in years), treatment, gender, age (< 30, 30–40, 40–50, ≥ 50), and the number of depressive or mixed episodes in the last year (1–2 vs. ≥ 3). Screening refers to the time at enrollment and baseline refers to the time of randomization (after stabilization). Investigators wished to learn whether there was individual-specific variation in the time course of depression symptoms, so that models with random intercepts and slopes (M2), random intercepts only (M1), and no random effects (M0) were fit to the data. We center and scale the variable year by two standard deviations. Based on the scale of both the response and the explanatory variables, we use vague priors on the fixed effects and residual variance as β ~ N9(0, 10I) and σ2 ~InvGamma(0.01, 0.01).

The estimated log Bayes factors are log β̂21 = 61.6, log β̂20 = 348.6, and log β̂10 = 287.9. These estimates show strong evidence for M2 versus the other models, indicating the intercepts and slopes vary significantly by individual. Fitting M2 using MCMC methods, basing inference on 15,000 samples after discarding a burn-in of 10,000, we plot the predicted mean for a typical subject (45 year old female with 1 depressive episode in past year and average values of HAMD at screening and baseline) in Figure 3, along with predicted individual HAMD scores for 30 random subjects. These predicted individual lines highlight the considerable heterogeneity across subjects. A step-wise RLRT approach rejects H0 (M0) versus the alternative H1 (M1) and rejects H1 versus the alternative H2 (M2), which agrees with our Bayesian approach in favoring the random slopes model.

Figure 3.

Figure 3

Predicted mean & individual HAMD, with random slope and intercept

Lamotrigine use, age, and sqrt-HAMD at screening and baseline all appear to be significant predictors of the outcome. A one unit increase in sqrt-HAMD at baseline is associated with a 0.63 (95% CI = 0.51, 0.74) increase in mean sqrt-HAMD, and a one unit increase in sqrt-HAMD at screening is associated with a 0.22 (95% CI = 0.02, 0.48) increase in mean sqrt-HAMD. Older patients generally had greater values of sqrt-HAMD than younger patients. As the main association of interest, sqrt-HAMD values for subjects on lamotrigine are on average 0.33 units lower (95% CI = −0.54, −0.10) than sqrt-HAMD values for subjects on placebo. The 95% credible interval does not contain 0, indicating that lamotrigine may be effective at reducing depressive symptoms. These conclusions reinforce the time-to-event analysis of Calabrese et al. (2003).

5.2 Exposure of disinfection by-products in drinking water and male fertility

The three study sites for the disinfection by-product study were chosen due to their different levels of disinfection by-product measures of interest, including THM-Br, HAA-Br, and TOX. However, because the study populations in the three sites differed dramatically by factors that were not well characterized by measured fixed effects, investigators wanted to base inferences on a model that allowed site-specific heterogeneity not only in overall sperm quality but also in the relationships between DBP concentrations and sperm quality. For each DBP, we fit a random intercepts and slopes model (M2), random intercepts model (M1), and model with no random effects (M0), adjusting in all models for age categories, education, and abstinence interval before providing the sample. Each DBP is transformed by subtracting the mean and dividing by two standard deviations. For percent normal sperm, we use a probit transformation multiplied by five so that the transformed response has a range of −10.5 to −1.8, a mean of −5.6, and a variance of 1.8.

Based on the scale of both the response and the explanatory variables, we use vague priors on the fixed effects β and residual variance σ2 that accommodate a wide range of reasonable mean values. We define these priors as β ~ N9(μ, Σ) and σ2 ~InvGamma(0.01, 0.01), with μ=(5.5,08) and Σ a diagonal matrix with diagonal elements (100,10×18). For HAA-Br and THM-Br, we observe weak evidence for M1 versus M2 (B̂21 = 0.66 and B̂21 = 0.31, respectively) and strong evidence for M1 versus M0 (B̂10 > 100 for both HAA-Br and THM-Br). For TOX, we observe weak evidence for M2 versus M1 (B̂21 = 1.5) and strong evidence for M2 versus M0 (B̂20 > 100). We fit M2 for each DBP using MCMC methods and conduct inference on 40,000 samples after discarding an equal number as a burn-in. We plot the predicted mean response for a 30–35 year old male who has graduated college and has abstained for 2–3 days (Figure 4). One can see that the intercepts appear to vary for all 3 DBPs but the slopes only appear to differ for the TOX exposure. A step-wise RLRT approach rejects H0 (M0) versus the alternative H1 (M1) and fails to reject H1 versus the alternative H2 (M2) for all three DBP exposures. With the exception of the TOX exposure, this assessment agrees with our Bayesian approach.

Figure 4.

Figure 4

Predicted means of transformed % normal sperm, M2

Based on M1, both HAA-Br and THM-Br have posterior distributions centered near 0, indicating little association between these DPB’s and percent normal sperm. Based on M2, the posterior distribution of TOX tends to be centered below 0 for Galveston but near 0 for Raleigh and Memphis. Hence, increasing TOX exposure may be associated with decreasing values of percent normal sperm among patients in Galveston.

6. Discussion

We recommend our approach as a simple and efficient method for testing random effects in the linear mixed model. Our approach avoids issues with testing on the boundary of the parameter space, uses low-dimensional approximations to the Bayes factor, and incorporates default priors on the random effects. The scaling of the random effects to the residual variance makes the log N(log(0.3) × 1, 2 × I) and N(log(0.3) × 1, 2 × I) distributions reasonable default priors for λk(1) and λk(2), respectively. Simulations suggest that these priors have good small sample properties and consistency in large samples. They also have good frequentist properties with respect to Type I error and power. Incorporating reasonable default priors on the fixed effects, our method can be used for comparing a large class of random effects models with varying fixed and random effects.

Alternative procedures for allowing default priors for model selection via Bayes factors are discussed by Berger and Pericchi (1996). These include the authors’ proposed intrinsic Bayes factors, the Schwarz approximation (Schwarz, 1978), and the methods of Jeffreys (1961) and Smith and Spiegelhalter (1980). Gelman (2006) discuss various approaches to default priors specifically for variance components. Common approaches include the uniform prior (e.g. Gelman, 2007), the half-t family of prior distributions, and the inverse-gamma distribution (Spiegelhalter et al., 2003). These prior distributions can encounter difficulties when the variance components are close to 0. Other discussions of selecting default priors on variance components include Natarajan and Kass (2000), Browne and Draper (2006), and Kass and Natarajan (2006).

Supplementary Material

Supplementary

Acknowledgments

This work was completed while the first author was a graduate student at the University of North Carolina at Chapel Hill. We would like to acknowledge GlaxoSmithKline for generously providing data from the clinical trial involving bipolar disorder. We would also like to acknowledge Dr. Andrew F. Olshan for providing data from the Healthy Men Study, which was funded by the U.S. Environmental Protection Agency (R-82932701) and the American Water Works Association Research Foundation (CR825625-01, CR827268-01, CR828216-01). This research was also supported by the U.S. Environmental Protection Agency (R-83184301-0) and the National Institute of Environmental Health Sciences (T32ES007018, P30ES10126).

Footnotes

Supplementary Materials

Web Appendices referenced in the text are available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.

References

  1. Bartlett MS. Comment on “A Statistical Paradox” by D. V. Lindley. Biometrika. 1957;44:533–534. [Google Scholar]
  2. Berger JO, Pericchi LR. The intrinsic Bayes factor for model selection and prediction. Journal of the American Statistical Association. 1996;91:109–122. [Google Scholar]
  3. Browne WJ, Draper D. A comparison of Bayesian and likelihood-based methods for fitting multilevel models (with discussion) Bayesian Analysis. 2006;1:473–514. [Google Scholar]
  4. Cai B, Dunson DB. Bayesian covariance selection in generalized mixed models. Biometrics. 2006;62:446–457. doi: 10.1111/j.1541-0420.2005.00499.x. [DOI] [PubMed] [Google Scholar]
  5. Calabrese JR, Bowden CL, Sachs G, Yatham LN, Behnke K. A placebo-controlled 18-month trial of lamotrigine and lithium maintenance treatment in recently depressed patients with bipolar I disorder. Journal of Clinical Psychiatry. 2003;64:1013–1024. doi: 10.4088/jcp.v64n0906. [DOI] [PubMed] [Google Scholar]
  6. Chen Z, Dunson DB. Random effects selection in linear mixed models. Biometrics. 2003;59:762–769. doi: 10.1111/j.0006-341x.2003.00089.x. [DOI] [PubMed] [Google Scholar]
  7. Commenges D, Jacqmin-Gadda H. Generalized score test of homogeneity based on correlated random effects models. Journal of the Royal Statistical Society, Series B. 1997;59:157–171. [Google Scholar]
  8. Crainiceanu CM, Ruppert D. Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society, Series B. 2004;66:165–185. [Google Scholar]
  9. Erkanli A. Laplace approximations for posterior expectations when the mode occurs at the boundary of the parameter space. Journal of the American Statistical Association. 1994;89:250–258. [Google Scholar]
  10. Feng Z, McCulloch CE. Statistical inference using maximum likelihood estimation and the generalized likelihood ratio when the true parameter lies on the boundary of the parameter space. Statistics & Probability Letters. 1992;11:325–332. [Google Scholar]
  11. Gelman A. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) Bayesian Analysis. 2006;1:515–533. [Google Scholar]
  12. Gelman A. Running WinBugs and OpenBugs from R. 2007 Available at www.stat.columbia.edu/~gelman/bugsR/.
  13. Gelman A. Scaling regression inputs by dividing by two standard deviations. Statistics in Medicine. 2008;27:2865–2873. doi: 10.1002/sim.3107. [DOI] [PubMed] [Google Scholar]
  14. Hsiao CK. Approximate Bayes factors when a mode occurs on the boundary. Journal of the American Statistical Association. 1997;92:656–663. [Google Scholar]
  15. Jeffreys H. Theory of Probability. 3rd edition. Oxford, U.K.: Oxford University Press; 1961. [Google Scholar]
  16. Kass RE, Natarajan R. A default conjugate prior for variance components in generalized linear mixed models (comment on article by Browne and Draper) Bayesian Analysis. 2006;1:535–542. [Google Scholar]
  17. Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association. 1995;90:773–795. [Google Scholar]
  18. Kinney SK, Dunson DB. Fixed and random effects selection in linear and logistic models. Biometrics. 2008;63:690–698. doi: 10.1111/j.1541-0420.2007.00771.x. [DOI] [PubMed] [Google Scholar]
  19. Kuonen D. Numerical integration in S-plus or R: A survey. Journal of Statistical Software. 2003;8:1–14. [Google Scholar]
  20. Laird N, Ware J. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
  21. Lin X. Variance components testing in generalised linear models with random effects. Biometrika. 1997;84:309–326. [Google Scholar]
  22. Molenberghs G, Verbeke G. Likelihood ratio, score, and Wald tests in a constrained parameter space. The American Statistician. 2007;61:22–27. [Google Scholar]
  23. Natarajan R, Kass RE. Reference Bayesian methods for generalized linear mixed models. Journal of the American Statistical Association. 2000;95:227–237. [Google Scholar]
  24. Nelder JA, Mead R. A simplex algorithm for function minimization. Computer Journal. 1965;7:308–313. [Google Scholar]
  25. Pauler DK, Wakefield JC, Kass RE. Bayes factors and approximations for variance component models. Journal of the American Statistical Association. 1999;94:1242–1253. [Google Scholar]
  26. Schwarz G. Estimating the dimension of a model. The Annals of Statistics. 1978;6:461–464. [Google Scholar]
  27. Self SG, Liang KY. Asymptotic properties of maximum likelihood estimators and the likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association. 1987;82:605–610. [Google Scholar]
  28. Shapiro A. Towards a unified theory of inequality constrained testing in multivariate analysis. International Statistical Review. 1988;56:49–62. [Google Scholar]
  29. Silvapulle MJ. Robust Wald-type tests of one-sided hypotheses in the linear model. Journal of the American Statistical Association. 1992;87:156–161. [Google Scholar]
  30. Sinharay S, Stern HS. Bayesian Methods with Applications to Science, Policy and Official Statistics. ISBA 2000 Proceedings; 2001. Bayes factors for variance component models in generalized mixed models; pp. 507–516. [Google Scholar]
  31. Smith AFM, Spiegelhalter DJ. Bayes factors and choice criteria for linear models. Journal of the Royal Statistical Society, Series B. 1980;42:213–220. [Google Scholar]
  32. Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B. 2002;64:583–640. [Google Scholar]
  33. Spiegelhalter DJ, Thomas A, Best NG, Gilks WR, Lunn D. WinBUGS User Manual, Version 1.4. 2003 Available at www.mrc-bsu.cam.ac.uk/bugs.
  34. Stram DO, Lee JW. Variance components testing in the longitudinal mixed effects model. Biometrics. 1994;50:1171–1177. [PubMed] [Google Scholar]
  35. Tierney L, Kadane JB. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistician. 1986;81:82–86. [Google Scholar]
  36. Verbeke G, Molenberghs G. The use of score tests for inference on variance components. Biometrics. 2003;59:254–262. doi: 10.1111/1541-0420.00032. [DOI] [PubMed] [Google Scholar]
  37. Wasserman L. Bayesian model selection and model averaging. Journal of Mathematical Psychology. 2000;44:92–107. doi: 10.1006/jmps.1999.1278. [DOI] [PubMed] [Google Scholar]
  38. Zhang D, Lin X. Variance component testing in generalized linear mixed models for longitudinal/clustered data and other related topics. Model Uncertainty in Latent Variable and Random Effects Models. 2008 (to appear). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary

RESOURCES