Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Apr 19.
Published in final edited form as: Am Stat. 2011;65(4):274–282. doi: 10.1198/tas.2011.11077

An Overview of Current Software Procedures for Fitting Linear Mixed Models

Brady T West 1, Andrzej T Galecki 2
PMCID: PMC3630376  NIHMSID: NIHMS451201  PMID: 23606752

Abstract

At present, there are many software procedures available enabling statisticians to fit linear mixed models (LMMs) to continuous dependent variables in clustered or longitudinal data sets. LMMs are flexible tools for analyzing relationships among variables in these types of data sets, in that a variety of covariance structures can be used depending on the subject matter under study. The explicit random effects in LMMs allow analysts to make inferences about the variability between clusters or subjects in larger hypothetical populations, and examine cluster- or subject-level variables that explain portions of this variability. These models can also be used to analyze longitudinal or clustered data sets with data that are missing at random (MAR), and can accommodate time-varying covariates in longitudinal data sets. While the software procedures currently available have many features in common, more specific analytic aspects of fitting LMMs (e.g., crossed random effects, appropriate hypothesis testing for variance components, diagnostics, incorporating sampling weights) may only be available in selected software procedures. With this article, we aim to perform a comprehensive and up-to-date comparison of the current capabilities of software procedures for fitting LMMs, and provide statisticians with a guide for selecting a software procedure appropriate for their analytic goals.

Keywords: Models for Clustered Data, Longitudinal Data Analysis, Covariance Structures, Statistical Software

1. INTRODUCTION

Linear mixed models (LMMs) are flexible analytic tools for modeling correlated continuous data. Correlations among values on continuous dependent variables may arise from repeated measurements collected on study subjects, or from subjects being clustered in a way that would introduce similar values on the measures of interest. These types of dependent variables arise in many different contexts from many different fields, where study designs may not result in balanced sets of continuous measures (either due to missing data or unbalanced designs). Due to their ability to accommodate these important features of continuous data collected in many studies, these models have become extremely popular tools among data analysts in a variety of fields.

Because of this increased popularity, recent years have seen a proliferation of software procedures capable of fitting these types of models. West, Welch and Galecki (2006) provide a brief history of some of these developments. In the past year alone, new versions of software procedures for fitting LMMs have been released in the SAS, SPSS, R, S+, Mplus, Stata, MLwiN, and HLM software packages, incorporating faster algorithms for fitting the models and options for accommodating some of the many components that these models may have (e.g., subgroup-specific variance-covariance structures for errors). With this article, we aim to provide readers with a current overview of the software procedures that are available for fitting these models, including comparisons of their features and their abilities to accommodate the many analytic aspects that accompany such a broad class of models.

We do not aim to provide a detailed overview of the theory or many possible applications of LMMs in this article. Readers interested in the theory underlying these models can turn to comprehensive texts by Demidenko (2004), Diggle et al. (2002), Fitzmaurice et al. (2004), Goldstein (2010), Jiang (2010), McCulloch, Searle, and Neuhaus (2008), Searle, Casella, and McCulloch (1992), or Verbeke and Molenberghs (2000). Readers more interested in applications of these models in practice using statistical software can turn to Brown and Prescott (2006), Faraway (2005), Gelman and Hill (2006), Hox (2010), Littell et al. (2006), Pinheiro and Bates (2009), Rabe-Hesketh and Skrondal (2008), Raudenbush and Bryk (2002), Singer and Willett (2003), Twisk (2006), Verbeke and Molenberghs (1997), West (2009), or West, Welch, and Galecki (2006). We do assume that readers have basic familiarity with essential LMM concepts, including alternative model specification, distributional assumptions, alternative estimation techniques, conceptual differences between fixed and random effects, and alternative covariance structures.

With this article, we aim to identify capabilities of currently available software procedures for fitting LMMs that are both shared in common and also less commonly implemented across the procedures. We do not aim to declare certain software procedures as “better” than others, but only to present up-to-date, user-friendly comparisons of the procedures based on features that we feel are important for fitting these models. Section 2 compares currently available software procedures in terms of these features, focusing on various analytic aspects that will be of interest to statisticians working with these models. Section 3 summarizes our comparisons with a concluding discussion.

2. COMPARISONS OF SOFTWARE PROCEDURES

In this section, we introduce important analytic aspects of fitting LMMs that statisticians will need to consider carefully when selecting a software procedure for their analysis. In general, modern software procedures currently available for fitting LMMs share many common analytic capabilities, and we discuss these briefly below. Our focus in this review is on key analytic aspects of fitting LMMs that are not yet widely implemented across the available software procedures, and we present comparisons of the available software1 in terms of current implementation of these aspects in Tables 1 through 5. More details on these different aspects can be found in West, Welch, and Galecki (2006).

Table 1.

Comparisons of Model Specification Aspects

Software Procedure (Version) Graphical Multilevel Specification Crossed Random Effects Multiple Outcomes Choices for Co-variance Matrix of Random Effects (D) Group- Specific D Matrices Group-Specific Co-variance Matrices for Model Errors (R)
HLM (7)
MIXREG* (1.2)
MLwiN (2.22)
Mplus (6.1) ** 1
R (2.12.2): lme()
R (2.12.2): lmer()
R (2.12.2): hglm() 2
SAS (9.2): PROC MIXED 3
SAS (9.2): PROC GLIMMIX
SAS (9.2): PROC HPMIXED
SPSS (20): MIXED / GENLINMIXED
Stata (12): gllamm 4
Stata (12): xtmixed 5
Statistica (10)
SYSTAT (13)
WinBUGS (1.4.3)
1

Mplus provides group-specific covariance matrices by multiple group analysis via the KNOWNCLASS option and TYPE=MIXTURE. When all classes are known, this is identical to multiple group analysis. Heterogeneity in Level-1 variances can also be accommodated using multilevel regression mixture models; see Muthen and Asparouhov (2009a, 2009b) for additional details.

2

The HGLMfit()function in the R package HGLMMM can be used to fit models with crossed random effects using the same estimation methods as hglm(). See Ronnegard et al. (2011) for more details on the hglm package in R, which offers the flexibility of fitting generalized linear mixed models with different families of distributions for the random effects (aside from Gaussian).

3

See Galecki (1994) for computational details.

4

In general, software like the gllamm procedure that has been designed for fitting models with nested random effects only can be “tricked” into fitting models with crossed random effects: see http://www.stata.com/statalist/archive/2004-04/msg00762.html for details. We advise analysts to use procedures like xtmixed that have been programmed to handle crossed random effects appropriately so as to minimize possible programming errors.

5

From Stata: The -xtmixed- command can accommodate group-specific D (random-effects) matrices through repetitive independent-panel specifications. As a simpler example, consider a two-level random-coefficient model for outcome variable -y- with a random coefficient for -z-. Suppose we want the random-effects covariance structure, the covariance matrix of random intercept and random coefficient for -z-, to be different between various groups, say boys and girls. This can be done as follows:

. xtmixed y z <…> || id: boy boyXz, nocons cov(un) || id: girl girlXz, nocons cov(un)

where the -id- variable is the second-level identifier; -boy- and -girl- are dummy variables for boys and girls, respectively; and -boyXz- and -girlXz- contain gender-specific interactions with covariate -z-.

**

Crossed random effects will be available in Version 7 of Mplus.

Aspect 1: Data Organization

Most current software procedures (e.g., PROC MIXED in SAS/STAT) require a clustered or longitudinal data set to be in a “long” or “vertical” format, with multiple rows per cluster (or subject in a longitudinal study). Nearly all of the statistical software packages mentioned in this review will accept as input raw text files in a delimited format (where rows represent cases, columns represent variables, and data for different variables are separated by a character, such as a comma or a blank space), or data files in a format native to a specific package (e.g., .dta files in Stata). However, input data files in a “wide” format (which is common in longitudinal studies, where different variables contain repeated measures) will need to be restructured to the vertical format required by procedures that fit LMMs. Most of the “general-purpose” packages (e.g., SPSS, Stata, SYSTAT) will provide users with data management tools allowing for easy restructuring of “wide” or “horizontal” data sets into the necessary vertical structure for an LMM analysis (e.g., the VARSTOCASES command in SPSS, the reshape command in Stata, or the WRAP command in SYSTAT). Users of software procedures specifically dedicated to LMM analyses (e.g., HLM) will need to perform this data management in another package prior to analysis. Some software procedures (e.g., HLM) require the user to input multiple data sets in the “long” form, with each data set corresponding to a specific level of a hierarchical study design (e.g., a student-level data set with both a school ID and student ID, and a school-level data set).

Aspect 2: Model Specification

Current software procedures for fitting LMMs offer a variety of ways for users to specify the different components in a model of interest. Some software procedures (e.g., HLM, MLwiN) offer users a graphical interface for specifying models in a multilevel (or hierarchical) manner, while others rely on syntax (e.g., SAS PROC MIXED) or possibly selection of choices from a menu (e.g., SPSS MIXED) for specifications more in the spirit of a “single equation” for the desired model. In general, in a given software package, the specification of design matrices for fixed and random effects is similar to specifications used for classical linear models. However, different software procedures have different syntax or different options allowing users to specify alternative covariance structures in these models, and we aim to provide some clarity on these differences below. Table 1 compares currently available software procedures in terms of different model specification aspects.

Statisticians working with subject matter where heterogeneous error covariance structures are expected between different groups of analytic units will be attracted to procedures that are capable of allowing heterogeneity in the error covariance matrix across different groups (e.g., PROC MIXED in SAS, GENLINMIXED in SPSS Statistics, xtmixed in Stata, gllamm in Stata, hglm() in R, etc.). Table 1 shows that most software procedures are capable of fitting models with crossed random effects, but important implementation holes still exist with regard to models allowing for different variance-covariance structures for different groups in a given data set, which is a very important aspect of fitting LMMs. We also see in Table 1 that the vast majority of the available software procedures require users to specify models using syntax, where the syntax defines a “single equation” form of the model. Only the HLM and MLwiN packages offer users the convenience of specifying models graphically in a multilevel structure, using point-and-click interfaces. Users of syntax-driven procedures need to take extra care in making sure that the model of interest has been correctly specified.

Aspect 3: Estimation Methods and Likelihood Optimization

Most current software procedures capable of fitting LMMs provide users with a choice of residual maximum likelihood (REML) or maximum likelihood (ML) estimation, depending on the hypothesis test of interest. Software procedures may vary in terms of methods used to derive starting values for estimates of covariance parameters, and some software procedures (e.g., SAS PROC MIXED) allow the user to specify starting values for the covariance parameter estimates. Most current software procedures also have default methods for optimizing likelihood functions, and various combinations of these methods may be used to arrive at the final estimates. Current computing power generally makes differences between these methods trivial for models with nested random effects. Models with crossed random effects require more advanced methods that are not yet widely programmed into LMM software procedures. Table 2 compares the currently available software procedures in terms of these estimation aspects.

Table 2.

Comparisons of Estimation and Likelihood Optimization Aspects

Software Procedure (Version) Alternative Algorithms to Initiate Parameter Estimation Correct Handling of Survey Weights Constraints on Variance- Covariance Parameters Sparse Matrices1 for Computational Speed
HLM (7)
MIXREG (1.2)
MLwiN (2.22)
Mplus(6.1)
R (2.12.2): lme() 2
R (2.12.2): lmer() 2
R (2.12.2): hglm()
SAS (9.2): PROC MIXED
SAS (9.2): PROC GLIMMIX 3
SAS (9.2): PROC HPMIXED
SPSS (20): MIXED / GENLINMIXED
Stata (12): gllamm
Stata (12): xtmixed
Statistica (10)
SYSTAT (13)
WinBUGS (1.4.3) 4
1

See Section 5 in the paper http://cran.r-project.org/web/packages/lme4/vignettes/Theory.pdf for a primer.

2

Log-Cholesky decomposition to assure variance-covariance matrices remain positive definite.

3

Modeling the variance with structures based on the Cholesky root, as well as constructing and solving the mixed model equation using the Cholesky root of the covariance matrix of random effects.

4

In purely model-based methods, as implemented via the MCMC-based Gibbs Sampler in the WinBUGS software, inclusion of auxiliary variables used to define survey weights in LMMs is generally recommended; see Gelman (2007) for a discussion of the issues with this approach.

Occasionally, analysts wish to fit LMMs to survey data sets collected from samples with complex designs (see Heeringa et al, 2010, Chapter 12). Complex sample designs are generally characterized by division of the population into strata, multi-stage selection of clusters of individuals from within the strata, and unequal probabilities of selection for both clusters and the ultimate individuals sampled. These unequal probabilities of selection generally lead to the construction of sampling weights for individuals, which ensure unbiased estimation of descriptive parameters when incorporated into an analysis. These weights might be further adjusted for survey nonresponse and calibrated to known population totals. Traditionally, analysts might consider a design-based approach to incorporating these complex sampling features when estimating regression models (Heeringa et al., 2010). More recently, statisticians have started to explore model-based approaches to analyzing these data, using LMMs to incorporate fixed effects of sampling strata and random effects of sampled clusters.

The primary difficulty with the development of model-based approaches to analyzing these data has been choosing appropriate methods for incorporating the sampling weights (see Gelman, 2007 for a summary of the issues). Pfeffermann et al. (1998), Asparouhov and Muthen (2006), and Rabe-Hesketh and Skrondal (2006) have developed theory for estimating multilevel models in a way that incorporates the survey weights, and Rabe-Hesketh and Skrondal (2006), Carle (2009) and Heeringa et al. (2010, Chapter 12) have presented applications using current software procedures, but this continues to be an active area of statistical research. Software procedures capable of fitting LMMs are at various stages of implementing the approaches that have been proposed in the literature thus far for incorporating complex design features, and analysts need to consider this when fitting LMMs to complex sample survey data. Analysts interested in fitting LMMs to data collected from complex sample surveys will be attracted to procedures that are capable of correctly incorporating the survey weights into the estimation procedures (HLM, MLwiN, Mplus, xtmixed, and gllamm), consistent with the present literature in this area.

Presently, only selected software procedures are relying on estimation methods using sparse matrices, including ASREML, HLM, MLwiN, the lmer()function in R, GENLINMIXED in SPSS Statistics, and PROC HPMIXED in SAS. These methods greatly improve computational speed, especially for larger data sets. Slightly more procedures are using alternative methods to invoke parameter estimation, which can result in more stability of parameter estimates (see West et al., 2006, Section 2.5). Given modern computing power, these alternative aspects will only truly impact models being fitted to very large data sets. Finally, most software procedures have built in constraints on covariance parameters during estimation routines, preventing negative estimates of variance components. Upon encountering negative variance estimates, some estimation routines working under specific constraints will simply display estimates of 0 for the variance components (e.g., PROC MIXED in SAS; this constraint can be relaxed by using the NOBOUND option on the PROC MIXED line). The final estimates of variance components displayed by the software procedures (and corresponding “notes” or warning messages generated by executing the procedures) may vary depending on the constraints being used.

Aspect 4: Hypothesis Testing

Most software procedures will automatically compute test statistics, approximate degrees of freedom for the test statistics, and corresponding p-values for fixed effects (to test null hypotheses that the fixed effects are equal to zero). Procedures will vary in terms of the approximations used (or available for use) for the degrees of freedom. For instance, Satterthwaite and Kenward-Roger DF approximations (Kenward and Roger, 1997) are computed by PROC MIXED and PROC GLIMMIX in SAS. Some software procedures (e.g., the mcmcsamp() function in the contributed R package lme4) enable the user to apply Bayesian methods (i.e., generating draws from posterior distributions based on a fitted model) post-estimation for making inferences. Bayesian approaches to making inferences for fixed effects avoid the complication of selecting a degrees of freedom approximation for the test statistics.

When null hypotheses for covariance parameters do not define the parameters to be on the boundary of a parameter space (e.g., a null hypothesis that the covariance between two errors is equal to zero), performing asymptotic likelihood ratios tests for the parameters is straightforward using modern statistical software. Computational difficulties arise when testing whether variance components are equal to 0 (which places the variance on the boundary of its parameter space). Some software procedures will automatically present some form of an asymptotic Wald test (e.g., the estimated variance component divided by its estimated standard error) for covariance parameters, which has several shortcomings (see Berkhof and Snijders, 2001). Other procedures might report confidence intervals for covariance parameters (see Bottai and Orsini, 2004, for a Stata procedure). The HLM software uses a chi-square test for variance components explained by Raudenbush and Bryk (2002, p. 63–64). From a frequentist perspective, the current literature advocates the reference of likelihood ratio test statistics for variance components to distributions defined (asymptotically) by mixtures of chi-square distributions [see Verbeke and Molenberghs (2000) and Zhang and Lin (2008) for nice discussions of this issue].

The exact null distribution of the likelihood ratio test statistic for a single variance component under more general conditions (including small samples) has been defined (Crainiceanu and Ruppert, 2004), and Fabian Scheipl has written software in R implementing exact likelihood ratio tests based on simulations from this distribution (the exactLRT() function in the RLRsim package2). Appropriate null distributions of likelihood ratio test statistics for multiple covariance parameters have not been derived to date; classical likelihood ratio tests comparing nested models with multiple variance components constrained to be 0 in the reduced model should be considered conservative. The xtmixed command in Stata, for example, makes explicit note of this when users fit models with multiple random effects. Bayesian methods based on draws from posterior distributions defined by a given model are generally considered more appropriate for making inferences about covariance parameters; however, readily available software procedures for this type of inference (e.g., the pvals.func() function from the languageR package in R) are not as common. Table 3 compares the currently available software procedures in terms of hypothesis testing capabilities.

Table 3.

Comparisons of Hypothesis Testing Aspects

Software Procedure (Version) Alternative Methods for Computing Approximate DF Bayesian Inference Methods Appropriate Hypothesis Tests for Variance Components
HLM (7)
MIXREG (1.2)
MLwiN (2.22)
Mplus (6.1)
R (2.12.2): lme() 1 2
R (2.12.2): lmer() 1 2
R (2.12.2): hglm()
SAS (9.2): PROC MIXED
SAS (9.2): PROC GLIMMIX 3
SAS (9.2): PROC HPMIXED
SPSS (20): MIXED / GENLINMIXED
Stata (12): gllamm
Stata (12): xtmixed 4
Statistica (10)
SYSTAT (13)
WinBUGS (1.4.3)
1

With mcmcsamp() in the lme4 package, or pvals.func() in the languageR package.

2

With exactLRT() in the RLRsim package.

3

A mixture of central Chi-square distributions is used to compute the p-values for the likelihood ratio test for several recognized special cases.

4

For tests of single variance components.

Table 3 indicates that appropriate inferential methods for the variance components associated with random effects have (to date) not been widely implemented. While post-estimation likelihood ratio tests for variance components are fairly easy to program post-estimation (see West et al., 2006), the software procedures currently available rarely automate this kind of test. SAS PROC GLIMMIX recognizes several special cases and uses a mixture of central Chi-square distributions to compute p-values for likelihood ratio tests. The xtmixed command in the Stata software currently automates this test correctly for null hypotheses that single variance components are equal to zero; when multiple random effects are included in a model, Stata uses a classical likelihood ratio test, and reminds users that results of the test should be considered conservative.

Bayesian methods for making inferences about variance components are also becoming more widely advocated, but these methods have yet to be widely implemented outside of specialized post-estimation routines available in selected software packages [e.g., the mcmcsamp() function in the R software, or available Markov Chain Monte Carlo (MCMC) procedures for inference in MLwiN]. We also note that the HLM software implements a relatively unique but fairly straightforward automated test of null hypotheses for single variance components, which only relies on the ability to compute ordinary least squares (OLS) estimates of the random coefficients associated with each cluster or subject (in a longitudinal study). For more information on this chi-square test, see Raudenbush and Bryk (2002, p. 63–64).

Aspect 5: Model Diagnostics

Assessment of model diagnostics is more complicated in the case of the LMM as opposed to a simple linear regression model, given that there are more underlying assumptions (see Fung et al., 2002, and Zewotir and Galpin, 2005). Most LMM software can readily compute empirical best linear unbiased predictors (EBLUPs) for random effects and model-based residuals for assessing simple model diagnostics. Recent work by Schabenberger (2004) has allowed SAS users to examine influence diagnostics when using PROC MIXED, but diagnostic procedures still tend to be fairly limited in most software procedures. See the case studies in West et al. (2006) for examples of simpler model diagnostics that can be implemented in most current software packages, including plots of residuals and plots of EBLUPs. The ability of a given software procedure to generate important diagnostic statistics for LMMs is an important consideration for model assessment. Table 4 compares the currently available software procedures in terms of model diagnostic capabilities.

Table 4.

Comparisons in terms of model diagnostic aspects

Software Procedure (Version) Influence Statistics Diagnostic Plots
HLM (7)
MIXREG (1.2)
MLwiN (2.22)
Mplus (6.1)
R (2.12.2): lme()
R (2.12.2): lmer()
R (2.12.2): hglm()
SAS (9.2): PROC MIXED
SAS (9.2): PROC GLIMMIX
SAS (9.2): PROC HPMIXED
SPSS (20): MIXED / GENLINMIXED
Stata (12): gllamm
Stata (12): xtmixed
Statistica (10)
SYSTAT (13)
WinBUGS (1.4.3)

Table 4 indicates that tools for model diagnostics are also currently lacking in the available LMM software. Thorough assessment of model diagnostics for LMMs is essential for making sure that inferences are not being heavily influenced by unusual observations or unusual clusters (or subjects, in the case of a longitudinal study), and there is clearly room for software development in this area. See West et al. (2006, Chapter 3) for examples of the diagnostic capabilities within SAS PROC MIXED.

Aspect 6: Statistical Output

LMM analyses can produce a great deal of output, and alternative software procedures vary in terms of the amount of output produced by default. Basic output shared in common by the available software procedures includes parameter estimates, estimates of asymptotic standard errors for the parameter estimates, simple test statistics, and information criteria based on the maximized likelihood function. However, many alternative tests and model diagnostics are possible, and software procedures vary in terms of this default output. An important consideration for statisticians is what output is produced by default, and what output needs to be specially requested. Table 5 compares the currently available software procedures in terms of selected aspects of statistical output.

Table 5.

Comparisons in terms of selected statistical output

Software Procedure (Version) Estimates of Marginal Variance- Covariance Matrices Adjustments for Multiple Comparisons Graphs of Marginal Fitted Values
HLM (7)
MIXREG (1.2)
MLwiN (2.22)
Mplus (6.1)
R (2.12.2): lme() 1
R (2.12.2): lmer() 1
R (2.12.2): hglm()
SAS (9.2): PROC MIXED
SAS (9.2): PROC GLIMMIX
SAS (9.2): PROC HPMIXED
SPSS (20): MIXED / GENLINMIXED
Stata (12): gllamm
Stata (12): xtmixed 2 3
Statistica (10)
SYSTAT (13)
WinBUGS (1.4.3)
1

Possible when using the multcomp package for lme or lmer objects.

2

Estimates of marginal variance-covariance matrices are available automatically when using the unofficial user-written postestimation command -xtmixed_corr- after -xtmixed-. Although this is not an official part of Stata 12, the -xtmixed_corr- command is written by one of StataCorp's developers and is available for free download from within Stata.

3

When using the post-estimation commands pwcompare or contrast.

Table 5 suggests that additional software developments enabling more advanced visualization of model fit (through automated plotting of estimated marginal means and measures of uncertainty associated with these estimates) and adjustments for post-hoc multiple comparisons of means between different levels of categorical fixed factors are needed in general. Computing marginal fitted values and their standard errors is fairly straightforward in most software procedures (e.g., using the/EMMEANS subcommand in SPSS MIXED), enabling users to make their own custom plots if automated procedures are not available.

The display of estimated marginal variance-covariance matrices for blocks of observations on the dependent variable in a clustered or longitudinal data set can be a useful diagnostic tool (West et al., 2006, Section 6.11), but this option is not widely available in current LMM software. Users can also compute the marginal variance-covariance matrix implied by a fitted model using any matrix computing language (see West et al., 2006, Chapter 6), but automating this computation would be a nice feature to include in the available software.

3. DISCUSSION

The objective of this review article was to provide an overview of the more advanced features of current software procedures capable of fitting LMMs, and to compare their relative strengths when considering all of the analytic complexities that can arise when fitting these models. We did not aim to highlight any one procedure as being the “best” procedure, and we hope that the information in this paper will provide practicing statisticians with a useful guide for selecting the best procedure for a given analysis. We have attempted to provide a concise summary of current similarities and differences between the many different software procedures capable of fitting LMMs in one article; the Centre for Multilevel Modeling at the University of Bristol provides a well-designed web site (http://www.bristol.ac.uk/cmm/) with detailed reviews of many different software packages including procedures for fitting LMMs.

In our comparisons, we emphasize the less common components that have already been programmed for the use of analysts and are essentially “automatic,” rather than less common components that would require additional programming on the part of the user. For example, generation of standard diagnostic plots for LMMs is fairly straightforward when using saved predicted values and residuals from the various procedures. Not all procedures have programmed options for automatically generating diagnostic plots, and we highlight procedures with this capability in Table 4. Evidence of a missing feature for a particular software procedure is not entirely detrimental, as our focus here is on more specialized features that we believe to be important for analysts and reflect state-of-the-art developments in the literature. As indicated in Section 2, all of the analysis aspects of fitting LMMs discussed in this paper can be programmed with enough effort; we focus on components that have already been implemented and are readily available for analysts.

We believe that the identified strengths of the currently available software procedures should be used by statisticians to select procedures that provide the best fit for a given line of applied work. Of course, “blank” cells in our comparison tables do not indicate that certain analytical procedures are impossible in a given software package; ultimately, all of these procedures can be programmed (with enough effort) in any statistical software environment. Although some of our “availability” indicators in Tables 15 may soon be incorrect given rapid changes in the development and capabilities of modern statistical software procedures, we feel that our review provides a fairly clear indication of where user-friendly developments are needed in the software procedures that are currently available for fitting LMMs. We will attempt to provide updates on availability of these features at the web site http://www.umich.edu/~bwest/almmussp.html.

One analysis dimension that we have not focused on from the user point-of-view is whether these models can be specified using point-and-click graphical user interfaces (GUIs), or whether package-specific syntax is needed to fit the models. For example, fitting these models in SAS, R, gllamm (in Stata), and Mplus requires users to write syntax for specifying the models and selecting desired options. Users of SPSS, xtmixed (in Stata), HLM, MLwiN, SYSTAT, Statistica, MIXREG, and GENSTAT may prefer the convenience of point-and-click interfaces for setting up the models, although the tradeoffs between available features and ease of use should be carefully considered depending on the analysis problem.

Another important consideration in the selection of a software procedure is cost. Some of the software that has been reviewed in this article is available online and can be downloaded free-of-charge (R, WinBUGS, and MIXREG). We recommend that statisticians working on tight budgets first fit LMMs using whatever commercial software is available for a given project, and then apply more specialized procedures in freely available software (e.g., exactLRT() in R) for more specific aspects of the analysis, such as hypothesis testing. In general, we hope that this article will spur developers of statistical software to focus on continuous implementation of new published methodologies for LMM analysis, so that disparities between software procedures in terms of available capabilities will slowly cease to exist. Many user communities are forced to use a particular statistical software package (due to budgetary constraints, historical usage, etc.), and the need to use other statistical software for specific aspects of LMM fitting can be an inconvenience for inexperienced and experienced users alike.

Finally, this review article does not address software tools providing power analysis capabilities for longitudinal or clustered study designs, software procedures for fitting generalized linear mixed models (GLMMs) to non-normal outcomes in longitudinal or clustered data sets, or non-linear mixed models (NLMMs). We find the Optimal Design software (http://sitemaker.umich.edu/group-based/optimal_design_software) and the MLPowSim software (http://www.bristol.ac.uk/cmm/learning/multilevel-models/samples.html#mlpowsim; see Browne et al., 2009) to be flexible (and free) software tools for power analysis in clustered or longitudinal study designs. Gelman and Hill (2006) also provide an extremely useful reference on simulation-based methods for examining power in multilevel studies. Finally, we hope to publish a similar article addressing these topics for GLMMs and NLMMs in the near future. We note that other researchers have already begun similar comparisons of software for fitting GLMMs (see http://glmm.wikidot.com/pkg-comparison), and we hope to contribute further to this work in the near future.

Acknowledgments

The authors wish to gratefully acknowledge the helpful comments from the Editor, an Associate Editor, and two anonymous referees, along with the detailed fact-checking provided by representatives from SAS, SPSS, Stata, SYSTAT, Mplus, and MLwiN. Any inaccuracies with regard to software procedures from the other packages are not intentional and should be attributed to the authors.

Footnotes

1

We were unable to fully review the stand-alone ASREML software (Version 3; Gilmour et al., 2009) and the more general-purpose GENSTAT software (Version 14) in this article, due to cost restrictions. Both procedures use the same computational engine for fitting mixed-effects models. GENSTAT is a more general-purpose statistical computing package that enables users to fit mixed-effects models (and perform other analyses) while also managing data. ASREML is used strictly for fitting mixed-effects models to large data sets using REML and efficient computational methods. Visit http://www.vsni.co.uk/software/asreml/ for more details.

Contributor Information

Brady T. West, Email: bwest@umich.edu, Institute for Social Research, Center for Statistical Consultation and Research, University of Michigan-Ann Arbor, Ann Arbor, MI, 48109

Andrzej T. Galecki, Email: agalecki@umich.edu, Institute of Gerontology, Medical School, Department of Biostatistics, School of Public Health, University of Michigan-Ann Arbor, Ann Arbor, MI, 48109

References

  1. Asparouhov T, Muthen B. Multilevel modeling of complex survey data. ASA section on Survey Research Methods; Proceedings of the Joint Statistical Meetings; Seattle, WA. August 2006; 2006. pp. 2718–2726. [Google Scholar]
  2. Berkhof Johannes, Snijders Tom AB. Variance component testing in multilevel models. Journal of Educational and Behavioral Statistics. 2001;26(2):133–152. [Google Scholar]
  3. Bottai M, Orsini N. United Kingdom Stata Users’ Group Meetings. Stata Users Group; 2004. A new Stata command for estimating confidence intervals for the variance components of random-effects linear models. Presented at. [Google Scholar]
  4. Brown H, Prescott R. Applied Mixed Models in Medicine. 2. John Wiley and Sons; New York: 2006. [Google Scholar]
  5. Browne WJ, Lahi MG, Parker RMA. A Guide to Sample Size Calculations for Random Effect Models via Simulation and the MLPowSim Software Package. 2009 http://seis.bris.ac.uk/~frwjb/esrc/MLPOWSIMmanual.pdf.
  6. Carle AC. Fitting multilevel models in complex survey data with design weights: Recommendations. BMC Medical Research Methodology. 2009 doi: 10.1186/1471-2288-9-49. 1471-2288-9-49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Crainiceanu Ciprian M. Likelihood ratio testing for zero variance components in linear mixed models. In: Dunson DB, editor. Random effect and latent variable model selection. Chapter 1. 2008. p. 192. Springer Lecture Notes in Statistics. [Google Scholar]
  8. Crainiceanu Ciprian M, David Ruppert. Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society: Series B. 2004;66:165–185. [Google Scholar]
  9. Demidenko E. Mixed Models: Theory and Applications. Wiley-Interscience; 2004. [Google Scholar]
  10. Diggle P, Heagerty P, Liang K, Zeger S. Analysis of Longitudinal Data. 2. Oxford University Press; New York: 2002. [Google Scholar]
  11. Faraway JJ. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models. Chapman and Hall / CRC Press; London, New York: 2005. [Google Scholar]
  12. Fitzmaurice GM, Laird NM, Ware JH. Applied Longitudinal Data Analysis. John Wiley and Sons; Hoboken, NJ: 2004. [Google Scholar]
  13. Fung WK, Zhu ZY, Wei BC, He X. Influence diagnostics and outlier tests for semiparametric mixed models. Journal of the Royal Statistical Society: Series B. 2002;64(3):565–579. [Google Scholar]
  14. Galecki AT. General class of covariance structures for two or more repeated factors in longitudinal data analysis. Communications in Statistics: Theory and Methods. 1994;23(11):3105. [Google Scholar]
  15. Gelman A. Struggles with Survey Weighting and Regression Modeling. Statistical Science. 2007;22(2):153–164. [Google Scholar]
  16. Gelman A, Hill J. Data Analysis using Regression and Multilevel / Hierarchical Models. Cambridge University Press; New York: 2006. [Google Scholar]
  17. Gilmour AR, Gogel BJ, Cullis BR, Thompson R. ASReml User Guide, Release 3.0. VSN International Ltd; UK: 2009. http://www.vsni.co.uk. [Google Scholar]
  18. Goldstein H. Multilevel Statistical Models. 4. Wiley-Interscience; 2010. [Google Scholar]
  19. Greven Sonja, Crainiceanu Ciprian M, Kuchenhoff Helmut, Annette Peters. Restricted likelihood ratio testing for zero variance components in linear mixed models. Journal of Computational and Graphical Statistics. 2008;17(4):870–891. [Google Scholar]
  20. Hedeker D, Gibbons RD. MIXREG: a computer program for mixed-effects regression analysis with autocorrelated errors. Computer Methods and Programs in Biomedicine. 1996;49(3):229–252. doi: 10.1016/0169-2607(96)01723-3. [DOI] [PubMed] [Google Scholar]
  21. Heeringa SG, West BT, Berglund PA. Applied Survey Data Analysis. Chapman and Hall / CRC Press; Boca Raton, FL: 2010. [Google Scholar]
  22. Hox J. Multilevel Analysis: Techniques and Applications. 2. Routledge Academic; NY: 2010. [Google Scholar]
  23. Jiang J. Linear and Generalized Linear Mixed Models and Their Applications. Springer; NY: 2010. [Google Scholar]
  24. Kenward MG, Roger JH. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics. 1997;53(3):983–997. [PubMed] [Google Scholar]
  25. Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963. [PubMed] [Google Scholar]
  26. Littell RC, Milliken GA, Stroup WW, Wolfinger RD, Schabenberger O. SAS for Mixed Models. 2. SAS Publishing; Cary, NC: 2006. [Google Scholar]
  27. McCulloch CE, Searle SR, Neuhaus JM. Generalized, Linear, and Mixed Models. 2. Wiley-Interscience; 2008. [Google Scholar]
  28. Morrell CH. Likelihood ratio testing of variance components in the linear mixed-effects model using restricted maximum likelihood. Biometrics. 1998;54:1560. [PubMed] [Google Scholar]
  29. Muthén B, Asparouhov T. Growth mixture modeling: Analysis with non-Gaussian random effects. In: Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G, editors. Longitudinal Data Analysis. Boca Raton: Chapman & Hall/CRC Press; 2009a. pp. 143–165. [Google Scholar]
  30. Muthén B, Asparouhov T. Multilevel Regression Mixture Analysis. Journal of the Royal Statistical Society: Series A. 2009b;172(3):639–657. [Google Scholar]
  31. Pfeffermann D, Skinner CJ, Holmes DJ, Goldstein H, Rasbash J. Weighting for Unequal Selection Probabilities in Multilevel Models. Journal of the Royal Statistical Society: Series B. 1998;60(1):23–40. [Google Scholar]
  32. Pinheiro J, Bates D. Mixed-Effects Models in S and S-PLUS. 2. Springer; NY: 2009. [Google Scholar]
  33. Rabe-Hesketh S, Skrondal A. Multilevel modeling of complex survey data. Journal of the Royal Statistical Society: Series A. 2006;169:805–827. [Google Scholar]
  34. Rabe-Hesketh S, Skrondal A. Multilevel and longitudinal modeling using Stata. 2. Stata Press; College Station, TX: 2008. [Google Scholar]
  35. Raudenbush SW, Bryk AS. Hierarchical Linear Models: Applications and Data Analysis Methods. 2. Sage; Newbury Park, CA: 2002. [Google Scholar]
  36. Ronnegard L, Shen X, Alam M. The hglm Package. 2011 A Vignette downloaded from http://cran.r-project.org/web/packages/hglm/vignettes/hglm.pdf.
  37. Schabenberger O. Mixed Model Influence Diagnostics. Proceedings of the Twenty-Ninth Annual SAS Users Group International Conference; Cary, NC: SAS Institute; 2004. Paper 189-29. [Google Scholar]
  38. Searle SR, Casella G, McCulloch CE. Variance Components. John Wiley and Sons; NY: 1992. [Google Scholar]
  39. Self SG, Liang K. Asymptotical properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association. 1987;82:605. [Google Scholar]
  40. Singer JD, Willett JB. Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence. Oxford University Press; 2003. [Google Scholar]
  41. Snijders TAB, Bosker RJ. Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling. Sage Publications; Newbury Park, CA: 1999. [Google Scholar]
  42. Stram DO, Lee JW. Variance components testing in the longitudinal mixed effects model (Corr: 95V51, p. 1196) Biometrics. 1994;50:1171. [PubMed] [Google Scholar]
  43. Twisk JWR. Applied Multilevel Analysis: A Practical Guide for Medical Researchers. Cambridge University Press; 2006. [Google Scholar]
  44. Verbeke G, Molenberghs G. Linear Mixed Models in Practice: A SAS-Oriented Approach. Springer; NY: 1997. [Google Scholar]
  45. Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. Springer-Verlag; Berlin: 2000. [Google Scholar]
  46. West BT. Analyzing Longitudinal Data with the Linear Mixed Models Procedure in SPSS. Evaluation and the Health Professions. 2009;32(3):207–228. doi: 10.1177/0163278709338554. [DOI] [PubMed] [Google Scholar]
  47. West BT, Welch KB, Galecki AT, Gillespie BW. Linear Mixed Models: A Practical Guide using Statistical Software. Chapman and Hall / CRC Press; Boca Raton, FL: 2006. [Google Scholar]
  48. Zewotir T, Galpin JS. Influence diagnostics for linear mixed models. Journal of Data Science. 2005;3:153–177. [Google Scholar]
  49. Zhang Daowen, Lin Xihong. Variance component testing in generalized linear mixed models for longitudinal / clustered data and other related topics. In: Dunson DB, editor. Random effect and latent variable model selection. Chapter 2. 2008. p. 192. Springer Lecture Notes in Statistics. [Google Scholar]

RESOURCES