Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2023 Sep 24;51(11):2116–2138. doi: 10.1080/02664763.2023.2260576

Estimating linear mixed effect models with non-normal random effects through saddlepoint approximation and its application in retail pricing analytics

Hao Chen a,CONTACT, Lanshan Han a, Alvin Lim b,c
PMCID: PMC11328820  PMID: 39157268

ABSTRACT

Linear Mixed Effects (LME) models are powerful statistical tools that have been employed in many different real-world applications such as retail data analytics, marketing measurement, and medical research. Statistical inference is often conducted via maximum likelihood estimation with Normality assumptions on the random effects. Nevertheless, for many applications in the retail industry, it is often necessary to consider non-Normal distributions on the random effects when considering the unknown parameters' business interpretations. Motivated by this need, a linear mixed effects model with possibly non-Normal distribution is studied in this research. We propose a general estimating framework based on a saddlepoint approximation (SA) of the probability density function of the dependent variable, which leads to constrained nonlinear optimization problems. The classical LME model with Normality assumption can then be viewed as a special case under the proposed general SA framework. Compared with the existing approach, the proposed method enhances the real-world interpretability of the estimates with satisfactory model fits.

KEYWORDS: Mixed effects model, linear regression, constrained optimization, statistical inference, saddlepoint approximation

MATHEMATICAL SUBJECT CLASSIFICATION: 62J05

1. Introduction

In retail analytics, linear mixed effects models are prevalent statistical models to quantify the association between the dependent variable and the independent variables, especially when it is necessary to account for the possible heterogeneity among different subgroups. Consider an application in retail analytics where the goal is to establish the relationship between a product's sales volume and price using regression. The established relationship is known as the price elasticity of demand (PED), which characterizes how sales volume of a product changes with varying price [23]. It is usually assumed that PED is negative for consumer packaged goods (CPG) [6] that are sold in a grocery store, representing a common belief in the retail industry that sales volume is negatively associated with the price of a product. In this application, since the same product is sold in different stores, store becomes a natural grouping factor, for which the possible variation needs to be considered. In what follows, we present a simplified real-world case to illustrate the motivation of this research.

A sales dataset on 1 liter regular whole milk from a retail chain was available to us. The dataset consists of weekly sales price and corresponding weekly sales volume of the product from 6 different stores, and each store has data for the whole year of 2021 (53 weeks in total). Given the available data, we are interested in finding the association between volume and price, while taking the variation across the 6 different stores into account. To best illustrate the motivation, all other potential factors affecting sales such as holidays, events, seasonality and the possible temporal effect are ignored for now to focus on a simplified case. A linear mixed effects model with implementation from the R package lme4 [3] was fitted on the data with the following model specification.

yi,t=(α+αi)+(β+βi)xi,t+ϵi,t, (1)

where i is the store index, i.e. i=1,,6, and is the grouping factor for both the random intercepts and the random slopes, t is the individual week index, i.e. t=1,,53, y is the natural logarithm of the sales volume, x is the natural logarithm of the sales price. Both the random intercepts ( αi) and random slopes ( βi) are assumed to follow independent Normal distributions, and the error ϵ follows a Normal distribution as well under the canonical model specification of a LME model. After the model was fitted, the two fixed effects are as follows: α^=2.671 and β^=0.055. The 6 random slopes are β^1=0.080,β^2=0.034,β^3=0.059,β^4=0.103,β^5=0.053,β^6=0.036. If we further define the overall slope of store i as β+βi, we observe that for store 1 and store 4, the overall slopes are 0.025 and 0.048, respectively, i.e. both are above 0. However, as the real world interpretation of the overall slope of store i is the product's PED for that store, obviously the values for store 1 and store 4 are not meaningful for direct use in practice, and some ad hoc procedures are entailed to ‘correct’ the sign of the overall slopes before the modeling results are treated as reasonable.

In retail analytics, fitting hundreds or even thousands of independent models within a tight delivery timeline is common. While it's possible to investigate the causes of wrong signs manually when working with a few models, this approach is not feasible at scale. However, relying solely on classical linear mixed-effects (LME) models with a Normal assumption on the random effects can lead to unbounded overall coefficients [8]. This violates reasonable business interpretations that require coefficients to be bounded. One solution is to assume a bounded non-Normal distribution on the random effects, but this can pose significant technical challenges, as the likelihood function of an LME model cannot be derived analytically without the Normality assumption. This represents a major hurdle for maximum likelihood (ML) approaches, which typically require an analytical form of the likelihood function. To overcome this challenge, an ML approach for LME models with non-Normally distributed random effects must be developed.

To overcome this hurdle, we propose to approximate the likelihood function using the saddlepoint approximation (SA) [10], which provides an accurate point-wise approximation of a probability density function (PDF) using its moment generating function (MGF). The SA approach has been successfully used in many circumstances where an analytical form of the PDF of the random variables of interest is not readily available. However, to the best of our knowledge, it has not been used under the ML paradigm for statistical inference. In fact, the SA approach does not directly provide an analytical approximation of the likelihood function. Instead, the SA approach defines an implicit function as an approximation of the PDF of interest. To use this implicit function in the maximum likelihood paradigm, we propose a novel approach to include the defining nonlinear equations as constraints, leading to constrained nonlinear optimization problems. With the recent advances in optimization theory and algorithms, we demonstrate in this paper that the resulting optimization problems can be solved efficiently and therefore produce high quality estimations on both simulated data sets and real-world data sets under varying assumptions on the distribution of the random effects. In particular, we examine the proposed approach under assumptions of four different non-Normal distributions: (1) Uniform, (2) Laplace, (3) Gamma, and (4) Triangular.

The classical LME models have been successfully applied in many areas. There is a vast body of statistical literature covering its theoretical properties and computational implementations. The mathematical properties were carefully studied and presented by Jiang [13]. Details about the computational aspects were reported by Lindstrom and Bates [16]. In a more recent paper, Bates et al. [2] implemented lme4 – a widely used R package for fitting a linear mixed effects model. On the other hand, there have been studies on the LME model with non-Normal random effects in the literature as well. Instead of the Normal distribution, the multivariate t-distribution was studied by Pinheiro et al. [22]. Working with synthetic data, Yucel and Demirtas [29] researched the influence of non-Normal random effects on the model parameter estimation. However, the Normality assumption on random effects was still employed when estimating the unknown parameters. Moreover, Lin and Lee [15] extended the classical LME model by employing a multivariate skew-Normal assumption on the random effects. Matos et al. [17] considered multivariate t distribution with censored response, and likelihood-based approach was proposed to conduct statistical inference. Among all the pertinent literature, the following three more relevant points are worthy of further discussions.

First, Verbeke and Lesaffre [26] showed that the maximum likelihood estimators are consistent for estimating the fixed effects and variance component even when the distribution of random effects is misspecified. McCulloch and Neuhaus [19] considered the robustness to the assumed distribution on a random intercept, and argued that the misspecification of the distribution has only minor impact, where the inferences for within-group covariates are robust to the misspecification. However, it is our argument that the business interpretation on the variables of interest must be taken into account as well. In the motivating example, if, for example, a Uniform distribution was assumed on the random effects instead of the (unconstrained) Normal distribution, we then would ensure the needed negativity of the overall coefficients of the PED for store 1 and store 4, which is of great practical importance in retail analytics.

Second, a LME model can also be viewed as a special case of the latent variables model proposed by Skrondal and Rabe-Hesketh [25], where random effects are treated as latent variables. Statistical inference is then conducted by maximizing the marginal likelihood function after integrating out the latent variables. In most of the non-toy examples, methods for numerical integration such as Monte Carlo integration [12] and Gaussian-Hermite quadrature [11] are entailed to approximate the marginal likelihood. The EM algorithm is another option to iteratively maximize the likelihood. For example, it is utilized by Mattos et al. [18] to estimate nonparametric functions using smoothing splines in the context of linear mixed models for longitudinal censored data. Considering a LME model as a special case of the latent variable model conceptually can handle non-Normal random effects, since random effects are viewed as latent variables that are integrated out.

Moving forward with the idea of latent variables, Nelson et al. [20] proposed the probability integral transformation (PIT) method, which utilizes the fact that a non-Normal realization can be converted from a Normal realization using the inverse of its cumulative distribution function (CDF). Therefore, one can presumably assume any continuous distribution on the random effects as long as the inverse of its CDF exists, and inference is then conducted against the marginal likelihood by numerically integrating out the random effects with Gaussian-Hermite quadrature. Compared to the PIT approach, the proposed SA method is expected to be numerically more stable especially when sample size becomes large. A direct comparison between the proposed SA and PIT is discussed further in Section 4.4 and in Section 5, in which the motivating example was revisited and comments were made.

Third, in a more recent paper [8], the authors proposed to estimate the model parameters when random effects follow a truncated Normal distribution. Although the motivation bears similarity at first glance, the main approach and the estimating framework are very different from the SA approach. Since the exact probability density function of the responses is intractable, the theory proposed by Chen et al. [8] was established primarily based on the idea that as the number of truncated Normally distributed random effects becomes large, the distribution of the responses can be approximated by a Normal distribution. However, the distribution of the responses is approximated by the SA approach in this paper. Moreover, the approach based on the SA approximation is a general framework as mentioned before, i.e. it is applicable as long as the MGF of the distribution assumed on the random effects exists. The objective here is not to substitute for either the PIT approach [20] or the approach of Chen et al. [8], but rather to provide practitioners with more flexibility and choices when facing the need for dealing with non-Normal random effects.

The rest of the paper is organized as follows. In Section 2, we provide preliminaries regarding LME models and SA approaches. The proposed estimation method is detailed in Section 3. Some simulation results are presented in Section 4. We then revisit the real world example in Section 5. Some concluding remarks are made in Section 6.

2. Preliminaries

We first introduce the notations used in this paper. We use lower and upper case letters to represent scalars or (one dimensional) random variables, bold lower case letters to represent vectors or (multiple dimensional) random variables, and bold upper case letters to represent matrices. All vectors are assumed to be column vectors. We index vectors and matrices by superscripts and scalars by subscripts. We denote the n-dimensional real vector space as Rn, with its nonnegative orthant denoted by R+n. For a vector vRn, we denote its ith element by vi and its transpose by vT. For a matrix ARn×m, we denote its (i,j) element by aij, its transpose by AT, and its determinant by |A|. The all zero vector and all one vector of length n are denoted by 0n and 1n, respectively. The n×n identity matrix is denoted by In. We use v2 to denote the two-norm of a vector vRn, i.e. v2=vTv. Given a function g:RnR, we let g(x) be its gradient evaluated at x. We use N(μ,σ2) to represent a Normal distribution with mean μ and variance σ2 and N(μ,Σ) to represent a multivariate Normal distribution with mean vector μ and variance-covariance matrix Σ. We use the relation ∽ to indicate that a random variable follows a certain distribution.

2.1. Linear mixed effects models

Working under the classical LME model, we use =1,,g to denote the grouping factor. For each group ℓ, the dependent variable y is assumed to linearly dependent on the independent variables and the error term follows a Normal distribution with mean 0 and variance σ2 that is unknown. Let n denote the total sample size and n is the size for group ℓ such that =1gn=n. Let p denote the dimension of the design matrix, and k is the number of variables that random effects are considered. When k=0, the LME model then reduces to the ordinary linear regression model. Mathematically, the LME model is presented as follows:

y=Xβ+Zγ+ε, (2)

where

y[y,1y,n]Rn,β[β1βp]Rp,γ[γ,1γ,k]Rk,X[x,1,1x,1,2x,1,px,n,1x,n,2x,n,p]Rn×p,Z[z,1,1z,1,2z,1,kz,n,1z,n,2z,n,k]Rn×k,and ε[ε,1ε,n]Rn,

where the symbol means defined to be equal to. We often refer to βj,j=1,,p, as the fixed effect coefficients, γ,j,=1,,g;j=1,,k, the random effect coefficients, and εN(0n,σ2In), the error vector with σ2 unknown. Under the classical model specification, a multivariate Normal distribution is assumed on the random effects as follows.

γiidN(0k,Σ),=1,,g, (3)

where Σ is typically a structured k×k covariance matrix that parameterized by some unknown parameters. Many different structures of Σ have been considered in the literature [28]. The independent structure,

Σ=[ς12ςk2]

is assumed in this paper, where ς1,,ςk require estimation in practice. In other words, for each =1,,g:

γ,jiidN(0,ςj2),j=1,,k. (4)

Stacking up the data from the g groups, we then have

y[y1yg]Rn,γ[γ1γg]Rkg,X[X1Xg]Rn×p,andZ[Z1Zg]Rn×kg,ε[ε1εg]Rn,

With the above definitions, we rewrite the model into a succinct expression

y=++ε, (5)

where

γN(0nk,G),G=[ΣΣ]

and εN(0n,R) with R=σ2In. In addition, as γ and ε are independent, we have

(γε)N(0,[G00R]).

Thanks to the Normality assumption on the random effects, y is multivariate Normally distributed, that is, the exact distribution of y is tractable. Therefore, either the maximum likelihood estimation or restricted maximum likelihood estimation can be utilized to conduct statistical inference, see e.g. [8,30] for a discussion on these technical details. The above classical approach has been implemented in popular statistical packages such as the statsmodels module [24] in Python and the lme4 library [3] in R.

2.2. Saddlepoint approximation

The saddlepoint approximation method, proposed by Daniels [10] provides an accurate pointwise approximation formula for the probability density function (PDF) of a distribution based on its MGF. In general, given a random variable w with PDF fw(w), the MGF Mw(t) and cumulant generating function (CGF) Kw(t) are defined as

Mw(t)E[etw],andKw(t)ln(Mw(t)),

respectively. Then, the saddlepoint approximation of fw(w) at any given w is given by:

f^w(w)=12πKw′′(t)exp(Kw(t)tw) (6)

and t satisfies:

Kw(t)=w, (7)

where Kw(t) and Kw′′(t) are first and second derivatives of Kw(t), respectively. The saddlepoint approximation in Equation (6) provides a convenient analytical formula for pointwise approximation to the density function of the random variable w. This is particularly relevant when we examine the distribution of the (weighted) sum of a finite set of random variables. In fact, let s=i=1doiwi, where wi are independent to each other, but not necessarily identically distributed, and oi are known constants (weights). Essentially, s is the weighted sum of d independent random variable wi. Assume the MFG of each wi is Mwi(w). The MFG and CGF of s are then given by

Ms(t)=i=1dMwi(oit)andKs(t)=i=1dKwi(oit).

To obtain the exact PDF of s, one needs to use convolution, which is usually intractable unless wi conveniently follows a Normal distribution. On the other hand, the saddlepoint approximation for the density of s is given by

f^s(s)=12πKs′′(t)exp(Ks(t)ts), (8)

with t being the solution to

Ks(t)=i=1doiKwi(oit)=h. (9)

Equations (8) and (9) together provide a valuable approximation formula for the density of s=i=1doiwi at each point. Note that under proper regularity conditions, saddlepoint Equation (7) defines an implicit function t(w), based on which we can define the saddlepoint density:

f^w(w)=12πKw′′(t(w))exp(Kw(t(w))t(w)w). (10)

Note that the saddlepoint density function may need to be normalized to become a proper density function. Due to the involvement of an implicit function, which often does not have a closed form, it is not straightforward to use (10) directly in a maximum likelihood paradigm. We propose the inclusion of the saddlepoint Equation (7) as an equality constraint in the maximization of the likelihood function. This proposed approach is detailed in the next section.

In addition, Equation (10) will then reduce to the Normal density when it follows a Normal distribution. In other words, the density for a Normal distribution can be viewed as a special case of saddlepoint approximation, and it is no longer an approximation.

3. The proposed estimation method

We consider a mixed effects model involving k random effect coefficients, each of which follows a distribution Fj(θj), parameterized by unknown parameters θj, with its PDF, MGF, and CGF given by ϕj(γ;θj), Mj(t;θj), and Kj(t;θj), respectively with θj unknown. Specifically, for the ℓth cluster, with the cluster index ℓ being dropped for conciseness, we have

yi=j=1pxi,jβj+j=1kzi,jγj+εi,i=1,,n, (11)

where εiN(0,σ2) and γjFj(θj). Let f(y,i;β,θ1,,θk,σ2) be the PDF of the ith observation in cluster ℓ. Assuming ε,i are independent to each other, and γ,j are independent to each other as well as to ε,i, the join density function of y=(y,i)=1,,g;i=1,,n is given by

=1gi=1nf(y,i;β,θ1,,θk,σ2). (12)

When Fj(θj) does not represent a Normal distribution, it is difficult to derive the analytical form for f(y,i;β,θ1,,θk,σ2). We therefore resort to the SA method to find a good approximation of f(y,i;β,θ1,,θk,σ2) at each observation of the dependent variable y,i, assuming the existence of MGF and CGF. According to (8), we have

f(y,i;β,θ1,,θk,σ2)12πK,i′′(t,i)exp(K,i(t,i)t,iy,i), (13)

where t,i is the solution of K,i(t,i)=y,i and

K,i(t,i)=ln(exp(t,ij=1px,i,jβj)×(j=1kM,j(z,i,jt,i))×(Mε,i(t,i;σ2)))=t,i(j=1px,i,jβj)+j=1kKj(z,i,jt,i;θj)+12σ2t,i2. (14)

The last term in above equation is from the CGF of the Normal distribution N(0,σ2). With Equation (13), the log-likelihood can be approximated by

L(β,θ1,,θk,σ2|y,i)=1gi=1nln(12πK,i′′(t,i)exp(K,i(t,i)t,iy,i)). (15)

When random effects follow a Normal distribution, Equation (15) is no longer an approximation, i.e. it is same as the log-likelihood derived with Normally distributed random effects. In other words, the log-likelihood of a regular LME model with Normally distributed random effects can be viewed as a special case of Equation (15). With Equation (15), we propose to estimate the fixed effects, β,θ1,,θk,σ2 by solving the following optimization problem.

maxβ,θ1,,θk,t,σ2=1gi=1n[12ln(K,i′′(t,i))+K,i(t,i)t,iy,i]s.t.K,i(t,i)=y,i,=1,,g;i=1,,n. (16)

Equation (16) is typically nonlinear and nonconvex, which is, in general, challenging to solve to global optimality. However, there are efficient modern optimization algorithms, such as sequential quadratic programing (SQP) [4] and alternating direction method of multipliers (ADMM) [5] to find local optimal solutions or stationary points. For practitioners, if computational resources permit, it is highly recommended to consider both and see if they lead to similar estimated coefficients as well as similar log-likelihood values or not. In addition, we can have multiple runs of an algorithm with different initial solutions and then choose the one with the best objective value. For all the simulated examples that have been studied in Section 4, satisfactory results can be obtained by having 5 different initial values. Empirically, it is recommended to consider at least that amount of different starting points to ensure the optimization algorithm ends with an acceptable solution for both the simulated examples and the real-world examples.

Once a solution of (16), denoted by β^,θ^1,,θ^k,σ^2, collectively written as Φ^ is obtained, we can further estimate the random effect coefficients γ's. The joint likelihood function of y and γ conditional on Φ^ is:

f(y,γ|Φ^)=fy(y|γ,Φ^)fγ(γ|Φ^)=1gexp(12σ^2(yXβZγ)T(yXβZγ))×=1gi=1nj=1kϕj(γ,j;θj). (17)

Conditional on the estimates of fixed effects coefficients, we therefore propose to estimate the random effects coefficients γ,j,=1,,g,j=1,,k, by solving the following optimization problem.

(γ^1,,γ^g)=minγ1,,γg[2nj=1kln(ϕj(γ,j;θj^))=1g1σ^2(yXβ^Zγ)T(yXβ^Zγ)2nj=1kln(ϕj(γ,j;θj^))]. (18)

Note that (18) is decomposable and can be solved for each γ individually as follows.

γ^=minγ[1σ^2(yXβ^Zγ)T(yXβ^Zγ)2nj=1kln(ϕj(γ,j;θj^))]. (19)

Now, we have completed the estimation process for both the fixed effects ( β,θ1,,θk,σ2) and the random effects ( γ1,,γg).

Remark 3.1

Since we apply the saddlepoint approximation of the joint density function (12), the Normality assumption on the error term εi in (11) is in fact not essential. The proposed framework can handle non-Normal errors as well, as long as the distribution of the error term possesses an MGF and a CGF. We only consider the traditional Normal error assumption in this paper for simplicity. Moreover, using the proposed approach, the random effects do not have to follow the same distribution, providing great flexibility in modeling different factors contributing to the dependent variable. For simplicity, we only demonstrate the cases where the random effects follow the same distribution in this paper.

To demonstrate the proposed approach, in the rest of this section, we study a special case where all the random effects follow the same distribution, but possibly with different parameters (unknown). More specifically, we assume that for each j=1,,k

γ,jiidF(θj),=1,,g.

We specifically consider four different distributions in this paper: (1) Uniform, (2) Laplace [14], (3) Gamma, and (4) Triangular. The equations representing the Uniform distribution are given in the next paragraph. The equations for the other three distributions are derived in detail in Section A of the Supplemental Document.

Assume for each j=1,,k

γ,jiidU(|βj|,|βj|),=1,,g,

where U() stands for a Uniform distribution, βj is the corresponding fixed effect coefficient. In this way, the random effect coefficient is bounded by the absolute value of βj. The overall coefficient ( βj+γ,j) is restricted to within (0,2βj) if βj>0 or (2βj,0) with βj<0, for all =1,,g and j=1,,k. We can derive the following:

K,i(t,i)=t,i(j=1px,i,jβj)+(j=1kln(eβjz,i,jt,ieβjz,i,jt,i2βjz,i,jt,i))+(12σ2t,i2),K,i(t,i)=(j=1px,i,jβj)+(j=1k1+βjz,i,jt,i+e2βjz,i,jt,i(βjz,i,jt,i1)(e2βjz,i,jt,i1)t,i)+σ2t,i,K,i′′(t,i)=j=1k1+e4βjz,i,jt,i2e2βjz,i,jt,i(2βj2z,i,j2t,i2+1)(e2βjz,i,jt,i1)2t,i2+σ2

and βjz,i,jt,i0 for =1,,g;i=1,,n;j=1,,k. To estimate the fixed effects, we solve the following optimization problem

maxβ,t,σ2=1gi=1n[12ln(K,i′′(t,i))+K,i(t,i)t,iy,i]subject toK,i(t,i)=y,i,=1,,g;i=1,,n.

Remark 3.2

Technically, we also need to include an inequality constraint βjz,i,jt,i0, for all =1,,g;i=1,,n;j=1,,k to make sure that the denominators are not 0. However, this kind of constraint is typically not easy to include in an optimization problem. Moreover, the constraint only removes a measure zero set in the βt space. Therefore, for practical reasons, we can ignore this constraint. Our numerical experiments also show that ignoring this constraint does not prevent the optimization algorithms from completing successfully.

For the random effects estimation, we have

ln(ϕ(γ,j|θ^j))=ln(12β^j),

which is independent of the random effects coefficients γ's. Therefore the estimation of individual γ can be obtained by solving the following minimization problem

γ^=minγ(1σ^2(yXβ^Zγ)T(yXβ^Zγ)),

which is a linear regression and has a closed form solution.

We visualize the distributions of the four distributions in Figure 1. From (a) and (d) of Figure 1, we note that both the Uniform distribution and the Triangular distribution have a lower bound and an upper bound. Hence, it is feasible to assume either on the random effects when sign constraints are needed for the overall regression coefficients. The difference lies in the underlying belief: The Uniform distribution corresponds to no preference on the magnitude of the random effect coefficients, while the Triangular distribution corresponds to a preference that the magnitude is closer to 0. In fact, the Triangular distribution is based on some knowledge of the minimum, maximum and a preference of the most likely value. From (c) of Figure 1, we see that samples from the Gamma distribution are strictly positive ranging from 0 to infinity. Hence, if domain knowledge suggests that all random effects should be greater than 0, then the Gamma distribution is a natural choice. Similar to the Normal distribution, the Centered Laplace distribution in (b) of Figure 1 is unconstrained and ranges from negative infinity to positive infinity, which can be considered as an alternative to the Normal distribution. The above four distributions cover all the scenarios for the sign of the overall effects.

Figure 1.

Figure 1.

(a) is the PDF of a Uniform distribution that is bounded by |β| and |β|; (b) the PDF of a Centered Laplace distribution with scale 1 for illustrative purpose; (c) is a Gamma PDF with a = 1, b = 2 for illustrative purpose; (d) is the PDF of a Triangular distribution that is bounded by |β| and |β|.

4. Simulation with synthetic data

We conduct simulation studies in this section. Results for a LME model with random intercept only is presented in Section 4.1. We then detail results for the same model with both random intercept and one random slope in Section 4.2. All the optimization problems involved were solved by an interior point algorithm described in [7,21] and implemented within the SciPy package [27] optimization module. This algorithm is a rather sophisticated general purpose nonlinear optimization algorithm, featuring a logarithmic barrier function with adaptive barrier parameter, a linearly constrained convex quadratic approximation at each iteration, and a trust region with adaptive radius. To solve the optimization problems for fixed effects estimation, we started with 5 different initial solutions and the one with the largest log-likelihood value is retained. This is a typical strategy for solving nonconvex optimization problems. As we have shown earlier, the random effects estimation problems are often convex and hence were solved with only 1 initial solution.

4.1. Models with random intercept

Datasets are simulated according to the statistical assumption of an LME model with random intercept only. We detail the specific procedures about data generation in Section A of the Supplemental Document. The choice of dimension is p=5 and p=8, and the sample sizes are n=100,200, and 500. Each sample size is then equally divided into g=20 clusters. Therefore, we have 2×3=6 combinations of different dimensions and sample sizes. For each combination, we simulate 10 different datasets aiming to check the variation in the data-generating process. The true parameters used to simulate the datasets and more simulation details are reported in Section A of the Supplemental Document. The root mean square error (RMSE) of the estimated fixed effect coefficients ( β^1,,β^p) is computed against the true regression parameters used to simulate data, and is used as the criterion for performance assessment. Note that the first dimension of data X is intercept, 1. For instance, when p=5, it means

X=[1x1,2x1,41xn,2xn,4].

The results for Uniform and Centered Laplace distributions are reported in Figures 2 and 3, respectively.

Figure 2.

Figure 2.

Uniform (a) p = 5; (b) p = 8. Each box is 10 RMSEs of 10 different datasets for that combination. LME models with random intercept only.

Figure 3.

Figure 3.

Centered Laplace (a) p = 5; (b) p = 8. Each box is 10 RMSEs of 10 different datasets for that combination. LME models with random intercept only.

From Figure 2, it is obvious that as the sample size increases, the RMSEs decrease for both p=5 and p = 8 in general. Also, the variation that comes from different datasets of the same model is slightly smaller with larger sample size, which is visualized in each box in Figure 2. The dimensionality also plays a role here as the performances are similar when n=100 and n=200, but the RMSEs of p = 8 become smaller than those for p = 5 when n increases to 500. For each combination, the median of its 10 RMSEs is reported in Table 1. Median is used instead of the mean since median is robust in the presence of extreme values. The results in Table 1 agrees with the observations from Figure 2. It is also observed that all of the RMSEs for Uniform are less than 0.2 indicating satisfactory performances.

Table 1.

Each cell is median of 10 RMSEs of estimated fixed effect coefficients against true parameters of 10 different datasets for that combination

    Uniform Centered Laplace
  Sample Random Random Intercept Random Random Intercept
Dimension Size Intercept and Slope Intercept and Slope
p = 5 n = 100 0.120 0.112 0.040 0.053
  n = 200 0.099 0.083 0.035 0.035
  n = 500 0.089 0.065 0.031 0.032
p = 8 n = 100 0.099 0.095 0.052 0.058
  n = 200 0.097 0.087 0.041 0.046
  n = 500 0.052 0.068 0.031 0.041
    Gamma with fixed shape parameter Triangular
  Sample Random Random Intercept Random Random Intercept
Dimension Size Intercept and Slope Intercept and Slope
p = 5 n = 100 0.096 0.143 0.230 0.280
  n = 200 0.041 0.121 0.153 0.279
  n = 500 0.037 0.115 0.133 0.246
p = 8 n = 100 0.078 0.167 0.229 0.240
  n = 200 0.057 0.133 0.140 0.237
  n = 500 0.036 0.091 0.136 0.199

Observations from Figure 3 are consistent with those made for Figure 2, i.e. as the sample size increases, the RMSEs decrease for the same dimension. In addition, the Centered Laplace distribution actually has a better performance than Uniform as none of the RMSEs for Centered Laplace distribution is above 0.1. Actually, the median of RMSEs for n=100,200, and 500 are 0.040, 0.035 and 0.031, respectively. It manifests that the estimation accuracy is satisfactory.

In addition, we present the boxplots for Gamma and Triangular in Section C of the Supplemental Document to save space. For Gamma distribution, we actually consider a special case by fixing a=1 to make its shape less flexible for model identifiability purposes. These observations agree with what we have discussed above.

4.2. Models with random intercept and one random slope

In this section, we keep the same set up as in Section 4.1. However, instead of using the LME models with random intercept only, we simulate the same model with both a random intercept ( x1) and one random slope ( x2). The true parameters used are reported in the Section B of the Supplemental Document. The root mean squared error (RMSE) of the estimated fixed effect coefficients ( β^1,,β^p) is computed against the true parameters used to simulate data, and is used as the criterion for performance assessment. Similarly, the boxplots for Uniform and Centered Laplace distributions are reported in Figures 4 and 5, respectively. The boxplots for the other two distributions are in Section C of the Supplemental Document.

Figure 4.

Figure 4.

Uniform (a) p = 5; (b) p = 8. Each box is 10 RMSEs of 10 different datasets for that combination. LME models with random intercept and one random slope.

Figure 5.

Figure 5.

Centered Laplace (a) p = 5; (b) p = 8. Each box is 10 RMSEs of 10 different datasets for that combination. LME models with random intercept and one random slope.

From Figures 4 and 5, similar observations hold as those for models with random intercept only. Generally speaking, as the sample size increases, the RMSEs decrease. Taking the Centered Laplace distribution as an example, the median of the RMSEs drops from 0.053 for n = 100 to 0.032 for n = 500 with p = 5. We observe the same pattern for the other distributions. Moreover, if we compare the performances of models with both random intercept and random slope with models with random intercept only, it is actually observed that making the model more complicated is unnecessary especially when the sample size is not big enough. For example, for the same distribution, the median RMSE is 0.046 for n = 200, p = 8 compared to 0.041 of the same combination. The medians of the RMSEs are reported in Table 1.

4.3. Validation of random effects

Since the random effects are simulated as well, RMSEs can be calculated against the true random effects. Sticking to the LME model with random intercept only, we report the median of RMSEs for random effects in Table 2. The observations are similar to those for the fixed effects in the previous two subsections: as sample size increases, the median of the RMSEs decreases. In addition, none of the medians is above 0.3 suggesting that the estimation performance is satisfactory.

Table 2.

Each cell is the median of 10 RMSEs of estimated random effects against true random effects of the 10 different datasets for that combination.

Dimension Sample Size Uniform Centered Laplace Gamma Triangular
p = 5 n = 100 0.168 0.140 0.089 0.255
p = 5 n = 200 0.129 0.092 0.055 0.143
p = 5 n = 500 0.107 0.060 0.049 0.116
p = 8 n = 100 0.156 0.138 0.111 0.274
p = 8 n = 200 0.148 0.090 0.059 0.153
p = 8 n = 500 0.108 0.067 0.044 0.123

Moreover, in order to further justify the proposed method, focusing on one dataset for each distribution, the estimated g=20 random effects are tested against the distribution that is assumed using a two-sided Kolmogorov–Smirnov test [9] and the p-values are reported in Table 3. It is observed that none of the p-values is smaller than 0.05 suggesting that there is no statistically significant evidence to reject the null hypothesis that the g = 20 random effects are not from the distribution assumed.

Table 3.

The p-values of a two-sided Kolmogorov–Smirnov test on g=20 random effect coefficients with each distribution assumed. One dataset simulated from the LME model with random intercept only.

Dimension Sample Size Uniform Centered Laplace Gamma Triangular
p = 5 n = 100 0.586 0.591 0.328 0.548
p = 5 n = 200 0.430 0.569 0.256 0.310
p = 5 n = 500 0.667 0.649 0.209 0.778
p = 8 n = 100 0.376 0.604 0.365 0.398
p = 8 n = 200 0.452 0.623 0.312 0.771
p = 8 n = 500 0.541 0.596 0.243 0.779

Furthermore, taking Gamma distribution as an example, we further validate the results by presenting Figure 6, where the left hand plot is the histogram of the g = 20 true random effects generated from Γ(a=1,b=0.8) with the true density imposed on the same plot, while the right hand side is the histogram of the 20 estimated random effects (p = 5, n = 500) with the true density of Γ(a=1,b=0.8). The p-value of two-sided Kolmogorov–Smirnov test is 0.204 and 0.209, respectively. Both plots have similar patterns, and both p-values are above 0.05 meaning it fails to reject the null hypothesis that the random effects are from Γ(a=1,b=0.8). The RMSE of the 20 estimated random effects against the true random effects is 0.023, which indicates the estimation is very much close to the true parameters.

Figure 6.

Figure 6.

(a): Histogram of the g = 20 true random effects generated from Γ(a=1,b=0.8) with the true density; (b): Histogram of the g = 20 estimated random effects with density of Γ(a=1,b=0.8).

Last but not the least, we also intend to validate the approximated density against the true density for simple cases where we can actually obtain an analytical expression of the exact density of yi. Using the Uniform distribution as an example, working under the LME model with random intercept, the PDF of yi=j=1pxi,jβj+zi,1γ1+εi, where εiN(0,σ2) and γ1U(1,1), can be explicitly written as follows by the convolution theorem.

f(yi)=12|zi,1|(Φ(yij=1pxi,jβj+|zi,1|σ)Φ(yij=1pxi,jβj|zi,1|σ)), (20)

where Φ() is the CDF of a standard Normal distribution. Densities from saddlepoint approximation in (13) can be evaluated against Equation (20) in a pointwise manner. The RMSEs are reported in Table 4 and all of the 6 RMSEs are less than 0.1 suggesting that the saddlepoint densities are indeed close to the true densities.

Table 4.

RMSE of saddlepoint approximation density against the true density; Uniform distribution on random effects; One dataset simulated from the LME model with random intercept only.

  p = 5 p = 8
n = 500 0.058 0.091
n = 1000 0.051 0.073
n = 2000 0.035 0.037

4.4. Comparisons with the PIT method

In this section, we compare the proposed SA method with the PIT method [20] that was reviewed in Section 1. Although both methods are able to handle non-Normal random effects, the proposed SA method is numerically more stable since the formulation of the PIT method sums up Q products of ni probability densities inside the logarithm function following the notation of [20], which is impossible to convert from multiplication to summation. We implemented the PIT method according to its original formulation [20]. Similar to Section 5.3 of [8], when implementing the PIT method, the only modification we did was to take the natural logarithm of the last equation in Section 3 of [20]. Numerical issues will arise during the optimization procedure without such modification.

Following the specification in Section 5.3 of [8], the number of points used to approximate integrals is 2 and 4, i.e. Q=2,4 to balance between approximation accuracy and computational resource. The values of other parameters such as zq,ηq,q=1,,Q were extracted from Table 25.10 of [1]. Taking Uniform and the LME model with random intercept only as an example. We follow the same procedures as in Section 4.1 and report the median of 10 RMSEs of the estimated fixed effect coefficients of PIT method against the true parameters in Table 5. The results are comparable to the same combination reported in Table 1, and the true parameters used to simulate data are reported in Section B of Supplemental Document.

Table 5.

Each cell is median of 10 RMSEs of estimated fixed effect coefficients against true coefficients of 10 different datasets for that combination for the PIT method.

  Q = 2 Q = 4
  p = 5 p = 8 p=5 p = 8
n = 100 0.204 0.174 0.207 0.219
n = 200 0.112 0.115 0.126 0.133
n = 500 0.090 0.106 0.077 0.084

From Table 5, the PIT method produces good results, although the RMSEs are mostly larger than the proposed SA method shown in Table 1. For example, the median of RMSEs for n=100,p=5,Q=2 is 0.204 as compared to 0.120 of the proposed SA method for the same combination. The only exception is when n=500,p=5 where the median of RMSEs is 0.077, while that of the SA method is 0.089. Despite the slight difference in the performance of the PIT method and the proposed SA approach, both yield acceptable results in general.

Finally, the proposed SA method is admittedly not a perfect method as one of its limitation is the computational time. Consider, for instance, a simulated dataset with p = 5 and n = 100. It takes the SA approach 251 seconds to finish the whole estimation process on an Amazon Linux cloud computing machine with Intel Xeon Platinum 8259CL processor. For comparison, the PIT method can complete the same process using 10% of the computing time in the same environment. We believe part of the performance difference is due to the constrained optimization in Equation (16), which is the most time consuming part in the whole process.

4.5. More considerations

We discuss in this section some additional important topics regarding the proposed SA approach and empirically evaluate them.

4.5.1. Model misspecification

We investigate how the proposed SA method performs when the distribution of the random effects is misspecified. In this regard, we work with a random intercept only model with p = 5 and n=100,200. Identical to the settings in Section 4.1, each sample size is then equally divided into g=20 clusters. For each combination, we simulate 10 different datasets aiming to check the variation in the data-generating process. The random effects is generated from a truncated Normal distribution, but is fitted separately with the four distributions considered in this paper: (1) Uniform (2) Centered Laplace (3) Gamma and (4) Triangular. The RMSE of the estimated fixed effect coefficients is computed against the true regression parameters used to simulate data. The medians are reported in Table 6.

Table 6.

Each cell represents median of 10 RMSEs of estimated fixed effect coefficients against true parameters of 10 different datasets for that combination for model misspecification.

Sample Size Uniform Centered Laplace
n = 100 0.128 0.049
n = 200 0.096 0.048
Sample Size Gamma with fixed shape parameter Triangular
n = 100 0.104 0.341
n = 200 0.063 0.272

From Table 6, generally the median of the RMSE is slightly higher than that reported in Table 1, indicating the estimation accuracy is sightly affected by the model misspecification. However, we observe that for the Uniform distribution, the medians are 0.120 and 0.099 for n = 100, 200, respectively, when the model is correctly specified in Table 1, and the numbers remain at a similar level, i.e. 0.128 and 0.096 in Table 6 when model is misspecified, which may be due to the non-informative nature of the Uniform distribution. Therefore, when it is needed to model random effects with a non-Normal distribution, Uniform is empirically recommended to be the default distribution if no prior information is obviously against such choice. In addition, given the numbers in Table 6, we believe the estimation accuracy is still acceptable and the proposed SA approach remains practically viable even if the model is misspecified.

4.5.2. Test of hypothesis

Conducting a test of hypothesis is not an easy task when the exact likelihood function is intractable. Following the same approximation method detailed in Section 4.2 of [8], a likelihood ratio test (LRT) is applied to serve the purpose. In fact, we consider testing both all β and individual β. Since the exact expression of the likelihood function is not available, the SA approximation is then utilized in lieu of the exact analytical expression. Taking the test of all β as an illustrative example, the hypotheses are as follows, assuming the fixed effects are positively constrained.

H0:β=0versusH1:atleastoneβi>0, (21)

where i=1,,p. The proposed test statistic is

TLR=2[Lapprox(β^0,θ^0,σ^0)Lapprox(β^,θ^,σ^)], (22)

where β^0,θ^0,σ^0 denote the estimation under the null hypothesis, and β^,θ^,σ^ denotes the estimation under the alternative hypothesis. The exact distribution of TLR is not available and we adopt the upper bound given in the equation (22) of [8], i.e. 12(P(χp12cα)+P(χp2cα)) to compute the corresponding critical value, cα, where α is the significance level. Therefore, the LRT used here is more conservative of committing a false positive error.

Their finite sample performance is investigated against a random intercept only model with p = 5, n = 100 or p = 5, n = 200. Same as in Section 4.1, g is set as 20. The empirical true negative (TN) rate is utilized as the criterion, which is defined as 1 minus the calculated false positive (type 1 error) rate based on 500 replications. With α=0.05 the theoretical TN is 0.95. The hypotheses for testing all β are given in Equation (21). For testing of an individual β, we specifically test β2 (the first non-intercept parameter for fixed effects). H0:β2=0versusH1:β2>0. The performance is presented in Table 7 for testing all β and Table 8 for testing individual β.

Table 7.

Test of all β. The empirical true negative rate is computed based on M=500 replications.

Sample Size Uniform Centered Laplace
n = 100 0.944 0.947
n = 200 0.958 0.969
Sample Size Gamma with fixed shape parameter Triangular
n = 100 0.972 0.955
n = 200 0.981 0.970

Notes: The significance level is set as 0.05, p = 5, g = 20.

Table 8.

Test of individual β. The empirical true negative rate is computed based on M=500 replications.

Sample Size Uniform Centered Laplace
n = 100 0.922 0.939
n = 200 0.939 0.940
Sample Size Gamma with fixed shape parameter Triangular
n = 100 0.968 0.974
n = 200 0.971 0.990

Notes: The significance level is set as 0.05, p = 5, g = 20.

From Tables 7 and 8, it is clear that except for Uniform and centered Laplace distributions that slightly under-cover when the sample size is small, all remaining combinations have an empirical TN that is higher than 0.95. The results are empirically acceptable indicating satisfactory performance of the proposed SA approach.

5. An application in retail data analytics

We now apply the proposed SA method to analyze the sales data of the 1 liter regular whole milk that was introduced in Section 1. The Uniform distribution is assumed on the random effects as there is no preference on the magnitude of the random effects from a business's point of view. The classical LME model and the PIT method are included as comparisons. The estimated fixed effects of all models are reported in Table 10, from which we can clearly observed that the magnitude of the slope from the SA approach is about 4 times larger than that of the lme4 and about 2 times larger than that of the PIT method. The 6 overall slopes for the 6 stores are reported in Table 9.

Table 10.

Estimated fixed effect coefficients of sales data of the 1 liter regular whole milk.

  Intercept slope Pred RMSE
lme4 2.671 0.055 0.080
PIT 1.524 0.277 0.081
SA 1.557 0.118 0.079

Notes: The proposed SA is fitted with a Uniform distribution on the random effects.

Table 9.

Estimated overall slopes of sales data of the 1 liter regular whole milk.

Store No. lme4 SA PIT
1 0.025 0.018 0.000
2 −0.089 0.018 0.000
3 −0.114 0.185 0.185
4 0.048 0.218 0.231
5 −0.108 0.171 0.172
6 −0.091 0.098 0.100

Notes: The proposed SA is fitted with a Uniform distribution on the random effects.

Since the random effects are bounded by a Uniform distribution and none of the 6 overall slopes is positive unlike the model with the Normality assumption, for which store 1 and store 4 have positive overall slopes as reported in Section 1 and Table 9. Compared with the traditional LME method, the proposed SA and the PIT method are observed to preserve model interpretability and sign correctness.

In order to measure the predictive performance, we define the predictive RMSE as the RMSE of the estimated response variable value y^ against the observed response variable value y, i.e.

predictiveRMSE=1gi=1nl(y^,iy,i)2=1gn.

Clearly, the predictive RMSE is a metric of how well the estimated model fit the observed data. As we can see from Table 10, another advantage of the proposed method is that the predictive RMSE of the proposed model is slightly lower than that of the traditional LME and the PIT method in Table 10, suggesting superior performance of the proposed SA method in not only maintaining model interpretability and sign correctness, but also having better predictability.

6. Discussion and concluding comments

In this paper, we study linear mixed effects models. Instead of assuming a Normal distribution on the random effects, we explore the possibility of non-Normal distributions to provide flexibility for modeling purposes. We propose to use a saddlepoint approximation method for parameter estimation, overcoming a major challenge in the lack of a closed-form likelihood function. To demonstrate the proposed approach, we specifically study four special cases of non-Normal distributions: (1) Uniform, (2) Centered Laplace, (3) Gamma and (4) Triangular. Both the simulation studies and the real-world application example in retail data analytics demonstrate the satisfactory performance of the proposed approach, which works best when the use of an appropriate distribution is supported by practical or domain knowledge of the range of the overall regression parameters.

There are two interesting areas of future research directly motivated by this paper. First, if a non-Normal distribution is assumed on the random effects, how much worse will the SA approximation quality be? How will the shape of the distribution affect the approximation quality? We are currently not aware of any theoretical results for these in the literature. If rigorous theoretical results are not available for this analysis, a paper reporting some empirical findings regarding these questions should be quite attainable and will be very informative. Second, the paper considers a linear mixed effects model, there is a need to extend it to a generalized linear mixed effects model (GLMM) if the data type of the response variable is non-continuous, for example, binary or countable quantities. We expect the estimation process to be more challenging for a GLMM as a link function will be involved. For example, if the response variable is a countable quantity and the logarithm of its expected value is modeled by a linear combination of unknown parameters, then the existence of a non-linear link function will greatly complicate the mathematical derivations. Therefore, the extension to GLMM is an interesting topic for future research.

Supplementary Material

Supplemental Material
CJAS_A_2260576_SM6184.pdf (183.1KB, pdf)

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Abramowitz M., Stegun I.A., and Romer R.H., Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, National Bureau of Standards, Washington DC, 1964. [Google Scholar]
  • 2.Bates D., Mächler M., Bolker B., and Walker S., Fitting linear mixed-effects models using lme4, preprint (2014). Available at arXiv, arXiv:1406.5823.
  • 3.Bates D., Mächler M., Bolker B., and Walker S., Fitting linear mixed-effects models using lme4, J. Stat. Softw. 67 (2015), pp. 1–48. 10.18637/jss.v067.i01 [DOI] [Google Scholar]
  • 4.Boggs P.T. and Tolle J.W., Sequential quadratic programming, Acta Numerica 4 (1995), pp. 1–51. [Google Scholar]
  • 5.Boyd S., Parikh N., Chu E., Peleato B., and Eckstein J., Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends. Mach. Learn. 3 (2011), pp. 1–122. [Google Scholar]
  • 6.Bronnenberg B.J., Dhar S.K., and Dubé J.-P., Consumer packaged goods in the united states: National brands, local branding, J. Mark. Res. 44 (2007), pp. 4–13. [Google Scholar]
  • 7.Byrd R.H., Hribar M.E., and Nocedal J., An interior point algorithm for large-scale nonlinear programming, SIAM. J. Optim. 9 (1999), pp. 877–900. [Google Scholar]
  • 8.Chen H., Han L., and Lim A., Estimating linear mixed effects models with truncated normally distributed random effects, Commun. Stat. Simul. Comput. (2022). 10.1080/03610918.2022.2066696. [DOI] [Google Scholar]
  • 9.Daniel W.W., Applied Nonparametric Statistics, PWS-Kent Pub., Boston, 1990. [Google Scholar]
  • 10.Daniels H.E., Saddlepoint approximations in statistics, Ann. Math. Stat. 25 (1954), pp. 631–650. [Google Scholar]
  • 11.Fröberg C.-E., Introduction to Numerical Analysis, Addison-Wesley, Reading, MA, 1969. [Google Scholar]
  • 12.Hammersley J.M. and Handscomb D.C., Monte Carlo Methods, Springer, Methuen, 1964. [Google Scholar]
  • 13.Jiang J., Linear and Generalized Linear Mixed Models and Their Applications, Springer Science & Business Media, New York, NY, 2007. [Google Scholar]
  • 14.Kotz S., Kozubowski T., and Podgorski K., The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance, Springer Science & Business Media, Boston, MA, 2012. [Google Scholar]
  • 15.Lin T.I. and Lee J.C., Estimation and prediction in linear mixed models with skew-normal random effects for longitudinal data, Stat. Med. 27 (2008), pp. 1490–1507. [DOI] [PubMed] [Google Scholar]
  • 16.Lindstrom M.J. and Bates D.M., Newton-Raphson and EM algorithms for linear mixed-effects models for repeated-measures data, J. Am. Stat. Assoc. 83 (1988), pp. 1014–1022. [Google Scholar]
  • 17.Matos L.A., Prates M.O., Chen M.-H., and Lachos V.H., Likelihood-based inference for mixed-effects models with censored response using the multivariate-t distribution, Stat. Sin. 23 (2013), pp. 1323–1345. [Google Scholar]
  • 18.Mattos T.B., Lachos V.H., Castro L.M., and Matos L.A., Extending multivariate student's-t t semiparametric mixed models for longitudinal data with censored responses and heavy tails, Stat. Med. 41 (2022), pp. 3696–3719. [DOI] [PubMed] [Google Scholar]
  • 19.McCulloch C.E. and Neuhaus J.M., Misspecifying the shape of a random effects distribution: Why getting it wrong may not matter, Stat. Sci. 26 (2011), pp. 388–402. [Google Scholar]
  • 20.Nelson K.P., Lipsitz S.R., Fitzmaurice G.M., Ibrahim J., Parzen M., and Strawderman R., Use of the probability integral transformation to fit nonlinear mixed-effects models with nonnormal random effects, J. Comput. Graph. Stat. 15 (2006), pp. 39–57. [Google Scholar]
  • 21.Nocedal J. and Wright S., Numerical Optimization, Springer Science & Business Media, New York, NY, 2006. [Google Scholar]
  • 22.Pinheiro J.C., Liu C., and Nian Wu Y., Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution, J. Comput. Graph. Stat. 10 (2001), pp. 249–276. [Google Scholar]
  • 23.Rao V.R., Pricing research in marketing: The state of the art, J. Bus. 57 (1984), pp. S39–S60. [Google Scholar]
  • 24.Seabold S. and Perktold J., statsmodels: Econometric and statistical modeling with Python, in 9th Python in Science Conference, Austin, TX, 2010.
  • 25.Skrondal A. and Rabe-Hesketh S., Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models, Crc Press, Boca Raton, FL, 2004. [Google Scholar]
  • 26.Verbeke G. and Lesaffre E., The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data, Comput. Stat. Data. Anal. 23 (1997), pp. 541–556. [Google Scholar]
  • 27.Virtanen P., Gommers R., Oliphant T.E., Haberland M., Reddy T., Cournapeau D., Burovski E., Peterson P., Weckesser W., Bright J., van der Walt S.J., Brett M., Wilson J., Jarrod Millman K., Mayorov N., Nelson A.R.J., Jones E., Kern R., Larson E., Carey C.J., Polat İ., Feng Y., Moore E.W., VanderPlas J., Laxalde D., Perktold J., Cimrman R., Henriksen I., Quintero E.A., Harris C.R., Archibald A.M., Ribeiro A.H., Pedregosa F., and van Mulbregt P., SciPy 1.0 Contributors , SciPy 1.0: Fundamental algorithms for scientific computing in Python, Nat. Methods. 17 (2020), pp. 261–272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wu L., Mixed Effects Models for Complex Data, Chapman and Hall/CRC, Boca Raton, FL, 2009. [Google Scholar]
  • 29.Yucel R.M. and Demirtas H., Impact of non-normal random effects on inference by multiple imputation: A simulation assessment, Comput. Stat. Data. Anal. 54 (2010), pp. 790–801. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Zhang X., A tutorial on restricted maximum likelihood estimation in linear regression and linear mixed-effects model, 2015. http://statdb1.uos.ac.kr/teaching/multi-grad/ReML.pdf.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material
CJAS_A_2260576_SM6184.pdf (183.1KB, pdf)

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES