Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Oct 7.
Published in final edited form as: Stat Surv. 2008 Jan 1;2:154–169. doi: 10.1214/08-ss036

Testing polynomial covariate effects in linear and generalized linear mixed models*

Mingyan Huang 1, Daowen Zhang 1
PMCID: PMC2758794  NIHMSID: NIHMS88797  PMID: 19816591

Abstract

An important feature of linear mixed models and generalized linear mixed models is that the conditional mean of the response given the random effects, after transformed by a link function, is linearly related to the fixed covariate effects and random effects. Therefore, it is of practical importance to test the adequacy of this assumption, particularly the assumption of linear covariate effects. In this paper, we review procedures that can be used for testing polynomial covariate effects in these popular models. Specifically, four types of hypothesis testing approaches are reviewed, i.e. R tests, likelihood ratio tests, score tests and residual-based tests. Derivation and performance of each testing procedure will be discussed, including a small simulation study for comparing the likelihood ratio tests with the score tests.

Keywords: Likelihood Ratio Test, Restricted Maximum Likelihood (REML), Score Test

1. Introduction

Linear mixed models (LMMs) [16] and their extension, generalized linear mixed models (GLMMs) [2; 28] are popular statistical models for analyzing correlated data, including longitudinal and clustered data often arising in biomedical research. An important feature of these models is that the conditional mean of the response given covariates and random effects, after transformed by a link function, is linearly related to the fixed covariate effects and random effects. The correctness of such model specification, especially the one on parametric linear covariate effects, has a significant impact on the validity of the subsequent statistical inference on the covariate effects. Therefore, it is of practical importance to check the adequacy of the assumption for the parametric linear covariate effects.

In order to evaluate the adequacy of a parametric covariate effect in a regression model, one common approach is to cast the problem in the hypothesis testing framework, where a broader class of models is selected as the alternatives. Nonparametric regression models, due to their flexibility and robustness in modeling the relationship between a response variable and explanatory variables, are often chosen as such alternatives. In practice, however, one rarely directly uses pure nonparametric regression models as alternatives because of the intrinsic infinite dimensional problem of nonparametric functions. To overcome such difficulties, various smoothing techniques, such as kernel smoothing and (penalized) spline smoothing, are often applied to estimate nonparametric functions, and the resulting estimates are then used as the alternatives for testing the adequacy of the parametric covariate effects. In doing so, the infinite dimensional alternatives are reduced to the ones with finite dimensions (or even one dimension in some special cases), which significantly simplifies the testing problems. For example, it is well-known that a nonparametric function estimated via penalized splines or smoothing splines has a mixed effects representation [3; 29; 30]. An appealing feature of using the mixed effects representation is that one can cast the hypothesis test of parametric against nonparametric covariate effects as a variance component test, which in most cases is a simple one-dimensional testing problem [30; 8]. The likelihood ratio and the score testing approaches reviewed here are mainly based on this mixed effects representation.

Alternatively, testing the adequacy of parametric covariate effects in LMMs and GLMMs can also be viewed as a goodness-of-fit problem. The residual based tests proposed by Pan and Lin [22] take this view. Specifically, these tests are “based on the cumulative sums of residuals over covariates or predicted values of the response variable” [22]. The major advantage of this approach is that it is valid against any alternatives that deviate from an assumed model.

For checking the adequacy of parametric covariate effects, we present here an overview on four types of hypothesis testing approaches that receive significant attention in the literature: R tests, likelihood ratio tests, score tests and residual-based tests. For each test, the derivation and performance are described first in the linear or generalized linear model framework, and then we mainly focus on their extensions to mixed models. The paper is organized as follows. Section 2 briefly introduces the models to be considered in this review. In Section 3, we review the four testing procedures. In Section 4, we present the results from a small simulation study to compare the performance of two popular testing procedures, the exact likelihood ratio test and the score test, based on mixed effects representation of (penalized) smoothing spline estimates of a nonparametric function. The paper is concluded in Section 5 with some discussion.

2. Generalized linear mixed models

In this section, we briefly introduce the models to be considered and notations to be used in this review. Since LMMs are special cases of GLMMs, we will only introduce GLMMs for longitudinal/clustered data. Suppose there are m subjects (or clusters) in a data set. For the ith subject (i = 1, 2, …,m), denote by yij the jth measurement of the response variable (j = 1, 2, …, ni), and by zij, sij and tij the jth measurements of the q-dimensional covariates z, p-dimensional covariates s (not including the intercept) and a scalar covariate t. Given subject-specific random effects bi and these covariate values, yij is assumed to be independent and has a conditional density in an exponential family with conditional mean μij = E(yij |bi) and conditional variance var(yijbi)=ωij1ϕv(μij), where ωij is a prior weight, ϕ is the dispersion parameter and v(·) is the variance function. The conditional mean μij is assumed to be related to the covariates in the following GLMM [2]

g(μij)=sijTδ+m(tij,γ)+zijTbi, (2.1)

where g(·) is a known monotone link function, δ are fixed effects of s, m(t, γ) = γ0 + γ1t + ⋯ + γdtd is the d-order (d is a non-negative integer) polynomial covariate effect of t with coefficients γk's, and the random effects bi are usually assumed to have a multivariate normal distribution N{0,D(θ)} with θ being the vector of unique parameters in the variance matrix of the random effects bi.

Model (2.1) includes many popular models as special cases. When g(μ) = μ and yij is assumed to have a conditional normal distribution given random effects bi, the model (2.1) reduces to an LMM considered by Laird and Ware [16]. Suppose we are confident about the parametric linear form sijTδ in model (2.1) and are mainly concerned with the adequacy of m(t, γ), the polynomial covariate effect of t. For this purpose, we consider the following semiparametric additive mixed models (SAMMs) proposed by Zhang and Lin [30] as alternative models to model (2.1)

g(μij)=sijTδ+f(tij)+zijTbi, (2.2)

where f(t) is a smooth but arbitrary function.

Denote y = (y11, …, y1n1, …, ym1, …, ymnm)T, S = (s11, …, s1n1, …, sm1, …, smnm)T, b = (b1, …, bm)T, Zi=(zi1T,,ziniT)T, Z = diag{Z1, …, Zm}, and μ = E(y|b). In the next section, we discuss four procedures for checking the assumption that f(t) is adequately represented by a polynomial function m(t, γ).

3. Four testing procedures

3.1. R tests

The R tests, discussed by Hastie and Tibshirani [15], were originally developed for testing smoothing parameters during the estimation of nonparametric functions through smoothing techniques for independent data. The idea of the R tests is analogous to the F statistic frequently used in linear regression models. One of the advantages of the R tests is their easy implementation, as under the null hypothesis the asymptotic distribution of the R statistic can be approximated by the chi-square distribution. However, the estimates of the degrees of freedom of chi-square distributions can be biased, and the resulting approximated critical values might be inaccurate. Moreover, the finite-sample distribution of the R statistic has not been studied [8].

A number of modifications on the original R tests have been made, including the correction of the bias of nonparametric estimates, reconstruction of the original test statistics and the corresponding distributions [1; 4; 8]. Here we briefly describe a version of R statistics proposed by Hardle et al. [13] under the generalized linear model (GLM) framework. They considered the following generalized partially linear model, a special case of SAMMs (2.2) for independent data (ni = 1):

g(μi)=siTδ+f(ti). (3.1)

Here, no random effect is required as yi's are independent, so the second subscript j (j = 1) can be dropped for the simplicity of the notation.

Denote by δ̃ and the estimates of δ and f(t) under the null parametric model H0 : f(t) = m(t, γ), and by δ̂ and the estimates under the alternative model Ha : f(t) ≠ m(t, γ). Let μ~i=g1{siTδ~+f~(ti)} and μ^i=g1{siTδ^+f^(ti)}. The proposed R statistic for testing H0 : f(t) = m(t; γ) versus Ha : f(t) ≠ m(t; γ), is defined as

R=2Σi=1mQ(μ~i;μ^i), (3.2)

where Q is the log quasi-likelihood function defined as Q(μi;yi)=yiμiωi(yiu)v(u)du. Note that here the non-parametric estimates are based on kernel smoothing methods instead of spline methods as discussed below. As Hardle et al. [13] pointed out, the usual likelihood ratio statistic (, δ̂)−(, δ̃), where (f,δ)=Σi=1mQ(μi;yi), is not appropriate in this case as δf(t) are estimated from two different likelihood functions. Under the null hypothesis, Hardle et al. [13] showed that the new R statistic has an asymptotic normal distribution, although such approximation typically does not work well. Hence Hardle et al. [13] proposed several sophisticated bootstrap-based approaches to obtain more accurate critical values for the R tests.

Sperlich and Lombardia [21] extended the above R statistic to test H0 : f(t) = m(t; γ) for a special SAMM with a random intercept only (i.e., zij = 1). The test statistic they proposed takes the following form:

R1w=Σi=1mΣj=1niH{f^(tij),δ^}{f^(tij)f~(tij)+sijT(δ^δ~)}2π(tij), (3.3)

where π(.) is a weight function which could be chosen empirically and

H{f(tij),δ}=fl(yij;f,δ)2,

with l(yij; f, δ) = logf(yij |t, s, f, δ), the log density of yij. The R1w statistic is based on “direct comparison” between estimates from nonparametric alternatives and estimates from null parametric models. Furthermore, Sperlich and Lombardia [21] showed that the theory of the asymptotic normal distribution from Hardle et al. [13] can be carried over to the test statistic R1w. However, the asymptotic approximations often depart from the real finite sample distributions of the test statistics, which can lead to poor estimates of the critical values. Therefore, a number of bootstrap procedures were suggested to approximate the null distribution of the test statistic R1w.

It can be immediately seen that construction of the R test statistic and its extension R1w for SAMMs involves the estimation of both the null and alternative models. Estimation of the null model may be relatively straightforward, however the model estimation under alternatives can be computationally intensive and sometimes challenging. The bootstrap procedure used to calculate the null distribution of the test statistics also requires significant computation time, which may limit the application scope of this testing approach.

3.2. Likelihood ratio tests

For testing a parametric versus nonparametric covariate effect, the likelihood ratio test (LRT) is a natural choice. The LRT has been popular in situations where we need to compare two nested models. However, extending the LRT to testing the adequacy of a parametric covariate effect is not straightforward. A considerable amount of work has been done in constructing likelihood ratio based test statistics for comparing parametric versus nonparametric covariate effects. Depending on how the nonparametric alternatives were specified and what types of smoothing techniques were used, a number of versions of likelihood ratio based testing procedures have been proposed. In this section, we review the LRTs based on the mixed model representation of a nonparametric function estimated using a (penalized) smoothing spline.

Crainiceanu and Ruppert [7] considered the exact LRT and restricted likelihood ratio test (RLRT) for testing whether the nonparametric function is a certain degree polynomial in the following partially linear model, which is a special case of SAMMs (2.2) and generalized partially linear models (3.1),

yi=siTδ+f(ti)+i, (3.4)

where δ and f(t) have the same definitions as before, εi are i.i.d. from N(0,σ2) and are assumed to be independent of si and ti. The nonparametric function f(t) can be approximated through a penalized smoothing spline by the following spline function

f(t)=γ0+γ1t++γdtd+Σk=1Kak(tξk)+d, (3.5)

where K is a non-negative integer, γ = (γ0, …, γd)T , a = (a1, …, aK)T are two sets of parameters, (t)+d=td for t > 0 and zero otherwise, ξ1 < ⋯ < ξK are fixed knots, and ξk could be defined as the k/(K+1)th sample quantile of t′s. In order for (3.5) to be a good approximation, K is usually chosen to be large (such as 20), in which case it is not desirable to estimate γ and a directly. A penalized spline estimate of f(t) is obtained by minimizing the following penalized least square equation

Σi=1m{yif(ti)siTδ}2+1λaTΣ1a, (3.6)

where λ is the smoothing parameter and Σ is a pre-specified roughness penalty matrix, usually taken to be the identity matrix Σ = IK×K.

Let A be the m×(d+1) matrix with the ith row Ai=(1,ti,,tid) and B be the m×K matrix with the ith row Bi=[(tiξ1)+d,,(tiξK)+d]. The penalized least square equation (3.6) suggests that f(t) has a mixed effects representation f = + Ba, where f = {f(t1), f(t2),…, f(tm)}T , γ is considered as fixed effects and a is regarded as random effects having the distribution a~N(0,σa2) with σa2=λσ2. Denote β = (δT , γT)T and X = [S|A] where S is the m × p matrix with the ith row siT. Then the original partially linear model has the equivalent linear mixed model representation

Y=Xβ+Ba+. (3.7)

It can be clearly seen from the penalized spline expression (3.5) that generally f(t) is a polynomial of degree dh (h = 0, 1,…, d) if γdh+1 = ⋯ = γd = 0 and a1 = ⋯ = aK = 0, which is equivalent to γdh+1 = ⋯ = γd = 0 and σa2=0 (or λ = 0) using the linear mixed model representation. Therefore, testing whether the covariate effect of t is a (dh)-degree polynomial is equivalent to testing H0 : γdh+1 = ⋯ = γd = 0, σa2=0(λ=0) versus Ha : γdh+1 ≠ 0 or ⋯ or γd ≠ 0 or σa2>0(λ>0) if the mixed model representation of a penalized smoothing spline is used. One approach proposed by Crainiceanu and Ruppert [7] for testing this hypothesis is the LRT using the log-likelihood of β, σa2 and σ2 from the mixed model representation (3.7)

(β,σa2,σ2;Y)=12log|V|12(YXβ)TV1(YXβ),

where V=σa2BBT+σ2Im×m is the marginal variance of Y under the model (3.7). In the case where h = 0, the testing problem becomes a variance component test, i.e. H0:σa2=0 versus Ha:σa2>0. Besides the LRT, an alternative choice for testing this particular hypothesis is to use the following REML function

R(σa2,σ2;Y)=12log|V|12log|XTV1X|12(YXβ^)TV1(YXβ^),

where β̂ = (XTV−1X)−1XTV−1Y. This method is abbreviated by RLRT.

As pointed out by Crainiceanu and Ruppert [7], under H0 the LRT or RLRT asymptotically does not follow a 0.5χ02+0.5χ12 mixture chi-square distribution as suggested by Self and Liang [23] and Stram and Lee [25]. Instead, the LRT or RLRT asymptotically follows a mixture of χ02 and χ12 with a much heavier mass on χ02. A simple and fast algorithm was also proposed to sample the exact null distribution of the LRT or RLRT, which is summarized as follows [7]:

  • Step 1: Generate a grid of λ values where 0 = λ1 < λ2 < ⋯ < λn.

  • Step 2: Simulate K independent random variables w12,,wK2 from the χ12. Let SK=Σs=1Kws2.

  • Step 3: Independently simulate Xm,K,d=Σs=K+1mpd1ws2 with ws2~χ12.

  • Step 4: When h ≠ 0, independently simulate Xh=Σs=1hus2 with us2~χ12.

  • Step 5: For every grid point λi calculate
    Nm(λi)=Σs=1Kλiμs,m1+λiμs,mws2
    Dm(λi)=Σs=1Kws21+λiμs,m+Xm,K,d.
  • Step 6: Obtain λmax that maximizes fm(λi) over λ1, …, λn, where
    fm(λ)=mlog{1+Nm(λ)Dm(λ)}Σs=1Klog(1+λζs,m).
  • Step 7: Compute the LRT statistic LRTm=fm(λmax)+mlog(1+XhSK+Xm,K,d), or LRTm = fm(λmax) if h = 0. For the case of RLRT, compute
    RLRTm=supλ0[(mpd1)log{1+Nm(λ)Dm(λ)}Σs=1Klog(1+λμs,m)].
  • Step 8: Repeat steps 2–7.

Here μs,m and ζs,m are defined to be the K eigenvalues of the K × K matrices ZTP0Z and ZTZ respectively, where P0 = ImX(XTX)−1XT.

In a recent (unpublished) paper, Claeskens et al. [5] adapted the idea of Crainiceanu and Ruppert [7] and explored the advantages of wavelets for estimating nonparametric smooth functions over the use of penalized splines in partially linear models for independent data. Two asymptotic distribution theorems were developed for the test statistics proposed therein, and simulation results showed that the wavelet-based test has better performance than the penalized spline based test in some situations. They also extended the wavelet based test to the cases of simultaneously testing several polynomial covariate effects.

For testing generalized linear models with a single covariate t for independent discrete data, Liu et al. [20] proposed three methods which are “based on the connection between smoothing spline models and Bayesian models”, assuming f(t) in model (3.1) to have the following Bayesian expression

f(t)=γ0+γ1t++γdtd+τ12W(t),

where γ0, γ1, …, γd have flat prior, and W(t) is the d-order Wiener process. Under this Bayesian model, they extended the generalized maximum likelihood ratio (GML) test of Wahba [27] to test the adequacy of a generalized linear model, which is equivalent to H0 : τ = 0. The test statistic of the GML test proposed by Liu et al. [20] is constructed as

tGML=supϕL(0,ϕy)supτ,ϕL(τ,ϕy), (3.8)

where L(τ, ϕ|y) denotes the marginal density of y under this Bayesian model. Obviously, under the mixed model representation of a smoothing spline estimate of a nonparametric function tGML is essentially a LRT.

One difficulty with the GML test is that there is no closed form expression for L(τ, ϕ|y), and the test statistic can only be approximated numerically [20]. Secondly, it is nearly impossible to analytically derive the null distribution of the test statistic as its distribution depends on some unknown parameters. To overcome this difficulty, Liu et al. [20] suggested two approaches to approximating the exact null distribution of the test statistic. One is the usual bootstrap procedure which is computationally intensive. The other approach is the so called empirical approximation method, which was considered superior to the bootstrap-based method.

It should be noted that the testing procedures based on the likelihood ratio are all proposed for models for independent data. Although conceptually they can be extended to SAMMs for longitudinal/clustered data, there are at least two major obstacles. First the calculation of the likelihood is even more complicated under the alternative using the mixed model representation of a (penalized) smoothing spline estimate of a nonparametric function. Secondly, it may not be easy to extend the algorithm of Crainiceanu and Ruppert [7], originally proposed for simulating the exact distribution of the LRT in a partially linear model, to SAMMs or even LMMs for longitudinal/clustered data. More future research is needed in this area.

3.3. Score tests

In generalized linear models, score tests have been used for testing the overdispersion and heterogeneity of outcomes [10; 24]. Lin [19] extended score tests to GLMMs, in which a global score test as well as individual score tests were proposed to test the null hypotheses of all zero random-effect variance components and individual zero random-effect variance components respectively.

Zhang and Lin [30] considered the problem of testing the nonparametric function f(t) in model (2.2) being a d-order polynomial. They first estimated f(t) by a d-order smoothing spline and expressed f with a mixed effects representation, similar to the one in Section 3.2 for a penalized smoothing spline

f=Tγ+Σa, (3.9)

where f = f(t0), t0 is the vector formed by distinct {tij}'s, T is a matrix formed by zero to the dth polynomials of t0 with corresponding coefficients γ, Σ is a smoothing matrix, and a ~ N(0, τI). Note that this mixed effects representation is basically the same as the Bayesian expression presented in Section 3.2.

Denote by N the incidence matrix mapping t0 to {tij}'s, and define X = (NT, S), B = NΣ. Then under the mixed effects representation (3.9), SAMM (2.2) becomes the following GLMM

g(μ)=Xβ+Ba+Zb, (3.10)

where β = (γT, δT)T are the new fixed effects and (a, b) are the new random effects.

As described in the earlier sections, testing f(t) in SAMM (2.2) being a d-order polynomial is equivalent to testing H0 : τ = 0 in the induced GLMM (3.10). Zhang and Lin [30] adapted the idea of Lin's [19] variance component score tests to test H0 : τ = 0. However, they pointed out that the score tests proposed by Lin [19] for testing zero variance components in GLMMs cannot be used directly for testing H0 : τ = 0. They proposed a scaled chi-squared approximation to the test statistic.

Denote by ψ = (θT, ϕ) the nuisance parameter vector, and by M (τ, ψ) the marginal log-likelihood function of τ and ψ (by integrating out random effects a, b and fixed effects β). Then under the induced GLMM (3.10), the score Uτ for testing H0 : τ = 0 takes the following form

𝒰τ(ψ^)=M(τ,ψ;y)ττ=0,ψ^12{(YXβ)TV1NΣNTV1(YXβ)tr(PNΣNT)}β^,ψ^, (3.11)

where β̂ is the MLE of β and ψ̂ is the REML-type of estimate of ψ under the following null GLMM (3.12), and Y = +Zb+Δ(yμ) is the working vector from the null GLMM

g(μ)=Xβ+Zb, (3.12)

where P = V−1V−1X(XTV−11X)−1XT, V = W−1+ZGZT, G = diag{D, …, D},Δ = diag{g′(μij)}, W = diag{wij} and wij={ϕωij1v(μij)[g(μij)]2}1. Note that model (3.12) is the matrix representation of the original GLMM (2.1).

Because of the special structure of Σ, Zhang and Lin [30] found that the score Uτ (ψ̂) does not follow an asymptotic normal distribution. Write Uτ (ψ) as Uτ (ψ) = Uτ (y; ψ) − e(ψ), where Uτ (y; ψ) and e(ψ) denote the first and the second terms of the above score, and define ψ0 as the true value of ψ under H0 : τ = 0. Zhang and Lin [30] showed that the null distribution of Uτ (y; ψ0) is approximately equal to the one of weighted chi-squared random variables and can be well approximated by a scaled chi-squared distribution. Since the expectation of Uτ (ψ) is an increasing function of τ, larger values of Uτ (ψ̂) give more evidence against H0, which indicates that the score test should be one-sided.

Compared with the LRTs, one major advantage of using the score test statistic Uτ (y; ψ̂) is its easy implementation, as it can be calculated directly by fitting a GLMM (under the null hypothesis) rather than a SAMM. In addition, the critical values can be directly approximated from the regular chi-square distribution. Therefore, it is not necessary to derive the distribution of the test statistics under the null hypothesis as often required by the LRTs. Secondly, as SAMMs encompass a broad class of statistical models, the above score test can be applied in many situations, such as independent Gaussian data [6], clustered Gaussian or binary data, etc. For clustered data, the implementation of the LRTs can be very difficult as expensive computation is needed to approximate the null distribution of the test statistics.

The simulation results showed that the score test statistic above performs very well for Gaussian outcomes, less so for binary data due to the poor approximation of the Laplace method in calculating the score statistic, but improves rapidly as the binomial denominator increases [30].

3.4. Residual based tests

Inspired by the idea of residual plots for checking the goodness-of-fit of regression models, recently Pan and Lin [22] introduced a graphical and numerical approach to assess the adequacy of GLMMs. These methods are “based on the cumulative sums of residuals over covariates or predicted values of the response variable” [22] and are the further extensions of the work by Su and Wei [26] and Lin et al. [18].

Denote by μij(β, θ, ϕ) = E(yij), the marginal mean of yij and define residual eij as eij = yijμ̂ij, where μ̂ij = μ̂ij(β̂, θ̂, ϕ̂), and β̂, θ̂, ϕ̂ are the estimates of the corresponding parameters under the original GLMM (2.1) or model (3.12) in the matrix notation. Pan and Lin [22] then considered the following two classes of stochastic processes

W(x)=m12Σi=1mΣj=1niI(xijx)eij,
Wg(r)=m12Σi=1mΣj=1niI(μ^ijr)eij,

where x = (x1, ⋯, xp)T, r ∈ R, I(xijx) = I(x1ijx1, ⋯, xpijxp), and xkij is the kth component of xij.

Under the assumed GLMM, these stochastic processes converge in distribution to zero-mean Gaussian processes, which can be simulated through Monte Carlo techniques. Each observed cumulative-sum process W(x) or Wg(r) can then be compared, both visually and analytically, to a certain zero-mean Gaussian process. If the assumed GLMM is a reasonable model for the given data, the cumulative-sum processes would behave like white noise. Therefore, any abnormal departure of W(x) or Wg(r) from the zero-mean Gaussian processes would be an indication of model mis-specification. The main advantage of this testing approach is that there is no need to specify the alternatives, therefore it can be used to test whether or not f(t) in SAMM (2.2) can be adequately represented by a polynomial function. Nevertheless this test may be less powerful compared to the other procedures specifically designed for testing f(t).

Introduced by Fan and Huang [11], another residual based test is the so called “adaptive Neyman test”. Although the test statistic is constructed in a completely different way, the basic idea is similar to the one described above, i.e. if a parametric model fits data well, the residuals should fluctuate around 0. They focused on the classical nonparametric model, which is y = f(x) + ∈ with ∈ ~ N(0,σ2). Under the null hypothesis f(·) = m(·,γ) for some γ, where m(·, γ) belongs to a given parametric family, the resulting residuals are given as i^=yim(xi,γ^), i = 1, ⋯, n, where γ̂ is the estimate of γ under the assumed model. Denote ^=(1^,,^n), then ∈̂ is nearly independently and normally distributed with mean vector η = (η1, ⋯, ηn)T where ηi = f(xi)−m(xi, γ0) and γ0 is the convergent limit of γ̂. Thus, the testing problem can be constructed as H0 : η = 0 versus Ha : η ≠ 0. Fan and Huang [11] adopted the adaptive Neyman test to this testing problem. The adaptive Neyman test statistic is constructed based on the Fourier transform of the residuals ∈̂ with its exact null distribution being generated through simulations.

As mentioned earlier, the adaptive Neyman test has only been studied in partially linear models. So, extending it to LMMs or GLMMs could potentially be a future research direction.

4. Comparison between the exact likelihood ratio and the score tests

In this paper, we provided an overview of the four types of testing approaches. Among them, likelihood ratio and score tests have been widely used in a variety of hypothesis testing problems. To our knowledge, however, no comparison between these two tests has been investigated for the current situation, i.e. testing a parametric covariate effect against a nonparametric covariate effect. Here, we conduct a small simulation study to evaluate and compare the performance of these two popular testing procedures. For illustration purposes, we consider testing the linearity of covariate effects under the partially linear model framework, i.e. whether f(t) is a linear function of t in model (3.4). Following the penalized spline, we formulate the exact LRT (named as LRT1), RLRT and the score test as variance component tests based on the mixed model representation (3.7) as discussed above. In additon, for testing the same null hypothesis, we also formulate the exact LRT in a different way (named as LRT2) by modeling the alternative through a quadratic spline. In the latter case, we are testing whether f(t) is a (dh)-degree polynomial of t with d = 2 and h = 1.

Since no exact LRT or RLRT has been developed for mixed models for longitudinal/clustered data, we only consider partially linear models for independent data even though Zhang and Lin's [30] procedure is applicable to more complicated models.

Data in this simulation are generated from the following partially linear model

yi=si1β1+si2β2+f(ti)+i,i=1,2,,m

where si1 is generated from N(0, 0.3), si2 is generated from N(0, 0.4), ti's are equally spaced distinct points in [0,1], and i ~ N(0,σ2). The true values of β1 and β2 are set to be 1.3 and 0.45 respectively. The values of σ are 0.25 and 0.5, and the sample size m is taken to be 50 and 100. A total of five different functions of f(t) are considered, i.e., fc(t) = (0.25c)t · exp(2 − 2t) − t + 0.5, for c = (0, 1, 2, 3, 4) [30]. Note that when c = 0, fc(t) is a linear function of t and fc(t) deviates further from linearity with increasing c. We apply the exact LRT1, LRT2, RLRT and the score testing procedures to each simulated data set. The simulation results are based on 1000 Monte Carlo simulation runs.

For testing the null hypothesis that f(t) is a linear function of t, the size and power of each testing procedure are calculated by setting c = 0 and c ≠ 0 respectively. When a penalized spline is used to estimate f(t) as in the LRT or RLRT, the number of knots for the penalized spline is set to be 20. For the score testing procedure, the smoothing matrix Σ is from a natural smoothing spline.

The simulation results are presented in the Table 1 (m = 50) and Table 2 (m = 100), where the nominal levels are set to be 0.05 and 0.1. Regarding the empirical size, our simulation results show that the exact LRT2, RLRT and the score test are all close to the nominal levels. The empirical size of the LRT1, however, stays unchanged even if the nominal level increases from 0.05 to 0.1. Overall the increased sample size brings the empirical sizes of all these tests closer to the nominal levels, whereas the error noise seems to have not much influence on them. With respect to the power, all tests show decreased power as the error variance increases. As expected, the increased sample size improves the overall power. Note that the powers of the LRT1 are also unchanged as the nominal level increases, which implies that the simulated critical values for the LRT1 may not be accurate with a moderate number of Monte Carlo simulation runs. In general, our simulation indicates that the LRT2, RLRT and score test are more powerful than the LRT1, with the score test slightly out-performing the exact LRT2 and RLRT.

Table 1.

Empirical sizes and powers of the four tests in testing the linearity of covariate effects in model (3.4) where m = 50

nominal
level
σ Test Size Power
c = 0 c = 1 c = 2 c = 3 c = 4
0.05 0.25 LRT1 0.032 0.152 0.696 0.991 1.000
LRT2 0.049 0.419 0.935 0.999 1.000
RLRT 0.067 0.419 0.927 1.000 1.000
Score 0.066 0.443 0.948 1.000 1.000
0.5 LRT1 0.066 0.094 0.224 0.473 0.782
LRT2 0.047 0.135 0.412 0.737 0.923
RLRT 0.050 0.123 0.404 0.720 0.915
Score 0.060 0.158 0.448 0.762 0.936

0.1 0.25 LRT1 0.032 0.152 0.696 0.991 1.000
LRT2 0.115 0.548 0.962 0.999 1.000
RLRT 0.138 0.545 0.970 0.999 1.000
Score 0.124 0.560 0.972 1.000 1.000
0.5 LRT1 0.066 0.094 0.224 0.473 0.782
LRT2 0.093 0.230 0.545 0.838 0.961
RLRT 0.103 0.213 0.531 0.832 0.960
Score 0.104 0.242 0.565 0.859 0.970

Table 2.

Empirical sizes and powers of the four tests in testing the linearity of covariate effects in model (3.4) where m = 100

nominal
level
σ Test Size Power
c = 0 c = 1 c = 2 c = 3 c = 4
0.05 0.25 LRT1 0.044 0.217 0.950 1.000 1.000
LRT2 0.053 0.675 0.994 1.000 1.000
RLRT 0.052 0.661 0.995 1.000 1.000
Score 0.052 0.691 0.997 1.000 1.000
0.5 LRT1 0.068 0.115 0.364 0.810 0.988
LRT2 0.059 0.240 0.681 0.956 0.999
RLRT 0.054 0.221 0.670 0.959 0.999
Score 0.062 0.249 0.697 0.963 0.999

0.1 0.25 LRT1 0.044 0.217 0.950 1.000 1.000
LRT2 0.109 0.778 0.998 1.000 1.000
RLRT 0.102 0.762 0.999 1.000 1.000
Score 0.107 0.779 1.000 1.000 1.000
0.5 LRT1 0.068 0.115 0.364 0.810 0.988
LRT2 0.103 0.353 0.781 0.975 1.000
RLRT 0.112 0.336 0.777 0.982 1.000
Score 0.111 0.363 0.798 0.983 1.000

In comparing to likelihood ratio based tests, the score test has at least two main advantages. First the exact LRT (LRT1 and LRT2) and RLRT are computationally much more intensive than the score test, as deriving the null distributions of the LRT and RLRT statistics requires simulation in each run. The computing time of the exact LRT and RLRT in this simulation is 50 times more than that of the score test. Secondly, the exact LRT and RLRT have not yet been developed for more complicated models such as LMMs and GLMMs, whereas the score testing procedure is flexible and can be adapted to many modeling situations. For simplicity, only the linearity test is considered in the current simulation; however in practice, one might be interested in testing higher-order polynomial covariate effects (i.e. d > 1), which can be easily carried out by using a different d. Overall we consider the score test is a better choice than the LRT and RLRT.

5. Summary

We overview the main development of the four types of testing approaches used for testing a parametric covariate effect versus a nonparametric covariate effect. A considerable amount of work has been done with the LRTs under linear or generalized linear models. The likelihood based tests perform very well for independent data in finite sample situations. However, these test statistics can be difficult to compute in a more complex model, as both the parametric and nonparametric models need to be estimated.

In addition, deriving the null distributions of those test statistics can be challenging. Therefore, it is not straightforward to extend the existing LRTs or RLRTs to LMMs and GLMMs. Compared to the LRTs or RLRTs, the score statistics are easy to compute, usually show good performance and are applicable to both LMMs and GLMMs. Further study may be needed to investigate the properties of the score tests for small samples. The R tests are likelihood-ratio-based tests, hence they share the same advantages and disadvantages as the LRTs. The recently developed residual-based test [22] can be considered as an omnibus test for detecting model mis-specification and can be used to test the adequacy of a polynomial covariate effect. Since no alternative models need to be specified, the residual-based test is applicable in many situations including LMMs and GLMMs. However, it may be less powerful than the other testing procedures that are specifically designed for testing a particular covariate effect. Comparison of the residual-based tests with the score tests in mixed models could be of future interest.

Acknowledgements

The research of Daowen Zhang is partly supported by an NIH grant R01 CA85848-08. I would like to thank the referee and the managing editor Wendy Martinez for many valuable suggestions that greatly improved the presentation of this paper.

Footnotes

*

This paper was accepted by Michael Kosorok, Associate Editor for the IMS.

References

  • 1.Azzalini A, Bowman A. On the use of nonparametric regression for checking linear relationships. Journal of Royal Statistical Society - B. 1993;55:549–557. [Google Scholar]
  • 2.Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88:9–25. [Google Scholar]
  • 3.Brumback B, Ruppert D, Wand MP. Comment on variable selection and function estimation in additive nonparametric regression using data-based prior' by Shively, Kohn and Wood. Journal of the American Statistical Association. 1999;94:794–797. [Google Scholar]
  • 4.Cantoni E, Hastie T. Degrees-of-freedom tests for smoothing splines. Biometrika. 2002;89:251–263. [Google Scholar]
  • 5.Claeskens G, Ding H, Jansen M. Lack-of-fit tests in semiparametric mixed models. 2007 Available on web at www.econ.kuleuven.be/fetew/pdf_publicaties/KBI_0709.pdf.
  • 6.Cox D, Koh E, Wahba G, Yandell B. Testing the (parametric) null model hypothesis in (semiparametric) partial and generalized spline models. Annals of Statistics. 1988;21:903–923. [Google Scholar]
  • 7.Crainiceanu CM, Ruppert D. Likelihood ratio tests in linear mixed models with one variance component. Journal of Royal Statistical Society - B. 2004;66:165–185. [Google Scholar]
  • 8.Crainiceanu CM, Ruppert D. Exact likelihood ratio tests for penalized splines. Biometrika. 2005;92:91–103. [Google Scholar]
  • 9.Crainiceanu CM, Ruppert D, Vogelsang TJ. Some properties of likelihood ratio tests in linear mixed models (unpublished) 2003 [Google Scholar]
  • 10.Dean C. Testing for overdispersion in Poisson and binomial regression models. Journal of the American Statistical Association. 1992;87:451–457. [Google Scholar]
  • 11.Fan JQ, Huang LS. Goodness-of-fit tests for parametric regression models. Journal of the American Statistical Association. 2001;96:640–652. [Google Scholar]
  • 12.Gu C. Penalized likelihood regression: a Bayesian analysis. Statistica Sinica. 1992;2:255–264. [Google Scholar]
  • 13.Hardle W, Mammen E, Muller M. Testing parametric versus semiparametric modeling in generalized linear models. Journal of the American Statistical Association. 1998;93:1461–1474. [Google Scholar]
  • 14.Harville DA. Extension of the Gauss-Markov theorem to include the estimation of random effects. Annals of Statistics. 1976;4:384–395. [Google Scholar]
  • 15.Hastie T, Tishirani R. Generalized additive models. Chapman & Hall; New York: 1990. [Google Scholar]
  • 16.Laird NM, Ware JH. Random effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
  • 17.Liang H. Checking linearity of non-parametric component in partially linear models with an application in systemic inflammatory response syndrome study. Statistical Methods in Medical Research. 2006;15:273–284. doi: 10.1191/0962280206sm440oa. [DOI] [PubMed] [Google Scholar]
  • 18.Lin DY, Wei LJ, Ying Z. Model-checking techniques based on cumulative residuals. Biometrics. 2002;58:1–12. doi: 10.1111/j.0006-341x.2002.00001.x. [DOI] [PubMed] [Google Scholar]
  • 19.Lin X. Variance component testing in generalized linear models with random effects. Biometrika. 1997;84:309–326. [Google Scholar]
  • 20.Liu A, Meiring W, Wang Y. Testing generalized linear models using smoothing spline methods. Statistic Sinica. 2004;15:235–256. [Google Scholar]
  • 21.Lombardia MJ, Sperlich S. Semiparametric inference in generalized mixed effects models. 2007 http://ssrn.com/abstract=1010928.
  • 22.Pan Z, Lin DY. Goodness-of-fit methods for generalized linear mixed models. Biometrics. 2005;61:1000–1009. doi: 10.1111/j.1541-0420.2005.00365.x. [DOI] [PubMed] [Google Scholar]
  • 23.Self SG, Liang KY. Asymptotic properties of maximum likelihood estimates and likelihood ratio tests under non-standard conditions. Journal of the American Statistical Association. 1987;82:605–610. [Google Scholar]
  • 24.Smith PJ, Heitjan DF. Testing and adjusting for departures from nominal dispersion in generalized linear models. Applied. Statistics. 1993;41:31–41. [Google Scholar]
  • 25.Stram DO, Lee JW. Variance components testing in the longitudinal mixed effects model. Biometrics. 1994;50:1171–1177. [PubMed] [Google Scholar]
  • 26.Su JQ, Wei LJ. A lack-of-fit test for the mean function in a generalized linear model. Journal of the American Statistical Association. 1991;86:420–426. [Google Scholar]
  • 27.Wahba G. Spline models for observational data; CBMS-NSF regional conference series in applied mathematics, SIAM; 1990. [Google Scholar]
  • 28.Zeger SL, Karim MR. Generalized linear models with random effects: A Gibbs sampling approach. Journal of the American Statistical Association. 1991;86:79–86. [Google Scholar]
  • 29.Zhang D, Lin X. Semiparametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association. 1998;93:710–719. [Google Scholar]
  • 30.Zhang D, Lin X. Hypothesis testing in semiparametric additive mixed models. Biostatistics. 2003;4:57–74. doi: 10.1093/biostatistics/4.1.57. [DOI] [PubMed] [Google Scholar]
  • 31.Zhang D. Generalized linear mixed models with varying coefficients for longitudinal data. Biometrics. 2004;60:8–15. doi: 10.1111/j.0006-341X.2004.00165.x. [DOI] [PubMed] [Google Scholar]

RESOURCES