Skip to main content
HHS Author Manuscripts logoLink to HHS Author Manuscripts
. Author manuscript; available in PMC: 2021 Aug 1.
Published in final edited form as: Stat Methods Med Res. 2019 Nov 5;29(8):2087–2099. doi: 10.1177/0962280219884980

Modification of the generalized quasi-likelihood model in the analysis of the Add Health study

Katherine E Irimata 1,*, Jeffrey R Wilson 2
PMCID: PMC7233292  NIHMSID: NIHMS1586973  PMID: 31686601

Abstract

The relationship between the mean and variance is an implicit assumption of parametric modeling. While many distributions in the exponential family have a theoretical mean-variance relationship, it is often the case that the data under investigation are correlated, thus varying from the relation. We present a generalized method of moments estimation technique for modeling certain correlated data by adjusting the mean-variance relationship parameters based on a canonical parameterization. The proposed mean-variance form describes overdispersion using two parameters and implements an adjusted canonical parameter which makes this approach feasible for all distributions in the exponential family. Test statistics and confidence intervals are used to measure the deviations from the mean-variance relation parameters. We use the modified relation as a means of fitting generalized quasi-likelihood models to correlated data. The performance of the proposed modified generalized quasi-likelihood model is demonstrated through a simulation study and we highlight the importance of accounting for overdispersion in the evaluation of adolescent obesity data collected from a U.S. longitudinal study.

Keywords: Canonical parameter, correlation, generalized linear models, generalized method of moments, overdispersion

1. Introduction

As a common statistical measure, the variance is often relied on to evaluate the model fit and to understand the differences between the responses through the construction of test statistics and confidence intervals. The form of the variance is often assumed based on the underlying distribution of the responses. In fact, the variance is related to the mean for most distributions in the exponential family. However, while the responses may be on a certain scale or resemble a certain distribution, extraneous variation can impact the mean-variance relationship. Extraneous variation, or so-called overdispersion, is often present in longitudinal or clustered data arising from a hierarchical data structure.

Ignoring overdispersion in the fit of correlated data results in summary statistics, including test statistics, with a larger variance than expected.1 It often leads to a loss of efficiency in using statistics appropriate for the single-parameter family.2 Studies have shown that ignoring overdispersion and thereby misspecifying the model biases the covariate effects and greatly impacts the standard error of the coefficients.3, 4 While underdispersion, the case when the variation is smaller than expected, may occur and also impacts the accuracy of the analysis when it is not appropriately specified, McCullagh and Nelder5 have suggested that overdispersion may be the norm. Various methods have been proposed to identify the underlying variation and provide corrections to improve estimates of the variance.6, 7

Overdispersion or underdispersion is often identified by estimating the parameters in the mean-variance relationship and measuring the deviations from the theoretical values under the assumed distribution. Kukush et al.8 considered a pair of mean and variance functions with a common parameter vector θ estimated using an extended quasi-score function. Tsou9 considered two parameters (ψ,λ) in a parametric robust method of determining the mean-variance relationship through estimation of the power λ with an adjusted robust log likelihood method for fixed values of ψ. In addition, researchers have developed methods to test for overdispersion in proportions10 and score test statistics for overdispersed Poisson and binomial models.11 Xiang et al.12 provided a score test for overdispersion in a zero-inflated Poisson mixed regression model. Yang, Hardin, and Addy13 modified the score statistic to test overdispersion in the zero–inflated generalized Poisson mixed model. While these tests work well for identifying overdispersion, current parameterizations are limited to one parameter or are only applicable to distributions that have a particular form for the variance.

Overdispersed data are analyzed with appropriate statistical models such as generalized estimating equations, generalized linear mixed models, and joint modeling of the mean and dispersion.14 Generalized estimating equations account for correlation through the selection of a covariance structure for the correlated responses.15 Generalized linear mixed models have been used to model overdispersion in non-normal data.16 These models incorporate random effects, through random intercepts and random slopes, to account for correlation due to clustering.17 The joint modeling of the mean and the variance uses an additional dispersion submodel to address the overdispersion in a generalized linear model context.18 Joint modeling allows one to simultaneously model both the mean and the variance through submodels. This technique has been extended to consider joint modeling in hierarchical generalized linear model structures.19, 20

Quasi-likelihood models are useful in cases where the underlying distribution is unspecified.21 This modeling technique relaxes the distributional assumption in the random component and instead relies on the specification of a mean-variance function. The regression parameter estimates and standard errors are obtained from the specified mean-variance relationship and estimates of the covariance matrix in a quadratic form. The quasi-likelihood approach possesses many good properties, including unbiased estimates and small standard errors as compared to alternative methods.22 While the quasi-likelihood method is appropriate for evaluating overdispersed data, the form of the variance has been limited to a single multiplicative overdispersion parameter.

This paper proposes a modified generalized quasi-likelihood (MGQL) model which utilizes a canonical two-parameter mean-variance relation. The proposed canonical parameterization is flexible and can be used to represent the form of the variance for any distribution in the exponential family. The incorporation of this mean-variance relationship in the MGQL extends quasi-likelihood models to describe a larger class of variance functions in the analysis of correlated data.

In Section 2, we review mean-variance relationships in the exponential family and the generalized quasi-likelihood approach to estimate the regression parameters and variance components in the analysis of clustered data.23 In Section 3, we present the canonical parameterization of the mean-variance relationship. We provide a generalized method of moments (GMM) approach to estimate the mean-variance parameters and introduce a test to identify overdispersion or underdispersion. We propose a model that incorporates the mean-variance relationship in the form of a modified generalized quasi-likelihood. In Section 4, a simulation study is utilized to demonstrate the performance of the MGQL model. In Section 5, the MGQL model is used to analyze data collected through the National Longitudinal Study of Adolescent to Adult Health (Add Health).24 This study collected health information on adolescents over four waves of interviews, and is highly correlated due to the nested structure of the longitudinal study. We demonstrate the use of the MGQL to appropriately account for overdispersion in the evaluation of risk factors associated with obesity.

2. Background

2.1. Mean-Variance Relation Parameters

Let the vector of observations y=(y1,,yn)T be realizations of a set of independent random variables Y with means μ related to a set of k covariates X=(x1T,,xkT) through a function g() such that E(Y)=μ=(μ1,,μn)T and g(μ)=η=Xβ. The estimates of the regression parameters β=(β0,,βk)T can be obtained using estimating equations from the likelihood. Let the joint probability function of Yi with known functions a, b, and c and known dispersion parameter ϕ5 be

f(y;θ,ϕ)=exp[(yθb(θ))a(ϕ)+c(y,ϕ)].

If θ is unknown, we have a two-dimensional exponential family with log likelihood

l(θ,ϕ|y)=(yθb(θ))a(ϕ)+c(y,ϕ).

Using the expectation of the derivative of the likelihood

E(l(θ,ϕ|y)θ)=0

and the property

E(2l(θ,ϕ|y)θ2)+E(l(θ,ϕ|y)θ)2=0

yields the expected value E(Y)=b(θ) and the variance var(Y)=b(θ)a(ϕ) where b() denotes the first derivative and b() denotes the second derivative. Thus, both b and b are functions of the canonical parameter θ. The mean and the variance are related through the first derivative and second derivative of the function b(θ). The variance of the observations is a product of a function of the canonical parameter θ and a function of the dispersion parameter ϕ.

The Poisson distribution and the binomial distribution are members of the exponential family and are commonly used to analyze count data and binary data, respectively. The Poisson distribution has probability mass function

f(y;α)=exp(ylogααlogy!)

with a(ϕ)=1, b(θ)=α=exp(θ), c(y,ϕ)=logy!, and canonical parameter θ=log(α). Thus, the expected mean and variance under the Poisson distribution are equal to α. The binomial distribution has probability distribution function

f(y;m,p)=(my)py(1p)my,

with a(ϕ)=1, b(θ)=mlog(1+exp(θ)), c(y,ϕ)=log(my) and the canonical parameter θ=log(p1p). Under the binomial distribution, the expected mean is mp and the expected variance is mp(1p).5, 25

2.2. Generalized Quasi-likelihood Models

Generalized quasi-likelihood models use the specification of the mean-variance relationship to evaluate correlated data. Consider vectors of correlated observations y1,yn for i=1,n where yi=(yi1,.yini). The correlated observations yij for j=1,,ni where ni is the sample size of the ith vector comes from a distribution in the exponential family with link function g() such that

g(μij)=ηij=xijTβ+σξi

for ξi~N(0,1) as ξi=αi/σ for k covariates and random effects αi~N(0,σ2) for cluster i. We estimate the parameters β and σ using GQL. Let the response vector be

Si=(yiT,uiT)

for cluster i, where yiT=(yi1,,yini) and uiT=(ui1T,ui2T) contains the pairwise products for ui1=(yi12,,yini2) and ui2=(yi1yi2,,yijyij*,,yi(ni1)yini). Let θ=(βT,σ)T and Mi(θ) be the mean of the response vector Si. Let Ωi(θ) be the covariance matrix for Si with elements ωij. Then, the set of generalized quasi-likelihood estimating equations,

i=1nMi'(θ)θΩi1(θ)[SiMi(θ)]=0 (2.1)

provides GQL estimates of β and σ.21, 23 The mean of the response vector, Mi(θ), is evaluated as

Mi(θ)=E(Si)=E(Yi1,,Yini,Yi12,,Yini2,Yi1Yi2,,Yi(ni1)Yni)

and

E(Yij)=μij(θ)=E[g(1)(xijTβ+σξ)]
E(Yij2)=mijj(θ)=E[g(2)(xijTβ+σξ)]
E(YijYik)=mijk(θ)=E[g(1)(xijTβ+σξ)g(1)(xikTβ+σξ)]

where the functions g(r)(ηij) are the rth finite moments of yij. The partial derivative matrix Mi'(θ)θ has dimension (p+1)×{ni(ni+1)/2}, with partial derivatives

μij(θ)β=E[g˜(1)(xijTβ+σξ)]xijT
mijj(θ)β=E[g˜(2)(xijTβ+σξ)]xijT
mijk(θ)β=E[g˜(1)(xijTβ+σξ)g(1)(xikTβ+σξ)]xijT+E[g(1)(xijTβ+σξ)g˜(1)(xikTβ+σξ)]xikT
μij(θ)σ=E[ξg˜(1)(xijTβ+σξ)]xijT
mijj(θ)σ=E[ξg˜(2)(xijTβ+σξ)]xijT
mijk(θ)σ=E[ξ{g˜(1)(xijTβ+σξ)g(1)(xikTβ+σξ)+g(1)(xijTβ+σξ)g˜(1)(xikTβ+σξ)}]

where g˜(r)() as the first derivative of g(r)(). Then the covariance matrix is

Ωi=[ΣiPiPi'Qi]

where Σi=cov(Yi), Pi=cov(Yi,UiT) and Qi=cov(Ui). The diagonal elements of Σi are the variances of Yi such that σijj=Var(Yij)=mijj(θ)μij2(θ) with off diagonal elements σijk=Cov(Yij,Yik)=mijk(θ)μij(θ)μik(θ). The matrix Pi is of dimension ni×{ni(ni+2)/2} and contains cov(Yij,Yij2), cov(Yij,YijYil) and cov(Yij,YikYil). For j=k=l,

cov(Yij,Yij2)=pijjj(θ)=E[g(3)(xijTβ+σξ)]μij(θ)mijj(θ).

For j=kl and j=lk, the covariance elements are

cov(Yij,YijYil)=E(Yij2Yil)μij(θ)mijl(θ)
=E[g(2)(xijTβ+σξ)g(1)(xilTβ+σξ)]μij(θ)mijl(θ).

For jkl,

cov(Yij,YikYil)=pijkl(θ)
=E[g(1)(xijTβ+σξ)g(1)(xikTβ+σξ)g(1)(xilTβ+σξ)]μij(θ)mikl(θ).

In the covariance matrix, Qi contains cov(YijYik,YilYiw) with dimension {ni(ni+1)/2}×{ni(ni+1)/2}. For j=k=l=m,

cov(Yij2,Yij2)=qijjjj(θ)=E[g(4)(xijTβ+σξ)]mijj2(θ).

For j=klw,

cov(Yij2,YilYiw)=qijjlw(θ)
=E[g(2)(xijTβ+σξ)g(1)(xilTβ+σξ)g(1)(xiwTβ+σξ)]mijj(θ)milw(θ).

For jklw,

cov(YijYik,YilYiw)=qijklw
=E[g(1)(xijTβ+σξ)g(1)(xikTβ+σξ)g(1)(xilTβ+σξ)g(1)(xiwTβ+σξ)]mijk(θ)milw(θ).

The quasi-likelihood estimate θ^QL=(β^QLT,σ^QL)T is obtained using Newton-Raphson iteration as θ^QL(t+1)=θ^QL(t)+[i=1nMi'(θ)θΩi1(θ)Mi(θ)θ](t)1[i=1nMi'(θ)θΩi1(θ)[SiMi(θ)]] with covariance V where V^(θ^QL)=[i=1nMi'(θ)θΩi1(θ)Mi(θ)θ]1. These GQL estimators are consistent and efficient.23 A specification of the GQL model is important as consistency of the regression parameter estimates depends on correctly specifying the link function and the efficiency depends on a correctly specified variance function.

3. Modified Generalized Quasi-likelihood Models using the Canonical Mean-Variance Parameterization

3.1. Canonical Parameterization

For a random variable Y, we consider describing the variance in terms of two parameters including a dispersion parameter ψ and power parameter λ. Tsou9 suggested the mean-variance relationship, ψμλ, which introduced additional flexibility compared to the one parameter form. However, this form is limited to distributions with a variance power relationship such as the Poisson or gamma distributions (λ=1 or λ=2, respectively). We generalize this power parameterization through the canonical parameter. Consider the canonical parameter θ through its derivative of the inverse link function h, where h=g1. Then, we propose the general mean-variance relationship as

var(Y)=ψ[h(g(μ))]λ=ψ[h(θ)]λ

where h(θ) is the first derivative of the inverse of the canonical link and ψ and λ are parameters measuring the overdispersion, ψ>0. This form of the mean-variance relationship is distinct as it is applicable to members of the exponential family of distributions and provides general flexibility in describing the mean-variance relationship.

For example, if Y follows a Poisson distribution with natural parameter α, then the canonical parameter is θ=log(α) and the corresponding mean-variance parameter relation is

var(Y)=ψ[h(θ)]λ=ψαλ.

If Y follows a binomial distribution with natural parameters m and p and canonical parameter θ=logit(p), then the canonical mean-variance parameter relation is

var(Y)=ψ[h(θ)]λ=ψ[exp(θ)(1+exp(θ))2]λ=ψ[p(1p)]λ,

as p=eθ(1+eθ)1. Thus, the dispersion parameter ψ and power parameter λ are a means of adjusting for deviation from the assumed distributional properties. Values of the overdispersion parameters ψ and λ different than 1 indicate violations in the variance assumption. While the binomial does not have a natural power relationship between the mean and variance, the relationship is tractable in its power form when based on the canonical parameter. We measure the deviations and consider distributions that are not fully identified but belong to the quasi-exponential family. Such distributions are identified based on the scale of the responses which is robust to misspecification.

3.2. Estimation of ψ and λ

Let γ^GMM be an estimator for a vector of parameters γ=(ψ,λ)T that minimizes the quadratic objective function fn(γ)TWnfn(γ) where fn(γ) is a function of the vector of the sample moment conditions, and Wn is a symmetric, positive definite weight matrix of dimension n.26, 27 Then,

γ^GMM=argminβ{fn(γ)TWnfn(γ)} (3.1)

is a generalized method of moments estimator for γ which minimizes the objective function. Thus, the GMM estimators of the parameters ψ and λ, are obtained from the population moment conditions

E(h(θi)(var(yi)ψ[h(θi)]λ))=0 (3.2a)
E(h(θi)2(var(yi)ψ[h(θi)]λ))=0 (3.2b)

where h(θi) is the first derivative of the inverse link function and var(yi) is an empirical estimate of the variance based on the squared residual, (yiμi)2. Equating the moment conditions and an empirical estimate of fn(γ) results in

1ni=1nf(yi,ψ,λ)=[1ni=1nh(θi)(var(yi)ψ[h(θi)]λ)1ni=1nh(θi)2(var(yi)ψ[h(θi)]λ)]=[00]

We make use of a two-step GMM approach, with an identity weight matrix in the first step. In the second step, the weights are selected as an estimate of the optimal weight matrix for GMM as

W^n=[1ni=1nf(yi,ψ^,λ^)f(yi,ψ^,λ^)T]1

where ψ^ and λ^ are mean-variance relationship parameter estimates from the first step.28 Thus, the vector of GMM estimates for the mean-variance relation parameters γ^GMM minimizes the quadratic objective function fn(γ)TWnfn(γ). The generalized method of moments approach is flexible and estimates both parameters (ψ,λ) simultaneously.

An alternative approach is to fix one parameter at a time and estimate the second parameter using one moment condition. Thus, an extension of GMM is to make use of additional moment conditions, such as

f3(yi,ψ,λ)=h(θi)3(var(yi)ψ[h(θi)]λ) (3.3)

to estimate the parameters. The additional moment condition improves the asymptotic efficiency, although there is a possibility of small sample bias.29

To identify the mean-variance relation in clustered data, consider yij, the jth observation in the ith cluster, j=1,,ni and i=1,,n, with mean μij which is related to k covariates and random effect αi through the link function g such that

E(Yij)=μij

and

g(μij)=ηij=xijkTβ+αi

where the random effect αi represents the variation between clusters such that αi~N(0,σ2). Further, let ξi=αi/σ, such that ξi~N(0,1) so the linear predictor reduces to

g(μij)=xijkTβ+σξi.

The general mean-variance relationship, obtained for the data across all the clusters, is

var(Y)=ψ[h(θ)]λ,

where θ is the canonical parameter, h(θ) is the first derivative of the inverse canonical link, and ψ and λ are the variance parameters estimated from the data, (3.2a) and (3.2b). The estimates of ψ and λ indicate the strength of the clustering (variance of the random effect).

The parameters ψ and λ are essential in defining the variance and identifying deviations from the theoretical values in a known distributional mean-variance relation. We obtain GMM estimators using the first and second moments based on the fact that the distribution is a member of the quasi-exponential family.30 We do not require complete distributional assumptions, as is required with maximum likelihood estimators, and the estimates are obtainable even when likelihood methods are computationally burdensome.26 The GMM estimators for ψ and λ are consistent and asymptotically normal.31

3.3. Inference for ψ and λ

Assume that the data come from a quasi-exponential family. The sample moments are asymptotically normally distributed, so we have

n(fn(γ^))dN(0,Δ),

with the asymptotic variance Δ=E[f(y,γ)f(y,γ)T] where γ is the estimate of the parameters of interest.30 For the mean-variance relationship parameters ψ and λ, the GMM estimator γ^GMM=(ψ^GMM,λ^GMM) has the asymptotic covariance

var(γ^GMM)=VGMM=1n[ΓTWΓ]1ΓTWΔWΓ[ΓTWΓ]1

where W is a specified weight matrix and Γ is the expected value of the Jacobian of population moment conditions found as

Γ=E[f(y,γ)γ]=E[f(y,λ,ψ)ψ,f(y,λ,ψ)λ]T.

In the optimal case, the weight matrix is selected as W=(Δ)1, so that

VGMM=1n[ΓWΓ]1

resulting in asymptotically efficient GMM estimators, ψ^GMM and λ^GMM.26 In practice, the covariance matrix is evaluated using γ^,

Γ^=1ni=1nf(y,γ^)γ^.

Significant overdispersion is identified through two hypothesis tests of the overdispersion parameters ψ and λ,

H0:ψ=1,Ha:ψ>1

and

H0:λ=1,Ha:λ>1.

Then the z-test statistics

Zψ=ψ^1var(ψ^)

and

Zλ=λ^1var(λ^),

follow the standard normal distribution under the null hypothesis. Thus, a measure of the overdispersion is given based on the joint 100(1-α)% confidence intervals for ψ and λ,

(ψ^GMMz1α2VGMM,ψ,ψ^GMM+z1α2VGMM,ψ)
(λ^GMMz1α2VGMM,λ,λ^GMM+z1α2VGMM,λ)

where zα is the αth quantile from the standard normal distribution.28

3.4. Modified Generalized Quasi-likelihood Models

In this section, we propose a modified generalized quasi-likelihood model for correlated data based on the canonical parameterization. As correlated data necessitate dealing with extravariation, we rely on our two-parameter mean-variance relation. The GQL approach relies on the specification of the mean-variance relationship rather than a distributional assumption. We address the correlation through the empirical mean-variance estimates of ψ and λ.

The generalized quasi-likelihood estimating equation (2.1), with Mi, Mi'(θ)θ, and the covariance matrix Ωi(θ), is used to estimate the regression parameters β and the variance of the random effect σ. Though GQL performs well and produces consistent and efficient estimators,23 it relies on the estimate of the covariance Ωi(θ). We update this estimate to incorporate extravariation based on the canonical parameterization ψΩi(θ)λ such that the modified quasi-likelihood estimate θ^MGQL is obtained iteratively as

θ^MGQL(t+1)=θ^MGQL(t)+[i=1nMi'(θ)θ(ψΩiλ(θ))1Mi(θ)θ](t)1[i=1nMi'(θ)θ(ψΩiλ(θ))1[SiMi(θ)]].

with covariance

V^(θ^MGQL)=[i=1nMi'(θ)θ(ψΩiλ(θ))1Mi(θ)θ]1.

This modification makes use of the GMM estimates of ψ and λ. For given ψ and λ, the MGQL estimates of β and σ in (2.1) are unbiased. The MGQL estimators are consistent and efficient as μij is the mean of yij.

4. Simulation Study

We simulate hierarchical binary data and evaluate the estimation of the regression parameters using the MGQL model which incorporates GMM estimates of the mean-variance parameters into the quasi-likelihood model framework, a GQL model, and a generalized linear mixed model (GLMM) over 1000 iterations. The two-level binary data contain 50 clusters with 10 observations in each cluster, with the linear predictor logit(μi)=ηi=β1Xi1+β2Xi2+αi where β1=β2=1 and X1 and X2 are generated from standard normal distributions. The canonical mean-variance parameter relation under the Bernoulli distribution is var(Y)=ψ[p(1p)]λ. To evaluate the performance of these regression methods under the true mean-variance relation and an overdispersed form, we consider cases where the random effects are generated under the normal distribution and the t-distribution with 4 degrees of freedom. The GLMM is fit using the default optimization techniques in the R statistical software, which utilizes Nelder-Mead for the preliminary optimization of the random effects parameters and the bobyqa optimizer from the minqa package for the final estimation of the random and fixed effects parameters.

4.1. Normally Distributed Random Effects

We evaluate hierarchical binary data with normally distributed random effects. The random intercept αi associated with each cluster is generated from N(0,σ2) with σ = 0.6, 0.8, 1, 1.2, 1.4. The results are reported in Table 1. The generalized linear mixed model estimates for the standard error of σ^ were obtained using a profile likelihood approach.

Table 1:

Model Fit Simulation Results for Normally Distributed Random Effects

β1 β2 σ Iterations
Est SE Est SE Est SE
σ = 0.6 MGQL 1.0083 0.1228 1.0181 0.1231 0.5687 0.2177 5.5
GQL 1.0053 0.1319 1.0120 0.1320 0.5837 0.2019 5.2
GLMM 1.0040 0.1316 1.0110 0.1318 0.5706 0.1872 -
σ = 0.8 MGQL 1.0093 0.1243 1.0241 0.1247 0.7822 0.1880 4.5
GQL 1.0045 0.1344 1.0159 0.1348 0.7948 0.1814 4.0
GLMM 1.0035 0.1327 1.0149 0.1331 0.7790 0.1869 -
σ = 1 MGQL 1.0115 0.1281 1.0237 0.1284 0.9841 0.1924 4.5
GQL 1.0047 0.1372 1.0137 0.1375 1.0014 0.1916 4.0
GLMM 1.0041 0.1286 1.0131 0.1288 0.9832 0.1943 -
σ = 1.2 MGQL 1.0103 0.1333 1.0225 0.1336 1.1837 0.2067 4.6
GQL 1.0025 0.1401 1.0126 0.1404 1.2044 0.2096 4.1
GLMM 1.0023 0.1220 1.0125 0.1222 1.1835 0.2108 -
σ = 1.4 MGQL 1.0148 0.1402 1.0253 0.1405 1.3951 0.2280 4.7
GQL 1.0078 0.1437 1.0146 0.1439 1.4120 0.2326 4.2
GLMM 1.0080 0.1137 1.0147 0.1139 1.3874 0.2335 -

The simulation results demonstrate that the MGQL approach performs well and suggests that the MGQL model recovers the true values when relying on the estimated mean-variance relationship in the covariance matrix. While the parameter estimates are similar across the three methods, the standard errors for the MGQL estimates of β1 and β2 are lower than the standard errors of the GQL approach across all values of σ. The MGQL approach requires slightly more iterations, on average, than the GQL approach to achieve convergence, although the two approaches require similar computation time. As expected, the GLMM performs the best among the three methods as the data are generated under this model.

4.2. t-Distributed Random Effects

We evaluate the performance of the MGQL, GQL, and GLMM for non-normally distributed random effects. The random effects αi are generated under the t-distribution with 4 degrees of freedom, which has heavier tails than the normal distribution. The model parameter estimates and standard errors are reported in Table 2.

Table 2:

Model Fit Simulation Results for t-Distributed Random Effects

β1 β2 σ Iterations
Est SE Est SE Est SE
σ = 0.6 MGQL 1.0185 0.1245 1.0141 0.1243 0.7653 0.1922 5.0
GQL 1.0148 0.1347 1.0099 0.1345 0.7831 0.1818 4.0
GLMM 1.0136 0.1324 1.0089 0.1320 0.7671 0.1883 -
σ = 0.8 MGQL 1.0150 0.1287 1.0071 0.1283 1.0047 0.1953 4.7
GQL 1.0143 0.1379 1.0041 0.1374 1.0214 0.1935 4.1
GLMM 1.0143 0.1281 1.0041 0.1275 1.0075 0.1977 -
σ = 1 MGQL 1.0134 0.1354 1.0119 0.1352 1.2435 0.2136 4.9
GQL 1.0118 0.1414 1.0111 0.1413 1.2542 0.2151 4.2
GLMM 1.0127 0.1190 1.0120 0.1191 1.2417 0.2194 -
σ = 1.2 MGQL 1.0104 0.1427 1.0090 0.1425 1.4770 0.2385 4.7
GQL 1.0102 0.1450 1.0093 0.1449 1.4821 0.2414 4.3
GLMM 1.0119 0.1207 1.0110 0.1205 1.4698 0.2466 -
σ = 1.4 MGQL 1.0128 0.1507 1.0050 0.1503 1.7140 0.2705 4.8
GQL 1.0135 0.1490 1.0076 0.1487 1.7109 0.2716 4.4
GLMM 1.0155 0.1328 1.0096 0.1327 1.6969 0.2770 -

As seen in the previous simulation, MGQL tends to be more efficient than the GQL approach for estimates of β. The simulation results also highlight the advantage of implementing the MGQL model for overdispersed data. For small values of σ, the simulated data reflect the true mean-variance parameterization for the Bernoulli distribution and thus we see similar performance across the three methods. However, for larger values of σ where overdispersion is present (for σ = 1.2 and σ = 1.4, there was significant overdispersion in 60.0% and 62.3% of simulations, respectively), we find that the MGQL model produces improved estimates of β1 and β2 compared to the GQL model and GLMM. In addition, for values of σ > 1, we see that MGQL produces more efficient variance estimates. Thus, when the random effects are not normally distributed, the MGQL approach has many advantages for modeling overdispersed data compared to the GQL model and GLMM.

5. Numerical Example

The Add Health Study is a longitudinal study in the United States of adolescents in 7th through 12th grade, with information collected over four waves of interviews between 1994 and 2008.24 The data are available on the Add Health website (http://www.cpc.unc.edu/addhealth). We fit a modified generalized quasi-likelihood model to evaluate the binary variable adolescent obesity for 2,712 adolescents in the United States. The factors associated with obesity include activity scale and feeling scale, ratings of physical activity and emotional health. The mean-variance parameter estimates are ψ^=3.16 and λ^=1.80 with standard errors 0.27 and 0.06, respectively. The estimates indicate a significant deviation from the distributional form of the mean-variance relationship (test statistics Zψ=7.94 and Zλ=13.57). Thus, making use of the true mean-variance form accounts for the clustering. The model parameter estimates and standard errors of the MGQL, GQL, and GLMM are provided (Table 3).

Table 3:

Parameter estimates and standard errors for adolescent obesity data

βActivityScale βFeelingScale σ
MGQL Estimate −1.0921 −0.6758 2.6078
Std. Error 0.0326 0.0650 0.0794
GQL Estimate −1.3458 −0.5041 2.7022
Std. Error 0.0449 0.0718 0.0974
GLMM Estimate −1.3438 −0.6527 2.6900
Std. Error 0.0471 0.0726 0.0959

The covariates activity scale and feeling scale as well as the random effect are found to be significant across all three models. The regression parameter estimates for activity scale are positive, indicating that increased physical activity is associated with a lower probability of obesity. Similar estimates are produced for the GQL and GLMM approaches, while the MGQL estimate is slightly smaller (βActivityScale=1.0921). Similarly, the parameter estimates for feeling scale vary slightly although all three estimates are negative, indicating that larger values of the computed emotional health measure is associated with a lower probability of obesity. The estimates of the standard error of the random effect σ vary slightly among the three models, with σ^MGQL=2.6078, σ^GQL=2.7022, and σ^GLMM=2.6900. The random effect variance for the MGQL model is found to be the lowest of the three estimates.

6. Conclusions

It is common to assume that the variance of a random variable is a function of the mean, although it is often the case that the true variance in the data may be inflated due to underlying correlation or the hierarchical data structure. While the presence of overdispersion impacts the accuracy of statistical evaluations, the MGQL is a modeling approach that appropriately fits correlated data. The MGQL approach is flexible as it accounts for correlation through an extended representation of the covariance. The canonical parameterization is tractable in the power form for any distribution in the exponential family. Moreover, deviations in the variance can be readily identified using the proposed GMM estimators of the mean-variance parameters ψ and λ which are consistent and asymptotically normal. A simulation study demonstrated that the MGQL addresses correlation through the use of the mean-variance relationship and performs as well or better than existing methods including GQL models and GLMM, particularly for non-normally distributed random effects. The study confirmed that the MGQL retains good properties of quasi-likelihood approaches including unbiased estimates and small standard errors. In addition, we consider a numerical example to evaluate obesity data from the Add Health study. We verified that the MGQL model produced comparable results to existing models. Factors including activity scale and feeling scale were found to be negatively associated with obesity, and the MGQL model produced a lower variance estimate of the random effect.

Acknowledgements

This research uses data from Add Health, a program project directed by Kathleen Mullan Harris and designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill, and funded by grant P01-HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, with cooperative funding from 23 other federal agencies and foundations. Special acknowledgment is due Ronald R. Rindfuss and Barbara Entwisle for assistance in the original design. Information on how to obtain the Add Health data files is available on the Add Health website (http://www.cpc.unc.edu/addhealth). No direct support was received from grant P01-HD31921 for this analysis.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science Foundation [grant number DGE-1311230] and the National Institutes of Health Alzheimer’s Consortium Fellowship Grant [grant number NHS0007].

Footnotes

Publisher's Disclaimer: Disclaimer: The work for this paper was conducted while the first author was at Arizona State University. The findings and conclusions in this paper are those of the authors and do not necessarily represent the views of the National Center for Health Statistics, Centers for Disease Control and Prevention.

Declaration of Conflicting Interests

The Authors declare that there is no conflict of interest.

References

  • 1.Morel JG and Neerchal NK. Overdispersion Models in SAS Cary: SAS Institute Inc, 2012. [Google Scholar]
  • 2.Cox DR. Some remarks on overdispersion. Biometrika 1983; 70: 269–274. [Google Scholar]
  • 3.Wilson JR and Koehler KJ. Hierarchical Models for Cross-classified Overdispersed Multinomial Data. Journal of Business and Economic Statistics 1991; 9: 103–110. [Google Scholar]
  • 4.Milanzi E, Alonso A and Molenberghs G. Ignoring overdispersion in hierarchical loglinear models: Possible problems and solutions. Statistics in Medicine 2012; 31: 1475–1482. DOI: 10.1002/sim.4482. [DOI] [PubMed] [Google Scholar]
  • 5.McCullagh P and Nelder JA. Generalized Linear Models, 2nd ed London: Chapman and Hall, 1989. [Google Scholar]
  • 6.Cox DR and Reid N. Parameter Orthogonality and Approximate Conditional Inference. Journal of the Royal Statistical Society Series B 1987; 49: 1–39. [Google Scholar]
  • 7.McCullagh P and Tibshirani R. A Simple Method for the Adjustment of Profile Likelihoods. Journal of the Royal Statistical Society Series B 1990; 52: 325–344. [Google Scholar]
  • 8.Kukush A, Malenko A and Schneeweiss H. Optimality of the quasi-score estimator in a mean-variance model with applications to measurement error models. Journal of Statistical Planning and Inference 2009; 139: 3461–3472. [Google Scholar]
  • 9.Tsou T-S. Determining the mean-variance relationship in generalized linear models—A parametric robust way. Journal of Statistical Planning and Inference 2011; 141: 197–203. [Google Scholar]
  • 10.Pack SE. Hypothesis Testing for Proportions with Overdispersion. Biometrics 1986; 42: 967–972. [PubMed] [Google Scholar]
  • 11.Dean C Testing for overdispersion in Poisson and Binomial regression models. Journal of the American Statistical Association 1992; 87: 451–457. [Google Scholar]
  • 12.Xiang L, Lee AH, Yau KKW, et al. A score test for overdispersion in zero-inflated Poisson mixed regression model. Statistics in Medicine 2007; 26: 1608–1622. [DOI] [PubMed] [Google Scholar]
  • 13.Yang Z, Hardin JW and Addy CL. A note on Dean’s overdispersion test. Journal of Statistical Planning and Inference 2009; 139: 3675–3678. [Google Scholar]
  • 14.Wilson JR and Lorenz KA. Modeling Binary Correlated Responses using SAS, SPSS and R New York: Springer, 2015. [Google Scholar]
  • 15.Liang K-Y and Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986; 73: 13–22. [Google Scholar]
  • 16.Lee Y and Nelder JA. Two ways of modelling overdispersion in non-normal data. Journal of the Royal Statistical Society Series C 2000; 49: 591–598. [Google Scholar]
  • 17.Breslow NE and Clayton DG. Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association 1993; 88: 9–25. [Google Scholar]
  • 18.Smyth GK. Generalized linear models with varying dispersion. Journal of the Royal Statistical Society Series B 1989; 51: 47–60. [Google Scholar]
  • 19.Smyth GK and Verbyla AP. Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics 1999; 10: 695–709. [Google Scholar]
  • 20.Lee Y and Nelder JA. Double hierarchical generalized linear models. Journal of the Royal Statistical Society Series C 2006; 55: 139–185. [Google Scholar]
  • 21.Wedderburn R Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika 1974; 61: 439–447. [Google Scholar]
  • 22.Wang B and Wilson JR. Comparative GMM and GQL logistic regression models on hierarchical data. Journal of Applied Statistics 2017; 45: 409–425. [Google Scholar]
  • 23.Sutradhar BC. On Exact Quasilikelihood Inference in Generalized Linear Mixed Models. Sankhy-a: The Indian Journal of Statistics 2004; 66: 263–291. [Google Scholar]
  • 24.Harris KM, Halpern C, Whitsel E, et al. The National Longitudinal Study of Adolescent to Adult Health: Research Design http://www.cpc.unc.edu/projects/addhealth/design 2009.
  • 25.Dobson AJ and Barnett AG. An Introduction to Generalized Linear Models, Third Edition New York: CRC Press, 2008. [Google Scholar]
  • 26.Zsohar P Short Introduction to the Generalized Method of Moments. Hungarian Statistical Review Special Number 16 2012; 90: 150–170. [Google Scholar]
  • 27.Lalonde TL, Wilson JR and Yin J. GMM logistic regression models for longitudinal data with time-dependent covariates and extended classifications. Statistics in Medicine 2014; 33: 4756–4769. [DOI] [PubMed] [Google Scholar]
  • 28.Imbens GW and Spady R. Confidence intervals in generalized method of moments models. Journal of Econometrics 2002; 107: 87–98. [Google Scholar]
  • 29.Donald SG, Imbens GW and Newey WK. Choosing instrumental variables in conditional moment restriction models. Journal of Econometrics 2009; 152: 28–36. [Google Scholar]
  • 30.Hansen LP. Large Sample Properties of Generalized Method of Moments Estimators. Econometrica 1982; 50: 1029–1054. [Google Scholar]
  • 31.Jiang J Empirical method of moments and its applications. Journal of Statistical Planning and Inference 2003; 115: 69–84. [Google Scholar]

RESOURCES