Bootstrap-adjusted quasi-likelihood information criteria for mixed model selection

Wentao Ge; Junfeng Shang

doi:10.1080/02664763.2022.2143484

. 2022 Nov 7;51(4):621–645. doi: 10.1080/02664763.2022.2143484

Bootstrap-adjusted quasi-likelihood information criteria for mixed model selection

Wentao Ge ^a, Junfeng Shang ^b,^CONTACT

PMCID: PMC10896141 PMID: 38414805

Abstract

We propose two model selection criteria relying on the bootstrap approach, denoted by QAICb1 and QAICb2, in the framework of linear mixed models. Similar to the justification of Akaike Information Criterion (AIC), the proposed QAICb1 and QAICb2 are proved as asymptotically unbiased estimators of the Kullback–Leibler discrepancy between a candidate model and the true model. However, they are defined on the quasi-likelihood function instead of the likelihood and are proven to be asymptotically equivalent. The proposed selection criteria are constructed by the quasi-likelihood of a candidate model and a bias estimation term in which the bootstrap method is adopted to improve the estimation for the bias caused by using the candidate model to estimate the true model. The simulations across a variety of mixed model settings are conducted to demonstrate that the proposed selection criteria outperform some other existing model selection criteria in selecting the true model. Generalized estimating equations (GEE) are utilized to calculate QAICb1 and QAICb2 in the simulations. The effectiveness of the proposed selection criteria is also demonstrated in an application of Parkinson's Progression Markers Initiative (PPMI) data.

Keywords: Compound symmetric structure, autoregressive correlation structure, semiparametric bootstrap, nonparametric bootstrap, asymptotically unbiased estimator, Kullback–Leibler discrepancy

1. Introduction

During the process of model selection, model selection criteria play a vital role to choose the most appropriate model. A well-known model selection criterion is AIC [1], which assesses a model through two aspects, the goodness of fit and simplicity. Originated from the information theory, AIC utilizes the likelihood function of a candidate model to evaluate how well the model fits the data set and a bias correction to measure the complexity of a model. Due to its simple form of the bias correction term, AIC tends to choose more complex or overfitted models rather than simpler ones, especially in the small sample scenario [9,19]. More importantly, the distribution assumption of the data is not always satisfied and the computation cost of likelihood functions significantly increases when it comes to the mixed model for highly correlated data.

The quasi-likelihood function [20] shares similar properties of the traditional likelihood function but can be well-defined as long as the mean and variance of the distribution for the data set are specified. The lack of distribution assumptions makes the quasi-likelihood more applicable to various models, including linear and generalized linear mixed models. Furthermore, with the introduction of an over-dispersion parameter, the quasi-likelihood function is capable of reducing the influence brought by overdispersion. When the quasi-likelihood function is applied to correlated data, the method of GEE [13] is commonly used to estimate model parameters.

To improve the quality of model selection, resampling approaches can be incorporated, of which the most influential technique is the bootstrap [5–7] method. A typical bootstrap approach takes three forms: parametric, semiparametric and nonparametric bootstrap. The nonparametric bootstrap is the most widely used because it is free from the parametric distribution of the data by using the bootstrap distribution [7]. AIC could be improved by absorbing the bootstrap approach, as shown in Cavanaugh and Shumway [4]. Ishiguro and Morita [10] proposed WIC through bootstrap followed by a successful application to a practical problem. Ishiguro et al. [11] proposed the extended information criterion (EIC) to extend the usage of AIC by estimating the bias correction term based on the bootstrap resample. When it comes to the mixed model with dependent data, Shang and Cavanaugh [19] brought out two bootstrap-based selection criteria, AICb1 and AICb2, which are efficient especially in small sample scenarios. Unfortunately, these bootstrap utilizations rely on the likelihood function of a data set which furthermore depends on the distribution assumption.

To extend the justification of AIC, QIC(R) [16] was proposed as a model selection criterion by mimicking the construction of AIC and modifying the Kullback–Leibler (K-L) discrepancy [12] using the quasi-likelihood function and estimators from GEE. However, the performance of QIC(R) is not consistent. In the context of linear mixed models, when the correlation within groups becomes large, QIC(R) is less likely to select the most appropriate model.

Motivated to overcome the above disadvantages in selecting overfitted models, in distribution assumptions, and in consistency of selection criteria for mixed model selection, we propose two model selection criteria, denoted by QAICb1 and QAICb2, based on the quasi-likelihood function and bootstrap method for correlated data in linear mixed models, as an extension and modification of QIC(R) in Pan [16] and of AICb1 and AICb2 in Shang and Cavanaugh [19]. We apply the GEE estimator along with the bootstrap approach to compute the quasi-likelihood of the data and to estimate the bias term.

In Section 2, we present the linear mixed models and the quasi-likelihood function. In Section 3, we propose the bootstrap-adjusted quasi-likelihood-based model selection criteria, denoted by QAICb1 and QAICb2. They are proved to be asymptotically unbiased estimators of the K-L discrepancy between a candidate and the true model in the Supplemental Appendix. In Section 4, the simulations in various settings are conducted to illustrate the selection performance of QAICb1 and QAICb2 for linear mixed models using both nonparametric and semi-parametric bootstrap methods, along with the comparison with other two existing criteria. An application of the proposed selection criteria in the PPMI data is presented in Section 5. Section 6 concludes and discusses.

2. Linear mixed models and quasi-likelihood

Let N be the number of total observations with $N = \sum_{i = 1}^{n} n_{i}$ , the linear mixed model for n clusters takes the form of

y_{i} = X_{i} β + Z_{i} b_{i} + ϵ_{i}, i = 1, \dots, n,

(1)

where $y_{i}$ is the response vector, $X_{i}$ and $Z_{i}$ are the design matrices for the fixed and random effects, respectively, with i as the index for the clusters, β is a $(p + 1) \times 1$ vector of fixed coefficients, $b_{i}$ is a $q \times 1$ vector of random effects with $E (b_{i}) = 0$ and $Var (b_{i}) = Δ$ , $ϵ_{i}$ is an $n_{i} \times 1$ vector of error terms with $E (ϵ_{i}) = 0$ and $Var (ϵ_{i}) = σ^{2} I_{n_{i}}$ . The matrix Δ is positive definite, and $I_{n_{i}}$ is an $n_{i} \times n_{i}$ identity matrix. Combining all the responses in one vector, Equation (1) can be expressed as

Y = Xβ + Zb + ϵ .

(2)

In model (2), $Y = (y_{1}^{'}, \dots, y_{n}^{'})^{'}$ is an $N \times 1$ vector of all the responses, $b = (b_{1}^{'}, \dots, b_{n}^{'})^{'}$ is an $nq \times 1$ vector for random effects, $ϵ = (ϵ_{1}^{'}, \dots, ϵ_{n}^{'})^{'}$ is an $N \times 1$ error vector, $X = (X_{1}^{'}, \dots, X_{n}^{'})^{'}$ is an $N \times (p + 1)$ matrix and assumed to be full rank, and Z is an $N \times nq$ block diagonal matrix with diagonal elements $Z_{1}, \dots, Z_{n}$ .

We note that even though a typical linear mixed model requires random effects to follow a multivariate normal distribution with mean $0 0$ and covariance matrix Δ, the distribution of random effects in model (1) is not specified or unknown in many situations, and it is the same for the distribution of error terms $ϵ_{i}$ . As quasi-likelihood functions only require the first two moments of the distribution, more flexibility exists when it comes to the distribution of random effects. In other words, the quasi-score equation could be constructed when the mean and variance of Y are specified.

To further simplify the notation, it is assumed that $n_{i} = m$ for all $i = 1, \dots, n$ and $Var (ϵ_{i}) = σ^{2} I$ , where I is an $m \times m$ identity matrix. In a linear mixed model, let $μ_{i} = E (y_{i}) = X_{i} β$ and $V_{i} = Var (y_{i}) = Z_{i} Δ Z_{i}^{'} + Var (ϵ_{i}) = Z_{i} Δ Z_{i}^{'} + σ^{2} I$ , then the response vector $y_{i}$ has mean $μ_{i}$ and variance covariance matrix $V_{i}$ . Thus, the log quasi-likelihood $Q (μ_{i}; y_{i})$ is defined through the following differential equation:

\frac{\partial Q (μ_{i}; y_{i})}{\partial μ_{i}} = V_{i}^{- 1} (y_{i} - μ_{i}) .

(3)

Note that $\partial μ_{i} / \partial β_{i} = X_{i}$ , and Equation (3) could be written in terms of β by:

\frac{\partial Q (μ_{i}; y_{i})}{\partial β} = X_{i}^{'} V_{i}^{- 1} (y_{i} - X_{i} β) .

(4)

Moreover, using Equation (4), the estimated $\hat{β}$ can be achieved by solving the following quasi-score equation

U (β) = \sum_{i = 1}^{n} X_{i}^{'} V_{i}^{- 1} (y_{i} - X_{i} β) = 0.

(5)

Note that we have

E (U (β)) = \sum_{i = 1}^{n} X_{i}^{'} V_{i}^{- 1} (E (y_{i}) - X_{i} β) = 0,

which means when the first moment of the distribution is correctly specified, the root of Equation (5), $\hat{β}$ , is consistent. Moreover, the robust variance estimate of $\hat{β}$ given by White [21],

V_{\hat{β}} = {(\sum_{i = 1}^{n} X_{i}^{'} V_{i}^{- 1} X_{i})}^{- 1} {\sum_{i = 1}^{n} X_{i}^{'} V_{i}^{- 1} (y_{i} - X_{i} \hat{β}) (y_{i} - X_{i} \hat{β})^{'} V_{i}^{- 1} X_{i}} {(\sum_{i = 1}^{n} X_{i}^{'} V_{i}^{- 1} X_{i})}^{- 1},

which is consistent as well provided that $E (y_{i}) = X_{i} β$ [22]. Note that we do not include the overdispersion parameter when constructing the quasi-likelihood function here. As the mean and variance of Y are independently distributed, the parameter estimation of the linear mixed model will not encounter overdispersion.

3. Bootstrap-adjusted quasi-likelihood information criteria: QAICb1 and QAICb2

In this section, we propose two model selection criteria, denoted by QAICb1 and QAICb2, based on the quasi-likelihood function and bootstrap approach and start by extending the likelihood-based K-L discrepancy [12] to the quasi-likelihood-based function. Bootstrapping is applied to establish these two selection criteria, named QAICb1 and QAICb2. We prove that QAICb1 and QAICb2 are asymptotically equivalent and asymptotically unbiased estimators of the quasi-likelihood-based K-L discrepancy between the true model and a candidate model in the Supplemental Appendix. Therefore, QAICb1 and QAICb2 are proposed to serve as two criteria for mixed model selection. In fact, these two criteria can also be extended to generalized linear models with random effects in future analysis.

3.1. K-L discrepancy based on quasi-likelihood

Similar to the K-L discrepancy in Shang and Cavanaugh [19] and Cavanaugh and Shumway [4] using the likelihood function, we will define the K-L discrepancy using the quasi-likelihood function. Let $β_{0}$ and β denote the parameters for the true model and a candidate model, respectively. The quasi-likelihood function corresponding to the parameters β for a candidate model is denoted by $Q (β, ϕ; Y)$ , where ϕ denotes the nuisance parameters containing all covariance parameters. Following Pan [17] and the definition of K-L discrepancy, the K-L discrepancy based on the quasi-likelihood function between the true model and a candidate model is defined as

d (β, β_{0}, ϕ; Y) = E_{0} [- 2 Q (β, ϕ; Y)],

where the expectation $E_{0}$ is taken under the true model. Since the quasi-likelihood owns the same properties as the likelihood function, this discrepancy can reflect the distance between a fitted model and the true model, indicating the goal of this discrepancy is similar to that defined based on the likelihood function. As discussed in the Supplemental Appendix, the discrepancy $d (β, β_{0}, ϕ; Y)$ is valid for β for each of all candidate models in the neighborhood of $β_{0}$ with $β_{0}$ being the local minimizer of $d (β, β_{0}, ϕ; Y)$ . More importantly, the unbiased estimator or asymptotically unbiased estimator of this discrepancy can serve as a model selection criterion, and the minimized criterion value shows the fitted model is the most appropriate one because it has closest distance with the true model.

Let $\hat{β}$ be the estimator of β from a candidate model. Here, $\hat{β}$ is the estimator derived by solving the corresponding quasi-score Equation (5). Then, the corresponding discrepancy can be written as

d (\hat{β}, β_{0}, ϕ; Y) = E_{0} [- 2 Q (β, ϕ; Y)] |_{β = \hat{β}} .

(6)

It is not possible to evaluate the quantity in Equation (6) because the parameters corresponding to $β_{0}$ is usually unknown. Let $d (β_{0}, ϕ, k)$ be the expectation of the discrepancy in Equation (6) under the true model, and k is the number of estimated parameter for the candidate model, we now have

\begin{aligned} d (β_{0}, ϕ, k) & = E_{0} [d (\hat{β}, β_{0}, ϕ; Y)] \\ = E_{0} [E_{0} {- 2 Q (β, ϕ; Y)} |_{β = \hat{β}}] \\ = E_{0} [- 2 Q (\hat{β}, ϕ; Y)] \\ + E_{0} [E_{0} {- 2 Q (β, ϕ; Y)} |_{β = \hat{β}}] - E_{0} [- 2 Q (\hat{β}, ϕ; Y)] . \end{aligned}

(7)

According to construction of AIC in Akaike [1], as shown in Equation (7), the quantity $- 2 Q (\hat{β}, ϕ; Y)$ is a biased estimator of $d (β_{0}, ϕ, k)$ and the bias term is

E_{0} [E_{0} {- 2 Q (β, ϕ; Y)} |_{β = \hat{β}}] - E_{0} [- 2 Q (\hat{β}, ϕ; Y)] .

(8)

The selection criteria based on the discrepancy in Equation (7) should serve as the unbiased estimators or asymptotically unbiased estimators and then can measure the distance between the true model and a candidate model. Next, we will propose two selection criteria named QAICb1 and QAICb2, which own such properties, as proved in the Supplemental Appendix.

3.2. Selection criteria: QAICb1 and QAICb2

Let ${Y^{*} (i), i = 1, \dots, B}$ be the B bootstrap samples obtained through Y by resampling on the individual level and ${{\hat{β}}^{*} (i), i = 1, \dots, B}$ be the corresponding estimators from the bootstrap samples. As discussed in the Supplemental Appendix, we can replace the terms of the original sample by the related ones from the bootstrap samples. For the brevity of notation, we remove the ϕ from the notation of a quasi-likelihood because in a set of candidate models, we let the covariance structure be fixed and only make a selection from the fixed effects.

Thus, the quantity in expression (8) of the bias correction term could be expressed as

E_{*} [E_{*} {- 2 Q (β; Y^{*})} |_{β = {\hat{β}}^{*}}] - E_{*} [- 2 Q ({\hat{β}}^{*}; Y^{*})],

(9)

where the expectation $E_{*}$ is taken with respect to the empirical distribution of the bootstrap sample $Y^{*}$ , and fortunately, the expectation of expression (9) can be estimated numerically through estimators obtained using the bootstrap samples.

Motivated by the construction of EIC in Ishiguro et al. [11] and AICb1 and AICb2 in Shang and Cavanaugh [19], the bootstrap estimation of the expectation in expression (9) relies on a crucial assumption, which is expressed as

E_{*} {- 2 Q (β; Y^{*})} = - 2 Q (β; Y),

(10)

under the parametric, semiparametric or nonparametric bootstrap approaches. The detailed proof of the assumption in Equation (10) is provided in the Supplemental Appendix. Taking advantage of assumption (10), the expectation in Equation (9) can be expressed as

\begin{aligned} E_{*} [E_{*} {- 2 Q (β; Y^{*})} |_{β = {\hat{β}}^{*}}] & = E_{*} [- 2 Q (β; Y) |_{β = {\hat{β}}^{*}}] \\ = E_{*} [- 2 Q ({\hat{β}}^{*}; Y)] . \end{aligned}

(11)

The bootstrap expectation in Equation (11) can be estimated by

\frac{1}{B} \sum_{i = 1}^{B} - 2 Q ({\hat{β}}^{*} (i); Y) .

(12)

As $B \to \infty$ , we have

\frac{1}{B} \sum_{i = 1}^{B} - 2 Q ({\hat{β}}^{*} (i); Y) ⟶ E_{*} {- 2 Q ({\hat{β}}^{*}; Y)} a . s .,

according to the law of large numbers (LLN).

Similarly, we can employ the bootstrap approach to directly estimate the quantity in expression (9) by

\frac{1}{B} \sum_{i = 1}^{B} - 2 Q (\hat{β} (i)^{*}; Y (i)^{*}) .

(13)

As $B \to \infty$ , we have

\frac{1}{B} \sum_{i = 1}^{B} - 2 Q ({\hat{β}}^{*} (i); Y^{*} (i)) ⟶ E_{*} {- 2 Q ({\hat{β}}^{*}; Y^{*})} a . s .

By utilizing the two bootstrap estimates in expressions (12) and (13), the following expression, denoted by $b 1$ , is to estimate the bias term in expression (9):

\begin{aligned} b 1 & = \frac{1}{B} \sum_{i = 1}^{B} - 2 Q ({\hat{β}}^{*} (i); Y) - \frac{1}{B} \sum_{i = 1}^{B} - 2 Q ({\hat{β}}^{*} (i); Y^{*} (i)) \\ = \frac{1}{B} \sum_{i = 1}^{B} - 2 \frac{Q ({\hat{β}}^{*} (i); Y)}{Q ({\hat{β}}^{*} (i); Y^{*} (i))} . \end{aligned}

(14)

In fact, the expression of b1 in Equation (14) is asymptotically unbiased estimator of the bias term in expression (8). We, therefore, propose the first bootstrap-adjusted quasi-likelihood information criteria QAICb1 for the linear mixed model as

QAICb 1 = - 2 Q (\hat{β}; Y) + \frac{1}{B} \sum_{i = 1}^{B} - 2 \frac{Q ({\hat{β}}^{*} (i); Y)}{Q ({\hat{β}}^{*} (i); Y^{*} (i))} .

(15)

Equation (15) is an asymptotically unbiased estimator of the discrepancy between a candidate model and the true model in Equation (7) as proved in the Supplemental Appendix.

The second bootstrap-adjusted variant is similarly constructed following the development of the AICb in Cavanaugh and Shumway [4] and AICb2 in Shang and Cavanaugh [19]. The bias term in expression (8) can be written as

\begin{aligned} E_{0} [E_{0} {- 2 Q (β; Y)} |_{β = \hat{β}}] - E_{0} {- 2 Q (β_{0}; Y)} \end{aligned}

(16)

\begin{aligned} + E_{0} {- 2 Q (β_{0}; Y)} - E_{0} [- 2 Q (\hat{β}; Y)] . \end{aligned}

(17)

By replacing the expectations in expressions (16) and (17) and using the bootstrap expectation and applying the crucial assumption in Equation (10), we have

\begin{aligned} E_{*} [E_{*} {- 2 Q (β; Y^{*})} |_{β = {\hat{β}}^{*}}] - E_{*} {- 2 Q (β_{0}; Y^{*})} \\ = E_{*} {- 2 Q ({\hat{β}}^{*}; Y^{*})} - {- 2 Q (β_{0}; Y)} \end{aligned}

(18)

and

\begin{aligned} E_{*} {- 2 Q (β_{0}; Y^{*})} - E_{*} [- 2 Q (\hat{β}; Y^{*})] \\ = - 2 Q (β_{0}; Y) - 2 Q (\hat{β}; Y) . \end{aligned}

(19)

Under certain conditions, we can show that both quantities in Equations (18) and (19) will converge as follows when $n \to \infty$ :

\begin{aligned} E_{*} [- 2 Q ({\hat{β}}^{*}; Y) - {- 2 Q (\hat{β}; Y)}] \\ ⟶ E_{*} {- 2 Q ({\hat{β}}^{*}; Y^{*})} - {- 2 Q (β_{0}; Y)} a . s . \end{aligned}

(20)

and

\begin{aligned} E_{*} [- 2 Q ({\hat{β}}^{*}; Y) - {- 2 Q (\hat{β}; Y)}] \\ ⟶ - 2 Q (β_{0}; Y) - 2 Q (\hat{β}; Y) a . s . \end{aligned}

(21)

We now define b2 as a mean over the bootstrap samples and sum up the left-hand side of expressions (20) and (21), then we have

\begin{aligned} b 2 & = 2 {\frac{1}{B} \sum_{i = 1}^{B} - 2 Q ({\hat{β}}^{*} (i); Y) - \frac{1}{B} \sum_{i = 1}^{B} - 2 Q (\hat{β}; Y)} \\ = 2 {\frac{1}{B} \sum_{i = 1}^{B} - 2 \frac{Q ({\hat{β}}^{*} (i); Y)}{Q (\hat{β}; Y)}} . \end{aligned}

(22)

Because of convergences in expressions (20), (21) combining with Equation (22), the quantity of b2 in Equation (22) will be used to estimate the sum of the converged parts in Equations (18) and (19), and this sum is equal to the bias term of the discrepancy in expression (9). We therefore propose the second bootstrap-adjusted quasi-likelihood information criterion QAICb2 for the linear mixed model as

QAICb 2 = - 2 Q (\hat{β}; Y) + 2 {\frac{1}{B} \sum_{i = 1}^{B} - 2 \frac{Q ({\hat{β}}^{*} (i); Y)}{Q (\hat{β}; Y)}} .

(23)

Therefore, two model selection criteria in Equations (15) and (23) are proposed.

4. Simulations in linear mixed models

In this section, the performance of selection criteria QIC(R) in Equation (24), ${QAIC}_{u} (R)$ in Equation (25), QAICb1 in Equation (15), and QAICb2 in Equation (23) is compared in the simulated correlated data from model (1). When the nested models are utilized in the simulations, there are total 10 subsequently nested candidate models with the largest model containing all the 10 covariates. Let $β_{0} = (2, 2, 1, 1, 0.5, 0, 0, 0, 0, 0, 0)^{'}$ be true parameters for the fixed effects. The covariates $X_{1}, \dots, X_{10}$ are independently generated from the standard normal distribution. The error term is generated from a normal distribution with mean 0 and standard deviations $σ = 1$ , 1.3, and 1.5, respectively. Let the sample size be n = 25, n = 50, n = 100, and $n = 200,$ respectively. With each sample size, let m = 3 be the number of repeated measurements. The number of bootstrap samples B is set to be 250, which is the minimum value of B in which the simulation results can be well obtained. The true correlation matrix is chosen as EX(ρ) or AR(ρ) or a mixture of different correlation matrices with $ρ = 0.2$ , 0.4, 0.6, 0.8, and the fitting correlation matrix is exchangeable or autoregressive.

There are three covariance matrices used in generating the data: exchangeable, first-order autoregressive, and unstructured. Under these three covariance structures, the variance for the random effects is determined by specifying the correlations, and the random effects are generated from a normal distributions. For example, when $ρ = 0.2$ and $σ^{2} = 1$ and an exchangeable covariance structure is utilized, we can obtain that $Var (b_{i}) = Δ = 0.25$ in model (1) because $ρ = Δ / (Δ + σ^{2})$ and $b_{i}$ are generated from normal distribution with mean zero and variance 0.25. We note that an exchangeable covariance structure is also called a compound symmetric structure or a random intercept model. When a first-order autoregressive model is utilized, the covariance structure is presented as

\frac{σ^{2}}{1 - ρ^{2}} [\begin{matrix} 1 & ρ & ρ^{2} \\ ρ & 1 & ρ \\ ρ^{2} & ρ & 1 \end{matrix}] .

If ρ and $σ^{2}$ are given, the covariance structure is determined. Such a first-order autoregressive model serves as linear mixed model and the random effects and error terms are generated from a normal distribution with mean zero and this covariance structure. We note that autoregressive linear mixed effects models in which the current response is regressed on the previous response, fixed effects, and random effects [8]. When an unstructured covariance is utilized, the data are constructed from a combination of data generated with exchangeable, autoregressive, and self-defined covariances which will be described in related simulation parts.

This section is divided into two subsections. The simulations in the first subsection are via nonparametric bootstrap, and those in the second part are via semi-parametric bootstrap. We investigate the model selection performance of QAICb1 and QAICb2 using a set of nested candidate models pairing with different true correlations. When bootstrap is utilized, the performance of QAICb1 and QAICb2 is also examined using candidate models constructed by different combinations of predictor variables.

We will incorporate GEE in the calculation of the proposed selection criteria, QAICb1 and QAICb2 in Equations (15) and (23). Two different covariance structures are adopted: exchangeable and first-order autoregressive. They are easy to be computed when GEE is utilized to estimate the model parameters under the quasi-likelihood setting. However, when the number of observations within the same individual is large, it is very challenging to estimate the parameters because of high-dimensional parameters under which McCullagh and Nelder [15] pointed out that the quasi-likelihood function may not exist until certain requirements are met. In the process of selecting the most appropriate model in a candidate pool, the model with the smallest value of QAICb1 or QAICb2 is considered to be the best model. We present a selection criterion now for comparison in the simulations. Pan [16] proposed an information criterion-based on the quasi-likelihood function for correlation R, QIC(R), as

QIC (R) = - 2 Q (\hat{β} (R), \hat{ϕ}; Y) + 2 trace (\hat{Σ} \hat{J}),

(24)

where $\hat{ϕ}$ is estimated ϕ based on the largest candidate model. In addition, J is the covariance of $\hat{β}$ and can be estimated by the robust or sandwich covariance estimator. Additionally, $\hat{Σ}$ is the estimated of Σ, and to estimate $Q (β, ϕ; Y)$ and Σ, we have the following properties:

\begin{aligned} E_{0} {- {\frac{\partial Q (β, ϕ; Y)}{\partial β} |}_{β = β_{0}}} = 0 and \\ Σ = E_{0} {- {\frac{\partial^{2} Q (β, ϕ; Y)}{\partial β \partial β^{'}} |}_{β = β_{0}}} = \sum_{i = 1}^{n} X_{i}^{'} V_{i} X_{i} . \end{aligned}

We note that $\hat{Σ}$ can be consistently estimated by its empirical estimator $- \partial^{2} Q (β, ϕ; Y) / ∂β \partial β^{'} |_{β = \hat{β}}$ . QIC(R) in Equation (24) is used to select the most appropriate mixed model by fitting models and selecting the one with the smallest QIC(R). QIC(R) partially under the independence assumption. QIC(R) treats all the within-individual observations to be mutually independent. The parameters estimated in QIC(R) are based on the GEE approach such that QIC(R) is distribution free compared to other AIC-type selection criteria.

Before going deep into the performance of the proposed criteria QAICb1 and QAICb2, we now introduce another criterion ${QAIC}_{u}$ defined in Pan [16]. Given a GEE estimator $\hat{β} (R)$ of β, ${QAIC}_{u} (R)$ is expressed as

{QAIC}_{u} (R) = - 2 Q (\hat{β} (R), \hat{ϕ}; Y) + 2 k .

(25)

Note that k is the number of parameters to be estimated. It has been found in the simulations that ${QAIC}_{u} (R)$ is more efficient when the correlation of the data is relatively large, compared to the $QIC (R)$ . A possible reason would be the use of the bias correction term $2 k$ for ${QAIC}_{u} (R)$ . Notice that the term $2 k$ is only associated with the dimension of the candidate models and independent of the correlation structure. As there exist similarities between the log-likelihood and log quasi-likelihood functions, especially when the normal models are used, ${QAIC}_{u} (R)$ shares similar properties of AIC, and it is also the asymptotic result of the QIC(R). Unlike $QIC (R)$ , which tends to underestimate the corresponding discrepancy when the correlation is large, the bootstrap-adjusted criteria QAICb1 and QAICb2 perform better in model selection across different correlation structures.

4.1. Simulations via nonparametric bootstrap approach

We will conduct simulations using a nonparametric approach. In real data sets, it is usually unknown that the correlation matrix is correctly specified or not. Therefore, the simulations are conducted in the settings where the true correlation matrix is not correctly specified to investigate the performance of QAICb1 and QAICb2 in model selection, which means the true correlation and the fitting correlation matrices are distinct. In the first setting, EX(ρ) is the true correlation and autoregressive (AR(ρ)) is the fitting correlation. The second setting is the opposite, with AR(ρ) being the true correlation and exchangeable (EX(ρ)) being the fitting correlation.

4.1.1. True correlation is not correctly specified

Tables 1– 3 feature the simulation results corresponding to three different true variances. A clear trend observed is that regardless of the correlation coefficient ρ and the standard deviation σ (or variance $σ^{2}$ ), as the sample size n increases, the rate of selecting the true model turns larger, i.e. the performance of model selection becomes better, indicating the consistency of the proposed selection criteria QAICb1 and QAICb2. The criterion $QIC (R)$ performs poorly in selecting the correct model, and it is less effective than the others. As the correlation ρ increases, its selection performance turns worse.

Table 1.

True model selection rates under $σ = 1$ (nonparametric).

True correlation		EX(ρ)				AR(ρ)
Fitting correlation		Autoregressive				Exchangeable
n	ρ	0.2	0.4	0.6	0.8	0.2	0.4	0.6	0.8
25	QIC(R)	50.4	50.1	49.7	44.1	54.0	48.8	47.1	43.3
	${QAIC}_{u} (R)$	74.0	76.8	82.7	90.5	76.7	74.5	80.8	86.6
	QAICb1	61.9	66.1	72.0	80.3	65.0	63.7	70.4	76.2
	QAICb2	67.4	71.2	75.5	83.1	70.0	68.9	74.9	79.3
50	QIC(R)	60.3	60.0	56.3	49.8	63.4	62.1	57.4	49.8
	${QAIC}_{u} (R)$	71.9	78.5	81.8	91.5	74.4	77.4	81.6	88.9
	QAICb1	66.0	72.5	74.4	86.5	69.6	72.0	76.0	83.3
	QAICb2	68.1	74.9	76.8	88.1	72.0	73.9	77.4	84.7
100	QIC(R)	66.1	63.5	60.8	55.3	66.1	64.8	56.9	49.9
	${QAIC}_{u} (R)$	75.0	77.4	84.3	91.6	73.9	76.7	80.9	90.8
	QAICb1	70.6	72.4	80.5	88.9	69.1	73.8	77.2	88.7
	QAICb2	71.4	74.7	81.9	89.3	70.2	74.8	78.3	88.8
200	QIC(R)	70.2	66.6	67.8	58.0	70.3	65.6	62.7	55.0
	${QAIC}_{u} (R)$	75.6	77.1	85.7	92.1	74.9	77.0	84.0	91.0
	QAICb1	72.8	75.1	83.6	90.4	72.8	74.8	82.1	88.7
	QAICb2	73.5	75.9	84.7	90.7	73.2	75.6	82.1	89.5

True correlation: EX(0.6)				Fitting correlation: exchangeable
Criteria	M1	M2	M3	TRUE	M5	M6	M7	M8	M9	M10
QIC(R)	0	0	0	587	119	77	69	43	53	52
${QAIC}_{u} (R)$	0	0	0	656	114	75	51	37	43	24
QAICb1	0	0	0	818	74	48	27	9	17	7
QAICb2	0	0	0	828	70	52	22	7	15	6
Run-time	15.99 seconds

Model	Covariates	Model	Covariates
M1	$X_{1}$ , $X_{2}$	M5	$X_{1}$ , $X_{2}$ , $X_{3}$ , $X_{4}$
M2	$X_{1}$ , $X_{2}$ , $X_{3}$	M6	$X_{1}$ , $X_{2}$ , $X_{3}$ , $X_{5}$
M3	$X_{1}$ , $X_{2}$ , $X_{4}$	M7	$X_{1}$ , $X_{2}$ , $X_{4}$ , $X_{5}$
M4 (True)	$X_{1}$ , $X_{2}$ , $X_{5}$	M8	$X_{1}$ , $X_{2}$ , $X_{3}$ , $X_{4}$ , $X_{5}$

Name	Description
PD	Status of Parkinson's disease, ‘yes’ = 1, ‘no’ = 0
GP1	Genetic piece: chr12 rs34637584 GT
GP2	Genetic piece: chr17 rs11868035 GT
GP3	Genetic piece: chr17 rs11012 GT
GP4	Genetic piece: chr17 rs393152 GT
GP5	Genetic piece: chr17 rs12185268 GT
GP6	Genetic piece: chr17 rs199533 GT
RS1	Part I of the unified Parkinson's disease rating scale
RS2	Part II of the unified Parkinson's disease rating scale
RS3	Part III of the unified Parkinson's disease rating scale
Sex	Index of an individual's sex, ‘female’ = 0, ‘male’ = 1
Weight	Numerical value of an individual's age
Age	Numerical value of an individual's age
ID	Index of an individual

Name	GP1	Age	Weight	Sex	RS1	RS2	RS3
M1				$\times \times$	$\times \times$	$\times \times$	$\times \times$
M2	$\times \times$			$\times \times$	$\times \times$	$\times \times$	$\times \times$
M3		$\times \times$		$\times \times$	$\times \times$	$\times \times$	$\times \times$
M4	$\times \times$	$\times \times$		$\times \times$	$\times \times$	$\times \times$	$\times \times$
M5			$\times \times$	$\times \times$	$\times \times$	$\times \times$	$\times \times$
M6	$\times \times$		$\times \times$	$\times \times$	$\times \times$	$\times \times$	$\times \times$
M7		$\times \times$	$\times \times$	$\times \times$	$\times \times$	$\times \times$	$\times \times$
M8	$\times \times$	$\times \times$	$\times \times$	$\times \times$	$\times \times$	$\times \times$	$\times \times$

True correlation		EX(ρ)				AR(ρ)
Fitting correlation		Autoregressive				Exchangeable
n	ρ	0.2	0.4	0.6	0.8	0.2	0.4	0.6	0.8
25	QIC(R)	48.4	47.0	48.5	41.2	48.8	48.7	44.9	40.8
	${QAIC}_{u} (R)$	34.7	38.9	51.3	61.5	32.8	37.7	45.5	59.7
	QAICb1	59.5	60.6	64.5	69.7	59.2	60.4	61.5	70.6
	QAICb2	63.1	63.8	67.8	71.5	62.4	64.5	65.8	73.8
50	QIC(R)	59.0	61.8	57.0	51.5	60.6	59.3	55.2	46.4
	${QAIC}_{u} (R)$	33.0	41.8	52.6	68.4	31.7	36.8	46.7	62.2
	QAICb1	64.8	71.3	75.8	83.4	66.9	69.6	74.2	81.4
	QAICb2	67.3	73.8	77.0	85.9	69.7	72.1	76.3	82.7
100	QIC(R)	66.0	64.3	63.3	54.7	67.0	63.6	60.4	53.3
	${QAIC}_{u} (R)$	30.9	40.2	54.0	68.6	31.9	38.1	48.1	63.1
	QAICb1	70.0	75.7	80.5	86.7	69.2	71.5	79.9	86.4
	QAICb2	70.3	76.8	81.8	87.2	70.9	73.9	79.9	87.3
200	QIC(R)	67.4	66.8	65.7	55.4	68.9	64.1	65.1	52.0
	${QAIC}_{u} (R)$	31.7	38.2	53.4	70.1	30.8	34.7	48.5	62.9
	QAICb1	70.2	76.9	83.2	91.4	71.5	71.1	82.1	87.7
	QAICb2	71.2	76.8	84.3	91.2	72.5	71.1	82.3	88.2

True correlation		EX(ρ)				AR(ρ)
Fitting correlation		Autoregressive				Exchangeable
n	ρ	0.2	0.4	0.6	0.8	0.2	0.4	0.6	0.8
25	QIC(R)	50.4	49.3	46.1	43.3	50.0	47.5	44.6	39.2
	${QAIC}_{u} (R)$	46.7	52.5	59.3	75.5	45.9	48.6	55.8	69.5
	QAICb1	59.0	62.0	66.2	77.0	62.2	63.4	63.0	72.9
	QAICb2	64.6	62.7	70.0	79.3	66.8	67.8	67.3	75.8
	Run-time	8.76 seconds				9.74 seconds
50	QIC(R)	59.3	59.3	53.0	49.2	62.5	60.8	56.9	48.0
	${QAIC}_{u} (R)$	42.5	53.7	59.0	77.9	48.3	54.3	60.2	72.9
	QAICb1	64.9	70.8	75.0	85.3	68.1	70.6	75.9	82.9
	QAICb2	67.4	73.1	78.5	85.8	70.8	72.8	78.0	83.7
	Run-time	10.98 seconds				12.44 seconds
100	QIC(R)	70.1	64.0	62.0	54.8	67.3	64.9	58.2	54.5
	${QAIC}_{u} (R)$	47.9	51.5	65.0	78.4	46.0	51.9	58.9	76.6
	QAICb1	72.5	72.8	82.0	89.4	69.5	72.5	76.7	88.3
	QAICb2	73.6	74.5	83.5	90.0	71.5	73.1	78.7	88.9
	Run-time	17.82 seconds				16.47 seconds
200	QIC(R)	70.1	65.3	67.7	57.7	70.4	68.0	62.1	54.6
	${QAIC}_{u} (R)$	46.5	51.5	67.4	80.0	45.5	52.7	60.8	76.1
	QAICb1	72.9	74.1	83.2	91.5	72.4	76.2	78.5	89.0
	QAICb2	73.9	74.3	83.7	91.6	72.5	76.7	79.2	89.5
	Run-Time	25.12 seconds				29.74 seconds

PERMALINK

Bootstrap-adjusted quasi-likelihood information criteria for mixed model selection

Wentao Ge

Junfeng Shang

Abstract

1. Introduction

2. Linear mixed models and quasi-likelihood

3. Bootstrap-adjusted quasi-likelihood information criteria: QAICb1 and QAICb2

3.1. K-L discrepancy based on quasi-likelihood

3.2. Selection criteria: QAICb1 and QAICb2

4. Simulations in linear mixed models

4.1. Simulations via nonparametric bootstrap approach

4.1.1. True correlation is not correctly specified

Table 1.

Table 3.

Table 2.

4.1.2. Discussion on selecting overfitted candidate models

Table 4.

Table 6.

Table 5.

4.1.3. Simulations for candidate models with combinations of predictors

Table 7.

Table 8.

Table 10.

Table 9.

4.2. Simulations via semiparametric bootstrap approach

4.2.1. Linear mixed models involving different correlation structures

Table 11.

Table 13.

Table 12.

4.2.2. Discussion on selecting overfitted candidate models

Table 14.

Table 16.

Table 15.

5. Application

5.1. Data description

Table 17.

5.2. Data processing and candidate models

Table 18.

Figure 1.

Figure 2.

Figure 3.

5.3. Presentation of selection results

Table 19.

6. Concluding remarks and discussion

Supplementary Material

Acknowledgments

Disclosure statement

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases