Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Dec 3.
Published in final edited form as: Biometrics. 2019 Apr 25;75(3):950–965. doi: 10.1111/biom.13060

Empirical-likelihood-based criteria for model selection on marginal analysis of longitudinal data with dropout missingness

Chixiang Chen 1, Biyi Shen 1, Lijun Zhang 2, Yuan Xue 3, Ming Wang 1
PMCID: PMC6889864  NIHMSID: NIHMS1057969  PMID: 31004449

Abstract

Longitudinal data are common in clinical trials and observational studies, where missing outcomes due to dropouts are always encountered. Under such context with the assumption of missing at random, the weighted generalized estimating equation (WGEE) approach is widely adopted for marginal analysis. Model selection on marginal mean regression is a crucial aspect of data analysis, and identifying an appropriate correlation structure for model fitting may also be of interest and importance. However, the existing information criteria for model selection in WGEE have limitations, such as separate criteria for the selection of marginal mean and correlation structures, unsatisfactory selection performance in small-sample setups, and so forth. In particular, there are few studies to develop joint information criteria for selection of both marginal mean and correlation structures. In this work, by embedding empirical likelihood into the WGEE framework, we propose two innovative information criteria named a joint empirical Akaike information criterion and a joint empirical Bayesian information criterion, which can simultaneously select the variables for marginal mean regression and also correlation structure. Through extensive simulation studies, these empirical-likelihood-based criteria exhibit robustness, flexibility, and outperformance compared to the other criteria including the weighted quasi-likelihood under the independence model criterion, the missing longitudinal information criterion, and the joint longitudinal information criterion. In addition, we provide a theoretical justification of our proposed criteria, and present two real data examples in practice for further illustration.

Keywords: Akaike information criterion, Bayesian information criterion, empirical likelihood, longitudinal data, missing at random, model selection, weighted generalized estimating equation

1 |. INTRODUCTION

Longitudinal data are common in clinical trials and observational studies. Due to the research interest in conducting inference on the population-level parameter estimates, generalized estimating equation (GEE) has been widely employed for marginal regression analysis, where the correlations among the observations within subjects are treated as nuisance parameters (Liang and Zeger, 1986; Wang, 2014). In longitudinal studies, missing data are typically encountered, which poses challenges for model fitting and model selection. There are three types of missing data: missing completely at random, missing at random (MAR), and missing not at random, depending on whether the factors related to missing probability are observed or not (Little and Rubin, 2014). For instance, subjects may drop out of the study or are lost to follow-up due to several reasons such as drug resistance or side effects. Under such context, MAR is commonly and reasonably assumed for statistical inference. Literature has shown that the estimates based on regular GEE are biased for longitudinal data under MAR (Laird, 1988). Robins et al. (1995) first proposed the weighted GEE (WGEE) method for bias correction by incorporating an inverse probability weight matrix. Given the correctly specified model for missing data, the consistency of WGEE estimates still holds even when the “working” correlation structure is misspecified.

Model selection is a crucial aspect of longitudinal data analysis. Without a doubt, identifying the variables for the marginal mean structure is always essential. Also, an improper correlation structure may lead to loss of efficiency of parameter estimates. This problem has been exclusively investigated for complete longitudinal data; however, when the missing data exist, the efficiency improvement is still under exploration, but several works have shown that selecting a proper correlation structure for WGEE is somewhat promising and important (Preisser et al., 2002; Shardell and Miller, 2008; Gosho et al., 2014; Gosho, 2016). To accomplish these selection goals, development of model information criteria has gained substantial attention by researchers. Pan (2001) first proposed one of the most popularly used information criteria, the quasi-likelihood under the independence model criterion (QIC), but it does not accommodate missing data. For longitudinal data with dropout missingness under MAR, Shen and Chen (2012) proposed two separate measures based on the quadratic loss function, the missing longitudinal information criterion (MLIC) and the MLIC for correlation (MLICC), for selection of marginal mean regression and correlation structures in WGEE, respectively. Another option for marginal model selection under this scenario is the weighted quasi-likelihood information criterion (QICWp) by accommodating the weight matrix into QIC (Platt et al., 2013). Later on, Gosho (2016) proposed QICWr by modifying the penalty term of QICWp for selection of both marginal mean and correlation structures. Most recently, Shen and Chen (2018) proposed the joint longitudinal information criterion (JLIC) with regard to the joint selection of marginal mean and correlation structures for longitudinal data with missing outcomes and covariates. However, the aforementioned criteria have the following limitations: (a) ignoring missing data; (b) losing model selection power when different criteria for either marginal mean structure selection or correlation structure selection are implemented; and (c) leading to unsatisfactory results in selection rates, particularly when the sample size is small (Shen and Chen, 2012; Gosho, 2016; Shen and Chen, 2018).

Contrarily, the empirical likelihood approach by adopting a purely observation-based technique has recently gained more attention due to the relaxing of parametric distributional assumption, and literature has already shown its outperformance in regression analysis especially on confidence interval construction (Owen, 1988; Qin and Lawless, 1994; Qin et al., 2009). However, empirical-likelihood-based model selection criteria have not been widely investigated yet. Kolaczyk (1995) first proposed the empirical information criterion (EIC), but pointed out that convergence to a proper solution was not reached in estimation, particularly when the number of estimating equations is larger than the number of parameters. Later, Variyath et al. (2010) introduced adjusted empirical likelihood criteria, the empirical Akaike information criterion (EAIC) and the empirical Bayesian information criterion (EBIC), to guarantee the existence of a solution. However, the computational issue remains if the estimators have bounded support (eg, a correlation coefficient). Chen and Lazar (2012) applied empirical likelihood for only the correlation structure selection in GEE under complete longitudinal data and proposed to use plug-in estimators obtained from GEE; however, no theoretical justification of plug-in estimators was provided in their work. To our knowledge, there is little work on empirical-likelihood-based model selection criteria accommodating missing data under the longitudinal framework.

In this paper, two motivating data applications are provided. One is a large epidemiological study, the Atherosclerosis Risk in Communities (ARIC) study. Systolic blood pressure (SBP), a crucial risk factor for cardiovascular disease (CVD), is of clinical and research interest, and characterizing its longitudinal patterns over time can help for CVD risk prediction and determine relatively more effective treatment or medication (Parati et al., 2013; Muntner et al., 2015). The other one is a study of Schizophrenia disorder. The mean level, as well as visit-to-visit variability on severity measurements, is associated with deficits in emotional processing and functional impairment (Simon et al., 2007; Bilderbeck et al., 2016), which could reflect drug effectiveness and indicate a strategy for prevention of disease progression. To achieve these clinical objectives, we need to identify the best fitting model among different candidates. Here, we propose two information criteria named a joint EAIC (JEAIC) and a joint EBIC (JEBIC), which can simultaneously select marginal mean and correlation structures in WGEE for longitudinal data with dropout missingness under MAR. The basic strategy is that the empirical-likelihood-based criteria are first established by utilizing parameter estimates from WGEE together with the proposed empirical likelihood, and thus JEAIC and JEBIC can be constructed by incorporating extra penalty terms. These criteria are easy to implement in statistical software, and potential computational issues can be avoided because the parameter estimates are obtained directly from WGEE. Also, this work can be extended to accommodate more general missing patterns (ie, intermittent missingness). For simplicity, we mainly focus on monotone dropout missingness here.

The paper is organized as follows. In Section 2, we formulate the problem, introduce WGEE and the existing model selection criteria, and then provide the proposed information criteria of JEAIC and JEBIC based on the empirical likelihood. The theoretical justification for our proposal is granted under certain conditions with detailed proof in the Supporting Information. In Section 3, we conduct extensive simulations under a variety of scenarios with continuous and categorical outcomes to evaluate the performance of the two proposed criteria when compared with the current existing alternatives. Lastly, we illustrate the application of our scheme by utilizing two real data examples in Section 4, and conclude with a discussion in Section 5.

2 |. METHODOLOGY

2.1 |. Notation

Let Yi = (Yi1,…,YiT)′ and Xi = (Xi1,…,XiT)′ denote the outcomes and covariates collected from subject i, i = 1,…,n, respectively, where Yij is the jth outcome and a p × 1 vector of covariates Xij includes the intercept, j = 1,…,T. For simplicity, we assume balanced data with equal numbers of observations for all subjects. Let μi=E(Yi|Xi) and Vi=Var(Yi|Xi) be the conditional mean and variance of Yi. It is noted that μi is usually modeled as ξ(μi) = Xiβ with ξ as a known and prespecified link function depending on the type of outcomes and β as a p × 1 vector of regression parameters (McCullagh and Nelder, 1989). In addition, Vi can be written by Ai1/2Ci(ρ)Ai1/2, where the matrix Ai is a T × T diagonal matrix with diagonal elements var(Yit|Xit)=ϕν(μit), where ν is a known function and ϕ is a dispersion parameter which could be known or has to be estimated if unknown; Ci(ρ) is a prespecified “working” correlation matrix depending on a set of parameters ρ. Here, we consider the outcomes subject to missingness under the assumption of MAR, where the indicator Rij = 1 for the observed Yij and Rij = 0, otherwise. For simplicity, we focus on dropout missingness, but it can be straightforwardly extended to accommodate other general missing patterns (Robins et al., 1995; Shen and Chen, 2018).

2.2 |. WGEE

For longitudinal data with dropouts under MAR, WGEE has been proposed by incorporating a weight matrix based on the inverse probability of observing the outcomes to adjust for the missing mechanism (Robins et al., 1995). Let the probability of observing the outcome for the ïth subject as ωi = (ωi1,…,ωiT)′, where ωij = Pr(Rij = 1|Yi, Hi) with Hi including potential predictors which could be overlapped with Xi. It is noted that ωij = λi1 × λi2 ×···× λij where λi1 = 1 (the outcomes at baseline are all observed) and λij = Pr(Rij = 1|Rij, − 1 = 1, Yi, Hi), j = 2,…,T. Given the data (Rij, Yi, Hi), λij can be estimated based on the partial likelihood from a logistic regression, i=1nj=2TRi,j1log[λij(θ)Rij{1λij(θ)}1Rij], where θ is a q × 1 vector of regression parameters with consistent estimates obtained by

Snθ=1ni=1nsi(θ)=1ni=1nj=2TRi,j1{Ri,jλij(θ)}Hij, (1)

with logit(λij (θ)) = Hijθ. Thus, the predicted probability λ^ij and thereafter ω^ij can be calculated. After plugging ω^ into Wi, the estimating equations for the parameters β are

g(β)=i=1ng(Xi,Yi,β;ω^)=i=1nDiVi1Wi(Yiμi)=0, (2)

where {γ˜1,γ˜2,,γ˜n} which is a T × p matrix, Vi=Ai1/2CiAi1/2, and Wi is the weight matrix with diagonal elements Rij/ω^ij,j=1,,T. The estimate β^ is consistent even if the “working” correlation matrix is misspecified, and n(β^β) is asymptotically normal distributed under mild regulatory conditions, given that the dropout model is correctly specified (ie, EWi = IT, with Wi evaluated at the true value ω0; Robins et al., 1995).

It is noted that given any prespecified “working” correlation matrix other than an independent correlation structure, the correlation coefficient ρ needs to be estimated. Usually, the correlation estimates can be obtained based on an iterative process by utilizing the Pearson residuals (Wedderburn, 1974). But, the correlation coefficient estimate for the longitudinal data with missing outcomes could be biased, while the unbiased estimate for ρjk is ρ^jk(β^)=[{1/{(np)ϕ}]i=1neij(β^)eik(β^)RijRik/ω^i,jk where ω^i,jk is the estimate of ωi,jk=Pr(Rij=1,Rik=1|Yi,Hij,Hik) and eij(β) is the residual (Yijμij)/ν(μij)(1j<kT). Because of dropout missingness, the weights can be simplified as ωi,jk=ωik=Pr(Rik=1|Yi,Hik) and then ρ^jk(β^)=[1/{(np)ϕ}]i=1neij(β^)eik(β^)Rik/ω^ik. For other missing patterns (ie, intermittent), the estimation would become more complicated (Robins et al., 1995; Chen et al., 2010). In addition, ϕ is assumed to be known or estimated as ϕ^(β^)={1/(nTp)}i=1nj=1Teij2(β^)Rij/ω^ij (released afterwards for mathematical simplicity). For convenient notation, we stack the estimating equations by subject i for the parameters γ = (β′, ρ′)′ as follows:

g(Xi,Yi,γ;ω^i)=(DiVi1Wi{Yiμi(β)}ζ(Xi,Yi,ρ;ω^i)), (3)

where ζ(Xi,Yi,ρ;ω^i) is some estimating equation for the correlation coefficients ρ based on weighted Pearson residuals. Taking an unstructured case for example, ζ(Xt,Yt,ρ;ω^t) could be κl(β)ρϕ(1p/n), where κi(β)=(ρ^i12(β),,ρ^i1T(β),,ρ^i(T1)T(β)) with ρ^ijk(β)=eij(β)eik(β)Rik/ω^ik, 1 ≤ j < kT, and ρ=(ρ12,,ρ1T,,ρ(T1)T).

2.3 |. Model selection criteria

2.3.1 |. Overview of existing criteria

Before introducing our proposed information criteria, we first conduct a literature review of several key criteria on model selection for WGEE in longitudinal data analysis, with dropout missingness under MAR. One called MLIC was proposed for the selection on marginal mean regression by Shen and Chen (2012), which is based on the expected quadratic loss function and modifies Mallows’s Cp statistics (in linear regression). Given the estimates γ^=(β^,ρ^) and ω^ MLIC is calculated by

MLIC=i=1n(Yiμ^i)Wi(Yiμ^i)+2Tr(En1Jn),

where En=i=1nDiVi1WiDi and Jn=i=1n(DiVi1ϵiϵiGiϵi)Di with ϵi=Wi(Yiμi0) and Gi=(m=1nQmsm)(m=1nsmsm)1si where Qi=DiVi1Wi(Yiμ^i) and si is the score component of the ith individual in the partial likelihood for the dropout model in (1). It is noted that μi0 is estimated by the largest candidate model based on the collected information, and numerical studies via simulation have shown that the misspecification of this model has mild or negligible influence on the performance of MLIC. In addition, Shen and Chen (2012) also provided MLICC for correlation structure selection by modifying the penalty term.

Another commonly used criterion for such context is QICWr (Gosho, 2016), which is extended from regular QIC by incorporating the inverse probability weight matrix. Given the estimates γ^=(β^,ρ^) and ω^, the QICWr statistic is provided as

QICWr=2i=1nj=1TQw(β^,ω^;Yi,Xi,Hi)+2Tr(Φ^IV^w),

where Qw(β^,ω^;Yij,Xi,Hi) is the weighted log quasi-likelihood function under an independence correlation structure, and Φ^I=i=1nj=1T(2Qw/ββ)|β=β^.

2.3.2 |. Proposed criteria of JEAIC and JEBIC

To begin with, we first propose the full weighted estimating equation GF by accommodating a stationary correlation structure for the empirical likelihood, which is given by

GF(XFi,Yi,β˜,ρc,θ)=(DiVi1Wi{Yiμi(β˜)}Ui(β˜)h(ρc)ϕsi(θ)), (4)

where si(θ) is the estimating equation for θ in (1). Notation β˜L in GF denotes a vector of parameters with the same dimensionality as βFL from our proposed full mean structure with XFi as the covariates for the ith subject. Without loss of generality, we can always rearrange the covariate matrix XFi so that the first p-dimensional vector in β˜ equals the parameter vector β from the candidate model, and the remaining elements in β˜ equal zeros, thus β˜=(β,0). In addition, a stationary correlation structure is proposed for the full WGEE to estimate correlation coefficients, that is, ρFST=(ρ1ST,,ρT1ST),Ui(β˜)=(Ui1(β˜),Ui2(β˜),,Ui(T1)(β˜)) with Uim(β˜)=j=1Tm(Ri,j+m/ωi,j+m)eij(β˜)ei,j+m(β˜). Also, for any prespecified correlation structure denoted by the superscript c (nested within a stationary correlation structure), h(ρc)=(ρ1c(T1p/n),,ρT1c(1p/n)) with ρc=(ρ1c,,ρT1c)T1. For instance, ρEXC=(ρEXC,,ρEXC) when an exchangeable (EXC) correlation structure is fitted. Here, we consider a stationary correlation structure for the proposed full model; however, it can be extended to a more general case (ie, unstructured), which may substantially increase the number of parameters needing estimation, and thus likely lead to convergence issues particularly for small n and relatively large T.

Combining all the information above, we thus have the following empirical likelihood ratio, which is the key component to select marginal mean and correlation structures:

RF(β,ρc,θ)=supβ,ρc,θ{i=1nnpi;pi>0,i=1npi=1,i=1npiGF(XFi,Yi,β˜,ρc,θ)=0}, (5)

where pi = P(Y = yi, X = xi). Here, we assume that only the distributions with an atom of probability on each yi and xi have nonzero likelihood. Therefore, {pi}s will follow the rule of traditional probability with the sum equal to one. Without imposing constraints defined by the estimating equations, i=1npi is maximized as i=1n(1/n). Thus, the empirical likelihood ratio is defined as i=1nnpi. More basic properties about empirical likelihood can be found in Owen (2001). An intuitive rationale of model selection based on proposed empirical likelihood ratio is as follows: when the estimators β^F,ρ^FST=(ρ^1ST,,ρ^T1ST) are obtained from the WGEE method with XFi and a stationary correlation structure from (3), and θ^ is calculated from (1), we will have RF(β^F,ρ^FST,θ^)=1, which achieves the upper limit of the empirical likelihood ratio. However, the estimators β^ and ρ^c other than β^F and ρ^FST will lead to RF(β^F,ρ^c,θ^)<1. The departure from 1 indicates the misspecification of the model to the degree reflected by the magnitude of the deviation. In other words, the closer the mean and correlation structures approach the underlying true values, the closer RF will approach 1, which ensures the potential for joint selection of marginal mean and correlation structures.

Thereafter, by plugging the parameter estimates (β^,ρ^c) from a candidate model in WGEE (3) and θ^ML obtained based on the estimating equation (1) into RF(β^,ρ^c,θ^ML), the empirical likelihood ratio is the solution of the following equation by utilizing the Lagrange multiplier method (Owen, 2001):

2logRF(β^,ρ^c,θ^ML)=2i=1nlog{1+λGF(Xi,Yi,β˜^,ρ^c,θ^ML)}, (6)

where the parameter λ can be solved by applying the Newton-Raphson method based on

i=1nGF(Xi,Yi,β˜^,ρ^c,θ^ML)1+λGF(Xi,Yi,β˜^,ρ^c,θ^ML)=0. (7)

Thus, for longitudinal data with dropout missingness under MAR, our proposed information criteria are defined by

JEAIC=2logRF(β^,ρ^c,θ^ML)+2p,˜JEBIC=2logRF(β^,ρ^c,θ^ML)+p˜logn,

where p˜ denotes the total number of parameters. The asymptotic property of our proposed information criteria can be evaluated based on the existing work. In particular, in the work by Kolaczyk (1995), EIC has been proved to be an asymptotically unbiased estimate that is proportional to the expected Kullback-Leibler distance between two discrete empirical distributions. Also, Variyath et al. (2010) evaluated the consistency of EBIC. In both of their works, general estimating equations are considered, but it is straightforward to embed our proposed full estimating equation (4) into their theoretical framework when the empirical likelihood estimators are utilized. However, our proposed approach is built upon the plug-in estimators; thus, it is important to assess the asymptotic proprieties of these plug-in estimators and their relationship with the empirical likelihood estimators.

2.3.3 |. Asymptotic properties of plug-in estimators

In this section, we will investigate the asymptotic properties of our plug-in estimators under MAR, and explain why we advocate such an alternative. First, we investigate the asymptotic behavior of estimators β^EL, ρ^ELc, and θ^EL from maximizing the profile empirical likelihood ratio. Inspired by Qin and Lawless (1994) and Qin et al. (2009), we derive the asymptotic properties of the estimator shown in Theorem 1 with the proof sketched in the Supporting Information.

Theorem 1.

Let us denote

gF(Xi,Yi,β˜,ρc,θ)=(DiVi1Wi{Yiμi(β˜)}Ui(β)h(ρc)ϕ),(β^EL,ρ^ELc,θ^EL)=argmaxβ,ρc,θRF(β,ρc,θ),ands=si(θ).

Under the conditions specified in the Supporting Information and given γ=(β,ρc) and θ with corresponding true values γ0 and θ0, we have

  • (1)
    (γ^ELγ0θ^ELθ0)=(V*A*Qn*ΩSnθ)+op(n1/2), (8)
    where Snθ is defined in (1), and
    V*=[E(gFγ){EgFgFE(gFθ)(Ess)1E(gFθ)}1E(gFγ)]1,A*=E(gFγ){EgFgFE(gFθ)(Ess)1E(gFθ)}1,Qn*=1ni=1ngF(Xi,Yi,β˜,ρc,θ)+E(gFθ)E(ss)1Snθ,Ω=(Ess)1.
  • (2)
    Furthermore, the asymptotic normality can be derived from (8)
    n(γ^ELγ0θ^ELθ0)dN((00),(Σ1100Σ22)),
    withΣ11=V*A*Cov(Qn*)A*V*,Σ22.=ΩCov(Snθ)Ω
  • (3)

    2logRF(β^EL,ρ^ELc,θ^EL) follows a χ2 distribution with L˜P˜ degrees of freedom where L˜ is the number of estimating equations in (4) and P˜ as the total number of parameters.

An interesting finding from Theorem 1 is that the empirical-likelihood-based estimator θ^EL is asymptotically equivalent to the estimator θ^ML from partial likelihood in (1) since they have the same influence function. Also, the estimator θ^EL is asymptotically independent of the estimator γ^EL by Theorem 1(II). Thus, we can substitute θ^MLinRF(β,ρc,θ) first and then estimate γ by maximizing RF(γ;θ^ML), by which means, the estimator is asymptotically equivalent to the estimator γ^EL, thus we keep this notation for this context. Such plug-in method can definitely decrease the dimensionality of parameters for estimation by only focusing on γ, and thus reducing the computational burden in particular when the dimension of θ is relatively large.

However, maximizing RF(γ;θ^ML) to estimate γ still raises computational issues since the number of the estimating equations may exceed the number of parameters, which requires 0 to be inside the convex hull of data to guarantee the existence of solution (Variyath et al., 2010; Chen and Lazar, 2012). Furthermore, the bounded support of correlation coefficients also increases the difficulty among the existing algorithms. Instead, we advocate to substitute the empirical likelihood estimators γ^ELinRF(γ^EL;θ^ML) with the estimators from a candidate model fitting in WGEE (3), which can avoid computational issues and ensure convenient application. Here, we investigate the asymptotic relationship between the WGEE and empirical-likelihood-based estimators, which is summarized in the following theorem:

Theorem 2.

Under Theorem 1 and the conditions provided in the Supporting Information, the estimates γEL=(β^EL,ρ^ELc) from empirical likelihood based on (5) and γ^=(β^,ρ^c)' based on WGEE (3) are asymptotically equivalent.

The proofs for EXC and AR1 scenarios are provided in the Supporting Information. Theorem 2 implies that the WGEE estimator is a reasonable approximation of the empirical likelihood estimator under certain conditions, indicating that any asymptotic properties induced by the empirical likelihood estimator would be reasonably invoked by the WGEE estimator. More discussion on conditions is referred to the Supporting Information.

3 |. SIMULATION STUDIES

In this section, we investigate the numerical performance of our proposed criteria under various settings, and compare with several existing criteria such as MLIC and QICWr as well as the most recent work of JLIC. We expect better performance of the two proposed criteria compared to the existing alternatives. In addition, JEBIC might have better control of false-positive rates than JEAIC under relatively large sample sizes (Variyath et al., 2010).

Our first scenario considers binary outcomes, and the true marginal mean structure is

log(μij1μij)=β0+xi1β1+xij2β2fori=1,,n,j=1,,T, (9)

where xi1 is the subject (cluster)-level covariate generated from the uniform distribution over [0, 1] and xij2 = j − 1 is a time-dependent covariate. The number of observations (ie, cluster size) is T = 3. The true parameter vector β=(β0,β1,β2) in the marginal mean is (−1, 1, 0.4). The true correlation structure is EXC with a correlation coefficient ρ0 = 0.5. The dropout model is

log(λij1λij)=θ0+yi(j1)θ1+hijθ2fori=1,,n,j=2,,T, (10)

where the covariate hij is uniformly distributed over [−0.5, 0.5]. Different choices for the parameters θ=(θ0,θ1,θ2) can ensure the missing probability (denoted by m) around 0.2 and 0.3, that is, θ = (1.74, 0.5, −0.8) is for m = 0.2 and θ = (1.05, 0.5, −0.8) is for m = 0.3.

In the first scenario, we consider a correctly specified dropout model. Then, we also evaluate the robustness of our proposal when the dropout model is misspecified because of the left out variable hij in the regression (Shen and Chen, 2018).

In addition, we generate one redundant variable xij3 ~ N (0, 1). The full model considered for our proposed criteria as well as MLIC/MLICC includes three variables, xi1, xij2, and xij3. Six potential marginal mean structures are considered with three types of “working” correlation structures (ie, EXC, first order autoregressive [AR1] and independence [IND]) for model fitting. To summarize the simulation results, 500 Monte Carlo data sets with sample size n = 100, 200 are generated for each scenario, and the selection rate for each combination of marginal mean and correlation structures is reported. Moreover, we also consider the scenarios with Gaussian outcomes, the ones where the assumption of MAR is violated, and also the ones with redundant variables. Due to limited space, we cannot show all these results here, but provide them in the Supporting Information.

Contrarily, to compare our proposal with JLIC, we consider the same setups (with binary and Gaussian outcomes) in Shen and Chen (2018) by utilizing their supporting program functions for simulations. The detailed information on parameter setups is not provided here but can be referred to Shen and Chen (2018). All the simulations are conducted in R and MATLAB softwares.

In Table 1, We find out that both JEAIC and JEBIC outperform two-stage MLIC/MLICC and QICWr across different settings. In general, all methods exhibit better selection behaviors if sample size increases or missing probability decreases, but the superiority of our proposal becomes more apparent compared to the other alternatives regarding higher improvement in selection rates. Under relatively small sample size, JEAIC and JEBIC behave similarly on joint model selection, while JEBIC seems more promising under relatively large sample size by imposing more penalty on both parameter number and sample size, which agrees with our expectation (Variyath et al., 2010). Contrarily, the performances of MLIC/MLICC and QICWr are not satisfactory and consistently stable across different setups despite having slightly better performance as the sample size increases. Similar patterns and selection rates can be found in Table 2, which indicates that misspecification of the dropout model does not have much influence on the performance of our proposed criteria when the MAR assumption still holds.

TABLE 1.

Performance of JEAIC and JEBIC compared with MLIC and QICWr: percentage of selecting six candidate logistic models across 500 Monte Carlo datasets; T = 3, ρ = 0.5

Setups Method C(ρ) x1 x3 X1, x2 X1, x3 x1, x3 x1, x2, x3 Total
n = 100 JEAIC AR1 0.004 0 0.082 0 0.016 0.006 0.108
m = 0.2 EXC 0.026 0.008 0.578 0.002 0.186 0.092 0.892
IND 0 0 0 0 0 0 0
Total 0.03 0.008 0.66 0.002 0.202 0.098 1
JEBIC AR1 0.02 0.004 0.072 0 0.014 0 0.11
EXC 0.09 0.028 0.566 0.002 0.2 0.004 0.89
IND 0 0 0 0 0 0 0
Total 0.11 0.032 0.638 0.002 0.214 0.004 1
MLIC AR1 0.008 0.008 0.2 0.002 0.14 0.06 0.418
EXC 0.008 0.008 0.28 0.004 0.168 0.068 0.536
IND 0.004 0 0.018 0 0.016 0.008 0.046
Total 0.02 0.016 0.498 0.006 0.324 0.136 1
QICWr AR1 0 0 0.062 0 0.038 0.04 0.14
EXC 0.006 0.004 0.436 0.002 0.236 0.112 0.796
IND 0 0 0.03 0 0.02 0.014 0.064
Total 0.006 0.004 0.528 0.002 0.294 0.166 1
n = 100 JEAIC AR1 0.01 0.002 0.102 0 0.03 0.016 0.16
m = 0.3 EXC 0.042 0.026 0.472 0.004 0.198 0.098 0.84
IND 0 0 0 0 0 0 0
Total 0.052 0.028 0.574 0.004 0.228 0.114 1
JEBIC AR1 0.038 0.014 0.082 0 0.028 0.002 0.164
EXC 0.126 0.066 0.44 0.002 0.188 0.014 0.836
IND 0 0 0 0 0 0 0
Total 0.164 0.08 0.522 0.002 0.216 0.016 1
MLIC AR1 0.01 0.01 0.164 0 0.106 0.064 0.354
EXC 0.036 0.026 0.29 0.002 0.174 0.06 0.588
IND 0.002 0.004 0.028 0.002 0.014 0.008 0.058
Total 0.048 0.04 0.482 0.004 0.294 0.132 1
QICWr AR1 0.002 0.002 0.05 0.002 0.028 0.03 0.114
EXC 0.008 0.006 0.452 0.002 0.232 0.136 0.836
IND 0 0 0.026 0 0.012 0.012 0.05
Total 0.01 0.008 0.528 0.004 0.272 0.178 1
n = 200 JEAIC AR1 0 0 0.034 0 0.008 0.008 0.05
m = 0.2 EXC 0 0 0.73 0 0.096 0.124 0.95
IND 0 0 0 0 0 0 0
Total 0 0 0.764 0 0.104 0.132 1
JEBIC AR1 0 0 0.042 0 0.012 0 0.054
EXC 0.01 0 0.806 0 0.114 0.016 0.946
IND 0 0 0 0 0 0 0
Total 0.01 0 0.848 0 0.126 0.016 1
MLIC AR1 0 0 0.23 0 0.064 0.068 0.362
EXC 0.002 0 0.392 0 0.082 0.098 0.574
IND 0.002 0 0.036 0 0.006 0.02 0.064
Total 0.004 0 0.658 0 0.152 0.186 1
QICWr AR1 0 0 0.056 0 0.012 0.02 0.088
EXC 0 0 0.56 0 0.114 0.168 0.842
IND 0 0 0.04 0 0.002 0.028 0.07
Total 0 0 0.656 0 0.128 0.216 1
n = 200 JEAIC AR1 0 0 0.066 0 0.014 0.008 0.088
m = 0.3 EXC 0.006 0 0.646 0 0.132 0.128 0.912
IND 0 0 0 0 0 0 0
Total 0.006 0 0.712 0 0.146 0.136 1
JEBIC AR1 0.002 0 0.074 0 0.014 0 0.09
EXC 0.038 0.004 0.704 0.002 0.152 0.01 0.91
IND 0 0 0 0 0 0 0
Total 0.04 0.004 0.778 0.002 0.166 0.01 1
MLIC AR1 0.002 0 0.214 0.002 0.056 0.056 0.33
EXC 0.002 0.002 0.386 0 0.124 0.098 0.612
IND 0 0 0.03 0 0.01 0.018 0.058
Total 0.004 0.002 0.63 0.002 0.19 0.172 1
QICWr AR1 0 0 0.066 0 0.006 0.03 0.102
EXC 0 0 0.554 0 0.118 0.18 0.852
IND 0 0 0.018 0 0.004 0.024 0.046
Total 0 0 0.638 0 0.128 0.234 1

The model with {x1, x2} and an EXC correlation structure is the true model. Notations n and m denote the sample size and the missing probability, respectively.

Abbreviations: EXC, exchangeable; IND, independence; JEAIC, joint empirical Akaike information criterion; JEBIC, joint empirical Bayesian information criterion; MLIC, missing longitudinal information criterion; QICW, weighted quasi-likelihood information criterion.

The bold values denote the true mean and correlation structures.

TABLE 2.

Performance of JEAIC and JEBIC compared with MLIC and QICWr when the dropout model is misspecified: percentage of selecting six candidate logistic models across 500 Monte Carlo datasets; T = 3, ρ = 0.3

Setups Method C(ρ) x1 x3 x1, x2 x1, x3 x2, x3 x1, x2, x3 Total
n = 100 JEAIC AR1 0.002 0.002 0.092 0 0.012 0.008 0.116
m = 0.2 EXC 0.024 0.01 0.566 0.002 0.191 0.09 0.884
IND 0 0 0 0 0 0 0
Total 0.026 0.012 0.659 0.002 0.203 0.098 1
JEBIC AR1 0.022 0.004 0.084 0 0.012 0 0.122
EXC 0.096 0.022 0.557 0.002 0.195 0.006 0.878
IND 0 0 0 0 0 0 0
Total 0.118 0.026 0.641 0.002 0.207 0.006 1
MLIC AR1 0.006 0.012 0.212 0 0.126 0.054 0.41
EXC 0.012 0.006 0.258 0.006 0.184 0.074 0.54
IND 0 0 0.028 0 0.016 0.006 0.05
Total 0.018 0.018 0.498 0.006 0.326 0.134 1
QICWr AR1 0 0 0.056 0 0.042 0.04 0.138
EXC 0.008 0.004 0.436 0.002 0.232 0.114 0.796
IND 0 0 0.034 0 0.02 0.012 0.066
Total 0.008 0.004 0.526 0.002 0.294 0.166 1
n = 100 JEAIC AR1 0.01 0.002 0.094 0.002 0.032 0.02 0.16
m = 0.3 EXC 0.046 0.026 0.484 0.004 0.194 0.086 0.84
IND 0 0 0 0 0 0 0
Total 0.056 0.028 0.578 0.006 0.226 0.106 1
JEBIC AR1 0.042 0.012 0.078 0 0.028 0.002 0.162
EXC 0.142 0.066 0.436 0 0.184 0.01 0.838
IND 0 0 0 0 0 0 0
Total 0.184 0.078 0.514 0 0.212 0.012 1
MLIC AR1 0.01 0.008 0.154 0.002 0.098 0.046 0.318
EXC 0.038 0.032 0.296 0.002 0.184 0.066 0.618
IND 0.002 0.004 0.028 0 0.016 0.014 0.064
Total 0.05 0.044 0.478 0.004 0.298 0.126 1
QICWr AR1 0.002 0 0.048 0 0.03 0.034 0.114
EXC 0.008 0.004 0.456 0.004 0.232 0.132 0.836
IND 0 0 0.022 0 0.012 0.016 0.05
Total 0.01 0.004 0.526 0.004 0.274 0.182 1
n = 200 JEAIC AR1 0 0 0.036 0 0.008 0.008 0.052
m = 0.2 EXC 0 0 0.726 0 0.092 0.13 0.948
IND 0 0 0 0 0 0 0
Total 0 0 0.762 0 0.1 0.138 1
JEBIC AR1 0 0 0.042 0 0.012 0.002 0.056
EXC 0.01 0 0.806 0 0.11 0.018 0.944
IND 0 0 0 0 0 0 0
Total 0.01 0 0.848 0 0.122 0.02 1
MLIC AR1 0 0 0.216 0 0.05 0.072 0.338
EXC 0 0 0.404 0 0.098 0.098 0.6
IND 0.002 0 0.032 0 0.006 0.022 0.062
Total 0.002 0 0.652 0 0.154 0.192 1
QICWr AR1 0 0 0.056 0 0.01 0.028 0.094
EXC 0 0 0.558 0 0.108 0.168 0.834
IND 0 0 0.04 0 0.002 0.03 0.072
Total 0 0 0.654 0 0.12 0.226 1
n = 200 JEAIC AR1 0 0 0.068 0 0.012 0.008 0.088
m = 0.3 EXC 0.008 0 0.662 0 0.132 0.11 0.912
IND 0 0 0 0 0 0 0
Total 0.008 0 0.73 0 0.144 0.118 1
JEBIC AR1 0.004 0.002 0.078 0 0.012 0 0.096
EXC 0.04 0 0.698 0.002 0.154 0.01 0.904
IND 0 0 0 0 0 0 0
Total 0.044 0.002 0.776 0.002 0.166 0.01 1
MLIC AR1 0.002 0 0.206 0.002 0.054 0.072 0.336
EXC 0.004 0.002 0.374 0 0.13 0.094 0.604
IND 0 0 0.032 0 0.012 0.016 0.06
Total 0.006 0.002 0.612 0.002 0.196 0.182 1
QICWr AR1 0 0 0.064 0 0.004 0.028 0.096
EXC 0 0 0.542 0 0.124 0.192 0.858
IND 0 0 0.018 0 0.004 0.024 0.046
Total 0 0 0.624 0 0.132 0.244 1

The model with {x1, x2} and an EXC correlation structure is the true model. Notations n and m denote the sample size and the missing probability, respectively.

Abbreviations: EXC, exchangeable; IND, independence; JEAIC, joint empirical Akaike information criterion; JEBIC, joint empirical Bayesian information criterion; MLIC, missing longitudinal information criterion; QICW, weighted quasi-likelihood information criterion.

The bold values denote the true mean and correlation structures.

Moreover, using the same setups in the first scenario, we conduct further investigation by only considering marginal mean selection given a prespecified correlation structure according to the editor’s suggestion. The results, in the Supporting Information, imply that the misspecified correlation structure would worsen the selection performance. More interestingly, in Table 1, the marginal selection rates, for mean structures (column total) regardless of the correlation structure selection, is comparable or even slightly higher than the Oracle one under which the true correlation structure is specified and fixed for the marginal mean selection. These findings provide further evidence of our joint selection’s advantages; thus, even though the marginal mean structure is the sole interest, the implementation of the joint selection would promise a satisfactory selection rate. Also, the additional simulations provided in the Supporting Information further indicate the robustness of our proposal when the MAR assumption is violated, and also show the generalization into the cases with different types of outcomes or a relatively large number of redundant predictors in candidate models. Even for the scenarios with relatively higher missing proportions (ie, m = 0.5), our proposal is still applicable (results not shown). Overall, our proposed JEAIC and JEBIC outperform the other existing criteria, and JEBIC is highly recommended when the sample size is relatively large in real applications.

Tables 3 and 4 summarize the comparison between our proposal and JLIC on joint selection performance when the missing probability is 0.1 or 0.2 under binary and Gaussian scenarios. All results show that JEAIC and JEBIC outperform JLIC with higher selection rates for the true underlying model. The improvement becomes more substantial when the outcomes are in continuous scale. In addition, with relatively larger sample size, JEBIC performs even better, which suggests a possible advantage in controlling false positive rates.

Table 3.

Performance of JEAIC and JEBIC compared with JLIC for scenarios with binary outcomes

Setups Method C(ρ) 1 2 3 4 5 6 7 8 9 10 Total
m = 0.1 JLIC AR1 0 0 0.03 0 0 0.007 0.007 0 0 0.003 0.047
EXC 0.006 0 0.645 0 0 0.132 0.147 0 0 0.023 0.953
IND 0 0 0 0 0 0 0 0 0 0 0
Total 0.006 0 0.675 0 0 0.139 0.154 0 0 0.026 1
JEAIC AR1 0 0 0.006 0 0 0.001 0.003 0 0 0.001 0.011
EXC 0.002 0 0.698 0 0 0.138 0.128 0 0 0.023 0.989
IND 0 0 0 0 0 0 0 0 0 0 0
total 0.002 0 0.704 0 0 0.139 0.131 0 0 0.024 1
JEBIC AR1 0 0 0.011 0 0 0 0 0 0 0 0.011
EXC 0.011 0 0.952 0 0 0.017 0.009 0 0 0 0.989
IND 0 0 0 0 0 0 0 0 0 0 0
total 0.011 0 0.963 0 0 0.017 0.009 0 0 0 1
m = 0.2 JLIC AR1 0 0 0.057 0 0 0.011 0.009 0 0 0.001 0.078
EXC 0.008 0 0.63 0.002 0 0.12 0.137 0 0.025 0 0.922
IND 0 0 0 0 0 0 0 0 0 0 0
Total 0.008 0 0.687 0.002 0 0.131 0.146 0 0.025 0.001 1
JEAIC AR1 0.001 0 0.016 0 0 0.002 0.004 0 0 0.001 0.024
EXC 0.001 0 0.687 0.001 0 0.146 0.12 0 0 0.021 0.976
IND 0 0 0 0 0 0 0 0 0 0 0
total 0.002 0 0.703 0.001 0 0.148 0.124 0 0 0.022 1
JEBIC AR1 0.001 0 0.022 0 0 0 0 0 0 0 0.023
EXC 0.026 0 0.922 0 0 0.015 0.014 0 0 0 0.977
IND 0 0 0 0 0 0 0 0 0 0 0
total 0.027 0 0.944 0 0 0.015 0.014 0 0 0 1

The sample size n = 500, T = 3, ρ = 0.3 across 1,000 Monte Carlo datasets. Ten candidate models are considered: {1} = {x1}, {2} = {x3}, {3} = {x1, x2}, {4} = {x1, x3}, {5} = {x3, x4}, {6} = {x1, x2, x4}, {7} = {x1, x2, x3}, {8} = {x1, x3, x4}, {9} = {x2, x3, x4}, {10} = {x1, x2, x3, x4}. It is noted that Model {3} = {x1, x2} with an EXC correlation structure is the true model. The variables x3 and x4 are redundant.

Abbreviation: EXC, exchangeable; IND, independence; JEAIC, joint empirical Akaike information criterion; JEBIC, joint empirical Bayesian information criterion; JLIC, joint longitudinal information criterion; MLIC, missing longitudinal information criterion; QICW, weighted quasi-likelihood information criterion.

The bold values denote the true mean and correlation structures.

Table 4.

Performance of JEAIC and JEBIC compared with JLIC for scenarios with Gaussian outcomes

Setups Method C(ρ) 1 2 3 4 5 6 7 8 9 10 Total
m = 0.1 JLIC AR1 0 0 0 0 0 0 0.082 0 0.027 0.012 0.121
EXC 0 0 0 0 0 0 0.654 0 0.141 0.083 0.878
IND 0 0 0 0 0 0 0 0 0 0.001 0.001
Total 0 0 0 0 0 0 0.736 0 0.168 0.096 1
JEAIC AR1 0 0 0 0 0 0 0.002 0 0 0 0.002
EXC 0 0 0 0 0 0 0.802 0 0.124 0.072 0.998
IND 0 0 0 0 0 0 0 0 0 0 0
Total 0 0 0 0 0 0 0.804 0 0.124 0.072 1
JEBIC AR1 0 0 0 0 0 0 0.002 0 0 0 0.002
EXC 0 0 0 0 0 0 0.991 0 0.005 0.002 0.998
IND 0 0 0 0 0 0 0 0 0 0 0
Total 0 0 0 0 0 0 0.993 0 0.005 0.002 1
m = 0.2 JLIC AR1 0 0 0 0 0 0 0.163 0 0.034 0.027 0.224
EXC 0 0 0 0.001 0 0 0.542 0.002 0.136 0.091 0.772
IND 0 0 0 0 0 0 0 0 0.001 0.003 0.004
Total 0 0 0 0.001 0 0 0.705 0.002 0.171 0.121 1
JEAIC AR1 0 0 0 0 0 0 0.01 0 0.006 0.001 0.017
EXC 0 0 0 0 0 0 0.744 0 0.156 0.083 0.983
IND 0 0 0 0 0 0 0 0 0 0 0
Total 0 0 0 0 0 0 0.754 0 0.162 0.084 1
JEBIC AR1 0 0 0 0 0 0 0.016 0 0.001 0 0.017
EXC 0 0 0 0 0 0 0.975 0 0.007 0.001 0.983
IND 0 0 0 0 0 0 0 0 0 0 0
Total 0 0 0 0 0 0 0.991 0 0.008 0.001 1

The sample size n = 500, T = 3, ρ = 0.3 across 1,000 Monte Carlo datasets. Ten candidate models are considered: {1} = {x1}, {2} = {x2}, {3} = {x1, x2}, {4} = {x1, x3}, {5} = {x1, x3, x1,3}, {6} = {x1, x2, x1,2}, {7} = {x1, x2, x3}, {8} = {x2, x3, x2,3}, {9} = {x1, x2, x3, x1,2, x1,3}, {10} = {x1, x2, x3, x1,2, x1,3, x1,3}. It is noted that Model {7} = {x1, x2, x3} with an EXC correlation structure is the true model. The variable x4 is redundant.

Abbreviations: EXC, exchangeable; IND, independence; JEAIC, joint empirical Akaike information criterion; JEBIC, joint empirical Bayesian information criterion; JLIC, joint longitudinal information criterion; MLIC, missing longitudinal information criterion; QICW, weighted quasi-likelihood information criterion.

The bold values denote the true mean and correlation structures.

4 |. REAL DATA APPLICATIONS

4.1 |. Case 1: the ARIC study

The ARIC study was designed to investigate the causes of atherosclerosis and its clinical outcomes, the trends in rates of hospitalized myocardial infarction and coronary heart disease in 45-year-old to 64-year-old men and women from four US communities. We select Forsyth County to identify a total of 1036 white patients who were diagnosed with hypertension at the first examination in 1987 to 1989 for analysis (Kim et al., 2012). The existing literature has shown that SBP is an important risk factor for CVD risk prediction; however, the findings on its longitudinal pattern vary across studies due to several factors such as small sample size, lack of model diagnosis, limiting factors, and so on (Muntner et al., 2015). Here, we utilize the large epidemiological ARIC study for more exploration. During the study period, longitudinal SBP measures were collected at approximately 3-year intervals (1987-1989, 1990-1992, 1993-1995, and 1996-1998). There exist 355 dropout subjects, leading to a monotone missing pattern. The baseline covariates of interest are considered for exploration: age (in years), gender (1 = female; 0 = male), diabetes (1 = fasting glucose ≥ 126 mg/dL; 0 = fasting glucose < 126 mg/dL), ever smoker (1 = yes; 0 = no), and also the examination times are coded as 1, 2, 3, and 4 for four time intervals. Before modeling, data processing is conducted, where the age variable is centered at the mean age of 54 and divided by 10 to represent a decade, and also SBP is standardized (Kim et al., 2012). Also, the dropout probability λij is estimated from a logistic model with independent variables including all baseline covariates aforementioned and Yi,j−1, Yi,j−2, and Yi,j−3.

Table 5 summarizes the results with the boldface values indicating that the information criterion is the smallest among possible candidate models. From Table 5, Model 2 with an AR1 correlation structure is selected by JEAIC, JEBIC, while Model 2 with an EXC correlation structure is selected by MLIC/MLICC and QICWr. Thus, marginal mean regression is selected consistently; however, the discrepancy in the selected correlation structures based on different criteria shows the necessity and importance to utilize more robust and reliable information criteria. Furthermore, we check the empirical pairwise correlations between times, and a decreasing trend is shown when time gap becomes larger, indicating our selection is reasonable and valid. The final selected model, Model 2, includes three variables: time, gender, and age, which all have significant effects on SBP.

TABLE 5.

Analysis of the ARIC study based on eight candidate marginal mean regressions and three potential correlation structures

Predictors C(ρ) Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8
Time 0.05* (0.012) 0.05* (0.012) 0.05* (0.012) 0.05* (0.012) 0.05* (0.012) 0.05* (0.012) 0.05* (0.012) 0.05* (0.012)
Gender −0.15* (0.054) −0.10* (0.050) −0.11* (0.052) −0.14* (0.052) −0.10* (0.050) −0.13* (0.052)
Smoke −0.10 (0.057) –0.07 (0.052) −0.11* (0.055) −0.07 (0.052) −0.11* (0.055)
Age 0.36* (0.045) 0.37* (0.045) 0.37* (0.045) 0.36* (0.045) 0.36* (0.046) 0.36* (0.045)
Diabetes 0.14 (0.076) 0.09 (0.078) 0.10 (0.076) 0.09 (0.077)
JEAIC AR1 123.46 55.24 59.58 117.78 56.26 58.04 62.51 58.95
EXC 129.37 71.02 73.19 121.8 70.37 70.51 72.66 69.68
IND 922.75 789.34 781.46 896.88 798.1 780.41 785.85 791.26
JEBIC AR1 148.18 79.96 84.29 142.5 85.92 87.7 92.17 93.55
EXC 154.09 95.73 97.91 146.52 100.03 100.17 102.32 104.28
IND 942.52 809.11 801.23 916.66 822.81 805.13 810.57 820.91
MLIC EXC 5,118 5,037.9 5,040.7 5,117.2 5,040.1 5,042.5 5,044.9 5,044.8
QICWr EXC 5,114.7 5,035.1 5,037.9 5,113.5 5,036.6 5,038.6 5,041 5,040.1

Summary results include WGEE estimates with standard errors in parentheses under an AR1 “working” correlation structure, and JEAIC, JEBIC, MLIC, and QICWr for model selection. It is noted that for MLIC and QICWr, an EXC correlation structure is selected based on MLICC and QICWr, respectively.

Abbreviations: ARIC, Atherosclerosis Risk in Communities; EXC, exchangeable; IND, independence; JEAIC, joint empirical Akaike information criterion; JEBIC, joint empirical Bayesian information criterion; MLIC, missing longitudinal information criterion; QICW, weighted quasi-likelihood information criterion.

*

p < 0.05.

The bold values denote the optimal models selected based on different criteria.

4.2 |. Case 2: the national institute of the mental health schizophrenia (IMPS) study

To further evaluate our proposal for categorical outcomes, we consider the data from the IMPS study that includes 293 patients in the treatment group who were given drugs chlorprom azine, fluphenazine, or thioridazine as treatment and 93 patients in placebo group (Gibbons and Hedeker, 1994). For each patient, the severity of schizophrenia disorder (IMPS79) was measured (range: 0–7) at weeks 0, 1, 3, 6 (time=week). Here, we define Y = 1 if IMPS ≥ 4; otherwise, Y = 0. The goal is to investigate treatment effect (drug = 1 for treatment; 0 for placebo) and sex (male = 1; female = 0) on Y. The dropout probability λij is estimated from a logistic regression with the predictors drugij, sexij, timeij, Yi,j−1, Yi,j−2, and Yi,j−3.

Table 6 summarizes the results of model fitting and comparisons. It is noted that previous work has shown that an AR1 correlation structure is preferred based on MLICC; thus MLIC and QICWr are calculated given this AR1 selection. Table 6 shows that Model 3 is selected as the best candidate model based on JEAIC, JEBIC, and MLIC because of the minimum values among all six candidate models. However, QICWr selects Model 4 as the best one even though the value is slightly lower than that of Model 3. Lastly, the final selected model, Model 3, includes two variables, time and drug, which both have significant effects on the risk of severe schizophrenia disorder.

TABLE 6.

Analysis of the IMPS study based on six candidate marginal mean regressions and three correlation structures

Predictors C(ρ) Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
Time −1.339 (0.081)* −1.372 (0.084)* −1.166 (0.208)* −1.372 (0.084)* −1.180 (0.239)*
Drug −0.618 (0.182)* −0.854 (0.236)* −0.357 (0.438) −0.860 (0.237)* −0.524 (0.492)
Sex 0.116 (0.184) −0.188 (0.494)
Time × drug –0.256 (0.271) −0.252 (0.229)
Time × sex 0.023 (0.171)
Sex × drug 0.345 (0.460)
JEAIC AR1 27.55 398.50 16.08 17.55 17.64 22.63
EXC 94.52 491.44 90.70 91.87 94.14 101.78
IND 223.56 496.46 209.77 210.76 209.69 212.86
JEBIC AR1 39.42 410.37 31.91 37.33 37.42 54.28
EXC 106.38 503.31 106.53 111.65 113.92 133.43
IND 231.48 504.37 221.64 226.59 225.51 240.56
MLIC AR1 261.9 321.5 255.8 256 256.5 257.5
QICWr AR1 1,554.8 1,872.2 1,529.6 1,529.5 1,532.7 1,537.1

Summary results include WGEE estimates with standard errors in parentheses under an AR1 “working” correlation structure, and JEAIC, JEBIC, MLIC, and QICWr for model selection. It is noted that for MLIC and QICWr, an AR1 correlation structure is selected based on MLICC and QICWr, respectively.

Abbreviations: EXC, exchangeable; IMPS, institute of the mental health schizophrenia; IND, independence; JEAIC, joint empirical Akaike information criterion; JEBIC, joint empirical Bayesian information criterion; MLIC, missing longitudinal information criterion; QICW, weighted quasi-likelihood information criterion; WGEE, weighted generalized estimating equation.

*

p < 0.05.

The bold values denote the optimal models selected based on different criteria.

5 |. DISCUSSION

In this paper, we heuristically introduce two innovative information criteria, JEAIC and JEBIC, for longitudinal data with dropout missingness under MAR. The proposed criteria are evaluated in both theoretical and numerical studies with better performance compared to MLIC, QICWr, and JLIC under a variety of scenarios. In particular, the expected quadratic loss distance based upon which MLIC and JLIC are derived is a model-free criterion, which only measures how well the estimated means approximate to the population means but without identifying the true mean structure (Ye, 1998). Thus, it might not be easy to distinguish two mean structures, which are both close to the true mean under finite samples. Contrarily, QICWr modifies QIC and implements correlation structure selection based on so-called “more informative” penalty term (Gosho, 2016). However, it is unclear in theory whether and how correctly specifying a “working” correlation structure will intrinsically minimize the penalty term in QICWr. In contrast, our proposed JEAIC and JEBIC are based on empirical likelihood, which are distribution-free and efficiently driven by observed data and informative estimating equations. This accordingly provides scientific sense why our empirical-likelihood-based criteria would have outperformance, assuming that the true underlying model is nested within the full estimating equations. Our approach is easy to be implemented in software with the code available in the Supporting Information. Also, extensive simulations show that our proposed criteria perform computationally efficient and are flexible to be extended for more complicated scenarios, indicating the potential for wide application.

Despite the aforementioned advantages brought up from JEAIC and JEBIC, there is still substantial work for further evaluation or improvement, for instance, selection stability to account for sampling variability may need more check via extensive simulation studies using a bootstrap approach. Also, two other potential extensions may include: (a) to accommodate more general missing patterns such as intermittent missingness; (b) to consider the missingness on some time-dependent covariates or high-dimensional predictors (ie, gene expression data; Chen et al., 2010), which is also commonly encountered in practice nowadays. Therefore, how to generalize our proposal and accurately perform joint model selection under these scenarios still needs to be explored.

Supplementary Material

proof

ACKNOWLEDGMENTS

Wang’s research was partially supported by Grant UL1 TR002014 and KL2 TR002015 from the National Center for Advancing Transnational Sciences (NCATS). The content is solely the responsibility of the authors and does not represent the official views of the National Institute of Health, the National Science Foundation, and other research sponsors.

Funding information

National Center for Advancing Translational Sciences, Grant/Award Number: UL1 TR002014 and KL2 TR002015

Footnotes

SUPPORTING INFORMATION

Additional supporting information may be found online in the Supporting Information section at the end of the article.

REFERENCES

  1. Bilderbeck A, Reed ZE, McMahon H, Atkinson L, Price J, Geddes J, et al. (2016). Associations between mood instability and emotional processing in a large cohort of bipolar patients. Psychological Medicine, 46, 3151–3160. [DOI] [PubMed] [Google Scholar]
  2. Chen B, Yi GY and Cook RJ (2010). Weighted generalized estimating functions for longitudinal response and covariate data that are missing at random. Journal of the American Statistical Association, 105, 336–353. [Google Scholar]
  3. Chen J and Lazar NA (2012). Selection of working correlation structure in generalized estimating equations via empirical likelihood. Journal of Computational and Graphical Statistics, 21(1), 18–41. [Google Scholar]
  4. Gibbons RD and Hedeker D (1994). Application of randomeffects probit regression models. Journal of Consulting and Clinical Psychology, 62(2), 285. [DOI] [PubMed] [Google Scholar]
  5. Gosho M (2016) Model selection in the weighted generalized estimating equations for longitudinal data with dropout. Biometrical Journal, 58(3), 570–587. [DOI] [PubMed] [Google Scholar]
  6. Gosho M, Hamada C and Yoshimura I (2014). Selection of working correlation structure in weighted generalized estimating equation method for incomplete longitudinal data. Communications in Statistics-Simulation and Computation, 43, 62–81. [Google Scholar]
  7. Kim S, Zeng D, Chambless L and Li Y (2012). Joint models of longitudinal data and recurrent events with informative terminal event. Statistics in Biosciences, 4(2), 262–281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Kolaczyk ED (1995). An information criterion for empirical likelihood with general estimating equations. University of Chicago Technical Report 417, Dept. of Statistics, The University of Chicago. [Google Scholar]
  9. Laird NM (1988). Missing data in longitudinal studies. Statistics in Medicine, 7(1-2), 305–315. [DOI] [PubMed] [Google Scholar]
  10. Liang KY and Zeger SL (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 31–22. [Google Scholar]
  11. Little RJ, and Rubin DB (2014). Statistical Analysis with Missing Data. 2nd Edition, Hoboken: John Wiley & Sons. [Google Scholar]
  12. McCullagh P, and Nelder JA (1989). Generalized Linear Models. London: Chapman & Hall/CRC. [Google Scholar]
  13. Muntner P, Whittle J, Lynch AI, Colantonio LD, Simpson LM, Einhorn PT et al. (2015). Visit-to-visit variability of blood pressure and coronary heart disease, stroke, heart failure, and mortality: a cohort study. Annals of Internal Medicine, 163, 329–338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Owen AB (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2), 237–249. [Google Scholar]
  15. Owen AB (2001). Empirical Likelihood. New York: Chapman & Hall/CRC. [Google Scholar]
  16. Pan W (2001). Akaikeas information criterion in generalized estimating equations. Biometrics, 57(1), 120–125. [DOI] [PubMed] [Google Scholar]
  17. Parati G, Ochoa JE, Lombardi C and Bilo G (2013). Assessment and management of blood-pressure variability. Nature Reviews Cardiology, 10, 143. [DOI] [PubMed] [Google Scholar]
  18. Platt RW, Brookhart MA, Cole SR, Westreich D and Schisterman EF (2013). An information criterion for marginal structural models. Statistics in Medicine, 32(8), 1383–1393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Preisser JS, Lohman KK and Rathouz PJ (2002). Performance of weighted estimating equations for longitudinal binary data with drop-outs missing at random. Statistics in Medicine, 21(20), 3035–3054. [DOI] [PubMed] [Google Scholar]
  20. Qin J and Lawless J (1994). Empirical likelihood and general estimating equations. The Annals of Statistics, 22(1), 300–325. [Google Scholar]
  21. Qin J, Zhang B and Leung DHY (2009). Empirical likelihood in missing data problems. Journal of the American Statistical Association, 104(488), 1492–1503. [Google Scholar]
  22. Robins JM, Rotnitzky A and Zhao LP (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90(429), 106–121. [Google Scholar]
  23. Shardell M and Miller RR (2008). Weighted estimating equations for longitudinal studies with death and non-monotone missing time-dependent covariates and outcomes. Statistics in Medicine, 27, 1008–1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Shen CW and Chen YH (2012). Model selection for generalized estimating equations accommodating dropout missingness. Biometrics, 68(4), 1046–1054. [DOI] [PubMed] [Google Scholar]
  25. Shen CW and Chen YH (2018). Joint model selection of marginal mean regression and correlation structure for longitudinal data with missing outcome and covariate. Biometrical Journal, 60(1), 20–33. [DOI] [PubMed] [Google Scholar]
  26. Simon GE, Bauer MS, Ludman EJ, Operskalski BH and Unützer J (2007). Mood symptoms, functional impairment, and disability in people with bipolar disorder: specific effects of mania and depression. The Journal of Clinical Psychiatry, 68, 1237–1245. [DOI] [PubMed] [Google Scholar]
  27. Variyath AM, Chen J and Abraham B (2010). Empirical likelihood based variable selection. Journal of Statistical Planning and Inference, 140(4), 971–981. [Google Scholar]
  28. Wang M (2014). Generalized estimating equations in longitudinal data analysis: a review and recent developments. Advances in Statistics, 2014, 11 Available at: 10.1155/2014/303728 [DOI] [Google Scholar]
  29. Wedderburn RWM (1974). Quasi-likelihood functions, generalized linear models, and the gaussian-newton method. Biometrika, 61(3), 437–447. [Google Scholar]
  30. Ye J (1998). On measuring and correcting the effects of data mining and model selection. Journal of the American Statistical Association, 93(441), 120–131. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

proof

RESOURCES