Abstract
Cluster randomized trials (CRTs) usually randomize groups of individuals to interventions, and outcomes are typically measured at the individual level. Marginal intervention effects are frequently of interest in CRTs due to their population-averaged interpretations. Such effects are estimated using generalized estimating equations (GEE), or a recent alternative called the quadratic inference function (QIF). However, the performance of QIF relative to GEE have not been extensively evaluated in the CRT context, especially when the marginal mean model includes additional covariates. Motivated by the HALI trial, we conduct simulation studies to compare the finite-sample operating characteristics of QIF and GEE. We demonstrate that QIF and GEE are equivalent under some conditions. When the marginal mean model includes individual-level covariates, QIF shows an efficiency improvement over GEE with overall larger power, but its test size may be more liberal than GEE and GEE achieves better coverage than QIF. The test size inflation may not by fully addressed from using finite-sample bias corrections. The estimates of QIF tend to be closer to GEE in the HALI data, although the former presents a small standard error. Overall, we confirm that the QIF approach generally has potentially better efficiency than GEE in our simulation studies but might be more cautiously used as a viable approach for the analysis of CRTs. More research is needed, however, to address the finite-sample bias in the variance estimation of the QIF to better control its test size.
Keywords: Generalized estimating equations, Generalized method of moments, Marginal intervention effect estimation, Population-average intervention effect, Quadratic inference function
1. Introduction
A cluster randomized trial (CRT), also referred to as a group randomized trial or a community-randomized trial, is a randomized controlled trial where a cluster (e.g. hospital, village) of individuals is the unit of randomization [1]. The outcomes are usually measured for each individual member within a cluster. As a consequence of the cluster randomization design, the outcomes of individuals within the same cluster are expected to be correlated and this correlation, namely the intraclass correlation coefficient (ICC), must be accounted for in both the design and analysis [2,3].
During the analysis stage, the ICC is commonly accounted for using one of two modeling approaches for individual-level outcome data. One approach is the mixed-effects model, which estimates the cluster-specific (conditional) intervention effect. The other approach is the generalized estimating equations (GEE), which estimate the population-averaged (marginal) intervention effect [4]. The conditional and marginal intervention effects are equal to each other with the identity or log link, while they could differ with the logit link for binary outcomes [5]. The choice of marginal or conditional models depends on the research objectives and each model has its pros and cons. In particular, while the mixed-effects model requires the correct specification of the random-effects distribution to obtain consistent model-based variance [6], the marginal model is more robust with the sandwich variance being consistent even under misspecification of the correlation structure. For this reason, the marginal model coupled with GEE has been commonly used in the analysis of CRTs [7].
Despite the robustness of the marginal model, misspecification of the correlation structure could lead to reduced efficiency in estimating the intervention effect. To improve the efficiency of GEE, an alternative approach – the quadratic inference function (QIF) – has been developed [8,9]. Although it has been shown that QIF may have an efficiency advantage over GEE with correlated data that arise in the longitudinal data settings [8,9], it is not as commonly used in the analysis of CRTs, with a few exceptions [[10], [11], [12]]. The purpose of this article is to provide additional empirical evidence by comparing the performance of QIF and GEE in the CRT scenarios, and discuss several analytical insights of these two methods.
2. Motivating study: the HALI trial
The HALI (Health and Literacy Intervention) trial is a 22 factorial CRT conducted in Kenya to evaluate the impact of a malaria intervention and a literacy intervention on child health and educational outcomes [13,14]. We focus on the literacy intervention and do not address features of the factorial design in this paper. In other words, we treat the trial as a regular parallel-arm CRT. A multilevel data structure is present because children are nested in schools, which are nested in teacher advisory center; the randomization is carried out at the school level and stratified by each teacher advisory center. We focus on the 9-month outcome and only consider school-level clustering as previous analysis suggested minimal clustering at the teacher advisory center level [15]. There are 51 schools in the literacy intervention arm and 50 schools in the control arm, with approximately 25 children in each school.
The descriptive statistics in the HALI trial are provided in Table 1. The primary outcome is the spelling score, which ranges from 0 to 20 and is treated as a continuous outcome. Other baseline variables included in the trial are age, sex, baseline spelling score (individual-level variables) and the availability of handwashing facilities (school-level variable), which might have a direct or indirect relationship with the primary outcome. All baseline variables (with age log-transformed) are reasonably balanced between the intervention and the control arms due to randomization [15].
Table 1.
Level | Variable | Overall |
Intervention |
Control |
|
---|---|---|---|---|---|
(101 schools) | (51 schools) | (50 schools) | |||
Child-level | log-Age | n | 2230 | 1103 | 1127 |
Min | 1.6 | 1.6 | 1.6 | ||
Mean | 2.0 | 2.0 | 2.0 | ||
Median | 2.1 | 2.1 | 2.1 | ||
IQR | 1.92.2 | 1.92.2 | 1.92.2 | ||
Max | 2.7 | 2.7 | 2.6 | ||
SD | 0.22 | 0.21 | 0.22 | ||
Sex | n | ||||
Male | |||||
Female | |||||
Baseline spelling score |
n | 2193 | 1089 | 1104 | |
Min | 0 | 0 | 0 | ||
Mean | 8.2 | 8.5 | 7.9 | ||
Median | 7 | 8 | 7 | ||
IQR | |||||
Max | 20 | 20 | 19 | ||
SD | 4.50 | 4.66 | 4.33 | ||
School-level | Handwash facilities in school |
N | |||
Yes | |||||
No |
In the context of the HALI study, we consider CRTs with a continuous outcome measured at a single follow-up time point. We also assume a relatively large number of clusters (e.g., ) with equal cluster size. We first describe the GEE and QIF approaches to estimate the marginal intervention effect and analytically study their connections. The finite-sample operating characteristics of GEE and QIF are evaluated using simulations, and both approaches are applied to the HALI trial for empirical illustrations.
3. Statistical methods
3.1. Generalized estimating equations (GEE)
For correlated data arising from N clusters and m individuals in each cluster, we use , , to denote the p-dimensional design vector, individual-level outcome, and marginal mean of the jth individual in the ith cluster. We define the covariate matrix , outcome vector and mean vector for the ith cluster. We write as the working covariance matrix of the ith cluster, as the p-dimensional regression parameter vector, and define a generalized linear model , where is a monotonic and differentiable link function. The marginal variance is defined as , where is a parametric variance function and φ is the common dispersion. We also define the gradient matrix .
The generalized estimating equations (GEE) approach was originally developed for longitudinal data analysis based on quasi-likelihood [4,16] and formulated as
To specify the unknown covariance matrix , a working correlation matrix is assumed and frequently parameterized by a common parameter . Define the variance matrix as , and the covariance matrix can be written as . The estimation of and is carried out via the modified Fisher-scoring algorithm [4]. It is known that the GEE estimator is consistent for any choice of working correlation structure and has a robust ‘‘sandwich’’ covariance matrix given by
(1) |
A consistent estimator for the ‘‘sandwich’’ covariance matrix is obtained by replacing in equation (1) with its empirical version . In the context of CRTs, two most frequently used correlation structure is the independence and exchangeable structure. Interestingly, under some conditions, the GEE estimator provides identical point estimates using either correlation structure [8], as summarized in the following result.
Result 1
Under the independence and exchangeable working correlations, GEE produces identical point estimate and the robust sandwich covariance if the following two conditions are satisfied: (1) the marginal mean model only includes cluster-level covariates; (2) equal cluster sizes.
The proof of Result 1 is provided in Appendix I.
In CRTs with correlated binary outcomes, Pan [17] demonstrated the same result assuming a logistic marginal mean model, and Result 1 can be viewed as a generalization of this earlier result to arbitrary link and variance functions. In practice, this result implies that when only the intervention indicator is included in the marginal analysis of CRTs, the point estimate for is the same regardless of the working correlation specification as long as the cluster sizes are the same. Moreover, Result 1 has importantly implications for sample size and power calculations in the design stage. The equal cluster-size assumption is frequently assumed in designing CRTs [18], in which case the sample size requirements become identical under either working correlation specification.
3.2. Quadratic inference function (QIF)
The approach of quadratic inference function (QIF) was introduced to improve the efficiency of GEE in longitudinal data analysis under correlation misspecification [19]. The QIF approach expresses the inverse of the working correlation matrix as a linear combination of K basis matrices: with as the kth basis matrix and the weight. The first basis matrix is usually specified as the identity matrix . A -dimensional score vector is then defined as
Let be the list of extended score equations and be the empirical covariance matrix of . The QIF is written as . Based on the generalized method of moments (GMM) [20], the estimator can be more efficient than the GEE estimator in large samples when the working correlation is misspecified. From the first derivative of , the QIF estimator obtained by minimizing the function is asymptotically equivalent to solving [19], and the Newton-Raphson algorithm can be used to iteratively update the estimator for until convergence [9]. A consistent variance estimator of the QIF estimator then has a sandwich form
Due to the theoretical efficiency improvement of QIF over GEE, there has been increasing efforts in developing the theory of QIF for analyzing correlated data, including the following examples concerning longitudinal data analysis. A QIF likelihood-ratio test statistic, with asymptotic Chi-squared distribution, was proposed and shown useful to test for goodness-of-fit in longitudinal studies [19]. A penalized version of QIF can further accommodate the variable selection [21]. The Godambe Information (TGI) criterion and the trace of the empirical covariance matrix were developed to select the appropriate correlation structure [22,23]. The QIF approach has also been shown to automatically down-weight outlying observations [8], while GEE has unbounded influence function and can be sensitive to outliers. In addition, the QIF approach can also be utilized for meta-analysis with a flexible joint estimation procedure [24]. In small to moderately-sized samples, it was found that the standard errors of parameters can be severely downward-biased and two biased-corrected covariance estimators have shown to provide adequate finite-sample adjustments [9]. Furthermore, it was shown that imbalance of covariate distributions and of cluster sizes can also lead to larger variability of the QIF estimator [25].
In contrast to the longitudinal data setting, in a CRT with a single follow-up time-point, there is no natural ordering or structure of individuals in the same cluster. That is, decay-type structures are not appropriate but the exchangeable working correlation structure with a common ICC parameter is a natural choice. For the exchangeable working correlation structure, we can write , where is a -dimensional matrix of 1's. Following Li et al. [26], we have . As is not a full-rank matrix, a better use is to specify and as two full-rank basis matrices for exchangeable correlation structure of QIF [9]. We can also use QIF with an independence correlation structure with only one basis matrix of the identity matrix . Parallel to Result 1, we provide an additional insight that the results obtained from GEE and QIF are equal under conditions assumed below.
Result 2
Under the exchangeable working correlation, QIF and GEE produce identical point estimates and robust covariances if two conditions are satisfied: (1) the marginal mean model only includes cluster-level covariates; (2) equal cluster sizes.
The proof of Result 2 is provided in Appendix I.
Furthermore, it has been pointed out previously that QIF and GEE are identical when the independence working correlation is used [8,25]. Combining Result 1 and Result 2, we further summarize an additional result as follows.
Result 3
The point estimates and robust covariances are identical using either GEE or QIF with either independence or exchangeable working correlation, if the following two conditions are satisfied: (1) the marginal mean model only includes cluster-level covariates; (2) equal cluster sizes.
The proof of Result 3 is also provided in Appendix I.
As previously indicated, these results have implications for sample size procedures assuming equal cluster sizes, in which cases the sample requirements will be equivalent using either GEE or QIF coupled with either independence and exchangeable correlation structures.
4. Simulation studies
To study the empirical performance of QIF in CRTs, we carry out a series of simulation studies. Inspired by the HALI trial in our motivating example, we assume 100 clusters with 50 clusters randomized to the treatment and control arms. For simplicity, we specify the cluster sizes to be 25 for all 100 clusters. We consider two mean models and four types of correlation structures to form different data generating process (DGP) in CRTs. We then implement GEE and QIF with the correctly-specified mean model and exchangeable working correlation structure. In each simulation, we use a multivariate normal model to simulate outcomes in each cluster with individual-level variance . The four correlation structures used in the DGP include: fixed-regular exchangeable (CS0), regular exchangeable (CS1), cluster-specific exchangeable (CS2) and cluster-specific exchangeable with sub-clustering (CS3). The two selected mean models are: intervention-only model (MM1) and covariate-adjusted model (MM2).
In the intervention-only mean model (MM1), we use to indicate whether the ith cluster is in the intervention arm () or not (). The MM1 is given by . We fix and allow the marginal intervention effect to adopt a range of different values in the DGP. For MM2, we simulate four covariates whose distributions are informed by the HALI trial. Specifically, we simulate three individual-level covariates , and mimicking the age, sex and baseline spelling score; we assume follows a log-normal distribution with mean of 2 and standard deviation 0.2 (both on the log scale), follows a binomial distribution with probability of 0.5 and follows a normal distribution with mean 8 and standard deviation 5. We also simulate a cluster-level covariate from a binomial distribution with probability of 0.26 based on the school-level prevalence of handwash facilities. We write MM2 as . We specify different values for the intervention effect in the DGP, while keeping all other covariate effects constant as .
For the correlation structures, we first assume the fixed-regular exchangeable correlation structure (CS0) that all clusters have the same exchangeable correlation structure with ICC . For the regular exchangeable correlation structure (CS1), we assume all the clusters have the same exchangeable correlation matrices in the same simulation iteration but might vary in different iterations. The cluster-specific exchangeable correlation structure (CS2) further allows possibly different exchangeable correlation matrices for each cluster in each simulation iteration. Both CS1 and CS2 assume the ICC ρ in the exchangeable correlation structure is sampled from a uniform distribution from 0.01 to 0.2. For the cluster-specific exchangeable correlation structure with sub-clustering (CS3), we first generate a variable from a discrete uniform distribution in . For the jth and kth individuals in the ith cluster, we specify their pairwise correlation value to be if and are the same, indicating most closely-correlated. If and differ in 1, 2 or 3, we specify the correlation to be or , respectively, in order to model different degrees of pairwise correlations. Particularly, except for CS0, we have assumed the true correlation matrix is sampled from a population-level distribution; the purpose of this additional step is to provide a data generating process with the desired marginal mean and a complex correlation structure, but without the multivariate normality assumption. This type of DGP is less restrictive than the usual multivariate normal DGP, and we provide additional technical details on this type of DGP in Appendix II. For CS0, CS1 and CS2, we use five different values for in the DGP, and use six values for under CS3, leading to a total of 42 scenarios.
We generate replicates for each of 8 combinations of the 2 mean models and 4 correlation structures. Two models are fit for each simulated data, namely GEE with exchangeable working correlation and QIF with exchangeable working correlation. As we explain in Appendix II, the working correlation matrix will be incorrectly specified under DGP with CS2 and CS3. In all scenarios, the correct mean model is specified in the analysis, namely an intervention-only mean model is fit for data generated under MM1 and the covariate-adjusted mean model is fit for data generated under MM2.
The following metrics are used to compare the performance of GEE and QIF: (1) relative bias (RBS), which equals the bias relative to the true effect, (2) empirical standard error (ESE) of the estimates, and, (3) the mean robust standard error (MRSE) over all 3000 replicates. In addition, we set the nominal type I error rate to be 0.05 to obtain power for the Wald-type Z-test. We also calculate the power ratio (PR) defined by the power of QIF analysis relative to that of GEE analysis (Q/G). We additionally calculate the coverage probability (Coverage) for both GEE and QIF. Finally, the empirical type I error rate is estimated assuming a true null intervention effect () in the DGP.
We present results for DGP with MM1 and four different correlation structures in Table 2. The results indicate that GEE and QIF lead to the exact same estimates, and confirms the analytical insights from Result 2. With the exchangeable working correlation structure and balanced cluster sizes, MM1 only includes the cluster-level intervention variable, satisfying the conditions listed in Result 2. In this setting, the relative biases are small for both GEE and QIF, and the coverage probabilities are all close to nominal. The type I error rate, which is the power when , are also close to 0.05. In addition, in each scenario, the ESE and the MRSE are close to each other, indicating that the robust variance estimators are consistent for both GEE and QIF. Notably, the results in scenarios with CS3 and or have smaller power than those with CS0 and CS1, suggesting that both QIF and GEE could be less efficient with a misspecified correlation structure [19].
Table 2.
DGP: MM1 |
RBS |
ESE |
MRSE |
Power |
Coverage |
||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Analysis: MM1a | GEE | QIF | GEE | QIF | GEE | QIF | GEE | QIF | GEE | QIF | |
CS0 | – | – | – | – | – | – | 94.43% | 94.43% | |||
0.118 | 0.118 | 0.117 | 0.117 | 94.67% | 94.67% | ||||||
0.118 | 0.118 | 0.117 | 0.117 | 94.67% | 94.67% | ||||||
0.118 | 0.118 | 0.117 | 0.117 | 94.67% | 94.67% | ||||||
|
|
|
0.119 |
0.119 |
0.117 |
0.117 |
|
|
94.27% |
94.27% |
|
CS1 | – | – | – | – | – | – | 94.13% | 94.13% | |||
0.147 | 0.147 | 0.143 | 0.143 | 95.20% | 95.20% | ||||||
0.148 | 0.148 | 0.145 | 0.145 | 94.77% | 94.77% | ||||||
0.150 | 0.150 | 0.145 | 0.145 | 94.60% | 94.60% | ||||||
|
|
|
0.148 |
0.148 |
0.146 |
0.146 |
|
|
94.93% |
94.93% |
|
CS2 | – | – | – | – | – | – | 94.70% | 94.70% | |||
0.152 | 0.152 | 0.148 | 0.148 | 93.93% | 93.93% | ||||||
. | 0.150 | 0.150 | 0.148 | 0.148 | 94.10% | 94.10% | |||||
0.149 | 0.149 | 0.148 | 0.148 | 94.60% | 94.60% | ||||||
|
|
|
0.146 |
0.146 |
0.148 |
0.148 |
|
|
94.80% |
94.80% |
|
CS3 | – | – | – | – | – | – | 94.70% | 94.70% | |||
0.215 | 0.215 | 0.211 | 0.211 | 94.20% | 94.20% | ||||||
0.30% | 0.219 | 0.219 | 0.212 | 0.212 | 94.00% | 94.00% | |||||
0.216 | 0.216 | 0.212 | 0.212 | 94.53% | 94.53% | ||||||
0.212 | 0.212 | 0.212 | 0.212 | 94.60% | 94.60% | ||||||
0.213 | 0.213 | 0.212 | 0.212 | 94.10% | 94.10% |
The analysis utilizes mean model MM1 and exchangeable working correlation matrix with robust SE.
Table 3 presents the results for the DGP with MM2 and four correlation structures. Our analytical finding from Result 2 does not apply with MM2 which now includes multiple individual-level covariates. As a result, the GEE and QIF results differ, although both approaches give small relative bias across all scenarios. Interestingly, while GEE has smaller ESE compared to QIF, QIF presents smaller MRSE than GEE. This finding suggests that the robust variance of QIF tends to be biased towards zero under this complex mean model. The downward bias of QIF variance estimator further leads to under-coverage of the interval estimator, and a type I error inflation especially when the correlation structure deviates from CS0. In contrast, the type I error rate and coverage of the GEE estimator are more close to nominal throughout. The results in Table 3 also allow us to compare the efficiency between QIF and GEE, by comparing the ESE and the power. Under CS0 and CS1, GEE and QIF have almost identical power under the alternative, confirming that their results are asymptotically equivalent under correct correlation specification. When the working correlation model is misspecified, the power of both GEE and QIF will decrease, and QIF appears to be slightly more efficient, as evidenced by the results under CS3. Throughout, the power ratio of QIF over GEE is at most slightly larger than 1 (the largest increase in power is for QIF over GEE under CS3), and becomes closer to 1 as the effect size increases. However, one should be cautious in interpreting the efficiency advantage of QIF because (a) QIF carries an inflated type I error rate under the null and (b) QIF interval estimator frequently leads to under-coverage due to the negative bias in its robust variance estimator.
Table 3.
DGP: MM2 |
RBS |
ESE |
MRSE |
Power |
PR |
Coverage |
||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Analysis: MM2a | GEE | QIF | GEE | QIF | GEE | QIF | GEE | QIF | Q/G | GEE | QIF | |
CS0 | – | – | – | – | – | – | – | 94.73% | 93.07% | |||
0.120 | 0.123 | 0.117 | 0.114 | 1.0022 | 94.50% | 92.97% | ||||||
0.122 | 0.127 | 0.117 | 0.114 | 0.9983 | 93.53% | 91.90% | ||||||
0.120 | 0.124 | 0.117 | 0.114 | 0.9990 | 94.40% | 92.67% | ||||||
|
|
|
0.116 |
0.120 |
0.117 |
0.114 |
|
|
0.9990 |
94.37% |
93.13% |
|
CS1 | – | – | – | – | – | – | – | 93.93% | 92.17% | |||
0.150 | 0.154 | 0.146 | 0.142 | 1.0178 | 94.87% | 93.47% | ||||||
0.154 | 0.159 | 0.146 | 0.142 | 1.0001 | 93.27% | 91.80% | ||||||
0.151 | 0.156 | 0.145 | 0.141 | 1.0003 | 94.17% | 92.93% | ||||||
|
|
|
0.157 |
0.163 |
0.146 |
0.142 |
|
|
1.0003 |
93.77% |
91.73% |
|
CS2 | – | – | – | – | – | – | – | 94.33% | 92.70% | |||
0.151 | 0.153 | 0.148 | 0.143 | 1.0202 | 94.40% | 93.20% | ||||||
0.154 | 0.158 | 0.148 | 0.143 | 1.0145 | 94.37% | 92.03% | ||||||
0.151 | 0.154 | 0.148 | 0.143 | 1.0003 | 94.93% | 93.53% | ||||||
|
|
|
0.152 |
0.155 |
0.148 |
0.143 |
|
|
1.0010 |
94.10% |
92.43% |
|
CS3 | – | – | – | – | – | – | – | 94.07% | 92.83% | |||
0.221 | 0.228 | 0.212 | 0.206 | 1.0446 | 94.07% | 92.10% | ||||||
0.220 | 0.227 | 0.212 | 0.206 | 1.0262 | 93.67% | 91.87% | ||||||
0.215 | 0.221 | 0.211 | 0.205 | 1.0065 | 94.10% | 92.67% | ||||||
0.216 | 0.223 | 0.212 | 0.206 | 1.0007 | 94.13% | 92.93% | ||||||
0.213 | 0.220 | 0.212 | 0.206 | 1.0000 | 94.53% | 93.33% |
The analysis utilizes mean model MM2 and exchangeable working correlation matrix with robust SE.
In an effort to potentially reduce the bias in QIF variance estimator under MM2, we additionally explore two bias-corrections for the variance proposed by Westgate [9]. These two bias-corrections are extensions of the Mancl-DeRouen (MD) and Kauermann-Carroll (KC) methods proposed in the GEE literature [27,28], and we denote them by and . The details of the two bias-correction methods for QIF are given in Appendix III. Table 4 shows that the type I error rates from the two bias-correction methods are closer to the nominal 0.05 level that from QIF without correction. However, the type I error rate of bias-corrected QIF remains liberal and consistently larger than that of GEE, cautioning the use of QIF in CRTs when the marginal mean model is complex with multiple individual-level covariates.
Table 4.
DGP: MM2 Analysis: MM2a |
Method | ESE | MRSE | Power (Type I error) | |
---|---|---|---|---|---|
CS0 | GEE | 0.119 | 0.117 | 5.27% | |
QIF | 0.123 | 0.114 | 6.93% | ||
0.123 | 0.117 | 5.93% | |||
|
0.123 |
0.115 |
6.33% |
||
CS1 | GEE | 0.152 | 0.145 | 6.07% | |
QIF | 0.159 | 0.141 | 7.83% | ||
0.159 | 0.146 | 7.27% | |||
|
0.159 |
0.143 |
7.50% |
||
CS2 | GEE | 0.152 | 0.148 | 5.67% | |
QIF | 0.155 | 0.143 | 7.30% | ||
0.155 | 0.148 | 6.43% | |||
|
0.155 |
0.145 |
6.90% |
||
CS3 | GEE | 0.215 | 0.212 | 5.93% | |
QIF | 0.220 | 0.206 | 7.17% | ||
0.220 | 0.212 | 6.33% | |||
0.220 | 0.209 | 6.73% |
The analysis utilizes mean model MM2 and exchangeable working correlation matrix with robust SE and bias-corrected SEs.
As pointed out by a reviewer, a possible explanation for the bias of the variance estimator for QIF and the associated inflated type I error could be the large variability of the empirical weighting or covariance matrix . Previous studies have suggested that including additional covariates in the mean model could lead to increased variability in estimating , which affects the efficiency of the QIF estimator [23,25,29,30]. It remains to be explored whether improved estimation of along the lines of Westgate [29] and Westgate [30] coupled with the bias-corrected variances could reduce the negative bias in the robust variance of QIF and improve the coverage rate of the interval estimator.
5. Analysis of HALI trial
We apply GEE and QIF to the analysis of the HALI trial and focus on the continuous outcome of spelling score at the 9-month follow-up [15]. From the descriptive statistics, we observe the baseline covariates are approximately the same across arms, but the mean value of the spelling score at 9-month follow-up differs between the arms, indicating a potential intervention effect due to the literacy intervention. In the marginal mean model, we sequentially include the cluster-level intervention , age , sex , presence of handwash facilities and baseline spelling score . We utilize both independence and exchangeable correlation structures based on three models on the marginal mean outcome in Table 5 to estimate the unadjusted and covariate-adjusted literacy intervention effects. The intervention parameter is denoted as . Specifically, the first mean model is an unadjusted model with intervention indicator as the only covariate, which corresponds to MM1 in simulation studies. The second and third mean models are covariate-adjusted models with three and four other covariates, with the third model corresponding to MM2 in simulation studies. We denote and as GEE with the independence and exchangeable correlation structures, respectively. The notations of and are similarly used for QIF. In addition, we also consider the bias-correction techniques of QIF [9] for the two correlation structures and denote them as , , and .
Table 5.
Model | Formulation |
---|---|
1 | |
2 | |
3 |
We summarize the results in Table 6. The intervention effect estimates from GEE and QIF are generally close to each other under mean model 1, but may be slightly different under mean model 2 and 3. Specifically, the intervention effect estimate tends to be larger using QIF and assuming working exchangeable correlation compared to the rest of methods. Although the standard error of the uncorrected variance of QIF appears to be the smallest, it may carry negative bias as suggested in simulation studies. The two bias-corrected variances of QIF could slightly reduce the bias and improve the variance estimator. For example, the MD corrected QIF standard error is close to the GEE standard error with the exchangeable working correlation. Overall, the standard error estimates are similar across methods under each specific mean model, suggesting that the application of QIF may have limited efficiency improvement over GEE in this data example. However, across the mean models, the standard error estimates for all methods sharply decrease when mean model 3 is considered compared to the rest of mean models, suggesting a strong predictive effect of the baseline spelling score.
Table 6.
Model | Method | Estimate | S.E. | confidence interval | p-value |
---|---|---|---|---|---|
1 | 1.766 | 0.4813 | (0.823, 2.709) | 0.00024 | |
1.758 | 0.4819 | (0.813, 2.703) | 0.00026 | ||
1.766 | 0.4813 | (0.822, 2.709) | 0.00024 | ||
1.766 | 0.4913 | (0.803, 2.728) | 0.00033 | ||
1.766 | 0.4863 | (0.812, 2.719) | 0.00028 | ||
1.797 | 0.4748 | (0.866, 2.727) | 0.00015 | ||
1.797 | 0.4847 | (0.847, 2.747) | 0.00021 | ||
|
1.797 |
0.4797 |
(0.856, 2.737) |
0.00018 |
|
2 | 1.811 | 0.4758 | (0.878, 2.744) | 0.00014 | |
1.842 | 0.4774 | (0.906, 2.778) | 0.00011 | ||
1.811 | 0.4758 | (0.879, 2.744) | 0.00014 | ||
1.811 | 0.4943 | (0.842, 2.780) | 0.00025 | ||
1.811 | 0.4849 | (0.861, 2.762) | 0.00019 | ||
2.056 | 0.4583 | (1.158, 2.955) | 0.00001 | ||
2.056 | 0.4734 | (1.128, 2.984) | 0.00001 | ||
|
2.056 |
0.4659 |
(1.143, 2.969) |
0.00001 |
|
3 | 1.413 | 0.2868 | (0.851, 1.975) | ||
1.446 | 0.2934 | (0.871, 2.021) | |||
1.413 | 0.2868 | (0.851, 1.975) | |||
1.413 | 0.2980 | (0.829, 1.997) | |||
1.413 | 0.2924 | (0.840, 1.986) | |||
1.601 | 0.2817 | (1.049, 2.153) | |||
1.601 | 0.2922 | (1.028, 2.173) | |||
1.601 | 0.2872 | (1.038, 2.164) |
6. Discussion
In this paper, we compare the QIF approach with the more commonly-used GEE approach for the estimation of intervention effect in CRTs. In particular, we focus on CRTs with continuous outcomes at one follow-up time point, a large number of clusters and equal cluster sizes, similar to the motivating data example. We present three analytical results on the equivalence between GEE and QIF under specific conditions to explicitly acknowledge their connections. Our simulation studies also confirm these analytical results. Although our simulations show a potential power advantage of QIF over GEE, the inflated type I error rate of QIF cautions its use when the marginal mean model includes multiple baseline covariate. On the other hand, the GEE approach performs quite stable under complex mean and correlation models in our setting with a large number of clusters.
Our simulation results suggest that the two specific limitations of QIF in the application to CRTs, which may be addressed by further research. First, we found an inflated type I error rate of QIF when the DGP concerns a complex marginal mean model MM2. Surprisingly, the application of the two bias-correction techniques does not fully address this issue, as the empirical type I error rate is still above 7% across 3000 simulations. The type I error inflation is mostly due to the negative bias of the QIF variance estimator. In CRTs, a better control of type I error rate may be achieved by permutation test. In future studies, one could consider developing the marginal-model-based permutation analysis as in Braun and Feng [31] and Li et al. [32] for QIF to achieve better finite-sample properties. Second, despite a potential power advantage of QIF over GEE suggested by Table 3, we additionally saw a larger ESE for QIF compared to GEE. In fact, if we define the relative efficiency based on the ratio of the ESE, then Table 3 suggests that QIF could be slightly less efficient than GEE, even though the asymptotic theory states otherwise. As we explain in the simulations, the reduced efficiency of QIF may be attributed to the large variability in estimating the empirical weighting or covariance matrix under a complex mean model [25]. Additionally studies are required to systematically evaluate whether improved estimation of [25,29,30] can lead to better efficiency of QIF and eventually address the bias of its variance estimates.
To conclude, our empirical evaluation supports the use of GEE over QIF in CRTs with a large number of clusters. We observe that the QIF could exhibit inflated type I error rate and under-coverage when the marginal mean model includes baseline adjustment variables other than the intervention status, while the GEE approach performs consistently well across all scenarios. The bias-corrected variances of QIF also shows limited improvement in terms of type I error rate, and more empirical evaluations of QIF are required to clearly demonstrate its theoretical efficiency advantage over GEE before recommending its routine applications in CRTs.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Acknowledgements
The paper was done in part as a master's thesis of the first author. We thank Simon Brooker and Matthew Jukes, Principal Investigators of the HALI trial, for agreeing to the use of data. We thank all the participants of the study for providing the data. We also thank Huiman X. Barnhart and Yuliya V. Lokhnygina for reviewing the manuscript and helpful discussions as the thesis committee members at the various stages of this work.
APPENDIX I. Proof of Result 1, 2, 3
Result 1
Suppose that there are only cluster-level covariate vector for each individual in the ith cluster with covariate-vector denoted by . We have and the individual-level variance . With only cluster-level covariates, the individual-level means and also variances are equal in the ith cluster with and for . For convenience, we set , and . Thus, we can express the cluster-level mean vector as follows: , and .
For an independence working correlation structure, we can simplify the GEE with as follows: . Furthermore, the ‘‘sandwich’’ covariance matrix for the independence working correlation matrix can be denoted by
which shares the same form with the consistent covariance estimator with only being replaced by .
For an exchangeable working correlation structure, there is some value ρ for which and . We thus let and . As a consequence, we can simplify the GEE as follows:
Since for , we can further drop out the non-zero scalar to give . Therefore, the estimating equations reduce to the same form for GEE with independence and exchangeable correlation structures. Furthermore, the ‘‘sandwich’’ covariance matrix for the exchangeable working correlation is
with replaced by for . Therefore, GEE with independence and exchangeable correlation structures share the same estimating equation form as well as robust ‘‘sandwich’’ covariance estimator, which further implies numerically equivalent estimators and robust covariance estimators.
Result 2
For convenience, we set . We have and with for a more general choice of basis matrices of QIF with the exchangeable correlation structure. We use specifically in our numerical studies. For general exchangeable correlation structure, we have
and
We set , and . As is known, QIF has the asymptotic estimating equation , which is implemented for the QIF estimator. We have
which further implies that . Both and are positive-definite and non-singular, the estimating equation is equivalent to . Therefore, the estimating equations for GEE and QIF are simplified to the same form with exchangeable working correlation structure. Furthermore, the covariance estimator for QIF with exchangeable working correlation is
which is the same as the ‘‘sandwich’’ covariance estimator of GEE with exchangeable working correlation structure. Therefore, GEE and QIF with exchangeable correlation structures share the same estimating equation form and covariance estimator, which further implies numerically equivalent estimators and covariance estimators.
Result 3
Based on Result 1 and Result 2, we need only to prove that QIF with independence and exchangeable working correlation structures have identical estimating equations and covariance estimators to fulfill Result 3. For QIF with independence correlation structure, we have , , and . Therefore, under independence correlation structure, we have the asymptotic estimating equation of QIF to be , which also implies that and is the same as the result in QIF with exchangeable correlation structure. The covariance estimator for QIF with independence working correlation is
Thus, both the asymptotic estimating equations and covariance estimators for QIF are simplified to the same forms with independence and exchangeable correlation structures. Combining with Result 1 and Result 2, Result 3 is proven.
APPENDIX II. Additional details of the data generating process
A potential difference between our data generating process (DGP) and previously published simulations (in the GEE or QIF literature) is that we have assumed that the correlation matrix is randomly sampled from some population distribution. Here, we show that this DGP is valid as it corresponds to an induced and fixed marginal correlation matrix. In other words, once we marginalize over the population distribution of the correlation matrix, our DGP follows a specific marginal mean and marginal correlation structure, but dispenses with the multivariate normality assumption. We will explain details of DGP under CS0, CS1, CS2 and CS3 below. Throughout, we let be the individual-level variance, as we are simulating correlated continuous outcomes.
Fixed-regular exchagneable (CS0)
With a fixed off-diagonal value in the correlation matrix, we have for each cluster i
For , we also have . Therefore, under CS0, follows multivariate normal outcomes, and has a zero covariance matrix with for .
Regular exchangeable (CS1)
In this scenario, we assume the common correlation ρ is sampled from a uniform distribution with lower and upper bounds . Denote and observe that conditional on a realized value of ρ, . We can then marginalize over the sampling variability of ρ to get and
For , we also have
Therefore, under CS1, and have a zero covariance matrix, and still follows a distribution with mean and exchangeable correlation , but is no longer multivariate normal (after accounting for the sampling variability of ρ).
Cluster-specific exchangeable (CS2)
Given that the cluster-specific correlation is now sampled from a uniform distribution with lower and upper bounds , we similarly have and . We then marginalize over the variability of to get and
For , we also have
Therefore, under CS2, we still have a zero covariance matrix for and , and follows a distribution with mean and exchangeable correlation , but again is no longer multivariate normal (after accounting for the sampling variability of ρ).
Cluster-specific exchangeable with sub-clustering (CS3)
Let with sampled from a discrete uniform distribution in , we have the correlation matrix , which is uniquely determined by . Then, the outcome has conditional distribution
We now denote the population-level mean of is , which averages over the sampling variability of . We then marginalize over the variability of to get and
For , we also have
Therefore, under CS3, we still have a zero covariance matrix for and , and follows a distribution with mean and marginal common correlation , but again is no longer multivariate normal (after accounting for the sampling variability of ).
APPENDIX III. Bias-corrected covariance estimators
The Mancl-DeRouen and Kauermann-Carroll bias-correction methods for QIF [9] are introduced in the similar expressions as the corresponding bias-correction methods for GEE [27,28]. Let
and
we have the covariance estimator of QIF . The Mancl-DeRouen and Kauermann-Carroll bias-corrected covariance estimators have the general form
where
Define
the Mancl-DeRouen bias-corrected covariance estimator has
while the Kauermann-Carroll bias-corrected covariance estimater has
APPENDIX IV. Table of abbreviations
We present the acronyms and abbreviations in Table 7.
Table 7.
Acronym | Full name |
---|---|
CRT | cluster randomized trial |
ICC | intracluster correlation coefficient |
GEE | generalized estimating equations |
QIF | quadratic inference function |
GMM | generalized method of moments |
HALI | health and literacy intervention |
IQR | interquartile range |
SD | standard deviation |
TGI | the godambe infomration criterion |
DGP | data generating process |
CS | correlation structure in the data generating process |
MM | mean model in the data generating process |
CS0 | fixed-regular exchangeable correlation structure |
CS1 | regular exchangeable correlation structure |
CS2 | cluster-specific exchangeable correlation structure |
CS3 | cluster-specific exchangeable with sub-clustering correlation structure |
MM1 | intervention-arm only model |
MM2 | intervention-arm with 4 covariates model |
RBS | relative bias |
SE | standard error |
ESE | empirical standard error |
MRSE | mean robust standard error |
PR | power ratio |
Q/G | quadratic inference function relative to generalized estimating equations |
quadratic inference function with the Mancl-DeRouen bias-correction method | |
quadratic inference function with the Kauermann-Carroll bias-correction method | |
generalized estimating equations using independence working correlation | |
generalized estimating equations using exchangeable working correlation | |
quadratic inference function with Mancl-DeRouen bias correction and independence | |
quadratic inference function with Kauermann-Carroll bias correction and independence | |
quadratic inference function with Mancl-DeRouen bias correction and exchangeable | |
quadratic inference function with Kauermann-Carroll bias correction and exchangeable |
References
- 1.Hayes Richard J., Moulton Lawrence H. CRC Press; 2009. Cluster Randomizd Trials; pp. 3–40. (chapter 1-3) [Google Scholar]
- 2.Turner Elizabeth L., Li Fan, Gallis John A., Prague Melanie, Murray David M. Review of recent methodological developments in group-randomized trials: part 1–design. Am. J. Publ. Health. 2017;107(6):907–915. doi: 10.2105/AJPH.2017.303706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Turner Elizabeth L., Prague Melanie, Gallis John A., Li Fan, Murray David M. Review of recent methodological developments in group-randomized trials: part 2–analysis. Am. J. Publ. Health. 2017;107(7):1078–1086. doi: 10.2105/AJPH.2017.303707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Liang Kung-Yee, Zeger Scott L. Longitudinal data analysis using generalized linear models. Biometrika. April 1986;73(1):13–22. [Google Scholar]
- 5.Ritz John, Spiegelman Donna. Equivalence of conditional and marginal regression models for clustered and longitudinal data. Stat. Methods Med. Res. 2004;13(4):309–323. [Google Scholar]
- 6.Hubbard Alan E., Ahern Jennifer, Fleischer Nancy L., Van der Laan Mark, Lippman Sheri A., Jewell Nicholas, Bruckner Tim, Satariano William A. To gee or not to gee: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health. Epidemiology. July 2010;21(4):467–474. doi: 10.1097/EDE.0b013e3181caeb90. [DOI] [PubMed] [Google Scholar]
- 7.Preisser John S., Young Mary L., Zaccaro Daniel J., Wolfson Mark. An integrated population-averaged approach to the design, analysis and sample size determination of cluster-unit trials. Stat. Med. 2003;22(8):1235–1254. doi: 10.1002/sim.1379. [DOI] [PubMed] [Google Scholar]
- 8.Qu Annie, Song Peter X.-K. Assessing robustness of generalised estimating equations and quadratic inference functions. Biometrika. June 2004;91(2):447–459. [Google Scholar]
- 9.Westgate Philip M. A bias-corrected covariance estimate for improved inference with quadratic inference function. Stat. Med. December 2012;31(29):4003–4022. doi: 10.1002/sim.5479. [DOI] [PubMed] [Google Scholar]
- 10.Asgari Fereshteh, Biglarian Akbar, Seifi Behjat, Bakhshi Andisheh, Miri Hamid Heidarian, Bakhshi Enayatollah. Using quadratic inference functions to determine the factors associated with obesity: findings from the steps survey in Iran. Ann. Epidemiol. 2013;23(9):534–538. doi: 10.1016/j.annepidem.2013.07.006. [DOI] [PubMed] [Google Scholar]
- 11.Bakhshi Enayatollah, Etemad Koorosh, Seifi Behjat, Mohammad Kazem, Biglarian Akbar, Koohpayehzadeh Jalil. Changes in obesity odds ratio among iranian adults, since 2000: quadratic inference functions method. Comput. Math. Method. Med. 2016;2016 doi: 10.1155/2016/7101343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Yang Kun, Tao Lixin, Mahara Gehendra, Yan Yan, Cao Kai, Liu Xiangtong, Chen Sipeng, Xu Qin, Liu Long, Wang Chao. An association of platelet indices with blood pressure in beijing adults: applying quadratic inference function for a longitudinal study. Medicine. 2016;95(39) doi: 10.1097/MD.0000000000004964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Brooker Simon, Okello George, Njagi Kiambo, Dubeck Margaret M., Halliday Katherine E., Inyega Hellen, Jukes Matthew CH. Improving educational achievement and anaemia of school children: design of a cluster randomised trial of school-based malaria prevention and enhanced literacy instruction in Kenya. Trials. 2010;11(1):93. doi: 10.1186/1745-6215-11-93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Halliday Katherine E., Okello George, Turner Elizabeth L., Njagi Kiambo, Mcharo Carlos, Kengo Juddy, Allen Elizabeth, Dubeck Margaret M., Jukes Matthew CH., Brooker Simon J. Impact of intermittent screening and treatment for malaria among school children in Kenya: a cluster randomised trial. PLoS Med. 2014;11(1) doi: 10.1371/journal.pmed.1001594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Jukes Matthew CH., Turner Elizabeth L., Dubeck Margaret M., Halliday Katherine E., Inyega Hellen N., Wolf Sharon, Zuilkowski Stephanie Simmons, Brooker Simon J. Improving literacy instruction in Kenya through teacher professional development and text messages support: a cluster randomized trial. J. Res. Edu. Effect. 2017;10(3):449–481. [Google Scholar]
- 16.Wedderburn Robert WM. Quasi-likelihood functions, generalized linear models, and the gauss-Newton method. Biometrika. December 1974;61(3):439–447. [Google Scholar]
- 17.Pan Wei. Sample size and power calculations with correlated binary data. Contr. Clin. Trials. 2001;22(3):211–227. doi: 10.1016/s0197-2456(01)00131-3. [DOI] [PubMed] [Google Scholar]
- 18.Rutterford Clare, Copas Andrew, Eldridge Sandra. Methods for sample size determination in cluster randomized trials. Int. J. Epidemiol. 2015;44(3):1051–1067. doi: 10.1093/ije/dyv113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Qu Annie, Lindsay Bruce G., Li Bing. Improving generalised estimating equations using quadratic inference functions. Biometrika. December 2000;87(4):823–836. [Google Scholar]
- 20.Hansen Lars Peter. Large sample properties of generalized method of moments estimators. Econometrica. July 1982;50(4):1029–1054. [Google Scholar]
- 21.Wang Lan, Qu Annie. Consistent model selection and data-driven smooth tests for longitudinal data in the estimating equations approach. J. Roy. Stat. Soc. January 2009;71(1):177–190. [Google Scholar]
- 22.Song Peter X.-K., Jiang Zhichang, Park Eunjoo, Qu Annie. Quadratic inference functions in marginal models for longitudinal data. Stat. Med. December 2009;28(29):3683–3696. doi: 10.1002/sim.3719. [DOI] [PubMed] [Google Scholar]
- 23.Westgate Philip M. Criterion for the simultaneous selection of a working correlation structure and either generalized estimating equations or the quadratic inference function approach. Biom. J. May 2014;56(3):461–476. doi: 10.1002/bimj.201300098. [DOI] [PubMed] [Google Scholar]
- 24.Wang Fei, Wang Lu, Song Peter X.-K. Quadratic inference function approach to merging longitudinal studies: validation and joint estimation. Biometrika. September 2012;99(3):755–762. [Google Scholar]
- 25.Westgate Philip M., Braun Thomas M. The effect of cluster size imbalance and covariates on the estimation performance of quadratic inference functions. Stat. Med. September 2012;31(20):2209–2222. doi: 10.1002/sim.5329. [DOI] [PubMed] [Google Scholar]
- 26.Li Fan, Forbes Andrew B., Turner Elizabeth L., Preisser John S. Power and sample size requirements for gee analyses of cluster randomized crossover trials. Stat. Med. 2019;38(4):636–649. doi: 10.1002/sim.7995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Mancl Lloyd a, DeRouen T.A. A covariance estimator for GEE with improved small-sample properties. Biometrics. 2001;57(1):126–134. doi: 10.1111/j.0006-341x.2001.00126.x. [DOI] [PubMed] [Google Scholar]
- 28.Kauermann Göran, Carroll R.J. A note on the efficiency of sandwich covariance matrix estimation. J. Am. Stat. Assoc. 2001;96(456):1387–1396. [Google Scholar]
- 29.Westgate Philip M., Braun Thomas M. An improved quadratic inference function for parameter estimation in the analysis of correlated data. Stat. Med. August 2013;32(19):3260–3273. doi: 10.1002/sim.5715. [DOI] [PubMed] [Google Scholar]
- 30.Westgate Philip M. A comparison of utilized and theoretical covariance weighting matrices on the estimation performance of quadratic inference functions. Commun. Stat. Simulat. Comput. 2014;43(10):2432–2443. [Google Scholar]
- 31.Braun Thomas M., Feng Ziding. Optimal permutation tests for the analysis of group randomized trials. J. Am. Stat. Assoc. dec 2001;96(456):1424–1432. [Google Scholar]
- 32.Li Fan, Turner Elizabeth L., Heagerty Patrick J., Murray David M., Vollmer William M., DeLong Elizabeth R. An evaluation of constrained randomization for the design and analysis of group-randomized trials with binary outcomes. Stat. Med. 2017;36(24):3791–3806. doi: 10.1002/sim.7410. [DOI] [PMC free article] [PubMed] [Google Scholar]