Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jan 1.
Published in final edited form as: Commun Stat Simul Comput. 2022 Feb 23;53(2):1048–1067. doi: 10.1080/03610918.2022.2039396

A weighted Jackknife approach utilizing linear model based-estimators for clustered data

Ruofei Du a,**, Ye Jin Choi b, Ji-Hyun Lee c, Ounpraseuth Songthip a, Zhuopei Hu a
PMCID: PMC10959512  NIHMSID: NIHMS1798418  PMID: 38523866

Abstract

Small number of clusters combined with cluster level heterogeneity poses a great challenge for the data analysis. We have published a weighted Jackknife approach to address this issue applying weighted cluster means as the basic estimators. The current study proposes a new version of the weighted delete-one-cluster Jackknife analytic framework, which employs Ordinary Least Squares or Generalized Least Squares estimators as the fundamentals. Algorithms for computing estimated variances of the study estimators have also been derived. Wald test statistics can be further obtained, and the statistical comparison in the outcome means of two conditions is determined using the cluster permutation procedure. The simulation studies show that the proposed framework produces estimates with higher precision and improved power for statistical hypothesis testing compared to other methods.

Keywords: small number of clusters, heterogeneity, weighted Jackknife

1. Introduction

The most essential characteristic that makes the clustered data analysis standing out from general data analyses is the so-called intracluster correlation (Murray, 1998). It describes a commonly seen fact that an outcome variable collected within the same cluster resembles each other, which sets the assumption of independent observations in a common statistical analysis as invalid. If the randomly selected clusters are surveyed from a target population, a widely applied approach for the clustered data analysis with a continuous outcome is through linear mixed-effects model (LMM) fitting in which the intracluster correlation is accounted by including a random effect of the cluster (Harville, 1977; Murray, 1998). The model fitting for the parameter estimation and the hypothesis testing in a LMM are all derived assuming the number of surveyed clusters is sufficiently large and the random effect follows a Normal distribution (Harville, 1977, 1988; Littell, Stroup, Milliken, Wolfinger, & Schabenberger, 2006). However, often real-world datasets encounter violated assumptions, or the assumptions are not able to be justified, e.g. due to a small number of clusters in a study.

For cluster randomized trials (CRTs), which is one of the major application fields of the clustered data analysis, it is recommended at minimum 30–40 clusters are needed for LMM fitting (Donner & Klar, 1996; Eldridge & Kerry, 2012; Murray, Varnell, & Blitstein, 2004). However, it is often observed that many studies end up with a small number of clusters or even few clusters (Bogart et al., 2016; Jairath et al., 2015; Roetzheim et al., 2004) due in part to practical hurdles, such as limited study budget and available clusters. It has long been recognized that a small number of clusters in CRTs is associated with an inflated type I error rate (Bland, 2004; Fay & Graubard, 2001; Kahan et al., 2016; Murray, 1998). The small-sample corrections have been focused either on the approximation of the degree freedom of the null distribution for a test statistics to compare (Kenward & Roger, 1997; Satterthwaite, 1946), or on the estimation of the variance of a test statistic (Mancl & DeRouen, 2001; Pan & Wall, 2002). The analytic issue with a small number of clusters has drawn attention in other disciplines as well, including but limited to behaviour sciences (McNeish & Stapleton, 2016), educational psychology studies (Bauer & Sterba, 2011), and economics (Donald & Lang, 2007). In addition to the above-mentioned corrections on the small number of clusters, others also proposed bootstrap methods in estimation of the variance of a statistic (Cameron, Gelbach, & Miller, 2008; Canay, Santos, & Shaikh, 2018; Flynn & Peters, 2004).

Heterogeneity (also termed as heteroscedasticity) in clustered data is usually to characterize the situation that the within-cluster random errors do not share the same variance value among the clusters. Cluster robust variance estimates (CRVE) were promoted to address the within-cluster heterogeneity (Cameron & Miller, 2010; Rogers, 1993; White, 1980). When the inference of a CRVE is conducted with a small number of clusters or few clusters, simple adjustments or other means were suggested to correct the downward bias due to small-sample size (Cameron et al., 2008; Cameron & Miller, 2010; Nichols & Schaffer, 2007). Despite the increasing amount of research concerning both the small number of clusters and within-cluster heterogeneity, another important type of heterogeneity that has been left unaddressed is the cluster-level heterogeneity. For LMM setting, the concern is whether or not an assumed Normal distribution on the random effect of the clusters is able to adequately accommodate the cluster-level heterogeneity in addition to within-cluster one, especially in the combined consideration with a small number of clusters. As shown in our previous work (Du & Lee, 2019), some widely applied approaches (LMM with KR correction and CRVE based methods for few clusters) may be uncompetitive when dealing with few clusters also with the cluster-level heterogeneity.

Motivated by the comment “Unequal variances can occur at any set of levels of a random effect” (Chapter 9, Littell et al. (2006)), we relax the assumption and allow the random effect of clusters could follow Normal distributions with varied variances relative to one another clusters. A set of Ordinary Least Squares (OLS) or Generalized Least Squares (GLS) estimators are obtained following a delete-one-cluster Jackknife procedure. The outcome mean for a condition, or the contrast of the outcome means between the conditions, is then estimated by a weighted sum of those OLS or GLS estimators. Algorithms for computing estimated variances of the study estimators have also been derived. The Wald type test statistics are further obtained, and a statistical significance is determined following a cluster permutation testing procedure. The rest of the paper is organized as follows: in Section 2 we propose OLS- and GLS-based Weighted Jackknife estimators for a clustered dataset; Section 3 presents our simulation studies; Section 4 illustrates an application of the developed methods with a real dataset; in Section 5 we discuss our findings. Detailed derivations of the methods are presented in Appendix.

2. Methods

Let Yjk be the outcome corresponding to individual k from cluster j, and it can be expressed as,

Yjk=μi+γj+ϵjk (1)

where μi is the outcome mean for the ith condition that the kth cluster is associated to, γj is the random effect due to clusters, and ϵjk is the random error within cluster j. In order to clearly present the derivation, the clusters are indexed across the conditions, from 1 to G. Considering the heterogeneity at the cluster level, γj assumed to follow a Normal distribution but possibly with a different variance relative to other clusters: γj~N(0,σγj2). It also allows unequal variances for the error term among the clusters: ϵjk~N(0,σϵj2). In addition, all γj’s and ϵjk’s are assumed independent to one another. The fixed effects in model (1) are μi’s, denoted by a vector β=(μ1,μ2)T. The model equation in matrix form is

Y=[μ1μ1μ2μ2]+[γ1γ1γ2γ2γGγG]+[ϵ11ϵ1n1ϵ21ϵ2n2ϵG1ϵGnG]=(10100101)[μ1μ2]+(111111)[γ1γ2γG]+[ϵ11ϵ1n1ϵ21ϵ2n2ϵG1ϵGnG]=Xβ+Diag{1nj}γ+ϵ,

where no-valued indicated places inside a matrix contain all zeros; nj is for cluster size of cluster j;Diag{1nj} denotes the matrix having 1nj’s as the diagonal vectors and zeros elsewhere.

Our method development is based on two popular linear model-based estimators: OLS and GLS. In derivations of our method, we restrict to have only two conditions: the treatment vs. control condition, while the methods can be extended naturally for more than two conditions.

2.1. OLS based Weighted Jackknife estimator

Denote θ = l′β, a linear combination of β. With an appropriately valued l we can form a parameter of interest to estimate. For example, we can set l′ = (1, 0) to estimate the parameter μ1, the outcome mean for the first condition; we can also set l′ = (1, −1) to estimate the difference in two condition effects μ1μ2. To illustrate the analytical derivation, the indices of the clusters are ordered starting from the treatment condition and then to the control condition. The OLS based weighted Jackknife (WJK-OLS) estimator of θ is expressed as follows,

θ^=lβ^=l(XX)1Xy=l(j=1g1njy¯jN1j=g1+1Gnjy¯jN2) (2)

where y denotes the vector of observed outcome across the two conditions, X stands for the design matrix of the model (1), Ni,i = 1,2 is the total number of subjects under condition i, G is the total number of clusters across the conditions, again nj,j, = 1,…, G, is the number of individuals in cluster j, g1 is the number of clusters from the treatment condition, and yj denotes the outcome mean for cluster j. The proposed WJK-OLS estimator is,

θ~OLS=c=1Gwcθ^c (3)

where the subscript −c denotes the cth cluster being deleted out of the total G clusters. The weight is calculated by

wc=1/V^(θ^c)j=1G1/V^(θ^c) (4)

where V^(θ^c) is a variance estimate of θ^c. Apart from the traditional Jackknife resampling procedure, a cluster rather than an individual unit is excluded at a time during the computation. The inverse-variance weighting technique generates a greater weight for an estimate with a higher precision based on a smaller variance. For calculation of wc in above equation, after a cluster introducing a relatively larger variance being deleted, the estimated variance of θ^c is expected to be smaller and wc to be greater. In other words, the relative impact of heterogeneous clusters is weighted down in the estimation of the condition means.

The variance of θ^ is

V(θ^)=V(lβ^)=lV((XX)1Xy)l=l(XX)1XVyX(XX)1l=l(j=1g1nj2(σγj2+σϵj2nj)N1200j=g1+1Gnj2(σγj2+σϵj2nj)N22)l=l(j=1g1nj2σδj2N1200j=g1+1Gnj2σδj2N22)l (5)

where Vy is the variance-covariance matrix of y, and σδj2=σγj2+σϵj2nj. The derivation of Vy is provided in Appendix. The σδj2 has two additive terms: the variance of cluster-wise random effect and the variance of errors within cluster j. Denote the diagonal elements in (5) as V(β^i),i=1,2, which is a weighted sum of σδj2’s including all the clusters under condition i. During the delete-one-cluster Jackknife procedure, let V^(β^i,c) denote the variance estimate with cth cluster excluded in the estimation. We are able to construct the linear equations of σ^δj2’s shown in (6).

n22N12σ^δ22+n32N12σ^δ32++ng12N12σ^δg12=V^(β^1,1)n12N12σ^δ12+n32N12σ^δ32++ng12N12σ^δg12=V^(β^1,2)ng1+12N22σ^δg1+12+ng1+22N22σ^δg1+22++nG12N22σ^δG12=V^(β^2,G) (6)

For our setting with the presence of cluster-level heterogeneity, we can apply a published approach by Shao and Rao (Shao & Rao, 1993) to obtain an estimate of V(β^i,c). Then in (6), there are G unknown parameters (i.e., σ^δj2’s), and G independent equations. Solving (6) with respect to σ^δj2 leads us to obtain an estimate of V(θ^). By this point, we are able to calculate the weights wc’s and thus θ~OLS. Further details of the derivation are provided in Appendix.

The variance of θ~OLS can be written as

V(θ~OLS)=c=1Gwc2V(θ^c)+21s<tGwswtCov(θ^s,θ^t), (7)

where the covariance matrix between θ^s and θ^t are given below condtional on which conditions clusters s and t are from.

Cov(θ^s,θ^t)=lCov(β^s,β^t)l={l(1cg1,csnc2σδc2N1,s200g1<cG,ctnc2σδc2N2,t2)l,sg1andt>g1l(1cg1,c(s,t)nc2σδc2N1,(s,t)200g1<cGnc2σδc2N22)l,sg1andtg1l(1cg1nc2σδc2N1200g1<cG,c(s,t)nc2σδc2N2,(s,t)2)l,s>g1andt>g1

Replacing any σδc2 by an estimate of σ^δc2 obtained from solving the linear equations in (6), we can get estimates of Cov(θ^s,θ^t), and further compute an estimate of V(θ~OLS), denoted by V^θ~OLS.

2.2. GLS based Weighted Jackknife estimator

The GLS based Weighted Jackknife (WJK-GLS) estimator is calculated as follows.

θˇ=lβˇ=l(XVy1X)1XVy1y=l(j=1g11/σδj2j=1g11/σδj2y¯jj=g1+1G1/σδj2j=g1+1G1/σδj2y¯j)

As already being noted in the previous section, σδj2(=σγj2+σϵj2nj) contains two parts (σγj2 and σϵj2nj), so its value is determined by the associated cluster variance (σγj2), within-cluster error variance (σϵj2), and the cluster size (nj). The form of the WJK-GLS estimator is the same as the WJK-OLS estimator,

θ~GLS=c=1Gucθˇc

where uc is an inverse-variance style weight as in (4),

uc=1/V^(θˇc)j=1G1/V^(θˇc).

The variance of a GLS estimator can be derived as,

Vθˇ=l'1/j=1g11σδj2001/j=g1+1G1σδj2l.

We still apply the same approach in solving the linear equations in (6) delineated under section 2.1 to obtain estimates of σδj’s. The variance of θ~GLS can be decomposed as the way how Vθ~OLS is decomposed,

Vθ~GLS=c=1Guc2Vθˇc+21s<tGusutCovθˇs,θˇt,

where the covariance term is calculated

Covθˇs,θˇt=lCovβˇs,βˇtll1cg1,cs1σδc2100g1<cG,ct1σδc21l,sg1andt>g1l1cg1,c(s,t)1σδc2100j=g1+1G1σδc21l,sg1andtg1.lj=1g11σδc2100g1<cG,c(s,t)1σδc21,s>g1andt>g1

Replacing any σδc2 by an estimate of σ^δc2 acquired from solving the equations, we are able to compute estimates for all V(θˇc)’s and Cov(θˇs,θˇt)’s, and thus to obtain an estimate of Vθ~GLS, denoted by V^θ~GLS.

2.3. Cluster permutation for hypothesis testing

In addition to the estimation of θ, often an analysis would further carry out a hypothesis testing of θ against a constant value: H0: θ = d vs. H0: θd. When we have l′ = (1, −1) to compute θ and set d = 0, one may interest to estimate the treatment effect (i.e., μ1μ2) and test whether the treatment effect is significantly different from 0. Here we explain how the cluster permutation procedure is designed for a hypothesis testing given our method framework.

The order of the clusters, from 1 to G, is permutated. For example, using 4 clusters per each of the two conditions, {1,2,3,4,5,6,7,8} is the original order of the clusters. One possible permutation is {1,3,5,6,2,4,7,8}, and another possible one is {1,2,5,8,3,4,6,7}, and etc. In each permutation step, the reordered clusters are associated with the two conditions according to the association from the raw dataset. In this example, the first 4 clusters are associated with the treatment condition, and the other 4 clusters are under the control condition. After applying the proposed methods, we can obtain a Wald test statistic from each permutation step,

t=θ~OLSV^θ~OLS

with the WJK-OLS estimator, or

t=θ~GLSV^θ~GLS

with the WJK-GLS estimator.

The Wald test statistics are obtained from all possible permutated datasets. Denote the test statistic calculated from the observed data by tobs. The percentage t’s generated from all the permutations equal to or more extreme than tobs serves as the p-value for the hypothesis testing, to draw a decision to either reject or not reject the null hypothesis. It’s worthy of note that due to a limited number of permutations, the attainable p-values in a cluster permutation testing are limited optional values. For example, for a dataset with 4 clusters per condition, the total number of permutations is 70 and all attainable two-sided p-values are the fold(s) of 2/70: 2/70, 4/70, 6/70, …, 66/70, 68/70, 70/70. For a dataset with 5 clusters per condition, the two-sided p-values are 2/252, 4/252 and etc. We then choose the attainable p-values nearest to 0.05 or 0.1 as the nominal significance levels in our simulation study.

3. Simulation studies

3.1. Methods to compare

The proposed WJK approaches are applied to simulated datasets. Their performances are assessed by the comparison of the analysis results between the WJK approaches and a couple of existing representative methods. As reviewed in the introduction section, those methods were designed to either perform a small-sample correction or address the within-cluster error heterogeneity meanwhile incorporating an adjustment for a small number of clusters.

The compared methods have been elaborated in our previous work and are briefed here (Du & Lee, 2019).

  • 1

    LMM-KR. It stands for LMM fitting with the degree of freedom of the test statistic (usually a T or F statistic) being corrected by the Kenward and Roger (KR) method (Kenward & Roger, 1997).

  • 2

    LMM-Permutation. The estimation is carried out through the LMM fitting, but the significance of a test statistic will be determined by the cluster permutation procedure given in section 2.3 instead of being compared to a parametric distribution, e.g., T or F distribution.

  • 3

    OLS-CRVE. It stands for the ordinary least square estimate incorporated with the cluster-robust variance estimate of the OLS. The same Wald test statistic will be computed to run the cluster permutations for hypothesis testing,

t=θ^V^OLSCRVE(θ^).
  • 4
    FGLS-CRVE. It stands for feasible generalized least squares with a cluster-robust variance estimator. In the following simulation study, the variance components are initially estimated through LMM fitting and then the CRVE is computed. The Wald test statistic for the hypothesis testing by the cluster permutations is,
    t=θˇV^GLSCRVE(θˇ).

3.2. Simulations with different numbers of clusters

A study outcome is simulated by implementing model (1) in R programming. The other settings in our simulations are made to reflect the data characteristics seen when doing some real-world data analysis. The cluster size is set unbalanced; for example, for the simulation of 4 clusters per condition, the cluster sizes are 80, 50, 120, 25, 60, 75, 50, and 5 across the two conditions. The within-cluster error variance is 100 except for the size of 75, to which the within-cluster error variance is 900. The cluster-level heterogeneity is simulated to the cluster with the largest cluster size (nj = 120), for which σγj2=25, and for the other clusters, the variance is 1. In this set of data simulations, we want to examine how the WJK approaches perform when the number of clusters per condition changes among 4, 5, 6, and 8. A thousand simulated datasets are generated for a single setting to test a treatment effect: H0: δ = μ1μ2 = 0 vs. Ha: δ = μ1μ2 ≠ 0. For reproducible research, the index of a simulation (i.e. a corresponding integer out of 1 to 1000) is used as the random seed in the programming for both the true null (μ1 = μ2 = 100) and the true effect (μ1 = 100, μ2 = 95) situations. The analytic results of the WJK approaches and the other compared methods are given in Table 1. The mean squared error (MSE) is used to measure the performance of our methods in estimation. The testing rejection rate is the percentage of the p-values from the 1000 simulations less than or equal to a nominal significance level. For a true null situation, the testing rejection rate is an observed Type I error rate, which is expected not to be greater than the nominal significance level; otherwise, the testing approach would raise a concern of inflated Type I error rate (also called False Positive rate). For a true effect situation, the testing rejection rate is an estimated statistical power.

Table 1:

Performance of WJK-OLS and WJK-GLS approaches compared to the other methods in the estimation of the conditional means, and the hypothesis testing of the treatment effect, when the number of clusters per condition varies.

Number of clusters per conditiona Measureb Method
LMM-KR LMM-Permu. OLS-CRVE FGLS-CRVE WJK-OLS WJK-GLS

4 clusters MSE Est. μ1 1.76 1.76 2.28 1.76 1.34 1.19
Est. μ2 1.56 1.56 1.70 1.56 1.38 1.32

Rejection % (δ = 0) Sig. 1 0.015 0.054 0.054 0.05 0.071 0.063
Sig. 2 0.048 0.103 0.095 0.093 0.13 0.118

Rejection % (δ ≠ 0 ) Sig. 1 0.485 0.562 0.539 0.564 0.573 0.584
Sig. 2 0.662 0.697 0.645 0.686 0.695 0.729

5 clusters MSE Est. μ1 1.53 1.53 2.06 1.53 1.34 1.17
Est. μ2 1.06 1.06 1.21 1.06 1.08 0.996

Rejection % (δ=0) Sig. 1 0.011 0.025 0.026 0.024 0.037 0.038
Sig. 2 0.050 0.083 0.082 0.078 0.084 0.093

Rejection % (δ ≠ 0 ) Sig. 1 0.608 0.651 0.614 0.651 0.650 0.691
Sig. 2 0.767 0.784 0.709 0.760 0.751 0.849

6 clusters MSE Est. μ1 1.42 1.42 1.99 1.42 1.296 1.05
Est. μ2 1.07 1.07 1.196 1.07 1.09 1.12

Rejection % (δ = 0) Sig. 1 0.012 0.036 0.038 0.037 0.044 0.037
Sig. 2 0.042 0.07 0.069 0.073 0.085 0.072

Rejection % (δ ≠ 0) Sig. 1 0.683 0.71 0.652 0.69 0.675 0.769
Sig. 2 0.814 0.814 0.735 0.788 0.766 0.898

8 clusters MSE Est. μ1 1.31 1.31 1.84 1.31 1.35 1.01
Est. μ2 1.06 1.06 1.18 1.06 1.07 0.99

Rejection % (δ = 0) Sig. 1 0.018 0.044 0.044 0.048 0.054 0.049
Sig. 2 0.06 0.093 0.089 0.088 0.1 0.088

Rejection % (δ ≠ 0) Sig. 1 0.813 0.824 0.717 0.787 0.726 0.895
Sig. 2 0.891 0.896 0.781 0.863 0.799 0.951
a

The cluster sizes are:

4 clusters/condition: 80, 50, 120, 25, 60, 75, 50, 5;

5 clusters/condition: 80, 50, 120, 25, 30, 30, 60, 75, 50, 5;

6 clusters/condition: 80, 50, 120, 25, 30, 10, 10, 30, 60, 75, 50, 5;

8 clusters/condition: 80, 50, 120, 25, 30, 10, 20, 10, 10, 20, 10, 30, 60, 75, 50, 5.

b

Since the same 1000 random seeds used for the simulations with the true null and the true effect situations, the MSEs are the same when the other settings are identical. Thus the MSE is presented for only one time.

Est. is an abbreviation for Estimate of; Sig. is for nominal significance level. Due to the limited number of permutations, for 4 clusters per condition, Sig. 1=0.057 (4/70) and Sig. 2=0.114 (8/70); for 5 clusters per condition, Sig. 1=0.048 (12/252) and Sig. 2=0.103 (26/252). For 6 or 8 clusters/condition, Sig. 1=0.05 and Sig. 2=0.1.

The analysis results in Table 1 shows the estimates by the WJK-GLS has the smallest MSE among the methods for any of the settings, no matter in the estimation of μ1 or μ2. The WJK-GLS also displays the best statistical power in testing the treatment effect when it’s simulated a true effect (δ ≠ 0); and Type 1 errors are smaller than the nominal significances except for 4 clusters per condition although they are very close (δ = 0: 0.063 vs. 0.057, 0.118 vs. 0.114). For this set of simulated datasets, however it is not consistently observed that WJK-OLS performs superiorly to the exiting GLS based methods, i.e. LMM-KR, LMM-Permutation, and FGLS-CRVE. When the number of clusters becomes greater, the statistical power of WJK-OLS becomes relatively less competitive.

3.3. Simulations with varied cluster-level heterogeneity

Having the essential data characteristics unaltered from the above simulations, which includes the unbalanced cluster sizes and an unequal within-cluster error variance for a cluster, we investigate how the WJK methods’ performances are impacted when the cluster-level heterogeneity changes. The number of clusters is set to 5 per condition for all the simulations in this section. The cluster-level unequal variance is set 36 and 49 additional to 25; we also specially look into the situation when no heterogeneous cluster variance presents.

Table 2 displays the analytic results for the new set of simulated data. When the unequal variance of the cluster random effect increases, all the methods are impacted having reduced capacity in both the estimation and the testing, although for any individual setting the WJK-GLS performs best out of the compared. In the case of no unequal variances, both WJK-OLS and WJK-GLS have slightly higher MSEs in the estimation of the outcome mean for condition 1 (i.e. μ1), under which the cluster sizes are 80, 50, 120, 25, 30 and no within-cluster error heterogeneity simulated either. For condition 2, under which a very small cluster presents (i.e. cluster size 5) with an unequal within-cluster error variance on the cluster of size 75, both WJK-OLS and WJK-GLS have lower MSEs than the others in the estimation of μ2. They both also outperform the other methods in testing the true treatment effect (last two rows of Table 2). For WJK-GLS, the observed Type I error rates are lower than the nominal significance values; and for WJK-OLS, the values are very close (0.05 vs. 0.048, 0.104 vs. 0.103).

Table 2:

Performance of WJK-OLS and WJK-GLS approaches compared to the other methods in the estimation of the conditional means, and the hypothesis testing of the treatment effect when the cluster-level heterogeneity varies.

Cluster-level unequal variancea Measureb Method
LMM-KR LMM-Permu. OLS-CRVE FGLS-CRVE WJK-OLS WJK-GLS

25 MSE Est. μ1 1.53 1.53 2.06 1.53 1.34 1.17
Est. μ2 1.06 1.06 1.21 1.06 1.08 0.996

Rejection % (δ = 0) Sig. 1 0.011 0.025 0.026 0.024 0.037 0.038
Sig. 2 0.050 0.083 0.082 0.078 0.084 0.093

Rejection % (δ ≠ 0) Sig. 1 0.608 0.651 0.614 0.651 0.650 0.691
Sig. 2 0.767 0.784 0.709 0.760 0.751 0.849

36 MSE Est. μ1 1.71 1.71 2.43 1.71 1.38 1.17
Est. μ2 1.06 1.06 1.21 1.06 1.08 0.996

Rejection % (δ = 0) Sig. 1 0.008 0.025 0.024 0.025 0.038 0.036
Sig. 2 0.049 0.077 0.079 0.072 0.087 0.088

Rejection % (δ ≠ 0) Sig. 1 0.567 0.617 0.589 0.621 0.618 0.661
Sig. 2 0.741 0.749 0.674 0.723 0.712 0.822

49 MSE Est. μ1 1.88 1.88 2.81 1.88 1.39 1.197
Est. μ2 1.06 1.06 1.21 1.06 1.08 0.996

Rejection % (δ = 0) Sig. 1 0.008 0.024 0.023 0.025 0.036 0.041
Sig. 2 0.049 0.075 0.076 0.07 0.088 0.087

Rejection % (δ ≠ 0) Sig. 1 0.532 0.593 0.568 0.595 0.595 0.647
Sig. 2 0.706 0.718 0.642 0.689 0.689 0.814

Variances all equal to 1 MSE Est. μ1 0.79 0.79 0.79 0.79 0.84 0.89
Est. μ2 1.13 1.13 1.21 1.13 1.08 0.996

Rejection % (δ = 0) Sig. 1 0.008 0.044 0.046 0.05 0.05 0.034
Sig. 2 0.033 0.112 0.1 0.103 0.104 0.078

Rejection % (δ ≠ 0) Sig. 1 0.779 0.839 0.818 0.832 0.842 0.841
Sig. 2 0.904 0.919 0.889 0.915 0.921 0.938
a

There are 5 clusters/condition, and the cluster sizes are: 80, 50, 120, 25, 30, 30, 60, 75, 50, 5. The variance of cluster-level random effect is 1 if no unequal variances exit or an unequal variance is associated with the cluster of size 120.

b

Since the same 1000 random seeds are used for the simulations with the true null and the true effect situations, the MSEs are the same when the other settings are identical. Thus the MSE is presented only once.

Est. is an abbreviation for Estimate of; Sig. is for nominal significance level. Due to the limited number of permutations, Sig. 1=0.048 (12/252) and Sig. 2=0.103 (26/252).

3.4. Simulations with paired cluster size

In real practice when conducting cluster randomized trials, as a widely applied design, researchers would pair match the clusters between the study conditions prior to the cluster randomization based on cluster sizes or some other characteristics that need to be balanced. To this end, we also designed the simulation settings to investigate the performance of WJK approaches for data simulated with paired cluster sizes between the two conditions. Except for the cluster sizes being set as 30, 60, 75, 50, and 5 for each condition, we follow all the other simulation settings used in Section 3.3. The details of the settings and the analysis results are presented in Table 3. The observed Type I error rates for WJK-GLS are lower than the nominal significance values for all the situations in this set of simulations; for WJK-OLS, the observed values higher than the nominal value within < 1% for some situations (i.e. 0.107 to 0.109 vs. 0.103). For the settings with a cluster-level heterogeneous variance presented, the WJK approaches outperform the other compared methods in having lower MSEs for estimation of the condition means and higher statistical powers in detecting a nonzero δ. For the setting without cluster-level heterogeneous variance, the same observation as the above section also holds: both WJK approaches have slightly higher MSEs in the estimation of μ1 for which condition the unequal within-cluster error variance is not simulated either; otherwise, the WJK approaches lead to smaller MSEs in estimation and higher or same statistical power in finding the difference of the condition means.

Table 3:

Performance of WJK-OLS and WJK-GLS approaches compared to the other methods in the estimation of the conditional means, and the hypothesis testing of the treatment effect, when the cluster sizes being paired between the two conditions.

Cluster-level unequal variance a Measure b Method
LMM-KR LMM-Permu. OLS-CRVE FGLS-CRVE WJK-OLS WJK-GLS

25 MSE Est. μ1 1.61 1.61 1.87 1.61 1.36 1.35
Est. μ2 1.13 1.13 1.30 1.13 1.02 0.99

Rejection % (δ = 0) Sig. 1 0.006 0.032 0.029 0.030 0.044 0.042
Sig. 2 0.043 0.085 0.084 0.082 0.099 0.100

Rejection % (δ ≠ 0) Sig. 1 0.577 0.648 0.621 0.641 0.644 0.669
Sig. 2 0.730 0.762 0.724 0.754 0.769 0.811

36 MSE Est. μ1 1.81 1.81 2.18 1.81 1.42 1.30
Est. μ2 1.12 1.12 1.30 1.12 1.02 0.99

Rejection % (δ = 0) Sig. 1 0.006 0.029 0.029 0.030 0.044 0.046
Sig. 2 0.045 0.083 0.084 0.086 0.107 0.094

Rejection % (δ ≠ 0) Sig. 1 0.545 0.618 0.602 0.616 0.616 0.652
Sig. 2 0.695 0.724 0.694 0.721 0.741 0.793

49 MSE Est. μ1 2.00 2.00 2.50 2.00 1.47 1.37
Est. μ2 1.11 1.11 1.30 1.11 1.02 0.99

Rejection % (δ = 0) Sig. 1 0.007 0.029 0.030 0.029 0.046 0.046
Sig. 2 0.047 0.084 0.080 0.082 0.109 0.091

Rejection % (δ ≠ 0) Sig. 1 0.518 0.598 0.580 0.601 0.602 0.642
Sig. 2 0.672 0.694 0.664 0.691 0.714 0.787

Variances all equal to 1 MSE Est. μ1 0.87 0.87 0.87 0.87 0.92 1.01
Est. μ2 1.17 1.17 1.30 1.17 1.02 0.99

Rejection % (δ = 0) Sig. 1 0.002 0.036 0.035 0.035 0.048 0.046
Sig. 2 0.027 0.097 0.092 0092 0.108 0.098

Rejection % (δ ≠ 0) Sig. 1 0.718 0.807 0.791 0.795 0.807 0.801
Sig. 2 0.869 0.900 0.879 0.884 0.901 0.907
a

There are 5 clusters/condition, and the cluster sizes are: 30, 60, 75, 50, 5, 30, 60, 75, 50, 5 for the both conditions. The variance of cluster-level random effect is 1 if no unequal variances exit, or an unequal variance is associated with the 3rd cluster having the size 75. The within-cluster error variance is 100 except for the 8th cluster of size 75, to which the within-cluster error variance is 900.

b

Since the same 1000 random seeds are used for the simulations with the true null and the true effect situations, the MSEs are the same when the other settings are identical. Thus the MSE is presented only once.

Est. is an abbreviation for Estimate of; Sig. is for nominal significance level. Due to the limited number of permutations, Sig. 1=0.048 (12/252) and Sig. 2=0.103 (26/252).

4. Application: a resampled school-based survey dataset

The application of the proposed methods is illustrated using a dataset resampled from a real school-based survey pool. A student’s math achievement score is used as the outcome variable. The original dataset contains 160 schools (90 public and 70 private schools) of 7185 students (Jones, 1983). Six schools were randomly sampled from each type of the schools. Figure 1 shows the boxplots of math scores from the resampled schools.

Figure 1:

Figure 1:

Boxplots of math scores from the resampled schools of the both types.

The estimated condition means (i.e. math scores for the two types of schools) are 11.99 vs. 14.15 by LMM-KR, LMM-permutation, and FGLS-CRVE, 11.96 vs. 14.33 by OLS-CRVE, 11.96 vs. 14.56 by WJK-OLS, and 11.94 vs. 14.6 by WJK-GLS for the public schools and private schools, respectively. The disagreement between the WJK methods and the compared methods is more on the estimated condition mean for private schools. The calculated cluster means for the six private schools are 13.4, 16.96, 16.06, 14.62, 16.45, and 7.33, from left to right shown in Figure 1. The rightmost one, a heterogeneous private school in terms of a distinctly lower distribution of the math scores, made the difficulty for the existing methods in the analysis of this resampled dataset. They are not able to claim the difference of the math score means between the two school types at the significance level of 0.1, while both WJK-OLS and WJK-GLS can (p-values for LMMKR: 0.198; LMM-Permutation: 0.203; OLS-CRVE: 0.115; FGLS-CRVE: 0.195; WJK-OLS: 0.074; WJK-GLS: 0.076).

An analysis using LMM approach with the entire160 schools yields the estimated mean math scores (11.4 for the public schools, 14.2 for the private schools), and a p-value < .001. The number of the clusters in this analysis is sufficiently large (90 vs. 70), the intra-class correlation coefficient (ICC) of the schools is estimated at 0.15 after the LMM fitting. The analytic approach is justified to apply and the result acts as the benchmark for us to examine the result generated from above resampled dataset of a small number of clusters.

5. Discussion

The delete-one cluster procedure provides a way to evaluate the impact of each cluster in the estimation. In our proposed approaches, an estimated variance of the estimator (i.e. V^(θ^c) or V^(θˇc)) is calculated when a cluster is deleted. Relatively smaller this value suggests we gain less uncertainty for an estimate when a cluster is deleted. In other words, the deleted one is more towards a heterogenous cluster compared to the others. Using the reciprocal of V^(θ^c) or V^(θˇc) as the weight to performing a weighted sum of the estimates resulted from each delete-once cluster procedure, the impact of a heterogeneous cluster is then downplayed and a more precise estimate can be achieved. We should also mention, since the sum of the weights is one, the WJK estimators are unbiased given θ^c or θˇc is unbiased, which is true in our framework (see Equations in 2.1 and 2.2). That’s why we have been focusing our study attention on the estimation precision solely.

Figure 2 shows kernel density of the estimated condition means from the simulated data of 5 clusters per condition with (Left Figure) or without (Right Figure) a heterogeneous cluster. It’s visually distinguishable that WJK-GLS and WJK-OLS produced curves are less stretched out than the curves generated by the other methods in the left panel for condition 1 (μ1 = 100), which was simulated with a cluster-level heterogeneous variance of 25. As they are meant to, the simulation study also confirmed that the proposed WJK approaches do produce more precise estimates (also see the MSEs in Table 1 &2). While, the curves are overlapped to each other in Left Figure for condition 2 or for both conditions in Right Figure when the heterogeneity was not simulated.

Figure 2:

Figure 2:

Density of estimated condition means by WJK approaches and the other methods (LMM- in the figure legends stands from LMM-KR and LMM-Permutation) for the simulated data of 5-clusters per condition with a cluster-level heterogeneous variance 25 (Left Figure), or without the heterogeneous variance (Right Figure).

When solving linear equations (6), we could encounter negative solutions which are not appropriate for our purpose in obtaining variance estimates, i.e. σ^δgj2. This problem is more prevalent when the number of clusters per condition is less than 5 as we experienced in the simulation study. For the steps in WJK-OLS, since no reciprocal calculations are further needed, we applied a stratiforward way to handle this problem: a negative estimate is changed to 0 (i.e. if σ^δgj2<0 then it’s set 0). For WJK-GLS approach, each negative estimate is first changed to 0 and then all the estimates are added by an additional part, respectively, i.e. σ^δgj2+σ^ϵj2nj, where σ^ϵj2 is the sample variance of errors in cluster j and calculated as

σ^ϵj2=1nj1k=1nj(yjkyj)2.

Since σδj2=σγj2+σϵj2nj and an estimate of σγj2 is usually downward biased due to a small number of clusters, which made a reason for us to additionally add σ^ϵj2/nj to handle this negative estimate problem.

For estimating θ, an OLS estimate only involves the weights by the cluster sizes (see the computation of θ^ in Eq. (2)), while a GLS applies the weights of 1/σδj2 for each condition (see the computation of θˇ in 2.2). A larger cluster size will lead to a more precise estimate of the within-cluster error variance, but which cannot incorporate a measure of how one cluster is heterogenous from the others into consideration. Since σδj2 is composed of two additive parts: σγj2+σϵj2nj, using 1/σδj2 to balance the impact from each cluster in the estimation, a larger weight will then be defined by both the cluster size and the cluster-level variance interactively. When a heterogeneous cluster presents, WJK-GLS is conceptually a better approach than WJK-OLS, which is also confirmed by our simulation study. However, WJK-GLS involves estimating the reciprocal (1/σδj2), which appears more challenging than estimating σδj2 itself. When the cluster sizes are not very unbalanced, and/or no severe cluster-level heterogeneity exists, WJK-OLS performs as good as WJK-GLS (see Table 2: ‘No unequal variances’ setting), and in some situations, as we experienced with real datasets WJK-OLS could work better than the others (not detailed due to page limit). If outlier data points within a cluster are observed, a general treatment on outliers, such as data transformation, can be considered prior to the proposed methods being applied. It’s of much interest to further study the WJK approaches as our ongoing research. For both WJK-OLS and WJK-GLS approaches, the computation time is not a concern with an up-to-date personal computer platform. For a simulated dataset with 8 clusters per condition, it needs about 1 min to finish the analysis by either of the WJK approaches on an author’s desktop computer, which has 2 Intel Xeon CPUs of 1.7GHz integrated total of 12 cores with a Windows 10 system. The R functions for implementing the WJK- methods have been developed, which are available upon request from the corresponding author.

Funding

This research was supported in part by the University of Arkansas for Medical Sciences Translational Research Institute (TRI) Grant TL1 TR003109 through the NCATS of the NIH. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.

Appendix

Derivations

The major derivation starts from Equation (1) which is rewritten here:

Yjk=μi+γj+ϵjk. (1)

The model equation in matrix form is

Y=[μ1μ1μ2μ2]+[γ1γ1γ2γ2γGγG]+[ϵ11ϵ1n1ϵ21ϵ2n2ϵG1ϵGnG]=(10100101)[μ1μ2]+(111111)[γ1γ2γG]+[ϵ11ϵ1n1ϵ21ϵ2n2ϵG1ϵGnG]=Xβ+Diag{1nj}γ+ϵ,

where no-valued indicated places inside a matrix contain all zeros; Diag1nj denotes the matrix having 1nj’s as the diagonal vectors and zeros elsewhere. The other notations are

YN×1=y11y1n1y21y2n2yG1yGnG,XN×2=11000011=1N10N10N21N2,
β2×1=[μ1μ2],γG×1=[γ1γ2γG],ϵN×1=[ϵ11ϵ1n1ϵ21ϵ2n2ϵG1ϵGnG]

Let Vy be the covariance matrix of Y.

Vy=Diag{1nj}Var(γ)(Diag{1nj})+Var(ϵ)=Diag{1nj}(σγ12000σγ22000000σγG2)(Diag{1nj})+(σϵ12In1000σϵ22In2000000σϵG2InG)=(σγ12+σϵ12σγ12σγ12σγ12σγ12+σϵ12σγ12σγ12σγ12σγ12+σϵ12σγG2+σϵG2σγG2σγG2σγG2σγG2+σϵG2σγG2σγG2σγG2σγG2+σϵG2)=Diag{σγj21nj1nj+σϵj2Inj},

where no-valued indicated places inside a matrix contain all zeros.

Let Unj=1nj1nj1nj and Snj=InjUnj. Both Unj and Snj are orthogonal projection matrices; we have UnjUnj=Unj,SnjSnj=Snj,UnjSnj=0nj. Then Vy1 can be calculated,

Vy1=[Diag{σγj21nj1nj+σϵj2Inj}]1=Diag{σγj21nj1nj+σϵj2Inj}1=Diag{σγj2njUnj+σϵj2Unj+σϵj2Snj}1=Diag{(σγj2nj+σϵj2)Unj+σϵj2Snj}1=Diag{1σγj2nj+σϵj2Unj+1σϵj2Snj}

WJK-OLS related derivation

XVyX=(1N10N20N11N2)Diag{σγj21nj1nj+σϵj2Inj}(1N10N10N21N2)=(1N10N20N11N2)Diag{(σγj2nj+σϵj2)Unj+σϵj2Snj}(1N10N10N21N2)=(j=1g1nj(σγj2nj+σϵj2)00j=g1+1Gnj(σγj2nj+σϵj2))=(j=1g1nj2σδj200j=g1+1Gnj2σδj2)

where σδj2=σγj2+σϵj2nj as defined in the main text.

β^=(XX)1XY=(1/N1001/N2)(j=1g1k=1njyjkj=g1+1Gk=1njyjk)=(j=1g1njy¯jN1j=g1+1Gnjy¯jN2)
V(β^)=V((XX)1XY)=(XX)1XVyX(XX)1=(1/N1001/N2)(j=1g1nj2σδj200j=g1+1Gnj2σδj2)(1/N1001/N2)=(j=1g1nj2σδj2N1200j=g1+1Gnj2σδj2N22)
V(θ^c)=V(lβ^c)=lV(β^c)l={l(j=1,jcg1nj2σδj2N1200j=g1+1Gnj2σδj2N22)l,cg1l(j=1g1nj2σδj2N1200j=g1+1,jcGnj2σδj2N22)l,c>g1

WJK-GLS related derivation

XVy1X=(1N10N20N11N2)Diag{1σγj2nj+σϵj2Unj+1σϵj2Snj}(1N10N10N21N2)=(j=1g1njσγj2nj+σϵj200j=g1+1Gnjσγj2nj+σϵj2)=(j=1g11σδj200j=g1+1G1σδj2)
XVy1Y=(1N10N20N11N2)Diag{1σγj2nj+σϵj2Unj+1σϵj2Snj}(YN1YN2)=(1σγ12n1+σϵ121n11σγg12ng1+σϵg121ng10ng1+10nG0n10ng11σγg1+12ng1+1+σϵg1+121ng1+11σγG2nG+σϵG21nG)(YN1YN2)=(j=1g1k=1njyjkσγj2nj+σϵj2j=g1+1Gk=1njyjkσγj2nj+σϵj2)=(j=1g1y¯jσδj2j=g1+1Gy¯jσδj2)
βˇ=(XVy1X)1XVy1Y=(1j=1g11σδj2001j=g1+1G1σδj2)(j=1g1y¯jσδj2j=g1+1Gy¯jσδj2)=(j=1g11σδj2j=1g11σδj2y¯jj=g1+1G1σδj2j=g1+1G1σδj2y¯j)
V(βˇ)=(XVy1X)1XVy1Vy(Vy1)X[(XVy1X)1]=(XVy1X)1XVy1X[(XVy1X)1]=(XV1X)1=(1j=1g11σδj2001j=g1+1G1σδj2)
V(θˇc)=V(lβˇc)=lV(βˇc)l={l(1j=1,jcg11σδj2001j=g1+1G1σδj2)l,cg1l(1j=1g11σδj2001j=g1+1,jcG1σδj2)l,c>g1

References

  1. Bauer DJ, & Sterba SK (2011). Fitting multilevel models with ordinal outcomes: Performance of alternative specifications and methods of estimation. Psychological methods, 16(4), 373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bland JM (2004). Cluster randomised trials in the medical literature: two bibliometric surveys. BMC Medical Research Methodology, 4(1), 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bogart LM, Elliott MN, Cowgill BO, Klein DJ, Hawes-Dawson J, Uyeda K, & Schuster MA (2016). Two-Year BMI Outcomes From a School-Based Intervention for Nutrition and Exercise: A Randomized Trial. Pediatrics, 137(5), e20152493. [DOI] [PMC free article] [PubMed] [Google Scholar]; Cameron AC, Gelbach JB, & Miller DL (2008). Bootstrap-based improvements for inference with clustered errors. The Review of Economics and Statistics, 90(3), 414–427. [Google Scholar]
  4. Cameron AC, & Miller DL (2010). Robust inference with clustered data.
  5. Canay IA, Santos A, & Shaikh AM (2018). The wild bootstrap with a” small” number of” large” clusters. Review of Economics and Statistics, 1–45. [Google Scholar]
  6. Donald SG, & Lang K (2007). Inference with difference-in-differences and other panel data. The Review of Economics and Statistics, 89(2), 221–233. [Google Scholar]
  7. Donner A, & Klar N (1996). Statistical considerations in the design and analysis of community intervention trials. Journal of clinical epidemiology, 49(4), 435–439. [DOI] [PubMed] [Google Scholar]
  8. Du R, & Lee J-H (2019). A weighted Jackknife method for clustered data. Communications in Statistics-Theory and Methods, 48(8), 1963–1980. [Google Scholar]
  9. Eldridge S, & Kerry S (2012). A practical guide to cluster randomised trials in health services research (Vol. 120): John Wiley & Sons. [Google Scholar]
  10. Fay MP, & Graubard BI (2001). Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics, 57(4), 1198–1206. [DOI] [PubMed] [Google Scholar]
  11. Flynn TN, & Peters TJ (2004). Use of the bootstrap in analysing cost data from cluster randomised trials: some simulation results. BMC health services research, 4(1), 1. Retrieved from [DOI] [PMC free article] [PubMed] [Google Scholar]; arville DA (1977). Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association, 72(358), 320–338. [Google Scholar]
  12. Harville DA (1988). Mixed-model methodology: theoretical justifications and future directions Paper presented at the Proc. Statistical Computing Section of the American Statistical Association, New Orleans, USA. [Google Scholar]
  13. Jairath V, Kahan BC, Gray A, Doré CJ, Mora A, James MW, . . . Dallal H. (2015). Restrictive versus liberal blood transfusion for acute upper gastrointestinal bleeding (TRIGGER): a pragmatic, open-label, cluster randomised feasibility trial. The Lancet, 386(9989), 137–144. [DOI] [PubMed] [Google Scholar]
  14. Jones C (1983). High School and Beyond Transcripts Survey (1982). Data File User’s Manual. Contractor Report. [Google Scholar]
  15. Kahan BC, Forbes G, Ali Y, Jairath V, Bremner S, Harhay MO, . . . Leyrat C (2016). Increased risk of type I errors in cluster randomised trials with small or medium numbers of clusters: a review, reanalysis, and simulation study. Trials, 17(1), 438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Kenward MG, & Roger JH (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 983–997. [PubMed] [Google Scholar]
  17. Littell RC, Stroup WW, Milliken GA, Wolfinger RD, & Schabenberger O (2006). SAS for mixed models: SAS institute. [Google Scholar]
  18. Mancl LA, & DeRouen TA (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics, 57(1), 126–134. [DOI] [PubMed] [Google Scholar]
  19. McNeish D, & Stapleton LM (2016). Modeling clustered data with very few clusters. Multivariate behavioral research, 51(4), 495–518. [DOI] [PubMed] [Google Scholar]
  20. Murray DM (1998). Design and analysis of group-randomized trials (Vol. 29): Oxford University Press, USA. [Google Scholar]
  21. Murray DM, Varnell SP, & Blitstein JL (2004). Design and analysis of group-randomized trials: a review of recent methodological developments. American journal of public health, 94(3), 423–432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Nichols A, & Schaffer M (2007). Clustered errors in Stata. [Google Scholar]
  23. Pan W, & Wall MM (2002). Small-sample adjustments in using the sandwich variance estimator in generalized estimating equations. Statistics in medicine, 21(10), 1429–1441. [DOI] [PubMed] [Google Scholar]
  24. Roetzheim RG, Christman LK, Jacobsen PB, Cantor AB, Schroeder J, Abdulla R, . . . Krischer JP. (2004). A randomized controlled trial to increase cancer screening among attendees of community health centers. The Annals of Family Medicine, 2(4), 294–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Rogers W (1993). Regression standard errors in clustered samples. Stata technical bulletin, 13, 19–23. [Google Scholar]
  26. Satterthwaite FE (1946). An approximate distribution of estimates of variance components. Biometrics bulletin, 2(6), 110–114. [PubMed] [Google Scholar]
  27. Shao J, & Rao J (1993). Jackknife inference for heteroscedastic linear regression models. Canadian Journal of Statistics, 21(4), 377–395. [Google Scholar]
  28. White H (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica: Journal of the Econometric Society, 817–838. [Google Scholar]

RESOURCES