Estimation of heterogeneity variance based on a generalized Q statistic in meta‐analysis of log‐odds‐ratio

Elena Kulinskaya; David C Hoaglin

doi:10.1002/jrsm.1647

. 2023 Jun 28;14(5):671–688. doi: 10.1002/jrsm.1647

Estimation of heterogeneity variance based on a generalized Q statistic in meta‐analysis of log‐odds‐ratio

Elena Kulinskaya ^1,^✉, David C Hoaglin ²

PMCID: PMC10946484 PMID: 37381621

Abstract

For estimation of heterogeneity variance $τ^{2}$ in meta‐analysis of log‐odds‐ratio, we derive new mean‐ and median‐unbiased point estimators and new interval estimators based on a generalized $Q$ statistic, $Q_{F}$ , in which the weights depend on only the studies' effective sample sizes. We compare them with familiar estimators based on the inverse‐variance‐weights version of $Q$ , $Q_{IV} .$ In an extensive simulation, we studied the bias (including median bias) of the point estimators and the coverage (including left and right coverage error) of the confidence intervals. Most estimators add $0.5$ to each cell of the $2 \times 2$ table when one cell contains a zero count; we include a version that always adds $0.5$ . The results show that: two of the new point estimators and two of the familiar point estimators are almost unbiased when the total sample size $n \geq 250$ and the probability in the Control arm ( $p_{iC}$ ) is 0.1, and when $n \geq 100$ and $p_{iC}$ is 0.2 or 0.5; for $0.1 \leq τ^{2} \leq 1$ , all estimators have negative bias for small to medium sample sizes, but for larger sample sizes some of the new median‐unbiased estimators are almost median‐unbiased; choices of interval estimators depend on values of parameters, but one of the new estimators is reasonable when $p_{iC} = 0.1$ and another, when $p_{iC} = 0.2$ or $p_{iC} = 0.5$ ; and lack of balance between left and right coverage errors for small $n$ and/or $p_{iC}$ implies that the available approximations for the distributions of $Q_{IV}$ and $Q_{F}$ are accurate only for larger sample sizes.

Keywords: effective‐sample‐size weights, heterogeneity, inverse‐variance weights, random effects

Highlights.

What is already known

Use of inverse‐variance weights based on estimated variances in the conventional $Q$ statistic makes it very difficult to approximate the null distribution of $Q$ .
Related moment‐based estimators of the heterogeneity variance ( $τ^{2}$ ), such as the DerSimonian‐Laird estimator, have considerable bias.

What is new

We study estimation of $τ^{2}$ based on $Q_{F}$ , a generalized $Q$ statistic with fixed effective‐sample‐size weights.
For point estimation of $τ^{2}$ we consider both mean‐unbiased and novel median‐unbiased estimators. The new estimators are based on the Farebrother approximation to the distribution of $Q_{F}$ .
In an extensive simulation study, we compared four new point estimators of $τ^{2}$ and two new interval estimators with traditional estimators.
We provide practical guidelines for choosing appropriate point and interval estimators for LOR.
For median bias, which was not studied previously, an important finding is that the majority of the standard estimators have negative bias for larger sample sizes. This means that their median values of ${\hat{τ}}^{2}$ are too low. Two new estimators consistently result in almost median‐unbiased estimation for moderate to large $n$ .

Potential impact for RSM readers outside the authors' field

We provide practical guidelines for choosing appropriate point and interval estimators of $τ^{2}$ in meta‐analysis of log‐odds‐ratio.
Some popular estimators of heterogeneity variance $τ^{2}$ such as the DL and REML point estimators and PL intervals have unacceptable bias or coverage. We recommend instead new point and interval estimators of $τ^{2}$ that use constant effective‐sample‐size weights. Some of these estimators are already implemented in metafor.

1. INTRODUCTION

As the measure of effect, meta‐analyses of binary outcomes from randomized trials most often use the odds ratio (OR), preferably analyzed as log‐odds‐ratio (LOR). The customary random‐effects analyses assess heterogeneity by using Cochran's $Q$ statistic ¹ and estimate the between‐study variance, $τ^{2}$ , for use in inverse‐variance weights. The resulting weights, based on estimated variances, underlie various shortcomings of that approach. Thus, for assessing heterogeneity and estimating $τ^{2}$ , we have studied $Q_{F}$ , a version of Cochran's $Q$ statistic in which the weights involve only the studies' arm‐level sample sizes. $Q_{F}$ belongs to a class of generalized $Q$ statistics, introduced by DerSimonian and Kacker, ² in which the $w_{i}$ are arbitrary positive constants.

Our initial motivation came from random‐effects meta‐analyses of mean difference (MD) and standardized mean difference (SMD) ³ and, later, LOR, ⁴ where a weighted mean whose weights involved only those effective sample sizes performed well in estimating the overall effect.

Further developments produced $Q_{F}$ and accurate approximations to its distribution function, for testing heterogeneity of MD, ⁵ SMD, ⁵ and three binary effect measures (LOR, log‐relative‐risk, and risk difference). ⁶ Those studies also examined approximations for the customary $Q$ statistic ( $Q_{IV}$ ) and investigated estimates for $τ^{2}$ in MD and SMD.

In the present paper, we derive from $Q_{F}$ new point and interval estimators of $τ^{2}$ for meta‐analysis of LOR. In particular, we study mean‐ and median‐unbiased estimators. Similar new estimators of $τ^{2}$ are available in the procedure rma in metafor. ⁷ However, the performance of these point and interval estimators of $τ^{2}$ had not been evaluated by simulations. Therefore, we carried out an extensive simulation study of the bias of the point estimators and the coverage of the confidence intervals. For comparison we included familiar point and interval estimators of $τ^{2}$ .

Section 2 briefly reviews study‐level estimation of LOR. Section 3 reviews the generic random‐effects model and describes the $Q$ statistic. Section 4 describes random‐effects models for LOR. Section 5 discusses approximations to the distributions of $Q_{F}$ and $Q_{IV}$ . Section 6 introduces new point and interval estimators of $τ^{2}$ for LOR. Section 7 describes the simulation design, and Section 8 summarizes the results. Section 9 examines an example of meta‐analysis using LOR. Section 10 offers a summary and discussion. The Supporting Information provides further details on the interval estimators, the simulation results, and the example.

2. STUDY‐LEVEL ESTIMATION OF LOG‐ODDS‐RATIO

Analyses of log‐odds‐ratio usually adopt binomial distributions to model the numbers of events. In Study $i$ ( $i = 1, \dots, K$ ), $X_{iT}$ and $X_{iC}$ denote the numbers of events in the $n_{iT}$ subjects in the Treatment arm and the $n_{iC}$ subjects in the Control arm. Thus, we treat $X_{iT}$ and $X_{iC}$ as independent binomial variables:

X_{iT} \sim Bin (n_{iT}, p_{iT}) and X_{iC} \sim Bin (n_{iC}, p_{iC}) .

(1)

The log‐odds‐ratio for Study $i$ is

θ_{i} = \log_{e} (\frac{p_{iT} (1 - p_{iC})}{p_{iC} (1 - p_{iT})}) estimated by {\hat{θ}}_{i} = \log_{e} (\frac{{\overset{⌣}{p}}_{iT} (1 - {\overset{⌣}{p}}_{iC})}{{\overset{⌣}{p}}_{iC} (1 - {\overset{⌣}{p}}_{iT})}),

(2)

where ${\overset{⌣}{p}}_{ij}$ is an estimate of $p_{ij}$ .

The customary estimators of $p_{iT}$ and $p_{iC}$ are ${\tilde{p}}_{ij} = X_{ij} / n_{ij}$ (maximum likelihood). A reasonable alternative adds 0.5 to $X_{ij}$ and adds 1 to $n_{ij}$ : ${\hat{p}}_{ij} = (X_{ij} + 0.5) / (n_{ij} + 1)$ eliminates bias of order $1 / n_{ij}$ and provides the least biased estimator of log‐odds. ⁸

As inputs, a two‐stage meta‐analysis uses estimates of the $θ_{i}$ ( ${\hat{θ}}_{i}$ ) and estimates of their variances ( ${\hat{v}}_{i}^{2}$ ). The (conditional, given $p_{ij}$ and $n_{ij}$ ) asymptotic variance of ${\hat{θ}}_{i}$ , derived by the delta method, is

v_{i}^{2} = Var ({\hat{θ}}_{i}) = \frac{1}{n_{iT} p_{iT} (1 - p_{iT})} + \frac{1}{n_{iC} p_{iC} (1 - p_{iC})},

(3)

estimated by substituting ${\tilde{p}}_{ij}$ or ${\hat{p}}_{ij}$ for $p_{ij}$ . The estimator of the variance using the ${\hat{p}}_{ij}$ and $n_{ij} + 1$ instead of $n_{ij}$ in (3) is unbiased in large samples, but Gart et al. ⁸ note that it overestimates the variance for small sample sizes. As we explain in Section 4, the unconditional variance depends on the mechanism generating the $p_{iC}$ and the $p_{iT}$ .

3. RANDOM‐EFFECTS MODEL AND THE $Q$ STATISTIC.

In the most widely used method for two‐stage meta‐analysis, ⁹ Cochran's $Q$ statistic serves as the basis for testing heterogeneity and estimating $τ^{2}$ , for use in the inverse‐variance weights, ${\hat{w}}_{i} = 1 / ({\hat{v}}_{i}^{2} + {\hat{τ}}^{2})$ . $Q$ is a weighted sum of the squared deviations of the estimated effects ${\hat{θ}}_{i}$ from their weighted mean ${\bar{θ}}_{w} = \sum w_{i} {\hat{θ}}_{i} / \sum w_{i}$ :

Q = \sum w_{i} {({\hat{θ}}_{i} - {\bar{θ}}_{w})}^{2} .

(4)

In calculating $Q$ , an estimate of $τ^{2}$ is not yet available, so $w_{i}$ is simply $1 / {\hat{v}}_{i}^{2}$ , the reciprocal of the estimated variance of ${\hat{θ}}_{i}$ (as in Cochran ¹ ). We denote the result by $Q_{IV}$ .

The process underlying the DerSimonian‐Laird ⁹ estimator of $τ^{2}$ ( ${\hat{τ}}_{DL}^{2}$ ) derives the expected value of $Q_{IV}$ and rearranges the resulting expression to obtain $τ^{2}$ in terms of $E (Q_{IV})$ and the $v_{i}^{2} = Var ({\hat{θ}}_{i})$ . Substituting the observed value of $Q_{IV}$ for $E (Q_{IV})$ and the ${\hat{v}}_{i}^{2}$ for the $v_{i}^{2}$ then yields ${\hat{τ}}_{DL}^{2}$ . The convenient step of plugging in the ${\hat{v}}_{i}^{2}$ , however, lacks justification; it inderlies the documented shortcomings of the DL method.

As an alternative, we base estimates of $τ^{2}$ on $Q_{F}$ , studied by Kulinskaya et al., ⁵ in which $w_{i} = {\tilde{n}}_{i} = n_{iC} n_{iT} / n_{i}$ , the effective sample size in Study $i$ ( $n_{i} = n_{iC} + n_{iT}$ ).

In studying properties of estimators under the random‐effects model, we start by taking as given the observed study‐level effects ${\hat{θ}}_{i}$ ; that is, we condition on those values. The model includes a distribution for the true $θ_{i}$ , conventionally $θ_{i} \sim N (θ, τ^{2})$ . Taking expectations with respect to that model yields unconditional estimators, for comparison with the conditional ones.

The random‐effects model assumes that the ${\hat{θ}}_{i}$ are unbiased estimators of the $θ_{i}$ and that the $v_{i}^{2} = Var ({\hat{θ}}_{i}| θ_{i})$ are the corresponding variances (i.e., the true conditional variances).

To develop estimators of $τ^{2}$ based on $Q_{F}$ , we define $W = \sum w_{i}$ , $q_{i} = w_{i} / W$ , and $Θ_{i} = {\hat{θ}}_{i} - θ$ . In this notation, and expanding ${\bar{θ}}_{w}$ , Equation (4) can be written as

Q = W [\sum q_{i} (1 - q_{i}) Θ_{i}^{2} - \sum_{i = j} q_{i} q_{j} Θ_{i} Θ_{j}] .

(5)

We distinguish between the conditional distribution of $Q$ (given the $θ_{i}$ ) and the unconditional distribution, and the respective moments of $Θ_{i}$ . For instance, the conditional second moment of $Θ_{i}$ , denoted by $M_{2 i}^{c}$ , is $v_{i}^{2}$ ; and the unconditional second moment, denoted by $M_{2 i}$ , is $E (Θ_{i}^{2}) = Var ({\hat{θ}}_{i}) = E (v_{i}^{2}) + τ^{2}$ .

Under the REM, it is straightforward to obtain the first moment of $Q_{F}$ as

E (Q_{F}) = W [\sum q_{i} (1 - q_{i}) E (Θ_{i}^{2})] = W [\sum q_{i} (1 - q_{i}) (E (v_{i}^{2}) + τ^{2})] .

(6)

This expression is similar to Equation (4) in DerSimonian and Kacker ² ; they use $v_{i}^{2} + τ^{2}$ instead of the unconditional variance $E (v_{i}^{2}) + τ^{2}$ .

Our simulations yield an exact calculation of conditional central moments of LOR, following the implementation of Kulinskaya and Dollinger. ¹⁰

4. RANDOM‐EFFECTS META‐ANALYSIS OF LOG‐ODDS‐RATIO

The standard REM for LOR assumes that $logit (p_{iT}) = logit (p_{iC}) + θ_{i}$ for $θ_{i} \sim N (θ, τ^{2})$ . The intercept $α_{i} = logit (p_{iC})$ may also be random. Further, $p_{iC}$ and $p_{iT}$ may be correlated. Equation (3) gives the conditional variance of ${\hat{θ}}_{i}$ . The full (unconditional) variance of ${\hat{θ}}_{i}$ depends on the generation mechanism for the $p_{iC}$ and was derived in Kulinskaya et al. ¹¹

Conventionally, the $p_{iC}$ are assumed to be fixed. Then

E (v_{i}^{2}) = \frac{1}{n_{iT} {\overset{⌣}{p}}_{iT} (1 - {\overset{⌣}{p}}_{iT})} + \frac{1}{n_{iC} p_{iC} (1 - p_{iC})} + τ^{2} (1 + \frac{1}{2 n_{iT}} ({[{\overset{⌣}{p}}_{iT} (1 - {\overset{⌣}{p}}_{iT})]}^{- 1} - 2)),

(7)

where ${\overset{⌣}{p}}_{iT} = expit (α_{i} + θ)$ . For use in $Q_{F}$ , this unconditional variance can be estimated by substituting ${\hat{p}}_{iC}$ for $p_{iC}$ and ${\bar{p}}_{iT} = expit ({\hat{α}}_{i} + \hat{θ})$ for ${\overset{⌣}{p}}_{iT}$ , where ${\hat{α}}_{i} = logit ({\hat{p}}_{iC})$ and $\hat{θ}$ is the estimated LOR. We refer to the estimate ${\bar{p}}_{iT}$ of $p_{iT}$ as model‐based. Alternatively, a naïve estimate uses ${\hat{p}}_{iT}$ . Such a naïve estimate has the advantage that it maintains the variance inflation of $E (v_{i}^{2})$ in comparison with $v_{i}^{2}$ (using $v_{i}^{2}$ from Equation (3)). Whenever ${\hat{p}}_{ij}$ is used in Equation (7), $n_{ij} + 1$ replaces $n_{ij}$ .

We use these results in Sections 5 and 6.

5. APPROXIMATIONS TO THE DISTRIBUTIONS OF $Q_{F}$ AND $Q_{IV}$

Our new interval estimators of $τ^{2}$ (Section 6.2) involve the cumulative distribution function of $Q_{F}$ . For LOR, $Q_{F}$ is a quadratic form in asymptotically normal variables. The Farebrother algorithm, ¹² applicable for quadratic forms in normal variables, provides a satisfactory approximation to the cdf for larger sample sizes ( $n \geq 100$ ), though it may not behave well for small $n$ . ⁶ To apply it, we plug in estimated variances. As in Reference [13], we denote the Farebrother approximation for $Q$ with effective‐sample‐size weights by F SSW. We further distinguish between a “model‐based” version and a “naïve” version of F SSW, according to whether we use ${\bar{p}}_{iT}$ or ${\hat{p}}_{iT}$ in Equation (7).

The null distribution of $Q_{IV}$ is usually approximated by the chi‐square distribution with $K - 1$ degrees of freedom. For LOR, as also for both MD and SMD, this approximation is not accurate for small sample sizes. ¹⁴ For LOR, Kulinskaya and Dollinger ¹⁰ provided an improved approximation to the null distribution of $Q_{IV}$ based on fitting two moments of the gamma distribution; we denote this approximation by KD. For comparison, we study point and interval estimators of $τ^{2}$ based on the chi‐square and KD approximations.

6. POINT AND INTERVAL ESTIMATORS OF $τ^{2}$ FOR LOR

6.1. Point estimators

The unconditional variance of $\hat{θ}$ in the customary fixed‐intercept model, Equation (7), can be written as a sum of two terms,

M_{2 i} = E (Var ({\hat{θ}}_{i}| p_{ij}| n_{ij})) + τ^{2} (1 + \frac{1}{2 n_{iT}} ({[{\overset{⌣}{p}}_{iT} (1 - {\overset{⌣}{p}}_{iT})]}^{- 1} - 2)),

(8)

where $E (Var ({\hat{θ}}_{i}| p_{ij}| n_{ij})) = {[n_{iT} {\overset{⌣}{p}}_{iT} (1 - {\overset{⌣}{p}}_{iT})]}^{- 1} + {[n_{iC} p_{iC} (1 - p_{iC})]}^{- 1}$ , $p_{iC} = expit (α_{i})$ , and ${\overset{⌣}{p}}_{iT} = expit (α_{i} + θ)$ . Rearranging the terms in Equation (6) with $E (Θ_{i}^{2}) = M_{2 i}$ and replacing $p_{iC}$ and ${\overset{⌣}{p}}_{iT}$ with ${\hat{p}}_{iC}$ and ${\hat{p}}_{iT}$ (or ${\bar{p}}_{iT}$ ) give the naïve (or model‐based) moment estimator of $τ^{2}$ .

{\hat{τ}}_{U}^{2} = \frac{Q / W - \sum q_{i} (1 - q_{i}) {\hat{v}}_{i}^{2}}{\sum q_{i} (1 - q_{i}) C_{i}},

(9)

where ${\hat{v}}_{i}^{2} = {[n_{iT} {\hat{p}}_{iT} (1 - {\hat{p}}_{iT})]}^{- 1} + {[n_{iC} {\hat{p}}_{iC} (1 - {\hat{p}}_{iC})]}^{- 1}$ and $C_{i} = 1 + \frac{1}{2 n_{iT}} ({[{\hat{p}}_{iT} (1 - {\hat{p}}_{iT})]}^{- 1} - 2)$ . DerSimonian and Kacker ² obtain a similar result; they use the conditional estimate, ${\hat{v}}_{i}^{2}$ , instead of the unconditional estimate, $\hat{E (v_{i}^{2})}$ , obtaining

{\hat{τ}}_{M}^{2} = \frac{Q / W - \sum q_{i} (1 - q_{i}) {\hat{v}}_{i}^{2}}{\sum q_{i} (1 - q_{i})} .

(10)

We study both estimators with effective‐sample‐size weights. With the conditional estimated variances in Equation (10), we denote the estimator by SSC (for “Sample Sizes Conditional”); with the unconditional estimated variances, as in Equation (9), it is SSU (for “Sample Sizes Unconditional”). These estimators differ by a term of order $O (1 / n_{i})$ and will be very similar for large sample sizes.

The estimators ${\hat{τ}}_{U}^{2}$ and ${\hat{τ}}_{M}^{2}$ arose from setting the observed value of $Q$ equal to its expected value and solving for $τ^{2}$ . Instead of the expected value, one could use the median of the distribution of $Q$ given $τ^{2}$ . ¹⁵ , ¹⁶ , ¹⁷ If the true (or approximate) cumulative distribution function is $F (\cdot τ^{2})$ , a point estimator of $τ^{2}$ can be found as

{\hat{τ}}_{med}^{2} = \max (0, \{τ^{2} : F (Q| τ^{2}) = 0.5\}) .

(11)

In the Farebrother approximation to the distribution of $Q_{F}$ (Section 5), one can use either the conditional estimated variances or the unconditional estimated variances. We denote the resulting estimators by SMC and SMU (“Sample sizes Median (Un)Conditional”), respectively.

Choosing between the “model‐based” and the “naïve” estimate of $p_{iT}$ in $M_{2 i}$ (8) yields “model‐based” and “naïve” versions of SSU and SMU: SSU model or SSU naïve and SMU model or SMU naïve.

The SSC and SMC estimators can be obtained from the procedure rma in metafor ⁷ by choosing as the method “GENQ” or “GENQM,” respectively, and specifying $nbar$ weights.

For comparison, our simulations (Section 7) include four estimators that use inverse‐variance weights: DerSimonian‐Laird ⁹ (DL), restricted maximum‐likelihood (REML), Mandel‐Paule ¹⁸ (MP), and an estimator (KD) based on the work of Kulinskaya and Dollinger ¹⁰ and discussed by Bakbergenuly et al. ⁴ KD uses an improved non‐null first moment of $Q$ and has better performance than most other estimators of $τ^{2}$ . In their review of methods for estimating the between‐study variance, Veroniki et al. ¹⁹ explain that DL is (by default) the most widely used, and they conclude that both REML and MP are better.

A perennial question involves whether analysts should add $1 / 2$ to each of $X_{iT}, X_{iC}, n_{iT} - X_{iT}, n_{iC} - X_{iC}$ only when one of them is zero, or in all studies. This is equivalent to using $\tilde{p}$ (whenever possible) or $\hat{p}$ (always), respectively, when estimating $θ_{i}$ and $v_{i}^{2}$ . To obtain evidence on this issue, we included the corresponding two versions, “only” and “always,” of DL, REML, MP, SSC, and SMC (we follow the prevalent practice of omitting “double‐zero” studies, in which two of those cell counts are zero).

Table 1 gives the full list of point estimators.

TABLE 1.

Point and interval estimators of $τ^{2}$ in the simulations.

Estimator	Description	Add $1 / 2$
Estimator	Description	“Only”	“Always”
Point estimators
DL	DerSimonian‐Laird, ⁹ a moment estimator based on $χ^{2}$ approximation to distribution of $Q_{IV}$	x	x
REML	Restricted Maximum Likelihood	x	x
MP	Mandel‐Paule, ¹⁸ a moment estimator based on $χ^{2}$ approximation to distribution of $Q_{IV}$	x	x
KD	Kulinskaya‐Dollinger, ¹⁰ Bakbergenuly et al. ⁴ a moment estimator based on improved Gamma approximation to $Q_{IV}$		x
SSC	Equation (10), effective‐sample‐size weights, conditional variance (3) of $\hat{θ}$	x	x
SSU model	Equation (9), effective‐sample‐size weights, model‐based estimate of $p_{iT}$ in unconditional variance (7) of $\hat{θ}$		x
SSU naïve	Equation (9), effective‐sample‐size weights, naïve estimate of $p_{iT}$ in unconditional variance (7) of $\hat{θ}$		x
SMC	Median‐unbiased, Equation (11), effective‐sample‐size weights, conditional variance (3) of $\hat{θ}$	x	x
SMU model	Median‐unbiased, Equation (11), effective‐sample‐size weights, model‐based estimate of $p_{iT}$ in unconditional variance (7) of $\hat{θ}$		x
SMU naïve	Median‐unbiased, Equation (11), effective‐sample‐size weights, naïve estimate of $p_{iT}$ in unconditional variance (7) of $\hat{θ}$		x
Confidence intervals
QP	Q‐profile, Viechtbauer, ²⁰ Appendix S1.2	x	x
PL	Profile Likelihood, Hardy & Thompson, ²¹ Appendix S1.1	x	x
KD	Kulinskaya–Dollinger, ¹⁰ Bakbergenuly et al., ⁴ Appendix S1.3, profiled improved Gamma approximation to distribution of $Q_{IV}$		x
FPC	Farebrother Profile, that is, profiled Farebrother approximation to distribution of $Q_{F}$ , effective‐sample‐size weights, conditional variance (3) of $\hat{θ}$	x	x
FPU model	Farebrother Profile, effective‐sample‐size weights, model‐based estimate of $p_{iT}$ in unconditional variance (7) of $\hat{θ}$		x
FPU naïve	Farebrother Profile, effective‐sample‐size weights, naïve estimate of $p_{iT}$ in unconditional variance (7) of $\hat{θ}$		x

Open in a new tab

6.2. Interval estimators

Straightforward use of the cumulative distribution function $F (\cdot τ^{2})$ also yields a $100 (1 - α) %$ confidence interval for $τ^{2}$ :

\{τ^{2} \geq 0 : F (Q| τ^{2}) \in [α / 2, 1 - α / 2]\} .

We use both the conditional estimated variances and the unconditional estimated variances in the Farebrother approximation to $Q_{F}$ (Section 5); we refer to the resulting profile estimators as FPC and FPU (“Farebrother Profile (Un)Conditional”) intervals. Jackson ²² introduced a similar approach using conditional variances. The FPC interval can be obtained from the confint procedure in metafor ⁷ for “GENQ” or “GENQM” objects that used $nbar$ weights. For the FPU intervals, we further distinguish between a “model‐based” version and a “naïve” version, according to whether we use ${\bar{p}}_{iT}$ or ${\hat{p}}_{iT}$ in $M_{2 i}$ (8). Kulinskaya and Hoaglin ¹³ give the higher unconditional moments of $\hat{θ}$ required for the FPU intervals.

Our simulations (Section 7) also include the profile‐likelihood interval, ²¹ the Q‐profile interval, ²⁰ and the KD interval. ¹⁰ Table 1 gives the full list, and Section S1 in the Supporting Information gives further details.

7. SIMULATION DESIGN

Our simulation design followed that described in Bakbergenuly et al. ⁴ Briefly, we varied five parameters: the overall true effect ( $θ$ ), the between‐studies variance ( $τ^{2}$ ), the number of studies $(K)$ , the studies' total sample size ( $n$ or $\bar{n}$ , the average sample size), and the probability in the control arm ( $p_{iC}$ ). We kept the proportion of observations in the control arm ( $f$ ) at $1 / 2$ .

The values of $θ$ (0, 0.1, 0.5, 1, 1.5, and 2) aim to represent the range containing most values encountered in practice. LOR is a symmetric effect measure, so the sign of $θ$ is not relevant.

The values of $τ^{2}$ (0(0.1)1) systematically cover a reasonable range.

The numbers of studies ( $K = 5$ , 10, and 30) reflect the sizes of many meta‐analyses and have yielded valuable insights in previous work.

In practice, many studies' total sample sizes fall in the ranges covered by our choices $(n = 20$ , 40, 100, and 250 when all studies have the same $n$ , and $\bar{n} = 30$ , 60, 100, and 160 when sample sizes vary among studies). The choices of sample sizes corresponding to $\bar{n}$ follow a suggestion of Sánchez‐Meca and Marín‐Martínez, ²³ who constructed the studies' sample sizes to have skewness 1.464, which they regarded as typical in behavioral and health sciences. For $K = 5,$ Table 2 lists the sets of five sample sizes. The simulations for $K = 10$ and $K = 30$ used each set of unequal sample sizes twice and six times, respectively.

TABLE 2.

Values of parameters in the simulations.

Parameter	Equal study sizes	Unequal study sizes
$K$ (number of studies)	5, 10, 30
$n$ or $\bar{n}$ (average (individual) study size—total of the two arms) For $K = 10$ and $K = 30$ , the same set of unequal study sizes is used twice or six times, respectively.	20, 40, 100, 250	30 (12,16,18,20,84), 60 (24,32,36,40,168), 100 (64,72,76,80,208), 160 (124,132,136,140,268)
$f$ (proportion of observations in the control arm)	1/2
$p_{iC}$ (probability in the control arm)	0.1, 0.2, 0.5
$θ$ (true value of LOR)	0, 0.1, 0.5, 1, 1.5, 2
$τ^{2}$ (variance of random effects)	0 (0.1)1

Open in a new tab

The values of $p_{iC}$ , 0.1, 0.2, and 0.5, provide a typical range of small to medium risks.

The values of $p_{iC}$ and the true effect $θ_{i}$ defined the probabilities $p_{iT}$ , and the counts $X_{iC}$ and $X_{iT}$ were generated from the respective binomial distributions. We used a total of $10,000$ repetitions for each combination of parameters (which we also call a situation). We discarded “double‐zero” and “double‐n” studies and reduced the observed value of $K$ accordingly. Next, we discarded repetitions with $K < 3$ and used the observed number of repetitions for analysis.

The simulations used R statistical software. ²⁴ We used metafor for all methods of interest that it implemented. Table 1 gives the full list of point and interval estimators of $τ^{2}$ . The user‐friendly R programs implementing our methods are available on OSF. ²⁵

8. SIMULATION RESULTS

Our eprint ²⁶ reports the full simulation results. Here we describe the most important findings. We also refer to Figures S1 through S10 in the Supporting Information.

8.1. Bias in point estimation of $τ^{2}$ for LOR

The relation of each estimator's bias to $τ^{2}$ is roughly linear, with variation in intercept and, especially, in slope among the situations in the simulation. The slope varies most with $n$ and $K$ and to a lesser extent with $p_{iC}$ , $θ$ , and whether sample sizes are equal or unequal. In one of the more extreme examples, when $p_{iC} = .1$ , $θ = 0$ , and $K = 5$ , the bias of SMC “only” when $n = 20$ is $+ 0.29$ at $τ^{2} = 0$ and $- 0.22$ at $τ^{2} = 1$ , whereas when $n = 250$ , it is $+ 0.09$ at $τ^{2} = 0$ and $+ 0.35$ at $τ^{2} = 1$ . The estimators' traces combine to form various patterns (Figure 1).

Bias of estimators of between‐study variance of LOR (the “only” versions of DL, REML, MP, and SMC; SSC “always”; KD; and the model versions of SMU and SSU) versus $τ^{2}$ , for equal sample sizes $n = 20, 40, 100$ , and $250$ , $p_{iC} = 0.1$ , $θ = 0$ , and $f = 0.5$ . [Colour figure can be viewed at wileyonlinelibrary.com]

For small sample sizes, all estimators of $τ^{2}$ have considerable bias, positive at $τ^{2} = 0$ and negative for larger $τ^{2}$ . The traces are roughly parallel. The pattern becomes tighter as $K$ increases. When $p_{iC} = 0.1$ , a few estimators have bias that is almost constant in $τ^{2}$ when $n \geq 100 .$ When $n = 250$ , the traces form a fan‐shaped pattern, in which the bias ranges from 0.02 to 0.07 when $τ^{2} = 0$ and from $- 0.2$ to $+ 0.35$ when $τ^{2} = 1$ . The flattening and fan‐shaped pattern occur at somewhat smaller $n$ as $θ$ increases. The fan‐shaped pattern appears by $n = 100$ when $p_{iC} = 0.2$ and by $n = 40$ when $p_{iC} = .5$ . For larger sample sizes, some estimators have negative trends (or no trend), and other have positive trends.

The best estimators, almost unbiased when $n \geq 250$ and $p_{iC} = 0.1$ , are MP “only,” KD, SSU model, and SSC “always.” The same estimators are recommended for larger values of $p_{iC}$ , where they generally become less biased earlier. When $p_{iC} = 0.2$ or $p_{iC} = 0.5$ , various transitions occur at smaller sample sizes (Figures S1 and S2).

For comparison, our simulations included the popular point estimators DL and REML. We focus on their “only” version. The trace for the bias of DL “only” is in the middle of the traces for the collection of estimators, or slightly lower. When $n = 250$ or $\bar{n} = 160$ , its negative bias, increasing in size as $τ^{2}$ increases, stands out (usually alone).

The trace for REML “only” lies very close to that for DL, except that when $n = 250$ (or, less often, $\bar{n} = 160$ ), it is close to zero instead of trending negative.

To summarize, we recommend MP “only,” KD, SSU model, and SSC “always”; and we emphatically recommend against using DL and REML.

8.2. Median bias of estimators of $τ^{2}$ for LOR

We define median bias as $P ({\hat{τ}}^{2} \geq τ^{2}) - P ({\hat{τ}}^{2} \leq τ^{2})$ . For a median, the median bias is zero.

For $0.1 \leq τ^{2} \leq 1$ , all estimators have negative bias for small to medium sample sizes, $n \leq 40$ (Figure 2). Interestingly, the median bias of most estimators is almost constant across the range of nonzero $τ^{2}$ values. That level, however, varies with $n$ and $K$ . As $n$ increases, the bias becomes less negative (but not small), but increasing $K$ tends to make it more negative. DL “only,” REML “only,” and MP “only” depart from the almost constant pattern. As $K$ increases, the trace of DL “only” declines steeply (e.g., from $- 0.20$ at $τ^{2} = 0.1$ to $- 0.68$ at $τ^{2} = 1$ when $p_{iC} = 0.2$ , $θ = 0$ , $n = 100$ , and $K = 30$ , Figure S3), and the trace of REML “only” declines less steeply (and only when $p_{iC} = 0.5$ , Figure S4). The trace of MP “only” tends to have a positive slope when $n = 40$ and $n = 100$ and $θ \geq 1$ and $p_{iC} = 0.1$ and a negative slope when $n = 20$ and $n = 40$ and $p_{iC} = 0.5$ .

Median bias of estimators of between‐study variance of LOR (the “only” versions of DL, REML, MP, and SSC; SMC “always”; KD; and the model versions of SMU and SSU) versus $τ^{2},$ for equal sample sizes $n = 20, 40, 100$ and $250$ , $p_{iC} = 0.1$ , $θ = 0$ and $f = 0.5$ . [Colour figure can be viewed at wileyonlinelibrary.com]

When $n = 20$ or $\bar{n} = 30$ and $p_{iC} = 0.1$ , none of the estimators are satisfactory. KD comes closest, with median bias around $- 0.25$ at best. For $p_{iC} = 0.2$ and $p_{iC} = 0.5$ , SMC “always” and SMC “only” are better; in a few situations one or both have small positive or negative bias.

The majority of the standard estimators have negative bias for larger values of $n$ . This means that their median values of ${\hat{τ}}^{2}$ are too low. When $n \geq 100$ for $p_{iC} = 0.1$ and $n \geq 40$ for $p_{iC} \geq 0.2$ , the new median‐unbiased estimators perform well. We recommend SMC “always” and SMU model. Both consistently result in almost median‐unbiased estimation across the range of $p_{iC}$ values.

8.3. Coverage of interval estimators of $τ^{2}$ for LOR

The results of our simulations show a few general patterns, but specific choices of interval estimators depend on values of parameters or combinations of parameters. In particular, we often separate $K = 30$ from $K = 5$ and $K = 10$ .

At $τ^{2} = 0$ coverage is always too high (essentially 1.00). For small $n$ , it is roughly flat when $τ^{2} \geq 0.1$ and $K = 5$ or $K = 10$ .

When $p_{iC} = 0.1$ and $θ = 0$ , coverage of most estimators remains too high when $n \leq 100$ (Figure 3). KD and FPU model are close to 0.95 when $n = 100$ . When $K = 30$ and $n = 20$ or $n = 40$ , all estimators except KD break down: most have coverage $< 0.90$ at $τ^{2} = 0.1$ , and their coverage declines steeply as $τ^{2}$ increases. Some form of breakdown persists for all $θ$ . As $θ$ increases, and $n = 20$ or $n = 40$ , coverage at $0.1 \leq τ^{2} \leq 1$ moves toward 0.95. KD and FPC model (for $n \geq 40$ ) seem the best choices.

Coverage of 95% confidence intervals for between‐study variance of LOR (the “only” versions of PL, QP and FPC; KD; and the model version of FPU) versus $τ^{2}$ , for equal sample sizes $n = 20, 40, 100$ and $250$ , $p_{iC} = 0.1$ , $θ = 0$ _, and $f = 0.5$ . [Colour figure can be viewed at wileyonlinelibrary.com]

When $p_{iC} = 0.2$ or $p_{iC} = 0.5$ (Figures S5 and S6), the picture is simpler. Coverage when $n = 20$ or $n = 40$ becomes closer to (or equals) 0.95 as $p_{iC}$ increases, except for the “always” versions of PL, QP, and FPC when $p_{iC} = 0.2$ . For $K = 30$ , KD seems the best single choice.

Coverage of PL is usually too high, and in some situations it seems trapped at 1.00.

8.4. Left and right coverage error

It is often informative to approach estimation of coverage by separating its complement, the coverage error or “miscoverage,” into two parts, corresponding to whether the value of the parameter is to the left of the lower confidence limit or to the right of the upper confidence limit. Efron and Tibshirani in Reference [27], section 13.5 denote these by “miss left” and “miss right.” A CI that had a small miss‐left percentage would overcover on the left; if it had a large percentage, it would undercover. The confidence intervals that we studied aim to have miscoverage equal to 2.5% on each side (Section 6.2), but an approximation for the distribution of $Q$ may not provide the desired balance, even if the overall coverage is close to 95%. Thus, our simulations included the miss‐left and miss‐right percentages.

In general, the miss‐left percentages are typically lower than 2.5% for small sample sizes and/or control‐arm probabilities, but they improve for $n \geq 100$ and for larger $p_{iC}$ (Figures 4, S7, S8). A low miss‐left percentage means that the confidence interval includes an excess of low $τ^{2}$ values.

Miss‐left probability of PL, QP, KD, FPC, and FPU 95% confidence intervals for between‐study variance of LOR versus $τ^{2}$ , for equal sample sizes $n = 20, 40, 100$ and $250$ , $p_{iC} = 0.1$ , $θ = 0$ , and $f = 0.5$ . Solid lines: the “only” versions of PL, QP and FPC; KD; and the model version of FPU. Dashed lines: the “always” version of FPC and the naïve version of FPU. [Colour figure can be viewed at wileyonlinelibrary.com]

The miss‐left percentages of KD, FPU naïve, and FPC “always” are closer to 2.5% from $n = 100$ , and miss‐left percentages of QP are typically lower than nominal when $p_{iC} = 0.1$ but improve for larger $p_{iC}$ .

The only exceptions to typically lower than nominal miss‐left percentages are the FPC “only” and FPU model intervals, whose percentages for $n \geq 40$ are often higher than nominal for $K = 30$ and sometimes when $K = 10$ . KD occasionally has high percentages for high values of $θ$ when $K = 30$ . PL miss‐left percentages are especially low for all sample sizes.

The miss‐right percentages are often higher than nominal, but they improve for larger $n$ and $p_{iC}$ (Figures 5, S9, S10). $K = 30$ is especially challenging.

Miss‐right probability of PL, QP, KD, FPC, and FPU 95% confidence intervals for between‐study variance of LOR versus $τ^{2}$ , for equal sample sizes $n = 20, 40, 100$ and $250$ , $p_{iC} = 0.1$ , $θ = 0$ , and $f = 0.5$ . Solid lines: the “only” versions of PL, QP and FPC; KD; and the model version of FPU. Dashed lines: the “always” version of FPC and the naïve version of FPU. [Colour figure can be viewed at wileyonlinelibrary.com]

For $p_{iC} = 0.1$ , the miss‐right percentages of KD, QP “always,” and FPC “always” are closest to 2.5% when $n = 20$ and $K = 5$ , but increase for larger $n$ . When $K = 5$ or $10$ , the percentages of FPC “only” and QP “only” are reasonably close to 2.5% when $n \geq 40$ . The percentages of KD are consistently close to 2.5% for $K = 10$ . For $K = 30$ , the range of results is larger. The percentages of KD, QP “only,” and FPC “always” are close to 2.5% when $n \geq 100$ . The percentages for all other estimators are typically higher, with the exception of FPU model, which has lower than nominal percentages. PL produces erratic percentages, from 0 to above 10%. For larger values of $p_{iC}$ , the miss‐right percentages of all estimators, with the exception of PL, improve earlier, so that those of KD, QP “only,” and FPC “always” are all close to 2.5% when $n = 40$ and $p_{iC} = 0.5$ , even when $K = 30$ .

Lack of balance in the miss‐left and miss‐right percentages for small sample sizes and/or control‐arm probabilities agrees with our findings ¹³ that the chi‐square approximation for $Q_{IV}$ and the Farebrother approximation for $Q_{F}$ are accurate only for larger sample sizes.

9. EXAMPLE: SMOKING CESSATION

Stead et al. ²⁸ conducted a systematic review of clinical trials on the use of physician advice for smoking cessation. We use the data from the subgroup of interventions in which the treatment involved only one visit (Comparison 3.1.4, p. 54). The first version of the report was published in 2001. In an update, published in 2004, 17 studies included this comparison. The 2013 update included one more study, by Unrod (2007). For each study, Table S1 gives the number of subjects in the treatment and control arms and the number who were nonsmokers at the longest follow‐up time reported (either 6 months or 12 months). The definition of “nonsmoker” varied among the studies. Some studies required sustained abstinence, and others only asked about smoking status at that time. Stead et al. ²⁸ analyzed relative risk. Kulinskaya and Hoaglin ⁶ showed that both OR and RR are reasonable effect measures for these data. Here we consider estimation of $τ^{2}$ in two meta‐analyses of LOR.

The studies were mostly balanced, though two studies had substantially more subjects in the treatment arm. Sample sizes varied from 182 to 3128, with an average of 836 patients per study. The mean probabilities of smoking cessation in both arms were rather low, at $0.058$ in the treatment arm and $0.043$ in the control arm.

The standard IV‐based meta‐analysis of LOR for the original 17 studies gives $\hat{θ} = 0.4774$ with standard error $0.1148$ and $p < 0.0001$ for the intervention effect ( ${\hat{τ}}_{MP}^{2} = 0.0754$ ). The fixed‐weights estimate of $θ$ is higher, at $0.7127$ . In testing for heterogeneity, $Q_{IV} = 24.84$ , and the chi‐square approximation on 16 df provides a p‐value of $.079$ ( $I^{2} = 38.18 %$ ). The p‐values for the KD and F SSW naïve methods (which Kulinskaya and Hoaglin ⁶ recommend) are very close, at 0.035 and 0.038, respectively.

Table 3 shows striking differences: the SSC and SSU estimates of $τ^{2}$ are more than twice as large as the standard IV estimates. This agrees with our simulations, which showed large positive biases of SSC and SSU for low values of $τ^{2}$ . All confidence intervals have zero as the lower limit (Table 4). This result contradicts the significant p‐values from the KD and F SSW naïve tests, but it can be explained by the inflated confidence level of all six confidence intervals near zero. Adding $1 / 2$ to the numbers of events somewhat reduces the estimated values and the upper confidence limits. This would somewhat reduce the positive biases and the inflation in coverage. Because of the large sample sizes, there are only minor differences between conditional and unconditional model‐based or naïve estimators of $τ^{2}$ and confidence intervals.

TABLE 3.

Estimated values of $τ^{2}$ in the meta‐analysis of LOR for the data of Stead et al. on the use of physician advice for smoking cessation.

Example	Add 1/2	DL	REML	MP	KD	SSC	SSU model	SSU naïve	SMC	SMU model	SMU naïve
Stead et al. (17 studies)	Always	0.0627	0.0696	0.0701	0.0887	0.1715	0.1725	0.1680	0.2014	0.2043	0.1972
Stead et al. (17 studies)	Only	0.0675	0.0769	0.0754		0.1776			0.2098
Stead et al. (18 studies)	Always	0.0514	0.0537	0.0576	0.0740	0.1634	0.1642	0.1601	0.1913	0.1939	0.1873
Stead et al. (18 studies)	Only	0.0555	0.0597	0.0623		0.1697			0.1997

Open in a new tab

TABLE 4.

95% confidence intervals for the between‐study variance in the meta‐analysis of LOR for the data of Stead et al. on the use of physician advice for smoking cessation.

Example	Add 1/2	QP	PL	KD	FPC	FPU model	FPU naïve
Stead et al. (17 studies)	Always	[0, 0.3745]	[0, 0.3445]	[0, 0.3910]	[0, 0.7583]	[0, 0.7554]	[0, 0.7389]
Stead et al. (17 studies)	Only	[0, 0.3957]	[0, 0.3713]		[0, 0.8075]
Stead et al. (18 studies)	Always	[0, 0.3267]	[0, 0.2915]	[0, 0.3419]	[0, 0.6992]	[0, 0.6969]	[0, 0.6820]
Stead et al. (18 studies)	Only	[0, 0.3455]	[0, 0.3139]		[0, 0.7458]

Open in a new tab

Addition of the 18th study somewhat increased the p‐value for the standard $Q$ test, to $0.107$ , and the KD p‐value to $0.052$ , but hardly affected the p‐values of the SSW‐based methods. The recommended F SSW naïve test rejects homogeneity at the 5% significance level, with $p = 0.033$ . The estimated $τ^{2}$ values are somewhat smaller, and all confidence intervals still start at zero.

To better understand the properties of the estimation methods for very small values of $p_{iC}$ in this example, we performed additional simulations with 1000 repetitions in the relevant intervals of values, $τ^{2} \in [0, 0.30]$ and $θ \in [0.4, 0.8]$ , keeping the sample sizes as in the example and using the probabilities $p_{iC} = X_{iC} / n_{iC}$ . Figure 6 shows the results. In agreement with our main series of simulations, KD, MP “only”, SSC “always,” and SSU model are the least biased estimators of $τ^{2}$ , whereas SMC “only” is the least median‐biased. Table S2 gives summary statistics for these five estimators when $τ^{2} = 0.05$ and $0.2$ , and $θ = 0.5$ and $0.7$ . Additionally, Table S2 provides summary statistics when $θ = 0.5$ and $τ^{2} = 0.2$ , but the control arm probabilities are less extreme, at $p_{iC} = (X_{iC} / n_{iC}) + .1$ .

Bias, median bias, and coverage at 95% nominal confidence level of point and interval estimators of $τ^{2}$ for the example of meta‐analysis by Stead et al. Solid lines: the “only” versions of DL, REML, MP, SSC, and SMC; KD; the model versions of SSU and SMU point estimators; the “only” versions of PL, QP, and FPC; FPU model, and KD intervals. Dashed lines: the “always” versions of SSC and SMC point estimators. [Colour figure can be viewed at wileyonlinelibrary.com]

Both lower and upper quartiles for the first four estimators of $τ^{2}$ are comparable, though the upper quartile of SMC “only” is somewhat higher. However, the three effective‐sample‐size estimators have a much longer right tail than KD and MP “only.” This shape explains their much higher mean values. Comparing the medians, SMC “only” is almost median‐unbiased in all five scenarios, whereas SSC “always” and SSU model have the lowest medians, followed by MP “only” and then KD. This pattern is especially noticeable when the control arm probabilities are low. The lower the median, the more often these estimators would underestimate true heterogeneity. The differences between distributions decrease quite considerably for larger control arm probabilities, so the KD and MP “only” densities practically coincide, as do the densities of SSC “always” and SSU model (Figure 7). Overall, KD is the best mean‐unbiased estimator, and SMC “only” is the best median‐unbiased estimator. Which one to prefer depends very much on the context of the research. All confidence intervals provide reasonable coverage, though KD is sometimes too liberal, and PL too conservative (Figure 6).

Density plots of the point estimators of $τ^{2}$ for the example of meta‐analysis by Stead et al.: the “only” versions of MP and SMC; the “always” version of SSC; the model version of SMU; and KD. [Colour figure can be viewed at wileyonlinelibrary.com]

10. DISCUSSION

Estimation of heterogeneity variance $τ^{2}$ is an important part of any meta‐analysis. Apart from its importance per se, it also affects the value of an estimated pooled effect and its estimated variability. In this paper, we studied estimation of $τ^{2}$ based on $Q_{F}$ , a generalized $Q$ statistic with fixed effective‐sample‐size weights, and compared four proposed new point estimators of $τ^{2}$ (SSC, SMC, SSU, and SMU) and two new interval estimators (FPC and FPU) with traditional estimators based on the $Q$ statistic with inverse‐variance weights, $Q_{IV}$ .

The new point estimators based on the expected value of $Q_{F}$ involve estimates ( ${\hat{v}}_{i}$ ) of the variances of the ${\hat{θ}}_{i}$ . We obtained SSC from the conditional variances and SSU from the unconditional variances of the ${\hat{θ}}_{i}$ . Similarly, our novel point estimators based on the median of the Farebrother approximation to the distribution of $Q_{F}$ are SMC and SMU, and our profile interval estimators are FPC and FPU.

For each unconditional estimator, we investigated two approaches to estimation of $p_{iT}$ (which is used in the calculation of the second and fourth central moments of ${\hat{θ}}_{i}$ ), “naïve” estimation of $p_{iT}$ from $X_{iT}$ and $n_{iT}$ and “model‐based” estimation using the fixed‐effects meta‐analysis estimate of the overall effect to obtain ${\bar{p}}_{iT}$ from ${\hat{p}}_{iC}$ . Additionally, we considered two versions of the traditional and the unconditional estimators of $τ^{2}$ : adding $1 / 2$ to the four cell counts always, or only when one of those counts is zero. Thus, our simulation study examined a total of 15 point estimators of $τ^{2}$ and 9 interval estimators (Table 1).

For mean bias, the best estimators are MP “only,” KD, SSU model, and SSC “always.” The DL and REML estimators have the worst bias. For median bias, which was not studied previously, an important finding is that the majority of the standard estimators have negative bias for larger values of $n$ . This means that their median values of ${\hat{τ}}^{2}$ are too low. We recommend SMC “always” and SMU model. Both consistently result in almost median‐unbiased estimation across the range of $p_{iC}$ values for moderate to large $n$ .

Coverage of the confidence intervals depends on $p_{iC}$ , $n$ , and $K$ . There is also some dependence on $θ$ and $τ^{2}$ . However, KD, FPC “always,” and FPU model are the best choices overall. PL is the worst choice. We also considered left and right coverage errors separately. Typically, the miss‐left rates are below, and the miss‐right rates are above, the nominal 2.5% level, especially so for small sample sizes and/or control arm probabilities. Arguably, this is not bad, as it would reduce the width of the confidence intervals. ²⁹ Lack of balance in the miss‐left and miss‐right rates agrees with our findings ¹³ that the chi‐square approximation for $Q_{IV}$ and the Farebrother approximation for $Q_{F}$ are accurate only for larger sample sizes.

Alongside the novel two‐stage estimators using sample‐size‐based weights, our simulation study involved a variety of established inverse‐variance‐based two‐stage estimators of heterogeneity. With a very few exceptions, such as MP and, especially, KD, their performance is not impressive. One‐stage meta‐analysis is often suggested as a better choice. However, we previously considered quality of one‐stage estimation in GLMM‐based binomial‐normal models. ¹¹ , ³⁰ and found it lacking. Also, it is useful to remember that GLMMs are still asymptotic methods that use normal likelihood.

Ideally, for meta‐analyses of log‐odds‐ratio, a single point estimator of $τ^{2}$ would have acceptably low bias (and, perhaps, median bias) for a substantial region of $p_{iC}$ , $θ$ , $n$ , $K$ , and $τ^{2}$ . Similarly, a single confidence interval would have close to nominal coverage of $τ^{2}$ . Our results show some progress toward those goals, by demonstrating advantages of new estimators in some situations and, importantly, by demonstrating that some popular estimators have unacceptable bias or coverage. The goals, however, require further research. In the interim, effective education can help users avoid methods that perform poorly.

AUTHOR CONTRIBUTIONS

Elena Kulinskaya: Conceptualization; funding acquisition; methodology; software; visualization; writing – original draft; writing – review and editing. David C. Hoaglin: Conceptualization; investigation; methodology; writing – original draft; writing – review and editing.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest.

Supporting information

Data S1: Supporting Information.

JRSM-14-671-s001.pdf^{(310.8KB, pdf)}

ACKNOWLEDGMENTS

The work by E. Kulinskaya was supported by the Economic and Social Research Council [grant number ES/L011859/1].

Kulinskaya E, Hoaglin DC. Estimation of heterogeneity variance based on a generalized Q statistic in meta‐analysis of log‐odds‐ratio. Res Syn Meth. 2023;14(5):671‐688. doi: 10.1002/jrsm.1647

DATA AVAILABILITY STATEMENT

Our full simulation results are available as an arXiv e‐print (arXiv:2208.00707v1). The user‐friendly R program implementing all studied estimators of heterogeneity variance \$\tau^2\$ in meta‐analysis of log‐odds‐ratio with related confidence intervals is available at https://osf.io/5n3vd

REFERENCES

1. Cochran WG. The combination of estimates from different experiments. Biometrics. 1954;10(1):101‐129. [Google Scholar]
2. DerSimonian R, Kacker R. Random‐effects model for meta‐analysis of clinical trials: an update. Contemp Clin Trials. 2007;28(2):105‐114. [DOI] [PubMed] [Google Scholar]
3. Bakbergenuly I, Hoaglin DC, Kulinskaya E. Estimation in meta‐analyses of mean difference and standardized mean difference. Stat Med. 2020;39(2):171‐191. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Bakbergenuly I, Hoaglin DC, Kulinskaya E. Methods for estimating between‐study variance and overall effect in meta‐analysis of odds ratios. Res Synth Methods. 2020;11(3):426‐442. doi: 10.1002/jrsm.1404 [DOI] [PubMed] [Google Scholar]
5. Kulinskaya E, Hoaglin DC, Bakbergenuly I, Newman J. A Q statistic with constant weights for assessing heterogeneity in meta‐analysis. Res Synth Methods. 2021;12:711‐730. doi: 10.1002/jrsm.1491 [DOI] [PubMed] [Google Scholar]
6. Kulinskaya E, Hoaglin DC. Simulations for the Q statistic with constant and inverse variance weights for binary effect measures. 2022. arXiv:2206.08907v1 [statME].
7. Viechtbauer W. Conducting meta‐analyses in R with the metafor package. J Stat Softw. 2010;36:3 https://www.metafor-project.org. [Google Scholar]
8. Gart JJ, Pettigrew HM, Thomas DG. The effect of bias, variance estimation, skewness and kurtosis of the empirical logit on weighted least squares analyses. Biometrika. 1985;72(1):179‐190. [Google Scholar]
9. DerSimonian R, Laird N. Meta‐analysis in clinical trials. Control Clin Trials. 1986;7(3):177‐188. [DOI] [PubMed] [Google Scholar]
10. Kulinskaya E, Dollinger MB. An accurate test for homogeneity of odds ratios based on Cochran's Q‐statistic. BMC Med Res Methodol. 2015;15(1):49. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Kulinskaya E, Hoaglin DC, Bakbergenuly I. Exploring consequences of simulation design for apparent performance of methods of meta‐analysis. Stat Methods Med Res. 2021;30(7):1667‐1690. PMID: 34110941. doi: 10.1177/09622802211013065 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Farebrother RW, Algorithm AS 204: the distribution of a positive linear combination of χ ² random variables. J R Stat Soc Ser C. 1984;33(3):332‐339. [Google Scholar]
13. Kulinskaya E, Hoaglin DC. On the Q statistic with constant weights in meta‐analysis of binary outcomes. 2022. Available at Research Square. doi: 10.21203/rs.3.rs-2121915/v1 [DOI] [PMC free article] [PubMed]
14. Viechtbauer W. Hypothesis tests for population heterogeneity in meta‐analysis. Br J Math Stat Psychol. 2007;60:29‐60. [DOI] [PubMed] [Google Scholar]
15. Bakbergenuly I, Hoaglin DC, Kulinskaya E. On the Q statistic with constant weights for standardized mean difference. Br J Math Stat Psychol. 2021;75:444‐465. doi: 10.1111/bmsp.12263 [DOI] [PubMed] [Google Scholar]
16. Brown GW. On small‐sample estimation. Ann Math Stat. 1947;18:582‐585. [Google Scholar]
17. Viechtbauer W. Median‐unbiased estimators for the amount of heterogeneity in meta‐analysis. Paper presented at: 9th European Congress of Methodology. European Association of Methodology. 2021.
18. Mandel J, Paule RC. Interlaboratory evaluation of a material with unequal numbers of replicates. Anal Chem. 1970;42(11):1194‐1197. [Google Scholar]
19. Veroniki AA, Jackson D, Viechtbauer W, et al. Methods to estimate the between‐study variance and its uncertainty in meta‐analysis. Res Synth Methods. 2016;7:55‐79. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Viechtbauer W. Confidence intervals for the amount of heterogeneity in meta‐analysis. Stat Med. 2007;26(1):37‐52. [DOI] [PubMed] [Google Scholar]
21. Hardy RJ, Thompson SG. A likelihood approach to meta‐analysis with random effects. Stat Med. 1996;15:619‐629. [DOI] [PubMed] [Google Scholar]
22. Jackson D. Confidence intervals for the between‐study variance in random effects meta‐analysis using generalised Cochran heterogeneity statistics. Res Synth Methods. 2013;4(3):220‐229. doi: 10.1002/jrsm.1081 [DOI] [PubMed] [Google Scholar]
23. Sánchez‐Meca J, Marín‐Martínez F. Testing the significance of a common risk difference in meta‐analysis. Comput Stat Data Anal. 2000;33(3):299‐313. [Google Scholar]
24. R Core Team . R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; 2016. [Google Scholar]
25. Kulinskaya E, Hoaglin DC. R programs for estimation of heterogeneity variance $τ^{2}$ for log‐odds‐ratio using the generalised Q statistic with constant and inverse variance weights. OSF. 2022. https://osf.io/5n3vd
26. Kulinskaya E, Hoaglin DC. Simulations for estimation of heterogeneity variance $τ^{2}$ in constant and inverse variance weights meta‐analysis of log‐odds‐ratios. 2022. arXiv:2208.00707v1 [stat.ME]. doi: 10.48550/arxiv.2208.00707 [DOI]
27. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Boca Raton, Florida: Chapman & Hall/CRC. 1993. [Google Scholar]
28. Stead L, Buitrago D, Preciado N, Sanchez G, Hartmann‐Boyce J, Lancaster T. Physician advice for smoking cessation. Cochrane Database Syst Rev. 2013. Art. No. CD000165. [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Jackson D, Bowden J. Confidence intervals for the between‐study variance in random‐effects meta‐analysis using generalised heterogeneity statistics: should we use unequal tails? BMC Med Res Methodol. 2016;16:116. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Bakbergenuly I, Kulinskaya E. Meta‐analysis of binary outcomes via generalized linear mixed models: a simulation study. BMC Med Res Methodol. 2018;18:70. doi: 10.1186/s12874-018-0531-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1: Supporting Information.

JRSM-14-671-s001.pdf^{(310.8KB, pdf)}

Data Availability Statement

[jrsm1647-bib-0001] 1. Cochran WG. The combination of estimates from different experiments. Biometrics. 1954;10(1):101‐129. [Google Scholar]

[jrsm1647-bib-0002] 2. DerSimonian R, Kacker R. Random‐effects model for meta‐analysis of clinical trials: an update. Contemp Clin Trials. 2007;28(2):105‐114. [DOI] [PubMed] [Google Scholar]

[jrsm1647-bib-0003] 3. Bakbergenuly I, Hoaglin DC, Kulinskaya E. Estimation in meta‐analyses of mean difference and standardized mean difference. Stat Med. 2020;39(2):171‐191. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jrsm1647-bib-0004] 4. Bakbergenuly I, Hoaglin DC, Kulinskaya E. Methods for estimating between‐study variance and overall effect in meta‐analysis of odds ratios. Res Synth Methods. 2020;11(3):426‐442. doi: 10.1002/jrsm.1404 [DOI] [PubMed] [Google Scholar]

[jrsm1647-bib-0005] 5. Kulinskaya E, Hoaglin DC, Bakbergenuly I, Newman J. A Q statistic with constant weights for assessing heterogeneity in meta‐analysis. Res Synth Methods. 2021;12:711‐730. doi: 10.1002/jrsm.1491 [DOI] [PubMed] [Google Scholar]

[jrsm1647-bib-0006] 6. Kulinskaya E, Hoaglin DC. Simulations for the Q statistic with constant and inverse variance weights for binary effect measures. 2022. arXiv:2206.08907v1 [statME].

[jrsm1647-bib-0007] 7. Viechtbauer W. Conducting meta‐analyses in R with the metafor package. J Stat Softw. 2010;36:3 https://www.metafor-project.org. [Google Scholar]

[jrsm1647-bib-0008] 8. Gart JJ, Pettigrew HM, Thomas DG. The effect of bias, variance estimation, skewness and kurtosis of the empirical logit on weighted least squares analyses. Biometrika. 1985;72(1):179‐190. [Google Scholar]

[jrsm1647-bib-0009] 9. DerSimonian R, Laird N. Meta‐analysis in clinical trials. Control Clin Trials. 1986;7(3):177‐188. [DOI] [PubMed] [Google Scholar]

[jrsm1647-bib-0010] 10. Kulinskaya E, Dollinger MB. An accurate test for homogeneity of odds ratios based on Cochran's Q‐statistic. BMC Med Res Methodol. 2015;15(1):49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jrsm1647-bib-0011] 11. Kulinskaya E, Hoaglin DC, Bakbergenuly I. Exploring consequences of simulation design for apparent performance of methods of meta‐analysis. Stat Methods Med Res. 2021;30(7):1667‐1690. PMID: 34110941. doi: 10.1177/09622802211013065 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jrsm1647-bib-0012] 12. Farebrother RW, Algorithm AS 204: the distribution of a positive linear combination of χ ² random variables. J R Stat Soc Ser C. 1984;33(3):332‐339. [Google Scholar]

[jrsm1647-bib-0013] 13. Kulinskaya E, Hoaglin DC. On the Q statistic with constant weights in meta‐analysis of binary outcomes. 2022. Available at Research Square. doi: 10.21203/rs.3.rs-2121915/v1 [DOI] [PMC free article] [PubMed]

[jrsm1647-bib-0014] 14. Viechtbauer W. Hypothesis tests for population heterogeneity in meta‐analysis. Br J Math Stat Psychol. 2007;60:29‐60. [DOI] [PubMed] [Google Scholar]

[jrsm1647-bib-0015] 15. Bakbergenuly I, Hoaglin DC, Kulinskaya E. On the Q statistic with constant weights for standardized mean difference. Br J Math Stat Psychol. 2021;75:444‐465. doi: 10.1111/bmsp.12263 [DOI] [PubMed] [Google Scholar]

[jrsm1647-bib-0016] 16. Brown GW. On small‐sample estimation. Ann Math Stat. 1947;18:582‐585. [Google Scholar]

[jrsm1647-bib-0017] 17. Viechtbauer W. Median‐unbiased estimators for the amount of heterogeneity in meta‐analysis. Paper presented at: 9th European Congress of Methodology. European Association of Methodology. 2021.

[jrsm1647-bib-0018] 18. Mandel J, Paule RC. Interlaboratory evaluation of a material with unequal numbers of replicates. Anal Chem. 1970;42(11):1194‐1197. [Google Scholar]

[jrsm1647-bib-0019] 19. Veroniki AA, Jackson D, Viechtbauer W, et al. Methods to estimate the between‐study variance and its uncertainty in meta‐analysis. Res Synth Methods. 2016;7:55‐79. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jrsm1647-bib-0020] 20. Viechtbauer W. Confidence intervals for the amount of heterogeneity in meta‐analysis. Stat Med. 2007;26(1):37‐52. [DOI] [PubMed] [Google Scholar]

[jrsm1647-bib-0021] 21. Hardy RJ, Thompson SG. A likelihood approach to meta‐analysis with random effects. Stat Med. 1996;15:619‐629. [DOI] [PubMed] [Google Scholar]

[jrsm1647-bib-0022] 22. Jackson D. Confidence intervals for the between‐study variance in random effects meta‐analysis using generalised Cochran heterogeneity statistics. Res Synth Methods. 2013;4(3):220‐229. doi: 10.1002/jrsm.1081 [DOI] [PubMed] [Google Scholar]

[jrsm1647-bib-0023] 23. Sánchez‐Meca J, Marín‐Martínez F. Testing the significance of a common risk difference in meta‐analysis. Comput Stat Data Anal. 2000;33(3):299‐313. [Google Scholar]

[jrsm1647-bib-0024] 24. R Core Team . R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; 2016. [Google Scholar]

[jrsm1647-bib-0025] 25. Kulinskaya E, Hoaglin DC. R programs for estimation of heterogeneity variance $τ^{2}$ for log‐odds‐ratio using the generalised Q statistic with constant and inverse variance weights. OSF. 2022. https://osf.io/5n3vd

[jrsm1647-bib-0026] 26. Kulinskaya E, Hoaglin DC. Simulations for estimation of heterogeneity variance $τ^{2}$ in constant and inverse variance weights meta‐analysis of log‐odds‐ratios. 2022. arXiv:2208.00707v1 [stat.ME]. doi: 10.48550/arxiv.2208.00707 [DOI]

[jrsm1647-bib-0027] 27. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Boca Raton, Florida: Chapman & Hall/CRC. 1993. [Google Scholar]

[jrsm1647-bib-0028] 28. Stead L, Buitrago D, Preciado N, Sanchez G, Hartmann‐Boyce J, Lancaster T. Physician advice for smoking cessation. Cochrane Database Syst Rev. 2013. Art. No. CD000165. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jrsm1647-bib-0029] 29. Jackson D, Bowden J. Confidence intervals for the between‐study variance in random‐effects meta‐analysis using generalised heterogeneity statistics: should we use unequal tails? BMC Med Res Methodol. 2016;16:116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jrsm1647-bib-0030] 30. Bakbergenuly I, Kulinskaya E. Meta‐analysis of binary outcomes via generalized linear mixed models: a simulation study. BMC Med Res Methodol. 2018;18:70. doi: 10.1186/s12874-018-0531-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Estimation of heterogeneity variance based on a generalized Q statistic in meta‐analysis of log‐odds‐ratio

Elena Kulinskaya

David C Hoaglin

Abstract

Highlights.

What is already known

What is new

Potential impact for RSM readers outside the authors' field

1. INTRODUCTION

2. STUDY‐LEVEL ESTIMATION OF LOG‐ODDS‐RATIO

3. RANDOM‐EFFECTS MODEL AND THE Q STATISTIC.

4. RANDOM‐EFFECTS META‐ANALYSIS OF LOG‐ODDS‐RATIO

5. APPROXIMATIONS TO THE DISTRIBUTIONS OF QF AND QIV

6. POINT AND INTERVAL ESTIMATORS OF τ2 FOR LOR

6.1. Point estimators

TABLE 1.

6.2. Interval estimators

7. SIMULATION DESIGN

TABLE 2.

8. SIMULATION RESULTS

8.1. Bias in point estimation of τ2 for LOR

FIGURE 1.

8.2. Median bias of estimators of τ2 for LOR

FIGURE 2.

8.3. Coverage of interval estimators of τ2 for LOR

FIGURE 3.

8.4. Left and right coverage error

FIGURE 4.

FIGURE 5.

9. EXAMPLE: SMOKING CESSATION

TABLE 3.

TABLE 4.

FIGURE 6.

FIGURE 7.

10. DISCUSSION

AUTHOR CONTRIBUTIONS

CONFLICT OF INTEREST STATEMENT

Supporting information

ACKNOWLEDGMENTS

DATA AVAILABILITY STATEMENT

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3. RANDOM‐EFFECTS MODEL AND THE $Q$ STATISTIC.

5. APPROXIMATIONS TO THE DISTRIBUTIONS OF $Q_{F}$ AND $Q_{IV}$

6. POINT AND INTERVAL ESTIMATORS OF $τ^{2}$ FOR LOR

8.1. Bias in point estimation of $τ^{2}$ for LOR

8.2. Median bias of estimators of $τ^{2}$ for LOR

8.3. Coverage of interval estimators of $τ^{2}$ for LOR