Evaluation of various estimators for standardized mean difference in meta-analysis

Lifeng Lin; Ariel M Aloe

doi:10.1002/sim.8781

. Author manuscript; available in PMC: 2022 Jan 30.

Published in final edited form as: Stat Med. 2020 Nov 12;40(2):403–426. doi: 10.1002/sim.8781

Evaluation of various estimators for standardized mean difference in meta-analysis

Lifeng Lin ^1,^*, Ariel M Aloe ²

PMCID: PMC7770064 NIHMSID: NIHMS1633849 PMID: 33180373

Summary

Meta-analyses of a treatment’s effect compared to a control frequently calculate the meta-effect from standardized mean differences (SMDs). SMDs are usually estimated by Cohen’s d or Hedges’ g. Cohen’s d divides the difference between sample means of a continuous response by the pooled standard deviation, but is subject to non-negligible bias for small sample sizes. Hedges’ g removes this bias with a correction factor. The current literature (including meta-analysis books and software packages) is confusingly inconsistent about methods for synthesizing SMDs, potentially making reproducibility a problem. Using conventional methods, the variance estimate of SMD is associated with the point estimate of SMD, so Hedges’ g is not guaranteed to be unbiased in meta-analyses. This article comprehensively reviews and evaluates available methods for synthesizing SMDs. Their performance is compared using extensive simulation studies and analyses of actual datasets. We find that because of the intrinsic association between point estimates and standard errors, the usual version of Hedges’ g can result in more biased meta-estimation than Cohen’s d. We recommend using average-adjusted variance estimators to obtain an unbiased meta-estimate, and the Hartung–Knapp–Sidik–Jonkman method for accurate estimation of its confidence interval.

Keywords: bias, Cohen’s d, confidence interval, Hedges’ g, meta-analysis, standardized mean difference

1 |. INTRODUCTION

Meta-analysis is a statistical tool for combining sources of evidence from multiple related but independent studies. It was introduced decades ago and is now increasingly used across a wide range of disciplines, especially in medical sciences.^1,2,3 Despite the rapid development of meta-analysis methods and their increasing use to address real-world questions, the practices of meta-analyses often overlook fundamental problems, including conventional assumptions about within-study variances and normality,^4,5 as well as discrepancies in statistical methods used to perform the analyses.⁶ These practices may lead to biased and conflicting results from different meta-analyses on the same topic.⁷ Specifically, most meta-analyses treat the sample within-study variance as the true variance; this assumption leads to misunderstanding about properties of the commonly-used Q test for heterogeneity⁸ and, more importantly, considerable biases in meta-estimates when sample sizes are small.⁹ This article focuses on problems in the meta-analysis of standardized mean differences (SMDs), including inconsistent notation and estimators for the SMD and biases caused by conventional methods for synthesizing SMDs. We use simulation studies and analyses of empirical data to compare the performance of various SMD estimators in meta-analysis.

When the outcome of a study is a continuous measure, it is common to compare the treatment and control groups using the difference between their sample means, i.e., the mean difference. Because a meta-analysis synthesizes results from all available studies, different studies commonly use different scales for the continuous measure.¹⁰ For example, in a meta-analysis of the body-mass response to an intervention, body mass may be measured in some studies as body weight and in other studies as body mass index (calculated as body weight divided by the square of body height).¹¹ In such cases, the mean difference can be transformed to a common scale by dividing by the measure’s standard deviation (SD); the resulting effect size is referred to as the SMD, which is considered to be more comparable across studies.^12,13 An SMD can be obtained readily from most published studies with continuous outcomes; for example, the SMD is proportional to the t-statistic, and p-values from t-tests are commonly reported.¹⁴ Compared to an SMD, the raw mean differences (or the SDs) may be less often present in published data, limiting their availability in systematic reviews.

Despite the flexibility offered by the SMD, several problems attend their use in practice. First, various estimators are available to calculate the SMD, including Glass’ Δ, Cohen’s d, and Hedges’ g. This article focuses on the last two estimators because they are more widely used than other estimators. Second, notation used to denote the estimator of the SMD may be inconsistent in publications in different disciplines. For example, in the landmark book of Hedges and Olkin¹⁵ (pp. 78 and 81) on statistical methods for meta-analysis, the estimator that is commonly called Hedges’ g in current medical meta-analyses was originally denoted by d, while Cohen’s d was denoted by g. This inconsistent notation may lead to misuse of statistical formulas to calculate the estimators and their variances.⁷ Third, from a practical perspective, many clinicians may find the SMD difficult to interpret,¹⁶ and understanding estimation of the SMD and its variance requires non-trivial statistical knowledge. Fourth, various methods are available to estimate the variance of the SMD, leading to different choices of weights for the studies in a meta-analysis. Consequently, practitioners may be confused about choosing a proper variance estimator or weight. Inconsistent use of these methods may impact the reproducibility of meta-analyses of SMDs,¹⁷ and may lead to substantial disagreement when a meta-analysis is validated by other researchers.¹⁸

In addition, commonly-used estimators of the SMD’s variance depend on the point estimate of the SMD, and this intrinsic association within each study may introduce considerable bias in meta-estimates.^9,19,20 This bias was noted in early work by Hedges,^21,22,23 but it is frequently neglected in recent meta-analysis practice. Approaches to combining SMDs therefore need to be clearly reviewed and examined by meta-analysts.

This article is organized as follows. Section 2 comprehensively reviews estimators of the SMD and its variance. Section 3 explains the bias in the synthesized SMD in conventional meta-analysis methods and summarizes alternative methods designed to reduce this bias. It also gives approaches to constructing a confidence interval (CI) for the synthesized SMD. Section 4 presents extensive simulation studies to compare various estimators of the SMD and its variance. Section 5 gives two examples of empirical datasets. Section 6 provides a summary of popular software packages for meta-analyses of SMDs, and Section 7 closes with a brief discussion.

2 |. STANDARDIZED MEAN DIFFERENCE

2.1 |. Notation

Consider a meta-analysis of N independent studies with continuous outcomes; each study compares the same two groups (say, control and treatment). Let n_i0 and n_i1 be the numbers of subjects in the control and treatment groups in study i, respectively. Define

m_{i} = n_{i 0} + n_{i 1} - 2; q_{i} = n_{i 0} n_{i 1} / (n_{i 0} + n_{i 1}) .

Assume the outcome measures of the subjects in each group independently follow a normal distribution; the distributions in the control and treatment groups within each study have population means μ_i0 and μ_i1, respectively, and share a common SD σ_i. Let ${\bar{y}}_{i 0}$ and $s_{i 0}^{2}$ be the sample mean and sample variance in the control group, respectively, and define ${\bar{y}}_{i 1}$ and $s_{i 1}^{2}$ analogously for the treatment group. These estimates are commonly reported in published articles and can be used to calculate the SMD.

The true value of the SMD in study i is defined as

θ_{i} = \frac{μ_{i 1} - μ_{i 0}}{σ_{i}} .

A straightforward estimator for this estimand plugs in the point estimates of the three parameters μ_i0, μ_i1, and σ_i. The common SD σ_i of the two groups is estimated as the pooled sample SD, i.e.,

s_{i P} = \sqrt{\frac{(n_{i 0} - 1) s_{i 0}^{2} + (n_{i 1} - 1) s_{i 1}^{2}}{m_{i}}} .

Thus, one estimator of the SMD is

d_{i} = \frac{{\bar{y}}_{i 1} - {\bar{y}}_{i 0}}{s_{i P}},

(1)

which is called Cohen’s d, referring to Cohen²⁴ (p. 66). This estimator can be shown to be biased; a bias-corrected estimator, called Hedges’ g, is widely used as an alternative and is described below.²¹ Sections 2.2 and 2.3 introduce properties of the two estimators in detail.

2.2 |. Cohen’s d

To examine the bias of Cohen’s d as an estimate of SMD for a single study, we first focus on its distribution. Although conventional meta-analysis methods model the SMDs as normal random variables, Cohen’s d actually follows a t-distribution (after multiplying by a constant $\sqrt{q_{i}}$ ), which may or may not be a central t, depending on the true SMD. Specifically, consider $\sqrt{q_{i}} d_{i} = \frac{\sqrt{q_{i}} ({\bar{y}}_{i 1} - {\bar{y}}_{i 0}) / σ_{i}}{s_{i P} / σ_{i}}$ , where the numerator follows $N (\sqrt{q_{i}} θ_{i}, 1)$ and thus can be written as $Z_{i} + \sqrt{q_{i}} θ_{i}$ , for Z_i a standard normal random variable with mean 0 and variance 1. Note that $(n_{i 0} - 1) s_{i 0}^{2} / σ_{i}^{2}$ and $(n_{i 1} - 1) s_{i 1}^{2} / σ_{i}^{2}$ follow χ² distributions with n_i0 − 1 and n_i1 − 1 degrees of freedom, respectively. They are mutually independent because they are based on different subjects in the control and treatment groups; they are also independent of the sample means ${\bar{y}}_{i 0}$ and ${\bar{y}}_{i 1}$ (see, e.g., Theorem 5.3.1 in Casella and Berger²⁵). The denominator of $\sqrt{q_{i}} d_{i}$ , s_iP/σ_i, can be written as $\sqrt{V_{i} / m_{i}}$ , where V_i follows a χ² distribution with m_i degrees of freedom, and it is independent of the numerator. Consequently, $\sqrt{q_{i}} d_{i} = \frac{Z_{i} + \sqrt{q_{i}} θ_{i}}{\sqrt{V_{i} / m_{i}}}$ , so it follows a t-distribution with m_i degrees of freedom and noncentrality parameter $\sqrt{q_{i}} θ_{i}$ .

Using properties of the noncentral t-distribution, the expectation and variance of Cohen’s d are (Hedges and Olkin,¹⁵ p. 104):

E [d_{i}] = θ_{i} / J (m_{i});

(2)

Var (d_{i}) = \frac{m_{i}}{m_{i} - 2} (\frac{1}{n_{i 0}} + \frac{1}{n_{i 1}}) + θ_{i}^{2} {\frac{m_{i}}{m_{i} - 2} - \frac{1}{{[J (m_{i})]}^{2}}},

(3)

where $J (m_{i}) = \frac{Γ (m_{i} / 2)}{\sqrt{m_{i} / 2} Γ ((m_{i} - 1) / 2)}$ and Γ(·) is the gamma function. The expectation in Equation (2) is valid for m_i > 1 and the variance in Equation (3) is valid for m_i > 2; otherwise, they do not exist. Equation (2) indicates that the bias of Cohen’s d is [1/J(m_i) − 1]θ_i. Section A1 in the Supplemental Material shows that J(m_i) increases in m_i, so the bias converges to 0 as the total sample size increases. Cohen’s d is unbiased if the true SMD θ_i is 0, but the bias increases (in absolute magnitude) as the true value θ_i departs from 0.

In practice, J(m_i) is frequently approximated as $\tilde{J} (m_{i}) = 1 - \frac{3}{4 m_{i} - 1}$ . Figure 1A compares these two forms. Their values are nearly identical for m_i > 5, and their difference at m_i = 3 is less than 0.004, so the approximation is considered excellent. Using the approximation $\tilde{J} (m_{i})$ became a convention because of limited computational capacity, but the exact form J(m_i) can now be calculated accurately and quickly using various software; Section 6 gives more details. To avoid any inaccuracy caused by the approximation, throughout this article we use the exact form J(m_i) in analyses regarding properties of the SMDs.

(A) The exact form of the correction coefficient J(m_i) and its approximate form $\tilde{J} (m_{i})$ ; and (B) the coefficients of $g_{i}^{2}$ in the variance estimators of Hedges’ g in Equations (10) and (14) for m_i = 3, 4, …, 30.

A meta-analysis synthesizes the SMDs by using their within-study variances to determine the studies’ weights. The exact variance in Equation (3) cannot be used directly, because it contains the unknown true SMD θ_i. Various estimators are available for the variance of Cohen’s d. For example, an unbiased estimator is²⁶

s_{d i}^{2} = \frac{1}{{[J (m_{i})]}^{2}} (\frac{1}{n_{i 0}} + \frac{1}{n_{i 1}}) + d_{i}^{2} {1 - \frac{m_{i} - 2}{m_{i} {[J (m_{i})]}^{2}}} .

(4)

This unbiased estimator is seldom used in meta-analysis applications. Instead, the variance is commonly approximated using large-sample properties, while various meta-analysis guidebooks suggest different approximations. For example, Egger et al.²⁷ (p. 290) and Hartung et al.²⁸ (p. 15) suggested

s_{d i}^{2} = \frac{1}{n_{i 0}} + \frac{1}{n_{i 1}} + \frac{d_{i}^{2}}{2 (n_{i 0} + n_{i 1} - 2)},

(5)

while Cooper et al.²⁹ (p. 226), Borenstein et al.³⁰ (p. 27), and Stangl and Berry³¹ (p. 37) suggested

s_{d i}^{2} = \frac{1}{n_{i 0}} + \frac{1}{n_{i 1}} + \frac{d_{i}^{2}}{2 (n_{i 0} + n_{i 1})} .

(6)

Essentially, the first two terms of Equations (5) and (6), $\frac{1}{n_{i 0}} + \frac{1}{n_{i 1}}$ , reflect the variance of the numerator of Cohen’s d, and the third term, a function of $d_{i}^{2}$ , reflects the variance of the denominator of Cohen’s d. The two estimators have slightly different denominators due to different assumptions used for their large-sample approximation (see p. 28 of Borenstein et al.³⁰).

Sections A2–A4 in the Supplemental Material give derivations of Equations (2)–(6).

2.3 |. Hedges’ g

Because Cohen’s d is biased when the true SMD θ_i ≠ 0, Hedges²¹ proposed a bias-corrected estimator

g_{i} = J (m_{i}) d_{i},

(7)

multiplying Cohen’s d by J(m_i). This estimator is called Hedges’ g. Equations (2) and (3) give the expectation and variance of Hedges’ g as

E [g_{i}] = θ_{i};

(8)

Var (g_{i}) = \frac{m_{i} {[J (m_{i})]}^{2}}{m_{i} - 2} (\frac{1}{n_{i 0}} + \frac{1}{n_{i 1}}) + θ_{i}^{2} {\frac{m_{i} {[J (m_{i})]}^{2}}{m_{i} - 2} - 1} .

(9)

Equation (8) shows Hedges’ g is unbiased.

Based on the unbiased estimator of the variance of Cohen’s d in Equation (4), note that $E [s_{d i}^{2}] = Var (d_{i})$ implies $E {{[J (m_{i})]}^{2} s_{d i}^{2}} = Var (J (m_{i}) d_{i}) = Var (g_{i})$ , so $s_{g i}^{2} = {[J (m_{i})]}^{2} s_{d i}^{2}$ is an unbiased estimator of the variance of Hedges’ g. Specifically, this estimator is²⁶

s_{g i}^{2} = \frac{1}{n_{i 0}} + \frac{1}{n_{i 1}} + g_{i}^{2} {1 - \frac{m_{i} - 2}{m_{i} {[J (m_{i})]}^{2}}} .

(10)

As for Cohen’s d, meta-analysts seldom use this unbiased estimator in practice; instead, they usually use the large-sample approximation of the variance of Cohen’s d in Equation (5) or (6) to estimate the variance of Hedges’ g:

s_{g i}^{2} = {[J (m_{i})]}^{2} [\frac{1}{n_{i 0}} + \frac{1}{n_{i 1}} + \frac{d_{i}^{2}}{2 (n_{i 0} + n_{i 1} - 2)}] = {[J (m_{i})]}^{2} (\frac{1}{n_{i 0}} + \frac{1}{n_{i 1}}) + \frac{g_{i}^{2}}{2 (n_{i 0} + n_{i 1} - 2)},

(11)

s_{g i}^{2} = {[J (m_{i})]}^{2} [\frac{1}{n_{i 0}} + \frac{1}{n_{i 1}} + \frac{d_{i}^{2}}{2 (n_{i 0} + n_{i 1})}] = {[J (m_{i})]}^{2} (\frac{1}{n_{i 0}} + \frac{1}{n_{i 1}}) + \frac{g_{i}^{2}}{2 (n_{i 0} + n_{i 1})} .

(12)

Besides the above estimators, because J(m_i) converges to 1 when the sample sizes are large, Hedges and Olkin¹⁵ (p. 86) suggested estimating the variance using

s_{g i}^{2} = \frac{1}{n_{i 0}} + \frac{1}{n_{i 1}} + \frac{g_{i}^{2}}{2 (n_{i 0} + n_{i 1})};

(13)

Koricheva et al.³² (p. 63), Pigott³³ (p. 10), and Lipsey and Wilson³⁴ (p. 49) also used this variance estimator.

Some books (e.g., p. 25 in Schwarzer et al.³⁵ and p. 212 in Schlattmann³⁶) used another variance estimator:

s_{g i}^{2} = \frac{1}{n_{i 0}} + \frac{1}{n_{i 1}} + \frac{g_{i}^{2}}{2 (n_{i 0} + n_{i 1} - 3.94)} .

(14)

This likely originated from Equation (8) in the seminal work of Hedges and Olkin¹⁵ (p. 80); however, readers should be aware that the original equation contained the true SMD, and Hedges and Olkin¹⁵ did not directly suggest replacing the true value with the estimated SMD to obtain the above Equation (14). More importantly, the symbol g used by Hedges and Olkin¹⁵ actually refers to Cohen’s d, while the symbol d in that book refers to Hedges’ g; see Section 2.5 for more discussion. Therefore, the rationale of the variance estimator of Hedges’ g in Equation (14) has never been well justified.

Figure 1B compares the coefficient of $g_{i}^{2}$ in Equation (14), $\frac{1}{2 (n_{i 0} + n_{i 1} - 3.94)} = \frac{1}{2 (m_{i} - 1.94)}$ , with that in the unbiased estimator in Equation (10), $1 - \frac{m_{i} - 2}{m_{j} {[J (m_{i})]}^{2}}$ . It shows that the two coefficients are very similar when m_i ≥ 5, so the estimator in Equation (14) may perform similarly to the unbiased estimator in Equation (10).

2.4 |. Alternative estimators of standardized mean difference

Besides Cohen’s d and Hedges’ g, alternative estimators for SMD are available. For example, Glass³⁷ and Glass et al.³⁸ (p. 29) suggested using the sample variance in the control group to standardize the mean difference. This estimator is called Glass’ Δ. Specifically,

Δ_{i} = \frac{{\bar{y}}_{i 1} - {\bar{y}}_{i 0}}{s_{i 0}},

(15)

where s_i0 is the sample SD in study i’s control group. The rationale for Glass’ Δ is that the control group represents a general population, and it may be preferred if the true standard deviations in the two groups differ greatly (e.g., based on Levene’s test).³⁹

Cohen’s d in Equation (1) uses the unbiased pooled variance $s_{i P}^{2}$ . The maximum-likelihood variance estimator $s_{i P}^{⋆ 2} = \frac{(n_{i 0} - 1) s_{i 0}^{2} + (n_{i 1} - 1) s_{i 1}^{2}}{n_{i 0} + n_{i 1}} = \frac{n_{i 0} + n_{i 1} - 2}{n_{i 0} + n_{i 1}} s_{i P}^{2}$ may alternatively be used to estimate the SMD (Hedges and Olkin,¹⁵ pp. 81 and 82). We denote this estimator by

d_{i}^{'} = \frac{{\bar{y}}_{i 1} - {\bar{y}}_{i 0}}{s_{i P}^{⋆}} .

(16)

Hedges and Olkin¹⁵ (p. 82) also introduced a shrunken estimator with smaller mean squared error (MSE) than Cohen’s d and Hedges’ g. Cohen⁴⁰ (p. 44) gave an SMD estimator that uses the average of the variances from the treatment and control groups, without assuming homoscedasticity. Because Cohen’s d and Hedges’ g are applied much more commonly than other estimators in practice, the analyses in this article focus on these two estimators.

2.5 |. Inconsistent notation in different publications

Most meta-analysis books introduce only some of the above SMD estimators, and their notation is frequently inconsistent. Table 1 summarizes notation for four SMD estimators in 13 books. Rigorous meta-analysis methods were developed in educational, psychological, and ecological sciences in the 1980s; the estimator in Equation (7), which we call Hedges’ g, is often called d in these disciplines (e.g., Hedges and Olkin¹⁵ and Koricheva et al.³²). In the medical and epidemiological literature, meta-analysis was extensively debated and began to thrive in 1990s,^{41,42,43,44,45,46} and researchers prefer the name Hedges’ g for the estimator in Equation (7) (e.g., Egger et al.²⁷). Consequently, when referring to books from different disciplines, meta-analysts should be aware of the definitions of the estimators and apply the corresponding formulas with great caution.

TABLE 1.

Notation for estimators of the standardized mean difference in 13 meta-analysis books (sorted by publication years).

Book	Notation
Book	Glass’ Δ in Equation (15)	Cohen’s d in Equation (1)	Cohen’s d′ in Equation (16)	Hedges’ g in Equation (7)
Hedges and Olkin¹⁵ (1985)	g′	g	$\hat{δ}$	d
Hunter and Schmidt⁸¹ (1990)	d_G	d	NA	d^⋆
Stangl and Berry³¹ (2000)	NA	z	NA	NA
Egger et al.²⁷ (2001)	Δ	d	NA	g
Lipsey and Wilson³⁴ (2001)	NA	ES_sm	NA	$E S_{s m}^{'}$
Hartung et al.²⁸ (2008)	Δ	g	d	g^⋆
Borenstein et al.³⁰ (2009)	NA	d	NA	g
Cooper et al.²⁹ (2009)	NA	d	NA	g
Schlattmann³⁶ (2009)	NA	d	NA	g
Card⁸² (2012)	g_Glass	g	d	g_adjusted
Pigott³³ (2012)	NA	NA	NA	d
Koricheva et al.³² (2013)	NA	NA	NA	d
Schwarzer et al.³⁵ (2015)	NA	NA	NA	g

Open in a new tab

Note: NA, not available.

3 |. META-ANALYSIS OF STANDARDIZED MEAN DIFFERENCES

3.1 |. Conventional methods

A meta-analysis synthesizes SMDs from the N studies based on their summary data, i.e., ${(y_{i}, s_{i}^{2})}_{i = 1}^{N}$ , where y_i = d_i and $s_{i}^{2} = s_{d i}^{2}$ for Cohen’s d or y_i = g_i and $s_{i}^{2} = s_{g i}^{2}$ for Hedges’ g. The estimated within-study variances $s_{i}^{2}$ are treated as if they are the true values, but this assumption is valid only for sufficiently large sample sizes. The synthesized SMD is $\hat{θ} = \frac{\sum_{i = 1}^{N} w_{i} y_{i}}{\sum_{i = 1}^{N} w_{i}}$ , where the study-specific weights are $w_{i} = 1 / s_{i}^{2}$ in the common-effect model and $w_{i} = 1 / (s_{i}^{2} + {\hat{τ}}^{2})$ in the random-effects model.⁴⁷ Here, ${\hat{τ}}^{2}$ is the estimate of the between-study variance τ² under the random-effects model. The most widely-used estimator of τ² was a method-of-moments estimator proposed by DerSimonian and Laird.⁴⁸ Several alternatives have since been shown to outperform it, such as the restricted maximum-likelihood (REML) estimator and the Paule–Mandel estimator.^{49,50,51,52,53,54,55}

Note that the conventional variance estimators $s_{i}^{2}$ reviewed in Section 2 can be expressed as $s_{i}^{2} = a_{i} + b_{i} y_{i}^{2}$ for either Cohen’s d or Hedges’ g, where a_i and b_i are constants (i.e., functions of the sample sizes n_i0 and n_i1) in each study. For example, $a_{i} = \frac{1}{n_{0}} + \frac{1}{n_{i 1}}$ and $b_{i} = 1 - \frac{m_{i} - 2}{m_{i} {[J (m_{i})]}^{2}}$ in Equation (10). Because the conventional variance estimators $s_{i}^{2}$ always include the term $y_{i}^{2}$ , the SMD estimates y_i and their weights w_i are associated within studies for both Cohen’s d and Hedges’ g. Without treating $s_{i}^{2}$ as fixed values as in conventional meta-analysis methods, it is invalid to simply decompose the expectation of the synthesized SMD $\hat{θ}$ as $E [\hat{θ}] = \frac{\sum_{i = 1}^{N} w_{i} E [y_{i}]}{\sum_{i = 1}^{N} w_{i}}$ . For example, consider the common-effect setting where the true SMDs θ_i in all studies equal a common value θ. The expectation of the synthesized SMD is

E [\hat{θ}] = \int_{ℝ^{N}} \frac{\sum_{i = 1}^{N} y_{i} / (a_{i} + b_{i} y_{i}^{2})}{\sum_{i = 1}^{N} 1 / (a_{i} + b_{i} y_{i}^{2})} \prod_{i = 1}^{N} f_{i} (y_{i}; m_{i}, \sqrt{q_{i}} θ) d y_{1} d y_{2} \dots d y_{N},

where $f_{i} (y_{i}; m_{i}, \sqrt{q_{i}} θ)$ is the probability density function (pdf) of y_i (Cohen’s d or Hedges’ g) in study i. Recall that $\sqrt{q_{i}} d_{i}$ (equivalently, $\frac{\sqrt{q_{i}}}{J (m_{i})} g_{i}$ ) follows a (noncentral) t-distribution with m_i degrees of freedom and noncentrality parameter, with pdf denoted by $t (\cdot; m_{i}, \sqrt{q_{i}} θ)$ . The pdf of the SMD estimate y_i is then

f_{i} (y_{i}; m_{i}, \sqrt{q_{i}} θ) = {\begin{array}{l} \sqrt{q_{i}} t (\sqrt{q_{i}} y_{i}; m_{i}, \sqrt{q_{i}} θ) & for Cohen ’ s d; \\ \frac{\sqrt{q_{i}}}{J (m_{i})} t (\sqrt{q_{i}} y_{i} / J (m_{i}); m_{i}, \sqrt{q_{i}} θ) & for Hedges ’ g . \end{array}

The exact expectation of $\hat{θ}$ cannot be derived. Even if each study-specific random variable follows central t-distributions with θ = 0, it is still very challenging to derive the distribution of their linear combination.⁵⁶

3.2 |. Illustration of asymptotic bias produced by conventional methods

Because y_i is associated with $s_{i}^{2}$ (equivalently, with the weights w_i), an estimator (e.g., Hedges’ g) that is unbiased at the individual-study level may not give a synthesized estimator that is unbiased. To briefly illustrate this problem, we focus on the common-effect setting with θ_i = θ as in Section 3.1, and assume that each study has the same sample sizes n_i0 = n₀ and n_i1 = n₁ in the control and treatment groups. Then the study-specific SMD estimates y_i are independent and identically distributed (iid). The constants a_i and b_i in the variance estimators $s_{i}^{2}$ depend only on n_i0 and n_i1, so they are the same in all studies and can be denoted by a_i = a and b_i = b. Similarly, m_i = m = n₀ + n₁ −1 and q_i = q = n₀n₁/(n₀ + n₁). We consider the asymptotic case in which the number of studies N → ∞. Although these settings are not useful in practice, they simplify expressions and suffice for our illustrative purpose.

Under the above assumptions,

\hat{θ} = \frac{N^{- 1} \sum_{i = 1}^{N} y_{i} / (a + b y_{i}^{2})}{N^{- 1} \sum_{i = 1}^{N} 1 / (a + b y_{i}^{2})},

where the numerator and denominator converge to $E [y_{i} / (a + b y_{i}^{2})]$ and $E [1 / (a + b y_{i}^{2})]$ in probability, respectively, by the law of large numbers. Consequently, by the continuous mapping theorem,⁵⁷

\hat{θ} \overset{p}{\to} \frac{E [y_{i} / (a + b y_{i}^{2})]}{E [1 / (a + b y_{i}^{2})]},

(17)

where $\overset{p}{\to}$ denotes the convergence in probability.

If the true SMD θ = 0, then y_i multiplied by a constant c follows a central t-distribution with m degrees of freedom. Here, $c = \sqrt{q}$ for Cohen’s d and $c = \sqrt{q} / J (m)$ for Hedges’ g. Let t_i = cy_i be N such t-distributed random variables. Because a > 0 and b > 0,

E [| y_{i} / (a + b y_{i}^{2}) |] < E [| y_{i} | / a] = E [| t_{i} |] / (a c) .

Applying properties of the central t-distribution, E[|t_i|] < ∞ if the degrees of freedom m > 1, which is generally true in practice; thus, $E [y_{i} / (a + b y_{i}^{2})]$ exists. Because the pdf of the central t-distribution is symmetric around 0, i.e., t(x; m, 0) = t(−x; m, 0), we have

E [y_{i} / (a + b y_{i}^{2})] = E [c t_{i} / (c^{2} a + b t_{i}^{2})] = \int_{- \infty}^{0} \frac{c x}{c^{2} a + b x^{2}} t (x; m, 0) d x + \int_{0}^{\infty} \frac{c x}{c^{2} a + b x^{2}} t (x; m, 0) d x = 0.

Consequently, $\hat{θ} \overset{p}{\to} 0$ , so the synthesized SMD is asymptotically unbiased when the true SMD θ = 0.

The argument above does not generalize to the situation of θ ≠ 0, where the random variables t_i have noncentral t-distributions. It is difficult to theoretically calculate the asymptotic bias by solving the integrals in Equation (17)’s expectations. Instead, Monte Carlo methods can be used to numerically approximate those expectations and thus derive the asymptotic bias. Figure 2 presents the (approximated) asymptotic biases of the synthesized SMD $\hat{θ}$ produced by the conventional common-effect meta-analysis using Cohen’s d and Hedges’ g with different variance estimators, including those in Equations (4)–(6) and (10)–(14). These biases are approximated by sampling 10⁷ iid random variables to approximate the expectations in Equation (17). In the panels of Figure 2, the sample size ratio r = n₀/n₁ is 1 or 2, and the true SMD θ is ±0.2, ±0.5, or ±0.8, representing different treatment effects (Cohen,⁴⁰ pp. 25–27).

Asymptotic bias of the synthesized standardized mean difference using conventional variance estimators under the common-effect setting. The sample size ratio r is 1 (upper panels) or 2 (lower panels), and the true standardized mean difference θ changes from ±0.2 (left panels), to ±0.5 (middle panels), and to ±0.8 (right panels). The filled points represent the results of Cohen’s d, and the unfilled points represent those of Hedges’ g.

Figure 2 implies that both Cohen’s d and Hedges’ g are biased asymptotically for all variance estimators introduced in Sections 2.2 and 2.3, but the magnitude of bias differs for different estimators. Although Hedges’ g is an unbiased estimator of the SMD within each individual study, the synthesized Hedges’ g is noticeably more biased than the synthesized Cohen’s d. Generally, in meta-analyses Hedges’ g with the variance estimator in Equation (14) produces the largest bias in absolute magnitude, while Cohen’s d with the variance estimator in Equation (6) produces the smallest bias. A larger true SMD leads to a larger bias in absolute magnitude. As the total sample size in each study increases, the bias converges to 0. For θ = ±0.8, the asymptotic bias can be influential (around 0.2) when the total sample size is very small (e.g., 6), and it still reaches as high as 0.03 for a moderate total sample size of 30. The sample size ratio r (varying from 1 to 2) has little impact on trends in the asymptotic bias.

In the conventional random-effects meta-analysis, the estimated between-study variance ${\hat{τ}}^{2}$ depends on the data from all studies and is incorporated in the studies’ weights $w_{i} = 1 / (s_{i}^{2} + {\hat{τ}}^{2})$ . As in the common-effect meta-analysis, these weights are associated with the study-specific SMD estimates y_i, so the synthesized SMD is also subject to bias.

These illustrative examples show that the rationale for using Hedges’ g with conventional variance estimators needs to be carefully examined. Although Hedges’ g is designed to reduce bias in individual studies, in meta-analysis it may not perform as researchers expect. Section 4 shows extensive simulation studies under more realistic settings to evaluate the performance of different meta-analysis methods for synthesizing SMDs. Before that, Sections 3.3 introduces several attempts to reduce bias of the synthesized SMD in meta-analyses.

3.3 |. Alternative methods for reducing bias in meta-analysis

The bias in the synthesized SMD was noted by Hedges,^22,23 but this problem is neglected in many current meta-analyses. Doncaster and Spake⁹ showed that even without bias, the variability in the variance estimate may lead to underestimation of the meta-variance and thus low coverage of the CI. To eliminate the bias in the common-effect setting, Hedges²² proposed to remove from the variance of Hedges’ g in Equation (9) the part associated with θ_i, and thus use

s_{g i}^{2} = \frac{m_{i} {[J (m_{i})]}^{2}}{m_{i} - 2} (\frac{1}{n_{i 0}} + \frac{1}{n_{i 1}})

(18)

to determine the studies’ weights. Because weights based on the above variances depend only on sample sizes and are not associated with the study-specific Hedges’ g, the synthesized Hedges’ g is guaranteed to be unbiased. However, these weights are not optimal because they ignore the reality that the within-study variances increase with effect sizes. The synthesized SMD may have a larger MSE than an estimator weighted by the more accurate variance estimators in Equations (10)–(13), especially when the true SMD departs from 0.²⁰

In the random-effects meta-analysis, Hedges²³ used the unbiased estimator of the variance of Hedges’ g in Equation (10) to derive the weights $w_{i} = 1 / (s_{g i}^{2} + {\hat{τ}}^{2})$ , but he acknowledged the bias produced by this weighting. He suggested alternative weights based on study-specific total sample sizes, i.e., w_i = n_i0 + n_i1. The resulting sample-size-weighted SMD estimate is²³

\hat{θ} = \frac{\sum_{i = 1}^{N} (n_{i 0} + n_{i 1}) g_{i}}{\sum_{i = 1}^{N} (n_{i 0} + n_{i 1})} .

(19)

This weighting helps remove the bias in the meta-estimator but it may still be inefficient.⁵⁸

Rather than using weights computed by completely removing the terms associated with θ_i in the variance estimators, we may instead consider replacing these study-specific terms with a cross-study averaged term. This approach reduces the association between the SMD estimates g_i and their estimated variances $s_{g i}^{2}$ within studies, and thus reduces the bias in the synthesized estimator of SMD. Of note, this approach is not limited to the SMD but can also apply to other effect measures including the mean difference, response ratio, and odds ratio.^9,59

Specifically, we may use the sample-size-weighted SMD estimate $\bar{g} = \hat{θ}$ in Equation (19) to calculate adjusted within-study variances as

s_{g i}^{2} (\bar{g}) = \frac{1}{n_{i 0}} + \frac{1}{n_{i 1}} + \frac{{\bar{g}}^{2}}{2 (n_{i 0} + n_{i 1})} .

(20)

Alternatively, Hedges²² suggested first calculating $\bar{g} = \frac{\sum_{i = 1}^{N} g_{i} / s_{g i}^{2}}{\sum_{i = 1}^{N} 1 / s_{g i}^{2}}$ based on the variance estimator $s_{g i}^{2}$ in Equation (13) and then plugging it into Equation (20) to better adjust the within-study variances. Hedges and Olkin¹⁵ (p. 129) suggested replacing the arithmetic mean $\bar{g} = N^{- 1} \sum_{i = 1}^{N} g_{i}$ with the study-specific true SMD θ_i in Equation (9), leading to this variance estimator:

s_{g i}^{2} (\bar{g}) = \frac{m_{i} {[J (m_{i})]}^{2}}{m_{i} - 2} (\frac{1}{n_{i 0}} + \frac{1}{n_{i 1}}) + {\bar{g}}^{2} {\frac{m_{i} {[J (m_{i})]}^{2}}{m_{i} - 2} - 1} .

(21)

Doncaster and Spake⁹ proposed a variance adjustment based on Equation (12). They replaced the study-specific squared Cohen’s d, $d_{i}^{2}$ , in Equation (12) with its average across studies, $\bar{d^{2}} = N^{- 1} \sum_{j = 1}^{N} d_{j}^{2}$ , that is,

s_{g i}^{2} (\bar{d^{2}}) = {[J (m_{i})]}^{2} [\frac{1}{n_{i 0}} + \frac{1}{n_{i 1}} + \frac{\bar{d^{2}}}{2 (n_{i 0} + n_{i 1})}] .

(22)

The foregoing approaches were originally considered for Hedges’ g and they can also be applied to synthesize Cohen’s d. However, because Cohen’s d is biased within studies, it may be of less interest to apply such bias-reduction procedures to Cohen’s d. Due to the bias at the individual-study level, the synthesized Cohen’s d is still biased even if the association between the study-specific Cohen’s d and its corresponding weight can be reduced.

3.4 |. Confidence interval for the synthesized standardized mean difference

Besides the point estimates of the synthesized SMD, several methods are available to construct a CI. Recall that the synthesized SMD is $\hat{θ} = \frac{\sum_{i = 1}^{N} w_{i} y_{i}}{\sum_{i = 1}^{N} w_{i}}$ , where y_i is Cohen’s d or Hedges’ g and w_i is the inverse of the marginal variance of y_i; $w_{i} = 1 / s_{i}^{2}$ under the common-effect setting, and $w_{i} = 1 / (s_{i}^{2} + {\hat{τ}}^{2})$ under the random-effects setting. A conventional (1 − α) × 100% CI is:

\hat{θ} \pm z_{1 - α / 2} \times {(\sum_{i = 1}^{N} w_{i})}^{- 1 / 2},

(23)

where z_1−α/2 is the 1 − α/2 quantile of the standard normal distribution. This CI is applicable under both the common- and random-effects settings. Hartung and Knapp^62,61 and Sidik and Jonkman^60,63 proposed an alternative CI for the random-effects setting:

\hat{θ} \pm t_{N - 1, 1 - α / 2} \times {\frac{\sum_{i = 1}^{N} w_{i} {(y_{i} - \hat{θ})}^{2}}{(N - 1) \sum_{i = 1}^{N} w_{i}}}^{1 / 2},

(24)

where t_{N−1,1−α/2} is the 1 − α/2 quantile of the t-distribution with N − 1 degrees of freedom.

The CI in Equation (23) corresponds to conventional meta-analysis methods, in which the variance of the synthesized SMD is calculated as $Var (\hat{θ}) = {(\sum_{i = 1}^{N} w_{i})}^{- 1}$ . The $s_{i}^{2}$ used in the weights w_i can be any study-specific variance estimator from Sections 2.2 and 2.3 or an alternative variance estimator in Equation (18), (20), or (22) in Section 3.3.

Two key assumptions are required for this CI to be valid: (i) $\hat{θ}$ follows a normal distribution; and (ii) the weights w_i are fixed, known values. Asymptotically, both assumptions are approximately true if the sample sizes in each study, n_i0 and n_i1, are sufficiently large. Strictly speaking, assumption (i), normality, can never be true because the synthesized SMD is a linear combination t-distributed random variables. Assumption (ii) may not be exactly true because the study-specific variance estimators in Sections 2.2 and 2.3 involve the point estimate of the SMD and the estimated between-study variance ${\hat{τ}}^{2}$ under the random-effects setting may be highly variable. If either assumption is seriously violated, the CI in Equation (23) may have poor coverage probability.

The CI in Equation (24) partly relaxes assumption (ii). Specifically, in the random-effects setting the weights w_i depend on ${\hat{τ}}^{2}$ , while the variation in ${\hat{τ}}^{2}$ is ignored in the conventional CI in Equation (23). The Hartung–Knapp–Sidik–Jonkman CI in Equation (24) is designed to account for variation in ${\hat{τ}}^{2}$ . This alternative CI has been shown to have better coverage probability in meta-analyses generally (i.e., not just meta-analyses of SMDs), especially when the number of studies is small.^51,64

Variation in the study-specific sample variances $s_{i}^{2}$ impacts the validity of assumption (ii). When combining Hedges’ g using the variance estimator in Equation (18) or using the sample-size-weighted estimator in Equation (19), the $s_{i}^{2}^{'} s$ depend only on sample sizes and thus can be treated validly as fixed values. The variance estimators in Equations (20)–(22) use the averaged SMD to adjust the study-specific variance; a single study’s estimate of variance may considerably exceed the averaged estimate. Assumption (ii) may be better satisfied by the techniques in Section 3.3 than when using the variance estimators in Sections 2.2 and 2.3.

Although the average-adjusted variance estimators in Section 3.3 are more suitable for assumption (ii), they may be inaccurate for calculating the variance of the synthesized SMD $\hat{θ}$ and thus constructing the CI in Equation (23), especially if the effect sizes substantially differ across studies. We may borrow the idea of Henmi and Copas,⁶⁵ in the context of publication bias, to refine the CI of $\hat{θ}$ in Equation (23). Specifically, they propose to calculate the point estimate of the overall effect size under the common-effect setting but derive its CI under the random-effects setting. Given that studies with larger sample sizes may be of higher quality and less subject to risk of bias, the common-effect point estimate may be preferable because it assigns larger weights to larger studies compared with the random-effects point estimate. On the other hand, by allowing potential heterogeneity when deriving the variance in the random-effects setting, the CI may have a better coverage probability.

When synthesizing SMDs we may use a similar idea, applying two types of variance estimators, to construct the CI. That is, use a variance estimator from Section 3.3—which is less associated with study-specific estimates and better satisfies assumption (ii)—to derive study-specific weights, and use a different variance estimator from Section 2.3—which is more accurate for a single study—to calculate the variance of the synthesized SMD. Specifically, let $s_{W g i}^{2}$ be the within-study variance estimator used to derive the weights, and $s_{V gi}^{2}$ be the estimator used to calculate the variance of the synthesized SMD. In the random-effects setting, $s_{W g i}^{2}$ is used to estimate the between-study variance and the weights are $w_{i} = 1 / (s_{W g i}^{2} + {\hat{τ}}^{2})$ ; in the common-effect setting the weights are simply $w_{i} = 1 / s_{W g i}^{2}$ . Under assumption (ii), the variance of $\hat{θ}$ is

Var (\hat{θ}) = \frac{\sum_{i = 1}^{N} w_{i}^{2} Var (y_{i})}{{(\sum_{i = 1}^{N} w_{i})}^{2}} .

Note that Var(y_i) is no longer 1/w_i as in the CI construction in Equation (23). Instead, in the common-effect setting, the study-specific variance is $Var (y_{i}) = s_{V gi}^{2}$ , so the (1 − α) × 100% CI of the synthesized SMD can be constructed as

\hat{θ} \pm z_{1 - α / 2} \cdot \frac{\sqrt{\sum_{i = 1}^{N} s_{Vgi}^{2} / s_{Wg i}^{4}}}{\sum_{i = 1}^{N} 1 / s_{Wgi}^{2}} .

(25)

In the random-effects setting, the study-specific marginal variance is estimated as $Var (y_{i}) = s_{V gi}^{2} + {\hat{τ}}^{2}$ , so the (1 − α)×100% CI is

\hat{θ} \pm z_{1 - α / 2} \cdot \frac{\sqrt{\sum_{i = 1}^{N} (s_{Vgi}^{2} + {\hat{τ}}^{2}) / {(s_{Wgi}^{2} + {\hat{τ}}^{2})}^{2}}}{\sum_{i = 1}^{N} 1 / (s_{Wgi}^{2} + {\hat{τ}}^{2})} .

(26)

4 |. SIMULATION STUDIES

4.1 |. Simulation designs

We devised simulation studies to evaluate the performance of the various methods for synthesizing SMDs. The simulation settings were in a factorial design. Specifically, the number of studies in a simulated meta-analysis was set to N = 5 or 20, the true between-study SD τ was 0.1 or 0.5, and the true overall SMD θ was 0, 0.3, or 0.8, representing no, moderate, or large treatment effects, respectively. The study-specific true SMDs were sampled as θ_i ~ N(θ, τ²). For simulated meta-analyses with N = 5 studies, the sample size within each study was n_i = {6, 8, 10, 12, 14} (small sample sizes) or n_i = {30, 40, 50, 60, 70} (moderately large samples sizes). The treatment allocation ratio was 1:1; thus, each study’s control and treatment groups had n_i0 = n_i1 = n_i/2 individuals. For simulated meta-analyses with N = 20 studies, each of these 5 sample sizes was repeated four times. The study-specific true SDs of the continuous outcome measures were σ_i = {1, 2, 3, 4, 5} in both treatment groups for meta-analyses with N = 5 studies; for meta-analyses with N = 20 studies, each of these 5 study-specific SDs was repeated 4 times. Without loss of generality, the true population mean of the outcome measures in each study’s control group was μ_i0 = 0, and in study i’s treatment group was μ_i1 = μ_i0 + θ_iσ_i. In study i, the outcome measures for individuals in the control group were generated as $x_{i 0 j} ~ N (μ_{i 0}, σ_{i}^{2})$ for j = 1, …, n_i0, and those in the treatment group were generated as $x_{i 1 k} ~ N (μ_{i 1}, σ_{i}^{2})$ for k = 1, …, n_i1. From these outcome measures, we obtained the sample means, ${\bar{y}}_{i 0}$ and ${\bar{y}}_{i 1}$ , and sample variances, $s_{i 0}^{2}$ and $s_{i 1}^{2}$ , in the control and treatment groups, respectively. Cohen’s d, Hedges’ g, and their estimated variances based on the various methods were then computed in each study. The above factorial design led to a total of 24 settings (resulting from 2 values of the number of studies, 2 values of the between-study variance, 3 values of the true overall SMD, and 2 sets of study-specific sample sizes). We simulated 10,000 meta-analysis datasets for each setting. We used the REML method to estimate the between-study variance; for some datasets, optimization of the restricted likelihood did not converge, leading to computational errors. The error rate depended on the simulation settings. We omitted such datasets and simulated new meta-analysis datasets until we obtained 10,000 replicates for which the REML maximization converged.

For each simulated meta-analysis, we considered Cohen’s d with variance estimators in Equations (4)–(6) and Hedges’ g with variance estimators in Equations (10)–(14) and (20)–(22), i.e., a total of 11 methods. Recall that the variance estimators in Equations (20)–(22) use an averaged SMD to reduce the association between Hedges’ g and its variance. We used the method suggested by Hedges²² to obtain the estimator in Equation (20), as detailed in Section 3.3. All meta-analyses were performed using the R package “metafor” under the random-effects setting, because heterogeneity is generally expected.⁶⁶

We calculated the point estimate specified by each method. As described in Section 3.4, we derived the 95% CI for each method using the conventional interval in Equation (23) and the Hartung–Knapp–Sidik–Jonkman interval in Equation (24).^62,60,64 For the three methods based on the average-adjusted variance estimators in Equations (20)–(22), we also computed 95% CIs using Equation (26). The performance of each method was evaluated by the point estimate’s bias and root mean squared error (RMSE), and by the CI’s coverage probability, which were estimated as $bias = M^{- 1} \sum_{m = 1}^{M} {\hat{θ}}^{(m)} - θ$ , $RMSE = \sqrt{M^{- 1} {\sum_{m = 1}^{M} ({\hat{θ}}^{(m)} - θ)}^{2}}$ , and $CP = M^{- 1} \sum_{m = 1}^{M} I ({\hat{θ}}_{lb}^{(m)} < θ < {\hat{θ}}_{ub}^{(m)}) \times 100 %$ , respectively. Here, M = 10,000 is the number of replicates under each simulation setting, ${\hat{θ}}^{(m)}$ denotes the synthesized SMD in the m^th simulated meta-analysis, and ${\hat{θ}}_{lb}^{(m)}$ and ${\hat{θ}}_{ub}^{(m)}$ denote the lower and upper bounds of its 95% CI.

4.2 |. Simulation results

Different methods for synthesizing SMDs led to noticeably different biases, RMSEs, and coverage probabilities; as study-specific sample sizes decreased, biases tended to depart from 0 (Tables 2 and 3). The Monte Carlo standard errors of all biases were less than 0.005, those of all RMSEs were less than 0.003, and those of all coverage probabilities were less than 0.4%. Specifically, as N increased from 5 to 20, biases became generally larger in magnitude (with some exceptions) while RMSEs became smaller by about half. When N = 5, the coverage probabilities of the conventional CIs derived from Equation (23) were less than 90% for some methods, while those using the Hartung–Knapp–Sidik–Jonkman method in Equation (24) were generally close to 95% with some exceptions for small samples and large θ. As τ increased from 0.1 to 0.5, both biases (in magnitude) and RMSEs became slightly larger, and most CIs had slightly lower coverage probabilities. As study-specific sample sizes n_i decreased, both biases (in magnitude) and RMSEs became smaller, but the coverage probabilities seemed to interact with other factors. For example, when τ = 0.5, larger n_i generally had lower coverage probabilities when N = 5 but had higher coverage probabilities when N = 20 and θ = 0.8.

TABLE 2.

Performance of different methods for synthesizing standardized mean differences based on 10,000 replicates of simulated meta-analyses with N = 5 studies under various settings.

Method	τ = 0.1						τ = 0.5
	n_i = 6–14			n_i = 30–70			n_i = 6–14			n_i = 30–70
	Bias	RMSE	CP, %	Bias	RMSE	CP, %	Bias	RMSE	CP, %	Bias	RMSE	CP, %
θ = 0:
Cohen’s d, Eq. (4)	0.000	0.299	97.6 (95.4)	−0.001	0.137	95.2 (94.6)	0.001	0.367	95.4 (95.8)	0.002	0.260	89.6 (95.4)
Cohen’s d, Eq. (5)	0.000	0.303	96.3 (96.0)	−0.001	0.137	95.2 (94.6)	0.001	0.371	94.0 (96.1)	0.002	0.260	89.6 (95.4)
Cohen’s d, Eq. (6)	0.000	0.312	96.0 (95.9)	−0.001	0.138	95.2 (94.6)	0.000	0.383	93.6 (96.0)	0.002	0.260	89.6 (95.4)
Hedges’ g, Eq. (10)	0.000	0.267	97.6 (95.6)	−0.001	0.135	95.3 (94.6)	0.001	0.327	95.5 (96.0)	0.002	0.255	89.6 (95.4)
Hedges’ g, Eq. (11)	0.000	0.271	96.3 (95.9)	−0.001	0.135	95.2 (94.7)	0.001	0.330	94.1 (96.2)	0.002	0.255	89.6 (95.4)
Hedges’ g, Eq. (12)	0.000	0.279	96.0 (95.8)	−0.001	0.135	95.2 (94.7)	0.000	0.341	93.7 (96.0)	0.002	0.256	89.6 (95.4)
Hedges’ g, Eq. (13)	0.000	0.281	97.1 (95.5)	−0.001	0.135	95.3 (94.6)	0.000	0.345	94.7 (95.8)	0.002	0.256	89.6 (95.4)
Hedges’ g, Eq. (14)	0.000	0.266	97.6 (95.6)	−0.001	0.135	95.3 (94.6)	0.001	0.326	95.5 (96.0)	0.002	0.255	89.7 (95.4)
Hedges’ g, Eq. (20)	−0.001	0.313	96.2 (95.2) [96.9]	−0.001	0.137	95.2 (94.5) [95.3]	−0.001	0.382	93.7 (95.4) [94.8]	0.002	0.260	89.3 (95.2) [89.5]
Hedges’ g, Eq. (21)	0.000	0.309	96.9 (94.9) [96.6]	−0.001	0.137	95.2 (94.5) [95.2]	0.000	0.381	94.4 (95.1) [94.0]	0.002	0.260	89.4 (95.2) [89.4]
Hedges’ g, Eq. (22)	−0.001	0.315	95.1 (95.5) [97.4]	−0.001	0.137	95.0 (94.6) [95.3]	−0.001	0.382	92.6 (95.6) [95.5]	0.002	0.260	89.3 (95.2) [89.6]
θ = 0.3:
Cohen’s d, Eq. (4)	−0.001	0.300	97.7 (95.6)	0.000	0.138	95.6 (95.0)	−0.007	0.372	95.2 (95.3)	0.000	0.263	88.5 (94.8)
Cohen’s d, Eq. (5)	0.002	0.303	96.4 (95.9)	0.000	0.138	95.4 (95.0)	−0.001	0.378	93.8 (95.8)	0.000	0.263	88.4 (94.8)
Cohen’s d, Eq. (6)	0.010	0.312	96.1 (95.8)	0.000	0.138	95.3 (95.0)	0.007	0.391	93.5 (95.8)	0.000	0.264	88.4 (94.8)
Hedges’ g, Eq. (10)	−0.032	0.270	97.5 (95.7)	−0.005	0.135	95.6 (95.0)	−0.037	0.334	95.1 (95.5)	−0.006	0.259	88.5 (94.8)
Hedges’ g, Eq. (11)	−0.030	0.272	96.4 (96.0)	−0.005	0.136	95.4 (95.0)	−0.033	0.338	93.9 (95.7)	−0.005	0.259	88.5 (94.8)
Hedges’ g, Eq. (12)	−0.023	0.280	96.0 (95.9)	−0.005	0.136	95.4 (95.0)	−0.026	0.349	93.5 (95.7)	−0.005	0.259	88.5 (94.8)
Hedges’ g, Eq. (13)	−0.020	0.281	97.0 (95.6)	−0.005	0.136	95.6 (95.0)	−0.023	0.352	94.6 (95.4)	−0.005	0.259	88.6 (94.8)
Hedges’ g, Eq. (14)	−0.033	0.269	97.6 (95.6)	−0.005	0.135	95.6 (95.0)	−0.039	0.333	95.2 (95.5)	−0.006	0.259	88.5 (94.8)
Hedges’ g, Eq. (20)	0.003	0.311	96.3 (95.4) [97.1]	−0.001	0.137	95.3 (94.9) [95.4]	0.002	0.393	93.6 (95.0) [94.6]	0.000	0.264	88.3 (94.6) [88.4]
Hedges’ g, Eq. (21)	0.002	0.307	97.0 (95.2) [96.6]	−0.001	0.137	95.4 (94.9) [95.4]	0.001	0.391	94.2 (94.8) [93.9]	0.000	0.264	88.3 (94.6) [88.3]
Hedges’ g, Eq. (22)	0.002	0.313	95.3 (95.6) [97.5]	−0.001	0.138	95.1 (94.9) [95.5]	0.001	0.393	92.7 (95.3) [95.4]	0.000	0.264	88.2 (94.6) [88.5]
θ = 0.8:
Cohen’s d, Eq. (4)	−0.006	0.308	98.0 (95.8)	0.000	0.141	95.8 (95.0)	−0.023	0.386	94.6 (95.0)	0.000	0.261	89.2 (95.2)
Cohen’s d, Eq. (5)	0.004	0.313	96.4 (95.9)	0.000	0.141	95.6 (95.0)	−0.009	0.390	93.5 (95.4)	0.001	0.261	89.2 (95.2)
Cohen’s d, Eq. (6)	0.023	0.324	96.1 (95.9)	0.000	0.141	95.6 (95.0)	0.013	0.402	93.4 (95.4)	0.002	0.261	89.2 (95.2)
Hedges’ g, Eq. (10)	−0.087	0.289	96.7 (94.9)	−0.013	0.139	95.6 (94.9)	−0.103	0.359	93.5 (94.3)	−0.013	0.257	89.2 (95.1)
Hedges’ g, Eq. (11)	−0.082	0.292	95.1 (95.2)	−0.013	0.139	95.3 (94.9)	−0.093	0.360	92.0 (94.7)	−0.012	0.257	89.1 (95.1)
Hedges’ g, Eq. (12)	−0.065	0.296	95.0 (95.2)	−0.013	0.139	95.3 (94.9)	−0.075	0.365	92.2 (94.8)	−0.012	0.257	89.1 (95.1)
Hedges’ g, Eq. (13)	−0.056	0.294	96.5 (95.1)	−0.012	0.139	95.5 (94.9)	−0.068	0.366	93.5 (94.6)	−0.012	0.257	89.1 (95.0)
Hedges’ g, Eq. (14)	−0.090	0.289	96.7 (94.8)	−0.013	0.139	95.6 (94.9)	−0.107	0.359	93.4 (94.2)	−0.013	0.257	89.2 (95.1)
Hedges’ g, Eq. (20)	0.005	0.334	96.3 (95.0) [97.0]	−0.002	0.141	95.5 (94.9) [95.7]	−0.001	0.405	93.3 (94.4) [94.3]	0.002	0.261	89.0 (94.9) [89.2]
Hedges’ g, Eq. (21)	0.002	0.327	97.1 (94.8) [96.8]	−0.002	0.141	95.7 (94.9) [95.6]	−0.004	0.403	93.9 (94.1) [93.7]	0.002	0.261	89.1 (94.9) [89.1]
Hedges’ g, Eq. (22)	0.003	0.337	95.0 (95.3) [97.4]	−0.002	0.141	95.3 (94.9) [95.7]	−0.004	0.405	92.1 (94.7) [95.0]	0.002	0.261	89.1 (95.0) [89.3]

Open in a new tab

Abbreviation: CP, coverage probability of 95% confidence interval; RMSE, root mean squared error.

Notation: N, no. of studies; τ, between-study standard deviation; n_i, study-specific sample size; θ, true overall standardized mean difference.

Note: coverage probabilities include those based on conventional meta-analysis method in Equation (23) (outside parentheses and square brackets), based on the Hartung–Knapp–Sidik–Jonkman method in Equation (24) (inside parentheses), and based on the method in Equation (26) (inside square brackets).

TABLE 3.

Performance of different methods for synthesizing standardized mean differences based on 10,000 replicates of simulated meta-analyses with N = 20 studies under various settings.

Method	τ = 0.1						τ = 0.5
	n_i = 6–14			n_i = 30–70			n_i = 6–14			n_i = 30–70
	Bias	RMSE	CP, %	Bias	RMSE	CP, %	Bias	RMSE	CP, %	Bias	RMSE	CP, %
θ = 0:
Cohen’s d, Eq. (4)	0.001	0.144	97.7 (95.6)	0.000	0.068	95.2 (95.0)	0.000	0.177	95.4 (95.9)	0.002	0.129	93.4 (95.3)
Cohen’s d, Eq. (5)	0.001	0.144	96.3 (96.1)	0.000	0.068	95.2 (95.0)	0.000	0.178	94.6 (96.4)	0.002	0.129	93.4 (95.3)
Cohen’s d, Eq. (6)	0.001	0.148	95.9 (96.1)	0.000	0.068	95.1 (95.0)	−0.001	0.184	94.4 (96.3)	0.002	0.129	93.4 (95.2)
Hedges’ g, Eq. (10)	0.001	0.129	97.7 (95.8)	0.000	0.067	95.2 (95.0)	0.000	0.157	95.6 (96.1)	0.002	0.127	93.4 (95.3)
Hedges’ g, Eq. (11)	0.001	0.129	96.4 (96.2)	0.000	0.067	95.2 (95.0)	0.000	0.159	94.7 (96.5)	0.002	0.127	93.4 (95.2)
Hedges’ g, Eq. (12)	0.001	0.133	95.9 (96.1)	0.000	0.067	95.2 (95.0)	−0.001	0.164	94.4 (96.3)	0.002	0.127	93.4 (95.2)
Hedges’ g, Eq. (13)	0.001	0.135	97.0 (95.8)	0.000	0.067	95.3 (95.0)	0.000	0.167	95.1 (95.9)	0.002	0.127	93.4 (95.2)
Hedges’ g, Eq. (14)	0.001	0.128	97.8 (95.8)	0.000	0.067	95.2 (95.0)	0.000	0.157	95.7 (96.1)	0.002	0.127	93.4 (95.3)
Hedges’ g, Eq. (20)	0.001	0.154	95.8 (95.1) [96.5]	0.000	0.069	95.1 (94.7) [95.2]	−0.001	0.192	94.1 (95.1) [95.2]	0.002	0.130	93.4 (95.0) [93.5]
Hedges’ g, Eq. (21)	0.001	0.153	96.3 (94.9) [96.2]	0.000	0.069	95.1 (94.7) [95.1]	−0.001	0.192	94.3 (94.8) [94.4]	0.002	0.130	93.4 (95.0) [93.5]
Hedges’ g, Eq. (22)	0.001	0.156	94.9 (95.3) [97.2]	0.000	0.069	94.9 (94.8) [95.2]	−0.001	0.193	93.9 (95.3) [96.0]	0.002	0.130	93.4 (95.0) [93.6]
θ = 0.3:
Cohen’s d, Eq. (4)	−0.012	0.145	97.6 (95.8)	−0.001	0.068	95.2 (94.9)	−0.023	0.179	95.5 (95.8)	−0.003	0.130	93.3 (95.1)
Cohen’s d, Eq. (5)	−0.010	0.145	96.3 (96.2)	−0.001	0.068	94.9 (94.9)	−0.016	0.179	94.7 (96.3)	−0.003	0.130	93.3 (95.1)
Cohen’s d, Eq. (6)	−0.002	0.148	96.0 (96.1)	0.000	0.069	95.0 (94.9)	−0.006	0.184	94.7 (96.2)	−0.003	0.131	93.3 (95.1)
Hedges’ g, Eq. (10)	−0.042	0.136	96.9 (94.9)	−0.006	0.068	95.1 (94.9)	−0.051	0.166	94.6 (95.4)	−0.008	0.128	93.3 (95.0)
Hedges’ g, Eq. (11)	−0.041	0.136	95.2 (95.2)	−0.006	0.068	94.9 (94.9)	−0.046	0.166	93.9 (95.8)	−0.008	0.128	93.3 (95.0)
Hedges’ g, Eq. (12)	−0.034	0.137	95.0 (95.5)	−0.005	0.068	94.9 (94.9)	−0.038	0.169	94.0 (96.0)	−0.008	0.129	93.3 (95.0)
Hedges’ g, Eq. (13)	−0.029	0.138	96.5 (95.4)	−0.005	0.068	95.0 (94.9)	−0.035	0.170	94.7 (95.7)	−0.008	0.129	93.3 (95.0)
Hedges’ g, Eq. (14)	−0.043	0.136	96.9 (94.8)	−0.006	0.068	95.1 (94.9)	−0.053	0.166	94.6 (95.3)	−0.008	0.128	93.3 (95.0)
Hedges’ g, Eq. (20)	0.000	0.155	95.7 (95.2) [96.5]	0.000	0.069	95.0 (94.7) [95.1]	−0.001	0.193	94.6 (95.4) [95.6]	−0.001	0.131	93.3 (94.9) [93.5]
Hedges’ g, Eq. (21)	0.000	0.154	96.3 (95.2) [96.2]	0.000	0.069	95.0 (94.7) [95.0]	−0.002	0.192	94.8 (95.2) [94.9]	−0.001	0.131	93.3 (94.9) [93.4]
Hedges’ g, Eq. (22)	0.000	0.157	95.1 (95.3) [97.1]	0.000	0.069	94.8 (94.7) [95.2]	−0.002	0.194	94.4 (95.7) [96.3]	−0.001	0.131	93.3 (94.9) [93.4]
θ = 0.8:
Cohen’s d, Eq. (4)	−0.030	0.152	97.2 (95.3)	−0.002	0.070	95.4 (95.0)	−0.054	0.191	94.5 (95.1)	−0.006	0.131	93.6 (95.3)
Cohen’s d, Eq. (5)	−0.023	0.151	95.9 (96.0)	−0.002	0.070	95.3 (95.1)	−0.037	0.188	94.0 (95.8)	−0.005	0.131	93.6 (95.3)
Cohen’s d, Eq. (6)	−0.004	0.153	96.0 (96.1)	−0.001	0.070	95.2 (95.1)	−0.013	0.190	94.3 (96.3)	−0.004	0.131	93.6 (95.4)
Hedges’ g, Eq. (10)	−0.108	0.171	92.3 (89.2)	−0.015	0.071	94.7 (94.4)	−0.131	0.209	87.8 (89.5)	−0.019	0.130	93.4 (95.2)
Hedges’ g, Eq. (11)	−0.105	0.170	89.7 (90.1)	−0.015	0.071	94.5 (94.4)	−0.118	0.203	87.6 (91.0)	−0.019	0.130	93.3 (95.2)
Hedges’ g, Eq. (12)	−0.088	0.163	91.1 (91.8)	−0.014	0.071	94.6 (94.5)	−0.097	0.195	89.8 (92.6)	−0.018	0.130	93.4 (95.2)
Hedges’ g, Eq. (13)	−0.076	0.158	94.0 (92.6)	−0.014	0.071	94.6 (94.5)	−0.089	0.193	91.4 (93.0)	−0.018	0.130	93.4 (95.2)
Hedges’ g, Eq. (14)	−0.111	0.173	92.1 (88.6)	−0.015	0.071	94.7 (94.4)	−0.135	0.211	87.3 (89.0)	−0.019	0.130	93.3 (95.2)
Hedges’ g, Eq. (20)	0.001	0.162	95.7 (95.1) [96.8]	0.001	0.071	95.2 (94.9) [95.4]	0.001	0.200	94.4 (95.3) [95.6]	−0.001	0.131	93.5 (95.1) [93.6]
Hedges’ g, Eq. (21)	0.000	0.160	96.4 (94.7) [96.3]	0.001	0.071	95.3 (94.9) [95.3]	0.000	0.199	94.5 (95.0) [94.5]	−0.001	0.131	93.4 (95.1) [93.5]
Hedges’ g, Eq. (22)	0.001	0.164	95.0 (95.2) [97.1]	0.001	0.071	95.2 (95.0) [95.5]	0.000	0.201	94.2 (95.4) [96.3]	−0.001	0.131	93.5 (95.1) [93.6]

Open in a new tab

Abbreviation: CP, coverage probability of 95% confidence interval; RMSE, root mean squared error.

Notation: N, no. of studies; τ, between-study standard deviation; n_i, study-specific sample size; θ, true overall standardized mean difference.

When the true overall SMD θ = 0, the point estimates produced by all methods were nearly unbiased. This was expected because the true variances of Cohen’s d and Hedges’ g depend only on sample sizes as shown in Equations (3) and (9). Although the estimated variances might be still correlated with Cohen’s d or Hedges’ g, the correlation is likely negligible. Cohen’s d was also nearly unbiased when θ = 0 because by Equation (2) its marginal expectation is E[θ_i/J(m_i)] = θ/J(m_i) = 0 = θ.

Biases departed slightly from 0 as θ increased to 0.3. When θ = 0.8, many conventional methods led to noticeable or even substantial biases. Bias reached as high as 0.135 when N = 20, τ = 0.5, and n_i = 6–14. Note that Cohen’s d with variance estimators in Equations (4)–(6) generally led to smaller biases than Hedges’ g with variance estimators in Equations (10)–(14), although with larger RMSE in most but not all cases. Under the aforementioned setting (N = 20, τ = 0.5, and n_i = 6–14), Cohen’s d based on Equations (4)–(6) outperformed Hedges’ g based on Equations (10)–(14) for all three performance measures (smaller biases, smaller RMSEs, and higher coverage probabilities). On the other hand, Hedges’g with average-adjusted variance estimators in Equations (20)–(22) led to nearly unbiased point estimates because these methods were specifically designed to reduce the association between Hedges’ g and its variance. This generally led to better coverage probabilities than methods using the variance estimators in Equations (10)–(14) under several settings, especially when N = 20 and n_i = 6–14. The CIs derived from Equation (26) may have better coverage probabilities than the conventional CIs derived from Equation (23) but they were generally outperformed by the Hartung–Knapp–Sidik–Jonkman method in Equation (24). Nevertheless, RMSEs based on the average-adjusted variance estimators were sometimes larger than those produced by other methods, likely because weights based on them were not optimal for individual studies.

5 |. EMPIRICAL DATA ANALYSES

We applied the various methods for synthesizing SMDs to two published datasets, each having two subgroups of studies. The first dataset was reported by Gracie et al.⁶⁷ It consisted of 14 studies investigating the effect of psychological therapy on disease activity indices in quiescent inflammatory bowel disease. These studies were classified into a subgroup studying patients with Crohn’s disease and another studying patients with ulcerative colitis; each subgroup had 7 studies. The second dataset was from Köhler et al.,⁶⁸ who investigated the effect of anti-inflammatory intervention on depression and depressive symptoms. A total of 13 studies were collected, which were classified into a subgroup studying depression and another studying depressive symptoms.

Figure 3 shows forest plots of these two datasets, where the individual studies’ SMDs were estimated using Hedges’ g and their 95% CIs were based on the variance estimator in Equation (13). The studies in the first dataset had relatively small sample sizes; the study-specific sample size had a median of 37 (minimum, 11; maximum, 73; interquartile range [IQR], 18–51). In the second dataset, the depression subgroup also had relatively small sample sizes (minimum, 18; maximum, 60); the depressive-symptoms subgroup had large sample sizes, with a median of 353 (minimum, 96; maximum, 2233; IQR, 291–377). The last two studies in this dataset had much larger sample sizes (over 1200) than the other studies. This scenario is common in practice, where a few studies are large while most are much smaller. The first dataset also tended to have smaller SMDs (in magnitude) with more homogeneous effects compared to the second dataset. Specifically, based on a preliminary analysis using the estimated SMDs presented in Figure 3, in the first dataset the synthesized overall SMD (across both subgroups) was −0.032, the estimated between-study SD $\hat{τ} = 0.001$ , and the heterogeneity measure I² statistic was 0%. For the Crohn’s disease subgroup, the synthesized SMD was −0.120 with $\hat{τ} = 0.001$ and I² = 0%; for the ulcerative colitis subgroup, the synthesized SMD was 0.061 with $\hat{τ} = 0$ and I² = 0%. In the second dataset, the synthesized overall SMD (across both subgroups) was −0.324 with $\hat{τ} = 0.282$ and I² = 84.3%, implying substantial heterogeneity. For the depression subgroup, the synthesized SMD was −0.539 with $\hat{τ} = 0.459$ and I² = 64.7%; for the depressive symptoms subgroup, the synthesized SMD was −0.265 with $\hat{τ} = 0.243$ and I² = 86.4%. Each dataset contributed three meta-analyses (within each subgroup and combining the two subgroups) and represented different settings of meta-analyses of SMDs.

Forest plots of meta-analyses on disease activity indices (A) and on depression and depressive symptoms (B). The columns “Mean” and “SD” represent the continuous outcome measures’ sample means and sample standard deviations in the control or treatment group, respectively. The column “SMD” gives the standardized mean difference with its 95% confidence interval for individual studies, calculated using Hedges’ g with the variance estimator in Equation (13).

We applied the 11 methods considered in the simulation studies to both datasets. Again, all meta-analyses were performed under the random-effects setting using the R package “metafor”; the between-study variance was estimated using the REML method. Both the conventional and the Hartung–Knapp–Sidik–Jonkman methods in Equations (23) and (24) respectively were used to compute a 95% CI. The CI in Equation (26) was also used for the synthesized Hedges’ g based on the variance estimators in Equations (20)–(22).

Table 4 presents the synthesized SMDs and their 95% CIs using the 11 methods. For the first dataset, the meta-analysis of the Crohn’s disease subgroup had noticeable differences in both point and interval estimates when the SMDs were estimated by the various methods. For example, Hedges’ g with Equation (10) gave estimated overall SMD −0.117 with 95% CI (−0.355, 0.122) using the conventional CI, while using Equation (22) gave −0.142 with 95% CI (−0.381, 0.097). As the study-specific sample sizes in this subgroup were moderately small, our simulation studies in Section 4 indicate that the variance estimator in Equation (10) generally produced larger biases than the variance estimator in Equation (22); that is, the point estimate of −0.142 was likely subject to less bias. For the meta-analysis of the ulcerative colitis subgroup, the SMD estimators differed only slightly because the SMDs of these individual studies were close to 0. This is consistent with our simulation studies with θ = 0, in which the point estimates were generally similar. Moreover, the overall meta-analyses (combining the two subgroups) had noticeable differences between methods.

TABLE 4.

Synthesized standardized mean differences and 95% confidence intervals produced by various methods in the two datasets.

Method	Meta-analysis on disease activity indices			Meta-analysis on depression and depressive symptoms
Method	Crohn’s disease	Ulcerative colitis	Overall	Depression	Depressive symptoms	Overall
Cohen’s d, Eq. (4)	−0.115	0.062	−0.028	−0.552	−0.265	−0.326
	(−0.358, 0.129)	(−0.187, 0.311)	(−0.202, 0.146)	(−1.071, −0.034)	(−0.456, −0.075)	(−0.516, −0.136)
	[−0.405, 0.176]	[−0.078, 0.203]	[−0.176, 0.119]	[−1.236, 0.131]	[−0.484, −0.046]	[−0.542, −0.111]
Cohen’s d, Eq. (5)	−0.124	0.062	−0.033	−0.554	−0.266	−0.329
	(−0.363, 0.115)	(−0.182, 0.306)	(−0.203, 0.138)	(−1.068, −0.040)	(−0.456, −0.075)	(−0.519, −0.138)
	[−0.430, 0.182]	[−0.080, 0.204]	[−0.187, 0.121]	[−1.234, 0.126]	[−0.484, −0.047]	[−0.545, −0.112]
Cohen’s d, Eq. (6)	−0.126	0.062	−0.034	−0.554	−0.266	−0.329
	(−0.365, 0.113)	(−0.182, 0.306)	(−0.204, 0.136)	(−1.068, −0.041)	(−0.456, −0.075)	(−0.520, −0.139)
	[−0.436, 0.184]	[−0.080, 0.204]	[−0.190, 0.122]	[−1.234, 0.125]	[−0.485, −0.047]	[−0.546, −0.113]
Hedges’ g, Eq. (10)	−0.117	0.061	−0.030	−0.539	−0.265	−0.324
	(−0.355, 0.122)	(−0.183, 0.304)	(−0.200, 0.141)	(−1.044, −0.033)	(−0.455, −0.075)	(−0.512, −0.136)
	[−0.400, 0.166]	[−0.077, 0.198]	[−0.174, 0.115]	[−1.207, 0.130]	[−0.483, −0.046]	[−0.536, −0.111]
Hedges’ g, Eq. (11)	−0.126	0.060	−0.035	−0.540	−0.265	−0.326
	(−0.360, 0.108)	(−0.178, 0.299)	(−0.202, 0.132)	(−1.041, −0.039)	(−0.455, −0.075)	(−0.515, −0.138)
	[−0.425, 0.172]	[−0.078, 0.199]	[−0.186, 0.116]	[−1.205, 0.125]	[−0.483, −0.047]	[−0.540, −0.113]
Hedges’ g, Eq. (12)	−0.128	0.060	−0.036	−0.541	−0.265	−0.327
	(−0.362, 0.105)	(−0.178, 0.299)	(−0.203, 0.131)	(−1.041, −0.040)	(−0.455, −0.075)	(−0.515, −0.138)
	[−0.431, 0.174]	[−0.078, 0.199]	[−0.189, 0.117]	[−1.206, 0.124]	[−0.483, −0.047]	[−0.540, −0.113]
Hedges’ g, Eq. (13)	−0.120	0.061	−0.032	−0.539	−0.265	−0.324
	(−0.359, 0.118)	(−0.183, 0.304)	(−0.202, 0.139)	(−1.044, −0.034)	(−0.455, −0.075)	(−0.512, −0.136)
	[−0.411, 0.170]	[−0.077, 0.198]	[−0.179, 0.116]	[−1.207, 0.129]	[−0.483, −0.046]	[−0.537, −0.112]
Hedges’ g, Eq. (14)	−0.116	0.061	−0.030	−0.539	−0.265	−0.324
	(−0.355, 0.122)	(−0.183, 0.304)	(−0.200, 0.141)	(−1.044, −0.033)	(−0.455, −0.075)	(−0.512, −0.136)
	[−0.399, 0.166]	[−0.077, 0.198]	[−0.174, 0.115]	[−1.207, 0.130]	[−0.483, −0.046]	[−0.536, −0.111]
Hedges’ g, Eq. (20)	−0.130	0.061	−0.037	−0.545	−0.266	−0.330
	(−0.368, 0.107)	(−0.182, 0.304)	(−0.207, 0.133)	(−1.046, −0.044)	(−0.456, −0.075)	(−0.519, −0.140)
	[−0.440, 0.179]	[−0.077, 0.199]	[−0.193, 0.119]	[−1.210, 0.120]	[−0.484, −0.047]	[−0.544, −0.115]
	{−0.370, 0.109}	{−0.183, 0.305}	{−0.208, 0.134}	{−1.050, −0.040}	{−0.456, −0.075}	{−0.520, −0.139}
Hedges’ g, Eq. (21)	−0.126	0.061	−0.035	−0.544	−0.265	−0.328
	(−0.367, 0.115)	(−0.184, 0.306)	(−0.206, 0.136)	(−1.048, −0.040)	(−0.456, −0.075)	(−0.517, −0.139)
	[−0.429, 0.176]	[−0.077, 0.199]	[−0.188, 0.118]	[−1.211, 0.123]	[−0.484, −0.047]	[−0.542, −0.114]
	{−0.365, 0.113}	{−0.182, 0.305}	{−0.206, 0.136}	{−1.048, −0.039}	{−0.456, −0.075}	{−0.518, −0.139}
Hedges’ g, Eq. (22)	−0.142	0.061	−0.043	−0.545	−0.266	−0.331
	(−0.381, 0.097)	(−0.178, 0.299)	(−0.212, 0.126)	(−1.043, −0.047)	(−0.456, −0.075)	(−0.520, −0.141)
	[−0.468, 0.184]	[−0.079, 0.200]	[−0.206, 0.120]	[−1.209, 0.118]	[−0.484, −0.047]	[−0.545, −0.116]
	{−0.382, 0.098}	{−0.183, 0.304}	{−0.214, 0.128}	{−1.048, −0.042}	{−0.456, −0.075}	{−0.521, −0.140}

Open in a new tab

Note: the confidence intervals inside parentheses are derived using the conventional method in Equation (23); those inside square brackets are derived using the Hartung–Knapp–Sidik–Jonkman method in Equation (24); and those inside braces are derived using the method in Equation (26).

The confidence intervals not encompassing 0 are in boldface.

For the depression subgroup in the second dataset, the point and interval estimates of Cohen’s d and Hedges’ g produced by the various methods were noticeably different. The differences between the results of several estimators were larger than 0.01; as the upper bound of the CI for the synthesized SMD was close to 0, such differences could affect interpretation of the treatment effect. Because this subgroup had relatively small sample sizes, these differences were likely due to biases in the meta-estimates produced by conventional estimators. The subgroup of depressive symptoms had large sample sizes, so the various methods produced nearly the same estimates. The overall meta-analysis combined both small studies on depression and large studies on depressive symptoms; the results by the various methods had slight differences.

The CIs produced by the Hartung–Knapp–Sidik–Jonkman method dramatically differed from those by other methods, because the number of studies was not large in both datasets. Our simulation studies indicated that the CIs by the Hartung–Knapp–Sidik–Jonkman method likely had higher coverage probabilities. For example, in the second dataset’s depression subgroup, all Hartung–Knapp–Sidik–Jonkman CIs contained 0, indicating no significant effects of anti-inflammatory intervention. The CIs produced by other methods, however, were slightly below 0, leading to significant effects.

6 |. SOFTWARE PACKAGES

Practitioners with little statistical background may use software packages to perform meta-analyses of SMDs. Just like different meta-analysis books have inconsistent notation of SMD estimators, as discussed in Section 2.5, software packages may differ in the way they calculate the SMD and its within-study variance.

We investigated how popular meta-analysis software packages handle the SMD, including the two R core packages for meta-analysis, “metafor” (version 2.4–0)⁶⁹ and “meta” (version 4.14–0),⁷⁰ the commands “metan” and “meta” in Stata 16, Review Manager 5 (RevMan 5, version 5.4.1),⁷¹ and the Comprehensive Meta-Analysis (CMA, version 3).³⁰ The R packages “metafor” and “meta” are freely available on the Comprehensive R Archive Network (CRAN, https://cran.r-project.org/). Stata is a paid general-purpose statistical software package available on https://www.stata.com/, which has various commands, including the main commands “metan” and “meta,” for performing meta-analysis. Instructions for meta-analysis of SMDs using the command “metan” can be found in Palmer and Sterne⁷² (pp. 35, 36, and 53). The Stata command “meta” is a new feature in Stata 16; instructions about SMDs can be found in the manual at https://www.stata.com/manuals/meta.pdf (pp. 83 and 84). RevMan is software specially designed for preparing and maintaining systematic reviews in the Cochrane Library. It is available at https://community.cochrane.org/ and can be freely used to prepare Cochrane reviews or for purely academic purposes. The CMA is a commercial program for meta-analysis which can be ordered on https://www.meta-analysis.com/. Both RevMan and CMA do not require statistical coding, so they can be used easily by practitioners with no statistical training.

Table 5 describes how the aforementioned packages compute the SMD. Most packages use the approximate form $\tilde{J} (\cdot)$ of the correction coefficient, while the R package “metafor” uses the exact form J(·) by default and the “meta” package offers the exact form as an option. To use the exact form in the “meta” package, one specifies the argument exact.smd = TRUE in the function metacont(). The “metafor” package and RevMan do not have options to calculate Cohen’s d.

TABLE 5.

Calculations of the standardized mean difference in popular software packages for meta-analysis.

Software package	Default estimate of standardized mean difference	Within-study variance of Cohen’s d, $s_{d i}^{2}$		Within-study variance of Hedges’ g, $s_{g i}^{2}$		Bias correction coefficient
Software package	Default estimate of standardized mean difference	Default	Other options	Default	Other options	J (⋅) (Exact)	$\tilde{J} (\cdot) (Approximate)$
R package “metafor” (version 2.4–0)	Hedges’ g	NA	NA	Eq. (13)	Eq. (10); Eq. (20) with $\bar{g} = \hat{θ}$ in Eq. (19)	Default	NA
R package “meta” (version 4.14–0)	Hedges’ g	Eq. (4) ×[J (⋅)]² for J (⋅); Eq. (6) for $\tilde{J} (\cdot)$	NA	Eq. (10) for J (⋅); Eq. (14) for $\tilde{J} (\cdot)$	NA	Available	Default
Command “metan” in Stata 16	Cohen’s d	Eq. (5)	NA	Eq. (14)	NA	NA	Default
Command “meta” in Stata 16	Hedges’ g	Eq. (6)	Eq. (5)	Eq. (12)	Eq. (14)	NA	Default
RevMan 5 (version 5.4.1)	Hedges’ g	NA	NA	Eq. (14)	NA	NA	Default
Comprehensive Meta-Analysis (version 3)	Both Cohen’s d and Hedges’ g	Eq. (6)	NA	Eq. (12)	NA	NA	Default

Open in a new tab

Note: NA, not available.

Different packages use different estimators of the variance of the SMD. The “meta” package, the commands “metan” and “meta” in Stata 16, and RevMan 5 use the estimator in Equation (14) by default or provide this as an option, which may be due to the confusion of the notation d and g used in Hedges and Olkin.¹⁵ In the “metafor” package, the default variance estimator of Hedges’ g is Equation (13), which is specified by the argument vtype = “LS” (for “large-sample”) with measure = “SMD” in the function escalc(). The unbiased estimator in Equation (10) can be specified by vtype = “UB” (for “unbiased”), and the sample-size-averaged estimator in Equation (20) with $\bar{g} = \hat{θ}$ in Equation (19) can be specified by vtype = “AV” (for “average”).

Because the various software packages have different default options, when practitioners apply these packages, they should be fully aware of the specific formulas used in the analysis and report sufficient information about the package. In this way, they can avoid reproducibility problems when other researchers want to validate the results with different software.

7 |. DISCUSSION

This article reviewed methods to estimate the SMD and evaluated their performance in meta-analysis. Cohen’s d and Hedges’ g are widely used as SMD estimators; their variances can be estimated in many different ways. We found that meta-analysis books and software packages differ substantially in their notation and use of these estimators. As a result, practitioners must be unusually careful when they perform meta-analyses of SMDs. We recommend that they report sufficient details about the analyses, including software packages and estimator types, so their results can be reproduced readily.

Hedges’ g has been used frequently to reduce bias in studies with small sample sizes. However, both the asymptotic illustrations in Section 3.2 and the finite-sample simulation studies in Section 4 show that Hedges’ g actually has larger bias (in magnitude) than Cohen’s d using the conventional methods reviewed in Sections 2.2 and 2.3. In other words, the bias reduction of Hedges’ g in individual studies is not inherited by the meta-analysis. This phenomenon is caused mainly by the association between the study-specific point estimates and their variance estimates. To maintain the bias reduction of Hedges’ g in meta-analysis, the variance estimators reviewed in Section 3.3 should be used, including those in Equations (20)–(22).These estimators use an averaged SMD to adjust the sample variance of Hedges’ g within individual studies, thus reducing the association between Hedges’ g and their variances and reducing bias. The simulation studies in Section 4 indicated that these average-adjusted variance estimators had similar performance. Although they reduce the bias of the synthesized SMD, the average-adjusted variance estimators may not be optimal for individual studies, so the synthesized SMD might have a larger MSE in certain situations.

Our findings suggest that researchers should be careful when making statements like “because the sample sizes are small, we use the unbiased Hedges’ g as the effect measure in meta-analysis.” Such statements are true for individual studies or for meta-analyses using the average-adjusted estimators of the variance of Hedges’ g, but not for conventional methods that are currently used in most applications. Our simulation studies also showed that the Hartung–Knapp–Sidik–Jonkman method gives CIs with satisfactory coverage probabilities. This method is highly recommended in general meta-analysis practice.

We chose estimators of the SMD considered in this article by focusing on bias reduction; other estimators are available for other purposes. For example, Hedges and Olkin¹⁵ (p. 82) introduced shrunken estimators and Van Den Noortgate and Onghena⁷³ introduced the empirical Bayes estimator for reducing the MSE. We have considered only a general framework of meta-analysis; other issues, such as subgrouping or nesting of patients, may need specific methods to better synthesize SMDs.⁷⁴ Also, the between-study variance was estimated using the REML method in all analyses in this article, because it has been shown to perform better than other methods (such as the DerSimonian–Laird estimator⁴⁸) in general.⁵⁵ For meta-analyses of SMDs, measure-specific variance estimators are available.⁷⁵ In future studies, it is worthwhile to investigate their performance for the different point and variance estimators of the SMD reviewed in this article.

Cohen’s d or Hedges’ g and the corresponding sample variance use aggregated data (AD) from each study. If available, individual patient data (IPD) might help standardize the statistical analyses (choice of SMD and variance estimator) across different studies.⁷⁶ More appropriate or advanced methods, such one-step approaches via Bayesian hierarchical models or mixed-effects models that make fewer or weaker assumptions, may be used for synthesizing IPD with continuous outcomes.^77,78 Meta-analyses of IPD may offer considerable benefits compared with AD when sample sizes are small, while their performance is known to be similar for large sample sizes.^79,80

Supplementary Material

Supplemental material

NIHMS1633849-supplement-Supplemental_material.pdf^{(234.8KB, pdf)}

ACKNOWLEDGMENTS

We thank Professor James S. Hodges of the Division of Biostatistics, University of Minnesota, for helping us with writing. We also thank an Associate Editor and two anonymous referees for constructive comments that have substantially improved this article. This research was supported in part by the U.S. National Institutes of Health/National Library of Medicine grant R01 LM012982 and National Institutes of Health/National Center for Advancing Translational Sciences grant UL1 TR001427. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

SUPPORTING INFORMATION

Additional supplemental material that contains theoretical results and R code for all analyses can be found online in the supporting information tab for this article.

References

1.Gurevitch J, Koricheva J, Nakagawa S, Stewart G Meta-analysis and the science of research synthesis. Nature. 2018;555(7695):175–182. [DOI] [PubMed] [Google Scholar]
2.Berlin JA, Golub RM Meta-analysis as evidence: building a better pyramid. JAMA. 2014;312(6):603–606. [DOI] [PubMed] [Google Scholar]
3.Murad MH, Montori VM, Ioannidis JPA, et al. How to read a systematic review and meta-analysis and apply the results to patient care: users’ guides to the medical literature. JAMA. 2014;312(2):171–179. [DOI] [PubMed] [Google Scholar]
4.Hoaglin DC We know less than we should about methods of meta-analysis. Res Synth Methods. 2015;6(3):287–289. [DOI] [PubMed] [Google Scholar]
5.Jackson D, White IR When should meta-analysis avoid making hidden normality assumptions?. Biom J. 2018;60(6):1040–1058. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Hacke C, Nunan D Discrepancies in meta-analyses answering the same clinical question were hard to explain: a meta-epidemiological study. J Clin Epidemiol. 2020;119:47–56. [DOI] [PubMed] [Google Scholar]
7.Ioannidis JPA Massive citations to misleading methods and research tools: Matthew effect, quotation error and citation copying. Eur J Epidemiol. 2018;33(11):1021–1023. [DOI] [PubMed] [Google Scholar]
8.Hoaglin DC Misunderstandings about Q and ‘Cochran’s Q test’ in meta-analysis. Stat Med. 2016;35(4):485–495. [DOI] [PubMed] [Google Scholar]
9.Doncaster CP, Spake R Correction for bias in meta-analysis of little-replicated studies. Methods Ecol Evol. 2018;9(3):634–644. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Murad MH, Wang Z, Chu H, Lin L When continuous outcomes are measured using different scales: guide for meta-analysis and interpretation. BMJ. 2019;364:k4817. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Boulé NG, Haddad E, Kenny GP, Wells GA, Sigal RJ Effects of exercise on glycemic control and body mass in type 2 diabetes mellitus: a meta-analysis of controlled clinical trials. JAMA. 2001;286(10):1218–1227. [DOI] [PubMed] [Google Scholar]
12.Kirsch I, Deacon BJ, Huedo-Medina TB, Scoboria A, Moore TJ, Johnson BT Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration. PLoS Med. 2008;5(2):e45. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Takeshima N, Sozu T, Tajika A, Ogawa Y, Hayasaka Y, Furukawa TA Which is more generalizable, powerful and interpretable in meta-analyses, mean difference or standardized mean difference?. BMC Med Res Methodol. 2014;14(1):30. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R Selective publication of antidepressant trials and its influence on apparent efficacy. N Engl J Med. 2008;358(3):252–260. [DOI] [PubMed] [Google Scholar]
15.Hedges LV, Olkin I Statistical Methods for Meta-Analysis. Orlando, FL: Academic Press; 1985. [Google Scholar]
16.da Costa BR, Rutjes AW, Johnston BC, et al. Methods to convert continuous outcomes into odds ratios of treatment response and numbers needed to treat: meta-epidemiological study. Int J Epidemiol. 2012;41(5):1445–1459. [DOI] [PubMed] [Google Scholar]
17.Gøtzsche PC, Hróbjartsson A, Marić K, Tendal B Data extraction errors in meta-analyses that use standardized mean differences. JAMA. 2007;298(4):430–437. [DOI] [PubMed] [Google Scholar]
18.Tendal B, Higgins JP, Jüni P, et al. Disagreements in meta-analyses using outcomes measured on continuous or rating scales: observer agreement study. BMJ. 2009;339:b3128. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Lin L Bias caused by sampling error in meta-analysis with small sample sizes. PLoS One. 2018;13(9):e0204056. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Hamman EA, Pappalardo P, Bence JR, Peacor SD, Osenberg CW Bias in meta-analyses using Hedges’ d. Ecosphere. 2018;9(9):e02419. [Google Scholar]
21.Hedges LV Distribution theory for Glass’s estimator of effect size and related estimators. J Educ Stat. 1981;6(2):107–128. [Google Scholar]
22.Hedges LV Estimation of effect size from a series of independent experiments. Psychol Bull. 1982;92(2):490–499. [Google Scholar]
23.Hedges LV A random effects model for effect sizes. Psychol Bull. 1983;93(2):388–395. [Google Scholar]
24.Cohen J Statistical Power Analysis for the Behavioral Sciencies. New York, NY: Academic Press; 1977. [Google Scholar]
25.Casella G, Berger RL Statistical Inference. Belmont, CA: Duxbury Press; 2nd ed2001. [Google Scholar]
26.White IR, Thomas J Standardized mean differences in individually-randomized and cluster-randomized trials, with applications to meta-analysis. Clin Trials. 2005;2(2):141–151. [DOI] [PubMed] [Google Scholar]
27.Egger M, Davey Smith G, Altman D Systematic Reviews in Health Care: Meta-Analysis in Context. London, UK: BMJ Publishing Group; 2nd ed2001. [Google Scholar]
28.Hartung J, Knapp G, Sinha BK Statistical Meta-Analysis with Applications. Hoboken, NJ: John Wiley & Sons; 2008. [Google Scholar]
29.Cooper H, Hedges LV, Valentine JC The Handbook of Research Synthesis and Meta-Analysis. New York, NY: Russell Sage Foundation; 2nd ed2009. [Google Scholar]
30.Borenstein M, Hedges LV, Higgins JPT, Rothstein HR Introduction to Meta-Analysis. Chichester, UK: John Wiley & Sons; 2009. [Google Scholar]
31.Stangl DK, Berry DA Meta-Analysis in Medicine and Health Policy. New York, NY: Marcel Dekker; 2000. [Google Scholar]
32.Koricheva J, Gurevitch J, Mengersen K Handbook of Meta-Analysis in Ecology and Evolution. Princeton, NJ: Princeton University Press; 2013. [Google Scholar]
33.Pigott TD Advances in Meta-Analysis. New York, NY: Springer; 2012. [Google Scholar]
34.Lipsey MW, Wilson DB Practical Meta-Analysis. Thousand Oaks, CA: Sage Publications; 2001. [Google Scholar]
35.Schwarzer G, Carpenter JR, Rücker G Meta-Analysis with R. Cham, Switzerland: Springer; 2015. [Google Scholar]
36.Schlattmann P Medical Applications of Finite Mixture Models. Berlin, Germany: Springer; 2009. [Google Scholar]
37.Glass GV Primary, secondary, and meta-analysis of research. Educ Res. 1976;5(10):3–8. [Google Scholar]
38.Glass GV, McGaw B, Smith ML Meta-Analysis in Social Research. Newbury Park, CA: Sage Publications; 1981. [Google Scholar]
39.Cruz N, Desai SC, Dewitt S, et al. Widening access to Bayesian problem solving. Front Psychol. 2020;11:660. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Cohen J Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988. [Google Scholar]
41.Thompson SG, Pocock SJ Can meta-analyses be trusted?. Lancet. 1991;338(8775):1127–1130. [DOI] [PubMed] [Google Scholar]
42.Eysenck HJ Meta-analysis and its problems. BMJ. 1994;309(6957):789–792. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Thompson SG Why sources of heterogeneity in meta-analysis should be investigated. BMJ. 1994;309(6965):1351–1355. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Cappelleri JC, loannidis JPA, Schmid CH, et al. Large trials vs meta-analysis of smaller trials: how do their results compare?. JAMA. 1996;276(16):1332–1338. [PubMed] [Google Scholar]
45.LeLorier J, Gregoire G, Benhaddad A, Lapierre J, Derderian F Discrepancies between meta-analyses and subsequent large randomized, controlled trials. N Engl J Med. 1997;337(8):536–542. [DOI] [PubMed] [Google Scholar]
46.Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Lancet. 1999;354(9193):1896–1900. [DOI] [PubMed] [Google Scholar]
47.Borenstein M, Hedges LV, Higgins JPT, Rothstein HR A basic introduction to fixed-effect and random-effects models for meta-analysis. Res Synth Methods. 2010;1(2):97–111. [DOI] [PubMed] [Google Scholar]
48.DerSimonian R, Laird N Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3):177–188. [DOI] [PubMed] [Google Scholar]
49.Paule RC, Mandel J Consensus values and weighting factors. J Res Natl Bur Stand. 1982;87(5):377–385. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Normand SLT Meta-analysis: formulating, evaluating, combining, and reporting. Stat Med. 1999;18(3):321–359. [DOI] [PubMed] [Google Scholar]
51.Cornell JE, Mulrow CD, Localio R, et al. Random-effects meta-analysis of inconsistent effects: a time for change. Ann Intern Med. 2014;160(4):267–270. [DOI] [PubMed] [Google Scholar]
52.Langan D, Higgins JPT, Simmonds M An empirical comparison of heterogeneity variance estimators in 12894 meta-analyses. Res Synth Methods. 2015;6(2):195–205. [DOI] [PubMed] [Google Scholar]
53.Veroniki AA, Jackson D, Viechtbauer W, et al. Methods to estimate the between-study variance and its uncertainty in meta-analysis. Res Synth Methods. 2016;7(1):55–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.van Aert RCM, Jackson D Multistep estimators of the between-study variance: the relationship with the Paule-Mandel estimator. Stat Med. 2018;37(17):2616–2629. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Langan D, Higgins JPT, Jackson D, et al. A comparison of heterogeneity variance estimators in simulated random-effects meta-analyses. Res Synth Methods. 2019;10(1):83–98. [DOI] [PubMed] [Google Scholar]
56.Walker GA, Saw JG The distribution of linear combinations of t-variables. J Am Stat Assoc. 1978;73(364):876–878. [Google Scholar]
57.Boos DD, Stefanski LA Essential Statistical Inference: Theory and Methods. New York, NY: Springer; 2013. [Google Scholar]
58.Sánchez-Meca J, Marín-Martínez F Weighting by inverse variance or by sample size in meta-analysis: a simulation study. Educ Psychol Meas. 1998;58(2):211–220. [Google Scholar]
59.Berkey CS, Hoaglin DC, Mosteller F, Colditz GA A random-effects regression model for meta-analysis. Stat Med. 1995;14(4):395–411. [DOI] [PubMed] [Google Scholar]
60.Sidik K, Jonkman JN A simple confidence interval for meta-analysis. Stat Med. 2002;21(21):3153–3159. [DOI] [PubMed] [Google Scholar]
61.Knapp G, Hartung J Improved tests for a random effects meta-regression with a single covariate. Stat Med. 2003;22(17):2693–2710. [DOI] [PubMed] [Google Scholar]
62.Hartung J, Knapp G A refined method for the meta-analysis of controlled clinical trials with binary outcome. Stat Med. 2001;20(24):3875–3889. [DOI] [PubMed] [Google Scholar]
63.Sidik K, Jonkman JN On constructing confidence intervals for a standardized mean difference in meta-analysis. Commun Stat Simul Comput. 2003;32(4):1191–1203. [Google Scholar]
64.IntHout J, Ioannidis JPA, Borm GF The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Med Res Methodol. 2014;14(1):25. [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Henmi M, Copas JB Confidence intervals for random effects meta-analysis and robustness to publication bias. Stat Med. 2010;29(29):2969–2983. [DOI] [PubMed] [Google Scholar]
66.Higgins JPT Commentary: Heterogeneity in meta-analysis should be expected and appropriately quantified. Int J Epidemiol. 2008;37(5):1158–1160. [DOI] [PubMed] [Google Scholar]
67.Gracie DJ, Irvine AJ, Sood R, Mikocka-Walus A, Hamlin PJ, Ford AC Effect of psychological therapy on disease activity, psychological comorbidity, and quality of life in inflammatory bowel disease: a systematic review and meta-analysis. Lancet Gastroenterol Hepatol. 2017;2(3):189–199. [DOI] [PubMed] [Google Scholar]
68.Köhler O, Benros ME, Nordentoft M, et al. Effect of anti-inflammatory treatment on depression, depressive symptoms, and adverse effects: a systematic review and meta-analysis of randomized clinical trials. JAMA Psychiatry. 2014;71(12):1381–1391. [DOI] [PubMed] [Google Scholar]
69.Viechtbauer W Conducting meta-analyses in R with the metafor package. J Stat Softw. 2010;36(3):1–48. [Google Scholar]
70.Schwarzer G meta: An R package for meta-analysis. R News. 2007;7(3):40–45. [Google Scholar]
71.Higgins JPT, Thomas J, Chandler J, et al. Cochrane Handbook for Systematic Reviews of Interventions. Chichester, UK: John Wiley & Sons; 2nd ed2019. [Google Scholar]
72.Palmer TM, Sterne JAC Meta-Analysis in Stata: An Updated Collection from the Stata Journal. College Station, TX: Stata Press; 2nd ed2016. [Google Scholar]
73.Van Den Noortgate W, Onghena P Estimating the mean effect size in meta-analysis: bias, precision, and mean squared error of different weighting methods. Behav Res Methods Instrum Comput. 2003;35(4):504–511. [DOI] [PubMed] [Google Scholar]
74.Walwyn R, Roberts C Meta-analysis of standardised mean differences from randomised trials with treatment-related clustering associated with care providers. Stat Med. 2017;36(7):1043–1067. [DOI] [PubMed] [Google Scholar]
75.Bakbergenuly I, Hoaglin DC, Kulinskaya E Estimation in meta-analyses of mean difference and standardized mean difference. Stat Med. 2020;39(2):171–191. [DOI] [PMC free article] [PubMed] [Google Scholar]
76.Riley RD, Lambert PC, Abo-Zaid G Meta-analysis of individual participant data: rationale, conduct, and reporting. BMJ. 2010;340:c221. [DOI] [PubMed] [Google Scholar]
77.Higgins JPT, Whitehead A, Turner RM, Omar RZ, Thompson SG Meta-analysis of continuous outcome data from individual patients. Stat Med. 2001;20(15):2219–2241. [DOI] [PubMed] [Google Scholar]
78.Riley RD, Lambert PC, Staessen JA, et al. Meta-analysis of continuous outcomes combining individual patient data and aggregate data. Stat Med. 2008;27(11):1870–1893. [DOI] [PubMed] [Google Scholar]
79.Zeng D, Lin DY On random-effects meta-analysis. Biometrika. 2015;102(2):281–294. [DOI] [PMC free article] [PubMed] [Google Scholar]
80.Tierney JF, Fisher DJ, Burdett S, Stewart LA, Parmar MKB Comparison of aggregate and individual participant data approaches to meta-analysis of randomised trials: an observational study. PLoS Med. 2020;17(1):e1003019. [DOI] [PMC free article] [PubMed] [Google Scholar]
81.Hunter JE, Schmidt FL Methods of Meta-Analysis: Correcting Error and Bias in Research Findings. Newbury Park, CA: SAGE Publications; 1990. [Google Scholar]
82.Card NA Applied Meta-Analysis for Social Science Research. New York, NY: Guilford Press; 2012. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental material

NIHMS1633849-supplement-Supplemental_material.pdf^{(234.8KB, pdf)}

[R1] 1.Gurevitch J, Koricheva J, Nakagawa S, Stewart G Meta-analysis and the science of research synthesis. Nature. 2018;555(7695):175–182. [DOI] [PubMed] [Google Scholar]

[R2] 2.Berlin JA, Golub RM Meta-analysis as evidence: building a better pyramid. JAMA. 2014;312(6):603–606. [DOI] [PubMed] [Google Scholar]

[R3] 3.Murad MH, Montori VM, Ioannidis JPA, et al. How to read a systematic review and meta-analysis and apply the results to patient care: users’ guides to the medical literature. JAMA. 2014;312(2):171–179. [DOI] [PubMed] [Google Scholar]

[R4] 4.Hoaglin DC We know less than we should about methods of meta-analysis. Res Synth Methods. 2015;6(3):287–289. [DOI] [PubMed] [Google Scholar]

[R5] 5.Jackson D, White IR When should meta-analysis avoid making hidden normality assumptions?. Biom J. 2018;60(6):1040–1058. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Hacke C, Nunan D Discrepancies in meta-analyses answering the same clinical question were hard to explain: a meta-epidemiological study. J Clin Epidemiol. 2020;119:47–56. [DOI] [PubMed] [Google Scholar]

[R7] 7.Ioannidis JPA Massive citations to misleading methods and research tools: Matthew effect, quotation error and citation copying. Eur J Epidemiol. 2018;33(11):1021–1023. [DOI] [PubMed] [Google Scholar]

[R8] 8.Hoaglin DC Misunderstandings about Q and ‘Cochran’s Q test’ in meta-analysis. Stat Med. 2016;35(4):485–495. [DOI] [PubMed] [Google Scholar]

[R9] 9.Doncaster CP, Spake R Correction for bias in meta-analysis of little-replicated studies. Methods Ecol Evol. 2018;9(3):634–644. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Murad MH, Wang Z, Chu H, Lin L When continuous outcomes are measured using different scales: guide for meta-analysis and interpretation. BMJ. 2019;364:k4817. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Boulé NG, Haddad E, Kenny GP, Wells GA, Sigal RJ Effects of exercise on glycemic control and body mass in type 2 diabetes mellitus: a meta-analysis of controlled clinical trials. JAMA. 2001;286(10):1218–1227. [DOI] [PubMed] [Google Scholar]

[R12] 12.Kirsch I, Deacon BJ, Huedo-Medina TB, Scoboria A, Moore TJ, Johnson BT Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration. PLoS Med. 2008;5(2):e45. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Takeshima N, Sozu T, Tajika A, Ogawa Y, Hayasaka Y, Furukawa TA Which is more generalizable, powerful and interpretable in meta-analyses, mean difference or standardized mean difference?. BMC Med Res Methodol. 2014;14(1):30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R Selective publication of antidepressant trials and its influence on apparent efficacy. N Engl J Med. 2008;358(3):252–260. [DOI] [PubMed] [Google Scholar]

[R15] 15.Hedges LV, Olkin I Statistical Methods for Meta-Analysis. Orlando, FL: Academic Press; 1985. [Google Scholar]

[R16] 16.da Costa BR, Rutjes AW, Johnston BC, et al. Methods to convert continuous outcomes into odds ratios of treatment response and numbers needed to treat: meta-epidemiological study. Int J Epidemiol. 2012;41(5):1445–1459. [DOI] [PubMed] [Google Scholar]

[R17] 17.Gøtzsche PC, Hróbjartsson A, Marić K, Tendal B Data extraction errors in meta-analyses that use standardized mean differences. JAMA. 2007;298(4):430–437. [DOI] [PubMed] [Google Scholar]

[R18] 18.Tendal B, Higgins JP, Jüni P, et al. Disagreements in meta-analyses using outcomes measured on continuous or rating scales: observer agreement study. BMJ. 2009;339:b3128. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Lin L Bias caused by sampling error in meta-analysis with small sample sizes. PLoS One. 2018;13(9):e0204056. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Hamman EA, Pappalardo P, Bence JR, Peacor SD, Osenberg CW Bias in meta-analyses using Hedges’ d. Ecosphere. 2018;9(9):e02419. [Google Scholar]

[R21] 21.Hedges LV Distribution theory for Glass’s estimator of effect size and related estimators. J Educ Stat. 1981;6(2):107–128. [Google Scholar]

[R22] 22.Hedges LV Estimation of effect size from a series of independent experiments. Psychol Bull. 1982;92(2):490–499. [Google Scholar]

[R23] 23.Hedges LV A random effects model for effect sizes. Psychol Bull. 1983;93(2):388–395. [Google Scholar]

[R24] 24.Cohen J Statistical Power Analysis for the Behavioral Sciencies. New York, NY: Academic Press; 1977. [Google Scholar]

[R25] 25.Casella G, Berger RL Statistical Inference. Belmont, CA: Duxbury Press; 2nd ed2001. [Google Scholar]

[R26] 26.White IR, Thomas J Standardized mean differences in individually-randomized and cluster-randomized trials, with applications to meta-analysis. Clin Trials. 2005;2(2):141–151. [DOI] [PubMed] [Google Scholar]

[R27] 27.Egger M, Davey Smith G, Altman D Systematic Reviews in Health Care: Meta-Analysis in Context. London, UK: BMJ Publishing Group; 2nd ed2001. [Google Scholar]

[R28] 28.Hartung J, Knapp G, Sinha BK Statistical Meta-Analysis with Applications. Hoboken, NJ: John Wiley & Sons; 2008. [Google Scholar]

[R29] 29.Cooper H, Hedges LV, Valentine JC The Handbook of Research Synthesis and Meta-Analysis. New York, NY: Russell Sage Foundation; 2nd ed2009. [Google Scholar]

[R30] 30.Borenstein M, Hedges LV, Higgins JPT, Rothstein HR Introduction to Meta-Analysis. Chichester, UK: John Wiley & Sons; 2009. [Google Scholar]

[R31] 31.Stangl DK, Berry DA Meta-Analysis in Medicine and Health Policy. New York, NY: Marcel Dekker; 2000. [Google Scholar]

[R32] 32.Koricheva J, Gurevitch J, Mengersen K Handbook of Meta-Analysis in Ecology and Evolution. Princeton, NJ: Princeton University Press; 2013. [Google Scholar]

[R33] 33.Pigott TD Advances in Meta-Analysis. New York, NY: Springer; 2012. [Google Scholar]

[R34] 34.Lipsey MW, Wilson DB Practical Meta-Analysis. Thousand Oaks, CA: Sage Publications; 2001. [Google Scholar]

[R35] 35.Schwarzer G, Carpenter JR, Rücker G Meta-Analysis with R. Cham, Switzerland: Springer; 2015. [Google Scholar]

[R36] 36.Schlattmann P Medical Applications of Finite Mixture Models. Berlin, Germany: Springer; 2009. [Google Scholar]

[R37] 37.Glass GV Primary, secondary, and meta-analysis of research. Educ Res. 1976;5(10):3–8. [Google Scholar]

[R38] 38.Glass GV, McGaw B, Smith ML Meta-Analysis in Social Research. Newbury Park, CA: Sage Publications; 1981. [Google Scholar]

[R39] 39.Cruz N, Desai SC, Dewitt S, et al. Widening access to Bayesian problem solving. Front Psychol. 2020;11:660. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Cohen J Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988. [Google Scholar]

[R41] 41.Thompson SG, Pocock SJ Can meta-analyses be trusted?. Lancet. 1991;338(8775):1127–1130. [DOI] [PubMed] [Google Scholar]

[R42] 42.Eysenck HJ Meta-analysis and its problems. BMJ. 1994;309(6957):789–792. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Thompson SG Why sources of heterogeneity in meta-analysis should be investigated. BMJ. 1994;309(6965):1351–1355. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Cappelleri JC, loannidis JPA, Schmid CH, et al. Large trials vs meta-analysis of smaller trials: how do their results compare?. JAMA. 1996;276(16):1332–1338. [PubMed] [Google Scholar]

[R45] 45.LeLorier J, Gregoire G, Benhaddad A, Lapierre J, Derderian F Discrepancies between meta-analyses and subsequent large randomized, controlled trials. N Engl J Med. 1997;337(8):536–542. [DOI] [PubMed] [Google Scholar]

[R46] 46.Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Lancet. 1999;354(9193):1896–1900. [DOI] [PubMed] [Google Scholar]

[R47] 47.Borenstein M, Hedges LV, Higgins JPT, Rothstein HR A basic introduction to fixed-effect and random-effects models for meta-analysis. Res Synth Methods. 2010;1(2):97–111. [DOI] [PubMed] [Google Scholar]

[R48] 48.DerSimonian R, Laird N Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3):177–188. [DOI] [PubMed] [Google Scholar]

[R49] 49.Paule RC, Mandel J Consensus values and weighting factors. J Res Natl Bur Stand. 1982;87(5):377–385. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] 50.Normand SLT Meta-analysis: formulating, evaluating, combining, and reporting. Stat Med. 1999;18(3):321–359. [DOI] [PubMed] [Google Scholar]

[R51] 51.Cornell JE, Mulrow CD, Localio R, et al. Random-effects meta-analysis of inconsistent effects: a time for change. Ann Intern Med. 2014;160(4):267–270. [DOI] [PubMed] [Google Scholar]

[R52] 52.Langan D, Higgins JPT, Simmonds M An empirical comparison of heterogeneity variance estimators in 12894 meta-analyses. Res Synth Methods. 2015;6(2):195–205. [DOI] [PubMed] [Google Scholar]

[R53] 53.Veroniki AA, Jackson D, Viechtbauer W, et al. Methods to estimate the between-study variance and its uncertainty in meta-analysis. Res Synth Methods. 2016;7(1):55–79. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] 54.van Aert RCM, Jackson D Multistep estimators of the between-study variance: the relationship with the Paule-Mandel estimator. Stat Med. 2018;37(17):2616–2629. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] 55.Langan D, Higgins JPT, Jackson D, et al. A comparison of heterogeneity variance estimators in simulated random-effects meta-analyses. Res Synth Methods. 2019;10(1):83–98. [DOI] [PubMed] [Google Scholar]

[R56] 56.Walker GA, Saw JG The distribution of linear combinations of t-variables. J Am Stat Assoc. 1978;73(364):876–878. [Google Scholar]

[R57] 57.Boos DD, Stefanski LA Essential Statistical Inference: Theory and Methods. New York, NY: Springer; 2013. [Google Scholar]

[R58] 58.Sánchez-Meca J, Marín-Martínez F Weighting by inverse variance or by sample size in meta-analysis: a simulation study. Educ Psychol Meas. 1998;58(2):211–220. [Google Scholar]

[R59] 59.Berkey CS, Hoaglin DC, Mosteller F, Colditz GA A random-effects regression model for meta-analysis. Stat Med. 1995;14(4):395–411. [DOI] [PubMed] [Google Scholar]

[R60] 60.Sidik K, Jonkman JN A simple confidence interval for meta-analysis. Stat Med. 2002;21(21):3153–3159. [DOI] [PubMed] [Google Scholar]

[R61] 61.Knapp G, Hartung J Improved tests for a random effects meta-regression with a single covariate. Stat Med. 2003;22(17):2693–2710. [DOI] [PubMed] [Google Scholar]

[R62] 62.Hartung J, Knapp G A refined method for the meta-analysis of controlled clinical trials with binary outcome. Stat Med. 2001;20(24):3875–3889. [DOI] [PubMed] [Google Scholar]

[R63] 63.Sidik K, Jonkman JN On constructing confidence intervals for a standardized mean difference in meta-analysis. Commun Stat Simul Comput. 2003;32(4):1191–1203. [Google Scholar]

[R64] 64.IntHout J, Ioannidis JPA, Borm GF The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Med Res Methodol. 2014;14(1):25. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R65] 65.Henmi M, Copas JB Confidence intervals for random effects meta-analysis and robustness to publication bias. Stat Med. 2010;29(29):2969–2983. [DOI] [PubMed] [Google Scholar]

[R66] 66.Higgins JPT Commentary: Heterogeneity in meta-analysis should be expected and appropriately quantified. Int J Epidemiol. 2008;37(5):1158–1160. [DOI] [PubMed] [Google Scholar]

[R67] 67.Gracie DJ, Irvine AJ, Sood R, Mikocka-Walus A, Hamlin PJ, Ford AC Effect of psychological therapy on disease activity, psychological comorbidity, and quality of life in inflammatory bowel disease: a systematic review and meta-analysis. Lancet Gastroenterol Hepatol. 2017;2(3):189–199. [DOI] [PubMed] [Google Scholar]

[R68] 68.Köhler O, Benros ME, Nordentoft M, et al. Effect of anti-inflammatory treatment on depression, depressive symptoms, and adverse effects: a systematic review and meta-analysis of randomized clinical trials. JAMA Psychiatry. 2014;71(12):1381–1391. [DOI] [PubMed] [Google Scholar]

[R69] 69.Viechtbauer W Conducting meta-analyses in R with the metafor package. J Stat Softw. 2010;36(3):1–48. [Google Scholar]

[R70] 70.Schwarzer G meta: An R package for meta-analysis. R News. 2007;7(3):40–45. [Google Scholar]

[R71] 71.Higgins JPT, Thomas J, Chandler J, et al. Cochrane Handbook for Systematic Reviews of Interventions. Chichester, UK: John Wiley & Sons; 2nd ed2019. [Google Scholar]

[R72] 72.Palmer TM, Sterne JAC Meta-Analysis in Stata: An Updated Collection from the Stata Journal. College Station, TX: Stata Press; 2nd ed2016. [Google Scholar]

[R73] 73.Van Den Noortgate W, Onghena P Estimating the mean effect size in meta-analysis: bias, precision, and mean squared error of different weighting methods. Behav Res Methods Instrum Comput. 2003;35(4):504–511. [DOI] [PubMed] [Google Scholar]

[R74] 74.Walwyn R, Roberts C Meta-analysis of standardised mean differences from randomised trials with treatment-related clustering associated with care providers. Stat Med. 2017;36(7):1043–1067. [DOI] [PubMed] [Google Scholar]

[R75] 75.Bakbergenuly I, Hoaglin DC, Kulinskaya E Estimation in meta-analyses of mean difference and standardized mean difference. Stat Med. 2020;39(2):171–191. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R76] 76.Riley RD, Lambert PC, Abo-Zaid G Meta-analysis of individual participant data: rationale, conduct, and reporting. BMJ. 2010;340:c221. [DOI] [PubMed] [Google Scholar]

[R77] 77.Higgins JPT, Whitehead A, Turner RM, Omar RZ, Thompson SG Meta-analysis of continuous outcome data from individual patients. Stat Med. 2001;20(15):2219–2241. [DOI] [PubMed] [Google Scholar]

[R78] 78.Riley RD, Lambert PC, Staessen JA, et al. Meta-analysis of continuous outcomes combining individual patient data and aggregate data. Stat Med. 2008;27(11):1870–1893. [DOI] [PubMed] [Google Scholar]

[R79] 79.Zeng D, Lin DY On random-effects meta-analysis. Biometrika. 2015;102(2):281–294. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R80] 80.Tierney JF, Fisher DJ, Burdett S, Stewart LA, Parmar MKB Comparison of aggregate and individual participant data approaches to meta-analysis of randomised trials: an observational study. PLoS Med. 2020;17(1):e1003019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R81] 81.Hunter JE, Schmidt FL Methods of Meta-Analysis: Correcting Error and Bias in Research Findings. Newbury Park, CA: SAGE Publications; 1990. [Google Scholar]

[R82] 82.Card NA Applied Meta-Analysis for Social Science Research. New York, NY: Guilford Press; 2012. [Google Scholar]

PERMALINK

Evaluation of various estimators for standardized mean difference in meta-analysis

Lifeng Lin

Ariel M Aloe

Summary

1 |. INTRODUCTION

2 |. STANDARDIZED MEAN DIFFERENCE

2.1 |. Notation

2.2 |. Cohen’s d

FIGURE 1.

2.3 |. Hedges’ g

2.4 |. Alternative estimators of standardized mean difference

2.5 |. Inconsistent notation in different publications

TABLE 1.

3 |. META-ANALYSIS OF STANDARDIZED MEAN DIFFERENCES

3.1 |. Conventional methods

3.2 |. Illustration of asymptotic bias produced by conventional methods

FIGURE 2.

3.3 |. Alternative methods for reducing bias in meta-analysis

3.4 |. Confidence interval for the synthesized standardized mean difference

4 |. SIMULATION STUDIES

4.1 |. Simulation designs

4.2 |. Simulation results

TABLE 2.

TABLE 3.

5 |. EMPIRICAL DATA ANALYSES

FIGURE 3.

TABLE 4.

6 |. SOFTWARE PACKAGES

TABLE 5.

7 |. DISCUSSION

Supplementary Material

ACKNOWLEDGMENTS

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases