Skip to main content
HHS Author Manuscripts logoLink to HHS Author Manuscripts
. Author manuscript; available in PMC: 2019 Dec 10.
Published in final edited form as: Stat Med. 2018 Jul 18;37(28):4114–4125. doi: 10.1002/sim.7905

Quantifying and presenting overall evidence in network meta-analysis

Lifeng Lin 1,*
PMCID: PMC6235692  NIHMSID: NIHMS976196  PMID: 30019428

Summary

Network meta-analysis (NMA) has become an increasingly-used tool to compare multiple treatments simultaneously by synthesizing direct and indirect evidence in clinical research. However, many existing studies did not properly report the evidence of treatment comparisons and show the comparison structure to audience. Also, nearly all treatment networks presented only direct evidence, not overall evidence that can reflect the benefit of performing NMAs. This article classifies treatment networks into three types under different assumptions; they include networks with each treatment comparison’s edge width proportional to the corresponding number of studies, sample size, and precision. Also, three new measures (i.e., the effective number of studies, the effective sample size, and the effective precision) are proposed to preliminarily quantify overall evidence gained in NMAs. They permit audience to intuitively evaluate the benefit of performing NMAs, compared with pairwise meta-analyses based on only direct evidence. We use four case studies, including one illustrative example, to demonstrate their derivations and interpretations. Treatment networks may look fairly differently when different measures are used to present the evidence. The proposed measures provide clear information about overall evidence of all treatment comparisons, and they also imply the additional number of studies, sample size, and precision obtained from indirect evidence. Some comparisons may benefit little from NMAs. Researchers are encouraged to present overall evidence of all treatment comparisons, so that audience can preliminarily evaluate the quality of NMAs.

Keywords: direct and indirect evidence, effective number of studies, effective sample size, effective precision, network meta-analysis

1 | INTRODUCTION

Systematic reviews and meta-analyses have become commonly-used statistical tools in the recent decades for synthesizing evidence of comparisons between treatments in clinical research and many other scientific areas.1 As new treatments are being continuously developed for various diseases, network meta-analysis (NMA), also known as mixed-treatment comparison or multiple-treatment meta-analysis, is increasingly popular in this era of data sharing.2 As an extension of traditional pairwise meta-analyses, an NMA compares multiple treatments at one time and synthesizes both direct and indirect evidence for the comparisons. Due to its appealing features, researchers can obtain more precise treatment effect estimates, compare treatments that lack direct evidence, and rank all available treatments for better decision making.3,4,5,6

So far, much attention has been given to modeling NMAs. Various methods have been proposed for NMAs from different statistical perspectives, such as Bayesian hierarchical models,4,5,7 frequentist methods,8,9 graph theory,10 etc. Extensive research has also focused on checking critical assumptions (e.g., transitivity, consistency, and heterogeneity) and conducting diagnostics in NMAs.11,12,13,14,15,16 However, less attention was paid to reporting NMAs, and the results were usually reported fairly differently.17

We may distinguish the reporting of an NMA into two classes. One is the pre-analysis reporting before applying rigorous statistical models to perform the NMA. This includes presenting characteristics of the collected studies and treatments and showing the structure of treatment comparisons. Such pre-analysis evidence allows audience to preliminarily evaluate the NMA’s quality. Currently, comparisons are usually presented using a treatment network, also known as an evidence network, in which each node represents a treatment and each edge shows a comparison between the corresponding treatments.6,18 For example, if all treatments are compared with only a common comparator, they form a star-shaped network, which may not be connected well and contain few sources of evidence. If all treatments are mutually compared, their network is often referred to as a fully-connected network, which contains many sources of direct and indirect evidence for treatment comparisons. Reporting the network is simple, straightforward, and intuitive. However, according to the survey by Bafeta et al.,17 among 121 NMAs, 83% did not report treatment networks. Another class is the post-analysis reporting, which presents the results derived from direct evidence, indirect evidence, and the synthesis by an NMA and provides information about treatment ranking.9,19,20,21,22 In the same survey by Bafeta et al.,17 however, many NMAs still did not report these post-analysis results that were important for decision making. Recently, the PRISMA-NMA statement23 appeared in the literature; hopefully it will provide a better guideline for properly conducting and reporting future NMAs.

This article focuses on the pre-analysis reporting of NMAs. Although treatment networks seem simple, they are usually reported inconsistently. For example, in the PRISMA-NMA statement23, the width of a network edge was proportional to the number of studies that directly compared the corresponding treatments. However, in an NMA of non-acute coronary artery disease,24 the edge width was proportional to the cumulative number of patients for each comparison. Both types of networks have certain advantages and flexibility. Since the number of studies for each comparison is always available in an NMA, we can present the former type of networks in all cases. Some papers may not report the total number of patients, so the latter network may be infeasible in this situation. Nevertheless, because in practice the collected studies differ in their sizes and thus contribute different strength of evidence, and the latter network provides more accurate evidence than the former one. In fact, when we use the treatment network with each edge proportional to the number of studies, the presented evidence is based on the condition that each study contributes equal evidence. Using the network with each edge proportional to the cumulative number of patients, the evidence is based on the condition that each patient in all studies contributes equal evidence. Although both conditions may be unrealistic, these treatment networks are essential and critical for researchers preliminarily understanding the structures of treatment comparisons in NMAs.

Nearly all current treatment networks present direct evidence. However, preliminarily assessing overall evidence obtained in an NMA is equivalently important, because it gives us some insight about how much the NMA improves treatment effect estimates compared with the much simpler pairwise meta-analyses. Motivated by the foregoing networks of direct evidence, this article proposes the effective number of studies, the effective sample size, and the effective precision to quantify the overall evidence in NMAs at the pre-analysis stage.

2 | METHODS

2.1 | The effective number of studies: an example

Consider an NMA with N studies; each study compares a subset of K treatments. Let Nhk be the number of studies that directly compare treatment k vs. h for 1 ≤ hkK, so Nhk = Nkh and N = Σh<k Nhk.

We take the NMA of non-acute coronary artery disease by Trikalinos et al.24 as an illustrative example. It has a binary outcome (death) and compares K = 4 treatments: medical therapy, percutaneous transluminal balloon coronary angioplasty, bare-metal stents, and drug-eluting stents. Originally, 61 trials have been collected. Among them, one trial compares percutaneous trans-luminal balloon coronary angioplasty vs. medical therapy in stratified patients with single- and double-vessel coronary artery disease, and another trial compares slow- and moderate-release paclitaxel-eluting stents vs. bare-metal stents. Trikalinos et al.24 treated each of these two trials as two data entries in the NMA, and this article follows their analysis. Therefore, the NMA effectively contains N = 63 unique studies; N12 = 7, N13 = 4, N23 = 34, and N34 = 18. Its treatment network is in Figure 1(a); the edge width is proportional to the corresponding Nhk. Treatments 1, 2, and 3 form an evidence cycle, but 4 vs. 3 is not in any cycle.

FIGURE 1.

FIGURE 1

Treatment networks of the NMA by Trikalinos et al.24 Each network’s edge width is proportional to the measure shown in the subfigure’s title. The treatment IDs are: (1) medical therapy; (2) percutaneous transluminal balloon coronary angioplasty; (3) bare-metal stents; and (4) drug-eluting stents.

In this convenient example, each of the 63 studies compares two treatment arms. However, an NMA may contain a few multi-arm studies. To derive the effective number of studies, we treat the treatment comparisons in a multi-arm study as separate comparisons from different studies, so the effective number of studies may be overestimated (see more details in Section 4). For example, if a study compares treatments 1, 2, and 3, then it is decomposed as three studies of 2 vs. 1, 3 vs. 2, and 3 vs. 1. In addition, we tentatively assume that all studies contribute the same strength of evidence to the NMA and their estimated effect sizes (e.g., the log odds ratio for binary outcomes) share a common within-study variance σ2. This assumption was also used to derive the popular I2 statistic for assessing between-study heterogeneity.25

First, we focus on the overall evidence of 2 vs. 1. Let y12,0 be the relative effect estimate using the direct evidence. Since N12 studies directly compare 2 vs. 1, the variance of y12,0 is υ12,0=σ2N121 under the fixed-effects setting. The NMA incorporates the indirect evidence from comparisons 3 vs. 1 and 2 vs. 3 in the overall evidence of 2 vs. 1, and we denote the relative effect estimate using such indirect evidence as y12,1 = y13,0 + y32,0. Here, y13,0 and y32,0 are the relative effect estimates using the direct evidence of 3 vs. 1 and 2 vs. 3, respectively, so the variance of y12,1 is υ12,1=σ2(N131+N231). Using the traditional inverse-variance method, the overall relative effect of 2 vs. 1 is estimated as

y12=(υ12,01y12,0+υ12,11y12,1)/(υ12,01+υ12,11),

and its variance is

υ12=(υ12,01+υ12,11)1=σ2[N12+(N131+N231)1]1.

The effective number of studies for the overall evidence of 2 vs. 1, denoted as E12, is defined as the quantity that satisfies υ12=σ2E121. Therefore,

E12=N12+(N131+N231)1=10.6; (1)

that is, approximately 10.6 studies effectively contribute to the overall evidence of 2 vs. 1. Similarly, the effective numbers of studies for 3 vs. 1 and 3 vs. 2 are E13 = 9.8 and E23 = 36.5.

Equation (1) indicates that the indirect evidence effectively contributes additional (N131+N231)1=3.6 studies to the overall effect estimate for 2 vs. 1. The indirect evidence depends on each part (3 vs. 1 and 2 vs. 3) that consists of it. Because (N131+N231)1 does not exceed the minimum of N13 and N23, so any part with weak evidence will contaminate the whole indirect evidence. In an extreme case, if N13 = 0, then (N131+N231)1=0.

Second, consider the comparison 4 vs. 3. Because the overall relative effect y34 is informed only by the direct evidence y34,0 with variance σ2 N341, its effective number of studies is E34 = N34 = 18.

Finally, for the comparison of 4 vs. 1, its overall relative effect y14 is informed only by the indirect evidence, but such evidence has two sources: a) 4 vs. 3 plus 3 vs. 1; and b) 4 vs. 3 plus 3 vs. 2 plus 2 vs. 1. Equivalently, we may view the overall relative effect as a combination of the direct evidence of 4 vs. 3 and the overall evidence of 3 vs. 1; that is, y14 = y13 + y34,0. Its variance is υ14=σ2(E131+N341), so the effective number of studies for 4 vs. 1 is E14=(E131+N341)1=6.3. Similarly, the effective number of studies for 4 vs. 2 is E24=(E231+N341)1=12.1. Figure 1(d) presents the network based on the Ehk’s.

2.2 | Generalizing the effective number of studies

The above calculation is simple and straightforward for networks with a few treatments and evidence cycles; however, it becomes much more complicated when more cycles appear. This section explores the general formula of the effective number of studies. Again, Ehk is the quantity that satisfies υhk=σ2Ehk1, where υhk is the variance of the overall relative effect estimate of k vs. h. Clearly, Ehk = Ekh. Let θhk be the true relative effect of k vs. h. If the two treatments are directly compared, let yhk,0 be the relative effect estimate using the direct evidence, whose population mean is θhk. Also, suppose that there are M sources of the indirect evidence for k vs. h. In the illustrative example, M = 0 for 4 vs. 3, M = 1 for 2 vs. 1, 3 vs. 1, and 3 vs. 2, and M = 2 for 4 vs. 1 and 4 vs. 2. If M > 0, denote yhk,m as the relative effect estimate using the mth source (0 < mM) of the indirect evidence. Each source is a combination of the direct evidence of certain different comparisons; that is, there is a path via treatment nodes 1, 2, …, q that link treatments h and k in the network, so yhk,m=yhl1,0+yl1l2,0++ylq1lq,0+ylqk,0. Under the assumption of consistency between direct and indirect evidence, the relative effect estimate using each source of the indirect evidence has the same population mean θhk. Consequently, we consider a fixed-effect model:

y=(yhk,0,yhk,1,,yhk,M)T~N(θhk1,Vhk),

where 1 is an (M + 1)-dimensional vector with all elements being one, and Vhk = (υhk,rs) is the variance-covariance matrix of the relative effect estimates using different sources of evidence for k vs. h. The maximum likelihood method yields an overall relative effect estimate

θˆhk=(1TVhk11)1(1TVhk1y)

with variance

υhk=(1TVhk11)1.

Under the tentative assumption that all studies have the equal variance σ2, the variance of the relative effect estimate using the direct evidence is υhk,0=σ2Nhk1, and the variance of that using the mth source of the indirect evidence is υhk,m=σ2(Nhl11+Nl1l21++Nlq1lq1+Nlqk1). If both direct and indirect evidence is available, the correlation between their relative effect estimates is zero; that is, the first row and column of Vhk are all zero except the first diagonal element υhk,0. If the relative effect estimates yhk,r and yhk,s (0 < rsM) using different sources of the indirect evidence share some common treatment comparisons, say j vs. i, then their covariance is υhk,rs=σ2(i,j)Nlilj1. Consequently, the variance-covariance matrix can be written as Vhk = σ2Shk, where the matrix Shk depends only on the Nhk’s. Because the variance of the overall relative effect estimate is υhk=σ2Ehk1=σ2(1TShk11)1, the effective number of studies for the overall evidence of k vs. h is calcualted as

Ehk=1TShk11.

When a comparison is informed by many sources of evidence, some sources may be linear combinations of others; therefore, the matrix Shk may be positive semidefinite and thus y may follow a degenerate multivariate normal distribution.26 Using eigen-decomposition of Shk, we can project y to a lower dimension that retains all evidence and calculate the effective number of studies as Ehk=1TQhk+(Λhk+)1(Qhk+)T1, where Λhk+ is the diagonal matrix of all positive eigenvalues of Shk and their corresponding eigenvectors form the columns of Qhk+.

We revisit the illustrative example to demonstrate the general formula of Ehk. Consider the comparison 2 vs. 1; it is informed by the direct evidence, providing the relative effect estimate y12,0, and one source of the indirect evidence, providing the estimate y12,1 = y13,0 + y32,0. Therefore, the matrix S12 is

S12=[N12100N131+N231],

and the effective number of studies for 2 vs. 1 is E12=1TS1211=10.6. For 4 vs. 1, it is informed by no direct evidence but two sources of the indirect evidence, providing the relative effect estimates y14,1 = y13,0 + y34,0 and y14,2 = y12,0 + y23,0 + y34,0. Then, the matrix S14 is

S14=[N121+N341N341N341N121+N231+N341];

thus, E14=1TS1411=6.3.

2.3 The effective sample size

The assumption of equal variance in each study is unrealistic in practice. If the sample size of each study is available, then we may relax this assumption: the precision of the relative effect estimate provided by each study is proportional to its sample size. Specifically, denote ni as the total sample size in all treatment groups in study i (i = 1, 2, …, N). Suppose the variance of the relative effect estimate provided by study i is υi=cni1, where c is a common constant across studies. For example, the constant c can reflect the variance of the samples’ responses within each study when the effect size is the mean difference; it differs from the σ2 in Section 2.1 which denotes the variance of study i’s summary result.

Let Ahk be the set of studies that directly compare treatments h and k. Using the inverse-variance method, the variance of the relative effect estimate using the direct evidence of k vs. h is υhk,0=(iAhkυi1)1=c(iAhkni)1. That is, for each treatment comparison, the sample size that effectively contributes to the direct evidence is the cumulative number of patients in the studies that directly give the comparison. Denote this cumulative sample size for the comparison k vs. h as nhk=iAhkni. This was used in Trikalinos et al.24 to present the network of the direct evidence; Figure 1(b) gives the network with edge width proportional to the nhk’s.

To derive the sample size that effectively contributes to the overall evidence for each comparison, we follow the foregoing calculation of the effective number of studies. Now, the variance of the relative effect estimate using the direct evidence is υhk,0=cnhk1, instead of σ2Nhk1 as in Section 2.2. Denote the effective sample size for the overall evidence of k vs. h as ESShk; similarly, it can be calculated as ESShk=1TShk11. The matrix Shk here is based on the cumulative number of patients nhk, not the number of studies Nhk. In the illustrative example, 25,388 patients were included in the NMA, and the cumulative numbers of patients for the direct comparisons are n12 = 1991, n13 = 4619, n23 = 11, 020, and n34 = 7758. Consequently, the effective sample size for the overall evidence of 2 vs. 1 is ESS12=n12+(n131+n231)1=5246; similarly, ESS13 = 6305 and ESS23 = 12, 411. As 4 vs. 3 is informed only by direct evidence, ESS34 = n34 = 7758. For the comparison 4 vs. 1,

S14=[n131+n341n341n341n121+n231+n341],

so ESS14=1TS1411=3478. Similarly, ESS24 = 4774 for 4 vs. 2. Figure 1(e) presents the network with edge width proportional to the effective sample size.

2.4 | The effective precision

We can further refine the measure of the overall evidence in an NMA if more information is available for each study. For example, for binary outcomes, suppose that we know all four cells of the 2 × 2 table of each study, and denote them as ni00, ni01, ni10, and ni11 for study i. Here, the first index of 0/1 in the subscript indicates treatment group (control/treatment) and the second one indicates outcome (no/yes). When the log odds ratio is used as the effect size, the within-study variance is usually approximated as υi=ni001+ni011+ni101+ni111; a correction of 0.5 can be added to each cell if any cell in the 2 × 2 table is zero.27 Therefore, as in Section 2.3, the variance of the relative effect estimate using the direct evidence of k vs. h is υhk,0=(iAhkυi1)1, where Ahk contains the studies that directly compare treatments h and k. Thus, the precision of the direct evidence is Phk=υhk,01, and Figure 1(c) shows the network with edge width proportional to this precision. Furthermore, we can use these variances/precisions to obtain the matrix Shk as in the calculation of the effective number of studies, and the effective precision of the overall evidence is similarly defined as EPhk=1TShk11. Note that, to derive the effective precision, the variances υhk,0 do not involve other parameters such as σ2 and c in Sections 2.1–2.3, so the matrix Shk equals to the variance-covariance matrix Vhk. In the NMA by Trikalinos et al.,24 which has a binary outcome (death), the effective precisions of the overall evidence are EP12 = 67.4, EP13 = 102, EP23 = 65.2, EP34 = 25.3, EP14 = 20.3, and EP24 = 18.2. Figure 1(f) shows the comparisons with edge width proportional to the effective precision.

When the outcome is continuous, suppose that we know the sample sizes in both treatment groups in each study, denoted as ni0 and ni1 in study i. Assuming that the treatment groups share a common population variance within each study, the mean difference and the standardized mean difference (including Cohen’s d and Hedges’ g) are widely used as the effect size.28,29 The mean difference’s within-study variance is estimated as υi=(ni01+ni11)sip2, where sip2 is the pooled sample variance in the two groups of study i. The standardized mean difference’s within-study variance is often approximated as υi=ni01+ni11+yi22(ni0+ni1), where yi is the estimated standardized mean difference.29 Based on these, the effective precision for continuous outcomes can be similarly calculated.

3 | MORE COMPLICATED EXAMPLES

We apply the proposed measures of overall evidence to three NMAs with more complicated structures; they all have binary outcomes. The first NMA was conducted Welton et al.30 to compare psychological interventions in coronary heart disease, and the outcome is total mortality. The second one was conducted by Elliott and Meyer31 to investigate the effect of antihypertensive drugs on incidence diabetes mellitus. The last one was collected by Picard and Tramèr32 to compare interventions to prevent pain on injection with propofol.

The treatment networks in the upper panels of Figures 24 show the direct evidence of all treatment comparisons in the three NMAs; those in the lower panels demonstrate the overall evidence using the proposed measures. The exact results of the effective number of studies, the effective sample size, and the effective precision are in the Supplementary Information.

FIGURE 2.

FIGURE 2

Treatment networks of the NMA by Welton et al.30 Each network’s edge width is proportional to the measure shown in the subfigure’s title. The treatment IDs are: (1) usual control; (2) educational; (3) behavioral; (4) cognitive; (5) support; (6) educational + behavioral; (7) educational + cognitive; (8) educational + relaxation; (9) behavioral + cognitive; (10) behavioral + relaxation; (11) cognitive + relaxation; (12) cognitive + support; (13) educational + behavioral + cognitive; (14) educational + behavioral + relaxation; (15) educational + cognitive + relaxation; and (16) behavioral + cognitive + support.

FIGURE 4.

FIGURE 4

Treatment networks of the NMA by Picard and Tramèr.32 Each network’s edge width is proportional to the measure shown in the subfigure’s title. The treatment IDs are: (1) placebo; (2) no treatment; (3) lidocaine before; (4) lidocaine mixed; (5) lidocaine + tourniquet; (6) opioids; (7) metoclopramide; and (8) temperature.

The NMA by Welton et al.30 contains 35 studies, including 4 three-arm studies. It compares a total of 16 treatments. The treatment names are shown in Figure 2’s legend. Although its network seems large, most pairs of treatments are compared by no more than one study. The large networks based on the direct evidence in Figures 2(a)–(c) might make audience believe that the NMA could greatly improve the effect estimate of each comparison. However, among all 16 × 15/2 = 120 possible pairs of treatments, only 8 (7%) pairs are effectively compared by at least three studies. The effective sample size and the effective precision also indicate that the overall evidence of most comparisons is limited. Figures 2(d)–(f) show that the edge of 4 vs. 1 is the thickest in the networks, so this comparison has the strongest overall evidence from the NMA. The edges of other comparisons are much thinner than 4 vs. 1. Also, using different measures of evidence, the networks look different. For example, the comparisons 6 vs. 1, 7 vs. 1, and 14 vs. 1 have noticeably thicker edges in Figure 2(d) compared with those in Figures 2(e) and 2(f), while the edge of 11 vs. 1 in Figure 2(e) is thicker compared with the other two networks.

Figure 3 presents the direct and overall evidence in the NMA by Elliott and Meyer,31 which contains 22 studies, comparing a total of six treatments. Among the 22 studies, 4 are three-armed. Figure 3’s legend gives the treatment names. All treatments are directly compared except 5 vs. 3, so the network is well-connected and contains many cycles that can synthesize the direct and indirect evidence. The upper panel in Figure 3 shows that the comparison 6 vs. 4 has the strongest direct evidence; the edges of 2 vs. 1, 3 vs. 1, and 6 vs. 3 are also thicker than most other comparisons. Using the proposed measures of overall evidence, Figure 3’s lower panel demonstrates that the edges of nearly all comparisons are noticeably thickened compared with the networks of the direct evidence. They indicate that much indirect evidence is gained from the NMA beyond the direct evidence.

FIGURE 3.

FIGURE 3

Treatment networks of the NMA by Elliott and Meyer.31 Each network’s edge width is proportional to the measure shown in the subfigure’s title. The treatment IDs are: (1) placebo; (2) thiazide diuretic; (3) converting-enzyme inhibitor; (4) calcium-channel blocker; (5) angiotensin-receptor blocker; and (6) β blocker.

For the NMA by Picard and Tramèr,32 43 studies have been collected to compare eight treatments, which are detailed in Figure 4’s legend. It contains 12 three-arm studies and 3 four-arm studies. Like the NMA by Elliott and Meyer,31 this NMA is also well-connected, and only five pairs of treatments among all 8 × 7/2 = 28 possible pairs are not directly compared. The networks in Figure 4’s upper panel based on different measures indicate similar strength of the direct evidence. The comparison of 4 vs. 1 has the strongest direct evidence. The NMA greatly improves the evidence of all comparisons, because most edges in Figure 4’s lower panel of the overall evidence are much thicker than those in the networks of the direct evidence. For example, although only two studies directly compare treatments 1 and 2, they are effectively compared by 14 studies after incorporating the indirect evidence. Also, for the same comparison, the sample size is increased from 133 to 907 and the precision is increased from 5.1 to 41.3 by performing the NMA.

4 | DISCUSSION

Because currently many NMAs did not properly report the structure of treatment comparisons and the existing methods present only direct evidence,17 this article focused on the strategy of presenting overall evidence in NMAs and proposed three new measures for this purpose. The case studies of four real NMAs showed that the new measures intuitively quantified the overall evidence, so that the potential benefit of performing NMAs could be easily evaluated at the preliminary stage. Some NMAs, such as the one by Welton et al.,30 may not greatly outperform the traditional pairwise meta-analyses, primarily because their networks may contain few cycles and thus the synthesis of direct and indirect evidence is limited. Prestigious journals tend to accept meta-analyses having many studies, because they may contain more evidence and produce more precise estimates than small meta-analyses. However, using the traditional concept of “the number of studies” to evaluate an NMA is a black box for journal editors and reviewers. The NMA may collect many studies, but most may focus on comparing a few frequently-investigated or well-established treatments. As NMAs claim to synthesize all available evidence, editors and reviewers may have little concerns about too few studies for certain comparisons. For example, despite that the NMA by Welton et al.30 contains 35 studies, most of them compare three treatments with the control among the total of 16 treatments, and the effective numbers of studies for many comparisons are less than two. The NMA may provide little benefit for these comparisons, because meta-analysis is usually considered as a method to synthesize multiple (i.e., at least two) studies. The new measures of overall evidence may be good additions to reporting treatment networks for preliminarily evaluating NMAs.

The three new measures of overall evidence are based on different assumptions. The effective number of studies requires the strongest assumption that each study contributes equal evidence, and the effective sample size assumes that the precision of each study is proportional to its total sample size. The effective precision requires a weaker assumption by using estimated within-study variances, but its interpretation may not be as intuitive as the former two measures. Researchers may choose a proper measure based on how they want to interpret the evidence in an NMA.

The measures of overall evidence are designed to help researchers preliminarily understanding and evaluating NMAs; they are not intended to supersede formal statistical analyses. To be more rigorous, we may view the new measures as the upper bound of the number of studies, sample size, or precision that effectively informs overall evidence because of three reasons below. First, a fixed-effect model was used to derive the new measures, so the variance of the overall effect estimate of each treatment comparison may be underestimated if heterogeneity is not ignorable in an NMA.33 Since the new measures of overall evidence are proportional to the inverse of the variance, they may be overestimated. Second, we also assumed consistency between direct and indirect evidence for all comparisons; nevertheless, a noticeable number of NMAs in the literature were found to suffer from evidence inconsistency.16 If the NMA model accounts for inconsistency, the overall evidence may be weakened because the variances of the overall effect estimates increase due to the additional uncertainties introduced by inconsistency. Third, some NMAs may contain several multi-arm studies, which compare more than two treatments. The derivations of the new measures simply treated them as separate two-arm studies, leading the proposed measures to be overestimated.

In summary, because the new measures may serve as upper bounds to quantify overall evidence, if they indicate that many treatment comparisons in an NMA have weak overall evidence (e.g., the effective number of studies is less than two), then researchers may gain little benefits from the NMA. Table 1 briefly summarizes the interpretations, assumptions, limitations, and advantages of the three proposed measures of overall evidence.

TABLE 1.

The three proposed measures of overall evidence for NMAs.

Measure of Overall Evidence Interpretation Assumption Limitation & Advantage
The effective number of studies The maximum number of studies that effectively contribute to the overall evidence All studies share a common within-study variance; that is, all studies contribute the same strength of evidence Its assumption is unlikely true in practice, so it is a rough measure; however, it has an intuitive interpretation and pro-vides importation information for meta-analysts
The effective sample size The maximum number of samples that effectively contribute to the overall evidence Each study’s variance is pro-portional to the inverse of its sample size with a common proportion coefficient; that is, all samples contribute the same strength of evidence Its assumption may be violated in some cases (e.g., when the studies have difference treatment allocation ratios), while its interpretation is straightforward and informative
The effective precision The best precision (the inverse of the variance) of the overall evidence The within-study variance for each treatment comparison in each study is available Some papers may not report all within-study variances that are required for calculating this measure, and its interpretation is less intuitive than the other two, while it is more accurate

In the literature, the statistical methods for NMAs may be classified into two groups: the contrast-based methods that focus on estimating relative effects of treatment contrasts,4 and the arm-based methods that directly estimate absolute effects of treatment arms.5,34 The contrast-based methods are currently the most popular tools for NMAs, while the arm-based methods recently become increasingly acceptable.35,36 This article derived the measures of overall evidence from the contrast-based perspective. Similar measures of overall evidence for the arm-based NMAs may require different assumptions, and we leave these as future work.

In addition, the three proposed methods may be considered as absolute measures of overall evidence, and we can use them to further derive relative measures to assess the benefits of performing NMAs. For example, we may define the proportion of information provided by indirect evidence among overall evidence as (1 − Nhk/Ehk) × 100%, (1 − nhk/ESShk) × 100%, or (1−Phk/EPhk) × 100%. These proportions are in the range between 0% and 100%, and they also have attractive interpretations for assessing the indirect evidence at the pre-analysis stage. They are similar to the borrowing of strength (BoS) statistic proposed by Jackson et al.,37 which is calculated as BoS = (1 − E) × 100%, where the efficiency E is the ratio of the variance of the effect estimate using the NMA to that using only the direct evidence. However, the BoS statistic uses the results from strict NMA models at the post-analysis stage, and it varies when using different statistical models. In future work, we will empirically investigate the magnitude of the relative measures based on published NMAs and provide a guideline to assess the quality of NMAs.

The R code to calculate the proposed measures of overall evidence is in the Supplementary Information. An important step in the calculation is to find all sources of indirect evidence for each treatment comparison. We used the R package “igraph” to realize this step.38 When a network contains many treatments and evidence cycles, there may be exponentially many paths between two treatment nodes, leading to computational difficulties. Although the number of sources of indirect evidence may be huge, most of them can be expressed as combinations of certain sources in a small basis set. This is closely related to the concept of independent cycles for assessing inconsistency.11 Future work also includes efficiently identifying the basis set of indirect evidence so that the new measures of overall evidence can be calculated much faster and easily applied in large networks.

Supplementary Material

1

Acknowledgments

We thank the associate editor and two anonymous referees for their helpful comments that substantially improved the quality of this article.

Financial disclosure

This work was partially supported by AHRQ R03 HS024743.

Footnotes

SUPPORTING INFORMATION

Additional Supporting Information can be found online in the supporting information tab for this article.

Conflict of interest

The author declares no potential conflict of interests.

References

  • 1.Sutton AJ, Higgins JPT. Recent developments in meta-analysis. Statistics in Medicine. 2008;27(5):625–650. doi: 10.1002/sim.2934. [DOI] [PubMed] [Google Scholar]
  • 2.Salanti G. Indirect and mixed-treatment comparison, network, or multiple-treatments meta-analysis: many names, many benefits, many concerns for the next generation evidence synthesis tool. Research Synthesis Methods. 2012;3(2):80–97. doi: 10.1002/jrsm.1037. [DOI] [PubMed] [Google Scholar]
  • 3.Lumley T. Network meta-analysis for indirect treatment comparisons. Statistics in Medicine. 2002;21(16):2313–2324. doi: 10.1002/sim.1201. [DOI] [PubMed] [Google Scholar]
  • 4.Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Statistics in Medicine. 2004;23(20):3105–3124. doi: 10.1002/sim.1875. [DOI] [PubMed] [Google Scholar]
  • 5.Zhang J, Carlin BP, Neaton JD, et al. Network meta-analysis of randomized clinical trials: reporting the proper summaries. Clinical Trials. 2014;11(2):246–262. doi: 10.1177/1740774513498322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lin L, Zhang J, Hodges JS, Chu H. Performing arm-based network meta-analysis in R with the pcnetmeta package. Journal of Statistical Software. 2017;80(1):1–25. doi: 10.18637/jss.v080.i05. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lu G, Ades AE. Modeling between-trial variance structure in mixed treatment comparisons. Biostatistics. 2009;10(4):792–805. doi: 10.1093/biostatistics/kxp032. [DOI] [PubMed] [Google Scholar]
  • 8.Bucher HC, Guyatt GH, Griffith LE, Walter SD. The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials. Journal of Clinical Epidemiology. 1997;50(6):683–691. doi: 10.1016/s0895-4356(97)00049-8. [DOI] [PubMed] [Google Scholar]
  • 9.Rücker G, Schwarzer G. Ranking treatments in frequentist network meta-analysis works without resampling methods. BMC Medical Research Methodology. 2015;15:58. doi: 10.1186/s12874-015-0060-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Rücker G. Network meta-analysis, electrical networks and graph theory. Research Synthesis Methods. 2012;3(4):312–324. doi: 10.1002/jrsm.1058. [DOI] [PubMed] [Google Scholar]
  • 11.Lu G, Ades AE. Assessing evidence inconsistency in mixed treatment comparisons. Journal of the American Statistical Association. 2006;101(474):447–459. [Google Scholar]
  • 12.Cipriani A, Higgins JPT, Geddes JR, Salanti G. Conceptual and technical challenges in network meta-analysis. Annals of Internal Medicine. 2013;159(2):130–137. doi: 10.7326/0003-4819-159-2-201307160-00008. [DOI] [PubMed] [Google Scholar]
  • 13.Zhang J, Fu H, Carlin BP. Detecting outlying trials in network meta-analysis. Statistics in Medicine. 2015;34(19):2695–2707. doi: 10.1002/sim.6509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Zhang J, Yuan Y, Chu H. The impact of excluding trials from network meta-analyses – an empirical study. PLOS ONE. 2016;11(12):e0165889. doi: 10.1371/journal.pone.0165889. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lin L, Chu H, Hodges JS. Sensitivity to excluding treatments in network meta-analysis. Epidemiology. 2016;27(4):562–569. doi: 10.1097/EDE.0000000000000482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Veroniki AA, Vasiliadis HS, Higgins JPT, Salanti G. Evaluation of inconsistency in networks of interventions. International Journal of Epidemiology. 2013;42(1):332–345. doi: 10.1093/ije/dys222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bafeta A, Trinquart L, Seror R, Ravaud P. Reporting of results from network meta-analyses: methodological systematic review. BMJ. 2014;348:g1741. doi: 10.1136/bmj.g1741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Chaimani A, Higgins JPT, Mavridis D, Spyridonos P, Salanti G. Graphical tools for network meta-analysis in STATA. PLOS ONE. 2013;8(10):e76654. doi: 10.1371/journal.pone.0076654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Tan SH, Cooper NJ, Bujkiewicz S, Welton NJ, Caldwell DM, Sutton AJ. Novel presentational approaches were developed for reporting network meta-analysis. Journal of Clinical Epidemiology. 2014;67(6):672–680. doi: 10.1016/j.jclinepi.2013.11.006. [DOI] [PubMed] [Google Scholar]
  • 20.Salanti G, Ades AE, Ioannidis JPA. Graphical methods and numerical summaries for presenting results from multiple-treatment meta-analysis: an overview and tutorial. Journal of Clinical Epidemiology. 2011;64(2):163–171. doi: 10.1016/j.jclinepi.2010.03.016. [DOI] [PubMed] [Google Scholar]
  • 21.Noma H, Tanaka S, Matsui S, Cipriani A, Furukawa TA. Quantifying indirect evidence in network meta-analysis. Statistics in Medicine. 2017;36(6):917–927. doi: 10.1002/sim.7187. [DOI] [PubMed] [Google Scholar]
  • 22.Riley RD, Jackson D, Salanti G, et al. Multivariate and network meta-analysis of multiple outcomes and multiple treatments: rationale, concepts, and examples. BMJ. 2017;358:j3932. doi: 10.1136/bmj.j3932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hutton B, Salanti G, Caldwell DM, et al. The PRISMA extension statement for reporting of systematic reviews incorporating network meta-analyses of health care interventions: checklist and explanations. Annals of Internal Medicine. 2015;162(11):777–784. doi: 10.7326/M14-2385. [DOI] [PubMed] [Google Scholar]
  • 24.Trikalinos TA, Alsheikh-Ali AA, Tatsioni A, Nallamothu BK, Kent DM. Percutaneous coronary interventions for non-acute coronary artery disease: a quantitative 20-year synopsis and a network meta-analysis. The Lancet. 2009;373(9667):911–918. doi: 10.1016/S0140-6736(09)60319-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine. 2002;21(11):1539–1558. doi: 10.1002/sim.1186. [DOI] [PubMed] [Google Scholar]
  • 26.Rao CR. Linear Statistical Inference and Its Applications. 2nd. New York, NY: John Wiley & Sons; 2002. [Google Scholar]
  • 27.Higgins JPT, Green S. Cochrane Handbook for Systematic Reviews of Interventions. Chichester, UK: John Wiley & Sons; 2008. [Google Scholar]
  • 28.Cooper H, Hedges LV, Valentine JC. The Handbook of Research Synthesis and Meta-Analysis. 2nd. New York, NY: Russell Sage Foundation; 2009. [Google Scholar]
  • 29.Hedges LV, Olkin I. Statistical Method for Meta-Analysis. Orlando, FL: Academic Press; 1985. [Google Scholar]
  • 30.Welton NJ, Caldwell DM, Adamopoulos E, Vedhara K. Mixed treatment comparison meta-analysis of complex interventions: psychological interventions in coronary heart disease. American Journal of Epidemiology. 2009;169(9):1158–1165. doi: 10.1093/aje/kwp014. [DOI] [PubMed] [Google Scholar]
  • 31.Elliott WJ, Meyer PM. Incident diabetes in clinical trials of antihypertensive drugs: a network meta-analysis. The Lancet. 2007;369(9557):201–207. doi: 10.1016/S0140-6736(07)60108-1. [DOI] [PubMed] [Google Scholar]
  • 32.Picard P, Tramer MR. Prevention of pain on injection with propofol: a quantitative systematic review. Anesthesia & Analgesia. 2000;90(4):963–969. doi: 10.1097/00000539-200004000-00035. [DOI] [PubMed] [Google Scholar]
  • 33.Mills EJ, Thorlund K, Ioannidis JPA. Demystifying trial networks and network meta-analysis. BMJ. 2013;346:f2914. doi: 10.1136/bmj.f2914. [DOI] [PubMed] [Google Scholar]
  • 34.Hong H, Chu H, Zhang J, Carlin BP. A Bayesian missing data framework for generalized multiple outcome mixed treatment comparisons. Research Synthesis Methods. 2016;7(1):6–22. doi: 10.1002/jrsm.1153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Hong H, Chu H, Zhang J, Carlin BP. Rejoinder to the discussion of “a Bayesian missing data framework for generalized multiple outcome mixed treatment comparisons,” by S. Dias and A.E. Ades. Research Synthesis Methods. 2016;7(1):29–33. doi: 10.1002/jrsm.1186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Efthimiou O, Debray TPA, Valkenhoef G, et al. GetReal in network meta-analysis: a review of the methodology. Research Synthesis Methods. 2016;7(3):236–263. doi: 10.1002/jrsm.1195. [DOI] [PubMed] [Google Scholar]
  • 37.Jackson D, White IR, Price M, Copas J, Riley RD. Borrowing of strength and study weights in multivariate and network meta-analysis. Statistical Methods in Medical Research. 2017;26(6):2853–2868. doi: 10.1177/0962280215611702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Csárdi G, Nepusz T. The igraph software package for complex network research. InterJournal. 2006;1695 [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES