Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2017 Jun 6;19(1):87–102. doi: 10.1093/biostatistics/kxx025

A Bayesian hierarchical model for network meta-analysis of multiple diagnostic tests

Xiaoye Ma 1,✉,#, Qinshu Lian 1,#, Haitao Chu 1,, Joseph G Ibrahim 2, Yong Chen 3
PMCID: PMC6454495  PMID: 28586407

SUMMARY

To compare the accuracy of multiple diagnostic tests in a single study, three designs are commonly used (i) the multiple test comparison design; (ii) the randomized design, and (iii) the non-comparative design. Existing meta-analysis methods of diagnostic tests (MA-DT) have been focused on evaluating the performance of a single test by comparing it with a reference test. The increasing number of available diagnostic instruments for a disease condition and the different study designs being used have generated the need to develop efficient and flexible meta-analysis framework to combine all designs for simultaneous inference. In this article, we develop a missing data framework and a Bayesian hierarchical model for network MA-DT (NMA-DT) and offer important promises over traditional MA-DT: (i) It combines studies using all three designs; (ii) It pools both studies with or without a gold standard; (iii) it combines studies with different sets of candidate tests; and (iv) it accounts for heterogeneity across studies and complex correlation structure among multiple tests. We illustrate our method through a case study: network meta-analysis of deep vein thrombosis tests.

Keywords: Diagnostic test, Hierarchical model, Missing data, Multiple test comparison, Network meta-analysis

1. Introduction

Comparative effectiveness research relies fundamentally on accurate assessment of clinical outcomes. The growing number of assessment instruments, as well as the rapid escalation in the cost, have generated the increasing need for scientifically rigorous comparisons of multiple diagnostic tests in clinical practice. To compare the accuracy of multiple diagnostic tests in a single study, three designs are commonly used (Takwoingi, Leeflang and Deeks, 2013): (i) The multiple test comparison design where all subjects are diagnosed by all candidate tests and verified by a gold standard; (ii) The randomized design where subjects are randomly assigned to one of candidate tests, and all subjects are verified by a gold standard; and (iii) The non-comparative design where different sets of subjects are used to compare a candidate test to a gold standard or to another candidate test. Systematic reviews and meta-analysis methods have been developed as useful tools to improve the estimation of diagnostic test accuracy by combining information from multiple studies (Rutter and Gatsonis, 2001; Reitsma and others, 2005). Thus, a flexible meta-analysis framework is needed to combine information from all three designs for effectively ranking all candidate tests.

However, in the methodology literature of meta-analysis of diagnostic tests, a great deal of attention has been devoted to developing methods to estimate the performance of one candidate test compared to a reference test. When the reference test is a gold standard, multivariate random effects models are developed to account for the heterogeneity of test performance across studies and correlations among test accuracy indices (such as sensitivity and specificity) (Rutter and Gatsonis, 2001; Reitsma and others, 2005; Chu and Cole, 2006; Harbord and others, 2007; Chu, Chen and Louis, 2009; Ma and others, 2016b; Ma and others, 2016a; Chen and others, 2015). When the reference test cannot perfectly distinguish diseased and non-diseased subjects (i.e., non-gold standard), latent class random effects models (Chu, Chen and Louis, 2009, Dendukuri and others, 2012, Liu, Chen and Chu, 2015) are proposed to estimate diagnostic accuracy of both candidate and reference tests.

Very few papers have discussed how to simultaneously compare multiple candidate tests in meta-analysis. A naive procedure is to conduct separate meta-analysis methods of diagnostic tests (MA-DT) of each candidate test then compare their summary estimates, which is valid only under the missing completely random (MCAR) assumption. However, there are some important drawbacks of this procedure. First, for studies that compared multiple diagnostic tests, the accuracy estimates of each candidate test from separate MA-DT are typically correlated, as multiple test comparison design may be used for some studies and some subjects may be evaluated by multiple tests. Ignoring such correlations can lead to efficiency loss. Secondly, current methods are not able to combine studies comparing a candidate test with a gold standard and studies comparing a candidate test with a non-gold standard reference; Thirdly, when candidate tests are evaluated one at a time, the number of studies is typically small, which can potentially lead to issues of model fitting (Hamza and others, 2008) and difficulty in estimating between-study heterogeneity. In addition, as different studies that represent heterogeneous populations are synthesized, the candidate tests are not directly comparable without certain strong assumptions, thus limiting the generalizability of results. At last, separate MA-DT does not allow for “borrowing of information,” which can potentially lead to statistical efficiency loss.

To address these limitations, we develop a network MA-DT (NMA-DT) framework from the perspective of missing data analysis to simultaneously compare multiple tests. The proposed framework is motivated from the literature on network meta-analysis of randomized clinical trials, which extends the scope of traditional pairwise meta-analysis by synthesizing both direct and indirect comparisons of multiple treatments across randomized controlled trials (Lu and Ades, 2004, Salanti and others, 2011, Zhang and others, 2014). Specifically, we view studies using the randomized design and non-comparative design as if they were designed using the multiple test comparison design such that all subjects in all studies were evaluated by all candidate tests and a gold-standard test. However, most of the studies only include a subset of the whole set of tests of interest. The test outcomes from non-included tests are considered as missing data. By simultaneously comparing all candidate tests and a gold standard, the proposed approach can make use of all available information, allow for borrowing of information across studies and rank diagnostic tests through full posterior inferences. It effectively handles three critical challenges in the traditional MA-DT by (i) combining information from studies with all three designs; (ii) pooling both studies with or without a gold standard; and (iii) allowing different sets of candidate tests in different studies or different subsets of subjects within a study. This model also accounts for potential heterogeneity across studies (due to difference of study population, design and lab technical issues) as in conventional MA-DT models, as well as the complex correlation structure among multiple diagnostic tests.

The rest of this article is organized as follows. In Section 2, we describe our motivating case study: NMA of deep vein thrombosis (DVT) tests. We present the proposed NMA-DT model and the Bayesian inference method in Section 3 and apply the proposed method to the motivating study in Section 4. Simulation studies are conducted in Section 5, and Section 6 provides a brief discussion. A directed graphical model of the proposed model, an additional case study, data and some additional results are provided in the supplementary material available at Biostatistics online.

2. Motivating study: NMA of DVT tests

DVT is developed when blood clots form in one or more deep veins of the human body. If DVT is left untreated, the blood clot can cause pulmonary embolus and result in death (Venta and Venta, 1987). The gold standard diagnostic test for DVT, contrast venography, is an invasive procedure and can introduce allergic reaction. Therefore, ultrasonography is a commonly used surrogate test because it is noninvasive and has good accuracy. Alternatively, D-dimer is a small protein fragment present in the blood when there is a blood clot and thus testing its concentration in a screening blood test can also be used to diagnose DVT.

A recent paper by Kang and others (2013) presented a meta-analysis that included 12 studies comparing the accuracy of diagnostic tests for DVT. Among the 12 studies, 4 studies compared D-dimer test to venography, three studies compared ultrasonography to venography and 5 studies compared the D-dimer to ultrasonography (Kang and others, 2013). None of the studies compared the three tests together. A mixed-effects log-linear model was applied and random effects were incorporated to account for the heterogeneity in test accuracies of D-dimer but not for ultrasonography. In addition, the log linear model for test accuracies made it difficult to interpret the model parameters, and hard to generalize when comparing more diagnostic tests.

3. A unified statistical framework

We present a Bayesian hierarchical NMA-DT model to compare multiple tests simultaneously. In this article, we focus on modeling a commonly used pair of test accuracy indices, sensitivity (Se) and specificity (Sp), where sensitivity is the probability of a candidate test being positive given a diseased subject and specificity is the probability of a candidate test being negative given non-diseased (Pepe, 2003). In addition, other test accuracy indices such as positive and negative likelihood ratios (LR+ and LRInline graphic), positive and negative predictive values (PPV and NPV) can be useful in practice. LR+ (LRInline graphic) is the likelihood that a positive (negative) test result would be expected on a patient with the target disease compared to the likelihood that the same result would be expected on a patient without the disease. And PPV (NPV) describes the chance of truely diseased (non-diseased) given a positive (negative) test result. However, these test indices (PPV and NPV) are closely related to disease prevalence and estimation of these quantities requires information on prevalence. Furthermore, disease prevalence has been argued to be potentially correlated with Se and Sp and meta-analysis models accounting for such correlation are proposed (Chu and others, 2009; Leeflang and others, 2009). Therefore, the Bayesian hierarchical NMA-DT approach also models disease prevalence to account for these correlations and to provide the inference of other test accuracy indices. In this section, we first present the hierarchical model with random effects, then describe the prior distributions of the parameters. Next we provide the likelihood and posterior estimates.

3.1 Hierarchical model

We view different studies as if they were all designed to adopt a multiple test comparison design such that all studies should undergo a whole set of tests containing all candidate tests and a gold-standard. However, each of the studies includes a subset of the whole set, and the test outcomes from non-included tests are considered as missing data (Little and Rubin, 2002). We assume that the missing test outcomes are missing at random (MAR). Under MAR, the presence of a test does not depend on any unobserved characteristics, which means in our case missingness is independent of its sensitivity and specificity. In Section 3.4, we will provide a method for sensitivity analysis under missing not at random assumption (MNAR).

Let Inline graphic be a set of Inline graphic binary diagnostic tests, where Inline graphic denotes a gold standard and Inline graphic stand for candidate tests under evaluation. Suppose we have a collection of Inline graphic studies, where each of them reports outcomes of tests in a subset of Inline graphic. In the Inline graphicth study, for Inline graphic, let Inline graphic be the test outcome of Inline graphic on subject Inline graphic (Inline graphic if positive and Inline graphic if negative) and let Inline graphic be the missing data indicator (Inline graphic if Inline graphic is conducted to the Inline graphicth subject and 0 if not). Let Inline graphic be the study-specific disease prevalence: Inline graphic, Inline graphic. For Inline graphic, let Inline graphic and Inline graphic denote the study specific sensitivity and specificity for the Inline graphicth test, respectively: Inline graphic and Inline graphic. Denote Inline graphic as the set of candidate tests conducted on subject Inline graphic (Inline graphic) in the Inline graphicth study, and Inline graphic as the collection of candidate test outcomes for this subject.

Multivariate random effects are used to account for potential across study heterogeneities in prevalence, sensitivities and specificities and correlations among them. Specifically, we write

πi=Φ(η+εi),Seik=Φ(αk+μik),Spik=Φ(βk+νik),i=1,,N,k=1,,K, (3.1)

where Inline graphic is the standard normal cumulative distribution function for probit transformations. We note that other link functions can be specified as well. The parameter Inline graphic is the fixed effect for prevalence. The parameters Inline graphic and Inline graphic are the fixed effects for sensitivity and specificity of Inline graphic, respectively. The random effects Inline graphic, Inline graphic and Inline graphic are the study specific effects for prevalence, sensitivity and specificity of Inline graphic, respectively. It is straightforward to incorporate meta-regression covariates in (3.1) by

πi=Φ(η+η~Xi+εi),Seik=Φ(αk+α~Wi+μik),Spik=Φ(βk+β~Zi+νik),

for Inline graphic and Inline graphic, where Inline graphic, Inline graphic and Inline graphic are study-level covariates such as study population characteristics, and Inline graphic and Inline graphic are the corresponding coefficient vectors. In this paper, we focus on models without covariates for simplicity.

We introduce the within-study dependency structure of multiple test parameters by assuming that the random effect vector follows a multivariate normal distribution. Furthermore, this distribution also accounts for potential correlations between prevalence and the test accuracy parameters (Leeflang and others, 2009, Chu, Chen and Louis, 2009):

θi=(εi,μi1,νi1,,μiK,νiK)TMVN(0,Σ),i=1,,N. (3.2)

The covariance matrix Inline graphic can be written as Inline graphic, where Inline graphic is a Inline graphic diagonal matrix with diagonal elements (Inline graphic) capturing the between study heterogeneities and Inline graphic is a positive definite correlation matrix whose diagonal elements are 1 and the off-diagonal elements measure potential correlations among disease prevalence and the test accuracy parameters. We assume the same correlation structure for all studies. Therefore, studies reporting all test outcomes of Inline graphic contribute to estimating Inline graphic and studies with missing test outcomes directly contribute to estimating a submatrix of Inline graphic. By assuming MAR and the same covariance matrix across all studies, which is equivalent to assuming all studies apply the multiple test comparison design, the NMA-DT model can combine studies reporting different sets of candidate tests and make inferences on the relative test performances.

3.2 Likelihood specification

To derive the likelihood for the Inline graphicth subject in the Inline graphicth study, we first consider a subject that is tested by the gold standard test (Inline graphic) such that the true disease status is known. Conditional independence is assumed such that candidate test results are independent given the disease status. This assumption has been used in latent class models assessing accuracy of diagnostic tests without a gold standard (Chu, Chen and Louis, 2009). Denote Inline graphic, Inline graphic, Inline graphic and Inline graphic, the probability of the test outcomes for a diseased subject, given the random effects, is calculated as,

P(yij,yij0=1|η,α,εi,μi)=P(yij0=1)P(yij|yij0=1,η,α,εi,μi)=πikKij(Seik)yijk(1Seik)(1yijk)=πihij1, (3.3)

where Inline graphic. Similarly, the probability for a non-diseased subject Inline graphic in study Inline graphic is given by

P(yij,yij0=0|η,β,εi,νi)=P(yij0=0)P(yij|yij0=0,η,β,εi,νi)=(1πi)kKij(Spik)(1yijk)(1Spik)yijk=(1πi)hij0, (3.4)

where Inline graphic.

Now we consider the setting where the subject Inline graphic has not been tested by the gold standard Inline graphic (i.e., Inline graphic). By the law of total probability, the probability of the test outcomes Inline graphic, given the random effects, can be written as

P(yij,yij0=missing|η,α,β,εi,μi,νi)=P(yij,yij0=1|η,α,εi,μi)+P(yij,yij0=0|η,β,εi,νi)=πihij1+(1πi)hij0. (3.5)

In general, the probability of test outcomes for subject Inline graphic (with or without the gold standard) can be written in the following unified form

(πihij1)δij0yij0[(1πi)hij0]δij0(1yij0)[πihij1+(1πi)hij0](1δij0). (3.6)

3.3 Prior specifications and posterior estimations

In this subsection, we describe specifications of prior distributions of Inline graphic and Inline graphic. Conjugate Wishart prior can be assumed for the precision matrix: Inline graphic. Taking the degrees of freedom Inline graphic equal to the dimension of Inline graphic, Inline graphic, will have approximately uniform prior in the correlation coefficients. Different choices of Inline graphic can give relatively informative or non-informative priors on the variance parameters. Specific choices of Inline graphic are discussed in the case studies.

Vague normal priors with mean 0 and variance 10 are assumed for Inline graphic and Inline graphic (Inline graphic), which correspond to equal tail 95% prior credible intervals (CI) of approximately (0,1) for Inline graphic and Inline graphic, Inline graphic.

Denote the above prior distributions for Inline graphic, Inline graphic, Inline graphic and Inline graphic by Inline graphic and Inline graphic, respectively. With these specifications, the joint posterior distribution is:

Pr(η,α,β,εi,μi,νi|y)i=1N[j=1JiP(yij,yij0|η,α,β,εi,μi,νi)P(εi,μi,νi|Σ)]×{k=1K[fαk(αk)fβk(βk)]}fη(η)f(Σ). (3.7)

Given the likelihood specification in Section 3.2, the posterior can be written as

Pr(η,α,β,εi,μi,νi|y)i=1N[j=1Ji(πihij1)δij0yij0[(1πi)hij0]δij0(1yij0)[πihij1+(1πi)hij0](1δij0)|Σ|1/2e12θiΣ1θi]×{k=1K[fαk(αk)fβk(βk)]}fη(η)f(Σ), (3.8)

where Inline graphic and Inline graphic are defined after equations (3.3) and (3.4), respectively, and Inline graphic is defined in equation (3.2).

We use JAGS software via the rjags package in R to sample from the joint posterior distribution using Markov Chain Monte Carlo (MCMC) methods (Lunn and others, 2000; Plummer and others, 2003). The posterior samples are drawn by Gibbs and Metropolis-Hasting’s algorithms. Posterior estimates are similar to the maximum likelihood estimates when the priors are non-informative, and Bayesian approach allows for full posterior inference, so that the asymptotic approximations are not required. Convergence is assessed using trace plots, sample autocorrelation and Gelman-Rudin statistic (Gelman and Rubin, 1992).

Posterior samples of median disease prevalence, sensitivity and specificity of Inline graphic can be achieved by MCMC sampling: Inline graphic, Inline graphic and Inline graphic. Posterior medians of other measures of clinical interest such as PPV, NPV, positive likelihood (LR+) and negative likelihood (LRInline graphic) of Inline graphic can also be achieved:

PPVk=SekπSekπ+(1Spk)(1π),NPVk=Spk(1π)Spk(1π)+(1Sek)π,
LR+k=Sek1Spk,LRk=1SekSpk.

The surface under the cumulative ranking (SUCRA) can be used to rank test performance, accounting for the uncertainty in ranking (Salanti and others, 2011). The SUCRA for each candidate test can be calculated as follows.

SUCRAk=C=1K1cumk,CK1,

where Inline graphic is the cumulative probability of Inline graphic ranking as the Inline graphicth best test: Inline graphic, where Inline graphic can be calculated from posterior samples. Larger value of SUCRA indicates better test performance, and SUCRA=1 (SUCRA=0) if the test always ranks first (last). Therefore, we can pick the best test by comparing their SUCRA value. This approach is most useful when the difference in preference between successive ranks is the same across the entire ranking scale, otherwise it can be misleading.

3.4 A sensitivity analysis for missingness not at random

The NMA-DT model is built upon the assumption of MAR. However, this assumption may be questionable in some applications. For example, researchers select candidate tests that are believed to have better performance, and hence missing test outcomes are related to unknown test accuracy parameters (which is MNAR). In this subsection, we present a model of missingness to be incorporated into the NMA-DT model to account for known MNAR mechanism. However, in practice, the MAR assumption is hardly known and is not testable. Thus, different models of missingness can be used in sensitivity analyses to evaluate the impact on parameter estimates if the MAR assumption is violated.

Let Inline graphic matrix Inline graphic denote the study-level missingness of a NMA-DT dataset containing Inline graphic studies and Inline graphic candidate tests. The entries of Inline graphic are Inline graphic, Inline graphic and Inline graphic, such that Inline graphic if Inline graphic is missing in study Inline graphic and Inline graphic otherwise. Inline graphic, where Inline graphic is the probability of missing Inline graphic in study Inline graphic. We specify a model of missingness for Inline graphic as

logit(pik)=γk+γ1k×logit(Seik)+γ0k×logit(Spik),

where Inline graphic (Inline graphic) controls the degree of association between the missing outcomes and the study-specific sensitivity (specificity). We assume non-positive Inline graphic and Inline graphic such that Inline graphic is prone to be missing when it’s accuracy is low. When Inline graphic, the outcomes of Inline graphic are MAR with respect to its sensitivity and specificity. The model of missingness can then be incorporated in the likelihood in (3.6) under different pre-specified values of Inline graphic and Inline graphic. It should be noticed that the model of missingness described here is not for general MNAR scenarios, but is specific for the NMA-DT problem, and we only consider the scenarios when missingness is related to the test accuracy indices.

4. Case study results and sensitivity analyses: NMA of DVT tests

We analyze the NMA of DVT tests in Section 2 by the proposed NMA-DT model. In this study, we have Inline graphic. We adopt a relatively moderate informative Wishart prior with Inline graphic and Inline graphic with diagonal elements equal to 5 and off-diagonal elements equal to 0.05. This Wishart prior corresponds to a 95% prior CI of (0.2, 15) for the standard deviation components Inline graphic. We fit the model by assuming vague Inline graphic priors for Inline graphic and Inline graphic. Here the assumed conditional independence between candidate tests given disease status implies that any agreement between D-dimer and ultrasound test results for a specific subject is only a result of the subject’s disease status.

After 10,000 burn-in samples, 1,000,000 posterior samples are obtained. Table 1 shows the results from the proposed NMA-DT model. Figure 1 plots joint posterior distributions and study-specific posterior medians and 95% CIs for prevalence, sensitivity and specificity parameters. We write posterior medians followed by 95% CI in brackets for the rest of this article. The NMA-DT model concludes that ultrasonography has median Se of 0.90 (0.77, 0.96) and median Sp of 0.80 (0.54, 0.97). The D-dimer test is estimated to have moderate ability in diagnosing DVT with median Se 0.83 (0.68, 0.92) and median Sp 0.88 (0.75, 0.97). The SUCRA values for ultrasonography is 0.2 and for D-dimer is 0.13. Overall, ultrasonography is favored in detecting the diseased with higher sensitivity, whereas D-dimer performs better in ruling out the non-diseased with higher specificity.

Table 1.

Meta-analysis of DVT tests: Posterior median estimates and 95% CIs

Ultrasonography D-dimer
Sensitivity 0.90 (0.77, 0.96) 0.83 (0.68, 0.92)
Specificity 0.80 (0.54, 0.97) 0.88 (0.75, 0.97)
PPV 0.84 (0.68, 0.96) 0.84 (0.68 0.96)
NPV 0.91 (0.80, 0.97) 0.87 (0.77, 0.94)
LRInline graphic 4.39 (1.89, 27.90) 7.00 (3.10, 33.49)
LRInline graphic 0.13 (0.05, 0.33) 0.20 (0.09, 0.38)
Prevalence 0.43 (0.36, 0.50)

Fig. 1.

Fig. 1.

Meta-analysis of DVT tests: forest plots and countour plots. (a) is the forest plot for prevalence, (b), (c) are forest plots for sensitivity and specificity of D-dimer, respectively, and (e), (f) are forest plots for sensitivity and specificity of ultrasound, respectively. The solid (dashed) lines denote the corresponding 95% credible intervals when the test is included (not included) in the study, respectively. (d) and (g) are the quantile countours of posterior sensitivity versus specificity at quantile levels 0.25, 0.5, 0.75, 0.9 and 0.95

4.1 Sensitivity analyses to prior distribution of Inline graphic

Sensitivity analyses to the prior distributions of Inline graphic are conducted to evaluate the effect of the prior distribution on the posterior prevalence, sensitivity and specificity. A relatively more informative Wishart prior with Inline graphic and Inline graphic with diagonal elements equal to 20 and off-diagonal elements equal to 0.05 is used to repeat the analysis. This Wishart prior corresponds to a 95% prior CI of (0.1, 7.5) for the standard deviation components Inline graphic. The posterior median disease prevalence is estimated to be 0.43 (0.37, 0.49). Ultrasonography has posterior median sensitivity of 0.89 (0.78, 0.96) and specificity of 0.79 (0.54, 0.96). The D-dimer test has posterior median sensitivity of 0.82 (0.67, 0.92) and specificity of 0.88 (0.75, 0.97). Similar posterior medians and 95% CIs compared to Table 1 are derived using a more informative prior.

A vague prior taking Inline graphic and Inline graphic with diagonal elements equal to 1 and off-diagonal elements equal to 0.05 is also used to repeat the analysis. This prior distribution corresponds to a 95% prior CI of (0.4, 35) for the standard deviation components. The posterior median of disease prevalence is 0.43 (0.33, 0.53). Ultrasonography has posterior median sensitivity 0.90 (0.74, 0.97) and specificity 0.82 (0.56, 0.98). D-dimer has posterior median sensitivity 0.83 (0.65, 0.93) and specificity of 0.89 (0.74, 0.98). Compared to Table 1, this prior leads to wider CIs for all parameters and slightly higher posterior medians of ultrasonography Sp.

Overall, different choices of the Wishart prior for Inline graphic have little effect on the posterior medians of prevalence, sensitivity and specificity but have slight influences on the width of their CIs.

4.2 Sensitivity analysis to the MAR assumption

The MAR assumption is untestable, but looking at the observed data might inform us the validity of this assumption. For example, in Figure 1(b) and (c), the Se and Sp estimates for D-dimer test are generally higher in studies 1-4 which includes D-dimer test and the gold standard, than other studies, due to some reasons leading to MNAR.

In this section, we conduct sensitivity analyses to explore the influence on parameter estimates when the MAR assumption is violated. We incorporate the model of missingness in Section 3.4 under different values of Inline graphic and Inline graphic: 0, Inline graphic, Inline graphic1, and Inline graphic2, which corresponds to MAR, an odds ratio of missingness 0.61, 0.37, and 0.13 (with respect to 1 unit increase in the logit scale of accuracy parameters), respectively.

The posterior medians of prevalence, sensitivities, and specificities are presented in Table 2 under different missingness assumptions: MAR, missingness related to accuracy of ultrasonography or D-dimer test only, missingness related to sensitivities of both tests or specificities only and missingness related to sensitivities and specificities of both tests. Compared to MAR, the estimates of Inline graphic are barely affected under different assumptions. When missingness is negatively correlated with one of the tests, assuming MAR overestimates its Se and Sp while underestimates the other test’s performance. When the missingness probabilities are negatively correlated with specificities, assuming MAR overestimates specificities but similar phenomenon is only observed for Inline graphic when missingness is related to sensitivies. When the missing probabilities are negatively correlated with all parameters, assuming MAR generally overestimates all test accuracies (except for sensitivity estimates when the relation is weak). The differences between the estimates under the MAR and MNAR assumptions are generally enlarged when the bond between missing and test accuracy becomes stronger. In general, when missingness is negatively correlated with the test accuracy parameters, ignoring the model of missingness will overestimate the test performance. Note that, as shown in this example, due to the complex dependency structure of multiple test parameters, it is hard to tell whether the other tests will be over or underestimated when one of the tests is MNAR.

Table 2.

Meta-analysis of DVT tests: median parameter estimates and 95% CIs under different missingness assumptions. MNAR=“None” is equivalent to MAR; MNAR=“D-dimer” (“Ultrasonography”) means missingness related to sensitivity and specificity of D-dimer test (ultrasonography); MNAR=“Se”(“Sp”) means missingness related to the sensitivities (specificities) of both the D-dimer test and ultrasonography; MNAR=“All” means missingness related to sensitivities and specificities of both tests. Bold numbers indicate parameters directly related to missingness

MNAR Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Se: D-dimer Se: ultrasonography Sp: D-dimer Sp: ultrasonography
None 0 0 0 0 0.43 (0.36,0.50) 0.83 (0.68,0.92) 0.90 (0.77,0.96) 0.88 (0.75,0.97) 0.80 (0.54,0.97)
D-dimer -0.5 0 -0.5 0 0.44 (0.37,0.51) 0.81 (0.58,0.95) 0.94 (0.84,1) 0.84 (0.61,0.96) 0.8 (0.56,0.98)
Ultrasonography 0 -0.5 0 -0.5 0.43 (0.36,0.51) 0.89 (0.75,0.99) 0.89 (0.66,0.99) 0.91 (0.78,1) 0.61 (0.15,0.91)
Se -0.5 -0.5 0 0 0.44 (0.37, 0.52) 0.86 (0.72,0.96) 0.93 (0.83,0.99) 0.88 (0.68,0.99) 0.73 (0.38,0.96)
Sp 0 0 -0.5 -0.5 0.43 (0.37,0.51) 0.85 (0.67,0.97) 0.91 (0.7,0.99) 0.88 (0.74,0.98) 0.76 (0.43,0.96)
All -0.5 -0.5 -0.5 -0.5 0.43 (0.36,0.51) 0.85 (0.68,0.97) 0.92 (0.79,0.99) 0.87 (0.71,0.97) 0.72 (0.4,0.93)
D-dimer -1 0 -1 0 0.44 (0.37,0.51) 0.78 (0.49,0.94) 0.95 (0.85,1) 0.80 (0.45,0.96) 0.83 (0.60, 1)
Ultrasonography 0 -1 0 -1 0.43 (0.36,0.50) 0.91 (0.78,1) 0.88 (0.64,0.98) 0.92 (0.79,1) 0.54 (0.11,0.89)
Se -1 -1 0 0 0.43 (0.37,0.51) 0.84 (0.63,0.96) 0.89 (0.63,0.99) 0.90 (0.77,0.99) 0.79 (0.51,0.99)
Sp 0 0 -1 -1 0.43 (0.36,0.51) 0.86 (0.69,0.98) 0.93 (0.83,0.99) 0.86 (0.63,0.98) 0.70 (0.37,0.92)
All -1 -1 -1 -1 0.44 (0.37,0.51) 0.85 (0.68,0.96) 0.90 (0.78,0.98) 0.87 (0.72,0.98) 0.71 (0.38,0.91)
D-dimer -2 0 -2 0 0.44 (0.37,0.52) 0.77 (0.46,0.93) 0.95 (0.85,1) 0.81 (0.49,0.96) 0.85 (0.62,1)
Ultrasonography 0 -2 0 -2 0.43 (0.36,0.50) 0.91 (0.77,1) 0.87 (0.56,0.99) 0.92 (0.79,1) 0.53 (0.11, 0.87)
Se -2 -2 0 0 0.44 (0.37, 0.52) 0.81 (0.56, 0.94) 0.84 (0.53, 0.97) 0.93 (0.81, 0.99) 0.83 (0.57, 0.98)
Sp 0 0 -2 -2 0.43 (0.36,0.51) 0.88 (00.74,0.98) 0.96 (0.86,1) 0.83 (0.61,0.96) 0.68 (0.38,0.89)
All -2 -2 -2 -2 0.44 (0.37,0.51) 0.82 (0.55,0.96) 0.88 (0.65,0.98) 0.86 (0.58,0.99) 0.69 (0.25,0.92)

In summary, the estimates of diagnostic test accuracies were fairly robust under various MNAR models that we considered. The relative rank in test accuracies among these tests also preserved under different MNAR models. Additional model fitting results when the Inline graphic parameters are treated as random are presented in the supplementary material available at Biostatistics online, showing that the Inline graphics are weakly identified from this model.

5. Simulation

5.1 Simulation setups

Simulation studies were conducted to test how the NMA-DT model performs under different assumptions. As the case study, we assume K=2, i.e., the whole test set contains two candidate tests (Inline graphic and Inline graphic) and a gold standard (Inline graphic). The Se (Sp) of Inline graphic is 0.8 (0.9) and the Se (Sp) of Inline graphic is 0.6 (0.7). The overall true disease prevalence is 0.4. We assume the random effects have standard deviations of 0.3: (Inline graphic) = 0.3. The correlations between prevalence and sensitivities are set to be 0.5. The correlations between prevalence and specificities, and the correlation between sensitivities and specificities are all set to be Inline graphic0.5.

We create partially missing data under MCAR, MAR, and MNAR assumptions, respectively. Under each scenario, we simulate 1000 replicates of NMA-DT datasets. Each dataset comprises 20 studies where 100 subjects are tested by both candidate tests and the gold standard. To generate test outcomes in each study, study specific prevalences, sensitivities and specificities are sampled from the multivariate normal distribution of the probit transformed parameters in Section 3.2, with the mean vector and covariance matrix specified above. Under the MCAR assumption, the missing indicators for all studies are prespecified such that the first five studies do not have a missing test outcome, the next five studies are missing Inline graphic, the next five are missing Inline graphic, and the last five studies are missing Inline graphic. In addition to the moderate correlation (0.5) assumption, we also investigate the model performance under weak (0.3) and strong (0.8) correlation assumptions for the probit transformed parameters. To generate the partially missing data under the MAR and MNAR assumptions, we only prespecify the missing indicator for the gold standard test such that Inline graphic is missing in five studies. We then apply a logit model with the observed (unobserved) data as covariates to calculate the missing probabilities of Inline graphic and Inline graphic for MAR (MNAR) scenario, which can be further used to generate the random missing indicators to drop the candidate tests in each study. The model for the missing indicators can be described as below:

For both MAR and MNAR: mikBer(pik)MAR:   logit(pik)=β0kMAR+β1kMAR×logit(πik)MNAR: logit(pik)=β0kMNAR+β1kMNAR×logit(Seik)+β2kMNAR×logit(Spik),

where k=1, 2. Since each study performs at least 2 tests, if 1 test is missing, the missing indicators for the other 2 tests are automatically equal to 1. The intercept terms Inline graphic and Inline graphic are chosen to control the average number of studies missing test k so that it is approximately equal to 5 in each scenario (as the MCAR scenario): Inline graphic, Inline graphic, Inline graphic, Inline graphic. To demonstrate the robustness of our model under the MAR assumption, we create a case where the missing probabilities highly depend on the observed prevalences: Inline graphic. Under the MNAR, both methods are biased, we choose Inline graphic and Inline graphic as an example to investigate the impacts of MNAR on the estimates from each method. Cross-classified cell counts can be collected for each study to present the observed data as in the case studies.

We compare the performance of the NMA-DT model with the “naive” approach. The “naive” method applies the trivariate generalized linear mixed model (TGLMM) (Chu and others, 2009) to studies reporting both Inline graphic and Inline graphic, accounting for potential correlations between disease prevalence and test accuracy parameters. Specifically, studies reporting Inline graphic and Inline graphic and studies reporting Inline graphic and Inline graphic are excluded from the naive analysis. The “naive” analysis is not applied to Inline graphic because Inline graphic and Inline graphic are exchangeable. Test outcomes of Inline graphic in studies reporting all three tests are ignored and only Inline graphic tables cross-classifying outcomes of Inline graphic and Inline graphic are used to fit the trivariate GLMM. In total, 10 out of the 20 studies in each dataset are used to evaluate the performance of Inline graphic in the “naive” approach. The estimates of the fixed effects for prevalence, sensitivity and specificity of Inline graphic are compared with the estimates from the NMA-DT model.

5.2 Simulation results

Table 3 summarizes the bias, mean squared error (MSE) and 95% CI coverage probability (CP) of the fixed effects estimates using the proposed NMA-DT model (in column “NMA-DT”). Under different assumptions, the NMA-DT model is shown to provide nearly unbiased estimates for all parameters with small MSE. Generally, the estimates are more biased under MAR and MNAR assumptions, or as the correlation becomes stronger. The coverage probabilities remain close to the nominal level of 0.95 in all scenarios, and decrease under the MNAR assumption.

Table 3.

Simulation results: bias, mean square error (MSE) and 95% CI coverage probabilities (CP) of the estimates for fixed effects Inline graphic. Estimates from the proposed NMA-DT model and the “naive” method are compared for Inline graphic

NMA-DT Naive
Parameter (true) Bias MSE CP Bias MSE CP
MCAR
Weak Correlation (0.3)
Inline graphic (-0.25) 0.001 0.008 0.957 0.001 0.011 0.976
Inline graphic (0.84) 0.005 0.015 0.965 0.008 0.017 0.975
Inline graphic (1.28) -0.001 0.013 0.966 0.003 0.015 0.978
Inline graphic (0.25) 0.006 0.012 0.963
Inline graphic (0.52) 0.003 0.01 0.966
Moderate Correlation (0.5)
Inline graphic (-0.25) 0.001 0.005 0.967 0.001 0.011 0.962
Inline graphic (0.84) 0.008 0.014 0.961 0.011 0.017 0.956
Inline graphic (1.28) 0.008 0.014 0.957 0.013 0.017 0.958
Inline graphic (0.25) 0.007 0.01 0.955
Inline graphic (0.52) 0.007 0.009 0.959
Strong Correlation (0.8)
Inline graphic (-0.25) -0.006 0.007 0.964 -0.005 0.011 0.972
Inline graphic (0.84) 0.01 0.013 0.972 0.014 0.016 0.973
Inline graphic (1.28) 0.021 0.014 0.972 0.023 0.016 0.971
Inline graphic (0.25) 0.007 0.011 0.971
Inline graphic (0.52) 0.01 0.01 0.969
MAR
Medium Correlation (0.5)
Inline graphic (-0.25) -0.009 0.006 0.962 0.230 0.059 0.534
Inline graphic (0.84) 0.055 0.021 0.972 0.116 0.033 0.920
Inline graphic (1.28) -0.040 0.020 0.967 -0.105 0.030 0.928
Inline graphic (0.25) 0.012 0.007 0.967
Inline graphic (0.52) 0.005 0.007 0.955
MNAR
Medium Correlation (0.5)
Inline graphic (-0.25) -0.009 0.006 0.954 0.047 0.012 0.959
Inline graphic (0.84) 0.073 0.018 0.932 0.104 0.024 0.926
Inline graphic (1.28) -0.015 0.013 0.966 -0.036 0.016 0.968
Inline graphic (0.25) 0.032 0.010 0.959
Inline graphic (0.52) 0.007 0.009 0.955

The fixed effects estimates for prevalence, sensitivity, and specificity for Inline graphic from the ‘naive” approach are also summarized in Table 3, column “Naive.” Under the MCAR assumption, both methods perform reasonably well, although NMA-DT is slightly more efficient since it borrows strength from the indirect evidence. However, under MAR and MNAR assumptions, the estimates from the naive method are significantly more biased than those from the NMA-DT model, and the coverage probabilities may be substantially lower than 0.95 when the missingness is strongly associated with the observed or unobserved data. These observations are expected because the underlying assumption is MAR for NMA-DT , while it is MCAR for the naive approach.

6. Discussion

There is a growing interest in simultaneously comparing the performance of multiple diagnostic tests in a network meta-analysis setting. However, due to mixture study designs, various reported test outcomes, heterogeneity in a meta-analysis, and complex correlation structure of multiple test outcomes, the methodological development for NMA-DT remains challenging. In this article, we presented a Bayesian hierarchical NMA-DT framework that unifies all three types of study designs into the multiple test comparison design using a missing data framework. In addition, it can provide ranks of diagnostic tests to guide clinical decision making. Through simulation studies, we have shown that the proposed method can provide unbiased estimates for prevalence and test accuracy. In addition, it is more efficient than a commonly used “naive” approach doing separate meta-analyses for each candidate test.

The NMA-DT model relies on an “consistency” assumption, which assumes that candidate tests would have been performed consistently on subjects assigned and not assigned to the test. However, inconsistency could happen, when studies that do not include Inline graphic may include a population for whom Inline graphic is inappropriate, and hence their performance may differ systematically from studies that do include Inline graphic. In this situation, the MAR assumption is questionable and borrowing information from studies must be done with caution. The concern of inconsistency is also discussed in contrast based NMA methods (Lu and Ades, 2006) that indirect evidence may be inconsistent with direct evidence. White and others (2012) proposed frequentist ways to estimate consistency and inconsistency models by expressing them as multivariate random-effects meta-regressions. Lu and Ades (2006) proposed to use inconsistency degrees of freedom to estimate the degree of inconsistency in evidence cycles. However, this method cannot be directly applied in NMA-DT because it is restricted to relative effects (e.g., log odds ratio) while NMA-DT is estimating marginal test accuracies (e.g., Se and Sp). On the other hand, researchers have been working on developing methods to detect inconsistency in arm-based NMA models (Hong, chu and others, 2016a; Zhao and others, 2016). Discussions have been elaborated around the comparison of contrast based and arm-based NMA methods (Dias and Ades, 2016; Hong and others, 2016b). Further research is needed to develop a formal test of inconsistency in NMA-DT.

An assumption made in the proposed model is the conditional independent test results given the true disease status and all study-specific diagnostic accuracy parameters. Such an assumption may be violated when two candidate tests are based on a similar biological mechanisms (Vacek, 1985). Attempts were made to account for this dependence through a correlation parameter (Chu, Chen and others, 2009), an additional latent class random effect (Qu and others, 1996), or multivariate probit models (Xu and Craig, 2009). However, they cannot be directly applied to NMA-DT model, because correlation parameters are only suitable for pairwise comparisons and only a small portion of the studies in NMA-DT may be subject to conditional dependence. Specifically, for studies adopting the randomized design, each candidate test is compared to the gold standard, thus conditional independence assumption is not required. For studies adopting the multiple test comparison design, conditional dependence may become a concern, since several candidate tests are compared simultaneously. Similarly, non-comparative designs may also suffer conditional dependence, but only when gold-standard test is not involved and subjects are tested by two candidate tests. As a result, how to adjust for conditional dependence in NMA-DT is subject to future studies.

A concern brought by combining studies in a systematic review is how to correctly measure between-study heterogeneity. In this article, generalized linear mixed models are used to account for heterogeneity in a Bayesian framework, where posterior random effects covariance estimate measures the extent of heterogeneity. Inverse Wishart prior is used for the covariance matrix, but is limited in that the variance components are always positive. Another limitation is that when the correlation matrix grows, it imposes an unstructured covariance matrix while a structured correlation assumption may be more efficient. Applications of alternative priors for the covariance matrix (Daniels and Kass, 1999) in NMA-DT deserve further research.

Finally, we note that although multivariate normal distribution for random effects is assumed in the proposed model, it is straightforward to extend the proposed model to other multivariate distributions including distributions generated from copulas.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Supplementary Material

Supplementary Data

Acknowledgments

We thank the associate editor and an anonymous reviewer for many constructive comments. Conflict of Interest: None declared.

Funding

Research reported in this publication was supported in part by NIAID R21 AI103012 (H.C., X.M.), NIDCR R03 DE024750 (H.C.), NLM R21 LM012197 (H.C.), NIDDK U01 DK106786 (H.C.), and NHLBI T32HL129956 (Q.L). The content is solely the responsibility of the authors and does not necessarily represent official views of the National Institutes of Health.

References

  1. Chen Y., Liu Y., Ning J., Cormier J. and Chu H. (2015). A hybrid model for combining case–control and cohort studies in systematic reviews of diagnostic tests. Journal of the Royal Statistical Society: Series C (Applied Statistics) 64, 469–489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Chu H., Chen S. and Louis T. A. (2009). Random effects models in a meta-analysis of the accuracy of two diagnostic tests without a gold standard. Journal of the American Statistical Association 104, 512–523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chu H. and Cole S. R. (2006). Bivariate meta-analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach. Journal of Clinical Epidemiology 59, 1331–1332. [DOI] [PubMed] [Google Scholar]
  4. Chu H., Nie L., Cole S. R. and Poole C. (2009). Meta-analysis of diagnostic accuracy studies accounting for disease prevalence: Alternative parameterizations and model selection. Statistics in Medicine 28, 2384–2399. [DOI] [PubMed] [Google Scholar]
  5. Daniels M. J. and Kass R. E. (1999). Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. Journal of the American Statistical Association 94, 1254–1263. [Google Scholar]
  6. Dendukuri N., Schiller I., Joseph L. and Pai M. (2012). Bayesian meta-analysis of the accuracy of a test for tuberculous pleuritis in the absence of a gold standard reference, Biometrics 68, 1285–1293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Dias S. and Ades A. E. (2016). Absolute or relative effects? Arm-based synthesis of trial data, Research Synthesis Methods 7, 23–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Gelman A. and Rubin D. B. (1992). Inference from iterative simulation using multiple sequences, Statistical Science 7, 457–472. [Google Scholar]
  9. Hamza T. H., Reitsma J. B. and Stijnen T. (2008). Meta-analysis of diagnostic studies: a comparison of random intercept, normal-normal, and binomial-normal bivariate summary ROC approaches, Medical Decision Making 28, 639–649. [DOI] [PubMed] [Google Scholar]
  10. Harbord R. M., Deeks J. J., Egger M., Whiting P. and Sterne J. A. (2007). A unification of models for meta-analysis of diagnostic accuracy studies, Biostatistics 8, 239–251. [DOI] [PubMed] [Google Scholar]
  11. Hong H., Chu H., Zhang J. and Carlin B. P. (2016a), A Bayesian missing data framework for generalized multiple outcome mixed treatment comparisons, Research Synthesis Methods 7, 6–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Hong H., Chu H., Zhang J. and Carlin B. P. (2016b). Rejoinder to the discussion of a Bayesian missing data framework for generalized multiple outcome mixed treatment comparisons, by Dias S. and Ades AE. Research Synthesis Methods 7, 29–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Kang J., Brant R. and Ghali W. A. (2013). Statistical methods for the meta-analysis of diagnostic tests must take into account the use of surrogate standards, Journal of Clinical Epidemiology 66, 566–574. [DOI] [PubMed] [Google Scholar]
  14. Leeflang M. M., Bossuyt P. M. and Irwig L. (2009). Diagnostic test accuracy may vary with prevalence: implications for evidence-based diagnosis, Journal of Clinical Epidemiology 62, 5–12. [DOI] [PubMed] [Google Scholar]
  15. Little R. J. and Rubin D. (2002). Statistical Analysis with Missing Data, 2nd edition.Hoboken, NJ: John Wiley & Sons. [Google Scholar]
  16. Liu Y., Chen Y. and Chu H. (2015). A unification of models for meta-analysis of diagnostic accuracy studies without a gold standard, Biometrics 71, 538–547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Lu G. and Ades A. (2004). Combination of direct and indirect evidence in mixed treatment comparisons, Statistics in Medicine 23, 3105–3124. [DOI] [PubMed] [Google Scholar]
  18. Lu G. and Ades A. (2006). Assessing evidence inconsistency in mixed treatment comparisons, Journal of the American Statistical Association 101, 447–459. [Google Scholar]
  19. Lunn D. J., Thomas A., Best N. and Spiegelhalter D. (2000). WinBUGS-a Bayesian modelling framework: concepts, structure, and extensibility, Statistics and Computing 10, 325–337. [Google Scholar]
  20. Ma X., Chen Y., Cole S. R. and Chu H. (2016a). A hybrid Bayesian hierarchical model combining cohort and case-control studies for meta-analysis of diagnostic tests: accounting for partial verificatin bias, Statistical Methods in Medical Research 25, 3015–3037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ma X., Nie L., Cole S. R. and Chu H. (2016b). Statistical methods for multivariate meta-analysis of diagnostic tests: an overview and tutorial, Statistical Methods in Medical Research 25, 1596–1619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Pepe M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction, chapter 2. Oxford: Oxford University Press. [Google Scholar]
  23. Plummer M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In: Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003). pp. 20–22. [Google Scholar]
  24. Qu Y., Tan M. and Kutner M. H. (1996). Random effects models in latent class analysis for evaluating accuracy of diagnostic tests, Biometrics 52, 797–810. [PubMed] [Google Scholar]
  25. Reitsma J. B., Glas A. S., Rutjes A. W., Scholten R. J., Bossuyt P. M. and Zwinderman A. H. (2005). Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews, Journal of Clinical Epidemiology 58, 982–990. [DOI] [PubMed] [Google Scholar]
  26. Rutter C. M. and Gatsonis C. A. (2001). A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations, Statistics in Medicine 20, 2865–2884. [DOI] [PubMed] [Google Scholar]
  27. Salanti G., Ades A. and Ioannidis J. (2011). Graphical methods and numerical summaries for presenting results from multiple-treatment meta-analysis: an overview and tutorial, Journal of Clinical Epidemiology 64, 163–171. [DOI] [PubMed] [Google Scholar]
  28. Takwoingi Y., Leeflang M. M. and Deeks J. J. (2013). Empirical evidence of the importance of comparative studies of diagnostic test accuracy, Annals of Internal Medicine 158, 544–554. [DOI] [PubMed] [Google Scholar]
  29. Vacek P. M. (1985). The effect of conditional dependence on the evaluation of diagnostic tests, Biometrics 41, 959–968. [PubMed] [Google Scholar]
  30. Venta E. R. and Venta L. A. (1987). The diagnosis of deep-vein thrombosis: an application of decision analysis, Journal of the Operational Research Society 38, 615–624. [Google Scholar]
  31. White I. R., Barrett J. K., Jackson D. and Higgins J. (2012). Consistency and inconsistency in network meta-analysis: model estimation using multivariate meta-regression, Research Synthesis Methods 3, 111–125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Xu H. and Craig B. A. (2009). A probit latent class model with general correlation structures for evaluating accuracy of diagnostic tests, Biometrics 65, 1145–1155. [DOI] [PubMed] [Google Scholar]
  33. Zhang J., Carlin B. P., Neaton J. D., Soon G. G., Nie L., Kane R., Virnig B. A. and Chu H. (2014). Network meta-analysis of randomized clinical trials: reporting the proper summaries, Clinical Trials 11, 246–262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Zhao H., Hodges J. S., Ma H., Jiang Q. and Carlin B. P. (2016). Hierarchical Bayesian approaches for detecting inconsistency in network meta-analysis, Statistics in Medicine 35, 3524–3536. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES