Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Sep 23.
Published before final editing as: Bayesian Anal. 2025 Feb 20:10.1214/25-ba1511. doi: 10.1214/25-ba1511

Exploiting Multivariate Network Meta-Analysis: A Calibrated Bayesian Composite Likelihood Inference

Yifei Wang *, Lifeng Lin , Yu-Lun Liu
PMCID: PMC12453069  NIHMSID: NIHMS2111791  PMID: 40989826

Abstract

Multivariate network meta-analysis has emerged as a powerful tool for evidence synthesis by incorporating multiple outcomes and treatments. Despite its advantages, this method comes with methodological challenges, such as the issue of unreported within-study correlations among treatments and outcomes, which can lead to biased estimates and misleading conclusions. In this paper, we propose a calibrated Bayesian composite likelihood approach to overcome this limitation. The proposed method eliminates the need for a fully specified likelihood function while allowing for the unavailability of within-study correlations among treatments and outcomes. Additionally, we developed a hybrid Gibbs sampler algorithm along with the Open-Faced Sandwich post-sampling adjustment to enable robust posterior inference. Through comprehensive simulation studies, we demonstrated that the proposed approach yields unbiased estimates while maintaining coverage probabilities close to the nominal levels. We implemented the proposed method to two real-world network meta-analysis datasets: one comparing treatment procedures for root coverage and the other comparing treatments for anemia in patients with chronic kidney disease.

Keywords: Bayesian composite likelihood, Gibbs sampling, multivariate network meta-analysis, Open-Faced Sandwich adjustment, unknown within-study correlations

1. Introduction

Network meta-analysis (NMA), also known as multiple treatments meta-analysis or mixed treatment comparisons, has emerged as a pivotal tool for synthesizing the effectiveness and safety of multiple treatment regimens in medical and healthcare domains (Lumley, 2002; Lu and Ades, 2006; White et al., 2012; Salanti, 2012; Riley et al., 2017). This is evident from the growing number of NMAs, as a recent comprehensive review by Petropoulou et al. (2017) identified 456 NMAs from randomized trials across different domains. The rationale behind NMA lies in estimating pooled treatment effects across at least three interventions by combining both direct and indirect evidence into a coherent framework. This approach overcomes the limitations of scarce head-to-head trials and allows the borrowing of indirect evidence through a common comparator. NMA enables the establishment of a treatment hierarchy, ranking candidate interventions from most to least effective (Cipriani et al., 2013). Numerous approaches for treatment ranking have been developed and widely adopted in healthcare-related research, such as the surface under the cumulative ranking curve (SUCRA) (Salanti et al., 2011) and P-score (Rücker and Schwarzer, 2015), among others.

While NMAs typically focus on single outcomes, real-world scenarios often require evaluating multiple outcomes simultaneously, such as in benefit-risk assessments and health economic evaluations. Since these outcomes of interest are measured within the same population in the same study, they are likely to be correlated. As suggested by Jackson et al. (2011), correlations among multiple outcomes are potentially informative and worth leveraging. Multivariate NMAs allow for the incorporation of correlation estimations, in which correlations between multiple outcomes can occur at both the between-study and within-study levels (Riley et al., 2017). However, obtaining within-study correlations poses a challenge due to their infrequent reporting in trials (Riley, 2009). Ignoring the possibility of within-study correlations tends to distort estimates of relative treatment effects, leading to increased mean-square errors and standard errors of pooled effect estimates (Riley, 2009). Standard multivariate NMAs or meta-analyses typically assume either known or zero within-study correlations. Alternatively, assumed correlation coefficients, possibly combined with sensitivity analysis (Waddingham et al., 2020), can be used to model within-study correlations. Riley et al. (2008) proposed a single overall correlation parameter by combining within-study and between-study correlations in the multivariate meta-analysis framework. Furthermore, Wei and Higgins (2013) derived approximate formulations for assessing within-study correlations among different pairs of treatment effect measurements.

Recent advancements have expanded the methodologies for multivariate NMAs. Efthimiou et al. (2014) introduced a joint model for two correlated dichotomous outcomes within a network and elicited expert opinions to inform within-study correlations. Building upon their work, Efthimiou et al. (2015) further incorporated both within-study and between-study correlations into a Bayesian framework. Additionally, they employed the overall correlations proposed by Riley et al. (2008) as an alternative model to tackle the absence of within-study correlations in multivariate NMAs. Hong et al. (2016) presented a Bayesian framework, extending the work of Lu and Ades (2004), for modeling multiple correlated outcomes; however, their method did not account for within-study correlations. Jackson et al. (2018) proposed a matrix-based method of moments approach for multivariate NMAs, accommodating between-study heterogeneity and inconsistency. Furthermore, Liu et al. (2018) devised a Bayesian hierarchical approach using the bivariate Clayton copula model to leverage information across multiple correlated outcomes and treatments, partially mitigating the impact of outcome reporting bias. A recent contribution by Duan et al. (2023) introduced personalized treatment ranking procedures which enabled the consideration of individual preferences across multiple outcomes.

In this paper, we propose a Bayesian composite likelihood-based method for multivariate NMAs to address two potential challenges: the unavailability of within-study correlations and computational burden. While composite likelihood methods (Lindsay, 1988; Varin et al., 2011) are commonly used in spatial extremes research (Pauli et al., 2011; Ribatet et al., 2012), they have also been applied in diagnostic test accuracy evaluations within multivariate meta-analysis settings, as demonstrated by Chen et al. (2014a), Chen et al. (2014b), and Guolo and To (2020). However, their application to multivariate NMAs remains relatively under-explored. The proposed method offers several advantages over existing Bayesian methods for multivariate NMAs. First, it enables valid statistical inference without requiring the knowledge of within-study correlations among outcomes and treatments. This is achieved by calibrating the composite likelihood to approximate the posterior distribution of parameters using the Markov chain Monte Carlo. Second, while it is acknowledged that the posterior distribution derived from a composite likelihood method may lead to overly precise estimates (Pauli et al., 2011; Ribatet et al., 2012), we address this issue by employing the Open-Faced Sandwich adjustment on posterior samples to ensure accurate asymptotic variance and preserve the proper shapes of posterior distributions. Finally, the proposed method significantly improves computational efficiency gains through the implementation of a parallel sampling strategy.

The remainder of this paper is organized as follows. In Section 2, we present two motivating NMA examples. In Section 3, we formalize the Bayesian composite likelihood approach for multivariate NMAs using the Markov chain Monte Carlo algorithm and introduce the calibration of posterior samples via the Open-Faced Sandwich adjustment. In Section 4, we evaluate the performance and statistical properties of the proposed method using simulation studies under various scenarios. In Section 5, we demonstrate the application of the proposed method with two motivating NMA examples. Finally, Section 6 concludes the paper with a discussion and provides recommendations for future research.

2. Motivating examples

2.1. Comparative analysis of coronally advanced flap procedures for root coverage

Several procedures combined with the coronally advanced flap (CAF) have been proposed to treat gingival recessions. Cairo et al. (2008) showed that connective tissue grafts (CTGs) or enamel matrix derivatives (EMDs), in conjunction with CAF, increase complete root coverage (CRC) rates in Miller Class I and II single gingival recessions. Cheng et al. (2007) found that EMD enhances the predictability of CAF outcomes. However, these systematic reviews did not comprehensively compare all treatment options. To address this, Buti et al. (2013) conducted a Bayesian network meta-analysis (Lu and Ades, 2006) to compare and rank the efficacy of seven CAF-based treatments, including CAF, CAF+CTG, CAF+EMD, CAF plus barrier membrane (CAF+BM), CAF plus acellular dermal matrix (CAF+ADM), CAF plus platelet-rich plasma (CAF+PRP), CAF plus collagen matrix (CAF+CM), and CAF plus human fibroblast-derived dermal substitute (CAF+HF−DDS). The analysis included 29 trials with a focus on four clinical outcomes: recession reduction (RecRed), clinical attachment gain (CALgain), keratinized tissue gain (KTgain), and complete root coverage (CRC). However, within-study correlations between treatments and outcomes were not reported in this NMA. The evidence networks are illustrated in Figure S1 of Appendix A in the Supplementary Material (Wang et al., 2025). Consequently, the authors concluded that CAF combined with CTG is the most efficacious treatment procedure.

2.2. Comparative effectiveness of treatments for anemia in adults with chronic kidney disease

Anemia, characterized by a deficiency in red blood cells, is a frequent complication among patients with chronic kidney disease (CKD). Symptoms include fatigue, shortness of breath, and, in severe cases, the need of blood transfusions. Erythropoiesis-stimulating agents (ESAs) are commonly administrated to treat anemia in CKD patients, but recent studies have raised concerns about potential cardiovascular risks associated with ESA use (Hanna et al., 2021). To further assess the effectiveness and safety of various ESAs, Chung et al. (2023) conducted an NMA of 74 published trials up to April 29, 2022. This analysis compared six ESA treatments (including epoetin alfa, epoetin beta, darbepoetin alfa, methoxy polyethylene glycol-epoetin beta, biosimilar epoetin, and biosimilar darbepoetin) across nine clinically relevant outcomes, such as blood transfusion, cardiovascular mortality, and kidney failure. The networks of evidence are illustrated in Figure S2 of Appendix A in the Supplementary Material. They employed a multivariate meta-regression method for NMAs, as proposed by White et al. (2012), assuming a common heterogeneity variance structure and known within-study correlations. Consequently, Chung et al. (2023) concluded that the relative efficacy of different ESAs remains unclear.

3. Methodology

3.1. Notation and model specifications

Suppose that I studies (i=1,,I) have compared a total of 𝒯 treatments, denoted by 𝒯={A,B,C,}. Each study compares at least two treatments from 𝒯. Let d=1,,D index the designs or sets of treatments compared within a study. Define nd as the number of studies for the dth design, each involving a comparison of 𝒯d treatments. For example, if d=1 corresponds to a three-arm design comparing treatments A,B, and C (thus 𝒯1=3), and only one study follows this three-arm design, then n1=1. Assume that a set of L outcomes is of interest, where L continuous outcomes consist of standardized mean differences or other transformed continuous scales, such as log odds ratios or log risk ratios. Let yi,lXZ represent the observed contrast of treatment Z(Z𝒯) with treatment X(X𝒯) for the ith study and the lth outcome (l=1,,L), along with a standard error si,lXZ. We note that the corresponding standard errors, si,lXZ, are assumed to be known and equal to the estimates provided by each study. While this assumption is standard in network meta-analysis, it may not always hold, especially when study-specific standard error estimates are imprecise or unavailable. Our proposed method can also estimate within-study variances when they are unknown, accounting for additional uncertainty and improving the robustness of the analysis.

For illustrative purposes and without loss of generality, we consider a network with three treatments (A,B,C) and two distinct yet potentially correlated continuous outcomes, denoted by yi,1XZ and yi,2XZ, where Z,X𝒯={A,B,C}. Any treatment can be chosen as a reference; for the sake of simplicity, treatment A is chosen as a reference throughout this section. When comparing the treatments A,B and C, possible designs include two-arm designs AB,BC,AC, as well as a three-arm design ABC. Let yi=yi,1AB,yi,1BC,yi,1AC,yi,2AB,yi,2BC,yi,2AC and si=si,1AB,si,1BC,si,1AC,si,2AB,si,2BC,si,2AC represent the observed treatment effects and their corresponding standard errors for the ith study, respectively. The random-effects model for the observed data is specified as follows,

yi~MVNμ,Ωi+Σ, (1)
Ωi=si,1AB2ρi,12wsi,1ABsi,1BCρi,16wsi,1ABsi,2ACρi,12wsi,1ABsi,1BCsi,1BC2ρi,26wsi,1BCsi,2ACρi,16wsi,1ABsi,2ACρi,26wsi,1BCsi,2ACsi,2AC2,
Σ=τ1AB2ρ12bτ1ABτ1BCρ16bτ1ABτ2ACρ12bτ1ABτ1BCτ1BC2ρ26bτ1BCτ2ACρ16bτ1ABτ1ACρ26bτ1BCτ2ACτ2AC2,

where μ=μ1,μ2, and μl=μlAB,μlBC,μlAC represents the overall mean estimates for outcome l.Ωi is the 6 × 6 within-study variance-covariance matrix, with elements ρi,stw(1s<t6) capturing the within-study correlations. Estimating these correlations is often challenging due to limited available data (Riley et al., 2008; Riley, 2009; Kirkham et al., 2012). We denote the within-study correlation matrix for each study as Riw. Furthermore, Σ is the 6 × 6 between-study variance-covariance matrix; τlXZ2(l=1,2) and ρstb(1s<t6) describe the between-study heterogeneity variance and correlation, respectively. The between-study heterogeneity variances are denoted by τ=τ1,τ2, with τl=τlAB2,τlBC2,τlAC2 for l=1,2, and the between-study correlation matrix is represented as Rb.

3.2. Proposed method

The full likelihood function in Equation (1) often encounters computational challenges due to the high-dimensional parameter space Θ=θ,Riw,Rb. This space includes parameters of primary interest θ=(μ,τ), and nuisance parameters, (Riw,Rb). This complexity motivates the exploration of computationally efficient approximations under a Bayesian framework using the composite likelihood function. More specifically, this strategy involves replacing the full likelihood function with a surrogate composite likelihood (Lindsay, 1988; Varin et al., 2011), which shares similar properties with the standard full likelihood. The concept behind the composite likelihood paradigm (Lindsay, 1988) lies in constructing a set of pseudo-likelihood functions based on either conditional or marginal likelihood components. Let 𝒜lXZ denote a set of studies containing effect sizes along with their associated standard errors for outcome l when comparing treatments X and Z. It is worth noting that studies in each 𝒜lXZ are likely to overlap due to the inclusion of multiple outcomes and treatment comparisons in single studies. Considering all yi,lXZ for studies in 𝒜lXZ, the log composite likelihood function for the observed data is given by

cμ,τ=-12l=12X,Z𝒯i𝒜lXZwlXZlogsi,lXZ2+τlXZ2+yi,lXZ-μlXZ2si,lXZ2+τlXZ2, (2)

where wlXZ represents non-negative weights. We initially set wlXZ=1 for all XZ and subsequently apply post-sample adjustment to implicitly incorporate different weights. The composite likelihood function is constructed under the assumption of a working independence covariance structure. This implies that outcomes are treated as independent both within and between studies. While this assumption simplifies computations, potential dependencies are partially accounted for using a sandwich-type variance estimator. The sandwich-type variance estimator offers two advantages in multivariate NMAs. First, it is robust to dependence, as explained in the following subsections. Second, the sandwich-type estimator is generally robust to misspecification of covariance structures, which is beneficial for handling complex data structures commonly encountered in multivariate NMAs. It is essential to emphasize that these methods remain valid only if the marginal model is accurately specified.

Prior and posterior distributions

In the log composite likelihood function for observed data (as shown in Equation (2)), the parameters to be estimated consist of the treatment effect μlXZ and the between-study heterogeneity variance τlXZ2 for outcome l between treatments X and Z. Assuming no prior knowledge, we assign non-informative prior distributions to these parameters. Specifically, for outcome l and treatments X and Z, we adopt a normal prior distribution, N(0,10000), for the treatment effect μlXZ and an inverse-gamma distribution, IG(0.001,0.001), for the between-study variance τlXZ2.

To obtain samples of the parameters of interest from their posterior distributions, we leverage the observed data to derive posterior kernel functions from the composite likelihood function and specified priors. Given the observed data y and the vector of between-study variance τ, the posterior kernel function for the treatment effect μlXZ is

PμlXZτ,yLτlXZ2,μXZPμlXZexp-μlXZ220000-i𝒜lXZyi,lXZ-μlXZ22si,lXZ2+τlXZ2exp-μlXZ-B2A21/A. (3)

Thus, the posterior distribution, PμlXZτ,y, follows a normal distribution,

PμlXZτ,y~NB2A,12A,

where

A=120000+iAlXZ12si,lXZ2+τlXZ2,
B=iAlXZyi,lXZsi,lXZ2+τlXZ2.

Similarly, conditioning on the observed data y and the treatment effect μ, the posterior kernel function for the between-study heterogeneity variance τlXZ2 is

PτlXZ2μ,yLτlXZ2,μXZPτlXZ2i𝒜lXZsi,lXZ2+τlXZ2-1/2×expi𝒜lXZ-yi,lXZ-μlXZ22si,lXZ2+τlXZ2×1τlXZ21.001×exp-0.001τlXZ2. (4)

Note that the above posterior kernel function does not conform any standard distributions, resulting in non-conjugate posterior distributions. To tackle this challenge, we tailor the Metropolis–Hastings step within the Gibbs sampler to sample from the posterior distribution of the between-study variance τlXZ2 in the following subsection. Detailed derivations are provided in Appendix B of the Supplementary Material (Wang et al., 2025).

Markov chain Monte Carlo algorithm: a hybrid of Gibbs samplers and Metropolis–Hastings

To approximate posterior distributions of model parameters, we utilize a Markov chain Monte Carlo (MCMC) algorithm. This method efficiently generates a sequence of random observations to approximate complex posterior distributions. We propose a Gibbs sampler that exploits the conditional dependence among model parameters within the MCMC framework. This iterative sampler sequentially draws samples from the composite posterior distribution of each parameter in turn, conditioned on the current values of other parameters and the observed data, as outlined in Algorithm 1. Recognizing the limitations of standard MCMC methods in high-dimensional problems with complex correlation structures, as often encountered in multivariate NMAs, we propose a hybrid approach. By combining a hybrid Gibbs sampler with an Open-Faced Sandwich post-sampling adjustment, we enhance robustness and computational efficiency in this context. The hybrid Gibbs sampler facilitates efficient sampling in high-dimensional parameter spaces, while the Open-Faced Sandwich adjustment corrects potential biases introduced by the composite likelihood approximation, ensuring reliable posterior inference.

Algorithm 1.

Gibbs sampler algorithm.

for Simulation iteration t from 1 to N do
for Each outcome l{1,2} do
  for Each treatment comparison XZ{AB,BC,AC} do
   Sampling μl(t)XZ~PμlXZy,τ(t1)
  end for
end for
 Sampling τ(t)~Pτy,μ(t)
end for

Each iteration of the hybrid Gibbs sampler employs a combination of Gibbs and Metropolis–Hastings steps to overcome the lack of a closed-form solution for the full conditional distribution of the between-study heterogeneity variance τlXZ2. For each iteration, we leverage a truncated normal distribution conditioned on the values of between-study variance obtained in the previous iteration. To mitigate computational burden associated with an increasing number of treatments, we group the between-study variance τlXZ2 into blocks and sample them jointly, leveraging the posterior kernel function in Equation (4) and a truncated normal distribution. The Metropolis–Hastings step implementation is detailed in Algorithm 2.

Algorithm 2.

Metropolis–Hastings step for τ.

Propose qτl(t)XZ2τl(t-1)XZ2=Nτl(t-1)XZ2,1Iτl(t-1)XZ2>0
Set the initial value τl(0)XZ2
Sampling τl(*)XZ2~Nτl(t-1)XZ2,1Iτl(t-1)XZ2>0
Set a=XZ{AB,BC,AC}Pτl(*)XZ2y,μ/qτl(*)XZ2τl(t-1)XZ2Pτl(t-1)XZ2y,μ/qτl(t-1)XZ2τl(*)XZ2
if Unif(0,1)<min{a,1} then:
τ(t)=τl(*)XZ2
else:
τ(t)=τ(t-1)
end if

We sequentially sample each individual treatment. Given the normal distribution of the treatment effect’s posterior, we directly sample each treatment effect based on parameters estimated in the previous step. Details of the Gibbs sampling step are provided in Algorithm 3.

Algorithm 3.

Gibbs step for μ.

Sampling μl(*)XZ~NB2A,12A where
A=120000+iAlXZ12si,lXZ2+τl(t)XZ2
B=iAlXZyi,lXZsi,lXZ2+τl(t)XZ2

Posterior inference with the Open-Faced Sandwich adjustment

Our proposed approach substitutes the full likelihood function with a composite likelihood function within the posterior density to generate posterior samples. However, it is well-acknowledged that posterior distributions derived from composite likelihood functions can yield overly precise estimates (Pauli et al., 2011; Ribatet et al., 2012). To address this issue, we employ the Open-Faced Sandwich (OFS) adjustment on posterior samples to ensure asymptotically valid parameter estimation. This method, building upon sandwich-type estimators, applies to objective functions involving both observed data and unknown parameters, providing asymptotically valid estimations (Shaby, 2014). As our initial composite posterior estimates, θˆ=(μˆ,τˆ), for θ=(μ,τ), correspond to the maximum a posteriori (MAP) estimates, we construct the log composite posterior as the objective function by summing the log-composite likelihood in Equation (2) and the logarithm of the prior distributions

f(μ,τ)=c(μ,τ)+logπ(μ)+logπ(τ),

where π(μ) and π(τ) denote the respective prior distributions for μ and τ. Following the principles of the OFS adjustment, we compute the score squares and Hessian matrix at the initial composite posterior estimates, θˆ, using the Newton–Raphson method

Hθˆ=2fμ,τθ2θ=θˆ,
J(θˆ)=i=1Nfi(μ,τ)θθ=θˆfi(μ,τ)θθ=θˆ,

where fi(μ,τ) is the objective function given the data of study i. While various numerical methods are feasible, we opt for the Newton-Raphson method due to its flexibility in handling scenarios with varying numbers of treatments and outcomes. After obtaining the score vector and Hessian matrix, the adjustment is applied to the posterior sample instead of directly altering the composite posterior. The adjustment is computed using the following formula:

θOFS=θˆ+W(θ-θˆ),whereW=H(θˆ)-1J(θˆ)1/2H(θˆ)1/2.

Importantly, the initial MAP estimate remains unaffected by the OFS adjustment. Compared to in-sampling adjustment methods such as the Moment-type adjustment (Chandler and Bate, 2007), the OFS method offers three key advantages. First, by performing post-sampling adjustments, the OFS method eliminates the need to calculate the maximum likelihood estimate at each sampling step, thereby significantly improving computational efficiency. Second, unlike in-sampling methods, which adjust the posterior kernel during the sampling process, the OFS method avoids distorting the sampling process from its stationary distribution. Third, the composite likelihood framework assumes a working independence covariance structure to simplify computation. The sandwich-type variance estimator, including the OFS adjustment, accounts for potential dependencies through its two main components: J(θˆ), which reflects empirical variability in the score function, and H(θˆ), which accounts for the curvature of the composite likelihood. Together, these components ensure consistent variance estimation, even when the independence assumption is misspecified. This property of sandwich-type estimators has been demonstrated in prior work (Lindsay, 1988; Varin et al., 2011) and forms the theoretical basis for the robustness of the OFS adjustment.

Parallel sampling for computational efficiency gains

By incorporating the composite likelihood function into the posterior distribution, we can efficiently generate independent posterior samples for each outcome. This allows us to leverage a parallel sampling strategy using the doParallel and foreach R packages, where posterior distributions for all outcomes are sampled simultaneously. The feasibility of parallelizing these computations relies on the assumption of independent priors for the outcomes – a reasonable assumption given the use of noninformative priors. Once the sampling procedure is complete, the posterior samples are combined across outcomes and the OFS adjustment is applied. This parallel sampling strategy significantly improves computational efficiency, particularly in scenarios with multiple outcomes. Execution time comparisons are illustrated in the subsequent simulation section.

4. Simulation

4.1. Data-generating mechanisms

The simulation study was designed to evaluate the performance of the proposed methods under various network meta-analysis scenarios. Treatment effect sizes were selected based on examples from the relevant literature (Duan et al., 2023), reflecting small, moderate, and large datasets typically observed in NMAs. To examine both finite-sample performance and asymptotic properties, the number of studies was varied to simulate networks of different sizes. For computational evaluation, we replicated the structure of an existing NMA (Witkowski et al., 2018) and used bootstrapping to expand the number of studies, treatments, and outcomes. The parameter settings are detailed below.

The simulations employed a contrast-based multivariate NMA with a three-arm design (i.e., A,B, and C, with A as a reference) for two continuous outcomes of primary interest. Of note, ratio measures such as odds ratios and risk ratios, along with their log transformations, can be treated as continuous variables. Within the simulated network, consistency was assumed to be satisfied, implying μ1BC=μ1AC-μ1AB and μ2BC=μ2AC-μ2AB. Data were drawn from a multivariate normal distribution following Equation (1), with specified treatment effects: μ1AB=1,μ1AC=2,μ2AB=-1, and μ2AC=-2. We assessed the performance of the proposed method without and with the OFS adjustment (i.e., BCL and BCL-OFS), and the Bayesian full-likelihood method (i.e., BFL) across various scenarios, including (i) equality of between-study heterogeneity variance (τ1AB2=τ1AC2=τ2AB2=τ2AC2=0.36 under common between-study heterogeneity variance; τ1AB2=0.25,τ1AC2=0.36,τ2AB2=0.36,τ2AC2=0.49 under unequal between-study heterogeneity variance), (ii) varying numbers of studies (I=5,10,20,40, and 100), and (iii) various within-study and between-study correlation structures. Specifically, we evaluated the magnitude of correlations, from low (ρw=ρb=0.2), moderate (ρw=ρb=0.5), to high (ρw=ρb=0.8), as well as varying correlation structures

Rb=Rw=1.00.50.40.30.51.00.50.40.40.51.00.30.30.40.31.0.

For each scenario, 1000 NMA datasets were generated. Detailed specifications of the simulation design are provided in Appendix C of the Supplementary Material (Wang et al., 2025).

To evaluate the computational efficiency of the BCL method, we compared its execution time against that of the BFL methods using the simulated data. The BFL methods employed the full-likelihood function as described in Equation (1), assuming a moderate within-study correlation of ρw=0.5. Both BCL and BFL methods utilized the same prior distributions. The Markov chain was run for 3000 iterations for every parameter, discarding the first 1500 as burn-in and collecting posterior samples every 5th iteration thereafter.

4.2. Simulation results

We evaluated the performance of the BCL, BCL-OFS, and BFL methods by examining bias and the coverage probability of 95% credible interval (CrI) for estimated treatment effects. Table 1 presents the bias of treatment effect estimates for both outcomes under a common between-study heterogeneity variance. We note that the OFS adjustment did not shift the location of the initial composite likelihood distribution, indicating consistent treatment effect estimates regardless of the OFS application. As shown in Table 1, the proposed BCL method generally produced nearly unbiased pooled estimates for treatment comparisons AB and AC across most scenarios and both outcomes. High within-study correlations had minimal impact on treatment effect estimates; however, scenarios with a limited number of studies (I=5 or 10), especially those with moderate to varying within-study correlations, exhibited slightly larger bias. Increasing the number of studies reduced bias for both outcomes, particularly in scenarios with varying within-study correlations. Similar patterns were observed for the BCL method under unequal between-study heterogeneity variance (Tables S1 and S2 in the Supplementary Material) (Wang et al., 2025).

Table 1:

Summary of 1, 000 simulations with I=5,10,20,40, and 100: bias and coverage probability (CP) of pooled estimates for AB and AC treatment comparisons across two outcomes. The data-generating mechanism assumes a common between-study heterogeneity variance τ1AB2=τ1AC2=τ2AB2=τ2AC2=0.36 and true treatment effects (μ1AB=1,μ1AC=2,μ2AB=-1, and μ2AC=-2). Correlation scenarios include high (ρw=ρb=0.8), moderate (ρw=ρb=0.5), low (ρw=ρb=0.2), and varying correlations. Results are based on the proposed BCL method.

High correlation Moderate correlation Low correlation Varying correlation
μ1AB μ1AC μ2AB μ2AC μ1AB μ1AC μ2AB μ2AC μ1AB μ1AC μ2AB μ2AC μ1AB μ1AC μ2AB μ2AC
I=5
 Bias 0.008 0.001 0.007 0.002 0.006 0.008 0.011 0.008 0.018 0.016 0.009 0.010 0.005 0.007 0.010 0.006
 CP 0.882 0.882 0.881 0.884 0.912 0.907 0.901 0.924 0.906 0.908 0.909 0.921 0.909 0.895 0.921 0.912
I=10
 Bias 0.009 0.000 0.005 0.005 0.003 0.005 0.007 0.003 0.009 0.003 0.005 0.004 0.006 0.001 0.013 0.010
 CP 0.869 0.900 0.882 0.895 0.891 0.896 0.911 0.897 0.899 0.904 0.919 0.911 0.895 0.909 0.909 0.906
I=20
 Bias 0.002 0.002 0.004 0.002 0.001 0.001 0.002 0.005 0.002 0.001 0.003 0.010 0.006 0.002 0.001 0.006
 CP 0.886 0.903 0.899 0.879 0.904 0.884 0.898 0.886 0.910 0.893 0.906 0.891 0.899 0.895 0.916 0.888
I=40
 Bias 0.006 0.009 0.004 0.006 0.001 0.002 0.001 0.005 0.002 0.001 0.002 0.001 0.002 0.005 0.003 0.002
 CP 0.889 0.884 0.893 0.889 0.888 0.912 0.908 0.888 0.901 0.916 0.919 0.915 0.887 0.903 0.909 0.900
I=100
 Bias 0.002 0.000 0.001 0.000 0.004 0.002 0.000 0.003 0.001 0.002 0.001 0.000 0.000 0.001 0.000 0.003
 CP 0.906 0.907 0.914 0.912 0.918 0.910 0.917 0.922 0.911 0.922 0.905 0.915 0.901 0.897 0.911 0.906

Figure 1 displays the coverage probabilities of 95% CrIs for the proposed BCL and BCL-OFS methods under a common between-study heterogeneity variance. Overall, the BCL method slightly underestimated coverage probabilities, ranging from 86.6% to 92.8% for both outcomes. This underestimation was particularly evident in scenarios with strong within-study correlations (ranging from 88.1% to 91.4%). These findings align with the previous research by Pauli et al. (2011) and Ribatet et al. (2012), which suggested that Bayesian composite likelihood methods can underestimate variance, leading to overly precise posterior distributions. Conversely, applying the OFS adjustment to the BCL method effectively mitigated this underestimation, improving coverage probabilities closer to the 95% nominal level. For scenarios with low within-study correlations, the BCL-OFS method enhanced coverage probabilities by 1% to 6%, achieving a range of 90.9% and 96.6%. In scenarios with high within-study correlations, the BCL-OFS method yielded coverage probabilities ranging from 89.8% to 95.7%. Similar patterns were observed under unequal between-study heterogeneity variance (Figure S3 in the Supplementary Material) (Wang et al., 2025).

Figure 1:

Figure 1:

Coverage probabilities of pooled estimates for treatment comparisons AB and AC across both outcomes using the proposed BCL and BCL-OFS methods. Results are shown for low (ρw=ρb=0.2), moderate (ρw=ρb=0.5), high (ρw=ρb=0.8), and varying correlation structures. The data-generating mechanism assumes a common between-study heterogeneity variance, with τ1AB2=τ1AC2=τ2AB2=τ2AC2=0.36.

We compared the proposed BCL-OFS method with the standard BFL method assuming moderate correlations under a common between-study heterogeneity variance, as shown in Table 2. Overall, both BCL-OFS and BFL methods generated nearly unbiased results. However, when the number of studies was small (e.g., I=5), the BFL method tended to have underestimated coverage probabilities: 88.1% in the high correlation scenario, 85.1% in the moderate correlation scenario, 87.9% in the low correlation scenario, and 86.2% in the varying correlation scenario. In contrast, the BCL-OFS method consistently had acceptable coverage probabilities above 90%. Additionally, we calculated the relative efficiency (RE) of the BCL-OFS method, defined as the ratio of the squared empirical standard errors of the BCL-OFS and BFL estimates. The RE ranged from 78.3% to 112.7% across correlation structures under a common between-study heterogeneity variance. While the BFL method was asymptotically most efficient, the BCL-OFS method demonstrated comparable efficiency by avoiding estimation of within-study correlation parameters. As the number of studies increased, both methods exhibited coverage probabilities close to the 95% nominal level. Similar patterns were observed in scenarios with unequal between-study heterogeneity variance (Tables S3S6 in the Supplementary Material) (Wang et al., 2025).

Table 2:

Summary of 1, 000 simulations with I=5,10,20,40, and 100: bias, coverage probability (CP), and relative efficiency (RE) of pooled estimates for AB and AC treatment comparisons across two outcomes. The data-generating mechanism assumes a common between-study heterogeneity variance (τ1AB2=τ1AC2=τ2AB2=τ2AC2=0.36) and true treatment effects (μ1AB=1,μ1AC=2,μ2AB=-1, and μ2AC=-2). Correlation scenarios include high (ρw=ρb=0.8), moderate (ρw=ρb=0.5), low (ρw=ρb=0.2), and varying correlations. Results are based on the proposed BCL-OFS method and the BFL method under a moderate correlation of 0.5.

μ1AB μ1AC μ2AB μ2AC
# Study Method Bias CP RE Bias CP RE Bias CP RE Bias CP RE
High correlation
5 BFL 0.000 0.898 1.000 0.003 0.844 1.000 0.012 0.911 1.000 0.004 0.866 1.000
BCL-OFS 0.008 0.900 1.103 0.001 0.907 0.922 0.007 0.898 1.090 0.002 0.909 0.999
10 BFL 0.002 0.906 1.000 0.006 0.849 1.000 0.005 0.913 1.000 0.005 0.854 1.000
BCL-OFS 0.009 0.927 1.084 0.000 0.938 0.868 0.005 0.933 1.028 0.005 0.937 0.866
20 BFL 0.002 0.942 1.000 0.001 0.900 1.000 0.003 0.941 1.000 0.003 0.912 1.000
BCL-OFS 0.002 0.947 1.017 0.002 0.957 0.898 0.004 0.955 0.970 0.002 0.951 1.024
40 BFL 0.001 0.956 1.000 0.001 0.929 1.000 0.003 0.949 1.000 0.004 0.936 1.000
BCL-OFS 0.006 0.939 1.031 0.009 0.933 0.969 0.004 0.950 0.985 0.006 0.933 1.030
100 BFL 0.002 0.954 1.000 0.001 0.938 1.000 0.001 0.950 1.000 0.002 0.934 1.000
BCL-OFS 0.002 0.932 0.976 0.000 0.932 0.936 0.001 0.937 0.980 0.000 0.928 0.915
Moderate correlation
5 BFL 0.006 0.916 1.000 0.004 0.880 1.000 0.002 0.922 1.000 0.000 0.851 1.000
BCL-OFS 0.006 0.922 0.960 0.008 0.909 0.999 0.011 0.908 1.081 0.008 0.932 0.783
10 BFL 0.000 0.928 1.000 0.009 0.890 1.000 0.003 0.921 1.000 0.006 0.867 1.000
BCL-OFS 0.003 0.945 1.025 0.005 0.948 0.991 0.007 0.948 0.983 0.003 0.950 0.832
20 BFL 0.004 0.955 1.000 0.005 0.927 1.000 0.001 0.953 1.000 0.000 0.914 1.000
BCL-OFS 0.001 0.966 1.080 0.001 0.963 1.050 0.002 0.944 1.068 0.005 0.945 0.961
40 BFL 0.006 0.959 1.000 0.000 0.952 1.000 0.000 0.956 1.000 0.001 0.948 1.000
BCL-OFS 0.001 0.946 1.143 0.002 0.957 1.016 0.001 0.958 1.042 0.005 0.952 1.127
100 BFL 0.002 0.959 1.000 0.000 0.935 1.000 0.001 0.948 1.000 0.002 0.942 1.000
BCL-OFS 0.004 0.947 1.039 0.002 0.930 0.984 0.000 0.940 0.944 0.003 0.945 0.935
Low correlation
5 BFL 0.009 0.938 1.000 0.013 0.879 1.000 0.001 0.927 1.000 0.007 0.894 1.000
BCL-OFS 0.018 0.929 1.017 0.016 0.933 0.913 0.009 0.927 0.984 0.010 0.939 0.836
10 BFL 0.005 0.943 1.000 0.012 0.905 1.000 0.010 0.935 1.000 0.005 0.908 1.000
BCL-OFS 0.009 0.947 1.071 0.003 0.949 1.000 0.005 0.968 0.976 0.004 0.958 0.872
20 BFL 0.007 0.947 1.000 0.003 0.935 1.000 0.004 0.961 1.000 0.002 0.927 1.000
BCL-OFS 0.002 0.962 1.011 0.001 0.957 1.029 0.003 0.959 1.051 0.010 0.948 0.957
40 BFL 0.008 0.956 1.000 0.000 0.942 1.000 0.002 0.968 1.000 0.002 0.953 1.000
BCL-OFS 0.002 0.957 1.027 0.001 0.964 0.944 0.002 0.965 0.920 0.001 0.961 0.925
100 BFL 0.005 0.954 1.000 0.001 0.944 1.000 0.000 0.952 1.000 0.003 0.949 1.000
BCL-OFS 0.001 0.940 0.993 0.002 0.946 0.938 0.001 0.931 1.066 0.000 0.941 0.938
Varying correlation
5 BFL 0.003 0.917 1.000 0.009 0.869 1.000 0.027 0.927 1.000 0.012 0.862 1.000
BCL-OFS 0.005 0.926 0.945 0.007 0.915 0.969 0.010 0.936 1.004 0.006 0.924 0.856
10 BFL 0.000 0.932 1.000 0.016 0.897 1.000 0.001 0.918 1.000 0.001 0.911 1.000
BCL-OFS 0.006 0.940 1.068 0.001 0.943 0.931 0.013 0.961 0.966 0.010 0.943 1.018
20 BFL 0.005 0.930 1.000 0.007 0.917 1.000 0.001 0.940 1.000 0.002 0.930 1.000
BCL-OFS 0.006 0.958 0.978 0.002 0.960 0.972 0.001 0.964 1.056 0.006 0.947 1.025
40 BFL 0.003 0.951 1.000 0.004 0.936 1.000 0.006 0.959 1.000 0.001 0.937 1.000
BCL-OFS 0.002 0.953 1.084 0.005 0.958 0.890 0.003 0.964 1.071 0.002 0.958 1.066
100 BFL 0.004 0.951 1.000 0.003 0.941 1.000 0.001 0.961 1.000 0.002 0.952 1.000
BCL-OFS 0.000 0.928 1.022 0.001 0.929 0.980 0.000 0.934 1.114 0.003 0.933 1.025

Our exploration of computational efficiency is presented in Figure 2 and Table S7, which depicts the running times of four methods across different scenarios. The proposed BCL method, utilizing the parallel sampling strategy, consistently exhibited the fastest running times across all configurations. This demonstrated the substantial performance gain achieved by incorporating the parallel sampling strategy into the proposed BCL method. In contrast, the BFL methods, especially those assuming unequal correlations, required longer running times due to the increased number of parameters needed to estimate variance-covariance structures as the number of treatments expanded. The simpler parameterization of the BCL method mitigated this issue and proved to be more computationally efficient, particularly for NMA datasets with large numbers of treatments and studies.

Figure 2:

Figure 2:

Running times (in minutes) are presented for four methods: the proposed BCL method without parallel sampling (BCL), the BCL method with parallel sampling (BCL Parallel), the Bayesian full-likelihood method with equal within-study correlations (BFL Equal Correlation), and the Bayesian full-likelihood method with unequal within-study correlations (BFL Unequal Correlation).

5. Application

5.1. Application to root coverage

The details of MCMC setup and potential scale reduction factor are provided in Appendices E and F of the Supplementary Material (Wang et al., 2025). The NMA estimates and 90% CrIs for the four outcomes (CRC, RecRed, CAL gain, and KT gain) are displayed in Figures S6S9 of the Supplementary Material. For CRC, RecRed and KT gain, the NMA results were consistent across both the BCL-OFS method and the standard Bayesian NMA using Lu and Ades’ approach. However, several discrepancies were observed for CAL gain. These discrepancies were exemplified by the results for CAF+CM versus CAF+EMD (BCL-OFS: mean difference (MD) = −0.26; 90% CrI, −1.20 to 0.68; the standard NMA: MD = 0.56; 90% CrI, −2.04 to 3.27), CAF versus CAF+EMD (BCL-OFS: MD = −0.21; 90% CrI, −0.73 to 0.31; the standard NMA: MD = 0.41; 90% CrI, −1.11 to 2.02), and CAF+CM versus CAF (BCL-OFS: MD = 0.06; 90% CrI, −0.50 to 0.60; the standard NMA: MD = −0.15; 90% CrI, −2.77 to 2.42). Moreover, the standard Bayesian method exhibited wider 90% CrIs. These discrepancies might be attributed to two explanations: first, the original analysis identified network inconsistency, leading to large variability in both direct and indirect comparisons; and second, the proposed BCL-OFS method was more robust to misspecified within-study correlations compared to the standard NMA method.

Figure 3 shows two-dimensional concordance plots of Z-values, comparing the proposed BCL and BCL-OFS methods to the standard NMA using Lu and Ades’ method across four relevant outcomes. The Z-value represents the Z statistic for the overall treatment effect and is calculated as Z=μˆsd(μˆ), where μˆ is the estimated treatment effect and sd (μˆ) is the corresponding estimated standard error. A Z-value exceeding |1.96| corresponds to a p-value less than 0.05, indicating statistical significance. Most points fell within the diagonal quadrants, indicating agreement between the proposed and standard NMA methods. Nevertheless, some points displayed discordant evidence in treatment comparisons for CAL gain. These discrepancies in the 90% CrIs between the proposed BCL and BCL-OFS methods are likely due to the correction of standard errors for estimated pooled treatment effects in the BCL-OFS method.

Figure 3:

Figure 3:

Concordance plots comparing Z-values from the proposed BCL and BCL-OFS methods (x-axis) with those from the standard Bayesian NMA using Lu and Ades’ method (y-axis) across four outcomes for the root coverage data. Cross marks indicate results from the BCL method, while open circles represent results from the BCL-OFS method, both compared with the standard Bayesian NMA. Red dashed lines represent the critical values for the 95% confidence level.

Figure S5 of the Supplementary Material (Wang et al., 2025) presents the ranking of treatments obtained by integrating posterior estimates into the SUCRA calculation for four outcomes related to the root coverage data. As expected, the most effective treatment was CAF+CTG for RecRed (SUCRA = 0.87) and CAL gain (SUCRA = 0.94). For the outcome CRC, the most effective treatment was CAF+EMD (SUCRA = 0.99), while for KT gain, it was CAF+CM (SUCRA = 0.99). These results align with the findings of Buti et al. (2013). However, the BCL-OFS method revealed a key difference in the outcome CRC; the comparison between CAF+CTG and CAF+EMD was not statistically significant (Odds ratio (OR) = 0.61; 90% CrI, 0.24 to 1.52) in the standard NMA method but was significant (OR = 0.63; 90% CrI, 0.40 to 0.93) in the BCL-OFS method. This implies that CAF+CTG may not be the gold standard for the outcome CRC.

5.2. Application to anemia in adults with chronic kidney disease

The league tables presenting NMA estimates and 95% CrIs (or CIs) for the two primary outcomes (among a total of nine outcomes, including preventing blood transfusion and all-cause death) are summarized in Figures S12 and S13 of the Supplementary Material (Wang et al., 2025). These estimates were obtained using two methods: the standard NMA employing White’s approach (White et al., 2012) and the proposed BCL-OFS method. Both methods yielded largely consistent NMA estimates for the two primary outcomes. For the outcome of preventing blood transfusion, as shown in the upper triangular matrix of Figure S12, four ESAs demonstrated significant risk reductions compared to placebo: darbepoetin alfa (OR = 0.37; 95% CrI, 0.13 to 0.82), Epoetin beta (OR = 0.25; 95% CrI, 0.10 to 0.52), methoxy polyethylene glycol-epoetin beta (OR = 0.35; 95% CrI, 0.10 to 0.91), and epoetin alfa (OR = 0.35; 95% CrI, 0.12 to 0.72). However, a discrepancy was observed for methoxy polyethylene glycol-epoetin beta versus placebo: the proposed method identified a significant result (OR = 0.35; 95% CrI, 0.10 to 0.91), while White’s approach reported a non-significant result (OR = 0.33; 95% CI, 0.11 to 1.02). For the outcome of all-cause death, NMA estimates revealed high consistency between the two methods, with only three exceptions in treatment comparisons, such as methoxy polyethylene glycol-epoetin beta versus placebo and darbepoetin alfa versus methoxy polyethylene glycol-epoetin beta (Figure S13). These discrepancies may be interpreted as follows. First, the original analysis reported an I2 statistic of 0% when comparing darbepoetin alfa to methoxy polyethylene glycol-epoetin beta, indicating that the total variance was not completely dominated by the between-study variance. Some unexplained variability likely arose from within-study correlations. If these within-study correlations were not properly specified, they might result in overestimated standard errors for pooled estimates of treatment comparisons. Second, methoxy polyethylene glycol-epoetin beta had very low-certainty evidence in the original study, which might have contributed to the observed inconsistencies.

Figure 4 displays two-dimensional concordance plots of statistical significance based on the Z-values obtained from these methods, including the standard NMA using White’s method and the proposed BCL and BCL-OFS methods. These results revealed a high degree of concordance between the methods across nine outcomes, with most data points clustering around the diagonal quadrants. However, some points from the proposed BCL-OFS method deviated from the diagonal for the outcome of major cardiovascular events.

Figure 4:

Figure 4:

Concordance plots comparing Z-values from the proposed BCL and BCL-OFS methods (x-axis) with those from the NMA using White’s approach (y-axis) across all nine outcomes for the anemia data. Cross marks indicate results from the BCL method, while open circles represent results from the BCL-OFS method, both compared with the NMA using White’s approach. Red dashed lines represent the critical values for the 95% confidence level.

In Figure S11 of the Supplementary Material (Wang et al., 2025), epoetin beta was identified as the top-ranked treatment based on SUCRA values, achieving the highest scores of 0.76 for preventing blood transfusion and 0.87 for all-cause death. These results are consistent with the previous research by Chung et al. (2023), suggesting potential clinical benefits associated with epoetin beta. Interestingly, epoetin beta exhibited the lowest SUCRA value of 0.26 for the outcome of hypertension, which aligns with earlier findings by Chung et al. (2023). Although the previous study did not definitively establish the superiority or safety of one epoetin drug over another, the highest SUCRA values of 0.99 in these outcomes suggest that epoetin beta offers distinct benefits in preventing cardiovascular death and myocardial infarction.

6. Discussion

This paper introduced a new method for analyzing multiple outcomes in NMAs, particularly addressing the challenge of specifying within-study correlations. The proposed BCL method provides unbiased estimates of relative treatment effects without requiring knowledge of within-study correlations among treatments and outcomes, thereby simplifying the analysis. A hybrid Gibbs sampler was implemented to efficiently draw samples from the posterior distributions, allowing us to capture uncertainty in the estimated treatment effects. Additionally, we introduced the OFS adjustment, which refines posterior samples to better approximate the true posterior distributions of treatment effects. Through extensive simulations, we demonstrated the robustness of the proposed BCL and BCL-OFS methods under a range of scenarios, including varying numbers of studies, magnitudes of within-study and between-study correlations, and both common and heterogeneous between-study variances. Based on the simulation findings, the BCL-OFS method is recommended for settings with at least low to moderate correlation structures and those involving varying correlation structures.

The proposed BCL method offers several key advantages. First, it eliminates the need to specify within-study correlations, which guarantees valid statistical inferences regardless of the underlying correlation structures, as demonstrated in our simulation study. This feature greatly enhances the BCL method’s applicability to real-world settings, where estimating within-study correlations can be difficult or prone to misspecification. Second, the BCL method simplifies parameterization compared to the BFL methods by focusing solely on estimating treatment effects and between-study variances. This reduction in model complexity also improves computational efficiency for large-scale and complex NMAs. Third, the proposed BCL method avoids the computational burden of enforcing positive semi-definiteness constraints on variance-covariance matrices, which is a common requirement in existing multivariate approaches (Nam et al., 2003). Lastly, the OFS adjustment improves the calibration of posterior samples and better approximates true posterior distributions. Unlike existing in-sampling location and curvature adjustment methods (Pauli et al., 2011), which can increase computational cost and potentially distort MCMC sampling, the OFS adjustment achieves improved accuracy without interfering with the sampling process.

Despite these advantages, several areas warrant further exploration. As the network structure becomes more complex, the computational cost associated with the OFS adjustment – particularly for calculating the score vector and Hessian matrix – can become substantial. Investigating computationally efficient extensions to the OFS adjustment could increase its scalability for large-scale and complex network meta-analyses. Additionally, the current implementation of the Gibbs sampler employs a uniform step size across all outcomes. By incorporating an adaptive Metropolis-Hastings algorithm (Brooks et al., 2011), sampler efficiency could be improved through self-adaptive step sizes for each outcome, leading to faster convergence and more accurate estimates.

Another potential limitation concerns the use of normal approximations for binary outcomes via log transformations, such as log odds ratios or log risk ratios. While this approximation is widely adopted for its computational efficiency, it may not fully capture the complexities of binary data, especially in scenarios with rare or zero events. Bayesian methods for binary outcomes can face issues, such as convergence problems in the presence of sparse data (Al Amer et al., 2021) and sensitivity to prior specifications for covariance matrices (Wang et al., 2020). Exact models for binary outcomes provide more precise estimates, but they often impose significant computational burdens, particularly in large-scale network meta-analyses. Future work could explore hybrid approaches that combine the simplicity of normal approximations with more accurate exact models for sensitivity analyses. For example, methods that adaptively switch between normal and exact models, based on event rates or data sparsity, could improve robustness. Additionally, developing more efficient Bayesian algorithms with robust prior specifications tailored to binary outcomes may mitigate convergence issues without sacrificing computational feasibility.

Our simulation study employs aggregated-level data generated from a multivariate random-effects model, as specified in Equation (1). This approach provides a practical and computationally efficient framework for evaluating the proposed method. However, it may not fully capture the complexities inherent in real-world data, especially for binary outcomes. Several empirical studies have demonstrated that results from individual participant data (IPD) and aggregated data can be equivalent under certain conditions (Tudur Smith et al., 2016; Tierney et al., 2020), but we acknowledge that IPD-based simulations offer a more realistic representation of data-generating processes. Challenges also arise in multi-arm simulations when dealing with binary outcomes. In multi-arm trials, the assumption of independence among treatments may be violated due to repeated comparisons involving the same baseline arm (Seide et al., 2019). Overcoming this challenge requires developing a simulation framework that integrates both independent two-arm and multi-arm trials while appropriately accounting for these dependencies (Seide et al., 2019). Such advancements would improve the robustness and generalizability of the proposed methods for analyzing binary outcomes in multivariate network meta-analyses.

Finally, we note that the motivating examples in this study may be subject to potential outcome reporting bias, a common concern in meta-analyses and network meta-analyses (Marks-Anglin and Chen, 2020). Extending the proposed methods to incorporate sensitivity analyses (Hutton and Williamson, 2000) would allow for adjustments to this bias, thereby improving the reliability of the findings in the presence of selective outcome reporting.

Supplementary Material

Supplementary

Supplementary Material for “Exploiting Multivariate Network Meta-Analysis: A Calibrated Bayesian Composite Likelihood Inference” (DOI: 10.1214/25-BA1511SUPP; .pdf).

  • Appendix A: Evidence network diagrams. Diagrams illustrating the evidence networks for two datasets: one comparing treatment procedures for root coverage and the other comparing treatments for anemia in chronic kidney disease patients.

  • Appendix B: Derivation of posterior distributions for treatment effect parameters. A detailed derivation of the posterior kernel function is provided.

  • Appendix C: Simulation design. Comprehensive details of the data generation process used in the simulations.

  • Appendix D: Additional simulation results. Supplementary results from simulation studies based on the proposed methods are presented.

  • Appendix E: MCMC setup for real applications. A detailed description of the MCMC setup used in two real applications.

  • Appendix F: Potential scale reduction factor. A description and discussion of the potential scale reduction factor are provided.

  • Appendix G: Additional results from real applications. Supplementary results, including treatment rankings and league tables, are provided.

Acknowledgments

The authors sincerely thank the anonymous referees, the Associate Editor, and the Editor for their insightful and constructive comments, which have significantly enhanced the quality of this paper.

Funding

This work was supported by the National Institutes of Health - NIDDK grant: R01DK128237 (YW and YL).

References

  1. Al Amer FM, Thompson CG, and Lin L (2021). “Bayesian methods for meta-analyses of binary outcomes: implementations, examples, and impact of priors.” International Journal of Environmental Research and Public Health, 18(7): 3492. doi: 10.3390/ijerph18073492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Brooks S, Gelman A, Jones G, and Meng X-L (2011). Handbook of Markov Chain Monte Carlo. Boca Raton, FL: CRC Press. doi: 10.1201/b10905.21 [DOI] [Google Scholar]
  3. Buti J, Baccini M, Nieri M, La Marca M, and Pini-Prato GP (2013). “Bayesian network meta-analysis of root coverage procedures: ranking efficacy and identification of best treatment.” Journal of Clinical Periodontology, 40(4): 372–386. doi: 10.1111/jcpe.12028. [DOI] [PubMed] [Google Scholar]
  4. Cairo F, Pagliaro U, and Nieri M (2008). “Treatment of gingival recession with coronally advanced flap procedures: a systematic review.” Journal of Clinical Periodontology, 35(s8): 136–162. doi: 10.1111/j.1600-051X.2008.01267.x. [DOI] [PubMed] [Google Scholar]
  5. Chandler RE and Bate S (2007). “Inference for clustered data using the independence loglikelihood.” Biometrika, 94(1): 167–183. doi: 10.1093/biomet/asm015. [DOI] [Google Scholar]
  6. Chen Y, Hong C, and Riley RD (2014a). “An alternative pseudolikelihood method for multivariate random-effects meta-analysis: An alternative pseudolikelihood method for multivariate random-effects meta-analysis.” Statistics in Medicine, 34(3): 361–380. doi: 10.1002/sim.6350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chen Y, Liu Y, Ning J, Nie L, Zhu H, and Chu H (2014b). “A composite likelihood method for bivariate meta-analysis in diagnostic systematic reviews.” Statistical Methods in Medical Research, 26(2): 914–930. doi: 10.1177/0962280214562146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cheng Y, Chen J, Lin S, and Lu H (2007). “Is coronally positioned flap procedure adjunct with enamel matrix derivative or root conditioning a relevant predictor for achieving root coverage? A systemic review.” Journal of Periodontal Research, 42(5): 474–485. doi: 10.1111/j.1600-0765.2007.00971.x. [DOI] [PubMed] [Google Scholar]
  9. Chung EY, Palmer SC, Saglimbene VM, Craig JC, Tonelli M, and Strippoli GF (2023). “Erythropoiesis-stimulating agents for anaemia in adults with chronic kidney disease: a network meta-analysis.” Cochrane Database of Systematic Reviews, 2: Art. No.: CD010590. doi: 10.1002/14651858.CD010590.pub3. [DOI] [Google Scholar]
  10. Cipriani A, Higgins JPT, Geddes JR, and Salanti G (2013). “Conceptual and technical challenges in network meta-analysis.” Annals of Internal Medicine, 159(2): 130–137. [DOI] [PubMed] [Google Scholar]
  11. Duan R, Tong J, Lin L, Levine L, Sammel M, Stoddard J, Li T, Schmid CH, Chu H, and Chen Y (2023). “PALM: Patient-centered treatment ranking via large-scale multivariate network meta-analysis.” The Annals of Applied Statistics, 17(1): 815–837. doi: 10.1214/22-aoas1652. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Efthimiou O, Mavridis D, Cipriani A, Leucht S, Bagos P, and Salanti G (2014). “An approach for modelling multiple correlated outcomes in a network of interventions using odds ratios.” Statistics in Medicine, 33(13): 2275–2287. doi: 10.1002/sim.6117. [DOI] [PubMed] [Google Scholar]
  13. Efthimiou O, Mavridis D, Riley RD, Cipriani A, and Salanti G (2015). “Joint synthesis of multiple correlated outcomes in networks of interventions.” Biostatistics, 16(1): 84–97. doi: 10.1093/biostatistics/kxu030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Guolo A and To D-K (2020). “A pseudo-likelihood approach for multivariate meta-analysis of test accuracy studies with multiple thresholds.” Statistical Methods in Medical Research, 30(1): 204–220. doi: 10.1177/0962280220948085. [DOI] [PubMed] [Google Scholar]
  15. Hanna RM, Streja E, and Kalantar-Zadeh K (2021). “Burden of anemia in chronic kidney disease: beyond erythropoietin.” Advances in Therapy, 38(1): 52–75. [Google Scholar]
  16. Hong H, Chu H, Zhang J, and Carlin BP (2016). “A Bayesian missing data framework for generalized multiple outcome mixed treatment comparisons.” Research Synthesis Methods, 7(1): 6–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hutton JL and Williamson PR (2000). “Bias in meta-analysis due to outcome variable selection within studies.” Journal of the Royal Statistical Society. Series C (Applied Statistics), 49(3): 359–370. URL http://www.jstor.org/stable/2680770 doi: 10.1111/1467-9876.00197. [DOI] [Google Scholar]
  18. Jackson D, Bujkiewicz S, Law M, Riley RD, and White IR (2018). “A matrix-based method of moments for fitting multivariate network meta-analysis models with multiple outcomes and random inconsistency effects.” Biometrics, 74(2): 548–556. doi: 10.1111/biom.12762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Jackson D, Riley R, and White IR (2011). “Multivariate meta-analysis: potential and promise.” Statistics in Medicine, 30(20): 2481–2498. doi: 10.1002/sim.4172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Kirkham JJ, Riley RD, and Williamson PR (2012). “A multivariate meta-analysis approach for reducing the impact of outcome reporting bias in systematic reviews.” Statistics in Medicine, 31(20): 2179–2195. doi: 10.1002/sim.5356. [DOI] [PubMed] [Google Scholar]
  21. Lindsay BG (1988). “Composite likelihood methods.” Comtemporary Mathematics, 80(1): 221–239. doi: 10.1090/conm/080/999014. [DOI] [Google Scholar]
  22. Liu Y, DeSantis SM, and Chen Y (2018). “Bayesian mixed treatment comparisons meta-analysis for correlated outcomes subject to reporting bias.” Journal of the Royal Statistical Society. Series C, Applied Statistics, 67(1): 127–144. doi: 10.1111/rssc.12220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lu G and Ades AE (2004). “Combination of direct and indirect evidence in mixed treatment comparisons.” Statistics in Medicine, 23(20): 3105–3124. doi: 10.1198/016214505000001302. [DOI] [PubMed] [Google Scholar]
  24. Lu G and Ades AE (2006). “Assessing evidence inconsistency in mixed treatment comparisons.” Journal of the American Statistical Association, 101(474): 447–459. doi: 10.1198/016214505000001302. [DOI] [Google Scholar]
  25. Lumley T (2002). “Network meta-analysis for indirect treatment comparisons.” Statistics in Medicine, 21(16): 2313–2324. [DOI] [PubMed] [Google Scholar]
  26. Marks-Anglin A and Chen Y (2020). “A historical review of publication bias.” Research Synthesis Methods, 11(6): 725–742. doi: 10.1002/jrsm.1452. [DOI] [PubMed] [Google Scholar]
  27. Nam I-S, Mengersen K, and Garthwaite P (2003). “Multivariate meta-analysis.” Statistics in Medicine, 22(14): 2309–2333. doi: 10.1002/sim.4223. [DOI] [PubMed] [Google Scholar]
  28. Pauli F, Racugno W, and Ventura L (2011). “Bayesian composite marginal likelihoods.” Statistica Sinica, 21(1): 149–164. [Google Scholar]
  29. Petropoulou M, Nikolakopoulou A, Veroniki A-A, Rios P, Vafaei A, Zarin W, Giannatsi M, Sullivan S, Tricco AC, Chaimani A, et al. (2017). “Bibliographic study showed improving statistical methodology of network meta-analyses published between 1999 and 2015.” Journal of Clinical Epidemiology, 82: 20–28. [DOI] [PubMed] [Google Scholar]
  30. Ribatet M, Cooley D, and Davison AC (2012). “Bayesian inference from composite likelihoods, with an application to spatial extremes.” Statistica Sinica, 22(2): 813–845. [Google Scholar]
  31. Riley RD (2009). “Multivariate meta-analysis: the effect of ignoring within-study correlation.” Journal of the Royal Statistical Society: Series A (Statistics in Society), 172(4): 789–811. doi: 10.1111/j.1467-985X.2008.00593.x. [DOI] [Google Scholar]
  32. Riley RD, Jackson D, Salanti G, Burke DL, Price M, Kirkham J, and White IR (2017). “Multivariate and network meta-analysis of multiple outcomes and multiple treatments: rationale, concepts, and examples.” BMJ, 358: j3932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Riley RD, Thompson JR, and Abrams KR (2008). “An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown.” Biostatistics, 9(1): 172–186. [DOI] [PubMed] [Google Scholar]
  34. Rücker G and Schwarzer G (2015). “Ranking treatments in frequentist network meta-analysis works without resampling methods.” BMC Medical Research Methodology, 15(1): 58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Salanti G (2012). “Indirect and mixed-treatment comparison, network, or multiple-treatments meta-analysis: many names, many benefits, many concerns for the next generation evidence synthesis tool.” Research Synthesis Methods, 3(2): 80–97. [DOI] [PubMed] [Google Scholar]
  36. Salanti G, Ades AE, and Ioannidis JPA (2011). “Graphical methods and numerical summaries for presenting results from multiple-treatment meta-analysis: an overview and tutorial.” Journal of Clinical Epidemiology, 64(2): 163–171. [DOI] [PubMed] [Google Scholar]
  37. Seide SE, Jensen K, and Kieser M (2019). “Simulation and data-generation for random-effects network meta-analysis of binary outcome.” Statistics in Medicine, 38(17): 3288–3303. doi: 10.1002/sim.8193. [DOI] [PubMed] [Google Scholar]
  38. Shaby BA (2014). “The open-faced sandwich adjustment for MCMC using estimating functions.” Journal of Computational and Graphical Statistics, 23(3): 853–876. doi: 10.1080/10618600.2013.842174. [DOI] [Google Scholar]
  39. Tierney JF, Fisher DJ, Burdett S, Stewart LA, and Parmar MKB (2020). “Comparison of aggregate and individual participant data approaches to meta-analysis of randomised trials: An observational study.” PLOS Medicine, 17(1): e1003019. doi: 10.1371/journal.pmed.1003019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Tudur Smith C, Marcucci M, Nolan SJ, Iorio A, Sudell M, Riley R, Rovers MM, and Williamson PR (2016). “Individual participant data meta-analyses compared with meta-analyses based on aggregate data.” Cochrane Database of Systematic Reviews, 2016(9). doi: 10.1002/14651858.MR000007.pub3. [DOI] [Google Scholar]
  41. Varin C, Reid N, and Firth D (2011). “An overview of composite likelihood methods.” Statistica Sinica, 21(1): 5–42. [Google Scholar]
  42. Waddingham E, Matthews PM, and Ashby D (2020). “Exploiting relationships between outcomes in Bayesian multivariate network meta-analysis with an application to relapsing-remitting multiple sclerosis.” Statistics in Medicine, 39(24): 3329–3346. doi: 10.1002/sim.8668. [DOI] [PubMed] [Google Scholar]
  43. Wang Z, Lin L, Hodges JS, and Chu H (2020). “The impact of covariance priors on arm-based Bayesian network meta-analyses with binary outcomes.” Statistics in Medicine, 39(22): 2883–2900. doi: 10.1002/sim.8580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Wang Y, Lin L, and Liu Y-L (2025) “Supplementary Material for “Exploiting Multivariate Network Meta-Analysis: A Calibrated Bayesian Composite Likelihood Inference”.” doi: 10.1214/25-BA1511SUPP. [DOI] [Google Scholar]
  45. Wei Y and Higgins JPT (2013). “Estimating within-study covariances in multivariate meta-analysis with multiple outcomes.” Statistics in Medicine, 32(7): 1191–1205. doi: 10.1002/sim.5679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. White IR, Barrett JK, Jackson D, and Higgins JPT (2012). “Consistency and inconsistency in network meta-analysis: model estimation using multivariate meta-regression.” Research Synthesis Methods, 3(2): 111–125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Witkowski M, Wilkinson L, Webb N, Weids A, Glah D, and Vrazic H (2018). “A systematic literature review and network meta-analysis comparing once-weekly semaglutide with other GLP-1 receptor agonists in patients with type 2 diabetes previously receiving basal insulin.” Diabetes Therapy, 9: 1233–1251. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary

RESOURCES