Abstract
Network meta-analysis (NMA) is gaining popularity in evidence synthesis and network meta-regression (NMR) allows us to incorporate potentially important covariates into network meta-analysis. In this paper, we propose a Bayesian network meta-regression hierarchical model and assume a general multivariate t distribution for the random treatment effects. The multivariate t distribution is desired for heavy-tailed random effects and converges to the multivariate normal distribution when the degrees of freedom go to infinity. Moreover, in NMA, some treatments are compared only in a single study. To overcome such sparsity, we propose a log-linear regression model for the variances of the random effects and incorporate aggregate covariates into modeling the variance components. We develop a Markov chain Monte Carlo sampling algorithm to sample from the posterior distribution via the collapsed Gibbs technique. We further use the Deviance Information Criterion (DIC) and the logarithm of the Pseudo-marginal likelihood (LPML) for model comparison. A simulation study is conducted and a detailed analysis from our motivating case study is carried out to further demonstrate the proposed methodology.
Keywords: Arm-based model, Collapsed Gibbs sampling, Multivariate t distribution, Surface under the cumulative ranking curve (SUCRA), Triglycerides (TG)
1 |. INTRODUCTION
Triglycerides (TG) are a type of fat found in blood which is used by body for energy. Some TG are needed for good health, but high levels might raise the risk of heart disease. A TG level below 150 mg/dL is considered normal, from 150 mg/dL to 199 mg/dL borderline-high, from 200 mg/dL to 499 mg/dL high and over 500 mg/dL very high. Diet and lifestyle changes are recommended as the first line for therapy of hypertriglyceridemia, but drug therapy is often required. The drugs currently available for the treatment of hypertriglyceridemia include niacin, fibrates and omega-3-fatty acids. In a recent study, the addition of ezetimibe to the ongoing lipid lowering therapies reduced TG levels by 33% and by 8% with the monotherapy of ezetimibe.1
According to the National Vital Statistics Reports,2 cardiovascular disease (CVD) continues to be the leading cause of death for both men and women. This is the case in the U.S. and worldwide. Although low density lipoprotein cholesterol (LDL-C) is a primary cause of CVD, other risk factors, for example, triglycerides, contribute as well. The 2011 statement cited epidemiolog- ical evidence that a moderate elevation in the TG level is often associated with increased atherosclerotic cardiovascular disease (ASCVD) risk.3 Several studies have demonstrated that elevated TG associated with genetic variants may be a causal factor for ASCVD and possibly for premature all-cause mortality.4,5,6 A long-standing association exists between elevated TG levels and CVD.7,8 In a meta-analysis of 17 studies, increased TG levels were associated with increased coronary disease risk in both men and women, after adjustment for high density lipoprotein cholesterol (HDL-C) and other risk factors.9 The randomized, controlled clinical trial REDUCE-IT10 has shown that intervention to lower TG levels is associated with reduced CVD events.
In the process of synthesizing evidence from multiple studies, study-level covariates might be taken into account since the settings of studies may differ.11,12,13,14 Network meta-regression, by incorporating categorical or continuous covariates, can reduce heterogeneity between studies and summarize treatment effects after adjusting for study-level differences. The importance of adjusting for covariate effects in the presence of treatment-covariate interactions and adding individual baseline characteristics as covariates to the random effects logistic model are illustrated in Batson et al.14 Categorical study-level covariates can also split the data into subgroups and be used to investigate subgroup effects.12 Mainstream modeling methods for NMR focus on synthesis of relative treatment effects data from studies (contrast-based models). In cases where the effect of each treatment arm is available, an alternative method of modeling the absolute treatment effects (arm-based models) has been investigated.15,16,17 In the arm-based framework, NMR allows us to include treatment-by-study-level covariates. For both modeling methods, an exchangeability assumption is usually made to allow for variation in the true treatment effects across trials, which induces a generalized linear mixed effects model. A detailed discussion of generalized linear mixed effects models is provided for implementing arm-based network meta-regression within the frequentist framework.13 In addition to Markov chain Monte Carlo (MCMC) sampling, which is widely used in NMA or NMR, integrated nested Laplace approximations are proposed to carry out Bayesian inference of the NMR model and compare the results to those obtained by MCMC.18 In this paper, we consider 10 baseline characteristic covariates in our arm-based NMR model in order to reduce the bias in estimating the true treatment effects. The covariate effects are assumed fixed while the treatment effects are assumed random. In all of the aforementioned work, regardless of the types of outcomes and modeling methods, the common distribution for the trial-specific random treatment effects is usually assumed to be a normal distribution. Here, we assume a general multivariate t distribution for the random treatment effects. The random effects converge to a commonly used multivariate normal distribution when the degrees of freedom in the multivariate t distribution go to infinity. The use of the multivariate t distribution creates more challenges in sampling from posterior distributions and carrying out Bayesian inference. We develop a generalized collapsed Gibbs sampling algorithm to implement the posterior computations.
While the methodology for network meta-analysis has become more and more sophisticated, the complexity of the NMA structure keeps increasing. The number of treatments compared in one study is typically smaller than the complete collection of treatments in the NMA data. This complicates the adoption of non-informative prior distributions and inference for the parameters in the covariance matrix of the random effects. In a contrast-based model, the covariance matrix of the random effects is often assumed to have certain structures under the homogeneous variances (the variances of the random effects are equal) and consistency (no apparent discrepancy between direct and indirect comparisons) assumption.18,19,20,21 Lu and Ades have proposed spherical parameterizations to generate a positive-definite matrix to facilitate the unstructured covariance matrix. 22 Li et al.11 propose an alternative approach based on partial correlations to generate the correlation matrix, which results in the same posterior inference under non-informative priors for the correlations. In addition, it is also common that some treatments are included in single trials so that the variances of the random effects of these treatments cannot be estimated. To resolve this issue, a grouping approach for the variances of the random effects based on the mechanism of the corresponding treatments has been investigated.11 In this paper, we extend the grouping approach to a modeling approach and assume a log-linear model for the variances of the random effects so that the variances can depend on the treatment-by-study-level aggregate covariates. The modeling approach is a natural extension to the grouping approach and also facilitates more flexibility.
We further use the DIC and LPML to compare the performance of different degrees of freedom of the multivariate t distribution and different sets of covariates for modeling the variances of the random effects. The rest of this article is organized as follows. In Section 2, we describe the motivating network meta-data, which have continuous outcomes in each study arm for 29 studies consisting of 11 different treatments. In Section 3, we thoroughly develop the network meta-regression hierarchical model. This includes specifying a multivariate t distribution for the random effects and a log-linear model for heterogenous variances. The complete-data likelihood and the observed-data likelihood are also given in this section. The priors, posteriors, MCMC sampling algorithm, and Bayesian model comparison criteria are presented in Sections 4.1, 4.2, and 4.3, respectively. In Section 5, we carry out a simulation study to examine the empirical performance of the proposed methodology. In Section 6, we give detailed analyses and Bayesian inference for our motivating case study. Finally, we conclude the article with some discussion for future research in Section 7.
2 |. MOTIVATING CASE STUDY: THE TNM DATA
A systematic online search yielded 78 double-blind, randomized, active or placebo-controlled clinical trials. From these trials, 37 second-line studies (i.e., studies with patients on statin at study entry) were excluded. From the remaining 41 first-line trials (i.e., studies with patients who were drug-naive or rendered drug-naive by wash-out at study entry), 12 trials with missing information (5 trials having the response variable missing, 2 trials having the study covariates missing and 5 trials having both the response variable and covariates missing) were further excluded. The flow diagram of data collection can be found in Figure 1(a) of Li et al. (2019).11 The remaining 29 studies compose the motivating dataset, the Triglycerides Network Meta (TNM) data. These studies reported the TG lowering effects of placebo (PBO), five statins (simvastatin (S), atorvastatin (A), lovastatin (L), rosuvastatin (R), pravastatin (P)), ezetimibe (E), or statins plus ezetimibe (S and E (SE), A and E (AE), L and E (LE), and P and E (PE)). Figure 1 exhibits the study network among the cholesterol lowering treatments for patients with primary hypercholesterolemia. The outcome variable of our data is the mean percent change from baseline in TG. Our continuous outcome and the sample standard deviation for each trial arm are presented in Table 1. Figure 2 is the boxplot of the outcome variable grouped by treatments. We can see from Table 1 and Figure 2 that some treatments have a wide range of effects in lowering TG. For example, the mean TG lowering effect ranges from −10.20 to −27.00 for treatment A, and ranges from −8.50 to −29.20 for treatment R. Particularly, two data points fall outside the boxplot of treatment R and they are labelled with trial IDs. Treatments A and R are compared in the most trials and have the widest ranges in effects. The potential heavy-tails in the outcome variable prompt us to assume heavy-tailed random effects for the treatment effects.
FIGURE 1.
TG network meta-data diagram. Each node represents one treatment in the network. Node sizes are proportional to the total number of subjects treated in the arms across the network. Each edge represents at least one direct comparison that exists for two connected treatments. The involved trial IDs are listed on the edges. Thinner lines represent fewer direct comparisons. Red, green or blue colors represent 1, 2–4, or 5+ direct comparisons available for the connected treatments.
TABLE 1.
TG network meta-data: mean (SD) percent change from baseline in TG
| Treatment Study ID | PBO | S | A | L | R | P | E | SE | AE | LE | PE |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | −2.20 (33.00) | −15.20 (34.10) | −13.20 (27.80) | −28.00 (28.00) | |||||||
| 2 | 2.40 (23.26) | −16.60 (22.70) | −8.30 (23.24) | −24.10 (23.17) | |||||||
| 3 | −1.90 (31.42) | −20.80 (29.69) | −10.70 (31.63) | −24.30 (27.03) | |||||||
| 4 | −19.00 (29.92) | −22.82 (25.49) | |||||||||
| 5 | −14.97 (27.36) | −22.38 (27.96) | |||||||||
| 6 | −22.90 (24.18) | −28.60 (25.29) | |||||||||
| 7 | −22.50 (28.23) | −25.45 (28.08) | |||||||||
| 8 | −25.50 (29.51) | −27.40 (29.45) | |||||||||
| 9 | −25.75 (25.17) | −27.72 (26.78) | |||||||||
| 10 | −26.41 (29.25) | −26.38 (30.78) | |||||||||
| 11 | −24.70 (27.80) | −25.15 (22.09) | |||||||||
| 12 | −6.40 (42.21) | −24.50 (24.41) | −5.10 (25.46) | −32.80 (24.43) | |||||||
| 13 | 4.00 (24.00) | −11.00 (29.66) | −3.00 (25.46) | −22.00 (27.71) | |||||||
| 14 | 2.00 (30.64) | −7.60 (30.07) | −2.10 (30.40) | −17.60 (29.99) | |||||||
| 15 | −24.60 (25.06) | −26.20 (24.09) | |||||||||
| 16 | −2.80 (27.38) | −20.90 (27.39) | −19.10 (28.17) | ||||||||
| 17 | −1.00 (26.42) | −19.00 (27.05) | −18.00 (27.17) | ||||||||
| 18 | −14.00 (23.04) | −18.00 (22.98) | −20.00 (23.15) | ||||||||
| 19 | −18.35 (25.38) | −18.47 (26.10) | |||||||||
| 20 | −19.10 (26.32) | −17.90 (26.64) | |||||||||
| 21 | −27.00 (27.04) | −22.20 (29.55) | |||||||||
| 22 | −23.90 (27.36) | −29.20 (27.30) | |||||||||
| 23 | −17.80 (24.79) | −18.60 (24.82) | |||||||||
| 24 | −16.00 (28.30) | −16.98 (28.90) | |||||||||
| 25 | −22.10 (25.98) | −22.90 (29.20) | |||||||||
| 26 | −26.11 (29.11) | −21.88 (28.37) | |||||||||
| 27 | −11.80 (34.48) | −13.50 (36.35) | |||||||||
| 28 | −10.20 (40.36) | −8.50 (41 70) | |||||||||
| 29 | −14.30 (31.31) | −12.00 (32.17) | |||||||||
FIGURE 2.
Boxplot of the mean percent change in TG for each treatment arm. The width of each box is proportional to the number of trials that compared the corresponding treatment.
In our data, ezetimibe is available at only one dose of 10 mg while the statins are available at múltiple doses. We follow the recommendation from the Cochrane Handbook23 to form the TG lowering effects of each statin with different doses. We consider 10 common covariates from the 29 studies, which are aggregated on the treatment-by-study level. An examination of the covariates shows that there is heterogeneity among the baseline characteristics of the included studies. For example, the trial durations were not uniform and ranged between 6 and 12 months. Further, the proportions of white race and male gender also varied across trials. In particular, the proportions of white race were 0% in trials 13 and 16 and almost 100% in trials 18 and 21. Furthermore, the proportions of high statin potency were 100% in trials 8, 15, and 21. These intuitive inspections encourage us to incorporate covariates into our meta-regression model. A detailed summary of the aggregate covariates and the sample size of each treatment arm in each study can be found in Tables A1a and A1b of Appendix A of the Supplementary Materials.
3 |. NETWORK META-REGRESSION HIERARCHICAL MODEL
3.1 |. Model
Assume that is the complete collection of treatments across a total of K trials in the data. Within a multiple treatment comparison framework, although it is rarely the case that the complete collection of treatments is compared in all K trials, we momentarily assume this is true for ease of exposition. Thus, for the tth treatment in the kth trial, let ykt and denote the aggregate response, which generally represents the sample mean and the sample variance, respectively. Following Zhang et al.16, Yao et al.24, and Hong et al.25, we propose a general hierarchical random effects model for the network meta-regression as
| (1) |
and
| (2) |
The aggregate variables, ykt and are independent.26 The aggregate covariate vector, xkt, is p-dimensional and β is the regression coefficient vector corresponding to xkt.
The random effect of the tth treatment in the kth trial, γkt, is assumed to be independent of ∈kt. It captures the dependence of the ykt’s within the trial as well as the heterogeneity across trials. We let be the vector of T-dimensional scaled random effects, which would be observed in the kth trial, and ρ denotes the T × T correlation matrix. We include the overall treatment effects in the coefficients β in (1) so that the first p − T components are the regression coefficients corresponding to the p − T aggregate covariates and the remaining T components are the overall treatment effects. We assume a general multivariate t distribution for the scaled random effects γk, that is
| (3) |
where v is the degrees of freedom. When the degrees of freedom v → ∞, the scaled random effects γk converge to a multivariate normal distribution, which is a common assumption for random effects in NMA. The multivariate t distribution is preferable for heavy tailed random effects. As shown in Figure 2, the mean TG lowering effects for both treatments A and R exhibit substantial variation from trial to trial with wide ranges of −10.20 to −27.00 for treatment A and −8.50 to −29.20 for treatment R. Thus, for the TNM data, the heavy tailed random effects in (3) may yield a better fit than light tailed random effects.
The variance of the random effect for the tth treatment in the kth trial is captured by . We further assume that
| (4) |
where is the q-dimensional vector of aggregate covariates and ϕ is the corresponding coefficient vector in the log-linear regression model. This means that the variances of the random effects in the log-linear regression model are treatment-by-study-based and can be determined by the aggregate covariates from each trial arm. The variance of the random effect, τkt, is generally more difficult to estimate than the regression coefficients β in (1) for the aggregate response. Also, certain covariates may be highly associated with the outcome variable while they may not affect the variance of the random effect, since the random effect quantifies the difference between the trial-specific treatment effect and the corresponding overall treatment effect. For these reasons, we assume that may be different than xkt and q ≤ p. If in (4), the random effects for the T treatments across all trials share the same variance. By letting ϕ = (ϕ1, …, ϕG) and , with the gth element equal to 1 and 0 otherwise if the tth treatment is from the gth group, we are able to divide these T treatments into G groups, and assume the variances of the random effects for those treatments within the same group share the same variance. This is analogous to the grouping approach11 for resolving the issue that some variances of the random effects cannot be estimated due to insufficient data in a heterogeneous variances model. Our log-linear regression model can easily accommodate the grouping approach, and therefore is an extension of that approach.
3.2. | Likelihood Function
In practice, we can only observe Tk treatments in the kth trial. Let denote the set of treatments compared in the kth trial. Define a collection of unit vectors , where with tkℓ th element equal to 1 and 0 otherwise. Thus, Ek is a T×Tk matrix. Let be a T×(T−Tk) matrix, which consists of the columns of the T×T identity matrix IT that are not included in Ek. We let be the vector of the Tk-dimensional scaled random effects of the treatments that are actually observed in the kth trial while is the vector of the (T − Tk)-dimensional scaled random effects of the treatments that are missing in the kth trial. It is easy to show that , and , for k =1,2,…, K.
Now, we replace the notation with subscript kt in the last subsection to the new notation with subscript ktkℓ to indicate the tkℓth treatment in the kth trial. Let denote the vector of the response variables and also let denote the vector of the random errors for the kth trial. We have , where is a Tk × Tk diagonal matrix. Let denote the Tk × p matrix of covariates for the kth trial and let be a Tk × Tk diagonal matrix. Then, the vector form of (1) for the kth trial is
| (5) |
Let , and Do = Doy ∪ Dos denotes the observed data. Let θ = (β, ϕ, ρ, Σ), where , and thus θ is the collection of all model parameters. Also, write γo = {γk,o k = 1, …, K}, which is the collection of random effects. Using (2) and (5) and the independence between and , the likelihood function can be written as
| (6) |
where f(γk,o|ρ, v) is the probability density function of . The scaled random effects γk,o cannot be directly integrated out from (6) because of the multivariate t distribution. We use the fact that a multivariate t random vector can be represented in terms of a multivariate normal vector whose covariance matrix is scaled by the reciprocal of a gamma random variable. In (5), the scaled random effects are conditionally distributed as a multivariate normal distribution, that is, , where for v > 0. Given λk, γk,o and ϵk are independent and normally distributed, we have
| (7) |
The following proposition gives the expression of the likelihood function after integrating out the random effects γo.
Proposition 1. Let , which is the density function of a random variable. After integrating out γo, the likelihood function in (6) reduces to
| (8) |
where
and
4 |. BAYESIAN INFERENCE
4.1 |. Priors and Posteriors
We assume independent non-informative priors for β, ϕ, Σ and ρ. Specifically, we assume that β ~ Np(0,c01Ip), ϕ ~ Nq(0, c02Iq), and . The hyperparameter c01 = 100,000. For our TG network meta-data, we use c02 = 4. When q = 1, c02 = 4 yields a 95% prior credible interval from 0.00039 to 2540.20 for the expected variance of the random effects, which is fairly non-informative as well as facilitating MCMC convergence. We also assume that subject to the constraint of positive definiteness. The degrees of freedom v can also be treated as random. We consider the following hierarchical prior: , and vb ~ IG(a5, b5) with density proportional to . This prior specification places the conditional mean of v at vb, i.e., E(v | va, vb) = vb. The hyperparameters are set as a4 = 10, b4 = 10, a5 = 11, b5 = 10.
Write λ = (λ1, …, λK)'. From our previous discussion and (6), the augmented posterior distribution is given by
| (9) |
4.2 |. Computational Development
The analytical evaluation of the posterior distribution of θ = (β, ϕ, ρ, Σ) given in (9) is not available. However, we can develop a Metropolis-within-Gibbs sampling algorithm to sample from (9). The algorithm requires sampling the following parameters in turn from the following conditional distributions: (i) [Σ|β, λ, ϕ, ρ, γo, Do] and (ii) [β, λ, ϕ, ρ, γo |Σ, Do].
For (i), the full conditional distribution for is
For (ii), we use the modified collapsed Gibbs sampling technique in Chen, Shao and Ibrahim27 via the identity
and further [β, λ, ϕ, ρ |Σ, Do] is sampled in turn from the conditional distributions: (ii_a) [β | Σ, λ, ϕ, ρ, Do]; (ii_b) [λ | Σ, β, ϕ, ρ, Do]; (ii_c) [ϕ | Σ, β, λ, ρ, Do]; and (ii_d) [ρ | Σ, β, λ, ϕ, Do]. The full conditional distribution for β is
where and . Given Σ, β, ϕ, ρ, and Do, the λk’s are conditionally independent with density function
for k = 1,2 …,K. The lack of log-concavity of π* (λk | Σ, β, ϕ, ρ, Do) facilitates the use of a localized Metropolis algorithm from Chen et al.27. We make a log transformation on λk and denote ηk = log λk. Thus, the conditional density function of ηk is proportional to
Instead of generating λk, the following steps give the algorithm for generating ηk.
The Localized Metropolis Algorithm for Generating ηk
Step 1. Let be the current value.
Step 2. Compute .
Step 3. Compute
Step 4. Draw from .
Step 5. Compute
where ϕ is the standard normal density function.
Step 6. Draw u from U(0, 1). Set if u ≤ a and if u > a.
Remark 4.2. In Step 4, is the normal proposal density, where is the maximizer of log π* (ηk | Σ, β, ϕ, ρ, Do), and is minus the inverse of the second derivative of log π* (ηk | Σ, β, ϕ, ρ, Do) evaluated at . To avoid direct computation of the second derivative of log π* (ηk | Σ, β, ϕ, ρ, Do), we fit a quadratic polynomial regression model to approximate log π* (ηk | Σ, β, ϕ, ρ, Do) around its mode . The variance can thus be approximated by −1/(2a), where a is the least squares estimate of the quadratic coefficient. Specifically, we select in the neighborhood of , compute for b =1, …, B, fit a second order polynomial regression using the points , and then set the inverse of the variance to be -k times the least squares estimate of the second order coefficient.
The full conditional distribution for ϕ is
Letting be the current value, we use a modified localized Metropolis algorithm to generate . The steps are similar to the algorithm for generating . For (ii_d), let ρij denote the (i, j)th element of ρ and also let denote the partial correlations. The full conditional distribution for ρij, 1 ≤ i < j ≤T, is
We generate the positive definite correlation matrix ρ(ρij) based on the correlations ρi,i+1 for i = 1, …, T − 1 and partial correlations for j − i ≥ 2.28 These dimensional parameters can independently take values on (−1, 1). Following this approach, we first transform to , for which the determinant |JT| of the Jacobian of this transformation is given by
Write with ℓ = 1 since the indices i+1, …, i+ ℓ−1 form an empty set if ℓ = 1. Then the full conditional joint density of for ℓ = 1, 2, …, T − 1 and i = 1,…, T − ℓ can be written as
We further make the Fisher’s z transformation on and denote . The Jacobian for this transformation is . Thus, the full conditional joint density of zi,i+ℓ with ℓ = 1, 2, …, T − 1, i = 1, …, T − ℓ, can be written as
Again, zi,i+ℓ can be generated by the localized Metropolis algorithm. The full conditional distribution for γk,o, k = 1, …, K, is
The full conditional distribution [vb | v, va] is . Finally,
We make the transformation ξ = log v and ξa = log va. The conditional densities of £ and £a are proportional to
We use the localized Metropolis algorithm for ξ and ξa, where the proposal distributions have variances
where ψ(·) and ψ′(·) indicate the digamma function and the trigamma function, respectively, and and indicate the values where their full conditionals are maximized.
4.3 |. Model Comparison
It is essential to determine the degrees of freedom when using the multivatiate t distribution for the random effects. In this paper, we alternate between several choices for fixed v including ∞ for which the random effects would follow a multivariate normal distribution, and random v with a hierarchical prior. Likewise, the choices of the covariate vector need to be compared. These two issues call for carrying out Bayesian model comparisons for which two criterion-based measures are considered. We first define the deviance function only for the observed data likelihood using the response variable yk, which is given by D(θ) = −2log L(θ | Doy). Thus, the deviance information criterion (DIC)29 is given by , where and . Secondly, for the kth trial, the conditional predictive ordinate is defined as , where is Doy with the kth trial deleted, and is the posterior distribution based on the data . Another model comparison criterion is the logarithm of the Pseudo-marginal likelihood (LPML), which is a summary statistic of the CPOk’s.
5 |. A SIMULATION STUDY
To examine the empirical performance of the proposed methods, we conduct two sets of simulation studies, in which the true distributions of the random effects are a multivariate t-distribution with v = 3 and a multivariate normal distribution, respectively. The simulation setting is as follows: K = 30, p = 9, T = 4, β = (−0.06, 1.40, 1.83, 0.11, −0.81, 1.03, −18.88, −24.53, −11.70)′, and ϕ = (1.38, −1.5)′. The K = 30 trials are divided into three blocks of the same size, having two, three, and four arms,respectively. The covariate vectors are generated from a multivariate normal distribution. That is, with . The correlation matrix ρ is fixed as a 4×4 matrix with off-diagonal elements of 0.7. zkt is set to be a 2-dimensional vector where its first element is 1 if its treatment is 1, and its second element is 1 if the treatment is 3; all else is zero. The error variances, ‘s, are set to the values that range from 366.1473 to 439.7980 with the first, second (median), and third quartiles being 392.0314, 399.9440, and 412.9699, respectively. These specific values for the true parame- ters were chosen to reflect the real data in our paper. However, to make the simulation model simpler than that of the real data analysis, we reduced the number of covariates from 10 to 5 and the treatments from 11 to 4. The true values for the treatment effects (β6, … ,β9) were chosen to include significant effect sizes and a close-to-null effect. The near-zero effect was included to examine whether our models can detect the effect even when it is small. The measures for evaluating the accuracy of the posterior estimates are the bias, the relative bias (Rel. Bias), which is defined as bias/|true value of the parameter|, simulation error (SE), the average of standard deviations (SD), the root mean squared error (rMSE), and the coverage probabilities (CP) of their 95% HPD intervals. The MCMC chains were run for 12,500 iterations each with the first 2,500 discarded.
Table 2 contains the summary of 500 simulations for which the true model assumes the random effects follow a multivariate t-distribution with 3 degrees of freedom. The posterior estimates of the parameters under the true model have small biases and the CP’s are close to 95% for most of them. The posterior estimates for β are less affected by the distributional assumption of the random effects than by the variance structure. However, the coverage probabilities start to deviate from 95%, with the exception that ϕ2 has a high CP under all models. The CP for ϕ1 drops to 80% for v = 10 while the CPs of β7 and β9 are around 85% when v = ∞. The covariate effects captured by (β1, β2, … ,β5) have CPs barely above 90%. The departure from the true model takes the most toll on the CP for ϕ1, which falls to 80% when we fit the multivariate t distribution with v = 10 and as low as 41.4% when the normal random effects were assumed. Assuming a single variance is worse than an incorrect distribution for the random effects as it particularly worsens the CP to 79.4% for j5. Furthermore, the CPs for (β6, β7, β8)′ range from 99.4% to 100%, which are far too high, indicating a lack of power.
TABLE 2.
Results for 500 simulations from the multivariate t-distribution with 3 degrees of freedom. v indicates the degrees of freedom of the fitted model with v = ∞ implying a normal distribution. “Single Variance” assumes zkt = 1 for all k and t.
| Bias | Rel. Bias | SE | SD | rMSE | CP | Bias | Rel. Bias | SE | SD | rMSE | CP | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| v =3 | |||||||||||||
| β1 | 0.006 | 0.100 | 0.154 | 0.171 | 0.154 | 0.978 | β6 | −0.022 | −0.021 | 1.001 | 1.019 | 1.000 | 0.946 |
| β2 | 0.004 | 0.003 | 0.168 | 0.187 | 0.168 | 0.976 | β7 | 0.026 | 0.001 | 0.351 | 0.357 | 0.352 | 0.960 |
| β3 | −0.001 | −0.001 | 0.173 | 0.180 | 0.173 | 0.952 | β8 | 0.003 | 0.000 | 0.214 | 0.229 | 0.214 | 0.962 |
| β4 | −0.019 | −0.173 | 0.243 | 0.248 | 0.244 | 0.954 | β9 | −0.000 | 0.000 | 0.364 | 0.375 | 0.364 | 0.954 |
| β5 | −0.015 | −0.019 | 0.231 | 0.238 | 0.231 | 0.952 | ϕ1 | −0.027 | −0.020 | 0.176 | 0.188 | 0.178 | 0.966 |
| ϕ2 | −0.537 | −0.358 | 0.445 | 1.087 | 0.697 | 0.992 | |||||||
| v =10 | |||||||||||||
| β1 | 0.008 | 0.133 | 0.158 | 0.165 | 0.158 | 0.972 | β6 | −0.022 | −0.021 | 1.075 | 1.084 | 1.074 | 0.946 |
| β2 | 0.002 | 0.001 | 0.170 | 0.179 | 0.170 | 0.956 | β7 | 0.026 | 0.001 | 0.364 | 0.329 | 0.365 | 0.936 |
| β3 | −0.001 | −0.001 | 0.176 | 0.175 | 0.176 | 0.942 | β8 | 0.001 | 0.000 | 0.214 | 0.229 | 0.214 | 0.962 |
| β4 | −0.020 | −0.182 | 0.245 | 0.238 | 0.246 | 0.944 | β9 | 0.001 | 0.000 | 0.376 | 0.346 | 0.375 | 0.930 |
| β5 | −0.012 | −0.015 | 0.232 | 0.229 | 0.232 | 0.946 | ϕ1 | 0.150 | 0.109 | 0.180 | 0.163 | 0.234 | 0.800 |
| ϕ2 | −0.366 | −0.244 | 0.462 | 1.105 | 0.589 | 0.990 | |||||||
| v =∞ | |||||||||||||
| β1 | 0.011 | 0.183 | 0.178 | 0.160 | 0.178 | 0.928 | β6 | 0.026 | 0.025 | 1.389 | 1.214 | 1.388 | 0.922 |
| β2 | 0.002 | 0.001 | 0.200 | 0.172 | 0.200 | 0.928 | β7 | 0.034 | 0.002 | 0.421 | 0.309 | 0.422 | 0.846 |
| β3 | −0.005 | −0.003 | 0.202 | 0.172 | 0.202 | 0.900 | β8 | 0.002 | 0.000 | 0.215 | 0.233 | 0.215 | 0.966 |
| β4 | −0.021 | −0.191 | 0.273 | 0.230 | 0.274 | 0.916 | β9 | 0.020 | 0.002 | 0.439 | 0.322 | 0.439 | 0.858 |
| β5 | −0.007 | −0.009 | 0.265 | 0.220 | 0.265 | 0.908 | ϕ1 | 0.335 | 0.243 | 0.235 | 0.141 | 0.409 | 0.414 |
| ϕ2 | −0.198 | −0.132 | 0.475 | 1.103 | 0.514 | 0.978 | |||||||
| single varience | |||||||||||||
| β1 | 0.014 | 0.233 | 0.227 | 0.237 | 0.227 | 0.958 | β6 | 0.048 | 0.047 | 0.814 | 0.517 | 0.814 | 0.794 |
| β2 | −0.002 | −0.001 | 0.228 | 0.268 | 0.228 | 0.978 | β7 | 0.029 | 0.002 | 0.325 | 0.529 | 0.326 | 0.994 |
| β3 | −0.022 | −0.012 | 0.265 | 0.286 | 0.266 | 0.962 | β8 | 0.003 | 0.000 | 0.230 | 0.529 | 0.230 | 1.000 |
| β4 | 0.017 | 0.155 | 0.357 | 0.313 | 0.357 | 0.916 | β9 | −0.008 | −0.001 | 0.338 | 0.572 | 0.337 | 0.998 |
| β5 | −0.004 | −0.005 | 0.370 | 0.313 | 0.370 | 0.886 | ϕ1 | N/A | N/A | N/A | N/A | N/A | N/A |
| ϕ2 | N/A | N/A | N/A | N/A | N/A | N/A | |||||||
| random v | |||||||||||||
| β1 | 0.007 | 0.117 | 0.153 | 0.171 | 0.153 | 0.980 | β6 | −0.033 | −0.032 | 0.999 | 1.012 | 0.998 | 0.938 |
| β2 | 0.004 | 0.003 | 0.168 | 0.188 | 0.168 | 0.974 | β7 | 0.022 | 0.001 | 0.351 | 0.358 | 0.351 | 0.964 |
| β3 | 0.000 | 0.000 | 0.173 | 0.181 | 0.173 | 0.958 | β8 | 0.000 | 0.000 | 0.215 | 0.231 | 0.214 | 0.962 |
| β4 | −0.030 | −0.273 | 0.243 | 0.248 | 0.245 | 0.950 | β9 | 0.007 | 0.001 | 0.364 | 0.375 | 0.363 | 0.960 |
| β5 | −0.022 | −0.027 | 0.232 | 0.240 | 0.233 | 0.952 | ϕ1 | −0.048 | −0.035 | 0.183 | 0.201 | 0.189 | 0.968 |
| v | 0.229 | 0.076 | 0.988 | 1.550 | 1.014 | 0.966 | ϕ2 | −0.556 | −0.371 | 0.440 | 1.062 | 0.709 | 0.994 |
Table 3 contains the summary of 500 simulations for which the true model assumes normally distributed random effects. The same pattern persists under this setting as well: the bigger the departure of the assumed model from the true normal distribution, the worse the CPs. The effect of the incorrect assumption is the most conspicuous in ϕ1 where its CP for the model assuming v = 3 drops to 85%. Equally impacted is the CP of ϕ2 under v = 3 which attains 100% coverage, implying a lack of power. However, as aforementioned, the CPs are worst when a single variance was assumed for all random effects, for which the CP for β6 falls to 73.6%, and those of β7, β8, and β9, on the other hand, spike to 98.4%, 100%, and 99.2%.
TABLE 3.
Results for 500 simulations from the multivariate normal distribution. v indicates the degrees of freedom of the random effects for the fitted model with v = ∞ implying a normal distribution. “Single Variance” assumes zkt = 1 for all k and t.
| Bias | Rel. Bias | SE | SD | rMSE | CP | Bias | Rel. Bias | SE | SD | rMSE | CP | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| v =3 | |||||||||||||
| β1 | 0.002 | 0.037 | 0.157 | 0.166 | 0.157 | 0.956 | β6 | 0.040 | 0.038 | 0.853 | 0.830 | 0.853 | 0.928 |
| β2 | 0.002 | 0.002 | 0.164 | 0.182 | 0.164 | 0.978 | β7 | 0.035 | 0.002 | 0.319 | 0.344 | 0.321 | 0.964 |
| β3 | −0.009 | −0.005 | 0.169 | 0.177 | 0.169 | 0.954 | β8 | 0.003 | 0.000 | 0.204 | 0.228 | 0.203 | 0.972 |
| β4 | 0.018 | 0.170 | 0.215 | 0.239 | 0.216 | 0.982 | β9 | −0.010 | −0.001 | 0.333 | 0.361 | 0.333 | 0.970 |
| β5 | 0.002 | 0.002 | 0.225 | 0.230 | 0.225 | 0.948 | ϕ1 | −0.231 | −0.167 | 0.175 | 0.193 | 0.290 | 0.850 |
| ϕ2 | −0.460 | −0.307 | 0.407 | 1.117 | 0.614 | 1.000 | |||||||
| v =10 | |||||||||||||
| β1 | 0.004 | 0.060 | 0.155 | 0.161 | 0.155 | 0.952 | β6 | 0.037 | 0.036 | 0.817 | 0.826 | 0.817 | 0.934 |
| β2 | 0.003 | 0.002 | 0.161 | 0.175 | 0.161 | 0.976 | β7 | 0.035 | 0.002 | 0.316 | 0.322 | 0.317 | 0.952 |
| β3 | −0.008 | −0.005 | 0.169 | 0.172 | 0.169 | 0.944 | β8 | −0.000 | 0.000 | 0.203 | 0.228 | 0.203 | 0.972 |
| β4 | 0.018 | 0.167 | 0.214 | 0.231 | 0.214 | 0.978 | β9 | −0.010 | −0.001 | 0.330 | 0.338 | 0.330 | 0.964 |
| β5 | 0.001 | 0.001 | 0.221 | 0.222 | 0.221 | 0.948 | ϕ1 | −0.121 | −0.088 | 0.163 | 0.169 | 0.203 | 0.908 |
| ϕ2 | −0.332 | −0.221 | 0.376 | 1.112 | 0.501 | 0.998 | |||||||
| v=∞ | |||||||||||||
| β1 | 0.003 | 0.045 | 0.155 | 0.157 | 0.155 | 0.946 | β6 | 0.048 | 0.047 | 0.809 | 0.825 | 0.810 | 0.940 |
| β2 | 0.004 | 0.003 | 0.160 | 0.171 | 0.160 | 0.976 | β7 | 0.034 | 0.002 | 0.316 | 0.310 | 0.317 | 0.948 |
| β3 | −0.008 | −0.004 | 0.169 | 0.169 | 0.169 | 0.940 | β8 | 0.001 | 0.000 | 0.203 | 0.229 | 0.203 | 0.976 |
| β4 | 0.016 | 0.145 | 0.215 | 0.226 | 0.215 | 0.972 | β9 | −0.005 | 0.000 | 0.330 | 0.322 | 0.330 | 0.950 |
| β5 | −0.001 | −0.001 | 0.221 | 0.217 | 0.220 | 0.946 | ϕ1 | −0.043 | −0.031 | 0.159 | 0.155 | 0.164 | 0.922 |
| ϕ2 | −0.284 | −0.189 | 0.428 | 1.137 | 0.513 | 0.998 | |||||||
| single varlance | |||||||||||||
| β1 | 0.017 | 0.283 | 0.214 | 0.216 | 0.214 | 0.940 | β6 | 0.034 | 0.033 | 0.917 | 0.540 | 0.917 | 0.736 |
| β2 | −0.001 | −0.001 | 0.211 | 0.241 | 0.211 | 0.974 | β7 | 0.030 | 0.002 | 0.338 | 0.443 | 0.339 | 0.984 |
| β3 | −0.018 | −0.010 | 0.243 | 0.261 | 0.243 | 0.966 | β8 | −0.002 | 0.000 | 0.231 | 0.424 | 0.230 | 1.000 |
| β4 | 0.013 | 0.118 | 0.331 | 0.293 | 0.331 | 0.910 | β9 | −0.011 | −0.001 | 0.346 | 0.471 | 0.346 | 0.992 |
| β5 | −0.007 | −0.009 | 0.333 | 0.289 | 0.333 | 0.892 | ϕ1 | N/A | N/A | N/A | N/A | N/A | N/A |
| ϕ2 | N/A | N/A | N/A | N/A | N/A | N/A | |||||||
| Random v | |||||||||||||
| β1 | 0.003 | 0.045 | 0.156 | 0.165 | 0.156 | 0.958 | β6 | 0.044 | 0.042 | 0.842 | 0.824 | 1.012 | 0.928 |
| β2 | 0.005 | 0.004 | 0.164 | 0.181 | 0.164 | 0.978 | β7 | 0.035 | 0.002 | 0.318 | 0.338 | 0.358 | 0.958 |
| β3 | −0.008 | −0.004 | 0.169 | 0.176 | 0.169 | 0.954 | β8 | −0.003 | 0.000 | 0.203 | 0.229 | 0.231 | 0.974 |
| β4 | 0.017 | 0.150 | 0.214 | 0.238 | 0.215 | 0.980 | β9 | −0.010 | −0.001 | 0.332 | 0.355 | 0.375 | 0.970 |
| β5 | 0.001 | 0.002 | 0.224 | 0.230 | 0.224 | 0.950 | ϕ1 | −0.208 | −0.151 | 0.175 | 0.195 | 0.201 | 0.868 |
| ϕ2 | −0.437 | −0.291 | 0.389 | 1.119 | 1.062 | 1.000 | |||||||
The simulation error (SE) encodes the across-simulation variation of the posterior mean, whereas the average of the standard deviations (SD) indicates the within-simulation variation of the posterior mean. Tables 2 and 3 show that SE and SD are close to each other, as desired, for most parameters. The results for v = 3 in Table 2 show that the SE and SD of β1 are 0.154 and 0.171, respectively. This implies that the posterior means across simulations tend to oscillate less than the posterior sample in a given batch, which typically is reflected in its CP as an overcoverage provided that it is unbiased. The greater the discrepancy, the larger the impact on its CP. For example, this is observed for ϕ2, whose SE, SD, and CP are 0.445, 1.087, and 0.992, respectively.
Meanwhile, the relative biases of β1, β4, and β6 are notably larger than those of others across the board. It is apparent that the relative bias reflects the strength of the signal in its true value since (β1, β4, β6) are exactly the elements with effect sizes that are hard to detect. Moreover, the rMSEs indicate the average variability of the posterior sample with respect to the true value. The rMSE for β6 corresponding to v = 3 in Table 2 is 1.000, which is the largest among all the parameters. This indicates that a given posterior sample may signal a fair amount of uncertainty for β6 due to its small size, which is 1.03, while all other treatment effects have magnitudes over 11.00.
The results when v was assigned a prior distribution and sampled in the MCMC iterations are promising. Although the model does not assume that the degrees of freedom are fixed at the true value, it does not sacrifice the CPs for almost all of the parameters with β6 being a minor exception. This implies that treating v as random has similar statistical power to that of the true model specification. Table 2 also contains the summary of the posterior distribution of v, which shows that its CP is 0.966 with a relative bias of 0.076. It is worth noting that taking v random does not solve the lack of power for ϕ2 since the CP is equally too high.
Figures 3 and 4 shows the differences between each pair of DICs and LPMLs, respectively. A summary of the DIC (LPML) differences is provided for each boxplot in Table 4, in which the medians and IQRs (interquartile ranges) are reported. Both DICs and LPMLs clearly select the true model most of the time when the variance structure is correctly specified. However, it is evident that the magnitude of lost information in the DICs and the LPMLs is much greater when the variance structure is misspecified than when a wrong distribution is assigned for the random effects. On the other hand, LPML and DIC both favor treating v as random even when compared to the model assuming the true value for v provided that the true model assumes t-distributed random effects. When the true random effects come from a multivariate normal distribution, the random degrees of freedom model resulted in worse performance as demonstrated in Figures 3 and 4. Table 4 indicates, however, that the true model and the fitted model with random degrees of freedom have comparable fit since the sizes of the median differences are around 2 and the widths of the IQRs do not exceed 10.
FIGURE 3.
The DIC difference is defined as the DIC under the true model minus the DIC under the fitted model. The true model for the left panel has multivariate t random effects with v = 3 and that of the right panel is multivariate normal random effects. Differences below −100 are not shown in the boxplots.
FIGURE 4.
The LPML difference is defined as the LPML under the true model minus the LPML under the fitted model. The true model for the left panel is the multivariate t random effects with v = 3 and that of the right panel is the multivariate normal random effects.
TABLE 4.
Summary of DIC (LPML) differences. The DIC (LPML) difference is defined as the DIC (LPML) under the true model minus the DIC (LPML) under the fitted model. The row label indicates the fitted model.
| DIC Difference |
LPML Difference |
|||
|---|---|---|---|---|
| Fitted Model | Median | IQR | Median | IQR |
| (True) v = 3 | ||||
| v = 10 | −0.19 | (−1.94, 1.80) | 0.97 | (−0.26,1.91) |
| v = ∞ | −1.83 | (−6.21, 1.48) | 2.53 | (0.72, 5.63) |
| Single Variance | −43.40 | (−48.97, −38.31) | 22.74 | (20.91, 27.12) |
| Random v | 1.47 | (−2.40, 8.41) | −2.07 | (−6.41,0.19) |
| (True) v = ∞ | ||||
| v = 3 | −5.23 | (−6.43, −3.86) | 3.03 | (2.34, 3.44) |
| v = 10 | −1.19 | (−1.68, −0.55) | 0.54 | (0.34, 0.74) |
| Single Variance | −47.44 | (−53.07, −40.87) | 27.44 | (21.74, 36.97) |
| Random v | −4.74 | (−5.23, −4.11) | 1.68 | (1.20, 2.00) |
6 |. ANALYSIS OF THE TNM DATA
In this section, we use our model (1) - (4) to analyze the TG network meta-data introduced in Section 2. Specifically, our data include T = 11 treatments {PBO, S, A, L, R, P, E, SE, AE, LE, PE}, each of which is assigned a treatment ID from 1 to 11, sequentially. Among the 11 treatments, {S, A, L, R, P} are different types of statins and {SE, AE, LE, PE} are the combination treatments of E with different types of statins. It is worth noting from Section 2 that the combination treatment {RE} is missing in our TG network meta-data and the five treatments {L, P, AE, LE, PE} are only included in single trials. Let in (1) be the mean percent change in TG from the baseline value for the tkℓ th treatment in the kth trial. In model (1), includes the ten baseline covariates, namely, baseLDLC, baseHDLC, baseTG, age, white, male, BMI, potency_med, potency_high, and duration, and the eleven-dimensional treatment indicators consisting of 1{PBO}, 1{S}, 1{A}, 1{L}, 1{R}, 1{P}, 1{E}, 1{SE}, 1{AE}, 1{LE}, and 1{PE}, where the indicator function 1{B} takes a value of “1” if B is true and a value of “0” if B is false.
In model (4), we compare eight sets of group indicator variables, one set of selected baseline variables, and a mixed set of group indicator variables and selected baseline variables. For the eight sets of group indicator variables, we replicate the eight different groupings in Li et al.,11 where they first formulate possible grouping sets based on the clinical mechanism of action of treatments and then compare them using model comparison criteria. Briefly, the eight sets of group indicator variables are: , . The set divides the treatments into four groups, in which PBO alone is the first group, all statins {S, A, L, R, P} are the second group, E is the third group, and all statins with E {SE, AE, LE, PE} are the fourth group. Similarly, we create sets by separating one statin (S, A or R) from the other statins. Sets are also similar to set but separate two statins (S and R, S and A, or A and R) from the other statins. Sets . It is chosen based on the plausibility that baseline cholesterol measures are related to the variability of treatment effects in lowering TG. In addition, we include a calibrated covariate set with a mixture of treatment indicators and baseline variables. The reasoning for the inclusion of each covariate is as follows. We have a binary covariate for PBO since PBO has the most different mechanism of action compared to other treatments. Also, from the discussion in Section 2, we include a binary covariate for R since treatment R has the widest range of variability in effects. Baseline TG is included to adjust for baseline effects on variances, and correspondingly baseline LDL-C is included since LDL-C and TG are highly related. Therefore, .
In all of the computations, we standardized the covariates in order to achieve better MCMC convergence. We took every third sample and generated 20000 MCMC samples after a 2000-iteration burn-in. We checked MCMC convergence using standard Bayesian model assessment, including trace plots and sample autocorrelation plots. We see from Table 1 that some treatments in trials 12, 13, and 14 are not included in other trials. Thus, the calculation of the LPML excluded these trials. Table 5 compares the DIC and LPML values for the 10 sets of covariates in model (4) combined with six fixed values of v as well as a random v. Different covariate sets might give different parameter sizes in ϕ. This information is additionally given in the first column of Table 5. We see from Table 5 that (i) for each of these 10 sets of covariates, the largest DIC and the smallest LPML are attained at v = ∞ (the normal distribution); (ii) set has the smallest DIC and the largest LMPL among all 10 covariate sets; (iii) the values of DICs and LPMLs for a random v are close to those with v = 3 or v = 4 for most of these 10 sets of covariates; (iv) the combination of set with v = 2 fits the data best under DIC; and (v) the combination of set with v = 3 fits the data best under LPML. Without including the covariates in model (1), the DIC under set with v = 3 is 423.21. We compare the posterior estimates of covariates-included and covariates-not-included models in the latter discussion. We also fit the data using a diagonal covariance matrix, that is, assuming no correlations among the random treatment effects. With covariate set and v = 3, the DIC under the diagonal covariance matrix is 387.86, which is larger than the DIC under our specified covariance matrix.
TABLE 5.
Model comparisons for the TG network meta-data using DIC and LPML.
| Criterion | DIC | LPML | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| va = Covariates Set(dim(ϕ)b) | 2 | 3 | 5 | 10 | 20 | ∞ | Random v | 2 | 3 | 5 | 10 | 20 | ∞ | Random v |
| 385.81 | 387.09 | 389.52 | 392.88 | 395.50 | 397.70 | 387.0137 | −160.97 | −161.90 | −163.33 | −165.22 | −166.88 | −171.17 | −161.4639 | |
| 387.82 | 388.77 | 390.85 | 393.78 | 395.97 | 397.56 | 389.5591 | −162.48 | −163.15 | −164.27 | −166.43 | −168.88 | −169.12 | −163.2149 | |
| 389.10 | 389.89 | 391.78 | 394.07 | 395.82 | 397.06 | 398.2215 | −163.47 | −164.28 | −165.68 | −167.78 | −168.84 | −169.21 | −163.8937 | |
| 386.42 | 386.81 | 387.95 | 389.40 | 390.35 | 391.19 | 388.6203 | −162.15 | −163.16 | −163.95 | −164.45 | −165.65 | −167.85 | −163.1581 | |
| 388.44 | 389.09 | 390.79 | 393.24 | 395.20 | 396.82 | 389.9858 | −162.36 | −163.25 | −164.42 | −166.03 | −167.85 | −169.45 | −162.9580 | |
| 388.26 | 388.37 | 389.24 | 391.00 | 391.63 | 392.82 | 390.0239 | −164.43 | −164.33 | −164.94 | −167.09 | −166.62 | −167.17 | −164.0532 | |
| 389.14 | 389.64 | 390.88 | 392.48 | 393.62 | 394.33 | 390.9370 | −163.30 | −164.25 | −164.68 | −166.88 | −166.51 | −167.65 | −164.1565 | |
| 388.55 | 388.62 | 389.57 | 390.87 | 392.23 | 392.78 | 390.4394 | −163.24 | −163.39 | −164.70 | −166.49 | −167.37 | −168.03 | −164.1556 | |
| 385.75 | 386.47 | 387.75 | 390.16 | 391.71 | 393.10 | 387.1559 | −161.36 | −162.03 | −162.84 | −164.09 | −166.52 | −166.21 | −162.0838 | |
| 383.63 | 384.03 | 384.88 | 386.15 | 387.23 | 387.88 | 385.2640 | −160.48 | −160.17 | −160.92 | −161.84 | −162.13 | −164.13 | −161.1602 | |
Degrees of Freedom
Dimension of ϕ.
Table 6 presents the posterior means, posterior standard deviations (SDs), and 95% HPD intervals of the parameters under the best covariate set with v = 2 and v = 3. The posterior estimates for v = 2 are similar to those for v = 3. The posterior means (SDs and HPD intervals) for the baseline TG and race proportion of white are 1.86 (0.68 and [0.49, 3.34]) and −2.08 (0.42 and [−2.88, −1.24]), respectively, for v = 2; and 1.83 (0.70 and [0.43, 3.20]) and −2.01 (0.45 and [−2.86, −1.10]), respectively, for v = 3. Both sets of the 95% HPD intervals do not include 0, indicating significant influence on the mean percent change from baseline in TG. As we have mentioned, the covariates were standardized in all computations. It is worth noting that one should not read too much into the individual covariate coefficients from the fitted model given in Table 6. These coefficients are dependent upon the trial populations and the terms included in the model. Instead, the model should be looked at and interpreted in its totality.
TABLE 6.
Posterior estimates of the parameters under set with v = 2 and v = 3.
|
v = 2 |
v = 3 |
||||||
|---|---|---|---|---|---|---|---|
| Posterior |
Posterior | ||||||
| Variable | Parameter | Mean | SD | 95% HPD Interval | Mean | SD | 95% HPD Interval |
| baseLDLC | β1 | 0.005 | 0.48 | (−0.92, 0.96) | −0.06 | 0.49 | (−1.01, 0.91) |
| baseHDLC | β1 | 1.58 | 0.92 | (−0.26, 3.34) | 1.40 | 0.94 | (−0.48, 3.20) |
| baseTG | β3 | 1.86 | 0.68 | (0.49, 3.18) | 1.83 | 0.70 | (0.43, 3.20) |
| age | β4 | 0.14 | 0.42 | (−0.69, 0.96) | 0.11 | 0.43 | (−0.77, 0.93) |
| white | β5 | −2.08 | 0.42 | (−2.88, −1.24) | −2.01 | 0.45 | (−2.86,-1.10) |
| male | β6 | 0.50 | 0.74 | (−0.97, 1.95) | 0.34 | 0.76 | (−1.16, 1.82) |
| BMI | β7 | −0.79 | 0.51 | (−1.78, 0.24) | −0.81 | 0.53 | (−1.81, 0.26) |
| potency_med | β8 | 2.30 | 3.19 | (−3.95, 8.64) | 2.52 | 3.25 | (−3.77, 9.07) |
| potency_high | β9 | −0.79 | 3.40 | (−7.65, 5.75) | −0.61 | 3.46 | (−7.55, 6.14) |
| duration | β10 | 0.13 | 0.49 | (−0.81, 1.08) | 0.21 | 0.51 | (−0.78, 1.22) |
| PBO | β11 | 0.79 | 5.75 | (−10.51, 12.08) | 1.03 | 5.85 | (−10.88, 12.26) |
| S | β12 | −18.78 | 1.20 | (−21.13, −16.41) | −18.88 | 1.23 | (−21.28, −16.40) |
| A | β13 | −24.54 | 1.68 | (−27.86, −21.26) | −24.53 | 1.72 | (−27.87, −21.10) |
| L | β14 | −11.70 | 4.30 | (−20.05, −3.53) | −11.70 | 4.25 | (−20.09, −3.49) |
| R | β15 | −18.63 | 1.77 | (−21.96,-15.03) | −18.50 | 1.83 | (−22.06, −14.79) |
| P | β16 | −8.53 | 4.39 | (−17.07, −0.17) | −8.41 | 4.35 | (−16.74, 0.39) |
| E | β17 | −5.66 | 5.80 | (−17.15, 5.70) | −5.47 | 5.90 | (−16.93, 6.37) |
| SE | β18 | −21.75 | 1.78 | (−25.23, −18.26) | −21.82 | 1.83 | (−25.60, −18.38) |
| AE | β19 | −29.75 | 2.86 | (−35.31, −24.22) | −29.94 | 2.96 | (−35.61, −24.00) |
| LE | β20 | −25.95 | 3.50 | (−32.83, −19.28) | −26.24 | 3.52 | (−33.27, −19.52) |
| PE | β21 | −21.77 | 3.82 | (−29.17, −14.62) | −22.13 | 3.71 | (−29.47, −14.86) |
| Intercept | ϕ1 | −0.26 | 0.39 | (−1.06, 0.46) | −0.38 | 0.51 | (−1.41, 0.51) |
| PBO | ϕ2 | −0.70 | 1.07 | (−2.83, 1.29) | −1.38 | 1.19 | (−3.76, 0.41) |
| R | ϕ3 | 0.50 | 0.42 | (−0.30, 1.34) | 0.22 | 0.16 | (−0.11, 0.54) |
| baseLDLC | ϕ4 | −0.41 | 0.47 | (−1.38, 0.44) | −0.41 | 0.46 | (−1.32, 0.42) |
| baseTG | ϕ5 | 0.27 | 0.30 | (−0.31, 0.88) | 0.23 | 0.28 | (−0.31, 0.79) |
We also calculated the posterior mean ranks for all treatments. For v = 3, the mean ranks of {PBO, S, A, L, R, P, E, SE, AE, LE, PE} are {10.97, 6.33, 2.97, 7.96, 6.63, 8.95, 9.58, 4.54, 1.21, 2.38, 4.50}, respectively. This suggests that the descending order of treatments, after adjusting for the 10 aggregate covariates, is AE, LE, A, PE, SE, S, R, L, P, E, PBO. This order reflects the same order of the posterior means of the overall treatment effects. The treatments have the same order for v = 2. We also plot the absolute and cumulative ranking probabilities for all treatments and calculate the surface under the cumulative ranking curve (SUCRA) in Figure 5. The higher the probability for a treatment having top ranks, the larger the area under the cumulative ranking line. Hence, larger SUCRA values represent better treatments on a scale from 0 to 1. It is worth noting that a SUCRA of value p is related to the mean rank r through the transformation r = 1 + (1 − p)(T − 1), where T is the number of treatments in the data30. The mean ranks of the treatments for v = 3, without adjusting for the covariates, are {10.92, 6.29, 3.80, 8.22, 4.90, 9.16, 9.24, 2.22, 1.09, 4.09, 6.06}. The suggested descending order becomes AE, SE, A, LE, R, PE, S, L, P, E, PBO. The best treatment is always E combined with A and the worst two treatments are PBO and E alone. The effects of the combination treatments SE, LE, PE and the statin R differ substantially with and without covariate adjustment.
FIGURE 5.
Plots of ranking probabilities for all treatment arms: (a) v = 2, (b) v = 3. The green dashed line represents the absolute probability and the orange solid line represents the cumulative probability. Treatment abbreviations and the SUCRA values are on the top of each sub-plot.
Figure 6 exhibits the posterior means and 95% HPD intervals of pairwise treatment differences under set for v = 2 and v = 3 with and without adjusting for covariates. We observe from this figure that (i) without covariate adjustment, treatment P does not provide a substantially higher reduction in TG than PBO. With covariate adjustment, all active treatments provide substantially higher reductions in TG than PBO; (ii) without covariate adjustment, SE and AE have significantly higher TG reductions than their respective statins, while LE and PE do not; (iii) With covariate adjustment, AE, LE and PE provide sig- nificantly higher TG reductions than their respective statins while SE does not; (iv) except for PE, all combination treatments provide significantly higher TG reductions than E alone with or without adjusting for covariates; (v) AE provides a substantially higher TG reduction than PE without adjusting for covariates and a substantially higher TG reduction than SE with adjusting for covariates; (vi) the four combination treatments do not have substantial differences although they do have a treatment order; and (vii) the results for v = 2 are similar to those for v = 3. Finally, we note that covariate adjustment is critical when the studies are not sufficiently similar in clinical characteristics.
FIGURE 6.
Plot of posterior means and 95% HPD intervals of treatment differences under the models with v = 2 and v = 3. The reference treatment being compared in each sub-plot is listed at the bottom. The red color represents estimates with covariate adjustment, while the blue color represents estimates without covariate adjustment.
7 |. DISCUSSION
In this paper, we have proposed a general multivariate t distribution for the random treatment effects. The multivariate t distribution is a natural distribution to use and is a natural extension of the multivariate normal distribution. The Cochrane Handbook23 has an independent section about meta-analysis of skewed data and gives instructions on diagnosing and fixing skewed out- comes in meta-analysis. Some other authors31,32 also make contributions to this topic. However, none of their work has been applied to the network meta-regression model.
In (1), we assume a normal distribution for the random error ∈kt. The rationale for such an assumption is that since our outcome variable is aggregate, which generally represents the sample mean, the Central Limit Theorem guarantees that the aggregate response approximately follows a normal distribution. However, for individual patient data, the random error should be assumed to follow a multivariate t distribution. In this paper, we assume that v (the degrees of freedom in the multivariate t distribution) is both fixed and random, and use DIC and LPML to guide the choice of model. Based on our empirical results, the posterior estimates of the regression coefficients, which include the overall treatment effects, are relatively robust with respect to moderate misspecification of the degrees of freedom v. In clinical practice, it may be sufficient to default to the hierarchical prior for v as a way of data-driven model selection. Ideally, however, it is advised to compare a heavy-tailed t distribution (2 or 3 degrees of freedom), a moderate-tailed t distribution (e.g., 8–10 degrees of freedom), which is approximately a logistic distribution, and a light-tailed t distribution (20 or greater degrees of freedom).
In addition to the network meta-regression model, we have developed an innovative log-linear regression model for the variances of the random effects and used an unstructured correlation matrix. The variances of the random effects can be estimated through the log-linear regression and thus depend on the treatment-by-study-level covariates. The log-linear regression relaxes the strong assumption of homogeneous variances. Our specified correlation matrix also exhibited better fit to the data than the diagonal covariance matrix. Although our results show that the correlations of treatment effects are necessary, it is difficult to accurately estimate all correlations due to the incomplete observed arms in each study. In this situation, more data need to be collected or stronger assumptions need to applied. Potential future work would be to explore the weakest assumptions for the correlations so that the correlation parameters can be fully determined by the data and the posterior estimates can be close to the true correlations.
Supplementary Material
ACKNOWLEDGEMENTS
We would like to thank the Editor, the Associate Editor, and two reviewers for their very helpful comments and suggestions, which have led to a much improved version of the paper. Dr. Chen and Dr. Ibrahim’s research was partially supported by NIH grants #GM70335 and #P01CA142538, and Merck & Co., Inc., Kenilworth, NJ, USA. Dr. Kim’s research was supported by the Intramural Research Program of National Institutes of Health, National Cancer Institute.
References
- 1.Vrablik M, Holmes D, Forer B, Juren A, Martinka P, Frohlich J. Use of ezetimibe results in more patients reaching lipid targets without side effects. Cor Et Vasa 2014; 56(2): e128–e132. [Google Scholar]
- 2.Heron M Deaths: Leading Causes for 2017. National Vital Statistics Reports 2019; 68(6): June 24. [PubMed] [Google Scholar]
- 3.Miller M, Stone N, Ballantyne C, et al. Triglycerides and cardiovascular disease: a scientific statement from the American Heart Association. Circulation 2011; 123: 2292–2333. [DOI] [PubMed] [Google Scholar]
- 4.Nordestgaard B, Varbo A. Triglycerides and cardiovascular disease. Lancet 2014; 384: 626–635. [DOI] [PubMed] [Google Scholar]
- 5.Dewey F, Gusarova V, Dunbar R, et al. Genetic and pharmacologic inactivation of ANGPTL3 and cardiovascular disease. New England Journal of Medicine 2017; 377: 211–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Jorgensen A, Frikke-Schmidt R, Nordestgaard B, Tybjarg-Hansen A. Loss-of-function mutations in APOC3 and risk of ischemic vascular disease. New England Journal of Medicine 2014; 371: 32–41. [DOI] [PubMed] [Google Scholar]
- 7.Austin M, Hokanson J, Edwards K. Hypertriglyceridemia as a cardiovascular risk factor. American Journal Of Cardiology 1998; 81: 7B–12B. [DOI] [PubMed] [Google Scholar]
- 8.Sarwar N, Danesh J, Eiriksdottir G, et al. Triglycerides and the risk of coronary heart disease: 10,158 incident cases among 262,525 participants in 29 Western prospective studies. Circulation 2007; 115: 450–458. [DOI] [PubMed] [Google Scholar]
- 9.Hokanson J, Austin M. Plasma triglyceride level is a risk factor for cardiovascular disease independent of high-density lipoprotein cholesterol level: a meta-analysis of population-based prospective studies. Journal of Cardiovasc Risk 1996; 3: 213–219. [PubMed] [Google Scholar]
- 10.Bhatt D, Steg P, Miller M, et al. Cardiovascular Risk Reduction with Icosapent Ethyl for Hypertriglyceridemia. New England Journal of Medicine 2019; 380: 11–22. [DOI] [PubMed] [Google Scholar]
- 11.Li H, Chen MH, Ibrahim JG, et al. Bayesian inference for network meta-regression using multivariate random effects with applications to cholesterol lowering drugs. Biostatistics 2019; 20(3): 499–516. 10.1093/biostatistics/kxy014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Dias S, Sutton AJ, Welton NJ, Ades AE. Evidence synthesis for decision making 3: heterogeneity—subgroups, meta-regression, bias, and bias-adjustment. Medical Decision Making 2013; 33(5): 618–640. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Tu YK. Use of generalized linear mixed models for network meta-analysis. Medical Decision Making 2014; 34(7): 911–918. [DOI] [PubMed] [Google Scholar]
- 14.Batson S, Sutton A, Abrams K. Exploratory Network Meta Regression Analysis of Stroke Prevention in Atrial Fibrillation Fails to Identify Any Interactions with Treatment Effect. PloS one 2016; 11(8): e0161864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Zhang J, Carlin BP, Neaton JD, et al. Network Meta-analysis of Randomized Clinical Trials: Reporting the Proper Summaries. Clinical Trials 2014; 11(2): 246–262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zhang J, Fu H, Carlin BP. Detecting outlying trials in network meta-analysis. Statistics in Medicine 2015; 34(19): 2695–2707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hong H, Chu H, Zhang J, Carlin BP. A Bayesian missing data framework for generalized multiple outcome mixed treatment comparisons. Research Synthesis Methods 2016; 7(1): 6–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gunhan BK, Friede T, Held L. A design-by-treatment interaction model for network meta-analysis and meta-regression with integrated nested Laplace approximations. Research Synthesis Methods 2018; 9(2): 179–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Donegan S, Dias S, Tudur-Smith C, Marinho V, Welton NJ. Graphs of study contributions and covariate distributions for network meta-regression. Research Synthesis Methods 2018; 9(2): 243–260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Higgins JPT, Jackson D, Barrett JK, Lu G, Ades AE, White IR. Consistency and inconsistency in network meta-analysis: concepts and models for multi-arm studies. Research Synthesis Methods 2012; 3(2): 98–110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Statistics in Medicine 2004; 23(20): 3105–3124. [DOI] [PubMed] [Google Scholar]
- 22.Lu G, Ades AE. Modeling between-trial variance structure in mixed treatment comparisons. Biostatistics 2009; 10(4): 792–805. [DOI] [PubMed] [Google Scholar]
- 23.Higgins JPT, Green S. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0. 2011. http://handbook.cochrane.org, The Cochrane Collaboration. [Google Scholar]
- 24.Yao H, Chen M, Qiu C. Bayesian Modeling and Inference for Meta Data with Applications in Efficacy Evaluation of an Allergic Rhinitis Drug. Journal of Biopharmaceutical Statistics 2011; 21(5): 992–1005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hong H, Price KL, Fu H, Carlin BP. Bayesian network meta-analysis for multiple endpoints. In: Morton S, Gatsonis C, eds. Methods in Comparative Effectiveness ResearchBoca Raton, FL: CRC Press. 2017. [Google Scholar]
- 26.Yao H, Kim S, Chen MH, Ibrahim JG, Shah AK, Lin J. Bayesian Inference for Multivariate Meta-regression with Partially Observed Within-Study Sample Covariance Matrix. Journal ofthe American Statistical Association 2015; 110(510): 528–544. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Chen MH, Shao QM, Ibrahim JG. Monte Carlo methods in Bayesian computation. New York: Springer. 2000. ISBN 978–1-4612–1276-8. [Google Scholar]
- 28.Joe H Generating random correlation matrices based on partial correlations. Journal ofMultivariate Analysis 2006; 97(10): 2177–2189. [Google Scholar]
- 29.Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. Journal ofthe Royal Statistical Society: Series B (Statistical Methodology) 2000; 64(4): 583–639. [Google Scholar]
- 30.Salanti G, Ades AE, Ioannidis JP. Graphical methods and numerical summaries for presenting results from multiple-treatment meta-analysis: an overview and tutorial. Journal ofclinical epidemiology 2011; 64(2): 163–171. [DOI] [PubMed] [Google Scholar]
- 31.Higgins J, White IR, Anzures-Cabrera J. Meta-analysis of skewed data: Combining results reported on log-transformed or raw scales. Statistics in Medicine 2008; 27(29): 6072–6092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Greco T, Biondi-Zoccai G, Gemma M, Guérin C, Zangrillo A, Landoni G. How to impute study-specific standard deviations in meta-analyses of skewed continuous endpoints?. World Journal of Meta-Analysis 2000; 3(5): 215–224. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






