Statistical Methods for Multivariate Meta-analysis of Diagnostic Tests: An Overview and Tutorial

Xiaoye Ma; Lei Nie; Stephen R Cole; Haitao Chu

doi:10.1177/0962280213492588

. Author manuscript; available in PMC: 2014 Dec 26.

Published in final edited form as: Stat Methods Med Res. 2013 Jun 26;25(4):1596–1619. doi: 10.1177/0962280213492588

Statistical Methods for Multivariate Meta-analysis of Diagnostic Tests: An Overview and Tutorial

Xiaoye Ma ¹, Lei Nie ^2,^a, Stephen R Cole ³, Haitao Chu ^1,^*

PMCID: PMC3883791 NIHMSID: NIHMS536823 PMID: 23804970

Summary

In this article, we present an overview and tutorial of statistical methods for meta-analysis of diagnostic tests under two scenarios: 1) when the reference test can be considered a gold standard; and 2) when the reference test cannot be considered a gold standard. In the first scenario, we first review the conventional summary receiver operating characteristics (ROC) approach and a bivariate approach using linear mixed models (BLMM). Both approaches require direct calculations of study-specific sensitivities and specificities. We next discuss the hierarchical summary ROC curve approach for jointly modeling positivity criteria and accuracy parameters, and the bivariate generalized linear mixed models (GLMM) for jointly modeling sensitivities and specificities. We further discuss the trivariate GLMM for jointly modeling prevalence, sensitivities and specificities, which allows us to assess the correlations among the three parameters. These approaches are based on the exact binomial distribution and thus do not require an ad hoc continuity correction. Last, we discuss a latent class random effects model for meta-analysis of diagnostic tests when the reference test itself is imperfect for the second scenario. A number of case studies with detailed annotated SAS code in procedures MIXED and NLMIXED are presented to facilitate the implementation of these approaches.

Keywords: meta-analysis, diagnostic test, gold standard, generalized linear mixed models

1. Introduction

In the medical literature, a diagnostic test commonly refers to a medical test to classify subjects with respect to a (disease) state of interest. Accurate diagnosis plays an important role in the disease control and prevention. Diagnostic test outcomes could be dichotomous, ordinal or continuous. This article only focuses on the dichotomous outcome. The performance of a binary test is commonly measured by a pair of indices such as sensitivity and specificity. Sensitivity is defined as the probability of testing positive given a person being diseased and specificity is defined as the probability of testing negative given a person being disease-free.^{1, 2} Other frequently used indices include positive and negative predictive values, and positive and negative diagnostic likelihood ratios.^{1, 2}

In meta-analysis of diagnostic tests, there is a great potential for heterogeneity due to differences in such things as disease prevalence, study population characteristics, laboratory methods, and study designs. While some study level covariates such as mean age may explain some variation, random effects models are commonly recommended to account for other unobserved sources of variation. When a reference test can be considered a gold standard, a few methods are available to account for this heterogeneity.^3–12 Specifically, random effects models including the hierarchical summary receiver operating characteristic model³ and bivariate random effects meta-analysis on sensitivities and specificities are recommended.^{5, 11, 12} These approaches are identical in some situations.^{6, 9, 13} Some examples and extensive simulations demonstrated that bivariate random-effects meta-analysis offers numerous advantages over separate univariate meta-analysis.^{14, 15} In general, generalized linear mixed models, which use the exact binomial likelihood, often perform better than the linear mixed models which use a normal approximation.^{12, 16} In addition, a trivariate generalized linear random-effects model were proposed to jointly models the disease prevalence, sensitivities and specificities.¹⁷

In practice, disease status is often measured by a reference test that is subject to nontrivial measurement error. This leads to a setting without a gold standard. When the reference test is subject to measurement error, the evaluation of diagnostic tests in a meta-analysis setting becomes more challenging. To the best of our knowledge, only a few articles have considered meta-analysis methods for diagnostic tests in the absence of a gold standard. Walter et al. discussed a latent class model for a meta-analysis of two diagnostic tests assuming varying prevalence, but constant sensitivity and specificity across studies.¹⁸ A more general latent class random effects model by Chu et al. assumes sensitivity and specificity of both tests as well as prevalence to be random effects.¹⁹ Sadatsafavi et al. presented a model where conditional dependence between tests is allowed, but beyond prevalence, only one of the sensitivity or specificity can be implemented using a random effect.²⁰ Dendukuri et al. presented a Bayesian method for the meta-analysis of a tuberculous pleuritis diagnostic test in the absence of a gold standard.²¹

In this article, we present an overview and tutorial summarizing the pros and cons of these approaches and provide detailed case studies with annotated SAS code. The outline of this article is as follows. In Section 2, we summarize and compare different models when the referent test can be considered a gold standard. In Section 3, we introduce models in the absence of a gold standard. In Section 4, we present case studies to illustrate the approaches described in Sections 2 and 3. The annotated SAS code to implement these approaches is presented in the appendix.

The following notation is used throughout this paper:

π	Disease prevalence
Se (Sp)	Sensitivity (Specificity)
TPR (FPR)	True positive rate (false positive rate)
ROC	Receiver operating characteristic
AIC	Akaike information criterion
BIC	Bayesian information criterion
GLMM	Generalized Linear Mixed Model
BLMM	Bivariate Linear Mixed Model
SE	Standard Error

Open in a new tab

2. Statistical methods when the reference test is a gold standard

When the reference test can be considered a gold standard, let n_i₁₁, n_i₀₀, n_i₀₁, and n_i₁₀ be the number of true positives, true negatives, false positives and false negatives for the i^th study (i = 1, 2, …, N), respectively. Let n_i₁₊ = n_i₁₁ + n_i₁₀ and n_i₀₊ = n_i₀₁ + n_i₀₀ be the study-specific numbers of diseased and disease-free subjects. Then the study-specific sensitivity and specificity can be estimated as ${\hat{S e}}_{i} = n_{i 11} / n_{i 1 +}$ , and ${\hat{S p}}_{i} = n_{i 00} / n_{i 0 +}$ . See Table 1 for a typical 2 by 2 table.

Table 1.

2 by 2 table for i^th study

		Reference test		total
		Positive (+)	Negative (−)	total
Diagnostic Test	Positive (+)	n_i₁₁	n_i₀₁
Diagnostic Test	Negative (−)	n_i₁₀	n_i₀₀

Total		n_i₁₊	n_i₀₊	n_i₊₊

Open in a new tab

In this section, we will first discuss the conventional summary ROC approach and a bivariate approach using linear mixed models (LMM). Both methods require direct calculations of study-specific sensitivities and specificities, and an ad hoc continuity correction when there are empty cells. Second, we will discuss the hierarchical summary ROC approach for jointly modeling positivity criteria and accuracy parameters, and a bivariate approach using generalized linear mixed models (GLMM) for jointly modeling sensitivities and specificities. At last, we will discuss a trivariate approach using GLMM for jointly modeling prevalence, sensitivities and specificities to account for the correlations among the three parameters. The hierarchical summary ROC approach, and the bivariate and trivariate approaches are based on the exact binomial distribution and thus do not require any ad hoc continuity correction.

2.1 The summary ROC method

The summary ROC curve method was first proposed by Moses et al.²² Reflecting the trade-off between sensitivity and specificity caused by implicit thresholds, this method had been widely used in diagnostic tests studies. As test threshold varies, the observed Se and Sp estimates can form a concave shape for the ROC curve. Such a curve can be fitted by back-transforming the linear relationship between the logit transformations of Se and Sp to the ROC space: First, if some studies have n_i₁₁ =0 or n_i₀₀ =0, an ad hoc continuity correction is applied by adding 0.5 to each of the 4 cells of such studies. After the correction, sensitivity is estimated as ${\hat{S e}}_{i} = (n_{i 11} + 0.5) / (n_{i 1 +} + 1)$ and specificity is estimated as ${\hat{S p}}_{i} = (n_{i 00} + 0.5) / (n_{i 0 +} + 1)$ for the i^th study. Second, define variables S and D as the sum and the difference of logit transformed sensitivity and specificity, such that $S_{i} = logit ({\hat{S e}}_{i}) + logit ({\hat{S p}}_{i})$ and $D_{i} = logit ({\hat{S e}}_{i}) - logit ({\hat{S p}}_{i})$ , where logit(p) = log(p / (1– p)). This notation is slightly different than Moses et al.²² because the original transformation is on Se and one minus Sp (1– Sp). One can see that $S_{i} = log ({\hat{O R}}_{i})$ , where ${\hat{O R}}_{i} = \frac{n_{i 11}}{n_{i 10}} / \frac{n_{i 01}}{n_{i 00}}$ is the diagnostic odds ratio for the i^th study. Third, for N studies, fit a linear regression line S = a + bD either by an ordinary least squares or by a weighted least squares method weighing by the inverse of within-study variance $var {(log ({\hat{O R}}_{i}))}^{- 1}$ , where $var (log ({\hat{O R}}_{i})) = 1 / n_{i 11} + 1 / n_{i 10} + 1 / n_{i 01} + 1 / n_{i 00}$ .⁵ After fitting the regression line using either un-weighted or weighted method, one can plot the summary ROC curve by the two estimated coefficients (i.e., intercept â and slope b̂),

S e = {[1 + e^{- a / (1 - \hat{b})} \times {(S p / (1 - S p))}^{(1 + \hat{b}) / (1 - \hat{b})}]}^{- 1},

(1)

with Se on the y-axis and 1– Sp on the x-axis. To adjust for study-level covariates Z (e.g., different anatomical sites from which the diagnostic tests were obtained), one can fit a model with S_i = a +bD_i + cZ_i. We can then have S_i = â + b̂D_i +ĉZ_i = (â + ĉZ_i) + b̂D_i = â′ + b̂′D_i. The summary ROC curve can be plotted according to new estimates â′ and b̂′ given Z.

The summary ROC method is easy to perform but suffers limitations. First, its interpretation is known to be problematic. Walter discussed the interpretation of area under the curve (AUC).²³ A summary ROC curve located closer to the left upper corner of the ROC space will have a larger AUC, indicating better predictive accuracy of a test.²³ However, the conclusion becomes unreliable when comparing tests whose summary ROC curves may cross each other. Alternative statistics, such as the partial AUC²⁴ and the Q point²⁵ also have limited application. Second, the model setting has some drawbacks. First, because $S_{i} = log ({\hat{O R}}_{i})$ , the data are reduced to one outcome measure per study: diagnostic odds ratio. Independent summaries of sensitivity and specificity are not available, which could be important in test evaluation. Also, the model is restricted in that the between-study heterogeneity can only be adjusted by study level covariates, such that some components of the variance might not be explained. This is the reason why both Moses et al.²² and Irwig et al.²⁶ recommended the unweighted least squares rather than the weighted, as in a fixed effect model, a few large studies may dominate the result if the between-study variation is present. In addition, in practice, study characteristics besides the cut-point effect contribute to the trade-off between sensitivity and specificity within a study,^{22, 27} which are not incorporated in the summary ROC curves. Finally, an arbitrary continuity correction is needed to handle zero cells. Moses showed that it can push the summary ROC curve far from the ideal upper left corner of the ROC space, giving biased results.²⁴ Moreover, there is a long-standing debate on what arbitrary number should be added to handle zero cells.^{28, 29}

2.2 A Bivariate Approach Based on Linear Mixed Models

To improve upon the summary ROC method, Reitsma et al. proposed a bivariate LMM.¹¹ The model proceeds as follows. First, logit transforms of the sensitivity and specificity are applied to each study. Different from the summary ROC method, they are considered as random by allowing variation according to normal distributions, that is $logit ({S e}_{i}) ~ N (μ_{0}, σ_{μ}^{2})$ and $logit ({S p}_{i}) ~ N (ν_{0}, σ_{ν}^{2})$ . A bivariate normal distribution can include possible correlation between sensitivity and specificity within study: $(\begin{matrix} logit ({S e}_{i}) \\ logit ({S p}_{i}) \end{matrix}) ~ N ((\begin{matrix} μ_{0} \\ ν_{0} \end{matrix}), \sum)$ , where $\sum = (\begin{matrix} σ_{μ}^{2} & σ_{μ ν} \\ σ_{μ ν} & σ_{ν}^{2} \end{matrix})$ and σ_μν denotes the covariance between logit sensitivity and specificity.

Second, to account for the sampling variation, the estimated logit sensitivity and specificity are assumed to be normally distributed as $(\begin{matrix} logit ({\hat{S e}}_{i}) \\ logit ({\hat{S p}}_{i}) \end{matrix}) ~ N ((\begin{matrix} logit ({S e}_{i}) \\ logit ({S p}_{i}) \end{matrix}), C_{i})$ for study i, where C_i is a diagonal matrix with components of $var (logit ({\hat{S e}}_{i})) = {n_{i 11}}^{- 1} + {n_{i 10}}^{- 1}$ and $var (log it ({\hat{S p}}_{i})) = {n_{i 01}}^{- 1} + {n_{i 00}}^{- 1}$ . Note that, the general rule that $n_{i 1 +} \hat{{S e}_{i}}, n_{i 1 +} (1 - \hat{{S e}_{i}}), n_{i 0 +} \hat{{S p}_{i}}$ , and $n_{i 0 +} (1 - \hat{{S p}_{i}})$ are at least five need to hold for normal approximation to be valid. Consequently, $logit ({\hat{S e}}_{i})$ and $logit ({\hat{S p}}_{i})$ are assumed to have the following bivariate normal distribution:

(\begin{matrix} logit ({\hat{S e}}_{i}) \\ logit ({\hat{S p}}_{i}) \end{matrix}) ~ N ((\begin{matrix} μ_{0} \\ ν_{0} \end{matrix}), \sum + C_{i}) .

(2)

Because the distributions of sensitivity and specificity are often skewed, one may prefer inference based on the medians rather than means as overall diagnostic test performance summaries. Based on parameter estimates, the median sensitivity and specificity can be back-transformed as ${\hat{S e}}_{M} = {logit}^{- 1} ({\hat{μ}}_{0})$ and ${\hat{S p}}_{M} = {logit}^{- 1} ({\hat{ν}}_{0})$ . Similarly, confidence intervals for ${\hat{S e}}_{M}$ and ${\hat{S p}}_{M}$ can be transformed from the confidence intervals of μ̂₀ and ν̂₀. The correlation between sensitivity and specificity can be estimated as $\frac{{\hat{σ}}_{μ ν}}{{\hat{σ}}_{μ} \times {\hat{σ}}_{ν}}$ . The standard errors are $S E ({\hat{S e}}_{M}) = \frac{S E (\hat{μ})}{1 / {\hat{S e}}_{M} + 1 / (1 - {\hat{S e}}_{M})}$ and $S E ({\hat{S p}}_{M}) = \frac{S E (\hat{ν})}{1 / {\hat{S p}}_{M} + 1 / (1 - {\hat{S p}}_{M})}$ based on the Delta method. A summary ROC curve can be constructed by

logit (S e) = {\hat{μ}}_{0} + \frac{{\hat{σ}}_{μ ν}}{{\hat{σ}}_{ν}^{2}} (logit (S p) - {\hat{ν}}_{0}) .

(3)

In general, this approach is superior to the summary ROC model by analyzing sensitivity and specificity jointly in a bivariate linear mixed model. However, the bivariate approach estimates the degree of correlation between sensitivity and specificity, as well as both within- and between-study variation in the two indexes separately. A drawback of this approach is that an ad hoc continuity correction is required in the presence of zero cells, as with the summary ROC approach. In addition, the normal approximation is sometimes violated in practice¹². The bivariate model can adjust for covariates by regression model for the mean vector of the bivariate normal distribution: $(\begin{matrix} logit ({S e}_{i}) \\ logit ({S p}_{i}) \end{matrix}) ~ N ((\begin{matrix} μ_{0} + γ Z_{i} \\ ν_{0} + λ Z_{i} \end{matrix}), \sum)$ , where Z_i is the study-level covariate and γ, λ are the corresponding coefficient parameters.⁵ Adjusting for individual level covariates is also straightforward.

2.3 The Hierarchical summary ROC Approach

Rutter and Gatsonis proposed a hierarchical summary ROC approach,³ which is a simplification of the ordinal regression model by Tosteson and Begg: g(γ_j (x)) =(θ_j –α′x)e^β^′^x, where g(.) is a link function, γ_j (x) is the probability of a response being in one of the ordered categories given covariates x, θ_j is the cutoff values of each category, α is the location parameters and β is the scale parameter.³⁰ The hierarchical summary ROC approach reduces the ordinal regression model to two categories (j=1,2), with x indicates true disease status (coded as 0.5 for D+ and −0.5 for D−) and γ_j (x) correspond to positive test rates: Se_i and 1– Sp_i (FPR).³

The first stage of this model assumes binomial distributions of the number of positive outcomes in the i^th study, i.e., n_i₁₁ ~ Bin(n_i₁₊, Se_i) and n_i₀₁ ~ Bin(n_i₀₊,1 – Sp_i). Choose g(.) to be a logit link, the model is written as,

logit ({S e}_{i}) = (θ_{i} + 0.5 α_{i}) e^{- 0.5 β}, logit (1 - {S p}_{i}) = (θ_{i} - 0.5 α_{i}) e^{0.5 β},

(4)

where the latter is the same as logit(Sp_i) = −(θ_i – 0.5α_i)e^0.5^β. The positivity criterion θ_i models the tradeoff between sensitivity and specificity in each study. Direct interpretations of the accuracy parameters α_i are that when β = 0, α_i =logit(Se_i) + logit(Sp_i) = log(DOR_i), which is independent of θ_i. In the second stage, Rutter and Gatsonis allow θ_i and α_i to vary across studies.³ Thus, θ_i and α_i are assumed independently and normally distributed as: $(\begin{matrix} θ_{i} \\ α_{i} \end{matrix}) ~ N ((\begin{matrix} θ_{0} \\ α_{0} \end{matrix}), (\begin{matrix} σ_{θ}^{2} & 0 \\ 0 & σ_{α}^{2} \end{matrix}))$ .

A summary ROC curve can be derived based on solving functions in (4) as

logit ({S e}_{i}) = α_{i} e^{- β / 2} + e^{- β} logit (1 - {S p}_{i}) .

Another possible construction of a summary ROC curve pointed out by Chu et al.¹³ is based on the bivariate normal distribution of θ_i and α_i as

logit (S e) = e^{- 0.5 \hat{β}} (0.5 {\hat{α}}_{0} + {\hat{θ}}_{0}) + \frac{0.25 {\hat{σ}}_{α}^{2} - {\hat{σ}}_{θ}^{2}}{0.25 {\hat{σ}}_{α}^{2} + {\hat{σ}}_{θ}^{2}} \times e^{- \hat{β}} [logit (S p) - e^{0.5 \hat{β}} (0.5 {\hat{α}}_{0} - {\hat{θ}}_{0})] .

(5)

In addition, Arends et al. discussed several choices of SROC curves.¹⁰ Median sensitivity and specificity estimates are ${\hat{S e}}_{M} = {1 + exp [- ({\hat{θ}}_{0} + 0.5 {\hat{α}}_{0}) e^{- 0.5 \hat{β}}]}^{- 1}$ and ${\hat{S p}}_{M} = {1 + exp [({\hat{θ}}_{0} - 0.5 {\hat{α}}_{0}) e^{0.5 \hat{β}}]}^{- 1}$ . Also, similar as the previous models, the hierarchical summary ROC approach can incorporate study level covariates by $(\begin{matrix} θ_{i} \\ α_{i} \end{matrix}) ~ N ((\begin{matrix} θ_{0} + γ Z_{i} \\ α_{0} + λ Z_{i} \end{matrix}), (\begin{matrix} σ_{θ}^{2} & 0 \\ 0 & σ_{α}^{2} \end{matrix}))$ .

The hierarchical summary ROC approach incorporates both within- and between-study variability and the correlation between the summary statistics by random effects θ_i and α_i. Because sparse data is common in meta-analysis of diagnostic tests, an important advantage over the previous models is that the hierarchical summary ROC approach avoids the continuity correction by assuming the exact binomial distributions.³ A practical limitation of this model is that originally it was fitted using Bayesian Markov Chain Monte Carlo approach implemented in BUGS, which requires some programming expertise. This approach is found to be the same as the following bivariate GLMM with alternative parameterizations in some situations.

2.4 The Bivariate Generalized Linear Mixed Model

Chu and Cole presented a bivariate GLMM to jointly analyze sensitivity and specificity using logit link.¹² Later, the bivariate GLMM was broadened to a general link function.³¹ The model starts with binomial distribution assumptions and applies link functions on the probability parameters:

n_{i 11} ~ Bin (n_{i 1 +}, {S e}_{i}), n_{i 00} ~ Bin (n_{i 0 +}, {S p}_{i}), g ({S e}_{i}) = μ_{0} + μ_{i}, g ({S p}_{i}) = ν_{0} + ν_{i},

(6)

where μ_i and ν_i are random effects follow bivariate normal distribution $(\begin{matrix} μ_{i} \\ ν_{i} \end{matrix}) ~ N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} σ_{μ}^{2} & ρ σ_{μ} σ_{ν} \\ ρ σ_{μ} σ_{ν} & σ_{ν}^{2} \end{matrix})]$ , and g(.) is a link function such as the logit, probit, or complimentary log-log link. Different link functions can be applied to sensitivity and specificity. Though to date the logit link is the most widely used in meta-analysis, Chu et al. argued that, for some meta-analyses, the choice of the link may affect model fit and inference.³¹ The parameters $σ_{ε}^{2}, σ_{μ}^{2}, σ_{ν}^{2}$ estimate the between-study variances and ρ_εμ, ρ_εν, ρ_μν explain possible correlations.

The model gives median estimates as ${\hat{S e}}_{M} = {logit}^{- 1} ({\hat{μ}}_{0})$ and ${\hat{S p}}_{M} = {logit}^{- 1} ({\hat{ν}}_{0})$ . Similarly, confidence intervals for ${\hat{S e}}_{M}$ and ${\hat{S p}}_{M}$ can be transformed from the confidence intervals of μ̂₀ and ν̂₀. Study-level covariate Z can be included as g(Se_i) = μ₀ + μ_i + γZ_i and g(Sp_i) = ν₀ +ν_i + λZ_i, where γ, λ are corresponding coefficient parameters. Different covariates could be used for sensitivity and specificity. A regression line of g(Se) on g(Sp), $g (S e) = {\hat{μ}}_{0} + \hat{ρ} \frac{{\hat{σ}}_{μ}}{{\hat{σ}}_{ν}} [g (S p) - {\hat{ν}}_{0}]$ , gives the summary ROC curve by transforming to the ROC space. Also, alternative choices of the regression lines can construct different summary ROC curves with corresponding interpretations.¹⁰

In addition to estimating the heterogeneity and correlation parameters, both hierarchical summary ROC and bivariate GLMM approaches have advantages over the bivariate LMM. First, the bivariate GLMM does not require the normal approximation to estimate $var (logit (\hat{{S e}_{i}}))$ and $var (logit (\hat{{S p}_{i}}))$ . Second, neither of the two approaches requires a continuity correction because direct calculation of study-specific sensitivities and specificities is not involved. In the absence of study-level covariates, the two approaches are equivalent (with alternative parameterizations).⁶

Both hierarchical summary ROC and bivariate GLMM can be fitted using maximum likelihood. Several numerical methods might be used, for instance, the dual quasi-Newton optimization techniques, as implemented in the SAS procedure NLMIXED. The standard errors and confidence intervals for parameters are estimated by the Delta method and are reported automatically if specified in the ESTIMATE statement. To restrict the correlation coefficient ρ in the range [−1, 1] in the bivariate GLMM, one can use the Fisher’s z transformation of ρ. AUC for both hierarchical summary ROC and bivariate GLMM can be computed by numerical integration implemented in a SAS macro, which is available upon request from the first author.

2.5 The Trivariate Generalized Linear Mixed Model

The above approaches involving only sensitivities and specificities work best if all or the majority of the studies use case-control designs. When disease prevalence estimation is allowed, as in cohort study designs, we can derive other clinically interesting indices such as positive and negative predictive values. In this case, the test performance indexes Se and Sp can be correlated with the prevalence, which is commonly termed ‘spectrum bias’.³² Such dependence is particularly of concern when the binary diagnostic outcome is based on a cut-off point on a continuous trait, thus misclassification rates could be higher among subjects with true value near the cut point.³³ To account for this potential dependence, Chu et al. extended the bivariate GLMM to a trivariate GLMM jointly modeling the disease prevalence, sensitivity and specificity.¹⁷ Recently, Li and Fine proposed a Pearson-type correlation coefficient to assess this dependence by an estimating equation-based regression framework.³⁴

Here, we consider a trivariate GLMM based on the parameterization of π_i, Se_i and Sp_i, where π_i is the disease prevalence in the i^th study. The first level of this model assumes binomial distributions:

n_{i 1 +} ~ Bin (n_{i + +}, π_{i}), n_{i 11} ~ Bin (n_{i 1 +}, {S e}_{i}), n_{i 00} ~ Bin (n_{i 0 +}, {S p}_{i}) .

(7)

The parameters are modeled via link functions: g(π_i) = ε₀ + ε_i, g (Se_i) = μ₀ + μ_i and g (Sp_i) =ν₀ +ν_i. See Table 2 a two by two table accounting for disease prevalence.

Table 2.

2 by 2 table for ith study accounting for disease prevalence

Diagnostic Test	Reference Test		Total
Diagnostic Test	Positive (+)	Negative (−)	Total
Positive (+)	n_i₁₁	n_i₀₁
Positive (+)	π_i Se_i	(1−π_i)(1− Sp_i)
Negative (−)	n_i₁₀	n_i₀₀
Negative (−)	π_i (1−Se_i)	(1−π_i)(Sp_i)

Total	n_i₁₊	n_i₀₊	n_i₊₊
	π_i	1−π_i	1

Open in a new tab

To consider heterogeneity and potential correlations of the 3 parameters, ε_i, μ_i and ν_i are assumed to be random effects with trivariate normal distribution:

(\begin{matrix} ε_{i} \\ μ_{i} \\ ν_{i} \end{matrix}) ~ N ((\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), \sum), where \sum = (\begin{matrix} σ_{ε}^{2} & ρ_{ε μ} σ_{μ} σ_{ε} & ρ_{ε ν} σ_{ν} σ_{ε} \\ σ_{μ}^{2} & ρ_{μ ν} σ_{μ} σ_{ν} \\ σ_{ν}^{2} \end{matrix}) .

The parameters $σ_{ε}^{2}, σ_{μ}^{2}, σ_{ν}^{2}$ capture the between-study variance of the disease prevalence, sensitivity and specificity while ρ_εμ, ρ_εν, ρ_μν represent correlations.

Standard software such as SAS NLMIXED can maximize this likelihood. To avoid including unnecessary parameters, model selection criteria such as AIC can be used. The medians are derived as π̂_M =g⁻¹(ε̂₀), ${\hat{S e}}_{M} = g^{- 1} ({\hat{μ}}_{0})$ and ${\hat{S p}}_{M} = g^{- 1} ({\hat{ν}}_{0})$ . In this model, covariates can be incorporated for sensitivities, specificities and disease prevalence as was done for the bivariate GLMM.

3. Statistical methods when the reference test is not a gold standard

Limited meta-analysis tools are available when the reference test is imperfect. Walter et al. discussed the latent class model for a meta-analysis of two diagnostic tests.¹⁸ Sadatsafavi et al. presented a latent class random effects model.²⁰ However, beyond prevalence, only one of the sensitivity and specificity can be implemented as a random effect. Dendukuri et al. presented a Bayesian approach, which is an extension of the hierarchical summary ROC model, to adjust for different reference standards.²¹ We describe the latent class random effects model by Chu et al. using random effects to allow variation and correlation in sensitivity, specificity and prevalence between studies.¹⁹

Let (Se_Bi, Sp_Bi) be the pair of diagnostic accuracy parameters for the reference test while (Se_Ai, Sp_Ai) be the pair for the diagnostic test of interest. To construct the 2 by 2 table (Table 3) for such studies, both the above pairs of statistics and the disease prevalence are needed.

Table 3.

2 by 2 table when the reference test is not a gold standard

		Reference test		Total
		+	−	Total
Diagnostic test	+	n_i₁₁	n_i₀₁
	+	p_i ₁₁ =π_iSe_AiSe_Bi + (1−π_i)(1−Sp_Ai)(1−Sp_Bi)	p_i ₀₁ = π_iSe_Ai (1−Se_Bi) + (1−π_i)(1−Sp_Ai)Sp_Bi
	−	n_i₁₀	n_i₀₀
	−	p_i ₁₀ =π_i (1− Se_Ai)Se_Bi + (1−π_i)Sp_Ai (1−Sp_Bi)	p_i ₀₀ = π_i (1− Se_Ai)(1 − Se_Bi) + (1 − π_i)Sp_Ai Sp_Bi

Total		n_i₁₊	n_i₀₊	n_i₊₊
Total		p_i_{1+ =} π_iSe_Bi + (1 −π_i)(1 − Sp_Bi)	p_i ₀₊ = π_i (1 − Se_Bi) + (1−π_i)Sp_Bi	1

Open in a new tab

Se_Ai, Sp_Ai are the sensitivity and specificity for the diagnostic test; Se_Bi, Sp_Bi are sensitivity and specificity for the reference test. π_i is the disease prevalence in the i^th study

The four counts in Table 3 follow a multinomial distribution, with the log-likelihood being:

log L = \sum_{i} {n_{i 11} log (p_{i 11}) + n_{i 10} log (p_{i 10}) + n_{i 01} log (p_{i 01}) + n_{i 00} log (p_{i 00})} .

(8)

Chu et al. used random effects to model between and within study heterogeneity and potential correlations.¹⁹ We write this model in a form suitable for a general link function:

g (π_{i}) = ε_{0} + ε_{i}; g ({S e}_{A i}) = μ_{A 0} + μ_{A i}; g ({S p}_{A i}) = ν_{A 0} + ν_{A i}; g ({S e}_{B i}) = μ_{B 0} + μ_{B i}; g ({S p}_{B i}) = ν_{B 0} + v_{B i},

where random effects follow a multivariate normal distribution: (ε_i, μ_Ai, ν_Ai, μ_Bi, ν_Bi)′ ~ N (0, Σ) with variance-covariance matrix $\sum = (\begin{matrix} σ_{ε}^{2} & ρ_{ε μ_{A}} σ_{ε} σ_{μ_{A}} & ρ_{ε ν_{A}} σ_{ε} σ_{ν_{A}} & ρ_{ε μ_{B}} σ_{ε} σ_{μ_{B}} & ρ_{ε ν_{B}} σ_{ε} σ_{ν_{B}} \\ σ_{μ_{A}}^{2} & ρ_{μ_{A} ν_{A}} σ_{μ_{A}} σ_{ν_{A}} & ρ_{μ_{A} μ_{B}} σ_{μ_{A}} σ_{μ_{B}} & ρ_{μ_{A} ν_{B}} σ_{μ_{A}} σ_{ν_{B}} \\ σ_{ν_{A}}^{2} & ρ_{ν_{A} μ_{B}} σ_{ν_{A}} σ_{μ_{B}} & ρ_{ν_{A} ν_{B}} σ_{ν_{A}} σ_{ν_{B}} \\ σ_{μ_{B}}^{2} & ρ_{μ_{B} ν_{B}} σ_{μ_{B}} σ_{ν_{B}} \\ σ_{ν_{B}}^{2} \end{matrix})$ .

Median estimates of prevalence, sensitivities and specificities can be constructed as π̂_M = g⁻¹(ε̂₀), ${\hat{S e}}_{A M} = g^{- 1} ({\hat{μ}}_{A 0}), {\hat{S p}}_{A M} = g^{- 1} ({\hat{ν}}_{A 0}), {\hat{S e}}_{B M} = g^{- 1} ({\hat{μ}}_{B 0})$ and ${\hat{S p}}_{B M} = g^{- 1} ({\hat{ν}}_{B 0})$ . Variance and correlation parameter estimates can be derived from Σ̂. Covariates Z_i can be adjusted by linear regressions for the mean vectors, for instance g(π_i) =ε₀ +ε_i +γZ_i.

This latent class random effects model fills a gap in the existing models for meta-analysis with imperfect reference tests. This model can be used to evaluate the performance of both the diagnostic test of interest and the reference test while retaining all the advantages of the GLMMs. A limitation applies when fitting this model by SAS NLMIXED. One may encounter convergence problems because of the limited number of studies and relatively large number of parameters. Possible simplification of model assumptions may include letting disease prevalence be independent of sensitivities and specificities. Also, to avoid including unnecessary random effects whose variance approaches zero, one can apply a forward selection based on AIC. We will illustrate this process in Section 4.2 with an example.

4. Case Study

4.1 A meta-analysis of rotator cuff tears diagnosis using ultra-sound

4.1.1 Study background

We demonstrate an application of the methods in Section 2 using data on ultra-sound diagnosis of rotator cuff tears. Rotator cuff tears are a common reason for shoulder pain, which is the third most common musculoskeletal complaint. The incidence of partial rotator cuff tears is reported to be 13% to 32% in cadaveric studies, yet much of this incidence goes undiagnosed.³⁵ Among the diagnostic tests for this disease, ultrasound is non-invasive and less expensive. However, it has lower sensitivity and specificity in detecting the disease than MRI or arthroscopic evaluation.³⁶ We will re-analyze the data from a meta-analysis of 30 studies of diagnostic accuracy of ultrasound for rotator cuff tears in adults, performed by Smith et al.³⁷ The studies compared the accuracy of ultrasound with either arthroscopic or open surgical findings as a gold standard test. The data is presented in Appendix A1. Figure 1 and 2 present the forest plots of sensitivity and specificity, respectively. In the rest of this section, we explore this example using the models discussed in Section 2. The corresponding SAS code can be found in Appendix B1-B6.

Forest plot for sensitivity in rotator cuff tears study

Forest plot for specificity in rotator cuff tears study

4.1.2 Summary ROC method

Applying the summary ROC method, we analyze the data first by unweighted least squares, then by weighted least squares. The un-weighted method gives estimates â =3.39, b̂ =0.131 and AUC=0.911. The AUC can be interpreted as a likelihood of 91.11% that a randomly selected diseased subject will receive a more suspicious rating than a non-diseased subject. The weighted method give estimates â_w =3.573, b̂_w =0.400 and AUC_w=0.910. To build the summary ROC curve, we plug in â and b̂ (â_w and b̂_w) into equation (1) then plot Se against 1–Sp. The summary ROC curves are presented in Figure 3.

Summary median estimates and ROC curves from some of the introduced models. Panel A presents summary median Se and Sp estimates with confidence and predictive regions and summary ROC curve from the bivariate GLMM using logit link. Panel B presents summary median Se and Sp estimates with confidence and predictive regions and summary ROC curve from the BLMM and the summary ROC curve from the unweighted summary ROC method.

4.1.3 Bivariate LMM

To fit the bivariate linear mixed model, we use the SAS procedure MIXED. The bivariate LMM method can provide summary estimates of sensitivity and specificity other than the summary ROC curve. Parameter estimates are: μ̂ =1.351, ν̂ = 1.853, ${\hat{σ}}_{μ}^{2} = 1.040, {\hat{σ}}_{ν}^{2} = 0.399$ , σ̂_μν = −0.116. The sensitivity and specificity are estimated as ${\hat{S e}}_{M} = 0.794$ and ${\hat{S p}}_{M} = 0.865$ . Correlation estimate is ρ̂ = −0.18. The standard errors (SE) can be calculated by delta method: $S E (\hat{S e}) = 0.043$ and $S E (\hat{S p}) = 0.023$ . Plugging in the estimates into the equation (3), one can draw the summary ROC curve as presented in Figure 3. This model gives an AUC of 0.858. With the estimated medians, standard errors and correlation coefficients, one can draw confidence and prediction regions around the median estimates. Compared with the summary ROC method, the Bivariate LMM can provide summary estimates of overall sensitivity and specificity and their confidence regions. It may be more intuitive for investigators to compare different diagnostic tests.

4.1.4 Hierarchical summary ROC model

The hierarchical summary ROC model is fitted using the SAS procedure NLMIXED. Estimates of the parameters are: θ̂₀ = −0.738, σ̂_θ = 0.708, α̂₀ = 3.887, σ̂_α = 1.045 and β̂ = −0.522. The median sensitivity and specificity are ${\hat{S e}}_{M} = 0.827$ with SE 0.042 and ${\hat{S p}}_{M} = 0.888$ with SE 0.021. To draw the summary ROC curve, plug in the estimates into the expected logit sensitivity given specificity as in equation (5), then transform to ROC space, as presented in Figure 3. The AUC is 0.908.

4.1.5 Bivariate GLMM method

The bivariate GLMM models are fitted using the SAS procedure NLMIXED under three link functions: logit, probit and complementary log-log. The ‘estimate’ statements in the NLMIXED procedure can transform the parameter estimates to median sensitivity and specificity and carry out the estimation of standard errors via delta method. Table 4 reports summary indexes with standard errors. When dependence is assumed in the model, the three links give comparable summary estimates. The logit link provides the smallest AIC (214.8), and thus selected as the best fitted model. However, the negative correlation estimate has a large standard error. In fact, if one fit a logit link GLMM assuming independence, the AIC (213.5) is slightly smaller than the correlated model. This example does not strongly support correlation between sensitivity and specificity.

Table 4.

GLMM method estimates and standard errors (SE)

Model	Sensitivity	Specificity	ρ	AUC	σ_μ(SE)	σ_ν(SE)	AIC	-2logL
logit link	0.827 (0.042)	0.888 (0.021)	−0.298 (0.330)	0.908 (0.049)	1.143 (0.243)	0.679 (0.208)	214.8	204.8
logit link-independence	0.826 (0.042)	0.887 (0.021)	0	0.902 (0.026)	1.138 (0.241)	0.678 (0.206)	213.5	205.5
probit link	0.817 (0.042)	0.885 (0.021)	−0.312 (0.325)	0.915 (0.051)	0.636 (0.132)	0.359 (0.111)	215.2	205.2
c-log-log link	0.801 (0.045)	0.882 (0.021)	−0.329 (0.321)	0.925 (0.041)	0.560 (0.119)	0.284 (0.090)	215.9	205.9

Open in a new tab

AUC denoted the area under the summary ROC curve. The boldfaced cells represent the best chosen model based on AIC. The ‘logit link-independece’ model uses logit link and assumes independence between sensitivity and specificity while the other models assume dependence.

To summarize estimates from bivariate models, we compare the bivariate LMM method, hierarchical summary ROC model and GLMM model using logit link. The summary ROC curves and confidence and prediction ellipses of these models are presented in Figure 3. Hierarchical summary ROC and GLMM models achieve same sensitivity and specificity median estimates and standard errors, which agrees with the argument by Harbord et al. that the two models are the same with different parameterizations.⁶ The bivariate LMM model has lower estimates of sensitivity and specificity. The differences may be due to the continuity correction applied in bivariate LMM and the some degrees of approximation involved in the MIXED procedure when study size is small.⁶ A simulation study from Chu and Cole demonstrated that the GLMM method provides unbiased estimates while the bivariate LMM model has biased estimates of Se_M, Sp_M and ρ.¹²

4.1.6 Trivariate GLMM

When the prevalence of disease is involved as in a trivariate model, case-control studies need to be excluded. All our studies included satisfy the 1st criterion in the QUADAS checklist which requires random selection of the sample.³⁷

To successfully capture possible correlations without including unnecessary correlations, we fit models with all possible correlation combinations. The parameters and desired estimates, AIC and log-likelihoods are summarized in Table 5. The best model with the smallest AIC of 2653.8 is model I with no correlations (boldfaced estimates in Table 5). This suggests no correlations among disease prevalence, sensitivity and specificity in this example. This conclusion agrees with the bivariate GLMM and the estimated median sensitivity and specificity are similar as the estimates from bivariate GLMM method using logit link in Section 4.1.5. This example shows that, when the prevalence is weakly correlated with sensitivity and specificity, the bivariate GLMM gives very similar estimates to that from the trivariate GLMM.

Table 5.

Trivariate model parameter estimates and standard errors

	Se (SE)	Sp (SE)	Disease prevalence (SE)	Test prevalence (SE)	σ_μ (SE)	σ_ν (SE)	σ_ε (SE)	ρ_μν (SE)	ρ_εμ (SE)	ρ_εμ (SE)	AIC	-2logL
model I	0.826 (0.041)	0.887 (0.021)	0.448 (0.075)	0.433 (0.058)	1.123 (0.236)	0.671 (0.203)	1.387 (0.260)	0	0	0	2653.8	2641.8
model II	0.827 (0.041)	0.888 (0.021)	0.448 (0.075)	0.433 (0.059)	1.130 (0.238)	0.673 (0.205)	1.387 (0.260)	−0.311 (0.328)	0	0	2655.0	2641
model III	0.826 (0.042)	0.877 (0.026)	0.448 (0.076)	0.438 (0.062)	1.138 (0.241)	0.715 (0.235)	1.396 (0.263)	0	0	−0.401 (0.471)	2654.5	2640.5
model IV	0.822 (0.042)	0.887 (0.021)	0.447 (0.075)	0.430 (0.060)	1.109 (0.237)	0.678 (0.206)	1.383 (0.260)	0	0.176 (0.256)	0	2654.5	2640.5
model V	0.823 (0.042)	0.879 (0.026)	0.447 (0.075)	0.435 (0.062)	1.117 (0.240)	0.702 (0.229)	1.381 (0.260)	0	0.122 (0.270)	−0.328 (0.547)	2656.2	2640.2
model VI	0.828 (0.042)	0.880 (0.025)	0.448 (0.076)	0.437 (0.062)	1.146 (0.244)	0.697 (0.223)	1.396 (0.263)	−0.247 (0.340)	0	−0.325 (0.516)	2655.9	2639.9
model VII	0.823 (0.042)	0.888 (0.021)	0.447 (0.075)	0.429 (0.061)	1.114 (0.238)	0.681 (0.208)	1.384 (0.259)	−0.284 (0.329)	0.165 (0.256)	0	2655.9	2639.9
model VIII	0.823 (0.042)	0.881 (0.025)	0.447 (0.075)	0.434 (0.064)	1.117 (0.239)	0.703 (0.226)	1.387 (0.261)	−0.298 (0.323)	0.175 (0.256)	−0.343 (0.502)	2657.5	2639.5

Open in a new tab

Model I–VIII is trivariate GLMM with all possible combinations of correlation parameters. Model I assumes all correlation coefficient ρ_μν,ρ_εμ and ρ_εν equal to 0. Model II, III, IV assume only one of the correlation coeffecients is not 0: ρ_μν ≠ 0, ρ_εν ≠ 0 and ρ_εμ ≠ 0, respectively. Model V–VII assumes two of the correlation coefficients are not 0: ρ_εμ ≠0 and ρ_εν ≠ 0, ρ_μν ≠0 and ρ_εν ≠ 0, ρ_μν ≠0 and ρ_εμ ≠ 0, respectively. Model VII assumes none of ρ_μν, ρ_εμ and ρ_εν are 0.

4.2 A meta-analysis of cervical cancer diagnosis using Pap smears test

In this section, we re-visit the example used by Walter et al. ¹⁸ and apply the latent class random effects models. The data is collected from a meta-analysis of Papanicolau (Pap) smears test accuracy by Fahey et al. The Pap smear is a quick, noninvasive and relatively inexpensive test for cervical cancer.³⁸ Fahey’s analysis consists of 59 cross-sectional studies using Pap smears as the diagnostic test and histology as the gold standard. However, Walter’s model argued that the histology test has sensitivity of 0.97 and specificity of 0.62, revealing lack of a perfect gold standard.¹⁸ Hence we will treat histology as an imperfect reference test then fit the data by the latent class random effects models in Section 3. The data is listed in Appendix A2 and corresponding SAS code is included in Appendix B7.

When fitting the model using the SAS procedure NLMIXED, convergence problems appeared as more random effects were added. Thus we assume prevalence to be independent of sensitivities and specificities for ease of fitting and apply a forward-selection procedure to select random effects. We begin with a fixed effects model, and add random effects sequentially. The process of selection is outlined in Table 6. The final model obtained is IVe, in which random effects are considered for the disease prevalence, Pap smear test sensitivity and specificity and the specificity of the histology test. The parameter estimates of the best fitted models at each step are provided in Table 7.

Table 6.

Pap test example – model selection procedure

Models	Random effects	-2logL	AIC	BIC
I	NA	45277	45287	45297

IIa	ε	41882	41894	41906
IIb	μ_A	43329	43341	43353
IIc	μ_B	43398	43410	43423
IId	ν_A	43520	43532	43544
IIe	ν_B	42838	42850	42863

IIIa	ε & μ_A	40510	40524	40539
IIIb	ε & μ_B	40888	40902	40917
IIIc	ε & ν_A	40894	40908	40922
IIId	ε & ν_B	40520	40534	40548

IVa	ε, μ_A & μ_B	39777	39793	39810
IVb	ε, μ_A & ν_A	39762	39778	39795
IVc	ε, μ_A & ν_B	40506	40522	40539
IVd	ε, μ_A, μ_B & ρ_{μ_Aμ_B}	39777	39795	39814
IVe	*ε, μ_A,* ν_A & ρ_{μ_Aμ_B}	39752	39770	39789
IVf	ε, μ_A, ν_B & _{ρμ_Aν_B}	40503	40521	40540

Open in a new tab

Models in level I–IV include random effects and possible correlations denoted in the corresponding ‘random effects’ column. The procedure starts from the fixed effects model I. In Level 2, five possible random effects are added one at a time. Model IIa with random effect ε (prevalence) has smallest AIC, thus ε is carried to models in level 3. The same process continued until level IV because model fitting became unstable with more random effects than level IV and AIC was not significantly reduced anymore. The bold faced estimates represents the best model with smallest AIC in each level.

Table 7.

Pap test example—fitted estimates and standard errors

	Model Parameter Estimates (standard error)
	I	IIa	IIIa	IVe
Se_pap test( ${\hat{S e}}_{A M}$ )	0.815(1.420)	0.750(0.006)	0.664(0.043)	0.655(0.042)
Sp_pap testa( ${\hat{S p}}_{A M}$ )	0.810(1.531)	0.795(0.010)	0.822(0.010)	0.835(0.032)
Se_reference( ${\hat{S e}}_{B M}$ )	0.842(1.418)	0.858(0.010)	0.829(0.009)	0.903(0.013)
Sp_reference( ${\hat{S p}}_{B M}$ )	0.803(1.629)	0.900(0.009)	0.977(0.012)	0.989(0.014)
Prevalence(π̂_M)	0.527(1.708)	0.588(0.061)	0.712(0.050)	0.636(0.048)
σ_ε	NA	1.819(0.195)	1.727(0.194)	1.467(0.164)
σ_{μ_A}	NA	NA	1.367(0.147)	1.292(0.136)
σ_{ν_A}	NA	NA	NA	1.269(0.164)
σ_{μ_B}	NA	NA	NA	NA
σ_{ν_B}	NA	NA	NA	NA
ρ_{μ_Aν_A}	NA	NA	NA	−0.509(0.136)

Open in a new tab

Model I, IIa, IIIa and IVe are the same models specified in Table 6.

After adjustment for possible variation and correlations by random effects in our method, the final model IVe shows a low sensitivity for the Pap smears of 0.655 (SE=0.042) and a specificity of 0.835 (SE=0.032). However, the histology test outperforms the Pap smears with sensitivity of 0.903 (SE=0.013) and specificity of 0.989 (SE=0.014). Moreover, our estimates of the histology test differ from the estimates in Walter’s, suggesting a somewhat different interpretation in practice.¹⁸

5. Discussion

In this paper, we discussed methods for evaluating the performance of diagnostic tests for situations when the reference test can be considered a gold standard, as well as situations when it is error-prone. Under the scenario with a gold standard, we reviewed the traditional summary ROC method, bivariate LMM and the hierarchical summary ROC model. Then we focused on the random effect GLMM, because it has several advantages over the simpler methods. We showed how the bivariate GLMM can be fitted using a variety of link functions including logit, probit and complementary log-log, and extended the approach to a trivariate GLMM to jointly model prevalence, sensitivity and specificity. Under the situation with no gold standard, we built upon the latent class model proposed by Walter et al.¹⁸ by adding random effects to quantify possible correlation and variation following the methods by Chu et al..¹⁹ We worked through two empirical examples to illustrate the application of our models. We used the SAS procedures MIXED and NLMIXED to fit all models, and provide SAS code with detailed explanation in the Appendix. The SAS macro METADAS may assist in automating the fitting of bivariate and hierarchical summary ROC models for meta-analysis of diagnostic tests.³⁹

Several extensive simulation studies have been conducted in the literature to compare different methods. Hamza et al.⁴⁰ studied the univariate exact binomial likelihood approach against the univariate approximate normal likelihood approach in different simulation settings. The size of meta-analysis varied from 10 to 100 studies and the true median sensitivity values ranged from 0.6 to 0.93. Overall the simulations showed that the exact likelihood approach performs superior than the approximate approach in terms of bias and coverage probabilities. Riley et al.⁴¹ compared the bivariate random-effects meta-analysis dealing with dependence between two outcomes to the univariate random-effects meta-analysis. Simulation studies showed that the bivariate approach has smaller mean-square error and is recommended over the univariate approach. Chu et al.¹² conducted simulations to study the bivariate GLMM and the BLMM approaches. Size of meta-analysis varied from 25 to 250, and Se/Sp was either relatively low (0.7/0.8) or relatively high (0.9/0.95). The bivariate GLMM was shown to yield unbiased estimates of Se, Sp and their correlation, while the BLMM gave biased results. Another paper of Chu et al.³¹ conducted simulations to compare different links used in bivariate GLMM with 40 meta-studies, 200 subjects in each study and median Se/Sp as 0.8/0.9. It suggested that the AUC and median Se/Sp estimates are relatively robust to the choice of link functions. The trivariate GLMM and bivariate GLMM were compared in Chu et al.¹⁷ under different correlation assumptions. The results suggested that misspecification resulting from AIC-based model selection is reasonably low in studied settings. When the reference test is imperfect, Chu et al.¹⁹ used different selection criteria DIC, AIC and BIC on selecting the appropriate random effects. The simulation results recommended including random effects because omitting important variability can cause inflated variance and decreased coverage.

Among the models presented, the summary ROC approach is simple and widely used. However, it is limited as it does not assess the within- and between-study variations and possible correlations between Se and Sp. The bivariate LMM improves over the summary ROC approach by assuming random effects to explain both within- and between-study variations and possible correlations. The bivariate LMM can provide inferences both in terms of summary ROC curves and summary statistics of overall test performance. However, it has limitations due to the use of a continuity correction and a normal approximation. The GLMMs do not have the limitations of the above models because they assume exact binomial distributions. The bivariate GLMM, which is essentially the same as the hierarchical summary ROC model in certain situations, is recommended when research interests focus on sensitivity and specificity and there’s strong suggestion of independence with disease prevalence. The trivariate GLMM will be most appropriate when there’s interest in estimating PPV or NPV, because estimation of disease prevalence is required and correlation among prevalence and Se, Sp should not be ignored. Besides, the trivariate GLMM is most reliable when most of the studies are cohorts. When the reference test is not a gold standard, the latent class random effects model should be used to avoid biased estimates.

A limitation related to the GLMMs is that the meta-analysis reported often includes a mixture of case-control and cohort studies designs. Thus using either the bivariate or the trivariate GLMM for all the studies can lead to problems. Another issue arises when fitting the trivariate GLMM and the latent class random effects models in the SAS procedure NLMIXED. The more random effects included, the longer it takes to converge. Under such situations, one can first get raw estimates of the desired parameters by fitting the data in models with fewer random effects. The raw estimates can then be used as starting values to improve convergence in a more complex model. For the latent class random effects model, one may need to apply simpler assumptions for ease of fitting. For instance, our example assumes independence between prevalence and the paired indices. However, as discussed, dependence between the indices may be expected.

In the example of rotator cuff tears, we excluded seven studies having the partial verification problem to avoid biased results, though these studies might still be able to contribute to our analysis. To the best of our knowledge, multivariate methods to correct publication bias in a meta-analysis of diagnostic test settings still await for further development. A recent Bayesian approach to correct such bias by de Groot et al. may be applied to diagnostic tests with nominal outcomes.⁴² In summary, sensitivity analysis methods for meta-analysis of diagnostic tests investigating the impact of publication bias through a selection or pattern mixture model framework are yet to be developed.

Acknowledgments

This research was supported in part by the U.S. Department of Health and Human Services Agency for Healthcare Research and Quality Grant R03HS020666, P01CA142538 and P30CA77598 from the U.S. National Cancer Institute, and R21AI103012 from the U.S. National Institute of Allergy and Infectious Diseases.

Appendix A. Data for case studies

Appendix A1.

Partial rotator cuff tears meta-analysis data

study	year	True positive	False positive	False negative	True negative
Al-Shawi	2008	65	12	1	65
Alasaarela	1998	1	0	0	19
Brenneke and Morgan	1992	11	8	14	45
Cullen	2007	11	2	3	21
Ferrari	2002	8	1	10	25
Friedman	1993	2	0	2	0
Hedtmann and Fett	1995	121	0	12	0
Iannotti	2005	26	7	2	16
Kang	2009	2	5	2	5
Kayser	2005	41	16	11	171
Labanauskaite	2002	11	3	2	9
Milosavljevic	2005	17	0	7	6
Naqvi	2009	4	2	0	11
Read et al	1998	6	1	7	28
Roberts et al	2001	5	0	2	7
Rutten et al	2010	8	12	0	24
Takagishi	1996	10	7	10	57
Teefey	2000	10	3	5	17
Teefey	2005	13	4	2	52
van Holsbeeck et al	1995	14	3	1	47
Vlychou et al	2009	44	2	3	7
Wiener and Seitz	1993	64	4	3	71
Yen et al	2004	9	1	1	9

Open in a new tab

Appendix A2.

Pap smear test meta-analysis data

Study	Type^*	n_i11	n_i10	n_i00	n_i01
Alloub et al	SC	8	23	84	3
Alons-van Kordelaar and Boon	SC	31	43	14	3
Anderson et al	SC	70	121	25	12
Anderson et al	FU	65	6	6	10
Anderson et al	FU	20	19	4	3
Andrews et al	FU	35	20	156	92
August	FU	39	111	271	7
Bigrigg et al	SC	567	140	157	117
Bolger and Lewis	SC	26	12	18	37
Byme et al	FU	38	17	37	28
Chomet	SC	45	15	48	35
Engineer and Misra	SC	71	10	306	87
Fletcher et al	FU	4	36	5	0
Frisch et al	SC	2	3	21	2
Giles et al	SC	5	3	182	9
Giles et al	FU	38	7	62	21
Gunderson et al	SC	4	16	31	2
Haddad et al	SC	87	12	9	13
Hellberg et al	SC	15	65	15	3
Helmerhorst et al	FU	41	61	29	1
Hirschowitz et al	FU	76	11	12	12
Jones DED et al	FU	10	48	174	4
Jones MH et al	FU	28	28	77	11
Kashimura et al	SC	3	5	1	0
Kealy	FU	79	13	182	26
Koonlng-1 et al	FU	61	27	35	20
Koonlng-2 et al	FU	62	16	49	20
Kwikkel et al	FU	284	68	68	31
Lozowski et al	FU	66	20	44	25
Maggi et al	FU	40	12	47	43
Morrison EAB et al	FU	11	1	2	1
Morrison BW et al	SC	23	10	44	50
Nyirjesy	SC	65	42	13	13
Okagaki and Zelterman	SC	1270	263	1085	927
Oyer and Hanjanl	FU	223	74	83	22
Parker	SC	154	20	237	30
Pearlstone et al	FU	6	12	81	2
Ramlrez et al	SC	7	3	4	4
Reld et al	SC	12	11	60	5
Robertson et al	FU	348	212	103	41
Schauberger et al	SC	8	11	34	4
Shaw	FU	12	6	0	2
Singh et al	FU	95	2	1	9
Skehan et al	FU	40	20	19	18
Smith et al	FU	71	20	18	13
Soost et al	SC	1205	454	241	186
Soutter-1 et al	SC	5	52	27	20
Soutter-2 et al	SC	35	12	12	9
Spitzer et al	FU	10	5	32	31
Stafi	SC	3	3	15	5
Syrjanen et al	FU	118	44	183	40
Szarewski	SC	13	82	17	3
Tait et al	SC	38	13	62	14
Tawa et al	SC	14	67	291	25
Tay et al	FU	12	6	12	14
Upadhyay et al	SC	238	2	16	52
Walker et al	FU	111	20	39	44
Wetrich	FU	491	250	702	164
Wheelock and Kamlnlski	FU	49	39	31	16

Open in a new tab

type of the study denotes the usage of the test clinically, SC as screening and FU as follow up.

Appendix B. SAS codes for fitting models

B1. Unweighted summary ROC

data partial1;                                   | /*‘tp’ stands for true positive, ‘fp’ for
 set partial;                                    | false positive, ‘fn’ for false negative, ‘tn’
 if tp = 0 or fp = 0 or fn = 0 or tn = 0 then do;         | for true negative */
 tp=tp+0.5; fp=fp+0.5; fn=fn+ 0.5; tn=tn+ 0.5;        | /*continuity correction on zero cells*/
 n0=n0+1; n1=n1+1; end;
 se= tp/n1; sp=tn/n0;                             | /*calculate Se and Sp for each study*/
 logitse = log(se/(1-se)); var logitse=1/(se*(1-se)*n1);  | /* logit(Se) and logit(Sp) and their
 logitsp = log(sp/(1-sp)); var logitsp=1/(sp*(1 -sp)*n0); | variances*/
 D=logitse+logitsp; S=logitse-logitsp;                | /* D and S*/
proc reg data=partial1; model D=S; run;             | /*fit linear regression model D=a+bS*/

B2. Weighted summary ROC

data partial2; set partial1;
 w=1/(1/tp+1/fp+1/tn+1/fn);                        | /*calculate the weight for each study*/
proc reg data=partial2; model D=S; weight w; run;     | /*fit weighted regression using the created
                                               | weights*/

B3. SAS MIXED procedure to fit bivariate LMM

data partial3; set partial1; id= n_;          | /* make each study have two observations, one for
 dis=1; non dis=0; logit=logitse;          | sensitivity, the other for specificity.*/
var logit=var logitse; rec+ 1; output;
 dis=0; non dis=1; logit=logitsp;
var logit=var_logitsp; rec+1 ; output; run;
data cov;                             | /* build the data containing variable ‘est’ with 3 starting
 if n eq 1 then do;                     | values for the covariance parameters of the random effects
  est=0; output; est=0; output; est=0 ; output; | and 60 within study arm variances.*/
 end;
 set partial3; est = var_logit; output;
 keep est; run;
proc mixed data=partial3 method=reml cl;   | /*choose the residual (restricted) method(reml), ‘cl’ asks
class id;                               | for confidence limits for covariance parameter estimates.*/
model logit= dis non_dis / noint s cl covb     | /*indicator variables ‘dis’ and ‘non_dis’ are explanatory
df=1000, 1000;                          | variables for logit(Se) and logit(Sp). ‘covb’ asks for
                                      | covariance matrix of fixed effects parameters. Large ‘df”
                                      | approximate a t distribution to a normal distribution.*/
random dis non_dis / subject=id type=un s;    | /* random effects corresponds to disease and non_disease
                                      | status. An unstructured working covariance structure is
                                      | stated to assume possible correlation of ‘dis’ and ‘non_dis’
                                      | within the same study.*/
repeated / group=rec;                     | /* ‘group=rec’ statement specifies the with-in study-arm
                                      | variance in each study.*/
parms / parmsdata=cov hold=4 to 63 ;        | /*‘parmsdata’ option reads in variable ‘est’ from the cov
run;                                   | data. 60 within study-arm variances are kept constant.*/

B4. SAS NLMIXED procedure to fit the hierarchical summary ROC Model

data partial4; set partial; id= n ;                        | /* make each study has two records */
 dis=0.5; ny=tp; n=n1; se=1; output;                     | /*code ‘dis’ as 0.5 and ‘se’=1 for disease
 dis=-0.5; ny=tn; n=n0; se=0 ; output;                    | patients, ‘dis’ as -0.5 and ‘se’=0 for non-
keep id ny n dis se; run;                               | disease subjects*/
proc nlmixed data=partial4;
parms theta=-1 alpha=4 beta=-0.6 sigtheta=0.7 sigalpha=1.7;  | /* assign starting values for parameters.*/
logitp=2*dis*((theta+ut)+(alpha+ua)*dis)*exp(-beta*dis);     | /* code ‘logitp’ as logit(Se) and logit(Sp)
                                                   | */
p=exp(logitp)/(1+exp(logitp));                           | /*logit transform is applied to the
                                                   | probabilities*/
model ny~binomial(n,p);                               | /* number of tp and tn are Bin(n, p)
                                                   | distributed */
random ut ua ~                                       | /* ‘ut’ and ‘ua’ are random effects
normal([0,0],[exp(2*sigtheta),0,exp(2*sigalpha)]) subject=id;  | clustered within study. Independence is
                                                    | assumed between random effects.
                                                    | Exponential formed variance is to ensure
 estimate "se" 1/(1+exp(-((theta+0.5*alpha)*exp(-0.5*beta))));  | positivity.*/
 estimate "sp" 1/(1+exp((theta-0.5*alpha)*exp( 0.5*beta)));     | /*use estimate statement to get
 estimate "sigtheta" exp(sigtheta); estimate "sigalpha"         | estimates of desired indices with standard
exp(sigalpha); run;                                     | errors. */

B5. SAS NLMIXED procedure to fit bivariate GLMM with logit link

proc nlmixed data=partial4 fd cov corr df=1000 gtol=1e-11;  | /* ‘fd’ specifies that all derivatives be
parms mu0=1.5 nu0=-2.2 fz= 0.23 sigse=0.37 sigsp=-0.26;   | computed using finite difference
                                                  | approximations. /
rho= (exp(2*fz)-1)/(1+exp(2*fz));                       | /* use fisher’s z transformation instead of
                                                   | the correlation coefficient ρ directly to
 if Se=1 then beta=mu0+mu; if Se=0 then beta=nu0+nv;      | ensure –1 ≤ ρ 1*/
 pred=exp(beta)/(1+exp(beta));
 model ny~binomial(n, pred);                           | /* ‘tp’ and ‘tn’ are binomially distributed
                                                   | condition on random effects ‘mu’ and
                                                   | ‘nv’.*/
random mu nv ~ normal([0, 0 ],                         | /*random effects ‘mu’ and ‘nv’ are
              [exp(2*sigse), rho*exp(sigse)*exp(sigsp),   | bivariate normally distributed; ‘subject=id’
              exp(2*sigsp)]) subject=id;                | indicates possible correlation of random
estimate "Se" exp(mu0)/(1+exp(mu0));                   | effects within a study*/
estimate "Sp" exp(nu0)/(1+exp(nu0)); run;

B6. SAS NLMIXED procedure to fit trivariate GLMM

proc nlmixed data=partial fd df=1000 gtol=1e-10;           | /* model I is the best fitted model with
parms mu0=0 nv0=3 eta0=-1 sigse=0 sigsp=0 sigpi=-1 ;       | smallest AIC*/
logitsei = mu0 + mu; logitspi = nv0 + nv;
logitpi = eta0 + eta;                                     | /*model prevalence (‘pi’) together with
 Sei= 1 /(1+exp(-logitsei)); Spi=1/(1+exp(-logitspi));          | Se and Sp*/
 pi=1 /( 1+exp(-logitpi));
logL= tp * (log(pi) + log(Sei )) + fp * (log(1-pi) + log( 1-Spi))   | /*log-likelihood for trivariate model*/
+ fn * (log(pi) + log(1-Sei)) + tn * (log(1-pi) + log(Spi ));       | /*specify general log-likelihood
model Y ~ general(logL);                                 | function. Any variable can be used as the
                                                     | dependent variable in this situation.*/
random mu nv eta~normal([0, 0, 0],                         | /*‘mu’, ‘nv’ and ‘eta’ are the random
                    [exp(2*sigse),                      | effects corresponding to Se, Sp and
                    0, exp(2*sigsp),                     | prevalence. Possible correlation could
                    0, 0, exp(2*sigpi)])   subject=id;      | exist within studies. The best model is
estimate "sigse" exp(sigse);                                | achieved when all the correlation
estimate "sigsp" exp(sigsp);                                | coefficients among the random effects
estimate "sigpi" exp(sigpi);                                | ‘mu’, ‘nv’, ‘eta’ are zero.*/
estimate "Se" 1/(1+exp(-mu0));
estimate "Sp" 1/(1+exp(-nv0));   run;

B7. SAS NLMIXED procedure to fit the latent class random effect model IVe for Pap Smears test

proc nlmixed data=walter1999 cov fd;
 parms mua0=0.6 nva0=1.6 mub0=2 nvb0=4.5 eta0=0.6
       sigpi=0.4 sigmua=0.3 signva=0.2 sigmub=0 fz2=-0.5;         | /* five parameters are
                                                          | modeled: ‘SeA’ and ‘SpA’ for
       SeA=exp(mua0+mua)/(1+exp(mua0+mua));                | the Pap smear test, ‘SeB’ and
       SpA =exp(nva0+nva)/(1+exp(nva0+nva));                  | ‘SpB’ for the histology test
       SeB=exp(mub0)/(1+exp(mub0));                         | and ‘pi’ for disease
       SpB =exp(nvb0)/(1+exp(nvb0));                          | prevalence*/
       pi=exp(eta0+eta)/(1+exp(eta0+eta));                       | /*fisher’s z transformation;
                                                           | model V with correlation only
       rho2=(exp(2*fz2)-1)/(exp(2*fz2)+1);                      | between Se and Sp of the Pap
                                                           | smear test*/
       p11=pi*SeA*SeB+(1-pi)*(1-SpA)*(1 -SpB);                | /*expected probabilities in 2*2
       p01=pi*SeA*(1-SeB)+(1-pi)*(1-SpA)*SpB;                 | table*/
       p10=pi*(1-SeA)*SeB+(1-pi)*SpA*(1-SpB);
       p00=pi*(1-SeA)*(1-SeB)+(1-pi)*SpA*SpB;
                                                           | /*log likelihood*/
       logl=n11*log(p11)+n01*log(p01)+n10*log(p10)+n00*log(p00);
model y ~ general(logl);                                        | /*the best model selected
random eta mua nva~normal([0,0,0],[exp(2 *sigpi),                  | evolves four random effects.
             0, exp(2*sigmua),                                | model IVe assumes only
             0, rho2*exp(sigmua)*exp(signva),exp(2*signva)])       | correlation between ‘mua’ and
       subject=id;                                            | ‘nva’, i.e. sensitivity and
estimate       "SeA" exp(mua0)/(1+exp(mua0))                    | specificity of the Pap sme test
estimate       "SpA" exp(nva0)/(1+exp(nva0));                    | are correlated.*/
estimate       "pi" exp(eta0)/(1+exp(eta0));
estimate       "rhomuanva" (exp(2*fz2)-1)/(exp(2*fz2)+1);
estimate       "SeB" exp(mub0)/(1+exp(mub0));
estimate       "SpB" exp(nvb0)/(1+exp(nvb0));  run;

References

1.Zhou XH, Obuchowski NA, McClish DK. Statistical methods in diagnostic medicine. New York: John Wiley & Sons; 2002. [Google Scholar]
2.Pepe MS. The statistical evaluation of medical tests for classification and prediction. Oxford: Oxford University Press; 2003. [Google Scholar]
3.Rutter CA, Gatsonis CA. A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations. Statistics in Medicine. 2001;20:2865–84. doi: 10.1002/sim.942. [DOI] [PubMed] [Google Scholar]
4.Song F, Khan KS, Dinnes J, Sutton AJ. Asymmetric funnel plots and publication bias in meta-analyses of diagnostic accuracy. International Journal of Epidemiology. 2002;31:88–95. doi: 10.1093/ije/31.1.88. [DOI] [PubMed] [Google Scholar]
5.van Houwelingen HC. Advanced methods in meta-analysis: multivariate approach and meta-regression. Statistics in Medicine. 2002;21:589–624. doi: 10.1002/sim.1040. [DOI] [PubMed] [Google Scholar]
6.Harbord RM, Deeks JJ, Egger M, Whiting P, Sterne JAC. A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics. 2007;8:239–51. doi: 10.1093/biostatistics/kxl004. [DOI] [PubMed] [Google Scholar]
7.Macaskill P. Empirical Bayes estimates generated in a hierarchical summary ROC analysis agreed closely with those of a full Bayesian analysis. Journal of Clinical Epidemiology. 2004;57:925–32. doi: 10.1016/j.jclinepi.2003.12.019. [DOI] [PubMed] [Google Scholar]
8.Mallett S, Deeks JJ, Halligan S, Hopewell S, Cornelius V, Altman DG. Systematic reviews of diagnostic tests in cancer: review of methods and reporting. BMJ. 2006;333:413. doi: 10.1136/bmj.38895.467130.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Zwinderman AH, Bossuyt PM. We should not pool diagnostic likelihood ratios in systematic reviews. Statistics in Medicine. 2008;27:687–97. doi: 10.1002/sim.2992. [DOI] [PubMed] [Google Scholar]
10.Arends LR, Hamza TH, van Houwelingen JC, Heijenbrok-Kal MH, Hunink MGM, Stijnen T. Bivariate Random Effects Meta-Analysis of ROC Curves. Medical Decision Making. 2008;28:621–38. doi: 10.1177/0272989X08319957. [DOI] [PubMed] [Google Scholar]
11.Reitsma JB. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. Journal of Clinical Epidemiology. 2005;58:982–90. doi: 10.1016/j.jclinepi.2005.02.022. [DOI] [PubMed] [Google Scholar]
12.Chu H, Cole SR. Bivariate meta-analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach. Journal of clinical epidemiology. 2006;59:1331–2. doi: 10.1016/j.jclinepi.2006.06.011. [DOI] [PubMed] [Google Scholar]
13.Chu H, Guo H. Letter to the editor: a unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics. 2009;10:201–3. doi: 10.1093/biostatistics/kxn040. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Riley R, Abrams K, Sutton A, Lambert P, Thompson J. Bivariate random-effects meta-analysis and the estimation of between-study correlation. BMC Medical Research Methodology. 2007;7:3. doi: 10.1186/1471-2288-7-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Riley RD, Abrams KR, Lambert PC, Sutton AJ, Thompson JR. An evaluation of bivariate random-effects meta-analysis for the joint synthesis of two correlated outcomes. Statistics in Medicine. 2007;26:78–97. doi: 10.1002/sim.2524. [DOI] [PubMed] [Google Scholar]
16.Hamza TH, van Houwelingen HC, Stijnen T. The binomial distribution of meta-analysis was preferred to model within-study variability. Journal of Clinical Epidemiology. 2008;61:41–51. doi: 10.1016/j.jclinepi.2007.03.016. [DOI] [PubMed] [Google Scholar]
17.Chu H, Nie L, Cole SR, Poole C. Meta-analysis of diagnostic accuracy studies accounting for disease prevalence: Alternative parameterizations and model selection. Statistics in Medicine. 2009;28:2384–99. doi: 10.1002/sim.3627. [DOI] [PubMed] [Google Scholar]
18.Walter SD, Irwig L, Glasziou PP. Meta-analysis of diagnostic tests with imperfect reference standards. Journal of Clinical Epidemiology. 1999;52:943–51. doi: 10.1016/s0895-4356(99)00086-4. [DOI] [PubMed] [Google Scholar]
19.Chu H, Chen SN, Louis TA. Random Effects Models in a Meta-Analysis of the Accuracy of Two Diagnostic Tests Without a Gold Standard. Journal of the American Statistical Association. 2009;104:512–23. doi: 10.1198/jasa.2009.0017. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Sadatsafavi M, Shahidi N, Marra F, et al. A statistical method was used for the meta-analysis of tests for latent TB in the absence of a gold standard, combining random-effect and latent-class methods to estimate test accuracy. JClinEpidemiol. 2010;63:257–69. doi: 10.1016/j.jclinepi.2009.04.008. [DOI] [PubMed] [Google Scholar]
21.Dendukuri N, Schiller I, Joseph L, Pai M. Bayesian Meta-Analysis of the Accuracy of a Test for Tuberculous Pleuritis in the Absence of a Gold Standard Reference. Biometrics. 2012 doi: 10.1111/j.1541-0420.2012.01773.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Moses LE, Shapiro D, Littenberg B. Combining Independent Studies of A Diagnostic-Test Into A Summary Roc Curve - Data-Analytic Approaches and Some Additional Considerations. Statistics in Medicine. 1993;12:1293–316. doi: 10.1002/sim.4780121403. [DOI] [PubMed] [Google Scholar]
23.Walter SD. Properties of the summary receiver operating characteristic (SROC) curve for diagnostic test data. Statistics in Medicine. 2002;21:1237–56. doi: 10.1002/sim.1099. [DOI] [PubMed] [Google Scholar]
24.Walter SD. The partial area under the summary ROC curve. Statistics in Medicine. 2005;24:2025–40. doi: 10.1002/sim.2103. [DOI] [PubMed] [Google Scholar]
25.Sutton AJ, Cooper NJ, Goodacre S, Stevenson M. Integration of Meta-analysis and Economic Decision Modeling for Evaluating Diagnostic Tests. Medical Decision Making. 2008;28:650–67. doi: 10.1177/0272989X08324036. [DOI] [PubMed] [Google Scholar]
26.Irwig L, Tosteson ANA, Gatsonis C, et al. Guidelines for Meta-analyses Evaluating Diagnostic Tests. Annals of Internal Medicine. 1994;120:667–76. doi: 10.7326/0003-4819-120-8-199404150-00008. [DOI] [PubMed] [Google Scholar]
27.Deeks JJ. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. British Medical Journal. 2001;323:157–62. doi: 10.1136/bmj.323.7305.157. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Sweeting MJ, Sutton AJ, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Stat Med. 2004;23:1351–75. doi: 10.1002/sim.1761. [DOI] [PubMed] [Google Scholar]
29.Starmer CF, Grizzle JE, Sen PK. Some reasons for not using the Yates continuity corrections in meta-analysis of sparse data. Journal of the American Statistical Association. 1974;69:376–8. [Google Scholar]
30.Tosteson AN, Begg CB. A general regression methodology for ROC curve estimation. MedDecisMaking. 1988;8:204–15. doi: 10.1177/0272989X8800800309. [DOI] [PubMed] [Google Scholar]
31.Chu H, Guo H, Zhou Y. Bivariate Random Effects Meta-Analysis of Diagnostic Studies Using Generalized Linear Mixed Models. Medical Decision Making. 2010;30:499–508. doi: 10.1177/0272989X09353452. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Ransohoff DF, Feinstein AR. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. NEnglJMed. 1978;299:926–30. doi: 10.1056/NEJM197810262991705. [DOI] [PubMed] [Google Scholar]
33.Brenner H, Gefeller O. Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence. Statistics in Medicine. 1997;16:981–91. doi: 10.1002/(sici)1097-0258(19970515)16:9<981::aid-sim510>3.0.co;2-n. [DOI] [PubMed] [Google Scholar]
34.Li J, Fine JP. Assessing the dependence of sensitivity and specificity on prevalence in meta-analysis. Biostatistics. 2011 doi: 10.1093/biostatistics/kxr008. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Shin KM. Partial-thickness rotator cuff tears. Korean J Pain. 2011;24:69–73. doi: 10.3344/kjp.2011.24.2.69. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Singisetti K. Shoulder ultrasonography versus arthroscopy for the detection of rotator cuff tears: analysis of errors. J Orthop Surg (Hong Kong) 2011 Apr;19(1):76–9. doi: 10.1177/230949901101900118. [DOI] [PubMed] [Google Scholar]
37.Smith TO. Diagnostic accuracy of ultrasound for rotator cuff tears in adults: A systematic review and meta-analysis. Clinical Radiology. 2011 Jul 5; doi: 10.1016/j.crad.2011.05.007. [DOI] [PubMed] [Google Scholar]
38.Fahey MT, Irwig L, Macaskill P. Meta-analysis of Pap test accuracy. American Journal of Epidemiology. 1995;141:680–9. doi: 10.1093/oxfordjournals.aje.a117485. [DOI] [PubMed] [Google Scholar]
39.Takwoingi Y, Deeks JJ. METADAS: an SAS macro for meta-analysis of diagnostic accuracy studies. 2011. [Google Scholar]
40.Hamza TH, van Houwelingen HC, Stijnen T. The binomial distribution of meta-analysis was preferred to model within-study variability. Journal of clinical epidemiology. 2008;61:41–51. doi: 10.1016/j.jclinepi.2007.03.016. [DOI] [PubMed] [Google Scholar]
41.Riley RD, Thompson JR, Abrams KR. An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown. Biostatistics (Oxford, England) 2008;9:172–86. doi: 10.1093/biostatistics/kxm023. [DOI] [PubMed] [Google Scholar]
42.de Groot JA, Dendukuri N, Janssen KJ, et al. Adjusting for partial verification or workup bias in meta-analyses of diagnostic accuracy studies. American Journal of Epidemiology. 2012;175:847–53. doi: 10.1093/aje/kwr383. [DOI] [PubMed] [Google Scholar]

[R1] 1.Zhou XH, Obuchowski NA, McClish DK. Statistical methods in diagnostic medicine. New York: John Wiley & Sons; 2002. [Google Scholar]

[R2] 2.Pepe MS. The statistical evaluation of medical tests for classification and prediction. Oxford: Oxford University Press; 2003. [Google Scholar]

[R3] 3.Rutter CA, Gatsonis CA. A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations. Statistics in Medicine. 2001;20:2865–84. doi: 10.1002/sim.942. [DOI] [PubMed] [Google Scholar]

[R4] 4.Song F, Khan KS, Dinnes J, Sutton AJ. Asymmetric funnel plots and publication bias in meta-analyses of diagnostic accuracy. International Journal of Epidemiology. 2002;31:88–95. doi: 10.1093/ije/31.1.88. [DOI] [PubMed] [Google Scholar]

[R5] 5.van Houwelingen HC. Advanced methods in meta-analysis: multivariate approach and meta-regression. Statistics in Medicine. 2002;21:589–624. doi: 10.1002/sim.1040. [DOI] [PubMed] [Google Scholar]

[R6] 6.Harbord RM, Deeks JJ, Egger M, Whiting P, Sterne JAC. A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics. 2007;8:239–51. doi: 10.1093/biostatistics/kxl004. [DOI] [PubMed] [Google Scholar]

[R7] 7.Macaskill P. Empirical Bayes estimates generated in a hierarchical summary ROC analysis agreed closely with those of a full Bayesian analysis. Journal of Clinical Epidemiology. 2004;57:925–32. doi: 10.1016/j.jclinepi.2003.12.019. [DOI] [PubMed] [Google Scholar]

[R8] 8.Mallett S, Deeks JJ, Halligan S, Hopewell S, Cornelius V, Altman DG. Systematic reviews of diagnostic tests in cancer: review of methods and reporting. BMJ. 2006;333:413. doi: 10.1136/bmj.38895.467130.55. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Zwinderman AH, Bossuyt PM. We should not pool diagnostic likelihood ratios in systematic reviews. Statistics in Medicine. 2008;27:687–97. doi: 10.1002/sim.2992. [DOI] [PubMed] [Google Scholar]

[R10] 10.Arends LR, Hamza TH, van Houwelingen JC, Heijenbrok-Kal MH, Hunink MGM, Stijnen T. Bivariate Random Effects Meta-Analysis of ROC Curves. Medical Decision Making. 2008;28:621–38. doi: 10.1177/0272989X08319957. [DOI] [PubMed] [Google Scholar]

[R11] 11.Reitsma JB. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. Journal of Clinical Epidemiology. 2005;58:982–90. doi: 10.1016/j.jclinepi.2005.02.022. [DOI] [PubMed] [Google Scholar]

[R12] 12.Chu H, Cole SR. Bivariate meta-analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach. Journal of clinical epidemiology. 2006;59:1331–2. doi: 10.1016/j.jclinepi.2006.06.011. [DOI] [PubMed] [Google Scholar]

[R13] 13.Chu H, Guo H. Letter to the editor: a unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics. 2009;10:201–3. doi: 10.1093/biostatistics/kxn040. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Riley R, Abrams K, Sutton A, Lambert P, Thompson J. Bivariate random-effects meta-analysis and the estimation of between-study correlation. BMC Medical Research Methodology. 2007;7:3. doi: 10.1186/1471-2288-7-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Riley RD, Abrams KR, Lambert PC, Sutton AJ, Thompson JR. An evaluation of bivariate random-effects meta-analysis for the joint synthesis of two correlated outcomes. Statistics in Medicine. 2007;26:78–97. doi: 10.1002/sim.2524. [DOI] [PubMed] [Google Scholar]

[R16] 16.Hamza TH, van Houwelingen HC, Stijnen T. The binomial distribution of meta-analysis was preferred to model within-study variability. Journal of Clinical Epidemiology. 2008;61:41–51. doi: 10.1016/j.jclinepi.2007.03.016. [DOI] [PubMed] [Google Scholar]

[R17] 17.Chu H, Nie L, Cole SR, Poole C. Meta-analysis of diagnostic accuracy studies accounting for disease prevalence: Alternative parameterizations and model selection. Statistics in Medicine. 2009;28:2384–99. doi: 10.1002/sim.3627. [DOI] [PubMed] [Google Scholar]

[R18] 18.Walter SD, Irwig L, Glasziou PP. Meta-analysis of diagnostic tests with imperfect reference standards. Journal of Clinical Epidemiology. 1999;52:943–51. doi: 10.1016/s0895-4356(99)00086-4. [DOI] [PubMed] [Google Scholar]

[R19] 19.Chu H, Chen SN, Louis TA. Random Effects Models in a Meta-Analysis of the Accuracy of Two Diagnostic Tests Without a Gold Standard. Journal of the American Statistical Association. 2009;104:512–23. doi: 10.1198/jasa.2009.0017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Sadatsafavi M, Shahidi N, Marra F, et al. A statistical method was used for the meta-analysis of tests for latent TB in the absence of a gold standard, combining random-effect and latent-class methods to estimate test accuracy. JClinEpidemiol. 2010;63:257–69. doi: 10.1016/j.jclinepi.2009.04.008. [DOI] [PubMed] [Google Scholar]

[R21] 21.Dendukuri N, Schiller I, Joseph L, Pai M. Bayesian Meta-Analysis of the Accuracy of a Test for Tuberculous Pleuritis in the Absence of a Gold Standard Reference. Biometrics. 2012 doi: 10.1111/j.1541-0420.2012.01773.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Moses LE, Shapiro D, Littenberg B. Combining Independent Studies of A Diagnostic-Test Into A Summary Roc Curve - Data-Analytic Approaches and Some Additional Considerations. Statistics in Medicine. 1993;12:1293–316. doi: 10.1002/sim.4780121403. [DOI] [PubMed] [Google Scholar]

[R23] 23.Walter SD. Properties of the summary receiver operating characteristic (SROC) curve for diagnostic test data. Statistics in Medicine. 2002;21:1237–56. doi: 10.1002/sim.1099. [DOI] [PubMed] [Google Scholar]

[R24] 24.Walter SD. The partial area under the summary ROC curve. Statistics in Medicine. 2005;24:2025–40. doi: 10.1002/sim.2103. [DOI] [PubMed] [Google Scholar]

[R25] 25.Sutton AJ, Cooper NJ, Goodacre S, Stevenson M. Integration of Meta-analysis and Economic Decision Modeling for Evaluating Diagnostic Tests. Medical Decision Making. 2008;28:650–67. doi: 10.1177/0272989X08324036. [DOI] [PubMed] [Google Scholar]

[R26] 26.Irwig L, Tosteson ANA, Gatsonis C, et al. Guidelines for Meta-analyses Evaluating Diagnostic Tests. Annals of Internal Medicine. 1994;120:667–76. doi: 10.7326/0003-4819-120-8-199404150-00008. [DOI] [PubMed] [Google Scholar]

[R27] 27.Deeks JJ. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. British Medical Journal. 2001;323:157–62. doi: 10.1136/bmj.323.7305.157. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Sweeting MJ, Sutton AJ, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Stat Med. 2004;23:1351–75. doi: 10.1002/sim.1761. [DOI] [PubMed] [Google Scholar]

[R29] 29.Starmer CF, Grizzle JE, Sen PK. Some reasons for not using the Yates continuity corrections in meta-analysis of sparse data. Journal of the American Statistical Association. 1974;69:376–8. [Google Scholar]

[R30] 30.Tosteson AN, Begg CB. A general regression methodology for ROC curve estimation. MedDecisMaking. 1988;8:204–15. doi: 10.1177/0272989X8800800309. [DOI] [PubMed] [Google Scholar]

[R31] 31.Chu H, Guo H, Zhou Y. Bivariate Random Effects Meta-Analysis of Diagnostic Studies Using Generalized Linear Mixed Models. Medical Decision Making. 2010;30:499–508. doi: 10.1177/0272989X09353452. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Ransohoff DF, Feinstein AR. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. NEnglJMed. 1978;299:926–30. doi: 10.1056/NEJM197810262991705. [DOI] [PubMed] [Google Scholar]

[R33] 33.Brenner H, Gefeller O. Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence. Statistics in Medicine. 1997;16:981–91. doi: 10.1002/(sici)1097-0258(19970515)16:9<981::aid-sim510>3.0.co;2-n. [DOI] [PubMed] [Google Scholar]

[R34] 34.Li J, Fine JP. Assessing the dependence of sensitivity and specificity on prevalence in meta-analysis. Biostatistics. 2011 doi: 10.1093/biostatistics/kxr008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Shin KM. Partial-thickness rotator cuff tears. Korean J Pain. 2011;24:69–73. doi: 10.3344/kjp.2011.24.2.69. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Singisetti K. Shoulder ultrasonography versus arthroscopy for the detection of rotator cuff tears: analysis of errors. J Orthop Surg (Hong Kong) 2011 Apr;19(1):76–9. doi: 10.1177/230949901101900118. [DOI] [PubMed] [Google Scholar]

[R37] 37.Smith TO. Diagnostic accuracy of ultrasound for rotator cuff tears in adults: A systematic review and meta-analysis. Clinical Radiology. 2011 Jul 5; doi: 10.1016/j.crad.2011.05.007. [DOI] [PubMed] [Google Scholar]

[R38] 38.Fahey MT, Irwig L, Macaskill P. Meta-analysis of Pap test accuracy. American Journal of Epidemiology. 1995;141:680–9. doi: 10.1093/oxfordjournals.aje.a117485. [DOI] [PubMed] [Google Scholar]

[R39] 39.Takwoingi Y, Deeks JJ. METADAS: an SAS macro for meta-analysis of diagnostic accuracy studies. 2011. [Google Scholar]

[R40] 40.Hamza TH, van Houwelingen HC, Stijnen T. The binomial distribution of meta-analysis was preferred to model within-study variability. Journal of clinical epidemiology. 2008;61:41–51. doi: 10.1016/j.jclinepi.2007.03.016. [DOI] [PubMed] [Google Scholar]

[R41] 41.Riley RD, Thompson JR, Abrams KR. An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown. Biostatistics (Oxford, England) 2008;9:172–86. doi: 10.1093/biostatistics/kxm023. [DOI] [PubMed] [Google Scholar]

[R42] 42.de Groot JA, Dendukuri N, Janssen KJ, et al. Adjusting for partial verification or workup bias in meta-analyses of diagnostic accuracy studies. American Journal of Epidemiology. 2012;175:847–53. doi: 10.1093/aje/kwr383. [DOI] [PubMed] [Google Scholar]

PERMALINK

Statistical Methods for Multivariate Meta-analysis of Diagnostic Tests: An Overview and Tutorial

Xiaoye Ma

Lei Nie

Stephen R Cole

Haitao Chu

Summary

1. Introduction

2. Statistical methods when the reference test is a gold standard

Table 1.

2.1 The summary ROC method

2.2 A Bivariate Approach Based on Linear Mixed Models

2.3 The Hierarchical summary ROC Approach

2.4 The Bivariate Generalized Linear Mixed Model

2.5 The Trivariate Generalized Linear Mixed Model

Table 2.

3. Statistical methods when the reference test is not a gold standard

Table 3.

4. Case Study

4.1 A meta-analysis of rotator cuff tears diagnosis using ultra-sound

4.1.1 Study background

Figure 1.

Figure 2.

4.1.2 Summary ROC method

Figure 3.

4.1.3 Bivariate LMM

4.1.4 Hierarchical summary ROC model

4.1.5 Bivariate GLMM method

Table 4.

4.1.6 Trivariate GLMM

Table 5.

4.2 A meta-analysis of cervical cancer diagnosis using Pap smears test

Table 6.

Table 7.

5. Discussion

Acknowledgments

Appendix A. Data for case studies

Appendix A1.

Appendix A2.

Appendix B. SAS codes for fitting models

B1. Unweighted summary ROC

B2. Weighted summary ROC

B3. SAS MIXED procedure to fit bivariate LMM

B4. SAS NLMIXED procedure to fit the hierarchical summary ROC Model

B5. SAS NLMIXED procedure to fit bivariate GLMM with logit link

B6. SAS NLMIXED procedure to fit trivariate GLMM

B7. SAS NLMIXED procedure to fit the latent class random effect model IVe for Pap Smears test

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases