Recent Developments in the Dorfman-Berbaum-Metz Procedure for Multireader ROC Study Analysis

Stephen L Hillis; Kevin S Berbaum; Charles E Metz

doi:10.1016/j.acra.2007.12.015

. Author manuscript; available in PMC: 2009 Sep 3.

Published in final edited form as: Acad Radiol. 2008 May;15(5):647–661. doi: 10.1016/j.acra.2007.12.015

Recent Developments in the Dorfman-Berbaum-Metz Procedure for Multireader ROC Study Analysis

Stephen L Hillis ¹, Kevin S Berbaum ², Charles E Metz ³

PMCID: PMC2737462 NIHMSID: NIHMS48019 PMID: 18423323

Abstract

Rationale and Objectives

The Dorfman-Berbaum-Metz (DBM) method has been one of the most popular methods for analyzing multireader receiver operating characteristic (ROC) studies since it was proposed in 1992. Despite its popularity, the original procedure has several drawbacks: it is limited to jackknife accuracy estimates, it is substantially conservative, and it is not based on a satisfactory conceptual or theoretical model. Recently, solutions to these problems have been presented in three papers. Our purpose is to summarize and provide an overview of these recent developments.

Materials and Methods

We present and discuss the recently proposed solutions for the various drawbacks of the original DBM method.

Results

We compare the solutions in a simulation study and find that they result in improved performance for the DBM procedure. We also compare the solutions using two real data studies and find that the modified DBM procedure that incorporates these solutions yields more significant results and clearer interpretations of the variance component parameters than the original DBM procedure.

Conclusions

We recommend using the modified DBM procedure that incorporates the recent developments.

Keywords: receiver operating characteristic (ROC) curve, DBM, diagnostic radiology, jackknife, area under the curve (AUC)

Introduction

There are several different statistical methods for analyzing multireader receiver operating characteristic (ROC) studies, with the Dorfman-Berbaum-Metz (DBM) method [1–3] being one of the most frequently used methods. The DBM method involves an analysis of variance (ANOVA) of pseudovalues computed with the Quenouille-Tukey jackknife [4–6]. The basic data for the analysis are pseudovalues corresponding to test-reader ROC accuracy measures, such as the area under the ROC curve (AUC), computed by jackknifing cases separately for each test-reader combination. Throughout we use the term test to refer to a diagnostic test, modality, or treatment. A mixed-effects ANOVA is performed on the pseudovalues to test the null hypothesis that the average accuracy of readers is the same for all of the diagnostic tests studied. Accuracy can be characterized using any accuracy measure, such as sensitivity, specificity, area under the ROC curve, partial area under the ROC curve, sensitivity at a fixed specificity, or specificity at a fixed sensitivity. Furthermore, these measures of accuracy can be estimated parametrically, semiparametrically or nonparametrically; the DBM method accuracy estimates are the corresponding jackknife estimates.

Although the DBM method may be the most frequently used analysis method for multireader ROC studies since it was proposed in 1992, having been used in over 100 published studies [7], the original procedure has several drawbacks: it requires that the analysis be based on jackknife accuracy estimates, it is substantially conservative, and it is not based on a satisfactory conceptual or theoretical model. Recently, solutions to these problems have been presented in three papers [8–10]. We summarize these recent developments and compare the solutions in a simulation study and in two examples.

Materials and Methods

Original DBM Method

The DBM method is typically used with the test×reader × case factorial study design where each case (i.e., patient) undergoes each of several diagnostic tests and the resulting images are interpreted once by each reader. Throughout this paper, we assume that the data have been collected using this factorial design. The competing modalities can be compared using the DBM method; in particular, the null hypothesis of no test effect can be tested and confidence intervals for test differences can be computed. Results generalize to both the population of cases and the population of readers. To simplify the narration here, we assume that the outcome is AUC.

For the original DBM method, AUC pseudovalues are computed using the Quenouille-Tukey jackknife separately for each test-reader combination as described in Dorfman et al [1]. Let Y_ijk denote the AUC pseudovalue for test i, reader j, and case k; by definition Y_ijk = cθ̂_ij−(c−1)θ̂_ij₍_k₎, where c denotes the number of cases, θ̂_ij denotes the AUC estimate based on all of the data for the ith test and jth reader, and θ̂_ij₍_k₎, denotes the AUC estimate based on the same data but with data for the kth case removed. Thus, in effect, Y_ijk represents the contribution of the kth case to the accuracy estimate for the ith test and jth reader, θ̂_ij. Then using the Y_ijk as the data to be evaluated by conventional statistical analysis, the DBM procedure tests for a test effect using a fully-crossed three-factor ANOVA with test treated as a fixed factor and reader and case as random factors. A “jackknife estimate” of AUC for the ith test and jth reader is given by the mean of the corresponding pseudovalues:

{\bar{Y}}_{i j \cdot} = \frac{1}{c} \sum_{k = 1}^{c} Y_{ijk} .

(1)

We refer to θ̂_ij as the original AUC estimate, Ȳ_ij_· as the jackknife AUC estimate, and the Y_ijk as the raw pseudovalues.

The analysis model is expressed by

Y_{ijk} = μ + τ_{i} + R_{j} + C_{k} + {(τ R)}_{i j} + {(τ C)}_{i k} + {(R C)}_{j k} + {(τ R C)}_{ijk} + ε_{ijk},

(2)

i=1,…,t; j=1,…,r;k=1,…,c; where τ_i denotes the fixed effect of test i, R_j denotes the random effect of reader j, C_k denotes the random effect of case k, the multiple symbols in parentheses denote interactions, and ε_ijk is the error term. The interaction terms are all random effects. The random effects are assumed to be mutually independent and normally distributed with zero means and respective variances $σ_{R}^{2}, σ_{C}^{2}, σ_{τ R}^{2}, σ_{τ C}^{2}, σ_{R C}^{2}, σ_{τ R C}^{2}$ and $σ_{ε}^{2}$ . Since there are no replications, $σ_{τ R C}^{2}$ and $σ_{ε}^{2}$ are inseparable.

The DBM F statistic for testing for a test effect is the conventional mixed-model ANOVA F statistic based on the pseudovalues. Letting MS(T), MS(T*R), MS(T*C), and MS(T*R*C) denote the mean squares corresponding to the test, test×reader, test×case and test×reader × case effects, respectively, the F statistic for testing for a test effect for model (2) is given by

F = \frac{MS (T)}{MS (T * R) + MS (T * C) - MS (T * R * C)} .

(3)

Under the null hypothesis of no test effect, F has an approximate F_df₁,df₂ distribution, where df₁ = t−1 and df₂ is the Satterthwaite [11, 12] degrees of freedom approximation given by

{df}_{2} = \frac{{[MS (T * R) + MS (T * C) - MS (T * R * C)]}^{2}}{\frac{MS {(T * R)}^{2}}{(t - 1) (r - 1)} + \frac{MS {(T * C)}^{2}}{(t - 1) (c - 1)} + \frac{MS {(T * R * C)}^{2}}{(t - 1) (r - 1) (c - 1)}} .

(4)

In the original DBM formulation, extensive model-based simplification is performed to prevent the F statistic (3) from becoming negative (due to a negative denominator). Specifically, model (2) is simplified by omitting (or equivalently, setting to zero) the test×reader and the test×case variance components if the corresponding ANOVA estimates are not positive. For the simplified model the appropriate F statistic and denominator degrees of freedom (ddf) are used; the appropriate F statistic for each simplified model contains only one mean square in the denominator and hence cannot be negative. Thus equations (3–4) are used only when both of the variance component estimates are positive.

The test×reader and the test×case variance component ANOVA estimates are

\begin{array}{l} {\hat{σ}}_{τ R}^{2} = \frac{1}{c} [MS (T * R) - MS (T * R * C)] \\ {\hat{σ}}_{τ C}^{2} = \frac{1}{r} [MS (T * C) - MS (T * R * C)] . \end{array}

(5)

Taking into account possible model simplification, the F statistic and ddf for the original DBM method are given by

F_{orig} = {\begin{array}{l} \frac{MS (T)}{MS (T * R) + MS (T * C) - MS (T * R * C)} & {\hat{σ}}_{τ R}^{2} > 0, {\hat{σ}}_{τ C}^{2} > 0 \\ MS (T) / MS (T * R) & {\hat{σ}}_{τ R}^{2} > 0, {\hat{σ}}_{τ C}^{2} \leq 0 \\ MS (T) / MS (T * C) & {\hat{σ}}_{τ R}^{2} \leq 0, {\hat{σ}}_{τ C}^{2} > 0 \\ MS (T) / MS (T * R * C) & {\hat{σ}}_{τ R}^{2} \leq 0, {\hat{σ}}_{τ C}^{2} \leq 0 \end{array}

(6)

and

{ddf}_{orig} = {\begin{array}{l} equation (4) & {\hat{σ}}_{τ R}^{2} > 0, {\hat{σ}}_{τ C}^{2} > 0 \\ (t - 1) (r - 1) & {\hat{σ}}_{τ R}^{2} > 0, {\hat{σ}}_{τ C}^{2} \leq 0 \\ (t - 1) (c - 1) & {\hat{σ}}_{τ R}^{2} \leq 0, {\hat{σ}}_{τ C}^{2} > 0 \\ (t - 1) (r - 1) (c - 1) & {\hat{σ}}_{τ R}^{2} \leq 0, {\hat{σ}}_{τ C}^{2} \leq 0 \end{array} .

(7)

The numerator degrees of freedom for F in equation (6) is t−1. We refer to this approach, using F_orig and ddf_orig, as original DBM. Note that the conditions in equations (6) and (7) can also be written in terms of the mean squares; e.g., ${\hat{σ}}_{τ R}^{2} > 0, {\hat{σ}}_{τ C}^{2} > 0$ is equivalent to MS(T*R) > MS(T*R*C), MS(T*C) > MS(T*R*C).

Problem 1: DBM is limited to jackknife accuracy estimates

One problem with original DBM is that it requires that the analysis be based on jackknife AUC estimates. Although it is possible for the jackknife AUC estimator to perform better that the corresponding original AUC estimator, clearly it would be preferable to have the flexibility to base the analysis on either the jackknife or original accuracy estimator, especially if (as is typically the case) it has not been shown that the jackknife AUC estimator performs as well as the original AUC estimator. For trapezoidal-rule (trapezoid) AUC estimates [13] this is not a problem, since the trapezoid and corresponding jackknife AUC estimates are equal [8].

Hillis et al [8] provide a solution to this problem by showing that the DBM method can be based on normalized pseudovalues $Y_{ijk}^{*}$ , defined by $Y_{ijk}^{*} = Y_{ijk} + ({\overset{⌢}{θ}}_{i j} - {\bar{Y}}_{i j \cdot})$ . That is, the normalized pseudovalue for patient k, reader j, and test i is equal to the sum of the raw pseudovalue Y_ijk and the difference between the ijth test-reader original and jackknife AUC estimates. The estimate for θ_ij based on the normalized pseudovalues, given by ${\bar{Y}}_{i j \cdot}^{*} = \frac{1}{c} \sum_{k = 1}^{c} Y_{ijk}^{*}$ , is equal to the original AUC estimate θ̂_ij. Thus, the DBM procedure with normalized pseudovalues yields single test and test-difference confidence intervals centered on the original accuracy estimates and their differences, averaged across readers.

Problem 2: DBM is substantially conservative

Another problem with original DBM is that it is substantially conservative. Dorfman et al [3] conclude from simulations that the DBM method provides a “moderately conservative statistical test of modality differences,” with the degree of conservatism greatest with very large ROC areas and decreasing as the number of cases increases. Using the Roe and Metz [2] simulation structure, Hillis and Berbaum [14] report that, using semiparametric estimation with either normalized or raw pseudovalues, the average type I error across 144 combinations of reader-sample size, case-sample size, AUC, and variance components is.036, considerably lower than the nominal.05 significance level. The downside of a conservative test is that power is diminished compared to the same test with the critical value adjusted to yield significance levels closer to the nominal level.

In simulations Hillis [10] shows that the DBM procedure attains a type I error much closer to the nominal level when two modifications are incorporated: (1) less data-based model simplification is performed, and (2) a different ddf formula is used. We now discuss these two modifications.

Less data-based model simplification

Hillis et al [8] propose that, similar to original DBM, the test×case variance component be omitted if its ANOVA estimate is not positive; however, they stipulate that the test×reader variance component should never be omitted, even when its estimate is zero or negative. We refer to this approach as new model simplification. Like original DBM, new model simplification ensures that the F test statistic will not be negative. However, an important advantage of new model simplification is that it results in a less conservative test, with the type I error rate considerably closer to the nominal level [9]. Another advantage is that this approach avoids making inferences under the unrealistic assumption that differences between tests are the same for all readers in the population, which is implied when the test×reader variance component is omitted [14].

Using new model simplification, the F statistic for testing the null hypothesis of no test effect is the same as that given by equation (3) when ${\hat{σ}}_{τ C}^{2} > 0$ , whereas it is set equal to MS(T)/MS(T*R) when ${\hat{σ}}_{τ C}^{2} \leq 0$ . We denote this F statistic using new model simplification by F_DBM. Thus,

F_{DBM} = {\begin{array}{l} \frac{MS (T)}{MS (T * R) + MS (T * C) - MS (T * R * C)} & {\hat{σ}}_{τ C}^{2} > 0 \\ MS (T) / MS (T * R) & {\hat{σ}}_{τ C}^{2} \leq 0 \end{array} .

(8)

Since ${\hat{σ}}_{τ C}^{2} \leq 0$ is equivalent to MS(T*C)− MS(T*R*C) ≤ 0, this F statistic can be succinctly written in the following form that takes model simplification into account:

F_{DBM} = \frac{MS (T)}{MS (T * R) + max [MS (T * C) - MS (T * R * C), 0]} .

(9)

The corresponding conventional ANOVA ddf is given by

{ddf}_{D} = {\begin{array}{l} equation (4) & {\hat{σ}}_{τ C}^{2} > 0 \\ (t - 1) (r - 1) & {\hat{σ}}_{τ C}^{2} \leq 0 \end{array} .

(10)

Thus, new model simplification uses F_DBM and ddf_D.

In Appendix A we derive the following relationships: (1) if ${\hat{σ}}_{τ R}^{2} > 0$ then F_DBM = F_orig and ddf_D = ddf_orig; and (2) if ${\hat{σ}}_{τ R}^{2} \leq 0$ then F_DBM ≥ F_orig but ddf_D < ddf_orig. However, we have found that typically the larger F statistic under new model simplification, when ${\hat{σ}}_{τ R}^{2} < 0$ , will result in a more significant conclusion (smaller p-value), compared to that obtained using original DBM, even though the ddf is smaller under new model simplification. In this way new model simplification produces a less conservative test.

New denominator degrees of freedom

Hillis [10] proposes a new ddf given by

{ddf}_{H} = {\begin{array}{l} \frac{{MS (T * R) + MS (T * C) - MS (T * R * C)}^{2}}{MS {(T * R)}^{2} / [(t - 1) (r - 1)]} & {\hat{σ}}_{τ C}^{2} > 0 \\ (t - 1) (r - 1) & {\hat{σ}}_{τ C}^{2} \leq 0 \end{array} .

(11)

Equation (11) can be written more compactly in the form

{ddf}_{H} = \frac{{MS (T * R) + max [MS (T * C) - MS (T * R * C), 0]}^{2}}{\frac{MS {(T * R)}^{2}}{(t - 1) (r - 1)}} .

(12)

The quantity ddf_H is derived by assuming that new model simplification is used – that is, it is to be used with F_DBM (9). We refer to this approach, using F_DBM and ddf_H, as new model simplification plus ddf_H.

In Appendix A we show that ddf_H > ddf_D if ${\hat{σ}}_{τ C}^{2} > 0$ , whereas ddf_H = ddf_D if ${\hat{σ}}_{τ C}^{2} \leq 0$ . Since new model simplification and new model simplification plus ddf_H both use F_DBM, it follows that new model simplification plus ddf_H results in a lower p-value when ${\hat{σ}}_{τ C}^{2} > 0$ and the same p-value when ${\hat{σ}}_{τ C}^{2} \leq 0$ ; hence, it is less conservative than new model simplification.

Table 1 presents a summary of the three different DBM approaches – original DBM, new model simplification, and new model simplification plus ddf_H – and Table 2 presents their relationships.

Table 1.

Summary of the different DBM approaches

a) Original DBM

F_orig

ddf_orig

condition

\frac{MS (T)}{MS (T * R) + MS (T * C) - MS (T * R * C)}

Equation (4)

{\hat{σ}}_{τ R}^{2} > 0, {\hat{σ}}_{τ C}^{2} > 0

MS(T)/MS(T*R)

(t−1)(r−1)

{\hat{σ}}_{τ R}^{2} > 0, {\hat{σ}}_{τ C}^{2} \leq 0

MS(T)/MS(T*C)

(t−1)(c−1)

{\hat{σ}}_{τ R}^{2} \leq 0, {\hat{σ}}_{τ C}^{2} > 0

MS(T)/MS(T*R*C)

(t−1)(r−1)(c−1)

{\hat{σ}}_{τ R}^{2} \leq 0, {\hat{σ}}_{τ C}^{2} \leq 0

b) New model simplification

F_{DBM} = \frac{MS (T)}{MS (T * R) + max [MS (T * C) - MS (T * R * C), 0]}

{ddf}_{D} = {\begin{array}{l} equation (3) & {\hat{σ}}_{τ C}^{2} > 0 \\ (t - 1) (r - 1) & {\hat{σ}}_{τ C}^{2} \leq 0 \end{array}

c) New model simplification plus ddf_H

F_{DBM} = \frac{MS (T)}{MS (T * R) + max [MS (T * C) - MS (T * R * C), 0]} [same as in (b)]

{ddf}_{H} = \frac{{MS (T * R) + max [MS (T * C) - MS (T * R * C), 0]}^{2}}{\frac{MS {(T * R)}^{2}}{(t - 1) (r - 1)}}

Open in a new tab

These approaches can be used with raw, normalized, or quasi pseudovalues. See Table 6 for computational formulas for ${\tilde{σ}}_{τ R}^{2}$ and ${\tilde{σ}}_{τ C}^{2}$ .

Table 2.

Relationships between the DBM F statistics and between the DBM denominator degrees of freedom.

σ_{τ R}^{2}

σ_{τ C}^{2}

F relationship

Ddf relationship

F_orig = F_DBM

ddf_orig = ddf_D < ddf_H

≤0

F_orig ≤ F_DBM (equality iff

σ_{τ R}^{2} = 0

)

ddf_D < ddf_orig, ddf_D < ddf_H

≤0

F_orig = F_DBM

ddf_orig = ddf_D = ddf_H

≤0

F_orig ≤ F_DBM (equality iff

σ_{τ R}^{2} = 0

)

ddf_D = ddf_H < ddf_orig

Open in a new tab

These relationships are derived in Appendix A. Iff: if and only if.

Problem 3: DBM model is unsatisfactory conceptually and theoretically

The original DBM procedure does not provide a satisfactory conceptual model since the the model parameters are expressed in terms of pseudovalues rather than AUC values. The model is also unsatisfactory theoretically since it assumes that the pseudovalues are independent and normally distributed -- but they are neither. Thus, desirable statistical properties of the DBM procedure do not directly follow from the model assumptions, since the assumptions are not true; rather, the validity of the model must be determined through simulation studies.

Hillis et al [8] provide a solution to this problem by showing that the DBM procedure is equivalent to another procedure that is based on an acceptable conceptual and theoretical model. Specifically, they show that the DBM model can be viewed as a “working” model that produces the same inferences as obtained using the test×reader ANOVA model with correlated errors proposed by Obuchowski and Rockette (OR) [15, 16]. The OR model is given by

{\hat{θ}}_{i j} = \tilde{μ} + {\tilde{τ}}_{i} + R_{j} + {(τ R)}_{i j} + ε_{i j},

(13)

i=1,…,t; j=1,…,r; where θ̂_ij is the AUC estimate (or other accuracy estimate) for the ith test and jth reader, τ̃_i denotes the fixed effect of test i, R_j denotes the random effect of reader j, (τR)_ij denotes the random test×reader interaction, and ε_ij is the error term having mean zero and variance ${\tilde{σ}}_{ε}^{2}$ . The random effects R_j and (τR)_ij are assumed independent and normally distributed with zero means and variances ${\tilde{σ}}_{R}^{2}$ and ${\tilde{σ}}_{τ R}^{2}$ , respectively, and are assumed independent of the ε_ij. We use the tilde symbol “~” to distinguish OR model parameters from analogous DBM model parameters. Since the same cases are read by each reader using each test, the error terms are not assumed to be independent. Instead, equi-covariance of the errors between readers and tests is assumed, resulting in three possible covariances given by

Cov (ε_{i j}, ε_{i^{'} j^{'}}) = {\begin{array}{l} {Cov}_{1} & i \neq i^{'}, j = j^{'} (different test, same reader) \\ {Cov}_{2} & i = i^{'}, j \neq j^{'} (same test, different reader) \\ {Cov}_{3} & i \neq i^{'}, j \neq j^{'} (different test, different reader) \end{array} .

(14)

Obuchowski and Rockette [15] suggest the following ordering: Cov₁ ≥ Cov₂ ≥ Cov₃.

Conditional on the reader and test×reader effects (that is, treating readers as fixed), it follows from model (13) that Cov₁, Cov₂, and Cov₃ are also the corresponding covariances of the AUC estimates; for example, Cov₂ is the covariance between the AUCs for two fixed readers using the same test, while Cov₃ is the covariance between the AUCs for two fixed readers using different modalities.

The OR F statistic for testing for a test difference is given by

F_{OR} = \frac{MS {(T)}_{{\hat{θ}}_{i j}}}{MS {(T * R)}_{{\hat{θ}}_{i j}} + max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]},

(15)

where MS(T)_{θ̂_ij} and MS(T*R)_{θ̂_ij} are the test and test×reader mean squares corresponding to the OR model (13), and where ${\hat{Cov}}_{2}$ and ${\hat{Cov}}_{3}$ are covariance estimates; the subscript “θ̂_ij” is used here to indicate that the mean squares are computed from the AUCs rather than the pseudovalues. The quantities ${\hat{Cov}}_{2}$ and ${\hat{Cov}}_{3}$ are estimated by averaging corresponding covariance estimates for pairs of AUCs, estimated using covariance estimation methods that treat readers as fixed. For example, ${\hat{Cov}}_{2} = \frac{2}{t r (r - 1)} \sum_{i = 1}^{t} \sum_{j < j^{'}} \hat{Cov} ({\hat{θ}}_{i j}, {\hat{θ}}_{i j^{'}})$ , where $\hat{Cov} ({\hat{θ}}_{i j}, {\hat{θ}}_{i j^{'}})$ is an estimate of the covariance between AUCs for fixed readers j and j′ using test i, estimated using a fixed reader method such as bootstrapping or jackknifing.

The DBM and OR procedures are related as follows [8]. Note that the jackknife procedure provides both AUC point estimates, defined by equation (1), and covariance and variance estimates for the AUCs, as discussed in Reference [8]. The DBM and OR F statistics, F_DBM and F_OR defined by equations (9) and (15), are equal if ${\hat{Cov}}_{2}$ and ${\hat{Cov}}_{3}$ are jackknife covariance estimates and normalized pseudovalues are used with the DBM procedure. This relationship does not require any particular estimation method for the θ̂_ij in equation (13). On the other hand, if raw pseudovalues are used, then the relationship still holds if, additionally, the θ̂_ij in equation (13) are jackknife estimates. More generally, for any given AUC estimation method and any given method of estimating Cov₂ and Cov₃, F_DBM = F_OR if the DBM procedure is used with quasi pseudovalues, as defined in Reference [8]. These conditions which ensure that F_DBM = F_OR are summarized in Table 3. The appropriate ddf to use with either the DBM or OR procedure is ddf_H, given by equation (12) for the DBM procedure. In terms of the OR procedure mean squares, Reference [10] shows that ddf_H is given by

Table 3.

Conditions which result in F_DBM = F_OR as defined by equations (9) and (15)

Normalized pseudovalues are used with DBM and ${\hat{\tilde{σ}}}_{ε}^{2}, {\hat{Cov}}_{1}, {\hat{Cov}}_{2}$ and ${\hat{Cov}}_{3}$ are jackknife variance and covariance estimates.

or
Raw pseudovalues are used with DBM, ${\hat{\tilde{σ}}}_{ε}^{2}, {\hat{Cov}}_{1}, {\hat{Cov}}_{2}$ and ${\hat{Cov}}_{3}$ are jackknife variance and covariance estimates, and θ̂_ij are jackknife accuracy estimates.

or
Quasi pseudovalues are used with DBM.

Open in a new tab

Note: any one of the above conditions results in F_DBM = F_OR.

{ddf}_{H} = \frac{{MS (T * R) + max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]}^{2}}{\frac{MS {(T * R)}^{2}}{(t - 1) (r - 1)}} .

(16)

Under any of the conditions described above that result in F_DBM = F_OR, the same value for ddf_H is obtained using either equation (12) or (16), and there is a one-to-one correspondence between the DBM and OR computed quantities, as shown in Table 4.

Table 4.

Relationship between DBM and OR computed quantities.

OR computed quantity

Equivalent function of DBM computed quantities

MS(T)_{θ̂_ij}

= \frac{1}{c} MS (T)

MS(R)_{θ̂_ij}

= \frac{1}{c} MS (R)

MS(T*R)_{θ̂_ij}

= \frac{1}{c} MS (T * R)

{\hat{\tilde{σ}}}_{ε}^{2}

= \frac{1}{trc} [MS (C) + (t - 1) MS (T * C) + (r - 1) MS (R * C) + (t - 1) (r - 1) MS (T * R * C)]

{\hat{Cov}}_{1}

= \frac{1}{trc} {MS (C) - MS (T * C) + (r - 1) [MS (R * C) - MS (T * R * C)]}

{\hat{Cov}}_{2}

= \frac{1}{trc} {MS (C) - MS (R * C) + (t - 1) [MS (T * C) - MS (T * R * C)]}

{\hat{Cov}}_{3}

= \frac{1}{trc} [MS (C) - MS (T * C) - MS (R * C) + MS (T * R * C)]

DBM computed quantity

Equivalent function of OR computed quantities

MS(T)

=cMS(T)_{θ̂_ij}

MS(R)

=cMS(R)_{θ̂_ij}

MS(T*R)

=cMS(T*R)_{θ̂_ij}

MS(C)

= c [{\hat{\tilde{σ}}}_{ε}^{2} - (t - 1) {\hat{Cov}}_{1} + (r - 1) {\hat{Cov}}_{2} + (t - 1) (r - 1) {\hat{Cov}}_{3})]

MS(T*C)

= c [{\hat{\tilde{σ}}}_{ε}^{2} - {\hat{Cov}}_{1} + (r - 1) ({\hat{Cov}}_{2} - {\hat{Cov}}_{3})]

MS(R*C)

= c [{\hat{\tilde{σ}}}_{ε}^{2} + (t - 1) {\hat{Cov}}_{1} - {\hat{Cov}}_{2} - (t - 1) {\hat{Cov}}_{3})]

MS(T*R*C)

= c [{\hat{\tilde{σ}}}_{ε}^{2} - {\hat{Cov}}_{1} - {\hat{Cov}}_{2} + {\hat{Cov}}_{3}]

Open in a new tab

These relationships assume one of the three conditions given in Table 3.

The OR model is a satisfactory conceptual model since it is expressed in terms of meaningful reader-level accuracy outcomes (e.g., AUC values). In addition, the model assumptions are reasonable. The assumed independence of the reader effects follows from the independent selection of readers, and the assumption of independent test×reader interactions and equi-covariant errors allows for a fairly general covariance structure. Normality for the error terms is reasonable since typically there are many cases for each reader, and normality for the reader and test×reader effects is a typical assumption for generalizing from a sample to a population when we do not know the exact population distribution. Of course, these assumptions may not always hold, and topics for future research include the robustness of the DBM and OR procedures to violations of these assumptions and generalization of the procedures to accommodate less restrictive assumptions.

The equivalence of the DBM and OR procedures allows for interpretation of the DBM parameters in terms of the meaningful OR parameters. Table 5 shows the relationships between the DBM and OR parameters. We see that the DBM parameters μ, τ_i, $σ_{R}^{2}$ , and $σ_{τ R}^{2}$ have the same interpretation as the analogous OR parameters μ̃, τ̃_i, ${\tilde{σ}}_{R}^{2}$ and ${\tilde{σ}}_{τ R}^{2}$ , while $σ_{C}^{2}, σ_{τ C}^{2}, σ_{R C}^{2}$ and $σ_{τ R C}^{2} + σ_{ε}^{2}$ are equal to linear functions of ${\tilde{σ}}_{ε}^{2}$ , Cov₁, Cov₂ and Cov₃, and vice versa. For example, we see from Table 5 that $σ_{τ C}^{2} = c ({Cov}_{2} - {Cov}_{3})$ ; hence, setting $σ_{τ C}^{2} = 0$ , as is done with new model simplification when ${\hat{σ}}_{τ C}^{2} \leq 0$ , is equivalent to assuming that Cov₂ = Cov₃, which is a reasonable assumption. On the other hand, we see that setting $σ_{τ R}^{2} = 0$ , as is done with original DBM when ${\hat{σ}}_{τ R}^{2} \leq 0$ , is equivalent to assuming that the test×reader variance component of the OR model ( ${\tilde{σ}}_{τ R}^{2}$ ) is zero, implying that differences between tests are the same for all readers in the population. As mentioned earlier, this is an unreasonable assumption and is one reason why we no longer recommend original DBM.

Table 5.

Relationship between DBM and OR model parameters

OR model parameter

Equivalent function of DBM model parameters

μ̃

=μ

τ̃_i

=τ_i

{\tilde{σ}}_{R}^{2}

= σ_{R}^{2}

{\tilde{σ}}_{τ R}^{2}

= σ_{τ R}^{2}

{\tilde{σ}}_{ε}^{2}

= (σ_{C}^{2} + σ_{τ C}^{2} + σ_{R C}^{2} + σ_{τ R C}^{2} + σ_{ε}^{2}) / c

Cov₁

= (σ_{C}^{2} + σ_{R C}^{2}) / c

Cov₂

= (σ_{C}^{2} + σ_{τ C}^{2}) / c

Cov₃

= σ_{C}^{2} / c

DBM model parameter

Equivalent function of OR model parameters

=μ̃

τ_i

=τ̃_i

σ_{R}^{2}

= {\tilde{σ}}_{R}^{2}

σ_{τ R}^{2}

= {\tilde{σ}}_{τ R}^{2}

σ_{C}^{2}

=cCov₃

σ_{τ C}^{2}

=c(Cov₂ − Cov₃)

σ_{R C}^{2}

=c(Cov₁ − Cov₃)

σ_{τ R C}^{2} + σ_{ε}^{2}

= c ({\tilde{σ}}_{ε}^{2} - {Cov}_{1} - {Cov}_{2} + {Cov}_{3})

Open in a new tab

These relationships assume that the constraints for the OR model parameters are those implied by the DBM model: $σ_{ε}^{2} \geq {Cov}_{1} + {Cov}_{2} - {Cov}_{3}$ , Cov₁≥Cov₃, Cov₂≥Cov₃, and Cov₃≥0. They also assume the same linear constraint for the τ_i (e.g., Στ_i = 0) for both models and that either (1) normalized or quasi pseudovalues are used; or (2) if raw pseudovalues are used, then the OR model outcome is the jackknife accuracy estimate.

Note: Adapted and reprinted, with permission, from Reference [8]

Other examples of interpreting functions of OR parameters are the following. The expected accuracy measure across readers for the ith test is given by μ+ τ_i; the variance of the inherent (or latent) reader accuracy measure is given by ${\tilde{σ}}_{R}^{2} + {\tilde{σ}}_{τ R}^{2}$ , with ${\tilde{σ}}_{R}^{2}$ denoting the component due to the main effect of readers and ${\tilde{σ}}_{τ R}^{2}$ the component due to test×reader interaction; the variance of the reader accuracy measure estimate is given by ${\tilde{σ}}_{R}^{2} + {\tilde{σ}}_{τ R}^{2} + {\tilde{σ}}_{ε}^{2}$ ; and the measurement error variance that is attributable to cases and within-reader variability that describes how a reader interprets the same image in different ways on different occasions is given by ${\tilde{σ}}_{ε}^{2}$ . The interpretations of Cov₁, Cov₂ and Cov₃ have been discussed earlier. Various correlations are functions of the parameters. For example, define $ρ_{BR} = {Cov}_{2} / ({\tilde{σ}}_{R}^{2} + {\tilde{σ}}_{τ R}^{2} + {\tilde{σ}}_{ε}^{2})$ and $ρ_{BR ∣ readers} = {Cov}_{2} / {\tilde{σ}}_{ε}^{2}$ ; then ρ_BR is the correlation between AUC estimates for two different readers using the same test, and ρ_BR|readers is the analogous correlation but treating readers as fixed. See Appendix B for derivations of these last two correlations.

Formulas for computing the DBM variance components are presented in Table 6. Estimates for the OR variance components and covariances result from using Table 5 with the DBM variance components replaced by their estimates.

Table 6.

ANOVA estimates for DBM variance components

DBM model parameter

Estimate

σ_{R}^{2}

\frac{1}{t c} [MS (R) - MS (T * R) - MS (R * C) + MS (T * R * C)]

σ_{C}^{2}

\frac{1}{t r} [MS (C) - MS (T * C) - MS (R * C) + MS (T * R * C)]

σ_{τ R}^{2}

\frac{1}{c} [MS (T * R) - MS (T * R * C)]

σ_{τ C}^{2}

\frac{1}{r} [MS (T * C) - MS (T * R * C)]

σ_{R C}^{2}

\frac{1}{t} [MS (R * C) - MS (T * R * C)]

σ_{τ R C}^{2} + σ_{ε}^{2}

MS(T*R*C)

Open in a new tab

Note: These estimates, except for the last, can be negative.

Summary of related papers

The relationship between the DBM and OR methods is described by Hillis et al [8]. They generalize the DBM method, using new model simplification, to include the use of normalized and quasi pseudovalues and determine the conditions under which the DBM and OR methods produce equal test statistics. They also show how the DBM method can be used when readers are treated as fixed and show the relationship between the DBM and OR methods for fixed readers. Hillis and Berbaum [9] show empirically that new model simplification performs better than original DBM, as well as showing that use of normalized pseudovalues has little effect on the type I error compared to raw pseudovalues. Hillis [10] derives ddf_H for both the DBM and OR procedures and empirically shows that new model simplification plus ddf_H performs better than new model simplification. Hillis and Berbaum [14] show how to compute the power for the DBM method using new model simplification; updated power software using new model simplification plus ddf_H can be downloaded from http://perception.radiology.uiowa.edu

Results

Simulation Study

In a simulation study we examined the performance of the three DBM approaches –original DBM, new model simplification, and new model simplification plus ddf_H – with respect to the empirical type I error rate for testing the null hypothesis of no test effect. The simulation model of Roe and Metz [2] provided continuous decision-variable outcomes generated from a conventional binormal model that treats both cases and readers as random. We used this simulation model to create discrete rating data by computer simulation. The discrete rating data, taking integer values from one to five, were created by transforming the continuous outcomes using the cutpoints reported by Dorfman et al [3]. The combinations of reader and case sample sizes, AUC values, and variance components were the same as those used in Roe and Metz [2] and Dorfman et al [3]. Briefly, rating data were simulated for 144 combinations of three reader-sample sizes (readers = 3, 5, and 10); four case sample sizes (10+/90−, 25+/25−, 50+/50−, and 100+/100−, where “+” indicates a diseased case and “−” indicates a normal case); three AUC values (AUC = 0.702, 0.855, and 0.961) that describe the separation between the normal and diseased case populations, averaged across readers; and four combinations of reader and case variance components. Two thousand samples were generated for each of the 144 combinations; within each simulation, all Monte Carlo readers read the same cases for each of two equal tests.

The data from each simulated sample were analyzed by all three approaches. Both maximum likelihood (semiparametric) estimation assuming a latent binormal model [17, 18] and the trapezoidal-rule (nonparametric) method were used to estimate AUC from the 5-category discrete rating data. Analyses that employed semiparametric AUC estimation were performed using both raw and normalized pseudovalues, while for nonparametric AUC estimation no distinction was made since raw and normalized pseudovalues produce the same AUC estimates. For each of the 144 combinations, the empirical type I error rate was taken as the proportion of samples for which the null hypothesis was rejected at the alpha = 0.05 level. Data simulation was performed using the IML procedure in SAS [19]. The semiparametric AUC pseudovalues were computed using a dynamic link library (DLL), written in Fortran 90 by Don Dorfman and Kevin Schartz, that was accessed from within the IML procedure; this DLL, as well as a SAS macro that performs the different analyses used in this paper, can be downloaded from http://perception.radiology.uiowa.edu.

From the results, summarized in Tables 7 and 8, we draw the following conclusions. (1) New model simplification plus ddf_H has the mean empirical type I error rate closest to the nominal.05 level: 0.051 (raw pseudovalues) and 0.049 (normalized pseudovalues) for semiparametric estimation, and 0.053 for nonparametric estimation. (2) Original DBM has the most conservative type I error rates: 0.036 (raw and normalized pseudovalues) for semiparametric estimation and 0.041 for nonparametric estimation. (3) New model simplification gives type I error rates midway between those obtained from the other two approaches. (4) With semiparametric estimation, the mean type I error rates for raw and normalized pseudovalues differ only slightly for each approach. (5) New model simplification confidence intervals can be extremely wide, due to a small proportion of proportion of samples where ddf_D approaches zero [10]. We note that new model simplification plus ddf_H does not have this problem, since ddf_H is bounded below by (t−1)(r−1). (6) For semiparametric estimation using either original DBM or new model simplification plus ddf_H, normalized pseudovalue confidence interval widths are 4% smaller, on average, than those for raw pseudovalues, For new model simplification the confidence interval widths are 40% smaller, although here outliers are affecting the results as noted above. These results suggest that the original AUC estimator has more precision and power for semiparametric estimation than the jackknife AUC estimator.

Table 7.

Semiparametric estimation results of the simulation study for discrete rating data.

	Type I error rates
Approach	Pseudovalues	N	Mean	Min	Max	Range	SD	CI width mean
Original	raw	144	0.036	0.009	0.063	0.054	0.0124	0.196
	normalized	144	0.036	0.011	0.062	0.052	0.0111	0.188
New	raw	144	0.042	0.011	0.070	0.060	0.0123	4.05E+121
	normalized	144	0.043	0.017	0.067	0.050	0.0108	2.74E+121
New plus ddf_H	raw	144	0.049	0.016	0.075	0.060	0.0124	0.192
	normalized	144	0.051	0.025	0.077	0.052	0.0105	0.184

Open in a new tab

Original: original DBM; New: new model simplification; New plus ddf_H: new model simplification plus ddf_H; Min: minimum; Max: maximum; SD: standard deviation; CI width: width of a 95% confidence interval for the difference of the AUC estimates.

Table 8.

Nonparametric estimation results of the simulation study for discrete rating data.

	Type I error rates
Approach	N	Mean	Min	Max	Range	SD	CI width mean
original	144	0.041	0.014	0.069	0.055	0.0098	0.177
new	144	0.046	0.024	0.072	0.049	0.0100	4.55E+121
new plus ddf_H	144	0.053	0.029	0.079	0.050	0.0097	0.174

Open in a new tab

No distinction is made between raw and normalized pseudovalues since the trapezoid estimate is the same for either type of pseudovalues. Original: original DBM; new: new model simplification; new plus ddf_H: new model simplification plus ddf_H; min: minimum; max: maximum; SD: standard deviation; CI width: width of a 95% confidence interval for the difference of the AUC estimates.

Example 1: Spin-Echo versus CINE MRI for Detection of Aortic Dissection

The data for this example were provided by Carolyn Van Dyke, MD, who had obtained them in a study [20] that compared the relative performance of single Spin-Echo Magnetic Resonance Imaging (SE MRI) and CINE MRI in detecting thoracic aortic dissection. There were 45 patients with an aortic dissection and 69 patients without a dissection imaged with both SE MRI and CINE MRI. Five radiologists independently interpreted all of the images using a 5-point ordinal scale.

Table 9 presents the analysis results for raw and normalized pseudovalues obtained with semiparametric AUC estimation. We note that the jackknife and original semiparametric AUC estimates are similar, so there is little difference in the population estimates: the test AUC estimates based on the raw pseudovalues are.920 for CINE and.951 for Spin Echo, whereas the estimates based on normalized pseudovalues are.911 for CINE and.952 for Spin Echo. Since ${\hat{σ}}_{τ R}^{2} > 0$ for both types of pseudovalues, both original DBM and new model simplification yield the same results. For the normalized pseudovalues, F_orig = F_DBM = 2.619, ddf_orig = ddf_DBM = 10.31 and p = 0.1358 in assessing the difference in AUC. (We note that results for this and the following example differ slightly from those in References [8, 9, 14] because we have used an updated AUC algorithm). From equation (12) we have ddf_H = 10.99, resulting in p = 0.1339 with new model simplification plus ddf_H. Hence, the latter approach produces a slightly more significant result, illustrating a point made earlier: if ${\hat{σ}}_{τ C}^{2} > 0$ , then new model simplification plus ddf_H will yield a more significant result than new model simplification, since ddf_H > ddf_DBM. We note that the raw pseudovalues analysis produced less significant results, with p =.2579 for new model simplification and p =.2563 for new model simplification plus ddf_H.

Table 9.

DBM procedure analyses for Van Dyke et al [20] data

Semiparametric and corresponding jackknife AUC estimates:
	test
	1 (CINE)		2 (Spin Echo)

reader (j)	θ̂₁_j (semiparametric)	Y₁_j_· (jackknife)	θ̂₂_j (semiparametric)	Y₂_j· (jackknife)
1	0.933	0.947	0.951	0.950
2	0.890	0.909	0.935	0.933
3	0.929	0.929	0.928	0.928
4	0.970	0.981	1.000	0.999
5	0.833	0.836	0.945	0.943

	θ̂_1· =.911	Y_1·· =.920	θ̂_2· =.952	Y_2·· =.951

ANOVA table:
Source	ddf	Raw pseudovalue mean square	Normalized pseudovalue mean square
T	1	0.264166	0.468996
R	4	0.315637	0.297310
C	113	0.392538	0.392538
T×R	4	0.112560	0.108062
T×C	113	0.143095	0.143095
R×C	452	0.098771	0.098771
T×R×C	452	0.072068	0.072068

Open in a new tab

T: tests; R: readers; C: cases.

Raw pseudovalues results:

Original DBM: F_orig = 1.439, ddf_orig = 10.03, p = 0.2579

New model simplification: F_DBM = 1.439, ddf_D = 10.03, p = 0.2579

New model simplification plus ddf_H: F_DBM = 1.439, ddf_H = 10.64, p = 0.2563

Normalized pseudovalues results:

Original DBM: F_orig = 2.619, ddf_orig = 10.31, p = 0.1358

New model simplification: F_DBM = 2.619, ddf_D = 10.31, p = 0.1358

New model simplification plus ddf_H: F_DBM = 2.619, ddf_H = 10.99, p = 0.1339

Table 10 presents the DBM and OR variance components obtained on the basis of normalized pseudovalues. The DBM variance components were computed using the equations in Table 6, whereas the OR variance components and covariances were computed by replacing the DBM variance components in Table 5 with their estimates. The OR parameter estimates allow us to make statements such as the following about the variability in the reader-level AUC outcomes. The estimated variance of the inherent reader accuracy measures is ${\hat{\tilde{σ}}}_{R}^{2} + {\hat{\tilde{σ}}}_{τ R}^{2} = 0.000713 + 0.000316 = 0.001029$ ; thus, we estimate that, with probability.95, the inherent (or latent) AUC of a randomly selected reader lies within $1.96 \sqrt{0.001029} = .063$ of the population test AUC. The estimated variance of the observed reader accuracy measures is ${\hat{\tilde{σ}}}_{R}^{2} + {\hat{\tilde{σ}}}_{τ R}^{2} + {\hat{\tilde{σ}}}_{ε}^{2} = 0.001029 + 0.001069 = 0.002098$ . The estimated measurement error variance due to cases and within-reader variability is ${\hat{\tilde{σ}}}_{ε}^{2} = 0.001069$ . The estimated correlation between observed AUC values for a randomly selected reader reading the same cases in different modalities is given by ${\hat{ρ}}_{BR} = {\hat{Cov}}_{2} / ({\tilde{σ}}_{R}^{2} + {\tilde{σ}}_{τ R}^{2} + {\hat{\tilde{σ}}}_{ε}^{2}) = 0.000320 / 0.002098 = 0.153$ , and the analogous correlation for a given (or fixed) reader is ${\hat{ρ}}_{BR ∣ R, τ R} = {\hat{Cov}}_{2} / {\hat{\tilde{σ}}}_{ε}^{2} = 0.000320 / 0.001069 = 0.300$ .

Table 10.

Variance component estimates for Van Dyke et al [20] data based on normalized pseudovalues

DBM

Variance component

Estimate

Variance component

Estimate

σ_{R}^{2}

0.000713

{\tilde{σ}}_{R}^{2}

0.000713

σ_{τ R}^{2}

0.000316

{\tilde{σ}}_{τ R}^{2}

0.000316

σ_{C}^{2}

0.022274

Cov₁

0.000313

σ_{τ C}^{2}

0.014205

Cov₂

0.000320

σ_{R C}^{2}

0.013351

Cov₃

0.000195

σ_{τ R C}^{2} + σ_{ε}^{2}

0.072068

{\tilde{σ}}_{ε}^{2}

0.001069

Open in a new tab

Example 2: Picture archiving communication system versus plain film interpretation of neonatal examinations

Franken et al [21] compared the diagnostic accuracy of interpreting clinical neonatal radiographs using a picture archiving and communication system (PACS) workstation versus plain film. The case sample consisted of 100 chest or abdominal radiographs (67 abnormal and 33 normal). The readers were four radiologists with considerable experience in interpreting neonatal examinations. The readers indicated whether each patient had normal or abnormal findings and their degree of confidence in this judgment using a five-point ordinal scale.

Table 11 presents the ANOVA tables for the raw and normalized pseudovalues using semiparametric AUC estimation. For either type of pseudovalue we have MS(T*R) < MS(T*R*C) and MS(T*C) < MS(T*R*C); thus ${\hat{σ}}_{τ R}^{2} < 0$ and ${\hat{σ}}_{τ C}^{2} < 0$ from equation (5). Hence for original DBM we assume $σ_{τ R}^{2} = σ_{τ C}^{2} = 0$ and use MS(T*R*C) as the denominator for F_orig with ddf_orig = (t−1)(r−1)(c−1) = 297; in contrast, for new model simplification and new model simplification plus ddf_H we only assume $σ_{τ C}^{2} = 0$ and use MS(T*R) as the denominator for F_DBM with ddf_D =ddf_H = (t−1)(r−1) = 3. Using the normalized pseudovalues with original DBM yields F_orig = 0.796, ddf_orig = 297 and p = 0.3729, while new model simplification and new model simplification plus ddf_H yield F_DBM = 8.888, ddf_D = ddf_H = 3 and p = 0.0585. The raw pseudovalues analysis produces less significant results, with p = 0.0647 for both new model simplification and new model simplification plus ddf_H.

Table 11.

DBM procedure analyses for Franken et al [21] data.

ANOVA table:
Source	ddf	Raw pseudovalue Mean square	Normalized pseudovalue Mean square
T	1	0.063574	0.066606
R	3	0.088782	0.097686
C	99	0.547734	0.547734
T×R	3	0.007781	0.007494
T×C	99	0.078071	0.078071
R×C	297	0.127582	0.127582
T×R×C	297	0.083643	0.083643

Open in a new tab

T: tests; R: readers; C: cases.

Raw pseudovalues results:

Original DBM: F_orig = 0.760, ddf_orig = 297, p = 0.3840

New model simplification: F_DBM = 8.171, ddf_D = 3, p = 0.0647

New model simplification plus ddf_H: F_DBM = 8.171, ddf_H = 3, p = 0.0647

Normalized pseudovalues results:

Original DBM: F_orig = 0.796, ddf_orig = 297, p = 0.3729

New model simplification: F_DBM = 8.888, ddf_D = 3, p = 0.0585

New model simplification plus ddf_H: F_DBM = 8.888, ddf_H = 3, p = 0.0585

Discussion

We have summarized recently proposed solutions for the various drawbacks of the original DBM method and examined the performance of these solutions in a simulation study. The solutions include using normalized pseudovalues which allow DBM results to be based on either the original or the jackknife accuracy estimates; using less data-based model reduction and ddf_H to make DBM less conservative with a type I error rate much closer to the nominal level; and showing that the DBM model can be viewed as a “working” model that produces the same inferences as obtained using the acceptable conceptual and theoretical OR model. This last solution is especially important, since it establishes a solid theoretical justification for using DBM, allows us to make meaningful statements about the variability and covariances of the accuracy estimates by computing OR model parameter estimates from the DBM model parameter estimates, and allows for generalization in future research. Thus we recommend the revised DBM procedure (“new model simplification plus ddf_H”) that incorporates these recent developments. Stand-alone software as well as a SAS macro that incorporates these modifications are available to the public [22–24].

The DBM and OR approaches complement each other. We can think of each approach as consisting of a model and a procedure, where procedure denotes the computational algorithm steps and model denotes the statistical model used to motivate the procedure and justify inferences. The OR model is conceptually and theoretically more acceptable. However, the DBM procedure is easier to implement, because after computing the pseudovalues (for each test-reader combination) the F statistic is easily obtained by subjecting the pseudovalues to a conventional 3-way ANOVA analysis. Furthermore, the DBM model, though not statistically acceptable, makes the DBM procedure easier to initially comprehend, especially for users without an extensive statistical background.

Finally, we note that the choice between using the original or corresponding jackknife AUC estimator should depend on which estimator has superior performance properties. For the trapezoidal method AUC this is not an issue, since the original and jackknife estimates are equal; however, for semiparametric estimation our simulation study and examples (both examples had a smaller p value using normalized pseudovalues) suggest that the original estimator has higher precision and power.

Acknowledgments

The authors thank Carolyn Van Dyke, M.D. for sharing her data set. This research was supported by the National Institutes of Health, grant R01EB000863. The views expressed in this article are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs.

Grant support: This research was supported by the National Institutes of Health, grant R01EB000863.

APPENDIX A

In this section we derive the relationships given in Table 2 between F_orig and F_DBM, as defined by equations (6) and (8), respectively, and between ddf_orig, ddf_D, and ddf_H, as defined by equations (7), (10), and (11), respectively. We do this for the four possible situations corresponding to the test-by-reader and test-by-case variance component estimates being either positive or nonpositive. We make the reasonable assumptions that none of the mean squares are zero (and hence must be positive) and that the number of cases exceeds two (c>2).

First we derive the relationship between ddf_D and ddf_H. If ${\hat{σ}}_{τ C}^{2} > 0$ then

\begin{array}{l} {ddf}_{D} = \frac{{[MS (T * R) + MS (T * C) - MS (T * R * C)]}^{2}}{\frac{MS {(T * R)}^{2}}{(t - 1) (r - 1)} + \underset{> 0}{\underset{︸}{\frac{MS {(T * C)}^{2}}{(t - 1) (c - 1)} + \frac{MS {(T * R * C)}^{2}}{(t - 1) (r - 1) (c - 1)}}}} \\ < \frac{{MS (T * R) + MS (T * C) - MS (T * R * C)}^{2}}{MS {(T * R)}^{2} / [(t - 1) (r - 1)]} = {ddf}_{H} . \end{array}

If ${\hat{σ}}_{τ C}^{2} \leq 0$ then

{ddf}_{D} = (t - 1) (r - 1) = {ddf}_{H} .

Thus ddf_D < ddf_H if ${\hat{σ}}_{τ C}^{2} > 0$ and ddf_D = ddf_H if ${\hat{σ}}_{τ C}^{2} \leq 0$ . These relationships hold regardless of the value of ${\hat{σ}}_{τ R}^{2}$ . Now we consider each of the four situations separately for the other relationships.

Situation 1

${\hat{σ}}_{τ R}^{2} > 0, {\hat{σ}}_{τ C}^{2} > 0$ . For this situation we have

\begin{array}{l} F_{orig} = F_{DBM} = \frac{MS (T)}{MS (T * R) + MS (T * C) - MS (T * R * C)} \\ {ddf}_{orig} = {ddf}_{D} = \frac{{[MS (T * R) + MS (T * C) - MS (T * R * C)]}^{2}}{\frac{MS {(T * R)}^{2}}{(t - 1) (r - 1)} + \frac{MS {(T * C)}^{2}}{(t - 1) (c - 1)} + \frac{MS {(T * R * C)}^{2}}{(t - 1) (r - 1) (c - 1)}} . \end{array}

Situation 2

${\hat{σ}}_{τ R}^{2} \leq 0, {\hat{σ}}_{τ C}^{2} > 0$ . From equation (5) we have $c {\hat{σ}}_{τ R}^{2} = MS (T * R) - MS (T * R * C)$ . Hence

F_{orig} = \frac{MS (T)}{MS (T * C)} \leq \frac{MS (T)}{\underset{c {\hat{σ}}_{τ R}^{2} \leq 0}{\underset{︸}{MS (T * R) - MS (T * R * C)}} + MS (T * C)} = F_{DBM},

with F_orig = F_DBM if and only if ${\hat{σ}}_{τ R}^{2} = 0$ . Also,

\begin{array}{l} {ddf}_{D} = \frac{{[\underset{c {\hat{σ}}_{τ R}^{2} \leq 0}{\underset{︸}{MS (T * R) - MS (T * R * C)}} + MS (T * C)]}^{2}}{\frac{MS {(T * R)}^{2}}{(t - 1) (r - 1)} + \frac{MS {(T * C)}^{2}}{(t - 1) (c - 1)} + \frac{MS {(T * R * C)}^{2}}{(t - 1) (r - 1) (c - 1)}} \\ \leq \frac{MS {(T * C)}^{2}}{\frac{MS {(T * R)}^{2}}{(t - 1) (r - 1)} + \frac{MS {(T * C)}^{2}}{(t - 1) (c - 1)} + \frac{MS {(T * R * C)}^{2}}{(t - 1) (r - 1) (c - 1)}} \\ < \frac{MS {(T * C)}^{2}}{\frac{MS {(T * C)}^{2}}{(t - 1) (c - 1)}} = (t - 1) (c - 1) = {ddf}_{orig} . \end{array}

That is, ddf_D<ddf_orig. In the proof we have utilized the relationship MS(T*R) −MS(T*R*C) + MS(T*C)>0, since from equation (5) we have $MS (T * C) - MS (T * R * C) = r {\hat{σ}}_{τ C}^{2} > 0$ .

Situation 3

${\hat{σ}}_{τ R}^{2} > 0, {\hat{σ}}_{τ C}^{2} \leq 0$ . For this situation we have

\begin{array}{l} F_{orig} = F_{DBM} = \frac{MS (T)}{MS (T * R)} \\ {ddf}_{orig} = {ddf}_{D} = (t - 1) (r - 1) \end{array}

Situation 4

${\hat{σ}}_{τ R}^{2} \leq 0, {\hat{σ}}_{τ C}^{2} \leq 0$ . From equation (5) it follows that MS(T*R)≤MS(T*R*C), with equality if and only if ${\hat{σ}}_{τ R}^{2} = 0$ . Thus

F_{orig} = \frac{MS (T)}{MS (T * R * C)} \leq \frac{MS (T)}{MS (T * R)} = F_{DBM},

with equality if and only if ${\hat{σ}}_{τ R}^{2} = 0$ . Also,

{ddf}_{orig} = (t - 1) (r - 1) (c - 1) > (t - 1) (r - 1) = {ddf}_{DBM} .

Note that we require the assumption that c>2 for this last relationship.

APPENDIX B

In this section we show how to derive AUC correlations assuming the OR model (13). Let ${\hat{AUC}}_{i j}$ and ${\hat{AUC}}_{i^{'} j^{'}}$ denote two AUC estimates, with the first subscript denoting test and the second reader. Their correlation is defined by

Corr ({\hat{AUC}}_{i j}, {\hat{AUC}}_{i^{'} j^{'}}) = \frac{Cov ({\hat{AUC}}_{i j}, {\hat{AUC}}_{i^{'} j^{'}})}{\sqrt{Var ({\hat{AUC}}_{i j}) Var ({\hat{AUC}}_{i^{'} j^{'}})}}

where $Cov ({\hat{AUC}}_{i j}, {\hat{AUC}}_{i^{'} j^{'}})$ is the covariance. To find the covariance and variances, we write ${\hat{AUC}}_{i j}$ and ${\hat{AUC}}_{i^{'} j^{'}}$ as functions of random and fixed effects using the OR model (13). It follows from well known statistical properties that the variance for each AUC estimate is the sum of the OR model variance components corresponding to the random effects, and $Cov ({\hat{AUC}}_{i j}, {\hat{AUC}}_{i^{'} j^{'}})$ is the sum of the variance components corresponding to the reader or test×reader random effects that the AUC estimates have in common (i.e., they have the same subscript values for each AUC estimate), plus the covariance between the error terms.

For example, the between-reader correlation between AUC estimates for two different readers using the same test is given by

ρ_{BR} = Corr ({\hat{AUC}}_{i j}, {\hat{AUC}}_{i j^{'}}) = \frac{Cov ({\hat{AUC}}_{i j}, {\hat{AUC}}_{i j^{'}})}{\sqrt{Var ({\hat{AUC}}_{i j}) Var ({\hat{AUC}}_{i j^{'}})}}

(17)

where j≠j′. From equation (13), with ${\hat{AUC}}_{i j}$ taking the place of θ̂_ij, we have

\begin{array}{l} {\hat{AUC}}_{i j} = \tilde{μ} + {\tilde{τ}}_{i} + R_{j} + {(τ R)}_{i j} + ε_{i j} \\ {\hat{AUC}}_{i j^{'}} = \tilde{μ} + {\tilde{τ}}_{i} + R_{j^{'}} + {(τ R)}_{i j^{'}} + ε_{i j^{'}} . \end{array}

(18)

Each AUC estimate has the same variance, equal to the sum of all of the variance components corresponding to the random effects; that is,

Var ({\hat{AUC}}_{i j}) = Var ({\hat{AUC}}_{i j^{'}}) = {\tilde{σ}}_{R}^{2} + {\tilde{σ}}_{τ R}^{2} + {\tilde{σ}}_{ε}^{2} .

Examination of equations (18) shows that the AUCs do not have any reader or test×reader random effects in common since j≠j′. Thus the covariance is equal to Cov₂, the covariance between the error terms for different readers using the same test:

Cov ({\hat{AUC}}_{i j}, {\hat{AUC}}_{i j^{'}}) = {Cov}_{2} .

(19)

It follows from equations (17), (18) and (19) that

ρ_{BR} = \frac{{Cov}_{2}}{{\tilde{σ}}_{R}^{2} + {\tilde{σ}}_{τ R}^{2} + {\tilde{σ}}_{ε}^{2}} .

Now we derive the between-reader correlation between AUC estimates for two different readers using the same test, but this time treating readers as fixed. In this case the correlation is a measure of the association between the deviation of one reader’s AUC estimate from that reader’s underlying AUC, due to case variation and reader error, with the deviation of the other reader’s AUC estimate from that reader’s underlying AUC. In contrast, ρ_BR is a measure of association between deviations of randomly chosen readers’ AUC estimates from the reader population AUC.

To derive this correlation we treat the reader and test×reader effects as fixed in model (13) by conditioning on them; thus these effects do not have corresponding variance components, but rather are treated like constants. We denote this correlation by ρ_BR|readers to indicate that it is for two fixed readers. The correlation is defined as before, except now the covariance and variances are conditional on the reader and test×reader random effects:

\begin{array}{l} ρ_{BR ∣ readers} = Corr ({\hat{AUC}}_{i j}, {\hat{AUC}}_{i j^{'}} ∣ R_{i}, {(τ R)}_{i j}, {(τ R)}_{i j^{'}}) \\ = \frac{Cov ({\hat{AUC}}_{i j}, {\hat{AUC}}_{i j^{'}} ∣ R_{i}, {(τ R)}_{i j}, {(τ R)}_{i j^{'}})}{\sqrt{Var ({\hat{AUC}}_{i j} ∣ R_{i}, {(τ R)}_{i j}, {(τ R)}_{i j^{'}}) Var ({\hat{AUC}}_{i j^{'}} ∣ R_{i}, {(τ R)}_{i j}, {(τ R)}_{i j^{'}})}} \end{array}

(20)

When we condition on the reader and test×reader random effects, the only random effects in equations (18) are the error terms. Thus each AUC has the same variance, equal to ${\tilde{σ}}_{ε}^{2}$ :

Var ({\hat{AUC}}_{i j} ∣ R_{i}, {(τ R)}_{i j}) = Var ({\hat{AUC}}_{i j^{'}} ∣ R_{i}, {(τ R)}_{i j}) = {\tilde{σ}}_{ε}^{2} .

(21)

Similarly, the covariance is equal to Cov₂, the covariance between the error terms:

Cov ({\hat{AUC}}_{i j}, {\hat{AUC}}_{i j^{'}} ∣ R_{i}, {(τ R)}_{i j}, {(τ R)}_{i j^{'}}) = {Cov}_{2}

(22)

It follows from equations (20), (21) and (22) that

ρ_{BR ∣ readers} = \frac{{Cov}_{2}}{{\tilde{σ}}_{ε}^{2}} .

These correlations can be written in terms of the DBM model parameters using the relationships in Table 5. For example, since ${Cov}_{2} = (σ_{C}^{2} + σ_{τ C}^{2})$ and ${\tilde{σ}}_{ε}^{2} = σ_{C}^{2} + σ_{τ C}^{2} + σ_{R C}^{2} + σ_{tRC}^{2} + σ_{ε}^{2}$ , where $σ_{C}^{2}, σ_{τ C}^{2}, σ_{R C}^{2}, σ_{τ R C}^{2}$ and $σ_{ε}^{2}$ denote the DBM model variance components, then $ρ_{BR ∣ readers} = (σ_{C}^{2} + σ_{τ C}^{2}) / (σ_{C}^{2} + σ_{τ C}^{2} + σ_{R C}^{2} + σ_{τ R C}^{2} + σ_{ε}^{2})$ in terms of the DBM variance components. This last expression is also given in equation (4) of Reference [2].

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Stephen L. Hillis, Center for Research in the Implementation of Innovative Strategies in Practice (CRIISP) Iowa City VA Medical Center, Iowa City, IA, U.S.A. Department of Biostatistics, University of Iowa, Iowa City, IA, U.S.A

Kevin S. Berbaum, Department of Radiology, University of Iowa, Iowa City, IA, U.S.A

Charles E. Metz, Department of Radiology, University of Chicago Medical Center, Chicago, IL, U.S.A

References

1.Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Investigative Radiology. 1992;27:723–731. [PubMed] [Google Scholar]
2.Roe CA, Metz CE. Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation. Academic Radiology. 1997;4:298–303. doi: 10.1016/s1076-6332(97)80032-3. [DOI] [PubMed] [Google Scholar]
3.Dorfman DD, Berbaum KS, Lenth RV, Chen YF, Donaghy BA. Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: factorial experimental design. Academic Radiology. 1998;5:591–602. doi: 10.1016/s1076-6332(98)80294-8. [DOI] [PubMed] [Google Scholar]
4.Quenoille MH. Approximate tests of correlation in time series. Journal of the Royal Statistical Society, Series B. 1949;11:68–84. [Google Scholar]
5.Quenoille MH. Notes on bias in estimation. Biometrika. 1956;43:353–360. [Google Scholar]
6.Tukey JW. Bias and confidence in not quite large samples (abstract) Annals of Mathematical Statistics. 1958;29:614. [Google Scholar]
7.Berbaum KS. God, like the devil, is in the details. Academic Radiology. 2006;13:1311–1316. doi: 10.1016/j.acra.2006.09.053. [DOI] [PubMed] [Google Scholar]
8.Hillis SL, Obuchowski NA, Schartz KM, Berbaum KS. A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette Methods for receiver operating characteristic (ROC) data. Statistics in Medicine. 2005;24:1579–1607. doi: 10.1002/sim.2024. [DOI] [PubMed] [Google Scholar]
9.Hillis SL, Berbaum KS. Monte Carlo validation of the Dorfman-Berbaum-Metz method using normalized pseudovalues and less data-based model simplification. Academic Radiology. 2005;12:1534–1542. doi: 10.1016/j.acra.2005.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Hillis SL. A comparison of denominator degrees of freedom methods for multiple observer ROC analysis. Statistics in Medicine. 2007;26:596–619. doi: 10.1002/sim.2532. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Satterthwaite FE. Synthesis of variance. Psychometrika. 1941;6:309–316. [Google Scholar]
12.Satterthwaite FE. An approximate distribution of estimates of variance components. Biometric Bulletin. 1946;2:110–114. [PubMed] [Google Scholar]
13.Hanley JA, Mcneil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) Curve. Radiology. 1982;143:29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]
14.Hillis SL, Berbaum KS. Power estimation for the Dorfman-Berbaum-Metz method. Academic Radiology. 2004;11:1260–1273. doi: 10.1016/j.acra.2004.08.009. [DOI] [PubMed] [Google Scholar]
15.Obuchowski NA, Rockette HE. Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: an ANOVA approach with dependent observations. Communications in Statistics-Simulation and Computation. 1995;24:285–308. [Google Scholar]
16.Obuchowski NA. Multireader, multimodality receiver operating characteristic curve studies: hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. Academic Radiology. 1995;2(Suppl 1):S22–S29. [PubMed] [Google Scholar]
17.Dorfman DD, Alf E., Jr Maximum likelihood estimation of parameters of signal-detection theory and determination of confidence intervals: rating method data. Journal of Mathematical Psychology. 1969;6:487–496. [Google Scholar]
18.Dorfman DD, RSCORE II. In: Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. Swets JA, Pickett RM, editors. Academic Press; San Diego, CA: 1982. pp. 212–232. [Google Scholar]
19.The SAS System for Windows, Version 9.1. SAS Institute Inc; Cary, NC: 2002. [Google Scholar]
20.Van Dyke CW, White RD, Obuchowski NA, Geisinger MA, Lorig RJ, Meziane MA. Cine MRI in the diagnosis of thoracic aortic dissection; 79th RSNA Meetings; Chicago, IL. 1993. [Google Scholar]
21.Franken EA, Jr, Berbaum KS, Marley SM, Smith WL, Sato Y, Kao SC, Milam SG. Evaluation of a digital workstation for interpreting neonatal examinations: a receiver operating characteristic study. Invest Radiol. 1992;27:732–737. doi: 10.1097/00004424-199209000-00016. [DOI] [PubMed] [Google Scholar]
22.Berbaum KS, Schartz KM, Pesce LL, Hillis SL. DBM MRMC 2.1 (Computer software) 2006 Available for download from http://perception.radiology.uiowa.edu.
23.Berbaum KS, Metz CE, Pesce LL, Schartz KM. DBM MRMC 2.1 User’s Guide (Software manual) 2006 Available for download from http://perception.radiology.uiowa.edu.
24.Hillis SL, Schartz KM, Pesce LL, Berbaum KS. DBM MRMC 2.1 for SAS (Computer software) 2007 Available for download from http://perception.radiology.uiowa.edu.

[R1] 1.Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Investigative Radiology. 1992;27:723–731. [PubMed] [Google Scholar]

[R2] 2.Roe CA, Metz CE. Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation. Academic Radiology. 1997;4:298–303. doi: 10.1016/s1076-6332(97)80032-3. [DOI] [PubMed] [Google Scholar]

[R3] 3.Dorfman DD, Berbaum KS, Lenth RV, Chen YF, Donaghy BA. Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: factorial experimental design. Academic Radiology. 1998;5:591–602. doi: 10.1016/s1076-6332(98)80294-8. [DOI] [PubMed] [Google Scholar]

[R4] 4.Quenoille MH. Approximate tests of correlation in time series. Journal of the Royal Statistical Society, Series B. 1949;11:68–84. [Google Scholar]

[R5] 5.Quenoille MH. Notes on bias in estimation. Biometrika. 1956;43:353–360. [Google Scholar]

[R6] 6.Tukey JW. Bias and confidence in not quite large samples (abstract) Annals of Mathematical Statistics. 1958;29:614. [Google Scholar]

[R7] 7.Berbaum KS. God, like the devil, is in the details. Academic Radiology. 2006;13:1311–1316. doi: 10.1016/j.acra.2006.09.053. [DOI] [PubMed] [Google Scholar]

[R8] 8.Hillis SL, Obuchowski NA, Schartz KM, Berbaum KS. A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette Methods for receiver operating characteristic (ROC) data. Statistics in Medicine. 2005;24:1579–1607. doi: 10.1002/sim.2024. [DOI] [PubMed] [Google Scholar]

[R9] 9.Hillis SL, Berbaum KS. Monte Carlo validation of the Dorfman-Berbaum-Metz method using normalized pseudovalues and less data-based model simplification. Academic Radiology. 2005;12:1534–1542. doi: 10.1016/j.acra.2005.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Hillis SL. A comparison of denominator degrees of freedom methods for multiple observer ROC analysis. Statistics in Medicine. 2007;26:596–619. doi: 10.1002/sim.2532. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Satterthwaite FE. Synthesis of variance. Psychometrika. 1941;6:309–316. [Google Scholar]

[R12] 12.Satterthwaite FE. An approximate distribution of estimates of variance components. Biometric Bulletin. 1946;2:110–114. [PubMed] [Google Scholar]

[R13] 13.Hanley JA, Mcneil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) Curve. Radiology. 1982;143:29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]

[R14] 14.Hillis SL, Berbaum KS. Power estimation for the Dorfman-Berbaum-Metz method. Academic Radiology. 2004;11:1260–1273. doi: 10.1016/j.acra.2004.08.009. [DOI] [PubMed] [Google Scholar]

[R15] 15.Obuchowski NA, Rockette HE. Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: an ANOVA approach with dependent observations. Communications in Statistics-Simulation and Computation. 1995;24:285–308. [Google Scholar]

[R16] 16.Obuchowski NA. Multireader, multimodality receiver operating characteristic curve studies: hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. Academic Radiology. 1995;2(Suppl 1):S22–S29. [PubMed] [Google Scholar]

[R17] 17.Dorfman DD, Alf E., Jr Maximum likelihood estimation of parameters of signal-detection theory and determination of confidence intervals: rating method data. Journal of Mathematical Psychology. 1969;6:487–496. [Google Scholar]

[R18] 18.Dorfman DD, RSCORE II. In: Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. Swets JA, Pickett RM, editors. Academic Press; San Diego, CA: 1982. pp. 212–232. [Google Scholar]

[R19] 19.The SAS System for Windows, Version 9.1. SAS Institute Inc; Cary, NC: 2002. [Google Scholar]

[R20] 20.Van Dyke CW, White RD, Obuchowski NA, Geisinger MA, Lorig RJ, Meziane MA. Cine MRI in the diagnosis of thoracic aortic dissection; 79th RSNA Meetings; Chicago, IL. 1993. [Google Scholar]

[R21] 21.Franken EA, Jr, Berbaum KS, Marley SM, Smith WL, Sato Y, Kao SC, Milam SG. Evaluation of a digital workstation for interpreting neonatal examinations: a receiver operating characteristic study. Invest Radiol. 1992;27:732–737. doi: 10.1097/00004424-199209000-00016. [DOI] [PubMed] [Google Scholar]

[R22] 22.Berbaum KS, Schartz KM, Pesce LL, Hillis SL. DBM MRMC 2.1 (Computer software) 2006 Available for download from http://perception.radiology.uiowa.edu.

[R23] 23.Berbaum KS, Metz CE, Pesce LL, Schartz KM. DBM MRMC 2.1 User’s Guide (Software manual) 2006 Available for download from http://perception.radiology.uiowa.edu.

[R24] 24.Hillis SL, Schartz KM, Pesce LL, Berbaum KS. DBM MRMC 2.1 for SAS (Computer software) 2007 Available for download from http://perception.radiology.uiowa.edu.

PERMALINK

Recent Developments in the Dorfman-Berbaum-Metz Procedure for Multireader ROC Study Analysis

Stephen L Hillis, Ph.D.

Kevin S Berbaum, Ph.D.

Charles E Metz, Ph.D.

Abstract

Rationale and Objectives

Materials and Methods

Results

Conclusions

Introduction

Materials and Methods

Original DBM Method

Problem 1: DBM is limited to jackknife accuracy estimates

Problem 2: DBM is substantially conservative

Less data-based model simplification

New denominator degrees of freedom

Table 1.

Table 2.

Problem 3: DBM model is unsatisfactory conceptually and theoretically

Table 3.

Table 4.

Table 5.

Table 6.

Summary of related papers

Results

Simulation Study

Table 7.

Table 8.

Example 1: Spin-Echo versus CINE MRI for Detection of Aortic Dissection

Table 9.

Table 10.

Example 2: Picture archiving communication system versus plain film interpretation of neonatal examinations

Table 11.

Discussion

Acknowledgments

APPENDIX A

Situation 1

Situation 2

Situation 3

Situation 4

APPENDIX B

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases