A Marginal-Mean ANOVA Approach for Analyzing Multireader Multicase Radiological Imaging Data

Stephen L Hillis

doi:10.1002/sim.5926

. Author manuscript; available in PMC: 2015 Nov 10.

Published in final edited form as: Stat Med. 2013 Aug 23;33(2):330–360. doi: 10.1002/sim.5926

A Marginal-Mean ANOVA Approach for Analyzing Multireader Multicase Radiological Imaging Data

Stephen L Hillis ^1,^✉

PMCID: PMC4640471 NIHMSID: NIHMS733736 PMID: 24038071

Abstract

The correlated-error ANOVA method proposed by Obuchowski and Rockette (OR) has been a useful procedure for analyzing reader-performance outcomes, such as the area under the receiver-operating-characteristic curve, resulting from multireader multicase radiological imaging data. This approach, however, has only been formally derived for the test-by-reader-by-case factorial study design. In this paper I show that the OR model can be viewed as a marginal-mean ANOVA model. Viewing the OR model within this marginal-mean ANOVA framework is the basis for the marginal-mean ANOVA approach, the topic of this paper. This approach (1) provides an intuitive motivation for the OR model, including its covariance-parameter constraints; (2) provides easy derivations of OR test statistics and parameter estimates, as well as their distributions and confidence intervals; and (3) allows for easy generalization of the OR procedure to other study designs. In particular, I show how one can easily derive OR-type analysis formulas for any balanced study design by following an algorithm which only requires an understanding of conventional ANOVA methods.

Keywords: Receiver operating characteristic (ROC) curve, correlated ANOVA, diagnostic radiology

1. INTRODUCTION

Receiver operating characteristic (ROC) curve analysis is a well established method for evaluating and comparing the performance of diagnostic tests. In radiological imaging studies such tests typically involve a human reader (usually a radiologist) evaluating an image or images resulting from an imaging modality (such as mammography for breast cancer) for a case (i.e., subject) with respect to confidence of disease. In such situations it is important that conclusions generalize to both the case and reader populations. A typical design for comparing diagnostic tests is the balanced test×reader×case factorial study design where each image is assigned a disease-confidence rating by each reader using each diagnostic test. Throughout I use test to refer to a diagnostic test, modality, or treatment.

The methods proposed by Obuchowski and Rockette (OR) [1, 2] and Dorfman, Berbaum, andMetz (DBM) [3, 4] are the most commonly used methods for analyzing such multireader multicase studies (often referred to as MRMC studies) and have performed well in simulations. The OR procedure fits a correlated-error test×reader ANOVA to reader-performance outcomes such as the area under the ROC curve (AUC), while the DBM procedure fits a test×reader×case conventional ANOVA to case-specific pseudovalues. Although the two methods have been shown to be equivalent [5, 6] when based on the same procedural parameters, I find the OR procedure more intuitive and its parameters more interpretable because it models observed reader-performance outcomes rather than pseudovalues. For this reason the OR procedure will be the focus of this paper.

Previously published derivations of OR model statistical properties [6] are tedious to derive, do not provide motivation for the model, and have been derived only for the balanced text×reader×case factorial study design. In this paper I show that the OR model is the same as the model for the marginal mean of a conventional ANOVA model with independent errors, where the mean is computed across cases. Viewing the OR model within this marginal-mean ANOVA framework is the basis for the marginal-mean ANOVA approach (mm-ANOVA approach), the topic of this paper. This approach (1) provides an intuitive motivation for the OR model, including its covariance-parameter constraints; (2) provides easy derivations of OR test statistics and parameter estimates, as well as their distributions and confidence intervals; and (3) allows for easy generalization of the OR procedure to other study designs.

In particular, I show how one can easily derive OR-type analysis formulas for any balanced study design by following an algorithm which only requires an understanding of conventional ANOVA methods. This development is important because for many situations other designs are more suitable than the text×reader×case factorial study design. For example, diagnostic tests may be mutually exclusive for various reasons, such as high radiation dose or invasiveness of the test, and thus can not be given to each patient; readers may be trained to read under only one of the tests; or power considerations may show that it is advantageous to have replicated readings or to have groups of readers read different cases.

The outline of this paper is as follows. I review the OR method in Section 2. In Sections 3–4 and Appendices A–C I describe and justify steps of an algorithm for motivating the OR model and deriving its properties using the marginal-mean ANOVA approach. Steps are stated in a general form so that analogous OR-type procedures can be formulated for other study designs. In Section 5 I summarize the algorithm and illustrate how the algorithm can be used to develop OR-type procedures for six other study designs. A discussion and concluding remarks are given in Section 6.

2. THE OBUCHOWSKI-ROCKETTE (OR) METHOD

2.1. Design and notation

Throughout this section I assume the data have been collected using a balanced test×reader×case study factorial design. This commonly used diagnostic-radiology study design specifies that each case be subjected to each test, with the resulting images evaluated once by each reader. In addition, each case is classified as diseased or nondiseased according to an available reference standard. Typically the number of cases is 25–200 while the number of readers is 3–15. Let Z_ijk denote a confidence-of-disease rating assigned to the kth case by the jth reader using the ith test. For example, often an ordinal five-level ordinal integer scale or a quasi-continuous 0% to 100% confidence scale is used. The observed rating data consists of the Z_ijk, with i = 1, …, t, j = 1, …, r, k = 1, …, c, where t is the number of tests, r the number of readers, and c the number of cases.

2.2. Model and test statistic

Let θ̂_ij denote the AUC estimate (or other ROC-curve accuracy estimate) for the ith test and jth reader. Obuchowski and Rockette [1] use a test × reader factorial ANOVA model for the AUC estimates, but unlike a conventional ANOVA model they allow the errors to be correlated to account for correlation due to each reader evaluating the same cases. Their model, which I refer as the OR model, can be written as

{\hat{θ}}_{i j} = μ + τ_{i} + R_{j} + {(τ R)}_{i j} + ε_{i j}

(1)

i = 1, …, t, j = 1, …, r, where τ_i denotes the fixed effect of test i, R_j denotes the random effect of reader j, (τR)_ij denotes the random test × reader interaction, and ε_ij is the error term. Without loss of generality I assume $\sum_{i = 1}^{t} τ_{i} = 0$ . The R_j and (τR)_ij are assumed to be mutually independent and normally distributed with zero means and respective variances $σ_{R}^{2}$ and $σ_{T R}^{2}$ . The ε_ij are assumed to be normally distributed with zero mean and variance $σ_{ε}^{2}$ and are assumed independent of the R_j and (τR)_ij. Equi-covariance of the errors between readers and tests is assumed, resulting in three possible covariances given by

Cov (ε_{i j}, ε_{i' j'}) = {\begin{matrix} {Cov}_{1} & i \neq i', j = j' (different test, same reader) \\ {Cov}_{2} & i = i', j \neq j' (same test, different reader) \\ {Cov}_{3} & i \neq i', j \neq j' (different test, different reader) \end{matrix}

It follows from model (1) that $σ_{ε}^{2}$ , Cov₁, Cov₂, and Cov₃ are also the variance and corresponding covariances of the AUC estimates, conditional on the reader and test × reader effects. Based on clinical considerations Obuchowski and Rockette [1] suggest the following ordering for the covariances:

{Cov}_{1} \geq {Cov}_{2} \geq {Cov}_{3} \geq 0 .

(2)

In Section 3.4 I show that these constraints can replaced by the less restrictive constraints

{Cov}_{1} \geq {Cov}_{3}, {Cov}_{2} \geq {Cov}_{3}, {Cov}_{3} \geq 0

(3)

Alternatively, the model can be described in terms of the error correlations, defined by $ρ_{i} = {Cov}_{i} / σ_{ε}^{2}, i = 1, 2, 3$ .

When Cov₂ and Cov₃ are known, the OR statistic for testing the null hypothesis of no test effect (H₀: τ_i = 0; i = 1, … t) is given by

F_{O R}^{*} = \frac{M S (T)}{M S (T * R) + r ({Cov}_{2} - {Cov}_{3})}

(4)

where MS(T) and MS(T * R) are the test and test × reader mean squares; i.e., $M S (T) = \frac{r}{t - 1} \sum_{i = 1}^{t} {({\hat{θ}}_{i •} - {\hat{θ}}_{• •})}^{2}$ and $M S (T * R) = \frac{1}{(t - 1) (r - 1)} \sum_{i = 1}^{t} \sum_{j = 1}^{r} {({\hat{θ}}_{i j} - {\hat{θ}}_{i •} - {\hat{θ}}_{• j} + {\hat{θ}}_{• •})}^{2}$ . A subscript replaced by a dot indicates that values are averaged across the missing subscript index; for example, ${\hat{θ}}_{• •} = \frac{1}{t r} \sum_{i = 1}^{t} \sum_{j = 1}^{r} {\hat{θ}}_{i j}$ .

In practice the statistic actually used is

F_{O R} = \frac{M S (T)}{M S (T * R) + max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]}

(5)

where ${\hat{Cov}}_{2}$ and ${\hat{Cov}}_{3}$ denote estimates for Cov₂ and Cov₃, respectively. Note that (5) incorporates the constraints specified by (3) by setting ${\hat{Cov}}_{2} - {\hat{Cov}}_{3}$ to zero if it is negative. Since Cov₂ and Cov₃ are also the corresponding covariances of the AUC estimates conditional on the reader and test × reader effects, they can be estimated using methods that treat cases as random but readers as fixed, such as jackknifing, bootstrapping, parametric methods, or the method proposed by DeLong et al [7] for trapezoidal-rule (or empirical) AUC estimates [8]. The OR estimates obtained from averaging corresponding fixed-reader AUC variances and covariances are denoted by ${\hat{σ}}_{ε}^{2}, {\hat{Cov}}_{1}, {\hat{Cov}}_{2}$ , and ${\hat{Cov}}_{3}$ . Hillis [6] shows that F_OR has an approximate F_{t−1;ddf_H} null distribution, where

d d f_{H} = \frac{{M S (T * R) + max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]}^{2}}{\frac{{[M S (T * R)]}^{2}}{(t - 1) (r - 1)}}

(6)

More generally, F_OR has an F_{t−1,df₂;λ} distribution where $λ = \frac{r \sum_{i = 1}^{t} τ_{i}^{2}}{σ_{T R}^{2} + σ_{\tilde{ε}}^{2} - {Cov}_{1} + (r - 1) ({Cov}_{2} - {Cov}_{3})}$ and $d f_{2} = \frac{{[σ_{T R}^{2} + σ_{\tilde{ε}}^{2} - {Cov}_{1} + (r - 1) ({Cov}_{2} - {Cov}_{3})]}^{2}}{{[σ_{T R}^{2} + σ_{\tilde{ε}}^{2} - {Cov}_{1} - {Cov}_{2} + {Cov}_{3}]}^{2} / [(t - 1) (r - 1)]}$ .

Letting θ_i denote the expected reader performance measure for test i (i.e., θ_i = E(θ̂_i•)), an approximate (1 − α) 100% confidence interval for contrast $\sum_{i = 1}^{t} l_{i} θ_{i} (\sum_{i = 1}^{t} l_{i} = 0)$ is given by $\sum_{i = 1}^{t} l_{i} {\hat{θ}}_{i •} \pm t_{α / 2; d d f_{H}} \sqrt{\hat{V}}$ where $\hat{V} = \frac{1}{r} (\sum_{i = 1}^{t} l_{i}^{2}) {M S (T * R) + max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]}$ . An approximate (1 − α) 100% confidence interval for θ_i, using a standard error computed from all of the data, is given by ${\hat{θ}}_{i •} \pm t_{α / 2; d f_{2}} \sqrt{\hat{V}}$ , where $\hat{V} = \frac{1}{t r} [M S (R) + (t - 1) M S (T * R) + t r max ({\hat{Cov}}_{2}, 0)]$ and $d f_{2} = \frac{{[M S (R) + (t - 1) M S (T * R) + t r max ({\hat{Cov}}_{2}, 0)]}^{2}}{{[M S (R)]}^{2} / (r - 1) + {[(t - 1) M S [T * R]]}^{2} / [(t - 1) (r - 1)]}$ . Alternatively, an approximate (1 − α) 100% confidence interval for test i, using a standard error computed only from data for test i, is given by ${\hat{θ}}_{i •} \pm t_{α / 2; d f_{2}^{(i)}} \sqrt{\hat{V} (i)}$ , where ${\hat{V}}^{(i)} = \frac{1}{r} [M S {(R)}^{(i)} + r max ({\hat{Cov}}_{2}^{(i)}, 0)]$ and $d f_{2}^{(i)} = \frac{{[M S {(R)}^{(i)} + r max ({\hat{Cov}}_{2}^{(i)}, 0)]}^{2}}{{[M S {(R)}^{(i)}]}^{2} / (r - 1)}$ ; here MS (R)⁽ⁱ⁾ and ${\hat{Cov}}_{2}^{(i)}$ are computed only from test i data. I recommend this latter formula for single AUC confidence intervals, since it does not depend on assuming equal error covariances and variances for each test. All of these results have been previously presented [6].

Expected mean squares are given in Table 1a; proofs for these results are given by Hillis [6]. Expressions for the variance components, in terms of the expected mean squares and covariances are presented in Table 1b; these relationships follow directly from Table 1a. Estimated variance components result by replacing expected mean squares by mean squares and covariance parameters by estimates; for example,

{\hat{σ}}_{T R}^{2} = M S (T * R) - {\hat{σ}}_{ε}^{2} + {\hat{Cov}}_{1} + max ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}, 0)

Typically the variance component estimates are changed to zero if the computed values are negative.

Table 1.

Expected mean square and variance component formulas for the Obuchowski-Rockette model.

Expected mean squares

Mean square

Expected mean square

MS(T)

\frac{r}{t - 1} \sum_{i = 1}^{t} τ_{i}^{2} + σ_{T R}^{2} + σ_{ε}^{2} - {Cov}_{1} + (r - 1) ({Cov}_{2} - {Cov}_{3})

MS(R)

t σ_{R}^{2} + σ_{T R}^{2} + σ_{ε}^{2} - {Cov}_{2} + (t - 1) ({Cov}_{1} - {Cov}_{3})

MS(T * R)

σ_{T R}^{2} + σ_{ε}^{2} - {Cov}_{1} - {Cov}_{2} + {Cov}_{3}

Open in a new tab

Variance components

Variance component Equivalent function of expected mean squares and covariances

$σ_{R}^{2}$

$\frac{1}{t} E {M S (R) - M S (T * R)} - {Cov}_{1} + {Cov}_{3}$

$σ_{T R}^{2}$

$E [M S (T * R)] - σ_{ε}^{2} + {Cov}_{1} + ({Cov}_{2} - {Cov}_{3})$

Open in a new tab

Open in a new tab

2.3. Real-data example

To illustrate the OR method for the factorial design, I compare reader AUCs for hard- and soft-copy computed radiography chest images selected randomly from a medical intensive care unit. In the study [9] four radiologists blindly read both hard- and soft-copy images obtained with computed radiography from the same patients. Six months separated the end of the hard-copy readings and the start of the soft-copy readings. A five-point ordinal scale was used to rate the likelihood of presence of the condition (which I will refer to as “disease”) implied by the reason for requesting the corresponding examination. Ninety-five images, consisting of 29 diseased and 66 nondiseased images, were read under each test condition.

The analysis of this study using empirical AUC estimates and jackknife covariance estimates is displayed in Table 2. The AUCs for soft- and hard-copy images, averaged across the four readers, are 0.804 and 0.841, respectively. The test for the null hypothesis of no test effect (i.e., the population average AUC across readers is the same for soft- and hard-copy images) is not significant (F_OR = 6.01, ddf_H = 3, p = .092); the 95% confidence interval for the difference of the population AUCs (hard- minus soft-copy) is (−0.011, 0.086). Parts (i) and (j) give 95% confidence intervals for the single-test AUCs, based on all of the data and only on data for the specific test, respectively. The confidence intervals from the two methods are similar; this is expected because the AUCs are similar.

Table 2.

Obuchowski-Rockette analysis of Kundel et al [9] data for soft- and hard-copy computed radiographs using trapezoid AUC estimation and jackknife covariance estimation for t = 2 tests, r = 4 readers, c = 95 cases (66 nondiseased, 29 diseased).

Trapezoid AUCs:

Test

1 (Soft-copy) 2 (Hard-copy)

Reader (j) θ̂_1j θ̂_2j

1 0.815 0.854

2 0.767 0.812

3 0.831 0.900

4 0.803 0.798

θ̂_1· = .804 θ̂_2· = .841

Open in a new tab
ANOVA table:

Source df Sum of squares Mean square

T 1 0.00281054 0.00281054

R 4 0.00715054 0.00238351

T*R 4 0.00140392 0.00046797

Open in a new tab
Fixed-reader covariance and corresponding correlation estimates computed from jackknife covariance matrix:
${\hat{σ}}_{ε}^{2} = .0022034331, {\hat{Cov}}_{1} = .0011163046, {\hat{Cov}}_{2} = .0.0008438255, {\hat{Cov}}_{3} = .0008871752, {\hat{ρ}}_{1} = 0.507, {\hat{ρ}}_{2} = 0.383, {\hat{ρ}}_{3} = 0.403$
Variance component estimates using Table 1b formulas:
${\hat{σ}}_{R}^{2} = \frac{1}{t} {M S (R) - M S (T * R)} - {\hat{Cov}}_{1} + {\hat{Cov}}_{3} = 0.0007286397$

${\hat{σ}}_{T R}^{2} = M S (T * R) - {\hat{σ}}_{ε}^{2} + {\hat{Cov}}_{1} + max ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}, 0) = - 0.000662504 (typically this would be changed to zero)$
$F_{O R} = \frac{M S (T)}{M S (T * R) + r max ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}, 0)} = 6.00576$
Denominator degrees of freedom:
$d d f_{H} = \frac{{[M S (T * R) + max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]]}^{2}}{\frac{{[M S (T * R)]}^{2}}{(t - 1) (r - 1)}} = 3$
P -value for H₀: θ₁ = θ₂: p = Pr (F_{(t−1), ddf_H} ≥ F_OR) = .092
95% CI for $θ_{2} - θ_{1} : {\hat{θ}}_{2 \cdot} - {\hat{θ}}_{1 \cdot} \pm t_{d d f_{H}} \sqrt{\frac{2}{r} {M S (T * R) + r max ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}, 0)}} = (- 0.0111940, .086168)$
Single-test 95% confidence intervals based on all of the data. Note: $StdErr = \frac{1}{t r} [M S (R) + (t - 1) M S (T * R) + t r max ({\hat{Cov}}_{2}, 0)]$ .

i θ̂_i StdErr df₂ 95% CI

1 (Soft-copy) 0.804 .0346 46.9 0.734, 0.874

2 (Hard-copy) 0.841 .0346 46.9 0.772, 0.911

Open in a new tab
Single test 95% confidence intervals using only corresponding test data. Note: ${StdErr}^{(i)} = \sqrt{\frac{1}{r} [M S {(R)}^{(i)} + r * max ({\hat{Cov}}_{2}^{(i)}, 0)]}$ .

i θ̂_i
${\hat{Cov}}_{2}^{(i)}$
MS(R)⁽ⁱ⁾ StdErr⁽ⁱ⁾
$d f_{2}^{(i)}$
95% CI

1 (Soft-copy) 0.804 0.000880 0.000735 0.0326 100.4 0.739, 0.867

2 (Hard-copy) 0.841 0.000808 0.002116 0.0366 19.2 0.765, 0.918

Open in a new tab

Open in a new tab

Although this study showed a nonsignificant difference between soft- and hard-copy image reader performance, the confidence interval for the difference of the AUCs showed a difference as large as 0.086 to be commensurate with the data. In such a situation, the researcher may decide to design a future study that would produce a more precise estimate of the difference. Increased precision could result from an increase in the number of cases, the number of readers, or from replicated readings where each reader reads each image 2 or more times. If increasing the number of cases and readers is not feasible, then a replicated study is a natural choice for increasing power; however, OR analysis methodology has been developed only for the nonreplicated factorial design. I use the algorithm described in this paper to derive the OR-type procedure for the replicated factorial design, including the test-statistic nonnull distribution, which allows for power and sample size estimation. Using this result, I illustrate efficiency computations comparing the nonreplicated and replicated designs in Section 5.6.

In this study the same radiologists also similarly rated 95 hard-copy chest images obtained with screen-film; these images were from different patients than the computed radiographs. Because the original OR method assumes a factorial study design with readers reading the same cases under each test, it cannot be used to compare the screen-film AUC outcomes with the AUC outcomes from either the soft- or hard-copy computed radiograph images. In Section 5.3 I show how the OR approach can be adapted for this situation, which represents a split-plot study design with cases nested within test, and illustrate the analysis of these data.

2.4. Previous derivations of OR properties

Derivations of OR-procedure properties have previously been derived starting with the OR model (1, 2). For what is essentially the OR model, Pavur and Nath [10] show that, for testing the null hypothesis of equal tests, the F statistic that is appropriate when the errors are independent can be used if corrected by a multiplicative factor. The multiplicative factor is a function of the correlations, which are assumed known, and the distribution for this corrected F statistic is the same as for the uncorrected F statistic when the errors are independent. The approach taken by Obuchowski and Rockette [1] was to modify this result by replacing the assumed-known correlations by estimated correlations. This approach yielded valid ANOVA statistics but unsatisfactory degrees of freedom, resulting in overly conservative tests [6]. Alternatively, Hillis [6] directly derived properties, but the proofs are tedious and nonintuitive.

3. MM-ANOVA APPROACH – STEP 1: DERIVE THE MM-ANOVA MODEL

In Sections 3–4 and Appendices A–C I show how the properties of the OR model can easily be derived using an algorithm, based on the mm-ANOVA approach, that only requires knowing how to determine conventional ANOVA test statistics and expected mean squares. I describe and illustrate the steps in the algorithm for the typical balanced test×reader×case study design discussed in the previous section. The steps are stated in a general form so that they can be applied to other balanced study designs. The mm-ANOVA approach and corresponding algorithm have not been previously described and are the main contribution of this paper.

3.1. Step 1a: Define the conventional ANOVA model that corresponds to the study design as if each reader-performance measure was the mean of case outcomes

Let Y_ijk denote a hypothetical outcome for test i, reader j, and case k. For our purposes Y_ijk is used only to illustrate the marginal ANOVA model approach; i.e., it does not represent an actual study outcome and should be distinguished from the observed rating Z_ijk. I assume that the Y_ijk follow a three-way conventional ANOVA model that corresponds to the study design.

Thus the distribution of Y_ijk is given by the following test × reader × case ANOVA model that treats test as a fixed factor and reader and case as random factors:

Y_{i j k} = μ + τ_{i} + R_{j} + C_{k} + {(τ R)}_{i j} + {(τ C)}_{i k} + {(R C)}_{j k} + {(τ R C)}_{i j k} + ε_{i j k}

(7)

i = 1, …, t, j = 1, …, r, k = 1, …, c, where τ_i denotes the fixed effect of test i with $\sum_{i = 1}^{t} τ_{i} = 0$ , R_j denotes the random effect of reader j, C_k denotes the random effect of case k, the multiple symbols in parentheses denote random interactions, and ε_ijk is the error term. The random effects are assumed to be mutually independent and normally distributed with zero means and respective variances $σ_{R}^{2}, σ_{C}^{2}, σ_{T R}^{2}, σ_{T C}^{2}, σ_{R C}^{2}, σ_{T R C}^{2}$ , and $σ_{ε}^{2}$ . Because there are no replications, for estimation purposes $σ_{T R C}^{2}$ and $σ_{ε}^{2}$ are inseparable; hence I define

σ^{2} = σ_{T R C}^{2} + σ_{ε}^{2}

Results for this model, such as mean square distributional properties and ANOVA test statistics, are well known (e.g., [11]) and will be stated without references.

3.2. Step 1b: From the conventional ANOVA model defined in step 1a, derive the mm-ANOVA model by averaging across cases and defining the mm-ANOVA model error term equal to the mean, across cases, of the sum of the conventional ANOVA model error term and random effects involving case

I say that a random effect “involves case” if it is subscripted according to case. Let Ỹ_ij denote the marginal mean resulting from averaging over cases; i.e.,

Ỹ_{i j} = Y_{i j •}

(8)

I use the term marginal-mean ANOVA model (mm-ANOVA model) to refer to the model implied by the conventional 3-way ANOVA model (7) for the marginal mean (8). It follows from (7) that

Ỹ_{i j} = μ + τ_{i} + R_{j} + {(τ R)}_{i j} + {\tilde{ε}}_{i j}

(9)

where

{\tilde{ε}}_{i j} = C_{•} + {(τ C)}_{i •} + {(R C)}_{j •} + {(τ R C)}_{i j •} + ε_{i j •}

(10)

the R_j and (τR)_ij are mutually independent and normally distributed with zero means and respective variances $σ_{R}^{2}$ and $σ_{T R}^{2}$ , and the ε̃_ij are independent of the R_j and (τR)_ij.

3.3. Step 1c: Express the mm-ANOVA model error variance and covariances in terms of the conventional ANOVA model variance components

From (10) it follows that the ε̃_ij are normally distributed with mean 0, variance

σ_{\tilde{ε}}^{2} = \frac{1}{c} (σ_{C}^{2} + σ_{T C}^{2} + σ_{R C}^{2} + σ_{T R C}^{2} + σ_{ε}^{2})

(11)

and equi-correlated with

{Cov}_{1} \equiv cov ({\tilde{ε}}_{i j}, {\tilde{ε}}_{i' j}) = \frac{1}{c} (σ_{C}^{2} + σ_{R C}^{2})

(12)

{Cov}_{2} \equiv cov ({\tilde{ε}}_{i j}, {\tilde{ε}}_{i j'}) = \frac{1}{c} (σ_{C}^{2} + σ_{T C}^{2})

(13)

and

{Cov}_{3} \equiv cov ({\tilde{ε}}_{i j}, {\tilde{ε}}_{i' j'}) = \frac{1}{c} σ_{C}^{2}

(14)

where i ≠ i′ and j ≠ j′.

3.4. Step 1d: Determine the mm-ANOVA model covariance constraints implied by step 1c

The covariance constraints given by (3) follow from (12–14). Thus the mm-ANOVA model for Ỹ_ij is defined by (9) and (3). It also follows from (11–14) that $σ_{\tilde{ε}}^{2} \geq ({Cov}_{1} + {Cov}_{2} + {Cov}_{3})$ , but I do not include this constraint as part of the definition of the mm-ANOVA model because this constraint is implied from the relationship Var(ε̃₁₁ − ε̃₁₂ − ε̃₂₁ + ε̃₂₂) ≥ 0.

3.5. Remarks

3.5.1. One-to-one relationship between parameters of the 3-way conventional ANOVA and corresponding mm-ANOVA models

In terms of the mm-ANOVA model parameters (μ, τ_i, $σ_{R}^{2}, σ_{T R}^{2}, σ_{\tilde{ε}}^{2}$ , Cov₁, Cov₂, and Cov₃), the parameters for the corresponding three-way ANOVA model (7) are given by μ, τ_i, $σ_{R}^{2}, σ_{T R}^{2}, σ_{ε}^{2} = c (σ_{\tilde{ε}}^{2} - {Cov}_{1} - {Cov}_{2} - {Cov}_{3}), σ_{C}^{2} = c {Cov}_{3}, σ_{T C}^{2} = c ({Cov}_{2} - {Cov}_{3})$ , and $σ_{R C}^{2} = c ({Cov}_{1} - {Cov}_{3})$ . Thus there is a one-to-one relationship between the parameters of the two models. Hence for any mm-ANOVA model, defined by (9) and (3), there is a corresponding conventional 3-way ANOVA model (7) that implies that model for the marginal means. These relationships between the two models are presented in Table 3.

Table 3.

Relationships between the 3-way ANOVA (7) and corresponding mm-ANOVA (9, 3) model parameters

3-way ANOVA parameter

Equivalent function of mm-ANOVA parameters

= μ

τ_i

= τ_i

σ_{R}^{2}

= σ_{R}^{2}

σ_{T R}^{2}

= σ_{T R}^{2}

σ_{C}^{2}

= cCov₃

σ_{T C}^{2}

= c (Cov₂ − Cov₃)

σ_{R C}^{2}

= c (Cov₁ − Cov₃)

σ^{2} \equiv σ_{T R C}^{2} + σ_{ε}^{2}

= c (σ_{\tilde{ε}}^{2} - {Cov}_{1} - {Cov}_{2} + {Cov}_{3})

mm-ANOVA parameter

Equivalent function of 3-way ANOVA parameters

τ_i

σ_{R}^{2}

= σ_{R}^{2}

σ_{T R}^{2}

= σ_{T R}^{2}

σ_{\tilde{ε}}^{2}

= \frac{1}{c} (σ_{C}^{2} + σ_{T C}^{2} + σ_{R C}^{2} + σ_{ε}^{2})

Cov₁

= \frac{1}{c} (σ_{C}^{2} + σ_{R C}^{2})

Cov₂

= \frac{1}{c} (σ_{C}^{2} + σ_{T C}^{2})

Cov₃

= \frac{1}{c} (σ_{C}^{2})

Open in a new tab

These relationships assume covariance constraints (3) for the mm-ANOVA model and the same linear constraints for the τ_i (i.e., ∑ τ_i = 0) for both models.

3.5.2. Equivalence of the OR and mm-ANOVA models

Note that the mm-ANOVA model (9, 3) has the same form as the OR model (1, 2), with the only difference being that the mm-ANOVA model covariance constraints (3) are less restrictive. Since the OR covariance constraints (2) were suggested by Obuchowski and Rockette [1] based only on clinical considerations, to simplify comparison of the models I now modify the definition of the OR model to include the less restrictive mm-ANOVA model constraints (3); i.e., the OR model is now considered to be defined by equations (1) and (3). With this change the OR and the mm-ANOVA model become equivalent.

3.5.3. Definition of the mm-ANOVA approach

Because the OR and mm-ANOVA model are identical, statistical properties for the ROC accuracy estimates, the θ̂_ij, are the same as for the marginal means, the Ỹ_ij, for an mm-ANOVA model having the same parameter values as the OR model. The mm-ANOVA approach consists of deriving statistical properties for the OR model (1, 3) by recognizing that it is equivalent to the mm-ANOVA model (9, 3), and then deriving properties of the mm-ANOVA model by utilizing its relationship with the conventional three-way ANOVA model. The advantage of this approach is that properties of the conventional three-way ANOVA model are well known.

3.5.4. Motivation for the OR model

The mm-ANOVA approach provides an intuitive motivation for the OR model (1, 3) as follows. Suppose, hypothetically, that the reader performance outcome θ̂_ij is the mean of case-specific outcomes; that is, suppose that θ̂_ij = Y_ij• for some outcome Y_ijk, with k = 1, …, c. A typical way to account for variation in θ̂_ij due to readers and cases would be to assume the three-way ANOVA model (7), which implies the mm-ANOVA model (9, 3) and hence also the equivalent OR model (1, 3) for θ̂_ij. Of course, in practice θ̂_ij is not a marginal mean, but rather a nonlinear function of the case-specific confidence-of-disease ratings and truth-state (i.e., reference standard) indicator values. However, the mm-ANOVA approach shows that the OR model accounts for reader and case variation using the covariance structure implied by a conventional three-way ANOVA model, as if the accuracy estimate was a marginal mean.

4. MM-ANOVA APPROACH – STEP 2: DERIVE THE MM-ANOVA MODEL TEST STATISTIC AND ITS NULL DISTRIBUTION FOR A HYPOTHESIS EXPRESSED IN TERMS OF TEST ACCURACIES

In this section I show how to derive the mm-ANOVA model test statistic and its null distribution for testing the null hypothesis of equal test accuracies. I define test accuracy as the expected reader-performance measure for a particular test level. However, more generally these steps can be applied to any hypothesis that can be expressed in terms of linear functions of expected reader-performance outcomes.

4.1. Step 2a: State the hypothesis of interest in terms of the mm-ANOVA model

For the mm-ANOVA model (9, 3) let θ_i denote the test accuracy for test i; i.e., θ_i = E (Ỹ_i•) is the expected reader-performance outcome for test i across the population of readers. The hypothesis of interest is the global null hypothesis of equal test accuracies, i.e., H₀ : θ₁ = … = θ_t, or equivalently, H₀ : τ₁ = … = τ_t = 0.

4.2. Step 2b: Express the hypothesis from step 2a in terms of the conventional ANOVA model

Noting that

θ_{i} = E (Ỹ_{i •}) = E (Y_{i • •}) = μ + τ_{i}

it follows that H₀ : θ₁ = … = θ_t is equivalent to H₀ : τ₁ = … = τ_t = 0 for the conventional ANOVA model (7).

4.3. Step 2c: Create the expected-mean-square table for the conventional ANOVA model

Let MS(T), MS(R), and MS(C) denote the conventional ANOVA mean squares due to test, reader, and case, respectively, with interaction mean squares notated in the usual manner. The expected mean squares for the conventional ANOVA model are presented in Table 4. These relationships will be utilized in other steps.

Table 4.

Expected mean squares for the conventional test-by-reader-by-case factorial ANOVA model (7).

Mean square

Expected mean square

MS (T)

\frac{r c}{(t - 1)} \sum_{i = 1}^{t} τ_{i}^{2} + c σ_{T R}^{2} + r σ_{T C}^{2} + σ^{2}

MS (R)

t c σ_{R}^{2} + c σ_{T R}^{2} + t σ_{R C}^{2} + σ^{2}

MS (C)

t r σ_{C}^{2} + r σ_{T C}^{2} + t σ_{R C}^{2} + σ^{2}

MS (T * R)

c σ_{T R}^{2} + σ^{2}

MS (T * C)

r σ_{T C}^{2} + σ^{2}

MS (R * C)

t σ_{R C}^{2} + σ^{2}

MS (T * R * C)

σ^{2} \equiv σ_{T R C}^{2} + σ_{ε}^{2}

Open in a new tab

4.4. Step 2d: Determine the conventional ANOVA F statistic corresponding to the step 2b hypothesis

The conventional ANOVA test statistic for testing for H₀ : τ₁ = … = τ_t = 0 is given by

F = \frac{M S (T)}{M S (T * R) + M S (T * C) - M S (T * R * C)}

(15)

I refer to F as an ANOVA statistic because its numerator and denominator have the same expectation under H₀, but the numerator has a larger expectation than the denominator under H₁ : τ_i ≠ τ_j for some i ≠ j.

4.5. Step 2e: Express mm-ANOVA mean squares in terms of conventional ANOVA mean squares

For the mm-ANOVA model let $\tilde{M S} (T), \tilde{M S} (R)$ , and $\tilde{M S} (T * R)$ denote the test, reader, and test×reader mean squares; i.e., $\tilde{M S} (T) = \frac{r}{t - 1} \sum_{i = 1}^{t} {(Ỹ_{i •} - Ỹ_{• •})}^{2}, \tilde{M S} (R) = \frac{t}{r - 1} \sum_{j = 1}^{r} {(Ỹ_{• j} - Ỹ_{• •})}^{2}$ and $\tilde{M S} (T * R) = \frac{1}{(t - 1) (r - 1)} \sum_{i = 1}^{t} \sum_{j = 1}^{r} {(Ỹ_{i j} - Ỹ_{i •} - Ỹ_{• j} + Ỹ_{• •})}^{2}$ . Noting that $M S (T) = \frac{r c}{t - 1} \sum_{i = 1}^{t} {(Y_{i • •} - Y_{• • •})}^{2}, M S (R) = \frac{t c}{r - 1} \sum_{i = 1}^{t} {(Y_{• j •} - Y_{• • •})}^{2}, M S (T * R) = \frac{c}{(t - 1) (r - 1)} \sum_{i = 1}^{t} \sum_{j = 1}^{r} {(Y_{i j •} - Y_{i • •} - Y_{• j •} + Y_{• • •})}^{2}$ , it follows that

\tilde{M S} (T) = \frac{1}{c} M S (T)

(16)

\tilde{M S} (R) = \frac{1}{c} M S (R)

\tilde{M S} (T * R) = \frac{1}{c} M S (T * R)

(17)

4.6. Step 2f: Express F from step 2d in terms of mm-ANOVA model mean squares and U, where U is a linear function of conventional ANOVA model mean squares that involve case

It follows from (16–17) that (15) can be written in the form

F = \frac{\tilde{M S} (T)}{\tilde{M S} (T * R) + U}

(18)

where

U = \frac{1}{c} [M S (T * C) - M S (T * R * C)]

Note that U is a linear function of conventional ANOVA model mean squares involving case and (18) is an ANOVA statistic.

4.7. Step 2g: Express E (U) in terms of conventional ANOVA model variance components, and then in terms of mm-ANOVA model error covariance parameters using the relationships from step 1c

From Table 4 we have $E [M S (T * C)] = r σ_{T C}^{2} + σ^{2}$ and E [MS (T * R * C)] = σ². It follows that

E (U) = \frac{1}{c} E [M S (T * C) - M S (T * R * C)] = \frac{r}{c} σ_{T C}^{2}

(19)

Using (13) and (14) we can write the right side of (19) in terms of the mm-ANOVA covariances: $\frac{r}{c} σ_{T C}^{2} = r ({Cov}_{2} - {Cov}_{3})$ . Hence

E (U) = r ({Cov}_{2} - {Cov}_{3})

(20)

4.8. Step 2h: Modify F (18) from step 2f to produce the mm-ANOVA statistic $F_{O R}^{*}$ by replacing U by E (U), expressed as a linear function of mm-ANOVA covariance parameters

Replacing U in equation (18) by its expectation (20) results in

F_{O R}^{*} = \frac{\tilde{M S} (T)}{\tilde{M S} (T * R) + r ({Cov}_{2} - {Cov}_{3})}

(21)

which is the OR test statistic $F_{O R}^{*}$ (4) when we treat the Ỹ_ij as the OR model outcomes θ̂_ij. Because (18) is an ANOVA statistic, it follows that $F_{O R}^{*}$ (21) is also an ANOVA statistic.

4.9. Step 2i: Derive F_OR by replacing covariance parameters in $F_{O R}^{*}$ by estimates that take into account the constraints from step 1d

An obvious estimate of Cov₂−Cov₃ that takes into account covariance constraints (3) is given by $max [({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]$ , where ${\hat{Cov}}_{2}$ and ${\hat{Cov}}_{3}$ are estimates as discussed in Section 2.2. Replacing Cov₂−Cov₃ in (21) by this estimate results in

F_{O R} = \frac{\tilde{M S} (T)}{\tilde{M S} (T * R) + max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]}

(22)

which is the OR statistic F_OR (5) when we replace the Ỹ_ij by the OR model outcomes θ̂_ij.

4.10. Step 2j: Determine the approximate null distribution of F_OR

Null-distribution result

Write the denominator of F_OR in the form

b (\sum_{i = 1}^{I} a_{i} {\tilde{M S}}_{i} + \hat{d})

(23)

where the ${\tilde{M S}}_{i}$ , i = 1, …, I are mm-ANOVA mean squares, d̂ is a function of the covariance parameter estimates and the a_i and b are constants. Then F_OR will have an approximate F_df₁,df₂ null distribution, where df₁ is the numerator degrees of freedom for the conventional ANOVA model test statistic in step 2d and df₂ is given by

d f_{2} = \frac{{[\sum_{i = 1}^{I} a_{i} {\tilde{M S}}_{i} + \hat{d}]}^{2}}{\sum_{i = 1}^{I} \frac{{[a_{i} {\tilde{M S}}_{i}]}^{2}}{d f ({\tilde{M S}}_{i})}}

(24)

where $d f ({\tilde{M S}}_{i})$ is the degrees of freedom for ${\tilde{M S}}_{i}$ , and hence also for MS_i. I have stated this result generally so that it can be easily applied to other designs. See Appendix A for a derivation of this result.

To apply this result to the balanced test×reader×case factorial study design, note that the denominator of F_OR (22) is given by (23) with I = 1, a₁ = 1, b = 1, ${\tilde{M S}}_{1} = \tilde{M S} (T * R)$ , and $\hat{d} = max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]$ . Using (24), the null-distribution result states that F_OR (22) has an approximate F_t−1,df₂ null distribution, where

d f_{2} = \frac{{\tilde{M S} (T * R) + max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]}^{2}}{\frac{{[\tilde{M S} (T * R)]}^{2}}{(t - 1) (r - 1)}}

(25)

Note that the equation for df₂ (25), with Ỹ_ij replaced by θ̂_ij, is the same as the equation for ddf_H (6) for the OR model.

4.11. Remark: Derivation of mm-ANOVA expected mean square and variance component expressions

For the mm-ANOVA model an expected mean square table, such as Table 1a, can be created as follows. Write the mm-ANOVA expected mean squares in terms of the conventional ANOVA variance components and fixed effects using the relationships given in steps 2c and 2e. For example, for the factorial model we have

E [\tilde{M S} (T)] = \frac{1}{c} E [M S (T)] = \frac{1}{c} [\frac{r c}{(t - 1)} \sum_{i = 1}^{t} τ_{i}^{2} + c σ_{T R}^{2} + r σ_{T C}^{2} + σ^{2}]

(26)

From step 1c it follows that the conventional ANOVA variance components in (26) involving case (i.e., the corresponding random effects are subscripted according to case) can be written in terms of the mm-ANOVA covariances: $σ_{T C}^{2} = c ({Cov}_{2} - {Cov}_{3})$ and $σ^{2} = c (σ_{\tilde{ε}}^{2} - {Cov}_{1} - {Cov}_{2} + {Cov}_{3})$ . Replacing these variance components in (26) by their corresponding mm-ANOVA covariance expressions yields $\tilde{M S} (T) = \frac{r}{(t - 1)} \sum_{i = 1}^{t} τ_{i}^{2} + σ_{T R}^{2} + σ_{\tilde{ε}}^{2} - {Cov}_{1} + (r - 1) ({Cov}_{2} - {Cov}_{3})$ , the first line in Table 1a. Similarly, the other expressions in Table 1a can be derived. A table of mm-ANOVA variance component formulas, such as Table 1b, can then be created from the mm-ANOVA expected mean square table by solving for the variance components.

5. Mm-ANOVA algorithm summary and examples

In Sections 3–4 steps 1 and 2 of the mm-ANOVA algorithm were presented. These two steps illustrated the essence of the mm-ANOVA approach. Steps 3 and 4, which are presented later in Appendices B and C, extend this approach by showing how to derive confidence intervals and the non-null distribution of the test statistic.

Table 5 presents a succinct summary of the mm-ANOVA algorithm. This summary is intended to make it easy to use the algorithm to determine the properties of OR-type models corresponding to other study designs. Note that Table 5 shows the steps for deriving the confidence interval formula, not only for a linear combination of test accuracy parameters, but also for a single accuracy parameter. Table 6 illustrates the application of Table 5 to the typical test×reader×case study design previously discussed in Sections 3 and 4.

Table 5.

Algorithm for deriving mm-ANOVA formulas

Derive the mm-ANOVA model
1. Define the conventional ANOVA model that corresponds to the study design as if each reader-performance measure was the mean of case-level outcomes. (Note: Since reader-performance measures are measures of discrimination between diseased and nondiseased cases, disease status should not be included as a factor.)
2. From the conventional ANOVA model defined in step 1a, derive the mm-ANOVA model by averaging across cases. Define the mm-ANOVA model error term equal to the mean, across cases, of the sum of the conventional ANOVA model error term and random effects involving case.
3. Express the mm-ANOVA model error variance and covariances in terms of the conventional ANOVA model variance components.
4. Determine the mm-ANOVA model covariance constraints implied by step 1c.
Derive the mm-ANOVA model test statistic and its null distribution for a hypothesis express in terms of test accuracies (i.e., expected reader-performance measures)
1. State the hypothesis of interest in terms of the mm-ANOVA model.
2. Express the hypotheses from step 2a in terms of the conventional ANOVA model.
3. Create the expected-mean-square table for the conventional ANOVA model
4. Determine the conventional ANOVA F statistic corresponding to the step 2b hypotheses.
5. Express mm-ANOVA mean squares in terms of conventional ANOVA mean squares.
6. Express F from step 2d in terms of the mm-ANOVA model mean squares and U, where U is a linear function of conventional ANOVA model mean squares that involve case.
7. Express E (U) in terms of conventional ANOVA model variance components, and then in terms of mm-ANOVA model error covariance parameters using the relationships from step 1c.
8. Modify F from step 2f to produce the mm-ANOVA statistic $F_{O R}^{*}$ by replacing U by E (U), expressed as a linear function of mm-ANOVA covariance parameters.
9. Derive F_OR by replacing covariance parameters in $F_{O R}^{*}$ by estimates that take into account the constraints from step 1d.
10. Determine the approximate null distribution of F_OR in the following way: Write the denominator of F_OR in the form $b (\sum_{i} a_{i} \tilde{M S_{i}} + \hat{d})$ where the $\tilde{M S_{i}}$ are mm-ANOVA model mean squares, d̂ is a function of the covariance parameter estimates, and the a_i and b are constants. Then F_OR will have an approximate F_df₁,df₂ null distribution, where df₁ is the numerator degrees of freedom for the conventional ANOVA model test statistic in step 2d and df₂ is given by
  $d f_{2} = \frac{{[\sum_{i} a_{i} \tilde{M S_{i}} + \hat{d}]}^{2}}{\sum_{i} \frac{{[a_{i} \tilde{M S_{i}}]}^{2}}{d f (\tilde{M S_{i}})}}$
  where $d f (\tilde{M S_{i}})$ is the degrees of freedom for $\tilde{M S_{i}}$ , and hence also for MS_i.
Derive confidence intervals for a linear function g (θ) of test accuracy parameters.
1. Write the test accuracy parameter vector θ in terms of the mm-ANOVA model.
2. Write θ in terms of the conventional ANOVA model.
3. Determine the conventional ANOVA estimate for θ, denoted by θ̂.
4. Determine the variance V of g (θ̂) in terms of conventional ANOVA parameters.
5. Write V from step 3d in the form V = bE (∑ a_iMS_i) for constants b and a_i.
6. Write V from step 3e in the form $V = \tilde{b} E (\sum ã_{i} \tilde{M S_{i}} + U)$ where b̃ and ã_i are constants and U is a linear function of conventional ANOVA mean squares that involve case.
7. Express E (U) in terms of conventional ANOVA model variance components and then in terms of mm-ANOVA model error covariance parameters, using the relationships from step 1c; then rewrite V using this expression for E (U).
8. Derive the variance estimate V̂ from V by replacing expected mean squares by mean squares and replacing covariances by estimates that take into account the constraints from step 1d.
9. Derive the degrees of freedom df₂ for V̂ using the general formula for df₂ given in step 2j.
10. Write θ̂ from step 3c in terms of the mm-ANOVA model.
11. An approximate (1 − α) 100% confidence interval for g (θ) is given by $g (\hat{θ}) \pm t_{α / 2; d f_{2}} \sqrt{\hat{V}}$ , where V̂ is determined in step 3h, df₂ in step 3i and θ̂ in step 3j.
Derive the non-null distribution of F_OR from step 2i
1. Compute the noncentrality parameter in terms of the conventional ANOVA model: $λ = \frac{d f (M S_{num}) M S_{num} |_{Y = E (Y)}}{E (M S_{num} | H_{0})}$ where MS_num is the numerator mean square from the conventional ANOVA F statistic given in step 2d.
2. Express λ in terms of mm-ANOVA parameters by replacing variance components involving case by mm-ANOVA covariances.
3. Determine the denominator degrees of freedom in terms of mm-ANOVA parameters using $d f_{2} = \frac{{[\sum_{i} a_{i} E (\tilde{M S_{i}}) + d]}^{2}}{\sum_{i} {[a_{i} E (\tilde{M S_{i}})]}^{2} / d f (\tilde{M S_{i}})}$ where $b (\sum_{i} a_{i} \tilde{M S_{i}} + d)$ is the denominator of $F_{O R}^{*}$ from step 2h
4. The non-null distribution is given by F_{df₁,df₂;λ}, where df₁ = df (MS_num), df₂ is determined in step 4c and λ in step 4b.

Open in a new tab

Table 6.

Mm-ANOVA approach for typical test×reader×case factorial study design

Derive the mm-ANOVA model
1. Conventional ANOVA model: Y_ijk = μ + τ_i + R_j + C_k + (τR)_ij + (τC)_ik + (RC)_jk + (τRC)_ijk + ε_ijk, i = 1, …, t; j = 1, …, r; k = 1, …, c, with variance components $σ_{R}^{2}, σ_{C}^{2}, σ_{T R}^{2}, σ_{T C}^{2}, σ_{R C}^{2}, σ_{τ R C}^{2}$ , and $σ_{ε}^{2}$ and constraint $\sum_{i = 1}^{t} τ_{i} = 0$ . Define $σ^{2} = σ_{T R C}^{2} + σ_{ε}^{2}$ .
2. Mm-ANOVA model (note: Ỹ_ij = Y_ij•):
  
  $Ỹ_{i j} = μ + τ_{i} + R_{j} + {(τ R)}_{i j} + {\tilde{ε}}_{i j}$ where ${\tilde{ε}}_{i j} = C_{•} + {(τ C)}_{i •} + {(R C)}_{j •} + {(τ R C)}_{i j •} + ε_{i j •}$ and $\sum_{i = 1}^{t} τ_{i} = 0$
3. Mm-ANOVA error variance and covariances expressed in terms of conventional ANOVA variance components: ${\tilde{σ}}_{ε}^{2} = \frac{1}{c} (σ_{C}^{2} + σ_{T C}^{2} + σ_{R C}^{2} + σ^{2})$ , ${Cov}_{1} \equiv cov ({\tilde{ε}}_{i j}, {\tilde{ε}}_{i' j}) = \frac{1}{c} (σ_{C}^{2} + σ_{R C}^{2})$ , ${Cov}_{2} \equiv cov ({\tilde{ε}}_{i j}, {\tilde{ε}}_{i j'}) = \frac{1}{c} (σ_{C}^{2} + σ_{τ C}^{2})$ , ${Cov}_{3} \equiv cov ({\tilde{ε}}_{i j}, {\tilde{ε}}_{i' j'}) = \frac{1}{c} σ_{C}^{2}$ , where i ≠ i′, j ≠ j′
4. Covariance constraints: Cov₁ ≥ Cov₃; Cov₂ ≥ Cov₃; Cov₃ ≥ 0

Derive the mm-ANOVA test statistic and its null distribution

Mm-ANOVA model hypothesis of equal test accuracies: H₀ : θ₁ = ⋯ = θ_t where θ_i = E (Ỹ_i•)
Conventional ANOVA model hypothesis: θ_i = E (Y_i••) = μ + τ_i ⇒ H₀ : τ₁ = ⋯ = τ_t

Conventional ANOVA expected mean squares

Mean square

Expected mean square

MS(T)

\frac{r c}{(t - 1)} \sum_{i = 1}^{t} τ_{i}^{2} + c σ_{T R}^{2} + r σ_{T C}^{2} + σ^{2}

MS(R)

t c σ_{R}^{2} + c σ_{T R}^{2} + t σ_{R C}^{2} + σ^{2}

MS(C)

t r σ_{C}^{2} + r σ_{T C}^{2} + t σ_{R C}^{2} + σ^{2}

MS(T * R)

c σ_{T R}^{2} + σ^{2}

MS(T * C)

r σ_{T C}^{2} + σ^{2}

MS(R * C)

t σ_{R C}^{2} + σ^{2}

MS(T * R * C)

σ^{2} \equiv σ_{T R C}^{2} + σ_{ε}^{2}

Open in a new tab

Conventional ANOVA test statistic: $F = \frac{M S (T)}{M S (T * R) + M S (T * C) - M S (T * R * C)}$
$\tilde{M S} (T) = \frac{1}{c} M S (T), \tilde{M S} (T * R) = \frac{1}{c} M S (T * R), \tilde{M S} (R) = \frac{1}{c} M S (R)$
$F = \frac{\tilde{M S} (T)}{\tilde{M S} (T * R) + U}$ where $U = \frac{1}{c} {M S (T * C) - M S (T * R * C)}$
$E {M S (T * C)} = r σ_{T C}^{2} + σ^{2}, E {M S [T * R * C]} = σ^{2} \Rightarrow E (U) = \frac{1}{c} (r σ_{T C}^{2}) = r ({Cov}_{2} - {Cov}_{3}) .$
$F_{O R}^{*} = \frac{\tilde{M S} (T)}{\tilde{M S} (T * R) + r ({Cov}_{2} - {Cov}_{3})}$
$F_{O R} = \frac{\tilde{M S} (T)}{\tilde{M S} (T * R) + r max ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}, 0)}$
Under H₀, F_OR ≈ F_t−1,df₂ where $d f_{2} = \frac{{[\tilde{M S} [T * R] + r max ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}, 0)]}^{2}}{{[\tilde{M S} [T * R]]}^{2} / [(t - 1) (r - 1)]}$

Derive confidence intervals
- (a)
  Mm-ANOVA test accuracy parameters: θ = (θ₁, …, θ_t)′, with θ_i = E (Ỹ_i•), i = 1, …, t
- (b)
  Corresponding conventional ANOVA parameters: θ_i = E (Y_i••) = μ + τ_i
- (c)
  Conventional ANOVA estimate: θ̂_i = Y_i••
  - CI for l′ (θ) with l = (l₁, …, l_t)′, $\sum_{i = 1}^{t} l_{i} = 0$ :
- (d)
  $l' (\hat{θ}) = \sum_{i = 1}^{t} l_{i} {\hat{θ}}_{i} = \sum_{i = 1}^{t} l_{i} Y_{i • •} = \sum_{i = 1}^{t} l_{i} τ_{i} + \sum_{i = 1}^{t} l_{i} [{(τ R)}_{i •} + {(τ C)}_{i •} + {(τ R C)}_{i • •} + ε_{i • •}] \Rightarrow V = \sum_{i = 1}^{t} l_{i}^{2} [\frac{σ_{T R}^{2}}{r} + \frac{σ_{T C}^{2}}{c} + \frac{σ^{2}}{r c}] = \frac{1}{r c} \sum_{i = 1}^{t} l_{i}^{2} [c σ_{T R}^{2} + r σ_{T C}^{2} + σ^{2}]$
- (e)
  $V = \frac{1}{r c} \sum_{i = 1}^{t} l_{i}^{2} E [M S (T * R) + M S (T * C) - M S (T * R * C)]$
- (f)
  $V = \frac{1}{r} \sum_{i = 1}^{t} l_{i}^{2} E [\tilde{M S} (T * R) + U]$ where $U = \frac{1}{c} {M S (T * C) - M S (T * R * C)}$
- (g)
  $E (U) = \frac{r σ_{T C}^{2}}{c} = r ({Cov}_{2} - {Cov}_{3}) \Rightarrow V = \frac{1}{r} \sum_{i = 1}^{t} l_{i}^{2} {E [\tilde{M S} (T * R)] + r ({Cov}_{2} - {Cov}_{3})}$
- (h)
  $\hat{V} = \frac{1}{r} \sum_{i = 1}^{t} l_{i}^{2} {\tilde{M S} (T * R) + max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]}$
- (i)
  $d f_{2} = \frac{{[\tilde{M S} (T * R) + r max ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}, 0)]}^{2}}{{[\tilde{M S} (T * R)]}^{2} / [(t - 1) (r - 1)]}$ (same as df₂ in step 2j)
- (j)
  θ̂_i = Ỹ_i•
- (k)
  $C I : \sum_{i = 1}^{t} l_{i} Ỹ_{i •} \pm t_{α / 2; d f_{2}} \sqrt{\frac{1}{r} \sum_{i = 1}^{t} l_{i}^{2} {\tilde{M S} (T * R) + max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]}}$
  - CI for θ_i
- (d)
  ${\hat{θ}}_{i} = Y_{i • •} = τ_{i} + R_{•} + C_{•} + {(τ R)}_{i •} + {(τ C)}_{i •} + {(R C)}_{• •} + {(τ R C)}_{i • •} + ε_{i • •} \Rightarrow V = \frac{σ_{R}^{2}}{r} + \frac{σ_{C}^{2}}{c} + \frac{σ_{T R}^{2}}{r} + \frac{σ_{T C}^{2}}{c} + \frac{σ_{R C}^{2}}{r c} + \frac{σ^{2}}{r c} = \frac{1}{r c} (c σ_{R}^{2} + r σ_{C}^{2} + c σ_{T R}^{2} + r σ_{T C}^{2} + σ_{R C}^{2} + σ^{2})$
- (e)
  $V = \frac{1}{t r c} E [M S (R) + (t - 1) M S (T * R) + M S (C) - M S (R * C) + (t - 1) M S (T * C) - (t - 1) M S (T * R * C)]$
- (f)
  
  $V = \frac{1}{t r} E [\tilde{M S} (R) + (t - 1) \tilde{M S} (T * R) + U]$
  where
  $U = \frac{1}{c} {M S (C) - M S (R * C) + (t - 1) M S (T * C) - (t - 1) M S (T * R * C)}$
- (g)
  $E (U) = \frac{t r}{c} (σ_{C}^{2} + σ_{T C}^{2}) = t r {Cov}_{2} \Rightarrow V = \frac{1}{t r} {E [\tilde{M S} (R) + (t - 1) \tilde{M S} (T * R)] + t r {Cov}_{2}}$
- (h)
  $\hat{V} = \frac{1}{t r} [\tilde{M S} (R) + (t - 1) \tilde{M S} (T * R) + t r max ({\hat{Cov}}_{2}, 0)]$
- (i)
  $d f_{2} = \frac{{[\tilde{M S} (R) + (t - 1) \tilde{M S} (T * R) + t r max ({\hat{Cov}}_{2}, 0)]}^{2}}{\frac{{[\tilde{M S} (R)]}^{2}}{r - 1} + \frac{{[(t - 1) \tilde{M S} (T * R)]}^{2}}{(t - 1) (r - 1)}}$
- (j)
  θ̂_i = Ỹ_i•
- (k)
  $C I : Ỹ_{i •} \pm t_{α / 2; d f_{2}} \sqrt{\frac{1}{t r} [\tilde{M S} (R) + (t - 1) \tilde{M S} (T * R) + t r max ({\hat{Cov}}_{2}, 0)]}$
Derive the non-null distribution F_{df₁,df₂;λ} of the step-2 F statistic
1. Step 2d F numerator: MS_num = MS(T), $E [M S (T)] = \frac{r c}{(t - 1)} \sum_{i = 1}^{t} τ_{i}^{2} + c σ_{T R}^{2} + r σ_{T C}^{2} + σ^{2}$ , df (MS (T)) = t − 1, $E (Y_{i j k}) = μ + τ_{i} \Rightarrow λ = \frac{d f (M S_{num}) {M S_{num} |}_{Y = E (Y)}}{E (M S_{num} | H_{0})} = \frac{r c \sum_{i = 1}^{t} τ_{i}^{2}}{c σ_{T R}^{2} + r σ_{T C}^{2} + σ^{2}}$
2. $r σ_{T C}^{2} + σ^{2} = c [σ_{\tilde{ε}}^{2} - {Cov}_{1} + (r - 1) ({Cov}_{2} - {Cov}_{3})] \Rightarrow λ = \frac{r \sum_{i = 1}^{t} τ_{i}^{2}}{σ_{T R}^{2} + σ_{\tilde{ε}}^{2} - {Cov}_{1} + (r - 1) ({Cov}_{2} - {Cov}_{3})}$
3. Step 2h $F_{O R}^{*} denominator = \tilde{M S} (T * R) + r ({Cov}_{2} - {Cov}_{3})$ , $E (\tilde{M S} (T * R)) = \frac{1}{c} E (\tilde{M S} (T * R)) = \frac{1}{c} (c σ_{T R}^{2} + σ^{2}) = (σ_{T R}^{2} + σ_{\tilde{ε}}^{2} - {Cov}_{1} - {Cov}_{2} + {Cov}_{3}) \Rightarrow d f_{2} = \frac{{[σ_{T R}^{2} + σ_{\tilde{ε}}^{2} - {Cov}_{1} + (r - 1) ({Cov}_{2} - {Cov}_{3})]}^{2}}{\frac{{[σ_{T R}^{2} + σ_{\tilde{ε}}^{2} - {Cov}_{1} - {Cov}_{2} + {Cov}_{3}]}^{2}}{(t - 1) (r - 1)}}$
4. F_OR ~˙ F_{t−1,df₂;λ}

Open in a new tab

Using the algorithm in Table 5, I derive results for several other study designs and summarize these results in the remainder of this section. For each study design the corresponding algorithm results, in a format similar to Table 6, are presented in the referenced supplementary tables that are available in the online version of this article. Note that in the summaries below the reader performance measure is denoted by θ̂_ij instead of Ỹ_ij to make it clear that, although these are mm-ANOVA models, the outcome is not restricted to a marginal mean but can be any reader-performance measure. In addition, I omit the tilde symbol over the mean squares and error term since it is clear that they are for the mm-ANOVA model rather than the corresponding conventional ANOVA model. Standard nesting notation is used; e.g., subscript (i) j denotes that the factor indexed by j is nested within the factor indexed by i, and MS[R (T)] is the mean square for reader nested within test.

5.1. Example 1: Reader×case study design (one test)

In this study design there is only one test and each reader reads each case. Derivation of results using the mm-ANOVA algorithm is presented in Supplementary Table S1. The derivation begins with a conventional reader×case study-design ANOVA model that treats reader and case as random factors and includes their interaction. Averaging across cases produces the corresponding mm-ANOVA model: a one-way ANOVA model with reader as its only factor.

This mm-ANOVA model is given by θ̂_j = μ + R_j + ε_ij, j = 1, …, r, where r is the number of readers. The R_j are mutually independent and normally distributed with zero mean and variance $σ_{R}^{2}$ ; the ε_ij are normally distributed with zero mean and variance $σ_{ε}^{2}$ and are independent of the R_j; and Cov₂ ≡ Cov (ε_j, ε_j′) ≥ 0, j ≠ j′. Thus reader is a random factor and the covariance between error terms is assumed constant. Because there is only one test, only the formula for computing a confidence interval for the single test accuracy is presented.

An approximate (1 − a) 100% confidence interval for a single test accuracy, θ = E (θ̂_j), is given by ${\hat{θ}}_{•} \pm t_{α / 2; d f_{2}} \sqrt{\hat{V}}$ , where $\hat{V} = \frac{1}{r} [M S (R) + r max ({\hat{Cov}}_{2}, 0)], d f_{2} = \frac{{[M S (R) + r max ({\hat{Cov}}_{2}, 0)]}^{2}}{{[M S (R)]}^{2} / (r - 1)}$ , and $M S (R) = \frac{1}{r} \sum_{j = 1}^{r} {({\hat{θ}}_{j} - {\hat{θ}}_{•})}^{2}$ . A hypothesis test for the single test accuracy can be based on this confidence interval. Although Hillis [6] discusses this single-test confidence interval formula, he does not provide a derivation of the result.

This confidence interval result can also be used with the test×reader×case study design to yield single test confidence intervals, each based only on data for the corresponding test, as was illustrated in the analysis of the example data in Section 2.3. Because properties of this confidence interval do not depend on assumptions about the variance components and covariances corresponding to the other tests, we expect these single-test confidence intervals to be more robust than those where the standard error is based on all of the data.

5.2. Example 2: Reader-nested-within-test study design

In this study design readers read images from only one test; i.e., readers are nested within test. This study design is natural when readers are trained to read under only one of the tests. The study design is balanced with an equal number of readers reading all cases using each test. Thus reader is nested within test and is crossed with case. Obuchowski [12] discusses this design and refers to this as a paired-case, unpaired-reader design. This can be viewed as a split-plot design with readers being the “whole plots,” case the split-plot (or within-plot) factor, and test the whole-plot (or between-plot) factor. This design is schematically illustrated in Table 7a.

Table 7.

Split-lot design layouts. For nested factors, the level of the nesting factor is given in parentheses; e.g., reader (t) 1 in (a) denotes reader 1 nested within test t.

a) Reader nested within test. Y_ijk = rating for test i from reader j reading cases 1, …, c, with readers nested in test i; i = 1, …, t, j = 1, …, r, k = 1, …, c.
		case

test	reader	1	…	c

1	(1)1	Y₁₁₁	⋯	Y_11c
⋮	⋮	⋮	⋱	⋮
1	(1)r	Y_1r1	⋯	Y_1rc

⋮	⋮	⋮	⋮	⋮

t	(t)1	Y_t11	⋯	Y_t1c
⋮	⋮	⋮	⋱	⋮
t	(t)r	Y_tr1	⋯	Y_trc

b) Case nested within test. Y_ijk = rating for test i from reader j reading cases 1, …, c, with readers nested in test i; i = 1, …, t, j = 1, …, r, k = 1, …, c.
		reader

test	case	1	⋯	r

1	(1)1	Y₁₁₁	⋯	Y_1r1
⋮	⋮	⋮	⋱	⋮
1	(1)c	Y_11c	⋯	Y_1rc

⋮	⋮	⋮	⋮	⋮

t	(t)1	Y_t11	⋯	Y_tr1
⋮	⋮	⋮	⋱	⋮
t	(t)c	Y_t1c	⋯	Y_trc

c) Case nested within reader. Y_ijk = rating for test i from reader j reading cases 1, …, c, with cases nested in reader j;i = 1, … t, j = 1, …, r, k = 1, …, c.
		test

reader	case	1	⋯	t

1	(1)1	Y₁₁₁	⋯	Y_t11
⋮	⋮	⋮	⋱	⋮
1	(1)c	Y_11c	⋯	Y_t1c

⋮	⋮	⋮	⋮	⋮

r	(r)1	Y_1r1	⋯	Y_tr1
⋮	⋮	⋮	⋱	⋮
r	(r)c	Y_1rc	⋯	Y_trc

d)Reader and case crossed and nested within group. Y_hijk = rating assigned by the jth reader in group h to the kth case in group h using rest i; h = 1, … g, i = 1, … t, j = 1, …, r, k = 1, …, c. Each reader and case is included in only one group.
			test

group	reader	case	1	⋯	t

1	(1)1	(1)1	Y₁₁₁₁	⋯	Y_1t11
⋮	⋮	⋮	⋮	⋱	⋮
1	(1)1	(1)c	Y_111c	⋯	Y_1t1c

⋮	⋮	⋮	⋮	⋮	⋮

1	(1)r	(1)1	Y_11r1	⋯	Y_1tr1
⋮	⋮	⋮	⋮	⋱	⋮
1	(1)r	(1)c	Y_11rc	⋯	Y_1trc

⋮	⋮	⋮	⋮	⋮	⋮

g	(g)1	(g)1	Y_g111	⋯	Y_gt11
⋮	⋮	⋮	⋮	⋱	⋮
g	(g)1	(g)c	Y_g11c	⋯	Y_gt1c

⋮	⋮	⋮	⋮	⋮	⋮

g	(1)r	(1)1	Y_g1r1	⋯	Y_gtr1
⋮	⋮	⋮	⋮	⋱	⋮
g	(g)r	(g)c	Y_g1rc	⋯	Y_gtrc

Open in a new tab

Derivation of results using the mm-ANOVA algorithm is presented in Supplementary Table S2. The derivation begins with a conventional split-plot ANOVA model corresponding to the study design (i.e., with reader nested within test and crossed with case) that treats reader and case as random factors and includes all possible interactions. Averaging across cases produces the corresponding mm-ANOVA model: a reader-nested-within-test ANOVA model with reader as a random factor.

The mm-ANOVA model is given by θ̂_ij = μ + τ_i + R_(i)j + ε_ij, i = 1, …, t, j = 1, …, r where t is the number of tests, r is the number of readers, τ_i denotes the fixed effect of test, and $\sum_{i = 1}^{t} τ_{i} = 0$ . The reader effects, the R_(i)j, are mutually independent and normally distributed with zero mean and variance $σ_{R (T)}^{2}$ , where “R(T)” is read “reader nested within test”. The ε_ij are normally distributed with zero mean and variance $σ_{ε}^{2}$ . The ε_ij are independent of the R_(i)j; Cov₂ = Cov (ε_ij, ε_i′j′) with j ≠ j′ and Cov₃ = Cov (ε_ij, ε_ij′) with i ≠ i′, with Cov₂ ≥ Cov₃ ≥ 0.

Thus there are two error covariances, Cov₂ and Cov₃, Cov₂ ≥ Cov₃ ≥ 0, defined as the covariances between errors for the same test and different readers, and for different tests and different readers, respectively. Note that the definition Cov₃ ≡ Cov (ε_ij, ε_i′j′), i ≠ i′ does not require j ≠ j′ because i ≠ i′ implies different readers. There is no Cov₁ parameter because the design does not allow for one reader reading under two tests.

Let θ_i ≡ E (θ̂_i•) denote the expected reader performance measure for test i. The test statistic for the null hypothesis of equal test accuracies (H₀ : θ₁ = … = θ_t) is

F_{O R} = \frac{M S (T)}{M S [R (T)] + r max ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}, 0)}

where MS(T) is defined as for the factorial model and $M S [R (T)] = \frac{1}{t (r - 1)} \sum_{i = 1}^{t} \sum_{j = 1}^{r} {({\hat{θ}}_{i j} - {\hat{θ}}_{i •})}^{2}$ . Under H₀, F_OR ~˙ F_t−1,df₂ where

d f_{2} = \frac{{[M S [R (T)] + r max ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}, 0)]}^{2}}{{M S [R (T)]}^{2} / [t (r - 1)]}

(27)

More generally, F_OR ~˙ F_{t−1,df₂;λ}, where $λ = \frac{r \sum_{i = 1}^{t} τ_{i}^{2}}{σ_{R (T)}^{2} + σ_{ε}^{2} + (r - 1) {Cov}_{2} - r {Cov}_{3}}$ and $d f_{2} = \frac{{[σ_{R (T)}^{2} + σ_{ε}^{2} + (r - 1) {Cov}_{2} - r {Cov}_{3}]}^{2}}{{(σ_{R (T)}^{2} + σ_{ε}^{2} - {Cov}_{2})}^{2} / [t (r - 1)]}$ .

An approximate (1 − α) 100% confidence interval for contrast $\sum_{i = 1}^{t} l_{i} θ_{i}$ is given by $\sum_{i = 1}^{t} l_{i} {\hat{θ}}_{i •} \pm t_{α / 2; d f_{2}} \sqrt{\hat{V}}$ where $\hat{V} = \frac{1}{r} (\sum_{i = 1}^{t} l_{i}^{2}) {M S [R (T)] + r max ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}, 0)}$ and df₂ is given by (27). An approximate (1 − α) 100% confidence interval for θ_i is given by ${\hat{θ}}_{i •} \pm t_{α / 2; d f_{2}} \sqrt{\hat{V}}$ , where $\hat{V} = \frac{1}{r} {M S [R (T)] + max (r {\hat{Cov}}_{2}, 0)}$ and $d f_{2} = \frac{{M S [R (T)] + max (r {\hat{Cov}}_{2}, 0)}^{2}}{{M S [R (T)]}^{2} / [t (r - 1)]}$ . Alternatively, an approximate (1 − α) 100% confidence interval for θ_i, using a standard error computed only from data for test i, is given by ${\hat{θ}}_{i •} \pm t_{α / 2; d f_{2}^{(i)}} \sqrt{{\hat{V}}^{(i)}}$ , where $\hat{V} = \frac{1}{r} [M S {(R)}^{(i)} + r max ({\hat{Cov}}_{2}^{(i)}, 0)]$ and $d f_{2}^{(i)} = \frac{{[M S {(R)}^{(i)} + r max ({\hat{Cov}}_{2}^{(i)}, 0)]}^{2}}{{[M S {(R)}^{(i)}]}^{2} / (r - 1)}$ , where MS (R)⁽ⁱ⁾ and ${\hat{Cov}}_{2}^{(i)}$ are computed only from test i data; note that this is the result from Section 5.1.

5.3. Example 3: Case-nested-within-test split-plot study design

In this study design each case is imaged under only one test, with the same number of cases imaged for each test. Each reader interprets all of the images from each test. This is often called a paired-reader, unpaired-case design. Obuchowski [12] notes that this design is needed when the diagnostic tests are mutually exclusive, e.g., if they are invasive, administer a high radiation dose, or carry a risk of contrast reactions. This can be viewed as a split-plot design with cases being the whole plots, reader the split-plot factor, and test the whole-plot factor. This design is schematically illustrated in Table 7b.

Derivation of results using the mm-ANOVA algorithm is presented in Supplementary Table S3. The derivation begins with a conventional split-plot ANOVA model corresponding to the study design that treats reader and case as random factors and includes all possible interactions. Averaging across cases produces the corresponding mm-ANOVA model, which is the same as the factorial mm-ANOVA model but with Cov₁ and Cov₃ constrained to zero; i.e., the model is defined by equation (1) and constraints Cov₂ ≥ 0, Cov₁ = Cov₃ = 0. It follows that hypotheses-test, confidence-interval and sample-size formulas can be derived from those for the factorial model by setting Cov₁ =Cov₃ = 0.

Thus the test statistic for the null hypothesis of equal test accuracies is

F_{O R} = \frac{M S (T)}{M S (T * R) + max (r {\hat{Cov}}_{2}, 0)}

Under H₀, F_OR ~˙ F_t−1,df₂ where

d f_{2} = \frac{{M S (T * R) + max (r {\hat{Cov}}_{2}, 0)}^{2}}{{[M S (T * R)]}^{2} / [(t - 1) (r - 1)]}

(28)

More generally, F_OR ~˙ F_{t−1,df₂;λ}, where $λ = \frac{r \sum_{i = 1}^{t} τ_{i}^{2}}{σ_{T R}^{2} + σ_{ε}^{2} + (r - 1) ({Cov}_{2})}$ and $d f_{2} = \frac{{[σ_{T R}^{2} + σ_{ε}^{2} + (r - 1) ({Cov}_{2})]}^{2}}{{[σ_{T R}^{2} + σ_{ε}^{2} - {Cov}_{2}]}^{2} / [(t - 1) (r - 1)]}$ .

Letting θ_i denote E (θ̂_i•), an approximate (1 − α) 100% confidence interval for contrast $\sum_{i = 1}^{t} l_{i} θ_{i}$ is given by $\sum_{i = 1}^{t} l_{i} {\hat{θ}}_{i •} \pm t_{α / 2; d f_{2}} \sqrt{\hat{V}}$ , where df₂ is given by (28) and $\hat{V} = \frac{1}{r} (\sum_{i = 1}^{t} l_{i}^{2}) {M S (T * R) + max [r {\hat{Cov}}_{2}, 0]}$ . An approximate (1 − α) 100% confidence interval for θ_i is given by ${\hat{θ}}_{i •} \pm t_{α / 2; d f_{2}} \sqrt{\hat{V}}$ , where $\hat{V} = \frac{1}{tr} [M S (R) + (t - 1) M S (T * R) + tr max ({\hat{Cov}}_{2}, 0)]$ and $d f_{2} = \frac{{[M S (R) + (t - 1) M S (T * R) + tr max ({\hat{Cov}}_{2}, 0)]}^{2}}{{[M S (R)]}^{2} / (r - 1) + {(t - 1) {M S [T * R]}}^{2} / [(t - 1) (r - 1)]}$ . Alternatively, an approximate (1 − α) 100% confidence interval for θ_i, using a standard error computed only from data for test i, is given by ${\hat{θ}}_{i •} \pm t_{α / 2; d f_{2}^{(i)}} \sqrt{{\hat{V}}^{(i)}}$ , where $\hat{V} = \frac{1}{r} [M S {(R)}^{(i)} + r max ({\hat{Cov}}_{2}^{(i)}, 0)]$ and $d f_{2}^{(i)} = \frac{{[M S {(R)}^{(i)} + r max ({\hat{Cov}}_{2}^{(i)}, 0)]}^{2}}{{[M S {(R)}^{(i)}]}^{2} / (r - 1)}$ , where MS (R)⁽ⁱ⁾ and ${\hat{Cov}}_{2}^{(i)}$ are computed only from test i data. Note that these single-test confidence-interval formulas are the same as those for the factorial design.

5.3.1. Real-data example

Using the Kundel et al [9] data that were discussed in Section 2.3, I now compare soft-copy computed radiographs with screen-film radiographs. The images are from different patients for each type of radiograph, with 95 images in each group (soft-copy computed radiograph: 66 nondiseased, 29 diseased; screen-film radiograph: 68 nondiseased, 27 diseased). Because the images for each method are from different patients, this is an example of a case-nested-within-test study design. The analysis of this study using empirical AUC estimates and jackknife covariance estimates is displayed in Table 8. The AUCs for soft-copy and screen-film images, averaged across the four readers, are 0.804 and 0.829, respectively. The test for the null hypothesis of no AUC difference between soft-copy and screen-film is not significant (F_OR = 0.31, df₂ = 164.4, p = 0.58); the 95% confidence interval for the difference of the population AUCs (screen-film minus soft-copy) is (−0.064, 0.114). Part (h) gives 95% confidence intervals for the single-test AUCs based only on data for the specific test.

Table 8.

Obuchowski-Rockette split-plot (cases nested within test) analysis of Kundel et al [9] data for soft-copy computed radiographs and screen-film radiographs using trapezoid AUC estimation and jackknife covariance estimation for t = 2 tests, r = 4 readers. The images were from different patients for each type of radiograph, with 95 images in each group (soft-copy computed radiograph: 66 nondiseased, 29 diseased; screen-film radiograph: 68 nondiseased, 27 diseased).

Trapezoid AUCs:

Test

1 (Soft-copy computed radiograph) 2 (Screen-film)

Reader (j) θ̂_1j θ̂_2j

1 0.815 0.818

2 0.767 0.836

3 0.831 0.828

4 0.803 0.834

θ̂_1· = .804 θ̂_2· = .829

Open in a new tab
ANOVA table:

Source df Sum of squares Mean square

T 1 0.00125969 0.00125969

R 4 0.00076530 0.00025510

T*R 4 0.00164974 0.00054991

Open in a new tab
Fixed-reader covariance estimates computed from jackknife covariance matrix: ${\hat{σ}}_{ε}^{2} = 0.0023651313, {\hat{Cov}}_{2} = 0.0008800774$
$F_{OR} = \frac{M S (T)}{M S (T * R) + max (r {\hat{Cov}}_{2}, 0)} = 0.31$
Denominator degrees of freedom:
$d f_{2} = \frac{{[M S (T * R) + max (r {\hat{Cov}}_{2}, 0)]}^{2}}{\frac{{[M S (T * R)]}^{2}}{(t - 1) (r - 1)}} = 164.4$
P-value for H₀: θ₁ = θ₂: p = Pr (F_{(t−1), df₂} ≥ F_OR) = 0.579
95% CI for θ₂ − θ₁: ${\hat{θ}}_{2 \cdot} - {\hat{θ}}_{1 \cdot} \pm t_{d f_{2}} \sqrt{\frac{2}{r} {M S (t * R) + r max ({\hat{Cov}}_{2}, 0)}} = (- 0.064, 0.114)$
Single test 95% confidence intervals using only corresponding data. Note: ${StdErr}^{(i)} = \sqrt{\frac{1}{r} {M S {(R)}^{(i)} + r * max ({\hat{Cov}}_{2}^{(i)}, 0)}}$

i θ̂_i
${\hat{Cov}}_{2}^{(i)}$
MS(R)⁽ⁱ⁾ StdErr⁽ⁱ⁾
${d f}_{2}^{(i)}$
95% CI

1(Soft-copy) 0.804 0.000880 0.000735 0.0326 100.4 0.739, 0.867

2(Screen-film) 0.829 0.000881 0.000070 0.0300 7997.2 0.770, 0.888

Open in a new tab

Open in a new tab

5.4. Example 4: Case-nested-within-reader split-plot study design

In this study design each reader interprets a different set of cases using all of the diagnostic tests. The study design is balanced with each reader reading the same number of cases under each test. This can be viewed as a split-plot design with cases being the whole plots, reader the whole-plot factor, and test the split-plot factor. Obuchowski [12] refers to this as a hybrid design. The advantage of this design is that for equivalent power each reader must interpret fewer cases than for the factorial design, but the disadvantage is that the total number of cases is higher [13]. Thus this design is appropriate when a large number of verified cases are available and reading time per reader is limited or relatively expensive. This design is schematically illustrated in Table 7c.

Derivation of results using the mm-ANOVA algorithm is presented in Supplementary Table S4. The derivation begins with a conventional split-plot ANOVA model corresponding to the study design that treats reader and case as random factors and includes all possible interactions. Averaging across cases produces the corresponding mm-ANOVA model, which is the same as the factorial model except with Cov₂ and Cov₃ constrained to zero; i.e., the model is defined by (1) and constraints: Cov₁ ≥ 0, Cov₂ = Cov₃ = 0. Because this model is the same as the factorial model with Cov₂ and Cov₃ constrained to zero, hypotheses-test, confidence-interval, and sample-size formulas can be derived from those for the factorial model by setting Cov₂ =Cov₃ = 0.

Thus the test statistic for the null hypothesis of equal test accuracies is

F_{O R} = \frac{M S (T)}{M S (T * R)}

Under H₀, F_OR ~˙ F_t−1,df₂ where

d f_{2} = (t - 1) (r - 1)

(29)

More generally, F_OR ~˙ F_{t−1,df₂;λ}, where $λ = \frac{r \sum_{i = 1}^{t} τ_{i}^{2}}{σ_{T R}^{2} + σ_{ε}^{2} - {Cov}_{1}}$ and df₂ is given by (29).

5.5. Example 5: Reader-and-case-crossed-and-nested-within-group split-plot study design

In this study design there are several groups (or blocks) of readers and cases such that (1) each reader and each case belongs to only one group and (2) within each group all readers read all cases under each test. I assume a balanced design where each group has the same number of readers and cases. Obuchowski [13] discusses this design and refers to it as a mixed design; I will refer to it as a mixed split-plot design. The motivation for this study design is to reduce the number of reader interpretations for each reader, compared to the factorial study, without requiring as many cases to be verified as the hybrid design. This design is schematically illustrated in Table 7d. Although not explicitly stated, Obuchowski [13] assumes that there is no group effect for this design; e.g., cases and readers are randomly assigned to the groups (personal communication, Nancy Obuchowski, 2012). In contrast, I allow for a group effect; e.g., readers are assigned to groups according to experience level. Obuchowski et al [14] provide a real-data example that shows how this design can be particularly useful for studying multiple imaging tests.

Derivation of results using the mm-ANOVA algorithm is presented in Supplementary Table S5. The derivation begins with a conventional split-plot ANOVA model corresponding to the study design (reader and case crossed and nested within group) that treats reader and case as random factors and group and test as fixed factors. All possible interactions are included. Averaging across cases produces the corresponding mm-ANOVA model: a three-way ANOVA model with group, test, and reader as factors.

Let θ̂_hij denote the reader-performance estimate for reader j under test i, with both belonging to group h. The mm-ANOVA model is given by θ̂_hij = μ + γ_h + τ_i + (γτ)_hi + R_(h)i + (τR)_(h)ij + ε_hij, h = 1, …, g, i = 1, …, t, j = 1, …, r, where g is the number of groups, t is the number of tests, r is the number of readers, τ_i denotes the fixed effect of test i, γ_h denotes the fixed effect of group h, and (γτ)_hi denotes the fixed group-by-test interaction with $\sum_{i = 1}^{t} τ_{i} = \sum_{h = 1}^{g} γ_{h} = \sum_{h = 1}^{g} {(γ τ)}_{h i} = \sum_{i = 1}^{t} {(γ τ)}_{h i} = 0$ . The R_(h)j and (τR)_(h)ij are random reader and test-by-reader effects, nested within group; they are mutually independent and normally distributed with zero means and respective variances $σ_{R (G)}^{2}$ and $σ_{τ R (G)}^{2}$ . The ε_hij are normally distributed with zero mean and variance $σ_{ε}^{2}$ . The ε_hij are independent of the R_(h)j and (τR)_(h)ij. In summary, the mm-ANOVA model contains fixed effects for group, test, and their interaction, and random effects for reader nested within group and the test-by-reader interaction nested within group.

Cov₁, Cov₂, and Cov₃ are defined and constrained similar to corresponding covariances for the typical test×reader×case factorial design, but with this difference: here they are not defined between errors corresponding to different groups because the covariance of those errors is zero. Specifically, Cov₁ ≡ Cov (ε_hij, ε_hi′j), Cov₂ ≡ Cov (ε_hij, ε_hij′), and Cov₃ ≡ Cov (ε_hij, ε_hij) where i ≠ i′, j ≠ j′ and Cov₁ ≥ Cov₃, Cov₂ ≥ Cov₃, and Cov₃ ≥ 0.

The null hypothesis of equal test accuracies is H₀ : θ₁ =…= θ_t, where θ_i = E (θ̂_•i•). The corresponding test statistic is

F_{O R} = \frac{M S (T)}{M S [T * R (G)] + max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]}

Under H₀, F_OR ~˙ F_t−1,df₂ where

d f_{2} = \frac{{M S [T * R (G)] + max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]}^{2}}{{M S [T * R (G)]}^{2} / [g (t - 1) (r - 1)]}

(30)

and MS[T * R(G)] denotes the mean square for test-by-reader interaction nested within group. More generally, F_OR has an approximate F_{t−1,df₂;λ} distribution, where $λ = \frac{g r \sum_{i = 1}^{t} τ_{i}^{2}}{σ_{T R (G)}^{2} + σ_{ε}^{2} - {Cov}_{1} + (r - 1) ({Cov}_{2} - {Cov}_{3})}$ and $d f_{2} = \frac{{[σ_{T R (G)}^{2} + σ_{ε}^{2} - {Cov}_{1} + (r - 1) ({Cov}_{2} - {Cov}_{3})]}^{2}}{{[σ_{T R (G)}^{2} + σ_{ε}^{2} - {Cov}_{1} - {Cov}_{2} + {Cov}_{3}]}^{2} / [g (t - 1) (r - 1)]}$ .

An approximate (1 − α) 100% confidence interval for contrast $\sum_{i = 1}^{t} l_{i} θ_{i}$ is given by $\sum_{i = 1}^{t} l_{i} {\hat{θ}}_{• i •} \pm t_{α / 2; d f_{2}} \sqrt{\hat{V}}$ , where df₂ is given by (30) and $\hat{V} = \frac{1}{r} \sum_{i = 1}^{t} l_{i}^{2} {M S [R (T)] + r max ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}, 0)}$ . An approximate (1 − α) 100% confidence interval for θ_i is given by ${\hat{θ}}_{• i •} \pm t_{α / 2; d f_{2}} \sqrt{\hat{V}}$ , where $\hat{V} = \frac{1}{g tr} [M S [R (G)] + (t - 1) M S [T * R (G)] + t r max ({\hat{Cov}}_{2}, 0)]$ and $d f_{2} = \frac{{M S [R (G)] + (t - 1) M S [T * R (G)] + t r max ({\hat{Cov}}_{2}, 0)}^{2}}{{M S [R (G)]}^{2} / [g (r - 1)] + {(t - 1) M S [T * R (G)]}^{2} / [g (t - 1) (r - 1)]}$ .

5.6. Example 6: Replicated factorial study design

This study design is the same as the factorial study design except that each reader reads each case n times. Typically sessions corresponding to different readings are separated by a suitable period of time to reduce the probability that the reader will recognize cases from the earlier session. This study design has two advantages over the factorial design with one replication: it allows for estimation of within-reader reliability between two readings of the same cases, and it provides more power for the same number of cases and readers. This last aspect can be important if the number of available cases and readers is limited. In the example later in this section, I show how to estimate the gain in power based on pilot data.

Derivation of results using the mm-ANOVA algorithm is presented in Supplementary Table S6. The derivation begins with a conventional three-way replicated factorial ANOVA model with reader and case as random factors and test as a fixed factor. There are n replications. All possible interactions are included between reader, case and test. Averaging across cases for each replication produces the corresponding mm-ANOVA model: a two-way replicated factorial ANOVA model with test and reader as factors.

Let θ̂_ijm denote the reader-performance estimate for reader j under test i based on the mth reading of the data. The mm-ANOVA model is given by θ̂_ijm = μ + τ_i + R_j + (τR)_ij + ε_ijm i = 1, …, t, j = 1, …, r, m = 1, …, n where t is the number of tests, r is the number of readers, n is the number of replications, τ_i denotes the fixed effect of test i, R_j denotes the random effect of reader j, (τR)_ij denotes the random test×reader interaction, ε_ijm is the error term, and $\sum_{i = 1}^{t} τ_{i} = 0$ . The R_j and (τR)_ij are assumed to be mutually independent and normally distributed with zero means and respective variances $σ_{R}^{2}$ and $σ_{T R}^{2}$ . The ε_ij are assumed to be normally distributed with zero mean and variance $σ_{ε}^{2}$ and are assumed independent of the R_j and (τR)_ij. The errors are equi-covariant with four possible covariances given by

Cov (ε_{i j m}, ε_{i' j' m'}) = {\begin{matrix} {Cov}_{0} & i = i, j = j, m \neq m' (same test and reader, different replication) \\ {Cov}_{1} & i \neq i', j = j' (different test, same reader) \\ {Cov}_{2} & i = i', j \neq j' (same test, different reader) \\ {Cov}_{3} & i \neq i', j \neq j' (different test, different reader) \end{matrix}

and subject to the following constraints:

{Cov}_{0} \geq {Cov}_{1} \geq {Cov}_{3}; {Cov}_{0} \geq {Cov}_{2} \geq {Cov}_{3}; {Cov}_{3} \geq 0

Let θ_i ≡ E (θ̂_i••) denote the expected reader performance measure for test i. The test statistic for the null hypothesis of equal test accuracies (H₀ : θ₁ =…= θ_t) is

F_{O R} = \frac{M S (T)}{M S (T * R) + n r max ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}, 0)}

where $M S (T) = \frac{n r}{t - 1} \sum_{i = 1}^{t} {({\hat{θ}}_{i • •} - {\hat{θ}}_{• • •})}^{2}$ and $M S (T * R) = \frac{n}{(t - 1) (r - 1)} \sum_{i = 1}^{t} \sum_{j = 1}^{r} {({\hat{θ}}_{i j •} - {\hat{θ}}_{i • •} - {\hat{θ}}_{• j •} + {\hat{θ}}_{• • •})}^{2}$ . Under H₀, F_OR ~˙ F_t−1,df₂ where

d f_{2} = \frac{{[M S (T * R) + n r max ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}, 0)]}^{2}}{{M S (T * R)}^{2} / [(t - 1) (r - 1)]}

(31)

More generally, F_OR ~˙ F_{t−1,df₂;λ}, where

λ = \frac{r \sum_{i = 1}^{t} τ_{i}^{2}}{σ_{T R}^{2} + σ_{ε}^{2} / n - {Cov}_{1} + (r - 1) ({Cov}_{2} - {Cov}_{3}) + [(n - 1) / (n)] {Cov}_{0}}

(32)

and

d f_{2} = \frac{{[σ_{T R}^{2} + σ_{\tilde{ε}}^{2} / n - {Cov}_{1} + (r - 1) ({Cov}_{2} - {Cov}_{3}) + [(n - 1) / n] {Cov}_{0}]}^{2}}{{[σ_{T R}^{2} + σ_{\tilde{ε}}^{2} / n - {Cov}_{1} - ({Cov}_{2} - {Cov}_{3}) + [(n - 1) / n] {Cov}_{0}]}^{2} / [(t - 1) (r - 1)]}

(33)

An approximate (1 − α) 100% confidence interval for contrast $\sum_{i = 1}^{t} l_{i} θ_{i}$ is given by $\sum_{i = 1}^{t} l_{i} {\hat{θ}}_{i • •} \pm t_{α / 2; d f_{2}} \sqrt{\hat{V}}$ where $\hat{V} = \frac{1}{n r} (\sum_{i = 1}^{t} l_{i}^{2}) {M S (T * R) + n r max ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}, 0)}$ and df₂ is given by (31). An approximate (1 − α) 100% confidence interval for θ_i is given by ${\hat{θ}}_{i • •} \pm t_{α / 2; d f_{2}} \sqrt{\hat{V}}$ , where $\hat{V} = \frac{1}{n tr} {M S (R) + (t - 1) M S (T * R) + max (n tr {\hat{Cov}}_{2}, 0)}$ and $d f_{2} = \frac{{[M S (R) + (t - 1) M S (T * R) + n t r max ({\hat{Cov}}_{2}, 0)]}^{2}}{{[M S (R)]}^{2} / (r - 1) + {[(t - 1) M S (T * R)]}^{2} / [(t - 1) (r - 1)]}$ .

Consider Cov₂ ≡ cov (θ̂_ijm, θ_ij′m′) where j ≠ j′ and either m = m′ or m ≠ m′. It follows that Cov₂ can be computed from one set of replications (m = m′) or from different sets of replications (m ≠ m′). For example, for test i and readers j and j′, with n = 2 we have Cov₂ = cov (θ̂_ij1, θ_ij′1) = cov (θ̂_ij1, θ_ij′2) = cov (θ̂_ij2, θ_ij′1) = cov (θ_ij2, θ_ij′2). Thus an obvious estimate for Cov₂ that utilizes all of the data is given by

{\hat{Cov}}_{2} = \frac{2}{n^{2} tr (r - 1)} \sum_{i = 1}^{t} \sum_{j < j'} \sum_{1 \leq m \leq n, 1 \leq m' \leq n} \hat{cov} ({\hat{θ}}_{i j m}, θ_{i j' m'})

where $\hat{cov} ({\hat{θ}}_{i j m}, θ_{i j' m'})$ is a fixed-reader covariance estimate, as discussed in Section 2.2. Similarly, estimates for Cov₁ and Cov₃ can be estimated by averaging fixed-reader covariance estimates, computed for each of the n² possible (m, m′) pairs of replications, across corresponding test-reader combinations. Obvious estimates for Cov₀ and $σ_{ε}^{2}$ are ${\hat{Cov}}_{0} = \frac{2}{n (n - 1) tr} \sum_{i = 1}^{t} \sum_{j = 1}^{r} \sum_{m < m'} \hat{cov} ({\hat{θ}}_{i j m}, θ_{i j m'})$ and ${\hat{σ}}_{ε}^{2} = \frac{1}{n tr} \sum_{i = 1}^{t} \sum_{j = 1}^{r} \sum_{m = 1}^{n} \hat{var} ({\hat{θ}}_{i j m})$ , where $\hat{var} ({\hat{θ}}_{i j m}) = \hat{cov} ({\hat{θ}}_{i j m}, {\hat{θ}}_{i j m})$ .

5.6.1. Real-data example

In Section 2.3 I compared AUCs for hard- and soft-copy computed radiography chest images. Both types of images were obtained for each patient and were read by each of the readers. Thus this was a factorial study design, which could be analyzed by the standard OR procedure. Although there was not a significant difference between the two types of images, the resulting confidence interval showed that an AUC difference as large as 0.086 was commensurate with the data. In such a situation the researcher might want to plan a similar experiment that is sized to have more power.

Increased power can be obtained by increasing the number of readers, the number of cases, or the number of replications. I now compute the number of cases needed to obtain .80 power to detect an AUC difference of .04 with alpha = .05. Because F_OR ~˙ F_{t−1,df₂;λ}, power is approximated by Pr (F_1,df₂,λ > F_.95;1,df₂) where λ and df₂ are defined by (32) and (33) and F_.95;1,df₂ is the 95th percentile of a central F distribution with degrees of freedom 1 and df₂.

For the power computations I use the following estimates, obtained from Section 2.3: ${\hat{σ}}_{ε}^{2} = . 0022034331, {\hat{Cov}}_{1} = . 0011163046, {\hat{Cov}}_{2} = . 0.0008438255, {\hat{Cov}}_{3} = . 0008871752$ , and ${\hat{σ}}_{T R}^{2} = 0$ . An estimate of Cov₀ is not available from the data because there are no replicated readings; however, the similarity of the two tests (hard- and soft-copy) suggests that the within-reader correlation between replications for the same test and reader, $ρ_{0} = {Cov}_{0} / σ_{ε}^{2}$ , should be only slightly higher than the within-reader correlation based on one replication between two tests, given by ${\hat{p}}_{1} = {\hat{Cov}}_{1} / {\hat{σ}}_{ε}^{2} = 0.507$ from Table 2. Thus I set ρ₀ = 0.60 for the power computations; it follows that ${\hat{Cov}}_{0} = . 6 {\hat{σ}}_{ε}^{2} = 0.00132206$ . Following Hillis et al [15] I assume that the covariances are inversely proportion to the number of cases c, and hence multiply ${\hat{σ}}_{ε}^{2}, {\hat{Cov}}_{1}, {\hat{Cov}}_{2}$ and ${\hat{Cov}}_{3}$ by the factor $\frac{95}{c}$ (recall that 95 is the number of cases for the example); the resulting values are used in place of $σ_{ε}^{2}$ , Cov₁, Cov₂, and Cov₃ in (32) and (33) when computing power for c cases.

The numbers of cases need to achieve 0.80 power for combinations of 4–8 readers and 1–2 replications are presented in Table 9. For example, achieving 0.80 power with 8 readers and one replication requires 173 cases versus 103 cases with two replications. Thus if cases are expensive to obtain or validate and it is difficult to obtain more than 8 readers, then using two replications appears to be an attractive option.

Table 9.

Number of replications, readers, and cases needed to achieve .80 power to detect a .04 AUC difference between soft- and hard-copy radiographs using a factorial study design, based on estimates from the Kundel et al [9] data, an assumed within-reader within-replication correlation of 0.60, and alpha = .05.

replications (n)	readers (r)	cases (c)	power
1	4	585	0.800
1	5	366	0.801
1	6	266	0.800
1	7	210	0.802
1	8	173	0.801
2	4	348	0.800
2	5	218	0.801
2	6	158	0.800
2	7	125	0.802
2	8	103	0.802

Open in a new tab

6. Discussion

The mm-ANOVA approach allows for analysis of ROC and other reader-performance outcomes that result from any balanced study design that has reader and case as random factors and any number of fixed factors. In addition, by providing the non-null distribution of the test statistic it allows for sample size estimation for such studies and efficiency comparisons between different types of studies. Although steps were fully justified only for the test×reader×case factorial study design, justification can be similarly established for other designs. Until now researchers have been limited to using the test×reader×case study design with the OR method because analysis methods were not developed for other designs. This work allows researchers to choose designs that are most appropriate for their study. A SAS macro for fitting some of these designs using the mm-ANOVA approach is available on request from the author.

As noted in Section 2.4, Obuchowski and Rockette [1] derived their F statistic by modifying the F statistic described by Pavur and Nath [10]. Although Pavur and Nath [10] give results only for two-factor models, their approach, which is based on results given by Pavur and Lewis [19], could conceivably be applied to other correlated-error ANOVA models; as such it would provide an alternative to the approach described in this paper. However, the results of Pavur and Lewis do not extend beyond specifying the correct form for the F test when correlations are known; in particular, they do not indicate how to implement their approach when the correlations must be estimated, do not discuss derivation of confidence interval formulas for contrasts, give little motivation for the correlated error models, and do not discuss power computations.

Explicit formulas can be derived [20, 21, 22] for the variances of reader-performance outcomes that are U-statistics [23], such as reader empirical-AUC averages and their differences. Replacing parameters in these formulas by sample estimates yields variance estimates with excellent statistical properties. However, this approach is limited to U-statistic estimators, such as the empirical AUC and presently incorporates an adaptation of the OR degrees of freedom formula. Advantages include explicit variance formulas and applicability to a wide variety of multireader study designs, including unbalanced designs.

Another alternative approach for analyzing multireader data is the marginal model approach proposed by Song and Zhou [24] for empirical AUC estimates. An advantage of their approach is that case-specific covariates can be included; disadvantages include being limited to empirical AUC outcomes, based on large-sample inferences, and thus far developed only for the factorial model.

Limitations of the mm-ANOVA approach include the following: (1) It is presently limited to balanced study designs; i.e., the number of levels for each factor does not depend on any other factor. However, because case is treated as one factor it is possible to have different numbers of normal and abnormal cases. I am currently investigating models that are not balanced with regard to case. (2) It assumes that the number of cases is large enough so that covariance estimates can be treated like known values for computing the denominator degrees of freedom. (3) It assumes that the fixed-reader measurement errors, the ε_ij, are normally distributed. This is a reasonable assumption when the number of cases is moderate because most typical reader-performance outcomes, such as AUC, have asymptotic normal distributions for a fixed reader. (4) It assumes that the latent reader-performance outcomes (i.e., R_j + (τR)_ij) have a normal distribution. If these normal distribution assumptions do not appear to be reasonable, one possible remedy is to transform the outcome, e.g., using a logarithmic or logit transformation for AUC. (5) It assumes the errors have an equi-covariance structure. I am currently investigating the robustness of the mm-ANOVA approach to this assumption.

Supplementary Material

Web-based Supporting Materials

NIHMS733736-supplement-Web-based_Supporting_Materials.pdf^{(143.9KB, pdf)}

ACKNOWLEDGEMENTS

This research was supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB), grants R01EB000863 and R01EB013667. I thank Dr. Harold Kundel for sharing his data set.

Appendix

A. DERIVATION OF THE NULL-DISTRIBUTION RESULT USED IN STEP 2J

To derive the null-distribution result given in step 2j, I approximate the distribution of F_OR (22) by deriving an approximate distribution for $F_{O R}^{*}$ (21), where Cov₂ and Cov₃ are known. Each ${\tilde{M S}}_{i}$ is equal to its corresponding conventional three-way ANOVA model mean square, denoted by MS_i, multiplied by $\frac{1}{c}$ , with $M S_{i} ~ E (M S_{i}) χ_{d f (M S_{i})}^{2} / d f (M S_{i})$ under H₀ : θ₁ = … = θ_t. It follows that the ${\tilde{M S}}_{i}$ are mutually independent, each ${\tilde{M S}}_{i}$ has the same degrees of freedom as its corresponding MS_i and ${\tilde{M S}}_{i} ~ E ({\tilde{M S}}_{i}) χ_{d f ({\tilde{M S}}_{i})}^{2} / d f ({\tilde{M S}}_{i})$ .

In general, a chi-squared-distribution approximation [25, 26] for a random variable X is given by

E (X) χ_{d f}^{2} / d f

where

d f = \frac{2 {[E (X)]}^{2}}{var (X)}

It follows that a chi-square approximation for

X = b (\sum_{i = 1}^{I} a_{i} {\tilde{M S}}_{i} + d)

where the a_i, b and d are constants, is given by

b (\sum_{i = 1}^{I} a_{i} E ({\tilde{M S}}_{i}) + d) χ_{d f}^{2} / d f

(A1)

where

d f = \frac{{[\sum_{i = 1}^{I} a_{i} E ({\tilde{M S}}_{i}) + d]}^{2}}{\sum_{i = 1}^{I} \frac{{[a_{i} E ({\tilde{M S}}_{i})]}^{2}}{d f ({\tilde{M S}}_{i})}}

(A2)

Replacing $E ({\tilde{M S}}_{i})$ by ${\tilde{M S}}_{i}$ and d by an estimate d̂ in (A2) results in the approximation for df given by df₂ (24).

It follows using (A1) with i = 1, a₁ = 1, ${\tilde{M S}}_{1} = \tilde{M S} (T * R)$ , d = r (Cov₂ − Cov₃) and (A2) estimated by (24) that a chi-squared approximation for $\tilde{M S} (T * R) + r ({Cov}_{2} - {Cov}_{3})$ , the denominator of F* (21), is given by

\tilde{M S} (T * R) + r ({Cov}_{2} - {Cov}_{3}) \dot{~} {E [\tilde{M S} (T * R)] + r ({Cov}_{2} - {Cov}_{3})} χ_{d f_{2}}^{2} / d f_{2}

(A3)

where df₂ is given by (25) and “~˙” stands for “is approximately distributed as.” See Reference [6] for a more detailed derivation and justification of df₂ (referred to as ddf_H in the reference.)

Because $F_{O R}^{*}$ (21) is an ANOVA statistic, $E [\tilde{M S} (T)] = E [\tilde{M S} (T * R)] + r ({Cov}_{2} - {Cov}_{3})$ under H₀. Combining this result with the chi-squared approximation (A3) for $\tilde{M S} (T * R) + r ({Cov}_{2} - {Cov}_{3})$ and the independence of $\tilde{M S} (T)$ and $\tilde{M S} (T * R)$ , it follows under H₀ that

F_{O R}^{*} = \frac{\tilde{M S} (T)}{\tilde{M S} (T * R) + r ({Cov}_{2} - {Cov}_{3})} = \frac{\frac{\tilde{M S} (T)}{E [\tilde{M S} (T)]}}{\frac{\tilde{M S} (T * R) + r ({Cov}_{2} - {Cov}_{3})}{E [\tilde{M S} (T * R)] + r ({Cov}_{2} - {Cov}_{3})}} = \frac{U / (t - 1)}{W / d f_{2}}

where $U ~ χ_{t - 1}^{2}$ , W is approximately $χ_{d f_{2}}^{2}$ , and U and W are independent. Thus $F_{O R}^{*}$ has an approximate F_{(t−1),df₂} null distribution, with df₂ given by (25). Because F_OR (22) approximates $F_{O R}^{*}$ (21), it is reasonable to approximate the null distribution of F_OR by F_{(t−1),df₂}, which is the null distribution derived by Hillis [6] for F_OR, discussed in Section 2.2.

B. MM-ANOVA APPROACH STEP 3: DERIVE CONFIDENCE INTERVALS FOR A LINEAR FUNCTION g(θ) OF TEST ACCURACIES

In this section I show how to compute a confidence interval for a linear function of test accuracy parameters. Specifically, for the balanced test×reader×case factorial study design with θ_i ≡ E(θ̂_i•) denoting the expected reader-performance outcome for test i across readers, θ = (θ₁, …, θ_t)′, and l = (l₁, …, l_t)′ denoting a t-dimensional contrast vector (i.e., $\sum_{i = 1}^{t} l_{i} = 0$ ), I illustrate how to derive a confidence interval for g (θ) ≡ l′θ. More generally this step can be used to determine a confidence interval for g (θ), where g (·) is any linear function and θ any vector of test accuracy parameters; this general result is given in step 3k.

B.1. Step 3a: Write the test accuracy parameter vector θ in terms of the mm-ANOVA model

In terms of the mm-ANOVA model parameterization, treating Ỹ_ij as θ̂_ij, we have θ_i = E (Ỹ_i•) = μ + τ_i.

B.2. Step 3b: Write θ in terms of the conventional ANOVA model

Since θ_i = E (Ỹ_i•) = E (Y_i••) = μ + τ_i, then in terms of the conventional ANOVA model we also have θ_i = μ + τ_i.

B.3. Step 3c: Determine the conventional ANOVA estimate for θ, denoted by θ̂

The conventional unbiased ANOVA estimate for θ is given by θ̂ = (θ̂₁, …, θ̂_t)′ with θ̂_i = Y_i••.

B.4. Step 3d: Determine the variance V of g (θ̂) in terms of conventional ANOVA parameters

From (7) it follows that

g (\hat{θ}) = \sum_{i = 1}^{t} l_{i} {\hat{θ}}_{i} = \sum_{i = 1}^{t} l_{i} Y_{i • •} = \sum_{i = 1}^{t} l_{i} τ_{i} + \sum_{i = 1}^{t} l_{i} [{(τ R)}_{i •} + {(τ C)}_{i •} + {(τ R C)}_{i • •} + ε_{i • •}]

Thus

V \equiv Var (g (\hat{θ})) = \sum_{i = 1}^{t} l_{i}^{2} [\frac{σ_{T R}^{2}}{r} + \frac{σ_{T C}^{2}}{c} + \frac{σ^{2}}{r c}] = \frac{1}{r c} \sum_{i = 1}^{t} l_{i}^{2} [c σ_{T R}^{2} + r σ_{T C}^{2} + σ^{2}]

Because θ̂ has a multivariate normal distribution, it follows that

g (\hat{θ}) ~ N (l' θ, V)

where

V = \frac{1}{r c} \sum_{i = 1}^{t} l_{i}^{2} (c σ_{T R}^{2} + r σ_{T C}^{2} + σ^{2})

B.5. Step 3e: Write V from step 3d in the form V = bE (∑a_iMS_i) for constants b and a_i

Expected values of the conventional ANOVA mean squares are given in Table 4. It follows that

V = \frac{1}{r c} \sum_{i = 1}^{t} l_{i}^{2} E [M S (T * R) + M S (T * C) - M S (T * R * C)]

B.6. Step 3f: Write V from step 3e in the form $V = \tilde{b} E (\sum ã_{i} {\tilde{M S}}_{i} + U)$ where b̃ and ã_i are constants and U is a linear function of conventional ANOVA mean squares that involve case

We have

V = \frac{1}{r} \sum_{i = 1}^{t} l_{i}^{2} E [\tilde{M S} (T * R) + U]

where

U = \frac{1}{c} [M S (T * C) - M S (T * R * C)]

B.7. Step 3g: Express E (U) in terms of conventional ANOVA model variance components and then in terms of mm-ANOVA model error covariance parameters, using the relationships from step 1c; then rewrite V using this expression for E (U)

We did the first part of this step in step 2g where we showed

E (U) = r ({Cov}_{2} - {Cov}_{3})

Using this expression we have

V = \frac{1}{r} \sum_{i = 1}^{t} l_{i}^{2} {E [\tilde{M S} (T * R)] + r ({Cov}_{2} - {Cov}_{3})}

B.8. Step 3h: Derive the variance estimate V̂ from V by replacing expected mean squares by mean squares and replacing covariances by estimates that take into account the constraints from step 1d

We have

\hat{V} = \frac{1}{r} \sum_{i = 1}^{t} l_{i}^{2} {\tilde{M S} (T * R) + max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]}

B.9. Step 3i: Derive the degrees of freedom df₂ for V̂ using the general formula for df₂ (24) given in step 2j

It follows that the degrees of freedom is given by (25), which is the same as ddf_H (6).

B.10. Step 3j: Write θ̂ from step 3c in terms of the mm-ANOVA model

Since θ̂_i = Y_i•• = Ỹ_i•, then in terms of the mm-ANOVA model θ̂_i = Ỹ_i•.

B.11. Step 3k: General confidence-interval result: In terms of the mm-ANOVA model, an approximate (1 − α) 100% confidence interval for g (θ) is given by $g (\hat{θ}) \pm t_{α / 2; d f_{2}} \sqrt{\hat{V}}$ where V̂ is determined in step 3h, df₂ in step 3i and θ̂ in step 3j

This result yields the following (1 − α) 100% confidence interval for l′θ:

\sum_{i = 1}^{t} l_{i} Ỹ_{i •} \pm t_{α / 2; d d f_{H}} \sqrt{\frac{1}{r} (\sum_{i = 1}^{t} l_{i}^{2}) [\tilde{M S} (T * R) + r max ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}, 0)]}

(B1)

where ddf_H is given by (25). Letting “F_OR-test denominator” denote the denominator of the F_OR statistic (22) for testing H₀ : θ₁ = … = θ_t, we can write (B1) as

\sum_{i = 1}^{t} l_{i} Ỹ_{i •} \pm t_{α / 2; d d f_{H}} \sqrt{\frac{1}{r} (\sum_{i = 1}^{t} l_{i}^{2}) {F_{O R} - test denominator}}

B.12. Derivation of the general confidence-interval result given in step 3k

I now derive the step 3k result for the test×reader×case factorial study design with g (θ) ≡ l′θ and l = (l₁, …, l_t)′ denoting a t-dimensional contrast vector (i.e., $\sum_{i = 1}^{t} l_{i} = 0$ ). We have shown in the previous steps that g (θ̂) ~ N [g (θ), V], where

V = \frac{1}{r} \sum_{i = 1}^{t} l_{i}^{2} {E [\tilde{M S} (T * R)] + r ({Cov}_{2} - {Cov}_{3})}

Define V* by replacing $E [\tilde{M S} (T * R)]$ by $\tilde{M S} (T * R)$ :

V^{*} = \frac{1}{r} \sum_{i = 1}^{t} l_{i}^{2} {\tilde{M S} (T * R) + r ({Cov}_{2} - {Cov}_{3})}

Using the same argument as given in Appendix A and noting that V = E (V*), we can show that a chi-squared-distribution approximation for V* is given by $V χ_{d f_{2}}^{2} / d f_{2}$ with df₂ given by (25). Furthermore, independence of g (θ̂) and $\tilde{M S} (T * R)$ for the mm-ANOVA model, and hence independence of g (θ̂) and V*, follows from the independence of g (θ̂) and MS(T * R) for the conventional ANOVA model (7). Thus for the mm-ANOVA model

t = \frac{g (\hat{θ}) - g (θ)}{\sqrt{\frac{1}{r} \sum_{i = 1}^{t} l_{i}^{2} {\tilde{M S} (T * R) + r ({Cov}_{2} - {Cov}_{3})}}} = \frac{g (\hat{θ}) - g (θ)}{\sqrt{V^{*}}} = \frac{\frac{g (\hat{θ}) - g (θ)}{\sqrt{V}}}{\sqrt{\frac{(V^{*}) d f_{2}}{V} / d f_{2}}} = \frac{Z}{\sqrt{W / d f_{2}}}

where Z ~ N (0, 1), W is approximately $χ_{d f_{2}}^{2}$ , and Z and W are independent. Thus

t = \frac{g (\hat{θ}) - g (θ)}{\sqrt{\frac{1}{r} \sum_{i = 1}^{t} l_{i}^{2} {\tilde{M S} (T * R) + r ({Cov}_{2} - {Cov}_{3})}}}

has an approximate t_df₂ distribution with df₂ given by (25). In practice we replace r (Cov₂ − Cov₃) by $max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]$ and base tests and confidence intervals on

t = \frac{g (\hat{θ}) - g (θ)}{\sqrt{\frac{1}{r} \sum_{i = 1}^{t} l_{i}^{2} {\tilde{M S} (T * R) + max [r ({\hat{Cov}}_{2} - {\hat{Cov}}_{3}), 0]}}} = \frac{g (\hat{θ}) - g (θ)}{\sqrt{\hat{V}}}

(B2)

which we treat as having an approximate t_df₂ distribution; the confidence interval result in step 3k follows.

The general result for with g (·) being any linear function can be similarly proved, with the main difference being the formula for V.

C. MM-ANOVA APPROACH – STEP 4: DERIVE THE NON-NULL DISTRIBUTION OF F_OR

Power and sample size estimation for the step 2a hypothesis requires specification of the distribution of the F_OR statistic, derived in step 2i, when the null hypothesis is not true. A noncentral F distribution approximation for the non-null distribution is specified by steps 4a–d below. These steps are justified in Section C.5.

C.1. Step 4a: Compute the noncentrality parameter in terms of the conventional ANOVA model

Express the noncentrality parameter in terms of the conventional ANOVA model using

λ = \frac{d f (M S_{num}) {M S_{num} |}_{Y = E (Y)}}{E (M S_{num} | H_{0})}

(C1)

where MS_num is the numerator mean square from the conventional ANOVA F statistic given in step 2d, df(MS_num) is its degrees of freedom, E (MS_num |H₀) is its expected value under H₀, and MS_num|_{Y= E(Y)} is the mean square evaluated with outcomes replaced by their expected values.

For the balanced test×reader×case factorial design we have MS_num = MS (T) from step 2d. From Table 4 we have $E [M S (T)] = \frac{r c}{(t - 1)} \sum_{i = 1}^{t} τ_{i}^{2} + c σ_{T R}^{2} + r σ_{T C}^{2} + σ^{2}$ . Thus $E [M S (T) | H_{0}] = c σ_{T R}^{2} + r σ_{T C}^{2} + σ^{2}$ under H₀:τ₁ = … = τ_t = 0. Noting that E (Y_ijk) = μ + τ_i, we have ${M S (T) |}_{Y = E (Y)} = \frac{r c}{t - 1} {\sum_{i = 1}^{t} {(Y_{i • •} - Y_{• • •})}^{2} |}_{Y_{i j k} = μ + τ_{i}} = \frac{r c}{t - 1} \sum_{i = 1}^{t} τ_{i}^{2}$ . Noting that df[MS (T)] = t − 1, then from (C1) it follows that

λ = \frac{r c \sum_{i = 1}^{t} τ_{i}^{2}}{c σ_{T R}^{2} + r σ_{T C}^{2} + σ^{2}}

(C2)

C.2. Step 4b: Express λ in terms of mm-ANOVA parameters

Replace variance components in (C2) corresponding to random effects involving case by mm-ANOVA covariances. From the relationships determined in step 1c and presented in Table 3 we have

r σ_{T C}^{2} + σ^{2} = c [σ_{\tilde{ε}}^{2} - {Cov}_{1} + (r - 1) ({Cov}_{2} - {Cov}_{3})]

(Recall that $σ_{\tilde{ε}}^{2}$ is the error variance for the mm-ANOVA model.) Thus in terms of mm-ANOVA parameters

λ = \frac{r \sum_{i = 1}^{t} τ_{i}^{2}}{σ_{T R}^{2} + σ_{\tilde{ε}}^{2} - {Cov}_{1} + (r - 1) ({Cov}_{2} - {Cov}_{3})}

(C3)

C.3. Step 4c: Determine the denominator degrees of freedom in terms of mm-ANOVA parameters

Write the denominator of $F_{O R}^{*}$ from step 2h in the form $b (\sum_{i = 1}^{I} a_{i} {\tilde{M S}}_{i} + d)$ . The denominator degrees of freedom is given by

d f_{2} = \frac{{[\sum_{i = 1}^{I} α_{i} E ({\tilde{M S}}_{i}) + d]}^{2}}{\sum_{i = 1}^{I} {[α_{i} E ({\tilde{M S}}_{i})]}^{2} / d f ({\tilde{M S}}_{i})}

(C4)

which is the same as (A2). Note that (C4) contains the expected mean square values and the true value of d, in contrast to approximation (24) that replaces these values by sample estimates. The reason for this difference is that approximation (24) will be used for hypotheses testing and confidence intervals for a study data set; in contrast, (C4) will be used for sample-size and power estimation for a future study and will be based on parameter values that are either conjectured or estimated from pilot data.

Express the expected mean squares in (C4) in terms of mm-ANOVA model parameters by determining their expected values in terms of the conventional ANOVA parameters and then replacing variance components that involve case by mm-ANOVA covariances. For example, for the balanced test×reader×case factorial study design, the denominator of $F_{O R}^{*}$ from step 2h is given by $\sum_{i = 1}^{I} a_{i} {\tilde{M S}}_{i} + d = \tilde{M S} (T * R) + r ({Cov}_{2} - {Cov}_{3})$ . From (17) and Tables 3–4 it follows that

E (\tilde{M S} (T * R)) = \frac{1}{c} E (M S (T * R)) = \frac{1}{c} (c σ_{T R}^{2} + σ^{2})

with $σ^{2} = c (σ_{\tilde{ε}}^{2} - {Cov}_{1} - {Cov}_{2} + {Cov}_{3})$ . Thus

E (\tilde{M S} (T * R)) = σ_{T R}^{2} + σ_{\tilde{ε}}^{2} - {Cov}_{1} - {Cov}_{2} + {Cov}_{3}

and hence, using (C4),

d f_{2} = \frac{{[σ_{T R}^{2} + σ_{\tilde{ε}}^{2} - {Cov}_{1} + (r - 1) ({Cov}_{2} - {Cov}_{3})]}^{2}}{\frac{{[σ_{T R}^{2} + σ_{\tilde{ε}}^{2} - {Cov}_{1} - {Cov}_{2} + {Cov}_{3}]}^{2}}{(t - 1) (r - 1)}}

(C5)

Hillis et al [15] illustrate how these formulas can be used in practice to estimate power and sample size using pilot-data or conjectured parameter estimates.

C.4. Step 4d: General non-null distribution result

An approximation for the non-null distribution of F_OR is given by

F_{d f_{1}, d f_{2}; λ}

where λ is given in step 4b, df₁ is the degrees of freedom for the numerator mean square from the conventional ANOVA F statistic given in step 2d and df₂ is given by (C4), expressed in terms of the mm-ANOVA parameters. Thus for the balanced test×reader×case factorial study design, λ is given by (C3),df₁ = t − 1, and df₂ is given by (C5).

C.5. Justification of steps 4a–d

The non-null distribution result given in step 4d can derived for the test×reader×case study design along the same lines as the derivation of the null distribution result given in Section A. One difference is that $\tilde{M S} (T)$ , the numerator numerator mean square in F_OR (22) has a noncentral chi-square distribution when appropriately normalized under H₁. The distribution for MS(T) is given by

(t - 1) \frac{M S (T)}{E (M S (T | H_{0}))} ~ χ_{t - 1; λ}^{2}

where λ is given by (C3). Because $\tilde{M S} (T) = \frac{1}{c} M S (T)$ , it follows that

(t - 1) \frac{\tilde{M S} (T)}{E (\tilde{M S} (T | H_{0}))} ~ χ_{t - 1; λ}^{2}

Using the Section A approach but with this one difference, we can show that

F_{O R}^{*} = \frac{\tilde{M S} (T)}{\tilde{M S} (T * R) + r ({Cov}_{2} - {Cov}_{3})} = \frac{\frac{\tilde{M S} (T)}{E [\tilde{M S} (T) | H_{0}]}}{\frac{\tilde{M S} (T * R) + r ({Cov}_{2} - {Cov}_{3})}{E [\tilde{M S} (T * R)] + r ({Cov}_{2} - {Cov}_{3})}} = \frac{U / (t - 1)}{W / d f_{2}}

where $U ~ χ_{t - 1; λ}^{2}$ W is approximately $χ_{d f_{2}}^{2}$ with df₂ given by (C5), and U and W are independent. Thus $F_{O R}^{*}$ has an approximate F_{(t−1),df₂;λ} distribution. Because F_OR (22) approximates $F_{O R}^{*}$ (21), it is reasonable to approximate the null distribution of F_OR by F_{(t − 1),df₂;λ}.

References

1.Obuchowski NA, Rockette HE. Hypothesis testing of the diagnostic accuracy for multiple diagnostic tests: an ANOVA approach with dependent observations. Communications in Statistics: Simulation and Computation. 1995;24:285–308. [Google Scholar]
2.Obuchowski NA. Multi-reader multi-modality ROC studies: hypothesis testing and sample size estimation using an ANOVA approach with dependent observations. With rejoinder. Academic Radiology. 1995;2(Suppl 1):S22–S29. [PubMed] [Google Scholar]
3.Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Investigative Radiology. 1992;27:723–731. [PubMed] [Google Scholar]
4.Dorfman DD, Berbaum KS, Lenth RV, Chen YF, Donaghy BA. Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: factorial experimental design. Academic Radiology. 1998;5:591–602. doi: 10.1016/s1076-6332(98)80294-8. [DOI] [PubMed] [Google Scholar]
5.Hillis SL, Obuchowski NA, Schartz KM, Berbaum KS. A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette Methods for receiver operating characteristic (ROC) data. Statistics in Medicine. 2005;24:1579–1607. doi: 10.1002/sim.2024. [DOI] [PubMed] [Google Scholar]
6.Hillis SL. A comparison of denominator degrees of freedom methods for multiple observer ROC analysis. Statistics in Medicine. 2007;26:596–619. doi: 10.1002/sim.2532. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837–844. [PubMed] [Google Scholar]
8.Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]
9.Kundel HL, Gefter W, Aronchick J, Miller W, Hatabu H, Whitfill CH. Accuracy of bedside chest hard-copy screen-film versus hard-and soft-copy computed radiographs in a medical intensive care unit: receiver operating characteristic analysis. Radiology. 1997;205:859–863. doi: 10.1148/radiology.205.3.9393548. [DOI] [PubMed] [Google Scholar]
10.Pavur R, Nath R. Exact F tests in an ANOVA procedure for dependent observations. Multivariate Behavioral Research. 1984;19:408–420. doi: 10.1207/s15327906mbr1904_3. [DOI] [PubMed] [Google Scholar]
11.Searle SR. Linear Models. New York: Wiley; 1971. pp. 55–59. [Google Scholar]
12.Obuchowski NA. Multireader, multimodality receiver operating characteristic curve studies: hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. Academic Radiology. 1995;2(Suppl 1):S22–S29. [PubMed] [Google Scholar]
13.Obuchowski NA. Reducing the number of reader interpretations in MRMC studies. Academic Radiology. 2009;16:209–217. doi: 10.1016/j.acra.2008.05.014. [DOI] [PubMed] [Google Scholar]
14.Obuchowski NA, Gallas BD, Hillis SL. Multi-reader ROC studies with split-plot designs: a comparison of statistical methods. Academic Radiology. 2012;19:1508–1517. doi: 10.1016/j.acra.2012.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Hillis SL, Obuchowski NA, Berbaum KS. Power estimation for multireader ROC methods: An updated and unified approach. Academic Radiology. 2011;18:129–142. doi: 10.1016/j.acra.2010.09.007. doi: [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Obuchowski NA, Lieber ML, Powell KA. Data analysis for detection and localization of multiple abnormalities with application to mammography. Academic Radiology. 2000;7:516–525. doi: 10.1016/s1076-6332(00)80324-4. [DOI] [PubMed] [Google Scholar]
17.Chakraborty DP, Berbaum KS. Observer studies involving detection and localization: Modeling, analysis, and validation. Medical Physics. 2004;31:2313–2330. doi: 10.1118/1.1769352. [DOI] [PubMed] [Google Scholar]
18.Bunch PC, Hamilton JF, Sanderson GK, Simmons AH. Free-response approach to the measurement and characterization of radiographic-observer performance. Journal of Applied Photographic Engineering. 1978;4:166–171. [Google Scholar]
19.Pavur RJ, Lewis TO. Unbiased F-tests for factorial-experiments for correlated data. Communications in Statistics-Theory and Methods. 1983;12:829–840. [Google Scholar]
20.Gallas BD. One-shot estimate of MRMC variance: AUC. Academic Radiology. 2006;13:353–362. doi: 10.1016/j.acra.2005.11.030. [DOI] [PubMed] [Google Scholar]
21.Gallas BD, Pennelo GA, Myers KJ. Multireader multicase variance analysis for binary data. JOSA A. 2007;24:B70–B80. doi: 10.1364/josaa.24.000b70. [DOI] [PubMed] [Google Scholar]
22.Gallas BD, Bandos A, Samuelson FW, Wagner RF. A framework for random-effects ROC analysis: biases with the bootstrap and other variance estimators. Communications in Statistics-Theory and Methods. 2009;38:2586–2603. [Google Scholar]
23.Hoeffding W. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics. 1948;19:293–325. [Google Scholar]
24.Song X, Zhou XH. A marginal model approach for analysis of multi-reader multi-test receiver operating characteristic (ROC) data. Biostatistics. 2005;6:303–312. doi: 10.1093/biostatistics/kxi011. [DOI] [PubMed] [Google Scholar]
25.Satterthwaite FE. Synthesis of variance. Psychometrika. 1941;6:309–316. [Google Scholar]
26.Satterthwaite FE. An approximate distribution of estimates of variance components. Biometric Bulletin. 1946;2:110–114. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web-based Supporting Materials

NIHMS733736-supplement-Web-based_Supporting_Materials.pdf^{(143.9KB, pdf)}

[R1] 1.Obuchowski NA, Rockette HE. Hypothesis testing of the diagnostic accuracy for multiple diagnostic tests: an ANOVA approach with dependent observations. Communications in Statistics: Simulation and Computation. 1995;24:285–308. [Google Scholar]

[R2] 2.Obuchowski NA. Multi-reader multi-modality ROC studies: hypothesis testing and sample size estimation using an ANOVA approach with dependent observations. With rejoinder. Academic Radiology. 1995;2(Suppl 1):S22–S29. [PubMed] [Google Scholar]

[R3] 3.Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Investigative Radiology. 1992;27:723–731. [PubMed] [Google Scholar]

[R4] 4.Dorfman DD, Berbaum KS, Lenth RV, Chen YF, Donaghy BA. Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: factorial experimental design. Academic Radiology. 1998;5:591–602. doi: 10.1016/s1076-6332(98)80294-8. [DOI] [PubMed] [Google Scholar]

[R5] 5.Hillis SL, Obuchowski NA, Schartz KM, Berbaum KS. A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette Methods for receiver operating characteristic (ROC) data. Statistics in Medicine. 2005;24:1579–1607. doi: 10.1002/sim.2024. [DOI] [PubMed] [Google Scholar]

[R6] 6.Hillis SL. A comparison of denominator degrees of freedom methods for multiple observer ROC analysis. Statistics in Medicine. 2007;26:596–619. doi: 10.1002/sim.2532. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837–844. [PubMed] [Google Scholar]

[R8] 8.Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]

[R9] 9.Kundel HL, Gefter W, Aronchick J, Miller W, Hatabu H, Whitfill CH. Accuracy of bedside chest hard-copy screen-film versus hard-and soft-copy computed radiographs in a medical intensive care unit: receiver operating characteristic analysis. Radiology. 1997;205:859–863. doi: 10.1148/radiology.205.3.9393548. [DOI] [PubMed] [Google Scholar]

[R10] 10.Pavur R, Nath R. Exact F tests in an ANOVA procedure for dependent observations. Multivariate Behavioral Research. 1984;19:408–420. doi: 10.1207/s15327906mbr1904_3. [DOI] [PubMed] [Google Scholar]

[R11] 11.Searle SR. Linear Models. New York: Wiley; 1971. pp. 55–59. [Google Scholar]

[R12] 12.Obuchowski NA. Multireader, multimodality receiver operating characteristic curve studies: hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. Academic Radiology. 1995;2(Suppl 1):S22–S29. [PubMed] [Google Scholar]

[R13] 13.Obuchowski NA. Reducing the number of reader interpretations in MRMC studies. Academic Radiology. 2009;16:209–217. doi: 10.1016/j.acra.2008.05.014. [DOI] [PubMed] [Google Scholar]

[R14] 14.Obuchowski NA, Gallas BD, Hillis SL. Multi-reader ROC studies with split-plot designs: a comparison of statistical methods. Academic Radiology. 2012;19:1508–1517. doi: 10.1016/j.acra.2012.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Hillis SL, Obuchowski NA, Berbaum KS. Power estimation for multireader ROC methods: An updated and unified approach. Academic Radiology. 2011;18:129–142. doi: 10.1016/j.acra.2010.09.007. doi: [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Obuchowski NA, Lieber ML, Powell KA. Data analysis for detection and localization of multiple abnormalities with application to mammography. Academic Radiology. 2000;7:516–525. doi: 10.1016/s1076-6332(00)80324-4. [DOI] [PubMed] [Google Scholar]

[R17] 17.Chakraborty DP, Berbaum KS. Observer studies involving detection and localization: Modeling, analysis, and validation. Medical Physics. 2004;31:2313–2330. doi: 10.1118/1.1769352. [DOI] [PubMed] [Google Scholar]

[R18] 18.Bunch PC, Hamilton JF, Sanderson GK, Simmons AH. Free-response approach to the measurement and characterization of radiographic-observer performance. Journal of Applied Photographic Engineering. 1978;4:166–171. [Google Scholar]

[R19] 19.Pavur RJ, Lewis TO. Unbiased F-tests for factorial-experiments for correlated data. Communications in Statistics-Theory and Methods. 1983;12:829–840. [Google Scholar]

[R20] 20.Gallas BD. One-shot estimate of MRMC variance: AUC. Academic Radiology. 2006;13:353–362. doi: 10.1016/j.acra.2005.11.030. [DOI] [PubMed] [Google Scholar]

[R21] 21.Gallas BD, Pennelo GA, Myers KJ. Multireader multicase variance analysis for binary data. JOSA A. 2007;24:B70–B80. doi: 10.1364/josaa.24.000b70. [DOI] [PubMed] [Google Scholar]

[R22] 22.Gallas BD, Bandos A, Samuelson FW, Wagner RF. A framework for random-effects ROC analysis: biases with the bootstrap and other variance estimators. Communications in Statistics-Theory and Methods. 2009;38:2586–2603. [Google Scholar]

[R23] 23.Hoeffding W. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics. 1948;19:293–325. [Google Scholar]

[R24] 24.Song X, Zhou XH. A marginal model approach for analysis of multi-reader multi-test receiver operating characteristic (ROC) data. Biostatistics. 2005;6:303–312. doi: 10.1093/biostatistics/kxi011. [DOI] [PubMed] [Google Scholar]

[R25] 25.Satterthwaite FE. Synthesis of variance. Psychometrika. 1941;6:309–316. [Google Scholar]

[R26] 26.Satterthwaite FE. An approximate distribution of estimates of variance components. Biometric Bulletin. 1946;2:110–114. [PubMed] [Google Scholar]

PERMALINK

A Marginal-Mean ANOVA Approach for Analyzing Multireader Multicase Radiological Imaging Data

Stephen L Hillis, Ph.D.

Abstract

1. INTRODUCTION

2. THE OBUCHOWSKI-ROCKETTE (OR) METHOD

2.1. Design and notation

2.2. Model and test statistic

Table 1.

2.3. Real-data example

Table 2.

2.4. Previous derivations of OR properties

3. MM-ANOVA APPROACH – STEP 1: DERIVE THE MM-ANOVA MODEL

3.1. Step 1a: Define the conventional ANOVA model that corresponds to the study design as if each reader-performance measure was the mean of case outcomes

3.2. Step 1b: From the conventional ANOVA model defined in step 1a, derive the mm-ANOVA model by averaging across cases and defining the mm-ANOVA model error term equal to the mean, across cases, of the sum of the conventional ANOVA model error term and random effects involving case

3.3. Step 1c: Express the mm-ANOVA model error variance and covariances in terms of the conventional ANOVA model variance components

3.4. Step 1d: Determine the mm-ANOVA model covariance constraints implied by step 1c

3.5. Remarks

3.5.1. One-to-one relationship between parameters of the 3-way conventional ANOVA and corresponding mm-ANOVA models

Table 3.

3.5.2. Equivalence of the OR and mm-ANOVA models

3.5.3. Definition of the mm-ANOVA approach

3.5.4. Motivation for the OR model

4. MM-ANOVA APPROACH – STEP 2: DERIVE THE MM-ANOVA MODEL TEST STATISTIC AND ITS NULL DISTRIBUTION FOR A HYPOTHESIS EXPRESSED IN TERMS OF TEST ACCURACIES

4.1. Step 2a: State the hypothesis of interest in terms of the mm-ANOVA model

4.2. Step 2b: Express the hypothesis from step 2a in terms of the conventional ANOVA model

4.3. Step 2c: Create the expected-mean-square table for the conventional ANOVA model

Table 4.

4.4. Step 2d: Determine the conventional ANOVA F statistic corresponding to the step 2b hypothesis

4.5. Step 2e: Express mm-ANOVA mean squares in terms of conventional ANOVA mean squares

4.6. Step 2f: Express F from step 2d in terms of mm-ANOVA model mean squares and U, where U is a linear function of conventional ANOVA model mean squares that involve case

4.7. Step 2g: Express E (U) in terms of conventional ANOVA model variance components, and then in terms of mm-ANOVA model error covariance parameters using the relationships from step 1c

4.8. Step 2h: Modify F (18) from step 2f to produce the mm-ANOVA statistic FOR* by replacing U by E (U), expressed as a linear function of mm-ANOVA covariance parameters

4.9. Step 2i: Derive FOR by replacing covariance parameters in FOR* by estimates that take into account the constraints from step 1d

4.10. Step 2j: Determine the approximate null distribution of FOR

Null-distribution result

4.11. Remark: Derivation of mm-ANOVA expected mean square and variance component expressions

5. Mm-ANOVA algorithm summary and examples

Table 5.

Table 6.

5.1. Example 1: Reader×case study design (one test)

5.2. Example 2: Reader-nested-within-test study design

Table 7.

5.3. Example 3: Case-nested-within-test split-plot study design

5.3.1. Real-data example

Table 8.

5.4. Example 4: Case-nested-within-reader split-plot study design

5.5. Example 5: Reader-and-case-crossed-and-nested-within-group split-plot study design

5.6. Example 6: Replicated factorial study design

5.6.1. Real-data example

Table 9.

6. Discussion

Supplementary Material

ACKNOWLEDGEMENTS

Appendix

A. DERIVATION OF THE NULL-DISTRIBUTION RESULT USED IN STEP 2J

B. MM-ANOVA APPROACH STEP 3: DERIVE CONFIDENCE INTERVALS FOR A LINEAR FUNCTION g(θ) OF TEST ACCURACIES

B.1. Step 3a: Write the test accuracy parameter vector θ in terms of the mm-ANOVA model

B.2. Step 3b: Write θ in terms of the conventional ANOVA model

B.3. Step 3c: Determine the conventional ANOVA estimate for θ, denoted by θ̂

B.4. Step 3d: Determine the variance V of g (θ̂) in terms of conventional ANOVA parameters

B.5. Step 3e: Write V from step 3d in the form V = bE (∑aiMSi) for constants b and ai

B.6. Step 3f: Write V from step 3e in the form V=b˜E(∑ãiMS˜i+U) where b̃ and ãi are constants and U is a linear function of conventional ANOVA mean squares that involve case

B.7. Step 3g: Express E (U) in terms of conventional ANOVA model variance components and then in terms of mm-ANOVA model error covariance parameters, using the relationships from step 1c; then rewrite V using this expression for E (U)

B.8. Step 3h: Derive the variance estimate V̂ from V by replacing expected mean squares by mean squares and replacing covariances by estimates that take into account the constraints from step 1d

B.9. Step 3i: Derive the degrees of freedom df2 for V̂ using the general formula for df2 (24) given in step 2j

B.10. Step 3j: Write θ̂ from step 3c in terms of the mm-ANOVA model

B.11. Step 3k: General confidence-interval result: In terms of the mm-ANOVA model, an approximate (1 − α) 100% confidence interval for g (θ) is given by g(θ^)±tα/2;df2V^ where V̂ is determined in step 3h, df2 in step 3i and θ̂ in step 3j

B.12. Derivation of the general confidence-interval result given in step 3k

C. MM-ANOVA APPROACH – STEP 4: DERIVE THE NON-NULL DISTRIBUTION OF FOR

C.1. Step 4a: Compute the noncentrality parameter in terms of the conventional ANOVA model

C.2. Step 4b: Express λ in terms of mm-ANOVA parameters

C.3. Step 4c: Determine the denominator degrees of freedom in terms of mm-ANOVA parameters

C.4. Step 4d: General non-null distribution result

C.5. Justification of steps 4a–d

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

4.8. Step 2h: Modify F (18) from step 2f to produce the mm-ANOVA statistic $F_{O R}^{*}$ by replacing U by E (U), expressed as a linear function of mm-ANOVA covariance parameters

4.9. Step 2i: Derive F_OR by replacing covariance parameters in $F_{O R}^{*}$ by estimates that take into account the constraints from step 1d

4.10. Step 2j: Determine the approximate null distribution of F_OR

B.5. Step 3e: Write V from step 3d in the form V = bE (∑a_iMS_i) for constants b and a_i

B.6. Step 3f: Write V from step 3e in the form $V = \tilde{b} E (\sum ã_{i} {\tilde{M S}}_{i} + U)$ where b̃ and ã_i are constants and U is a linear function of conventional ANOVA mean squares that involve case

B.9. Step 3i: Derive the degrees of freedom df₂ for V̂ using the general formula for df₂ (24) given in step 2j

B.11. Step 3k: General confidence-interval result: In terms of the mm-ANOVA model, an approximate (1 − α) 100% confidence interval for g (θ) is given by $g (\hat{θ}) \pm t_{α / 2; d f_{2}} \sqrt{\hat{V}}$ where V̂ is determined in step 3h, df₂ in step 3i and θ̂ in step 3j

C. MM-ANOVA APPROACH – STEP 4: DERIVE THE NON-NULL DISTRIBUTION OF F_OR