Abstract
A common study design for comparing the performances of diagnostic imaging tests is to obtain ratings from multiple readers of multiple cases whose true statuses are known. Typically, there is overlap between the tests, readers, and/or cases for which special analytical methods are needed to perform statistical comparisons. We present our new MATLAB MRMCaov toolbox, which is designed for multi-reader multi-case comparisons of two or more diagnostic tests. The toolbox allows for statistical comparison of reader performance metrics, such as area under the receiver operating characteristic curve (ROC AUC), with analysis of variance methods originally proposed by Obuchowski and Rockette (1995) and later unified and improved by Hillis and colleagues (2005, 2007, 2008, 2018). MRMCaov is open-source software with an integrated command-line interface for performing multi-reader multi-case statistical analysis, plotting, and presenting results. Its features (1) ROC AUC, likelihood ratios of positive or negative ratings, sensitivity, specificity, and expected utility reader performance metrics; (2) reader-specific ROC curves; (3) user-definable performance metrics; (4) test-specific estimates of mean performance along with confidence intervals and p-values for statistical comparisons; (5) support for factorial, nested, or partially paired study designs; (6) inference for random or fixed readers and cases; (7) DeLong, jackknife, or unbiased covariance estimation; and (8) compatibility with Microsoft Windows, Mac OS, and Linux.
Keywords: multi-reader multi-case, ANOVA, ROC analysis, diagnostic radiology, software
1. INTRODUCTION
A common study design for comparing the diagnostic performance of imaging modalities, or diagnostic tests, is to obtain modality-specific ratings from multiple readers of multiple cases (MRMC) whose true statuses are known. In such a design, receiver operating characteristic (ROC) metrics, such as area under the ROC curve (ROC AUC), can be used to quantify correspondence between reader ratings and case status. Metrics can then be compared statistically to determine if there are differences between modalities. Special statistical methods are needed when readers or cases represent a random sample from a larger population of interest and there is overlap in readers and/or cases across modalities. An ANOVA model designed for the characteristics of MRMC studies was initially proposed by Dorfman et al.1 and Obuchowski and Rockette2 and later unified and improved by Hillis and colleagues.3–6 Their models are implemented in the MRMCaov MATLAB toolbox.7
MRMCaov performs multi-reader multi-case analysis of variance for the comparison of reader performance across imaging modalities. This software is the first MATLAB implementation of the Hillis unified methodology and builds upon his OR-DBM MRMC SAS software.8 It is designed to be user friendly, integrate with the MATLAB programming and graphics environment, and offer new features and methodologies. Current features of the toolbox are summarized below. Usage of the software is illustrated with a medical imaging example in the subsequent sections.
MRMCaov provides both graphical and tabular analysis results, including reader-specific ROC curves and AUC estimates, modality-specific estimates, confidence intervals, and p-values for statistical comparisons. The toolbox includes a new method for unbiased covariance estimation as well as other features not collectively available in any other existing MATLAB toolbox.
MRMCaov MATLAB toolbox features.
Empirical ROC curves.
Reader-specific ROC curves and performance metrics.
Performance metric functions to compute area under the ROC curve, expected utility, sensitivity for a specified specificity, and specificity for a specified sensitivity.
User-definable performance metrics.
Modality-specific estimates of mean performance along with confidence intervals and p-values for statistical comparisons.
Comparison of two or more modalities.
Support for factorial, nested, and partially paired study designs.
Inference for random readers and cases, random readers and fixed cases, or fixed readers and random cases.
DeLong, jackknife, or unbiased covariance estimation.
Compatibility with Microsoft Windows, Mac OS, and Linux.
1.1. Installation
The MATLAB toolbox is currently available for download from https://github.com/brian-j-smith/MRMCaov.m. Installation instructions are provided at the download site. Once installed, the toolbox may be used in the MATLAB desktop environment.9
1.2. Data
Input data for MRMCaov analysis should be given as MATLAB vectors for reader, test, and case identifiers as well as true event statuses and reader ratings. A table of example data, named VanDyke, is provided with the toolbox and displayed in example 1.1. The data come from a study in which the relative performance of cinematic presentation of MRI (1 = CINE MRI) was compared to single spin-echo magnetic resonance imaging (2 = SE MRI) for the detection of thoracic aortic dissection.10 Forty five patients with aortic dissection and 69 without dissection were imaged with both modalities. Based on the images, five radiologists rated patient disease status as 1 = definitely no aortic dissection, 2 = probably no aortic dissection, 3 = unsure about aortic dissection, 4 = probably aortic dissection, or 5 = definitely aortic dissection. Interest lies in estimating ROC curves for each combination of reader and modality and in comparing modalities with respect to summary statistics from the curves.
Descriptions of the table variables are as follows.
reader: unique identifiers for the five radiologists.
treatment: identifiers for the imaging modality (1 = CINE MRI, 2 = SE MRI).
case: identifiers for the 114 cases.
truth: indicator for thoracic aortic dissection (1 = done, 0 = not done).
rating: five-point ratings given to case images by the readers.
case2: example identifiers representing nesting of cases within readers.
case3: example identifiers representing nesting of cases within treatments.
MATLAB Example 1.1: First 10 observations in VanDyke table.
>> load VanDyke.mat >> head(VanDyke, 10)
reader |
treatment |
case |
truth |
rating |
case2 |
case3 |
---|---|---|---|---|---|---|
1 |
1 |
1 |
0 |
1 |
1.1 |
1.1 |
1 |
2 |
1 |
0 |
3 |
1.1 |
2.1 |
2 |
1 |
1 |
0 |
2 |
2.1 |
1.1 |
2 |
2 |
1 |
0 |
3 |
2.1 |
2.1 |
3 |
1 |
1 |
0 |
2 |
3.1 |
1.1 |
3 |
2 |
1 |
0 |
2 |
3.1 |
2.1 |
4 |
1 |
1 |
0 |
1 |
4.1 |
1.1 |
4 |
2 |
1 |
0 |
2 |
4.1 |
2.1 |
5 |
1 |
1 |
0 |
3 |
5.1 |
1.1 |
5 |
2 |
1 |
0 |
2 |
5.1 |
2.1 |
2. READER PERFORMANCE METRICS
In an MRMCaov analysis, true case statuses, reader ratings, and a reader performance metric quantifying correspondence between them are specified with objects based on the PerformanceVariate class. Several PerformanceVariate classes are provided and include area under an ROC curve (ROCAUCVariate), expected utility of an ROC curve (ROCEUVariate), specificity (SpecificityVariate), sensitivity (SensitivityVariate), and likelihood ratio of a positive (ROCLRposVariate) or a negative (ROCLRnegVariate) rating. In addition, users may create their own PerformanceVariate classes to define other performance metrics to analyze. Functions that create PerformanceVariate objects take column vectors of true cases statuses (truth) and reader ratings (rating) as the first two arguments. Performance metrics measure the degree to which higher case ratings are associated with positive case statuses, where positive status is taken to be the highest level of values in the the truth vector.
PerformanceVariate class.
Syntax
obj = PerformanceVariate(truth, rating, metric)
Description
Returns a PerformanceVariate class object that contains true case statuses and reader ratings with which to compute a reader performance metric for multi-reader, multi-case analysis. This function is available for users who wish to define their own performance metrics.
Input Arguments
truth: column array of true binary statuses.
rating: column array of numeric ratings.
metric: handle to a function defined with arguments truth and rating to compute a performance metric on them.
2.1. Area Under the ROC Curve
Area under the ROC curve (ROC AUC) is a measure of concordance between numerical reader ratings and true cases statuses. It provides an estimate of the probability that a randomly selected positive case will have a higher rating than a negative case. ROC AUC values range from 0 to 1, with 0.5 representing no concordance and 1 perfect concordance. ROC curves in the ROCAUCVaraite class are estimated empirically as describe by Pepe.11 In addition, ROCAUCVariate has additional name-value arguments (options) to allow for calculation of partial area under the curve over a range of sensitivities or specificities.
ROCAUCVariate class.
Syntax
obj = ROCAUCVariate(truth, rating, options)
Description
Returns an ROCAUCVariate class object that inherits from PerformanceVariate and defines area under the ROC curve as the reader performance metric for MRMC analysis.
Input Arguments
truth, rating: see PerformanceVariate.
Name-Value Arguments
partial: character vector specifying whether to compute area under the entire curve (’’), over a range of sensitivities (‘sensitivity’), or over a ranges of specificities (‘specificity’) [default: ’’].
min: numeric value from 0 to 1 for the minimum sensitivity or specificity if computing partial area [default: 0].
max: numeric value from 0 to 1 for the maximum sensitivity or specificity if computing partial area [default: 1].
normalize: logical indicating whether to divide partial area by max - min [default: false].
2.2. Expected Utility of the ROC Curve
As an alternative to AUC as a summary of ROC curves, Abbey et al.12 propose an expected utility metric defined as
where TPR(FPR) are true positive rates on the ROC curve, and FPR are false positive rates ranging from 0 to 1. This expected utility can be viewed as a generalization of Youden’s J statistic13 which Perkins and Schisterman14 describe as having an optimal weighting factor of β = (1 − p)/(r × p), where p is the population prevalence of positive cases and r is the cost associated with a false negative classification relative to a false positive one.
ROCEUVariate class.
Syntax
obj = ROCEUVariate(truth, rating, slope)
Description
Returns an ROCEUVariate class object that inherits from PerformanceVariate and defines expected utility of an ROC curve12 as the reader performance metric for MRMC analysis.
Input Arguments
truth, rating: see PerformanceVariate.
slope: numeric slope (β) value at which to compute expected utility [default: 1].
2.3. Sensitivity and Specificity
Sensitivity is the probability of a positive rating (T+) for a positive case (D+) and specificity a negative rating (T−) for a negative case (D−); i.e.,
These metrics are estimated from the subset of cases who are either positive or negative. For instance, sensitivity is calculated from data as the proportion of positive cases who have positive ratings. As such sensitivity/specificity will not be estimable for reader-test combinations that do not have any positive/negative cases.
SensitivityVariate and SpecificityVariate classes.
Syntax
obj = SensitivityVariate(truth, rating) obj = SpecificityVariate(truth, rating)
Description
Return a SensitivityVariate or SpecificityVariate class object that inherits from PerformanceVariate and define sensitivity or specificity as the reader performance metric for MRMC analysis.
Input Arguments
truth: see PerformanceVariate.
rating: column array of binary ratings.
2.4. Likelihood Ratios of Positive and Negative Ratings
The likelihood ratio of a positive/negative rating (LR+/LR−) is defined as the probability of a positive/negative rating (T+/T−) for a positive case (D+) relative to a negative case (D−). Mathematically, the ratios are
Estimability of these metrics requires both positive and negative cases as well as tests that are positive (LR+) or negative (LR−).
ROCLRposVariate and ROCLRnegVariate classes.
Syntax
obj = ROCLRposVariate(truth, rating) obj = ROCLRnegVariate(truth, rating)
Description
Return an ROCLRposVariate or ROCLRnegVariate class object that inherits from PerformanceVariate and define the likelihood of a positive (LR+) or negative (LR−) rating as the reader performance metric for MRMC analysis. LR+ and LR− are computed as sensitivity / (1 − specificity) and (1 − sensitivity) / specificity, respectively.
Input Arguments
truth: see PerformanceVariate.
rating: column array of binary ratings.
3. MRMC ANALYSIS
3.1. Analysis Specification
The first step in conducting a multi-reader, multi-case analysis with the toolbox is a call to the function mrmc to specify the data inputs, performance metric of interest, and covariance estimation method. A summary of the function is given below.
Multi-reader multi-case analysis function.
Syntax
fit = mrmc(y, test, reader, id, options)
Description
Returns a MRMCFit class object of data that can be used to estimate and compare reader performance metrics in a multi-reader, multi-case statistical analysis.
Input Arguments
y: PerformanceVariate object defining true case statuses, corresponding reader ratings, and a reader performance metric to compute on them.
test, reader, id: column arrays of grouping variables that identify the test modality, reader, and case for the observations in y. Each of the variables can be either categorical, numerical, character, or string arrays and must have the same number of observations as y.
By default, all grouping variables are treated as random effects in the analysis. One, but not both, of the reader or case variables may be specified as fixed by specifying their variables as FixedVariate(reader) or FixedVariate(id).
Name-Value Argument
cov: character vector specifying the method of estimating within-reader rating covariances as ‘DeLong’, ‘jackknife’, or ‘unbiased’ [default: ‘jackknife’].
Study design is automatically inferred from the case identifiers. If the identifiers are different across readers and tests, a factorial design is assumed. Cases are assumed to be nested within readers if their identifiers are the same across readers and are assumed to be nested within tests if their identifiers are the same across tests. Examples of the latter two designs are given with the case2 and case3 variables in the VanDyke table. The case variable is coded for the full factorial design originally employed in the study. Methods for calculating covariances include ‘DeLong’,15 ‘jackknife’,16 and ‘unbiased’.6 The methods that can be used in an analysis depend on the specified performance metric. Jackknife will work with any metric. DeLong and unbiased can be used with empirically estimated AUC. The mrmc function automatically checks for and informs users if called with an incompatible combination of covariance method and performance metric.
In example 3.1, mrmc is called to specify an analysis in which the VanDyke imaging modalities are compared with respect to area under the empirically estimated ROC curve and with unbiased covariance estimation. Fit information is returned as an MRMCFit class object containing the fields summarized in example 3.2. Results of the function call are saved to a new variable fit. As can be seen from displaying the results, the function calculates and returns several quantities needed for statistical comparison of the imaging modalities, including areas under the ROC curves, analysis of variance sums of squares, estimated error variance and covariances. Covariances are provided for metrics from different tests used by the same reader (Cov1), different readers using the same test (Cov2), and different readers using different tests (Cov3). Plots of the mrmc fit are generated in example 3.3 and show the ROC curves for each reader and test.
MATLAB Example 3.1: mrmc function call.
>> y = ROCAUCVariate(VanDyke.truth, VanDyke.rating); >> fit = mrmc(y, VanDyke.treatment, VanDyke.reader, VanDyke.case, ‘cov’, ‘unbiased’); >> disp(fit) ROCAUCVariate ANOVA data:
reader |
test |
y |
N |
---|---|---|---|
1 |
1 |
0.91965 |
114 |
1 |
2 |
0.94783 |
114 |
2 |
1 |
0.85878 |
114 |
2 |
2 |
0.90531 |
114 |
3 |
1 |
0.90386 |
114 |
3 |
2 |
0.92174 |
114 |
4 |
1 |
0.97311 |
114 |
4 |
2 |
0.99936 |
114 |
5 |
1 |
0.82979 |
114 |
5 |
2 |
0.92995 |
114 |
ANOVA Table:
Source |
d.f. |
Sum Sq. |
Mean Sq. |
---|---|---|---|
{‘reader’ } |
4 |
0.015345 |
{[0.0038]} |
{‘test’ } |
1 |
0.0047962 |
{[0.0048]} |
{‘reader*test’} |
4 |
0.0022041 |
{[5.5103e-04]} |
Obuchowski-Rockette error variance and covariance estimates:
Estimate |
Correlation |
|
---|---|---|
0.00078839 |
NaN |
|
Cov1 |
0.00034167 |
0.43338 |
Cov2 |
0.00033906 |
0.43007 |
Cov3 |
0.00023561 |
0.29885 |
MATLAB Example 3.2: MRMCFit fields.
>> fields(fit) 7×1 cell array {‘y’ } {‘factors’ } {‘design’ } {‘data’ } {‘anova’ } {‘cov’ } {‘testfits’}
Descriptions
y: PerformanceVariate class object defining the response variable for the analysis.
factors: table of the test, reader, and case grouping levels for each rating.
design: structure containing information about the study design.
data: table of performance metrics, reader levels, and test levels for the ANOVA model.
anova: structure of results from the ANOVA.
cov: matrix of covariances between metrics by reader and test.
testfits: array of MRMCTestFit class objects containing results from test-specific ANOVAs.
MATLAB Example 3.3: Plot of mrmc ROC AUC fit.
>> plot(fit)
By default, readers and cases are assumed to be random factors in the MRMC analysis. As illustrated in example 3.4, either one of the factors may be designated as fixed in calls to mrmc with the syntax FixedVariate(<variable name>), where <variable name> is the name of the corresponding reader or case variable.
MATLAB Example 3.4: Designation of fixed readers or cases.
% Fixed readers mrmc(y, VanDyke.treatment, FixedVariate(VanDyke.reader), VanDyke.case,… ‘cov’, ‘unbiased’); % Fixed cases mrmc(y, VanDyke.treatment, VanDyke.reader, FixedVariate(VanDyke.case),… ‘cov’, ‘unbiased’);
3.2. Statistical Analysis Summary
Statistical comparisons of treatment modalities are obtained by calling the summary function with output returned by mrmc. The summary call produces ANOVA results for a global test of equality of ROC AUC means across all treatment modalities and tests of pairwise differences, along with confidence intervals for the differences and intervals for the individual modalities. Study design information and analysis results are available individually in the fields of the MRMCSummary class object returned by summary (see example 3.6).
MRMC statistical analysis summary function.
Syntax
res = summary(obj, options)
Description
Returns a MRMCSummary class object of statistical results from a multi-reader, multi-case analysis.
Input Arguments
obj: MRMCFit object from mrmc().
Name-Value Arguments
alpha: numerical value in the range of 0 to 1 for the significance level of 100 × (1 − α)% confidence intervals [default: 0.05].
In the summary given in example 3.5 for the present analysis, mean ROC AUC does not differ significantly between the two imaging modalities at the 5% level (p = 0.0512). Estimated means are 0.90 (95% CI: 0.83, 0.97) for CINE MRI and 0.94 (95% CI: 0.89, 0.99) for SE MRI.
MATLAB Example 3.5: MRMC statistical analysis summary (factorial design).
>> res = summary(fit); >> disp(res) Multi-Reader Multi-Case Analysis of Variance Experimental design: factorial Factor types: random readers and random cases Response: ROCAUCVariate Covariance method: unbiased Confidence interval level: 95% Obuchowski-Rockette variance component and covariance estimates:
Estimate |
Correlation |
|
---|---|---|
0.0015365 |
NaN |
|
reader*test |
0.00020776 |
NaN |
Error |
0.00078839 |
NaN |
Covl |
0.00034167 |
0.43338 |
Cov2 |
0.00033906 |
0.43007 |
Cov3 |
0.00023561 |
0.29885 |
ANOVA global statistical test of equal tests:
MS(T) |
MS(T:R) |
Cov2 |
Cov3 |
Denominator |
F |
df1 |
df2 |
p-value |
---|---|---|---|---|---|---|---|---|
0.0047962 |
0.00055103 |
0.00033906 |
0.00023561 |
0.0010683 |
4.4896 |
1 |
15.034 |
0.051162 |
Pairwise test differences:
Comparison |
Estimate |
StdErr |
df |
CI |
t |
p-value |
|
---|---|---|---|---|---|---|---|
“1 – 2” |
−0.0438 |
0.020672 |
15.034 |
−0.087852 |
0.0002513 |
−2.1189 |
0.051162 |
Test means based only on the data for each one:
Estimate |
MS(R) |
Cov2 |
StdErr |
df |
CI |
||
---|---|---|---|---|---|---|---|
0.89704 |
0.0030826 |
0.00047718 |
0.033071 |
12.588 |
0.82535 |
0.96872 |
|
2 |
0.94084 |
0.0013046 |
0.00020095 |
0.021491 |
12.534 |
0.89423 |
0.98744 |
MATLAB Example 3.6: MRMCSummary fields.
>> fields(res) 9×1 cell array {‘response’ } {‘design’ } {‘alpha’ } {‘vcov’ } {‘test_equality’ } {‘test_means’ } {‘test_diffs’ } {‘reader_test_diffs’} {‘reader_means’ }
Descriptions
response: class of the response variable used in the analysis.
design: structure containing information about the study design.
alpha: significance level used to construct confidence intervals.
vcov: table of variances for main effects and covariances between metrics for the following combinations: 1) same reader/different test, different reader/same test 2), and 3) different reader/different test.
test equality: table of a global test of equal tests.
test means: table of pairwise test differences.
test diffs: table of test means.
reader test diffs: table of reader-specific pairwise test differences.
reader means: table of reader-specific test means.
Finally, examples 3.7 and 3.8 are provided to illustrate summary statistical results for hypothetical study design in which cases are nested within readers and cases are nested within tests, respectively.
MATLAB Example 3.7: MRMC statistical analysis summary (cases nested within readers).
>> fit2 = mrmc(y, VanDyke.treatment, VanDyke.reader, VanDyke.case2, ‘cov’, ‘unbiased’); >> summary(fit2) Multi-Reader Multi-Case Analysis of Variance Experimental design: cases nested within readers Factor types: random readers and random cases Response: ROCAUCVariate Covariance method: unbiased Confidence interval level: 95% Obuchowski-Rockette variance component and covariance estimates:
Estimate |
Correlation |
|
---|---|---|
0.0015743 |
NaN |
|
reader*test |
0.00046223 |
NaN |
Error |
0.00015708 |
NaN |
Cov1 |
6.828e-05 |
0.43468 |
Cov2 |
0 |
0 |
Cov3 |
0 |
0 |
ANOVA global statistical test of equal tests:
MS(T) |
MS(T:R) |
Cov2 |
Cov3 |
Denominator |
F |
df1 |
df2 |
p-value |
---|---|---|---|---|---|---|---|---|
0.0047962 |
0.00055103 |
0 |
0 |
0.00055103 |
8.704 |
1 |
4 |
0.041959 |
Pairwise test differences:
Comparison |
Estimate |
StdErr |
df |
CI |
t |
p-value |
|
---|---|---|---|---|---|---|---|
“1 – 2” |
−0.0438 |
0.014846 |
4 |
−0.08502 |
−0.0025804 |
−2.9503 |
0.041959 |
Test means based only on the data for each one:
Estimate |
MS(R) |
Cov2 |
StdErr |
df |
CI |
||
---|---|---|---|---|---|---|---|
0.89704 |
0.0030826 |
NaN |
0.02483 |
4 |
0.8281 |
0.96598 |
|
2 |
0.94084 |
0.0013046 |
NaN |
0.016153 |
4 |
0.89599 |
0.98569 |
MATLAB Example 3.8: MRMC statistical analysis summary (cases nested within tests).
>> fit3 = mrmc(y, VanDyke.treatment, VanDyke.reader, VanDyke.case3, ‘cov’, ‘unbiased’); >> summary(fit3) Multi-Reader Multi-Case Analysis of Variance Experimental design: cases nested within tests Factor types: random readers and random cases Response: ROCAUCVariate Covariance method: unbiased Confidence interval level: 95% Obuchowski-Rockette variance component and covariance estimates:
Estimate |
Correlation |
|
---|---|---|
0.0016426 |
NaN |
|
reader*test |
0.00032719 |
NaN |
Error |
0.00039326 |
NaN |
Cov1 |
0 |
0 |
Cov2 |
0.00016942 |
0.4308 |
Cov3 |
0 |
0 |
ANOVA global statistical test of equal tests:
MS(T) |
MS(T:R) |
Cov2 |
Cov3 |
Denominator |
F |
df1 |
df2 |
p-value |
---|---|---|---|---|---|---|---|---|
0.0047962 |
0.00055103 |
0.00016942 |
0 |
0.0013981 |
3.4305 |
1 |
25.751 |
0.075502 |
Pairwise test differences:
Comparison |
Estimate |
SthErr |
df |
CI |
t |
p-value |
|
---|---|---|---|---|---|---|---|
“1 – 2” |
−0.0438 |
0.023648 |
25.751 |
−0.092433 |
0.0048325 |
−1.8521 |
0.075502 |
Test means based only on the data for each one:
Estimate |
MS(R) |
Cov2 |
StdErr |
df |
CI |
||
---|---|---|---|---|---|---|---|
0.89704 |
0.0030826 |
0.0002385 |
0.029241 |
7.6934 |
0.82914 |
0.96494 |
|
2 |
0.94084 |
0.0013046 |
0.00010033 |
0.019007 |
7.6677 |
0.89668 |
0.985 |
4. CONCLUSIONS
MRMCaov brings a new statistical toolbox for MRMC analysis to the MATLAB software environment and its large community of users. The toolbox enables comparison of imaging modalities with an interactive interface and flexible options for performance metrics, study designs, covariance estimation methods, and statistical estimation and testing of performance. A demonstration of features currently implemented in the toolbox is provided in this paper. Proper statistical methods and readily available software are crucial for the evaluation and comparison of multi-reader multi-case studies of imaging modalities. The MRMCaov software is designed to help ensure the application of such methods.
ACKNOWLEDGMENTS
This research was supported by the National Institutes of Health grant R01 EB025174.
REFERENCES
- [1].Dorfman DD, Berbaum KS, and Metz CE, “Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method.,” Investigative Radiology 27, 723–731 (1992). [PubMed] [Google Scholar]
- [2].Obuchowski NA and Rockette HE, “Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: an ANOVA approach with dependent observations,” Communications in Statistics–Simulation and Computation 24, 285–308 (1995). [Google Scholar]
- [3].Hillis SL, Obuchowski NA, Schartz KM, and Berbaum KS, “A comparison of the DorfmanBerbaum-Metz and Obuchowski-Rockette methods for receiver operating characteristic (ROC) data,” Statistics in Medicine 24, 1579–1607 (2005). [DOI] [PubMed] [Google Scholar]
- [4].Hillis SL, “A comparison of denominator degrees of freedom methods for multiple observer ROC analysis,” Statistics in Medicine 26, 596–619 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Hillis SL, Berbaum KS, and Metz CE, “Recent developments in the Dorfman-Berbaum-Metz procedure for multireader ROC study analysis,” Academic Radiology 15, 647–661 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Hillis SL, “Relationship between Roe and Metz simulation model for multireader diagnostic data and Obuchowski-Rockette model parameters,” Statistics in Medicine 37, 2067–2093 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Smith BJ, Hillis SL, and Pesce LL, MCMCaov: Multi-Reader Multi-Case Analysis of Variance (2022). MATLAB toolbox version 0.2.
- [8].Schartz KM, Hillis SL, Pesce LL, Berbaum KS, and Metz CE, OR-DBM MRMC (2019). version 2.52.
- [9].MATLAB, [Version 9.11.0 (R2021b)], The MathWorks Inc., Natick, Massachusetts: (2021). [Google Scholar]
- [10].VanDyke CW, White RD, Obuchowski NA, Geisinger MA, Lorig RJ, and Meziane MA, “Cine MRI in the diagnosis of thoracic aortic dissection,” 79th Radiological Society of North America Meetings (1993). [Google Scholar]
- [11].Pepe MS, [The Statistical Evaluation of Medical Tests for Classification and Prediction], Oxford University Press, New York: (2003). [Google Scholar]
- [12].Abbey CK, Samuelson FW, and Gallas BD, “Statistical power considerations for a utility endpoint in observer performance studies,” Academic Radiology 207, 798–806 (2013). [DOI] [PubMed] [Google Scholar]
- [13].Youden WJ, “Index for rating diagnostic tests,” Cancer 3(1), 32–35 (1950). [DOI] [PubMed] [Google Scholar]
- [14].Perkins NJ and Schisterman EF, “The inconsistency of ”optimal” cutpoints obtained using two criteria based on the receiver operating characteristic curve,” American Journal of Epidemiology 163(7), 670–675 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].DeLong ER, DeLong DM, and Clarke-Pearson DL, “Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,” Biometrics 44, 837–845 (1988). [PubMed] [Google Scholar]
- [16].Efron B, [The Jackknife, the bootstrap and other resampling plans], SIAM, Philadelphia: (1982). [Google Scholar]