Multi-reader multi-case analysis of variance software for diagnostic performance comparison of imaging modalities

Brian J Smith; Stephen L Hillis

doi:10.1117/12.2549075

. Author manuscript; available in PMC: 2020 Apr 29.

Published in final edited form as: Proc SPIE Int Soc Opt Eng. 2020 Mar 16;11316:113160K. doi: 10.1117/12.2549075

Multi-reader multi-case analysis of variance software for diagnostic performance comparison of imaging modalities

Brian J Smith ^a, Stephen L Hillis ^b

PMCID: PMC7190386 NIHMSID: NIHMS1580099 PMID: 32351258

Abstract

A common study design for comparing the diagnostic performance of imaging modalities is to obtain modality-specific ratings from multiple readers of multiple cases whose true statuses are known. Typically, there is overlap between the modalities, readers, and/or cases for which special analytical methods are needed to perform statistical comparisons. We describe our new R software package MRMCaov, which is designed for multi-reader multi-case comparisons of two or more imaging modalities. The software allows for the comparison of reader performance metrics, such as area under the receiver operating characteristic curve (ROC AUC), with analysis of variance methods originally proposed by Obuchowski and Rockette (1995) and later unified and improved by Hillis and colleagues (2005, 2007, 2008, 2018). MRMCaov is an open-source package with an integrated command-line interface for performing multi-reader multi-case statistical analysis, plotting, and presenting results. Features of the package include (1) ROC curves estimated parametrically or non-parametrically; (2) reader-specific ROC curves and performance metrics; (3) user-definable performance metrics; (4) modality-specific estimates of mean performance along with confidence intervals and p-values for statistical comparisons; (5) support for factorial, nested, or partially paired study designs; (6) inference for random readers and cases, random readers and fixed cases, or fixed readers and random cases; (7) DeLong, jackknife, or unbiased covariance estimation; and (8) compatibility with Microsoft Windows, Mac OS, and Linux.

Keywords: multi-reader multi-case, ANOVA, ROC analysis, diagnostic radiology, software

1. INTRODUCTION

A common study design for comparing the diagnostic performance of imaging modalities, or diagnostic tests, is to obtain modality-specific ratings from multiple readers of multiple cases (MRMC) whose true statuses are known. In such a design, receiver operating characteristic (ROC) indices, such as area under the ROC curve (ROC AUC), can be used to quantify correspondence between reader ratings and case status. Indices can then be compared statistically to determine if there are differences between modalities. However, special statistical methods are needed when readers or cases represent a random sample from a larger population of interest and there is overlap between modalities, readers, and/or cases. An ANOVA model designed for these characteristics of MRMC studies was initially proposed by Dorfman et al.¹ and Obuchowski and Rockette² and later unified and improved by Hillis and colleagues.^3–6 Their models are implemented in the MRMCaov R package.⁷

MRMCaov performs multi-reader multi-case analysis of variance for the performance comparison of imaging modalities. This software is the first R implementation of Dr. Hillis unified methodology and builds upon his OR-DBM MRMC SAS software.⁸ It is designed to be user friendly, integrate with the R statistical computing and graphics environment, and offer new features and methodologies. Current features of the package are summarized below. Usage of the software is illustrated with a medical imaging example in the next section.

graphic file with name nihms-1580099-f0002.jpg

2. USAGE

MRMCaov provides both graphical and tabular analysis results, including reader-specific ROC curves and AUC estimates, modality-specific estimates, confidence intervals, and p-values for statistical comparisons. It was release in July 2019 as a completely new R package, which includes a new method for unbiased covariance estimation as well as other features not collectively available in any other existing R package.

2.1. Installation

The R package is currently available for download from https://github.com/brian-j-smith/MRMCaov. Installation instructions are provided at the download site. Once installed, the package may be loading for use in an R software⁹ session with the command library(MRMCaov).

2.2. Data

Input data for MRMCaov analysis should be given as an R data frame with columns for reader, treatment, and case identifiers as well as event status truth and ratings and with their values in the rows. An example data frame named VanDyke is provided with the package. The data come from a study in which the relative performance of cinematic presentation of MRI (CINE MRI) was compared to single spin-echo magnetic resonance imaging (SE MRI) for the detection of thoracic aortic dissection.¹⁰ Forty five patients with aortic dissection and 69 without dissection were imaged with both modalities. Based on the images, five radiologists rated patient disease status as 1 = definitely no aortic dissection, 2 = probably no aortic dissection, 3 = unsure about aortic dissection, 4 = probably aortic dissection, or 5 = definitely aortic dissection. Interest lies in estimating ROC curves for each combination of reader and modality and in comparing modalities with respect to summary statistics from the curves.

graphic file with name nihms-1580099-f0003.jpg

Descriptions of the data frame variables are as follows.

reader: unique identifiers for the five radiologists.
treatment: identifiers for the imaging modality (1 = CINE MRI, 2 = SE MRI).
case: identifiers for the 114 cases.
truth: indicator for thoracic aortic dissection (1 = done, 0 = not done).
rating: five-point ratings given to case images by the readers.
case2: example identifiers representing nesting of cases within readers.
case3: example identifiers representing nesting of cases within treatments.

2.3. MRMC Analysis Specification

The first step in conducting a multi-reader, multi-case analysis with the package is an R command-line call to the function mrmc to specify the data inputs, performance metric of interest, and covariance estimation method. Below is a summary of the function arguments.

graphic file with name nihms-1580099-f0004.jpg

Study design is automatically inferred from the case identifiers. If the identifiers are different across readers and tests, a factorial design (design = 1) is assumed. Cases are assumed to be nested within readers if their identifiers are the same across readers and are assumed to be nested within tests if their identifiers are the same across tests. Examples of the latter two designs are given with the case2 and case3 variables in the VanDyke data frame. The case variable is coded for the full factorial design originally employed in the study. For the specification of response in mrmc, several performance metric functions are supplied by the package. Metrics include area under ROC curves, specificity, and sensitivity estimated either empirically or parametrically. Other, user-defined functions are supported.

graphic file with name nihms-1580099-f0005.jpg

Performance metrics measure the degree to which higher case ratings are associated with positive case statuses, where positive status is taken to be the highest level of the truth variable. AUC is derived from ROC curves estimated empirically by empirical_auc and trapezoidal_auc and parametrically with a proper binormal model by proproc_auc. Details of empirical and parametric ROC AUC estimation and derived performance metrics can be found in the book of Pepe.¹¹ Methods for calculating covariances include DeLong,¹² jackknife,¹³ and unbiased.⁶ The methods that can be used in an analysis depends on the performance metric specified. Jackknife will work with any metric. DeLong and unbiased can be used with empirically estimated AUC. The mrmc function automatically checks for and informs users if called with an incompatible combination of covariance method and performance metric. In the next code example, mrmc is called to specify an analysis in which the VanDyke imaging modalities are compared with respect to area under the empirically estimated ROC curve and with unbiased covariance estimation. Results of the function call are saved to a new variable est. As can be seen from printing the results, the function calculates and returns several quantities needed for statistical comparison of the imaging modalities, including ROC estimates of false and true positive rates, areas under the ROC curves, analysis of variance sums of squares, and covariance estimates.

graphic file with name nihms-1580099-f0006.jpg

By default, readers and cases are assumed to be random factors in the MRMC analysis. Either one of the factors may be designated as fixed in calls to mrmc with the syntax fixed(<variable name>), where <variable name> is the name of the corresponding reader or case variable.

graphic file with name nihms-1580099-f0007.jpg

2.4. Statistical Analysis

Statistical comparisons of treatment modalities is performed by calling the summary function with output returned by mrmc. The summary call produces ANOVA results for a global test of equality of ROC AUC means across all treatment modalities and tests of pairwise differences, along with confidence intervals for the differences and intervals for the individual modalities.

graphic file with name nihms-1580099-f0008.jpg

The confidence interval level (default: 95%) can be changed with the conf.int argument to summary.

graphic file with name nihms-1580099-f0009.jpg

3. CONCLUSIONS

MRMCaov brings a new statistical package for MRMC analysis to the R software environment and its large community of users. The package enables comparison of imaging modalities with an interactive interface and flexible options for performance metrics, study designs, covariance estimation methods, and statistical estimation and testing of performance. A demonstration of features currently implemented in the package is provided in this paper. New features, such as partial ROC AUC performance metrics and methodological extensions, are under development and planned for inclusion in future versions. Proper statistical methods and readily available software are crucial for the evaluation and comparison of multi-reader multi-case studies of imaging modalities. The MRMCaov software is designed to help ensure the application of such methods.

Figure 1. — R Example: Plots of ROC estimates from mrmc function call.

ACKNOWLEDGMENTS

This research was supported by the National Institutes of Health grant R01 EB025174.

REFERENCES

[1].Dorfman DD, Berbaum KS, and Metz CE, “Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method.,” Investigative Radiology 27, 723–731 (1992). [PubMed] [Google Scholar]
[2].Obuchowski NA and Rockette HE, “Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: an ANOVA approach with dependent observations,” Communications in Statistics–Simulation and Computation 24, 285–308 (1995). [Google Scholar]
[3].Hillis SL, Obuchowski NA, Schartz KM, and Berbaum KS, “A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette methods for receiver operating characteristic (ROC) data,” Statistics in Medicine 24, 1579–1607 (2005). [DOI] [PubMed] [Google Scholar]
[4].Hillis SL, “A comparison of denominator degrees of freedom methods for multiple observer ROC analysis,” Statistics in Medicine 26, 596–619 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Hillis SL, Berbaum KS, and Metz CE, “Recent developments in the Dorfman-Berbaum-Metz procedure for multireader ROC study analysis,” Academic Radiology 15, 647–661 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Hillis SL, “Relationship between Roe and Metz simulation model for multireader diagnostic data and Obuchowski-Rockette model parameters,” Statistics in Medicine 37, 2067–2093 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Smith BJ and Hillis SL, MRMCaov: Multi-Reader Multi-Case Analysis of Variance (2020). R package version 0.1.3. [DOI] [PMC free article] [PubMed]
[8].Schartz KM, Hillis SL, Pesce LL, Berbaum KS, and Metz CE, OR-DBM MRMC (2019). version 2.52.
[9].R Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria: (2019). [Google Scholar]
[10].VanDyke CW, White RD, Obuchowski NA, Geisinger MA, Lorig RJ, and Meziane MA, “Cine MRI in the diagnosis of thoracic aortic dissection,” 79th Radiological Society of North America Meetings (1993). [Google Scholar]
[11].Pepe MS, [The Statistical Evaluation of Medical Tests for Classification and Prediction], Oxford University Press, New York: (2003). [Google Scholar]
[12].DeLong ER, DeLong DM, and Clarke-Pearson DL, “Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,” Biometrics 44, 837–845 (1988). [PubMed] [Google Scholar]
[13].Efron B, [The Jackknife, the bootstrap and other resampling plans], SIAM, Philadelphia: (1982). [Google Scholar]

[R1] [1].Dorfman DD, Berbaum KS, and Metz CE, “Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method.,” Investigative Radiology 27, 723–731 (1992). [PubMed] [Google Scholar]

[R2] [2].Obuchowski NA and Rockette HE, “Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: an ANOVA approach with dependent observations,” Communications in Statistics–Simulation and Computation 24, 285–308 (1995). [Google Scholar]

[R3] [3].Hillis SL, Obuchowski NA, Schartz KM, and Berbaum KS, “A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette methods for receiver operating characteristic (ROC) data,” Statistics in Medicine 24, 1579–1607 (2005). [DOI] [PubMed] [Google Scholar]

[R4] [4].Hillis SL, “A comparison of denominator degrees of freedom methods for multiple observer ROC analysis,” Statistics in Medicine 26, 596–619 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].Hillis SL, Berbaum KS, and Metz CE, “Recent developments in the Dorfman-Berbaum-Metz procedure for multireader ROC study analysis,” Academic Radiology 15, 647–661 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] [6].Hillis SL, “Relationship between Roe and Metz simulation model for multireader diagnostic data and Obuchowski-Rockette model parameters,” Statistics in Medicine 37, 2067–2093 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Smith BJ and Hillis SL, MRMCaov: Multi-Reader Multi-Case Analysis of Variance (2020). R package version 0.1.3. [DOI] [PMC free article] [PubMed]

[R8] [8].Schartz KM, Hillis SL, Pesce LL, Berbaum KS, and Metz CE, OR-DBM MRMC (2019). version 2.52.

[R9] [9].R Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria: (2019). [Google Scholar]

[R10] [10].VanDyke CW, White RD, Obuchowski NA, Geisinger MA, Lorig RJ, and Meziane MA, “Cine MRI in the diagnosis of thoracic aortic dissection,” 79th Radiological Society of North America Meetings (1993). [Google Scholar]

[R11] [11].Pepe MS, [The Statistical Evaluation of Medical Tests for Classification and Prediction], Oxford University Press, New York: (2003). [Google Scholar]

[R12] [12].DeLong ER, DeLong DM, and Clarke-Pearson DL, “Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,” Biometrics 44, 837–845 (1988). [PubMed] [Google Scholar]

[R13] [13].Efron B, [The Jackknife, the bootstrap and other resampling plans], SIAM, Philadelphia: (1982). [Google Scholar]

PERMALINK

Multi-reader multi-case analysis of variance software for diagnostic performance comparison of imaging modalities

Brian J Smith

Stephen L Hillis

Abstract

1. INTRODUCTION