Abstract
Rationale and Objectives
A basic assumption for a meaningful diagnostic decision variable is that there is a monotone relationship between the decision variable and the likelihood of disease. This relationship, however, generally does not hold for the binormal model. As a result, ROC-curve estimation based on the binormal model produces improper ROC curves that are not concave over the entire domain and cross the chance line. Although in practice the “improperness” is typically not noticeable, there are situations where the improperness is evident. Presently, standard statistical software does not provide diagnostics for assessing the magnitude of the improperness.
Materials and Methods
We show how the mean-to-sigma ratio can be a useful, easy-to-understand and easy-to-use measure for assessing the magnitude of the improperness of a binormal ROC curve by showing how it is related to the chance-line crossing. We suggest an improperness criterion based on the mean-to-sigma ratio.
Results
Using a real-data example we illustrate how the mean-to-sigma ratio can be used to assess the improperness of binormal ROC curves, compare the binormal method with an alternative proper method, and describe uncertainty in a fitted ROC curve with respect to improperness.
Conclusions
By providing a quantitative and easily computable improperness measure, the mean-to-sigma ratio provides an easy way to identify improper binormal ROC curves and facilitates comparison of analysis strategies according to improperness categories in simulation and real-data studies.
Keywords: receiver operating characteristic (ROC) curve, diagnostic radiology, mean-to-sigma ratio, binormal model, proper ROC model
Introduction
For diagnostic studies that evaluate and compare medical imaging modalities (e.g., mammography) that require a human reader (typically a radiologist) to interpret generated images with respect to disease likelihood or severity, a commonly used method for estimating a receiver operating characteristic (ROC) curve is to use maximum likelihood estimation based on the assumption of a latent binormal model [1–4]; we refer to this method as the binormal method. The latent binormal model assumption states that there exists a monotone transformation that, when applied to the decision variable of interest, results in a latent decision variable that is normally distributed for nondiseased cases as well as for diseased cases, with the means and variances allowed to differ for the two distributions. For example, consider a study where a radiologist is asked to assign likelihood-of-disease confidence levels to images using a discrete five-level ordinal integer scale (e.g., 1 = “definitely not diseased”, …, 5 = “definitely diseased); for this situation it is typical to assume that these ratings represent the binning of values of a latent (i.e., unobserved) continuous decision variable representing the reader’s likelihood-of-disease perception. Often the ROC-curve summary measure of interest is the area under the curve (AUC).
For large samples the binormal method has been shown to perform well for decision variable distributions that can vary greatly from the binormal distribution [5–8]. We refer to the ROC curve corresponding to a latent binormal model decision variable as the binormal ROC curve. Throughout we assume that the decision variable of interest is continuous and that larger values of it are more indicative of disease.
In most practical situations a meaningful decision variable should be an increasing function of the likelihood ratio (likelihood of being diseased divided by likelihood of not being diseased)[9, p 94]. A decision variable model having this property and its corresponding ROC curve are said to be proper[10, p 37]. A function whose first derivative is decreasing throughout an open interval is called concave or concave downward in that interval, and a function whose first derivative is increasing throughout an open interval is called convex or concave upward in that interval [11, pp 144–145]. Since the slope of an ROC curve for a continuous decision variable is equal to the likelihood ratio at the corresponding threshold, it follows that the slope of a proper ROC curve decreases as the false positive fraction (fpf) increases, that is, a proper ROC curve will be concave everywhere (0 ≤ fpf ≤ 1) [9, pp 70–71]. If the decision variable is not an increasing function of the likelihood function, then its model and corresponding ROC curve are said to be improper[10, p 37].
The latent binormal model is improper if the nondiseased and diseased distribution variances differ; furthermore, there is a single fpf value such that the ROC curve is concave on one side and convex on the other side[9, p 83]. In addition, as we show later, there is a single fpf value where the ROC curve crosses the chance line, implying that for a range of fpf values the decision variable performs worse than guessing.
Although binormal model ROC curves are improper unless the diseased and nondiseased variances are equal, in practice the “improperness” is so small that it is not apparent when looking at the ROC curve. However, there are situations when the improperness is apparent, with the ROC curve visibly crossing below the chance line and having an obvious “hook”. For these situations we deem the ROC curve and its corresponding binormal model to be noticeably or slightly improper, depending on how easily the improperness can be seen. Pan and Metz [12, p 381] note that “because ROC curves do not show shapes of this kind when they are estimated from reliable data sets, hooks and degeneracy can be considered artifacts of the conventional binormal ROC model.”
Presently, researchers often ignore or do not check for improperness in fitted binormal ROC curves, even though there can be situations where the magnitude of the improperness is large enough to make the validity of conclusions based on the improper ROC curve questionable. Furthermore, standard statistical software packages do not provide any diagnostics for assessing the magnitude of the improperness; thus the researcher can only know the extent of the improperness from visually examining ROC-curve plots, which often is not done when the researcher is primarily interested in an ROC-curve summary index, such as the AUC.
There is not general agreement on an appropriate analysis strategy for ROC data that will satisfactorily account for the inherent improperness of binormal ROC curves. At one end of the spectrum is the strategy of using the binormal method and ignoring any improperness in resulting ROC curves, and at the other end is the strategy of always using a proper method that never results in improper ROC curves. In between are other strategies, such as using a proper method only when the binormal method produces a clearly visible improper ROC curve. Although improperness can be visually assessed from graphs, a discussion of the different analysis strategies requires a quantitative improperness measure that is easy to compute and interpret. Our purpose is to investigate the properties of the mean-to-sigma ratio as a quantitative measure of improperness. However, we do not attempt to discuss which analysis strategy should be used, since that would require separate treatment.
In summary, our main purpose is to show how the mean-to-sigma ratio can be a useful, easy-to-understand, and easy-to-use measure for assessing the magnitude of the improperness of binormal ROC curves. The outline of the paper is as follows. We illustrate the inherent improperness of the binormal model with an example, show how the mean-to-sigma ratio can be used as a measure for assessing the degree of improperness, and discuss alternative proper models. Using data from a multi-reader multi-modality study, we illustrate the usefulness of the mean-to-sigma ratio for assessing improperness and for comparing the binormal method with an alternative method based on a proper model.
Materials and Methods
Example of a noticeably improper binormal ROC curve
To illustrate the inherent improperness in binormal ROC curve estimation, consider Table 1 which shows the rating data for one reader from a study [13] that will be described in more detail in the Results section. Figure 1 shows the corresponding fitted binormal ROC curve; note that there is a visible hook and chance-line crossing near the upper right-hand corner of the unit square. In Figure 1 the ROC curve crosses the chance line at the point (fpf, tpf) = (0.976, 0.976), where tpf stands for true positive fraction, shown by the intersection of the “crossing” reference line with the ROC curve. Furthermore, this ROC curve is concave for fpf < 0.735, but is convex for fpf > 0.735. Letting ROC(t) denote the tpf corresponding to fpf = t, the point (.735, ROC (.735))on the ROC curve separates the concave and convex portions of the curve and thus is an inflection point; this point is shown by the intersection of the “inflection” reference line and the ROC curve in Figure 1.
Table 1.
Rating data for a radiologist from Van Dyke et al [13].
| Rating | ||||||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | ||
| Normals: | 39 | 19 | 9 | 1 | 1 | 69 |
| Diseased: | 7 | 7 | 3 | 5 | 23 | 45 |
| 46 | 26 | 12 | 6 | 24 | 114 | |
Figure 1.
Binormal ROC curve for the data in Table 1. The curve is concave to the left of the inflection reference line (fpf < t1) and convex to the right (fpf > t1), and drops below the chance line to the right of the crossing reference line (fpf > t0). Thus t0 is the chance-line crossing fpf and t1 is the inflection-point fpf.
In general we will denote the fpf = tpf value where the ROC curve crosses the chance line by t0 and refer to t0 as the chance-line crossing fpf; similarly, we will denote the fpf value where the ROC curve changes from convex to concave (for either increasing or decreasing fpf) by t1 and refer to t1 as the inflection-point fpf. (More precisely, t1 is the fpf -coordinate of the inflection point.) For example, in Figure 1 the chance-line crossing fpf is t0 = .976 and the inflection-point fpf is t1 = .735.
Figure 2 shows the latent nondiseased and diseased distribution densities that yield the ROC curve in Figure 1, with y denoting the latent binormal-model decision variable with the standard parameter constraints imposed: μ1 = 0 and σ1 = 1. The log-likelihood ratio (diseased divided by nondiseased density) is shown in the upper part of the figure. The crossing and inflection reference lines in Figure 2 mark the decision variable thresholds, c0 = −1.98 and c1 = −0.628, that correspond to t0 = 0.976 and t1 = 0.735, respectively, in Figure 1. We refer to c0 and c1 as the chance-line crossing threshold and the inflection-point threshold, respectively. In Figure 2, Pr(Y > c0) is the same for both the nondiseased and diseased distributions; correspondingly, the ROC curve crosses the chance line at (t0, t0) in Figure 1, with t0 = Pr (Y > c0 | nondiseased) = 1 − Φ(c0) = Φ (−c0), where Φ(·)is the cumulative standard normal distribution function. In Figure 2 the log-likelihood ratio slope is zero at y = c1, with the log-likelihood ratio decreasing with increasing y for y ≤ c1 and increasing with increasing y for y ≥ c1; correspondingly, the ROC curve is concave in (0, t1)and convex in (t1,1), where t1 = Φ(−c1).
Figure 2.
The latent binormal distribution that yields the ROC curve in Figure 1 and its corresponding log-likelihood ratio. The binormal distribution parameters are μ1 = 0, σ1 = 1, μ2 = 2.29337, σ2 = 2.15743 (a = 1.06301, b = 0.46351). The inflection and crossing reference lines correspond to those in Figure 1, with the thresholds c0 and c1 corresponding to the fpf values t0 and t1, respectively, in Figure 1. Thus c0 is the chance-line crossing threshold and c1 is the inflection-point threshold.
Although it is clear from Figure 1 that the ROC curve is not concave everywhere, it is not possible to visually identify the point where the ROC curve ceases to be concave without the reference line. In general, one cannot discern the point where the ROC curve changes from concave to convex, even for visibly improper curves. However, it is possible without the reference line to see where the ROC curve crosses the chance line for visibly improper curves. For this reason we believe that specifying the chance-line crossing fpf conveys an easier-to-interpret assessment of the improperness of the binormal ROC curve than specifying the inflection-point fpf.
Mean-to-sigma ratio as a measure of improperness
The main point of this paper is that the degree of improperness of a binormal ROC curve can easily be assessed by the mean-to-sigma ratio, which we denote by r. The mean-to-sigma ratio is defined by
| (1) |
where μ2 and σ2 are the mean and standard deviation of the diseased latent decision variable distribution, and μ1 and σ1 are the corresponding parameters for the nondiseased latent decision variable distribution. Defining Δm = μ2 − μ1 and Δσ = σ2 − σ1, we can write
Letting a and b denote the usual parameters of the binormal model, i.e., a = (μ2 − μ1)/σ2 and b = σ1/σ2, it follows from Equation (1) that
| (2) |
For σ1 − σ2 = 0 we define r = ∞ if μ2 − μ1 > 0 and r = −∞ if μ2 − μ1 < 0. If σ2 − σ1 = 0 and μ2 − μ1 = 0, then r is undefined.
The mean-to-sigma ratio was first introduced by Swets et al [14], who noticed that it seemed to be approximately constant for a variety of experiments. Some support for this conclusion was provided by later analyses [8, 15, 16]. For example, Green and Swets [16, p 95] note that r ≈ 4 describes the relationship between the latent means and standard deviations for many studies. These references only discuss using r to describe the relationship between the latent diseased and nondiseased distribution means and standard deviations.
However, a useful characteristic of the mean-to-sigma ratio that has received virtually no attention is its relationship to the binormal ROC-curve chance-line crossing fpf. Specifically, we show in the Appendix (available online at www.academicradiology.org) that for a binormal ROC curve with σ1 ≠ σ2 there exists a unique chance-line crossing fpf given by
| (3) |
with corresponding chance-line crossing threshold
| (4) |
It follows from Equation (4) that for μ1 = 0 and σ1 = 1 the corresponding threshold can be expressed in terms of the mean-to-sigma ratio:
| (5) |
The only previous reference to relationship (3) of which we are aware is a brief mention by Dorfman et al [17, p 141], but they do not prove the result or suggest using r as a measure for assessing improperness. We also show in the Appendix that for increasing fpf the ROC curve crosses the chance-line from above if b < 1 and from below if b > 1. These results are summarized in part (a) of Table 2.
Table 2.
Summary of chance-line crossing and inflection-point results for the binormal model with σ1 ≠ σ2. The results are proved in the Appendix.
| a) Chance-line crossing results | ||
| fpf: |
|
|
| threshold: |
|
|
| direction: |
|
|
| b) Inflection-point results | ||
| fpf: |
|
|
| threshold: |
|
|
| Concave and convex segments of ROC curve: | ||
|
|
||
| Slope of the likelihood and log-likelihood ratio functions: | ||
|
|
||
Note: r = (μ2 − μ1)/(σ2 − σ1) is the mean-to-sigma ratio.
For example, the latent binormal distribution in Figure 2 is based on the parameter estimates for the Table 1 data, given by μ1 = 0, σ1 = 1, μ2 = 2.29337, and σ2 = 2.15743 (a = 1.06301, b = 0.46351). From Equations (2), (3), and (5) we have r = a/(1−b) = 1.9814, t0 = Φ(r) = .976, and c0 = −r = −1.9814; hence it follows that the chance line crossing occurs at (.976, .976), as indicated in Figure 1, and Pr(Y > −1.9814) = .976 for both nondiseased and diseased distributions, as indicated in Figure 2.
Table 3 shows the chance-line crossing fpf t0 for various values of r, computed using Equation (3). For example, we see that for r = 1 the chance line cross occurs at (.841, .841) and for r = −1 it occurs at (.159, .159); for r = 2 it occurs at (.977, .977) and for r = −2 it occurs at (0.023, 0.023). Note from Equation (3) that t0 > .5 if r > 0 and t0 < .5 if r < 0.
Table 3.
Mean-to-sigma ratio r and the corresponding chance-line crossing fpf t0 for the binormal model.
| r | t0 |
|---|---|
| −4.0 | 0.00003 |
| −3.5 | 0.00023 |
| −3.0 | 0.00135 |
| −2.5 | 0.00621 |
| −2.0 | 0.02275 |
| −1.5 | 0.06681 |
| −1.0 | 0.15866 |
| −0.5 | 0.30854 |
| 0.0 | 0.50000 |
| 0.5 | 0.69146 |
| 1.0 | 0.84134 |
| 1.5 | 0.93319 |
| 2.0 | 0.97725 |
| 2.5 | 0.99379 |
| 3.0 | 0.99865 |
| 3.5 | 0.99977 |
| 4.0 | 0.99997 |
Figure 3 shows ROC curves for combinations of μ2 − μ1 = 1, 2, 3 and r = .5, 1, 1.5, 2, 2.5, 3, 3.5, 4, with σ1 = 1. Vertical crossing reference lines that intersect the fpf axis at (t0, 0) have been added to clearly show the chance-line crossing fpfs. The value of σ2 (not shown) varies for each combination; from Equation (1) it is given by
Figure 3.
ROC curves for positive values of the mean-to-sigma ratio r and μ2 − μ1 = 1, 2, 3, with σ1 = 1. The intersection of the ROC curve with the chance line occurs at (t0, t0), where t0 = Φ (r). The vertical crossing reference lines intersect the fpf-axis at fpf = t0. The value of σ2 (not shown) varies for each combination and is given by σ2 = σ1 + (μ2 − μ1)/r.
| (6) |
Figure 4 shows ROC curves for combinations of μ2 − μ1 = .25, 75, 1.25 with r = −.5, −1, −1.5, −2, −2.5, −3, −3.5, −4, with σ1 = 1. As in Figure 3, the value of σ2 varies for each combination according to Equation (6). Since r < 0 and a = (μ2 − μ1)/σ2 > 0, it follows from Equation (2) that b > 1. Writing Equation (1) in the form
Figure 4.
ROC curves for negative values of the mean-to-sigma ratio r and μ2 − μ1 = 1, 2, 3, with σ1 = 1. The intersection of the ROC curve with the chance line occurs at (t0, t0), where t0 = Φ (r). The vertical crossing reference lines intersect the fpf axis at fpf = t0. The value of σ2 (not shown) varies for each combination and is given by σ2 = 1 + (μ2 − μ1)/r.
shows that
| (7) |
for the Figure 4 combinations, since b > 1 and hence |b/(1 − b)| > 1. Thus there is only one ROC curve in Figure 4 for r = −.5, which corresponds to μ2 − μ1 = .25, since r cannot be equal to −.5 if μ2 − μ1 =.75 or μ2 − μ1 = 1.25 due to constraint (7). Similarly, there are only two curves in Figure 4 for r = −1.0 since r cannot be equal to −1.0 if μ2 − μ1 = 1.25.
Figures 3 and 4 illustrate that, assuming μ2 − μ1 > 0, the ROC curve will cross the chance line from above in the upper right quadrant for increasing fpf if 0 < r < ∞, while if −∞ < r < 0 the ROC curve will cross the chance line from below in the lower left quadrant. For example, in Figure 3 the crossing must be in the upper right quadrant since r > 0 implies t0 = Φ(r) > .5, and the crossing must be from above since b < 1(σ2 − σ1 > 0 follows from r > 0, μ2 − μ1 > 0 and Equation (1)). Similarly, in Figure 4 the ROC curves cross from below in the lower left quadrant since r < 0 and b > 1.
What is important to note from Figures 3 and 4 is that we can see the crossing of the chance line if |r| = 2, we can barely see it if |r| = 2.5, and it is not discernable for |r| = 3. From these observations we classify the improperness of a binormal ROC curve as indiscernible if |r| ≥ 3, noticeable if |r| ≤ 2, and slight if 2 < |r| < 3, as summarized in Table 4. However, we note that these boundaries are somewhat arbitrary.
Table 4.
Improperness classification of binormal ROC curves based on the mean-to-sigma ratio r.
| Criteria | Improperness Classification |
|---|---|
| |r| ≤ 2 | Noticeable |
| 2 < |r| < 3 | Slight |
| |r| ≥ 3 | Indiscernible |
For the special case of the equal variance binormal model, if μ2 − μ1 > 0 then r = ∞ and if μ2 − μ1 < 0 then r = −∞. For either situation the ROC curve intersects the chance line only at the right and left sides of the unit square; hence the ROC curve does not “cross” the chance line.
Inflection-point fpf and threshold
We show in the Appendix the following results for the inflection-point fpf and threshold. The unique inflection-point fpf is given by
| (8) |
and the corresponding inflection-point threshold is given by
If μ1 = 0 and σ1 = 1 then c1 can be written as
| (9) |
The derivative of the likelihood ratio and log-likelihood ratio function is zero only at c1; otherwise it is nonzero, having the same sign for all thresholds less than c1 and the opposite sign for all thresholds greater than c1. If b < 1 then the binormal ROC curve is concave over (0, t1) and convex over (t1, 1); if b > 1 then the binormal ROC curve is convex over(0, t1) and concave over (t1, 1). These results are listed in part (b) of Table 2. Formulas (8) and (9) were used to compute the inflection-point threshold and fpf in Figures 1 and 2.
As mentioned earlier, we believe that the mean-to-sigma ratio is a more meaningful measure of improperness than the infection-point fpf since, unlike the inflection-point fpf, the chance-line crossing fpf can be visually discerned for noticeably or slightly improper ROC curves.
Proper methods
We refer to an estimation method that always produces a proper ROC-curve estimate as a proper method. One approach that results in a proper method is maximum likelihood estimation based on a proper model. Some examples of proper models are the equal variance binormal model, the gamma model [5, 10, pp 195–201, 17], the contaminated binormal model [18–20], and the model proposed by Pan and Metz [12]. Although proper methods have the advantage of always producing a proper estimated ROC curve, the binormal method remains the standard method for estimating ROC curves, with proper methods used only occasionally
Of these proper models, the model proposed by Pan and Metz [12] is most closely related to the binormal model; for this reason we use it in Example 2 in the Results section to illustrate the usefulness of the mean-to-sigma ratio for comparing the performance of the binormal method with a proper method. Briefly, for every binormal model a corresponding proper model is defined by using the likelihood ratio of the binormal model as the new decision variable; the resulting model is always proper. Although References [12, 21] refer to this as the “‘proper’ binormal model”, we prefer to call it the binormal likelihood-ratio model (binormal-LR model) because we believe this name better describes it. Although the binormal-LR model is derived from the binormal model, we note that it is not a binormal model because its decision variable is not a monotone function of the decision variable of a binormal model (except for the case when the binormal-LR model is the same as an equal-variance binormal model). The binormal-LR model can be estimated using the PROPROC procedure [21, p 13].
Improperness classification rule proposed by Pan and Metz
Pan and Metz [12, pp 1–2] define a hook as “ the proportion of an ROC curve that lies below the +45° diagonal ‘guessing line’ of the ROC plot.” Metz and Pan [21] state that empirically the “hook” in a binormal ROC curve is evident if
| (10) |
with
| (11) |
where a and b are the binormal distribution parameters, and da and c represent a parameterization of the corresponding binormal-LR model. (Actually Metz and Pan use “≲” instead of “≤”; we have substituted the latter symbol to facilitate comparison with r.) We assume b ≠ 1 to avoid a zero denominator in da/|c|. We show in the Appendix that
| (12) |
where
From equation (12) it follows that condition (10) is equivalent to
It can be shown that
| (13) |
with M = 2 if and only if b = 1. It follows from Equations (12) and (13) that
but the reverse implication does not hold; that is, our rule is more conservative in classifying a binormal curve as having an evident hook. For typical values of b, M is close to 2; in particular, if .5 ≤ b ≤ 2 then 1.90 ≤ M ≤ 2. For example, for b = 0.5 or 2.0, M = 1.90 and hence da/|c| ≤ 6 if and only if |r| ≤ 3.16; more generally, for .5 ≤ b ≤ 2 the boundary of 6 in condition (10) corresponds to a boundary between 3 and 3.16 for |r| for a given b. Thus we see that for typical values of b the boundary da/|c| = 6 is similar to our proposed boundary of |r| = 3 separating slight and indiscernible improperness. On the other hand, if b = 0.1 or 10, then M = 1.548 and hence da/|c| ≤ 6 if and only if |r| ≤ 3.88. For instance, if a = 3.0 and b = .1, then da/|c| = 5.16 (from Eq (11)) and r = 3.33; here the hook should be evident according to rule (10) but our rule says it is not evident, which we believe is a reasonable assertion since the chance line crossing occurs at the fpf value Φ (3.33) = .99957. This last example illustrates an important advantage of the mean-to-sigma ratio – it has a clear interpretation of improperness in terms of the chance line crossing.
Metz and Pan [21, p 13] note that the binormal and corresponding binormal-LR ROC curves “are indistinguishable if and only if no ‘hook’ is evident in the conventional binormal ROC.” Using our rule this statement suggests that a binormal ROC curve having indiscernible improperness (|r| ≥ 3) is indistinguishable from the corresponding binormal-LR ROC curve.
Simulation study implications
For a simulation study that simulates ROC data using the binormal model, we recommend that the mean-to-sigma ratio be computed for each binormal model and that only models that are indiscernibly improper (|r| ≥ 3) be included in the study.
Results
Example 1: Assessing the degree of improperness using the mean-to-sigma ratio
The data for this example were provided by Carolyn Van Dyke, MD, who had obtained them in a study [13] that compared the relative performance of single spin-echo magnetic resonance imaging (MRI) and cine MRI in detecting thoracic aortic dissection. There were 45 patients with an aortic dissection and 69 patients without a dissection imaged with both spin-echo and cine MRI. Five radiologists independently interpreted all of the images using a five-point ordinal scale. We will refer to the cine modality and reader 1 combination asCINE1, the spin-echo modality and reader 1 combination as SPIN1, etc.
For each of the ten modality-reader combinations, Table 5 presents the parameter estimates for the latent binormal model and the corresponding mean-to-sigma ratio, chance-line crossing fpf, and AUC; the binormal-LR AUC computed using the PROPROC method is also included. From Table 4 we see that CINE5 is the only combination with a noticeably improper ROC curve, with r = 1.98. This is the curve previously discussed and displayed in Figure 1. Two of the curves have slight improperness: CINE4 (r = 2.41) and SPIN2 (r = 2.99). The other seven curves have indiscernible improperness, with 3.00 < r < 4.76 for six of them and r = 33.19 for the other.
Table 5.
VanDyke et al [13] data parameter estimates. All parameter estimates are based on the latent binormal model except for the PROPROC AUC, i.e., the AUC for the binormal likelihood-ratio model estimated using the PROPROC procedure.
| MODALITY | READER | μ2 | σ2 | r | t0 | AUC |
|r| bootstrap results |
|||
|---|---|---|---|---|---|---|---|---|---|---|
| Binormal | PROPROC | Min | 5th pct | 95th pct | ||||||
| Cine | 1 | 3.17 | 1.86 | 3.67 | 0.9999 | 0.933 | 0.934 | 0.82 | 1.30 | 28.43 |
| 2 | 2.50 | 1.78 | 3.19 | 0.9993 | 0.890 | 0.891 | 0.53 | 0.88 | 29.41 | |
| 3 | 2.74 | 1.58 | 4.76 | 1.0000 | 0.929 | 0.908 | 2.29 | 2.95 | 15.03 | |
| 4 | 9.56 | 4.96 | 2.41 | 0.9921 | 0.970 | 0.977 | 1.00 | 1.43 | 16.45 | |
| 5 | 2.29 | 2.16 | 1.98 | 0.9762 | 0.833 | 0.841 | 0.47 | 1.02 | 5.60 | |
| Spin echo | 1 | 3.68 | 1.99 | 3.72 | 0.9999 | 0.951 | 0.952 | 1.33 | 1.95 | 42.69 |
| 2 | 3.70 | 2.24 | 2.99 | 0.9986 | 0.935 | 0.926 | 1.05 | 2.38 | 4.35 | |
| 3 | 3.32 | 2.05 | 3.17 | 0.9992 | 0.928 | 0.930 | 1.17 | 1.68 | 17.57 | |
| 4 | 6.93 | 1.21 | 33.19 | 1.0000 | 1.000 | 1.000 | 3.89 | 10.07 | 23.89 | |
| 5 | 4.11 | 2.37 | 3.00 | 0.9986 | 0.945 | 0.943 | 1.32 | 1.82 | 8.04 | |
Notes: μ2 and σ2 are the mean and variance for the latent diseased distribution, μ1 = 0 and σ1 = 1 for the nondiseased distribution, r = (μ2 − μ1)/(σ2 − σ1) is the mean-to-sigma ratio, and t0 = Φ (r) is the chance-line crossing fpf; Min, 5th pct, and 95th pct are the minimum, 5th and 9th percentile values for |r| based on 1000 bootstrap samples.
The ten binormal ROC curves are displayed in Figure 5. The improperness of the CINE5 ROC curve (r = 1.98) is easy to discern while that of the CINE4 curve (r = 2.41) can be seen upon close examination. On the other hand, the improperness of the other curves (r ≥ 2.99) is not visible.
Figure 5.
Binormal ROC curves for the Van Dyke et al [13] data by reader.
Example 2: Comparing the binormal and binormal-LR methods using the mean-to-sigma ratio
In this example we use the same data as in Example 1, but now use the mean-to-sigma ratio as a tool for comparing the binormal and binormal-LR methods. Figure 6 shows the binormal and binormal-LR ROC curves for five of the modality-reader combinations.
Figure 6.
Binormal and binormal likelihood-ratio (binormal-LR) ROC curves for the Van Dyke et al [13] data. Figures (a) and (b) compare the curves for the two modality-reader combinations having the lowest mean-to-sigma ratios, figures (c) and (d) show two indiscernibly improper combinations where the two methods give similar results, and figure (e) is an indiscernibly improper combination where the two methods give visibly different results. Notes: “AUC(binormal)” and “AUC(binormal-LR)” indicate binormal and binormal-LR AUCs, respectively;“CINE5” indicates the cine modality and reader 5 combination, etc.
Figures 7(a) and 7(b) compare the methods for the two combinations having the lowest mean-to-sigma ratios: CINE5 (r =1.98) and CINE4 (r = 2.41). Here we see that the binormal and binormal-LR ROC curves are similar except in the region defined by fpf values larger than the highest fpf of any of the operating points (i.e., the empirical (fpf, tpf) points). In this region we see that each binormal-LR ROC curve, unlike the binormal ROC curve, does not have a hook or chance-line crossing. Thus the binormal-LR AUC is higher than the binormal AUC for both combinations: 0.841 versus 0.833 for CINE5, and 0.977 versus 0.970 for CINE4.
Figures 7(c) and 7(d) compare the methods for two indiscernibly improper combinations, SPIN5 (r = 3.00) and SPIN3 (r = 3.17), that have similar ROC curves with approximately equal AUCs: 0.943 (binormal-LR) versus 0.945 for SPIN5, and 0.930 (binormal-LR) versus 0.928 for SPIN3. Note that for both combinations the binormal and binormal-LR curves are virtually indistinguishable. These results are not surprising, considering that the curves are indiscernibly improper and the binormal-LR family of curves contains curves very similar to any of the indiscernibly improper binormal curves, as previously discussed.
Figure 6(e) compares the methods for an indiscernibly improper combination, CINE3 (r = 4.76), where the binormal-LR and binormal curves are notably different and have moderately different AUCs: 0.908 (binormal-LR) versus 0.929. We see from the plot that the binormal-LR curve visibly passes closer to the first 3 operating points than the binormal curve, while the binormal curve passes closer to the last operating point. The important point to note from Figure 6(e) is that the binormal-LR method, as implemented by the PROPROC procedure, can produce a visibly different curve than the binormal method, even when the binormal method produces a curve that is indiscernibly improper. We note that this finding contradicts the assertion by Metz and Pan [21, p 29] that the binormal LR model provides fitted curves that are virtually identical to those of the binormal model when no hooks are evident with the binormal model fitted curves.
In this example the mean-to-sigma ratio has enabled us to compare the binormal and binormal-LR procedures for different levels of binormal improperness. We found that the two methods not only can produce visibly different curves when the binormal curves are noticeably or slightly improper, but also when the binormal curves are indiscernibly improper.
Example 3: Assessing improperness uncertainty
In this example we give a 90% confidence interval for the degree of improperness by constructing a confidence interval for the absolute mean-to-sigma ratio. Mean-to-sigma ratios were computed for each of 1000 bootstrap samples from the data set used for the previous examples, with each bootstrap sample having the same number of diseased and nondiseased cases as the original data. The minimum and 5th and 95th percentiles of the bootstrap absolute-value mean-to-sigma ratios are reported in Table 5 (the 5th and 95th percentiles define the 90% confidence interval). The median of the ten minimum values is 1.11, with eight of them less than 1.33. The median of the ten 5th percentiles is 1.75, with eight of them less than 1.95. The 95th percentiles all exceed 4.35. Thus both noticeably improper and indiscernibly improper curves are included in the confidence intervals for all of the test-reader combinations. This example shows how the mean-to-sigma ratio can be useful for describing uncertainty in a fitted ROC curve with respect to improperness.
The ROC curves, corresponding summary indices and data for creating ROC-curve plots for the first two examples, as well as the mean-to-sigma ratios for the third example, were obtained using a SAS (SAS for Windows, Version 9.2, Copyright © 2002–2008 by SAS Institute Inc., Cary, NC, USA)macro that is available to the public [22]. Freely available stand-alone software that performs the same functions is also available for download in the multi -reader program DBM MRMC 2.2[23, 24] and in the single reader programs PROPROC and ROCKIT, written by Charles Metz and colleagues and available at http://xray.bsd.uchicago.edu/krl/index.htm. Graphical displays were created using STATA/IC 10.1 for Windows (Copyright © 1985–2009, College Station, TX, USA).
Discussion
Although the standard method for estimating an ROC curve is maximum likelihood estimation based on the latent binormal model, the binormal model implies that the decision variable is not a monotone function of the likelihood ratio; hence this method produces improper ROC curves that are not concave everywhere and cross the chance line, implying that the test performs worse than chance for a range of fpf values. Although in most situations the degree of improperness is so small that it cannot be seen, it is important to be able to easily identify those ROC curves where the improperness is visible. We have shown how the mean-to-sigma ratio, by having a clear interpretation in term of the chance-line crossing fpf, provides an easy way to assess the degree of improperness in binormal ROC curves.
We recommend that ROC analysis software print out the value of the mean-to-sigma ratio for each computed binormal ROC curve, along with a warning message when the improperness is slight or noticeable. While we have suggested classification categories for assessing improperness based on the mean-to-ratio, we emphasize that these categories are somewhat arbitrary and encourage researchers to formulate their own guidelines.
Although improperness can be visually assessed from plots, the mean-to-sigma ratio provides an objective measurement of improperness that is not dependent on individual subjectivity. Furthermore, it is useful when plots are not easily obtained from software, for reporting results in journal articles when space requirements restrict the use of plots, and when the number of ROC curves renders examination of plots impractical (e.g., a multi-reader study with many readers and modalities or a simulation study.) On the other hand, the mean-to-sigma ratio measures only one aspect of improperness; e.g., it does not assess the area of the hook. Thus we recommend also looking at ROC curves, when feasible, to visually assess improperness.
In addition to allowing easy assessment of the degree of improperness for binormal ROC curves, we saw in an example how the mean-to-sigma can help clarify differences between the binormal and other methods by allowing for comparison of the methods according to different levels of binormal improperness. In particular, we saw that the binormal-LR method, as implemented by the PROPROC procedure, can produce visibly different ROC curves regardless of the level of improperness of the binormal curve. Furthermore, we saw how it could be useful in describing uncertainty in a fitted binormal ROC curve with respect to improperness.
Since we interpret the mean-to-sigma ratio in terms of the chance-line crossing, we could have alternatively used the chance-line crossing fpf as a measure of improperness instead of the mean-to-sigma ratio. However, there were several reasons why we chose the mean-to-sigma ratio: (1) Our interest was primarily in assessing improperness for the binormal model, for which the mean-go-sigma ratio is easy to compute from the model parameters. (2) In general, an improper ROC curve does not necessarily cross the chance line; for such curves the chance-line crossing fpf would be misleading. (3) The mean-to-sigma ratio is an established measure for assessing the relationship between the means and standard deviations; viewing it as a measure of improperness gives it a dual function.
We have not attempted to discuss the advantages and disadvantages of various strategies to circumvent the binormal improperness problem since that would require separate treatment. What we have shown, however, is how the mean-to-sigma ratio, by providing a quantitative and easily computable improperness measure, facilitates such a discussion and allows for the comparison of the analysis methods according to improperness categories in simulation and real-data studies.
Acknowledgments
Grant support: This research was supported by the National Institutes of Health, grant R01EB000863.
Acknowledgements and Disclaimer
The authors thank Carolyn Van Dyke, M.D. for sharing her data set. The authors also thank the reviewers for their very helpful comments and suggestions. This research was supported by the National Institutes of Health, grant R01EB000863. The views expressed in this article are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs.
Equation Section (Next) Appendix
For completeness we include in this section proofs of the chance-line crossing and inflection-point results for the binormal model summarized in Table 2, as well as showing the relationship between da/|c| and r. Let Y denote a binormal decision variable corresponding to a nondiseased population with mean μ1 and standard deviation σ1, and a diseased population with mean μ2 and standard deviation σ2, with σ1 ≠ σ2. Let D denote disease status, with D = 1 denoting diseased and D = 0 denoting nondiseased. Thus and . Let r denote the mean-to-sigma ratio: r = (μ2 − μ1)/(σ2 − σ1). Let ROC(·) denote the ROC curve function for Y; i.e., for a given fpf t, the corresponding tpf is given by ROC(t). Let LR (·) denote the likelihood -ratio function for Y given by
| (A1) |
where φ(·) is the standard normal density function.
Chance-Line Crossing Theorem
Theorem 1
There exists a unique chance-line crossing fpf, t0 = Φ (r), with corresponding unique chance-line crossing threshold, c0 = (σ2μ1 − σ1μ2)/(σ2 − σ1). That is, the ROC curve corresponding to Y crosses the chance line once and only once at the point (t0, t0) and c0 is the corresponding threshold such that Pr (Y > c0 | D = 0)= t0.
Proof
Let c denote a threshold corresponding to a chance-line crossing fpf. Then
or equivalently,
It follows that
| (A2) |
Solving Equation (A2) for c yields the unique solution
| (A3) |
Note that c0 is defined since σ1 ≠ σ2.
Let t0 denote the chance-line crossing fpf corresponding to c0. Then
Thus
Uniqueness of t0 follows from the uniqueness of c0.
Corollary
If b < 1(b > 1) then the ROC curve crosses from above (below) the chance line for increasing fpf.
Proof
Since the slope of the chance line is unity, then for increasing fpf the ROC curve will cross the chance line from above if its derivative at t0 is less than unity, and from below if its derivative is greater than unity. The derivative of the ROC curve evaluated at t0 is equal to the likelihood ratio evaluated at c0 [9, p 70]; thus
| (A4) |
Substitution into the right-hand side of Equation (A4) using Equations (A1) and (A3) yields
and the corollary follows.
Inflection-Point Theorem
Theorem 2
If b < 1 the ROC curve is concave over (0, t1) and convex over (t1, 1), where . If b > 1 the ROC curve is convex over (0, t) and concave over (t1, 1).
Proof
The ROC curve for a binormal decision variable is twice differentiable. From basic calculus results concerning concave functions it follows that the binormal ROC curve is concave (convex) over an open interval if its second derivative is negative (positive) throughout the interval [25, pp 282–283]. Our approach is to show that the second derivative of the binormal ROC curve is negative throughout (0, t1) and positive throughout (t1, 1) if b < 1, and positive throughout (0, t1) and negative throughout (t1, 1) if b > 1.
Let t denote an fpf with corresponding threshold c. The derivative of the ROC curve evaluated at t is equal to the likelihood ratio evaluated at c, i.e.,
It follows, using the chain rule, that
| (A5) |
Since
| (A6) |
then t is a strictly decreasing function of c and . It follows that Equation (A5) is equivalent to
| (A7) |
Since φ[(c − μ1)/σ1] > 0, it follows from Equation (A7) that the second derivative of the ROC curve and the derivative of the likelihood ratio have opposites signs when evaluated at t and c, respectively. That is,
where
Since the logarithmic function is strictly increasing, the likelihood ratio and log-likelihood ratio derivatives have the same sign; hence it follows that
| (A8) |
It is straightforward to show that
and
| (A9) |
It follows from Equation (A9) that the derivative of the log-likelihood ratio evaluated at c + Δ, where , is given by
| (A10) |
Using Equation (A6) it is easy to show that c1 is the threshold corresponding to t1.
From Equation (A10) it follows that for b < 1(hence ) the derivative of the log-likelihood ratio evaluated at c is positive for c > c1 and negative for c < c1; and for b > 1 (hence ) the derivative of the log-likelihood ratio evaluated at c is negative for c > c1 and positive for c < c1. Since the derivative of the log-likelihood ratio has the opposite sign of the second derivative of the ROC curve evaluated at the corresponding fpf (A8), and since thresholds less than c1 correspond to fpfs greater than t1 and vice versa, it follows that for b < 1 the second derivative of the ROC curve evaluated at t is positive for t1 < t < 1 and negative for 0 < t < t1; and for b > 1 the second derivative of ROC curve evaluated at t is negative for t1 < t < 1 and positive for 0 < t < t1
Corollary 1
The fpf value is the unique inflection-point fpf, and is its corresponding inflection-point threshold.
Proof
It follows immediately from Result 2 that t1 is an inflection-point fpf, and uniqueness follows from the Theorem 2 result that the ROC curve does not change from concave to convex at any other point. That c1 is the corresponding inflection-point threshold is easy to show using Equation (A6).
Corollary 2
The derivative of the likelihood ratio and log-likelihood ratio function is zero only at the inflection-point threshold c1; otherwise it is positive for c > c1 and negative for c < c1 if b < 1, and negative for c > c1 and positive for c < c1 if b > 1.
Proof
From the previous corollary the inflection-point threshold is given by . Using Equation (A9) we have
Since the log-likelihood ratio function is a strictly increasing function of the likelihood ratio function, then it follows that
The last part of the corollary was shown in the proof for Theorem 2.
Relationship between da/|c| and r
Theorem 3
Let
where a and b are binormal distribution parameters, da and c represent a parameterization of the corresponding binormal-LR model, a ≥ 0 and b ≠ 1. Then
where
Proof
Suppose b > 1. Then |c| = −(b − 1)/(b + 1) and it follows that
since r is negative if b > 1 by Equation (2). Similarly, if b < 1 then |c| = −(b − 1)/(b + 1) and it follows that
since r is positive if b < 1.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Contributor Information
Stephen L. Hillis, Center for Research in the Implementation of Innovative Strategies in Practice (CRIISP) Iowa City VA Medical Center, Iowa City, IA, U.S.A. Department of Biostatistics University of Iowa, Iowa City, IA, U.S.A
Kevin S. Berbaum, Department of Radiology University of Iowa, Iowa City, IA, U.S.A
References
- 1.Dorfman DD, Alf E., Jr Maximum likelihood estimation of parameters of signal-detection theory and determination of confidence intervals: rating method data. Journal of Mathematical Psychology. 1969;6:487–496. [Google Scholar]
- 2.Dorfman DD. RSCORE II. In: Swets JA, Pickett RM, editors. Evaluation of diagnostic systems: methods from signal detection theory. Academic Press; San Diego, CA: 1982. pp. 212–232. [Google Scholar]
- 3.Dorfman DD, Berbaum KS. Degeneracy and discrete receiver operating characteristic rating data. Academic Radiology. 1995;2:907–915. doi: 10.1016/s1076-6332(05)80073-x. [DOI] [PubMed] [Google Scholar]
- 4.Metz CE, Herman BA, Shen JH. Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Statistics in Medicine. 1998;17:1033–1053. doi: 10.1002/(sici)1097-0258(19980515)17:9<1033::aid-sim784>3.0.co;2-z. [DOI] [PubMed] [Google Scholar]
- 5.Hanley JA. The robustness of the binormal assumptions used in fitting ROC curves. Medical Decision Making. 1988;8:197–203. doi: 10.1177/0272989X8800800308. [DOI] [PubMed] [Google Scholar]
- 6.Hanley JA. The use of the ‘binormal’ model for parametric ROC analysis of quantitative diagnostic tests. Statistics in Medicine. 1996;15:1575–1585. doi: 10.1002/(SICI)1097-0258(19960730)15:14<1575::AID-SIM283>3.0.CO;2-2. [DOI] [PubMed] [Google Scholar]
- 7.HajianTilaki KO, Hanley JA, Joseph L, Collet JP. A comparison of parametric and nonparametric approaches to ROC analysis of quantitative diagnostic tests. Medical Decision Making. 1997;17:94–102. doi: 10.1177/0272989X9701700111. [DOI] [PubMed] [Google Scholar]
- 8.Swets JA. Form of empirical ROCS in discrimination and diagnostic tasks: implications for theory and measurement of performance. Psychological Bulletin. 1986;99:181–198. [PubMed] [Google Scholar]
- 9.Pepe M. The statistical evaluation of medical tests for classification and prediction. Oxford; New York: 2003. [Google Scholar]
- 10.Egan JP. Signal detection theory and ROC analysis. Academic; New York: 1975. [Google Scholar]
- 11.Stein SK. Calculus and analyticgeometry. 4. McGraw-Hill; New York: 1987. [Google Scholar]
- 12.Pan XC, Metz CE. The “proper” binormal model: parametric receiver operating characteristic curve estimation with degenerate data. Academic Radiology. 1997;4:380–389. doi: 10.1016/s1076-6332(97)80121-3. [DOI] [PubMed] [Google Scholar]
- 13.Van Dyke CW, White RD, Obuchowski NA, Geisinger MA, Lorig RJ, Meziane MA. Cine MRI in the diagnosis of thoracic aortic dissection. 79th RSNA Meetings; Chicago, IL. November 28–December 3, 1993. [Google Scholar]
- 14.Swets JA, Tanner WP, Birdsall TG. Decision processes in perception. Psychological Review. 1961;68:301–340. [PubMed] [Google Scholar]
- 15.Swets JA. Indices of discrimination or diagnostic accuracy: their ROCs and implied models. Psychological Bulletin. 1986;99:100–117. [PubMed] [Google Scholar]
- 16.Green DM, Swets JA. Signal detection theory and psychophysics. Peninsula Publishing; Los Altos: 1988. Original work: Green DM, Swets JA. Signal detection theory and psychophysics. New York: Wiley, 1966. [Google Scholar]
- 17.Dorfman DD, Berbaum KS, Metz CE, Lenth RV, Hanley JA, AbuDagga H. Proper receiver operating characteristic analysis: The bigamma model. Academic Radiology. 1997;4:138–149. doi: 10.1016/s1076-6332(97)80013-x. [DOI] [PubMed] [Google Scholar]
- 18.Dorfman DD, Berbaum KS, Brandser EA. A contaminated binormal model for ROC data - Part I. Some interesting examples of binormal degeneracy. Academic Radiology. 2000;7:420–426. doi: 10.1016/s1076-6332(00)80382-7. [DOI] [PubMed] [Google Scholar]
- 19.Dorfman DD, Berbaum KS. A contaminated binormal model for ROC data-Part II. A formal model. Academic Radiology. 2000;7:427–437. doi: 10.1016/s1076-6332(00)80383-9. [DOI] [PubMed] [Google Scholar]
- 20.Dorfman DD, Berbaum KS. A contaminated binormal model for ROC data -Part III. Initial evaluation with detection ROC data. Academic Radiology. 2000;7:438–447. doi: 10.1016/s1076-6332(00)80384-0. [DOI] [PubMed] [Google Scholar]
- 21.Metz CE, Pan XC. “Proper” binormal ROC curves: theory and maximum-likelihood estimation. Journal of Mathematical Psychology. 1999;43:1–33. doi: 10.1006/jmps.1998.1218. [DOI] [PubMed] [Google Scholar]
- 22.Hillis SL, Schartz KM, Pesce LL, Berbaum KS, Metz CE. [Accessed August 1, 2009.];DBM MRMC procedure for SAS (computer software) Available for download from http://perception.radiology.uiowa.edu.
- 23.Berbaum KS, Schartz KM, Pesce LL, Hillis SL. [Accessed August 1, 2009.];DBM MRMC 2.2 (computer software) Available for download from http://perception.radiology.uiowa.edu.
- 24.Berbaum KS, Metz CE, Pesce LL, Schartz KM. [Accessed August 1, 2009.];DBM MRMC 2.1 User’s Guide (software manual) Available for download from http://perception.radiology.uiowa.edu.
- 25.Stein SK. Calculus in the first three dimensions. McGraw-Hill; New York: 1967. [Google Scholar]






