Abstract
Hepatocellular carcinoma (HCC) is the most common primary cancer of the liver. Finding new biomarkers for its early detection is of high clinical importance. As with many other diseases, cancer has a progressive nature. In cancer biomarker studies, it is often the case that the true disease status of the recruited individuals exhibits more than two classes. The receiver operating characteristic (ROC) surface is a well-known statistical tool for assessing the biomarkers’ discriminatory ability in trichotomous settings. The volume under the ROC surface (VUS) is an overall measure of the discriminatory ability of a marker. In practice, clinicians are often in need of cutoffs for decision-making purposes. A popular approach for computing such cutoffs is the Youden index and its recent three-class generalization. A drawback of such a method is that it treats the data in a pairwise fashion rather than consider all the data simultaneously. The use of the minimized Euclidean distance from the perfection corner to the ROC surface (also known as closest to perfection method) is an alternative to the Youden index that may be preferable in some settings. When such a method is employed, there is a need for inferences around the resulting true class rates/fractions that correspond to the optimal operating point. In this paper, we provide an inferential framework for the derivation of marginal confidence intervals (CIs) and joint confidence spaces (CSs) around the corresponding true class fractions, when dealing with trichotomous settings. We explore parametric and nonparametric approaches for the construction of such CIs and CSs. We evaluate our approaches through extensive simulations and apply them to a real data set that refers to HCC patients.
Keywords: closest to perfection corner, cutoff, Euclidean distance, ROC, true class fraction
1 |. INTRODUCTION
In several progressive disorders, clinicians characterize patients into one of three ordered categories (classes): healthy (“control”) group, “benign stage” (or “middle”) group, and diseased group (or “aggressive stage”) (Mossman, 1999; Nakas & Yiannoutsos, 2004; Xiong et al., 2006; Xu et al., 2010). One example involves liver cancer studies in which a patient could be diagnosed to be either in a “Healthy,” a “Chronic Liver Disease,” or a “Hepatoma” group (Xu et al., 2010). The receiver operating characteristic (ROC) curve is the most popular tool for evaluating the discriminatory ability of a continuous marker when the true disease status is dichotomous. Its three-class generalization has been discussed in the literature (Mossman, 1999; Nakas & Yiannoutsos, 2004).
The ROC curve illustrates the tradeoff of the sensitivity and the false positive fraction (or rate), FPF = 1 − specificity, of the biomarker of interest. Its three-class generalization, the ROC surface, is a three-dimensional (3D) plot that illustrates the tradeoff of all three true classification fractions (TCFs). Denote with Yi the random variable corresponding to the biomarker scores for group i = 1, 2, 3, where Y1 ~ F1, Y2 ~ F2, and Y3 ~ F3 under the stochastic ordering Y1 < Y2 < Y3. To categorize a patient into one of the underlying three groups, two ordered thresholds/cutoffs are needed, c1 and c2, with c1 < c2. A given pair of (c1, c2) corresponds to a triplet of true class fractions, namely, TCFi, i = 1, 2, 3, that is defined as follows (Nakas & Yiannoutsos, 2004):
TCF1 is the probability of correctly classifying a healthy subject to the first group. The interpretation is analogous for TCF2 and TCF3. After scanning for all possible values of c1 and c2 (c1 < c2), the 3D plot of all corresponding triplets (TCF1, TCF2, TCF3) represents the ROC surface, which can be denoted as ROC(c1, c2) = (TCF1(c1), TCF2(c1, c2), TCF3(c2)), with −∞ < c1 < c2 < +∞. The ROC surface can alternatively be expressed through the underlying cumulative distribution functions (cdfs) as follows (Nakas & Yiannoutsos, 2004):
Where . In such settings, the volume under the ROC surface (VUS) is often used to measure the overall discriminatory ability of a biomarker (Mossman, 1999). Assuming that the biomarker is continuous, such a measure can be shown to be equal to P(Y1 < Y2 < Y3). An uninformative biomarker leads to an ROC surface with VUS equal to 1/6, while a perfect biomarker has a VUS equal to 1.
Once the ROC surface is constructed, a common way of selecting optimal cutoff points is to maximize of the Youden index. In the three-class setting, the maximized Youden index is defined as follows (Nakas et al., 2010, 2013):
We note that some researchers define the three-class Youden index as (Luo & Xiong, 2013). This is used to scale the Youden index within the interval (0,1). This discrepancy does not affect the derivation of the optimized cutoffs, denoted with and . The expression above, indicates that only F1 and F2 contribute to the estimation of . Similarly, for , only the contributions of F2 and F3 are involved. As pointed out in Nakas et al. (2013), the expression above implies that maximizing the three-class Youden index is equivalent to maximizing the two pairwise two-class Youden indices, and thus the Youden index cannot accommodate all data simultaneously. As a result, this may have implications when doing inferences around the optimal operating point due to the inflated variances, which are caused by the underutilization of the samples/marker-scores. This will, in turn, result in inflated marginal confidence intervals (CIs) for the TCFs and inflated confidence spaces (CSs) around them. An additional implication is that when the cutoffs are estimated (and not considered fixed and known), the estimated triplet of TCFs will exhibit correlation that needs to be taken into consideration when doing inferences (Bantis et al., 2014, 2017).
While alternative approaches for cutoff estimation have been discussed in the three-class setting (Attwood et al., 2014; Mosier & Bantis, 2021), there are no available methods for the corresponding inferences around the underlying optimal TCF triplet. In this paper, we develop such inferences when the optimized cutoffs are based on: (1) the minimized Euclidean distance of the ROC surface from the perfection corner , which is also known as closest to perfection criterion (Attwood et al., 2014; Mosier & Bantis, 2021), and (2) the so-called Maximized Volume (MV) approach that refers to a Maximized Volume of a cuboid under the ROC surface (Attwood et al., 2014). We compare these two criteria with the corresponding results of the three-class Youden index . Our methods for both the and the MV are able to: (1) accommodate the underlying correlations of the optimized TCFs and (2) accommodate all available biomarker scores simultaneously. The criterion only enjoys the first of these two properties.
The , which involves the minimized Euclidean distance from the ROC surface to the perfection corner (Attwood et al., 2014; Mosier & Bantis, 2021) is given by:
The MV approach involves maximizing the volume of an underlying rectangular cuboid whose sides are determined by the TCFs:
Note in the expressions above, and MV, unlike the Youden index, do not imply a pairwise derivation of the cutoffs. That is, all three groups contribute simultaneously for the derivation of both c1 and c2.
This paper is organized as follows: In Section 2, we propose a parametric approach for the construction of the joint CSs around the TCFs under the assumption of normality. In Section 3, we explore more flexible techniques based on power transformations. In Section 4, we explore nonparametric techniques that are kernel based and logspline based. In Section 5, we evaluate our approaches through extensive simulations. In Section 6, we apply our approaches on a real data set that involves hepatocellular carcinoma (HCC) patients. We end with a discussion.
2 |. ASSUMPTION OF NORMALITY (DELTA-BASED APPROACH)
Under the assumption of normality we have: , , . The parameter vector of interest is θ = (μ1, σ1, μ2, σ2, μ3, σ3). The TCFs are given by:
| (1) |
The Euclidean distance of the ROC surface from the perfection corner, for any pair of cutoff points (c1, c2), is defined as:
The cuboid VUS for any pair of cutoff points (c1, c2) is defined as:
By plugging in the maximum likelihood estimates (MLE) of the involved parameters, we can obtain the MLE estimator of the full ROC surface. Optimized cutoff points can be derived so that they minimize the Euclidean distance or maximize the cuboid volume defined above (MV). Using either one of these objective functions, the corresponding estimated optimal cutoff points, (, ), do not have a closed form and can be derived numerically. The implied estimated TCF triplets are:
Since each of the TCFi, i = 1, 2, 3 is a probability, it is bounded by [0,1] and the TCF triplet of interest lies within the unit cube. We need to account for this when constructing marginal CIs and joint CSs using the normal approximation. For this reason, we employ the probit transformation, Φ−1(·), on the TCFs to project them onto the real line. This drives us to define the following δi’s, i = 1, 2, 3:
| (2) |
To estimate the δs, we can simply substitute the parameters involved in (2) with the corresponding MLEs (the cutoff points can be numerically derived by optimizing or MV after plugging in the MLEs of the parameters). The corresponding MLEs of the δs are denoted as , , and . Using the delta method, we can approximate the variances of by:
where and , i = 1, 2, 3.
The covariance between and is:
where i = 1, 2, 3, k = 1, 2, 3, i ≠ k.
Note that when using the three-class Youden index, the expressions above can be written in closed form (Bantis et al., 2014). When we aim to minimize the Euclidean distance from the perfection corner or to maximize the cuboid volume, these partial derivatives do not have closed forms and so we proceed numerically.
To obtain the 95% rectangular parallelepiped space, we use the Bonferroni adjustment. The CIs for have the following form:
| (3) |
In practice, we use , i = 1, 2, 3, in the expression above that involves replacement of the parameters with their corresponding MLEs. To obtain the desired CS around (TCF1, TCF2, TCF3), we back-transform the endpoint of expression (3) to the ROC space with the inverse probit. The Bonferroni-based CIs each TCFi, (i = 1, 2, 3) are:
| (4) |
Note that (, ) is a 98.33% CI for TCFi. The 3D rectangle
| (5) |
is a joint 95% CS around the (TCF1, TCF2, TCF3). However, such a CS cannot accommodate the correlations between the estimated TCFs, which stems from the estimated cutoffs (Bantis et al., 2014, 2017). To accommodate such correlations, we focus on an ellipsoidal CS for the δs. Note that the estimated variance–covariance matrix of the δi, i = 1, 2, 3 has the following form:
If we denote with q3;0.95 the 95th percentile of a distribution, then the ellipsoid defined by
| (6) |
is an approximate 95% joint CS for (δ1, δ2, δ3), where . Back-transforming this ellipsoid to the ROC space results in an “egg-shaped” CS that is restricted to lie within the unit cube.
As pointed out by a referee, one could alternatively use the logit transformation in an analogous way. By now defining δ’s as:
| (7) |
The delta-based variances and covariances can be analogously approximated using the delta method. After deriving the corresponding partial derivatives, back-transforming to the ROC space would imply:
| (8) |
where i = 1, 2, 3. We refer to the above approach in the simulation study section as “Delta.”
3 |. BOX–COX TRANSFORMATION-BASED APPROACH
When the normality assumption cannot be justified by the data, a transformation may be used to achieve normality. A popular transformation is the Box–Cox, named after Box and Cox (Box & Cox, 1964). Since the ROC is invariant to monotone transformations, the usefulness of the Box–Cox transformation has been discussed before under the ROC framework (Fluss et al., 2005; Zou & Hall, 2000, among others). It is defined by:
Note that the expression above assumes that all biomarker scores are positive. Under this setting, we assume that the Box–Cox transformed scores attain approximate normality, namely:
Since the transformation parameter λ needs to be estimated by the data, it exhibits variability. This variability needs to be taken into account. The full log-likelihood under the Box–Cox transformation is given by:
where θ = (μ1(λ), σ1(λ), μ2(λ), σ2(λ), μ3(λ), σ3(λ), λ) and k is constant. The corresponding 7 × 7 observed information matrix is provided by Bantis et al. (2017) and can be evaluated at . The underlying 7 × 7 variance–covariance matrix can be derived by inversion. By denoting its upper-left 6 × 6 part with , we derive:
where i = 1, 2, 3. The covariance between and can be approximated by:
where i = 1, 2, 3, k = 1, 2, 3, and i ≠ k. Based on the estimates of and , we can derive the 3D rectangular and egg-shaped CSs using Equations (5) and (6). We refer to this approach as “Box-Cox” in the simulation study section.
4 |. SMOOTH NONPARAMETRIC ESTIMATION AND INFERENCE
In practice, it is often the case that normality is violated even after the Box–Cox transformation. In such situations, more flexible nonparametric approaches are preferable.
4.1 |. Kernel-based method
Here, we explore a kernel-based approach that employs Gaussian (normal) kernels (Lloyd & Yong, 1999; Wand & Jones, 1994; Zhou & Harezlak, 2002). Under such a framework, the cdf can be estimated by:
where i = 1, 2, 3. The quantity ℎi is the so-called bandwidth of each group. The bandwidth we utilize is:
where sd and iqr refer to the standard deviation and interquartile range respectively (Silverman, 1986; Zou et al., 1998). After exploring a bootstrap scheme, we have observed in our simulations that the coverage is not as expected, especially for scenarios in which the marker exhibits high VUS values. For that reason, we consider a corrected bootstrap approach that adds a noise (error term) to the biomarker scores (see Step 2 of the algorithm below). Such a bootstrap scheme has been used before under a different ROC setting (Yin & Tian, 2014). This implies the following proposed algorithm:
Step 0: Calculate ℎ1, ℎ2, ℎ3 based on Y1, Y2, Y3, respectively. Derive the normal kernel estimates of , , . With these distribution estimates, derive the estimated optimal , i = 1, 2, 3 by applying the user’s choice of the objective function (, , or MV). Then the corresponding delta estimates , i = 1, 2, 3, can also be derived.
Step 1: Sample Y1, Y2, Y3 separately with replacement, and denote the corresponding bootstrap samples with , , , where i = 1, …, n1, j = 1, …, n2, k = 1, …, n3.
Step 2: Set , where , set , where , and set , where . The ℎi, i = 1, 2, 3, is calculated in Step 0.
Step 3: Based on , , and , estimate the bootstrap-based bandwidths and construct the kernel distribution estimates, namely, (, , ), based on which the optimal cutoff points (, ) can be obtained by the same choice of the objective function used in Step 0. If , discard this bootstrap iteration and replace it with the next, provided that .
Step 4: Obtain the δi estimates using the logit transformation of the estimated TCFi:
Step 5: Repeat Step 1 to 4 m times to obtain m bootstrap estimates of each δi.
Step 6: Based on the result of Step 5, derive an estimate of the 3 × 3 variance–covariance matrix of the δs, denoted with . The construction of the ellipsoid can be done as implied by Equation (6) by setting .
Step 7: Back-transform the ellipsoid to the ROC space by using the inverse logit function.
The approach proposed above is denoted as “Kernel” in the following sections. In our simulations, we also explore this algorithm with the use of probit instead of logit in Steps 0, 4, and 7.
4.2 |. Logspline-based method
Another alternative smooth estimator of interest for the densities of the three underlying groups is the logspline estimator. It is a density estimator that was first introduced by Kooperberg and Stone (1992). The logspline approach employs the following model for the cdf:
where f(y; θ) = exp(θ1B1(y) + … θpBp(y) − C(θ)) with L < y < U. In this expression, C(θ) is a normalization factor and the functions Bi(y) are cubic B-splines. Similar to the kernel-based method discussed in Section 4, the logspline technique can provide a smooth ROC surface estimate. The corresponding optimal pair of cutoffs and the underlying TCF triplet can be numerically derived. Use of our aforementioned algorithm is straightforward by omitting Step 2 and replacing the kernel-based cdf estimates with the logspline-based cdf estimates. We refer to this approach as “Logspline” in our simulation section. Similar to the proposed approaches so far, a probit transformation can be used as an alternative of the logit in Steps 0, 4, and 7 of the proposed algorithm.
5 |. SIMULATION STUDY
We conduct a simulation study to evaluate our methods in terms of coverage and volume of the discussed CSs. The theoretical targeted coverage is 0.95. Different distributional scenarios are considered involving the normal, lognormal, and gamma distributions (utilized parameters can be found in Table 1). The parameters chosen for these distributions correspond to VUS values of 0.4, 0.5, 0.6, and 0.7. To test if our proposed approaches are robust, we further consider generating data from a mixture bimodal model. For the mixture model, we consider parameters that correspond to a VUS value equal to 0.6. We choose the underlying parameters so that they correspond to the same cutoffs when using the Youden index, the Euclidean distance, and the maximum cuboid volume criterion. We choose such scenarios to make the comparison between the Youden, the Euclidean distance, and the cuboid volume more fair. Regarding the sample sizes, we consider (50, 50, 50), (100, 100, 100), and (200, 200, 200). The parameters used in all scenarios are given in Table 1.
TABLE 1.
Scenarios considered in our simulation study. We generate data from normal, lognormal, and gamma distributions. We further consider an additional scenario where groups 1 and 2 are normally distributed while group 3 is generated from a two-component normal mixture. The VUS values are set to be 0.4, 0.5, 0.6, and 0.7. For the scenario involving the mixture distribution, we only consider a VUS of 0.6
| Distribution | VUS | μ 1 | μ 2 | μ 3 | σ 1 | σ 2 | σ 3 |
|---|---|---|---|---|---|---|---|
| Normal | 0.4 | 9.6992 | 10.3077 | 10.8000 | 1.0105 | 0.7513 | 0.9100 |
| 0.5 | 10.2530 | 11.2025 | 12.6000 | 1.3010 | 1.0775 | 1.7150 | |
| 0.6 | 11.4060 | 12.7810 | 13.9340 | 1.3785 | 0.8897 | 1.1790 | |
| 0.7 | 10.2550 | 12.2010 | 14.4400 | 1.5430 | 1.1010 | 1.7930 | |
| Lognormal | 0.4 | 0.6764 | 0.9200 | 1.2200 | 0.4050 | 0.3300 | 0.4550 |
| 0.5 | 0.7225 | 1.1500 | 1.4500 | 0.5197 | 0.3300 | 0.4021 | |
| 0.6 | 0.8375 | 1.3700 | 1.8100 | 0.5197 | 0.3300 | 0.4370 | |
| 0.7 | 0.8800 | 1.5600 | 2.1100 | 0.5582 | 0.3300 | 0.4465 | |
| VUS | shape 1 | shape 2 | shape 3 | scale 1 | scale 2 | scale 3 | |
| Gamma | 0.4 | 3.2600 | 4.9045 | 3.1079 | 0.9599 | 0.7978 | 1.9300 |
| 0.5 | 3.2000 | 5.4170 | 2.9954 | 0.9599 | 0.7838 | 2.5000 | |
| 0.6 | 2.5900 | 5.4530 | 2.8800 | 0.9900 | 0.7853 | 3.1790 | |
| 0.7 | 3.4276 | 8.000 | 3.7500 | 0.8958 | 0.6765 | 3.1300 | |
| Mixture | 0.6 | Y1 ∼ N(11.9,0.652) | Y2 ∼ N(12.5,0.552) | ||||
| Y3 ∼ 0.5N(12.9,0.502) + 0.5N(14.5,0.502) | |||||||
In our simulations, we focus on the coverage of the egg-shaped CSs and their associated egg volumes. The egg volume is analogous to the width of a CI in a univariate setting. Smaller egg volumes imply tighter CSs around the estimated TCF triplet. We contrast both the coverage and the egg volumes derived from the Youden index , the Euclidean distance , and the MV. We note here, that we conducted additional simulations involving the corresponding results for a parallelepiped (rectangular) CS. We observe that the egg-shaped CSs dramatically outperform the rectangular based ones, and hence we only refer to the former for brevity. This is especially the case when the correlations of the TCFs are high. This is in line with the findings of Bantis et al. (2014, 2017) (Figures 1).
FIGURE 1.

Volume of egg-shaped confidence spaces (CSs) using the logit transformation. Data are generated from normal distributions (top row), gamma distributions (middle row), and lognormal distributions (bottom row), as presented in Table 1. Note that there are substantial differences in term of the volume of the explored CSs. The traditional -based approaches yield up to 7.5 times larger egg-volumes compared to the proposed -based ones. For the corresponding ratio plots, see Figure 2 and Figures S2, S3, S7, S8, and S9 of the Supporting Information
5.1 |. Results that refer to the scenario of normal distributions
First, we discuss the results for the scenarios that relate to the normal distributions. For the proposed Delta-based CSs, the corresponding egg volume based on is substantially smaller compared to the one based on . In particular, when the VUS = 0.4 and n = (50, 50, 50), we see that the Youden-based egg volume is around six times larger than the proposed -based one (see Tables S1 and S2 and Figures S1–S3 in the Supporting Information). We see analogous results for n = (200, 200, 200). Results remain similar when using the probit instead of the logit as shown in Tables S1 and S2 and Figures S2 and S3 (negligible differences). As expected, the egg volumes decrease as the biomarker becomes more accurate (i.e., for larger VUS values). This is because the variation of the optimal operating point for all approaches gets smaller when the VUS increases. Loosely speaking, the operating point itself has a smaller “leeway” when “squeezed” toward the perfection corner of the unit cube due to a well-performing marker. An interesting result is that the -based egg consistently outperforms the MV-based egg in terms of volume. We observe that for VUS = 0.7, the MV-based egg volume is approximately two times larger than the -based one.
For the Box–Cox approach, we observe similar results to the ones discussed in the previous paragraph. The Box–Cox approach provides overall satisfactory coverage, similar to the Delta-based approach (see Figure S4 of the Supporting Information for both the logit- and the probit-related results). The volumes of the CSs obtained by the Youden index remain six times larger compared to our proposed -based ones (see Tables S1 and S2 and Figures S5 and S6 in the Supporting Information). We note that the Box–Cox–based egg volumes are slightly larger than the corresponding Delta-based ones. Furthermore, the Box–Cox approach provides CSs that are substantially smaller compared to the kernel- and the logspline-based ones (see Tables S1 and S2 and Figure S5 in the Supporting Information). For example, when VUS = 0.4 and n = (200, 200, 200), the kernel-based egg exhibits a volume that is 3.6 times larger than the Box–Cox–based one. The corresponding logspline-based egg volume is four times larger than the Box–Cox–based one (see Table S1 of the Supporting Information). As before, we observe that the -based egg consistently outperforms the MV-based egg in terms of volume (see Tables S1 and S2 and Figures S2 and S3 in the Supporting Information).
For the kernel-based and the logspline-based approaches, we observe that even though they perform satisfactorily in terms of coverage, the kernel-based method yields better coverage, with the logspline being somewhat conservative for all scenarios (see Figures S2 and S3 and Tables S1 and S2 in the Supporting Information). In addition, we observe that the kernel-based egg volumes are consistently smaller than the corresponding logspline-based ones, regardless of the criterion used (, , or MV). As expected, the kernel- and the logspline-based egg volumes are larger compared to the ones obtained by the Box–Cox or the normality assumption. This is a general observation to all the simulation scenarios (see Table 2 and Tables S1–S5 in the Supporting Information). As before, the outperform the in terms of egg volume. This is also the case when we compare to MV, excluding cases where the VUS is small (VUS = 0.4 and 0.5). The corresponding results are given in Table 2 and Tables S1–S5 in the Supporting Information (see Figures S4–S6 in the Supporting Information).
TABLE 2.
Simulation results for the gamma-related scenarios using the logit transformation. We generate data from gamma distributions of sample sizes 50, 100, and 200 for each VUS value. This table shows the coverage and the egg volume for each method. The average of the volumes of the confidence spaces (CSs) based on 1000 Monte Carlo iterations is shown
| Egg-shaped CS |
||||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
MV |
||||||
| (n1, n2, n3) | VUS | Method | Volume | Coverage | Volume | Coverage | Volume | Coverage |
| (50, 50, 50) | 0.4 | Box–Cox | 0.0483 | 0.939 | 0.0080 | 0.948 | 0.0082 | 0.953 |
| Kernel | 0.1103 | 0.972 | 0.0191 | 0.971 | 0.0191 | 0.969 | ||
| Logspline | 0.1267 | 0.947 | 0.0396 | 0.975 | 0.0283 | 0.925 | ||
| 0.5 | Box–Cox | 0.0318 | 0.945 | 0.0063 | 0.945 | 0.0083 | 0.948 | |
| Kernel | 0.0703 | 0.981 | 0.0152 | 0.967 | 0.0186 | 0.965 | ||
| Logspline | 0.0975 | 0.962 | 0.0330 | 0.974 | 0.0278 | 0.936 | ||
| 0.6 | Box–Cox | 0.0186 | 0.947 | 0.0044 | 0.944 | 0.0074 | 0.950 | |
| Kernel | 0.0410 | 0.983 | 0.0105 | 0.965 | 0.0162 | 0.966 | ||
| Logspline | 0.0673 | 0.981 | 0.0226 | 0.979 | 0.0246 | 0.950 | ||
| 0.7 | Box–Cox | 0.0124 | 0.947 | 0.0032 | 0.945 | 0.0062 | 0.952 | |
| Kernel | 0.0281 | 0.963 | 0.0077 | 0.956 | 0.0137 | 0.953 | ||
| Logspline | 0.0451 | 0.983 | 0.0150 | 0.983 | 0.0208 | 0.979 | ||
| (100, 100, 100) | 0.4 | Box–Cox | 0.0193 | 0.932 | 0.0029 | 0.939 | 0.0029 | 0.943 |
| Kernel | 0.0534 | 0.969 | 0.0079 | 0.969 | 0.0078 | 0.962 | ||
| Logspline | 0.0771 | 0.966 | 0.0188 | 0.981 | 0.0105 | 0.914 | ||
| 0.5 | Box–Cox | 0.0122 | 0.936 | 0.0023 | 0.937 | 0.0030 | 0.950 | |
| Kernel | 0.0315 | 0.975 | 0.0060 | 0.961 | 0.0075 | 0.956 | ||
| Logspline | 0.0554 | 0.972 | 0.0147 | 0.979 | 0.0104 | 0.920 | ||
| 0.6 | Box–Cox | 0.0070 | 0.944 | 0.0016 | 0.945 | 0.0027 | 0.947 | |
| Kernel | 0.0169 | 0.953 | 0.0041 | 0.956 | 0.0063 | 0.953 | ||
| Logspline | 0.0333 | 0.973 | 0.0095 | 0.975 | 0.0092 | 0.933 | ||
| 0.7 | Box–Cox | 0.0046 | 0.947 | 0.0012 | 0.948 | 0.0022 | 0.947 | |
| Kernel | 0.0112 | 0.946 | 0.0029 | 0.942 | 0.0053 | 0.943 | ||
| Logspline | 0.0201 | 0.976 | 0.0060 | 0.979 | 0.0079 | 0.962 | ||
| (200, 200, 200) | 0.4 | Box–Cox | 0.0072 | 0.943 | 0.0010 | 0.945 | 0.0010 | 0.943 |
| Kernel | 0.0242 | 0.967 | 0.0032 | 0.950 | 0.0032 | 0.948 | ||
| Logspline | 0.0406 | 0.981 | 0.0071 | 0.976 | 0.0036 | 0.909 | ||
| 0.5 | Box–Cox | 0.0045 | 0.944 | 0.0008 | 0.943 | 0.0011 | 0.946 | |
| Kernel | 0.0129 | 0.959 | 0.0024 | 0.948 | 0.0030 | 0.946 | ||
| Logspline | 0.0245 | 0.976 | 0.0052 | 0.973 | 0.0035 | 0.916 | ||
| 0.6 | Box–Cox | 0.0025 | 0.945 | 0.0006 | 0.945 | 0.0010 | 0.945 | |
| Kernel | 0.0068 | 0.934 | 0.0016 | 0.933 | 0.0025 | 0.930 | ||
| Logspline | 0.0134 | 0.974 | 0.0032 | 0.977 | 0.0033 | 0.932 | ||
| 0.7 | Box–Cox | 0.0016 | 0.948 | 0.0004 | 0.949 | 0.0008 | 0.948 | |
| Kernel | 0.0044 | 0.930 | 0.0011 | 0.924 | 0.0021 | 0.921 | ||
| Logspline | 0.0078 | 0.970 | 0.0021 | 0.971 | 0.0028 | 0.942 | ||
5.2 |. Results that refer to the scenarios of nonnormal distributions
We also consider simulations for data that are not being generated from normal distributions. When using the Box–Cox approach, we observe good coverage for both the gamma and lognormal scenarios (see Table 2 and Tables S3–S5 of the Supporting Information). We point out that the Box–Cox is robust and outperforms substantially both the kernel- and the logspline-based egg volumes even for the gamma scenario that lies outside the Box–Cox family (see Table 2 and Tables S3–S5 of the Supporting Information). For example, for VUS = 0.4 and n = (200, 200, 200), the kernel-based approach yields an egg volume that is over three times larger than that of the Box–Cox approach. The corresponding egg volume of the logspline is over 5.5 times larger than the Box–Cox egg volume (see Table 2). As before, we observe that the kernel-based egg volumes are consistently smaller than the logspline-based ones (see Table 2 and Table S3 of the Supporting Information). Therein, we also see that -based egg volumes are substantially smaller than the -based ones. also outperforms MV in terms of egg volume apart from cases where the VUS is 0.4. We visualize the results for the gamma distributions in Figures 2–5 and Figure S7 of the Supporting Information. Results for the lognormal scenarios are analogous, see Figures S8–S12 of the Supporting Information.
FIGURE 2.

Ratio of egg-shaped confidence spaces (CSs) for the gamma-related scenarios using the logit transformation. The subplot on the left shows the ratios of the egg volumes ((egg volume of )/(egg volume of )). The subplot in the middle shows the corresponding ratio of the egg volume divided by the MV egg volume. The subplot on the right shows the corresponding ratio of the MV egg volume divided by the egg volume. The traditional -based approaches yield CSs that are up to 11 times larger than the ones from the - and MV-based approaches. The -based approach yields smaller egg volumes compared to the MV-based ones, except for VUS values of 0.4 and 0.5
FIGURE 5.

Volume of egg-shaped confidence spaces for the gamma-related scenarios. Each box-plot summarizes the results of our simulations broken down by VUS values and different criteria (, , MV) after combining results (merging) all other factors (different sample sizes, different modeling (“BoxCox,” “Kernel,” “Logspline”))
We further consider a scenario for which group 3 is generated from a bimodal density. The Box–Cox approach collapses for such a scenario yielding a coverage of 0.315 when the VUS is 0.6 and n = (200, 200, 200). This indicates that the Box–Cox approach is not robust enough under severe violations of normality. In this configuration, we observe that both the kernel and the logspline approach provide satisfactory coverage, with the kernel approach outperforming the logspline approach in terms of egg volume (see Tables S6 and S7 in the Supporting Information).
As a guidance, we recommend: (1) if all three groups conform with the assumption of normality, then the Delta-based method discussed in Section 2 is recommended. (2) If normality is violated for any of the groups, then apply the Box–Cox as discussed in Section 3, and test for normality using the transformed scores. If the transformed scores conform with the assumption of normality, then proceed to inferences using the Box–Cox–based framework. (3) If normality is violated for the transformed scores (for any of the groups), proceed to the kernel-based framework discussed in Section 4. Generally, -based CSs are substantially smaller/tighter around the optimal operating point compared to the -based ones. In addition, -based CSs outperform the MV-based ones in almost all scenarios.
6 |. APPLICATION
We apply our approaches to a data set that involves patients with HCC. The biomarkers were generated using mass-spectrometry (surface-enhanced laser desorption/ionization time of flight mass spectrometer—SELDI). The serum samples were collected and processed at Shanghai Chang-zheng Hospital, China. For additional details, we refer to Wang and Chang (2011). The data set involves 145 subjects: 54 hepatoma (H) patients, 39 chronic liver disease (LD) patients, and 52 “normal” individuals (N). The original data set has measurements for 236 biomarkers. The biomarkers used in this section are: 4271.368397, 11646.52674, and 11675.92167. All numerical results are presented in Table 3.
TABLE 3.
Results that refer to the application section (Liver Data). Three markers are used (first column of the table). We present results for all modeling methods (column 2) and all criteria (column 3). The optimized estimated cutoffs are given in columns 4 and 5. The associated triplet is given in columns 6, 7, and 8. In the last two columns (9 and 10), we present the egg volumes when using the probit and the logit transformations
| Egg Vol. |
||||||
|---|---|---|---|---|---|---|
| Biomarker | Method | Quantity | (ĉ1, ĉ2) | (,,) | probit | logit |
| 4271.368397 | Delta | Youden | (14.4494, 20.0438) | (0.4499, 0.6922, 0.4843) | 0.0544 | 0.0520 |
| Euclidean | (14.8701, 20.9864) | (0.4859, 0.6287, 0.5028) | 0.0083 | 0.0082 | ||
| MV | (14.8378, 20.8474) | (0.4831, 0.6383, 0.4983) | 0.0092 | 0.0091 | ||
| Box–Cox | Youden | (13.8263,19.1679) | (0.4311, 0.7245, 0.4817) | 0.0465 | 0.0449 | |
| Euclidean | (14.3251, 20.3480) | (0.4770, 0.6425, 0.5045) | 0.0076 | 0.0076 | ||
| MV | (14.2889, 20.1617) | (0.4737, 0.6558, 0.4981) | 0.0086 | 0.0086 | ||
| Kernel | Youden | (13.2093, 18.9097) | (0.3757, 0.7602, 0.5567) | 0.0733 | 0.0799 | |
| Euclidean | (14.0926,19.9497) | (0.4431, 0.6935, 0.5361) | 0.0188 | 0.0189 | ||
| MV | (14.0575, 19.6464) | (0.4404, 0.7143, 0.5250) | 0.0197 | 0.0197 | ||
| Logspline | Youden | (12.9267,18.9986) | (0.3186, 0.7703, 0.6107) | 0.0944 | 0.1094 | |
| Euclidean | (14.1404, 20.0615) | (0.4261, 0.7034, 0.5425) | 0.0290 | 0.0293 | ||
| MV | (14.1205,19.7501) | (0.4243, 0.7241, 0.5306) | 0.0276 | 0.0282 | ||
| 11646.52674 | Box–Cox | Youden | (7.6707, 9.9296) | (0.6552, 0.6639, 0.6706) | 0.0193 | 0.0192 |
| Euclidean | (7.6831, 9.9265) | (0.6601, 0.6642, 0.6654) | 0.0048 | 0.0048 | ||
| MV | (7.6800, 9.9263) | (0.6589, 0.6642, 0.6666) | 0.0082 | 0.0082 | ||
| Kernel | Youden | (7.4849,10.5942) | (0.6592, 0.6326, 0.7253) | 0.0301 | 0.0302 | |
| Euclidean | (7.5245, 10.4816) | (0.6762, 0.6369, 0.7028) | 0.0095 | 0.0096 | ||
| MV | (7.5115,10.5327) | (0.6707, 0.6350, 0.7109) | 0.0147 | 0.0149 | ||
| Logspline | Youden | (7.3810, 9.6379) | (0.6397, 0.6037, 0.7870) | 0.0369 | 0.0382 | |
| Euclidean | (7.4235, 9.2168) | (0.6569, 0.6470, 0.7193) | 0.0127 | 0.0131 | ||
| MV | (7.4084, 9.3403) | (0.6509, 0.6335, 0.7428) | 0.0211 | 0.0217 | ||
| 11675.92167 | Kernel | Youden | (8.1168,13.4878) | (0.7223, 0.5815, 0.6544) | 0.0413 | 0.0427 |
| Euclidean | (8.0680, 13.2900) | (0.7050, 0.5882, 0.6632) | 0.0148 | 0.0148 | ||
| MV | (8.0889,13.3803) | (0.7126, 0.5852, 0.6599) | 0.0217 | 0.0220 | ||
| Logspline | Youden | (8.0725, 9.8901) | (0.7137, 0.7201, 0.5499) | 0.0927 | 0.0967 | |
| Euclidean | (7.9981,10.1975) | (0.6837, 0.6761, 0.6177) | 0.0208 | 0.0211 | ||
| MV | (8.0173, 10.1248) | (0.6918, 0.6862, 0.6021) | 0.0237 | 0.0240 | ||
Biomarker 4271.368397 complies with the normality assumption, since the Kolmogorov–Smirnov test (KS-test) p-value for groups H, LD, and N are 0.9626, 0.3035, and 0.7565, respectively. The -based egg volume is equal to 0.0544 and 6.5 times larger than the -based egg volume (= 0.0083) when using the probit transformation. When using the MV, the corresponding egg volume is equal to 0.0092 (see Table 3 and Figures 6 and 7).
FIGURE 6.

Estimated ROC Surface with the corresponding confidence spaces (CSs) for the liver data set. Top left: The ROC surface is obtained using the measurements of biomarker 4271.368697. The outer, smoothed colored, CS is obtained by the Delta-based method utilizing the Youden index and has a volume of 0.0544. The inner smaller CS (plotted with a grid) is obtained by the Delta-based method using the Euclidean distance from the perfection corner and has a volume of 0.0083. Top right: the description is analogous and refers to biomarker 11646.52674 when analyzed using the Box–Cox approach with the probit transformation. Bottom row: refers to biomarker 11675.92167. The description is analogous and the estimation is based on the Kernel approach using the logit (left) and probit (right) transformation. In all subplot, the outer smooth colored egg refers to the Youden index, and the smaller grid-based refers to the approach
FIGURE 7.

Same illustration as in Figure 6 but focusing only on the confidence spaces (CSs) for better visualization. Top left: the outer larger egg-shaped CS has a volume of 0.0544. The inner smaller egg-shaped CS has a volume of 0.0083. The results for the remaining three subplots are similar and can be found in Table 3
Biomarker 11646.52674 does not conform to the normality assumption (KS test p-values are 0.0161, 0.1555, and < 0.001 for groups N, LD, and H, respectively). For that marker, we explore the use of the Box–Cox transformation, as well as the kernel- and logspline-based approaches. The corresponding KS test p-values after the Box–Cox transformation are 0.2266, 0.3620, and 0.2825, respectively. The egg-shaped CS using the Youden index exhibits a volume of 0.0193, while the -based exhibits a volume of 0.0048.
For biomarker 11675.92167, the normality test yields p-values < 0.001 for all three groups, and normality is also violated after the Box–Cox transformation (KS test p-value < 0.001 for the H group). Therefore, for this marker, we proceed with the kernels and the logspline. When using the logit transformation, in combination with the Euclidean distance method , the CS exhibits a volume of 0.0148, while the Youden index approach yields an egg volume of 0.0427, that is, over three times larger.
We note that after Box–Cox transforming biomarker 11646.52674, the underlying cutoffs are almost identical for both the Youden and the Euclidean methods. The results for this marker are very similar to what we have seen in our simulation studies. The volume of the egg-shaped CS when using the minimized Euclidean distance is about four times smaller compared to the egg volume. For all three markers, the Youden-based approach yields more than three times larger CSs compared to the proposed approach .
In Figure 6 we visualize the obtained egg-shaped CSs for the discussed markers. For biomarker 4271.368397, we assume normality and we proceed with the delta method. The corresponding result is in the top left subplot. For biomarker 11646.52674, we proceed with the Box–Cox approach (top-right subplot). For biomarker 11675.92167, we proceed nonparametrically using the kernel-based approach (bottom subplots). In all cases, our approaches attain smaller CSs, as further shown in Table 3.
Figure 7 provides a more focused visualization than what is illustrated in Figure 6. In Figure 7 we plot only the CSs without the ROC surfaces.
7 |. CONCLUSION AND DISCUSSION
Even though inferences for the volumes under the ROC surface (VUS) have been studied in the literature (see Nakas & Yiannoutsos, 2004; Yin et al., 2018), it is important to have available methods for making inferences around the corresponding optimal operating point. A recent generalization of the Youden index in the three-class case has been proposed by Nakas et al. (2010, 2013). Such a criterion, while popular, it cannot accommodate simultaneously all biomarker measurements from all three groups when performing cutoff estimation. Thus, when interest lies in joint inferences around the optimal TCFs, such an underutilization of samples may lead to inflated volumes of the CSs. In addition, as noted in Bantis et al. (2014), when dealing with joint CSs for the optimized TCFs, one needs to take into account their correlations (induced by the estimation of the cutoffs).
In this paper we take a different direction to tackle both of these issues. We consider two alternative criteria: the minimization of the Euclidean distance of the ROC surface from the perfection corner of the unit cube and the maximization of the volume of the cuboid under the ROC surface (MV). Both criteria allow simultaneous utilization of all biomarker scores. We explore parametric methods based on normality, flexible parametric methods based on power transformations, as well as kernel- and logspline-based methods for obtaining CSs around the optimized triplet. Our study shows that -based CSs substantially outperform the corresponding ones. In addition, seems to provide improved inferences compared to the MV as well. In terms of estimation, the Box–Cox approach seems to operate well even in settings where mild normality violations occur but collapses when they become severe. In such situations, the kernel-based approach should be preferred as it consistently outperforms the logspline-based approach.
One thing to note is that the choice of the objective function (, , or MV) may be driven by the underlying clinical setting. Even though seems to provide improved inferences around the optimized TCFs, could be preferred in settings that correspond to a clinically appealing TCF triplet. The criterion is known to provide a more balanced TCF triplet, as indicated by Hua and Tian (2020). In some settings, this may or may not be desirable from a clinical standpoint.
Supplementary Material
FIGURE 3.

Coverage of egg-shaped confidence spaces for the gamma-related scenarios. Each box-plot summarizes the results of our simulations broken down by VUS values and assumed modeling (“BoxCox,” “Kernel,” “Logspline”) after combining results (merging) all other factors (different sample sizes, different criteria (, , MV))
FIGURE 4.

Volume of egg-shaped confidence spaces for the gamma-related scenarios. Each box-plot summarizes the results of our simulations broken down by VUS values and assumed modeling (“BoxCox,” “Kernel,” “Logspline”) after combining results (merging) all other factors (different sample sizes, different criteria (, , MV))
ACKNOWLEDGMENTS
This research has been partially supported by the COBRE grant (P20GM130423), NIH Clinical Translational Science Award (UL1TR002366) to the University of Kansas—BERD Trailblazer award (2018–2019). Support has also been received by two National Cancer Institute (NCI) grants R01CA243445 and R33CA214333. This work was also supported in part by two Masonic Cancer Alliance (MCA) Partners Advisory Board grants from The University of Kansas Cancer Center (KUCC) and Children’s Mercy (CM). The contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH/NCI, MCA, KUCC, or CM. The authors would like to thank the anonymous referees and the associate editor for their valuable comments that significantly improved this paper. We would also like to thank Brian Mosier and Kate Young for proofreading the paper.
Funding information
Center for Scientific Review, Grant/Award Numbers: P20GM130423, R01CA243445, R33CA214333, UL1TR002366; Masonic Cancer Alliance
Footnotes
CONFLICT OF INTEREST
The authors have declared no conflict of interest.
OPEN RESEARCH BADGES
This article has earned an Open Data badge for making publicly available the digitally-shareable data necessary to reproduce the reported results. The data is available in the Supporting Information section.
This article has earned an open data badge “Reproducible Research” for making publicly available the code necessary to reproduce the reported results. The results reported in this article were reproduced partially due to their computational complexity.
SUPPORTING INFORMATION
Additional supporting information can be found online in the Supporting Information section at the end of this article.
DATA AVAILABILITY STATEMENT
We include the simulation results that are discussed in Section 5 in the Supporting Infomation that can be found at the journal’s website. Codes for using the discussed methods can be found at www.leobantis.net
REFERENCES
- Attwood K, Tian L, & Xiong C (2014). Diagnostic thresholds with three ordinal groups. Journal of Biopharmaceutical Statistics, 24(3), 608–633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bantis L, Nakas C, & Reiser B (2014). Construction of confidence regions in the ROC space after the estimation of the optimal Youden index-based cut-off point. Biometrics, 70(1), 212–223. [DOI] [PubMed] [Google Scholar]
- Bantis L, Nakas C, Reiser B, Myall D, & Dalrymple-Alford JC (2017). Construction of joint confidence regions for the optimal true class fractions of receiver operating characteristic (ROC) surfaces and manifolds. Statistical Methods in Medical Research, 26(3), 1429–1442. [DOI] [PubMed] [Google Scholar]
- Box GE, & Cox DR (1964). An analysis of transformations. Journal of the Royal Statistical Society, Series B (Methodological), 26(2), 211–243. [Google Scholar]
- Fluss R, Faraggi D, & Reiser B (2005). Estimation of the Youden index and its associated cutoff point. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 47(4), 458–472. [DOI] [PubMed] [Google Scholar]
- Hua J, & Tian L (2020). A comprehensive and comparative review of optimal cut-points selection methods for diseases with multiple ordinal stages. Journal of Biopharmaceutical Statistics, 30(1), 46–68. [DOI] [PubMed] [Google Scholar]
- Kooperberg C, & Stone CJ (1992). Logspline density estimation for censored data. Journal of Computational and Graphical Statistics, 1(4), 301–328. [Google Scholar]
- Kosinski A, Chen Y, & Lyles R (2010). Sample size calculations for evaluating a diagnostic test when the gold standard is missing atrandom. Statistics in Medicine, 29(15), 1572–1579. [DOI] [PubMed] [Google Scholar]
- Lloyd CJ, & Yong Z (1999). Kernel estimators of the ROC curve are better than empirical. Statistics & Probability Letters, 44(3), 221–228. [Google Scholar]
- Luo J, & Xiong C (2013). Youden index and associated cut-points for three ordinal diagnostic groups. Communications in Statistics—Simulation and Computation, 42(6), 1213–1234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mosier BR, & Bantis LE (2021). Estimation and construction of confidence intervals for biomarker cutoff-points under the shortest Euclidean distance from the ROC surface to the perfection corner. Statistics in Medicine, 40(20), 4522–4539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mossman D (1999). Three-way ROCs. Medical Decision Making, 19(1), 78–89. [DOI] [PubMed] [Google Scholar]
- Nakas C, Alonzo T, & Yiannoutsos C (2010). Accuracy and cut-off point selection in three-class classification problems using a generalization of the Youden index. Statistics in Medicine, 29(28), 2946–2955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nakas C, Dalrymple-Allford J, Anderson T, & Alonzo T (2013). Generalization of Youden index for multiple-class classification problems applied to the assessment of externally validated cognition in Parkinson disease screening. Statistics in Medicine, 32(6), 995–1003. [DOI] [PubMed] [Google Scholar]
- Nakas C, & Yiannoutsos C (2004). Ordered multiple-class ROC analysis with continuous measurements. Statistics in Medicine, 23, 3437–3449. [DOI] [PubMed] [Google Scholar]
- Silverman BW (1986). Density estimation for statistics and data analysis. Chapman and Hall, 1986, p. 176. [Google Scholar]
- Wand MP, & Jones MC (1994). Kernel smoothing. CRC Press. [Google Scholar]
- Wang Z, & Chang YCI (2011). Marker selection via maximizing the partial area under the ROC curve of linear risk scores. Biostatistics, 12(2), 369–385. [DOI] [PubMed] [Google Scholar]
- Xiong C, Bell G, Miller J, & Morris J (2006). Measuring and estimating diagnostic accuracy when there are three ordinal diagnostic groups. Statistics in Medicine, 25(7), 1251–1273. [DOI] [PubMed] [Google Scholar]
- Xu J, Wu C, Che X, Wang L, Yu D, Zhang T, Huang L, Li H, Tam W, Wang C, & Lin D (2010). Circulating microRNAs, miR-21, miR-122, and miR-223, in patients with hepatocellular carcinoma or chronic hepatitis. Digestive Diseases and Sciences, 57(11), 2910–2916. [DOI] [PubMed] [Google Scholar]
- Yin J, Nakas CT, Tian L, & Reiser B (2018). Confidence intervals for differences between volumes under receiver operating characteristic surfaces (VUS) and generalized Youden indices (GYIs). Statistical Methods in Medical Research, 27(3), 675–688. [DOI] [PubMed] [Google Scholar]
- Yin J, & Tian L (2014). Joint inference about sensitivity and specificity at the optimal cut-off point associated with Youden index. Computational Statistics & Data Analysis, 77, 1–13. [Google Scholar]
- Zhou X, & Harezlak J (2002). Comparison of bandwidth selection methods for kernel smoothing of ROC curves. Statistics in Medicine, 21(14), 2045–2055. [DOI] [PubMed] [Google Scholar]
- Zou K, & Hall W (2000). Two transformation models for estimating an ROC curve derived from continuous data. Journal of Applied Statistics, 27(5), 621–631. [Google Scholar]
- Zou K, Tempany C, Fielding J, & Silverman S (1998). Original smooth receiver operating characteristic curve estimation from continuous data: Statistical methods for analyzing the predictive value of spiral CT of ureteral stones. Academic Radiology, 5(10), 680–687. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
We include the simulation results that are discussed in Section 5 in the Supporting Infomation that can be found at the journal’s website. Codes for using the discussed methods can be found at www.leobantis.net
